Diagnosis Support

My goal is to support doctors to not oversee diagnoses, combining the pattern recognition of subsymbolic with the structured logic of symbolic ontologies

Why

Misdiagnosis causes 371k deaths and 424k permanent disabilities annually in the US alone.
(Newman-Toker et al., 2023)

The main causes: Information Overload. Time pressure. Premature closure. Complexity.

Neural Networks

During one of my projects at the University of Melbourne I trained neural networks to flag arrythmias in 12-lead ECG data.

Through that I gained deep insights that data quality outweights data quantity and figured that NNWs have a huge potential in pattern recognition.

The project ended up receiving the highest honors, which was especially nice since I finalized it up from a tent during a trek through Wilsons Promontory.

Structured Data

I came across Aristotles way of structuring knowledge: subject, predicate, object. Turns out its the backbone of how modern knowledge graphs are built.

My professor introduced me to logically prebuild knowledge bases for medical applications. These ontologies include logic-checked, empirical based graphs of diseases, findings and treatment.

Without diving too deep into ontologies there are more upsides like: language independence, interoperability, explainability and updatability.

Prerequisites

Real patient data (in the future) demands highest data security, so the whole system should be able to run locally - no information leaving the doctors room. To not distract the patient and doctor the machine should be silent and small- something like a Mac Studio.

The User Interface (UI) should be as intuitive as possible and the interaction should be ideally work without extra input from the doctor to optimize the user workflow

Because there are constantly new findings in medicine, my requirement was to be able to update the knowlege data as easy as possible.

Putting it all together

Combining the two choosen ontologies. I choose HPO because of the elaborate linking between phenotypes and diseases and enrich it with HOOM which includes more rare diseases.
Embedding the ontologies. Each concept is transformed into a vector in a high dimensional space - this will ease up a quick lookup of related topics.
Now having the knowledge "remembered" and ready to retrieve. The User input are phenotypes (e.g. laboratory results, symptoms...). Based on these, related information in the database is searched. In the future the User Input will be a preprocessed form of the electric patient record.
Then the K results with the highest "similarity" are being returned. This is comparable to the first Google results when searching something (only that I didnt monetize the first returned results through advertisements)
For each returned part of information the directly connected information is then being retrieve from the ontology as context.
All of these documents (Main node + context) are being passed on together with the initial User Prompt.
Due to my limited computing power I was forced to outsource the LLM call onto a GPU server. There, the final answer is being computed.

Step 1 and 2 are being calculated only once. The rest will be repeated with every new User input.

The pipeline scheme, including symbolic ontologies with LLM reasoning.

Project Highlights

Recognize and combine the strength of two different systems.
Fast and flexible retrieval made possible with vector embeddings
Flexible setup - easy for expanding ontologies and scale