Ontology-driven Text Feature Modeling for Disease Prediction using Unstructured Radiological Notes

Gokul S. Krishnan, Sowmya Kamath S.


Clinical Decision Support Systems (CDSSs) support medical personnel by offering aid in decision making and timely interventions in patient care. Typically such systems are built on structured Electronic Health Records (EHRs), which, unfortunately have a very low adoption rate in developing countries at present. In such situations, clinical notes recorded by medical personnel, though unstructured, can be a significant source for rich patient related information. However, conversion of unstructured clinical notes to a structured EHR form is a manual and time consuming task, underscoring a critical need for more efficient, automated methods. In this paper, a generic disease prediction CDSS built on unstructured radiology text reports is proposed. We incorporate word embeddings and clinical ontologies to model the textual features of the patient data for training a feed-forward neural network for ICD9 disease group prediction. The proposed model built on unstructured text outperformed the state-of-the-art model built on structured data by 9% in terms of AUROC and 23% interms of AUPRC, thus eliminating the dependency on the availability of structured clinical data.


Healthcare Informatics, Unstructured Text, Disease Prediction, ontologies, natural language processing

Full Text: PDF