Identifying symptom etiologies using syntactic patterns and large language models

Hillel Taub-Tabib, Yosi Shamay, Micah Shlain, Menny Pinhasov, Mark Polak, Aryeh Tiktinsky, Sigal Rahamimov, Dan Bareket, Ben Eyal, Moriya Kassis, Yoav Goldberg, Tal Kaminski Rosenberg, Simon Vulfsons, Maayan Ben Sasson

Research output: Contribution to journalArticlepeer-review

Abstract

Differential diagnosis is a crucial aspect of medical practice, as it guides clinicians to accurate diagnoses and effective treatment plans. Traditional resources, such as medical books and services like UpToDate, are constrained by manual curation, potentially missing out on novel or less common findings. This paper introduces and analyzes two novel methods to mine etiologies from scientific literature. The first method employs a traditional Natural Language Processing (NLP) approach based on syntactic patterns. By using a novel application of human-guided pattern bootstrapping patterns are derived quickly, and symptom etiologies are extracted with significant coverage. The second method utilizes generative models, specifically GPT-4, coupled with a fact verification pipeline, marking a pioneering application of generative techniques in etiology extraction. Analyzing this second method shows that while it is highly precise, it offers lesser coverage compared to the syntactic approach. Importantly, combining both methodologies yields synergistic outcomes, enhancing the depth and reliability of etiology mining.

Original languageEnglish
Article number16190
JournalScientific Reports
Volume14
Issue number1
DOIs
StatePublished - 13 Jul 2024

Bibliographical note

Publisher Copyright:
© The Author(s) 2024.

Fingerprint

Dive into the research topics of 'Identifying symptom etiologies using syntactic patterns and large language models'. Together they form a unique fingerprint.

Cite this