TY - JOUR
T1 - Identifying symptom etiologies using syntactic patterns and large language models
AU - Taub-Tabib, Hillel
AU - Shamay, Yosi
AU - Shlain, Micah
AU - Pinhasov, Menny
AU - Polak, Mark
AU - Tiktinsky, Aryeh
AU - Rahamimov, Sigal
AU - Bareket, Dan
AU - Eyal, Ben
AU - Kassis, Moriya
AU - Goldberg, Yoav
AU - Kaminski Rosenberg, Tal
AU - Vulfsons, Simon
AU - Ben Sasson, Maayan
N1 - Publisher Copyright:
© The Author(s) 2024.
PY - 2024/7/13
Y1 - 2024/7/13
N2 - Differential diagnosis is a crucial aspect of medical practice, as it guides clinicians to accurate diagnoses and effective treatment plans. Traditional resources, such as medical books and services like UpToDate, are constrained by manual curation, potentially missing out on novel or less common findings. This paper introduces and analyzes two novel methods to mine etiologies from scientific literature. The first method employs a traditional Natural Language Processing (NLP) approach based on syntactic patterns. By using a novel application of human-guided pattern bootstrapping patterns are derived quickly, and symptom etiologies are extracted with significant coverage. The second method utilizes generative models, specifically GPT-4, coupled with a fact verification pipeline, marking a pioneering application of generative techniques in etiology extraction. Analyzing this second method shows that while it is highly precise, it offers lesser coverage compared to the syntactic approach. Importantly, combining both methodologies yields synergistic outcomes, enhancing the depth and reliability of etiology mining.
AB - Differential diagnosis is a crucial aspect of medical practice, as it guides clinicians to accurate diagnoses and effective treatment plans. Traditional resources, such as medical books and services like UpToDate, are constrained by manual curation, potentially missing out on novel or less common findings. This paper introduces and analyzes two novel methods to mine etiologies from scientific literature. The first method employs a traditional Natural Language Processing (NLP) approach based on syntactic patterns. By using a novel application of human-guided pattern bootstrapping patterns are derived quickly, and symptom etiologies are extracted with significant coverage. The second method utilizes generative models, specifically GPT-4, coupled with a fact verification pipeline, marking a pioneering application of generative techniques in etiology extraction. Analyzing this second method shows that while it is highly precise, it offers lesser coverage compared to the syntactic approach. Importantly, combining both methodologies yields synergistic outcomes, enhancing the depth and reliability of etiology mining.
UR - http://www.scopus.com/inward/record.url?scp=85198397363&partnerID=8YFLogxK
U2 - 10.1038/s41598-024-65645-6
DO - 10.1038/s41598-024-65645-6
M3 - ???researchoutput.researchoutputtypes.contributiontojournal.article???
C2 - 39003296
AN - SCOPUS:85198397363
SN - 2045-2322
VL - 14
JO - Scientific Reports
JF - Scientific Reports
IS - 1
M1 - 16190
ER -