TY - JOUR
T1 - Effectiveness of the GPT-4o Model in Interpreting Electrocardiogram Images for Cardiac Diagnostics
T2 - Diagnostic Accuracy Study
AU - Engelstein, Haya
AU - Ramon-Gonen, Roni
AU - Sabbag, Avi
AU - Klang, Eyal
AU - Sudri, Karin
AU - Cohen-Shelly, Michal
AU - Barbash, Israel
N1 - Publisher Copyright:
© Haya Engelstein, Roni Ramon Gonen, Avi Sabbag, Eyal Klang, Karin Sudri, Michal Cohen-Shelly, Israel Barbash.
PY - 2025/8/22
Y1 - 2025/8/22
N2 - Background: Recent progress has demonstrated the potential of deep learning models in analyzing electrocardiogram (ECG) pathologies. However, this method is intricate, expensive to develop, and designed for specific purposes. Large language models show promise in medical image interpretation, and yet their effectiveness in ECG analysis remains understudied. Generative Pretrained Transformer 4 Omni (GPT-4o), a multimodal artificial intelligence model, capable of processing images and text without task-specific training, may offer an accessible alternative. Objective: This study aimed to evaluate GPT-4o’s effectiveness in interpreting 12-lead ECGs, assessing classification accuracy, and exploring methods to enhance its performance. Methods: A total of 6 common ECG diagnoses were evaluated: normal ECG, ST-segment elevation myocardial infarction, atrial fibrillation, right bundle branch block, left bundle branch block, and paced rhythm, with 30 normal ECGs and 10 of each abnormal pattern, totaling 80 cases. Deidentified ECGs were analyzed using OpenAI’s GPT-4o. Our study used both zero-shot and few-shot learning methodologies to investigate three main scenarios: (1) ECG image recognition, (2) binary classification of normal versus abnormal ECGs, and (3) multiclass classification into 6 categories. Results: The model excelled in recognizing ECG images, achieving an accuracy of 100%. In the classification of normal or abnormal ECG cases, the few-shot learning approach improved GPT-4o’s accuracy by 30% from the baseline, reaching 83% (95% CI 81.8%-84.6%). However, multiclass classification for a specific pathology remained limited, achieving only 41% accuracy. Conclusions: GPT-4o effectively differentiates normal from abnormal ECGs, suggesting its potential as an accessible artificial intelligence–assisted triage tool. Although limited in diagnosing specific cardiac conditions, GPT-4o’s capability to interpret ECG images without specialized training highlights its potential for preliminary ECG interpretation in clinical and remote settings.
AB - Background: Recent progress has demonstrated the potential of deep learning models in analyzing electrocardiogram (ECG) pathologies. However, this method is intricate, expensive to develop, and designed for specific purposes. Large language models show promise in medical image interpretation, and yet their effectiveness in ECG analysis remains understudied. Generative Pretrained Transformer 4 Omni (GPT-4o), a multimodal artificial intelligence model, capable of processing images and text without task-specific training, may offer an accessible alternative. Objective: This study aimed to evaluate GPT-4o’s effectiveness in interpreting 12-lead ECGs, assessing classification accuracy, and exploring methods to enhance its performance. Methods: A total of 6 common ECG diagnoses were evaluated: normal ECG, ST-segment elevation myocardial infarction, atrial fibrillation, right bundle branch block, left bundle branch block, and paced rhythm, with 30 normal ECGs and 10 of each abnormal pattern, totaling 80 cases. Deidentified ECGs were analyzed using OpenAI’s GPT-4o. Our study used both zero-shot and few-shot learning methodologies to investigate three main scenarios: (1) ECG image recognition, (2) binary classification of normal versus abnormal ECGs, and (3) multiclass classification into 6 categories. Results: The model excelled in recognizing ECG images, achieving an accuracy of 100%. In the classification of normal or abnormal ECG cases, the few-shot learning approach improved GPT-4o’s accuracy by 30% from the baseline, reaching 83% (95% CI 81.8%-84.6%). However, multiclass classification for a specific pathology remained limited, achieving only 41% accuracy. Conclusions: GPT-4o effectively differentiates normal from abnormal ECGs, suggesting its potential as an accessible artificial intelligence–assisted triage tool. Although limited in diagnosing specific cardiac conditions, GPT-4o’s capability to interpret ECG images without specialized training highlights its potential for preliminary ECG interpretation in clinical and remote settings.
KW - LLMs
KW - artificial intelligence
KW - cardiology
KW - decision support systems
KW - electrocardiogram
KW - large language models
UR - https://www.scopus.com/pages/publications/105014731001
U2 - 10.2196/74426
DO - 10.2196/74426
M3 - ???researchoutput.researchoutputtypes.contributiontojournal.article???
C2 - 40845836
AN - SCOPUS:105014731001
SN - 2817-1705
VL - 4
JO - JMIR AI
JF - JMIR AI
M1 - e74426
ER -