Performance of ChatGPT and Radiology Residents on Ultrasonography Board-Style Questions

Xu Jiale; Xia Shujun; Hua Qing; Mei Zihan; Hou Yiqing; Wei Minyan; Lai Limei; Yang Yixuan; Zhou Jianqiao

doi:10.37015/AUDT.2024.240002

ADVANCED ULTRASOUND IN DIAGNOSIS AND THERAPY >

2024 , Vol. 8 >Issue 4: 250 - 254

DOI: https://doi.org/10.37015/AUDT.2024.240002

Original Research

Performance of ChatGPT and Radiology Residents on Ultrasonography Board-Style Questions

Xu Jiale ,
Xia Shujun ,
Hua Qing ,
Mei Zihan ,
Hou Yiqing ,
Wei Minyan ,
Lai Limei ,
Yang Yixuan ,
Zhou Jianqiao

Expand

^aDepartment of Ultrasound, Ruijin Hospital, Shanghai Jiaotong University School of Medicine, Shanghai, China
^bCollege of Health Science and Technology, Shanghai Jiao Tong University School of Medicine, Shanghai, China

First author contact:

¹ Jiale Xu and ShuJun Xia contributed equally to this study.

Department of Ultrasound, Ruijin Hospital, Shanghai JiaoTong University School of Medicine, 197 Ruijin Er Road, 200025 Shanghai, China, e-mail: zhousu30@126.com

Received date: 2024-01-19

Accepted date: 2024-03-18

Online published: 2024-11-12

Fold

Abstract

Objective: This study aims to assess the performance of the Chat Generative Pre-Trained Transformer (ChatGPT), specifically versions GPT-3.5 and GPT-4, on ultrasonography board-style questions, and subsequently compare it with the performance of third-year radiology residents on the identical set of questions.
Methods: The study, conducted from May 19 to May 30, 2023, utilized a selection of 134 multiple-choice questions sourced from a commercial question bank for American Registry for Diagnostic Medical Sonography (ARDMS) examinations and imported into the ChatGPT model (encompassing GPT-3.5 and GPT-4 versions). ChatGPT’s responses were evaluated overall, by topic, and by GPT version. An identical question set was assigned to three third-year radiology residents, enabling a direct comparison of performances with ChatGPT.
Results: GPT-4 correctly responded to 82.1% of questions (110 of 134), significantly surpassing the performance of GPT-3.5 (P = 0.003), which correctly answered 66.4% of questions (89 of 134). Although GPT-3.5’s performance was statistically indistinguishable from the average performance of the radiology residents (66.7%, 89.3 of 134) (P = 0.969), there was a notable difference in the accuracy in question-answering accuracy between GPT-4 and the residents (P = 0.004).
Conclusions: ChatGPT demonstrated significant competency in responding to ultrasonography board-style questions, with the GPT-4 version markedly surpassing both its predecessor GPT-3.5 and the radiology residents.

Key words： Artificial intelligence; Ultrasonography; Accuracy; Medical education

Cite this article

Xu Jiale , Xia Shujun , Hua Qing , Mei Zihan , Hou Yiqing , Wei Minyan , Lai Limei , Yang Yixuan , Zhou Jianqiao . Performance of ChatGPT and Radiology Residents on Ultrasonography Board-Style Questions[J]. ADVANCED ULTRASOUND IN DIAGNOSIS AND THERAPY, 2024 , 8(4) : 250 -254 . DOI: 10.37015/AUDT.2024.240002

References

[1]	OpenAI. Introducing ChatGPT. https://openai.com/blog/chatgpt/, Nov 30, 2022. Accessed May 19, 2023.
[2]	Haver HL, Ambinder EB, Bahl M, Oluyemi ET, Jeudy J, Yi PH. Appropriateness of breast cancer prevention and screening recommendations provided by ChatGPT. Radiology 2023; 307:e230424.
[3]	Mihalache A, Popovic MM, Muni RH. Performance of an artificial intelligence Chatbot in ophthalmic knowledge assessment. JAMA Ophthalmol 2023; 141:589-597.
[4]	Adams LC, Truhn D, Busch F, Karder A, Niehues SM, Makowski MR, et al. Leveraging GPT-4 for post hoc transformation of free-text radiology reports into structured reporting: A multilingual feasibility study. Radiology 2023; 307:e230725.
[5]	Bhayana R, Krishna S, Bleakney RR. Performance of ChatGPT on a radiology Board-style examination: Insights into current strengths and limitations. Radiology 2023; 307:e230582.
[6]	OpenAI. GPT-4 Technical Report. https://ui.adsabs.harvard.edu/abs/2023arXiv230308774O May 8, 2023. Accessed May 10, 2023.
[7]	Bhayana R, Bleakney RR, Krishna S. GPT-4 in radiology: Improvements in advanced reasoning. Radiology 2023; 307:e230987.
[8]	Ultrasound Registry Review Question Bank. https://www.prepry.com/ Accessed May 19, 2023.
[9]	Shen Y, Heacock L, Elias J, Hentel KD, Reig B, Shih G, et al. ChatGPT and other large language models are double-edged swords. Radiology 2023; 307:e230163.

Options

Outlines

模态框（Modal）标题

Abstract

Cite this article

References