Original Research

Performance of ChatGPT and Radiology Residents on Ultrasonography Board-Style Questions

  • Xu Jiale ,
  • Xia Shujun ,
  • Hua Qing ,
  • Mei Zihan ,
  • Hou Yiqing ,
  • Wei Minyan ,
  • Lai Limei ,
  • Yang Yixuan ,
  • Zhou Jianqiao
Expand
  • aDepartment of Ultrasound, Ruijin Hospital, Shanghai Jiaotong University School of Medicine, Shanghai, China
    bCollege of Health Science and Technology, Shanghai Jiao Tong University School of Medicine, Shanghai, China
First author contact:

1 Jiale Xu and ShuJun Xia contributed equally to this study.

Department of Ultrasound, Ruijin Hospital, Shanghai JiaoTong University School of Medicine, 197 Ruijin Er Road, 200025 Shanghai, China, e-mail: zhousu30@126.com

Received date: 2024-01-19

  Accepted date: 2024-03-18

  Online published: 2024-11-12

Abstract

Objective: This study aims to assess the performance of the Chat Generative Pre-Trained Transformer (ChatGPT), specifically versions GPT-3.5 and GPT-4, on ultrasonography board-style questions, and subsequently compare it with the performance of third-year radiology residents on the identical set of questions.
Methods: The study, conducted from May 19 to May 30, 2023, utilized a selection of 134 multiple-choice questions sourced from a commercial question bank for American Registry for Diagnostic Medical Sonography (ARDMS) examinations and imported into the ChatGPT model (encompassing GPT-3.5 and GPT-4 versions). ChatGPT’s responses were evaluated overall, by topic, and by GPT version. An identical question set was assigned to three third-year radiology residents, enabling a direct comparison of performances with ChatGPT.
Results: GPT-4 correctly responded to 82.1% of questions (110 of 134), significantly surpassing the performance of GPT-3.5 (P = 0.003), which correctly answered 66.4% of questions (89 of 134). Although GPT-3.5’s performance was statistically indistinguishable from the average performance of the radiology residents (66.7%, 89.3 of 134) (P = 0.969), there was a notable difference in the accuracy in question-answering accuracy between GPT-4 and the residents (P = 0.004).
Conclusions: ChatGPT demonstrated significant competency in responding to ultrasonography board-style questions, with the GPT-4 version markedly surpassing both its predecessor GPT-3.5 and the radiology residents.

Cite this article

Xu Jiale , Xia Shujun , Hua Qing , Mei Zihan , Hou Yiqing , Wei Minyan , Lai Limei , Yang Yixuan , Zhou Jianqiao . Performance of ChatGPT and Radiology Residents on Ultrasonography Board-Style Questions[J]. ADVANCED ULTRASOUND IN DIAGNOSIS AND THERAPY, 2024 , 8(4) : 250 -254 . DOI: 10.37015/AUDT.2024.240002

References

[1] OpenAI. Introducing ChatGPT. https://openai.com/blog/chatgpt/, Nov 30, 2022. Accessed May 19, 2023.
[2] Haver HL, Ambinder EB, Bahl M, Oluyemi ET, Jeudy J, Yi PH. Appropriateness of breast cancer prevention and screening recommendations provided by ChatGPT. Radiology 2023; 307:e230424.
[3] Mihalache A, Popovic MM, Muni RH. Performance of an artificial intelligence Chatbot in ophthalmic knowledge assessment. JAMA Ophthalmol 2023; 141:589-597.
[4] Adams LC, Truhn D, Busch F, Karder A, Niehues SM, Makowski MR, et al. Leveraging GPT-4 for post hoc transformation of free-text radiology reports into structured reporting: A multilingual feasibility study. Radiology 2023; 307:e230725.
[5] Bhayana R, Krishna S, Bleakney RR. Performance of ChatGPT on a radiology Board-style examination: Insights into current strengths and limitations. Radiology 2023; 307:e230582.
[6] OpenAI. GPT-4 Technical Report. https://ui.adsabs.harvard.edu/abs/2023arXiv230308774O May 8, 2023. Accessed May 10, 2023.
[7] Bhayana R, Bleakney RR, Krishna S. GPT-4 in radiology: Improvements in advanced reasoning. Radiology 2023; 307:e230987.
[8] Ultrasound Registry Review Question Bank. https://www.prepry.com/ Accessed May 19, 2023.
[9] Shen Y, Heacock L, Elias J, Hentel KD, Reig B, Shih G, et al. ChatGPT and other large language models are double-edged swords. Radiology 2023; 307:e230163.
Outlines

/