Original Research

Performance of ChatGPT and Radiology Residents on Ultrasonography Board-Style Questions

  • Jiale Xu, MD ,
  • Shujun Xia, MD ,
  • Qing Hua, MD ,
  • Zihan Mei, MD ,
  • Yiqing Hou, MD ,
  • Minyan Wei, MD ,
  • Limei Lai, MD ,
  • Yixuan Yang, MD ,
  • Jianqiao Zhou, MD
Expand
  • aDepartment of Ultrasound, Ruijin Hospital, Shanghai Jiaotong University School of Medicine, Shanghai, China
    bCollege of Health Science and Technology, Shanghai Jiao Tong University School of Medicine, Shanghai, China

1 Jiale Xu and ShuJun Xia contributed equally to this study.

Department of Ultrasound, Ruijin Hospital, Shanghai JiaoTong University School of Medicine, 197 Ruijin Er Road, 200025 Shanghai, China, e-mail: zhousu30@126.com

Received date: 2024-01-19

  Accepted date: 2024-03-18

  Online published: 2024-11-12

Abstract

Objective: This study aims to assess the performance of the Chat Generative Pre-Trained Transformer (ChatGPT), specifically versions GPT-3.5 and GPT-4, on ultrasonography board-style questions, and subsequently compare it with the performance of third-year radiology residents on the identical set of questions.
Methods: The study, conducted from May 19 to May 30, 2023, utilized a selection of 134 multiple-choice questions sourced from a commercial question bank for American Registry for Diagnostic Medical Sonography (ARDMS) examinations and imported into the ChatGPT model (encompassing GPT-3.5 and GPT-4 versions). ChatGPT’s responses were evaluated overall, by topic, and by GPT version. An identical question set was assigned to three third-year radiology residents, enabling a direct comparison of performances with ChatGPT.
Results: GPT-4 correctly responded to 82.1% of questions (110 of 134), significantly surpassing the performance of GPT-3.5 (P = 0.003), which correctly answered 66.4% of questions (89 of 134). Although GPT-3.5’s performance was statistically indistinguishable from the average performance of the radiology residents (66.7%, 89.3 of 134) (P = 0.969), there was a notable difference in the accuracy in question-answering accuracy between GPT-4 and the residents (P = 0.004).
Conclusions: ChatGPT demonstrated significant competency in responding to ultrasonography board-style questions, with the GPT-4 version markedly surpassing both its predecessor GPT-3.5 and the radiology residents.

Cite this article

Jiale Xu, MD , Shujun Xia, MD , Qing Hua, MD , Zihan Mei, MD , Yiqing Hou, MD , Minyan Wei, MD , Limei Lai, MD , Yixuan Yang, MD , Jianqiao Zhou, MD . Performance of ChatGPT and Radiology Residents on Ultrasonography Board-Style Questions[J]. ADVANCED ULTRASOUND IN DIAGNOSIS AND THERAPY, 2024 , 8(4) : 250 -254 . DOI: 10.37015/AUDT.2024.240002

References

[1] OpenAI. Introducing ChatGPT. https://openai.com/blog/chatgpt/, Nov 30, 2022. Accessed May 19, 2023.
[2] Haver HL, Ambinder EB, Bahl M, Oluyemi ET, Jeudy J, Yi PH. Appropriateness of breast cancer prevention and screening recommendations provided by ChatGPT. Radiology 2023; 307:e230424.
[3] Mihalache A, Popovic MM, Muni RH. Performance of an artificial intelligence Chatbot in ophthalmic knowledge assessment. JAMA Ophthalmol 2023; 141:589-597.
[4] Adams LC, Truhn D, Busch F, Karder A, Niehues SM, Makowski MR, et al. Leveraging GPT-4 for post hoc transformation of free-text radiology reports into structured reporting: A multilingual feasibility study. Radiology 2023; 307:e230725.
[5] Bhayana R, Krishna S, Bleakney RR. Performance of ChatGPT on a radiology Board-style examination: Insights into current strengths and limitations. Radiology 2023; 307:e230582.
[6] OpenAI. GPT-4 Technical Report. https://ui.adsabs.harvard.edu/abs/2023arXiv230308774O May 8, 2023. Accessed May 10, 2023.
[7] Bhayana R, Bleakney RR, Krishna S. GPT-4 in radiology: Improvements in advanced reasoning. Radiology 2023; 307:e230987.
[8] Ultrasound Registry Review Question Bank. https://www.prepry.com/ Accessed May 19, 2023.
[9] Shen Y, Heacock L, Elias J, Hentel KD, Reig B, Shih G, et al. ChatGPT and other large language models are double-edged swords. Radiology 2023; 307:e230163.
Outlines

/