ADVANCED ULTRASOUND IN DIAGNOSIS AND THERAPY >
Performance of ChatGPT and Radiology Residents on Ultrasonography Board-Style Questions
1 Jiale Xu and ShuJun Xia contributed equally to this study.
Received date: 2024-01-19
Accepted date: 2024-03-18
Online published: 2024-11-12
Objective: This study aims to assess the performance of the Chat Generative Pre-Trained Transformer (ChatGPT), specifically versions GPT-3.5 and GPT-4, on ultrasonography board-style questions, and subsequently compare it with the performance of third-year radiology residents on the identical set of questions.
Methods: The study, conducted from May 19 to May 30, 2023, utilized a selection of 134 multiple-choice questions sourced from a commercial question bank for American Registry for Diagnostic Medical Sonography (ARDMS) examinations and imported into the ChatGPT model (encompassing GPT-3.5 and GPT-4 versions). ChatGPT’s responses were evaluated overall, by topic, and by GPT version. An identical question set was assigned to three third-year radiology residents, enabling a direct comparison of performances with ChatGPT.
Results: GPT-4 correctly responded to 82.1% of questions (110 of 134), significantly surpassing the performance of GPT-3.5 (P = 0.003), which correctly answered 66.4% of questions (89 of 134). Although GPT-3.5’s performance was statistically indistinguishable from the average performance of the radiology residents (66.7%, 89.3 of 134) (P = 0.969), there was a notable difference in the accuracy in question-answering accuracy between GPT-4 and the residents (P = 0.004).
Conclusions: ChatGPT demonstrated significant competency in responding to ultrasonography board-style questions, with the GPT-4 version markedly surpassing both its predecessor GPT-3.5 and the radiology residents.
Key words: Artificial intelligence; Ultrasonography; Accuracy; Medical education
Jiale Xu, MD , Shujun Xia, MD , Qing Hua, MD , Zihan Mei, MD , Yiqing Hou, MD , Minyan Wei, MD , Limei Lai, MD , Yixuan Yang, MD , Jianqiao Zhou, MD . Performance of ChatGPT and Radiology Residents on Ultrasonography Board-Style Questions[J]. ADVANCED ULTRASOUND IN DIAGNOSIS AND THERAPY, 2024 , 8(4) : 250 -254 . DOI: 10.37015/AUDT.2024.240002
[1] | OpenAI. Introducing ChatGPT. https://openai.com/blog/chatgpt/, Nov 30, 2022. Accessed May 19, 2023. |
[2] | Haver HL, Ambinder EB, Bahl M, Oluyemi ET, Jeudy J, Yi PH. Appropriateness of breast cancer prevention and screening recommendations provided by ChatGPT. Radiology 2023; 307:e230424. |
[3] | Mihalache A, Popovic MM, Muni RH. Performance of an artificial intelligence Chatbot in ophthalmic knowledge assessment. JAMA Ophthalmol 2023; 141:589-597. |
[4] | Adams LC, Truhn D, Busch F, Karder A, Niehues SM, Makowski MR, et al. Leveraging GPT-4 for post hoc transformation of free-text radiology reports into structured reporting: A multilingual feasibility study. Radiology 2023; 307:e230725. |
[5] | Bhayana R, Krishna S, Bleakney RR. Performance of ChatGPT on a radiology Board-style examination: Insights into current strengths and limitations. Radiology 2023; 307:e230582. |
[6] | OpenAI. GPT-4 Technical Report. https://ui.adsabs.harvard.edu/abs/2023arXiv230308774O May 8, 2023. Accessed May 10, 2023. |
[7] | Bhayana R, Bleakney RR, Krishna S. GPT-4 in radiology: Improvements in advanced reasoning. Radiology 2023; 307:e230987. |
[8] | Ultrasound Registry Review Question Bank. https://www.prepry.com/ Accessed May 19, 2023. |
[9] | Shen Y, Heacock L, Elias J, Hentel KD, Reig B, Shih G, et al. ChatGPT and other large language models are double-edged swords. Radiology 2023; 307:e230163. |
/
〈 | 〉 |