[an error occurred while processing this directive]

Advanced Ultrasound in Diagnosis and Therapy ›› 2024, Vol. 8 ›› Issue (4): 250-254.doi: 10.37015/AUDT.2024.240002

• • 上一篇    下一篇

  

  • 收稿日期:2024-01-19 接受日期:2024-03-18 出版日期:2024-12-30 发布日期:2024-11-12

Performance of ChatGPT and Radiology Residents on Ultrasonography Board-Style Questions

Xu Jialea,b,1, Xia Shujuna,b,1, Hua Qinga,b, Mei Zihana,b, Hou Yiqinga,b, Wei Minyana,b, Lai Limeia,b, Yang Yixuana,b, Zhou Jianqiaoa,b,*()   

  1. aDepartment of Ultrasound, Ruijin Hospital, Shanghai Jiaotong University School of Medicine, Shanghai, China
    bCollege of Health Science and Technology, Shanghai Jiao Tong University School of Medicine, Shanghai, China
  • Received:2024-01-19 Accepted:2024-03-18 Online:2024-12-30 Published:2024-11-12
  • Contact: Zhou Jianqiao E-mail:zhousu30@126.com
  • About author:First author contact:

    1 Jiale Xu and ShuJun Xia contributed equally to this study.

Abstract:

Objective: This study aims to assess the performance of the Chat Generative Pre-Trained Transformer (ChatGPT), specifically versions GPT-3.5 and GPT-4, on ultrasonography board-style questions, and subsequently compare it with the performance of third-year radiology residents on the identical set of questions.
Methods: The study, conducted from May 19 to May 30, 2023, utilized a selection of 134 multiple-choice questions sourced from a commercial question bank for American Registry for Diagnostic Medical Sonography (ARDMS) examinations and imported into the ChatGPT model (encompassing GPT-3.5 and GPT-4 versions). ChatGPT’s responses were evaluated overall, by topic, and by GPT version. An identical question set was assigned to three third-year radiology residents, enabling a direct comparison of performances with ChatGPT.
Results: GPT-4 correctly responded to 82.1% of questions (110 of 134), significantly surpassing the performance of GPT-3.5 (P = 0.003), which correctly answered 66.4% of questions (89 of 134). Although GPT-3.5’s performance was statistically indistinguishable from the average performance of the radiology residents (66.7%, 89.3 of 134) (P = 0.969), there was a notable difference in the accuracy in question-answering accuracy between GPT-4 and the residents (P = 0.004).
Conclusions: ChatGPT demonstrated significant competency in responding to ultrasonography board-style questions, with the GPT-4 version markedly surpassing both its predecessor GPT-3.5 and the radiology residents.

Key words: Artificial intelligence, Ultrasonography, Accuracy, Medical education

"

"

Question topics Number of questions Number of correct answers P value
GPT-3.5 GPT-4
Physics 20 17(85) 18(90) > 0.99
Clinic 114 72(63.2) 92(80.7) 0.003
Vascular technology 20 11(55) 19(95) 0.004
Abdomen 20 10(50) 15(75) 0.102
Obstetrics and gynecology 20 15(75) 18(90) 0.405
Pediatric sonography 20 10(50) 12(60) 0.525
Breast 20 14(70) 17(85) 0.449
Adult echocardiography 14 12(85.7) 11(78.6) >0.99
Total 134 89(66.4) 110(82.1) 0.003

"

Question topics Number of questions Number of correct answers
Resident A Resident B Resident C
Physics 20 14(70) 13(65) 12(60)
Clinic 114 72(63.2) 80(70.2) 77(67.5)
Vascular technology 20 14(70) 16(80) 15(75)
Abdomen 20 13(65) 12(60) 16(80)
Obstetrics and gynecology 20 15(75) 16(80) 15(75)
Pediatric sonography 20 9(45) 10(50) 9(45)
Breast 20 15(75) 13(65) 13(65)
Adult echocardiography 14 6(42.8) 13(92.8) 9(64.3)
Total 134 86(64.2) 93(69.4) 89(66.4)
[1] OpenAI. Introducing ChatGPT. https://openai.com/blog/chatgpt/, Nov 30, 2022. Accessed May 19, 2023.
[2] Haver HL, Ambinder EB, Bahl M, Oluyemi ET, Jeudy J, Yi PH. Appropriateness of breast cancer prevention and screening recommendations provided by ChatGPT. Radiology 2023; 307:e230424.
[3] Mihalache A, Popovic MM, Muni RH. Performance of an artificial intelligence Chatbot in ophthalmic knowledge assessment. JAMA Ophthalmol 2023; 141:589-597.
doi: 10.1001/jamaophthalmol.2023.1144 pmid: 37103928
[4] Adams LC, Truhn D, Busch F, Karder A, Niehues SM, Makowski MR, et al. Leveraging GPT-4 for post hoc transformation of free-text radiology reports into structured reporting: A multilingual feasibility study. Radiology 2023; 307:e230725.
[5] Bhayana R, Krishna S, Bleakney RR. Performance of ChatGPT on a radiology Board-style examination: Insights into current strengths and limitations. Radiology 2023; 307:e230582.
[6] OpenAI. GPT-4 Technical Report. https://ui.adsabs.harvard.edu/abs/2023arXiv230308774O May 8, 2023. Accessed May 10, 2023.
[7] Bhayana R, Bleakney RR, Krishna S. GPT-4 in radiology: Improvements in advanced reasoning. Radiology 2023; 307:e230987.
[8] Ultrasound Registry Review Question Bank. https://www.prepry.com/ Accessed May 19, 2023.
[9] Shen Y, Heacock L, Elias J, Hentel KD, Reig B, Shih G, et al. ChatGPT and other large language models are double-edged swords. Radiology 2023; 307:e230163.
No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!
[an error occurred while processing this directive]