Performance of ChatGPT and Radiology Residents on Ultrasonography Board-Style Questions

doi:10.37015/AUDT.2024.240002

Advanced Ultrasound in Diagnosis and Therapy ›› 2024, Vol. 8 ›› Issue (4): 250-254.doi: 10.37015/AUDT.2024.240002

收稿日期:2024-01-19 接受日期:2024-03-18 出版日期:2024-12-30 发布日期:2024-11-12

Performance of ChatGPT and Radiology Residents on Ultrasonography Board-Style Questions

Xu Jiale^a^,^b^,¹, Xia Shujun^a^,^b^,¹, Hua Qing^a^,^b, Mei Zihan^a^,^b, Hou Yiqing^a^,^b, Wei Minyan^a^,^b, Lai Limei^a^,^b, Yang Yixuan^a^,^b, Zhou Jianqiao^a^,^b^,^*()

^aDepartment of Ultrasound, Ruijin Hospital, Shanghai Jiaotong University School of Medicine, Shanghai, China
^bCollege of Health Science and Technology, Shanghai Jiao Tong University School of Medicine, Shanghai, China

Received:2024-01-19 Accepted:2024-03-18 Online:2024-12-30 Published:2024-11-12
Contact: Zhou Jianqiao E-mail:zhousu30@126.com
About author:First author contact:
¹ Jiale Xu and ShuJun Xia contributed equally to this study.

摘要/Abstract

Abstract:

Objective: This study aims to assess the performance of the Chat Generative Pre-Trained Transformer (ChatGPT), specifically versions GPT-3.5 and GPT-4, on ultrasonography board-style questions, and subsequently compare it with the performance of third-year radiology residents on the identical set of questions.
Methods: The study, conducted from May 19 to May 30, 2023, utilized a selection of 134 multiple-choice questions sourced from a commercial question bank for American Registry for Diagnostic Medical Sonography (ARDMS) examinations and imported into the ChatGPT model (encompassing GPT-3.5 and GPT-4 versions). ChatGPT’s responses were evaluated overall, by topic, and by GPT version. An identical question set was assigned to three third-year radiology residents, enabling a direct comparison of performances with ChatGPT.
Results: GPT-4 correctly responded to 82.1% of questions (110 of 134), significantly surpassing the performance of GPT-3.5 (P = 0.003), which correctly answered 66.4% of questions (89 of 134). Although GPT-3.5’s performance was statistically indistinguishable from the average performance of the radiology residents (66.7%, 89.3 of 134) (P = 0.969), there was a notable difference in the accuracy in question-answering accuracy between GPT-4 and the residents (P = 0.004).
Conclusions: ChatGPT demonstrated significant competency in responding to ultrasonography board-style questions, with the GPT-4 version markedly surpassing both its predecessor GPT-3.5 and the radiology residents.

Key words: Artificial intelligence, Ultrasonography, Accuracy, Medical education

. [J]. Advanced Ultrasound in Diagnosis and Therapy, 2024, 8(4): 250-254.

Xu Jiale, Xia Shujun, Hua Qing, Mei Zihan, Hou Yiqing, Wei Minyan, Lai Limei, Yang Yixuan, Zhou Jianqiao. Performance of ChatGPT and Radiology Residents on Ultrasonography Board-Style Questions[J]. Advanced Ultrasound in Diagnosis and Therapy, 2024, 8(4): 250-254.

图/表 3

参考文献 9

[1]	OpenAI. Introducing ChatGPT. https://openai.com/blog/chatgpt/, Nov 30, 2022. Accessed May 19, 2023.
[2]	Haver HL, Ambinder EB, Bahl M, Oluyemi ET, Jeudy J, Yi PH. Appropriateness of breast cancer prevention and screening recommendations provided by ChatGPT. Radiology 2023; 307:e230424.
[3]	Mihalache A, Popovic MM, Muni RH. Performance of an artificial intelligence Chatbot in ophthalmic knowledge assessment. JAMA Ophthalmol 2023; 141:589-597. doi: 10.1001/jamaophthalmol.2023.1144 pmid: 37103928
[4]	Adams LC, Truhn D, Busch F, Karder A, Niehues SM, Makowski MR, et al. Leveraging GPT-4 for post hoc transformation of free-text radiology reports into structured reporting: A multilingual feasibility study. Radiology 2023; 307:e230725.
[5]	Bhayana R, Krishna S, Bleakney RR. Performance of ChatGPT on a radiology Board-style examination: Insights into current strengths and limitations. Radiology 2023; 307:e230582.
[6]	OpenAI. GPT-4 Technical Report. https://ui.adsabs.harvard.edu/abs/2023arXiv230308774O May 8, 2023. Accessed May 10, 2023.
[7]	Bhayana R, Bleakney RR, Krishna S. GPT-4 in radiology: Improvements in advanced reasoning. Radiology 2023; 307:e230987.
[8]	Ultrasound Registry Review Question Bank. https://www.prepry.com/ Accessed May 19, 2023.
[9]	Shen Y, Heacock L, Elias J, Hentel KD, Reig B, Shih G, et al. ChatGPT and other large language models are double-edged swords. Radiology 2023; 307:e230163.

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed

Question topics	Number of questions	Number of correct answers		P value
Question topics	Number of questions	GPT-3.5	GPT-4	P value
Physics	20	17(85)	18(90)	> 0.99
Clinic	114	72(63.2)	92(80.7)	0.003
Vascular technology	20	11(55)	19(95)	0.004
Abdomen	20	10(50)	15(75)	0.102
Obstetrics and gynecology	20	15(75)	18(90)	0.405
Pediatric sonography	20	10(50)	12(60)	0.525
Breast	20	14(70)	17(85)	0.449
Adult echocardiography	14	12(85.7)	11(78.6)	>0.99
Total	134	89(66.4)	110(82.1)	0.003

Question topics	Number of questions	Number of correct answers
Question topics	Number of questions	Resident A	Resident B	Resident C
Physics	20	14(70)	13(65)	12(60)
Clinic	114	72(63.2)	80(70.2)	77(67.5)
Vascular technology	20	14(70)	16(80)	15(75)
Abdomen	20	13(65)	12(60)	16(80)
Obstetrics and gynecology	20	15(75)	16(80)	15(75)
Pediatric sonography	20	9(45)	10(50)	9(45)
Breast	20	15(75)	13(65)	13(65)
Adult echocardiography	14	6(42.8)	13(92.8)	9(64.3)
Total	134	86(64.2)	93(69.4)	89(66.4)

Performance of ChatGPT and Radiology Residents on Ultrasonography Board-Style Questions

RichHTML

PDF (PC)

PDF (Mobile)

摘要/Abstract

引用本文

使用本文

图/表 3

参考文献 9

相关文章 0

Metrics

本文评价

推荐阅读 0