Artificial Intelligence in Ultrasound Imaging: A Review of Progress from Machine Learning to Large Language Model

  • Jin Tong ,
  • Yu Xiaohu ,
  • Ai Zheng ,
  • Guo Hongcheng
Expand
  • aSchool of Life Sciences, Beijing University of Chinese Medicine, Beijing, China
    bSchool of Data Science, Fudan University, Shanghai, China
1Tong Jin and Xiaohu Yu contributed equally to this work.
School of Data Science, Fudan University, Shanghai, China (Hongcheng Guo),e-mail: guohc@fudan.edu.cn (HC G).

Received date: 2025-09-30

  Revised date: 2025-10-12

  Accepted date: 2025-10-26

  Online published: 2025-11-06

Copyright

2576-2508/© AUDT 2025

Abstract

Biomedical ultrasound imaging, as one of the most common, safe, and cost-effective modalities in clinical diagnosis, witnesses remarkable progress with the integration of artificial intelligence (AI). Early studies based on traditional machine learning (ML) rely on handcrafted features and classical classifiers to achieve automatic recognition and quantitative analysis of ultrasound images. However, such methods are limited in feature representation capacity and generalizability. With the advent of deep learning (DL), convolutional neural networks (CNNs), recurrent neural networks (RNNs), and attention-based architectures are widely applied to tasks such as segmentation, detection, and lesion classification, significantly improving diagnostic accuracy and robustness. More recently, large language models (LLMs) and multimodal foundation models open new avenues for intelligent ultrasound analysis. These models not only integrate imaging and textual information to support automated report generation and cross-modal reasoning but also offer enhanced interpretability and greater potential for clinical adoption. In this review, we provide a systematic review of the evolution of AI in ultrasound image analysis, spanning from traditional ML to deep learning and LLMs, outlining a complete trajectory of methodological advances.

Cite this article

Jin Tong , Yu Xiaohu , Ai Zheng , Guo Hongcheng . Artificial Intelligence in Ultrasound Imaging: A Review of Progress from Machine Learning to Large Language Model[J]. ADVANCED ULTRASOUND IN DIAGNOSIS AND THERAPY, 2025 , 9(4) : 483 -496 . DOI: 10.26599/AUDT.2025.250104

References

[1] Maclin S , Philip Dempsey . Using an artificial neural network to diagnose hepatic masses. J Med Syst 1992; 16.5: 215-225
[2] Garra BS , Krasner BH , Horii SC , Ascher S , Mun SK , Zeman RK . Improving the distinction between benign and malignant breast lesions: the value of sonographic texture analysis. Ultrason Imaging 1993; 15: 267-285.
[3] Lee HW , Liu BD , Hung KC , Lei SF , Wang PC . Breast tumor classification of ultrasound images using wavelet-based channel energy and imageJ. IEEE J Sel Top Signal Process 2009; 3: 81-93.
[4] Oelze ML , Mamou J . Review of quantitative ultrasound: Envelope statistics and backscatter coefficient imaging and contributions to diagnostic ultrasound. IEEE T Ultrason Ferr 2016; 63: 336-351.
[5] Krizhevsky, Alex , Ilya Sutskever , Geoffrey EH . Imagenet classification with deep convolutional neural networks. Adv Neural Inf Process Syst 2012;25.
[6] Long J , Shelhamer E , Darrell T . Fully convolutional networks for semantic segmentation. Proc IEEE Conf Comput Vision Pattern Recogn 2015; 39: 640-651
[7] Ronneberger O , Fischer P , Brox T . U-net: Convolutional networks for biomedical image segmentation. Int Conf Med Image Comput Computer-Assisted Intervention. Cham: Springer international publishing 2015.
[8] Yang X , Yu L , Wu L , Wang Y , Ni D , Qin J , et al. Fine-grained recurrent neural networks for automatic prostate segmentation in ultrasound images. Proc AAAI Conf Artif Intelligence 2016;31.
[9] Chen H , Dou Q , Ni D , Cheng JZ , Heng PA . Automatic fetal ultrasound standard plane detection using knowledge transferred recurrent neural networks. Int Conf Med Image Comput Computer-Assisted Intervention. Cham: Springer International Publishing 2015.
[10] Vaswani A , Shazeer N , Parmar N , Uszkoreit J , Jones L , Gomez AN , et al. Attention is all you need. Adv Neural Inf Process Syst. https://doi.org/10.48550/arXiv.1706.03762.
[11] Shen P , Yang Z , Sun J , Wang Y , Qiu C , Wang Y , et al . Explainable multimodal deep learning for predicting thyroid cancer lateral lymph node metastasis using ultrasound imaging. Nat Commun 2025; 16: 7052.
[12] Qian X , Pei J , Han C , Liang Z , Zhang G , Chen N , et al . A multimodal machine learning model for the stratification of breast cancer risk. Nat Biomed Eng 2025; 9: 356-370
[13] Yao W , Bai J , Liao W , Chen Y , Liu M , Xie Y . From cnn to transformer: A review of medical image segmentation models. J Imaging Inform Med 2024; 37: 1529-1547.
[14] Singhal K , Azizi S , Tu T , Mahdavi J , et al . Large language models encode clinical knowledge. Nature 2023; 620: 172-180.
[15] Thirunavukarasu AJ , Ting DSJ , Elangovan K , Gutierrez L , Tan TF , Ting DSW . Large language models in medicine. Nat Med 2023; 29: 1930-1940.
[16] OpenAI, Achiam J , Adler S , Agarwal S , Lama A , Ilge A , et al. Gpt-4 technical report. https://doi.org/10.48550/arXiv.2303.08774.
[17] Driess D , Xia F , Sajjadi MSM , Lynch, C , Chowdhery A , Ichter B , et al. Palm-e: an embodied multimodal language model. https://doi.org/10.48550/arXiv.2303.03378.
[18] Liu H , Li C , Wu Q , Lee YL . Visual instruction tuning. Adv Neural Inf Process Syst 2023; 36: 34892-34916
[19] Li XS , Huang DY , Zhang YM , Navab N , Jiang ZL . Semantic Scene Graph for Ultrasound Image Explanation and Scanning Guidance. https://doi.org/10.48550/arXiv.2506.19683.
[20] Chen J , Gui C , Ouyang R , Gao A , Chen S , Chen GH , et al. Towards injecting medical visual knowledge into multimodal llms at scale. Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing. 2024.
[21] Huang X , Shen L , Liu J , Shang F , Li H , Huang H , et al. Towards a multimodal large language model with pixel-level insight for biomedicine. Proceedings of the AAAI Conference on Artificial Intelligence. 2024.
[22] Chang RF , Wu WJ , Moon WK , et al . Improvement in breast tumor discrimination by support vector machines and speckle-emphasis texture analysis. Ultrasound Med Biol 2003; 29: 679-686.
[23] Wu WJ , Woo KM . Ultrasound breast tumor image computer-aided diagnosis with texture and morphological features. Acad Radio 2008; l15:873-880.
[24] Huang YL , Chen DR , Jiang YR , Kuo SJ , Wu HK , Moon WK . Computer‐aided diagnosis using morphological features for classifying breast lesions on ultrasound. Ultrasound Obstet Gynecol 2008; 32: 565-572.
[25] Gomez W , Pereira WCA , Infantosi AFC . Analysis of co-occurrence texture statistics as a function of gray-level quantization for classifying breast ultrasound. IEEE Trans Med Imaging 2012; 31: 1889-1899.
[26] Cai L , Wang X , Wang Y , Guo Y , Yu J , Wang Y . Robust phase-based texture descriptor for classification of breast ultrasound images. Biomed Eng Online 2015; 14: 26.
[27] Uniyal N , Eskandari H , Abolmaesumi P , Sojoudi S , Gordon P , Warren L , et al . Ultrasound RF time series for classification of breast lesions. IEEE Trans Med Imaging 2014; 34: 652-661
[28] Xia F , Wei W , Wang J , Wang Y , Wang K , Zhang C , Zhu Q . Ultrasound radiomics-based logistic regression model for fibrotic NASH. BMC Gastroenterol 2025; 25: 66.
[29] Jordan MI . Serial order: A parallel distributed processing approach. Adv Psychol 1997; 121: 471-495
[30] Elman JL . Finding structure in time. Cogn Sci 1990; 14: 179-211.
[31] Hochreiter S , Jürgen S . Long short-term memory. Neural computation 1997; 9: 1735-1780.
[32] Chung JY , Gulcehre C , Cho K , Bengio Y . Empirical evaluation of gated recurrent neural networks on sequence modeling. https://doi.org/10.48550/arXiv.1412.3555.
[33] Cho K , Merrienboer BV , Gulcehre C , Bahdanau D , Bougares F , Schwenk H , et al. Learning phrase representations using RNN encoder-decoder for statistical machine translation. https://doi.org/10.48550/arXiv.1406.1078.
[34] Lecun Y , Bottou L . Gradient-based learning applied to document recognition. Proceedings of the IEEE 1998; 86: 2278-2324.
[35] Qian X , Pei J , Zheng H , Xie X , Yan L , Zhang H , et al . Prospective assessment of breast cancer risk from multimodal multiview ultrasound images via clinically applicable deep learning. Nat Biomed Eng 2021; 5: 522-532.
[36] Wu WJ , Woo KM . Ultrasound breast tumor image computer-aided diagnosis with texture and morphological features. Acad Radiol 2008; 15: 873-880.
[37] Sharma H , Droste R , Chatelain P , Drukker L , Papageorghiou AT , Noble JA . Spatio-temporal partitioning and description of full-length routine fetal anomaly ultrasound scans. Proc IEEE Int Symp Biomed Imaging 2019; 16: 987-990
[38] Yang X , Yu L , Li S , Wen H , Luo D , Bian C , et al . Towards automated semantic segmentation in prenatal volumetric ultrasound. IEEE Trans Med Imaging 2019; 38: 180-193.
[39] Ferreira DL , Lau C , Salaymang Z , Arnaout R . Self-supervised learning for label-free segmentation in cardiac ultrasound. Nat Commun 2025; 16: 4070.
[40] Chen J , Mei J , Li X , Lu Y , Yu Q , Wei Q , et al . TransUNet: Rethinking the U-Net architecture design for medical image segmentation through the lens of transformers. Med Image Anal 2024; 97: 103280.
[41] Zhou Y , Chen C , Yao J , Yu J , Feng B , Sui L , et al . A deep learning based ultrasound diagnostic tool driven by 3D visualization of thyroid nodules. NPJ Digit Med 2025; 8: 126.
[42] Takahashi S , Sakaguchi Y , Kouno N , Takasawa K , Ishizu K , Akagi Y , et al . Comparison of vision transformers and convolutional neural networks in medical image analysis: A systematic review. J Med Syst 2024; 48: 84.
[43] Kirillov A , Mintun E , Ravi N , Mao H , Rolland C , Gustafson L , et al. Segment anything. Proceedings of the IEEE/CVF international conference on computer vision. 2023.
[44] Ma J , He Y , Li F , Han L , You C , Wang B . Segment anything in medical images. Nat Commun 2024; 15: 654.
[45] Chen C , Miao J , Wu D , Zhong A , Yan Z , Kim S , et al . MA-SAM: Modality-agnostic SAM adaptation for 3D medical image segmentation. Med Image Anal 2024; 98: 103310.
[46] Isensee F , Jaeger PF , Kohl SAA , Petersen J , Maier-Hein KH . nnU-Net: a self-configuring method for deep learning-based biomedical image segmentation. Nat Methods 2021; 18: 203-211.
[47] Deng X , Wu H , Zeng R , Qin J , et al. Memsam: Taming segment anything model for echocardiography video segmentation. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2024.
[48] Hu JT , Zhuo W , Cheng J , Liu YY , Xue WF , Ni D . EchoONE: segmenting multiple echocardiography planes in one model. Proceedings of the computer vision and pattern recognition conference. 2025.
[49] Wu Y , Zhao T , Hu S , Wu Q , Huang X , Chen Y , et al. SAID-Net: enhancing segment anything model with implicit decoding for echocardiography sequences segmentation. Med Biol Eng Comput 2025.
[50] Koleilat T , Asgariandehkordi H , Rivaz H , Xiao YM . Medclip-sam: Bridging text and image towards universal medical image segmentation. International conference on medical image computing and computer-assisted intervention. Cham: Springer Nature Switzerland 2024.
[51] Radford A , Kim JW , Hallacy C , Ramesh A , Goh G , Agarwal S , et al. Learning transferable visual models from natural language supervision. International conference on machine learning. PmLR, 2021.
[52] Zhang S , Xu YB , Usuyama N , Xu HW , Bagga J , Tinn R , et al. Biomedclip: a multimodal biomedical foundation model pretrained from fifteen million scientific image-text pairs. https://doi.org/10.48550/arXiv.2303.00915.
[53] Chen YX , Wei MH , Zheng ZX , Hu JL , Shi YL , Xiong SW , et al. Causalclipseg: Unlocking clip’s potential in referring medical image segmentation with causal intervention. https://doi.org/10.48550/arXiv.2503.15949.
[54] Jiao J , Zhou J , Li X , Xia M , Huang Y , Huang L , et al . Usfm: A universal ultrasound foundation model generalized to tasks and organs towards label efficient image analysis. Med Image Anal 2024; 96: 103202.
[55] Zhang S , Zhang Q , Zhang S , Liu X , Yue J , Lu M , et al. A generalist foundation model and database for open-world medical image segmentation. Nat Biomed Eng 2025:1-16.
[56] Xu Z , Tang F , Quan Q , Yao Q , Kong Q , Ding J , et al . Fair ultrasound diagnosis via adversarial protected attribute aware perturbations on latent embeddings. NPJ Digit Med 2025; 8: 291.
[57] Fang X , Lin Y , Zhang D , Cheng KT , Chen H . Aligning medical images with general knowledge from large language models. https://doi.org/10.48550/arXiv.2409.00341.
[58] Ferber D , W?lflein G , Wiest IC , Ligero M , Sainath S , Ghaffari Laleh N , et al . In-context learning enables multimodal large language models to classify cancer pathology images. Nat Commun 2024; 15: 10104.
[59] Christiansen F , Konuk E , Ganeshan AR , Welch R , Palés Huix J , Czekierdowski A , et al . International multicenter validation of AI-driven ultrasound detection of ovarian cancer. Nat Med 2025; 31: 189-196.
[60] Miao HY , Jia J , Cao YK , Zhou YJ , Jiang YW , Liu Z , et al. Ultrasound-qbench: Can llms aid in quality assessment of ultrasound imaging?. https://doi.org/10.48550/arXiv.2501.02751.
[61] Van Veen D , Van Uden C , Blankemeier L , Delbrouck JB , Aali A , Bluethgen C , et al . Adapted large language models can outperform medical experts in clinical text summarization. Nat Med 2024; 30: 1134-1142.
[62] Chen YX , Yang HY , Pan HK , Siddiqui F , Verdone A , Zhang QY , et al. Burextract-llama: An llm for clinical concept extraction in breast ultrasound reports. Proceedings of the 1st International Workshop on Multimedia Computing for Health and Medicine. 2024.
[63] Chen MC , Fan SQ , Cao GL , Liu YH , Liu HB . USPilot: An embodied robotic assistant ultrasound system with large language model enhanced graph planner. https://doi.org/10.48550/arXiv.2502.12498.
[64] Liu C , Wei M , Qin Y , Zhang M , Jiang H , Xu J , et al . Harnessing large language models for structured reporting in breast ultrasound: a comparative study of Open AI (GPT-4. 0) and Microsoft Bing (GPT-4). Ultrasound Med Biol 2024; 50: 1697-1703
[65] Wang S , Zhao Z , Ouyang X , Liu T , Wang Q , Shen D , et al . Interactive computer-aided diagnosis on medical image using large language models. Commun Eng 2024; 3: 133.
[66] Yao J , Wang Y , Lei Z , Wang K , Feng N , Dong F , et al . Multimodal GPT model for assisting thyroid nodule diagnosis and management. NPJ Digit Med 2025; 8: 245.
[67] Li ZM , Li MD , Wang W , Huang QH . Ultrasound report generation with fuzzy knowledge and multi-modal large language model. Expert Syst Appl 2025; 292: 128555.
[68] Lu MY , Chen B , Williamson DFK , Chen RJ , Zhao M , Chow AK , et al . A multimodal generative AI copilot for human pathology. Nature 2024; 634: 466-473.
[69] Guo XC , Chai WH , Li SY , Wang G . LLaVA-ultra: Large chinese language and vision assistant for ultrasound. Proceedings of the 32nd ACM international conference on multimedia. 2024.
[70] Zhu, KY , Qin ZY , Yi HH , Jiang ZK , Lao QC , Zhang, ST , et al. Guiding medical vision-language models with explicit visual prompts: framework design and comprehensive exploration of prompt variations. https://doi.org/10.48550/arXiv.2501.02385.
[71] Chen YY , Xu DX , Huang Y , Zhan, SK , Wang HP , Chen DX , et al. MIMO: A medical vision language model with visual referring multimodal input and pixel grounding multimodal output. Proceedings of the Computer Vision and Pattern Recognition Conference. 2025.
[72] Rasheed H , Maaz M , Mullappilly SS , Shaker A , Khan S , Cholakkal H , et al. Glamm: Pixel grounding large multimodal model. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2024.
[73] Le AJ , Liu H , Wang Y , Liu ZY , Zhu RK , Wenig TH , et al. U2-BENCH: Benchmarking large vision-language models on ultrasound understanding. 2025.arXiv preprint arXiv:2505.17779.
[74] Zhang K , Zhou R , Adhikarla E , Yan Z , Liu Y , Yu J , et al . A generalist vision–language foundation model for diverse biomedical tasks. Nat Med 2024; 30: 3129-3141.
[75] Peng C , Zhang K , Lyu MX , Liu HF , Sun LC , Wu YH . Scaling up biomedical vision-language models: fine-tuning, instruction tuning, and multi-modal learning. https://doi.org/10.48550/arXiv.2505.17436.
[76] Lewis M , Liu YH , Goyal N , Ghazvininejad M , Mohamed A , Levy O , et al. BART: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. https://doi.org/10.48550/arXiv.1910.13461.
Outlines

/