In recent years, generative artificial intelligence (AI), particularly large language models (LLMs) and their multimodal counterparts, Multi-Modal Large Language Models (MM-LLMs), including Vision Language Models (VLMs), have generated considerable interest in the global AI discourse. LLMs, or pre-trained language models (such as ChatGPT, Med-PaLM, LLaMA, etc.), are neural network architectures trained on extensive text data, excelling in language comprehension and generation. MM-LLMs, a subset of foundation models, are trained on multimodal datasets, integrating text with another modality, such as images, to better learn universal representations akin to human cognition. This versatility enables them to excel in tasks like chatbots, translation, and creative writing while facilitating knowledge sharing through transfer learning, federated learning, and synthetic data creation.Several of these models can have potentially appealing applications in the medical domain, including, but not limited to, enhancing patient care by processing patient data, summarizing reports and relevant literature, providing diagnostic, treatment, and follow-up recommendations, and ancillary tasks like coding and billing. As radiologists enter this promising but uncharted territory, it is imperative for them to be familiar with the basic terminology and processes of LLMs. Herein, we present an overview of the LLMs and their potential applications and challenges in the imaging domain.ABBREVIATIONS: AI: Artificial Intelligence; BERT: Bidirectional Encoder Representations from Transformers; CLIP: Contrastive Language-Image Pretraining; FM: Foundation Models; GPT: Generative Pre-trained Transformer; LLM: Large language model; NLP: natural language processing; VLM: Vision Language Models.
© 2024 by American Journal of Neuroradiology.