Large Language Model Use in Radiology Residency Applications: Unwelcomed but Inevitable

Emile B Gordon; Charles Maxfield; Robert French; Laura J Fish; Jacob Romm; Emily Barre; Erica Kinne; Ryan Peterson; Lars J Grimm

doi:10.1016/j.jacr.2024.08.027

Large Language Model Use in Radiology Residency Applications: Unwelcomed but Inevitable

J Am Coll Radiol. 2024 Sep 17:S1546-1440(24)00768-3. doi: 10.1016/j.jacr.2024.08.027. Online ahead of print.

Authors

Emile B Gordon¹, Charles Maxfield², Robert French², Laura J Fish³, Jacob Romm², Emily Barre², Erica Kinne⁴, Ryan Peterson⁵, Lars J Grimm²

Affiliations

¹ Department of Radiology, Duke University Health System, Durham, North Carolina; Department of Radiology, University of California San Diego, La Jolla, California. Electronic address: emgordon@health.ucsd.edu.
² Department of Radiology, Duke University Health System, Durham, North Carolina.
³ Duke Cancer Institute, Durham, North Carolina; Department of Family Medicine and Community Health, Duke University School of Medicine, Durham, North Carolina.
⁴ Department of Radiology, Loma Linda University Medical Center, Loma Linda, California.
⁵ Department of Radiology and Imaging Sciences, Emory University, Atlanta, Georgia.

PMID: 39299618
DOI: 10.1016/j.jacr.2024.08.027

Abstract

Objective: This study explores radiology program directors' perspectives on the impact of large language model (LLM) use among residency applicants to craft personal statements.

Methods: Eight program directors from the Radiology Residency Education Research Alliance participated in a mixed-methods study, which included a survey regarding impressions of artificial intelligence (AI)-generated personal statements and focus group discussions (July 2023). Each director reviewed four personal statement variations for five applicants, anonymized to author type: the original and three Chat Generative Pre-trained Transformer-4.0 (GPT) versions generated with varying prompts, aggregated for analysis. A 5-point Likert scale surveyed the writing quality, including voice, clarity, engagement, organization, and perceived origin of each statement. An experienced qualitative researcher facilitated focus group discussions. Data analysis was performed using a rapid analytic approach with a coding template capturing key areas related to residency applications.

Results: GPT-generated statement ratings were more often average or worse in quality (56%, 268 of 475) than ratings of human-authored statements (29%, 45 of 160). Although reviewers were not confident in their ability to distinguish the origin of personal statements, they did so reliably and consistently, identifying the human-authored personal statements at 95% (38 of 40) as probably or definitely original. Focus group discussions highlighted the inevitable use of AI in crafting personal statements and concerns about its impact on the authenticity and the value of the personal statement in residency selections. Program directors were divided on the appropriate use and regulation of AI.

Discussion: Radiology residency program directors rated LLM-generated personal statements as lower in quality and expressed concern about the loss of the applicant's voice but acknowledged the inevitability of increased AI use in the generation of application statements.

Keywords: ChatGPT; LLM; education; medical students; residency.