Collaboration between clinicians and vision-language models in radiology report generation

Ryutaro Tanno; David G T Barrett; Andrew Sellergren; Sumedh Ghaisas; Sumanth Dathathri; Abigail See; Johannes Welbl; Charles Lau; Tao Tu; Shekoofeh Azizi; Karan Singhal; Mike Schaekermann; Rhys May; Roy Lee; SiWai Man; Sara Mahdavi; Zahra Ahmed; Yossi Matias; Joelle Barral; S M Ali Eslami; Danielle Belgrave; Yun Liu; Sreenivasa Raju Kalidindi; Shravya Shetty; Vivek Natarajan; Pushmeet Kohli; Po-Sen Huang; Alan Karthikesalingam; Ira Ktena

doi:10.1038/s41591-024-03302-1

Collaboration between clinicians and vision-language models in radiology report generation

Nat Med. 2024 Nov 7. doi: 10.1038/s41591-024-03302-1. Online ahead of print.

Authors

Ryutaro Tanno^#¹, David G T Barrett^#², Andrew Sellergren³, Sumedh Ghaisas⁴, Sumanth Dathathri⁴, Abigail See⁴, Johannes Welbl⁴, Charles Lau³, Tao Tu⁴, Shekoofeh Azizi⁴, Karan Singhal^{3

5}, Mike Schaekermann³, Rhys May⁴, Roy Lee³, SiWai Man³, Sara Mahdavi⁴, Zahra Ahmed⁴, Yossi Matias³, Joelle Barral⁴, S M Ali Eslami⁴, Danielle Belgrave^{4

6}, Yun Liu³, Sreenivasa Raju Kalidindi⁷, Shravya Shetty³, Vivek Natarajan³, Pushmeet Kohli⁴, Po-Sen Huang⁴, Alan Karthikesalingam⁸, Ira Ktena⁹

Affiliations

¹ Google DeepMind, London, UK. rtanno@google.com.
² Google DeepMind, London, UK. barrettdavid@google.com.
³ Google Research, London, UK.
⁴ Google DeepMind, London, UK.
⁵ Open AI, San Francisco, CA, USA.
⁶ GlaxoSmithKline AI, London, UK.
⁷ Apollo Radiology International, Hyderabad, India.
⁸ Google Research, London, UK. alankarthi@google.com.
⁹ Google DeepMind, London, UK. iraktena@google.com.

^# Contributed equally.

PMID: 39511432
DOI: 10.1038/s41591-024-03302-1

Abstract

Automated radiology report generation has the potential to improve patient care and reduce the workload of radiologists. However, the path toward real-world adoption has been stymied by the challenge of evaluating the clinical quality of artificial intelligence (AI)-generated reports. We build a state-of-the-art report generation system for chest radiographs, called Flamingo-CXR, and perform an expert evaluation of AI-generated reports by engaging a panel of board-certified radiologists. We observe a wide distribution of preferences across the panel and across clinical settings, with 56.1% of Flamingo-CXR intensive care reports evaluated to be preferable or equivalent to clinician reports, by half or more of the panel, rising to 77.7% for in/outpatient X-rays overall and to 94% for the subset of cases with no pertinent abnormal findings. Errors were observed in human-written reports and Flamingo-CXR reports, with 24.8% of in/outpatient cases containing clinically significant errors in both report types, 22.8% in Flamingo-CXR reports only and 14.0% in human reports only. For reports that contain errors we develop an assistive setting, a demonstration of clinician-AI collaboration for radiology report composition, indicating new possibilities for potential clinical utility.