Towards a general-purpose foundation model for computational pathology

Richard J Chen; Tong Ding; Ming Y Lu; Drew F K Williamson; Guillaume Jaume; Andrew H Song; Bowen Chen; Andrew Zhang; Daniel Shao; Muhammad Shaban; Mane Williams; Lukas Oldenburg; Luca L Weishaupt; Judy J Wang; Anurag Vaidya; Long Phi Le; Georg Gerber; Sharifa Sahai; Walt Williams; Faisal Mahmood

doi:10.1038/s41591-024-02857-3

Towards a general-purpose foundation model for computational pathology

Nat Med. 2024 Mar;30(3):850-862. doi: 10.1038/s41591-024-02857-3. Epub 2024 Mar 19.

Authors

Richard J Chen^#^{1

2

3

4

5}, Tong Ding^#^{1

6}, Ming Y Lu^#^{1

2

3

4

7}, Drew F K Williamson^#^{1

2

3}, Guillaume Jaume^{1

2

3

4}, Andrew H Song^{1

2

3

4}, Bowen Chen^{1

2}, Andrew Zhang^{1

2

3

4

8}, Daniel Shao^{1

2

3

4

8}, Muhammad Shaban^{1

2

3

4}, Mane Williams^{1

2

3

4

5}, Lukas Oldenburg¹, Luca L Weishaupt^{1

2

3

4

8}, Judy J Wang¹, Anurag Vaidya^{1

2

3

4

8}, Long Phi Le^{2

8}, Georg Gerber¹, Sharifa Sahai^{1

2

3

4

9}, Walt Williams^{1

6}, Faisal Mahmood^{10

11

12

13

14}

Affiliations

¹ Department of Pathology, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA.
² Department of Pathology, Massachusetts General Hospital, Harvard Medical School, Boston, MA, USA.
³ Cancer Program, Broad Institute of Harvard and MIT, Cambridge, MA, USA.
⁴ Cancer Data Science Program, Dana-Farber Cancer Institute, Boston, MA, USA.
⁵ Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA.
⁶ Harvard John A. Paulson School of Engineering and Applied Sciences, Harvard University, Cambridge, MA, USA.
⁷ Electrical Engineering and Computer Science, Massachusetts Institute of Technology (MIT), Cambridge, MA, USA.
⁸ Health Sciences and Technology, Harvard-MIT, Cambridge, MA, USA.
⁹ Department of Systems Biology, Harvard University, Cambridge, MA, USA.
¹⁰ Department of Pathology, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA. faisalmahmood@bwh.harvard.edu.
¹¹ Department of Pathology, Massachusetts General Hospital, Harvard Medical School, Boston, MA, USA. faisalmahmood@bwh.harvard.edu.
¹² Cancer Program, Broad Institute of Harvard and MIT, Cambridge, MA, USA. faisalmahmood@bwh.harvard.edu.
¹³ Cancer Data Science Program, Dana-Farber Cancer Institute, Boston, MA, USA. faisalmahmood@bwh.harvard.edu.
¹⁴ Harvard Data Science Initiative, Harvard University, Cambridge, MA, USA. faisalmahmood@bwh.harvard.edu.

^# Contributed equally.

Abstract

Quantitative evaluation of tissue images is crucial for computational pathology (CPath) tasks, requiring the objective characterization of histopathological entities from whole-slide images (WSIs). The high resolution of WSIs and the variability of morphological features present significant challenges, complicating the large-scale annotation of data for high-performance applications. To address this challenge, current efforts have proposed the use of pretrained image encoders through transfer learning from natural image datasets or self-supervised learning on publicly available histopathology datasets, but have not been extensively developed and evaluated across diverse tissue types at scale. We introduce UNI, a general-purpose self-supervised model for pathology, pretrained using more than 100 million images from over 100,000 diagnostic H&E-stained WSIs (>77 TB of data) across 20 major tissue types. The model was evaluated on 34 representative CPath tasks of varying diagnostic difficulty. In addition to outperforming previous state-of-the-art models, we demonstrate new modeling capabilities in CPath such as resolution-agnostic tissue classification, slide classification using few-shot class prototypes, and disease subtyping generalization in classifying up to 108 cancer types in the OncoTree classification system. UNI advances unsupervised representation learning at scale in CPath in terms of both pretraining data and downstream evaluation, enabling data-efficient artificial intelligence models that can generalize and transfer to a wide range of diagnostically challenging tasks and clinical workflows in anatomic pathology.

MeSH terms

Artificial Intelligence*
Workflow

Abstract

MeSH terms

Grants and funding