Obtaining spatially resolved tumor purity maps using deep multiple instance learning in a pan-cancer study

Patterns (N Y). 2021 Dec 9;3(2):100399. doi: 10.1016/j.patter.2021.100399. eCollection 2022 Feb 11.

Abstract

Tumor purity is the percentage of cancer cells within a tissue section. Pathologists estimate tumor purity to select samples for genomic analysis by manually reading hematoxylin-eosin (H&E)-stained slides, which is tedious, time consuming, and prone to inter-observer variability. Besides, pathologists' estimates do not correlate well with genomic tumor purity values, which are inferred from genomic data and accepted as accurate for downstream analysis. We developed a deep multiple instance learning model predicting tumor purity from H&E-stained digital histopathology slides. Our model successfully predicted tumor purity in eight The Cancer Genome Atlas (TCGA) cohorts and a local Singapore cohort. The predictions were highly consistent with genomic tumor purity values. Thus, our model can be utilized to select samples for genomic analysis, which will help reduce pathologists' workload and decrease inter-observer variability. Furthermore, our model provided tumor purity maps showing the spatial variation within sections. They can help better understand the tumor microenvironment.

Keywords: computational pathology; deep learning; digital histopathology; digital pathology; genomic sequencing; multiple instance learning; spatial omics; tumor microenvironment; tumor purity; whole-slide images.