The Utility of Unsupervised Machine Learning in Anatomic Pathology

Ewen D McAlpine; Pamela Michelow; Turgay Celik

doi:10.1093/ajcp/aqab085

The Utility of Unsupervised Machine Learning in Anatomic Pathology

Am J Clin Pathol. 2022 Jan 6;157(1):5-14. doi: 10.1093/ajcp/aqab085.

Authors

Ewen D McAlpine^{1

2}, Pamela Michelow^{1

2}, Turgay Celik^{3

4}

Affiliations

¹ Division of Anatomical Pathology, School of Pathology, University of the Witwatersrand, Johannesburg, South Africa.
² National Health Laboratory Service, Johannesburg, South Africa.
³ School of Electrical and Information Engineering, University of the Witwatersrand, Johannesburg, South Africa.
⁴ Wits Institute of Data Science, University of the Witwatersrand, Johannesburg, South Africa.

PMID: 34302331
DOI: 10.1093/ajcp/aqab085

Abstract

Objectives: Developing accurate supervised machine learning algorithms is hampered by the lack of representative annotated datasets. Most data in anatomic pathology are unlabeled and creating large, annotated datasets is a time consuming and laborious process. Unsupervised learning, which does not require annotated data, possesses the potential to assist with this challenge. This review aims to introduce the concept of unsupervised learning and illustrate how clustering, generative adversarial networks (GANs) and autoencoders have the potential to address the lack of annotated data in anatomic pathology.

Methods: A review of unsupervised learning with examples from the literature was carried out.

Results: Clustering can be used as part of semisupervised learning where labels are propagated from a subset of annotated data points to remaining unlabeled data points in a dataset. GANs may assist by generating large amounts of synthetic data and performing color normalization. Autoencoders allow training of a network on a large, unlabeled dataset and transferring learned representations to a classifier using a smaller, labeled subset (unsupervised pretraining).

Conclusions: Unsupervised machine learning techniques such as clustering, GANs, and autoencoders, used individually or in combination, may help address the lack of annotated data in pathology and improve the process of developing supervised learning models.

Keywords: Autoencoder; Clustering; Digital pathology; Generative adversarial networks; Machine learning; Semisupervised learning; Unsupervised learning.

Publication types

Review

MeSH terms

Algorithms
Humans
Supervised Machine Learning*
Unsupervised Machine Learning*

Grants and funding

University of the Witwatersrand, Johannesburg