Objectives: Developing accurate supervised machine learning algorithms is hampered by the lack of representative annotated datasets. Most data in anatomic pathology are unlabeled and creating large, annotated datasets is a time consuming and laborious process. Unsupervised learning, which does not require annotated data, possesses the potential to assist with this challenge. This review aims to introduce the concept of unsupervised learning and illustrate how clustering, generative adversarial networks (GANs) and autoencoders have the potential to address the lack of annotated data in anatomic pathology.
Methods: A review of unsupervised learning with examples from the literature was carried out.
Results: Clustering can be used as part of semisupervised learning where labels are propagated from a subset of annotated data points to remaining unlabeled data points in a dataset. GANs may assist by generating large amounts of synthetic data and performing color normalization. Autoencoders allow training of a network on a large, unlabeled dataset and transferring learned representations to a classifier using a smaller, labeled subset (unsupervised pretraining).
Conclusions: Unsupervised machine learning techniques such as clustering, GANs, and autoencoders, used individually or in combination, may help address the lack of annotated data in pathology and improve the process of developing supervised learning models.
Keywords: Autoencoder; Clustering; Digital pathology; Generative adversarial networks; Machine learning; Semisupervised learning; Unsupervised learning.
© American Society for Clinical Pathology, 2021. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.