Synthetic Data and Its Utility in Pathology and Laboratory Medicine

Lab Invest. 2024 Aug;104(8):102095. doi: 10.1016/j.labinv.2024.102095. Epub 2024 Jun 24.

Abstract

In our rapidly expanding landscape of artificial intelligence, synthetic data have become a topic of great promise and also some concern. This review aimed to provide pathologists and laboratory professionals with a primer on the role of synthetic data and how it may soon shape the landscape within our field. Using synthetic data presents many advantages but also introduces a milieu of new obstacles and limitations. This review aimed to provide pathologists and laboratory professionals with a primer on the general concept of synthetic data and its potential to transform our field. By leveraging synthetic data, we can help accelerate the development of various machine learning models and enhance our medical education and research/quality study needs. This review explored the methods for generating synthetic data, including rule-based, machine learning model-based and hybrid approaches, as they apply to applications within pathology and laboratory medicine. We also discussed the limitations and challenges associated with such synthetic data, including data quality, malicious use, and ethical bias/concerns and challenges. By understanding the potential benefits (ie, medical education, training artificial intelligence programs, and proficiency testing, etc) and limitations of this new data realm, we can not only harness its power to improve patient outcomes, advance research, and enhance the practice of pathology but also become readily aware of their intrinsic limitations.

Keywords: artificial intelligence; data simulation; generative AI; laboratory medicine; machine learning models; pathology artificial intelligence; pathology education; synthetic data.

Publication types

  • Review

MeSH terms

  • Artificial Intelligence
  • Humans
  • Machine Learning*
  • Pathology