Restricting datasets to classifiable samples augments discovery of immune disease biomarkers

Nat Commun. 2024 Jun 26;15(1):5417. doi: 10.1038/s41467-024-49094-3.

Abstract

Immunological diseases are typically heterogeneous in clinical presentation, severity and response to therapy. Biomarkers of immune diseases often reflect this variability, especially compared to their regulated behaviour in health. This leads to a common difficulty that frustrates biomarker discovery and interpretation - namely, unequal dispersion of immune disease biomarker expression between patient classes necessarily limits a biomarker's informative range. To solve this problem, we introduce dataset restriction, a procedure that splits datasets into classifiable and unclassifiable samples. Applied to synthetic flow cytometry data, restriction identifies biomarkers that are otherwise disregarded. In advanced melanoma, restriction finds biomarkers of immune-related adverse event risk after immunotherapy and enables us to build multivariate models that accurately predict immunotherapy-related hepatitis. Hence, dataset restriction augments discovery of immune disease biomarkers, increases predictive certainty for classifiable samples and improves multivariate models incorporating biomarkers with a limited informative range. This principle can be directly extended to any classification task.

MeSH terms

  • Biomarkers* / metabolism
  • Flow Cytometry
  • Humans
  • Immune System Diseases / immunology
  • Immunotherapy / methods
  • Melanoma* / genetics
  • Melanoma* / immunology

Substances

  • Biomarkers