Co-varying neighborhood analysis identifies cell populations associated with phenotypes of interest from single-cell transcriptomics

Yakir A Reshef; Laurie Rumker; Joyce B Kang; Aparna Nathan; Ilya Korsunsky; Samira Asgari; Megan B Murray; D Branch Moody; Soumya Raychaudhuri

doi:10.1038/s41587-021-01066-4

Co-varying neighborhood analysis identifies cell populations associated with phenotypes of interest from single-cell transcriptomics

Nat Biotechnol. 2022 Mar;40(3):355-363. doi: 10.1038/s41587-021-01066-4. Epub 2021 Oct 21.

Authors

Yakir A Reshef^#^{1

2

3

4

5}, Laurie Rumker^#^{1

2

3

4

5}, Joyce B Kang^{1

2

3

4

5}, Aparna Nathan^{1

2

3

4

5}, Ilya Korsunsky^{1

2

3

4

5}, Samira Asgari^{1

2

3

4

5}, Megan B Murray⁶, D Branch Moody³, Soumya Raychaudhuri^{7

8

9

10

11

12}

Affiliations

¹ Center for Data Sciences, Brigham and Women's Hospital, Boston, MA, USA.
² Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA.
³ Division of Rheumatology, Inflammation, and Immunity, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA.
⁴ Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA.
⁵ Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA.
⁶ Department of Global Health and Social Medicine, Harvard Medical School, Boston, MA, USA.
⁷ Center for Data Sciences, Brigham and Women's Hospital, Boston, MA, USA. soumya@broadinstitute.org.
⁸ Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA. soumya@broadinstitute.org.
⁹ Division of Rheumatology, Inflammation, and Immunity, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA. soumya@broadinstitute.org.
¹⁰ Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA. soumya@broadinstitute.org.
¹¹ Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA. soumya@broadinstitute.org.
¹² Versus Arthritis Centre for Genetics and Genomics, Centre for Musculoskeletal Research, Manchester Academic Health Science Centre, University of Manchester, Manchester, UK. soumya@broadinstitute.org.

^# Contributed equally.

Abstract

As single-cell datasets grow in sample size, there is a critical need to characterize cell states that vary across samples and associate with sample attributes, such as clinical phenotypes. Current statistical approaches typically map cells to clusters and then assess differences in cluster abundance. Here we present co-varying neighborhood analysis (CNA), an unbiased method to identify associated cell populations with greater flexibility than cluster-based approaches. CNA characterizes dominant axes of variation across samples by identifying groups of small regions in transcriptional space-termed neighborhoods-that co-vary in abundance across samples, suggesting shared function or regulation. CNA performs statistical testing for associations between any sample-level attribute and the abundances of these co-varying neighborhood groups. Simulations show that CNA enables more sensitive and accurate identification of disease-associated cell states than a cluster-based approach. When applied to published datasets, CNA captures a Notch activation signature in rheumatoid arthritis, identifies monocyte populations expanded in sepsis and identifies a novel T cell population associated with progression to active tuberculosis.

Co-varying neighborhood analysis identifies cell populations associated with phenotypes of interest from single-cell transcriptomics

Authors

Affiliations

Abstract

Publication types

MeSH terms

Grants and funding