Statistical analysis of multiple regions-of-interest in multiplexed spatial proteomics data

Sarah Samorodnitsky; Michael C Wu

doi:10.1093/bib/bbae522

Statistical analysis of multiple regions-of-interest in multiplexed spatial proteomics data

Brief Bioinform. 2024 Sep 23;25(6):bbae522. doi: 10.1093/bib/bbae522.

Authors

Sarah Samorodnitsky^{1

2}, Michael C Wu^{1

2}

Affiliations

¹ Public Health Sciences Division, Fred Hutchinson Cancer Center, Seattle, WA 98109, United States.
² SWOG Statistics and Data Management Center, Fred Hutchinson Cancer Center, Seattle, WA 98109, United States.

Abstract

Multiplexed spatial proteomics reveals the spatial organization of cells in tumors, which is associated with important clinical outcomes such as survival and treatment response. This spatial organization is often summarized using spatial summary statistics, including Ripley's K and Besag's L. However, if multiple regions of the same tumor are imaged, it is unclear how to synthesize the relationship with a single patient-level endpoint. We evaluate extant approaches for accommodating multiple images within the context of associating summary statistics with outcomes. First, we consider averaging-based approaches wherein multiple summaries for a single sample are combined in a weighted mean. We then propose a novel class of ensemble testing approaches in which we simulate random weights used to aggregate summaries, test for an association with outcomes, and combine the $P$-values. We systematically evaluate the performance of these approaches via simulation and application to data from non-small cell lung cancer, colorectal cancer, and triple negative breast cancer. We find that the optimal strategy varies, but a simple weighted average of the summary statistics based on the number of cells in each image often offers the highest power and controls type I error effectively. When the size of the imaged regions varies, incorporating this variation into the weighted aggregation may yield additional power in cases where the varying size is informative. Ensemble testing (but not resampling) offered high power and type I error control across conditions in our simulated data sets.

Keywords: multiplexed immunofluorescence; multiplexed spatial proteomics; regions-of-interest; single-cell data; spatial point process.

MeSH terms

Algorithms
Carcinoma, Non-Small-Cell Lung / genetics
Carcinoma, Non-Small-Cell Lung / metabolism
Carcinoma, Non-Small-Cell Lung / pathology
Colorectal Neoplasms / genetics
Colorectal Neoplasms / metabolism
Colorectal Neoplasms / pathology
Data Interpretation, Statistical
Humans
Lung Neoplasms / genetics
Lung Neoplasms / metabolism
Lung Neoplasms / pathology
Neoplasms / genetics
Neoplasms / metabolism
Proteomics* / methods
Triple Negative Breast Neoplasms / genetics
Triple Negative Breast Neoplasms / metabolism
Triple Negative Breast Neoplasms / pathology

Abstract

MeSH terms

Grants and funding