Appropriate Statistical Methods to Assess Cross-study Diagnostic 23-Gene Expression Profile Test Performance for Cutaneous Melanocytic Neoplasms

Am J Dermatopathol. 2024 Aug 14. doi: 10.1097/DAD.0000000000002808. Online ahead of print.

Abstract

Comparing studies of molecular ancillary diagnostic tests for difficult-to-diagnose cutaneous melanocytic neoplasms presents a methodological challenge, given the disparate ways accuracy metrics are calculated. A recent report by Boothby-Shoemaker et al investigating the real-world accuracy of the 23-gene expression profile (23-GEP) test highlights this methodological difficulty, reporting lower accuracy than previously observed. However, their calculation method-with indeterminate test results defined as either false positive or false negative-was different than those used in previous studies. We corrected for these differences and recalculated their reported accuracy metrics in the same manner as the previous studies to enable appropriate comparison with previously published reports. This corrected analysis showed a sensitivity of 92.1% (95% confidence interval [CI], 82.1%-100%) and specificity of 94.4% (91.6%-96.9%). We then compared these results directly to previous studies with >25 benign and >25 malignant cases with outcomes and/or concordant histopathological diagnosis by ≥3 dermatopathologists. All studies assessed had enrollment imbalances of benign versus malignant patients (0.8-7.0 ratio), so balanced cohorts were resampled according to the lowest common denominator to calculate point estimates and CIs for accuracy metrics. Overall, we found no statistically significant differences in the ranges of 23-GEP sensitivity, 90.4%-96.3% (95% CI, 80.8%-100%), specificity, 87.3%-96.2% (78.2%-100%), positive predictive value, 88.5%-96.1% (81.5%-100%), or negative predictive value, 91.1%-96.3% (83.6%-100%) between previous studies and the cohort from Boothby-Shoemaker et al with this unified methodological approach. Rigorous standardization of calculation methods is necessary when the goal is direct cross-study comparability.