BEEx is An Open-Source Tool that Evaluates Batch Effects in Medical Images to Enable Multi-center Studies

Cancer Res. 2024 Dec 11. doi: 10.1158/0008-5472.CAN-23-3846. Online ahead of print.

Abstract

The batch effect is a nonbiological variation that arises from technical differences across different batches of data during the data generation process for acquisition-related reasons, such as collection of images at different sites or using different scanners. This phenomenon can affect the robustness and generalizability of computational pathology- or radiology-based cancer diagnostic models, especially in multi-center studies. To address this issue, we developed an open-source platform, Batch Effect Explorer (BEEx), that is designed to qualitatively and quantitatively determine whether batch effects exist among medical image datasets from different sites. A suite of tools was incorporated into BEEx that provide visualization and quantitative metrics based on intensity, gradient, and texture features to allow users to determine whether there are any image variables or combinations of variables that can distinguish datasets from different sites in an unsupervised manner. BEEx was designed to support various medical imaging techniques, including microscopy and radiology. Four use cases clearly demonstrated the ability of BEEx to identify batch effects and validated the effectiveness of rectification methods for batch effect reduction. Overall, BEEx is a scalable and versatile framework designed to read, process, and analyze a wide range of medical images to facilitate the identification and mitigation of batch effects, which can enhance the reliability and validity of image-based studies.