BEEx is An Open-Source Tool that Evaluates Batch Effects in Medical Images to Enable Multi-center Studies

Yuxin Wu; Xiongjun Xu; Yuan Cheng; Xiuming Zhang; Fanxi Liu; Zhenhui Li; Lei Hu; Anant Madabhushi; Peng Gao; Zaiyi Liu; Cheng Lu

doi:10.1158/0008-5472.CAN-23-3846

BEEx is An Open-Source Tool that Evaluates Batch Effects in Medical Images to Enable Multi-center Studies

Cancer Res. 2024 Dec 11. doi: 10.1158/0008-5472.CAN-23-3846. Online ahead of print.

Authors

Yuxin Wu¹, Xiongjun Xu², Yuan Cheng¹, Xiuming Zhang³, Fanxi Liu⁴, Zhenhui Li⁵, Lei Hu¹, Anant Madabhushi⁶, Peng Gao⁷, Zaiyi Liu¹, Cheng Lu¹

Affiliations

¹ Guangdong Provincial People's Hospital, Guangzhou, Guangdong, China.
² Third Affiliated Hospital of Sun Yat-sen University, China.
³ The First Affiliated Hospital, College of Medicine, Zhejiang University, Hangzhou, Zhejiang, China.
⁴ National University of Singapore, Singapore.
⁵ Kunming Medical University, China.
⁶ Emory University, Atlanta, GA, United States.
⁷ Qilu Hospital of Shandong University, Jinan, Shandong, China.

PMID: 39661030
DOI: 10.1158/0008-5472.CAN-23-3846

Abstract

The batch effect is a nonbiological variation that arises from technical differences across different batches of data during the data generation process for acquisition-related reasons, such as collection of images at different sites or using different scanners. This phenomenon can affect the robustness and generalizability of computational pathology- or radiology-based cancer diagnostic models, especially in multi-center studies. To address this issue, we developed an open-source platform, Batch Effect Explorer (BEEx), that is designed to qualitatively and quantitatively determine whether batch effects exist among medical image datasets from different sites. A suite of tools was incorporated into BEEx that provide visualization and quantitative metrics based on intensity, gradient, and texture features to allow users to determine whether there are any image variables or combinations of variables that can distinguish datasets from different sites in an unsupervised manner. BEEx was designed to support various medical imaging techniques, including microscopy and radiology. Four use cases clearly demonstrated the ability of BEEx to identify batch effects and validated the effectiveness of rectification methods for batch effect reduction. Overall, BEEx is a scalable and versatile framework designed to read, process, and analyze a wide range of medical images to facilitate the identification and mitigation of batch effects, which can enhance the reliability and validity of image-based studies.