ENCoDE - a skin tone and clinical dataset from a prospective trial on acute care patients

Sicheng Hao; Joao Matos; Katelyn Dempsey; Mahmoud Alwakeel; Jared Houghtaling; Chuan Hong; Judy Gichoya; Warren Kibbe; Michael Pencina; Christopher E Cox; A Ian Wong

doi:10.1101/2024.08.07.24311623

ENCoDE - a skin tone and clinical dataset from a prospective trial on acute care patients

medRxiv [Preprint]. 2024 Aug 8:2024.08.07.24311623. doi: 10.1101/2024.08.07.24311623.

Authors

Sicheng Hao, Joao Matos, Katelyn Dempsey, Mahmoud Alwakeel, Jared Houghtaling, Chuan Hong, Judy Gichoya, Warren Kibbe, Michael Pencina, Christopher E Cox, A Ian Wong

Abstract

Background: Although hypothesized to be the root cause of the pulse oximetry disparities, skin tone and its use for improving medical therapies have yet to be extensively studied. Studies previously used self-reported race as a proxy variable for skin tone. However, this approach cannot account for skin tone variability within race groups and also risks the potential to be confounded by other non-biological factors when modeling data. Therefore, to better evaluate health disparities associated with pulse oximetry, this study aimed to create a unique baseline dataset that included skin tone and electronic health record (EHR) data.

Methods: Patients admitted to Duke University Hospital were eligible if they had at least one pulse oximetry value recorded within 5 minutes before an arterial blood gas (ABG) value. We collected skin tone data at 16 different body locations using multiple devices, including administered visual scales, colorimetric, spectrophotometric, and photography via mobile phone cameras. All patients' data were linked in Duke's Protected Analytics Computational Environment (PACE), converted into a common data model, and then de-identified before publication in PhysioNet.

Results: Skin tone data were collected from 128 patients. We assessed 167 features per skin location on each patient. We also collected over 2000 images from mobile phones measured in the same controlled environment. Skin tone data are linked with patients' EHR data, such as laboratory data, vital sign recordings, and demographic information.

Conclusions: Measuring different aspects of skin tone for each of the sixteen body locations and linking them with patients' EHR data could assist in the development of a more equitable AI model to combat disparities in healthcare associated with skin tone. A common data model format enables easy data federation with similar data from other sources, facilitating multicenter research on skin tone in healthcare.

Description: A prospectively collected EHR-linked skin tone measurements database in a common data model with emphasis on pulse oximetry disparities.

Publication types

Preprint