An Experimental Comparison of Quality Models for Health Data De-Identification

Stud Health Technol Inform. 2017:245:704-708.

Abstract

When individual-level health data are shared in biomedical research, the privacy of patients must be protected. This is typically achieved by data de-identification methods, which transform data in such a way that formal privacy requirements are met. In the process, it is important to minimize the loss of information to maintain data quality. Although several models have been proposed for measuring this aspect, it remains unclear which model is best suited for which application. We have therefore performed an extensive experimental comparison. We first implemented several common quality models into the ARX de-identification tool for biomedical data. We then used each model to de-identify a patient discharge dataset covering almost 4 million cases and outputs were analyzed to measure the impact of different quality models on real-world applications. Our results show that different models are best suited for specific applications, but that one model (Non-Uniform Entropy) is particularly well suited for general-purpose use.

Keywords: Data anonymization; Personally identifiable information; Privacy.

MeSH terms

  • Biomedical Research*
  • Confidentiality
  • Data Accuracy
  • Data Anonymization*
  • Humans
  • Privacy