Identifying risk factors for heart disease over time: Overview of 2014 i2b2/UTHealth shared task Track 2

Amber Stubbs; Christopher Kotfila; Hua Xu; Özlem Uzuner

doi:10.1016/j.jbi.2015.07.001

Identifying risk factors for heart disease over time: Overview of 2014 i2b2/UTHealth shared task Track 2

J Biomed Inform. 2015 Dec;58 Suppl(Suppl):S67-S77. doi: 10.1016/j.jbi.2015.07.001. Epub 2015 Jul 22.

Authors

Amber Stubbs¹, Christopher Kotfila², Hua Xu³, Özlem Uzuner²

Affiliations

¹ School of Library and Information Science, Simmons College, Boston, MA, USA. Electronic address: stubbs@simmons.edu.
² Department of Information Studies, State University of New York at Albany, Albany, NY, USA.
³ Center for Computational Biomedicine, University of Texas Health Science Center at Houston, Houston, TX, USA.

Abstract

The second track of the 2014 i2b2/UTHealth natural language processing shared task focused on identifying medical risk factors related to Coronary Artery Disease (CAD) in the narratives of longitudinal medical records of diabetic patients. The risk factors included hypertension, hyperlipidemia, obesity, smoking status, and family history, as well as diabetes and CAD, and indicators that suggest the presence of those diseases. In addition to identifying the risk factors, this track of the 2014 i2b2/UTHealth shared task studied the presence and progression of the risk factors in longitudinal medical records. Twenty teams participated in this track, and submitted 49 system runs for evaluation. Six of the top 10 teams achieved F1 scores over 0.90, and all 10 scored over 0.87. The most successful system used a combination of additional annotations, external lexicons, hand-written rules and Support Vector Machines. The results of this track indicate that identification of risk factors and their progression over time is well within the reach of automated systems.

Keywords: CAD; Clinical narratives; Diabetes; Natural language processing.

Publication types

Comparative Study
Evaluation Study
Review

MeSH terms

Aged
Boston / epidemiology
Cohort Studies
Comorbidity
Computer Security
Confidentiality
Coronary Artery Disease / diagnosis
Coronary Artery Disease / epidemiology*
Data Mining / methods*
Diabetes Complications / diagnosis
Diabetes Complications / epidemiology*
Electronic Health Records / organization & administration*
Female
Humans
Incidence
Longitudinal Studies
Male
Middle Aged
Narration*
Natural Language Processing*
Pattern Recognition, Automated / methods
Risk Assessment / methods
Vocabulary, Controlled

Abstract

Publication types

MeSH terms

Grants and funding