Automated extraction of clinical traits of multiple sclerosis in electronic medical records

Mary F Davis; Subramaniam Sriram; William S Bush; Joshua C Denny; Jonathan L Haines

doi:10.1136/amiajnl-2013-001999

Automated extraction of clinical traits of multiple sclerosis in electronic medical records

J Am Med Inform Assoc. 2013 Dec;20(e2):e334-40. doi: 10.1136/amiajnl-2013-001999. Epub 2013 Oct 22.

Authors

Mary F Davis¹, Subramaniam Sriram, William S Bush, Joshua C Denny, Jonathan L Haines

Affiliation

¹ Center for Human Genetics Research, Vanderbilt University Medical Center, Nashville, Tennessee, USA.

Abstract

Objectives: The clinical course of multiple sclerosis (MS) is highly variable, and research data collection is costly and time consuming. We evaluated natural language processing techniques applied to electronic medical records (EMR) to identify MS patients and the key clinical traits of their disease course.

Materials and methods: We used four algorithms based on ICD-9 codes, text keywords, and medications to identify individuals with MS from a de-identified, research version of the EMR at Vanderbilt University. Using a training dataset of the records of 899 individuals, algorithms were constructed to identify and extract detailed information regarding the clinical course of MS from the text of the medical records, including clinical subtype, presence of oligoclonal bands, year of diagnosis, year and origin of first symptom, Expanded Disability Status Scale (EDSS) scores, timed 25-foot walk scores, and MS medications. Algorithms were evaluated on a test set validated by two independent reviewers.

Results: We identified 5789 individuals with MS. For all clinical traits extracted, precision was at least 87% and specificity was greater than 80%. Recall values for clinical subtype, EDSS scores, and timed 25-foot walk scores were greater than 80%.

Discussion and conclusion: This collection of clinical data represents one of the largest databases of detailed, clinical traits available for research on MS. This work demonstrates that detailed clinical information is recorded in the EMR and can be extracted for research purposes with high reliability.

Keywords: Multiple sclerosis; electronic health records.

Publication types

Research Support, N.I.H., Extramural

MeSH terms

Adolescent
Adult
Aged
Aged, 80 and over
Algorithms*
Child
Data Mining*
Disease Progression
Electronic Health Records*
Female
Humans
Male
Middle Aged
Multiple Sclerosis / diagnosis*
Natural Language Processing*

Abstract

Publication types

MeSH terms

Grants and funding