Accessing primary care Big Data: the development of a software algorithm to explore the rich content of consultation records

J MacRae; B Darlow; L McBain; O Jones; M Stubbe; N Turner; A Dowell

doi:10.1136/bmjopen-2015-008160

Accessing primary care Big Data: the development of a software algorithm to explore the rich content of consultation records

BMJ Open. 2015 Aug 21;5(8):e008160. doi: 10.1136/bmjopen-2015-008160.

Authors

J MacRae¹, B Darlow², L McBain², O Jones³, M Stubbe², N Turner⁴, A Dowell²

Affiliations

¹ Patients First, Wellington, New Zealand.
² Department of Primary Health Care and General Practice, University of Otago, Wellington, New Zealand.
³ Compass Health Wellington Trust, Wellington, New Zealand.
⁴ Department of General Practice and Primary Care, University of Auckland, Auckland, New Zealand.

Abstract

Objective: To develop a natural language processing software inference algorithm to classify the content of primary care consultations using electronic health record Big Data and subsequently test the algorithm's ability to estimate the prevalence and burden of childhood respiratory illness in primary care.

Design: Algorithm development and validation study. To classify consultations, the algorithm is designed to interrogate clinical narrative entered as free text, diagnostic (Read) codes created and medications prescribed on the day of the consultation.

Setting: Thirty-six consenting primary care practices from a mixed urban and semirural region of New Zealand. Three independent sets of 1200 child consultation records were randomly extracted from a data set of all general practitioner consultations in participating practices between 1 January 2008-31 December 2013 for children under 18 years of age (n=754,242). Each consultation record within these sets was independently classified by two expert clinicians as respiratory or non-respiratory, and subclassified according to respiratory diagnostic categories to create three 'gold standard' sets of classified records. These three gold standard record sets were used to train, test and validate the algorithm.

Outcome measures: Sensitivity, specificity, positive predictive value and F-measure were calculated to illustrate the algorithm's ability to replicate judgements of expert clinicians within the 1200 record gold standard validation set.

Results: The algorithm was able to identify respiratory consultations in the 1200 record validation set with a sensitivity of 0.72 (95% CI 0.67 to 0.78) and a specificity of 0.95 (95% CI 0.93 to 0.98). The positive predictive value of algorithm respiratory classification was 0.93 (95% CI 0.89 to 0.97). The positive predictive value of the algorithm classifying consultations as being related to specific respiratory diagnostic categories ranged from 0.68 (95% CI 0.40 to 1.00; other respiratory conditions) to 0.91 (95% CI 0.79 to 1.00; throat infections).

Conclusions: A software inference algorithm that uses primary care Big Data can accurately classify the content of clinical consultations. This algorithm will enable accurate estimation of the prevalence of childhood respiratory illness in primary care and resultant service utilisation. The methodology can also be applied to other areas of clinical care.

Keywords: PRIMARY CARE.

Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://group.bmj.com/group/rights-licensing/permissions.

Publication types

Research Support, Non-U.S. Gov't
Validation Study

MeSH terms

Adolescent
Algorithms*
Child
Child, Preschool
Electronic Health Records / standards*
Female
Humans
Infant
Infant, Newborn
Male
Natural Language Processing
New Zealand / epidemiology
Outcome Assessment, Health Care
Primary Health Care / statistics & numerical data*
Referral and Consultation / classification*
Referral and Consultation / standards
Respiratory Tract Diseases / epidemiology*
Sensitivity and Specificity
Software*