Coding and classifying GP data: the POLAR project

Christopher Pearce; Adam McLeod; Jon Patrick; Jason Ferrigi; Michael Michael Bainbridge; Natalie Rinehart; Anna Fragkoudi

doi:10.1136/bmjhci-2019-100009

Coding and classifying GP data: the POLAR project

BMJ Health Care Inform. 2019 Nov;26(1):e100009. doi: 10.1136/bmjhci-2019-100009.

Authors

Christopher Pearce¹, Adam McLeod², Jon Patrick³, Jason Ferrigi², Michael Michael Bainbridge⁴, Natalie Rinehart², Anna Fragkoudi²

Affiliations

¹ Outcome Health, East Burwood, Victoria, Australia drchrispearce@mac.com.
² Outcome Health, East Burwood, Victoria, Australia.
³ Health Language Analytics, Eveleigh, New South Wales, Australia.
⁴ School of Health Information Sciences, University of Victoria, Victoria, British Columbia, Canada.

Abstract

Background: Data, particularly 'big' data are increasingly being used for research in health. Using data from electronic medical records optimally requires coded data, but not all systems produce coded data.

Objective: To design a suitable, accurate method for converting large volumes of narrative diagnoses from Australian general practice records to codify them into SNOMED-CT-AU. Such codification will make them clinically useful for aggregation for population health and research purposes.

Method: The developed method consisted of using natural language processing to automatically code the texts, followed by a manual process to correct codes and subsequent natural language processing re-computation. These steps were repeated for four iterations until 95% of the records were coded. The coded data were then aggregated into classes considered to be useful for population health analytics.

Results: Coding the data effectively covered 95% of the corpus. Problems with the use of SNOMED CT-AU were identified and protocols for creating consistent coding were created. These protocols can be used to guide further development of SNOMED CT-AU (SCT). The coded values will be immensely useful for the development of population health analytics for Australia, and the lessons learnt applicable elsewhere.

Keywords: information management; information science.

MeSH terms

Australia
Big Data*
Electronic Health Records / organization & administration*
Electronic Health Records / standards
General Practice / organization & administration*
General Practice / standards
Humans
Natural Language Processing*
Systematized Nomenclature of Medicine*