Crowdsourcing as a novel technique for retinal fundus photography classification: analysis of images in the EPIC Norfolk cohort on behalf of the UK Biobank Eye and Vision Consortium

Danny Mitry; Tunde Peto; Shabina Hayat; James E Morgan; Kay-Tee Khaw; Paul J Foster

doi:10.1371/journal.pone.0071154

Crowdsourcing as a novel technique for retinal fundus photography classification: analysis of images in the EPIC Norfolk cohort on behalf of the UK Biobank Eye and Vision Consortium

PLoS One. 2013 Aug 21;8(8):e71154. doi: 10.1371/journal.pone.0071154. eCollection 2013.

Authors

Danny Mitry¹, Tunde Peto, Shabina Hayat, James E Morgan, Kay-Tee Khaw, Paul J Foster

Affiliation

¹ National Institute for Health Research Biomedical Research Centre at Moorfields Eye Hospital & University College London Institute of Ophthalmology, London, United Kingdom.

Abstract

Aim: Crowdsourcing is the process of outsourcing numerous tasks to many untrained individuals. Our aim was to assess the performance and repeatability of crowdsourcing for the classification of retinal fundus photography.

Methods: One hundred retinal fundus photograph images with pre-determined disease criteria were selected by experts from a large cohort study. After reading brief instructions and an example classification, we requested that knowledge workers (KWs) from a crowdsourcing platform classified each image as normal or abnormal with grades of severity. Each image was classified 20 times by different KWs. Four study designs were examined to assess the effect of varying incentive and KW experience in classification accuracy. All study designs were conducted twice to examine repeatability. Performance was assessed by comparing the sensitivity, specificity and area under the receiver operating characteristic curve (AUC).

Results: Without restriction on eligible participants, two thousand classifications of 100 images were received in under 24 hours at minimal cost. In trial 1 all study designs had an AUC (95%CI) of 0.701(0.680-0.721) or greater for classification of normal/abnormal. In trial 1, the highest AUC (95%CI) for normal/abnormal classification was 0.757 (0.738-0.776) for KWs with moderate experience. Comparable results were observed in trial 2. In trial 1, between 64-86% of any abnormal image was correctly classified by over half of all KWs. In trial 2, this ranged between 74-97%. Sensitivity was ≥ 96% for normal versus severely abnormal detections across all trials. Sensitivity for normal versus mildly abnormal varied between 61-79% across trials.

Conclusions: With minimal training, crowdsourcing represents an accurate, rapid and cost-effective method of retinal image analysis which demonstrates good repeatability. Larger studies with more comprehensive participant training are needed to explore the utility of this compelling technique in large scale medical image analysis.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Adult
Aged
Aged, 80 and over
Area Under Curve
Biological Specimen Banks
Cost-Benefit Analysis
Crowdsourcing*
Diabetic Retinopathy / diagnosis
Diagnostic Techniques, Ophthalmological*
Fundus Oculi*
Humans
Middle Aged
Observer Variation
Photography / methods*
ROC Curve
Reproducibility of Results
Retinal Diseases / diagnosis*
Severity of Illness Index
Surveys and Questionnaires
United Kingdom

Abstract

Publication types

MeSH terms

Grants and funding