Evaluating the Impact of Retinal Vessel Segmentation Metrics on Retest Reliability in a Clinical Setting: A Comparative Analysis Using AutoMorph

Samuel D Giesser; Ferhat Turgut; Amr Saad; Jay R Zoellin; Chiara Sommer; Yukun Zhou; Siegfried K Wagner; Pearse A Keane; Matthias Becker; Delia Cabrera DeBuc; Gábor Márk Somfai

doi:10.1167/iovs.65.13.24

Evaluating the Impact of Retinal Vessel Segmentation Metrics on Retest Reliability in a Clinical Setting: A Comparative Analysis Using AutoMorph

Invest Ophthalmol Vis Sci. 2024 Nov 4;65(13):24. doi: 10.1167/iovs.65.13.24.

Authors

Samuel D Giesser^{1

2}, Ferhat Turgut^{1

2

3

4}, Amr Saad^{1

2}, Jay R Zoellin^{1

2}, Chiara Sommer^{1

2}, Yukun Zhou^{5

6

7}, Siegfried K Wagner^{5

6}, Pearse A Keane^{5

6}, Matthias Becker^{1

2

8}, Delia Cabrera DeBuc^{9

10}, Gábor Márk Somfai^{1

2

4}

Affiliations

¹ Department of Ophthalmology, Stadtspital Zürich, Zurich, Switzerland.
² Spross Research Institute, Zurich, Switzerland.
³ Gutblick Research, Pfäffikon, Switzerland.
⁴ Department of Ophthalmology, Semmelweis University, Budapest, Hungary.
⁵ NIHR Biomedical Research Centre at Moorfields Eye Hospital NHS Foundation Trust, London, United Kingdom.
⁶ Institute of Ophthalmology, University College London, London, United Kingdom.
⁷ Department of Medical Physics and Biomedical Engineering, University College London, London, United Kingdom.
⁸ Department of Ophthalmology, University of Heidelberg, Heidelberg, Germany.
⁹ Bascom Palmer Eye Institute, Miller School of Medicine, University of Miami, Miami, Florida, United States.
¹⁰ iScreen 2 Prevent LLC, Miami, FL, United States.

Abstract

Purpose: Current research on artificial intelligence-based fundus photography biomarkers has demonstrated inconsistent results. Consequently, we aimed to evaluate and predict the test-retest reliability of retinal parameters extracted from fundus photography.

Methods: Two groups of patients were recruited for the study: an intervisit group (n = 28) to assess retest reliability over a period of 1 to 14 days and an intravisit group (n = 44) to evaluate retest reliability within a single session. Using AutoMorph, we generated test and retest vessel segmentation maps; measured segmentation map agreement via accuracy, sensitivity, F1 score and Jaccard index; and calculated 76 metrics from each fundus image. The retest reliability of each metric was analyzed in terms of the Spearman correlation coefficient, intraclass correlation coefficient (ICC), and relative percentage change. A linear model with the input variables contrast-to-noise-ratio and fractal dimension, chosen by a P-value-based backward selection process, was developed to predict the median percentage difference on retest per image based on image-quality metrics. This model was trained on the intravisit dataset and validated using the intervisit dataset.

Results: In the intervisit group, retest reliability varied between Spearman correlation coefficients of 0.34 and 0.99, ICC values of 0.31 to 0.99, and mean absolute percentage differences of 0.96% to 223.67%. Similarly, in the intravisit group, the retest reliability ranged from Spearman correlation coefficients of 0.55 and 0.96, ICC values of 0.40 to 0.97, and mean percentage differences of 0.49% to 371.23%. Segmentation map accuracy between test and retest never dropped below 97%; the mean F1 scores were 0.85 for the intravisit dataset and 0.82 for the intervisit dataset. The best retest was achieved with disc-width regarding the Spearman correlation coefficient in both datasets. In terms of the Spearman correlation coefficient, the worst retests of the intervisit and intravisit groups were tortuosity density and artery tortuosity density, respectively. The intravisit group exhibited better retest reliability than the intervisit group (P < 0.001). Our linear model, with the two independent variables contrast-to-noise ratio and fractal dimension predicted the median retest reliability per image on its validation dataset, the intervisit group, with an R2 of 0.53 (P < 0.001).

Conclusions: Our findings highlight a considerable volatility in the reliability of some retinal biomarkers. Improving retest could allow disease progression modeling in smaller datasets or an individualized treatment approach. Image quality is moderately predictive of retest reliability, and further work is warranted to understand the reasons behind our observations better and thus ensure consistent retest results.

Publication types

Comparative Study

MeSH terms

Aged
Artificial Intelligence
Female
Fundus Oculi
Humans
Image Processing, Computer-Assisted / methods
Male
Middle Aged
Photography* / methods
Reproducibility of Results
Retinal Vessels* / diagnostic imaging