Inter-observer variability of expert-derived morphologic risk predictors in aortic dissection

Martin J Willemink; Domenico Mastrodicasa; Mohammad H Madani; Marina Codari; Leonid L Chepelev; Gabriel Mistelbauer; Kate Hanneman; Maral Ouzounian; Daniel Ocazionez; Rana O Afifi; Joan M Lacomis; Luigi Lovato; Davide Pacini; Gianluca Folesani; Ricarda Hinzpeter; Hatem Alkadhi; Arthur E Stillman; Anna M Sailer; Valery L Turner; Virginia Hinostroza; Kathrin Bäumler; Anne S Chin; Nicholas S Burris; D Craig Miller; Michael P Fischbein; Dominik Fleischmann

doi:10.1007/s00330-022-09056-z

Inter-observer variability of expert-derived morphologic risk predictors in aortic dissection

Eur Radiol. 2023 Feb;33(2):1102-1111. doi: 10.1007/s00330-022-09056-z. Epub 2022 Aug 27.

Authors

Martin J Willemink¹, Domenico Mastrodicasa^{1

2}, Mohammad H Madani¹, Marina Codari¹, Leonid L Chepelev¹, Gabriel Mistelbauer¹, Kate Hanneman³, Maral Ouzounian⁴, Daniel Ocazionez⁵, Rana O Afifi⁶, Joan M Lacomis⁷, Luigi Lovato⁸, Davide Pacini⁹, Gianluca Folesani⁹, Ricarda Hinzpeter¹⁰, Hatem Alkadhi¹⁰, Arthur E Stillman¹¹, Anna M Sailer¹, Valery L Turner¹, Virginia Hinostroza¹, Kathrin Bäumler¹, Anne S Chin¹², Nicholas S Burris¹³, D Craig Miller¹⁴, Michael P Fischbein¹⁴, Dominik Fleischmann^{15

16}

Affiliations

¹ Department of Radiology, Stanford University School of Medicine, Stanford, CA, USA.
² Stanford Cardiovascular Institute, Stanford University School of Medicine, Stanford, CA, USA.
³ Department of Medical Imaging, Peter Munk Cardiac Centre, Toronto General Hospital, University of Toronto, Toronto, Canada.
⁴ Department of Surgery, University of Toronto, Toronto, Canada.
⁵ Department of Radiology, McGovern Medical School at The University of Texas Health Science Center at Houston (UTHealth), Houston, TX, USA.
⁶ Department of Cardiothoracic and Vascular Surgery, McGovern Medical School at The University of Texas Health Science Center at Houston (UTHealth), Houston, TX, USA.
⁷ Department of Radiology, University of Pittsburgh School of Medicine, Pittsburgh, PA, USA.
⁸ Department of Radiology, IRCCS Azienda Ospedaliero-Universitaria di Bologna, Policlinico di S. Orsola, Bologna, Italy.
⁹ Department of Cardiac Surgery, IRCCS Azienda Ospedaliero-Universitaria di Bologna, Policlinico di S. Orsola, Bologna, Italy.
¹⁰ Institute of Diagnostic and Interventional Radiology, University Hospital Zurich, University of Zurich, Zurich, Switzerland.
¹¹ Department of Radiology, Emory University, Atlanta, GA, USA.
¹² Département de Radiologie, Centre Hospitalier de l'Université de Montréal, Montreal, Canada.
¹³ Department of Radiology, University of Michigan, Ann Arbor, MI, USA.
¹⁴ Department of Cardiothoracic Surgery, Stanford University School of Medicine, Stanford, CA, USA.
¹⁵ Department of Radiology, Stanford University School of Medicine, Stanford, CA, USA. d.fleischmann@stanford.edu.
¹⁶ Stanford Cardiovascular Institute, Stanford University School of Medicine, Stanford, CA, USA. d.fleischmann@stanford.edu.

Abstract

Objectives: Establishing the reproducibility of expert-derived measurements on CTA exams of aortic dissection is clinically important and paramount for ground-truth determination for machine learning.

Methods: Four independent observers retrospectively evaluated CTA exams of 72 patients with uncomplicated Stanford type B aortic dissection and assessed the reproducibility of a recently proposed combination of four morphologic risk predictors (maximum aortic diameter, false lumen circumferential angle, false lumen outflow, and intercostal arteries). For the first inter-observer variability assessment, 47 CTA scans from one aortic center were evaluated by expert-observer 1 in an unconstrained clinical assessment without a standardized workflow and compared to a composite of three expert-observers (observers 2-4) using a standardized workflow. A second inter-observer variability assessment on 30 out of the 47 CTA scans compared observers 3 and 4 with a constrained, standardized workflow. A third inter-observer variability assessment was done after specialized training and tested between observers 3 and 4 in an external population of 25 CTA scans. Inter-observer agreement was assessed with intraclass correlation coefficients (ICCs) and Bland-Altman plots.

Results: Pre-training ICCs of the four morphologic features ranged from 0.04 (-0.05 to 0.13) to 0.68 (0.49-0.81) between observer 1 and observers 2-4 and from 0.50 (0.32-0.69) to 0.89 (0.78-0.95) between observers 3 and 4. ICCs improved after training ranging from 0.69 (0.52-0.87) to 0.97 (0.94-0.99), and Bland-Altman analysis showed decreased bias and limits of agreement.

Conclusions: Manual morphologic feature measurements on CTA images can be optimized resulting in improved inter-observer reliability. This is essential for robust ground-truth determination for machine learning models.

Key points: • Clinical fashion manual measurements of aortic CTA imaging features showed poor inter-observer reproducibility. • A standardized workflow with standardized training resulted in substantial improvements with excellent inter-observer reproducibility. • Robust ground truth labels obtained manually with excellent inter-observer reproducibility are key to develop reliable machine learning models.

Keywords: Aortic dissection; Computed tomography angiography; Variability, inter-observer.

MeSH terms

Aorta
Aortic Dissection* / diagnostic imaging
Humans
Observer Variation
Reproducibility of Results
Retrospective Studies

Abstract

MeSH terms

Grants and funding