Interrater Reliability of Expert Electroencephalographers Identifying Seizures and Rhythmic and Periodic Patterns in EEGs

Jin Jing; Wendong Ge; Aaron F Struck; Marta Bento Fernandes; Shenda Hong; Sungtae An; Safoora Fatima; Aline Herlopian; Ioannis Karakis; Jonathan J Halford; Marcus C Ng; Emily L Johnson; Brian L Appavu; Rani A Sarkis; Gamaleldin Osman; Peter W Kaplan; Monica B Dhakar; Lakshman Arcot Jayagopal; Zubeda Sheikh; Olga Taraschenko; Sarah Schmitt; Hiba A Haider; Jennifer A Kim; Christa B Swisher; Nicolas Gaspard; Mackenzie C Cervenka; Andres A Rodriguez Ruiz; Jong Woo Lee; Mohammad Tabaeizadeh; Emily J Gilmore; Kristy Nordstrom; Ji Yeoun Yoo; Manisha G Holmes; Susan T Herman; Jennifer A Williams; Jay Pathmanathan; Fábio A Nascimento; Ziwei Fan; Samaneh Nasiri; Mouhsin M Shafi; Sydney S Cash; Daniel B Hoch; Andrew J Cole; Eric S Rosenthal; Sahar F Zafar; Jimeng Sun; M Brandon Westover

doi:10.1212/WNL.0000000000201670

Interrater Reliability of Expert Electroencephalographers Identifying Seizures and Rhythmic and Periodic Patterns in EEGs

Neurology. 2023 Apr 25;100(17):e1737-e1749. doi: 10.1212/WNL.0000000000201670. Epub 2022 Dec 2.

Authors

Jin Jing¹, Wendong Ge¹, Aaron F Struck¹, Marta Bento Fernandes¹, Shenda Hong¹, Sungtae An¹, Safoora Fatima¹, Aline Herlopian¹, Ioannis Karakis¹, Jonathan J Halford¹, Marcus C Ng¹, Emily L Johnson¹, Brian L Appavu¹, Rani A Sarkis¹, Gamaleldin Osman¹, Peter W Kaplan¹, Monica B Dhakar¹, Lakshman Arcot Jayagopal¹, Zubeda Sheikh¹, Olga Taraschenko¹, Sarah Schmitt¹, Hiba A Haider¹, Jennifer A Kim¹, Christa B Swisher¹, Nicolas Gaspard¹, Mackenzie C Cervenka¹, Andres A Rodriguez Ruiz¹, Jong Woo Lee¹, Mohammad Tabaeizadeh¹, Emily J Gilmore¹, Kristy Nordstrom¹, Ji Yeoun Yoo¹, Manisha G Holmes¹, Susan T Herman¹, Jennifer A Williams¹, Jay Pathmanathan¹, Fábio A Nascimento¹, Ziwei Fan¹, Samaneh Nasiri¹, Mouhsin M Shafi¹, Sydney S Cash¹, Daniel B Hoch¹, Andrew J Cole¹, Eric S Rosenthal¹, Sahar F Zafar¹, Jimeng Sun¹, M Brandon Westover²

Affiliations

¹ From the Massachusetts General Hospital/Harvard Medical School Department of Neurology (J.J., W.G., M.B.F., S.S.C., A.J.C., D.B.H., E.S.R., S.F.Z., M.B.W.), MA; Massachusetts General Hospital Clinical Data Animation Center (CDAC) (J.J., W.G., M.B.F., S.S.C., D.B.H., A.J.C., E.S.R., S.F.Z., M.B.W.), MA; University of Wisconsin-Madison Department of Neurology (A.F.S., S.F.); William S. Middleton Memorial Veterans Hospital Madison (A.F.S.), WI; National Institute of Health Data Science (S.H.), Peking University, Beijing, China; Georgia Institute of Technology (S.A.), College of Computing, Atlanta, GA; Yale University-Yale New Haven Hospital (A.H.), CT; Emory University School of Medicine (I.K.), GA; Medical University of South Carolina (J.J.H.), SC; University of Manitoba (M.C.N.), Canada; Johns Hopkins School of Medicine (E.L.J.), MD; University of Arizona College of Medicine (B.L.A.), AZ; Brigham and Women's Hospital (R.A.S.), MA; Mayo Clinic-Rochester (G.O.), MN; Warren Alpert School of Medicine of Brown University (M.B.D.), Providence, RI; University of Nebraska Medical Center (L.A.J.), NE; West Virginia University Hospitals (Z.S.), WV; University of Chicago (H.A.H.), Chicago, IL; Atrium Health (C.B.S.), NC; Université Libre de Bruxelles - Hôpital Erasme (N.G.), Belgium; Icahn School of Medicine, Mount Sinai (J.Y.Y.), NY; New York University (NYU) Grossman School of Medicine (M.G.H.), NY; Barrow Neurological Institute (S.T.H.), Phoenix, AZ; Mater Misericordiae University Hospital (J.A.W.), Dublin, Ireland; University of Pennsylvania (J.P.), PA; Beth Israel Deaconess Medical Center/Harvard Medical School (M.M.S.), MA; and University of Illinois at Urbana-Champaign (J.S.), College of Computing, Champaign, IL.
² From the Massachusetts General Hospital/Harvard Medical School Department of Neurology (J.J., W.G., M.B.F., S.S.C., A.J.C., D.B.H., E.S.R., S.F.Z., M.B.W.), MA; Massachusetts General Hospital Clinical Data Animation Center (CDAC) (J.J., W.G., M.B.F., S.S.C., D.B.H., A.J.C., E.S.R., S.F.Z., M.B.W.), MA; University of Wisconsin-Madison Department of Neurology (A.F.S., S.F.); William S. Middleton Memorial Veterans Hospital Madison (A.F.S.), WI; National Institute of Health Data Science (S.H.), Peking University, Beijing, China; Georgia Institute of Technology (S.A.), College of Computing, Atlanta, GA; Yale University-Yale New Haven Hospital (A.H.), CT; Emory University School of Medicine (I.K.), GA; Medical University of South Carolina (J.J.H.), SC; University of Manitoba (M.C.N.), Canada; Johns Hopkins School of Medicine (E.L.J.), MD; University of Arizona College of Medicine (B.L.A.), AZ; Brigham and Women's Hospital (R.A.S.), MA; Mayo Clinic-Rochester (G.O.), MN; Warren Alpert School of Medicine of Brown University (M.B.D.), Providence, RI; University of Nebraska Medical Center (L.A.J.), NE; West Virginia University Hospitals (Z.S.), WV; University of Chicago (H.A.H.), Chicago, IL; Atrium Health (C.B.S.), NC; Université Libre de Bruxelles - Hôpital Erasme (N.G.), Belgium; Icahn School of Medicine, Mount Sinai (J.Y.Y.), NY; New York University (NYU) Grossman School of Medicine (M.G.H.), NY; Barrow Neurological Institute (S.T.H.), Phoenix, AZ; Mater Misericordiae University Hospital (J.A.W.), Dublin, Ireland; University of Pennsylvania (J.P.), PA; Beth Israel Deaconess Medical Center/Harvard Medical School (M.M.S.), MA; and University of Illinois at Urbana-Champaign (J.S.), College of Computing, Champaign, IL. mwestover@mgh.harvard.edu.

Abstract

Background and objectives: The validity of brain monitoring using electroencephalography (EEG), particularly to guide care in patients with acute or critical illness, requires that experts can reliably identify seizures and other potentially harmful rhythmic and periodic brain activity, collectively referred to as "ictal-interictal-injury continuum" (IIIC). Previous interrater reliability (IRR) studies are limited by small samples and selection bias. This study was conducted to assess the reliability of experts in identifying IIIC.

Methods: This prospective analysis included 30 experts with subspecialty clinical neurophysiology training from 18 institutions. Experts independently scored varying numbers of ten-second EEG segments as "seizure (SZ)," "lateralized periodic discharges (LPDs)," "generalized periodic discharges (GPDs)," "lateralized rhythmic delta activity (LRDA)," "generalized rhythmic delta activity (GRDA)," or "other." EEGs were performed for clinical indications at Massachusetts General Hospital between 2006 and 2020. Primary outcome measures were pairwise IRR (average percent agreement [PA] between pairs of experts) and majority IRR (average PA with group consensus) for each class and beyond chance agreement (κ). Secondary outcomes were calibration of expert scoring to group consensus, and latent trait analysis to investigate contributions of bias and noise to scoring variability.

Results: Among 2,711 EEGs, 49% were from women, and the median (IQR) age was 55 (41) years. In total, experts scored 50,697 EEG segments; the median [range] number scored by each expert was 6,287.5 [1,002, 45,267]. Overall pairwise IRR was moderate (PA 52%, κ 42%), and majority IRR was substantial (PA 65%, κ 61%). Noise-bias analysis demonstrated that a single underlying receiver operating curve can account for most variation in experts' false-positive vs true-positive characteristics (median [range] of variance explained ([Formula: see text]): 95 [93, 98]%) and for most variation in experts' precision vs sensitivity characteristics ([Formula: see text]: 75 [59, 89]%). Thus, variation between experts is mostly attributable not to differences in expertise but rather to variation in decision thresholds.

Discussion: Our results provide precise estimates of expert reliability from a large and diverse sample and a parsimonious theory to explain the origin of disagreements between experts. The results also establish a standard for how well an automated IIIC classifier must perform to match experts.

Classification of evidence: This study provides Class II evidence that an independent expert review reliably identifies ictal-interictal injury continuum patterns on EEG compared with expert consensus.

Publication types

Research Support, U.S. Gov't, Non-P.H.S.
Research Support, Non-U.S. Gov't
Research Support, N.I.H., Extramural

MeSH terms

Brain
Critical Illness
Electroencephalography* / methods
Female
Humans
Middle Aged
Reproducibility of Results
Seizures*

Abstract

Publication types

MeSH terms

Grants and funding