Assessment of interobserver reproducibility in quantitative 18F-FDG PET and CT measurements of tumor response to therapy

Heather A Jacene; Sophie Leboulleux; Shingo Baba; Daniel Chatzifotiadis; Behnaz Goudarzi; Oleg Teytelbaum; Karen M Horton; Ihab Kamel; Katarzyna J Macura; Hua-Ling Tsai; Jeanne Kowalski; Richard L Wahl

doi:10.2967/jnumed.109.063321

Assessment of interobserver reproducibility in quantitative 18F-FDG PET and CT measurements of tumor response to therapy

J Nucl Med. 2009 Nov;50(11):1760-9. doi: 10.2967/jnumed.109.063321. Epub 2009 Oct 16.

Authors

Heather A Jacene¹, Sophie Leboulleux, Shingo Baba, Daniel Chatzifotiadis, Behnaz Goudarzi, Oleg Teytelbaum, Karen M Horton, Ihab Kamel, Katarzyna J Macura, Hua-Ling Tsai, Jeanne Kowalski, Richard L Wahl

Affiliation

¹ Division of Nuclear Medicine and Body CT, Russell H. Morgan Department of Radiology and Radiological Science, Johns Hopkins University, Baltimore, Maryland, USA.

PMID: 19837757
DOI: 10.2967/jnumed.109.063321

Abstract

Our goal was to estimate and compare across different readers the reproducibility of the (18)F-FDG PET standardized uptake value (SUV) and CT size measurements, and changes in those measurements, in malignant tumors before and after therapy.

Methods: Fifty-two tumors in 25 patients were evaluated on (18)F-FDG PET/CT scans. Maximum SUVs (SUV(bw) max) and CT size measurements were determined for each tumor independently on pre- and posttreatment scans by 8 different readers (4 PET, 4 CT) using routine nonautomated clinical methods. Percentage changes in SUV(bw) max and CT size between pre- and posttreatment scans were calculated. Interobserver reproducibility of SUV(bw) max, CT size, and changes in these values were described by intraclass correlation coefficients (ICCs) and estimates of variance.

Results: The ICC was higher for the pretreatment, posttreatment, and percentage change in SUV(bw) max than the ICC for the longest CT size and the 2-dimensional CT size (before treatment, 0.93, 0.72, and 0.61, respectively; after treatment, 0.91, 0.85, and 0.45, respectively; and percentage change, 0.94, 0.70, and 0.33, respectively). The variability of SUV(bw) max was significantly lower than the variability of the longest CT size and the 2-dimensional CT size (mean +/- SD before treatment, 6.3% +/- 14.2%, 16.2% +/- 17.8%, and 27.5% +/- 26.7%, respectively, P < or = 0.001; and after treatment, 18.4% +/- 26.8%, 35.1% +/- 47.5%, and 50.9% +/- 51.4%, respectively, P < or = 0.02). The variability of percentage change in SUV(bw) max (16.7% +/- 36.2%) was significantly lower than that for percentage change in the longest CT size (156.3% +/- 157.3%, P < or = 0.0001) and the 2-dimensional CT size (178.4% +/- 546.5%, P < 0.0001).

Conclusion: The interobserver reproducibility of SUV(bw) max for both untreated and treated tumors and percentage change in SUV(bw) max are substantially higher than measurements of CT size and percentage change in CT size. Measurements of tumor metabolism by PET should be included in trials to assess response to therapy. Although PET reproducibility was high, the variability observed in analyses of identical image sets by 4 readers indicates that automated analytic tools to assess response might be helpful to further enhance reproducibility.

Publication types

Research Support, N.I.H., Extramural

MeSH terms

Female
Fluorodeoxyglucose F18* / metabolism
Humans
Male
Middle Aged
Neoplasms / diagnosis*
Neoplasms / metabolism
Neoplasms / therapy*
Observer Variation
Positron-Emission Tomography / statistics & numerical data*
Reproducibility of Results
Retrospective Studies
Time Factors
Tomography, X-Ray Computed / statistics & numerical data*
Treatment Outcome

Substances

Fluorodeoxyglucose F18

Grants and funding

P30 CA006973-43S2/CA/NCI NIH HHS/United States