Our goal was to estimate and compare across different readers the reproducibility of the (18)F-FDG PET standardized uptake value (SUV) and CT size measurements, and changes in those measurements, in malignant tumors before and after therapy.
Methods: Fifty-two tumors in 25 patients were evaluated on (18)F-FDG PET/CT scans. Maximum SUVs (SUV(bw) max) and CT size measurements were determined for each tumor independently on pre- and posttreatment scans by 8 different readers (4 PET, 4 CT) using routine nonautomated clinical methods. Percentage changes in SUV(bw) max and CT size between pre- and posttreatment scans were calculated. Interobserver reproducibility of SUV(bw) max, CT size, and changes in these values were described by intraclass correlation coefficients (ICCs) and estimates of variance.
Results: The ICC was higher for the pretreatment, posttreatment, and percentage change in SUV(bw) max than the ICC for the longest CT size and the 2-dimensional CT size (before treatment, 0.93, 0.72, and 0.61, respectively; after treatment, 0.91, 0.85, and 0.45, respectively; and percentage change, 0.94, 0.70, and 0.33, respectively). The variability of SUV(bw) max was significantly lower than the variability of the longest CT size and the 2-dimensional CT size (mean +/- SD before treatment, 6.3% +/- 14.2%, 16.2% +/- 17.8%, and 27.5% +/- 26.7%, respectively, P < or = 0.001; and after treatment, 18.4% +/- 26.8%, 35.1% +/- 47.5%, and 50.9% +/- 51.4%, respectively, P < or = 0.02). The variability of percentage change in SUV(bw) max (16.7% +/- 36.2%) was significantly lower than that for percentage change in the longest CT size (156.3% +/- 157.3%, P < or = 0.0001) and the 2-dimensional CT size (178.4% +/- 546.5%, P < 0.0001).
Conclusion: The interobserver reproducibility of SUV(bw) max for both untreated and treated tumors and percentage change in SUV(bw) max are substantially higher than measurements of CT size and percentage change in CT size. Measurements of tumor metabolism by PET should be included in trials to assess response to therapy. Although PET reproducibility was high, the variability observed in analyses of identical image sets by 4 readers indicates that automated analytic tools to assess response might be helpful to further enhance reproducibility.