Background & aims: Liver biopsies are a critical component of pivotal studies in non-alcoholic steatohepatitis (NASH), constituting inclusion criteria, risk stratification factors and endpoints. We evaluated the reliability of NASH Clinical Research Network scoring of liver biopsies in a NASH clinical trial.
Methods: Digitized slides of 678 biopsies from 339 patients with paired biopsies randomized into the EMMINENCE study - examining a novel insulin sensitizer (MSDC-0602K) in NASH - were read independently by 3 hepatopathologists blinded to treatment code and scored using the NASH CRN histological scoring system. Various endpoints were computed from these scores.
Results: Inter-reader linearly weighted kappas were 0.609, 0.484, 0.328, and 0.517 for steatosis, fibrosis, lobular inflammation, and ballooning, respectively. Inter-reader unweighted kappas were 0.400 for the diagnosis of NASH, 0.396 for NASH resolution without worsening fibrosis, and 0.366 for fibrosis improvement without worsening NASH. In the current study, 46.3% of the patients included in the study based on 1 hepatopathologist's qualifying reading were deemed not to meet the study's histologic inclusion criteria by at least 1 of the 3 hepatopathologists. The MSDC-0602K treatment effect was lowest for those histologic features with lower inter-reader reliability. Simulations show that the lack of reliability of endpoints and inclusion criteria can drastically reduce study power - from >90% in a well-powered study to as low as 40%.
Conclusions: The reliability of hepatopathologists' liver biopsy evaluation using currently accepted criteria is suboptimal. This lack of reliability may affect NASH pivotal studies by introducing patients who do not meet NASH study entry criteria, misclassifying fibrosis subgroups, and attenuating apparent treatment effects.
Lay summary: Since liver biopsy analysis plays such an important role in clinical studies of non-alcoholic steatohepatitis, it is important to understand the reliability of hepato-pathologist readings. We examined both inter- and intra-reader variability in a large data set of paired liver biopsies from a clinical trial. We found very poor inter-reader and modest intra-reader variability. This result has important implications for entry criteria, fibrosis stratification, and the ability to measure a treatment effect in clinical trials.
Keywords: Diabetes Mellitus; Histology; Insulin Resistance; Non-alcoholic fatty liver disease; Type 2; Validation studies.
Copyright © 2020 European Association for the Study of the Liver. Published by Elsevier B.V. All rights reserved.