Background: Statins are widely prescribed cholesterol-lowering medications in the US, but their clinical benefits can be diminished by statin-associated muscle symptoms (SAMS), leading to discontinuation. In this study, we aimed to develop and validate a pharmacological SAMS clinical phenotyping algorithm using electronic health records (EHRs) data from Minnesota Fairview.
Methods: We retrieved structured and unstructured EHR data of statin users and manually ascertained a gold standard set of SAMS cases and controls using the SAMS-CI tool from clinical notes in 200 patients. We developed machine learning algorithms and rule-based algorithms that incorporated various criteria, including ICD codes, statin allergy, creatine kinase elevation, and keyword mentions in clinical notes. We applied the best performing algorithm to the statin cohort to identify SAMS.
Results: We identified 16,889 patients who started statins in the Fairview EHR system from 2010-2020. The combined rule-based (CRB) algorithm, which utilized both clinical notes and structured data criteria, achieved similar performance compared to machine learning algorithms with a precision of 0.85, recall of 0.71, and F1 score of 0.77 against the gold standard set. Applying the CRB algorithm to the statin cohort, we identified the pharmacological SAMS prevalence to be 1.9% and selective risk factors which included female gender, coronary artery disease, hypothyroidism, use of immunosuppressants or fibrates.
Conclusion: Our study developed and validated a simple pharmacological SAMS phenotyping algorithm that can be used to create SAMS case/control cohort for further analysis such as developing SAMS risk prediction model.
Keywords: Electronic Health Records; Hydroxymethylglutaryl-CoA Reductase Inhibitors; Machine Learning; Phenotyping; Precision Medicine.