Hyperspectral imaging (HSI) technology was combined with chemometrics to achieve rapid determination of tanshinone contents in Salvia miltiorrhiza, as well as the rapid identification of its origins. Derivative (D1), second derivative (D2), Savitzky-Golay filtering (SG), multiplicative scatter correction (MSC), and standard normal variate transformation (SNV) were utilized to preprocess original spectrum (ORI). Partial least squares discriminant analysis (PLS-DA) and support vector machine (SVM) models were employed to discriminate 420 Salvia miltiorrhiza samples collected from Shandong, Hebei, Shanxi, Sichuan, and Anhui Provinces. The contents of tanshinone IIA, tanshinone I, cryptotanshinone, and total tanshinones in Salvia miltiorrhiza were predicted by the back-propagation neural network (BPNN), partial least square regression (PLSR), and random forest (RF). Finally, effective wavelengths were selected using the successive projections algorithm (SPA) and variable iterative space shrinkage approach (VISSA). The results indicated that the D1-PLS-DA model performed the best with a classification accuracy of 98.97%. SG-BPNN achieved the best prediction effect for cryptotanshinone (RMSEP = 0.527, RPD = 3.25), ORI-BPNN achieved the best prediction effect for tanshinone IIA (RMSEP = 0.332, RPD = 3.34), MSC-PLSR achieved the best prediction effect for tanshinone I (RMSEP = 0.110, RPD = 4.03), and SNV-BPNN achieved the best prediction effect for total tanshinones (RMSEP = 0.759, RPD = 4.01). When using the SPA and VISSA, the number of wavelengths was reduced below 60 and 150, respectively, and the performance of the models was all very good (RPD > 3). Therefore, the combination of HSI with chemometrics provides a promising method for predicting the active ingredients of Salvia miltiorrhiza and identifying its geographical origins.
Keywords: Salvia miltiorrhiza; chemometrics; content prediction; hyperspectral imaging; traceability.