[关键词]
[摘要]
目的 为实现对人参Panax ginseng年限的准确、无损、低成本识别,拟建立一种基于高光谱成像技术结合机器学习的人参年限识别方法。方法 以1~7年生吉林通化地区人参为研究对象,分别在可见-近红外波段(visible-near infrared,VNIR)和短波红外波段(short-wave infrared,SWIR)范围内采集人参高光谱图像,共得到84份人参样品的高光谱图像及1 680个感兴趣区域的高光谱数据。在VNIR、SWIR和VNIR+SWIR融合波段范围内,分别对人参样品高光谱数据进行多元散射校正(multiple scattering correction,MSC)、标准正态变化(standard normal variation,SNV)、Savitzky-Golay平滑、一阶导(first-order derivative,FD)、二阶导(second-order derivative,SD)的预处理,然后分别结合偏最小二乘判别分析(partial least squares discriminant analysis,PLS-DA)、线性支持向量机(Linear SVC)判别分析方法,在以“药食”为区分的2分类识别,以大于、小于、等于5年为区分的3分类识别,以7个年份进行逐年区分的7分类识别的3个年限分类尺度,分别建立人参年限的识别模型。结果 在VNIR 410~720 nm内,同一波长下1~7年人参平均光谱反射率整体有依次降低的趋势。不同分类识别模型的混淆矩阵评估结果表明,SWIR波段和融合波段经FD预处理后的LinearSVC模型在3个年限尺度下的分类效果较好,准确率较高,2、3、7分类模型的预测集准确率分别为99.60%、98.41%、95.24%。利用连续投影算法(continuous projection algorithm,SPA)筛选的特征波段建立的识别模型在2分类和3分类时精度较高,且所用波段更少,分类识别效率更高。结论 高光谱成像技术结合机器学习和特征波段筛选方法,可以较好实现对特定产地人参的年限识别,为实现该技术在人参年限识别和质量控制等实际应用方面提供参考。
[Key word]
[Abstract]
Objective To achieve accurate, nondestructive and low-cost identification of Panax ginseng age, a P. ginseng age identification method was established in this study based on hyperspectral imaging technology combined with machine learning. Methods Hyperspectral images of 84 P. ginseng samples and hyperspectral data of 1 680 regions of interest were obtained by acquiring P. ginseng hyperspectral images in the visible-near infrared (VNIR) and short-wave infrared (SWIR) bands from one to seven years old in Tonghua, Jilin, China, respectively. The hyperspectral data of ginseng samples were preprocessed with multiple scattering correction (MSC), standard normal variation (SNV), Savitzky-Golay smoothing, first-order derivative (FD) and second-order derivative (SD) in the VNIR, SWIR and VNIR + SWIR fusion bands, and then combined with partial least squares discriminant analysis (PLS-DA), linear Then, we combined PLS-DA and LinearSVC discriminant analysis methods to establish the identification models of P. ginseng years in two classification scales distinguished by “medicinal food”, three classification scales distinguished by greater than, less than, and equal to five years, and seven classification scales distinguished by seven years, respectively. Results In the VNIR 410—720 nm band range, there was an overall trend of sequential decrease in the average spectral reflectance of P. ginseng from one year to seven years at the same wavelength. The results of confusion matrix evaluation of different classification recognition models showed that the LinearSVC model with FD preprocessing in SWIR band and fusion band had better classification and higher accuracy at three annual scales, and the prediction set accuracy of 2, 3 and 7 classification models were 99.60%, 98.41% and 95.24%, respectively. The recognition models built using the feature bands screened by the continuous projection algorithm (SPA) have higher accuracy at 2 and 3 classifications, and use fewer bands for more efficient classification and recognition. Conclusion Hyperspectral imaging technology combined with machine learning and feature band screening methods can better achieve the identification of the age of P. ginseng of specific origin, and provide a reference for realizing the practical applications of this technology in P. ginseng age identification and quality control.
[中图分类号]
R282.1
[基金项目]
国家中医药管理局中医药创新团队及人才支持计划项目(ZYYCXTD-D-202005);中国中医科学院科技创新工程(CI2021A03901);中央本级重大增减支项目(2060302)