[关键词]
[摘要]
目的 为提高人工种植天麻的质量,基于Group-Lasso变量筛选构建随机森林回归模型分析影响天麻品质形成的关键因子。方法 基于Group-Lasso法,对2007—2022年天麻质量研究文献中天麻素含量及产地环境变量等数据进行变量筛选,并在筛选出的变量基础上建立随机森林回归模型及计算变量重要性得分。结果 最终选择了产区、生长状况、种质类型、产地气候类型、产地土壤类型、最热月均温、产地年降水量、产地年日照时数和无霜期9个变量,基于被选变量与天麻素含量建立随机森林回归模型,模型的均方误差(mean square error,MSE)和平均绝对百分误差(mean absolute percentage error,MAPE)分别为0.103 2和14.08%,特征重要性排序显示天麻素含量的最大影响因素是产地年降水量,其次是产地土壤类型、无霜期和产地年日照时数。结论 随机森林回归模型有相对较低的误差和较高的预估精度,更适合用于对天麻种植环境的分析和天麻素含量的估算,为人工种植天麻提供参考。
[Key word]
[Abstract]
Objective In order to improve the quality of artificially planted Tianma (Gastrodia elata), a random forest regression model based on Group-Lasso variable screening was constructed to analyze the key factors affecting the quality of G. elata. Methods Based on the Group-Lasso method, the data of gastrodin content and environmental variables of origin in the literature of G. elata quality research from 2007 to 2022 were screened, and the random forest regression model was then established on the selected variables, and importance score of the variables was calculated. Results Finally, nine variables including production area, growth status, species, production area climate type, production area soil type, average temperature in the hottest month, annual precipitation in the production area, annual sunshine hours in the production area, and frost-free period were selected. A random forest regression model was established based on the selected variables and gastrodin content. The mean square error (MSE) and mean absolute percentage error (MAPE) were 0.103 2 and 14.08%, respectively. The ranking of feature importance showed that the biggest influencing factor of gastrodin content was the annual precipitation in the production area, followed by the production area soil type, frost-free period, and annual sunshine hours in the production area. Conclusion The random forest regression model had relatively low error and high prediction accuracy, and was more suitable for the analysis of G. planting environment and the estimation of gastrodin content.
[中图分类号]
R282.2
[基金项目]
四川省科技厅重点研发项目:川产地道药材大品种精深加工关键技术及产品开发的研究与示范(2020YFN0152);川产道地药材品质评价关键技术装备研究(2021YFS0045)