[关键词]
[摘要]
目的 提高黄檀属的物种鉴别成功率,并将机器学习方法与传统的基于距离/系统发育树的方法进行比较,筛选最优的ITS条形码分析方法。方法 所使用的黄檀属物种ITS序列来自实验获得的3条以及从NCBI下载的399条共96个物种。以条形码ITS作为分子标记,对比距离法、系统发育树法及机器学习方法在黄檀属物种的鉴别成功率。结果 在基于机器学习方法的分析中,黄檀属物种的平均鉴别成功率为39.59%,其中BLOG能识别出42个黄檀属物种,其正确序列分类占比为95.75%。另外,SMO、Naïve Bayes、JRip、J48能够识别出34个物种,分别获得了79.10%、58.71%、72.64%、76.37%的正确序列分类占比。基于系统发育树法与距离法的分析分别获得28.13%和36.46%的鉴别成功率。结论 基于机器学习的黄檀属ITS条形码基原识别比距离法/系统发育树法拥有更高的鉴别成功率和社会经济效率。建议优先利用基于ITS条形码的机器学习方法对黄檀属物种进行基原识别。
[Key word]
[Abstract]
Objective To improve the identification success rate of Dalbergia and screen out the best ITS analysis methods, compare the machine learning methods with the traditional distance-based and phylogenetic tree-based methods to screen the optimal ITS barcode analysis method. Methods A total of 402 ITS sequences of Dalbergia species used in this study were collected by experiments (three ITS sequences) and downloaded from NCBI (399 ITS sequences) for a total of 96 species. The barcode ITS was used as a molecular marker to compare the success rate of distance method, phylogenetic tree method and machine learning method in the identification of Dalbergia species. Results In the analysis based on machine learning methods, the average identification success rate of Dalbergia species was 39.59%, of which 42 Dalbergia species could be recognized by BLOG, and the percentage of their correct sequence classification was 95.75%. In addition, SMO, Naïve Bayes, JRip and J48 can identify 34 species with the correct sequence distribution rate of 79.10%, 58.71%, 72.64% and 76.37%, respectively. The distance-based and phylogenetic tree-based methods obtained the species identification success rate of 36.46% and 28.13%, respectively. Conclusion ITS barcoding identification of Dalbergia based on machine learning approaches has higher identification success rate and socio-economic than traditional methods. It is recommended to prioritize the use of machine learning approaches to identify Dalbergia based on ITS barcode.
[中图分类号]
R286.12
[基金项目]
广东省基础与应用基础研究基金自然科学基金面上项目(2022A1515011268)