[关键词]
[摘要]
目的 以中药药性作为特征描述符构建机器学习抗辐射作用预测模型,并解释抗辐射中药药性的重要药性特征以指导临床和日常辐射防治。方法 通过药智网、中国知网、PubMed等数据库获取报道具有抗辐射作用的中药研究文献,通过SymMap数据库获得《中国药典》等权威著作记载的中药性味归经等药性作为特征描述符构建数据库,并将数据处理为适合机器学习的格式。使用随机森林、支持向量机、梯度提升、逻辑回归、全连接神经网络5种机器学习模型对数据集进行五折交叉训练并进行性能评估,再使用10个未参与训练的报道具抗辐射作用的中药和10个报道无抗辐射作用的中药作为外部验证集测试模型,最后利用SHapley加性解释(SHapley additive ex planations,SHAP)解释器对决定抗辐射作用有无的重要药性特征进行可视化。结果 收集到涉及单味药研究、保健食品注册和复方研究共136味报道具抗辐射作用的中药,总频次为447次,其中中药使用频次及频率排名前3的为灵芝、红景天、枸杞子,频次 ≥ 10的中药共10味。在5个机器学习的性能评估中,随机森林性能最佳,其准确率、平衡F分数(F1)和曲线下面积(area under urve,AUC)分别为0.804 4、0.773 2和0.879 8。在外部中药抗辐射的验证中,随机森林模型能较好地预测已报道具有抗辐射作用的中药。性能最佳的随机森林SHAP解释器认为"补虚、清热"功效,"心、肝、脾、肺"归经,"酸、甘、苦"味,"寒"药性特征对抗辐射作用贡献最大。结论 首次将中药药性作为特征描述应用到机器学习当中并取得了较好的预测模型性能,并指导了放射病的中医药防治,即当以扶正祛邪,滋阴降火,主治心、肝、脾、肺为治疗原则。此外,机器学习模型较好的预测结果反映了中医药理论对疾病的防治具有良好的可解释性与可重现性。
[Key word]
[Abstract]
Objective To construct a machine learning model for predicting anti-radiation effects using the medicinal properties of traditional Chinese medicine (TCM) as feature descriptors, and elucidate the significant medicinal properties of anti-radiation TCM to guide clinical practice and daily radiation prevention and control. Methods Research literature on TCM with reported anti-radiation effects was collected from databases such as the Medicinal Herb Database, CNKI, and PubMed. The SymMap database was utilized to obtain the medicinal properties of TCM, such as taste, attributed meridians, etc., as recorded in authoritative texts like the Chinese Pharmacopoeia. These properties were used as feature descriptors to construct a database, which was then processed into a format suitable for machine learning. Five machine learning models-random forest, support vector machine, gradient elevator, logistic regression, and fully connected neural network-were employed to perform five-fold cross-validation on the dataset and assess performance. An external validation set, comprising 10 TCMs with reported anti-radiation effects and 10 TCMs without such reports, was used to test the models. Finally, the SHapley Additive exPlanations (SHAP) interpreter was used to visualize and interpret the significant medicinal properties that contribute to the anti-radiation effects. Results A total of 136 TCMs with reported anti-radiation effects were collected, including single herbs, health food registrations, and compound formulations, with a total frequency of 447 occurrences. The top three most frequently used TCMs were Lingzhi (Ganoderma), Hongjingtian (Rhodiolae Crenulatae Radix et Rhizoma), and Gouqizi (Lycii Fructus), with each being used more than 10 times. Among the five machine learning models, the random forest exhibited the best performance, with an accuracy rate, F1 score, and area under urve (AUC) of 0.804 4, 0.773 2, and 0.879 8, respectively. The random forest model also demonstrated a good ability to correctly predict TCMs with reported anti-radiation effects in the external validation. The SHAP interpreter for the best-performing random forest model identified the functions of "tonifying deficiency, clearing heat", the attributed meridians of "heart, liver, spleen, lung", the tastes of "sour, sweet, bitter", and the properties of "cold" as the most contributive to the anti-radiation effects. Conclusions This study is the first to apply the medicinal properties of TCM as feature descriptors in machine learning, achieving good predictive model performance. It provides guidance for the prevention and treatment of radiation sickness with TCM, suggesting that principles such as reinforcing healthy qi and eliminating pathogen, nourishing yin and reducing fire, and focusing on the heart, liver, spleen, and lung should be followed. Furthermore, the satisfactory performance of the machine learning models reflects the good interpretability and reproducibility of TCM theory in disease prevention and treatment.
[中图分类号]
[基金项目]
广西中医药大学“桂派中医药传承创新团队”资助项目(2022A005)