[关键词]
[摘要]
目的 基于基因表达综合数据库(Gene Expression Omnibus,GEO)整合胃癌差异表达基因,系统鉴定肿瘤进展相关核心靶点,并通过网络距离预测具有治疗潜力的中药,为胃癌中西医结合精准干预提供分子依据。方法 从GEO下载21个胃癌数据集(胃癌2 125例、正常367例),构建表达矩阵。采用limma包筛选差异表达基因(differentially expressed genes,DEGs);加权基因共表达网络分析(weighted gene co-expression network analysis,WGCNA)构建共表达网络,鉴定与疾病表型最相关模块;基因本体(gene ontology,GO)和京都基因与基因组百科全书(Kyoto encyclopedia of genes and genomes,KEGG)富集分析关键基因功能;构建7种机器学习模型,SHapley可加性解释(SHapley additive exPlanations,SHAP)特征重要性;基于人类蛋白质相互作用(protein-protein interaction,PPI)网络计算中药靶点模块与胃癌关键基因的网络距离,筛选拓扑接近中药并统计四气、五味、归经与功效。结果 共获455个DEGs,WGCNA划分31个模块,浅黄色模块(r=0.56,q<0.01)含194个枢纽基因(hub genes),与DEGs交集得177个关键基因。富集分析显示GO-生物过程(biological processes,BP)集中于细胞外基质组织与黏附,GO-细胞成分(cell component,CC)富集含胶原细胞外基质(extracellular matrix,ECM)与黏附斑,GO-分子功能(molecular function,MF)突出整合素/生长因子结合;KEGG涵盖actin骨架调控、磷脂酰肌醇-3-羟激酶(phosphatidylinositol-3-hydroxykinase,PI3K)-蛋白激酶B(protein kinase B,Akt)及白细胞介素-17(interleukin-17,IL-17)信号。随机森林(random forest,RF)模型准确率0.991,SHAP一致识别SULF1、THY1、DNER、SPINK7为首位贡献基因。网络距离筛选出牛蒡子、夏枯草、苍术、川贝、女贞子、地耳草、桃仁等前15味中药,四气以凉为主,五味以苦为先,归经肝、胃、肺,功效以清热为主、补虚次之。结论 系统揭示了胃癌的分子机制,预测清热解毒、滋阴活血类中药具多靶点抗胃癌潜力,为胃癌精准诊疗与中医药现代化提供新策略。
[Key word]
[Abstract]
Objective To integrate differentially expressed genes (DEGs) in gastric cancer (GC) from the Gene Expression Omnibus (GEO) database, systematically identify core targets associated with tumor progression, and predict therapeutic Chinese medicines via network distance, providing molecular evidence for integrated traditional Chinese and Western medicine precision intervention in GC. Methods A total of 21 GC datasets (2 125 GC, 367 normal samples) were downloaded from GEO to construct an expression matrix. DEGs were screened using the limma package (|log2(FC)| > 1, FDR < 0.05), weighted gene co-expression network analysis (WGCNA) was performed to identify modules most correlated with disease phenotype, gene ontology (GO) and Kyoto encyclopedia of genes and genomes (KEGG) enrichment analyses were conducted on key genes, seven machine learning models were built with SHapley additive exPlanations (SHAP) for feature importance interpretation; network distance between Chinese medicine target modules and GC key genes was calculated based on the human PPI network to screen topologically proximal medicines, with statistics on four properties, five flavors, meridian tropism, and efficacy. Results A total of 455 DEGs were obtained. WGCNA yielded 31 modules, with the light-yellow module (r = 0.56, q < 0.01) containing 194 hub genes; intersection with DEGs produced 177 key genes. Enrichment analysis showed GO-biological processes (BP) focused on extracellular matrix organization and adhesion, GO-cell component (CC) on collagen-containing ECM and focal adhesion, GO-molecular function (MF) on integrin/growth factor binding, and KEGG on actin cytoskeleton regulation, phosphatidylinositol-3-hydroxykinase (PI3K)-protein kinase B (Akt), and interleukin-17 (IL-17) signaling. The random forest (RF) model achieved 0.991 accuracy, with SHAP consistently ranking SULF1, THY1, DNER, and SPINK7 as top contributors. Network distance screening identified Arctii Fructus, Prunellae Spica, Atractylodis Rhizoma, Fritillariae Cirrhosae Bulbus, Ligustri Lucidi Fructus, Hypocreaceae, and Persicae Semen among the top 15 medicines, characterized by cool/cold properties, bitter flavor, liver/stomach/lung tropism, and primarily heat-clearing with deficiency-tonifying efficacy. Conclusion This study systematically elucidates GC molecular mechanisms, predicting multi-target anti-GC potential of heat-clearing, yin-nourishing, and blood-activating Chinese medicines, and provides novel strategies for GC precision diagnosis/treatment and modernization of traditional Chinese medicine.
[中图分类号]
Q811.4;R285
[基金项目]
国家自然科学基金面上项目(82374621);中国中医科学院创新工程项目(CI2021A05042);国家自然科学基金面上项目(82575263)