[关键词]
[摘要]
目的 利用生物信息学、孟德尔随机化(Mendelian randomization,MR)和机器学习分析探索特发性肺纤维化(idiopathic pulmonary fibrosis,IPF)的潜在靶点,并初步预测可能的相关中药。方法 从GEO获得IPF微阵列数据集,并鉴定差异表达基因(differentially express genes,DEGs)。基于表达数量性状基因座(expression quantitative trait loci,eQTL)数据和全基因组关联研究(genome-wide association study,GWAS)数据,采用MR分析筛选与IPF相关的基因。将MR分析得出的风险基因与DEGs取交集,筛选出IPF相关的核心基因。利用功能富集分析、基因集富集分析(gene set enrichment analysis,GSEA)、免疫细胞浸润分析以及单细胞RNA测序进行评估。应用机器学习算法筛选最优诊断特征基因。使用独立的GEO队列进行差异表达验证及受试者工作特征(receiver operating characteristic,ROC)分析。此外,基于数据库挖掘与分子对接对潜在干预中药进行预测分析。结果 共识别出916个与IPF相关的差异表达基因。与224个MR风险基因取交集后,得到7个关键基因:IRF7、TTC32、IFI6、ISG15(风险基因)及ZNF204P、ISOC1、CTSK(保护基因)。这些基因主要富集于干扰素-β产生、视黄酸诱导基因-1(retinoic acid-inducible gene-1,RIG-1)样受体信号通路、Toll样受体信号通路以及I型干扰素信号通路。免疫浸润分析显示,IPF患者组织中M1/M0巨噬细胞及静息肥大细胞减少,而活化肥大细胞增加。单细胞RNA测序揭示了这些基因在上皮细胞亚群中的特异性表达模式。机器学习算法确定ZNF204P和IRF7为最优诊断基因。在数据集验证中证实了这2基因在IPF中存在显著差异表达,并具有较高的诊断准确性。预测到与关键基因相关的潜在中药385味,中药四气以寒、温、平为主,五味以苦、甘、辛为主,归经以肝、肺、胃、脾、肾经为主,分类以清热药、补虚药、活血化瘀药、解表药和利水渗湿药为主。分子对接揭示了潜在靶向中药关键活性成分能够与核心基因蛋白形成稳定的互相作用。结论 确定了7个与IPF相关的关键基因。机器学习筛选出ZNF204P和IRF7可作为稳健的诊断生物标志物,并具有治疗靶点潜力。并预测出栀子、柴胡、肉苁蓉、黄芪等可能是靶向IPF核心基因的潜在中药。
[Key word]
[Abstract]
Objective To identify potential therapeutic targets for idiopathic pulmonary fibrosis (IPF) and predict related herbal medicines by integrating bioinformatics, Mendelian randomization (MR), and machine learning approaches. Methods IPF microarray datasets were obtained from the GEO database to identify differentially expressed genes (DEGs). Using expression quantitative trait loci (eQTL) data and genome-wide association study (GWAS) data, MR analysis was conducted to screen for genes associated with IPF. The risk genes identified from MR analysis were intersected with DEGs to filter core IPF-related genes. Subsequent evaluations included functional enrichment analysis, gene set enrichment analysis (GSEA), immune cell infiltration analysis, and single-cell RNA sequencing. Machine learning algorithms were applied to select optimal diagnostic feature genes. An independent GEO cohort was used for differential expression validation and receiver operating characteristic (ROC) analysis. Results A total of 916 IPF-associated DEGs were identified. Intersection with 224 MR risk genes yielded seven key genes: IRF7, TTC32, IFI6, and ISG15 (risk genes), along with ZNF204P, ISOC1, and CTSK (protective genes). These genes were primarily enriched in pathways related to interferon-beta production, RIG-I-like receptor signaling, Toll-like receptor signaling, and type I interferon signaling. Immune infiltration analysis revealed a decrease in M1/M0 macrophages and resting mast cells, alongside an increase in activated mast cells in IPF tissues. Single-cell RNA sequencing demonstrated specific expression patterns of these genes within epithelial cell subpopulations. Machine learning algorithms identified ZNF204P and IRF7 as the optimal diagnostic genes. Validation in the dataset confirmed their significant differential expression in IPF and high diagnostic accuracy. A total of 385 traditional Chinese medicines (TCMs) related to the key genes were predicted. The primary properties of these TCMs were cold, warm, and neutral (four natures); their main flavors were bitter, sweet, and pungent (five flavors); the principal meridian tropisms were the liver, lung, stomach, spleen, and kidney meridians; and the major classifications were heat-clearing drugs, tonifying drugs, blood-activating and stasis-resolving drugs, exterior-releasing drugs, and dampness-draining diuretics. Molecular docking simulations revealed that these chemical components could form stable interactions with the core proteins. Conclusion This study identified seven key genes associated with IPF. Machine learning screened ZNF204P and IRF7 as robust diagnostic biomarkers with therapeutic target potential. Furthermore, TCMs such as Zhizi (Gardeniae Fructus), Chaihu (Bupleuri Radix), Roucongrong (Cistanches Herba), and Huangqi (Astragali Radix) might be potential TCMs that targets core genes associated with IPF.
[中图分类号]
Q811.4;R285
[基金项目]
中国科协青年科技人才培育工程博士生专项计划项目;国家重点研发计划(2020YFC2003104);国家自然科学基金面上项目(82174347);国家自然科学基金青年科学基金项目(C类)(82505509);四川省自然科学基金青年基金项目(2025ZNSFSC1853);中国博士后科学基金第75批面上资助(地区专项支持计划)(2024MD753905)