[关键词]
[摘要]
目的 基于生物信息学与机器学习方法探讨多囊卵巢综合征(polycystic ovary syndrome,PCOS)关键基因,并在临床水平进行验证,同时筛选对相关基因起调控作用的中药。方法 应用GEO数据库获取4个数据集,使用R软件包“Limma”和加权基因共表达网络分析(weighted gene co-expression network analysis,WGCNA)筛选PCOS组与健康对照组的差异表达基因,并对其进行功能富集和细胞免疫浸润分析。应用机器学习算法获取PCOS关键基因,绘制列线图,建立受试者工作特征(receiver operator characteristic,ROC)曲线评估列线图与每个关键基因识别PCOS的能力及特异性和敏感性。收集临床PCOS患者外周血单个核细胞,对关键基因表达量与PCOS识别情况进行临床验证。通过COREMINE数据库、古今医案云平台预测潜在调控PCOS的中药,并分析其性味归经及功效。结果 4个数据集共获得42个样本,其中PCOS组21个样本,健康对照组21个样本。共获得差异基因127个,基因本体(gene ontology,GO)分析显示差异基因与肾上腺髓质素受体信号过程、细胞间桥、类固醇结合等有关;京都基因与基因组百科全书(Kyoto encyclopedia of genes and genomes,KEGG)分析显示差异基因与核因子-κB(nuclear factor-κB,NF-κB)介导的肿瘤坏死因子-α(tumor necrosis factor-α,TNF-α)信号传导、血管生成、白细胞介素-2(interleukin-2,IL-2)-信号转导和转录激活因子5(signal transducer and activator of transcription 5,STAT5)信号等相关。进一步进行细胞免疫浸润,发现PCOS组γδT细胞、单核细胞、激活的肥大细胞水平升高,浆细胞、CD4初始T细胞、激活的自然杀伤(natural killer,NK)细胞水平降低。最小绝对收缩和选择算子(logistic least absolute shrinkage and selection operator,LASSO)-Cox比例风险模型(Cox proportional-hazards model,COX)回归筛选11个关键靶点,包括AK4、DEPP1、DUOX2、FGG、GAREM1、PLOD2、SLC41A2、SPIN4、THNSL1、TMEM187、ZNF443,PCOS组关键基因的表达量均低于对照组(P<0.05),单个关键基因识别PCOS的曲线下面积(area under urve,AUC)为0.76~0.90,诺莫图识别PCOS的AUC为0.98。临床数据验证共纳入PCOS组12例,健康对照组12例,PCOS组AK4、ZNF443、DUOX2、DEPP1、FGG、SLC41A2、SPIN4与TMEM187的基因表达量均低于健康对照组(P<0.05),应用诺莫图验证对PCOS的识别,AUC为1。预测到与差异基因相关的中药85味,中药的四气以寒、温、平为主,五味以苦、甘、辛味为主,归经以肝、肺、胃经为主,功效以清热解毒、理气为主。结论 AK4、DEPP1、DUOX2、FGG、GAREM1、PLOD2、SLC41A2、SPIN4、THNSL1、TMEM187、ZNF443可能是识别PCOS的潜在关键生物标志物以及对潜在治疗中药的预测二者为PCOS的诊断和治疗提供了新思路。
[Key word]
[Abstract]
Objective To explore the key genes of polycystic ovary syndrome (PCOS) based on bioinformatics and machine learning methods, and to validate them at the clinical level. At the same time, traditional Chinese medicine that regulates related genes will be screened. Methods A total of four datasets were obtained using GEO database, and Limma and weighted gene co-expression network analysis (WGCNA) analysis methods were used to analyze the differentially expressed genes between PCOS group and healthy control group. Functional enrichment and cellular immune infiltration analysis were performed on them. Apply machine learning algorithms to obtain key genes for PCOS, draw a nomograph, and establish an receiver operator characteristic (ROC) curve to evaluate the ability, specificity, and sensitivity of the nomograph in identifying PCOS with each key gene. Collect peripheral blood mononuclear cells from clinical PCOS patients to clinically validate the expression levels of key genes and PCOS recognition. Predict potential Chinese herbal medicines that regulate PCOS through the COREMININE database and the ancient and modern medical case cloud platform, and analyze their four properties, five flavors, meridian tropism, and efficacy. Results A total of 42 samples are obtained from four datasets, including 21 samples from the PCOS group and 21 samples from the healthy control group. A total of 127 differentially expressed genes are obtained. Gene ontology (GO) analysis showed that the differentially expressed genes are related to adrenomedullin receptor signaling processes, intercellular bridging, steroid binding, etc. Kyoto encyclopedia of genes and genomes (KEGG) analysis showed that the differentially expressed genes are related to nuclear factor-κB (NF-κB) mediated tumor necrosis factor-α (TNF-α) signaling, angiogenesis, interleukin-2 (IL-2)-signal transducer and activator of transcription 5 (STAT5) signaling, etc. Further cellular immune infiltration reveals an increase in levels of gamma delta T cells, monocytes, and activated mast cells in the PCOS group, while levels of plasma cells, CD4 naive T cells, and activated natural killer (NK) cells decreased. Logistic least absolute shrinkage and selection operator (LASSO)-Cox proportional-hazards model (COX) regression screening of 11 key genes, including AK4, DEPP1, DUOX2, FGG, GAREM1, PLOD2, SLC41A2, SPIN4, THNSL1, TMEM187, ZNF443, PCOS the expression levels of key genes in the group are lower than those in the control group (P < 0.05), and the area under urve (AUC) for identifying PCOS by a single key gene is 0.76—0.90, while the AUC for identifying PCOS by nomograph is 0.98. Clinical data validation includs 12 cases in the PCOS group and 12 cases in the healthy control group. The expression levels of AK4, ZNF443, DUOX2, DEPP1, FGG, SLC41A2, SPIN4, and TMEM187 in the PCOS group are lower than those in the healthy control group (P < 0.05). Nomogram validation is used to verify the recognition of PCOS, with an AUC of 1. It is predicted that there will be 85 traditional Chinese medicines related to differential genes. The four qi of traditional Chinese medicine are mainly cold, warmth, and calmness, while the five flavors are mainly bitterness, sweetness, and spicy. The meridian tropism is mainly liver, lung, and stomach, and the efficacy is mainly to heat-clearing, detoxify, and qi-regulating. Conclusion AK4, DEPP1, DUOX2, FGG, GAREM1, PLOD2, SLC41A2, SPIN4, THNSL1, TMEM187, ZNF443 may be potential key biomarker for identifying PCOS and providing new ideas for the diagnosis and treatment of PCOS through the prediction of potential therapeutic herbs.
[中图分类号]
Q811.4;R285
[基金项目]
内蒙古自治区自然科学基金项目(2025MS08065);内蒙古自治区卫生健康科技计划项目(202201553);赤峰市自然科学基金项目(SZR2025037)