详细信息
基于机器学习的胃癌关键基因筛选及预测模型构建 被引量:1
Key gene screening and prediction model construction of gastric cancer based on machine learning
文献类型:期刊文献
中文题名:基于机器学习的胃癌关键基因筛选及预测模型构建
英文题名:Key gene screening and prediction model construction of gastric cancer based on machine learning
作者:王泽朋[1];李坤鹏[1];周玉[1];李四海[1]
第一作者:王泽朋
机构:[1]甘肃中医药大学信息工程学院,甘肃兰州730100
第一机构:甘肃中医药大学信息工程学院(教育技术中心)
年份:2024
卷号:41
期号:1
起止页码:115
中文期刊名:中国医学物理学杂志
外文期刊名:Chinese Journal of Medical Physics
收录:CSTPCD;;CSCD:【CSCD_E2023_2024】;
基金:甘肃省科技计划项目(21JR1RA272);甘肃省教育厅高校教师创新基金项目(2023B-105)。
语种:中文
中文关键词:胃癌;基因筛选;关键基因;生物信息学;机器学习
外文关键词:gastric cancer;gene screening;key gene;bioinformatics;machine learning
摘要:目的:为了验证与胃癌相关的遗传特征,提出一种混合式特征选择方法确定靶基因,进一步分析其意义并建立新的诊断预测模型。方法:对原始胃癌数据进行生物信息学方差分析,使用随机森林、支持向量机的递归特征消除、套索算法等机器学习方法筛选胃癌相关基因,对结果取交集,获得关键基因集。进行富集分析,确定关键基因并验证;依据关键基因构建基于多层感知器(MLP)、逻辑回归、决策树等8种机器学习分类算法的诊断预测模型。结果:混合式的特征选择方法筛选出的关键基因与肿瘤发生和发展的生物学过程密切相关;8个关键基因(TXNDC5、BMP8A、ONECUT2、COL10A1、JCHAIN、INHBA、LCTL和TRIM59)被确定为诊断效果较好的胃癌潜在标志物;根据8种分类模型的ROC曲线和准确率结果可知,MLP为最佳胃癌预测模型,其准确率高达97.77%,比他人构建的Xgboost胃癌预测模型准确率高出3.83%。结论:本研究获得了诊断和预防胃癌的8个关键基因,并建立了最佳预后模型。
Objective To verify the genetic characteristics associated with gastric cancer,and to propose a hybrid feature selection method for identifying target genes,further analyzing their significance and establishing a new diagnostic prediction model.Methods Analysis of variance in bioinformatics was performed on the original gastric cancer data,and then machine learning methods such as random forest,recursive feature elimination of support vector machine,and LASSO algorithm were used to screen gastric cancer associated genes,and the intersection of results was taken as the key gene set.The key genes were identified and verified through enrichment analysis.The diagnosis and prediction models based on 8 kinds of machine learning classification algorithms such as multi-layer perceptron,logistic regression and decision tree,were constructed using the key genes.Results The key genes selected by the hybrid feature selection method were closely related to the tumorigenesis and development.Eight key genes(TXNDC5,BMP8A,ONECUT2,COL10A1,JCHAIN,INHBA,LCTL and TRIM59)were identified as potential markers of good diagnostic efficacy in gastric cancer.The ROC curve and accuracy results demonstrated that among the 8 classification models,MLP is the best gastric cancer prediction model,with an accuracy of 97.77%,which was 3.83%higher than that of Xgboost gastric cancer prediction model.Conclusion The study identifies 8 key genes for the diagnosis and prevention of gastric cancer,and establishes the optimal prognosis model.
参考文献:
正在载入数据...