详细信息
甘肃省不同地区糖尿病肾脏疾病的机器学习预测模型的研究
Machine learning prediction model of diabetic kidney disease in different regions of Gansu province
文献类型:期刊文献
中文题名:甘肃省不同地区糖尿病肾脏疾病的机器学习预测模型的研究
英文题名:Machine learning prediction model of diabetic kidney disease in different regions of Gansu province
作者:杨建宁[1];洪豆豆[1];李杨[2];余静[3];杨帆[1];温子英[1];乔文俊[4];刘静[2];张琦[3]
第一作者:杨建宁
机构:[1]甘肃中医药大学第一临床医学院,兰州730000;[2]甘肃省人民医院内分泌代谢诊疗中心;[3]甘肃省人民医院老年医学科;[4]宁夏医科大学第一临床医学院
第一机构:甘肃中医药大学临床医学院
年份:2025
卷号:33
期号:1
起止页码:8
中文期刊名:中国糖尿病杂志
外文期刊名:Chinese Journal of Diabetes
收录:;北大核心:【北大核心2023】;
基金:国家自然科学基金(81960173、82160166);甘肃省重点研发计划(22YF7FA096);甘肃省人民医院院内科研基金(22GSSYA-1);兰州市卫生健康委科技发展项目(2021005)。
语种:中文
中文关键词:糖尿病肾脏疾病;糖尿病;2型;机器学习;预测模型
外文关键词:Diabetic kidney disease;Diabetes mellitus,type 2;Machine learning;Prediction model
摘要:目的构建甘肃省平原风沙与黄土丘陵地区T2DM患者发生DKD的机器学习(ML)预测模型,并对模型进行可解释性分析。方法采用多阶段分层随机抽样法收集两地区T2DM患者资料,经关键特征筛选后构建8种DKD发生风险的ML预测模型。采用受试者工作特征(ROC)曲线下面积(AUC)、准确率及F1指数评价模型,模型解释采用Shapley加性解释(SHAP)算法。结果最终纳入1599例T2DM患者,经特征筛选后平原风沙地区纳入10个变量建模。在8种模型中,梯度提升决策树(GBDT)模型预测效能最高,其测试集AUC为0.972,准确率为0.949,F1指数为0.884。黄土丘陵地区纳入12个变量建模,最优模型为随机森林(RF),其测试集的AUC为0.966,准确率为0.951,F1指数为0.861。SHAP分析发现,DKD高风险除与血肌酐、年龄、LDL-C、Hb A1c及DM病程等因素相关外,还与血尿酸、尿微量白蛋白相关。结论GBDT和RF模型对两地区DKD的发生有良好预测效能,可用于两地区DKD高危人群筛查及潜在危险因素深入挖掘。
Objective To construct a machine learning prediction model for diabetic kidney disease(DKD) in type 2 diabetes mellitus(T2DM) patients in the plain-sand and loess hilly areas of Gansu Province,and analyze the interpretability of the model. Methods A multi-stage stratified random sampling method was used to collect the data of T2DM patients in the two areas. After key feature screening,eight ML prediction models were constructed for the risk of DKD in the two areas. The receiver operating characteristic(ROC) curve,accuracy and F1 index were used to evaluate the model,and Shapley additive explanation(SHAP) algorithm was used for model interpretation. Results A total of 1599 patients with T2DM were enrolled in this study. After feature screening,ten variables were selected for model construction in the plain-sand areas.Among the eight models,the gradient boosting decision tree(GBDT) model had the highest prediction efficiency.The area under the curve(AUC) of the test dataset was 0. 972,the accuracy was 0. 949,and the F1 index was 0. 884. In the loess hilly region,12 variables were included in the model,and the best model was the random forest(RF). The AUC of the test set was 0. 966,the accuracy was 0. 951,and the F1 index was 0. 861. SHAP analysis showed that in addition to serum creatinine,age,LDL-C,Hb A1c,DM duration,serum uric acid and urinary microalbumin were also closely related to the high risk of DKD. Conclusions The GBDT and RF models have good predictive efficiency for the occurrence of DKD in the two areas,which can be used for the screening of DKD high-risk populations and the in-depth exploration of potential risk factors in the two areas.
参考文献:
正在载入数据...