详细信息

基于随机森林算法的秦艽龙胆苦苷含量快速检测    

Rapid detection of gentiopicin content in Gentiana macrophylla based on random forest algorithm

文献类型:期刊文献

中文题名:基于随机森林算法的秦艽龙胆苦苷含量快速检测

英文题名:Rapid detection of gentiopicin content in Gentiana macrophylla based on random forest algorithm

作者:陈建国[1];李四海[1]

第一作者:陈建国

机构:[1]甘肃中医药大学信息工程学院,甘肃兰州730000

第一机构:甘肃中医药大学信息工程学院(教育技术中心)

年份:2023

卷号:58

期号:6

起止页码:257

中文期刊名:甘肃农业大学学报

外文期刊名:Journal of Gansu Agricultural University

收录:CSTPCD;;CSCD:【CSCD_E2023_2024】;

基金:甘肃省科技计划项目(21JR1RA272);兰州市科技计划项目(2018-3-41)。

语种:中文

中文关键词:秦艽;近红外光谱;龙胆苦苷;随机森林;小波变换

外文关键词:Gentiana macrophylla;near infrared spectroscopy;gentiopicrin;random forest;wavelet trans-form

摘要:【目的】基于近红外光谱技术,运用随机森林算法实现秦艽中龙胆苦苷含量的快速、准确、无损检测。【方法】采用HPLC法测定秦艽中龙胆苦苷的含量,正交信号校正结合小波压缩对原始光谱进行预处理,以抽取的小波系数作为光谱特征建立秦艽近红外光谱和龙胆苦苷含量之间的随机森林定量分析模型,同时对4种模型的预测结果进行了对比分析。【结果】原始光谱正交信号校正预处理后分别建立偏最小二乘和随机森林定量分析模型,偏最小二乘回归模型在验证集上的均方根误差(RMSEP)和决定系数(R2)分别为0.2469和0.9368,随机森林定量分析模型在验证集上的均方根误差(RMSEP)和决定系数(R2)分别为0.2075和0.9695。原始光谱正交信号校正后进行离散小波分解,抽取63个中低频小波系数分别建立偏最小二乘和随机森林定量分析模型,偏最小二乘回归模型在验证集上的均方根误差(RMSEP)和决定系数(R2)分别为0.2126和0.9503,随机森林定量分析模型在验证集上的均方根误差(RMSEP)和决定系数(R2)分别为0.1663和0.9804。【结论】通过小波多尺度分解降低了决策树之间的相关性,进一步提高了随机森林定量分析模型的泛化能力和稳健性,该定量分析模型可用于秦艽中龙胆苦苷含量的快速准确检测。
【Objective】Based on near infrared spectroscopy,the content of gentiopicin in Gentiana mac-rophylla was rapidly,accurately and non-destructively determined using a random forest algorithm.【Method】HPLC method was used to determine the content of gentiopicrin in G.macrophylla.Orthogonal signal correction combined with wavelet compression was used to preprocess the original spectra,and the extracted wavelet coefficients were used as spectral features to establish a random forest quantitative analy-sis model between NIR spectrum and gentiopicrin content.At the same time,the prediction results of the four models were compared and analyzed.【Result】The partial least squares and random forest quantitative analysis models were established after the spectral pre-processing of the orthogonal signal correction.The root mean square error(RMSEP)and coefficient of determination(R2)of the partial least squares regression model on the validation set were 0.2469 and 0.9368 respectively,and the root mean square error(RMSEP)and coefficient of determination(R2)of the random forest quantitative analysis model on the vali-dation set were 0.2075 and 0.9695 respectively.After the orthogonal signal is corrected,discrete wavelet decomposition is performed,and 63 medium and low frequency wavelet coefficients are extracted to estab-lish partial least squares and random forest quantitative analysis models respectively.The root mean square error(RMSEP)and coefficient of determination(R2)of the partial least squares regression model on the vali-dation set are 0.2126 and 0.9503,respectively.The root mean square error(RMSEP)and coefficient of determination(R2)of the random forest quantitative analysis model on the validation set are 0.1663 and 0.9804,respectively.【Conclusion】The correlation of decision trees was reduced by wavelet multi-scale decomposition,and the generalization ability and robustness of the random forest quantitative analysis model were further improved.The quantitative analysis model can be used for the rapid and accurate deter-mination of gentiopicin content in G.macrophylla.

参考文献:

正在载入数据...

版权所有©甘肃中医药大学 重庆维普资讯有限公司 渝B2-20050021-8 
渝公网安备 50019002500408号 违法和不良信息举报中心