详细信息
融合OMP和PLS的粮食作物近红外光谱变量选择
Selection of Near-Infrared Spectral Variables of Food Crops Combining Orthogonal Matching Tracking and Partial Least Squares
文献类型:期刊文献
中文题名:融合OMP和PLS的粮食作物近红外光谱变量选择
英文题名:Selection of Near-Infrared Spectral Variables of Food Crops Combining Orthogonal Matching Tracking and Partial Least Squares
作者:李四海[1];朱刚[1];刘明奇[1];董雯[1]
第一作者:李四海
机构:[1]甘肃中医药大学医学信息工程学院,兰州730000
第一机构:甘肃中医药大学信息工程学院(教育技术中心)
年份:2025
卷号:40
期号:1
起止页码:220
中文期刊名:中国粮油学报
外文期刊名:Journal of the Chinese Cereals and Oils Association
收录:;北大核心:【北大核心2023】;
基金:甘肃省科技计划项目(21JR1RA272),甘肃省教育厅高校教师创新基金项目(2023B-105)。
语种:中文
中文关键词:近红外光谱;变量选择;正交匹配追踪;偏最小二乘;贝叶斯信息准则
外文关键词:near infrared spectroscopy;variable selection;orthogonal matching pursuit;partial least squares;Bayesian information criterion
摘要:为进一步解决正交匹配追踪算法用于近红外光谱定量分析时存在的偏差小、方差大、选择变量较多、模型容易过拟合的问题,提出了一种融合正交匹配追踪和偏最小二乘回归的正交匹配偏最小二乘变量选择方法OMPLS(Orthogonal matching pursuit based partial least squares regression)。OMPLS为前向变量选择方法,算法根据OMP回归系数绝对值大小评价光谱变量重要性,使用偏最小二乘回归和贝叶斯信息准则确定剩余光谱变量中的重要变量,最终得到满足给定数量要求的最优变量集合。分别在corn数据集和wheat kernels数据集上进行变量选择实验,根据选择变量个数、RMSEC和RMSEP比较PLS、OMP、OMPLS 3种变量选择方法的性能。实验结果表明:OMPLS方法在corn数据集和Wheat kernels数据集上选择变量个数、RMSEP值均小于OMP方法,表明模型泛化能力有了一定程度的提高。OMPLS变量选择方法以BIC指标作为模型选择准则,在模型复杂度和预测能力之间取得平衡。与OMP方法相比,能够进一步减少选择变量的数量,防止过拟合,提高模型的预测能力和可解释性。
In order to further solve the problems of small deviation,large variance,multiple selection variables,and easy overfitting of the model in the quantitative analysis of near-infrared spectroscopy by using the orthogonal matching tracking algorithm,an orthogonal matching partial least squares regression(OMPLS,Orthogonal matching pursuit based partial least squares regression)method was proposed.It combined orthogonal matching tracking and partial least squares regression.The OMPLS was a forward variable selection method,by which the algorithm evaluated the importance of spectral variables based on the absolute value of OMP regression coefficients.Partial least squares regression and Bayesian information criteria were adopted to determine important variables in the remaining spectral variables,ultimately obtaining the optimal set of variables that met the given quantity requirements.Variable selection experiments were conducted on the corn dataset and the Wheat kernels dataset,and the performance of PLS,OMP,and OMPLS variable selection methods was compared based on the number of selected variables,RMSEC,and RMSEP.The experimental results indicated that the number of selected variables and RMSEP values of the OMPLS method were smaller than those of the OMP method based on the corn dataset and the wheat kernels dataset,indicating that the model's generalization ability had been improved to a certain extent.The OMPLS variable selection method utilized the BIC index as the model selection criterion,achieving a balance between model complexity and predictive ability.Compared with the OMP method,the number of selected variables could be further reduced to prevent overfitting and improve the predictive ability and interpretability of the model.
参考文献:
正在载入数据...