详细信息

正交匹配追踪算法的近红外光谱定量分析  ( SCI-EXPANDED收录 EI收录)  

Quantitative Analysis of Near Infrared Spectroscopy Based on Orthogonal Matching Pursuit Algorithm

文献类型:期刊文献

中文题名:正交匹配追踪算法的近红外光谱定量分析

英文题名:Quantitative Analysis of Near Infrared Spectroscopy Based on Orthogonal Matching Pursuit Algorithm

作者:李四海[1];刘东玲[2]

第一作者:李四海

通信作者:Li, SH[1]

机构:[1]甘肃中医药大学信息工程学院,甘肃兰州730000;[2]甘肃中医药大学药学院,甘肃兰州730000

第一机构:甘肃中医药大学信息工程学院(教育技术中心)

通信机构:[1]corresponding author), Gansu Univ Chinese Med, Coll Informat Engn, Lanzhou 730000, Peoples R China.|[10735]甘肃中医药大学;

年份:2021

卷号:41

期号:4

起止页码:1097

中文期刊名:光谱学与光谱分析

外文期刊名:Spectroscopy and Spectral Analysis

收录:CSTPCD;;EI(收录号:20211610215799);Scopus;WOS:【SCI-EXPANDED(收录号:WOS:000646723300017)】;北大核心:【北大核心2020】;CSCD:【CSCD2021_2022】;PubMed;

基金:国家自然科学基金项目(81603407);甘肃省自然科学基金项目(1506RJZA046);兰州市科技计划项目(2018-3-41);甘肃省高校中(藏)药化学与质量研究省级重点实验室开放基金项目(zzy-2018-05)资助。

语种:中文

中文关键词:近红外光谱;变量选择;压缩感知;偏最小二乘;正交匹配追踪

外文关键词:Near infrared spectroscopy;Variable selection;Compressed sensing;Partial Least squares;Orthogonal matching pursuit

摘要:压缩感知(CS)是一种新兴的信号压缩和采样技术,正交匹配追踪(OMP)是一种贪婪追踪算法,广泛用于压缩感知领域中的稀疏信号重构。针对近红外光谱信号高维小样本以及信号稀疏先验的特点,为进一步提高小样本近红外光谱变量选择的灵活性和可靠性,基于压缩感知理论,提出了一种新颖的光谱变量选择方法正交匹配追踪变量选择(OMPBVS)。OMPBVS算法通过对原始光谱信号的稀疏重构,将绝大部分变量的回归系数压缩为0,进而间接实现光谱变量选择。具体过程为以光谱矩阵为传感矩阵,预测变量为观测变量,迭代地计算残差与原子的内积,选择内积最大的原子,在每一步迭代过程中将信号投影到由所有已经被选择原子张成的子空间上,然后对所有被选择原子的系数进行更新,使得产生的残差与已被选择的所有原子都正交,其残差计算的实质是进行Gram-Schmidt正交化,正交投影能够在保证信号重构精度的情况下减小迭代次数。OMPBVS具有将光谱维度降低至样本大小规模的能力,其变量选择能力与LASSO相当,但与LASSO相比,由于OMPBVS损失函数的优化方法是前向选择算法,减少了迭代次数,并且可以精确控制选择变量的数量。分别在beer数据集和Wheat kernels数据集上进行变量选择实验,比较PLS,MCUVE-PLS,CARS-PLS,WMSCVS,LASSOLarsCV和OMPBVS六种变量选择方法的性能。其中beer数据集共60个样本,采用Kennard Stone(KS)方法划分训练集样本36个,测试集样本24个,预测变量为Original extract concentration。Wheat kernels数据集共523个样本,训练集样本415个,测试集样本108个,预测值为蛋白质含量。OMPBVS方法在beer数据集上选择变量个数、RMSEC和RMSEP分别为2,0.2052和0.1598,在Wheat kernels数据集上选择变量个数、RMSEC和RMSEP分别为9,0.4502和0.4125,其变量选择能力和模型性能均好于其他五种方法,这说明OMPBVS是一种有效的近红外光谱变量选择和定量分析方法。OMPBVS变量选择方法在小样本情况下具有良好的泛化能力,能够减少选择变量的数量,提高变量选择的稳健性。此外,基于SNV和MSC等光谱预处理方法,能够在一定程度上减少选择变量的个数,提高模型的可解释性。
Compressed sensing(CS)is a new technology of signal compression and sampling.Orthogonal Matching Pursuit(OMP),a greedy tracking algorithm,is widely used in sparse signal reconstruction in the compressed sensing field.In connection with the characteristics of high-dimensional small samples of near-infrared spectra signals and sparse prior signals,a novel near-infrared spectra variable selection method named Orthogonal Matching Pursuit Based Variable Selection(OMPBVS)is proposed,based on the compressed sensing theory,to further improve the flexibility and reliability of near-infrared spectra variable selection.By sparse reconstruction of the original spectral signal,OMPBVS can compress the regression coefficient of most variables to zero,and then indirectly realize the selection of spectral variables.In the specific process,the spectral matrix is adopted as the sensing matrix,the predictive variable as the observation variable and iteratively calculated residual and the inner product of the atom,and the inner product of the largest atom is chosen.During each iteration,the signal is projected onto the subspace spanned by all selected atoms,and then the coefficients are updated for all the selected atoms,enabling the residual error and all the selected atoms to be orthogonal.With the residual calculation to be the essence of Grammar-Schmidt Orthogonalization,the orthogonal projection can reduce the number of iterations and ensure the accuracy of signal reconstruction.OMPBVS can reduce the spectral dimension to the sample size scale,and its variable selection capability is comparable to LASSO.However,compared with LASSO,the optimization method of OMPBVS loss function is a forward selection algorithm,which reduces the number of iterations and can precisely control the number of selected variables.Variable selection experiments were performed on the beer dataset and Wheat kernels dataset to compare the performance of six variable selection methods:PLS,MCUVE,CARS,WMSCVS,LASSOLarsCV,and OMPBVS.There were 60 samples in the beer dataset,36 samples of the training set and 24 samples of the test set were divided by Kennard Stone(KS)method,and the prediction variable was Original extract concentration.The Wheat kernels data set consisted of 523 samples,415 training samples,and 108 test samples.The predicted value was protein content.The OMPBVS method selects the number of variables,RMSEC and RMSEP from the beer dataset as 2,0.2052 and 0.1598,respectively.When on the Wheat kernels data set,the number of selected variables,RMSEC and RMSEP were 9,0.4502,and 0.4125,respectively,and the variable selection ability and model performance was better than the other five methods,indicating that OMPBVS is an effective NIR spectral variable selection and quantitative analysis method.OMPBVS variable selection method has good generalization ability in the case of small samples,which can reduce the number of selected variables and improve the robustness of variable selection.Besides,spectral preprocessing methods based on SNV and MSC can reduce the number of selected variables to a certain extent and improve the interpretability of the model.

参考文献:

正在载入数据...

版权所有©甘肃中医药大学 重庆维普资讯有限公司 渝B2-20050021-8 
渝公网安备 50019002500408号 违法和不良信息举报中心