TY - JOUR
T1 - A comparison between subset selection and L1 regularisation with an application in spectroscopy
AU - Guo, Yi
AU - Berman, Mark
PY - 2012
Y1 - 2012
N2 - A reflectance spectrum measures the reflectance of a material at hundreds or thousands of wavelengths. It provides chemical information about the material. Many rock samples actually contain a mixture of minerals. Because of the differing chemical compositions of the component minerals, the spectra of the rock sample often enable us to "unmix" the spectra to identify their mineral components. This is done with the aid of a fairly large library of pure spectra and a relatively simple linear mixture model, but with non-negativity constraints on some of the coefficients in the model. For many years, we have used full subset selection methods to identify the composition of millions of samples. There are several difficulties with this approach, in particular: (i) identifying the composition of large numbers of samples can be relatively slow, and (ii) estimating the number of components in a mixture is not as reliable as we would like, because both the deterministic and stochastic parts of our model are only approximations to reality, and hence classical statistical methods for deciding on the order of a linear model (e.g. F tests, AIC) do not work very well. Hence, ad hoc methods have had to be developed. In the hope of overcoming these difficulties, we have investigated the use of L1 regularisation as an alternative, because it is a convex optimisation problem and therefore there are efficient methods for finding the unique optimum. Moreover, it is straightforward to carry out L1 regularisation incorporating non-negativity constraints on some of the coefficients. Unfortunately, L1 regularisation does not work as well as full subset selection does. We briefly discuss a possible hybrid approach.
AB - A reflectance spectrum measures the reflectance of a material at hundreds or thousands of wavelengths. It provides chemical information about the material. Many rock samples actually contain a mixture of minerals. Because of the differing chemical compositions of the component minerals, the spectra of the rock sample often enable us to "unmix" the spectra to identify their mineral components. This is done with the aid of a fairly large library of pure spectra and a relatively simple linear mixture model, but with non-negativity constraints on some of the coefficients in the model. For many years, we have used full subset selection methods to identify the composition of millions of samples. There are several difficulties with this approach, in particular: (i) identifying the composition of large numbers of samples can be relatively slow, and (ii) estimating the number of components in a mixture is not as reliable as we would like, because both the deterministic and stochastic parts of our model are only approximations to reality, and hence classical statistical methods for deciding on the order of a linear model (e.g. F tests, AIC) do not work very well. Hence, ad hoc methods have had to be developed. In the hope of overcoming these difficulties, we have investigated the use of L1 regularisation as an alternative, because it is a convex optimisation problem and therefore there are efficient methods for finding the unique optimum. Moreover, it is straightforward to carry out L1 regularisation incorporating non-negativity constraints on some of the coefficients. Unfortunately, L1 regularisation does not work as well as full subset selection does. We briefly discuss a possible hybrid approach.
UR - http://handle.uws.edu.au:8081/1959.7/550487
U2 - 10.1016/j.chemolab.2012.08.010
DO - 10.1016/j.chemolab.2012.08.010
M3 - Article
SN - 0169-7439
VL - 118
SP - 127
EP - 138
JO - Chemometrics and Intelligent Laboratory Systems
JF - Chemometrics and Intelligent Laboratory Systems
ER -