TY - JOUR
T1 - Group subset selection for linear regression
AU - Guo, Yi
AU - Berman, Mark
AU - Gao, Junbin
PY - 2014
Y1 - 2014
N2 - Two fast group subset selection (GSS) algorithms for the linear regression model are proposed in this paper. GSS finds the best combinations of groups up to a specified size minimising the residual sum of squares. This imposes an l0 constraint on the regression coefficients in a group context. It is a combinatorial optimisation problem with NP complexity. To make the exhaustive search very efficient, the GSS algorithms are built on QR decomposition and branch-and-bound techniques. They are suitable for middle scale problems where finding the most accurate solution is essential. In the application motivating this research, it is natural to require that the coefficients of some of the variables within groups satisfy some constraints (e.g. non-negativity). Therefore the GSS algorithms (optionally) calculate the model coefficient estimates during the exhaustive search in order to screen combinations that do not meet the constraints. The faster of the two GSS algorithms is compared to an extension to the original group Lasso, called the constrained group Lasso (CGL), which is proposed to handle convex constraints and to remove orthogonality requirements on the variables within each group. CGL is a convex relaxation of the GSS problem and hence more straightforward to solve. Although CGL is inferior to GSS in terms of group selection accuracy, it is a fast approximation to GSS if the optimal regularisation parameter can be determined efficiently and, in some cases, it may serve as a screening procedure to reduce the number of groups.
AB - Two fast group subset selection (GSS) algorithms for the linear regression model are proposed in this paper. GSS finds the best combinations of groups up to a specified size minimising the residual sum of squares. This imposes an l0 constraint on the regression coefficients in a group context. It is a combinatorial optimisation problem with NP complexity. To make the exhaustive search very efficient, the GSS algorithms are built on QR decomposition and branch-and-bound techniques. They are suitable for middle scale problems where finding the most accurate solution is essential. In the application motivating this research, it is natural to require that the coefficients of some of the variables within groups satisfy some constraints (e.g. non-negativity). Therefore the GSS algorithms (optionally) calculate the model coefficient estimates during the exhaustive search in order to screen combinations that do not meet the constraints. The faster of the two GSS algorithms is compared to an extension to the original group Lasso, called the constrained group Lasso (CGL), which is proposed to handle convex constraints and to remove orthogonality requirements on the variables within each group. CGL is a convex relaxation of the GSS problem and hence more straightforward to solve. Although CGL is inferior to GSS in terms of group selection accuracy, it is a fast approximation to GSS if the optimal regularisation parameter can be determined efficiently and, in some cases, it may serve as a screening procedure to reduce the number of groups.
UR - http://handle.uws.edu.au:8081/1959.7/550452
U2 - 10.1016/j.csda.2014.02.005
DO - 10.1016/j.csda.2014.02.005
M3 - Article
SN - 1872-7352
VL - 75
SP - 39
EP - 52
JO - Computational Statistics & Data Analysis
JF - Computational Statistics & Data Analysis
ER -