TY - JOUR
T1 - Regularity Properties for Sparse Regression
T2 - A tribute to Professor Xiru Chen
AU - Dobriban, Edgar
AU - Fan, Jianqing
N1 - Funding Information:
Fan’s research was partially supported by NIH Grants R01GM100474-04 and NIH R01-GM072611-10 and NSF Grants DMS-1206464 and DMS-1406266. The bulk of the research was carried out while Edgar Dobriban was an undergraduate student at Princeton University.
Publisher Copyright:
© 2016, School of Mathematical Sciences, University of Science and Technology of China and Springer-Verlag Berlin Heidelberg.
PY - 2016/3/1
Y1 - 2016/3/1
N2 - Statistical and machine learning theory has developed several conditions ensuring that popular estimators such as the Lasso or the Dantzig selector perform well in high-dimensional sparse regression, including the restricted eigenvalue, compatibility, and ℓq sensitivity properties. However, some of the central aspects of these conditions are not well understood. For instance, it is unknown if these conditions can be checked efficiently on any given dataset. This is problematic, because they are at the core of the theory of sparse regression. Here we provide a rigorous proof that these conditions are NP-hard to check. This shows that the conditions are computationally infeasible to verify, and raises some questions about their practical applications. However, by taking an average-case perspective instead of the worst-case view of NP-hardness, we show that a particular condition, ℓq sensitivity, has certain desirable properties. This condition is weaker and more general than the others. We show that it holds with high probability in models where the parent population is well behaved, and that it is robust to certain data processing steps. These results are desirable, as they provide guidance about when the condition, and more generally the theory of sparse regression, may be relevant in the analysis of high-dimensional correlated observational data.
AB - Statistical and machine learning theory has developed several conditions ensuring that popular estimators such as the Lasso or the Dantzig selector perform well in high-dimensional sparse regression, including the restricted eigenvalue, compatibility, and ℓq sensitivity properties. However, some of the central aspects of these conditions are not well understood. For instance, it is unknown if these conditions can be checked efficiently on any given dataset. This is problematic, because they are at the core of the theory of sparse regression. Here we provide a rigorous proof that these conditions are NP-hard to check. This shows that the conditions are computationally infeasible to verify, and raises some questions about their practical applications. However, by taking an average-case perspective instead of the worst-case view of NP-hardness, we show that a particular condition, ℓq sensitivity, has certain desirable properties. This condition is weaker and more general than the others. We show that it holds with high probability in models where the parent population is well behaved, and that it is robust to certain data processing steps. These results are desirable, as they provide guidance about when the condition, and more generally the theory of sparse regression, may be relevant in the analysis of high-dimensional correlated observational data.
KW - Computational complexity
KW - High-dimensional statistics
KW - Restricted eigenvalue
KW - Sparse regression
KW - ℓ sensitivity
UR - http://www.scopus.com/inward/record.url?scp=84976406289&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84976406289&partnerID=8YFLogxK
U2 - 10.1007/s40304-015-0078-6
DO - 10.1007/s40304-015-0078-6
M3 - Article
C2 - 27330929
AN - SCOPUS:84976406289
SN - 2194-6701
VL - 4
SP - 1
EP - 19
JO - Communications in Mathematics and Statistics
JF - Communications in Mathematics and Statistics
IS - 1
ER -