Abstract
Variable selection is an important and practical problem that arises in analysis of many high-dimensional datasets. Convex optimization procedures that arise from relaxing the NP-hard subset selection procedure, e.g., the Lasso or Dantzig selector, have become the focus of intense theoretical investigations. Although many efficient algorithms exist that solve these problems, finding a solution when the number of variables is large, e.g., several hundreds of thousands in problems arising in genome-wide association analysis, is still computationally challenging. A practical solution for these high-dimensional problems is marginal regression, where the output is regressed on each variable separately. We investigate theoretical properties of marginal regression in a multitask framework. Our contribution include: i) sharp analysis for marginal regression in a single task setting with random design, ii) sufficient conditions for the multitask screening to select the relevant variables, iii) a lower bound on the Hamming distance convergence for multitask variable selection problems. A simulation study further demonstrates the performance of marginal regression.
Original language | English (US) |
---|---|
Pages (from-to) | 647-655 |
Number of pages | 9 |
Journal | Journal of Machine Learning Research |
Volume | 22 |
State | Published - 2012 |
Event | 15th International Conference on Artificial Intelligence and Statistics, AISTATS 2012 - La Palma, Spain Duration: Apr 21 2012 → Apr 23 2012 |
All Science Journal Classification (ASJC) codes
- Software
- Artificial Intelligence
- Control and Systems Engineering
- Statistics and Probability