## Abstract

Variable selection is an important and practical problem that arises in analysis of many high-dimensional datasets. Convex optimization procedures that arise from relaxing the NP-hard subset selection procedure, e.g., the Lasso or Dantzig selector, have become the focus of intense theoretical investigations. Although many efficient algorithms exist that solve these problems, finding a solution when the number of variables is large, e.g., several hundreds of thousands in problems arising in genome-wide association analysis, is still computationally challenging. A practical solution for these high-dimensional problems is marginal regression, where the output is regressed on each variable separately. We investigate theoretical properties of marginal regression in a multitask framework. Our contribution include: i) sharp analysis for marginal regression in a single task setting with random design, ii) sufficient conditions for the multitask screening to select the relevant variables, iii) a lower bound on the Hamming distance convergence for multitask variable selection problems. A simulation study further demonstrates the performance of marginal regression.

Original language | English (US) |
---|---|

Pages (from-to) | 647-655 |

Number of pages | 9 |

Journal | Journal of Machine Learning Research |

Volume | 22 |

State | Published - Jan 1 2012 |

Externally published | Yes |

## All Science Journal Classification (ASJC) codes

- Software
- Control and Systems Engineering
- Statistics and Probability
- Artificial Intelligence