We consider the problem of estimating the joint density of a d-dimensional random vector X = (X 1,X 2,...,X d) when d is large. We assume that the density is a product of a parametric component and a nonparametric component which depends on an unknown subset of the variables. Using a modification of a recently developed nonparametric regression framework called rodeo (regularization of derivative expectation operator), we propose a method to greedily select bandwidths in a kernel density estimate. It is shown empirically that the density rodeo works well even for very high dimensional problems. When the unknown density function satisfies a suitably defined sparsity condition, and the parametric baseline density is smooth, the approach is shown to achieve near optimal minimax rates of convergence, and thus avoids the curse of dimensionality.
|Number of pages
|Journal of Machine Learning Research
|Published - Dec 1 2007
All Science Journal Classification (ASJC) codes
- Control and Systems Engineering
- Statistics and Probability
- Artificial Intelligence