Abstract
Rates of convergence for nearest neighbor estimation are established in a general framework in terms of metric covering numbers of the underlying space. Our first result is to find explicit finite sample upper bounds for the classical independent and identically distributed (i.i.d.) random sampling problem in a separable metric space setting. The convergence rate is a function of the covering numbers of the support of the distribution. For example, for bounded subsets of ℝr, the convergence rate is O (1/n2/r). Our main result is to extend the problem to allow samples drawn from a completely arbitrary random process in a separable metric space and to examine the performance in terms of the individual sample sequences. We show that for every sequence of samples the asymptotic time-average of nearest neighbor risks equals twice the time-average of the conditional Bayes risks of the sequence. Finite sample upper bounds under arbitrary sampling are again obtained in terms of the covering numbers of the underlying space. In particular, for bounded subsets of ℝr the convergence rate of the time-averaged risk is O (1/n2/r). We then establish a consistency result for kn-nearest neighbor estimation under arbitrary sampling and prove a convergence rate matching established rates for i.i.d. sampling. Finally, we show how our arbitrary sampling results lead to some classical i.i.d. sampling results and in fact extend them to stationary sampling. Our framework and results are quite general while the proof techniques are surprisingly elementary.
Original language | English (US) |
---|---|
Pages (from-to) | 1028-1039 |
Number of pages | 12 |
Journal | IEEE Transactions on Information Theory |
Volume | 41 |
Issue number | 4 |
DOIs | |
State | Published - Jul 1995 |
All Science Journal Classification (ASJC) codes
- Information Systems
- Computer Science Applications
- Library and Information Sciences
Keywords
- Nearest neighbor
- arbitrary sampling
- consistency
- covering numbers
- deterministic sampling
- metric entropy
- nonparametric regression estimation
- rates of convergence
- worst case