HIGH-Dimensional classification using features annealed independence rules

Jianqing Fan, Yingying Fan

Research output: Contribution to journalArticle

267 Scopus citations

Abstract

Classification using high-dimensional features arises frequently in many contemporary statistical studies such as tumor classification using microarray or other high-throughput data. The impact of dimensionality on classifications is poorly understood. In a seminal paper, Bickel and Levina [Bernoulli 10 (2004) 989-1010] show that the Fisher discriminant performs poorly due to diverging spectra and they propose to use the independence rule to overcome the problem. We first demonstrate that even for the independence classification rule, classification using all the features can be as poor as the random guessing due to noise accumulation in estimating population cen-troids in high-dimensional feature space. In fact, we demonstrate further that almost all linear discriminants can perform as poorly as the random guessing. Thus, it is important to select a subset of important features for high-dimensional classification, resulting in Features Annealed Independence Rules (FAIR). The conditions under which all the important features can be selected by the two-sample f-statistic are established. The choice of the optimal number of features, or equivalently, the threshold value of the test statistics are proposed based on an upper bound of the classification error. Simulation studies and real data analysis support our theoretical results and demonstrate convincingly the advantage of our new classification procedure.

Original languageEnglish (US)
Pages (from-to)2605-2637
Number of pages33
JournalAnnals of Statistics
Volume36
Issue number6
DOIs
StatePublished - Dec 2008

All Science Journal Classification (ASJC) codes

  • Statistics and Probability
  • Statistics, Probability and Uncertainty

Keywords

  • Classification
  • Feature extraction
  • High dimensionality
  • Independence rule
  • Misclassification rates

Fingerprint Dive into the research topics of 'HIGH-Dimensional classification using features annealed independence rules'. Together they form a unique fingerprint.

  • Cite this