Optimality driven nearest centroid classification from genomic data

Alan R. Dabney, John D. Storey

Research output: Contribution to journalArticle

23 Scopus citations

Abstract

Nearest-centroid classifiers have recently been successfully employed in high-dimensional applications, such as in genomics. A necessary step when building a classifier for high-dimensional data is feature selection. Feature selection is frequently carried out by computing univariate scores for each feature individually, without consideration for how a subset of features performs as a whole. We introduce a new feature selection approach for high-dimensional nearest centroid classifiers that instead is based on the theoretically optimal choice of a given number of features, which we determine directly here. This allows us to develop a new greedy algorithm to estimate this optimal nearest-centroid classifier with a given number of feetures. In addition, whereas the centroids; are usually formed from maximum likelihood estimates, we investigate the applicability of high-dimensional shrinkage estimates of centroids. We apply the proposed method to clinical classification based on gene-expression microarrays, demonstrating that the proposed method can outperform existing nearest centroid classifiers.

Original languageEnglish (US)
Article numbere1002
JournalPloS one
Volume2
Issue number10
DOIs
StatePublished - Oct 3 2007

All Science Journal Classification (ASJC) codes

  • Biochemistry, Genetics and Molecular Biology(all)
  • Agricultural and Biological Sciences(all)
  • General

Fingerprint Dive into the research topics of 'Optimality driven nearest centroid classification from genomic data'. Together they form a unique fingerprint.

  • Cite this