Feature selection for pairwise scoring kernels with applications to protein subcellular localization

Sun-Yuan Kung, Man Wai Mak

Research output: Chapter in Book/Report/Conference proceedingConference contribution

3 Scopus citations

Abstract

In biological sequence classification, it is common to convert variable-length sequences into fixed-length vectors via pairwise sequence comparison. This pairwise approach, however, can lead to feature vectors with dimension equal to the training set size, causing the curse of dimensionality. This calls for feature selection methods that can weed out irrelevant features to reduce training and recognition time. In this paper, we propose to train an SVM using the full-feature column vectors of a pairwise scoring matrix and select the relevant features based on the support vectors of the SVM. The idea stems from the fact that pairwise scoring matrices are symmetric and support vectors are important for classification. We refer to this approach as vector-index-adaptive SVM (VIA-SVM). We compare VIA-SVM with other feature selection schemes - including SVMRFE, R-SVM, and a filter method based on symmetric divergence (SD) - in protein subcellular localization. Results show that VIA-SVM is able to automatically bound the number of selected features within a small range. We also found that fusion of VIA-SVM and SD can produce more compact feature subsets without decreasing prediction accuracy, and that while VIA-SVM is superior for large feature-set size, the combination of SD and VIA-SVM performs better at small feature-set size.

Original languageEnglish (US)
Title of host publication2007 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '07
DOIs
StatePublished - Aug 6 2007
Event2007 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '07 - Honolulu, HI, United States
Duration: Apr 15 2007Apr 20 2007

Publication series

NameICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
Volume2
ISSN (Print)1520-6149

Other

Other2007 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '07
CountryUnited States
CityHonolulu, HI
Period4/15/074/20/07

All Science Journal Classification (ASJC) codes

  • Software
  • Signal Processing
  • Electrical and Electronic Engineering

Fingerprint Dive into the research topics of 'Feature selection for pairwise scoring kernels with applications to protein subcellular localization'. Together they form a unique fingerprint.

  • Cite this

    Kung, S-Y., & Mak, M. W. (2007). Feature selection for pairwise scoring kernels with applications to protein subcellular localization. In 2007 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '07 [4217472] (ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings; Vol. 2). https://doi.org/10.1109/ICASSP.2007.366299