TY - GEN
T1 - Feature selection for pairwise scoring kernels with applications to protein subcellular localization
AU - Kung, Sun Yuan
AU - Mak, Man Wai
N1 - Copyright:
Copyright 2008 Elsevier B.V., All rights reserved.
PY - 2007
Y1 - 2007
N2 - In biological sequence classification, it is common to convert variable-length sequences into fixed-length vectors via pairwise sequence comparison. This pairwise approach, however, can lead to feature vectors with dimension equal to the training set size, causing the curse of dimensionality. This calls for feature selection methods that can weed out irrelevant features to reduce training and recognition time. In this paper, we propose to train an SVM using the full-feature column vectors of a pairwise scoring matrix and select the relevant features based on the support vectors of the SVM. The idea stems from the fact that pairwise scoring matrices are symmetric and support vectors are important for classification. We refer to this approach as vector-index-adaptive SVM (VIA-SVM). We compare VIA-SVM with other feature selection schemes - including SVMRFE, R-SVM, and a filter method based on symmetric divergence (SD) - in protein subcellular localization. Results show that VIA-SVM is able to automatically bound the number of selected features within a small range. We also found that fusion of VIA-SVM and SD can produce more compact feature subsets without decreasing prediction accuracy, and that while VIA-SVM is superior for large feature-set size, the combination of SD and VIA-SVM performs better at small feature-set size.
AB - In biological sequence classification, it is common to convert variable-length sequences into fixed-length vectors via pairwise sequence comparison. This pairwise approach, however, can lead to feature vectors with dimension equal to the training set size, causing the curse of dimensionality. This calls for feature selection methods that can weed out irrelevant features to reduce training and recognition time. In this paper, we propose to train an SVM using the full-feature column vectors of a pairwise scoring matrix and select the relevant features based on the support vectors of the SVM. The idea stems from the fact that pairwise scoring matrices are symmetric and support vectors are important for classification. We refer to this approach as vector-index-adaptive SVM (VIA-SVM). We compare VIA-SVM with other feature selection schemes - including SVMRFE, R-SVM, and a filter method based on symmetric divergence (SD) - in protein subcellular localization. Results show that VIA-SVM is able to automatically bound the number of selected features within a small range. We also found that fusion of VIA-SVM and SD can produce more compact feature subsets without decreasing prediction accuracy, and that while VIA-SVM is superior for large feature-set size, the combination of SD and VIA-SVM performs better at small feature-set size.
KW - Feature selection
KW - Kernel methods
KW - Pairwise scoring
KW - SVM
KW - Subcellular localization
UR - http://www.scopus.com/inward/record.url?scp=34547525054&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=34547525054&partnerID=8YFLogxK
U2 - 10.1109/ICASSP.2007.366299
DO - 10.1109/ICASSP.2007.366299
M3 - Conference contribution
AN - SCOPUS:34547525054
SN - 1424407281
SN - 9781424407286
T3 - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
SP - II569-II572
BT - 2007 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '07
T2 - 2007 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '07
Y2 - 15 April 2007 through 20 April 2007
ER -