Abstract
The subcellular locations of proteins are important functional annotations. An effective and reliable subcellular localization method is necessary for proteomics research. This paper introduces a new method-PairProSVM-to automatically predict the subcellular locations of proteins. The profiles of all protein sequences in the training set are constructed by PSI-BLAST, and the pairwise profile alignment scores are used to form feature vectors for training a support vector machine (SVM) classifier. It was found that PairProSVM outperforms the methods that are based on sequence alignment and amino acid compositions even if most of the homologous sequences have been removed. PairProSVM was evaluated on Huang and Li's and Gardy et al.'s protein data sets. The overall accuracies on these data sets reach 75.3 percent and 91.9 percent, respectively, which are higher than or comparable to those obtained by sequence alignment and composition-based methods.
Original language | English (US) |
---|---|
Article number | 4384576 |
Pages (from-to) | 416-422 |
Number of pages | 7 |
Journal | IEEE/ACM Transactions on Computational Biology and Bioinformatics |
Volume | 5 |
Issue number | 3 |
DOIs | |
State | Published - Jul 2008 |
All Science Journal Classification (ASJC) codes
- Applied Mathematics
- Genetics
- Biotechnology
Keywords
- Kernel methods
- Profile alignment
- Protein subcellular localization
- Sequence alignment
- Support vector machines