TY - GEN
T1 - Truncation of protein sequences for fast profile alignment with application to subcellular localization
AU - Mak, Man Wai
AU - Wang, Wei
AU - Kung, Sun Yuan
PY - 2010
Y1 - 2010
N2 - We have recently found that the computation time of homology-based subcellular localization can be substantially reduced by aligning profiles up to the cleavage site positions of signal peptides, mitochondrial targeting peptides, and chloroplast transit peptides [1]. While the method can reduce the profile alignment time by as much as 20 folds, it cannot reduce the computation time spent on creating the profiles. In this paper, we propose a new approach that can reduce both the profile creation time and profile alignment time. In the new approach, instead of cutting the profiles, we shorten the sequences by cutting them at the cleavage site locations. The shortened sequences are then presented to PSI-BLAST to compute the profiles. Experimental results and analysis of profile-alignment score matrices suggest that both profile creation time and profile alignment time can be reduced without sacrificing subcellular localization accuracy. Once a pairwise profile-alignment score matrix has been obtained, a one-vs-rest SVM classifier can be trained. To further reduce the training and recognition time of the classifier, we propose a perturbation discriminant analysis (PDA) technique. It was found that PDA enjoys a short training time as compared to the conventional SVM.
AB - We have recently found that the computation time of homology-based subcellular localization can be substantially reduced by aligning profiles up to the cleavage site positions of signal peptides, mitochondrial targeting peptides, and chloroplast transit peptides [1]. While the method can reduce the profile alignment time by as much as 20 folds, it cannot reduce the computation time spent on creating the profiles. In this paper, we propose a new approach that can reduce both the profile creation time and profile alignment time. In the new approach, instead of cutting the profiles, we shorten the sequences by cutting them at the cleavage site locations. The shortened sequences are then presented to PSI-BLAST to compute the profiles. Experimental results and analysis of profile-alignment score matrices suggest that both profile creation time and profile alignment time can be reduced without sacrificing subcellular localization accuracy. Once a pairwise profile-alignment score matrix has been obtained, a one-vs-rest SVM classifier can be trained. To further reduce the training and recognition time of the classifier, we propose a perturbation discriminant analysis (PDA) technique. It was found that PDA enjoys a short training time as compared to the conventional SVM.
KW - Cleavage sites prediction
KW - Kernel discriminant analysis
KW - Profiles alignment
KW - Protein sequences
KW - SVM
KW - Subcellular localization
UR - http://www.scopus.com/inward/record.url?scp=79952399893&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=79952399893&partnerID=8YFLogxK
U2 - 10.1109/BIBM.2010.5706548
DO - 10.1109/BIBM.2010.5706548
M3 - Conference contribution
AN - SCOPUS:79952399893
SN - 9781424483075
T3 - Proceedings - 2010 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2010
SP - 115
EP - 120
BT - Proceedings - 2010 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2010
T2 - 2010 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2010
Y2 - 18 December 2010 through 21 December 2010
ER -