TY - GEN
T1 - Speeding up subcellular localization by extracting informative regions of protein sequences for profile alignment
AU - Wang, Wei
AU - Mak, Man Wai
AU - Kung, Sun Yuan
N1 - Copyright:
Copyright 2010 Elsevier B.V., All rights reserved.
PY - 2010
Y1 - 2010
N2 - The functions of proteins are closely related to their subcellular locations. In the post-proteomics era, the amount of gene and protein data grows exponentially, which necessitates the prediction of subcellular localization by computational means. This paper proposes mitigating the computation burden of alignment-based approaches to subcellular localization prediction by using the information provided by the N-terminal sorting signals. To this end, a cascaded fusion of cleavage site prediction and profile alignment is proposed. Specifically, the informative segments of protein sequences are identified by a cleavage site predictor. Then, only the informative segments are applied to a homology-based classifier for predicting the subcellular locations. Experimental results on a newly constructed dataset show that the method can make use of the best property of both approaches and can attain an accuracy higher than using the full-length sequences. Moreover, the method can reduce the computation time by 20 folds. We advocate that the method will be important for biologists to conduct large-scale protein annotation or for bioinformaticians to perform preliminary investigations on new algorithms that involve pairwise alignments.
AB - The functions of proteins are closely related to their subcellular locations. In the post-proteomics era, the amount of gene and protein data grows exponentially, which necessitates the prediction of subcellular localization by computational means. This paper proposes mitigating the computation burden of alignment-based approaches to subcellular localization prediction by using the information provided by the N-terminal sorting signals. To this end, a cascaded fusion of cleavage site prediction and profile alignment is proposed. Specifically, the informative segments of protein sequences are identified by a cleavage site predictor. Then, only the informative segments are applied to a homology-based classifier for predicting the subcellular locations. Experimental results on a newly constructed dataset show that the method can make use of the best property of both approaches and can attain an accuracy higher than using the full-length sequences. Moreover, the method can reduce the computation time by 20 folds. We advocate that the method will be important for biologists to conduct large-scale protein annotation or for bioinformaticians to perform preliminary investigations on new algorithms that involve pairwise alignments.
KW - Cleavage sites prediction
KW - Profiles alignment
KW - Protein sequences
KW - Subcellular localization
KW - Support vector machines
UR - http://www.scopus.com/inward/record.url?scp=77955645510&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=77955645510&partnerID=8YFLogxK
U2 - 10.1109/CIBCB.2010.5510320
DO - 10.1109/CIBCB.2010.5510320
M3 - Conference contribution
AN - SCOPUS:77955645510
SN - 9781424467662
T3 - 2010 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology, CIBCB 2010
SP - 147
EP - 154
BT - 2010 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology, CIBCB 2010
T2 - 2010 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology, CIBCB 2010
Y2 - 2 May 2010 through 5 May 2010
ER -