TY - JOUR
T1 - MPLR-Loc
T2 - An adaptive decision multi-label classifier based on penalized logistic regression for protein subcellular localization prediction
AU - Wan, Shibiao
AU - Mak, Man Wai
AU - Kung, Sun Yuan
N1 - Publisher Copyright:
© 2014 Elsevier Inc. All rights reserved.
PY - 2015/3/15
Y1 - 2015/3/15
N2 - Proteins located in appropriate cellular compartments are of paramount importance to exert their biological functions. Prediction of protein subcellular localization by computational methods is required in the post-genomic era. Recent studies have been focusing on predicting not only single-location proteins but also multi-location proteins. However, most of the existing predictors are far from effective for tackling the challenges of multi-label proteins. This article proposes an efficient multi-label predictor, namely mPLR-Loc, based on penalized logistic regression and adaptive decisions for predicting both single- and multi-location proteins. Specifically, for each query protein, mPLR-Loc exploits the information from the Gene Ontology (GO) database by using its accession number (AC) or the ACs of its homologs obtained via BLAST. The frequencies of GO occurrences are used to construct feature vectors, which are then classified by an adaptive decision-based multi-label penalized logistic regression classifier. Experimental results based on two recent stringent benchmark datasets (virus and plant) show that mPLR-Loc remarkably outperforms existing state-of-the-art multi-label predictors. In addition to being able to rapidly and accurately predict subcellular localization of single- and multi-label proteins, mPLR-Loc can also provide probabilistic confidence scores for the prediction decisions. For readers' convenience, the mPLR-Loc server is available online (http://bioinfo.eie.polyu.edu.hk/mPLRLocServer).
AB - Proteins located in appropriate cellular compartments are of paramount importance to exert their biological functions. Prediction of protein subcellular localization by computational methods is required in the post-genomic era. Recent studies have been focusing on predicting not only single-location proteins but also multi-location proteins. However, most of the existing predictors are far from effective for tackling the challenges of multi-label proteins. This article proposes an efficient multi-label predictor, namely mPLR-Loc, based on penalized logistic regression and adaptive decisions for predicting both single- and multi-location proteins. Specifically, for each query protein, mPLR-Loc exploits the information from the Gene Ontology (GO) database by using its accession number (AC) or the ACs of its homologs obtained via BLAST. The frequencies of GO occurrences are used to construct feature vectors, which are then classified by an adaptive decision-based multi-label penalized logistic regression classifier. Experimental results based on two recent stringent benchmark datasets (virus and plant) show that mPLR-Loc remarkably outperforms existing state-of-the-art multi-label predictors. In addition to being able to rapidly and accurately predict subcellular localization of single- and multi-label proteins, mPLR-Loc can also provide probabilistic confidence scores for the prediction decisions. For readers' convenience, the mPLR-Loc server is available online (http://bioinfo.eie.polyu.edu.hk/mPLRLocServer).
KW - Adaptive decision
KW - Logistic regression
KW - Multi-label classification
KW - Multi-location proteins
KW - Protein subcellular localization
UR - http://www.scopus.com/inward/record.url?scp=84961290985&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84961290985&partnerID=8YFLogxK
U2 - 10.1016/j.ab.2014.10.014
DO - 10.1016/j.ab.2014.10.014
M3 - Article
C2 - 25449328
AN - SCOPUS:84961290985
SN - 0003-2697
VL - 473
SP - 14
EP - 27
JO - Analytical Biochemistry
JF - Analytical Biochemistry
ER -