Abstract
Predicting the localization of chloroplast proteins at the sub-subcellular level is an essential yet challenging step to elucidate their functions. Most of the existing subchloroplast localization predictors are limited to predicting single-location proteins and ignore the multi-location chloroplast proteins. While recent studies have led to some multi-location chloroplast predictors, they usually perform poorly. This paper proposes an ensemble transductive learning method to tackle this multi-label classification problem. Specifically, given a protein in a dataset, its composition-based sequence information and profile-based evolutionary information are respectively extracted. These two kinds of features are respectively compared with those of other proteins in the dataset. The comparisons lead to two similarity vectors which are weighted-combined to constitute an ensemble feature vector. A transductive learning model based on the least squares and nearest neighbor algorithms is proposed to process the ensemble features. We refer to the resulting predictor as as EnTrans-Chlo. Experimental results on a stringent benchmark dataset and a novel dataset demonstrate that EnTrans-Chlo significantly outperforms state-of-the-art predictors and particularly gains more than 4 percent (absolute) improvement on the overall actual accuracy. For readers' convenience, EnTrans-Chlo is freely available online at http://bioinfo.eie.polyu.edu.hk/EnTransChloServer/.
Original language | English (US) |
---|---|
Article number | 7401011 |
Pages (from-to) | 212-224 |
Number of pages | 13 |
Journal | IEEE/ACM Transactions on Computational Biology and Bioinformatics |
Volume | 14 |
Issue number | 1 |
DOIs | |
State | Published - Jan 1 2017 |
All Science Journal Classification (ASJC) codes
- Applied Mathematics
- Genetics
- Biotechnology
Keywords
- Protein subchloroplast localization prediction
- ensemble transductive learning
- multi-label classification
- profile alignment