TY - GEN
T1 - An ensemble classifier with random projection for predicting multi-label protein subcellular localization
AU - Wan, Shibiao
AU - Mak, Man Wai
AU - Zhang, Bai
AU - Wang, Yue
AU - Kung, Sun Yuan
PY - 2013
Y1 - 2013
N2 - In protein subcellular localization prediction, a predominant scenario is that the number of available features is much larger than the number of data samples. Among the large number of features, many of them may contain redundant or irrelevant information, causing the prediction systems suffer from overfitting. To address this problem, this paper proposes a dimensionality- reduction method that applies random projection (RP) to construct an ensemble multi-label classifier for predicting protein subcellular localization. Specifically, the frequencies of occurrences of gene-ontology terms are used as feature vectors, which are projected onto lower-dimensional spaces by random projection matrices whose elements conform to a distribution with zero mean and unit variance. The transformed low-dimensional vectors are classified by an ensemble of one-vs-rest multi-label support vector machine (SVM) classifiers, each corresponding to one of the RP matrices. The scores obtained from the ensemble are then fused for making the final decision. Experimental results on two recent datasets suggest that the proposed method can reduce the dimensions by six folds and remarkably improve the classification performance.
AB - In protein subcellular localization prediction, a predominant scenario is that the number of available features is much larger than the number of data samples. Among the large number of features, many of them may contain redundant or irrelevant information, causing the prediction systems suffer from overfitting. To address this problem, this paper proposes a dimensionality- reduction method that applies random projection (RP) to construct an ensemble multi-label classifier for predicting protein subcellular localization. Specifically, the frequencies of occurrences of gene-ontology terms are used as feature vectors, which are projected onto lower-dimensional spaces by random projection matrices whose elements conform to a distribution with zero mean and unit variance. The transformed low-dimensional vectors are classified by an ensemble of one-vs-rest multi-label support vector machine (SVM) classifiers, each corresponding to one of the RP matrices. The scores obtained from the ensemble are then fused for making the final decision. Experimental results on two recent datasets suggest that the proposed method can reduce the dimensions by six folds and remarkably improve the classification performance.
KW - Dimension reduction
KW - Multi-label classification
KW - Protein subcellular localization
KW - Random projection
KW - Support vector machines
UR - http://www.scopus.com/inward/record.url?scp=84894554386&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84894554386&partnerID=8YFLogxK
U2 - 10.1109/BIBM.2013.6732715
DO - 10.1109/BIBM.2013.6732715
M3 - Conference contribution
AN - SCOPUS:84894554386
SN - 9781479913091
T3 - Proceedings - 2013 IEEE International Conference on Bioinformatics and Biomedicine, IEEE BIBM 2013
SP - 35
EP - 42
BT - Proceedings - 2013 IEEE International Conference on Bioinformatics and Biomedicine, IEEE BIBM 2013
T2 - 2013 IEEE International Conference on Bioinformatics and Biomedicine, IEEE BIBM 2013
Y2 - 18 December 2013 through 21 December 2013
ER -