TY - GEN
T1 - Adaptive thresholding for multi-label SVM classification with application to protein subcellular localization prediction
AU - Wan, Shibiao
AU - Mak, Man Wai
AU - Kung, Sun Yuan
PY - 2013/10/18
Y1 - 2013/10/18
N2 - Multi-label classification has received increasing attention in computational proteomics, especially in protein subcellular localization. Many existing multi-label protein predictors suffer from over-prediction because they use a fixed decision threshold to determine the number of labels to which a query protein should be assigned. To address this problem, this paper proposes an adaptive thresholding scheme for multi-label support vector machine (SVM) classifiers. Specifically, each one-vs-rest SVM has an adaptive threshold that is a fraction of the maximum score of the one-vs-rest SVMs in the classifier. Therefore, the number of class labels of the query protein depends on the confidence of the SVMs in the classification. This scheme is integrated into our recently proposed subcellular localization predictor that uses the frequency of occurrences of gene-ontology terms as feature vectors and one-vs-rest SVMs as classifiers. Experimental results on two recent datasets suggest that the scheme can effectively avoid both over-prediction and under-prediction, resulting in performance significantly better than other gene-ontology based subcellular localization predictors.
AB - Multi-label classification has received increasing attention in computational proteomics, especially in protein subcellular localization. Many existing multi-label protein predictors suffer from over-prediction because they use a fixed decision threshold to determine the number of labels to which a query protein should be assigned. To address this problem, this paper proposes an adaptive thresholding scheme for multi-label support vector machine (SVM) classifiers. Specifically, each one-vs-rest SVM has an adaptive threshold that is a fraction of the maximum score of the one-vs-rest SVMs in the classifier. Therefore, the number of class labels of the query protein depends on the confidence of the SVMs in the classification. This scheme is integrated into our recently proposed subcellular localization predictor that uses the frequency of occurrences of gene-ontology terms as feature vectors and one-vs-rest SVMs as classifiers. Experimental results on two recent datasets suggest that the scheme can effectively avoid both over-prediction and under-prediction, resulting in performance significantly better than other gene-ontology based subcellular localization predictors.
KW - Adaptive thresholding
KW - Gene Ontology
KW - Multi-label SVM
KW - Multi-label classification
KW - Protein subcellular localization
UR - http://www.scopus.com/inward/record.url?scp=84890467045&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84890467045&partnerID=8YFLogxK
U2 - 10.1109/ICASSP.2013.6638318
DO - 10.1109/ICASSP.2013.6638318
M3 - Conference contribution
AN - SCOPUS:84890467045
SN - 9781479903566
T3 - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
SP - 3547
EP - 3551
BT - 2013 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2013 - Proceedings
T2 - 2013 38th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2013
Y2 - 26 May 2013 through 31 May 2013
ER -