TY - GEN
T1 - Deep supervised and convolutional generative stochastic network for protein secondary structure prediction
AU - Zhou, Jian
AU - Troyanskaya, Olga G.
N1 - Publisher Copyright:
Copyright © (2014) by the International Machine Learning Society (IMLS) All rights reserved.
PY - 2014
Y1 - 2014
N2 - Predicting protein secondary structure is a fundamental problem in protein structure predic-tion. Here we present a new supervised generative stochastic network (GSN) based method to predict local secondary structure with deep hierarchical representations. GSN is a recently proposed deep learning technique (Bengio & Thibodeau-Laufer, 2013) to globally train deep generative model. We present the supervised extension of GSN, which learns a Markov chain to sample from a conditional distribution, and applied it to protein structure prediction. To scale the model to full-sized, high-dimensional data, like protein sequences with hundreds of amino- Acids, we introduce a convolutional architecture, which allows efficient learning across multiple layers of hierarchical representations. Our architecture uniquely focuses on predicting structured low-level labels informed with both low and high-level representations learned by the model. In our application this corresponds to labeling the secondary structure state of each amino-acid residue. We trained and tested the model on separate sets of non-homologous proteins sharing less than 30% sequence identity. Our model achieves 66.4% Q8 accuracy on the CB513 dataset, better than the previously reported best performance 64.9% (Wang et al., 2011) for this challenging secondary structure prediction problem.
AB - Predicting protein secondary structure is a fundamental problem in protein structure predic-tion. Here we present a new supervised generative stochastic network (GSN) based method to predict local secondary structure with deep hierarchical representations. GSN is a recently proposed deep learning technique (Bengio & Thibodeau-Laufer, 2013) to globally train deep generative model. We present the supervised extension of GSN, which learns a Markov chain to sample from a conditional distribution, and applied it to protein structure prediction. To scale the model to full-sized, high-dimensional data, like protein sequences with hundreds of amino- Acids, we introduce a convolutional architecture, which allows efficient learning across multiple layers of hierarchical representations. Our architecture uniquely focuses on predicting structured low-level labels informed with both low and high-level representations learned by the model. In our application this corresponds to labeling the secondary structure state of each amino-acid residue. We trained and tested the model on separate sets of non-homologous proteins sharing less than 30% sequence identity. Our model achieves 66.4% Q8 accuracy on the CB513 dataset, better than the previously reported best performance 64.9% (Wang et al., 2011) for this challenging secondary structure prediction problem.
UR - http://www.scopus.com/inward/record.url?scp=84919913066&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84919913066&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:84919913066
T3 - 31st International Conference on Machine Learning, ICML 2014
SP - 1121
EP - 1129
BT - 31st International Conference on Machine Learning, ICML 2014
PB - International Machine Learning Society (IMLS)
T2 - 31st International Conference on Machine Learning, ICML 2014
Y2 - 21 June 2014 through 26 June 2014
ER -