Deep supervised and convolutional generative stochastic network for protein secondary structure prediction

Research output: Chapter in Book/Report/Conference proceedingConference contribution

21 Scopus citations

Abstract

Predicting protein secondary structure is a fundamental problem in protein structure predic-tion. Here we present a new supervised generative stochastic network (GSN) based method to predict local secondary structure with deep hierarchical representations. GSN is a recently proposed deep learning technique (Bengio & Thibodeau-Laufer, 2013) to globally train deep generative model. We present the supervised extension of GSN, which learns a Markov chain to sample from a conditional distribution, and applied it to protein structure prediction. To scale the model to full-sized, high-dimensional data, like protein sequences with hundreds of amino- Acids, we introduce a convolutional architecture, which allows efficient learning across multiple layers of hierarchical representations. Our architecture uniquely focuses on predicting structured low-level labels informed with both low and high-level representations learned by the model. In our application this corresponds to labeling the secondary structure state of each amino-acid residue. We trained and tested the model on separate sets of non-homologous proteins sharing less than 30% sequence identity. Our model achieves 66.4% Q8 accuracy on the CB513 dataset, better than the previously reported best performance 64.9% (Wang et al., 2011) for this challenging secondary structure prediction problem.

Original languageEnglish (US)
Title of host publication31st International Conference on Machine Learning, ICML 2014
PublisherInternational Machine Learning Society (IMLS)
Pages1121-1129
Number of pages9
ISBN (Electronic)9781634393973
StatePublished - Jan 1 2014
Event31st International Conference on Machine Learning, ICML 2014 - Beijing, China
Duration: Jun 21 2014Jun 26 2014

Publication series

Name31st International Conference on Machine Learning, ICML 2014
Volume2

Other

Other31st International Conference on Machine Learning, ICML 2014
CountryChina
CityBeijing
Period6/21/146/26/14

All Science Journal Classification (ASJC) codes

  • Artificial Intelligence
  • Computer Networks and Communications
  • Software

Fingerprint Dive into the research topics of 'Deep supervised and convolutional generative stochastic network for protein secondary structure prediction'. Together they form a unique fingerprint.

  • Cite this

    Zhou, J., & Troyanskaya, O. G. (2014). Deep supervised and convolutional generative stochastic network for protein secondary structure prediction. In 31st International Conference on Machine Learning, ICML 2014 (pp. 1121-1129). (31st International Conference on Machine Learning, ICML 2014; Vol. 2). International Machine Learning Society (IMLS).