Extraction of Protein Conformational Modes from Distance Distributions Using Structurally Imputed Bayesian Data Augmentation

Xun Sun, Thomas E. Morrell, Haw Yang

Research output: Contribution to journalArticlepeer-review

4 Scopus citations


Protein conformational changes are known to play important roles in assorted biochemical and biological processes. Driven by thermal motions of surrounding solvent molecules, such a structural remodeling often occurs stochastically. Yet, regardless of how random the conformational reconfiguration may appear, it could in principle be described by a linear combination of a set of orthogonal modes which, in turn, are contained in the intramolecular distance distributions. The central challenge is how to obtain the distribution. This contribution proposes a Bayesian data-augmentation scheme to extract the predominant modes from only few distance distributions, be they from computational sampling or directly from experiments such as single-molecule Förster-type resonance energy transfer (smFRET). The inference of the complete protein structure from insufficient data was recognized as isomorphic to the missing-data problem in Bayesian statistical learning. Using smFRET data as an example, the missing coordinates were deduced, given protein structural constraints and multiple but limited number of smFRET distances; the Boltzmann weighing of each inferred protein structure was then evaluated using computational modeling to numerically construct the posterior density for the global protein conformation. The conformational modes were then determined from the iteratively converged overall conformational distribution using principal component analysis. Two examples were presented to illustrate these basic ideas as well as their practical implementation. The scheme described herein was based on the theory behind the powerful Tanner-Wang algorithm that guarantees convergence to the true posterior density. However, instead of assuming a mathematical model to calculate the likelihood as in conventional statistical inference, here the protein structure was treated as a statistical parameter and was imputed from the numerical likelihood function based on structural information, a probability model-free method. The framework put forth here is anticipated to be generally applicable, offering a new way to articulate protein conformational changes in a quantifiable manner.

Original languageEnglish (US)
Pages (from-to)10469-10482
Number of pages14
JournalJournal of Physical Chemistry B
Issue number40
StatePublished - Oct 13 2016

All Science Journal Classification (ASJC) codes

  • Materials Chemistry
  • Surfaces, Coatings and Films
  • Physical and Theoretical Chemistry


Dive into the research topics of 'Extraction of Protein Conformational Modes from Distance Distributions Using Structurally Imputed Bayesian Data Augmentation'. Together they form a unique fingerprint.

Cite this