Identifying a protein's functional sites is an important step towards characterizing its molecular function. Numerous structure- and sequence-based methods have been developed for this problem. Here we introduce ConCavity, a small molecule binding site prediction algorithm that integrates evolutionary sequence conservation estimates with structurebased methods for identifying protein surface cavities. In large-scale testing on a diverse set of single- and multi-chain protein structures, we show that ConCavity substantially outperforms existing methods for identifying both 3D ligand binding pockets and individual ligand binding residues. As part of our testing, we perform one of the first direct comparisons of conservation-based and structure-based methods. We find that the two approaches provide largely complementary information, which can be combined to improve upon either approach alone. We also demonstrate that ConCavity has state-of-the-art performance in predicting catalytic sites and drug binding pockets. Overall, the algorithms and analysis presented here significantly improve our ability to identify ligand binding sites and further advance our understanding of the relationship between evolutionary sequence conservation and structural and functional attributes of proteins. Data, source code, and prediction visualizations are available on the ConCavity web site (http://compbio.cs.princeton.edu/concavity/).
All Science Journal Classification (ASJC) codes
- Ecology, Evolution, Behavior and Systematics
- Modeling and Simulation
- Molecular Biology
- Cellular and Molecular Neuroscience
- Computational Theory and Mathematics