TY - JOUR
T1 - Inferring interaction partners from protein sequences
AU - Bitbol, Anne Florence
AU - Dwyer, Robert S.
AU - Colwell, Lucy J.
AU - Wingreen, Ned S.
N1 - Funding Information:
We thank Mohamed Barakat and Philippe Ortet for sharing and discussing specifically formatted datasets built from the P2CS database. A.-F.B. acknowledges support from the Human Frontier Science Program. This research was supported, in part, by National Institutes of Health Grant R01-GM082938 (to A.-F.B. and N.S.W.), National Science Foundation Grant PHY-1305525 (to N.S.W.), Marie Curie Career Integration Grant 631609 (to L.J.C.), a Next Generation Fellowship (to L.J.C.), and the Eric and Wendy Schmidt Transformative Technology Fund.
PY - 2016/10/25
Y1 - 2016/10/25
N2 - Specific protein-protein interactions are crucial in the cell, both to ensure the formation and stability of multiprotein complexes and to enable signal transduction in various pathways. Functional interactions between proteins result in coevolution between the interaction partners, causing their sequences to be correlated. Here we exploit these correlations to accurately identify, from sequence data alone, which proteins are specific interaction partners. Our general approach, which employs a pairwise maximum entropy model to infer couplings between residues, has been successfully used to predict the 3D structures of proteins from sequences. Thus inspired, we introduce an iterative algorithm to predict specific interaction partners from two protein families whose members are known to interact. We first assess the algorithm's performance on histidine kinases and response regulators from bacterial twocomponent signaling systems.We obtain a striking 0.93 true positive fraction on our complete dataset without any a priori knowledge of interaction partners, and we uncover the origin of this success. We then apply the algorithm to proteins from ATP-binding cassette (ABC) transporter complexes, and obtain accurate predictions in these systems as well. Finally, we present two metrics that accurately distinguish interacting protein families from noninteracting ones, using only sequence data.
AB - Specific protein-protein interactions are crucial in the cell, both to ensure the formation and stability of multiprotein complexes and to enable signal transduction in various pathways. Functional interactions between proteins result in coevolution between the interaction partners, causing their sequences to be correlated. Here we exploit these correlations to accurately identify, from sequence data alone, which proteins are specific interaction partners. Our general approach, which employs a pairwise maximum entropy model to infer couplings between residues, has been successfully used to predict the 3D structures of proteins from sequences. Thus inspired, we introduce an iterative algorithm to predict specific interaction partners from two protein families whose members are known to interact. We first assess the algorithm's performance on histidine kinases and response regulators from bacterial twocomponent signaling systems.We obtain a striking 0.93 true positive fraction on our complete dataset without any a priori knowledge of interaction partners, and we uncover the origin of this success. We then apply the algorithm to proteins from ATP-binding cassette (ABC) transporter complexes, and obtain accurate predictions in these systems as well. Finally, we present two metrics that accurately distinguish interacting protein families from noninteracting ones, using only sequence data.
KW - Coevolution
KW - Direct coupling analysis
KW - Maximum entropy
KW - Paralogs
KW - Protein-protein interactions
UR - http://www.scopus.com/inward/record.url?scp=84992386849&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84992386849&partnerID=8YFLogxK
U2 - 10.1073/pnas.1606762113
DO - 10.1073/pnas.1606762113
M3 - Article
C2 - 27663738
AN - SCOPUS:84992386849
SN - 0027-8424
VL - 113
SP - 12180
EP - 12185
JO - Proceedings of the National Academy of Sciences of the United States of America
JF - Proceedings of the National Academy of Sciences of the United States of America
IS - 43
ER -