TY - JOUR
T1 - Systematic domain-based aggregation of protein structures highlights DNA-, RNA- And other ligand-binding positions
AU - Kobren, Shilpa Nadimpalli
AU - Singh, Mona
N1 - Publisher Copyright:
© The Author(s) 2018.
PY - 2019/1/25
Y1 - 2019/1/25
N2 - Domains are fundamental subunits of proteins, and while they play major roles in facilitating protein-DNA, protein-RNA and other protein-ligand interactions, a systematic assessment of their various interaction modes is still lacking. A comprehensive resource identifying positions within domains that tend to interact with nucleic acids, small molecules and other ligands would expand our knowledge of domain functionality as well as aid in detecting ligandbinding sites within structurally uncharacterized proteins. Here, we introduce an approach to identify perdomain- position interaction 'frequencies' by aggregating protein co-complex structures by domain and ascertaining how often residues mapping to each domain position interact with ligands. We perform this domain-based analysis on ∼91000 co-complex structures, and infer positions involved in binding DNA, RNA, peptides, ions or small molecules across 4128 domains, which we refer to collectively as the InteracDome. Cross-validation testing reveals that ligand-binding positions for 2152 domains are highly consistent and can be used to identify residues facilitating interactions in ∼63-69% of human genes. Our resource of domain-inferred ligand-binding sites should be a great aid in understanding disease etiology: whereas these sites are enriched in Mendelian-associated and cancer somatic mutations, they are depleted in polymorphisms observed across healthy populations. The InteracDome is available at http://interacdome.princeton.edu.
AB - Domains are fundamental subunits of proteins, and while they play major roles in facilitating protein-DNA, protein-RNA and other protein-ligand interactions, a systematic assessment of their various interaction modes is still lacking. A comprehensive resource identifying positions within domains that tend to interact with nucleic acids, small molecules and other ligands would expand our knowledge of domain functionality as well as aid in detecting ligandbinding sites within structurally uncharacterized proteins. Here, we introduce an approach to identify perdomain- position interaction 'frequencies' by aggregating protein co-complex structures by domain and ascertaining how often residues mapping to each domain position interact with ligands. We perform this domain-based analysis on ∼91000 co-complex structures, and infer positions involved in binding DNA, RNA, peptides, ions or small molecules across 4128 domains, which we refer to collectively as the InteracDome. Cross-validation testing reveals that ligand-binding positions for 2152 domains are highly consistent and can be used to identify residues facilitating interactions in ∼63-69% of human genes. Our resource of domain-inferred ligand-binding sites should be a great aid in understanding disease etiology: whereas these sites are enriched in Mendelian-associated and cancer somatic mutations, they are depleted in polymorphisms observed across healthy populations. The InteracDome is available at http://interacdome.princeton.edu.
UR - http://www.scopus.com/inward/record.url?scp=85060605665&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85060605665&partnerID=8YFLogxK
U2 - 10.1093/nar/gky1224
DO - 10.1093/nar/gky1224
M3 - Article
C2 - 30535108
AN - SCOPUS:85060605665
SN - 0305-1048
VL - 47
SP - 582
EP - 593
JO - Nucleic acids research
JF - Nucleic acids research
IS - 2
ER -