Identifying Evolutionary Origins of Repeat Domains in Protein Families

Chaitanya Aluru, Mona Singh

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Arrays of repeat domains are critical to the proper function of a significant fraction of protein families. These repeats are easily identified in sequence, and are thought to have arisen primarily through the simultaneous duplication of multiple domains. However, for most repeat domain protein families, very little is typically known about the specific domain duplication events that occurred in their evolutionary histories. Here we extend existing reconciliation formulations that use domain trees and sequence trees to infer domain duplication and loss events to additionally consider simultaneous domain duplications under arbitrary cost models. We develop a novel integer linear programming (ILP) solution to this reconciliation problem, and demonstrate the accuracy and robustness of our approach on simulated datasets. Finally, as proof of principle, we apply our approach to an orthogroup containing the C2H2 zinc finger repeat domain, and identify simultaneous domain duplications that occurred at the onset of the primate lineage. Simulation and ILP code is available at https://github.com/Singh-Lab/treeSim.

Original languageEnglish (US)
Title of host publicationProceedings of the 11th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics, BCB 2020
PublisherAssociation for Computing Machinery, Inc
ISBN (Electronic)9781450379649
DOIs
StatePublished - Sep 21 2020
Event11th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics, BCB 2020 - Virtual, Online, United States
Duration: Sep 21 2020Sep 24 2020

Publication series

NameProceedings of the 11th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics, BCB 2020

Conference

Conference11th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics, BCB 2020
CountryUnited States
CityVirtual, Online
Period9/21/209/24/20

All Science Journal Classification (ASJC) codes

  • Computer Science Applications
  • Software
  • Biomedical Engineering
  • Health Informatics

Keywords

  • duplications
  • phylogenetics
  • protein domains
  • reconciliation

Fingerprint Dive into the research topics of 'Identifying Evolutionary Origins of Repeat Domains in Protein Families'. Together they form a unique fingerprint.

Cite this