TY - JOUR
T1 - In silico structural and functional analysis of the human cytomegalovirus (HHV5) genome
AU - Novotny, Jiri
AU - Rigoutsos, Isidore
AU - Coleman, David
AU - Shenk, Thomas
N1 - Funding Information:
We thank R. M. Graham for generous support; F. Domingues, W. A. Koppensteiner, P. Lackner and M. Sippl for advice with the ProCeryon program; B. Hitz for valuable opinions on pseudo-energy normalization schemes; and S. Miller for his interest in X-ray crystallography work aimed at testing selected folding hypotheses. Parts of this work were supported by grants from the National Institutes of Health (CA82396, CA85786 and CA87661).
PY - 2001/7/27
Y1 - 2001/7/27
N2 - The open reading frames of human cytomegalovirus (human herpesvirus-5, HHV5) encode some 213 unique proteins with mostly unknown functions. Using the threading program, ProCeryon, we calculated possible matches between the amino acid sequences of these proteins and the Protein Data Bank library of three-dimensional structures. Thirty-six proteins were fully identified in terms of their structure and, often, function; 65 proteins were recognized as members of narrow structural/functional families (e.g. DNA-binding factors, cytokines, enzymes, signaling particles, cell surface receptors etc.); and 87 proteins were assigned to broad structural classes (e.g. all-β, 3-1ayer-αβα, multidomain, etc.). Genes encoding proteins with similar folds, or containing identical structural traits (extreme sequence length, runs of unstructured (Pro and/or Gly-rich) residues, transmembrane segments, etc.) often formed tandem clusters throughout the genome. In the course of this work, benchmarks on about 20 known folds were used to optimize adjustable parameters of threading calculations, i.e. gap penalty weights used in sequence/structure alignments; new scores obtained as simple combinations of existing scoring functions; and number of threading runs conducive to meaningful results. An introduction of summed, per-residue-normalized scores has been essential for discovery of subdomains (EGF-like, SH2, SH3) in longer protein sequences, such as the eight "open sandwich" cytokine domains, 60-70 amino acids long and having the 31β1α fold with one or two disulfide bridges, present in otherwise unrelated proteins.
AB - The open reading frames of human cytomegalovirus (human herpesvirus-5, HHV5) encode some 213 unique proteins with mostly unknown functions. Using the threading program, ProCeryon, we calculated possible matches between the amino acid sequences of these proteins and the Protein Data Bank library of three-dimensional structures. Thirty-six proteins were fully identified in terms of their structure and, often, function; 65 proteins were recognized as members of narrow structural/functional families (e.g. DNA-binding factors, cytokines, enzymes, signaling particles, cell surface receptors etc.); and 87 proteins were assigned to broad structural classes (e.g. all-β, 3-1ayer-αβα, multidomain, etc.). Genes encoding proteins with similar folds, or containing identical structural traits (extreme sequence length, runs of unstructured (Pro and/or Gly-rich) residues, transmembrane segments, etc.) often formed tandem clusters throughout the genome. In the course of this work, benchmarks on about 20 known folds were used to optimize adjustable parameters of threading calculations, i.e. gap penalty weights used in sequence/structure alignments; new scores obtained as simple combinations of existing scoring functions; and number of threading runs conducive to meaningful results. An introduction of summed, per-residue-normalized scores has been essential for discovery of subdomains (EGF-like, SH2, SH3) in longer protein sequences, such as the eight "open sandwich" cytokine domains, 60-70 amino acids long and having the 31β1α fold with one or two disulfide bridges, present in otherwise unrelated proteins.
KW - Cytomegalovirus
KW - Protein folds
KW - Stereochemical code
KW - Structural genomics
KW - Threading
UR - http://www.scopus.com/inward/record.url?scp=0035958740&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=0035958740&partnerID=8YFLogxK
U2 - 10.1006/jmbi.2001.4798
DO - 10.1006/jmbi.2001.4798
M3 - Article
C2 - 11502002
AN - SCOPUS:0035958740
SN - 0022-2836
VL - 310
SP - 1151
EP - 1166
JO - Journal of Molecular Biology
JF - Journal of Molecular Biology
IS - 5
ER -