Motivation: Modification of proteins via phosphorylation is a primary mechanism for signal transduction in cells. Phosphorylation sites on proteins are determined in part through particular patterns, or motifs, present in the amino acid sequence. Results: We describe an algorithm that simultaneously discovers multiple motifs in a set of peptides that were phosphorylated by several different kinases. Such sets of peptides are routinely produced in proteomics experiments. Our motif-finding algorithm uses the principle of minimum description length to determine a mixture of sequence motifs that distinguish a foreground set of phosphopeptides from a background set of unphosphorylated peptides. We show that our algorithm outperforms existing motif-finding algorithms on synthetic datasets consisting of mixtures of known phosphorylation sites. We also derive a motif specificity score that quantifies whether or not the phosphoproteins containing an instance of a motif have a significant number of known interactions. Application of our motif-finding algorithm to recently published human and mouse proteomic studies recovers several known phosphorylation motifs and reveals a number of novel motifs that are enriched for interactions with a particular kinase or phosphatase. Our tools provide a new approach for uncovering the sequence specificities of uncharacterized kinases or phosphatases.
All Science Journal Classification (ASJC) codes
- Computational Mathematics
- Molecular Biology
- Statistics and Probability
- Computer Science Applications
- Computational Theory and Mathematics