Protein family classification using sparse Markov transducers.

E. Eskin, W. N. Grundy, Yoram Singer

Research output: Contribution to journalArticle

17 Scopus citations

Abstract

In this paper we present a method for classifying proteins into families using sparse Markov transducers (SMTs). Sparse Markov transducers, similar to probabilistic suffix trees, estimate a probability distribution conditioned on an input sequence. SMTs generalize probabilistic suffix trees by allowing for wild-cards in the conditioning sequences. Because substitutions of amino acids are common in protein families, incorporating wildcards into the model significantly improves classification performance. We present two models for building protein family classifiers using SMTs. We also present efficient data structures to improve the memory usage of the models. We evaluate SMTs by building protein family classifiers using the Pfam database and compare our results to previously published results.

Original languageEnglish (US)
Pages (from-to)134-145
Number of pages12
JournalProceedings / ... International Conference on Intelligent Systems for Molecular Biology ; ISMB. International Conference on Intelligent Systems for Molecular Biology
Volume8
StatePublished - Jan 1 2000
Externally publishedYes

All Science Journal Classification (ASJC) codes

  • Medicine(all)

Fingerprint Dive into the research topics of 'Protein family classification using sparse Markov transducers.'. Together they form a unique fingerprint.

  • Cite this