Protein Family Classification using Sparse Markov Transducers

Eleazar Eskin, William Noble Grundy, Yoram Singer

Research output: Chapter in Book/Report/Conference proceedingConference contribution

19 Scopus citations

Abstract

In this paper we present a method for classifying proteins into families using sparse Markov transducers (SMTs). Sparse Markov transducers, similar to probabilistic suffix trees, estimate a probability distribution conditioned on an input sequence. SMTs generalize probabilistic suffix trees by allowing for wild-cards in the conditioning sequences. Because substitutions of amino acids are common in protein families, incorporating wildcards into the model significantly improves classification performance. We present two models for building protein family classifiers using SMTs. We also present efficient data structures to improve the memory usage of the models. We evaluate SMTs by building protein family classifiers using the Pfam database and compare our results to previously published results.

Original languageEnglish (US)
Title of host publicationProceedings of the 8th International Conference on Intelligent Systems for Molecular Biology, ISMB 2000
PublisherAAAI press
Pages134-145
Number of pages12
ISBN (Electronic)1577351150, 9781577351153
StatePublished - 2000
Event8th International Conference on Intelligent Systems for Molecular Biology, ISMB 2000 - San Diego, United States
Duration: Aug 19 2000Aug 23 2000

Publication series

NameProceedings of the 8th International Conference on Intelligent Systems for Molecular Biology, ISMB 2000

Conference

Conference8th International Conference on Intelligent Systems for Molecular Biology, ISMB 2000
Country/TerritoryUnited States
CitySan Diego
Period8/19/008/23/00

All Science Journal Classification (ASJC) codes

  • General Biochemistry, Genetics and Molecular Biology
  • Artificial Intelligence
  • Information Systems

Fingerprint

Dive into the research topics of 'Protein Family Classification using Sparse Markov Transducers'. Together they form a unique fingerprint.

Cite this