Incremental learning of system log formats

Kenny Q. Zhu, Kathleen Fisher, David Walker

Research output: Contribution to journalArticlepeer-review

18 Scopus citations


System logs come in a large and evolving variety of formats, many of which are semi-structured and/or non-standard. As a consequence, off-the-shelf tools for processing such logs often do not exist, forcing analysts to develop their own tools, which is costly and timeconsuming. In this paper, we present an incremental algorithm that automatically infers the format of system log files. From the resulting format descriptions, we can generate a suite of data processing tools automatically. The system can handle large-scale data sources whose formats evolve over time. Furthermore, it allows analysts to modify inferred descriptions as desired and incorporates those changes in future revisions.

Original languageEnglish (US)
Pages (from-to)85-90
Number of pages6
JournalOperating Systems Review (ACM)
Issue number1
StatePublished - Mar 12 2010

All Science Journal Classification (ASJC) codes

  • Information Systems
  • Hardware and Architecture
  • Computer Networks and Communications


  • Ad hoc data
  • Analysis of system logs
  • Domain-specific languages
  • Grammar induction
  • PADS
  • Parsing
  • Tool generation

Cite this