Abstract
System logs come in a large and evolving variety of formats, many of which are semi-structured and/or non-standard. As a consequence, off-the-shelf tools for processing such logs often do not exist, forcing analysts to develop their own tools, which is costly and timeconsuming. In this paper, we present an incremental algorithm that automatically infers the format of system log files. From the resulting format descriptions, we can generate a suite of data processing tools automatically. The system can handle large-scale data sources whose formats evolve over time. Furthermore, it allows analysts to modify inferred descriptions as desired and incorporates those changes in future revisions.
Original language | English (US) |
---|---|
Pages (from-to) | 85-90 |
Number of pages | 6 |
Journal | Operating Systems Review (ACM) |
Volume | 44 |
Issue number | 1 |
DOIs | |
State | Published - Mar 12 2010 |
All Science Journal Classification (ASJC) codes
- Information Systems
- Hardware and Architecture
- Computer Networks and Communications
Keywords
- Ad hoc data
- Analysis of system logs
- Domain-specific languages
- Grammar induction
- PADS
- Parsing
- Tool generation