LearnPADS ++: Incremental inference of ad hoc data formats

Kenny Q. Zhu, Kathleen Fisher, David Walker

Research output: Chapter in Book/Report/Conference proceedingConference contribution

6 Scopus citations

Abstract

An ad hoc data source is any semi-structured, non-standard data source. The format of such data sources is often evolving and frequently lacking documentation. Consequently, off-the-shelf tools for processing such data often do not exist, forcing analysts to develop their own tools, a costly and time-consuming process. In this paper, we present an incremental algorithm that automatically infers the format of large-scale data sources. From the resulting format descriptions, we can generate a suite of data processing tools automatically. The system can handle large-scale or streaming data sources whose formats evolve over time. Furthermore, it allows analysts to modify inferred descriptions as desired and incorporates those changes in future revisions.

Original languageEnglish (US)
Title of host publicationPractical Aspects of Declarative Languages - 14th International Symposium, PADL 2012, Proceedings
Pages168-182
Number of pages15
DOIs
StatePublished - 2012
Event14th International Symposium on Practical Aspects of Declarative Languages, PADL 2012 - Philadelphia, PA, United States
Duration: Jan 23 2012Jan 24 2012

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume7149 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Other

Other14th International Symposium on Practical Aspects of Declarative Languages, PADL 2012
Country/TerritoryUnited States
CityPhiladelphia, PA
Period1/23/121/24/12

All Science Journal Classification (ASJC) codes

  • Theoretical Computer Science
  • General Computer Science

Fingerprint

Dive into the research topics of 'LearnPADS ++: Incremental inference of ad hoc data formats'. Together they form a unique fingerprint.

Cite this