PADS: An end-to-end system for processing ad hoc data

Mark Daly, Yitzhak Mandelbaum, David Walker, Mary Fernández, Kathleen Fisher

Research output: Chapter in Book/Report/Conference proceedingConference contribution

4 Scopus citations

Abstract

Enormous amounts of data exist in "well-behaved" formats such as relational tables and XML, which come equipped with extensive tool support. However, vast amounts of data also exist in non-standard or ad hoc data formats, which often lack standard or extensible tools. This deficiency forces data analysts to implement their own tools for parsing, querying, and analyzing their ad hoc data. The resulting tools typically interleave parsing, querying, and analysis, obscuring the semantics of the data format and making it nearly impossible for others to resuse the tools. This proposal describes PADS, an end-to-end system for processing ad hoc data sources. The core of PADS is a declarative language for describing ad hoc data sources and a data-description compiler that produces customizable libraries for parsing the ad hoc data. A suite of tools built around this core includes statistical data-profiling tools, a query engine that permits viewing ad hoc sources as XML and for querying them with XQuery, and an interactive front-end that helps users produce PADS descriptions quickly.

Original languageEnglish (US)
Title of host publicationSIGMOD 2006 - Proceedings of the ACM SIGMOD International Conference on Management of Data
Pages727-729
Number of pages3
DOIs
StatePublished - 2006
Event2006 ACM SIGMOD International Conference on Management of Data - Chicago, IL, United States
Duration: Jun 27 2006Jun 29 2006

Publication series

NameProceedings of the ACM SIGMOD International Conference on Management of Data
ISSN (Print)0730-8078

Other

Other2006 ACM SIGMOD International Conference on Management of Data
Country/TerritoryUnited States
CityChicago, IL
Period6/27/066/29/06

All Science Journal Classification (ASJC) codes

  • Software
  • Information Systems

Fingerprint

Dive into the research topics of 'PADS: An end-to-end system for processing ad hoc data'. Together they form a unique fingerprint.

Cite this