TY - GEN
T1 - PADS
T2 - 2006 ACM SIGMOD International Conference on Management of Data
AU - Daly, Mark
AU - Mandelbaum, Yitzhak
AU - Walker, David
AU - Fernández, Mary
AU - Fisher, Kathleen
PY - 2006
Y1 - 2006
N2 - Enormous amounts of data exist in "well-behaved" formats such as relational tables and XML, which come equipped with extensive tool support. However, vast amounts of data also exist in non-standard or ad hoc data formats, which often lack standard or extensible tools. This deficiency forces data analysts to implement their own tools for parsing, querying, and analyzing their ad hoc data. The resulting tools typically interleave parsing, querying, and analysis, obscuring the semantics of the data format and making it nearly impossible for others to resuse the tools. This proposal describes PADS, an end-to-end system for processing ad hoc data sources. The core of PADS is a declarative language for describing ad hoc data sources and a data-description compiler that produces customizable libraries for parsing the ad hoc data. A suite of tools built around this core includes statistical data-profiling tools, a query engine that permits viewing ad hoc sources as XML and for querying them with XQuery, and an interactive front-end that helps users produce PADS descriptions quickly.
AB - Enormous amounts of data exist in "well-behaved" formats such as relational tables and XML, which come equipped with extensive tool support. However, vast amounts of data also exist in non-standard or ad hoc data formats, which often lack standard or extensible tools. This deficiency forces data analysts to implement their own tools for parsing, querying, and analyzing their ad hoc data. The resulting tools typically interleave parsing, querying, and analysis, obscuring the semantics of the data format and making it nearly impossible for others to resuse the tools. This proposal describes PADS, an end-to-end system for processing ad hoc data sources. The core of PADS is a declarative language for describing ad hoc data sources and a data-description compiler that produces customizable libraries for parsing the ad hoc data. A suite of tools built around this core includes statistical data-profiling tools, a query engine that permits viewing ad hoc sources as XML and for querying them with XQuery, and an interactive front-end that helps users produce PADS descriptions quickly.
UR - http://www.scopus.com/inward/record.url?scp=34250675342&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=34250675342&partnerID=8YFLogxK
U2 - 10.1145/1142473.1142568
DO - 10.1145/1142473.1142568
M3 - Conference contribution
AN - SCOPUS:34250675342
SN - 1595934340
SN - 9781595934345
T3 - Proceedings of the ACM SIGMOD International Conference on Management of Data
SP - 727
EP - 729
BT - SIGMOD 2006 - Proceedings of the ACM SIGMOD International Conference on Management of Data
Y2 - 27 June 2006 through 29 June 2006
ER -