TY - GEN
T1 - Language support for processing distributed ad hoc data
AU - Zhu, Kenny Q.
AU - Dantas, Daniel S.
AU - Fisher, Kathleen
AU - Jia, Limin
AU - Mandelbaum, Yitzhak
AU - Pai, Vivek
AU - Walker, David
PY - 2009
Y1 - 2009
N2 - This paper presents the design, theory and implementation of GLOVES1, a domain-specific language that allows users to specify the provenance (the derivation history starting from the origins), syntax and semantic properties of collections of distributed data sources. In particular, GLOVES specifications indicate where to locate desired data, how to obtain it, when to get it or to give up trying, and what format it will be in on arrival. The GLOVES system compiles such specification into a suite of data-processing tools including an archiver, a provenance tracking system, a database loading tool, an alert system, an RSS feed generator and a debugging tool. In addition, the system generates description-specific libraries so that developers can create their own applications. GLOVES also provides a generic infrastructure so that advanced users can build new tools applicable to any data source with a GLOVES description. We show how GLOVES may be used to specify data sources from two domains: CoMon, a monitoring system for PlanetLab's 800+ nodes, and Arrakis, a monitoring system for an AT&T web hosting service. We show experimentally that our system can scale to distributed systems the size of CoMon. Finally, we provide a de-notational semantics for GLOVES and use this semantics to prove two important theorems. The first shows that our denotational semantics respects the typing rules for the language, while the second demonstrates that our system correctly maintains the provenance.
AB - This paper presents the design, theory and implementation of GLOVES1, a domain-specific language that allows users to specify the provenance (the derivation history starting from the origins), syntax and semantic properties of collections of distributed data sources. In particular, GLOVES specifications indicate where to locate desired data, how to obtain it, when to get it or to give up trying, and what format it will be in on arrival. The GLOVES system compiles such specification into a suite of data-processing tools including an archiver, a provenance tracking system, a database loading tool, an alert system, an RSS feed generator and a debugging tool. In addition, the system generates description-specific libraries so that developers can create their own applications. GLOVES also provides a generic infrastructure so that advanced users can build new tools applicable to any data source with a GLOVES description. We show how GLOVES may be used to specify data sources from two domains: CoMon, a monitoring system for PlanetLab's 800+ nodes, and Arrakis, a monitoring system for an AT&T web hosting service. We show experimentally that our system can scale to distributed systems the size of CoMon. Finally, we provide a de-notational semantics for GLOVES and use this semantics to prove two important theorems. The first shows that our denotational semantics respects the typing rules for the language, while the second demonstrates that our system correctly maintains the provenance.
KW - Languages
UR - http://www.scopus.com/inward/record.url?scp=70450265495&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=70450265495&partnerID=8YFLogxK
U2 - 10.1145/1599410.1599440
DO - 10.1145/1599410.1599440
M3 - Conference contribution
AN - SCOPUS:70450265495
SN - 9781605585680
T3 - PPDP'09 - Proceedings of the 11th International ACM SIGPLAN Symposium on Principles and Practice of Declarative Programming
SP - 243
EP - 254
BT - PPDP'09 - Proceedings of the 11th International ACM SIGPLAN Symposium on Principles and Practice of Declarative Programming
T2 - 11th International ACM SIGPLAN Symposium on Principles and Practice of Declarative Programming, PPDP'09
Y2 - 7 September 2009 through 9 September 2009
ER -