This paper presents the design, theory and implementation of GLOVES1, a domain-specific language that allows users to specify the provenance (the derivation history starting from the origins), syntax and semantic properties of collections of distributed data sources. In particular, GLOVES specifications indicate where to locate desired data, how to obtain it, when to get it or to give up trying, and what format it will be in on arrival. The GLOVES system compiles such specification into a suite of data-processing tools including an archiver, a provenance tracking system, a database loading tool, an alert system, an RSS feed generator and a debugging tool. In addition, the system generates description-specific libraries so that developers can create their own applications. GLOVES also provides a generic infrastructure so that advanced users can build new tools applicable to any data source with a GLOVES description. We show how GLOVES may be used to specify data sources from two domains: CoMon, a monitoring system for PlanetLab's 800+ nodes, and Arrakis, a monitoring system for an AT&T web hosting service. We show experimentally that our system can scale to distributed systems the size of CoMon. Finally, we provide a de-notational semantics for GLOVES and use this semantics to prove two important theorems. The first shows that our denotational semantics respects the typing rules for the language, while the second demonstrates that our system correctly maintains the provenance.