Abstract
In the spirit of Landin, we present a calculus of dependent types to serve as the semantic foundation for a family of languages called data description languages. Such languages, which include pads, datascript, and packettypes, are designed to facilitate programming with ad hoc data, that is, data not in well-behaved relational or xml formats. In the calculus, each type describes the physical layout and semantic properties of a data source. In the semantics, we interpret types simultaneously as the in-memory representation of the data described and as parsers for the data source. The parsing functions are robust, automatically detecting and recording errors in the data stream without halting parsing. We show the parsers are type-correct, returning data whose type matches the simple-type interpretation of the specification. We also prove the parsers are error-correct, accurately reporting the number of physical and semantic errors that occur in the returned data. We use the calculus to describe the features of various data description languages, and we discuss how we have used the calculus to improve pads.
Original language | English (US) |
---|---|
Article number | 10 |
Journal | Journal of the ACM |
Volume | 57 |
Issue number | 2 |
DOIs | |
State | Published - Jan 1 2010 |
All Science Journal Classification (ASJC) codes
- Software
- Control and Systems Engineering
- Information Systems
- Hardware and Architecture
- Artificial Intelligence
Keywords
- Ad hoc data formats
- Context-sensitive grammars
- Data description languages
- Data processing
- Data-dependent grammars
- Dependent types
- Domain-specific languages
- PADS
- Parsing