Using natural language processing to construct a National Zoning and Land Use Database

Matthew Mleczko, Matthew Desmond

Research output: Contribution to journalArticlepeer-review

3 Scopus citations


In the United States, zoning and land use policies have been linked to high housing costs and residential segregation. Yet almost all zoning and land use data come from a handful of cross-sectional surveys, which are costly, time intensive, subject to low response rates and measurement error and are quickly dated. As an alternative, we constructed a National Zoning and Land Use Database using natural language processing techniques on publicly available administrative data. We show this new database and our parsimonious measure of exclusionary zoning, the Zoning Restrictiveness Index, to be consistent with the Wharton Residential Land Use Regulatory Index (2018) and the National Longitudinal Land Use Survey (2019). Additionally, we overcome other limitations of these survey approaches, both by capturing previously omitted and important elements of land use policy and by revealing the land use regulations for a near-universe of municipalities in the San Francisco and Houston metropolitan statistical areas. We make all code and data publicly available, allowing the National Zoning and Land Use Database to be replicated in future years to ensure accurate, up-to-date and longitudinal nationwide zoning and land use data.

Original languageEnglish (US)
Pages (from-to)2564-2584
Number of pages21
JournalUrban Studies
Issue number13
StatePublished - Oct 2023

All Science Journal Classification (ASJC) codes

  • Environmental Science (miscellaneous)
  • Urban Studies


  • housing
  • land use
  • natural language processing
  • zoning


Dive into the research topics of 'Using natural language processing to construct a National Zoning and Land Use Database'. Together they form a unique fingerprint.

Cite this