Abstract
Text as data techniques offer a great promise: the ability to inductively discover measures that are useful for testing social science theories with large collections of text. Nearly all text-based causal inferences depend on a latent representation of the text, but we show that estimating this latent representation from the data creates underacknowledged risks: we may introduce an identification problem or overfit. To address these risks, we introduce a split-sample workflow for making rigorous causal inferences with discovered measures as treatments or outcomes. We then apply it to estimate causal effects from an experiment on immigration attitudes and a study on bureaucratic responsiveness.
| Original language | English (US) |
|---|---|
| Article number | eabg2652 |
| Journal | Science Advances |
| Volume | 8 |
| Issue number | 42 |
| DOIs | |
| State | Published - Oct 2022 |
All Science Journal Classification (ASJC) codes
- General
Fingerprint
Dive into the research topics of 'How to make causal inferences using texts'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver