TY - JOUR
T1 - Embedding Regression
T2 - Models for Context-Specific Description and Inference
AU - Rodriguez, Pedro L.
AU - Spirling, Arthur
AU - Stewart, Brandon M.
N1 - Publisher Copyright:
© The Author(s), 2023. Published by Cambridge University Press on behalf of the American Political Science Association.
PY - 2023/11/19
Y1 - 2023/11/19
N2 - Social scientists commonly seek to make statements about how word use varies over circumstances-including time, partisan identity, or some other document-level covariate. For example, researchers might wish to know how Republicans and Democrats diverge in their understanding of the term immigration. Building on the success of pretrained language models, we introduce the à la carte on text (conText) embedding regression model for this purpose. This fast and simple method produces valid vector representations of how words are used-and thus what words mean-in different contexts. We show that it outperforms slower, more complicated alternatives and works well even with very few documents. The model also allows for hypothesis testing and statements about statistical significance. We demonstrate that it can be used for a broad range of important tasks, including understanding US polarization, historical legislative development, and sentiment detection. We provide open-source software for fitting the model.
AB - Social scientists commonly seek to make statements about how word use varies over circumstances-including time, partisan identity, or some other document-level covariate. For example, researchers might wish to know how Republicans and Democrats diverge in their understanding of the term immigration. Building on the success of pretrained language models, we introduce the à la carte on text (conText) embedding regression model for this purpose. This fast and simple method produces valid vector representations of how words are used-and thus what words mean-in different contexts. We show that it outperforms slower, more complicated alternatives and works well even with very few documents. The model also allows for hypothesis testing and statements about statistical significance. We demonstrate that it can be used for a broad range of important tasks, including understanding US polarization, historical legislative development, and sentiment detection. We provide open-source software for fitting the model.
UR - http://www.scopus.com/inward/record.url?scp=85168925743&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85168925743&partnerID=8YFLogxK
U2 - 10.1017/S0003055422001228
DO - 10.1017/S0003055422001228
M3 - Article
AN - SCOPUS:85168925743
SN - 0003-0554
VL - 117
SP - 1255
EP - 1274
JO - American Political Science Review
JF - American Political Science Review
IS - 4
ER -