Advances in natural language processing provide accessible approaches to analyze psychological open-ended data. However, comprehensive instruments for text analysis of stereotype content are missing. We developed stereotype content dictionaries using a semi-automated method based on WordNet and word embeddings. These stereotype content dictionaries covered over 80% of open-ended stereotypes about salient American social groups, compared to 20% coverage from words extracted directly from the stereotype content literature. The dictionaries showed high levels of internal consistency and validity, predicting stereotype scale ratings and human judgments of online text. We developed the R package Semi-Automated Dictionary Creation for Analyzing Text (SADCAT; https://github.com/gandalfnicolas/SADCAT) for access to the stereotype content dictionaries and the creation of novel dictionaries for constructs of interest. Potential applications of the dictionaries range from advancing person perception theories through laboratory studies and analysis of online data to identifying social biases in artificial intelligence, social media, and other ubiquitous text sources.
All Science Journal Classification (ASJC) codes
- Social Psychology
- stereotype content
- text analysis
- word embeddings