TY - GEN
T1 - Evaluating Vector-Space Models of Word Representation, or, The Unreasonable Effectiveness of Counting Words Near Other Words
AU - Nematzadeh, Aida
AU - Meylan, Stephan C.
AU - Griffiths, Thomas L.
N1 - Publisher Copyright:
© CogSci 2017.
PY - 2017
Y1 - 2017
N2 - Vector-space models of semantics represent words as continuously-valued vectors and measure similarity based on the distance or angle between those vectors. Such representations have become increasingly popular due to the recent development of methods that allow them to be efficiently estimated from very large amounts of data. However, the idea of relating similarity to distance in a spatial representation has been criticized by cognitive scientists, as human similarity judgments have many properties that are inconsistent with the geometric constraints that a distance metric must obey. We show that two popular vector-space models, Word2Vec and GloVe, are unable to capture certain critical aspects of human word association data as a consequence of these constraints. However, a probabilistic topic model estimated from a relatively small curated corpus qualitatively reproduces the asymmetric patterns seen in the human data. We also demonstrate that a simple co-occurrence frequency performs similarly to reduced-dimensionality vector-space models on medium-size corpora, at least for relatively frequent words.
AB - Vector-space models of semantics represent words as continuously-valued vectors and measure similarity based on the distance or angle between those vectors. Such representations have become increasingly popular due to the recent development of methods that allow them to be efficiently estimated from very large amounts of data. However, the idea of relating similarity to distance in a spatial representation has been criticized by cognitive scientists, as human similarity judgments have many properties that are inconsistent with the geometric constraints that a distance metric must obey. We show that two popular vector-space models, Word2Vec and GloVe, are unable to capture certain critical aspects of human word association data as a consequence of these constraints. However, a probabilistic topic model estimated from a relatively small curated corpus qualitatively reproduces the asymmetric patterns seen in the human data. We also demonstrate that a simple co-occurrence frequency performs similarly to reduced-dimensionality vector-space models on medium-size corpora, at least for relatively frequent words.
KW - vector-space models
KW - word associations
KW - word representations
UR - http://www.scopus.com/inward/record.url?scp=85093714890&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85093714890&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85093714890
T3 - CogSci 2017 - Proceedings of the 39th Annual Meeting of the Cognitive Science Society: Computational Foundations of Cognition
SP - 859
EP - 864
BT - CogSci 2017 - Proceedings of the 39th Annual Meeting of the Cognitive Science Society
PB - The Cognitive Science Society
T2 - 39th Annual Meeting of the Cognitive Science Society: Computational Foundations of Cognition, CogSci 2017
Y2 - 26 July 2017 through 29 July 2017
ER -