Abstract
Real-valued word representations have transformed NLP applications; popular examples are word2vec and GloVe, recognized for their ability to capture linguistic regularities. In this paper, we demonstrate a very simple, and yet counter-intuitive, postprocessing technique – eliminate the common mean vector and a few top dominating directions from the word vectors – that renders off-the-shelf representations even stronger. The postprocessing is empirically validated on a variety of lexical-level intrinsic tasks (word similarity, concept categorization, word analogy) and sentence-level tasks (semantic textural similarity and text classification) on multiple datasets and with a variety of representation methods and hyperparameter choices in multiple languages; in each case, the processed representations are consistently better than the original ones.
| Original language | English (US) |
|---|---|
| State | Published - 2018 |
| Externally published | Yes |
| Event | 6th International Conference on Learning Representations, ICLR 2018 - Vancouver, Canada Duration: Apr 30 2018 → May 3 2018 |
Conference
| Conference | 6th International Conference on Learning Representations, ICLR 2018 |
|---|---|
| Country/Territory | Canada |
| City | Vancouver |
| Period | 4/30/18 → 5/3/18 |
All Science Journal Classification (ASJC) codes
- Language and Linguistics
- Education
- Computer Science Applications
- Linguistics and Language