Abstract
Building accurate language models that capture meaningful long-term dependencies is a core challenge in natural language processing. To-wards this end, we present a calibration-based approach to measure long-term discrepancies between a generative sequence model and the true distribution, and use these discrepancies to improve the model. Empirically, we show that state-of-the-art language models, including LSTMs and Transformers, are miscalibrated: the entropy rates of their generations drift dramatically up-ward over time. We then provide provable meth-ods to mitigate this phenomenon. Furthermore, we show how this calibration-based approach can also be used to measure the amount of memory that language models use for prediction.
| Original language | English (US) |
|---|---|
| Journal | Proceedings of Machine Learning Research |
| Volume | 119 |
| State | Published - 2020 |
| Event | 37th International Conference on Machine Learning, ICML 2020 - Virtual, Online Duration: Jul 13 2020 → Jul 18 2020 |
All Science Journal Classification (ASJC) codes
- Software
- Control and Systems Engineering
- Statistics and Probability
- Artificial Intelligence
Fingerprint
Dive into the research topics of 'Calibration, Entropy Rates, and Memory in Language Models'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver