TY - GEN
T1 - Calibration, entropy rates, and memory in language models
AU - Braverman, Mark
AU - Chen, Xinyi
AU - Kakade, Sham
AU - Narasimhan, Karthik
AU - Zhang, Cyril
AU - Zhang, Yi
N1 - Funding Information:
MB is supported in part by the NSF Alan T. Waterman Award, Grant No. 1933331, a Packard Fellowship in Science and Engineering, and the Simons Collaboration on Algorithms and Geometry. SK is supported by NSF CCF Grant No. 1637360. KN is supported by the Princeton SEAS Innovation Grant.
Publisher Copyright:
© 2020 37th International Conference on Machine Learning, ICML 2020. All rights reserved.
PY - 2020
Y1 - 2020
N2 - Building accurate language models that capture meaningful long-term dependencies is a core challenge in natural language processing. Towards this end, we present a calibration-based approach to measure long-term discrepancies between a generative sequence model and the true distribution, and use these discrepancies to improve the model. Empirically, we show that stateof- the-art language models, including LSTMs and Transformers, are miscalibrated: The entropy rates of their generations drift dramatically upward over time. We then provide provable methods to mitigate this phenomenon. Furthermore, we show how this calibration-based approach can also be used to measure the amount of memory that language models use for prediction.
AB - Building accurate language models that capture meaningful long-term dependencies is a core challenge in natural language processing. Towards this end, we present a calibration-based approach to measure long-term discrepancies between a generative sequence model and the true distribution, and use these discrepancies to improve the model. Empirically, we show that stateof- the-art language models, including LSTMs and Transformers, are miscalibrated: The entropy rates of their generations drift dramatically upward over time. We then provide provable methods to mitigate this phenomenon. Furthermore, we show how this calibration-based approach can also be used to measure the amount of memory that language models use for prediction.
UR - http://www.scopus.com/inward/record.url?scp=85105134213&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85105134213&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85105134213
T3 - 37th International Conference on Machine Learning, ICML 2020
SP - 1066
EP - 1076
BT - 37th International Conference on Machine Learning, ICML 2020
A2 - Daume, Hal
A2 - Singh, Aarti
PB - International Machine Learning Society (IMLS)
T2 - 37th International Conference on Machine Learning, ICML 2020
Y2 - 13 July 2020 through 18 July 2020
ER -