Entropy Regularization for Population Estimation

Ben Chugg, Peter Henderson, Jacob Goldin, Daniel E. Ho

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Scopus citations


Entropy regularization is known to improve exploration in sequential decision-making problems. We show that this same mechanism can also lead to nearly unbiased and lower-variance estimates of the mean reward in the optimize-and-estimate structured bandit setting. Mean reward estimation (i.e., population estimation) tasks have recently been shown to be essential for public policy settings where legal constraints often require precise estimates of population metrics. We show that leveraging entropy and KL divergence can yield a better trade-off between reward and estimator variance than existing baselines, all while remaining nearly unbiased. These properties of entropy regularization illustrate an exciting potential for bridging the optimal exploration and estimation literatures.

Original languageEnglish (US)
Title of host publicationAAAI-23 Technical Tracks 10
EditorsBrian Williams, Yiling Chen, Jennifer Neville
PublisherAAAI press
Number of pages7
ISBN (Electronic)9781577358800
StatePublished - Jun 27 2023
Externally publishedYes
Event37th AAAI Conference on Artificial Intelligence, AAAI 2023 - Washington, United States
Duration: Feb 7 2023Feb 14 2023

Publication series

NameProceedings of the 37th AAAI Conference on Artificial Intelligence, AAAI 2023


Conference37th AAAI Conference on Artificial Intelligence, AAAI 2023
Country/TerritoryUnited States

All Science Journal Classification (ASJC) codes

  • Artificial Intelligence


Dive into the research topics of 'Entropy Regularization for Population Estimation'. Together they form a unique fingerprint.

Cite this