Infomax Strategies for an Optimal Balance Between Exploration and Exploitation

Gautam Reddy, Antonio Celani, Massimo Vergassola

Research output: Contribution to journalArticlepeer-review

11 Scopus citations

Abstract

Proper balance between exploitation and exploration is what makes good decisions that achieve high reward, like payoff or evolutionary fitness. The Infomax principle postulates that maximization of information directs the function of diverse systems, from living systems to artificial neural networks. While specific applications turn out to be successful, the validity of information as a proxy for reward remains unclear. Here, we consider the multi-armed bandit decision problem, which features arms (slot-machines) of unknown probabilities of success and a player trying to maximize cumulative payoff by choosing the sequence of arms to play. We show that an Infomax strategy (Info-p) which optimally gathers information on the highest probability of success among the arms, saturates known optimal bounds and compares favorably to existing policies. Conversely, gathering information on the identity of the best arm in the bandit leads to a strategy that is vastly suboptimal in terms of payoff. The nature of the quantity selected for Infomax acquisition is then crucial for effective tradeoffs between exploration and exploitation.

Original languageEnglish (US)
Pages (from-to)1454-1476
Number of pages23
JournalJournal of Statistical Physics
Volume163
Issue number6
DOIs
StatePublished - Jun 1 2016
Externally publishedYes

All Science Journal Classification (ASJC) codes

  • Statistical and Nonlinear Physics
  • Mathematical Physics

Keywords

  • Decision and information theory
  • Exploration and exploitation
  • Infomax
  • Large deviations
  • Multi-armed bandits

Fingerprint

Dive into the research topics of 'Infomax Strategies for an Optimal Balance Between Exploration and Exploitation'. Together they form a unique fingerprint.

Cite this