Algorithmic models of human decision making in Gaussian multi-armed bandit problems

Paul Reverdy, Vaibhav Srivastava, Naomi E. Leonard

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Scopus citations

Abstract

We consider a heuristic Bayesian algorithm as a model of human decision making in multi-armed bandit problems with Gaussian rewards. We derive a novel upper bound on the Gaussian inverse cumulative distribution function and use it to show that the algorithm achieves logarithmic regret. We extend the algorithm to allow for stochastic decision making using Boltzmann action selection with a dynamic temperature parameter and provide a feedback rule for tuning the temperature parameter such that the stochastic algorithm achieves logarithmic regret. The stochastic algorithm encodes many of the observed features of human decision making.

Original languageEnglish (US)
Title of host publication2014 European Control Conference, ECC 2014
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages2210-2215
Number of pages6
ISBN (Electronic)9783952426913
DOIs
StatePublished - Jul 22 2014
Event13th European Control Conference, ECC 2014 - Strasbourg, France
Duration: Jun 24 2014Jun 27 2014

Publication series

Name2014 European Control Conference, ECC 2014

Other

Other13th European Control Conference, ECC 2014
Country/TerritoryFrance
CityStrasbourg
Period6/24/146/27/14

All Science Journal Classification (ASJC) codes

  • Control and Systems Engineering

Fingerprint

Dive into the research topics of 'Algorithmic models of human decision making in Gaussian multi-armed bandit problems'. Together they form a unique fingerprint.

Cite this