TY - JOUR
T1 - Learning the opportunity cost of time in a patch-foraging task
AU - Constantino, Sara M.
AU - Daw, Nathaniel D.
N1 - Funding Information:
This research was funded by Human Frontiers Science Program Grant No. RGP0036/2009-C and by Grant No. R01MH087882 from the National Institute of Mental Health. N.D.D. is supported by a Scholar Award from the McDonnell Foundation. We thank Paul W. Glimcher for helpful discussions, and Dylan S. Simon for technical assistance.
Publisher Copyright:
© 2015, Psychonomic Society, Inc.
PY - 2015/12/1
Y1 - 2015/12/1
N2 - Although most decision research concerns choice between simultaneously presented options, in many situations options are encountered serially, and the decision is whether to exploit an option or search for a better one. Such problems have a rich history in animal foraging, but we know little about the psychological processes involved. In particular, it is unknown whether learning in these problems is supported by the well-studied neurocomputational mechanisms involved in more conventional tasks. We investigated how humans learn in a foraging task, which requires deciding whether to harvest a depleting resource or switch to a replenished one. The optimal choice (given by the marginal value theorem; MVT) requires comparing the immediate return from harvesting to the opportunity cost of time, which is given by the long-run average reward. In two experiments, we varied opportunity cost across blocks, and subjects adjusted their behavior to blockwise changes in environmental characteristics. We examined how subjects learned their choice strategies by comparing choice adjustments to a learning rule suggested by the MVT (in which the opportunity cost threshold is estimated as an average over previous rewards) and to the predominant incremental-learning theory in neuroscience, temporal-difference learning (TD). Trial-by-trial decisions were explained better by the MVT threshold-learning rule. These findings expand on the foraging literature, which has focused on steady-state behavior, by elucidating a computational mechanism for learning in switching tasks that is distinct from those used in traditional tasks, and suggest connections to research on average reward rates in other domains of neuroscience.
AB - Although most decision research concerns choice between simultaneously presented options, in many situations options are encountered serially, and the decision is whether to exploit an option or search for a better one. Such problems have a rich history in animal foraging, but we know little about the psychological processes involved. In particular, it is unknown whether learning in these problems is supported by the well-studied neurocomputational mechanisms involved in more conventional tasks. We investigated how humans learn in a foraging task, which requires deciding whether to harvest a depleting resource or switch to a replenished one. The optimal choice (given by the marginal value theorem; MVT) requires comparing the immediate return from harvesting to the opportunity cost of time, which is given by the long-run average reward. In two experiments, we varied opportunity cost across blocks, and subjects adjusted their behavior to blockwise changes in environmental characteristics. We examined how subjects learned their choice strategies by comparing choice adjustments to a learning rule suggested by the MVT (in which the opportunity cost threshold is estimated as an average over previous rewards) and to the predominant incremental-learning theory in neuroscience, temporal-difference learning (TD). Trial-by-trial decisions were explained better by the MVT threshold-learning rule. These findings expand on the foraging literature, which has focused on steady-state behavior, by elucidating a computational mechanism for learning in switching tasks that is distinct from those used in traditional tasks, and suggest connections to research on average reward rates in other domains of neuroscience.
KW - Computational model
KW - Decision making
KW - Dopamine
KW - Patch foraging
KW - Reinforcement learning
KW - Reward
UR - http://www.scopus.com/inward/record.url?scp=84947040901&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84947040901&partnerID=8YFLogxK
U2 - 10.3758/s13415-015-0350-y
DO - 10.3758/s13415-015-0350-y
M3 - Article
C2 - 25917000
AN - SCOPUS:84947040901
SN - 1530-7026
VL - 15
SP - 837
EP - 853
JO - Cognitive, Affective and Behavioral Neuroscience
JF - Cognitive, Affective and Behavioral Neuroscience
IS - 4
ER -