TY - JOUR
T1 - A decision task in a social context
T2 - Human experiments, models, and analyses of behavioral data
AU - Nedic, Andrea
AU - Tomlin, Damon
AU - Holmes, Philip
AU - Prentice, Deborah A.
AU - Cohen, J. D.
N1 - Funding Information:
Manuscript received August 18, 2010; revised June 9, 2011; accepted July 29, 2011. Date of publication November 2, 2011; date of current version February 17, 2012. This work was supported by the Air Force Office of Scientific Research (AFOSR) under Grant FA9550-07-1-0528 under the Multidisciplinary University Research Initiative. A preliminary account of part of this work appeared in J. D. Cohen et al., BShould I stay or should I go? How the human brain manages the trade-off between exploitation and exploration,[ Phil. Trans. Roy. Soc. Lond. B, vol. 362, pp. 933–942, 2007. A. Nedic is with the Department of Electrical Engineering, Princeton University, Princeton, NJ 08544 USA (e-mail: [email protected]). D. Tomlin and J. D. Cohen are with the Department of Psychology and Princeton Neuroscience Institute, Princeton University, Princeton, NJ 08544 USA (e-mail: [email protected]; [email protected]). P. Holmes is with the Princeton Neuroscience Institute, Department of Mechanical and Aerospace Engineering, and Program in Applied and Computational Mathematics, Princeton University, Princeton, NJ 08544 USA (e-mail: [email protected]). D. A. Prentice is with the Department of Psychology, Princeton University, Princeton, NJ 08544 USA (e-mail: [email protected]).
PY - 2012/3
Y1 - 2012/3
N2 - To investigate the influence of information about fellow group members in a constrained decision-making context, we develop four two-armed bandit tasks in which subjects freely select one of two options (A or B) and are informed of the resulting reward following each choice. Rewards are determined by the fraction x of past A choices by two functions fA(x),fB(x) (unknown to the subject) which intersect at a matching point x that does not generally represent globally optimal behavior. Playing individually, subjects typically remain close to the matching point, although some discover the optimum. Each task is designed to probe a different type of behavior, and subjects work in parallel in groups of five with feedback of other group members' choices, of their rewards, of both, or with no knowledge of others' behavior. We employ a soft-max choice model that emerges from a drift-diffusion process, commonly used to model perceptual decision making with noisy stimuli. Here the stimuli are replaced by estimates of expected rewards produced by a temporal-difference reinforcement-learning algorithm, augmented to include appropriate feedback terms. Models are fitted for each task and feedback condition, and we compare choice allocations averaged across subjects and individual choice sequences to highlight differences between tasks and intersubject differences. The most complex model, involving both choice and reward feedback, contains only four parameters, but nonetheless reveals significant differences in individual strategies. Strikingly, we find that rewards feedback can be either detrimental or advantageous to performance, depending upon the task.
AB - To investigate the influence of information about fellow group members in a constrained decision-making context, we develop four two-armed bandit tasks in which subjects freely select one of two options (A or B) and are informed of the resulting reward following each choice. Rewards are determined by the fraction x of past A choices by two functions fA(x),fB(x) (unknown to the subject) which intersect at a matching point x that does not generally represent globally optimal behavior. Playing individually, subjects typically remain close to the matching point, although some discover the optimum. Each task is designed to probe a different type of behavior, and subjects work in parallel in groups of five with feedback of other group members' choices, of their rewards, of both, or with no knowledge of others' behavior. We employ a soft-max choice model that emerges from a drift-diffusion process, commonly used to model perceptual decision making with noisy stimuli. Here the stimuli are replaced by estimates of expected rewards produced by a temporal-difference reinforcement-learning algorithm, augmented to include appropriate feedback terms. Models are fitted for each task and feedback condition, and we compare choice allocations averaged across subjects and individual choice sequences to highlight differences between tasks and intersubject differences. The most complex model, involving both choice and reward feedback, contains only four parameters, but nonetheless reveals significant differences in individual strategies. Strikingly, we find that rewards feedback can be either detrimental or advantageous to performance, depending upon the task.
KW - Decision making
KW - drift-diffusion model
KW - exploitation
KW - exploration
KW - group dynamics
KW - human behavior
KW - reinforcement learning
KW - social information
KW - two-armed bandit task
UR - http://www.scopus.com/inward/record.url?scp=84857313572&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84857313572&partnerID=8YFLogxK
U2 - 10.1109/JPROC.2011.2166437
DO - 10.1109/JPROC.2011.2166437
M3 - Article
AN - SCOPUS:84857313572
SN - 0018-9219
VL - 100
SP - 713
EP - 733
JO - Proceedings of the IEEE
JF - Proceedings of the IEEE
IS - 3
M1 - 06069518
ER -