TY - GEN
T1 - An exploration-exploitation model based on norepinepherine and dopamine activity
AU - McClure, Samuel M.
AU - Gilzenrat, Mark S.
AU - Cohen, Jonathan D.
PY - 2005
Y1 - 2005
N2 - We propose a model by which dopamine (DA) and norepinepherine (NE) combine to alternate behavior between relatively exploratory and exploitative modes. The model is developed for a target detection task for which there is extant single neuron recording data available from locus coeruleus (LC) NE neurons. An exploration-exploitation trade-off is elicited by regularly switching which of the two stimuli are rewarded. DA functions within the model to change synaptic weights according to a reinforcement learning algorithm. Exploration is mediated by the state of LC firing, with higher tonic and lower phasic activity producing greater response variability. The opposite state of LC function, with lower baseline firing rate and greater phasic responses, favors exploitative behavior. Changes in LC firing mode result from combined measures of response conflict and reward rate, where response conflict is monitored using models of anterior cingulate cortex (ACC). Increased long-term response conflict and decreased reward rate, which occurs following reward contingency switch, favors the higher tonic state of LC function and NE release. This increases exploration, and facilitates discovery of the new target.
AB - We propose a model by which dopamine (DA) and norepinepherine (NE) combine to alternate behavior between relatively exploratory and exploitative modes. The model is developed for a target detection task for which there is extant single neuron recording data available from locus coeruleus (LC) NE neurons. An exploration-exploitation trade-off is elicited by regularly switching which of the two stimuli are rewarded. DA functions within the model to change synaptic weights according to a reinforcement learning algorithm. Exploration is mediated by the state of LC firing, with higher tonic and lower phasic activity producing greater response variability. The opposite state of LC function, with lower baseline firing rate and greater phasic responses, favors exploitative behavior. Changes in LC firing mode result from combined measures of response conflict and reward rate, where response conflict is monitored using models of anterior cingulate cortex (ACC). Increased long-term response conflict and decreased reward rate, which occurs following reward contingency switch, favors the higher tonic state of LC function and NE release. This increases exploration, and facilitates discovery of the new target.
UR - http://www.scopus.com/inward/record.url?scp=34250317199&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=34250317199&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:34250317199
SN - 9780262232531
T3 - Advances in Neural Information Processing Systems
SP - 867
EP - 874
BT - Advances in Neural Information Processing Systems 18 - Proceedings of the 2005 Conference
T2 - 2005 Annual Conference on Neural Information Processing Systems, NIPS 2005
Y2 - 5 December 2005 through 8 December 2005
ER -