Neurophysiological recording experiments in the dopamine system by Schultz and colleagues (Science 275 (1997) 1593-1598) suggest that neurons there are involved in learning to predict rewards and assess behaviors using the temporal-difference algorithm. One aspect of this theory which is undeveloped and experimentally underconstrained is its assumption of an exhaustive input representing all stimuli and their history over time. We use the algorithm to model operant choice between concurrent variable interval schedules-a key animal conditioning experiment-and show that animals' subtly suboptimal performance resembles the behavior of the algorithm with a more limited input representation. This limitation may reflect the operation of an attentional mechanism gating the inputs to the dopamine system.
All Science Journal Classification (ASJC) codes
- Computer Science Applications
- Cognitive Neuroscience
- Artificial Intelligence
- Operant conditioning
- Temporal-difference learning