FEATURE ARTICLE Understanding the Neural

Nov 21, 2007 - second time from trials 4 to 6 (trial 4 in Fig. 1). For the ... fMRI-compatible keyboard with 5 buttons positioned under the 5 fingers of the .... [TR] = 2400 ms, time echo = 35 ms, flip angle = 80°, field of view = 19.2 3 ... EP images.
990KB taille 2 téléchargements 320 vues
Cerebral Cortex July 2008;18:1485--1495 doi:10.1093/cercor/bhm198 Advance Access publication November 21, 2007

FEATURE ARTICLE Understanding the Neural Computations of Arbitrary Visuomotor Learning through fMRI and Associative Learning Theory

Andrea Brovelli1, Nadia Laksiri1, Bruno Nazarian2, Martine Meunier1 and Driss Boussaoud1

Associative theory postulates that learning the consequences of our actions in a given context is represented in the brain as stimulus-response--outcome associations that evolve according to predictionerror signals (the discrepancy between the observed and predicted outcome). We tested the theory on brain functional magnetic resonance imaging data acquired from human participants learning arbitrary visuomotor associations. We developed a novel task that systematically manipulated learning and induced highly reproducible performances. This granted the validation of the model-based results and an in-depth analysis of the brain signals in representative single trials. Consistent with the Rescorla--Wagner model, prediction-error signals are computed in the human brain and selectively engage the ventral striatum. In addition, we found evidence of computations not formally predicted by the Rescorla-Wagner model. The dorsal fronto-parietal network, the dorsal striatum, and the ventrolateral prefrontal cortex are activated both on the incorrect and first correct trials and may reflect the processing of relevant visuomotor mappings during the early phases of learning. The left dorsolateral prefrontal cortex is selectively activated on the first correct outcome. The results provide quantitative evidence of the neural computations mediating arbitrary visuomotor learning and suggest new directions for future computational models.

studies using positron emission tomography and functional magnetic resonance imaging (fMRI) have also sought to identify the underlying brain correlates. Regional cerebral blood flow has been shown to vary during learning in the frontal and parietal networks (Deiber et al. 1997) and in the prefrontal-basal ganglia pathways (Toni and Passingham 1999). Learningrelated changes in the blood oxygenation level--dependent (BOLD) signal have been identified in the temporal and prefrontal areas (Toni et al. 2001), and in the parietal and frontal cortical areas (Eliassen et al. 2003). More recently, BOLD changes reflecting the probability of correct response have been found in the medial temporal lobe (MTL) as well as in the cingulate cortex and frontal lobe (Law et al. 2005). These studies have established that arbitrary visuomotor learning engages a large brain network including the frontal-parietal system, the basal ganglia, and medial temporal structures. However, the neural computations mediating the acquisition of arbitrary visuomotor relations are not fully understood. To address this issue, we put forward hypotheses about the internal computations using animal associative learning theory (Dickinson 1980; Pearce 1997). We assumed that this ability resides in the computation of goal-directed stimulus--response--outcome associations whose strengths depend on the contingency and contiguity of the events, as well as the current goals and motivational state (Rescorla 1991; Dickinson 1994; Pearce 1997; Balleine and Dickinson 1998). We then predicted the evolution of the associative values and outcome prediction errors (i.e., the discrepancy between the observed and predicted outcome) by fitting the Rescorla--Wagner model (1972) to the behavioral performances of human participants learning arbitrary visuomotor associations. In the fMRI analysis, we tested if the predicted computations are represented in the brain using a model-based analysis of the BOLD signals. In addition, because the newly developed task systematically manipulated learning and induced highly reproducible performances, we validated and complemented the results by analyzing the brain signals in representative single trials. The results provide novel insights into the neural computations subserving arbitrary visuomotor learning and how distributed brain networks cooperate in the early phases of learning.

Keywords: fronto-parietal system, fronto-striatal system, model-based fMRI, prediction-error signal, reinforcement learning

Learning the consequences of our actions in their context is a fundamental cognitive ability, because it allows us and other animals to anticipate relevant events and to adapt to varying environments. If the relation between the visual stimulus, or context, the action, and its outcome is arbitrary and causal, we refer to as arbitrary visuomotor learning (Wise and Murray 2000). In monkey neurophysiology, research on its neural bases has relied on 2 complementary approaches. One searched for covariations between the firing rate of single neurons and behavioral performance, measured by the probability of correct response. Three classes of neurons, correlating positively or negatively with the probability of correct response or with its rate of change, have been found in the dorsal premotor (PMd) and prefrontal cortex, striatum, and hippocampus (for reviews, see Wise and Murray 2000; Brasted and Wise 2005; Suzuki and Brown 2005). A complementary approach searched for changes in selectivity for the rewarded action in the average firing rate of neuronal populations (Asaad et al. 1998; Pasupathy and Miller 2005). The authors found that the selectivity strength of neurons in the prefrontal cortex and in the striatum increases and its latency decreases (occurs earlier in the trial) as learning takes place. In human neuroimaging, a number of Ó The Author 2007. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: [email protected]

1

CNRS UMR 6193, Mediterranean Institute for Cognitive Neuroscience, 31 chemin Joseph Aiguier, 13402 Marseille, France and 2Centre IRMf, IFR 131 Cerveau et Cognition, Hoˆpital de La Timone, Marseille, France

Materials and Methods Subjects Fourteen healthy subjects participated in the study (all were right handed and 7 were females; average age 26 years old). All participants gave written informed consent according to established institutional guidelines and they received monetary compensation (45 euros) for their participation. The project has been approved by the local ethics committee.

Task Design Our implementation of arbitrary visuomotor learning required subjects to find by trial-and-error the correct associations between 3 colored circles and 5 finger movements (Fig. 1A). We developed a new task design to allow both a model-based analysis of the fMRI data (Corrado and Doya 2007; O’Doherty et al. 2007) and a trial-based validation of the results. Trial-based analysis is problematic when the behavioral choices are poorly reproducible across subjects and sessions, such as in the early phases of learning. To ensure the statistical power to analyze the BOLD responses in single representative trials, the novel task had to systematically manipulate learning and induce highly reproducible behavioral performances across subjects. To do so, the correct stimulus--response associations were not set a priori. Instead, the correct stimulus--response associations were assigned as the subject proceeded in the task. Figure 1B shows the matrix of all the possible stimulus--response combinations, updated according to the exemplar learning session (Fig. 1A). The first presentation of each stimulus was always followed by an incorrect outcome, irrespective of the subject’s motor response (Fig. 1B, from trials 1 to 3). Then, on the second presentation of stimulus S1, any untried finger movement was always considered as a correct response. Because the stimuli were presented randomly within blocks of 3 trials, stimulus S1 could occur for the second time from trials 4 to 6 (trial 4 in Fig. 1). For the second stimulus S2, the response was correct only when the subjects had performed 3 incorrect finger movements (trial 9 in Fig. 1B). For stimulus S3, the subject had to try 4 different finger movements before the correct response was found (trial 12 in Fig. 1B). In other words, the correct

response was the second finger movement (different from the first tried response) for stimulus S1, the fourth finger movement for stimulus S2, the fifth for stimulus S3. Each learning session was composed of 42 trials, 3 stimulus types (i.e., different colors, S1, S2, and S3) and 5 possible finger movements. During scanning, each subject performed 4 learning sessions, each containing new colored stimuli. The second issue that motivated our task design was the need to dissociate the BOLD signals produced by the stimulus presentation from those due to the outcome. To do so, we introduced a variable delay ranging from 4 to 12 s (randomly drawn from a log-normal distribution) between the presentation of the colored circles and the feedback image. The next trial started after a variable delay ranging from 4 to 12 s with the presentation of another visual stimulus. The outcome image informed the subject whether the response was correct, incorrect, or too late (if the reaction time exceeded 1.5 s). ‘‘Late’’ trials were excluded from the analysis, because they were very rare (i.e., maximum 2 late trials per session). The visual stimulus was projected onto the center of a screen positioned in the back of the scanner using a video projector. In the scanner, subjects could see the video reflected in a mirror (15 3 9 cm) suspended 10 cm in front of their face and subtending visual angles of 42° horizontally and 32° vertically. Finger movements and their timings were recorded using an fMRI-compatible keyboard with 5 buttons positioned under the 5 fingers of the right hand. The subjects were told that the correct stimulus--response associations were not mutually exclusive (i.e., 2 stimuli could be associated with the same finger movement), meaning that they could not infer correct associations by excluding previous correct movements. Subjects were explicitly told that every stimulus was associated with only 1 finger movement, thus convincing them that the probability of correct response on the first presentation of each stimulus was 0.2 (1 over 5 possible finger movements). When interviewed after the scanning session, none of the subjects realized that learning was manipulated by the experimenter. In particular, none realized that the first stimulus presentation was always followed by an incorrect outcome, irrespective of the response. Behavioral Model Implementation of the Associative Model Various models provide different accounts of the evolution of the associative values (Rescorla and Wagner 1972; Macintosh 1975; Pearce and Hall 1980), but all assign a central role to the computation of the outcome prediction error (i.e., the discrepancy between the observed and predicted outcome). We modeled arbitrary visuomotor learning using one of the most influential associative learning theories, the Rescorla--Wagner model (1972). The evolution of the associative strengths for each action are given by Va ðn + 1Þ = Va ðnÞ – gðk – Va ðnÞÞ

Figure 1. (A) Task design of an exemplar learning session. (B) Matrix of all the possible stimulus--response combinations corresponding to the exemplar session in (A). A red cross and a green tick-mark refer to incorrect and correct stimulus-response sequences, respectively. The first presentation of each stimulus was always followed by an incorrect outcome, irrespective of the motor response (from trials 1 to 3). On the second presentation of the stimulus S1 (the blue circle), any untried finger movement was always followed by a correct outcome (trial 4). The correct response for S2 and S3 (red and green circles, respectively) was found after 3 and 4 incorrect finger movements (at trials 9 and 12, respectively).

1486 Neural Representations of Arbitrary Visuomotor Learning

d

Brovelli et al.

ð1Þ

where n is the trial number, a = 1, . . . k (k is the number of possible actions) and g is the learning rate for a given stimulus. The asymptotic value of the association strength k takes values greater than 0 after a correct response, 0 if incorrect. The change in associative strength at each trial is proportional to the prediction-error signal, which represents the discrepancy between the obtained and expected outcome (for a review see, Schultz 2006). The expected outcome prior to learning is formalized as the initial value of the associative strength. Because we had instructed the subjects that 1 of the 5 finger movements had to be associated to a given stimulus, the initial probability of correct response can be assumed to be 0.2. To model such initial probabilities, we set the initial value of all associative strengths to k/k. Associative learning theory claims that the probability to perform a given action is proportional to its associative value. To model such an action selection process, we transformed the association values V(n) into probabilities according to the softmax equation, which is a standard method in reinforcement learning theory (Sutton and Barto 1998):   Pa n = expðbVa ðnÞÞ + expðbVa ðnÞÞ a

ð2Þ

The coefficient b is termed the inverse ‘‘temperature’’: low b (less than 1) causes all actions to be (nearly) equiprobable, whereas high b (greater than 1) amplifies the differences in association values. In reinforcement learning theory (Sutton and Barto 1998), this model is also referred to as the Q-learning algorithm (Watkins and Dayan 1992), in which action values are updated through the Rescorla--Wagner learning rule. To summarize, learning is described as follows: 1) the learner computes the stimulus--response--outcome association values according to the expected outcome, 2) he/she selects an action by comparing the values of multiple stimulus--response alternatives (softmax equation), and 3) he/she updates the association values according to an error-correcting process equal to the discrepancy between the obtained and estimated outcome (i.e., the prediction-error signal). Overall, learning requires the computation of the associative values and the prediction-error signals. Estimation of the Models’ Parameters To quantify the learning computations, we need to define the free parameters of the model, namely the learning rate g, the asymptotic value of the associative strength k, and the inverse ‘‘temperature’’ b in the softmax method (eq. 2). We identified the set of parameters that best fitted the behavioural data using a maximum likelihood approach. For each learning session, we varied the learning rate g from 0.1 to 1 (in steps of 0.05), the asymptotic value k was varied from 1 to 5 (in steps of 0.5) and b was varied from 0.5 to 5 (in steps of 0.25). For each parameter set, we computed the log-likelihood of the probability to make the action performed by the animal as follows: L = + ln Pa ðnÞ

ð3Þ

n

where Pa(n) is the probability to perform the action executed by the participant at trial n according to the model (eq. 2). If the model perfectly predicts the responses, Pa(n) is 1 and the log-likelihood L is 0. If the probability values are less than 1, L assumes negative values. Thus, we selected the set of model parameters h# giving the best fit (i.e., highest value of L) with the behavioural data: h# = arg maxh L



ð4Þ

fMRI Data Acquisition and Preprocessing Images were acquired using a 3-T whole-body imager equipped with a circular polarized head coil. For each participant, we acquired a highresolution structural T1-weighted anatomical image (inversion-recovery sequence, 1 3 0.75 3 1.22 mm) parallel to the anterior commissureposterior commissure plane, covering the whole brain. For functional imaging, we used a T2*-weighted echo-planar (EP) sequence at 36 interleaved 3-mm-thick axial slices with 1-mm gap (time repetition [TR] = 2400 ms, time echo = 35 ms, flip angle = 80°, field of view = 19.2 3 19.2 cm, 64 3 64 matrix of 3 3 3 mm voxels). Image analysis was conducted using SPM5 (http://www.fil.ion.ucl.ac.uk/spm/). For each subject, all functional images were slice-time corrected to a slice acquired half-way through the volume acquisition (at TR/2) and then realigned to the first slice of each learning session. The anatomical MRI was spatially normalized to a standard T2* template and the normalization parameters were used to normalize the preprocessed functional EP images. The resulting fMRI data were then smoothed using Gaussian kernel of a full width at half-maximum of 8 mm. Finally, intensity normalization and high-pass filtering (128 s) were applied to the data. fMRI Data Analysis Model-Based Analysis. Identification of Learning-Related Activations We performed a model-based analysis of the fMRI data to test if the computations predicted by the associative model are represented in the brain. The statistical analysis of the preprocessed event-related BOLD signals was performed using a general linear model approach. Because we aimed at dissociating 2 events per trial (the stimulus-response event and the outcome image, as in Fig. 1A), the regressors were constructed by convolving the canonical hemodynamic response

function with delta functions of constant or varying amplitudes aligned on the time of stimulus or outcome presentation. The first and third regressors of the design matrix had unit amplitudes and were aligned on the time of stimulus and outcome presentation, respectively. These 2 regressors are excluded from the results, as they account for BOLD responses that did not vary during learning. Results will focus on 3 regressors whose amplitude varied according to the associative model: 1 regressor (the second in the design matrix) ‘‘searched’’ for the neural representations of the associative values and 2 (the fourth and fifth) for those of the prediction-error signals. The delta functions of the second regressor with onsets at the stimulus presentations and a duration of 1.5 s was parametrically modulated according to the associative values Va(n). In order to explore the neural substrate of prediction-error signals, the amplitude of the fourth regressor varied according to the rate of change of the Va(n), that is Va(n + 1) – Va(n) and it was aligned on the outcome image. Because the prediction-error is negative on the first incorrect trials, we also hypothesized that negative predictionerror signals could produce positive BOLD deflections. Therefore, the parametric modulation of the fifth regressor was equal to the absolute value of the fourth regressor, that is the absolute value of the prediction-error signals. The regression parameters (the beta values) were estimated for each subject and design matrix, and they were taken to the random-effects level. All the fMRI statistics and P values arise from group random-effects analyzes. Because we searched for significant correlations over the entire brain, we considered as activated all the voxels with a P < 0.01 after whole-brain correction for multiple comparisons (family-wise error). In addition, we defined as activated brain regions, those clusters of more than 5 voxels, thus giving a P < 0.001 corrected for cluster dimension. The Montreal Neurological Institute (MNI) coordinates of the activated voxels were transformed to Talairach space using a Matlab (MathWorks, Inc. MA) code developed by Matthew Brett (http:// imaging.mrc-cbu.cam.ac.uk/imaging/MniTalairach). The anatomical location of each activated voxel within each activated cluster was identified using the Talairach Daemon software (Lancaster et al. 2000) (http://ric.uthscsa.edu/resources/), and the location of a cluster was defined as the brain region corresponding to the majority of activated voxels. The extent and the number of voxels in each activated cluster was defined on functional rather than anatomical data. Trial-Based Analysis. Computation of the Grand-Average BOLD Responses and the Identification of Learning-Related Functional Networks To validate the results of the model-based analysis and to better understand the functional role of each activated cluster, we analyzed the BOLD responses in 5 sets of representative single trials: the first set of 3 incorrect trials (trials 1--3 in each learning session, Fig. 1A), the next set of 3 incorrect trials, the first set of 3 correct trials (1 for each stimulus--response association), and the second and third sets of correct trials (3 trials each for each learning session). We first normalized the preprocessed fMRI signals by subtracting the mean value and dividing by standard deviation calculated over the scanning session. Then, we averaged the normalized time courses over the activated voxels for each cluster. In order to attain the highest temporal resolution, we grouped all the data points across subjects for a given set of trials (168 trials: 14 subjects with 4 sessions and 3 trials each), and we analyzed the data in sliding windows of 1.2 s (TR/2) stepped every 0.1 s. For each time window, we computed the mean BOLD response and the standard error of the mean (Supplementary Fig. 2). Because the fMRI scans were not synchronized to stimulus presentation, the BOLD signals were not sampled at exactly the same latencies with respect to the external events, but varied across trials. By grouping all the data across subjects, we could thus compute the grand-average BOLD responses for the 5 sets of representative trials at a temporal resolution less than the 2.4 s, the repetition time (Supplementary Fig. 2). On average, each window of 1.2 s contained 83.5 ± 10.6 (mean ± standard deviation) data points. We then examined whether the different clusters formed functionally homogeneous networks. To investigate grouping in our fMRI data, we used a hierarchical clustering method. Hierarchical clustering uses similarity or dissimilarity between every pair of objects in the data to create a hierarchical tree or dendrogram. We analyzed the concatenated grand-average BOLD responses for the 5 sets of trials, and we computed

Cerebral Cortex July 2008, V 18 N 7 1487

the Pearson’s correlation r among all pairs of clusters to create a dissimilarity matrix. The dissimilarity or ‘‘distance’’ between clusters was defined as 1 – r: 2 highly correlated clusters (r close to 1) had shortest ‘‘distance’’ (close to 0) and could be considered as part of the same network; very dissimilar clusters (r close to 0) had largest ‘‘distance’’ and they should have been considered as belonging to different networks. We identified agglomerations according to the minimum distance (i.e., maximal correlation) between clusters and we built a hierarchical tree by progressively merging clusters into networks using an algorithm known as single linkage clustering algorithm. The created hierarchical tree, or dendrogram, is read from bottom to top: clusters with shortest distance (i.e., networks composed of highly correlated clusters) are initially merged together and then merged into successively larger networks. The height of each inverted U is the distance (1 – r) between the 2 clusters or networks being connected. Further information on clustering methods can be found on (http:// en.wikipedia.org/wiki/Data_clustering). Matlab (The MathWorks, Inc.) functions in the Statistical Toolbox were used for this analysis.

Results Behavioral Results and Model’s Predictions Our task design produced reproducible learning performances across sessions and subjects. As illustrated in Figure 2A, the small standard error values provide evidence for highly

Figure 2. Behavioral learning curves. (A) The 3 curves (1 for each stimulus) representing the probability of correct response versus the number of stimulus presentations (not according to the actual appearance during the learning session) computed on the sequence of outcome (1 for correct and 0 for incorrect). The small standard error values provide evidence for quasi-stereotyped learning behaviors. (B) The probability of correct response computed on the sequences of outcomes using the state-space model approach developed by Smith et al. (2004).

1488 Neural Representations of Arbitrary Visuomotor Learning

d

Brovelli et al.

reproducible learning performances for the 3 stimulus-response associations. Figure 2B shows the probability of correct response computed on the sequences of outcomes using the state-space model approach developed by Smith et al. (2004) and previously applied on animal data by Wirth et al. (2003). Similarly, the variance across sessions and subjects is small with respect to the mean value. The first association is rapidly learnt, whereas the second and third are acquired more gradually. Thus, our task design successfully manipulated learning performance and induced reproducible performances ranging from rapid to gradual acquisition. In addition, the observed learning curves are similar to those observed in previous electrophysiological and neuroimaging studies (e.g., Mitz et al. 1991; Chen and Wise 1995; Law et al. 2005). Model’s Predictions The associative model significantly fitted the behavioral responses during learning (Supplementary Fig. 1). The learning computations predicted by the associative model are the associative value Va(t) and its rate of change, which is proportional to the prediction-error signal. Figure 3A shows the mean associative values after the presentation of the 3 different stimuli averaged over sessions and subjects. The small standard error values provide evidence for highly reproducible learning computations. The associative strengths increased rapidly after the second, fourth, and fifth presentation of stimuli 1, 2, and 3, respectively. The average increase in associative strength for stimulus 1 displayed a step-like function, whereas it was more gradual for stimulus 3. This is evident in the average prediction-error curve (Fig. 3B). On the stimulus presentation where the action was incorrect (e.g., presentations 1, 2, and 3 for stimulus 2), the prediction error of the performed (incorrect) action was negative, as predicted by equation 2. The prediction-error values were positive on the first correct trials for all stimuli (Fig. 3B). For stimuli 2 and 3, the prediction-error signals did not reach zero on the second correct trials, but maintained above zero on the first few trials (Fig. 3B inset, stimulus presentations 6--8). This explains the more gradual acquisition of the second and third association. In addition, the peak prediction-error values decreased slightly from stimulus 1 to stimulus 3. This is in line with the notion that the prediction-error is higher for unexpected positive feedback (i.e., correct action after the presentation of stimulus 1) than for nearly predictable outcomes (i.e., the last possible action for stimulus 3). Figure 3C shows the average probability of correct response predicted by the associative model (eq. 2) and fits the one computed on the behavioral data shown in Figure 2B. Neuroimaging Results We searched for the brain regions whose BOLD response correlated with the learning computations predicted by the associative model, that is the associative strengths and the prediction-error signals. No voxels were found to correlate with the associative values at the presentation of the stimulus. By contrast, several brain structures displayed BOLD responses correlating with either the signed or the absolute values of the prediction-error signals (fourth and fifth regressors). The results are summarized in Table 1: 2 clusters (cluster numbers 1 and 2) preferentially correlated with the signed predictionerror signals during learning and 9 clusters (3--11) with its

Figure 3. Mean evolution of the learning representations and the probability of correct response (PCR). (A) Associative values V(t) (1 for each stimulus); the values are plotted against the number of stimulus presentations and not according to the actual appearance during the learning session. (B) Average prediction-error curves. (C) Average PCR predicted by the associative model (eq. 2). The inset in (B) is a zoom-in from trials 6--10 of the average prediction-error curves.

Table 1 Activated clusters correlating with prediction-error signals Cluster Cluster P value Peak MNI name size coordinates (x, y, z) Prediction error 1 7 0.0001 2 7 0.0001 Absolute prediction error 3 59 0.0000 4 97 0.0000 5 17 0.0000 6 92 0.0000 7 16 0.0000 8 23 0.0000 9 9 0.0001 10 20 0.0000 11 12 0.0000

9 45

Peak Peak Brain area P value T value

Hemisphere

3 3 0.0001 6 30 0.0009

7.00 6.24

Ventral striatum R DLPFC (BA 9) L

21 15 3 0.0000 15 18 0 0.0000 27 0 54 0.0004 27 6 60 0.0000 6 0 57 0.0002 21 72 63 0.0012 6 69 57 0.0029 45 42 51 0.0009 30 54 0 0.0005

7.60 8.52 6.48 8.07 6.59 6.17 5.96 6.26 6.38

Dorsal striatum Dorsal striatum PMd (BA 6) PMd (BA 6) SMA (BA 6) SPL (BA 7) SPL (BA 7) IPL (BA 7) Anterior VLPFC (BA 10)

L R L R L L R R R

Note: Clusters 1--2 significantly correlate with the signed value of the predictions error signal, whereas clusters 3--11 with its absolute value. The P values for each cluster (third column) arise from group random-effects analyses and they are corrected for multiple comparisons (family-wise error) and cluster size (see Methods). Abbreviations: BA, Brodmann area; DLPFC, dorsolateral prefrontal cortex.

absolute value. Clusters 1 and 2 were localized in the ventral portion of the right head of the caudate nucleus (i.e., the ventral striatum) and in the left dorsolateral prefrontal cortex (DLPFC, Fig. 4). Clusters 3--11 were localized in different cortical and subcortical areas (Fig. 5): clusters 3 and 4 were localized in the left and right dorsal striatum, respectively (left and right head of the dorsal portion of the caudate nucleus and putamen); clusters 5 and 6 were in the premotor areas (left and right PMd cortices); cluster 7 in the left supplementary motor area (SMA); clusters 8 and 9 were located in the superior parietal lobule (SPL); and cluster 10 in the inferior parietal lobule (IPL); finally, cluster 11 was in the anterior portion of the ventrolateral prefrontal cortex in the right hemisphere (VLPFC). In order to examine whether the different learning-related clusters formed functionally homogeneous networks, we analyzed the grand-average BOLD responses using a hierarchical clustering method (see Methods). Figure 6 shows the hierarchical tree or dendrogram obtained from this analysis. A first network originated from the 2 premotor areas (PMd left and right) before merging with the 2 superior parietal clusters and finally with the inferior parietal cluster (green in Fig. 6);

Figure 4. Brain clusters significantly correlating with the prediction-error signals. (A) Axial and (B) coronal views of the activations. The cluster number is the same as in Table 1. Activated voxels with a P \ 0.01 after whole-brain correction for multiple comparisons (family-wise error). Activated clusters were those with more than 5 voxels, thus giving a P \ 0.001 corrected for cluster dimension.

a second network formed between the clusters in the dorsal striatum and the SMA (in blue). These 2 networks then merged together with the anterior VLPFC (in yellow), the DLPF cortex (in red), and finally with the ventral striatum (in gray). Figure 7A shows the grand-average BOLD responses for the 5 clusters merging in the fronto-parietal network (in green in Fig. 7). These clusters significantly correlate with the absolute value of the prediction error (Table 1) and they display an increase on both the incorrect (Fig. 7A, left panels) and first correct trials (Fig. 7A, right panels). This is evident also in Figure 8A, which shows the histogram of the beta values of the second-level analysis (group analysis across subjects) for the 5 regressors of the design matrix: the main contribution comes from correlations with the absolute value of the prediction error (fifth column in each panel of Fig. 8). Figure 7B shows the grandaverage BOLD responses for 3 clusters forming a fronto-striatal network (blue in Fig. 6), that significantly correlated with the absolute value of the prediction-error signal (Table 1). However, a closer inspection of the histograms of the beta coefficients in Figure 8B indicates a nonzero contribution of the fourth regressor, thus indicating a small canceling effect between the signed and absolute prediction-error signals. This reflects the fact that the BOLD responses on the first set of correct trials is stronger than those on the incorrect trials. Figure 7C shows the grand-average BOLD responses in the cluster located in the anterior ventrolateral prefrontal cortex, which correlated with Cerebral Cortex July 2008, V 18 N 7 1489

Figure 5. Brain clusters significantly correlating with the absolute value of the prediction-error signals. (A) and (B) are coronal views of the activations whereas (C) and (D) are axial views. The cluster numbering is the same as in Table 1.

Figure 6. Cluster analysis. The dendrogram is read from bottom to top, starting from the shortest distance between clusters (i.e., networks composed of highly correlated clusters). The height of each inverted U represents the distance (1  r) between the 2 clusters being connected. The cluster number is the same as in Table 1; clusters 1 and 2 are selective for the signed value of the prediction-error signals, whereas clusters 3--11 correlate with its absolute value. The corresponding cluster names are also reported.

the absolute values of the prediction errors (Fig. 8C). Figure 7D shows the response for the left DLPFC, which significantly correlated with the signed values of the prediction-error signal. However, this cluster shows a strong response only on the first correct trials (green curve in the right panel); in fact, the contributions from the signed and absolute values of the prediction errors are similar (Fig. 8D) and thus canceled out in the incorrect trials. Finally, the grand-average BOLD response measured in the ventral striatum (Fig. 7E) showed a negative peak around 4 s after incorrect outcome onset and a strong positive deflection on the first correct trials, thus correlating with the signed value of the prediction-error signal (Fig. 8E). Only this cluster specifically activated as the signed predictionerror signal. This analysis shows that the trial-based validation of the model-based fMRI results provides a deeper insight into the dynamics of activation. Discussion We tested the Rescorla--Wagner model on fMRI data acquired from human participants learning arbitrary visuomotor associations. We developed a novel task that systematically manipulated learning and induced highly reproducible performances across subjects and sessions. This allowed us to perform a model-based analysis of the fMRI and an in-depth validation of 1490 Neural Representations of Arbitrary Visuomotor Learning

d

Brovelli et al.

the results using a trial-based approach. The results provide quantitative evidence of the neural computations mediating arbitrary visuomotor learning, some beyond those predicted by the Rescorla--Wagner model. Neural Computations Predicted by the Rescorla--Wagner Model Associative Values No brain areas were found to reflect the stimulus--response-outcome associative strengths. Previous neuroimaging studies of arbitrary visuomotor learning found learning-related increases in the temporal and prefrontal areas (Toni et al. 2001) and in the parietal and frontal cortical areas (Eliassen et al. 2003). BOLD signals correlating with the probability of correct response, which is a relative measure of the associative values, have been found in the MTL as well as in the cingulate cortex and frontal lobe (Law et al. 2005). However, these effects are not comparable with those presented here, because they reflected the brain changes produced both by the stimulus--response pair and the outcome. Haruno and Kawato (2006) found stimulus-specific BOLD correlates of the associative strengths in the putamen, superior parietal, dorsolateral prefrontal, PMd, and occipital cortices, insula, thalamus, cerebellum, anterior cingulate cortex, and SMA. Kim et al.

Figure 7. Grand-average BOLD responses aligned on the outcome image for the 5 sets of trial (legend). The left panels shows the grand-average BOLD responses for the 2 sets of incorrect trials (black and gray curves), whereas the right panels show the responses for the first, second, and third set of correct trials (green, red, and blue curves, respectively). (A) Network including the PMd and the parietal lobule (green network in Fig. 6). (B) Network with the SMA and the dorsal striatum (blue in Fig. 6). (C) Right VLPFC cluster (yellow in Fig. 6). (D) Left dorsolateral prefrontal cortex, and (E) the ventral striatum.

(2006) also showed that the stimulus-driven activity in the orbito-frontal cortex reflects the expected reward value signals. However, both studies used stochastic rather than deterministic visuomotor associations, and they also differed in the computational models and statistical tools used. From the neurophysiological point of view, several studies have shown that neurons displaying either an increase or decrease in firing rate during learning are spatially intermixed in the brain regions mediating arbitrary visuomotor learning (Wise and Murray 2000; Hadj-Bouziane and Boussaoud 2003; Brasted and Wise 2005; Suzuki and Brown 2005). Neurons coding for action-specific reward values were also found in the striatum (Samejima et al. 2005). If we interpret such modulations as the correlates of the present associative values at the single neuron level, it is tempting to suggest that local canceling effects could produce null net BOLD signals, and thus explain our negative results. Overall, further work is needed to provide a global picture of the neural correlates at the single neuron and system level of the associative values, whether during probabilistic or deterministic learning.

Prediction-Error Signals The activity in the ventral striatum correlated with the signed value of the prediction-error signals (Table 1, Fig. 5). The grandaverage BOLD responses showed a deactivation on the 2 sets of incorrect trials and an increase on the first set of correct trials (Figs 4A,B and 7E). This structure is likely to be functionally independent from the remaining network, because it merged last in the dendrogram (Fig. 6). Prediction-error signals were never observed during arbitrary visuomotor learning (Deiber et al. 1997; Toni and Passingham 1999; Toni et al. 2001; Eliassen et al. 2003). This result corroborates reports on predictionerror signals in the ventral striatum during probabilistic instrumental learning (O’Doherty et al. 2004; Kim et al. 2006; Pessiglione et al. 2006). Given that the ventral striatum receives dopaminergic afferents coding for prediction-error signals during a one-association instrumental learning task (Hollerman and Schultz 1998; Schultz 1998; Hollerman et al. 2000; Schultz and Dickinson 2000), the observed BOLD response in the ventral striatum is likely to be produced by changes in synaptic Cerebral Cortex July 2008, V 18 N 7 1491

Figure 8. Histogram of the mean ± standard error of the regression coefficients (the beta values) of the second-level analysis (group analysis across subjects). (A) Network including the PMd and the parietal lobule (green network in Fig. 6). (B) Network with the SMA and the dorsal striatum (blue in Fig. 6). (C) Right VLPFC cluster (yellow in Fig. 6). (D) Left dorsolateral prefrontal cortex, and (E) the ventral striatum.

activity due to modulations in dopaminergic neuron activity. Our results, however, cannot rule out that other brain structures such as the medial prefrontal cortex (Matsumoto et al. 2007) could represent prediction-error signals. Learning Computations Beyond the Rescorla--Wagner Model Absolute Value of the Prediction-Error Signals Several brain areas correlated with the absolute value of the prediction-error signals (Table 1). The PMd cortex, SPL, and IPL (clusters 5, 6, 8, 9 and 10 in Table 1; Fig. 5) formed a homogeneous functional network (in green in Fig. 6). Because the ventral striatum correlated with the signed prediction-error signals, this activity should be interpreted as reflecting computations other than pure prediction-error signals. An alternative model of associative learning could provide a plausible interpretation (Pearce and Hall 1980). The Pearce--Hall model is based on the assumption that animals need to attend to a stimulus or motor response only while they are learning about the relationship with their consequence (Pearce 1997; Pearce and Bouton 2001). During learning, the 1492 Neural Representations of Arbitrary Visuomotor Learning

d

Brovelli et al.

amount of controlled attention, and the amount of processing of a stimulus or motor response, is proportional to the absolute prediction-error signal. Thus, extinction and acquisition are formulated differently from the Rescorla--Wagner model. In fact, extinction is seen as an additional form of conditioning, the reinforcer being the omission of the expected outcome. In other words, the model postulates that inhibitory learning occurs if the expected correct outcome is omitted (as in the first incorrect trials in our experiment) such that an association forms between the observed stimulus (or response) and the absence of expected outcome (e.g., the frustration produced after an incorrect outcome or, more simply, with the incorrect outcome image). Similarly to the Rescorla--Wagner model, excitatory learning is thought to be responsible for the acquisition of correct associations. Therefore, we suggest that inhibitory and excitatory learning take place after the presentation of incorrect and correct outcomes, respectively. The dorsal fronto-parietal network should reflect the processing of preceding incorrect or correct visuomotor mappings while learning about their relationship with their consequences, that is when attention is directed to them. Current hypotheses about the functional role of the dorsal fronto-parietal network

support this interpretation: previous work has highlighted its dual role in the computation of visuomotor transformations (Burnod et al. 1999; Culham and Valyear 2006) and in the control of goal-direct attention to salient stimuli and responses (Corbetta and Schulman 2002). The known cortico-cortical connections among the fronto-parietal regions additionally support the notion of a functionally homogeneous network (Wise et al. 1997; Tanne-Gariepy et al. 2002). This interpretation provides an explanation for the central role of the premotor cortex known from neurophysiological and lesion studies (Mitz et al. 1991; Chen and Wise 1995; Wise and Murray 2000; Hadj-Bouziane et al. 2003; Suzuki and Brown 2005) and of the PMd--SPL network following neuroimaging studies (Deiber et al. 1997; Eliassen et al. 2003; Law et al. 2005). A second network correlating with the absolute value of the prediction error included the dorsal striatum and the SMA (in blue in Fig. 6). The dorsal striatum clusters extend over the head of the caudate and the putamen, but no clear distinction among them could be made. This network differed from the fronto-parietal network by a relatively smaller increase in BOLD response on the incorrect trials, especially in the SMA (Fig. 7B). Previous studies showed caudate activity correlating with the absolute value of the prediction error during probabilistic instrumental learning (Haruno and Kawato 2006). Learningrelated activity was found in the head of the caudate nucleus (Toni et al. 2001; Grol et al. 2006; Seger and Cincotta 2005); fMRI activity was found in the SMA (Boettiger and D’Esposito 2005) and in the putamen (Seger and Cincotta 2005) during categorization learning tasks, with robust activations when subjects perceived a contingency between their actions and the outcome (Tricomi et al. 2004). In general, the caudate is thought to be involved in stimulus--response habit learning (Graybiel 1998, 2005; Packard and Knowlton 2002) and/or action--outcome learning (Hollerman et al. 2000; Hadj-Bouziane et al. 2003; Yin and Knowlton 2006). Consistently, Williams and Eskandar (2006) recently showed that microstimulation of the dorsal striatum at the time of reward after a correct response significantly enhanced the rate of learning. Therefore, both the fronto-parietal network and the fronto-striatal network reflect the processing of preceding incorrect or correct visuomotor and sensorimotor mappings while learning is taking place. These networks might differ in their contribution in the visuomotor (fronto-parietal circuit) and consolidation (frontostriatal system) processes during learning. The anterior VLPFC (cluster 11) also correlated with the absolute value of the prediction error (Fig. 5). The lateral prefrontal cortex is known to be involved in conditional visuomotor associations (Toni and Passingham 1999; Passingham et al. 2000; Toni et al. 2001), maintenance of goal-relevant information in working-memory, rule-based response selection, and the generation of new abstract rules (Miller 2000; Bunge 2004; Miller et al. 2004). Our fMRI result is, however, original, because it showed how this area activates in the early phases of learning. Therefore, we suggest that the prefrontal activation reflects the retrieval from memory of the observed stimuli and previously performed actions and/or their processing during the early phases of learning. First Correct Outcomes In the dorsolateral prefrontal cortex, Brodmann area 9 (clusters 2 in Table 1), we found a selective BOLD activation on the first correct trials (Fig. 7C) and a comparable contribution from the

signed and absolute values of the prediction-error signals (Fig. 8C). The cluster is relatively independent from the other structures (in yellow in Fig. 6). In line with our result, the inferior frontal gyrus has been shown to selectively activate on first correct trials (Eliassen et al. 2003). In addition, the dorsolateral prefrontal cortex was found to decrease in activity during arbitrary visuomotor learning (Deiber et al. 1997; Law et al. 2005). The ability to rapidly learn stimulus--response-outcome associations is required for a correct implementation of learning strategies such as the repeat-stay (perform the same action if previously rewarded). Another learning strategy such as the change-shift (to select a different action if previously unrewarded) requires the reactivation of the incorrect associations. Thus, our results are in line with previous reports showing deficits in rapid arbitrary visuomotor learning and strategy use after lesions of the lateral and orbital prefrontal cortex (Bussey et al. 2001) and electrophysiological findings showing a selectivity in the discharge of prefrontal neurons for the type of learning strategy (Genovesio et al. 2005). Conclusion Overall, we showed that arbitrary visuomotor learning involves distributed areas of the ‘‘sensorimotor,’’ ‘‘associative,’’ and ‘‘limbic’’ fronto-striatal circuits that interact with the frontoparietal network at the level of the PMd. Consistently with the Rescorla--Wagner model, prediction-error signals are computed in the human brain and selectively engage the ventral striatum. In parallel, the dorsal fronto-parietal network and the dorsal striatum activate in relation with the absolute value of the prediction error, and may reflect the selective processing of incorrect and correct visuomotor mappings while learning about their relationship with their consequences. The right VLPFC might represent the retrieval from memory of the observed stimuli and previously performed actions and/or their processing during the early phases of learning. Interestingly, the information about the first correct outcomes, which is crucial for a rapid acquisition of arbitrary associations, activates selectively the left dorsolateral prefrontal cortex. We thus provided quantitative evidence about the neural computations underlying arbitrary visuomotor learning and novel information about the functional specialization of the fronto-parietal and fronto-striatal systems. The newly developed learning task systematically manipulated learning and induced highly reproducible performances. Most importantly, it provides a novel perspective for the development of new tasks addressing other types of learning and/or decision making processes. In addition, the results suggest new directions for future computational models of arbitrary visuomotor learning within the scope of modern concepts in instrumental learning (Belleine and Dickinson 1998; Gallistel et al. 2004). Supplementary Material Supplementary material can be found at: http://www.cercor. oxfordjournals.org/.

Funding French Ministe`re de la Recherche (ACI Neurosciences inte´gratives et computationnelles); and 2-year post doc fellowship awarded by the Fondation pour la Recherche Me´dicale (Paris, France) to A.B. Cerebral Cortex July 2008, V 18 N 7 1493

Notes We wish to thank Jean-Luc Anton and Muriel Roth (fMRI Centre, IFR 131 Cerveau et Cognition, Marseille, France) for support in the fMRI data acquisition, Pierre-Arnaud Coquelin (Centre de Mathe´matiques Applique´es, UMR 7641 CNRS-Ecole polytechnique) for help in modelling, and Anders Ledberg (Computational Neuroscience Group, Universitat Pompeu Fabra, Barcelona, Spain) for useful suggestions in data analysis. Conflict of Interest : None declared. Address correspondence to Andrea Brovelli, PhD, UMR 6193 CNRS & Aix-Marseille University, Mediterranean Institute for Cognitive Neuroscience, 31 chemin Joseph Aiguier, 13402 Marseille, France. Email: [email protected].

References Asaad WF, Rainer G, Miller EK. 1998. Neural activity in the primate prefrontal cortex during associative learning. Neuron. 21:1399--1407. Balleine BW, Dickinson A. 1998. Goal-directed instrumental action: contingency and incentive learning and their cortical substrates. Neuropharmacology. 37:407--419. Boettiger CA, D’Esposito M. 2005. Frontal networks for learning and executing arbitrary stimulus-response associations. J Neurosci. 25:2723--2732. Brasted PJ, Wise SP. 2005. The arbitrary mapping of sensory inputs to volontary and involontary movements: learning-dependent activity in the motor cortex and other telencephalic networks. In: Riehle A, Vaadia E, editors. Motor cortex in voluntary movements. Boca Raton: CRC Press. p. 259--296. Bunge SA. 2004. How we use rules to select actions: a review of evidence from cognitive neuroscience. Cogn Affect Behav Neurosci. 4:564--579. Burnod Y, Baraduc P, Battaglia-Mayer A, Guigon E, Koechlin E, Ferraina S, Lacquaniti F, Caminiti R. 1999. Parieto-frontal coding of reaching: an integrated framework. Exp Brain Res. 129:325--346. Bussey TJ, Wise SP, Murray EA. 2001. The role of ventral and orbital prefrontal cortex in conditional visuomotor learning and strategy use in rhesus monkeys (Macaca mulatta). Behav Neurosci. 115:971--982. Chen LL, Wise SP. 1995. Neuronal activity in the supplementary eye field during acquisition of conditional oculomotor associations. J Neurophysiol. 73:1101--1121. Corbetta M, Shulman GL. 2002. Control of goal-directed and stimulusdriven attention in the brain. Nat Rev Neurosci. 3:201--215. Corrado G, Doya K. 2007. Understanding neural coding through the model-based analysis of decision making. J Neurosci. 1. 18(7):1485--1495. Culham JC, Valyear KF. 2006. Human parietal cortex in action. Curr Opin Neurobiol. 16:205--212. Deiber MP, Wise SP, Honda M, Catalan MJ, Grafman J, Hallett M. 1997. Frontal and parietal networks for conditional motor learning: a positron emission tomography study. J Neurophysiol. 78:977--991. Dickinson A. 1980. Contemporary animal learning theory. UK: Cambridge University Press. Dickinson A. 1994. Instrumental conditioning. In: Mackintosh NJ, editor. Animal cognition and learning. London: Academic Press. p. 45--79. Eliassen JC, Souza T, Sanes JN. 2003. Experience-dependent activation patterns in human brain during visual-motor associative learning. J Neurosci. 23:10540--10547. Gallistel CR, Fairhurst S, Balsam P. 2004. The learning curve: implications of a quantitative analysis. Proc Natl Acad Sci USA. 101:13124--13131. Genovesio A, Brasted PJ, Mitz AR, Wise SP. 2005. Prefrontal cortex activity related to abstract response strategies. Neuron. 47:307--320. Graybiel AM. 1998. The basal ganglia and chunking of action repertoires. Neurobiol Learn Mem. 70:119--136. Graybiel AM. 2005. The basal ganglia: learning new tricks and loving it. Curr Opin Neurobiol. 15:638--644. Grol MJ, de Lange FP, Verstraten FA, Passingham RE, Toni I. 2006. Cerebral changes during performance of overlearned arbitrary visuomotor associations. J Neurosci. 26:117--125.

1494 Neural Representations of Arbitrary Visuomotor Learning

d

Brovelli et al.

Hadj-Bouziane F, Boussaoud D. 2003. Neuronal activity in the monkey striatum during conditional visuomotor learning. Exp Brain Res. 153:190--196. Hadj-Bouziane F, Meunier M, Boussaoud D. 2003. Conditional visuomotor learning in primates: a key role for the basal ganglia. J Physiol Paris. 97:567--579. Haruno M, Kawato M. 2006. Different neural correlates of reward expectation and reward expectation error in the putamen and caudate nucleus during stimulus-action-reward association learning. J Neurophysiol. 95:948--959. Hollerman JR, Schultz W. 1998. Dopamine neurons report an error in the temporal prediction of reward during learning. Nat Neurosci. 1:304--309. Hollerman JR, Tremblay L, Schultz W. 2000. Involvement of basal ganglia and orbitofrontal cortex in goal-directed behavior. Prog Brain Res. 126:193--215. Kim H, Shimojo S, O’Doherty JP. 2006. Is avoiding an aversive outcome rewarding? Neural substrates of avoidance learning in the human brain. PLoS Biol. 4:e233. Lancaster JL, Woldorff MG, Parsons LM, Liotti M, Freitas CS, Rainey L, Kochunov PV, Nickerson D, Mikiten SA, Fox PT. 2000. Automated Talairach atlas labels for functional brain mapping. Hum Brain Mapp. 10:120--131. Law JR, Flanery MA, Wirth S, Yanike M, Smith AC, Frank LM, Suzuki WA, Brown EN, Stark CE. 2005. Functional magnetic resonance imaging activity during the gradual acquisition and expression of pairedassociate memory. J Neurosci. 25:5720--5729. Macintosh N. 1975. A theory of attention: variations in the associability of stimuli with reinforcement. Psychol Rev. 82:276--298. Matsumoto M, Matsumoto K, Abe H, Tanaka K. 2007. Medial prefrontal cell activity signaling prediction errors of action values. Nat Neurosci. 10:647--656. Miller EK. 2000. The prefrontal cortex and cognitive control. Nat Rev Neurosci. 1:59--65. Miller EK, Freedman DJ, Wallis J. 2004. The prefrontal cortex: categories, concepts and cognition. Philos Trans R Soc Lond B. 357:1123--1136. Mitz AR, Godschalk M, Wise SP. 1991. Learning-dependent neuronal activity in the premotor cortex: activity during the acquisition of conditional motor associations. J Neurosci. 11:1855--1872. O’Doherty J, Dayan P, Schultz J, Deichmann R, Friston K, Dolan RJ. 2004. Dissociable roles of ventral and dorsal striatum in instrumental conditioning. Science. 304:452--454. O’Doherty JP, Hampton A, Kim H. 2007. Model-based fMRI and its application to reward learning and decision making. Ann N Y Acad Sci. 1104:35--53. Packard MG, Knowlton BJ. 2002. Learning and memory functions of the basal ganglia. Annu Rev Neurosci. 25:563--593. Passingham RE, Toni I, Rushworth MF. 2000. Specialisation within the prefrontal cortex: the ventral prefrontal cortex and associative learning. Exp Brain Res. 133:103--113. Pasupathy A, Miller EK. 2005. Different time courses of learningrelated activity in the prefrontal cortex and striatum. Nature. 433: 873--876. Pearce JM. 1997. Animal leaning and cognition. United Kingdom: Psychology Press. Pearce JM, Bouton ME. 2001. Theories of associative learning in animals. Annu Rev Psychol. 52:111--139. Pearce JM, Hall G. 1980. A model for Pavlovian learning: variations in the effectiveness of conditioned but not of unconditioned stimuli. Psychol Rev. 87:532--552. Pessiglione M, Seymour B, Flandin G, Dolan RJ, Frith CD. 2006. Dopamine-dependent prediction errors underpin reward-seeking behaviour in humans. Nature. 442:1042--1045. Rescorla RA. 1991. Associative relations in instrumental learning: the eighteenth Barlett Memorial Lecture. Rescorla RA, Wagner AR. 1972. A theory of Pavlovian conditioning: variations in the effectiveness of reinforcement and nonreinforcement. In: Black AH, Prokasy WF, editors. Classical conditioning II: current theory and research. New York: Appleton-Century-Crofts. p. 64--99.

Samejima K, Ueda Y, Doya K, Kimura M. 2005. Representation of action-specific reward values in the striatum. Science. 310: 1337--1340. Schultz W. 1998. Predictive reward signal of dopamine neurons. J Neurophysiol. 80:1--27. Schultz W. 2006. Behavioral theories and the neurophysiology of reward. Annu Rev Psychol. 57:87--115. Schultz W, Dickinson A. 2000. Neural coding of prediction errors. Annu Rev Neurosci. 23:473--500. Seger CA, Cincotta CM. 2005. The roles of the caudate nucleus in human classification learning. J Neurosci. 25:2941--2951. Smith AC, Frank LM, Wirth S, Yanike M, Hu D, Kubota Y, Graybiel AM, Suzuki WA, Brown EN. 2004. Dynamic analysis of learning in behavioral experiments. J Neurosci. 24:447--461. Sutton RS, Barto AG. 1998. Reinforcement learning: an introduction. Cambridge (MA): MIT Press. Suzuki W, Brown E. 2005. Behavioral and neurophysiological analyses of dynamic learning processes. Behav Cogn Neurosci Rev. 4:67--95. Tanne-Gariepy J, Rouiller EM, Boussaoud D. 2002. Parietal inputs to dorsal versus ventral premotor areas in the macaque monkey: evidence for largely segregated visuomotor pathways. Exp Brain Res. 145:91--103.

Toni I, Passingham RE. 1999. Prefrontal-basal ganglia pathways are involved in the learning of arbitrary visuomotor associations: a PET study. Exp Brain Res. 127:19--32. Toni I, Ramnani N, Josephs O, Ashburner J, Passingham RE. 2001. Learning arbitrary visuomotor associations: temporal dynamic of brain activity. NeuroImage. 14:1048--1057. Tricomi EM, Delgado MR, Fiez JA. 2004. Modulation of caudate activity by action contingency. Neuron. 41:281--292. Watkins CJCH, Dayan P. 1992. Q-learning. Mach Learn. 8:279--292. Williams ZM, Eskandar EN. 2006. Selective enhancement of associative learning by microstimulation of the anterior caudate. Nat Neurosci. 9:562--568. Wirth S, Yanike M, Frank LM, Smith AC, Brown EN, Suzuki WA. 2003. Single neurons in the monkey hippocampus and learning of new associations. Science. 300:1578--1581. Wise SP, Boussaoud D, Johnson PB, Caminiti R. 1997. Premotor and parietal cortex: corticocortical connectivity and combinatorial computations. Annu Rev Neurosci. 20:25--42. Wise SP, Murray EA. 2000. Arbitrary associations between antecedents and actions. Trends Neurosci. 23:271--276. Yin HH, Knowlton BJ. 2006. The role of the basal ganglia in habit formation. Nat Rev Neurosci. 7:464--476.

Cerebral Cortex July 2008, V 18 N 7 1495