Reward-Related Neuronal Activity During Go-Nogo ... - Research

Two Macaca fascicularis monkeys (A: male, 5.4 kg; B: female, 3.2 kg weight) served for the study. The activity of single neurons was recorded with moveable ...
856KB taille 13 téléchargements 298 vues
Reward-Related Neuronal Activity During Go-Nogo Task Performance in Primate Orbitofrontal Cortex ´ LEON TREMBLAY AND WOLFRAM SCHULTZ Institute of Physiology and Program in Neuroscience, University of Fribourg, CH-1700 Fribourg, Switzerland Tremblay, Le´on and Wolfram Schultz. Reward-related neuronal activity during go-nogo task performance in primate orbitofrontal cortex. J. Neurophysiol. 83: 1864 –1876, 2000. The orbitofrontal cortex appears to be involved in the control of voluntary, goal-directed behavior by motivational outcomes. This study investigated how orbitofrontal neurons process information about rewards in a task that depends on intact orbitofrontal functions. In a delayed go-nogo task, animals executed or withheld a reaching movement and obtained liquid or a conditioned sound as reinforcement. An initial instruction picture indicated the behavioral reaction to be performed (movement vs. nonmovement) and the reinforcer to be obtained (liquid vs. sound) after a subsequent trigger stimulus. We found task-related activations in 188 of 505 neurons in rostral orbitofrontal area 13, entire area 11, and lateral area 14. The principal task-related activations consisted of responses to instructions, activations preceding reinforcers, or responses to reinforcers. Most activations reflected the reinforcing event rather than other task components. Instruction responses occurred either in liquid- or sound-reinforced trials but rarely distinguished between movement and nonmovement reactions. These instruction responses reflected the predicted motivational outcome rather than the behavioral reaction necessary for obtaining that outcome. Activations preceding the reinforcer began slowly and terminated immediately after the reinforcer, even when the reinforcer occurred earlier or later than usually. These activations preceded usually the liquid reward but rarely the conditioned auditory reinforcer. The activations also preceded expected drops of liquid delivered outside the task, suggesting a primary appetitive rather than a task-reinforcing relationship that apparently was related to the expectation of reward. Responses after the reinforcer occurred in liquid- but rarely in sound-reinforced trials. Reward-preceding activations and reward responses were unrelated temporally to licking movements. Several neurons showed reward responses outside the task but instruction responses during the task, indicating a response transfer from primary reward to the rewardpredicting instruction, possibly reflecting the temporal unpredictability of reward. In conclusion, orbitofrontal neurons report stimuli associated with reinforcers are concerned with the expectation of reward and detect reward delivery at trial end. These activities may contribute to the processing of reward information for the motivational control of goal-directed behavior.

INTRODUCTION

One of the least charted territories of the primate cortex appears to be the orbitofrontal part of the frontal lobe. Its functions are defined largely by anatomic connections to brain centers whose functions are better known and by the deficits after lesions in human patients and experimental animals which concern altered and reduced emotional reacThe costs of publication of this article were defrayed in part by the payment of page charges. The article must therefore be hereby marked “advertisement” in accordance with 18 U.S.C. Section 1734 solely to indicate this fact. 1864

tions to environmental changes (Butter et al. 1970; Damasio 1994; Hornak et al. 1996). Primates with orbitofrontal lesions show altered reactions to rewarding and aversive events (Baylis and Gaffan 1991; Butter and Snyder 1972; Butter et al. 1969, 1970) and impaired adaptations to changed reinforcement contingencies (Butter 1969; Dias et al. 1996; Iversen and Mishkin 1970; Jones and Mishkin 1972; Passingham 1972; Rosenkilde 1979). Medial orbitofrontal lesions in primates lead to deficits in visual discrimination and matching tests (Bachevalier and Mishkin 1986; Baylis and Gaffan 1991; Kowalska et al. 1991; Mishkin and Manning 1978; Passingham 1975; Voytko 1985). A role in reward processing is suggested by strong inputs from basal amygdala (Porrino et al. 1981; Potter and Nauta 1979) and, transsynaptically, from ventral striatum (Haber et al. 1995) with its reward-related neurons (Nishijio et al. 1988; Schultz et al. 1992). Heavy inputs arise also from medial temporal cortex structures whose roles in reward processes are less known (Barbas 1988, 1993; Carmichael and Price 1995; Seltzer and Pandya 1989; Ungerleider et al. 1989). Neurophysiological investigations of orbitofrontal cortex addressed mnemonic functions in delayed response paradigms typical for the functions of prefrontal cortex (Jacobsen and Nissen 1937). Orbitofrontal neurons showed smaller activations during the delay periods of spatial and object matching tasks compared with dorsolateral prefrontal cortex but responded to delivery of juice reward at the end of the trial (Niki et al. 1972; Rosenkilde et al. 1981). Orbitofrontal neurons discriminated between primary and conditioned appetitive and aversive stimuli and were activated specifically in extinction or reversal trials (Thorpe et al. 1983). Neurons in the caudally adjoining orbitofrontal taste area showed specific gustatory and olfactory responses that were modified in relation to the animal’s satiation (Rolls and Baylis 1994; Rolls et al. 1989, 1996). Thus orbitofrontal neurons may respond to rewards in a manner appropriate for reinforcing behavioral reactions. To better understand neuronal mechanisms underlying the motivational control of behavior, we investigated the neuronal processing of reward information in brain structures participating in the control of behavior. After the characterization of different forms of reward processing in primate striatum (caudate nucleus, putamen, and ventral striatum) (Apicella et al. 1991, 1992; Hollerman et al. 1998; Schultz et al. 1992), we searched for inputs that could possibly contribute to striatal reward-related activations. One of the principal candidates is the orbitofrontal cortex, which strongly projects to the ventral striatum and medial caudate (Arikuni and Kubota 1986; Eblen

0022-3077/00 $5.00 Copyright © 2000 The American Physiological Society

ORBITOFRONTAL REWARD ACTIVITY

1865

and Graybiel 1995; Haber et al. 1995; Selemon and GoldmanRakic 1985; Yeterian and Pandya 1991). The present report describes how orbitofrontal neurons processed information about rewards while monkeys performed in the same delayed go-nogo task that previously was used for studying reward processing in the striatum (Hollerman et al. 1998). The task allowed us to differentiate between primary reward and secondary reinforcement and between movement and nonmovement reactions. The results were presented previously as abstract (Tremblay and Schultz 1995). The subsequent report describes how reward-related activity changed while animals learned to associate novel pictures with known reinforcers and behavioral reactions (Tremblay and Schultz 2000).

METHODS

Two Macaca fascicularis monkeys (A: male, 5.4 kg; B: female, 3.2 kg weight) served for the study. The activity of single neurons was recorded with moveable microelectrodes during performance of a behavioral task while monitoring arm and mouth muscle activity, licking movements, and eye movements. Electrode positions were reconstructed from small electrolytic lesions on 40-␮m-thick, cresylviolet-stained histological brain sections. Most methods were similar to those described in detail for the recordings in the striatum, and the present animal A had served also for the study of striatum in the same task (animal B of Hollerman et al. 1998).

Behavioral procedures Animals were seated in a primate chair and contacted an immovable, touch-sensitive resting key. Visual stimuli of 13 ⫻ 13° were presented as instruction or trigger stimuli on a 13-in computer monitor. A small transparent response lever was positioned centrally in a transparent vertical wall in front of the monitor immediately below the position of the visual stimuli. A 1-kHz sound with ⬃68 dB intensity served as conditioned reinforcer. Small quantities of apple juice (0.15– 0.20 ml) delivered by a solenoid valve served as rewards. A closed-circuit video system served to continuously supervise limb movements from above. Animals were fluid- and partly food-deprived during weekdays and were returned to their home cages after each session. In the computer-controlled delay go-nogo task, the animal kept its right hand relaxed on the resting key and a fractal picture appeared on the screen for 1 s (Fig. 1). It served as an instruction, indicating whether the animal should execute or withhold a movement in response to an upcoming trigger stimulus and whether it would receive a liquid reward or a conditioned auditory reinforcer. Three instruction pictures were used for three trial types, comprising rewarded movement, rewarded nonmovement, or unrewarded movement. Thus each instruction served as a preparatory signal that the animal could remember and use for preparing the upcoming reaction (what), whereas the trigger determined the time of the behavioral reaction (when) without providing additional information about the nature of the required reaction. The trigger stimulus consisted of the same red square in each trial type and appeared at a random 1.5–2.5 s after instruction offset. In rewarded-movement trials, the animal released the resting key, touched the lever and received the liquid reward 1.5 s later (Fig. 1, top). The trigger stimulus extinguished on lever touch in correctly performed trials or 1.5 s after onset if the animal failed to touch the lever. In rewarded-nonmovement trials, the animal kept its hand on the resting key for a fixed duration of 2.0 s to receive a liquid reward at 3.5 s after trigger onset (Fig. 1, middle). The trigger stimulus extinguished after 2.0 s on correctly performed trials or on key

FIG. 1. Behavioral task. Monkey sat with its right hand immobile on the immovable resting key and faced a computer monitor positioned behind a transparent wall in which a nearly transparent lever was mounted centrally. Task consisted of 3 trial types alternating semirandomly. All trials began with a 2-s control period during which the monitor was blank, followed by a 1-s presentation of a fractal instruction picture at monitor center immediately above the lever. After a random delay of 2.5–3.5 s after instruction onset, the red square trigger stimulus appeared at the center of the monitor. In rewarded (top)- and unrewarded-movement trials (bottom), the trigger elicited the movement and disappeared when the animal touched the lever after release of the resting key or stayed on for 2.0 s in erroneous trials without key release or lever touch. In rewarded-movement trials, a small quantity of liquid reward, and in unrewarded-movement trials the reinforcing sound, were presented at 1.5 s after lever touch. In nonmovement trials (middle), the same trigger stimulus was presented for 2.0 s while the animal maintained its hand on the resting key, and liquid reward was delivered after a further 1.5 s.

release with an erroneous movement. Unrewarded-movement trials required the same behavioral reaction as rewarded-movement trials, but the liquid drop was replaced by a sound of 100-ms duration (Fig. 1, bottom). The sound served as signal of correct task performance and, as compared with no sound, improved the animals’ correct task performance and daily cooperation considerably. Animals needed to perform this trial type correctly before advancing to a trial reinforced by liquid. To maintain the motivation of the animal, every correct unrewarded-movement trial was followed by one of the two rewarded trial types, thus predicting an upcoming reward. Thus the sound did not constitute an immediate reward but served as reinforcer and predicted a reward in the following trial, thus qualifying it as a secondary reinforcer. The three trial types alternated semirandomly, with the consecutive occurrence of same trial types being restricted to three rewardedmovement trials, two nonmovement trials, and one unrewarded-movement trial. Thus a movement trial was followed by any trial type with a probability of 0.33, a nonmovement trial was followed by a movement trial type with a probability of 0.75, and an unrewarded-movement trial was followed by a rewarded trial type with a probability of 1.0, as long as trials were performed correctly. Trials lasted 11–13 s, intertrial intervals were 4 –7 s. In free-liquid trials, animals received small quantities of liquid without performing in any behavioral task

1866

L. TREMBLAY AND W. SCHULTZ

and in the absence of phasic stimuli. Intervals between drops were irregular and ⬎11 s.

Data acquisition After behavioral conditioning, animals were implanted under deep pentobarbital sodium anesthesia and aseptic conditions with two cylinders for head fixation and a stainless steel chamber permitting vertical access with microelectrodes to the left frontal lobe. The dura was left intact. Teflon-coated, multistranded, stainless steel wires were implanted into the right extensor digitorum communis and biceps brachii muscles for electromyographic (EMG) recordings. In animal A, Ag-AgCl electrodes were implanted into the outer, upper, and lower canthi of the orbits for the recording of electrooculograms (EOG). (In animal B, EOGs were recorded with an Iscan infrared oculometer.) The implant was fixed to the skull with stainless steel screws and several layers of dental cement. Glass-insulated, platinum-plated tungsten microelectrodes stuck inside a metal guide cannula served to record extracellularly the activity of single neurons, using conventional electrophysiological techniques. Histological inspections revealed that the tips of all guide cannulas ended above the most dorsal parts of orbitofrontal cortex. Although guide cannulas damaged more tissue than solid microelectrodes, they permitted use of thin microelectrodes, causing very little damage to the areas investigated. Discharges from neuronal perikarya were converted into standard digital pulses by means of an adjustable Schmitttrigger. EMGs and horizontal and vertical EOGs were collected during neuronal recordings. EMGs were converted into standard digital pulses by a Schmitt-trigger. Licking movements were recorded as a standard digital pulse when the tongue interrupted an infrared light beam at the liquid spout. Pulses from neuronal discharges and EMGs were sampled together with digital signals from the behavioral task by a computer, together with analogue signals from EOGs. Only data from neurons sampled by the computer for ⱖ30 trials using all three trial types are reported. All data from neurons suspected to covary with some task component, and occasionally from unmodulated neurons, were stored uncondensed on computer disks.

Data analysis Onset, duration, magnitude, and statistical significance of increases of neuronal activity were assessed with a specially implemented sliding window procedure based on the nonparametric one-tailed Wilcoxon signed-rank test (Apicella et al. 1992), using a 2-s control period immediately before the instruction, and a time window of 250 ms that was moved in steps of 25 ms through the period of a suspected change. For activations preceding the instruction, the control period was placed individually for each neuron toward trial end at a position without obvious neuronal changes. Magnitudes of activations were expressed as percentage above control period activity. Peak activity was determined from the 500-ms interval with maximum neuronal activity. Depressions of activity are not reported. Latencies, durations, and magnitudes of neuronal activations were calculated for blocks of trials and compared among the three trial types using ANOVA with post hoc Fisher’s PLSD test (P ⬍ 0.05). TABLE

Magnitudes of activations were compared between trial blocks with the two-tailed Mann-Whitney U test on the basis of impulse counts in individual trials, normalized for durations of comparisons (P ⬍ 0.01). Neuronal activations were considered as preferential for one or two trial types when they were statistically significant (Wilcoxon test), and their magnitudes were significantly higher than in the other trial types (Mann-Whitney U test). These included statistically significant activations occurring selectively in only one or two trial types but not in the other trial types (Wilcoxon test). Activations either preceded or followed individual task events. They were considered to follow a task event when their onset and peak latencies were ⬍500 ms after an event and when their peak activation was closer to the preceding rather than the subsequent event. We evaluated movement parameters in terms of reaction time (from trigger onset to release of resting key), movement time (from key release to touching the response lever), and return time (from lever touch back to touch of resting key) and compared them using the Kolmogorov-Smirnov test (P ⬍ 0.001). We assessed differences in distributions of neuronal activations in orbitofrontal cortex with the ␹2 test, using four equidistant mediolateral levels and three rostrocaudal levels (sections a-b, c-d and e-f of Fig. 14). RESULTS

Behavior Both animals showed ⬎95% correct task performance throughout the experiment (monkey A: 99.0, 99.6, and 98.0%; monkey B: 98.0, 97.0, and 99.6% for rewarded-movement, rewarded-nonmovement, and unrewarded-movement trials, respectively). Unrewarded movements did not lead to immediate reward but were followed by a conditioned auditory reinforcer and a subsequent rewarded trial. Reaction times in both animals were significantly shorter in rewarded as compared with unrewarded-movement trials, although both trials involved reaching from the same starting position toward the same response lever (Table 1). Movement times differed inconsistently. Return times were significantly longer in rewarded as compared with unrewarded-movement trials, as both animals kept pressing the response lever after the reaching movement until the liquid reward was delivered, whereas they immediately returned to the resting key after lever press in unrewarded-movement trials. All movement differences concerned predominantly the timing of movement. Major differences in patterns of arm muscle activity or visible postural differences were not observed in electromyographic and video recordings between rewarded- and unrewarded-movement trials. Eye movements were very similar in the three trial types and failed to show systematic differences between rewarded and unrewarded movements (Fig. 2). The instruction elicited an ocular saccade to a relatively fixed position on each instruction picture unless the gaze was already there. The trigger stimulus in both movement trials elicited a very similar saccade to the

1. Movement parameters in rewarded- and unrewarded-movement trials Reaction Time

Movement Time

Return Time

Monkey

A

B

A

B

A

B

Rewarded movements Unrewarded movements

328 ⫾ 2 452 ⫾ 8*

295 ⫾ 3 434 ⫾ 8*

634 ⫾ 4 358 ⫾ 5*

291 ⫾ 3 411 ⫾ 5*

3095 ⫾ 15 673 ⫾ 19*

2966 ⫾ 17 1113 ⫾ 17*

All values are given in means ⫾ SE in milliseconds. Values were obtained from 688 trials in monkey A and from 1,060 trials in monkey B. These were measured during neuronal recordings in blocks of maximally 15 trials of the same type. *P ⬍ 0.001 against rewarded movements; Kolmogorov-Smirnov test.

ORBITOFRONTAL REWARD ACTIVITY

1867

response lever. In no cases were differences of neuronal activity between trial types clearly related to differences in eye movements. Mouth movements were not a part of the task contingencies. Tongue contacts with the spout occurred over relatively long periods in each trial. They began sporadically before and after the instruction, were unrelated to instruction onset, became more frequent during the trigger-reward interval, were reproducible and maximal after reward delivery, and occurred occasionally during the intertrial interval (Figs. 7 and 9). They also occurred sporadically in unrewarded-movement trials. Neuronal database A total of 505 orbitofrontal neurons with mean spontaneous discharge rate of 6.3 imp/s (range 0.4 –38.5 imp/s) was tested during task performance. Of these, 188 neurons (37%) exhibited 260 statistically significant task-related activations. Three major task relationships were found, namely responses to instructions, activations preceding reinforcers, and responses to reinforcers (Table 2). A few neurons showed activations preceding the instructions or after the trigger stimulus. Responses to instructions

FIG. 2. Eye movements during performance in the 3 trial types. Each curve in the 2 top parts shows horizontal and vertical eye positions during a single trial, respectively. All recordings were obtained simultaneously with neuronal recordings. The polar plots (bottom) show superimposed eye positions during 4 s after instruction onset (10 trials). Top, upward; right, rightward.

TABLE

Instruction responses occurred in 99 of the 188 taskrelated neurons (54%) (Table 2). Many responses reflected the type of reinforcer. They occurred preferentially in both rewarded trials irrespective of the execution or withholding of movement but not in unrewarded-movement trials (Fig. 3A) or, conversely, only in unrewarded-movement trials (Fig. 3B). Ten neurons responded preferentially in nonmovement trials (Fig. 3C). Only three neurons responded in both movement trial types irrespective of the type of reinforcer. Although some responses lasted for ⬎1 s, only four neurons showed statistically significant sustained activations lasting until trigger onset or beyond (Fig. 3D). Instruction responses in 35 of the 99 neurons occurred unselectively in all three trial types. Responses had mean latencies ranging from 155 to 179 ms and durations of 459 –562 ms in the different trial types. Response magnitudes amounted to about fourfold increases of activity (mean magnitudes

2. Numbers of orbitofrontal neurons differentially influenced by the type of reinforcement Trigger Trial Type

Instruction Following

Preceding

Reinforcement Following

Preceding

Following

Instruction Preceding

Rewarded movement Nonmovement Unrewarded movement Reward (irrespective of movement) Movement (irrespective of reinforcer) Nonpreferential

8 10 22

0 1 0

1 0 2

4 2 4

2 0 3

2 1 2

20

1

1

41

62

6

3 36

0 2

3 10

0 0

0 0

0 11

Total (n ⫽ 188)

99 (53)

4 (2)

17 (9)

51 (27)

67 (36)

22 (12)

Total number of task-modulated neurons (n ⫽ 188) is inferior to the sum of table entries (n ⫽ 260) because of multiple-task relationships. Activations listed under Reward occurred in both rewarded-movement and -nonmovement trials. Activations listed under Movement occurred in both rewarded- and unrewardedmovement trials. Trial type with activations preceding instructions refers to the preceding, not the current, trial. Values in parentheses are percentages.

1868

L. TREMBLAY AND W. SCHULTZ

FIG. 3. Different trial selectivities of responses to instruction stimuli of 4 orbitofrontal neurons. A: response in both rewarded trial types but absence of response in unrewardedmovement trials. B: response restricted to unrewarded-movement trials reinforced by a conditioned sound. C: response restricted to nonmovement trials. D: 1 of the rare examples of sustained activation during the instruction-trigger interval, occurring in rewarded-movement trials and, to a lesser extent, in nonmovement trials. Perievent time histograms are composed of neuronal impulses shown as dots below. Each dot denotes the time of a neuronal impulse, and distances to instruction onset correspond to real-time intervals. Each line of dots shows 1 trial. Trials alternated semirandomly during the experiment and are separated according to trial types and rearranged according to instructiontrigger intervals.

of 286 –320% above control activity). None of these parameters varied significantly among the three trial types (P ⬎ 0.05; ANOVA). Activations preceding reinforcers Of the 188 task-related neurons, 51 (27%) showed activations that began well before the liquid reward or the conditioned auditory reinforcer and terminated 0.5–1.0 s after these events (Table 2). Activations in 41 neurons occurred in both liquid-rewarded trial types but not in sound-reinforced trials (Fig. 4A), a few others being restricted to one rewarded trial type. Twenty-one of the 41 neurons responded also to reward delivery. Some, usually weak, activations preceded only the reinforcing sound (Fig. 4B). Most activations began in the trigger-reinforcer interval, occasionally ⬍1 s before reinforcement (3 neurons; Fig. 5A)

but usually earlier (15 neurons; Fig. 5B). Other activations began before the trigger (17 neurons; Fig. 5C). Some activations had rather long time courses, showing sluggish onset times, lasting during major portions of the trial and returning shortly to baseline after reinforcement (6 neurons; Fig. 5D). Activations remained present until the liquid or sound reinforcer was delivered and subsided immediately afterward, even when these events occurred before or after the usual time (Fig. 6A). Prereward activations occurred also when liquid was delivered at regular intervals in free-liquid trials outside the task, in all 10 neurons tested with task and free reward (Fig. 6B). Prereward activations in 10 neurons adapted rapidly to the last timing of reward relative to the trigger stimulus. The prereward activation in Fig. 6C began earlier after the trigger stimulus when reward had been delivered earlier in a preceding

FIG. 4. Selective activations preceding reinforcers in three orbitofrontal neurons. A: activation preceding the delivery of liquid reward in the 2 rewarded trial types but not before the reinforcing sound in unrewarded-movement trials. B: weak activation preceding the reinforcing sound in unrewarded-movement trials. Trials are rank-ordered according to instruction-reinforcer intervals.

ORBITOFRONTAL REWARD ACTIVITY

1869

after liquid reward. Fourteen of the 67 neurons responded also to the instructions. Latencies of reward responses in rewarded-movement and -nonmovement trials ranged from 70 to 1,590 ms (⬍100 ms in 25 and 26 neurons, 100 –300 ms in 13 and 17 neurons, ⬎300 ms in 24 and 19 neurons in the 2 trial types, respectively; means of 298 and 322 ms). Durations of reward responses in these trials ranged from 120 to 2,320 ms (means of 633 and 651 ms). Response magnitudes amounted to about fivefold increases of activity (means of 381 and 418% above control activity). None of these parameters varied significantly among the two rewarded trial types (P ⬎ 0.05; ANOVA). We delivered reward earlier or later than usually to further characterize the responses. Responses in all nine neurons tested followed the reward to the new time (Fig. 9A) and were increased in magnitude in four of them. Thus responses occurred to the reward and were not delayed trigger responses. Reward responses were restricted to the period after reward

FIG. 5. Different onsets of activations preceding reward in 4 orbitofrontal neurons. From top to bottom, activations began immediately before the reward (A), during the trigger-reward interval (B), and before the trigger stimulus (C and D). Only data from rewarded-movement trials are shown. Trials are rank-ordered according to instruction-reward intervals.

trial block (compare 4th with 2nd trial block after reward had been shifted to an earlier time in the 3rd compared with the 1st block). Prereward activations occurred during and immediately after the trigger-reward interval during which licking movements were also frequent (Fig. 7). However, the activations were absent during other trial periods and in intertrial periods in which licking movements occurred occasionally. Activations also were absent in unrewarded-movement trials that showed considerable licking activity. Responses to reinforcers Of the 188 task-related neurons, 67 (36%) responded to the delivery of a reinforcer (Table 2). Responses in 62 neurons occurred in both liquid-rewarded trial types irrespective of the movement and not in unrewarded-movement trials (Fig. 8, A and B). Twenty-one of the 62 neurons also showed prereward activations. Very few neurons were further selective for rewarded-movement trials. A few responses occurred only after sound reinforcement in unrewarded-movement trials and not

FIG. 6. Temporal aspects of activations preceding reward in 2 orbitofrontal neurons. A: prolonged activations with delayed reward (top) and shortened activations with earlier reward (bottom; nonmovement trials). Trials with the usual trigger-reward interval are shown at the top. B: activations preceding regularly spaced delivery of liquid outside of any behavioral task (same neuron as in A). C: modification of activation by change in reward timing. Earlier, but not later, reward delivery leads to appearance of prereward activation (bottom trials; rewarded-movement trials). Trials with the usual trigger-reward interval are shown at the top. This neuron also responded after reward delivery. Chronological sequence is shown from top to bottom in A–C.

1870

L. TREMBLAY AND W. SCHULTZ

FIG. 7. Timing of prereward activation in 1 orbitofrontal neuron compared with lick movements. This neuron is activated before reward in rewardedmovement trials (top) and rewarded-nonmovement trials (middle), but not in unrewarded-movement trials (bottom), whereas lick movements occurred irregularly throughout the trial in all trial types. Licks were recorded simultaneously with neuronal activity and are indicated by short horizontal lines in rasters (interruption of infrared photobeam by the animal’s tongue at the liquid-dispensing spout). Trials are rearranged according to instruction-reinforcer intervals.

delivery, although licking movements also occurred before reward delivery and in unrewarded-movement trials (Fig. 9, B and C). Thus reward responses appeared to be unrelated to mouth movements. The relationship of reward responses to the solenoid noise associated with reward delivery was tested in 14 responding neurons by blocking the liquid tube while maintaining the solenoid noise in free-liquid trials. Eight of these neurons failed to respond to the solenoid noise alone, suggesting a true

FIG. 9. Control tests with reward responses. A: temporal variations of reward delivery leading to parallel displacement of reward response. Data from the usual reward time are shown in top trial block. Subsequent blocks show earlier or later reward delivery with the same neuron. Data are from nonmovement trials and were similar in rewarded-movement trials. Chronological sequence is shown from top to bottom. B: reward responses were unrelated to licking movements. Data are from rewarded-movement trials and were similar in rewarded-nonmovement trials. C: licking movements in unrewarded-movement trials not accompanied by neuronal responses (same neuron as in B). Horizontal lines in rasters in B and C indicate interruptions of infrared photobeam by the animal’s tongue at the liquid-dispensing spout. Trials are rearranged according to instruction-reinforcer intervals.

reward response (Fig. 10), whereas the other six neurons also responded without the liquid. Responses to free-liquid versus task reward A total of 76 neurons responded to reward in the behavioral task, in free-liquid trials or in both situations. Of these, 46

FIG. 8. Responses to liquid reward in 2 orbitofrontal neurons. A: transient response. B: sustained response. Responses occurred in both rewarded trial types irrespective of movement but were absent in unrewarded-movement trials reinforced by the sound. Trials are rearranged according to instruction-reinforcer intervals.

ORBITOFRONTAL REWARD ACTIVITY

1871

liquid in the task but not in free-liquid trials (Fig. 11B). By contrast, 27 neurons failed to respond to liquid in the task but were activated in free-liquid trials (Fig. 11C). Sixteen of them showed instruction responses in both rewarded trials (Fig. 11D). Activations preceding instructions

FIG. 10. Influence of interruption of liquid reward flow on reward response in an orbitofrontal neuron. Response was lost in the absence of reward delivery, suggesting a true response to reward rather than to the associated noise of the solenoid liquid valve, which opened audibly when reward liquid was delivered (“liquid with solenoid noise”). For “solenoid noise only,” the tube between solenoid and animal’s mouth was closed, and the solenoid noise occurred alone without delivering liquid. Data are from free-liquid trials, ordered chronologically from top to bottom.

neurons responded to the liquid during task performance in both rewarded trial types and in free-liquid trials in the absence of any specific task (Fig. 11A). Three neurons responded to

Some neurons showed an interesting type of activation that was partly also related to reinforcement. Of the 188 task-related neurons, 22 (12%) showed activations which began slowly and at varying times after the reinforcer of the preceding trial, showed their peak ⬍500 ms before the instruction and terminated abruptly afterward (Table 2). According to their sluggish onset, they appeared to precede the upcoming instruction rather than after the past reinforcer. Activations in 6 of the 22 neurons occurred preferentially after both rewarded trial types and not after unrewarded-movement trials, whereas in 2 neurons they occurred preferentially after unrewarded trials (Fig. 12). Population activity of major reinforcement-related activations The histograms of Fig. 13 display averaged activity from neurons showing responses to instructions (A), activations

FIG. 11. Relationship of reward responses to task performance in 4 orbitofrontal neurons. A: reward response in both rewarded trial types in behavioral-task and free-liquid trials without any behavioral task. B: reward response in behavioral-task but not in freeliquid trials. C: reward response occurring in free-liquid trials but not in behavioral task. D: response to reward in free-liquid trials, and response to instruction but not to liquid in behavioral task. Baseline activity was increased in free-liquid trials in C and D. Task trials alternated semirandomly during the experiment and are separated according to trial types and rearranged according to instruction-trigger intervals. None of the neurons in A–D responded in unrewarded-movement trials. Free-liquid trials were run in separate blocks.

1872

L. TREMBLAY AND W. SCHULTZ

after reward. A few neurons were activated before the initial instruction signal in relation to the reward situation in the preceding or expected upcoming trial. In contrast to other prefrontal areas, few orbitofrontal neurons showed activations related to behavioral reactions in this task. These data support the notion that orbitofrontal cortex constitutes an important component of reward circuits in the brain. Processing of reinforcement information Delayed response tasks typically assess the functions of prefrontal cortex in the temporal organization of goal-directed behavior, working memory, and preparation of responding (Bauer and Fuster 1976; Fuster 1973; Jacobsen and Nissen 1937; Kubota et al. 1974; Niki et al. 1972; Rosenkilde et al. 1981). Go-nogo tasks test the inhibition of overt behavioral responses and are deficient after orbitofrontal lesions (Iversen and Mishkin 1970). Performance in these tasks depends on reinforcement and thus makes them suitable for investigating the role of reinforcement in goal-directed behavior. To compare primary reward with conditioned reinforcement, we added a trial type to the standard delayed go-nogo task in which movement was reinforced with a conditioned tone instead of liquid. To differentiate movement preparation from reinforcer expectation, we introduced a second delay that separated the behavioral response from the reinforcer. PROCESSES TESTED BY THE BEHAVIORAL TASK.

FIG. 12. Activations preceding instructions reflecting reward. Activations occurred after rewarded-movement and -nonmovement trials but not unrewarded-movement trials. In the task, any rewarded trial could be followed by an unrewarded trial, whereas correctly performed unrewarded trials were not presented consecutively. Only correctly performed trials are shown. Perievent time histograms are composed of neuronal impulses shown as dots below. Dots denote neuronal impulses aligned to instruction onset and each line shows 1 trial, the original sequence being from top to bottom. Trials alternated semirandomly during the experiment and are separated for analysis according to previous trial types and rearranged according to instruction-trigger intervals. Vertical calibration is 20 imp/bin for all histograms.

preceding liquid reward (B), and responses after liquid reward (C) in both rewarded trial types. Note that averaging of activity over long task periods reduces temporally disperse activations peaks. Therefore population responses appear lower than averages of individual activations.

RESPONSES TO REWARD-PREDICTING ENVIRONMENTAL STIMULI.

According to animal learning theory (Dickinson 1980), the instructions in our task were associated with specific reinforcers through a Pavlovian procedure and had an occasion-setting function determining the movement or nonmovement reaction. Most instruction responses differentiated between liquid and sound, but very few responses differentiated between the behavioral reactions irrespective of the type of reinforcer. Thus orbitofrontal neurons reported environmental stimuli more in

Positions of neurons Histological reconstructions showed that rostral area 13, entire area 11, and lateral area 14 of orbitofrontal cortex were explored (Fig. 14). Neurons with instruction responses were distributed widely, being significantly more frequent in medial as compared with lateral parts of the explored region (P ⬍ 0.05). Neurons with prereward activations were found predominantly in rostral area 13, where they were significantly less frequent in its very anterior part (P ⬍ 0.001). Neurons responding to reinforcers were significantly more frequent in lateral than medial orbitofrontal cortex (P ⫽ 0.01). DISCUSSION

These data show that neurons in orbitofrontal cortex process rewards in three principal forms in a delayed go-nogo task, as transient responses to reinforcer-predicting instructions, sustained activations preceding reward, and transient responses

FIG. 13. Average population activity of the 3 major types of reward-related activity found in orbitofrontal neurons. A: instruction responses in 20 neurons. B: activations preceding liquid reward in 20 neurons. C: responses to liquid reward in 41 neurons. For A–C, only data from neurons activated in both rewarded trial types were used. Population histograms in B and C do not comprise data from 21 neurons with activations both preceding and after the reward. For each display, histograms of all respective neurons were normalized for trial number and added together, and the resulting sum was divided by the number of neurons. n, number of neurons.

ORBITOFRONTAL REWARD ACTIVITY

1873

FIG. 14. Positions of reward-related neurons in orbitofrontal cortex. Coronal levels a-f are indicated on lateral and ventral surface views to the left and correspond to coronal sections shown to the right. Region investigated included rostral area 13, entire area 11 and lateral area 14. Different shadings indicate relative incidence of neurons showing reward relationships indicated on top, the percentage referring to the total number of neurons showing the respective relationships. Regions shown in white were not explored. Cytoarchitectonic areas are indicated by numbers and separated by interrupted lines. AS, arcuate sulcus; PS, principal sulcus.

association with reinforcement than behavioral reaction. These instruction responses occurred in orbitofrontal areas influenced by the medial temporal cortex (Morecraft et al. 1992). Our approach was based on experiments in which neurons in dorsolateral prefrontal cortex discriminated between instruction stimuli predicting liquid reward versus no reinforcement in a less complex task (Watanabe 1990, 1992). Preceding work had shown that orbitofrontal neurons discriminate between appetitive and aversive visual stimuli (Thorpe et al. 1983). The presently observed reward relationships in both rewarded trial types would argue against relationships to visual stimulus features of these instruction responses. In addition, the adjoining report demonstrates that reinforcement-related trial selectivities were maintained when novel visual instructions were learned despite considerable differences in visual features (Tremblay and Schultz 2000). Also, selectivities in orbitofrontal neurons remained related to rewards when multiple instruction sets were used in a spatial delayed response task (Tremblay and Schultz 1999). Thus the trial selectivities were more likely due to differences in reinforcement than visual features. EXPECTATION OF REWARD. Sustained activations preceding reinforcement occurred mostly in trials rewarded with liquid irrespective of the behavioral reaction and were largely absent in trials reinforced by the sound. This suggests a relationship to reward and not to the end of trial message contained in the reinforcers. The activations began typically around the time of the trigger stimulus as the last signal preceding reward and terminated immediately after reward was delivered, irrespective of its time of occurrence. They apparently reflected the expectation of reward by coding the occurrence of reward but not its precise moment. These prereward activations occurred in orbitofrontal areas influenced by the medial temporal cortex (Morecraft et al. 1992). The expectation of reward evoked by a conditioned appetitive stimulus is a major component of the central motivational state underlying approach behavior (Bindra 1968; Dickinson

1980). Although the instruction stimuli in the present task are associated with reward and have occasion-setting properties in instrumental behavior, the trigger would have a better rewardpredicting property because of temporal proximity. In line with this reasoning, most sustained activations followed the trigger rather than preceded it. The present differential prereinforcement activations resembled activities discriminating between expected appetitive and aversive reinforcers in rat orbitofrontal cortex (Schoenbaum et al. 1998). They were somewhat more variable than reward expectation-related activations in primate striatum (Apicella et al. 1992; Hikosaka et al. 1989; Hollerman et al. 1998; Schultz et al. 1992; Shidara et al. 1998). REWARD RESPONSES. Many orbitofrontal neurons detected the delivery of liquid reward in both rewarded trials irrespective of the behavioral reaction, whereas only few neurons responded to sound reinforcement. Most reward-driven neurons also responded to liquid outside the task, suggesting a relationship to the primary appetitive event and not to a particular reinforcing function or an end of trial signal. These reward responses occurred in orbitofrontal areas influenced by the amygdala (Morecraft et al. 1992). Earlier studies reported similar orbitofrontal responses to liquid reward (Niki et al. 1972; Rosenkilde et al. 1981), which discriminated against aversive liquids (Thorpe et al. 1983), whereas neurons in more caudal parts of area 13 responded to gustatory and olfactory stimuli (Rolls et al. 1990, 1996; Schoenbaum and Eichenbaum 1995; Thorpe et al. 1983). Reward responses also were found in dorsolateral prefrontal cortex (Watanabe 1989) and striatum (Apicella et al. 1991; Bowman et al. 1996; Hikosaka et al. 1989; Shidara et al. 1998). Some orbitofrontal neurons only responded to liquid outside the task. This may be because of the fact that the liquid was not predicted by any phasic stimulus. More than half of these neurons responded to the instruction during task performance as also reported by others (Matsumoto et al. 1995). Apparently

1874

L. TREMBLAY AND W. SCHULTZ

the response was transferred to the earliest liquid-predicting stimulus as in midbrain dopamine neurons (Mirenowicz and Schultz 1994). The unpredictable occurrence of reinforcement is a necessary condition for acquiring new behavioral responses (Rescorla and Wagner 1972). By contrast, the detection of fully predicted reward is necessary to prevent extinction of established behavior. Thus the orbitofrontal responses to unpredicted reward may play a role in reward-directed learning, whereas the responses to predicted reward may function in maintaining established task performance. EXPECTATION OF INSTRUCTION. Preinstruction activations reflected the expectation of instructions acquired from the experience in the task schedules. Previous studies reported preinstruction activations in striatal and cortical neurons that were unconditional on trial type (Apicella et al. 1992), changed with regularly alternating trial types (Hikosaka et al. 1989), or reflected the employed dimensions in discriminations (Sakagami and Niki 1994). The present preinstruction activations apparently were related to the possible type of upcoming trial. As correct unrewarded trials were invariably followed by a rewarded trial type, activations preferentially following unrewarded trials may reflect the expectation of a rewarded trial. By contrast, as rewarded trials could follow each other in our asymmetric trial schedule, it is less certain which kind of expectation was reflected by activations occurring preferentially after rewarded trials. DELAY ACTIVITY. Sustained activations of dorsolateral prefrontal neurons during the instruction-trigger delay probably reflect working memory or movement preparation (Funahashi et al. 1993). Sustained activations in orbitofrontal neurons occurred rarely in the instruction-trigger delay in our task. This contrasted sharply with the frequent occurrence of sustained delay activity in the striatum in an identical conditional delayed go-nogo task (Hollerman et al. 1998) and in spatial delayed response tasks in dorsolateral prefrontal cortex (cf. Funahashi et al. 1993). Sustained activations were presently frequent in the second, trigger-reward delay, where they may reflect the expectation of reward. An earlier spatial delayed response task used only the initial instruction-trigger delay and reported sustained delay activity in 25% of tested orbitofrontal neurons (Rosenkilde et al. 1981). As that delay ended close to the reward, some of the activations might reflect the expectation of reward. More sustained mnemonic and movement preparatory activity conceivably may occur in orbitofrontal neurons in behavioral tasks involving more elaborate memory demands and behavioral reactions.

and sound reinforcement (Apicella et al. 1991, 1992; Hikosaka et al. 1989; Hollerman et al. 1998; Schultz et al. 1992). However, striatal neurons show a larger variety of behavioral relationships than orbitofrontal neurons, including the expectation of external stimuli and the preparation, initiation and execution of movement (cf. Schultz et al. 1995). Many of these activities depend on the expectation of reward as opposed to secondary reinforcement (Hollerman et al. 1998). The similarity between orbitofrontal and striatal reward-related activations may suggest that orbitofrontal inputs induce the striatal reward signals. Many striatal reward-related activations occur in areas with heavy orbitofrontal projections, in particular the ventral striatum (Apicella et al. 1991, 1992; Arikuni and Kubota 1986; Bowman et al. 1996; Eblen and Graybiel 1995; Haber et al. 1996; Schultz et al. 1992; Selemon and Goldman-Rakic 1985; Shidara et al. 1998), although they also are found in more dorsal striatal regions with fewer orbitofrontal inputs (Hikosaka et al. 1989; Yeterian and Pandya 1991). Neurons in different nuclei of amygdala respond selectively to primary foods and liquids and to conditioned stimuli associated with rewards (Nishijo et al. 1988). Amygdala neurons show sustained activations preceding behavioral reactions in a delayed response task (Nakamura et al. 1992). Without an interval between behavioral reaction and reward, some of these activations might reflect an expectation of reward, which was confirmed in rats (Schoenbaum et al. 1998). AMYGDALA.

Dopamine neurons show entirely different forms of reward processing. They show phasic, but not sustained, activations after unpredicted rewards and conditioned, reward-predicting stimuli, and they are depressed when a predicted reward is omitted (Ljungberg et al. 1992; Mirenowicz and Schultz 1994; Romo and Schultz 1990; Schultz et al. 1993). Dopamine responses appear to report the discrepancy between an expected and an actually occurring reward (Schultz et al. 1997) and thus have the formal characteristics of reinforcement signals for acquiring new behavioral reactions (Rescorla and Wagner 1972). DOPAMINE NEURONS.

We thank B. Aebischer, J. Corpataux, A. Gaillard, A. Pisani, A. Schwarz, and F. Tinguely for expert technical assistance. The study was supported by Swiss National Science Foundation Grants 31-28591.90, 31.43331.95, and NFP38.4038-43997. L. Tremblay received a postdoctoral fellowship from the Fondation pour la Recherche Scientifique of Quebec. Present address of L. Tremblay: INSERM Unit 289, Hoˆpital de la Salpetri re, 47 Boulevard de l’Hoˆpital, F-75651 Paris, France. Address reprint requests to W. Schultz. Received 18 February 1999; accepted in final form 29 November 1999.

Comparison with other reward-processing brain systems The prominent relationships to reinforcers would allow the orbitofrontal cortex to be a major component of the reward system of the brain. A comparison with reward signals in closely related brain structures may help to assess the potential contributions of orbitofrontal activities to the motivational control of goal-directed behavior. STRIATUM. Orbitofrontal neurons appear to process reward information in many similar ways as neurons in caudate nucleus, putamen, and ventral striatum. Striatal neurons are activated during the expectation of reward, respond to reward delivery, and discriminate between primary appetitive liquid

REFERENCES APICELLA, P., LJUNGBERG, T., SCARNATI, E., AND SCHULTZ, W. Responses to reward in monkey dorsal and ventral striatum. Exp. Brain Res. 85: 491–500, 1991. APICELLA, P., SCARNATI, E., LJUNGBERG, T., AND SCHULTZ, W. Neuronal activity in monkey striatum related to the expectation of predictable environmental events. J. Neurophysiol. 68: 945–960, 1992. ARIKUNI, T. AND KUBOTA, K. The organization of prefrontocaudate projections and their laminar origin in the macaque monkey: a retrograde study using HRP-gel. J. Comp. Neurol. 244: 492–510, 1986. BACHEVALIER, J. AND MISHKIN, M. Visual impairment follows ventromedial but not dorsolateral prefrontal lesions in monkeys. Behav. Brain Res. 20: 249 –261, 1986.

ORBITOFRONTAL REWARD ACTIVITY BARBAS, H. Anatomic organization of basoventral and mediodorsal visual recipient prefrontal regions in the rhesus monkey. J. Comp. Neurol. 276: 313–342, 1988. BARBAS, H. Organization of cortical afferent input to orbitofrontal areas in the rhesus monkey. Neuroscience 56: 841– 864, 1993. BAUER, R. H. AND FUSTER, J. M. Delayed-matching and delayed-response deficit from cooling dorsolateral prefrontal cortex in monkeys. J. Comp. Physiol. Psychol. 90: 293–302, 1976. BAYLIS, L. L. AND GAFFAN, D. Amygdalectomy and ventromedial prefrontal ablation produce similar deficits in food choce and in simple object discrimination learning for an unseen reward. Exp. Brain Res. 86: 617– 622, 1991. BINDRA, D. Neuropsychological interpretation of the effects of drive and incentive-motivation on general activity and instrumental behavior. Psychol. Rev. 75: 1–22, 1968. BOWMAN, E. M., AIGNER, T. G., AND RICHMOND, B. J. Neural signals in the monkey ventral striatum related to motivation for juice and cocaine rewards. J. Neurophysiol. 75: 1061–1073, 1996. BUTTER, C. M. Perseveration in extinction and in discrimination reversal tasks following selective prefrontal ablations in Macaca mulatta. Physiol. Behav. 4: 163–171, 1969. BUTTER, C. M., MACDONALD, J. A., AND SNYDER, D. R. Orality, preference behavior, and reinforcement value of non-food objects in monkeys with orbital frontal lesions. Science 164: 1306 –1307, 1969. BUTTER, C. M. AND SYNDER, D. R. Alterations in aversive and aggressive behaviors following orbitofrontal lesions in rhesus monkeys. Acta Neurobiol. Exp. 32: 525–565, 1972. BUTTER, C. M., SYNDER, D. R., AND MCDONALD, J. A. Effects of orbitofrontal lesions on aversive and aggressive behaviors in rhesus monkeys. J. Comp. Physiol. Psychol. 72: 132–144, 1970. CARMICHAEL, S. T. AND PRICE, J. L. Limbic connections of the orbital and medial prefrontal cortex in macaque monkeys. J. Comp. Neurol. 363: 615– 641, 1995. DAMASIO, A. R. Descartes’ Error. New York: Putnam, 1994. DIAS, R., ROBBINS, T. W., AND ROBERTS, A. C. Dissociation in prefrontal cortex of affective and attentional shifts. Nature 380: 69 –72, 1996. DICKINSON, A. Contemporary Animal Learning Theory. Cambridge: Cambridge, 1980. DICKINSON, A. AND BALLEINE, B. Motivational control of goal-directed action. Anim. Learn. Behav. 22: 1–18, 1994. EBLEN, F. AND GRAYBIEL, A. M. Highly restricted origin of prefrontal cortical inputs to striosomes in the macaque monkey. J. Neurosci. 15: 5999 – 6013, 1995. FUNAHASHI, S., CHAFEE, M. V., AND GOLDMAN-RAKIC, P. S. Prefrontal neuronal activity in rhesus monkeys performing a delayed anti-saccade task. Nature 365: 753–756, 1993. FUSTER, J. M. Unit activity of prefrontal cortex during delayed-response performance: neuronal correlates of transient memory. J. Neurophysiol. 36: 61–78, 1973. HABER, S., KUNISHIO, K., MIZOBUCHI, M., AND LYND-BALTA, E. The orbital and medial prefrontal circuit through the primate basal ganglia. J. Neurosci. 15: 4851– 4867, 1995. HIKOSAKA, O., SAKAMOTO, M., AND USUI, S. Functional properties of monkey caudate neurons. III. Activities related to expectation of target and reward. J. Neurophysiol. 61: 814 – 832, 1989. HOLLERMAN, J. R., TREMBLAY, L., AND SCHULTZ, W. Influence of reward expectation on behavior-related neuronal activity in primate striatum. J. Neurophysiol. 80: 947–963, 1998. HORNAK, J., ROLLS, E. T., AND WADE, D. Face and voice expression identification in patients with emotional and behavioural changes following ventral frontal lobe damage. Neuropsychologia 34: 247–261, 1996. IVERSEN, S. D. AND MISHKIN, M. Perseverative interference in monkeys following selective lesions of the inferior prefrontal convexity. Exp. Brain Res. 11: 376 –386, 1970. JACOBSEN, C. F. AND NISSEN, H. W. Studies of cerebral function in primates. IV. The effects of frontal lobe lesions on the delayed alternation habit in monkeys. J. Comp. Physiol. Psychol. 23: 101–112, 1937. JONES, B. AND MISHKIN, M. Limbic lesions and the problem of stimulusreinforcement associations. Exp. Neurol. 36: 362–377, 1972. KOWALSKA, D., BACHEVALIER, J., AND MISHKIN, M. The role of the inferior prefrontal convexity in performance of delayed nonmatching-to-sample. Neuropsychologia 29: 583– 600, 1991. KUBOTA, K., IWAMOTO, T,. AND SUZUKI, H. Visuokinetic activities of primate prefrontal neurons during delayed-response performance. J. Neurophysiol. 37: 1197–1212, 1974.

1875

LJUNGBERG, T., APICELLA, P., AND SCHULTZ, W. Responses of monkey dopamine neurons during learning of behavioral reactions. J. Neurophysiol. 67: 145–163, 1992. MATSUMOTO, K., NAKAMURA, K., MIKAMI, A., AND KUBOTA, K. Responses to unpredictable water delivery into the mouth of visually responsive neurons in the orbitofrontal cortex of monkeys. Abstr. IBRO Satellite Meet. Inuyama. 1995, p. 14. MIRENOWICZ, J. AND SCHULTZ, W. Importance of unpredictability for reward responses in primate dopamine neurons. J. Neurophysiol. 72: 1024 –1027, 1994. MISHKIN, M. AND MANNING, F. J. Non-spatial memory after selective prefrontal lesions in monkeys. Brain Res. 143: 313–323, 1978. MORECRAFT, R. J., GEULA, C., AND MESULAM, M.-M. Cytoarchitecture and neural afferents of orbitofrontal cortex in the brain of the monkey. J. Comp. Neurol. 323: 341–358, 1992. NAKAMURA, K., MIKAMI, A., AND KUBOTA, K. Activity of single neurons in the monkey amygdala during performance of a visual discrimination task. J. Neurophysiol. 67: 1447–1463, 1992. NIKI, H., SAKAI, M., AND KUBOTA, K. Delayed alternation performance and unit activity of the caudate head and medial orbitofrontal gyrus in the monkey. Brain Res. 38: 343–353, 1972. NISHIJO, H., ONO, T., AND NISHINO, H. Single neuron responses in amygdala of alert monkey during complex sensory stimulation with affective significance. J. Neurosci. 8: 3570 –3583, 1988. PASSINGHAM, R. E. Non-reversal shifts after selective prefrontal ablations in monkeys (Macaca mulatta). Neuropsychologia 10: 41– 46, 1972. PASSINGHAM, R. Delayed matching after selective prefrontal lesions in monkeys (Macaca mulatta). Brain Res. 92: 89 –102, 1975. PORRINO, L. J., CRANE, A. M., AND GOLDMAN-RAKIC, P. S. Direct and indirect pathways from the amygdala to the frontal lobe in rhesus monkeys. J. Comp. Neurol. 198: 121–136, 1981. POTTER, H. AND NAUTA, W.J.H. A note on the problem of olfactory associations of the orbitofrontal cortex in the monkey. Neuroscience 4: 361–367, 1979. RESCORLA, R. A. AND WAGNER, A. R. A theory of Pavlovian conditioning: variations in the effectiveness of reinforcement and nonreinforcement. In: Classical Conditioning II: Current Research and Theory, edited by A. H. Black and W. F. Prokasy. New York: Appleton Century Crofts, 1972, p. 64 –99. ROLLS, E. T. AND BAYLIS, L. L. Gustatory, olfactory, and visual convergence within the primate orbitofrontal cortex. J. Neurosci. 14: 5437–5452, 1994. ROLLS, E. T., CRITCHLEY, H. D., MASON, R., AND WAKEMAN, E. A. Orbitofrontal cortex neurons: role in olfactory and visual association learning. J. Neurophysiol. 75: 1970 –1981, 1996. ROLLS, E. T., SIENKIEWICZ, Z. J., AND YAXLEY, S. Hunger modulates the responses to gustatory stimuli of single neurons in the caudolateral orbitofrontal cortex of the macaque monkey. Eur. J. Neurosci. 1: 53– 60, 1989. ROLLS, E. T., YAXLEY, S., AND SIENKIEWICZ, Z. J. Gustatory responses of single neurons in the caudolateral orbitofrontal cortex of the macaque monkey. J. Neurophysiol. 64: 1055–1066, 1990. ROMO, R. AND SCHULTZ, W. Dopamine neurons of the monkey midbrain: contingencies of responses to active touch during self-initiated arm movements. J. Neurophysiol. 63: 592– 606, 1990. ROSENKILDE, C. E. Functional heterogeneity of the prefrontal cortex in the monkey: a review. Behav. Neural Biol. 25: 301–345, 1979. ROSENKILDE, C. E., BAUER, R. H., AND FUSTER, J. M. Single cell activity in ventral prefrontal cortex of behaving monkeys. Brain Res. 209: 375–394, 1981. SAKAGAMI, M. AND NIKI, H. Encoding of behavioral significance of visual stimuli by primate prefrontal neurons: relation to relevant task conditions. Exp. Brain Res. 97: 423– 436, 1994. SCHOENBAUM, G., CHIBA, A. A., AND GALLAGHER, M. Orbitofrontal cortex and basolateral amygdala encode expected outcome during learning. Nat. Neurosci. 1: 155–159, 1998. SCHOENBAUM, G. AND EICHENBAUM, H. Information coding in the rodent prefrontal cortex. I. Single-neuron activity in orbitofrontal cortex compared with that in pyriform cortex. J. Neurophysiol. 74: 733–750, 1995. SCHULTZ, W., APICELLA, P., AND LJUNGBERG, T. Responses of monkey dopamine neurons to reward and conditioned stimuli during successive steps of learning a delayed response task. J. Neurosci. 13: 900 –913, 1993. SCHULTZ, W., APICELLA, P., ROMO, R., AND SCARNATI, E. Context-dependent activity in primate striatum reflecting past and future behavioral events. In: Models of Information Processing in the Basal Ganglia, edited by J. C. Houk, J. L. Davis, and D. G. Beiser. Cambridge: MIT, 1995, p. 11–28.

1876

L. TREMBLAY AND W. SCHULTZ

SCHULTZ, W., APICELLA, P., SCARNATI, E., AND LJUNGBERG, T. Neuronal activity in monkey ventral striatum related to the expectation of reward. J. Neurosci. 12: 4595– 4610, 1992. SCHULTZ, W., DAYAN, P., AND MONTAGUE, R. R. A neural substrate of prediction and reward. Science 275: 1593–1599, 1997. SELEMON, L. D. AND GOLDMAN-RAKIC, P. S. Longitudinal topography and interdigitation of corticostriatal projections in the rhesus monkey. J. Neurosci. 5: 776 –794, 1985. SELTZER, B. AND PANDYA, D. N. Frontal lobe connections of the superior temporal sulcus in the rhesus monkey. J. Comp. Neurol. 281: 97–113, 1989. SHIDARA, M., AIGNER, T. G., AND RICHMOND, B. J. Neuronal signals in the monkey ventral striatum related to progress through a predictable series of trials. J. Neurosci. 18: 2613–2625, 1998. THORPE, S. J., ROLLS, E. T., AND MADDISON, S. The orbitofrontal cortex: neuronal activity in the behaving monkey. Exp. Brain Res. 49: 93–115, 1983. TREMBLAY, L. AND SCHULTZ, W. Processing of reward-related information in primate orbitofrontal neurons. Soc. Neurosci. Abstr. 21: 952, 1995. TREMBLAY, L. AND SCHULTZ, W. Relative reward preference in primate orbitofrontal cortex. Nature 398: 704 –708, 1999.

TREMBLAY, L. AND SCHULTZ, W. Modifications of reward expectation-related neuronal activity during learning in primate orbitofrontal cortex. J. Neurophysiol. 83: 1877–1885, 2000. UNGERLEIDER, L. G., GAFFAN, D., AND PELAK, V. S. Projections from inferotemporal cortex to prefrontal cortex via the uncinate fascicle in rhesus monkeys. Exp. Brain Res. 76: 473– 484, 1989. VOYTKO, M. L. Cooling orbitofrontal cortex disrupts matching-to-sample and visual discrimination learning in monkeys. Physiol. Psychol. 13: 219 –229, 1985. WATANABE, M. The appropriateness of behavioral responses coded in post-trial activity of primate prefrontal units. Neurosci. Lett. 101: 113–117, 1989. WATANABE, M. Prefrontal unit activity during associative learning in the monkey. Exp. Brain Res. 80: 296 –309, 1990. WATANABE, M. Frontal units coding the associative significance of visual and auditory stimuli. Exp. Brain Res. 89: 233–247, 1992. YETERIAN, E. H. AND PANDYA, D. N. Prefrontostriatal connections in relation to cortical architectonic organization in rhesus monkeys. J. Comp. Neurol. 312: 43– 67, 1991.