Control of Operant Response Force

4,431-438. Copyright 1998 by the American Psychological Association, Inc. ... was conducted by Andrew B. Slifkin and was part of his doctoral dissertation. .... from comparisons between the two high force groups. This .... Tray entries and.
932KB taille 2 téléchargements 365 vues
Journal of Experimental Psychology: Animal Behavior Processes 1998, Vol. 24, No. 4,431-438

Copyright 1998 by the American Psychological Association, Inc. 0097-7403/9S/S3.00

Control of Operant Response Force Andrew B. Slifkin and Jasper Brener State University of New York at Stony Brook Differences in motor-control strategies (feedback or feedforward) engaged by rats to produce operant response force were investigated under 2 conditions of external feedback. In the immediate condition, liquid sucrose reinforcers were delivered as soon as each forelimb response met the force requirement, whereas under the terminal condition, reinforcers were delivered at response termination. When feedback control of response force was precluded by delivering reinforcers at response termination, force was adjusted by modulation of the rate of rise of force. However, under immediate reinforcer delivery, response force was controlled by adjustments of time to peak force. Such adjustments of response time to meet response requirements of increasing difficulty are consonant with expressions of the speed—accuracy tradeoff'commonly observed in studies of human motor control.

The coordination and control of voluntary, goal-directed motor performance has been the focus of a substantial research effort spanning the century (Woodworth, 1899). Over the same period, the processes of operant conditioning have also been vigorously investigated (Skinner, 1938; Thomdike, 1898). There would seem to be a natural affinity between these two areas because they share an interest in identifying the determinants of behavior. However, aside from a few recent acknowledgments of the potential benefits of cross-fertilization (Adams, 1987; Mazur, 1986; Mechner, 1992), motor-control and operant-conditioning research have proceeded almost independently. Although research in motor control has borrowed many technical tools from operant conditioning (Adams, 1987), it has appeared to be largely disinterested in the theoretical issues that dominate the latter field. Similarly, in operant research there have been few attempts to examine the organization and coordination of motor responses (Brener, 1986; Kelso, 1995; Powers, 1973). Instead, operant-conditioning research has focused on identifying how contingencies of reinforcement influence the allocation of time and behavior to different options in the environment (e.g., Rachlin, 1989). In this regard, the present article is atypical as it investigates the influence of contingencies of reinforcement on the engagement of motor-control strategies used in the production of operant responses. In most operant-conditioning procedures, reinforcers are delivered as soon as the force required for the closure of a microswitch (lever or key) is achieved. Stimuli produced by

the delivery of reinforcers provides "immediate feedback" about response correctness, and this may provide a basis for developing a strategy to earn reinforcers. For example, subjects may "increase force applied to the lever until the reinforcer is detected." Once such a feedback strategy has been acquired, the effectiveness of the subject's operant motor performance may remain unimpaired despite increases or decreases in the force required for reinforcer delivery. Furthermore, because immediate reinforcer delivery may guide behavior on a moment-to-moment basis during response execution (Salmoni, Schmidt, & Walter, 1984), the acquisition of a feedback strategy removes the need to learn and retain the prevailing force criterion. Thus, under conditions of immediate reinforcer delivery such as those that prevail in most operant-conditioning experiments, subjects may meet experimental force requirements without learning to encode criterion forces or decode them to generate appropriate levels of muscular activation. However, in an experimental analysis of force learning in rats, Notterman and Mintz (1965) explicitly prevented reliance on such on-line exteroceptive feedback to regulate response force during response execution. They did this by delaying reinforcer delivery until the end of responses that met the force criterion. Under these conditions of "terminal feedback" (Schmidt, 1988), they observed, as have several other investigators since then (e.g., Brener & Mitchell, 1989; Fowler, 1987; Mitchell & Brener, 1991; Slifkin, Mitchell, & Brener, 1995), that subjects were readily able to scale their force output to force requirements. The terminal feedback contingency clearly does not permit response force to be regulated on-line by exteroceptive feedback available during response execution, by the strategy "increase force to the lever until the reinforcer is detected." Instead, the force target must be internalized and may be produced by what can be termed a feedforward

Andrew B. Slifkin and Jasper Brener, Department of Psychology, State University of New York at Stony Brook. This research was supported hi part by National Institute of Health Grants R01-HL42366 and F32-HD07885. This experiment was conducted by Andrew B. Slifkin and was part of his doctoral dissertation. Correspondence concerning this article should be addressed to Andrew B. Slifkin, who is now at the Departments of Biobehavioral Health and Kinesiology, Pennsylvania State University, 267P Recreation Building, University Park, Pennsylvania, 16802. Electronic mail may be sent to either Andrew B. Slifkin or Jasper Brener at [email protected] [email protected], respectively.

process. Conventionally, the term feedforward is used to identify processes in which a set of neuromuscular commands is composed and sent to the relevant effector system in advance of response initiation (e.g., Brooks, 1986; Keele, 431

432

SLIFKIN AND BRENER

B AF/AT,

AF/AT2

AF/AT,

AF/AT3 AF/AT,

HIGH

0

a

o

LOW

TPK

Time Figure 1. Three forms offeree control resulting in equivalent peak-force differentiations: (A) The time to peak force at the LOW (Tppi) requirement is less than that at the HIGH requirement (TPF2), and the rate of rise of force (AF/AT|) remains constant; (B) Time to the peak force (Tm) is constant, but the rate of rise of force is greater under the HIGH (AF/AT2) than the LOW (AF/AT,) requirement; (C) Both time to the peak force and the rate of rise of force are modulated with HIGH requirement parameter values (TpR, AF/AT3) being set to levels intermediate between those depicted in (A) and (B).

1968). This mode of control must be involved to some extent in the initiation of responses under both the immediate and terminal feedback conditions. However, we use the terras feedforward said feedback here simply to distinguish motorcontrol processes that do and do not involve on-line use of exteroceptive feedback to regulate response force. Under terminal feedback conditions, exteroceptive feedback of response correctness must be processed between responses. Because the success of the response is only signaled at response termination, if the response fails to meet the force requirement, any readjustment of the force parameter can only influence subsequent responses. Unlike the withinresponse regulatory strategy that may be used under conditions of immediate feedback, the between-response strategy requires that criterion response forces are encoded, recalled, and reproduced.1 Although the terminal feedback contingency clearly prevents the on-line regulation of response force by using external feedback, the immediate feedback contingency permits either feedback or feedforward regulation of response force. To determine whether the feedback made available during force conditioning influences the motor-

control strategy adopted to produce criterion response forces, the force-time (kinetic) characteristics of responses generated under immediate and terminal feedback conditions were examined in the present study. The analysis relied on examination of the following three features of the force-time trajectory: 1. peak force (PF), the maximum force achieved during a response. 2. time to peak force (7pF), the time from the onset of the response to the achievement of peak force. 3. the rate of rise of force (AF/A7"), the increase in force per unit time during the ascending portion of the force-time trajectory. Figure 1 shows that the peak response force, the presumed motor target of the control process, can be increased by lengthening time to peak force (Figure 1A), 1

Much of the information processing language used in this article (e.g., feedforward and feedback) is in common use in studies on motor control and cognition. Such language and associated models originally derive from information theory (Shannon & Weaver, 1949) and cybernetics (Weiner, 1948). In the study of motor control in particular, it has been popular to speak of a "motor program," which specifies the commands, rules, and flow of information related to motor behavior.

CONTROL OF OPERANT

by augmenting the rate of rise of force (Figure IB) or by combining both strategies (Figure 1C). This analysis identifies time to peak force and the rate of rise of force as determinants of peak force. Numerous studies on control of response force suggest that modulation of peak force by time to peak force (Figure 1A) reflects the operation of a feedback force-control process. However, modulation of peak force by the rate of rise of force (Figure IB) reflects the operation of feedforward control processes. Thus, in a study offeree learning in rats, Mitchell and Brener (1991) found that the rate of rise of force and peak force both increased systematically as a function of the force requirement, and both were highly correlated. However, time to peak force remained at a constant value as the force requirement increased (see Figure 3 in Slifkin et al., 1995). Because terminal reinforcer delivery was used in Mitchell and Brener's study, it was not possible for subjects to control response force by relying on the feedback generated by reinforcer delivery as a signal for response termination. Rather, response force had to be generated on the basis of encoded information. This encoded force information could then be updated following response execution according to the success or failure of that response. Mitchell and Brener's (1991) results therefore suggest that modulation of the rate of rise offeree as a means of producing criterion peak forces reflects the operation of feedforward control of force. This inference is consonant with conclusions reached in several investigations of force control in human participants (e.g., Freund & Budingen, 1978; Ghez & Gordon, 1987; Gordon & Ghez, 1987; Gottlieb, Corcos, & Agarwal, 1988) and feline subjects (Ghez & Vicario, 1978). On the other hand, numerous studies with human participants provide good evidence that variations of response duration in targeted limb movement reflect on-line, feedback control. In studies where the duration of response execution is left unspecified, a highly robust phenomenon, the speedaccuracy trade-off (Schmidt, 1988), occurs when subjects automatically lengthen movement time as accuracy requirements increase (viz., Fitts, 1954).2 Grossman and Goodeve (1963/1983), in their study of the speed-accuracy trade-off, accounted for the direct relationship between response duration and response accuracy (the speed-accuracy tradeoff) in terms of an iterative feedback-based error-correction model of response control. According to their data, movement time lengthens to accommodate increased visual sampling of the limb position relative to the target, and the production of corrective submovements following each visual sample. Consistent with their model and in further support of the notion that movement time lengthening reflects on-line use of exteroceptive feedback, it was found that removal of vision resulted in a reduction in the number of corrective submovements within the response (Grossman & Goodeve, 1963/1983), a reduction in movement time (Elliott, Lyons, & Dyson, 1997), and increased error in matching movement amplitude to the target (e.g., Digby et al., 1997; Keele & Posner, 1968; Klapp, 1975; Woodworm, 1899). It has also been shown that although movement time lengthens as a function of increases in task difficulty (see

433

Footnote 2), reaction time remains constant (Fitts & Peterson, 1964). Because reaction time is generally viewed as an index of the time taken to process information in preparation for generating the subsequent response (Henry & Rogers, 1960; Posner, 1978), these human experimental findings support the view that regulation of response amplitude by lengthening response duration reflects feedback, and not feedforward, control. The main purpose of this experiment was to test the hypothesis that motor-control strategies for producing criterion operant response forces depend on the feedback arrangement imposed by the prevailing reinforcement contingencies. On the basis of the research on the speed-accuracy trade-off, it is expected that under immediate feedback conditions, response force would be controlled by time to peak force (e.g., Fitts, 1954; see Figure 1A), whereas under terminal feedback conditions, force would be controlled by the rate of rise of force (Gordon & Ghez, 1987; see Figure IB). However, even when immediate feedback is available, subjects may engage feedforward instead of feedback control, in which case the response kinetic profile (Figure 1) would not differ between the reinforcer delivery conditions. The experiment described in this article was designed to examine whether the conventional method of immediate reinforcer delivery would engender feedback control of force, as evidenced by modulation of time to peak force, or feedforward control, as evidenced by modulation of the rate of rise of force. Half of the subjects could earn food rewards by pressing a beam with forces greater than a low force requirement (2 g) and the other half by pressing with forces greater than a high force criterion (18 g). Within each force group, reinforcers were delivered to half of the subjects immediately on meeting the force requirement (immediate), and to the other half, on the termination of responses that had satisfied the force criterion (terminal). It was anticipated that differences in force-control strategies would only emerge from comparisons between the two high force groups. This

2 In the task described by Fitts (1954) and it's numerous replications, two rectangular targets of the same width were placed with their midpoints at a distance or amplitude from one another. Under different conditions, the target width and amplitude were varied. The task for the human subjects was to contact the left and right targets by making alternating contacts with the tip of a handheld stylus. During each trial, subjects had 15 s in which to make as many successful target contacts as possible. Thus, dividing trial duration by the number of responses generated during the trial provides the average movement time. Fitts' (1954) specified an index of difficulty (ID) for each movement condition, ID = log2(2A/W), where the difficulty of movement was based on the ratio of the amplitude (A) of the movement to the narrowness of the target region (target width = W). The index of difficulty, expressed in binary units or bits, increases, for example, when movement amplitude remains constant and target width decreases. Such increases in the index of difficulty have been found to be reliably correlated with increases in average movement time (e.g., for a review see Schmidt, 1988). This formalization of the speed-accuracy trade-off became known as Fitts' Law.

434

SLIFKIN AND BRENER

is because the minimal task constraints imposed by the low (2 g) force requirement could be met without modifying the subject's natural or default response force level with which it entered the experiment. In the training phases of numerous experiments when the minimum force requirement of 1 g was imposed, we have repeatedly observed that mean peak forces reliably fall between 5 and 7 g. This default level may be determined by such organismic constraints as the biomechanical characteristics of the implicated effectors (e.g., Holt, Hamill, & Andres, 1990; Kugler & Turvey, 1987). Thus, it was predicted that the responses under the low force requirement would be generated in an open-loop fashion and would be insensitive to exteroceptive feedback manipulations. However, the expression of the feedback-dependent difference in the mechanism of force production should be expressed when rats are challenged to operate outside of their preferred (default) range of force production under the high force requirement.

Method Subjects Twenty naive male black-hooded, Long—Evans rats (Rattus norvegicus) aged from 210 to 523 days (M = 320 days) and weighing between 367 and 401 g (M — 383 g) at the beginning of the experiment served as subjects. Five subjects were assigned to each of the four experimental groups: (a) immediate reinforcer delivery-low force (2 g), (b) immediate reinforcer delivery-high force (18 g), (c) terminal reinforcer delivery-low force (2 g), (d) terminal reinforcer delivery-high force (18 g). The mean weights of subjects in the four groups were, respectively, 383 g, 385 g, 379 g, and 386 g. All animals were maintained at 85% to 90% of their preexperimental body weights by supplemental feeding with standard lab chow after each experimental session. Subjects were exposed to fluorescent lighting on a 12:12-hr light-dark cycle with lights off at 0700 and lights on at 1900 and were run 7 days a week at approximately 1500.

Apparatus The experimental environment consisted of a Plexiglas box, 28.8 cm (wide) X 18 cm (deep) X 16 cm (high). The front panel, made of sheet metal, and the response beams were designed according to the specifications given by Notterman and Mintz (1965). The response beam was composed of an aluminum shaft and a brass manipulandum attached at the shaft's end. The shaft was mounted behind the front panel, and the manipulandum of each beam protruded into the animal compartment through small openings in the front panel. The total length of the shaft was 4.42 cm with a width of 0.66 cm. Over the 1,65-cm length most distal to the front panel, the height (thickness) of the shaft was 0.64 cm, and over the 2.77 cm more proximal length it was 0.127 cm thick. Strain gauges (1.27 by 0.64 cm) were bonded to the thinner portion of the shaft. The manipulandum was milled from a single piece of brass to form a 1.27-cm diameter ball that was leveled to provide a flat surface on which rats produced forepaw presses. The maximum distance between the leveled-top surface and the bottom of the ball was 0.95 cm. Force applied to the manipulandum caused small movements (< 1 mm) of the shaft and resulted in changes of the electrical resistance of the strain gauges. The resistance changes, which were directly related to the force applied to the disc, were converted to voltage changes and amplified by using high-stability DC amplifi-

ers. Amplifier output was sampled at 1,000 Hz through a 12-bit analog-to-digital (A/D) converter by a 386 microcomputer. The system permitted force to be measured in units of