Slot Machines Operate On What Schedule Of Reinforcement

Slot Machines Operate On A ____ Schedule Of Reinforcement

Gambling relates to operant conditioning because it is considered a form of positive reinforcement using a variable ratio schedule of reinforcement (Especially slot machines). Variable ratio schedules of reinforcement mean that you are reinforced on a variable rate, as opposed to a fixed ratio schedule of reinforcement.

doi: 10.1901/jaba.2006.109-04

PMID: 17020215

Henry S Roane, Action Editor

This article has been cited by other articles in PMC.

Abstract

Slot Machines Operate On A ____ Schedule Of Reinforcement

The present experiment investigated the impact of contextually trained discriminations on gambling behavior. Nine recreational slot-machine players were initially exposed to concurrently available computerized slot machines that were each programmed on random-ratio schedules of reinforcement and differed only in color. All participants distributed responding equally across the two slot machines. A conditional discrimination procedure was then used to teach the contextual cues representing the arbitrary relations of “greater than” and “less than.” Following contextual cue training, participants were reexposed to the concurrent slot-machine task. After training of the contextual cues, a higher proportion of responses were made to the slot machine that shared formal properties (i.e., color) with the contextual cue representing “greater than.”

Keywords: choice, gambling, self-rules, verbal behavior

During the past 20 years there has been a growth in the number of states that allow legalized gambling (Ghezzi, Lyons, & Dixon, 2000). There has also been an increase from 1% to 5% of the United States population who are classified as problem or pathological gamblers (Harvard Mental Health Letter, 1996). Explanations as to why people develop into problem gamblers include sensation seeking and arousal (), genetic predispositions to gambling (), and the presence of specific personality disorders (). To date, however, the impact of behavior analysis on understanding the development of gambling behavior has been minimal (see Weatherly, 2004, for a discussion).

Understanding gambling from a behavior-analytic perspective poses a unique challenge because animal models of gambling are nonexistent. Furthermore, there are several legal and ethical issues surrounding optimal research settings and participants. Commercial gaming establishments offer a variety of games (e.g., slot machines, video poker, roulette, blackjack) from which gambling behavior can be evaluated. However, these games are designed and government regulated to be purely probabilistic (i.e., based on an intermittent schedule of reinforcement). As a result, field research in which variables of interest (e.g., reinforcement magnitude, density, delays to reinforcement, or odds of winning) are manipulated and in which participants wager actual currency at a casino game are legally prohibited.

An alternative to studying gambling in commercial gaming establishments might involve the use of controllable and modifiable casino-like games (e.g., MacLin & Dixon, 2004; MacLin, Dixon & Hayes, 1999) in which individuals who gamble recreationally can participate. In one such laboratory investigation, examined the degree of discounting of delayed consequences by pathological gamblers and matched-control nongamblers. Choices between hypothetically available amounts of money that differed in size (e.g., $20, $1,000) and delay (e.g., 1 week, 1 year) were presented. Overall, results showed that gamblers discounted the delayed rewards more quickly in a monotonic fashion than did matched-control participants.

One variable of interest in the study of gambling is predicting gamblers' choices among alternatives (e.g., to play one game or another). Such situations may be conceptualized as a concurrent-operants paradigm. Choice on concurrent schedules of reinforcement is rather predictable by understanding the reinforcement rates and magnitudes associated with each response option (Davison & McCarthy, 1988). Such response patterns often occur as the result of extensive exposure to the programmed contingencies, something that rarely happens when a person gambles. By contrast, it is often the case that a gambler may never experience the contingencies of winning the jackpot on a given slot machine, yet they may favor that machine over concurrently available machines with similar programmed contingencies.

Although some choice responding of gamblers may be due to superstitious reinforcement (Skinner, 1953), verbal behavior also appears to have an impact on gamblers' choices, risk levels, and duration of game play. For example, roulette players may choose to wager more chips on specific options (i.e., certain numbers on a roulette board) that have no impact on game outcome if given inaccurate rules stating strategies for winning. Dixon (2000) found that preferences for such game options were reduced via delivery of accurate rules about the game, but for some players even such rules were insufficient to remove “illogical” choice making. Gamblers also tend to play games with poor probabilities of winning for longer periods of time when given inaccurate rules often found in the casino (e.g., “you have to play if you are going to win” and “the best way to win is to keep playing”) (Dixon, Hayes, & Aban, 2000). Although these studies illustrate how rules can alter gambling behavior, they do not aid in understanding how gamblers generate such rules.

The concept of a self-rule (Skinner, 1972) has been discussed as the product of relational responding to various stimuli and their discriminative functions (Hayes, Barnes-Holmes, & Roche, 2001). This conceptualization of a self-rule suggests that initially neutral or novel stimuli may acquire certain functions through direct training or transfers of functional control in the absence of differential reinforcement. For example, if a verbally competent person is trained by direct reinforcement that A is better than B and B is better than C, the person will be able to derive that C is worse than A in the absence of any direct reinforcement. Furthermore, contextual cues or higher order conditional discriminations may make the functional relations among stimuli transient (i.e., under additional discriminative control; Saunders & Williams, 1998). For example, in a matching-to-sample task, a person might be presented with a sample stimulus of the numeral 5 and given comparison response options of the numerals 8, 2, and 4. Depending on the contextual cue present, differential reinforcement might be provided for selecting the comparison stimulus that is “better than” (selecting 8) or “worse than” (selecting 2).

The purpose of the current study was to develop a set of self-rules that would influence response allocation among concurrently available gambling options. First, we examined the degree to which recreational slot-machine players would match their responses to concurrently available random-ratio (RR) simulated slot machines that differed only by color. We then attempted to alter or enhance initial response allocations through the establishment of “greater than” and “less than” relations that were associated with specific contextual stimuli in a conditional discrimination procedure.

Method

Participants

Nine undergraduate students participated in the experiment. All participants were at least 18 years of age and reported occasionally having played slot machines at regional casinos.

Apparatus and Setting

Participation took place at a desk in a small room (3 m by 3.5 m) containing various furniture and equipment. A computer programmed in Microsoft® Visual Basic 6.0 controlled the presentation of stimuli and data collection. Accuracy of data collection by the computer was checked prior to the running of each participant via a program debugger, which evaluated all possibles “cumulative credits” (initially set at 100). When the “spin” button was clicked, the slot machine reels spun for approximately 3 s and displayed either a winning display (three identical symbols on the payoff line) or a losing display (any other type of display of symbols on the payoff line).

Participants' choice option between the two simulated slot machines.

The simulated slot-machine task.

Following every winning spin, two credits were added to the participant's “credits won” and “cumulative credits” display windows, and following a losing spin no programmed consequences were delivered except that the one credit the participant initially bet was removed from the “cumulative credits” window. After each spin, the participant was given another choice of which slot machine to play (yellow or blue) as described above. To eliminate any position bias, the different-colored slot machines were randomly positioned on either side of the screen across trials. In addition, an observing response was instated between all trials, in which the participant was required to click on the computer mouse in the middle of the computer screen (between the two pictures) before the next trial (i.e., presentation of the two slot machines) began.

Each of the slot machines was programmed on an RR schedule of reinforcement on which the probability of reinforcement was .5 and the magnitude of reinforcement was held constant (one credit net gain or loss). To control for possible variations in reinforcement density across participants, the RR sequence was generated a priori by a pilot participant, and the resulting identical sequence of trial outcomes was matched to all 9 participants. In addition, regardless of the choice option for a specific slot machine, the outcome of the RR schedule was predetermined for every participant. That is, the program controlled for credits won or lost such that every participant contacted the identical amount of reinforcement obtained despite their individual choices among the two slot-machine options. Thus, given the p = .5 contingencies, each participant ended this task with 100 credits. The slot-machine pretest condition continued until 50 trials had been completed.

Conditional Discrimination Training

Following the slot-machine pretest, conditional discrimination training was conducted to establish the relations of greater than and less than with the colors used in the slot-machine task. During this condition, participants were instructed to match a visual sample stimulus to one of three visual comparison stimuli presented on the computer screen. Six sets of five stimuli and two contextual cues were used during this procedure. The six sets of stimuli included five images or words that represented a continuum from least to most, each approximately 8 cm by 8 cm. Stimulus sets incorporated gambling stimuli (playing cards), monetary values (dollar bills and coins), and nonmonetary or gambling stimuli (letter grades used in American universities, placement in a competition). Thus, the stimuli could be considered to be related to different concepts (e.g., ranking, value, size); however, they represented a difference in categorization along a continuum (greater than to less than). For example, Set B included pictures of a penny, two pennies, a nickel, a dime, and a quarter (see Figure 3).

Stimulus sets that were used during the conditional discrimination training procedure.

In addition to the six sets of comparison stimuli, two contextual cues were presented in this condition. The contextual cues were two colored rectangles (yellow or blue) approximately 40 cm by 12 cm. The contextual cues were presented as a rectangle behind the comparison stimuli images (see Figure 4).

The discrimination training and testing task.

The top image represents the sample stimulus, the lower three images represent the comparison stimuli, and the larger shaded rectangle represents the contextual stimulus.

At the beginning of the conditional discrimination training condition, the following instructions were presented on the screen:

You are going to see five images presented on your screen: one image on top, three on the bottom, and one larger image surrounding the three on the bottom. Your job is to choose one of the three images on the bottom of the screen by clicking on it with the mouse. When you are correct you will receive one point. Incorrect responses will not result in awarded points. Please try to earn as many points as you can. The more points you earn, the quicker you will finish. There will be parts of the experiment where feedback is not given. The computer is still keeping track of your responses so continue to do your best. Do you have any questions?

The experimenter answered additional questions by repeating relevant sections of the instructions. After addressing questions, the experimenter left the room.

All trials of conditional discrimination training involved the same stimulus presentation format throughout. A single sample stimulus was visible in the middle of the screen, and a colored contextual cue (yellow or blue) was presented as a rectangle behind the three comparison stimuli at the bottom of the screen. Participants responded by clicking on one of the three bottom images with the mouse.

During the training phases, a point counter was visible. The counter displayed the cumulative points earned and feedback regarding the correctness of the response (i.e., “good job” or “wrong”). Following a correct response, a 1-s chime sounded, the phrase “good job” was displayed, and one point was added to the cumulative point counter visible at the top of the computer screen. Incorrect responses resulted in a 1-s chord sounding and display of the word “wrong.” Following feedback, a 1-s intertrial interval elapsed before the next trial began with the presentation of the relevant sample and comparison stimuli. The relations of greater than and less than were trained in three separate phases using three sets of stimuli.

Phase 1: Less Than

During this phase the blue contextual cue was presented surrounding the comparison stimuli to train the relation of less than. That is, when the blue cue was presented, a response on the comparison that was less than the sample resulted in programmed positive consequences (e.g., point delivery). Using Set A as an example, if the $5 bill was presented as a sample and the comparisons were $1, $10, or $20, a correct response would be selection of the $1 bill. Stimuli from Sets A, B, and C were randomly presented six times each in an 18-trial block that required the participant to correctly respond to the presented stimuli and contextual cue at 89% accuracy or better to advance to the next phase. If less than 89% accuracy occurred within the block of 18 trials, the participant was presented with another block of 18 trials. Sample stimuli during subsequent less than phases included the $5 bill, two pennies, the grade of B−, the $10 bill, a nickel, and the grade of C+. Comparison stimuli included various arrangements of all remaining 15 stimuli.

Phase 2: Greater Than

During this phase, the yellow contextual cue was presented surrounding the comparison stimuli to train the relation of greater than. When the yellow cue was presented, a response on the comparison that was greater than the sample resulted in the programmed positive consequences. For example, if the $10 bill was presented as a sample, the correct response would be selection of the $20 bill over $1 and $5 options. All other stimuli and the performance criteria (i.e., 89% accuracy or better) were identical to those used in the less than phase. Sample stimuli during this phase included the $20 bill, a dime, the grade of D−, the $10 bill, a nickel, and the grade of C+. Comparison stimuli included various arrangements of all remaining 15 stimuli.

Phase 3: Mixed Less Than and Greater Than

During this phase, mixed training between the Phase 1 and Phase 2 contingencies occurred. Stimuli from Sets A, B, and C were randomly presented 12 times each in a 36-trial block, and each contextual cue (blue or yellow) was presented 18 times each. As before, the participant was required to score 89% or better to advance to the next phase. If the 89% criterion was not met, the participant was presented with another bock of 36 trials, after which performance was again evaluated for meeting the 89% accuracy criterion. All sample and comparison stimuli arrangements were identical to those of Phases 1 and 2 and were presented in a randomized order.

Phase 4: Test

During this phase, a 54-trial relational test was administered. The stimuli used in the posttest included the three sets of images used during training (left side of Figure 3) as well as three sets of novel pictures (right side of Figure 3) to assess any transfer of function of the greater than and less than contextual cues to novel stimuli. The test contained 30 trials that used the sets of trained stimuli (A, B, and C) and 24 trials that used the sets of novel stimuli (D, E, and F). No feedback or points were provided at any time during this phase. Prior to the first trial of the test, the participant viewed the following instructions: “You will no longer receive feedback following your responses. Continue to do the best you can. The computer is recording your score.”

The criterion for completion of this phase was correct responding in the presence of the different stimuli and the relevant contextual cues at 85% accuracy or better (i.e., 46 of 54 trials). If less than 85% correct performance occurred, the participant was reexposed to the mixed training contingencies of Phase 3. Following completion of Phase 3, another exposure to Phase 4 occurred.

Slot-machine Task Posttest

The purpose of this task was to determine whether the participants exhibited any change in preference between the two simulated slot machines following conditional discrimination training. Participants were reexposed to the exact simulated slot-machine task used during the slot-machine task pretest condition. During this condition, an additional 50 trials were conducted so that direct comparisons could be made with the pretest condition. The same programmed RR schedules (50% probability of reinforcement) from the pretest remained in place for all participants.

Results

Figure 5 displays each participant's response allocation across the two slot machines during initial exposure to the task. For most participants, responding was relatively equally distributed. No participant showed more than a 20% (10-trial) preference for either option. Across all participants, 49% of responses were allocated to the yellow machine and 51% were allocated to the blue.

Response allocations across the concurrently available slot machines during initial exposure to the gambling task.

All participants reached criterion responding in the conditional discrimination training and subsequently progressed to the relational test. The number of blocks required to meet criteria during Phase 1 (less than) and Phase 2 (greater than) varied from 10 blocks to one block, with the average being approximately five blocks for less than training and two blocks for greater than training. The number of training blocks required to meet criteria in the mixed training phase varied from five to one, with an average of two training blocks.

Seven of the 9 participants performed at 89% accuracy or better during exposure to the relational test. Two participants required reexposure to the mixed less than and greater than training phase (Participants 2 and 8). Both participants reached mastery criterion with 97% correct during this reexposure in one block of 36 trials. It should be noted that Participant 2 subsequently passed the relational test, whereas Participant 8 scored 70% on the mixed less than and greater than test. Figure 6 displays performance on trained, novel, and combined (i.e., trained and novel) stimulus sets across participants.

Percentage of correct responses made during conditional discrimination testing conditions for each participant.

Overall test performance is displayed in the first of the three bars for each participant, trained stimulus sets in the middle, and novel stimulus sets in the right.

A comparison of responding on the slot-machine task pretest and final slot-machine task posttest is displayed in Figure 7. Eight of the 9 participants (the exception was Participant 8) allocated a majority of their responses to the yellow machine during the posttest condition. Together, the participants allocated 81% of their responses to the yellow machine and 19% to the blue machine. A notable exception to this trend occurred with Participant 8. During the initial exposure to the gambling task, Participant 8 allocated 49% of his responses to the yellow slot machine and 51% to the blue machine, thus demonstrating almost exact matching performance. As noted above, this participant required a second exposure to the mixed training phase after failing the discrimination test. Following reexposure to the mixed training and meeting of the criteria, Participant 8 again did not pass the discrimination test. Thus, the only participant who did not show a shift in response allocation on the second slot-machine task was the 1 participant who failed the relational responding test; all other participants allocated their responding to the yellow slot machine.

Response allocations to the preferred slot (yellow machine) during initial and final exposure to the slot-machine task.

Discussion

The current study provided a means of initially assessing response preference for slot machines that had an equal (50%) probability of reinforcement on any given spin. These two response options differed only in color, which allowed a baseline response preference to be established. No participant had a clear preference for one option over the other. During conditional discrimination training, all participants were provided with differential reinforcement for matching a sample stimulus to one of three comparison stimuli that were either greater than or less than the sample, depending on the contextual cue present. After reexposure to the same concurrent schedule consisting of two simulated slot machines, 8 of 9 participants demonstrated a higher preference for one option (the yellow slot machine) than another (the blue slot machine), thus suggesting a transformation of the stimulus functions of greater than (associated with the yellow slot machine) and less than (associated with the blue slot machine). These results are similar to previous investigations in which responding on a novel tasks was altered by training arbitrary relations of more than and less than ().

The development and transfer of stimulus functions have been shown across a variety of functions, including elicitation of fear (Dougher, 1998), interresponse times and temporal control (Rehfeldt & Hayes, 1998), contextual control (), as well as other discriminative stimulus functions (; ). Future research might expand on the present study to examine other transfers of function within a gambling environment. For example, a function could be attached to arbitrary stimuli so that Context X comes to represent “fast” and Context Y comes to represent “slow.” Presumably, such an arrangement could be used to differentially reinforce decreases in the frequency of gambling responses among pathological gamblers.

The current results suggest that recreational gamblers may allocate their responding almost equally across concurrently available slot machines of equal probability. It should be noted that our participants were exposed to the slot-machine task for only 50 trials; this may have not been sufficient exposure to generate steady-state responding. That is, the current participants may have been merely sampling both response options during these trials rather than showing a bias in responding toward either option. However, changes in response allocation following training suggest that a clear preference for the yellow (greater than) option developed posttraining. It is also important to note that the current participants were recreational gamblers. It is unknown if similar results would have been obtained with pathological gamblers.

Although the pretest–posttest design used in the present experiment is methodologically limited, it provided an initial foundation for evaluating the possible emergence of self-rules. Nevertheless, the design should be modified in future investigations. For example, the present methodology could be modified to incorporate a multiple baseline across subjects design in which each participant is exposed to varying numbers of pretest (baseline) choices between slot machines before progressing to conditional discrimination training and testing. Other procedural modifications could include alterations in the probabilities of reinforcement on the two simulated slot machines without informing the participant, to assess the resulting potential for insensitivity to subsequent changes in reinforcement probability. Additional manipulations might include examining the effects of significant losing streaks or winning streaks on individual participants (see Rachlin, 1990) along with temporal response patterns such as latency and engagement (Dixon & Schreiber, 2002). Also, future research might assess response allocation immediately following the pretest, such that if a preference for yellow was shown, the computer program would subsequently arrange stimuli such that blue represented greater than and vice versa. The programming structure of our experiment did not permit such changes to be made.

In summary, the present study provided an empirical foundation for the study of gambling behavior among recreational slot-machine players on concurrently available slot machines. Although a translational approach to the study of gambling may limit the implications that can be drawn from the present data, these results suggest that a preference for one type of machine over another can be empirically created through transfers of stimulus functions. Specifically, the current study led to the development of a self-rule through initially neutral or novel stimuli acquiring specific relations without direct training. Nevertheless, additional research on the development of self-rule formation is required. Although we anecdotally deduced that self-rules about which slot machine was greater than or less than were formed, this hypothesis was not directly assessed. Thus, future research might consider incorporating a variety of supplemental measures to examine rule generation. These might include think-aloud procedures in which participants speak aloud during the experiment and resulting verbal behavior is examined for relevant content, a postexperimental questionnaire as to why participants responded a certain way, or intermittently pausing the experiment and asking participants to state why they were responding one way over another.

References

Anderson G, Brown R.I.F. Real and laboratory gambling, sensation-seeking and arousal. British Journal of Psychology. 1984;75:401–410. [PubMed] [Google Scholar]
Barnes D, Keenan M. A transfer of functions through derived arbitrary and nonarbitrary stimulus relations. Journal of the Experimental Analysis of Behavior. 1993;59:61–81.[PMC free article] [PubMed] [Google Scholar]
Davison M, McCarthy J. The matching law. Hillsdale, NJ: Erlbaum; 1988. [Google Scholar]
DeGrandpre R.J, Bickel W.K, Higgins S.T. Emergent equivalence relations between interoceptive (drug) and exteroceptive (visual) stimuli. Journal of the Experimental Analysis of Behavior. 1992;58:9–18.[PMC free article] [PubMed] [Google Scholar]
Dixon M.R. Manipulating the “illusion of control”: Variations in risk-taking as a function of perceived control over chance outcomes. The Psychological Record. 2000;50:705–720.[Google Scholar]
Dixon M.R, Hayes L.J, Aban I. Examining the roles of rule-following, reinforcement, and pre-experimental histories on risk-taking behavior. The Psychological Record. 2000;50:687–704.[Google Scholar]
Dixon M.R, Marley J, Jacobs E. Delay discounting of pathological gamblers. Journal of Applied Behavior Analysis. 2003;36:449–458.[PMC free article] [PubMed] [Google Scholar]
Dixon M.R, Schreiber J. Utilizing a computerized video poker simulation for the collection of experimental data on gambling behavior. The Psychological Record. 2002;52:417–428.[Google Scholar]
Dougher M.J. Stimulus equivalence and the untrained acquisition of stimulus functions. Behavior Therapy. 1998;29:577–591.[Google Scholar]
Dymond S, Barnes D. A transformation of self-discrimination response functions in accordance with the arbitrarily applicable relations of sameness, more than, and less than. Journal of the Experimental Analysis of Behavior. 1995;64:163–184.[PMC free article] [PubMed] [Google Scholar]
Ghezzi P, Lyons C, Dixon M.R. Gambling from a socioeconomic perspective. In: Bickel W.K, Vuchinich R.E, editors. Reframing health behavior change with behavioral economics. New York: Erlbaum; 2000. pp. 313–338. In. eds. [Google Scholar]
Harvard Mental Health Letter. Pathological gambling. 1996. Retrieved September 13, 2001, from http://firstsearch.oclc.org/FETCH:…ml/fsfull-text.htm%22:/fstx23.htm. [Google Scholar]
Hayes S.C, Barnes-Holmes D, Roche B. Relational frame theory: A post-Skinnerian account of human language and cognition. New York: Kluwer Academic; 2001. [Google Scholar]
Kohlenberg B.S, Hayes S.C, Hayes L.J. The transfer of contextual control over equivalence classes through equivalence classes: A possible model of social stereotyping. Journal of the Experimental Analysis of Behavior. 1991;56:505–518.[PMC free article] [PubMed] [Google Scholar]
Kroeber H.L. Roulette gamblers and gamblers at electronic game machines: Where are the differences? Journal of Gambling Studies. 1992;8:79–92. [PubMed] [Google Scholar]
MacLin O.H, Dixon M.R. A computerized roulette simulation to investigate the variables involved in gambling behavior. Behavior Research Methods, Instruments, and Computers. 2004;36:96–100. [PubMed] [Google Scholar]
MacLin O.H, Dixon M.R, Hayes L.J. A computerized slot machine simulation to investigate the variables involved in gambling behavior. Behavior Research Methods, Instruments, and Computers. 1999;31:731–735. [PubMed] [Google Scholar]
Rachlin H. Why do people gamble and keep gambling despite heavy losses. Psychological Science. 1990;1:294–297.[Google Scholar]
Rehfeldt R.A, Hayes L.J. Untrained temporal differentiation and equivalence class formation. The Psychological Record. 1998;48:481–510.[Google Scholar]
Saunders K, Williams D. Stimulus equivalence. In: Lattal K.A, Perone M, editors. Handbook of research methods in human operant behavior. New York: Plenum; 1998. pp. 193–228. In. eds. [Google Scholar]
Skinner B.F. Science and human behavior. New York: Appleton-Century-Crofts; 1953. [Google Scholar]
Skinner B.F. About behaviorism. New York: Knopf; 1972. [Google Scholar]
Slutske W.S, Eisen S, True W.R, Lyons M.J, Goldberg J, Tsuang M. Common genetic vulnerability for pathological gambling and alcohol dependence in men. Archives of General Psychiatry. 2000;57:666–673. [PubMed] [Google Scholar]
Weatherly J.N. The best we have: A critique of Ladouceur, Sylvain, Boutin, and Doucet's Understanding and Treating the Pathological Gambler. The Behavior Analyst. 2004;27:119–124.[Google Scholar]

Articles from Journal of Applied Behavior Analysis are provided here courtesy of Society for the Experimental Analysis of Behavior

Basic Principles of Operant Conditioning: Thorndike’s Law of Effect

Thorndike’s law of effect states that behaviors are modified by their positive or negative consequences.

Learning Objectives

Relate Thorndike’s law of effect to the principles of operant conditioning

Key Takeaways

Key Points

The law of effect states that responses that produce a satisfying effect in a particular situation become more likely to occur again, while responses that produce a discomforting effect are less likely to be repeated.
Edward L. Thorndike first studied the law of effect by placing hungry cats inside puzzle boxes and observing their actions. He quickly realized that cats could learn the efficacy of certain behaviors and would repeat those behaviors that allowed them to escape faster.
The law of effect is at work in every human behavior as well. From a young age, we learn which actions are beneficial and which are detrimental through a similar trial and error process.
While the law of effect explains behavior from an external, observable point of view, it does not account for internal, unobservable processes that also affect the behavior patterns of human beings.

Key Terms

Law of Effect: A law developed by Edward L. Thorndike that states, “responses that produce a satisfying effect in a particular situation become more likely to occur again in that situation, and responses that produce a discomforting effect become less likely to occur again in that situation.”
behavior modification: The act of altering actions and reactions to stimuli through positive and negative reinforcement or punishment.
trial and error: The process of finding a solution to a problem by trying many possible solutions and learning from mistakes until a way is found.

Operant conditioning is a theory of learning that focuses on changes in an individual’s observable behaviors. In operant conditioning, new or continued behaviors are impacted by new or continued consequences. Research regarding this principle of learning first began in the late 19th century with Edward L. Thorndike, who established the law of effect.

Thorndike’s Experiments

Thorndike’s most famous work involved cats trying to navigate through various puzzle boxes. In this experiment, he placed hungry cats into homemade boxes and recorded the time it took for them to perform the necessary actions to escape and receive their food reward. Thorndike discovered that with successive trials, cats would learn from previous behavior, limit ineffective actions, and escape from the box more quickly. He observed that the cats seemed to learn, from an intricate trial and error process, which actions should be continued and which actions should be abandoned; a well-practiced cat could quickly remember and reuse actions that were successful in escaping to the food reward.

Thorndike’s puzzle box: This image shows an example of Thorndike’s puzzle box alongside a graph demonstrating the learning of a cat within the box. As the number of trials increased, the cats were able to escape more quickly by learning.

The Law of Effect

Thorndike realized not only that stimuli and responses were associated, but also that behavior could be modified by consequences. He used these findings to publish his now famous “law of effect” theory. According to the law of effect, behaviors that are followed by consequences that are satisfying to the organism are more likely to be repeated, and behaviors that are followed by unpleasant consequences are less likely to be repeated. Essentially, if an organism does something that brings about a desired result, the organism is more likely to do it again. If an organism does something that does not bring about a desired result, the organism is less likely to do it again.

Law of effect: Initially, cats displayed a variety of behaviors inside the box. Over successive trials, actions that were helpful in escaping the box and receiving the food reward were replicated and repeated at a higher rate.

Thorndike’s law of effect now informs much of what we know about operant conditioning and behaviorism. According to this law, behaviors are modified by their consequences, and this basic stimulus-response relationship can be learned by the operant person or animal. Once the association between behavior and consequences is established, the response is reinforced, and the association holds the sole responsibility for the occurrence of that behavior. Thorndike posited that learning was merely a change in behavior as a result of a consequence, and that if an action brought a reward, it was stamped into the mind and available for recall later.

From a young age, we learn which actions are beneficial and which are detrimental through a trial and error process. For example, a young child is playing with her friend on the playground and playfully pushes her friend off the swingset. Her friend falls to the ground and begins to cry, and then refuses to play with her for the rest of the day. The child’s actions (pushing her friend) are informed by their consequences (her friend refusing to play with her), and she learns not to repeat that action if she wants to continue playing with her friend.

The law of effect has been expanded to various forms of behavior modification. Because the law of effect is a key component of behaviorism, it does not include any reference to unobservable or internal states; instead, it relies solely on what can be observed in human behavior. While this theory does not account for the entirety of human behavior, it has been applied to nearly every sector of human life, but particularly in education and psychology.

Basic Principles of Operant Conditioning: Skinner

B. F. Skinner was a behavioral psychologist who expanded the field by defining and elaborating on operant conditioning.

Learning Objectives

Summarize Skinner’s research on operant conditioning

Key Takeaways

Key Points

B. F. Skinner, a behavioral psychologist and a student of E. L. Thorndike, contributed to our view of learning by expanding our understanding of conditioning to include operant conditioning.
Skinner theorized that if a behavior is followed by reinforcement, that behavior is more likely to be repeated, but if it is followed by punishment, it is less likely to be repeated.
Skinner conducted his research on rats and pigeons by presenting them with positive reinforcement, negative reinforcement, or punishment in various schedules that were designed to produce or inhibit specific target behaviors.
Skinner did not include room in his research for ideas such as free will or individual choice; instead, he posited that all behavior could be explained using learned, physical aspects of the world, including life history and evolution.

Key Terms

punishment: The act or process of imposing and/or applying a sanction for an undesired behavior when conditioning toward a desired behavior.
aversive: Tending to repel, causing avoidance (of a situation, a behavior, an item, etc.).
superstition: A belief, not based on reason or scientific knowledge, that future events may be influenced by one’s behavior in some magical or mystical way.

Operant conditioning is a theory of behaviorism that focuses on changes in an individual’s observable behaviors. In operant conditioning, new or continued behaviors are impacted by new or continued consequences. Research regarding this principle of learning was first conducted by Edward L. Thorndike in the late 1800s, then brought to popularity by B. F. Skinner in the mid-1900s. Much of this research informs current practices in human behavior and interaction.

Skinner’s Theories of Operant Conditioning

Almost half a century after Thorndike’s first publication of the principles of operant conditioning and the law of effect, Skinner attempted to prove an extension to this theory—that all behaviors are in some way a result of operant conditioning. Skinner theorized that if a behavior is followed by reinforcement, that behavior is more likely to be repeated, but if it is followed by some sort of aversive stimuli or punishment, it is less likely to be repeated. He also believed that this learned association could end, or become extinct, if the reinforcement or punishment was removed.

B. F. Skinner: Skinner was responsible for defining the segment of behaviorism known as operant conditioning—a process by which an organism learns from its physical environment.

Skinner’s Experiments

Skinner’s most famous research studies were simple reinforcement experiments conducted on lab rats and domestic pigeons, which demonstrated the most basic principles of operant conditioning. He conducted most of his research in a special cumulative recorder, now referred to as a “Skinner box,” which was used to analyze the behavioral responses of his test subjects. In these boxes he would present his subjects with positive reinforcement, negative reinforcement, or aversive stimuli in various timing intervals (or “schedules”) that were designed to produce or inhibit specific target behaviors.

In his first work with rats, Skinner would place the rats in a Skinner box with a lever attached to a feeding tube. Whenever a rat pressed the lever, food would be released. After the experience of multiple trials, the rats learned the association between the lever and food and began to spend more of their time in the box procuring food than performing any other action. It was through this early work that Skinner started to understand the effects of behavioral contingencies on actions. He discovered that the rate of response—as well as changes in response features—depended on what occurred after the behavior was performed, not before. Skinner named these actions operant behaviors because they operated on the environment to produce an outcome. The process by which one could arrange the contingencies of reinforcement responsible for producing a certain behavior then came to be called operant conditioning.

To prove his idea that behaviorism was responsible for all actions, he later created a “superstitious pigeon.” He fed the pigeon on continuous intervals (every 15 seconds) and observed the pigeon’s behavior. He found that the pigeon’s actions would change depending on what it had been doing in the moments before the food was dispensed, regardless of the fact that those actions had nothing to do with the dispensing of food. In this way, he discerned that the pigeon had fabricated a causal relationship between its actions and the presentation of reward. It was this development of “superstition” that led Skinner to believe all behavior could be explained as a learned reaction to specific consequences.

In his operant conditioning experiments, Skinner often used an approach called shaping. Instead of rewarding only the target, or desired, behavior, the process of shaping involves the reinforcement of successive approximations of the target behavior. Behavioral approximations are behaviors that, over time, grow increasingly closer to the actual desired response.

Skinner believed that all behavior is predetermined by past and present events in the objective world. He did not include room in his research for ideas such as free will or individual choice; instead, he posited that all behavior could be explained using learned, physical aspects of the world, including life history and evolution. His work remains extremely influential in the fields of psychology, behaviorism, and education.

Shaping

Shaping is a method of operant conditioning by which successive approximations of a target behavior are reinforced.

Learning Objectives

Describe how shaping is used to modify behavior

Key Takeaways

Key Points

B. F. Skinner used shaping —a method of training by which successive approximations toward a target behavior are reinforced—to test his theories of behavioral psychology.
Shaping involves a calculated reinforcement of a “target behavior”: it uses operant conditioning principles to train a subject by rewarding proper behavior and discouraging improper behavior.
The method requires that the subject perform behaviors that at first merely resemble the target behavior; through reinforcement, these behaviors are gradually changed or “shaped” to encourage the target behavior itself.
Skinner’s early experiments in operant conditioning involved the shaping of rats’ behavior so they learned to press a lever and receive a food reward.
Shaping is commonly used to train animals, such as dogs, to perform difficult tasks; it is also a useful learning tool for modifying human behavior.

Key Terms

successive approximation: An increasingly accurate estimate of a response desired by a trainer.
paradigm: An example serving as a model or pattern; a template, as for an experiment.
shaping: A method of positive reinforcement of behavior patterns in operant conditioning.

In his operant-conditioning experiments, Skinner often used an approach called shaping. Instead of rewarding only the target, or desired, behavior, the process of shaping involves the reinforcement of successive approximations of the target behavior. The method requires that the subject perform behaviors that at first merely resemble the target behavior; through reinforcement, these behaviors are gradually changed, or shaped, to encourage the performance of the target behavior itself. Shaping is useful because it is often unlikely that an organism will display anything but the simplest of behaviors spontaneously. It is a very useful tool for training animals, such as dogs, to perform difficult tasks.

Dog show: Dog training often uses the shaping method of operant conditioning.

How Shaping Works

In shaping, behaviors are broken down into many small, achievable steps. To test this method, B. F. Skinner performed shaping experiments on rats, which he placed in an apparatus (known as a Skinner box) that monitored their behaviors. The target behavior for the rat was to press a lever that would release food. Initially, rewards are given for even crude approximations of the target behavior—in other words, even taking a step in the right direction. Then, the trainer rewards a behavior that is one step closer, or one successive approximation nearer, to the target behavior. For example, Skinner would reward the rat for taking a step toward the lever, for standing on its hind legs, and for touching the lever—all of which were successive approximations toward the target behavior of pressing the lever.

As the subject moves through each behavior trial, rewards for old, less approximate behaviors are discontinued in order to encourage progress toward the desired behavior. For example, once the rat had touched the lever, Skinner might stop rewarding it for simply taking a step toward the lever. In Skinner’s experiment, each reward led the rat closer to the target behavior, finally culminating in the rat pressing the lever and receiving food. In this way, shaping uses operant-conditioning principles to train a subject by rewarding proper behavior and discouraging improper behavior.

In summary, the process of shaping includes the following steps:

Reinforce any response that resembles the target behavior.
Then reinforce the response that more closely resembles the target behavior. You will no longer reinforce the previously reinforced response.
Next, begin to reinforce the response that even more closely resembles the target behavior. Continue to reinforce closer and closer approximations of the target behavior.
Finally, only reinforce the target behavior.

Applications of Shaping

This process has been replicated with other animals—including humans—and is now common practice in many training and teaching methods. It is commonly used to train dogs to follow verbal commands or become house-broken: while puppies can rarely perform the target behavior automatically, they can be shaped toward this behavior by successively rewarding behaviors that come close.

Shaping is also a useful technique in human learning. For example, if a father wants his daughter to learn to clean her room, he can use shaping to help her master steps toward the goal. First, she cleans up one toy and is rewarded. Second, she cleans up five toys; then chooses whether to pick up ten toys or put her books and clothes away; then cleans up everything except two toys. Through a series of rewards, she finally learns to clean her entire room.

Reinforcement and Punishment

Reinforcement and punishment are principles of operant conditioning that increase or decrease the likelihood of a behavior.

Learning Objectives

Differentiate among primary, secondary, conditioned, and unconditioned reinforcers

Key Takeaways

Key Points

” Reinforcement ” refers to any consequence that increases the likelihood of a particular behavioral response; ” punishment ” refers to a consequence that decreases the likelihood of this response.
Both reinforcement and punishment can be positive or negative. In operant conditioning, positive means you are adding something and negative means you are taking something away.
Reinforcers can be either primary (linked unconditionally to a behavior) or secondary (requiring deliberate or conditioned linkage to a specific behavior).
Primary—or unconditioned—reinforcers, such as water, food, sleep, shelter, sex, touch, and pleasure, have innate reinforcing qualities.
Secondary—or conditioned—reinforcers (such as money) have no inherent value until they are linked or paired with a primary reinforcer.

Key Terms

latency: The delay between a stimulus and the response it triggers in an organism.

Reinforcement and punishment are principles that are used in operant conditioning. Reinforcement means you are increasing a behavior: it is any consequence or outcome that increases the likelihood of a particular behavioral response (and that therefore reinforces the behavior). The strengthening effect on the behavior can manifest in multiple ways, including higher frequency, longer duration, greater magnitude, and short latency of response. Punishment means you are decreasing a behavior: it is any consequence or outcome that decreases the likelihood of a behavioral response.

Extinction, in operant conditioning, refers to when a reinforced behavior is extinguished entirely. This occurs at some point after reinforcement stops; the speed at which this happens depends on the reinforcement schedule, which is discussed in more detail in another section.

Positive and Negative Reinforcement and Punishment

Both reinforcement and punishment can be positive or negative. In operant conditioning, positive and negative do not mean good and bad. Instead, positive means you are adding something and negative means you are taking something away. All of these methods can manipulate the behavior of a subject, but each works in a unique fashion.

Operant conditioning: In the context of operant conditioning, whether you are reinforcing or punishing a behavior, “positive” always means you are adding a stimulus (not necessarily a good one), and “negative” always means you are removing a stimulus (not necessarily a bad one. See the blue text and yellow text above, which represent positive and negative, respectively. Similarly, reinforcement always means you are increasing (or maintaining) the level of a behavior, and punishment always means you are decreasing the level of a behavior. See the green and red backgrounds above, which represent reinforcement and punishment, respectively.

Positive reinforcers add a wanted or pleasant stimulus to increase or maintain the frequency of a behavior. For example, a child cleans her room and is rewarded with a cookie.
Negative reinforcers remove an aversive or unpleasant stimulus to increase or maintain the frequency of a behavior. For example, a child cleans her room and is rewarded by not having to wash the dishes that night.
Positive punishments add an aversive stimulus to decrease a behavior or response. For example, a child refuses to clean her room and so her parents make her wash the dishes for a week.
Negative punishments remove a pleasant stimulus to decrease a behavior or response. For example, a child refuses to clean her room and so her parents refuse to let her play with her friend that afternoon.

Primary and Secondary Reinforcers

The stimulus used to reinforce a certain behavior can be either primary or secondary. A primary reinforcer, also called an unconditioned reinforcer, is a stimulus that has innate reinforcing qualities. These kinds of reinforcers are not learned. Water, food, sleep, shelter, sex, touch, and pleasure are all examples of primary reinforcers: organisms do not lose their drive for these things. Some primary reinforcers, such as drugs and alcohol, merely mimic the effects of other reinforcers. For most people, jumping into a cool lake on a very hot day would be reinforcing and the cool lake would be innately reinforcing—the water would cool the person off (a physical need), as well as provide pleasure.

A secondary reinforcer, also called a conditioned reinforcer, has no inherent value and only has reinforcing qualities when linked or paired with a primary reinforcer. Before pairing, the secondary reinforcer has no meaningful effect on a subject. Money is one of the best examples of a secondary reinforcer: it is only worth something because you can use it to buy other things—either things that satisfy basic needs (food, water, shelter—all primary reinforcers) or other secondary reinforcers.

Schedules of Reinforcement

Reinforcement schedules determine how and when a behavior will be followed by a reinforcer.

Learning Objectives

Compare and contrast different types of reinforcement schedules

Key Takeaways

Key Points

A reinforcement schedule is a tool in operant conditioning that allows the trainer to control the timing and frequency of reinforcement in order to elicit a target behavior.
Continuous schedules reward a behavior after every performance of the desired behavior; intermittent (or partial) schedules only reward the behavior after certain ratios or intervals of responses.
Intermittent schedules can be either fixed (where reinforcement occurs after a set amount of time or responses) or variable (where reinforcement occurs after a varied and unpredictable amount of time or responses).
Intermittent schedules are also described as either interval (based on the time between reinforcements) or ratio (based on the number of responses).
Different schedules (fixed-interval, variable-interval, fixed-ratio, and variable-ratio) have different advantages and respond differently to extinction.
Compound reinforcement schedules combine two or more simple schedules, using the same reinforcer and focusing on the same target behavior.

Key Terms

extinction: When a behavior ceases because it is no longer reinforced.
interval: A period of time.
ratio: A number representing a comparison between two things.

A schedule of reinforcement is a tactic used in operant conditioning that influences how an operant response is learned and maintained. Each type of schedule imposes a rule or program that attempts to determine how and when a desired behavior occurs. Behaviors are encouraged through the use of reinforcers, discouraged through the use of punishments, and rendered extinct by the complete removal of a stimulus. Schedules vary from simple ratio- and interval-based schedules to more complicated compound schedules that combine one or more simple strategies to manipulate behavior.

Continuous vs. Intermittent Schedules

Continuous schedules reward a behavior after every performance of the desired behavior. This reinforcement schedule is the quickest way to teach someone a behavior, and it is especially effective in teaching a new behavior. Simple intermittent (sometimes referred to as partial) schedules, on the other hand, only reward the behavior after certain ratios or intervals of responses.

Types of Intermittent Schedules

There are several different types of intermittent reinforcement schedules. These schedules are described as either fixed or variable and as either interval or ratio.

Fixed vs. Variable, Ratio vs. Interval

Fixed refers to when the number of responses between reinforcements, or the amount of time between reinforcements, is set and unchanging. Variable refers to when the number of responses or amount of time between reinforcements varies or changes. Interval means the schedule is based on the time between reinforcements, and ratio means the schedule is based on the number of responses between reinforcements. Simple intermittent schedules are a combination of these terms, creating the following four types of schedules:

A fixed-interval schedule is when behavior is rewarded after a set amount of time. This type of schedule exists in payment systems when someone is paid hourly: no matter how much work that person does in one hour (behavior), they will be paid the same amount (reinforcement).
With a variable-interval schedule, the subject gets the reinforcement based on varying and unpredictable amounts of time. People who like to fish experience this type of reinforcement schedule: on average, in the same location, you are likely to catch about the same number of fish in a given time period. However, you do not know exactly when those catches will occur (reinforcement) within the time period spent fishing (behavior).
With a fixed-ratio schedule, there are a set number of responses that must occur before the behavior is rewarded. This can be seen in payment for work such as fruit picking: pickers are paid a certain amount (reinforcement) based on the amount they pick (behavior), which encourages them to pick faster in order to make more money. In another example, Carla earns a commission for every pair of glasses she sells at an eyeglass store. The quality of what Carla sells does not matter because her commission is not based on quality; it’s only based on the number of pairs sold. This distinction in the quality of performance can help determine which reinforcement method is most appropriate for a particular situation: fixed ratios are better suited to optimize the quantity of output, whereas a fixed interval can lead to a higher quality of output.
In a variable-ratio schedule, the number of responses needed for a reward varies. This is the most powerful type of intermittent reinforcement schedule. In humans, this type of schedule is used by casinos to attract gamblers: a slot machine pays out an average win ratio—say five to one—but does not guarantee that every fifth bet (behavior) will be rewarded (reinforcement) with a win.

All of these schedules have different advantages. In general, ratio schedules consistently elicit higher response rates than interval schedules because of their predictability. For example, if you are a factory worker who gets paid per item that you manufacture, you will be motivated to manufacture these items quickly and consistently. Variable schedules are categorically less-predictable so they tend to resist extinction and encourage continued behavior. Both gamblers and fishermen alike can understand the feeling that one more pull on the slot-machine lever, or one more hour on the lake, will change their luck and elicit their respective rewards. Thus, they continue to gamble and fish, regardless of previously unsuccessful feedback.

Simple reinforcement-schedule responses: The four reinforcement schedules yield different response patterns. The variable-ratio schedule is unpredictable and yields high and steady response rates, with little if any pause after reinforcement (e.g., gambling). A fixed-ratio schedule is predictable and produces a high response rate, with a short pause after reinforcement (e.g., eyeglass sales). The variable-interval schedule is unpredictable and produces a moderate, steady response rate (e.g., fishing). The fixed-interval schedule yields a scallop-shaped response pattern, reflecting a significant pause after reinforcement (e.g., hourly employment).

Extinction of a reinforced behavior occurs at some point after reinforcement stops, and the speed at which this happens depends on the reinforcement schedule. Among the reinforcement schedules, variable-ratio is the most resistant to extinction, while fixed-interval is the easiest to extinguish.

Simple vs. Compound Schedules

All of the examples described above are referred to as simple schedules. Compound schedules combine at least two simple schedules and use the same reinforcer for the same behavior. Compound schedules are often seen in the workplace: for example, if you are paid at an hourly rate (fixed-interval) but also have an incentive to receive a small commission for certain sales (fixed-ratio), you are being reinforced by a compound schedule. Additionally, if there is an end-of-year bonus given to only three employees based on a lottery system, you’d be motivated by a variable schedule.

There are many possibilities for compound schedules: for example, superimposed schedules use at least two simple schedules simultaneously. Concurrent schedules, on the other hand, provide two possible simple schedules simultaneously, but allow the participant to respond on either schedule at will. All combinations and kinds of reinforcement schedules are intended to elicit a specific target behavior.