* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download PSY304 Test 2 Review Reinforcement
Insufficient justification wikipedia , lookup
Symbolic behavior wikipedia , lookup
Psychophysics wikipedia , lookup
Behavioral modernity wikipedia , lookup
Thin-slicing wikipedia , lookup
Classical conditioning wikipedia , lookup
Attribution (psychology) wikipedia , lookup
Parent management training wikipedia , lookup
Residential treatment center wikipedia , lookup
Theory of planned behavior wikipedia , lookup
Transtheoretical model wikipedia , lookup
Sociobiology wikipedia , lookup
Impulsivity wikipedia , lookup
Theory of reasoned action wikipedia , lookup
Neuroeconomics wikipedia , lookup
Psychological behaviorism wikipedia , lookup
Applied behavior analysis wikipedia , lookup
Descriptive psychology wikipedia , lookup
Verbal Behavior wikipedia , lookup
Behavior analysis of child development wikipedia , lookup
Psychology 304 Learning James T. Todd, Ph.D. Eastern Michigan University Operant Conditioning (also Instrumental Conditioning, Trial and Error, and Type-R) 1. The reflex (S-R) is defined by association. 2. The operant is defined by its consequences. 3. A response that occurs because it was reinforced in the past is an operant. The Operant r+ R —>S (Response —>Consequence) “Law of Effect” The future probability of a response is dependent on the effects of past occurrences. First formally investigated by Edward Thorndike. His 1911 book on this, Animal Inte!igence, can be found free online. It is based on his 1896 article of the same title. The “Three-Term Contingency” d r+ S ——>R ——>S (Discriminative Stimulus —> Response —Consequence) Discriminative Stimulus: Signals the kind of consequence. Response: What the organism does. Consequence: That happens after the organism emits the response (We say that operant responses are “emitted.”) The “Three-Term Contingency” d r+ S ——>R ——>S (Discriminative Stimulus —> Response —Consequence) Based on B.F. Skinner’s work, first compiled in The Behavior of Organisms: An Experimental Analysis (1938), then expanded throughout his life. Later, we will learn about “schedules of reinforcement,” also based on Skinner’s work in the 1950s. Kinds of Consequences d r+ S ——>R ——>S (Discriminative Stimulus —> Response —Consequence) Positive reinforcer: increases the probability of the response by its occurrence. Negative reinforcer: increases the probability of the response by its removal. Positive punisher: decreases the probability of the response by its occurrence. Negative punisher: decreases the probability of the response by its removal. Kinds of Discriminative Stimuli d r+ S ——>R ——>S (Discriminative Stimulus —> Response —Consequence) Sd or S+: Generally indicates the response will be reinforced. S∆ or S- or S-delta: Generally indicates the response will be punished or not reinforced. Effects of Single Reinforcement (one powerful reinforcer can produce hundreds of responses in extinction) Superstitious Behavior Superstitious behavior: Behavior that is controlled by accidental pairings of consequences and responses. Usually, we think of this as behavior accidentally reinforced. But, it can be behavior accidentally punished or extinguished. Superstitious behavior is generally unstable, because the reinforcers or punishers aren’t consistently paired with the responding. Operant Basics Extinction: Removing the reinforcing consequence to decrease the probability of the response. Response Strength 10 7.5 5 2.5 0 0 1 2 3 4 5 6 7 8 9 10 Real Extinction There is usually a temporary increase in rate, and occasionally increases throughout extinction. These are called “extinction bursts.” Response Strength 18 Burst 13.5 9 Burst Burst 4.5 0 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 Response Shaping Differential Reinforcement of Successive Approximations. Sometimes called “hand shaping.” Now we usually do “clicker training.” This is associated with Karen Pryor in the 1960, but first written about by Skinner in 1950. You pair a click with food using Pavlovian Conditioning. Then, you use the click to gradually shape the behavior you want by approximating it in steps. Some additional reinforcement terms Primary reinforcer: A reinforcer that acts without previous experience with it (e.g., food for a hungry organism). Conditioned/secondary reinforcer: A reinforcer that acts due to previous learning or experience (e.g., money). Generalized reinforcer: a reinforcer that acts on a broad range of responses under a variety of conditions. You get what the contingency is on. Reinforcing variability Barry Schwartz argued that contingent reinforcement produces “behavioral stereotypy.” Behavioral stereotypy is behavior that is the same time after time and hard to change. Schwartz suggested using more verbal instruction and less contingent rewards in teaching to avoid behavioral stereotypy and increase creativity. Schwartz required the pigeons not only be different, but get to the goal, thereby punishing too much variability. Thus, he didn’t get variability. Schedules of Reinforcement Schedules of Reinforcement Continuous reinforcement (CRF): Each response is reinforced. • CRF produces rapid acquisition and low resistance to extinction. • The rate of responding is controlled by time required to consume the reinforcer. Fixed Ratio reinforcement (FR): Responses are reinforced after a fixed number are emitted. • FR produces a “pause and run” pattern of responding. Fixed Interval reinforcement (FI): The first response after a specified interval is reinforced. • FI produces a “scalloped” or accelerated pattern. As the time of the reinforcement approaches, the responding becomes faster. • FI responding usually begins about half-way through the interval. Variable Ratio (VR): The number of responses required to produce a reinforcer varies according to a mathematical distribution, usually random. • VR produces a high, steady rate of responding. Slot machines are on VR. Variable Interval (VI): The first response after variable intervals is reinforced. The distribution of intervals is usually random, but can follow a range of mathematical functions. • VI produces a moderate, steady rate of responding. • VI is useful a baseline for studies of other effects because changes in response rate need not affect reinforcement rate very strongly. Schedules of Reinforcement From: proprofs flashcards Some Other Schedules Fixed Time (FT): A reinforcer is delivered entirely on the basis of time, regardless of the activity of the organism. Variable Time (VT): A reinforcer is delivered entirely on the basis of time, but the time varies according to a mathematical distribution. Differential Reinforcement of Other Behavior (DRO): A reinforcer is delivered after a specified interval without a specific “target” response. • DRO is used to eliminate a response without punishment. Differential Reinforcement of Low Rates (DRL): A reinforcer is delivered if a response is emitted after a specified interval has elapsed. • DRL is used to reduce the rate of a response, but not eliminate it. Differential Reinforcement of High Rates (DRH): Reinforcement is programmed to reinforce rates above a certain value. Progressive Ratio (PR): The value of the ratio increases or decreases systematically in one direction, up or down. • PR schedules are used to test the motivation to emit responses—how much work the organism will do for the food. Some Other Schedule Concepts Limited (LH): A limited period of time a reinforcer is available on interval schedules. • An LH is added to increase the rate of responding and engagement with interval schedules. • You would generally write something like FI 20 sec (LH 5) if a reinforcer was available for only five seconds after the main 20 second FI interval elapsed. Adjusting Schedule: Any schedule in which the value required changes. PR is a type of adjusting schedule. Post-Reinforcement Pause (PRP): The amount of time the organism pauses after a reinforcer is delivered. • Usually a consideration in fixed schedules. Local Rate of Responding: The response rate in a particular part of a schedule performance, such as the rate of the run in an FR schedule. Matching Law 33% • Matching Law: Behavior is distributed among available alternatives in proportion to the relative amounts of obtained reinforcement on the alternatives. • This means that if you get 1/3 of your reinforcement from lever A, and 2/3 from lever B, you will devote 1/3 of your responding to A and 2/3 to B. A 66% B Generalized Matching Law Generalized Matching Law: Solves for the proportion of behavior accounted for by its reinforcement relative to the reinforcement for everything else. Btarget = b ( Rtarget Rtarget + Reverything else a ) b: bias a: sensitivity Matching Law Facts • If you change the reinforcement for a behavior in constant steps, the responding will change “hyperbolically.” • That means that the behavior will change quickly at first, then the effects of the changes in reinforcement will taper off. Matching Law Fact It is important to note that behavior matches various measures of reinforcement, including its duration, magnitude, quality, rate, probability, and delay. The total “hedonic value” of a consequence equals: Rate x Duration x Magnitude x Quality x Probability Delay Matching Law Facts •If you reinforce two alternatives equally often, you sometimes see a “bias,” a preference for one alternative over the other not accounted for by the schedules. •Sometimes the change in the response is less than the change in reinforcer. This is called “undermatching.” It is due to a sensitivity of less than 1.0 Matching Law Facts •Another sign of “undermatching” is the organism devoting more behavior to the low probability alternative than is expected by true matching. •You might have a 60% -40% ratio of reinforcement on two keys, yet your organism distributes its behavior 55% - 45% Rachlin & Green, 1972 •Rachlin & Green confirmed that delay of reinforcement is accounted for by matching. •Also: Once the reinforcement for an alternative is great enough, you will commit to the behavior, giving up the chance to do the other. Social Traps Social Trap: A situation that leads to a small short-term gain, but a long-term relative loss. Quick calculation: •Unless you have been taught to avoid the situation, you will reliably choose the reinforcer with the greatest hedonic value. •Always think: “What is the answer to Reinforcement/Delay?” Transaction Costs •Transaction cost: In economics, it is the cost of making an economic exchange. •For us: The response cost required to shift behavior between alternatives. •Developed by John Commons in 1933 Delay Discounting •AKA: Temporal Discounting: The reduction in the effective value of a consequence due to the passage of time. •Delayed reinforcers are usually worth less than immediate ones. Delay Discounting •Impulsivity: The degree to which a response is sensitive to temporal delays in reinforcement. •High impulsivity means you are highly affected by delays. You are impatient. •Low impulsivity means you are less affected by delays. You are patient.