Download Study Guide - DocShare.tips

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Impulsivity wikipedia , lookup

Bullying and emotional intelligence wikipedia , lookup

Prosocial behavior wikipedia , lookup

Behavioral modernity wikipedia , lookup

Observational methods in psychology wikipedia , lookup

Motivation wikipedia , lookup

Symbolic behavior wikipedia , lookup

Abnormal psychology wikipedia , lookup

Thin-slicing wikipedia , lookup

Social perception wikipedia , lookup

Transtheoretical model wikipedia , lookup

Neuroeconomics wikipedia , lookup

Parent management training wikipedia , lookup

Attribution (psychology) wikipedia , lookup

Theory of planned behavior wikipedia , lookup

Sociobiology wikipedia , lookup

Verbal Behavior wikipedia , lookup

Applied behavior analysis wikipedia , lookup

Psychophysics wikipedia , lookup

Theory of reasoned action wikipedia , lookup

Classical conditioning wikipedia , lookup

Descriptive psychology wikipedia , lookup

Psychological behaviorism wikipedia , lookup

Social cognitive theory wikipedia , lookup

Insufficient justification wikipedia , lookup

Behavior analysis of child development wikipedia , lookup

Behaviorism wikipedia , lookup

Operant conditioning wikipedia , lookup

Transcript
Study Guide
Psychology of Learning (PSY211)
Exam #2
Operant and Instrumental Conditioning: Reinforcement
Key terms
Operant (emitted) behavior: any procedure in which a behavior becomes stronger or
weaker depending on its consequences.
Reflexive (elicited) behavior: stimulus evokes in innate, oftern reflexive, response trail
and error/success.
Trial and error/success: Thorndike’s Puzzle Box/Skinners Operant Chamber/Tolman’s
Maze.
Law of effect: every action has a consequence. This consequence causes behavior to
change.
Positive reinforcement: the presence of a stimulus increases the likelihood of the
preceding response (e.g., food, money, praise, drugs, electrical stimulation of “pleasure
centers” in the brain). Sometimes called “reward.” Adding something positive.
Negative reinforcement: the removal of a stimulus increases the likelihood of the
preceding response (e.g., remove hand from a warm stove, improve grades to lift
restriction, work hard not to get fired). Taking away something negative.
Primary reinforce: naturally or innately reinforcing stimuli (e.g., food, water, sex).
Secondary (conditioned) reinforcer: reinforcers that are dependent on their association
with other reinforcers (e.g., praise, recognition, money).
Generalized reinforce: secondary reinforcers that have been paired with a wide variety
of primary reinforcers (e.g., money, praise).
Superstitious conditioning: learning about rewards or negative reinforcers from
coincidental occurrences. Incorrect learning after accidental rewards. For example,
walking under a ladder = bad luck vs. quicker clapping = closer to goal/prize.
Successive approximation (shaping)
Chaining: performing behaviors in a sequence (e.g., ordering take-out).
 Forward Chaining: Train first-to-last
 Backward Chaining: Train last-to-first
Acquisition: the initial stage in classical conditioning, gradual increase in responding
when reinforcing stimulus follows the behavior (e.g., toilet training, athletic skills, stupid
pet tricks).
 Successive Approximation (Shaping)
Extinction: removing the reinforcer in order to stop a behavior. For example, a time out
is extinction.
Spontaneous recovery: the reappearance, after a pause, of an extinguished conditioned
response.
Resurgence: reappearance of previously reinforced behavior; an animal goes through an
entire life of previously learned stunts.
Primary drives: innate drives, such as hunger, thirst, and sexual desire that arise from
basic biological needs.
1
Secondary drives: acquired through learning; affiliation, social, achievement,
aggression, power. For example, money, grades, friends, intimacy, etc.
Escape conditioning: training an organism to remove or terminate an unpleasant
stimulus. Their behavior causes an unpleasant event to stop and so they continue that
behavior. They make the correct new response to stop delivery of the undesired stimulus.
Avoidance conditioning: increase in behavior that allows one to avoid an aversive
stimulus.
Key issues/distinctions/questions
What is the one key condition for effective reinforcement?
Behavior must have a consequence.
Identify the sequence of events that leads to reinforcement or punishment.
Why is classical conditioning termed S-S and operant conditioning termed R-S?
Classical conditioning is the pairing of two stimuli (light – air putt, bell – food).
Operant conditioning is based upon reward/punishment responses to a stimuli
(roll over – doggie treat, work late – extra pay, studying – good grades).
Generate examples of the following:
• Positive reinforcement: father gives candy to his daughter when she picks up her
toys. If the frequency of picking up toys increases or stays the same, the candy is
a positive reinforce.
• Negative reinforcement: turning off distracting music when trying to work. If
the work increases when the music is turned off, turning off the music is the
negative reinforce.
• Primary reinforcer: a stimulus that does not require pairing to function as a
reinforcer and most likely has obtained this function through the evolution and its
role in species’ survival. Examples include: sleep, food, air, water and sex.
• Secondary reinforcer: a stimulus or situation that has acquired its function as a
reinforcer after pairing with a stimulus that functions as a reinforcer. For example,
the sound from a clicker, as used in clicked training. The sounds of the clicker has
been associated with praise or treats, and subsequently, the sound of the clicker
may function as a reinforcer. As with primary reinforcers, an organism can
experience satiation and deprivation with secondary reinforcers.
• Positive punishment: mother yells at a child when running into the street. If child
stops running into the street the yelling is the punishment.
• Negative punishment: a teenager comes home an hour after curfew and the
parents take away the teen’s cell phone for two days. If the frequency of coming
home after curfew decreases, the removal of the phone is negative punishment.
• Shaping: you want a sea lion on a ball. First you reward the sea lion to go near
the ball. Then reward the sea lion for touching the ball. Finally, you reward the
sea lion to get on the ball.
• Chaining: train a rat to pull a string that releases a marble, have it pick up the
marble, carry the marble to the tube, then have it drop it in the tube.
• Superstitious conditioning: the organism is rewarded (or removal of
punishment) while performing a response, and even though the response and
2
reward aren't related, the subject associates the two together.
Example: You hurt your thumb, and keep swearing until the pain goes away. The
pain eventually goes away, and you assume it was because of your swearing, and
consequently swear every time you're hurt to relieve pain. The swearing actually
did absolutely nothing, but you 'superstitiously' associate the two.
• Extinction: Pavlov stopped giving food to the dogs when he rang the bell, so the
dogs stopped salivating to the bell.
• Spontaneous recovery: after a long time of not salivating to the bell, one
instance the dog starts salivating anyway.
• Escape conditioning
• Avoidance conditioning
What effect do the following have on the acquisition of operant behaviors (i.e., on
the speed of conditioning or the strength of the response)?
• Amount of reward: many small rewards better than a few large ones.
• Type of reward: chocolate better than raisins.
• Delay of reward: the longer you delay, the less effective it becomes; immediate
is best.
• Conditioning somatic (voluntary) behavior versus autonomic (involuntary)
behaviors. Somatic is easier to donation then autonomic.
• Deprivation level: learning is faster and stronger when learned is deprive of
rewards.
• Competing rewards: conditioning is a slow and weak if other behaviors are also
being rewarded (focus on one at a time).
• Awareness of reward/behavior contingency: not necessary for conditioning;
leads to faster conditioning.
What effect do the following have on the extinction of operant behaviors?
• Reinforcement variability:
• Stimulus variability
• Response variability
How do the following theories of reinforcement differ? What are the basic problems
with each theory?
• Drive reduction: behavior is driven by a desire to lessen drives resulting from
needs that disrupt homeostasis reinforcers: primary (food, water, sex), secondary
(success, popularity).
o Drive: A motivational force. Tension from unfulfilled needs or desires
 Primary Drives (e.g. hunger, thirst)
 Secondary Drives (e.g., success, popularity)
o Reinforcer: Any stimulus that reduces drive by fulfilling the needs and
desires (e.g., food, water, money)
o Difficulties with the theory:
 Some reinforcers do not reduce drives (electrical stimulation of the
brain, copulation without ejaculation).
 Some motivations do not create states of tension that need to be
reduced (exploratory behavior).
3
•
Relative value (Premack principle):
o Reinforcers viewed as behaviors (e.g., food smell vs. chewing behavior)
o Relative value: Some behaviors are more probable (more preferred)
than others (e.g., partying vs. studying)
o Premack Principle: High probability (preferred) behavior reinforces
low probability (non-preferred) behavior
o Problems with theory:
 How to explain strong secondary reinforcers (e.g., why is verbal
praise such a powerful reward?)
 Sometimes low probability behavior reinforces high probability
behavior if the less likely behavior has been prevented
(e.g., deprivation of study time)
• Response deprivation (Timberlake and Allison): relative value of responses
depends on relative deprivation. Behaviors that are not allowed to occur will
reinforce other, less deprives, behaviors (e.g., prohibition in the 1920s made
drinking booze a much stronger reward).
What are the two processes in the two-process theory of avoidance? Why is there a
problem with using these two processes to explain avoidance?
Operant and Instrumental Conditioning: Punishment
Key terms
Positive punishment: the presence of a stimulus (usually aversive such as slap, scolding,
or a dirty look) decreases (suppresses) the likelihood of a preceding response.
Negative punishment: the removal of a stimulus (usually something pleasant such as TV
privileges or a desirable object) decreases (suppresses) the likelihood of a preceding
response. When the stimulus that is removed is a reinforcer, we call this “extinction.”
Displaced aggression: people who are punished at work might sabotage the work.
People who get punished at school might vandalize the school.
Elicited aggression: you put two people in unsafe environment, people will get
aggressive.
Learned helplessness: the failure to escape an aversive following exposure to an
inescapable aversive.
Differential Reinforcement of Low Rate (DRL): a behavior is reinforced only if it
occurs no more than a specific number of times in a given period.
Differential Reinforcement of Zero Responding (DRO): reinforcement is contingent
on the complete absence of a behavior for a period of time.
Differential Reinforcement of Incompatible Behavior (DRI): a form of differential
reinforcement in which a behavior that is incompatible with an unwanted behavior is
systematically reinforced.
Differential Reinforcement of Alternative Behavior (DRA): a form of differential
reinforcement in which a behavior that is different from an undesired behavior is
systematically reinforced.
Key issues/distinctions/questions
4
What are the three necessary characteristics for punishment?
 Behavior has a consequence (e.g., crime leads to prison, cheating leads to
dismissal)
 Behavior decreases in strength or frequency (e.g., crime declines,
cheating stops)
 Reduction in behavior is a result of its consequences (e.g., criminals go
straight because of prison, cheating stops because of dismissal)
How are punishment and negative reinforcement different?
 Positive: The behavior (response) leads to the onset of some aversive
event
that suppresses future responses (e.g., shock, scolding, physical blows)
 Negative: The behavior (response) leads to the offset (removal) of some
pleasant event that suppresses future responses (e.g., removal of
attention,
desired toy, previous rewards)
How are negative punishment and extinction related?
They are related in the sense that they both deal with the taking away of something to
decrease behavior. Extinction to take away reinforce to decrease behavior,
negative punishment it taking away of a appetite reinforce to decrease behavior.
What effect do the following conditions have on punishment?
• R-S contingency: Dependency of punishing event on behavior (the response must
lead directly to the punishing event).
• R-S delay: The longer the delay between response and punisher, the less effective
the punishment (e.g., immediate reprimands are better than delayed reprimands).
• Intensity of punisher: Strong punishers work better than weak punishers.
• Progressive punishment: Punishment is less effective if weak punishers are
followed by progressively stronger punishers.
• Behaviors that are both reinforced and punished
o Behaviors that are both reinforced and punished become resistant to
punishment (e.g., children who get attention [reinforced] by being
punished for misbehaving become increasingly troublesome).
o Punishment works best on behavior (e.g., criminal activities) when
alternative behaviors (e.g., community service) are reinforced.
o When the motivation to engage in a behavior is strong (because the
reinforcement was strong) punishment is less effective.
•
•
Presence of alternative behaviors
Punishment works best on behavior when alternative behaviors are
reinforced.
Behaviors that are strongly reinforced
When the motivation to engage in a behavior is strong (because the
punishment was strong) punishment is less effective.
5
What are the three primary theories of punishment? Which theory is the most
limited?
 Disruption Theory: Punishment suppresses responding because it
leads to a disruption of ongoing activity (e.g., jumping, freezing). Can be
dismissed rather easily.
 Two-Process Theory:
 Punishment involves both classical and operant conditioning.
 Similar to the two process theory of avoidance.
 Stimuli associated with the punisher (e.g., lever, cookie jar)
become a CS for reactions to the punisher (e.g., the sight of the
lever or the cookie jar is associated with fear).
 We avoid the CS (e.g., lever, cookie jar) and thus decrease
responses to the stimulus (e.g., don’t press lever, don’t take
cookies)
 One-Process Theory: Only operant conditioning is involved in
punishment. Punishment suppresses behavior just as reinforcement
strengthens behavior (e.g., high preference behavior reinforces low
preference behavior; low preference behavior punished high
preference behavior).
How do the one-process and two-process theories of punishment differ?
Only operant conditioning is involved in punishment. The two-process theory invovles
both classical and operant conditioning.
What are six major problems with using punishment for behavioral control?
 Temporary Effects: The effects are not long lasting.
 Escape and Avoidance: We try to escape from or avoid aversive
stimuli (e.g., running away from home, lying to parents, escaping
from prison).
 Aggression: Aversive stimuli lead to aggression.
 Displaced Aggression (e.g., sabotage, vandalism).
 Elicited Aggression.
 Apathy: Punishment suppresses other behaviors.
 Fixation: Punishment limits the range of behaviors. Animals only
respond in “safe” ways, are unwilling to try new behaviors (e.g.,
“learned helplessness”).
 Progressive Punishment can go too far (e.g., spouse abuse).
 Imitation of the Punisher (e.g., successive generations of child abuse).
What are five alternatives to aversive control?
Prevention, Extinction, differential reinforcement of zero (DRO) responding,
differential reinforcement of low rates (DRL) of behavior, and reinforce other
behaviors.
6
Why has incarceration (imprisonment) been used as a form of punishment over the
years? Has incarceration been successful?
Operant and Instrumental Conditioning: Schedules
Key terms
Cumulative responses: overtime, you look at responses as they accumulate. If the sleep
is steep then it’s a fast rate of learning.
Response rate:
Continuous reinforcement: a correct response is reinforced every time it occurs.
Intermittent (partial) reinforcement: occurs for some responses but not all.
Ratio schedule: reinforcement is based on the number of responses (the ratio of
reinforced to non reinforced responses).
Interval schedule: reinforcement is based on the time since the last reinforced response.
Fixed ratio (FR): the number of reinforced responses is a fixed number.
Variable ratio (VR): the number of reinforced responses varies.
Post-reinforcement pause: the pauses that follow reinforcement.
Run rate: the rate at which behavior occurs once it has resumes following
reinforcements.
Fixed interval (FI): the amount of time the animal must wait until the next response is
reinforced is fixed.
Variable interval (VI): the amount of times the animal must wait until the next response
is reinforced is variable.
Fixed time (FT): reinforcer is delivered after a period of time without regard to behavior;
used to establish superstitious behavior.
Variable time (VT): reinforcers delivered at irregular intervals, regardless of behavior;
also may lead to superstitious behaviors.
Fixed duration (FD): reinforce is delivered of a behavior occurs continuously over a
period of time.
Variable duration (VD): required period of performance varies around some average.
Differential reinforcement of low rates (DRL): a behavior is reinforced only if it
occurs no more than a specified number of times in a given period. DRL is used to
encourage low rates of responding. It is like an interval schedule, except that premature
responses reset the time required between behaviors.
Differential reinforcement of high rates (DRH): a form of differential reinforcement in
which a behavior that is different from an undesired behavior is systematically reinforced
– an alternative way of obtaining reinforces DRH reinforcement after a minimum number
of tiems an action is performed in a given period, produces highest rate of behavior.
Ratio stretch
Ratio strain: disruption of the pattern of responding due to stretching the ratio of
reinforcement too abruptly or too far.
Partial reinforcement effect: the tendeny of a behavior to be more resistant to extinction
when partially reinforced than when continuously reinforced.
7
Resistance to extinction: intermittent (partial) reinforcement schedules, compared to
continuous reinforcement schedules, make animals reluctant to give up responding when
the reinforcers stop.
Key issues/distinctions/questions
What behavioral pattern on a cumulative record do the following schedules
produce? Give examples of each:
• Fixed ratio: if you got a rat in a operant chamber, the rat presses the bar three
times. The third times it get reinforced. This produces a stair-case like figure.
• Variable ratio: a rat is an operant chamber, the number of reinforcement varies.
This produces a high, steady rate.
• Fixed interval: the rat has to wait a certain amount of time and presses the bar,
and then the act is reinforced. This produces a kind of scalloped function.
• Variable interval: the rat must wait, on average, 10 seconds after the last
reinforced response before the next response is reinforced, but this time can vary.
Produces a low, steady-rate function.
Give examples of the following time-related schedules:
• Fixed time: FT-10 means the animal gets a reinforcer after 10 seconds no matter
what it happens to be doing.
• Variable time: VT-10 means the reinforcer is delivered every 10 seconds, on
average, sometimes more, sometimes less.
• Fixed duration: practice violin for 30 consecutive minutes to get an ice cream
cone).
• Variable duration: if a kid is hyperactive, then you reinforce sitting quite for a
while.
Give examples of the following rate-related schedules:
• DRL: Reinforce animal for responding at a slow rate (e.g., press the bar every
five seconds). Used to help people slow down (e.g., hyperactivity).
• DRH: reinforce animal for responding at a fast rate (e.g., press bar five times
during every 10-second interval). Used to help people speed up (e.g., dawdlers).
• Ratio stretch: start the animal out on a low ratio schedule (e.g., FR-1) then
gradually increase the ratio (FR-3, FR-5, FR-10). Stretching too fast or too far
(e.g., FR-300) creates Ratio Strain (responding is disrupted).
Describe the four theories of the partial reinforcement effect. How are they
different?
 Discrimination Hypothesis: It is harder for the animal to
discriminate between an intermittent schedule and extinction than
between continuous reinforcement and extinction (i.e., the animal
can’t tell when partial reinforcement ends and extinction begins).
 Frustration Hypothesis: There is greater frustration for animals
who switch from continuous reinforcement to extinction than for
animals who switch from partial reinforcement to extinction.
Frustrated animals stop responding sooner.
8
 Sequential Hypothesis: The sequence of reinforced and
non-reinforced responses becomes a cue for future responding. An
animal performs longer in the absence of reinforcement following
intermittent rewards because non-reinforced trials are cues to keep on
responding.
 Response Unit Hypothesis:
• The response should not be defined as a single behavior (e.g., bar
press or key peck).
• The “response” is whatever complex actions (“units” of behavior)
lead to a reinforcement (e.g., the response unit for an FR-3
schedule is three bar presses).
• The response unit for extinction more closely resembles the
response unit for partial reinforcement than for continuous
reinforcement.
• During extinction, the animal may actually produces fewer
response units after partial reinforcement than after continuous
reinforcement.
Operant and Instrumental Conditioning: Generalization, Discrimination &
Transfer
Key terms
Stimulus generalization: the tendency for a response learned to one specific stimulus
(e.g., flirt with people with red hair) to also occur for other, similar, stimuli (e.g., flirt
with people with auburn hair).
Response generalization: if a response of one type (e.g., punch a classmate, typing on a
keyboard) is blocked, then there is a tendency to make a similar response to the same
stimulus (e.g., kick the classmate, bang on the keyboard).
Stimulus discrimination (Stimulus Control): when a response learned to one specific
stimulus does not occur to other stimuli (e.g., go at a green light, stop at a red light). The
opposite of stimulus generalization.
Response discrimination: learning not to make similar responses to the same stimuli
(e.g., shifting gears, discriminating between a bad golf swing and a good one). The
opposite of response generalization.
Stimulus control
Discriminative stimuli (S+): any stimulus that signals either that a behavior be reinforced
(an S+ or Sd) or will not be enforced (an S- or S[triangle]).
Successive discrimination: subject can identify the difference between different stimuli
successfully.
Simultaneous discrimination: different stimuli are presented at the same time and a
subject chooses which one to pay attention to.
Matching to sample: a discrimination procedure in which the task is to select from two
or more comparison stimuli the one that matches a sample.
Errorless discrimination: present the S+ strongly and weak form of the S-.
9
Excitatory gradient: the gradient in which a new stimulus related to a previous
excitatory stimulus.
Inhibitory gradient: the gradient in which a new stimulus is related to a prior inhibitory
stimulus.
Peak shift: subject is more likely to respond to S new than S+ because excitation is
greater than inhibition.
Basic transfer design: what we learn in one situation carrying over into another
situation. In order to study transfer, you have to have at least two groups, an experimental
and control group. Experimental group learns task 1 and then sometime later learns task
2. You want to know if task 1 helped or hurt task 2. You don’t know the answer to that
until you compare it to the control group. The control rests during the time that the
experimental groups learns task 1. Then at a later time, the same time that the
experimental group is learning task 2, the control group learns task 2. If the experiment
does better in task 2, then task 1 helped them.
Positive transfer: experimental groups perform better on Task 2 than control group.
Negative transfer: experimental group performs worse on task 2 then control group.
Warm-up effects: you start off studying and it’s a struggle but as you progress it gets
easier.
Learning to learn: you start learning something and it’s difficult. Then, you learn new
strategies and get better at it.
Key issues/distinctions/questions
Provide examples of the following:
• Stimulus generalization: you have the tendency to flirt with red heads, you begin
the tendency to respond to people with similar hair color (I.E., strawberry blond
or auburn).
• Response generalization: a classmate punches you, and then you might kick the
classmate. You are being frustrated on typing on a keyboard; you then pound on
the keyboard as if it will help.
• Stimulus discrimination: You go at a green light, stop at a red light. We do one
thing when the light if green and we do not do that one thing when the light is red.
• Response discrimination: shifting gears in a car. Discriminating between a bad
gold swing and a good one.
What are the essential elements of Pavlov’s Physiological Theory of discrimination?
The reinforce stimulus (S+) creates an area of excitation in the brain that
produces a response(R). The non-reinforced stimulus (S-), creates an area of
inhibition in the brain that inhibits responding and produces non responding (NR).
What are the essential elements of Spence’s Gradient Theory of discrimination?
How does it differ from the Lashley-Wade theory?
S+ creates a gradient of excitation (green), S- creates a gradient of
inhibition (red). The tendency to respond to a new stimulus reflects the net
difference between excitation and inhibition.
Using the basic transfer design, how does a researcher know when negative transfer
has occurred? Why?
10
When the experimental group does worse on task 2 than the control group. For
example, an experimental group learns how to play tennis. The control group
rests. Then, both of these groups learn how to play racquet ball. Usually, the
control group will play better because the experimental group has tennis in mind.
In a transfer experiment with SA-RB in Task1 and SC-RD in Task 2, what do the A, B,
C, D subscripts refer to?
Which of the following conditions usually lead to positive transfer and which usually
lead to negative transfer?
• Response generalization - Positive
• Stimulus generalization - Positive
• Response facilitation/mediation - Positive
• Response interference – Negative
Supply some examples of the following transfer situations. Which produce positive
transfer and which produce negative transfer?
11