Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Instrumental Learning All Learning where an animal operates on its environment to obtain a reinforcement. Operant (Skinnerian) conditioning 1. Thorndike and the law of effect • The animal has an increased probability to repeat the behavior that was emitted just before the reward. • As Thorndike would say, “the memory becomes stamped in” Many different type a learning apparatus were tried between 1900-1945 • Escape learning for cats (Thorndike, Guthrie). • Rat Jumping Stand (Myers) • Complex Maze Learning (Tolman, Lashley) • T-maze (Hull, Spence) • Morris water-maze (modern day development) The Common element • All used a fixed trial presentation method. • Subjects had a fixed experience across animals and the learning varied per animal. • Subjects had a variable experience but had a fixed criterion of learning that must be obtained. • One looked at the time it took to acquire the goal (mazes) or the number of trials to reach the criterion. Skinner and Operant Behavior The unique feature of operant training is the experimenter waits until the animal does the specified response before a reward is given. This is called free operant behavior. Reward vs. Reinforcement • A reward is a global state of affairs given to the whole animal (food, electric shock). • A reinforcement is for the specific discreet response done by the animal to obtain the reward. Primary reinforcers • Eating, drinking & sexing • Addicting drugs • For animals the equivalent of money for humans, e.g., poker chips, marbles. • When used in this way it is called Condition reinforcers Positive and Negative Reinforcement • Both positive and aversive stimuli can be used to guide behavior. Both are used to increase a desired response. The reinforcement is delivered close in time after the emission of the desired response is accomplished. Reinforcement & Punishment • Concept – Positive Reinforcement Description • Increasing the frequency of a behavior by following it with the presentation of a positive reinforcer – a pleasant, positive stimulus or experience Example • Saying “Good job” after someone works hard to perform a task. Types of reinforces • Appetitive – usually food • Negative --- shock, air puff; those stimuli that deliver pain or discomfort. Positive Reinforcement Concept: • Negative reinforcer Negative Reinforcement Note the following • The removal of a negative stimulus is positively reinforcing – the animal will tend to do that behavior that removes itself from the cues associated with the aversive state of affairs. Reinforcement/Punishment Shaping • Shaping is the method by which one gets the animal to accomplish the desired response in the first place. • The final behavior desired is broken down into small steps or increments. The accomplishment of the first step leads directly to the next step in the chain. How to train a monkey to hit a key. Continuous reinforcement • A reinforcement is given for every desired response. Stop giving the reinforcement the animal stops responding.. Intermittent reinforcement • Intermittent reinforcement is more resistant to extinction than continuous reinforcement. Appetitive Schedules of reinforcement • Schedules of reinforcement are base on two criteria., number of responses, or the passage of time. Ratio Schedules (FR) • Fixed ratio schedule delivers a reinforcement after a given number of responses has been formed. Variable Ratio (VR) • Here the number of responses varies about a mean response rate • Slope is not quite as steep as fixed ratio Fixed interval (FI) • Here, a reinforcement is delivered after the first response after the passage of a fixed amount of time. • Note the scalloping of the cummulative record. Variable Interval (VI) • Variable interval is similar to FI schedule except it is the time lapse between the availability of successive reinforcements that is varied. For example, 1, 3, 2, ect. The interval is named after the mean amount of time past. Again the reinforcement is delivered after the first response after the interval has past. VI • Note that in variable interval schedules one does not see the scalloping one sees in FI schedules. The slope is not as steep as in VR not FR schedules Differential Reinforcement for Rate • In ratio schedules there is a contingency between the rate of responding and the rate of reinforcement. That is the faster the animal responds the faster it gets a reinforcement. The contingency is not as strong for interval schedules but still there. Setting up a Differential Rate • One sets up a contingency between the numbers of responses within a given time interval for reinforcement. The key is to control the rate of response per unit time, i.e. control the inter-response time (IRT) Differential Reinforcement for High Rates (DRH) • Here the animals must respond 10 times in 5 seconds as and example. Each time this criteria is met the animal get reinforced after the last response Differential Reinforcement for Low Rates of Responding (DRL) • Here the animal must inhibit early responses to meet a criterion of say 10 sec. If the animal responds prior to the 10 sec a clock is reset and the animal must start the wait period over. Current theory postulates two underlying processes • The animal forms a temporal discrimination. • The animal actively inhibits responding. (uses ancillary responses, not to the requisite key, or bar to pass the time). DRH/DRL • Respond within a window of time. Must respond after a specific time has past, must not allow an upper time span to be exceeded. • Wyler/Prim study using single neuron Negative Control of Behavior • Behavior emitted that removes an aversive state of affairs. Negative reinforcer Description: Increasing the frequency of a behavior by following it with the removal of an unpleasant stimulus or experience Concept • Avoidance conditioning Avoidance conditioning • Description: Learning to make a response that avoids an unpleasant stimulus. Example • You slow your car to the speed limit when you spot a police car, thus avoiding being stopped and reducing the fear of a fine; very resistant to extinction 1. Escape and Avoidance The control of Intrinsic behavior • Avoidance tasks the removal of one-self from an environment which has previously been associated with a negative reinforcement. Sidman Avoidance • Shock-Shock interval (shock every 5 sec) S. A. (cont.) • Response shock interval (time delay of shock/bar push) S. A. (cont.) • Very, very hard to extinguish. • VAN - chimp VIII. Punishment – different types • Punishment 2 (Penalty) Example • You learn to use the mute button on the TV remote control to remove the sound of an obnoxious commercial Concept • Escape Conditioning Escape Conditioning • Description: Learning to make a response that removes an unpleasant stimulus Example • A little boy learns the crying will cut short the time that he must stay in his room Concept • Punishment Punishment • Description: Decreasing the frequency of a behavior by either presenting an unpleasant stimulus (punishment 1) or removing a pleasant one (punishment 2 (penalty). Example • You swat the dog after it steals food from the table, or you take a favorite toy away from a child who misbehaves. A number of cautions should be kept in mind when using punishment (see below for an example). Learned helplessness • Continued punishment until the animal refuses to respond even when there is no aversive state of affairs. Combined Operant and C. C.