Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Behavior analysis of child development wikipedia , lookup
Educational psychology wikipedia , lookup
Behaviorism wikipedia , lookup
Psychophysics wikipedia , lookup
Learning theory (education) wikipedia , lookup
Eyeblink conditioning wikipedia , lookup
Psychological behaviorism wikipedia , lookup
Introduction 1 Literature! R. S. Sutton, A. G. Barto: Reinforcement Learning: An Introduction! MIT Press, 1998! http://www.cs.ualberta.ca/~sutton/book/the-book.html! E. Alpaydin: Machine Learning! MIT Press, 2004! S.J. Russell, P. Norvig:! Künstliche Intelligenz – Ein moderner Ansatz.! Prentice Hall, 2004. ! http://aima.cs.berkeley.edu/! Introduction 2 What is Learning ?! Learning denotes changes in the system that are adaptive in the sense that they enable the system to do the same task or tasks drawn from the same population more efficiently and more effectively the next time (Simon, 1983). ! Learning is constructing or modifying representations of what! is being experienced (Michalski, 1986).! Introduction 3 Learning strategies! •! Route learning and direct implanting of new knowledge! •! Learning from instruction! •! Learning by analogy! •! Learning from examples! •! Learning from observation and discovery! Introduction 4 Learning agents! Introduction 5 Learning agents - Learning element! •! Design of a learning element is affected by! –! Which components of the performance element are to be learned! –! What feedback is available to learn these components! –! What representation is used for the components! •! Type of feedback: !! –! Supervised learning: correct answers for each example! –! Unsupervised learning: correct answers not given! –! Reinforcement learning: occasional rewards! Introduction 6 Learning agents - Problem generator! •! Suggests exploratory actions! •! Will lead to new and informative experiences! •! This is what scientists do when they carry out experiments! Introduction 7 What is Reinforcement Learning?! •! •! •! •! An approach to Artificial Intelligence! Learning from interaction! Goal-oriented learning! Learning about, from, and while interacting with an external environment! •! Learning what to do—how to map situations to actions—so as to maximize a numerical reward signal! Introduction 8 RL in Computer Science! Artificial Intelligence! Control Theory and! Operations Research! Psychology! Reinforcement! Learning (RL)! Neuroscience! Artificial Neural Networks! Introduction 9 Key Features of RL! •! •! •! •! Learner is not told which actions to take! Trial-and-Error search! Possibility of delayed reward! Sacrifice short-term gains for greater long-term gains! •! The need to explore and exploit! •! Considers the whole problem of a goal-directed agent interacting with an uncertain environment! Introduction 10 Complete Agent! •! Temporally situated! •! Continual learning and planning! •! Agent changes its state by an action within the environment! •! Environment is stochastic and uncertain! Environment! action! state! reward! Agent! Introduction 11 Supervised Learning! Training Info = desired (target) outputs! Inputs! Supervised Learning ! System! Outputs! Error = (target output – actual output)! Introduction 12 Unsupervised Learning! Inputs! Unsupervised! Learning System! Outputs! Introduction 13 Reinforcement Learning! Training Info = evaluations (“rewards” / “penalties”)! Inputs! RL! System! Outputs (“actions”)! Objective: get as much reward as possible! Introduction 14 Example - Chess! •! States – Position of the figures on the board! •! Actions – elegible moves! •! Reward – typically delivered at the end of the game (Win, Lost, Remis)! •! No comments or reward during the game! •! Agent learns only by playing many games! Introduction 15 Example – Food Seeking Agent! •! Actions – Forwards-/Backwardsmovement und Right-/Left-Rotation! •! Reward – Food! •! No prior knowledge about good movements! •! No long distance sensor (vision) to see the food from a long distance! Introduction 16 Example – Labyrinth ! state! h! i! e! a! b! j! k! f! g! c! d! terminal states! actions! reward! else! Introduction 17 Biological background - Literature! Anderson, J. R. (2000). Learning and Memory. New York: Wiley Verlag. Mazur, J. E. (2006) Lernen und Verhalten. 6., aktualisierte Auflage. Pearson Studium. Domjan, M. P. (2003). The Principles of Learning and Behavior, fifth edition. Thomson. Introduction 18 Forms of Learning! •! Classical (Pawlovian) Conditioning: Learning through the association of stimuli! •! Instrumental Conditioning: Learning through the consequences of actions! •! Modeling: Learning through observation and imitation! Introduction 19 Lerning: Definition! Learning is a process that is mediated by experiences and evokes individual, long-term changes of behavior. A process that evokes changes Compared to memory, the result of such a process Introduction 20 Lerning: Definition! Learning is a process that is mediated by experiences and evokes individual, long-term changes of behavior. Individual changes Compared to evolution Introduction 21 Lerning: Definition! Learning is a process that is mediated by experiences and evokes individual, long-term changes of behavior. Learning is mediated by •! experiences •! exercises Not by •! growing •! getting tiered •! injury Introduction 22 Lerning: Definition! Learning is a process that is mediated by experiences and evokes individual, long-term changes of behavior. long-term changes Compared to •! attention •! working memory •! motivation Introduction 23 Lerning: Definition! Learning is a process that is mediated by experiences and evokes individual, long-term changes of behavior. Behaviorism: changes should lead to a different behavior, at least be triggered by external events Introduction 24 Experimental research in learning! Behaviorismus: Stimuli in the environment lead to reactions of the organism (= response) Stimulus Response • Behavior is determined by the environment, but can be modified • Analysis of stimulus-response relation • Criteria: Observable and repeatable • Little interest in internal processes Introduction 25 Classical conditioning! Introduction 26 Classical conditioning! Ivan Pavlov 1849-1936 ! Introduction 27 Classical conditioning – initial situation! Conditioned stimulus (CS)! … no response! Unconditioned stimulus (US)! Unconditioned response (UR)! time! " Introduction 28 Classical conditioning – acquisition! Conditioning: Pairing of CS and US! Conditioned Stimulus (CS)! Unconditioned Stimulus (US)! (Un-)Conditioned response! (UR/CR)! Test: Conditioned response on CS alone! Conditioned Stimulus (CS)! Conditioned response (CR)! time! Introduction 29 Classical conditioning – extinction! Initially:! Conditioned Stimulus (CS)! Conditioned response (CR)! Later:! Conditioned Stimulus (CS)! … no response! Introduction 30 Classical conditioning – timing! Introduction 31 Classical conditioning – temporal contiguity! time On Delayed-! Conditioning! Trace-! Conditioning! Long-delayed! Conditioning! Simultaneous ! Conditioning! Backwards-! Conditioning! Off CS Works pretty well US CS US Less effective, since it requires some form of memory CS Only sometimes effective US CS Often ineffective US CS Typically ineffective, ! but not always! US Introduction 32 Classical conditioning – contingency! The conditioned stimulus indicates that the unconditioned stimulus will appear: P ( US | CS ) > P ( not US | CS ) Example: P ( Food | Tone ) > P ( no food | Tone ) Introduction 33 Classical conditioning – blocking! Two conditioned stimuli: CSA (Tone) and CSB (Light)! One unconditioned stimulus: US (E-Shock)! Phase 1:! Control group! Phase 2:! Test:! Result:! CSA + CSB -> US! CSB! Strong Ass.! CSA + CSB -> US! CSB! Weak Ass.! Experimental group! CSA -> US! CSB is not sufficiently informative – the frequency of pairings is irrelevant! Introduction 34 Classical conditioning – blocking! Two conditioned stimuli: CSA (Tone) und CSB (Light)! Two unconditioned stimuli: US1 (mild E-Shock, US2 (strong E-Shock)! Phase 1:! Phase 2:! Test:! Result:! Control group! CSA + CSB -> US2! CSB! Strong Ass.! CSA + CSB -> US2! CSB! Strong Ass.! Experimental group! CSA -> US1! CSB is now informative for US2! Introduction 35 Classical conditioning – blocking! Two conditioned stimuli: CSA (Tone) and CSB (Light)! One unconditioned stimulus: US (E-Shock)! Phase 1:! Control group! Phase 2:! Test:! Result:! CSA + CSB -> US! CSB! Strong Ass.! CSA + CSB -> US! CSB! CSA + CSB -> US! CSB! Experimental group! CSA -> US! Weak Ass.! Experimental group 2! CSA -> no US! very strong Ass.! CSB is now even more informative! Introduction 36 Rescorla-Wagner Theory (1972)! " ! " ! An organism learns if events violate its expectations! Expectations are developed if relevant (salient) events follow a stimulus-complex.! Introduction 37 Rescorla-Wagner-Model! "V = ! (" – V)! V != present association strength! "V != change of the association strength! ! != Learning rate! " != maximal association strength! Introduction 38 Parameters before conditioning! " ! V != 0 (no conditioning at this point)! " ! " != 100 (arbitrary chosen, but depends on the ! ! !strength of the US)! " ! ! != .5 (0 < ! < 1)! Introduction 39 1. Trial! ! * Trial ! Association strength (V) 1 (" - V) != !.5 * (100 - !"V ! ! 0) != 50 !! 100 80 60 50 40 V 20 0 0 1 2 3 4 5 Trials 6 7 8 Introduction 40 2. Trial! ! * Trial ! Association strength (V) 2 (" - V) != !"V ! .5 * (100 - 50) != 25 !! 100 80 75 60 50 40 V 20 0 0 1 2 3 4 5 Trials 6 7 8 ! Introduction 41 3. Trial! ! * Trial " Association strength (V) 3 (" - V) != !"V .5 * (100 - 75) " "= ! ! "12.5" 100 87.5 80 75 60 50 40 V 20 0 0 0 1 2 3 4 5 Trials 6 7 8 Introduction 42 4. Trial! ! * Trial ! Association strength (V) 4 (" - .5 * (100 V) != ! - 87.5) = !6.25! 100 93.75 87.5 80 75 60 50 40 V 20 0 !"V 0 0 1 2 3 4 5 Trials 6 7 8 ! Introduction 43 5. Trial! ! * Trial ! Association strength (V) 5 (" - .5 * (100 V) != - 93.75 87.5 ! 96.88 75 60 50 40 V 20 0 ! 93.75) =!3.125! 100 80 !"V 0 0 1 2 3 4 5 Trials 6 7 8 Introduction 44 6. Trial! ! * Trial ! Association strength (V) 6 (" - V) != .5 * (100 - 93.75 87.5 98.44 96.88 75 60 50 40 V 20 0 ! 96.88) =!1.56! 100 80 !"V 0 0 1 2 3 4 5 Trials 6 7 8 ! Introduction 45 7. Trial! ! * Trial ! Association strength (V) 7 (" - V) != !"V ! ! .5 * (100 - 98.44) = .78! 100 93.75 87.5 80 96.88 99.22 75 60 50 40 V 20 0 98.44 0 0 1 2 3 4 5 Trials 6 7 8 Introduction 46 8. Trial! ! * Trial ! Association strength (V) 8 (" - V) != !"V .5 *!(100 - 99.22) 100 93.75 87.5 80 != .39! 99.61 98.44 96.88 99.22 75 60 50 40 V 20 0 ! 0 0 1 2 3 4 5 Trials 6 7 8 ! Introduction 47 1. Extinction! ! * Trial 1 ! (" - .5 * (0 - V) != !"V 99.61) != ! ! !-49.8! Association strength (V) Acquisition 100 93.75 87.5 80 99.61 98.44 96.88 99.61 99.22 V 75 60 49.81 50 40 V 20 0 Extinction 0 0 1 2 3 4 5 Trials 6 7 8 0 1 2 3 4 Trials 5 6 Introduction 48 2. Extinction! ! * Trial 2 ! (" - V) != !"V .5 * (0 - 49.8) != ! ! !-24.9! Association strength (V) Acquisition 100 93.75 87.5 80 99.61 98.44 96.88 99.61 99.22 V 75 60 49.81 50 40 24.91 V 20 0 Extinction 0 0 1 2 3 4 5 Trials 6 7 8 0 1 2 3 4 Trials 5 6 Introduction 49 Acquisition- & Extinction-curves # with !=.5 and " = 100! Association strength (V) Acquisition 100 93.75 87.5 80 99.61 98.44 99.61 99.22 96.88 V 75 60 49.8 50 40 24.91 12.45 6.23 3.11 1.56 V 20 0 Extinction 0 0 1 2 3 4 5 Trials 6 7 8 0 1 2 3 4 Trials 5 6 Introduction 50 Acquisition- & Extinction-curves with# !=.5 and !=.2 (" = 100)! "V = ! (" – V)! Extinction 120 100 80 60 40 !=.5 !=.2 20 0 0 1 2 3 4 5 Trials 6 7 8 Assoziationsstärke (V) Association strength (V) Acquisition 120 100 80 60 !=.5 !=.2 40 20 0 0 1 2 3 4 Trials 5 6 Introduction 51 Combined stimuli! add up the individual association strength:! Vcomb = VCS1 + VCS2! Trial 1:! "VTone = .2 (100 – 0) = (.2)(100) = 20! "VLight = .2 (100 – 0) = (.2)(100) = 20! Vcomb = act. Vcomb+ "VTone+ "VLight = 0 +20 +20 = 40! Trial 2:! 100 Association strength (V) If multiple stimuli are present it is necessary to ! single 80 60 combined 40 20 0 "VTone = .2 (100 – 40) = (.2)(60) = 12! 1 2 3 4 5 6 7 8 9 10 CS-US Pairs "VLight = .2 (100 – 40) = (.2)(60) = 12! Vcomb = act. Vcomb + "VTone + "VLight = 40+12+12=64! # Combined stimuli - Overshadowing! If the learning rates are different (since the stimuli are not equally salient), the more salient stimulus dominates the association:! Trial 1:! "VTone = .4 (100 – 0) = (.4)(100) = 40! 80 salient 60 40 "VLight = .1 (100 – 0) = (.1)(100) = 10! Vcomb = act. Vcomb + "VTone+ "VLight = 0 +40 +10 = 50! Trial 2:! "VTone = .4 (100 – 50) = (.4)(50) = 20! Introduction 52 20 0 less salient 1 2 3 4 5 6 7 8 9 10 CS-US Pairs "VLight = .1 (100 – 50) = (.1)(50) = 5! Vcomb = act. Vcomb + "VTone + "VLight = 50+20+5=75! # Introduction 53 Blocking! The conditioning of CSA (Tone) in phase 1 makes up the largest proportion of Vcomb. Thus, only a small proportion of Vcomb is left for CSB (Light) in phase 2.! !Phase 1: Vcomb != VCS-A = 100! !Phase 2: Vcomb = VCS-A + VCS-B = 100 + 0 = 100! ! ! ! "V != ! (100 - Vcomb) = 0! In case of a larger max. association strength " of the CSB it can be conditioned in addition to CSB! Introduction 54 Conditioned inhibition! Two conditioned stimuli: CS+ (Tone) und CS- (Light)! One unconditioned stimulus: US (Food)! Learning phase:! CS+ -> US! CS- + CS+ -> no US! CSA -> US! Test:! Result:! CS+! CR! CS- + CS+! no CR! CSA + CS-! no CR! Introduction 55 Conditioned inhibition# Two conditioned stimuli: CS+ (Tone) und CS- (Light)! One unconditioned stimulus: US (Food)! Association with CS+:! Vcs+ = 100! 100 CS+! 50 Association with combination CS+ / CS-:! 0 Vcomb = Vcs+ + Vcs- = 0! Thus: Vcs- = -100! -50 CS-! -100 (Stimuli in trails alternate between CS+ and CS+/CS-)! #$ Introduction 56 Problems of the Rescorla-Wagner model! Configural learning: ! !CSA->US, CSB->US, CSA+CSB->no US! !Solution: Implement CSA+CSB as a single new stimulus CSC ! Latent inhibition: ! !First CS->no stimulus, then CS->US results in only slow learning! !Solution: Reduce learning rate ! by CS->no US! Preferred and unpriviledged conditioning:! !Taste -> Nausea works better than Light -> Nausea! !Solution: Make learning rate ! dependent on CS-US combinations! The model can not explain all observations. ! Introduction 57 Thorndike’s cat puzzles! E. L. Thorndike (1874 - 1949) Hungry cat is put into a cage. If the cat shows a particular behavior (pull a cord, turn a lock) the door is opened and the cat could go outside and eat the food placed there. Introduction 58 Thorndike’s Law of Effect! E. L. Thorndike (1874 - 1949) Of several responses made to the same situation, those which are accompanied or closely followed by satisfaction to the animal will, other things being equal, be more firmly connected with the situation, so that, when it recurs, they will be more likely to recur; those which are accompanied or closely followed by discomfort to the animal will, other things being equal, have their connections with that situation weakened, so that, when it recurs, they will be less likely to occur. (p. 244)! Thorndike, E. L. (1911). Animal intelligence: Experimental studies. New York : Macmillan.! Introduction 59 Operant conditioning! •!Behavior occurs also without external stimuli. ! •!“free operants” instead of reactions! •!Reinforcement processes primarily shape the behavior: positive reinforcement is the strengthening of behavior and negative reinforcement is the strengthening of behavior by the removal or avoidance of some event. ! B. F. Skinner (1904 - 1990) Introduction 60 Operant conditioning! “I would define operant conditioning as shaping and maintaining behavior by making sure that reinforcing consequences follow” B. F. Skinner (1904 - 1990) Introduction 61 Similarities between classical and instrumental conditioning! •! Classical conditioning: Contingency between stimulus 1 (CS) and stimulus 2 (US)! •! Instrumental conditioning: Contingency between stimulus 1, reaction und stimulus 2! •! Both show acquisition, extinction and spontaneous recovery! •! Both show dependence on contiguity (temporal proximity)! •! In both cases contiguity alone is not sufficient! Introduction 62 Kontingenz vs. Kontiguität! Classical conditioning: Instrumental conditioning: The conditioned stimulus predicts the occurrence of the unconditioned stimulus: The response on the stimulus increases the probability that the reinforcer appears: P ( US | CS ) > P ( not US | CS ) P ( V | R, S ) > P ( not V | R, S ) e.g.: P ( Food | Tone ) > P ( no Food | Tone ) e.g.: P ( Food | Button press after the tone) > P ( no Food | Button press after the tone) Introduction 63 1.5 Elements of Reinforcement Learning!