Download Literature What is Learning

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Behavior analysis of child development wikipedia , lookup

Educational psychology wikipedia , lookup

Behaviorism wikipedia , lookup

Psychophysics wikipedia , lookup

Learning theory (education) wikipedia , lookup

Learning wikipedia , lookup

Eyeblink conditioning wikipedia , lookup

Psychological behaviorism wikipedia , lookup

Classical conditioning wikipedia , lookup

Operant conditioning wikipedia , lookup

Transcript
Introduction 1
Literature!
R. S. Sutton, A. G. Barto: Reinforcement Learning: An Introduction!
MIT Press, 1998!
http://www.cs.ualberta.ca/~sutton/book/the-book.html!
E. Alpaydin: Machine Learning!
MIT Press, 2004!
S.J. Russell, P. Norvig:!
Künstliche Intelligenz – Ein moderner Ansatz.!
Prentice Hall, 2004. !
http://aima.cs.berkeley.edu/!
Introduction 2
What is Learning ?!
Learning denotes changes in the system that are adaptive
in the sense that they enable the system to do the same
task or tasks drawn from the same population more
efficiently and more effectively the next time (Simon, 1983). !
Learning is constructing or modifying representations of what!
is being experienced (Michalski, 1986).!
Introduction 3
Learning strategies!
•! Route learning and direct implanting of new
knowledge!
•! Learning from instruction!
•! Learning by analogy!
•! Learning from examples!
•! Learning from observation and discovery!
Introduction 4
Learning agents!
Introduction 5
Learning agents - Learning element!
•! Design of a learning element is affected by!
–! Which components of the performance element are
to be learned!
–! What feedback is available to learn these
components!
–! What representation is used for the components!
•! Type of feedback: !!
–! Supervised learning: correct answers for each
example!
–! Unsupervised learning: correct answers not given!
–! Reinforcement learning: occasional rewards!
Introduction 6
Learning agents - Problem generator!
•! Suggests exploratory actions!
•! Will lead to new and informative experiences!
•! This is what scientists do when they carry out
experiments!
Introduction 7
What is Reinforcement Learning?!
•!
•!
•!
•!
An approach to Artificial Intelligence!
Learning from interaction!
Goal-oriented learning!
Learning about, from, and while interacting with
an external environment!
•! Learning what to do—how to map situations to
actions—so as to maximize a numerical reward
signal!
Introduction 8
RL in Computer Science!
Artificial Intelligence!
Control Theory and!
Operations Research!
Psychology!
Reinforcement!
Learning (RL)!
Neuroscience!
Artificial Neural Networks!
Introduction 9
Key Features of RL!
•!
•!
•!
•!
Learner is not told which actions to take!
Trial-and-Error search!
Possibility of delayed reward!
Sacrifice short-term gains for greater long-term
gains!
•! The need to explore and exploit!
•! Considers the whole problem of a goal-directed
agent interacting with an uncertain environment!
Introduction 10
Complete Agent!
•! Temporally situated!
•! Continual learning and planning!
•! Agent changes its state by an action within the
environment!
•! Environment is stochastic and uncertain!
Environment!
action!
state!
reward!
Agent!
Introduction 11
Supervised Learning!
Training Info = desired (target) outputs!
Inputs!
Supervised Learning !
System!
Outputs!
Error = (target output – actual output)!
Introduction 12
Unsupervised Learning!
Inputs!
Unsupervised!
Learning System!
Outputs!
Introduction 13
Reinforcement Learning!
Training Info = evaluations (“rewards” / “penalties”)!
Inputs!
RL!
System!
Outputs (“actions”)!
Objective: get as much reward as possible!
Introduction 14
Example - Chess!
•! States – Position of the figures on the board!
•! Actions – elegible moves!
•! Reward – typically delivered at the end of the
game (Win, Lost, Remis)!
•! No comments or reward during the game!
•! Agent learns only by playing many games!
Introduction 15
Example – Food Seeking Agent!
•! Actions – Forwards-/Backwardsmovement und
Right-/Left-Rotation!
•! Reward – Food!
•! No prior knowledge about good movements!
•! No long distance sensor (vision) to see the food
from a long distance!
Introduction 16
Example – Labyrinth !
state!
h!
i!
e!
a!
b!
j!
k!
f!
g!
c!
d!
terminal states!
actions!
reward!
else!
Introduction 17
Biological background - Literature!
Anderson, J. R.
(2000).
Learning and
Memory.
New York:
Wiley Verlag.
Mazur, J. E. (2006)
Lernen und
Verhalten. 6.,
aktualisierte
Auflage.
Pearson Studium.
Domjan, M. P.
(2003).
The Principles of
Learning and
Behavior, fifth
edition.
Thomson.
Introduction 18
Forms of Learning!
•! Classical (Pawlovian) Conditioning: Learning
through the association of stimuli!
•! Instrumental Conditioning: Learning through the
consequences of actions!
•! Modeling: Learning through observation and
imitation!
Introduction 19
Lerning: Definition!
Learning is a process that is mediated by experiences and
evokes individual, long-term changes of behavior.
A process that evokes changes
Compared to memory, the
result of such a process
Introduction 20
Lerning: Definition!
Learning is a process that is mediated by experiences and
evokes individual, long-term changes of behavior.
Individual changes
Compared to evolution
Introduction 21
Lerning: Definition!
Learning is a process that is mediated by experiences and
evokes individual, long-term changes of behavior.
Learning is mediated by
•! experiences
•! exercises
Not by
•! growing
•! getting tiered
•! injury
Introduction 22
Lerning: Definition!
Learning is a process that is mediated by experiences and
evokes individual, long-term changes of behavior.
long-term changes
Compared to
•! attention
•! working memory
•! motivation
Introduction 23
Lerning: Definition!
Learning is a process that is mediated by experiences and
evokes individual, long-term changes of behavior.
Behaviorism: changes should lead
to a different behavior, at least be
triggered by external events
Introduction 24
Experimental research in learning!
Behaviorismus:
Stimuli in the environment lead to
reactions of the organism (= response)
Stimulus
Response
• Behavior is determined by the environment, but can be
modified
• Analysis of stimulus-response relation
• Criteria: Observable and repeatable
• Little interest in internal processes
Introduction 25
Classical conditioning!
Introduction 26
Classical conditioning!
Ivan Pavlov
1849-1936
!
Introduction 27
Classical conditioning – initial situation!
Conditioned stimulus (CS)!
… no response!
Unconditioned stimulus (US)!
Unconditioned response (UR)!
time!
"
Introduction 28
Classical conditioning – acquisition!
Conditioning: Pairing of CS and US!
Conditioned Stimulus (CS)!
Unconditioned Stimulus (US)!
(Un-)Conditioned response!
(UR/CR)!
Test: Conditioned response on CS alone!
Conditioned Stimulus (CS)!
Conditioned response (CR)!
time!
Introduction 29
Classical conditioning – extinction!
Initially:!
Conditioned Stimulus (CS)!
Conditioned response (CR)!
Later:!
Conditioned Stimulus (CS)!
… no response!
Introduction 30
Classical conditioning – timing!
Introduction 31
Classical conditioning – temporal contiguity!
time
On
Delayed-!
Conditioning!
Trace-!
Conditioning!
Long-delayed!
Conditioning!
Simultaneous !
Conditioning!
Backwards-!
Conditioning!
Off
CS
Works pretty well
US
CS
US
Less effective, since it
requires some form of
memory
CS
Only sometimes
effective
US
CS
Often ineffective
US
CS
Typically ineffective, !
but not always!
US
Introduction 32
Classical conditioning – contingency!
The conditioned stimulus indicates that the unconditioned stimulus
will appear:
P ( US | CS ) > P ( not US | CS )
Example:
P ( Food | Tone ) > P ( no food | Tone )
Introduction 33
Classical conditioning – blocking!
Two conditioned stimuli: CSA (Tone) and CSB (Light)!
One unconditioned stimulus: US (E-Shock)!
Phase 1:!
Control group!
Phase 2:!
Test:!
Result:!
CSA + CSB -> US!
CSB!
Strong Ass.!
CSA + CSB -> US!
CSB!
Weak Ass.!
Experimental group!
CSA -> US!
CSB is not sufficiently informative – the frequency of pairings is irrelevant!
Introduction 34
Classical conditioning – blocking!
Two conditioned stimuli: CSA (Tone) und CSB (Light)!
Two unconditioned stimuli: US1 (mild E-Shock, US2 (strong E-Shock)!
Phase 1:!
Phase 2:!
Test:!
Result:!
Control group!
CSA + CSB -> US2!
CSB!
Strong Ass.!
CSA + CSB -> US2!
CSB!
Strong Ass.!
Experimental group!
CSA -> US1!
CSB is now informative for US2!
Introduction 35
Classical conditioning – blocking!
Two conditioned stimuli: CSA (Tone) and CSB (Light)!
One unconditioned stimulus: US (E-Shock)!
Phase 1:!
Control group!
Phase 2:!
Test:!
Result:!
CSA + CSB -> US!
CSB!
Strong Ass.!
CSA + CSB -> US!
CSB!
CSA + CSB -> US!
CSB!
Experimental group!
CSA -> US!
Weak Ass.!
Experimental group 2!
CSA -> no US!
very strong Ass.!
CSB is now even more informative!
Introduction 36
Rescorla-Wagner Theory (1972)!
" !
" !
An organism learns if events violate its
expectations!
Expectations are developed if relevant (salient)
events follow a stimulus-complex.!
Introduction 37
Rescorla-Wagner-Model!
"V = ! (" – V)!
V
!= present association strength!
"V
!= change of the association strength!
!
!= Learning rate!
"
!= maximal association strength!
Introduction 38
Parameters before conditioning!
" !
V
!= 0
(no conditioning at this point)!
" !
"
!= 100 (arbitrary chosen, but depends on the
!
!
!strength of the US)!
" !
! != .5
(0 < ! < 1)!
Introduction 39
1. Trial!
! *
Trial
!
Association strength (V)
1
("
-
V) !=
!.5 * (100 -
!"V
!
!
0) != 50 !!
100
80
60
50
40
V
20
0
0
1
2
3
4 5
Trials
6
7
8
Introduction 40
2. Trial!
! *
Trial
!
Association strength (V)
2
("
-
V) !=
!"V
!
.5 * (100 - 50) != 25 !!
100
80
75
60
50
40
V
20
0
0
1
2
3
4 5
Trials
6
7
8
!
Introduction 41
3. Trial!
! *
Trial
"
Association strength (V)
3
("
-
V) !=
!"V
.5 * (100 - 75) "
"=
!
!
"12.5"
100
87.5
80
75
60
50
40
V
20
0
0
0
1
2
3
4 5
Trials
6
7
8
Introduction 42
4. Trial!
! *
Trial
!
Association strength (V)
4
("
-
.5 * (100
V) !=
!
- 87.5) = !6.25!
100
93.75
87.5
80
75
60
50
40
V
20
0
!"V
0
0
1
2
3
4 5
Trials
6
7
8
!
Introduction 43
5. Trial!
! *
Trial
!
Association strength (V)
5
("
-
.5 * (100
V) !=
-
93.75
87.5
!
96.88
75
60
50
40
V
20
0
!
93.75) =!3.125!
100
80
!"V
0
0
1
2
3
4 5
Trials
6
7
8
Introduction 44
6. Trial!
! *
Trial
!
Association strength (V)
6
("
-
V) !=
.5 * (100 -
93.75
87.5
98.44
96.88
75
60
50
40
V
20
0
!
96.88) =!1.56!
100
80
!"V
0
0
1
2
3
4 5
Trials
6
7
8
!
Introduction 45
7. Trial!
! *
Trial
!
Association strength (V)
7
("
-
V) !=
!"V
!
!
.5 * (100 - 98.44) = .78!
100
93.75
87.5
80
96.88
99.22
75
60
50
40
V
20
0
98.44
0
0
1
2
3
4 5
Trials
6
7
8
Introduction 46
8. Trial!
! *
Trial
!
Association strength (V)
8
("
-
V) !=
!"V
.5 *!(100 - 99.22)
100
93.75
87.5
80
!= .39!
99.61
98.44
96.88
99.22
75
60
50
40
V
20
0
!
0
0
1
2
3
4 5
Trials
6
7
8
!
Introduction 47
1. Extinction!
! *
Trial
1
!
("
-
.5 * (0 -
V) !=
!"V
99.61)
!=
!
!
!-49.8!
Association strength (V)
Acquisition
100
93.75
87.5
80
99.61
98.44
96.88
99.61
99.22
V
75
60
49.81
50
40
V
20
0
Extinction
0
0
1
2
3
4 5
Trials
6
7
8
0
1
2
3 4
Trials
5
6
Introduction 48
2. Extinction!
! *
Trial
2
!
("
-
V) !=
!"V
.5 * (0 - 49.8) !=
!
!
!-24.9!
Association strength (V)
Acquisition
100
93.75
87.5
80
99.61
98.44
96.88
99.61
99.22
V
75
60
49.81
50
40
24.91
V
20
0
Extinction
0
0
1
2
3
4 5
Trials
6
7
8
0
1
2
3 4
Trials
5
6
Introduction 49
Acquisition- & Extinction-curves #
with !=.5 and " = 100!
Association strength (V)
Acquisition
100
93.75
87.5
80
99.61
98.44
99.61
99.22
96.88
V
75
60
49.8
50
40
24.91
12.45
6.23
3.11 1.56
V
20
0
Extinction
0
0
1
2
3
4 5
Trials
6
7
8
0
1
2
3 4
Trials
5
6
Introduction 50
Acquisition- & Extinction-curves with#
!=.5 and !=.2 (" = 100)!
"V = ! (" – V)!
Extinction
120
100
80
60
40
!=.5
!=.2
20
0
0
1
2
3
4 5
Trials
6
7
8
Assoziationsstärke (V)
Association strength (V)
Acquisition
120
100
80
60
!=.5
!=.2
40
20
0
0
1
2
3 4
Trials
5
6
Introduction 51
Combined stimuli!
add up the individual association strength:!
Vcomb = VCS1 + VCS2!
Trial 1:!
"VTone = .2 (100 – 0) = (.2)(100) = 20!
"VLight = .2 (100 – 0) = (.2)(100) = 20!
Vcomb = act. Vcomb+ "VTone+ "VLight = 0 +20 +20 = 40!
Trial 2:!
100
Association strength (V)
If multiple stimuli are present it is necessary to !
single
80
60
combined
40
20
0
"VTone = .2 (100 – 40) = (.2)(60) = 12!
1 2 3 4 5 6 7 8 9 10
CS-US Pairs
"VLight = .2 (100 – 40) = (.2)(60) = 12!
Vcomb = act. Vcomb + "VTone + "VLight = 40+12+12=64!
#
Combined stimuli - Overshadowing!
If the learning rates are different (since the stimuli are
not equally salient), the more salient stimulus
dominates the association:!
Trial 1:!
"VTone = .4 (100 – 0) = (.4)(100) = 40!
80
salient
60
40
"VLight = .1 (100 – 0) = (.1)(100) = 10!
Vcomb = act. Vcomb + "VTone+ "VLight = 0 +40 +10 = 50!
Trial 2:!
"VTone = .4 (100 – 50) = (.4)(50) = 20!
Introduction 52
20
0
less salient
1 2 3 4 5 6 7 8 9 10
CS-US Pairs
"VLight = .1 (100 – 50) = (.1)(50) = 5!
Vcomb = act. Vcomb + "VTone + "VLight = 50+20+5=75!
#
Introduction 53
Blocking!
The conditioning of CSA (Tone) in phase 1 makes up the largest proportion of
Vcomb. Thus, only a small proportion of Vcomb is left for CSB (Light) in phase 2.!
!Phase 1: Vcomb != VCS-A = 100!
!Phase 2: Vcomb = VCS-A + VCS-B = 100 + 0 = 100!
!
!
!
"V
!= ! (100 - Vcomb) = 0!
In case of a larger max. association strength " of the CSB it can be
conditioned in addition to CSB!
Introduction 54
Conditioned inhibition!
Two conditioned stimuli: CS+ (Tone) und CS- (Light)!
One unconditioned stimulus: US (Food)!
Learning phase:!
CS+ -> US!
CS- + CS+ -> no US!
CSA -> US!
Test:!
Result:!
CS+!
CR!
CS- + CS+!
no CR!
CSA + CS-!
no CR!
Introduction 55
Conditioned inhibition#
Two conditioned stimuli: CS+ (Tone) und CS- (Light)!
One unconditioned stimulus: US (Food)!
Association with CS+:!
Vcs+ = 100!
100
CS+!
50
Association with combination CS+ / CS-:! 0
Vcomb = Vcs+ + Vcs- = 0!
Thus:
Vcs- = -100!
-50
CS-!
-100
(Stimuli in trails alternate between
CS+ and CS+/CS-)!
#$
Introduction 56
Problems of the Rescorla-Wagner model!
Configural learning: !
!CSA->US, CSB->US, CSA+CSB->no US!
!Solution: Implement CSA+CSB as a single new stimulus CSC !
Latent inhibition: !
!First CS->no stimulus, then CS->US results in only slow learning!
!Solution: Reduce learning rate ! by CS->no US!
Preferred and unpriviledged conditioning:!
!Taste -> Nausea works better than Light -> Nausea!
!Solution: Make learning rate ! dependent on CS-US combinations!
The model can not explain all observations. !
Introduction 57
Thorndike’s cat puzzles!
E. L. Thorndike
(1874 - 1949)
Hungry cat is put into a cage.
If the cat shows a particular behavior (pull
a cord, turn a lock) the door is opened
and the cat could go outside and eat the
food placed there.
Introduction 58
Thorndike’s Law of Effect!
E. L. Thorndike
(1874 - 1949)
Of several responses made to the same situation,
those which are accompanied or closely
followed by satisfaction to the animal will, other
things being equal, be more firmly connected
with the situation, so that, when it recurs, they
will be more likely to recur; those which are
accompanied or closely followed by discomfort
to the animal will, other things being equal, have
their connections with that situation weakened,
so that, when it recurs, they will be less likely to
occur. (p. 244)!
Thorndike, E. L. (1911). Animal intelligence: Experimental studies. New York : Macmillan.!
Introduction 59
Operant conditioning!
•!Behavior occurs also without external
stimuli. !
•!“free operants” instead of reactions!
•!Reinforcement processes primarily
shape the behavior: positive
reinforcement is the strengthening of
behavior and negative reinforcement is
the strengthening of behavior by the
removal or avoidance of some event. !
B. F. Skinner
(1904 - 1990)
Introduction 60
Operant conditioning!
“I would define operant conditioning
as shaping and maintaining behavior
by making sure that reinforcing
consequences follow”
B. F. Skinner
(1904 - 1990)
Introduction 61
Similarities between classical and
instrumental conditioning!
•! Classical conditioning: Contingency between stimulus 1
(CS) and stimulus 2 (US)!
•! Instrumental conditioning: Contingency between stimulus
1, reaction und stimulus 2!
•! Both show acquisition, extinction and spontaneous
recovery!
•! Both show dependence on contiguity (temporal proximity)!
•! In both cases contiguity alone is not sufficient!
Introduction 62
Kontingenz vs. Kontiguität!
Classical conditioning:
Instrumental conditioning:
The conditioned stimulus predicts the
occurrence of the unconditioned
stimulus:
The response on the stimulus
increases the probability that the
reinforcer appears:
P ( US | CS ) > P ( not US | CS )
P ( V | R, S ) > P ( not V | R, S )
e.g.:
P ( Food | Tone ) > P ( no Food |
Tone )
e.g.:
P ( Food | Button press after the
tone)
> P ( no Food | Button press after
the tone)
Introduction 63
1.5 Elements of Reinforcement Learning!