Download Projective simulation with generalization

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Projective simulation with generalization
Alexey Melnikov
Institute for Theoretical Physics, University of Innsbruck
Institute for Quantum Optics and Quantum Information
Jointly with
Adi Makmal, Vedran Dunjko, and Hans J. Briegel
QI Seminar
March 11, 2015
Alexey Melnikov
Projective simulation with generalization
Outline
◦ Introduction
– artificial intelligence (AI) and its applications
– the projective simulation (PS) model
◦ Generalization within PS Model
– previous approach and our motivation
– mechanism of generalization
– rules of wildcard clips creation
◦ Analytical analysis of performance
– learning curve
– learning time
– more than two categories
Alexey Melnikov
Projective simulation with generalization
AI and intelligent agents
AI is the study of agents that receive percepts from the environment and
perform actions.* Any AI program is called intelligent agent.
Environment
Intelligent agent
percepts
actions
* S. Russell and P. Norvig. Artificial intelligence: A Modern Approach, 3rd edition (Prentice Hall,
2009).
Alexey Melnikov
Projective simulation with generalization
AI in robotics
A robotic agent might have microphones, cameras, touch sensors and various
motors for actuators.*
Environment
Applications:
Robot
microphones
• robotics
cameras, touch
• finance
• games
• QEC
motors, voice
• ...
* S. Russell and P. Norvig. Artificial intelligence: A Modern Approach, 3rd edition (Prentice Hall,
2009).
Alexey Melnikov
Projective simulation with generalization
AI in finance
A trading agent perceives market rates, news and trades in stock market.
A robotic agent
Stock market
Trading agent
rates, news
Applications:
• robotics
• finance
• games
trades
• QEC
• ...
* S. Russell and P. Norvig. Artificial intelligence: A Modern Approach, 3rd edition (Prentice Hall,
2009).
Alexey Melnikov
Projective simulation with generalization
AI in games
A game agent plays with you.
A robotic agent
You
Game agent
your moves
Applications:
• robotics
• finance
• games
it’s own moves
• QEC
• ...
* S. Russell and P. Norvig. Artificial intelligence: A Modern Approach, 3rd edition (Prentice Hall,
2009).
Alexey Melnikov
Projective simulation with generalization
AI in QEC
A QEC agent gets data from syndrome measurements and performs error
correction.*
Quantum register
QEC agent
syndrome data
Applications:
• robotics
• finance
• games
apply unitaries
• QEC
• ...
* J. Combes, et al. In-situ characterization of quantum devices with error correction. arXiv:1405.5656
(2014).
Alexey Melnikov
Projective simulation with generalization
AI in QEC
A QEC agent gets data from syndrome measurements and performs error
correction.*
Quantum register
QEC agent
syndrome data
Applications:
• robotics
• finance
• games
apply unitaries
• QEC
• ...
* J. Combes, et al. In-situ characterization of quantum devices with error correction. arXiv:1405.5656
(2014).
Alexey Melnikov
Projective simulation with generalization
The PS agent
PS is a physical approach to AI. The PS agent process information stochastically
in a directed, weighted network of clips, where each clip represents a remembered
percept, action, or sequences thereof.
Clip network
...
PS agent
...
percepts
p41
Clip 4
percept clip
Clip 1
action clip
p13
Clip 3
actions
input
p12
p35
p23
p32
Clip 6
p56
Clip 5
Clip 2
output
Once a percept is observed, the network is activated, invoking a random walk
between the clips, until an action clip is hit and couples out as a real action of the
agent.
*H. J. Briegel and G. De las Cuevas, Scientic reports 2 (2012).
Alexey Melnikov
Projective simulation with generalization
The PS agent
Each edge connects some clip ci with a clip cj and has a time-dependent weight
h(t) (ci , cj ) which we denote as h-value. The h-values represent the unnormalized
strength of the edges, and determine the hopping probabilities from clip ci to clip
cj according to
h(t) (ci , cj )
p (t) (cj |ci ) = P (t)
.
k h (ci , ck )
h-values are updated according to
h(t+1) (ci , cj ) = h(t) (ci , cj ) − γ(h(t) (ci , cj ) − 1) + λ,
where 0 ≤ γ ≤ 1 is a damping parameter and allows the agent to forget its past
experience, which may be useful when the environment changes. λ is a non-negative
reward given by the environment.
Alexey Melnikov
Projective simulation with generalization
The basic PS network
An agent acts as a driver who should learn how to deal with traffic lights and arrow
signs. While driving the agent sees a traffic light with an arrow sign and should
choose among two actions: continue driving (+) or stop a car (−).
percept clips
⇐
1
1
action clips
+
−
The percepts that the agent perceives are composed of two categories, color and
direction:
S = {⇐, ⇒} × {green, red} .
For instance, at first time step the PS agent perceives the (⇐, green) input.
Alexey Melnikov
Projective simulation with generalization
The basic PS network
An agent acts as a driver who should learn how to deal with traffic lights and arrow
signs. While driving the agent sees a traffic light with an arrow sign and should
choose among two actions: continue driving (+) or stop a car (−).
percept clips
⇐
1
2
action clips
+
−
The percepts that the agent perceives are composed of two categories, color and
direction:
S = {⇐, ⇒} × {green, red} .
Alexey Melnikov
Projective simulation with generalization
The basic PS network
An agent acts as a driver who should learn how to deal with traffic lights and arrow
signs. While driving the agent sees a traffic light with an arrow sign and should
choose among two actions: continue driving (+) or stop a car (−).
percept clips
⇐
action clips
⇒
+
−
The percepts that the agent perceives are composed of two categories, color and
direction:
S = {⇐, ⇒} × {green, red} .
Alexey Melnikov
Projective simulation with generalization
The basic PS network
An agent acts as a driver who should learn how to deal with traffic lights and arrow
signs. While driving the agent sees a traffic light with an arrow sign and should
choose among two actions: continue driving (+) or stop a car (−).
percept clips
⇐
action clips
⇒
+
⇒
−
The percepts that the agent perceives are composed of two categories, color and
direction:
S = {⇐, ⇒} × {green, red} .
Alexey Melnikov
Projective simulation with generalization
The basic PS network
An agent acts as a driver who should learn how to deal with traffic lights and arrow
signs. While driving the agent sees a traffic light with an arrow sign and should
choose among two actions: continue driving (+) or stop a car (−).
percept clips
⇐
action clips
⇒
+
⇒
⇐
−
The percepts that the agent perceives are composed of two categories, color and
direction:
S = {⇐, ⇒} × {green, red} .
Alexey Melnikov
Projective simulation with generalization
Generalization. Motivation
There are many tasks in which percepts are composed of several elements. Even
if two percept clips are different they may contain some common set of elements.
This common set of elements should be taken into account in order to share the
experience between different inputs.
Previous approach *:
+ the experience is shared between the
percept-clips
+ it was shown that the efficiency is better
compared to the basic network
The invasion game with color as additional property.* Generalization is
manifested via additional edges between percepts with the same direction of an arrow.
+ agent is able to relearn after the change
of the arrows meaning
– the mask is used
– no notion of direction is learned
* J. Mautner, A. Makmal, D. Manzano, M. Tiersch, and H. J. Briegel. New Generation Computing
33, 1 (2015).
Alexey Melnikov
Projective simulation with generalization
Generalization
A learning agent, capable of a meaningful and useful generalization is expected to
have the following characteristics:
• An ability for categorization (recognizing that all red signals have a common
property, which we can refer to as redness)
• An ability to classify (a new red object is to be related to the group of objects
with the redness property)
• Optimally, only generalizations that are relevant for the survival or the success
of the agent should be learned (red signals should be treated the same, whereas
squared signals share no property that is of relevance in this context)
• Correct actions should be associated with relevant generalized properties (the
driver should stop whenever a red signal is shown)
• The generalization mechanism should be flexible
Alexey Melnikov
Projective simulation with generalization
Mechanism of generalization
The key feature of this mechanism is a dynamical creation of a new kind of clips,
which we call the wildcard clips.
Whenever the new clip is created, it is compared to all existing clips pairwise. For
each pair of clips the new clip is created (if it does not yet exist) with all different
elements replaced with the “#” symbol. All matching clips connect to the new
clip with the unit weights. The new wildcard clip connects to all other matching
wildcard clips and to the actions.
⇐
⇐
⇒
⇐
⇒
⇒
⇐
⇒
#
−
+
t=1
−
+
t=2
t=3
Alexey Melnikov
⇒
⇒
⇐
⇐
#
−
+
⇒
−
+
t=4
Projective simulation with generalization
Mechanism of generalization
⇐
⇒
⇒
⇒
⇐
⇐
⇐
⇒
⇒
⇒
⇐
#
−
⇒
⇒
⇒
⇐
−
+
(a) t ≤ 1000
(b) 1000 < t ≤ 2000
⇐
⇐
#
+
(a) (1 ≤ t ≤ 1000), the agent is
rewarded for stopping at red
light and for driving at green
light
#
+
⇐
⇐
⇒
⇒
⇒
⇐
⇐
#
−
(c) 2000 < t ≤ 3000
+
−
(d) 3000 < t ≤ 4000
Alexey Melnikov
(b) (1000 < t ≤ 2000), the agent
is rewarded for doing the
opposite
(c) (2000 < t ≤ 3000), the agent
should only follow the arrows
(d) (3000 < t ≤ 4000), the
environment rewards the
agent whenever it chooses to
drive
Projective simulation with generalization
Mechanism of generalization
(a) (1 ≤ t ≤ 1000), the agent is
rewarded for stopping at red
light and for driving at green
light
1.0
efficiency Et
0.8
(a)
(b)
(c)
(d)
0.6
(b) (1000 < t ≤ 2000), the agent
is rewarded for doing the
opposite
0.4
0.2
0.0
0
1000
2000
3000
time step
The performance of the
PS agent with generalization
Alexey Melnikov
4000
(c) (2000 < t ≤ 3000), the agent
should only follow the arrows
(d) (3000 < t ≤ 4000), the
environment rewards the
agent whenever it chooses to
drive
Projective simulation with generalization
Necessity of generalization in learning
The environment shows one of n different arrows, but at each time step the background color is different. The agent can only move into one of these n directions
and the environment rewards the agent whenever it follows the arrow, irrespective
of its color.
percept clips
⇐
⇐
⇒
action clips
←
⇓
⇑
↑
⇒
⇒
⇑
...
⇓
...
↓
The PS agents efficiency is
1
,
n
it is not better than the random decision at every time step.
Ebasic =
Alexey Melnikov
Projective simulation with generalization
⇐
Necessity of generalization in learning
percept clips
⇐
(arrow, #) clips
⇐
⇒
⇐
⇓
⇑
⇒
⇓
⇑
...
⇓
⇐
...
⇑
(#, #) clip
action clips
⇒
#
←
↑
...
↓
The enhanced PS network as built up in the neverending-color scenario. Each
percept- and wildcard-clip is connected to higher level matching wildcard clips and
to all n action clips. For clarity, only one-level edges to and from wildcard clips are
solid, while other edges are semitransparent. The thickness of the edges does not
reflect their weights.
Alexey Melnikov
Projective simulation with generalization
Asymptotic efficiency
⇐
...
⇐
⇐
• We consider the efficiency E at
time t → ∞
1
⇐
1
1
#
←
1
↑
∞
• p = 1/(n+2) is the probability to
hit (⇐, #) clip after the first step
of random walk
1
1
...
E∞ (n) = p + (1 − p)
↓
• the asymptotic efficiency is independent on λ value
1 + 2n
1
1
=
> ,
n
n(n + 2)
n
p=
1
n+2
basic
E∞ (n)/E∞
(n)n→∞ = 2
Alexey Melnikov
Projective simulation with generalization
Learning curve
⇐
...
⇐
• plearn is the probability that the
correct association was learned
⇐
1
⇐
1
1
#
←
1
↑
∞
• We put λ → ∞ to simplify the
analysis (λ = 1000 in simulations)
1
• We assume all wildcard clips are
created at t = 1
1
...
↓
• We assume that the edge from
the (arrow, #) clip to the (#, #)
is never rewarded
1
Et (n) = plearn (t) E∞ (n) + (1 − plearn (t)) ,
n
plearn (t) = 1 − 1 −
Alexey Melnikov
1
n(n + 1)(n + 2)
t−1
Projective simulation with generalization
Learning curve
• plearn is the probability that the
correct association was learned
efficiency Et (n)
0.7
0.6
n=2
• We put λ → ∞ to simplify the
analysis (λ = 1000 in simulations)
0.5
n=3
0.4
0.3
n=5
0.2
Simulations
Asymptote Eq. (3)
Approximation Eq. (4)
0.1
0.0
0
100
200
300
time step
400
500
600
• We assume all wildcard clips are
created at t = 1
• We assume that the edge from
the (arrow, #) clip to the (#, #)
is never rewarded
1
Et (n) = plearn (t) E∞ (n) + (1 − plearn (t)) ,
n
plearn (t) = 1 − 1 −
Alexey Melnikov
1
n(n + 1)(n + 2)
t−1
Projective simulation with generalization
Learning time
• τ is a certain time at which the
asymptotic efficiency is achieved
efficiency Et (n)
0.7
0.6
n=2
• learning time is the expected
value of τ
0.5
n=3
0.4
0.3
Simulations
Asymptote Eq. (3)
Approximation Eq. (4)
0.1
0.0
0
• plearn (t) is a cumulative distribution function P(τ ≤ t − 1)
n=5
0.2
100
200
300
time step
∞
X
400
500
∞
X
600
• The probability mass function
P(τ = t) is therefore given by
P(τ ≤ t) − P(τ ≤ t − 1) =
plearn (t + 1) − plearn (t)
t−1
1
E[τ ] =
tP(τ = t) =
t
1−
n(n + 1)(n + 2)
t=1
t=1
!
t
1
−
1−
= n(n + 1)(n + 2).
n(n + 1)(n + 2)
Alexey Melnikov
Projective simulation with generalization
Three categories
⇓
...
⇓
⇓
• We put consider the efficiency E at
time t → ∞
1
1
1
⇓
1
1
1
∞
1
• p = 2/(n + 4) is the probability to
hit a wildcard clip with an arrow
⇓
←
↑
• the asymptotic efficiency is independent on λ value
∞
1
#
...
↓
E∞ (n) = p + (1 − p)
1
3
2
1
=
+
>
n
n + 4 (n + 4)n
n
basic
E∞ (n)/E∞
(n)n→∞ = 3
Alexey Melnikov
Projective simulation with generalization
K categories
⇓
...
⇓
Network properties:
⇓
1
1
1
⇓
1
1
1
∞
1
⇓
∞
1
#
←
↑
...
↓
• K additional layers
PK +1 K −1
K −1
additional clips
•
l=2
l−2 = 2
PK K −2
K −2
additional clips contain
•
l=2 l−2 = 2
arrow
• p = 2K −2 /(n + 2K −1 ) is the probability to
hit a wildcard clip with an arrow
E∞ (n, K ) = p + (1 − p)
1
n + (1 + n)2K −2
=
n
n (n + 2K −1 )
basic
E∞ (n)/E∞
(n)n→∞ = 1 + 2K −2
Alexey Melnikov
Projective simulation with generalization
asymptotic efficiency E∞ (n, K )
K categories
•
1.0
n=2
0.8
0.6
E∞ (n, K ) =
n = 210
n + (1 + n)2K −2
n (n + 2K −1 )
•
0.4
basic
E∞ (n)/E∞
(n)n→∞ = 1+2K −2
0.2
0.0
0
10
20
30
40
number of categories K
50
• K log n: efficiency goes to
(1 + 1/n) /2
Accordingly, when the number of possible actions n is also large, in which case the
performance of an agent with no generalization capabilities would drop to 0, the
enhanced PS agents would succeed with a probability that is larger than 1/2, which
can be amplified to 1.
Alexey Melnikov
Projective simulation with generalization
Conclusion
◦ We presented a simple dynamical machinery that enables the PS model to
generalize
◦ We showed that relevant generalizations are learned, that correct actions are
associated with the relevant properties, and that the generalization
mechanism is flexible
◦ In the considered tasks the PS model with generalization is always better
than the basic PS agent in performance in the case of large and rapidly
growing percept space
Alexey Melnikov
Projective simulation with generalization