Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Projective simulation with generalization Alexey Melnikov Institute for Theoretical Physics, University of Innsbruck Institute for Quantum Optics and Quantum Information Jointly with Adi Makmal, Vedran Dunjko, and Hans J. Briegel QI Seminar March 11, 2015 Alexey Melnikov Projective simulation with generalization Outline ◦ Introduction – artificial intelligence (AI) and its applications – the projective simulation (PS) model ◦ Generalization within PS Model – previous approach and our motivation – mechanism of generalization – rules of wildcard clips creation ◦ Analytical analysis of performance – learning curve – learning time – more than two categories Alexey Melnikov Projective simulation with generalization AI and intelligent agents AI is the study of agents that receive percepts from the environment and perform actions.* Any AI program is called intelligent agent. Environment Intelligent agent percepts actions * S. Russell and P. Norvig. Artificial intelligence: A Modern Approach, 3rd edition (Prentice Hall, 2009). Alexey Melnikov Projective simulation with generalization AI in robotics A robotic agent might have microphones, cameras, touch sensors and various motors for actuators.* Environment Applications: Robot microphones • robotics cameras, touch • finance • games • QEC motors, voice • ... * S. Russell and P. Norvig. Artificial intelligence: A Modern Approach, 3rd edition (Prentice Hall, 2009). Alexey Melnikov Projective simulation with generalization AI in finance A trading agent perceives market rates, news and trades in stock market. A robotic agent Stock market Trading agent rates, news Applications: • robotics • finance • games trades • QEC • ... * S. Russell and P. Norvig. Artificial intelligence: A Modern Approach, 3rd edition (Prentice Hall, 2009). Alexey Melnikov Projective simulation with generalization AI in games A game agent plays with you. A robotic agent You Game agent your moves Applications: • robotics • finance • games it’s own moves • QEC • ... * S. Russell and P. Norvig. Artificial intelligence: A Modern Approach, 3rd edition (Prentice Hall, 2009). Alexey Melnikov Projective simulation with generalization AI in QEC A QEC agent gets data from syndrome measurements and performs error correction.* Quantum register QEC agent syndrome data Applications: • robotics • finance • games apply unitaries • QEC • ... * J. Combes, et al. In-situ characterization of quantum devices with error correction. arXiv:1405.5656 (2014). Alexey Melnikov Projective simulation with generalization AI in QEC A QEC agent gets data from syndrome measurements and performs error correction.* Quantum register QEC agent syndrome data Applications: • robotics • finance • games apply unitaries • QEC • ... * J. Combes, et al. In-situ characterization of quantum devices with error correction. arXiv:1405.5656 (2014). Alexey Melnikov Projective simulation with generalization The PS agent PS is a physical approach to AI. The PS agent process information stochastically in a directed, weighted network of clips, where each clip represents a remembered percept, action, or sequences thereof. Clip network ... PS agent ... percepts p41 Clip 4 percept clip Clip 1 action clip p13 Clip 3 actions input p12 p35 p23 p32 Clip 6 p56 Clip 5 Clip 2 output Once a percept is observed, the network is activated, invoking a random walk between the clips, until an action clip is hit and couples out as a real action of the agent. *H. J. Briegel and G. De las Cuevas, Scientic reports 2 (2012). Alexey Melnikov Projective simulation with generalization The PS agent Each edge connects some clip ci with a clip cj and has a time-dependent weight h(t) (ci , cj ) which we denote as h-value. The h-values represent the unnormalized strength of the edges, and determine the hopping probabilities from clip ci to clip cj according to h(t) (ci , cj ) p (t) (cj |ci ) = P (t) . k h (ci , ck ) h-values are updated according to h(t+1) (ci , cj ) = h(t) (ci , cj ) − γ(h(t) (ci , cj ) − 1) + λ, where 0 ≤ γ ≤ 1 is a damping parameter and allows the agent to forget its past experience, which may be useful when the environment changes. λ is a non-negative reward given by the environment. Alexey Melnikov Projective simulation with generalization The basic PS network An agent acts as a driver who should learn how to deal with traffic lights and arrow signs. While driving the agent sees a traffic light with an arrow sign and should choose among two actions: continue driving (+) or stop a car (−). percept clips ⇐ 1 1 action clips + − The percepts that the agent perceives are composed of two categories, color and direction: S = {⇐, ⇒} × {green, red} . For instance, at first time step the PS agent perceives the (⇐, green) input. Alexey Melnikov Projective simulation with generalization The basic PS network An agent acts as a driver who should learn how to deal with traffic lights and arrow signs. While driving the agent sees a traffic light with an arrow sign and should choose among two actions: continue driving (+) or stop a car (−). percept clips ⇐ 1 2 action clips + − The percepts that the agent perceives are composed of two categories, color and direction: S = {⇐, ⇒} × {green, red} . Alexey Melnikov Projective simulation with generalization The basic PS network An agent acts as a driver who should learn how to deal with traffic lights and arrow signs. While driving the agent sees a traffic light with an arrow sign and should choose among two actions: continue driving (+) or stop a car (−). percept clips ⇐ action clips ⇒ + − The percepts that the agent perceives are composed of two categories, color and direction: S = {⇐, ⇒} × {green, red} . Alexey Melnikov Projective simulation with generalization The basic PS network An agent acts as a driver who should learn how to deal with traffic lights and arrow signs. While driving the agent sees a traffic light with an arrow sign and should choose among two actions: continue driving (+) or stop a car (−). percept clips ⇐ action clips ⇒ + ⇒ − The percepts that the agent perceives are composed of two categories, color and direction: S = {⇐, ⇒} × {green, red} . Alexey Melnikov Projective simulation with generalization The basic PS network An agent acts as a driver who should learn how to deal with traffic lights and arrow signs. While driving the agent sees a traffic light with an arrow sign and should choose among two actions: continue driving (+) or stop a car (−). percept clips ⇐ action clips ⇒ + ⇒ ⇐ − The percepts that the agent perceives are composed of two categories, color and direction: S = {⇐, ⇒} × {green, red} . Alexey Melnikov Projective simulation with generalization Generalization. Motivation There are many tasks in which percepts are composed of several elements. Even if two percept clips are different they may contain some common set of elements. This common set of elements should be taken into account in order to share the experience between different inputs. Previous approach *: + the experience is shared between the percept-clips + it was shown that the efficiency is better compared to the basic network The invasion game with color as additional property.* Generalization is manifested via additional edges between percepts with the same direction of an arrow. + agent is able to relearn after the change of the arrows meaning – the mask is used – no notion of direction is learned * J. Mautner, A. Makmal, D. Manzano, M. Tiersch, and H. J. Briegel. New Generation Computing 33, 1 (2015). Alexey Melnikov Projective simulation with generalization Generalization A learning agent, capable of a meaningful and useful generalization is expected to have the following characteristics: • An ability for categorization (recognizing that all red signals have a common property, which we can refer to as redness) • An ability to classify (a new red object is to be related to the group of objects with the redness property) • Optimally, only generalizations that are relevant for the survival or the success of the agent should be learned (red signals should be treated the same, whereas squared signals share no property that is of relevance in this context) • Correct actions should be associated with relevant generalized properties (the driver should stop whenever a red signal is shown) • The generalization mechanism should be flexible Alexey Melnikov Projective simulation with generalization Mechanism of generalization The key feature of this mechanism is a dynamical creation of a new kind of clips, which we call the wildcard clips. Whenever the new clip is created, it is compared to all existing clips pairwise. For each pair of clips the new clip is created (if it does not yet exist) with all different elements replaced with the “#” symbol. All matching clips connect to the new clip with the unit weights. The new wildcard clip connects to all other matching wildcard clips and to the actions. ⇐ ⇐ ⇒ ⇐ ⇒ ⇒ ⇐ ⇒ # − + t=1 − + t=2 t=3 Alexey Melnikov ⇒ ⇒ ⇐ ⇐ # − + ⇒ − + t=4 Projective simulation with generalization Mechanism of generalization ⇐ ⇒ ⇒ ⇒ ⇐ ⇐ ⇐ ⇒ ⇒ ⇒ ⇐ # − ⇒ ⇒ ⇒ ⇐ − + (a) t ≤ 1000 (b) 1000 < t ≤ 2000 ⇐ ⇐ # + (a) (1 ≤ t ≤ 1000), the agent is rewarded for stopping at red light and for driving at green light # + ⇐ ⇐ ⇒ ⇒ ⇒ ⇐ ⇐ # − (c) 2000 < t ≤ 3000 + − (d) 3000 < t ≤ 4000 Alexey Melnikov (b) (1000 < t ≤ 2000), the agent is rewarded for doing the opposite (c) (2000 < t ≤ 3000), the agent should only follow the arrows (d) (3000 < t ≤ 4000), the environment rewards the agent whenever it chooses to drive Projective simulation with generalization Mechanism of generalization (a) (1 ≤ t ≤ 1000), the agent is rewarded for stopping at red light and for driving at green light 1.0 efficiency Et 0.8 (a) (b) (c) (d) 0.6 (b) (1000 < t ≤ 2000), the agent is rewarded for doing the opposite 0.4 0.2 0.0 0 1000 2000 3000 time step The performance of the PS agent with generalization Alexey Melnikov 4000 (c) (2000 < t ≤ 3000), the agent should only follow the arrows (d) (3000 < t ≤ 4000), the environment rewards the agent whenever it chooses to drive Projective simulation with generalization Necessity of generalization in learning The environment shows one of n different arrows, but at each time step the background color is different. The agent can only move into one of these n directions and the environment rewards the agent whenever it follows the arrow, irrespective of its color. percept clips ⇐ ⇐ ⇒ action clips ← ⇓ ⇑ ↑ ⇒ ⇒ ⇑ ... ⇓ ... ↓ The PS agents efficiency is 1 , n it is not better than the random decision at every time step. Ebasic = Alexey Melnikov Projective simulation with generalization ⇐ Necessity of generalization in learning percept clips ⇐ (arrow, #) clips ⇐ ⇒ ⇐ ⇓ ⇑ ⇒ ⇓ ⇑ ... ⇓ ⇐ ... ⇑ (#, #) clip action clips ⇒ # ← ↑ ... ↓ The enhanced PS network as built up in the neverending-color scenario. Each percept- and wildcard-clip is connected to higher level matching wildcard clips and to all n action clips. For clarity, only one-level edges to and from wildcard clips are solid, while other edges are semitransparent. The thickness of the edges does not reflect their weights. Alexey Melnikov Projective simulation with generalization Asymptotic efficiency ⇐ ... ⇐ ⇐ • We consider the efficiency E at time t → ∞ 1 ⇐ 1 1 # ← 1 ↑ ∞ • p = 1/(n+2) is the probability to hit (⇐, #) clip after the first step of random walk 1 1 ... E∞ (n) = p + (1 − p) ↓ • the asymptotic efficiency is independent on λ value 1 + 2n 1 1 = > , n n(n + 2) n p= 1 n+2 basic E∞ (n)/E∞ (n)n→∞ = 2 Alexey Melnikov Projective simulation with generalization Learning curve ⇐ ... ⇐ • plearn is the probability that the correct association was learned ⇐ 1 ⇐ 1 1 # ← 1 ↑ ∞ • We put λ → ∞ to simplify the analysis (λ = 1000 in simulations) 1 • We assume all wildcard clips are created at t = 1 1 ... ↓ • We assume that the edge from the (arrow, #) clip to the (#, #) is never rewarded 1 Et (n) = plearn (t) E∞ (n) + (1 − plearn (t)) , n plearn (t) = 1 − 1 − Alexey Melnikov 1 n(n + 1)(n + 2) t−1 Projective simulation with generalization Learning curve • plearn is the probability that the correct association was learned efficiency Et (n) 0.7 0.6 n=2 • We put λ → ∞ to simplify the analysis (λ = 1000 in simulations) 0.5 n=3 0.4 0.3 n=5 0.2 Simulations Asymptote Eq. (3) Approximation Eq. (4) 0.1 0.0 0 100 200 300 time step 400 500 600 • We assume all wildcard clips are created at t = 1 • We assume that the edge from the (arrow, #) clip to the (#, #) is never rewarded 1 Et (n) = plearn (t) E∞ (n) + (1 − plearn (t)) , n plearn (t) = 1 − 1 − Alexey Melnikov 1 n(n + 1)(n + 2) t−1 Projective simulation with generalization Learning time • τ is a certain time at which the asymptotic efficiency is achieved efficiency Et (n) 0.7 0.6 n=2 • learning time is the expected value of τ 0.5 n=3 0.4 0.3 Simulations Asymptote Eq. (3) Approximation Eq. (4) 0.1 0.0 0 • plearn (t) is a cumulative distribution function P(τ ≤ t − 1) n=5 0.2 100 200 300 time step ∞ X 400 500 ∞ X 600 • The probability mass function P(τ = t) is therefore given by P(τ ≤ t) − P(τ ≤ t − 1) = plearn (t + 1) − plearn (t) t−1 1 E[τ ] = tP(τ = t) = t 1− n(n + 1)(n + 2) t=1 t=1 ! t 1 − 1− = n(n + 1)(n + 2). n(n + 1)(n + 2) Alexey Melnikov Projective simulation with generalization Three categories ⇓ ... ⇓ ⇓ • We put consider the efficiency E at time t → ∞ 1 1 1 ⇓ 1 1 1 ∞ 1 • p = 2/(n + 4) is the probability to hit a wildcard clip with an arrow ⇓ ← ↑ • the asymptotic efficiency is independent on λ value ∞ 1 # ... ↓ E∞ (n) = p + (1 − p) 1 3 2 1 = + > n n + 4 (n + 4)n n basic E∞ (n)/E∞ (n)n→∞ = 3 Alexey Melnikov Projective simulation with generalization K categories ⇓ ... ⇓ Network properties: ⇓ 1 1 1 ⇓ 1 1 1 ∞ 1 ⇓ ∞ 1 # ← ↑ ... ↓ • K additional layers PK +1 K −1 K −1 additional clips • l=2 l−2 = 2 PK K −2 K −2 additional clips contain • l=2 l−2 = 2 arrow • p = 2K −2 /(n + 2K −1 ) is the probability to hit a wildcard clip with an arrow E∞ (n, K ) = p + (1 − p) 1 n + (1 + n)2K −2 = n n (n + 2K −1 ) basic E∞ (n)/E∞ (n)n→∞ = 1 + 2K −2 Alexey Melnikov Projective simulation with generalization asymptotic efficiency E∞ (n, K ) K categories • 1.0 n=2 0.8 0.6 E∞ (n, K ) = n = 210 n + (1 + n)2K −2 n (n + 2K −1 ) • 0.4 basic E∞ (n)/E∞ (n)n→∞ = 1+2K −2 0.2 0.0 0 10 20 30 40 number of categories K 50 • K log n: efficiency goes to (1 + 1/n) /2 Accordingly, when the number of possible actions n is also large, in which case the performance of an agent with no generalization capabilities would drop to 0, the enhanced PS agents would succeed with a probability that is larger than 1/2, which can be amplified to 1. Alexey Melnikov Projective simulation with generalization Conclusion ◦ We presented a simple dynamical machinery that enables the PS model to generalize ◦ We showed that relevant generalizations are learned, that correct actions are associated with the relevant properties, and that the generalization mechanism is flexible ◦ In the considered tasks the PS model with generalization is always better than the basic PS agent in performance in the case of large and rapidly growing percept space Alexey Melnikov Projective simulation with generalization