Download Motivated Learning for Machine Intelligence_ Nov

Motivated Learning based on Goal Creation Janusz Starzyk School of Electrical Engineering and Computer Science, Ohio University, USA www.ent.ohiou.edu/~starzyk Istituto Dalle Molle di Studi sull'Intelligenza Artificiale, 4 December 2009. EE141 Outline Embodied Intelligence (EI)  Embodiment of Mind  How to Motivate a Machine  Goal Creation Hierarchy  GCS Experiment  Motivated Learning  EE141 Design principles of intelligent systems from Rolf Pfeifer “Understanding of Intelligence”, 1999         EE141 Interaction with complex environment cheap design ecological balance redundancy principle parallel, loosely coupled processes asynchronous sensory-motor coordination value principle Agent Drawing by Ciarán O’Leary- Dublin Institute of Technology Embodied Intelligence  Definition Embodied Intelligence (EI) is a mechanism that learns how to survive in a hostile environment – Mechanism: biological, mechanical or virtual agent with embodied sensors and actuators – EI acts on environment and perceives its actions – Environment hostility is persistent and stimulates EI to act – Hostility: direct aggression, pain, scarce resources, etc – EI learns so it must have associative self-organizing memory – Knowledge is acquired by EI EE141 Embodiment of a Mind     Embodiment is a part of environment under control of the mind It contains intelligence core and sensory motor interfaces to interact with environment It is necessary for development of intelligence It is not necessarily constant Embodiment Sensors channel Environment Intelligence core Actuators EE141 channel Embodiment of Mind      Changes in embodiment modify brain’s self-determination Brain learns its own body’s dynamics Self-awareness is a result of identification with own embodiment Embodiment can be extended by using tools and machines Successful operation is a function of correct perception of environment and own embodiment EE141 How to Motivate a Machine ? A fundamental question is what motivates an agent to do anything, and in particular, to enhance its own complexity? What drives an agent to explore the environment and learn ways to effectively interact with it? EE141 How to Motivate a Machine ?  Pfeifer claims that an agent’s motivation should emerge from the developmental process.  He called this the “motivated complexity” principle.  Chicken and egg problem? An agent must have a motivation to develop while his motivation comes from development?  Steels suggested equipping an agent with self-motivation.  “Flow” experienced when people perform their expert activity well would motivate to accomplish even more complex tasks.  But what is the mechanism of “flow”?  Oudeyer proposed an intrinsic motivation system.  Motivation comes from a desire to minimize the prediction error.  Similar to “artificial curiosity” presented by Schmidhuber. EE141 How to Motivate a Machine ?  Although artificial curiosity helps to explore the environment, it leads to learning without a specific purpose.  It may be compared to exploration in reinforcement learning.  Exploration is needed in order to learn and to model the environment.  But is exploration the only motivation we need to develop EI?  Can we find a more efficient mechanism for learning?  I suggest a simpler mechanism to motivate a machine. EE141 How to Motivate a Machine ?  I suggest that it is the hostility of the environment, in the definition of EI that is the most effective motivational factor.  It is the pain we receive that moves us.  It is our intelligence determined to reduce this pain that motivates us to act, learn, and develop.  Both are needed - hostility of the environment and intelligence that learns how to reduce the pain.  Thus pain is good.  Without pain we would not be motivated to develop. Fig. englishteachermexico.wordpress.com/ EE141 Motivated Learning  I suggest a goal-driven mechanism to motivate a machine to act, learn, and develop.     A simple pain based goal creation system. It uses externally defined pain signals that are associated with primitive pains. Machine is rewarded for minimizing the primitive pain signals. Definition: Motivated learning (ML) is learning based on the self-organizing system of goal creation in embodied agent.    Machine creates abstract goals based on the primitive pain signals. It receives internal rewards for satisfying its goals (both primitive and abstract). ML applies to EI working in a hostile environment. EE141 Pain-center and Goal Creation   EE141 expectation n  tio  Simple Mechanism Creates hierarchy of values Leads to formulation of complex goals Reinforcement • Pain increase • Pain decrease Forces exploration i bi inh  Dual pain memory Pain detection Pain increase + (-) d nee (-) (+) Sensor activation Missing objects - (+) Pain decrease Stimulation Pain detection/goal creation center Reinforcement neuro-transmitter Sensory neuron Motor neuron Motor Abstract Goal Creation for ML  The goal is to reduce the primitive pain level  Abstract goals are created if they satisfy the primitive goals Sensory pathway (perception, sense) Motor pathway (action, reaction) refrigerator Open - + food”becomes a “ sensory input to abstract pain center Abstract pain (Delayed memory of pain) Food Eat - Association Inhibition Reinforcement Connection Planning Expectation EE141 Level II Level I + Dual pain Pain Primitive Level Stomach Goal Creation Experiment SENSORY MOTOR INCREASES DECREASES 1 Food Eat sugar level food supplies 8 Grocery Buy food supplies money at hand 15 Bank Withdraw money at hand spending limits 22 Office Work spending limits job opportunities 29 School Study job opportunities - PAIR # Sensory-motor pairs and their effect on the environment EE141 Goal Creation Experiment in ML Pain Primitive Hunger 1 Pain 0 0 200 300 400 Lack of Food 500 600 100 200 300 400 Empty Gorcery 500 600 100 200 300 400 Discrete time 500 600 0.5 0 0 Pain 100 0.5 0 0 Pain signals in GCS simulation EE141 Goal Creation Experiment in ML Goal Scatter Plot 40 35 30 Goal ID 25 20 15 10 5 0 0 100 200 300 400 Discrete time 500 600 Action scatters in 5 GCS simulations EE141 Goal Creation Experiment in ML Pain Pain Pain Pain Pain Primitive Hunger 0.5 0 0.2 0.1 0 0.2 0.1 0 0.2 0.1 0 0.1 0.05 0 0 100 200 300 Lack of Food 400 500 600 0 100 200 300 Empty Gorcery 400 500 600 0 100 200 300 Lack of Money 400 500 600 0 100 200 300 400 Lack of JobOpportunitites 500 600 0 100 200 500 600 300 Discrete time 400 The average pain signals in 100 GCS simulations EE141 Compare RL (TDF) and ML (GCS) Mean primitive pain Pp value as a function of the number of iterations: - green line for TDF - blue line for GCS. Primitive pain ratio with pain threshold 0.1 EE141 Compare RL (TDF) and ML (GCS)  Comparison of execution time on log-log scale  TD-Falcon green  GCS blue  Combined efficiency of GCS 1000 better than TDF Problem solved Conclusion: embodied intelligence, with motivated learning based on goal creation is an effective learning and decision making system for dynamic environments. EE141 Reinforcement Learning   Single value function Measurable rewards  Can be optimized    Predictable Objectives set by designer Maximizes the reward Motivated Learning   One for each goal   Learning effort increases with complexity Always active EE141 Internal rewards  Cannot be optimized     Potentially unstable  Multiple value functions Unpredictable Sets its own objectives Solves minimax problem  Always stable   Learns better in complex environment than RL Acts when needed Sounds like science fiction   EE141 If you’re trying to look far ahead, and what you see seems like science fiction, it might be wrong. But if it doesn’t seem like science fiction, it’s definitely wrong. From presentation by Feresight Institute Questions? EE141 Resources – Evolution of Electronics EE141 From Ray Kurzwail, The Singularity Summit at Stanford, May 13, 2006 EE141 By Gordon E. Moore EE141 Clock Speed (doubles every 2.7 years) EE141 From Ray Kurzwail, The Singularity Summit at Stanford, May 13, 2006 Doubling (or Halving) times         EE141 Dynamic RAM Memory “Half Pitch” Feature Size Dynamic RAM Memory (bits per dollar) Average Transistor Price 5.4 years 1.5 years 1.6 years Microprocessor Cost per Transistor Cycle Total Bits Shipped Processor Performance in MIPS Transistors in Intel Microprocessors Microprocessor Clock Speed 1.1 years 1.1 years 1.8 years 2.0 years 2.7 years From Ray Kurzwail, The Singularity Summit at Stanford, May 13, 2006 EE141 From Ray Kurzwail, The Singularity Summit at Stanford, May 13, 2006 EE141 From Hans Moravec, Robot, 1999 Software or hardware? Software      Sequential Error prone Require programming Low cost Well developed programming methods EE141 Hardware      Concurrent Robust Require design Significant cost Hardware prototypes hard to build Future software/hardware capabilities 11 10 10 10 g alo n A SI L V 9 Number of neurons 10 (F ch a ro pp a are w d r Ha 8 10 7 10 re tw a Sof 6 10 io u lat m i S Human brain complexity A) G P d) ase b C n (P 5 10 4 10 2005 2010 2015 2020 2025 Year EE141 2030 2035 2040 Why should we care? EE141 Source: SEMATECH Design Productivity Gap  Low-Value Designs? Percent of die area that must be occupied by memory to maintain SOC design productivity 100% 80% 60% % Area Memory 40% % Area Reused Logic 20% % Area New Logic 19 99 20 02 20 05 20 08 20 11 20 14 0% Source = Japanese system-LSI industry EE141 Self-Organizing Learning Arrays SOLAR     * Self-organization * Sparse and local interconnections * Dynamically reconfigurable * Online data-driven learning Integrated circuits connect transistors into a system -millions of transistors easily assembled -first 50 years of microelectronic revolution Self-organizing arrays connect processors into a system -millions of processors easily assembled -next 50 years of microelectronic revolution EE141

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Motivated Learning for Machine Intelligence_ Nov