* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Prediction and Cognition or What is Knowledge, that a Machine may
Survey
Document related concepts
Soar (cognitive architecture) wikipedia , lookup
Machine learning wikipedia , lookup
Concept learning wikipedia , lookup
Embodied cognitive science wikipedia , lookup
Multi-armed bandit wikipedia , lookup
History of artificial intelligence wikipedia , lookup
Transcript
Toward Grounding Knowledge in Prediction or Toward a Computational Theory of Artificial Intelligence Rich Sutton AT&T Labs with thanks to Satinder Singh and Doina Precup It’s Hard to Build Large AI Systems • Brittleness • Unforeseen interactions • Scaling • Requires too much manual complexity management – people must understand, intervene, patch and tune – like programming • Need more autonomy – learning, verification – internal coherence of knowledge and experience Marr’s Three Levels of Understanding • Marr proposed three levels at which any information-processing machine must be understood – Computational Theory Level • What is computed and why – Representation and Algorithm Level – Hardware Implementation Level • We have little computational theory for Intelligence – Many methods for knowledge representation, but no theory of knowledge – No clear problem definition – Logic Reinforcement Learning provides a little Computational Theory • Policies (controllers) : States Pr(Actions) • Value Functions V : States V (s) E t 1 reward t start in s0 , follow t 1 • 1-Step Models P s t 1 st ,a t E rt 1 st ,a t Outline of Talk • Experience • Knowledge Prediction • Macro-Predictions • Mental Simulation offering a coherent candidate computational theory of intelligence Experience • AI agent should be embedded in an ongoing interaction with a world Agent actions observations World Experience = these 2 time series • Enables clear definition of the AI problem – Let {reward } be function of {observation } t t – Choose actions to maximize total reward cf. textbook definitions • Experience provides something for knowledge to be about What is Knowledge? What is Knowledge? Deny the physical world Deny existence of objects, people, space… Deny all non-answers, correspondence theories All we really know about is our experience Knowledge must be in terms of experience Grounded Knowledge A is always followed by B if ot = A then ot 1 = B if A( ) then B( ot if A( ) then B( ) ot 1 ) ht conditioning: ht 1 Action if A( ) and C( ht A,B observations A,B predicates h t ot ,a t 1 ,o t 1 ,a t 2 ,o t 2 , ) then B( at ) ht 1 All of these are predictions World Knowledge Predictions • The world is a black box, known only by its I/O behavior (observations in response to actions) • Therefore, all meaningful statements about the world are statements about the observations it generates • The only observations worth talking about are future ones Therefore: The only meaningful things to say about the world are predictions Non-predictive “Knowledge” • Mathematical knowledge, theorems and proofs – always true, but tell us nothing about the world – not world knowledge • Uninterpretted signals, e.g., useful representations – real and useful, but not by themselves world knowledge, only an aid to acquiring it • Knowledge of the past • Policies – could be viewed as predictions of value – but by themselves are more like uninterpretted signals Predictions capture “regular”, descriptive world knowledge Grounded Knowledge A is always followed by B if ot = A then ot 1 = B if A( ) then B( 1-step preds. ot if A( ) then B( ) ot 1 ) ht conditioning: ht 1 Action if A( ) and C( ht ) then B( at A,B observations A,B predicates h t ot ,a t 1 ,o t 1 ,a t 2 ,o t 2 , ) ht 1 Still a pretty limited kind of knowledge. Can’t say anything beyond one step! Grounded Knowledge A is always followed by B if ot = A then ot 1 = B if A( ) then B( 1-step preds. A,B observations ot if A( ) then B( ) A,B predicates ot 1 ) h t ot ,a t 1 ,o t 1 ,a t 2 ,o t 2 , ht conditioning: ht 1 Action if A( ) and C( ht ) then B( at ) ht 1 steps later many steps long if A( ) and <arbitrary experiment> then many B(<outcome>) macropred. ht prior grounding posterior grounding Both Prior and Posterior Grounding are Needed • “Classical” AI systems omit prior grounding – e.g., “Tweety is a bird”, “John loves Mary” – sometimes called the “symbol grounding problem” • Modern AI sytems tend to skimp the posterior – supervised learning, Bayes nets, robotics… • It is not OK to leave posterior grounding to external, human observers – the information is just not in the machine – we don’t understand it; we haven’t done our job! • Yet this is such an appealing shortcut that we have almost always done it Outline of Talk • Experience • Knowledge Prediction • Macro-Predictions • Mental Simulation offering a coherent candidate computational theory of intelligence Macro-Predictions (Options) a la Sutton, Precup & Singh, 1999 et al. Let : States Pr(Actions) be an arbitrary policy Let b : States Pr({0,1}) be a termination condition Then <,b> is a kind of experiment – do until b=1 – measure something about the resulting experience Suppose we measure the outcome: – the state at the end of the experiment – the total reward during the experiment Then the macro-prediction for <,b> would predict Pr(end-state), E{total reward} given start-state This is a very general, expressive form of prediction Sutton, Precup, & Singh, 1999 Rooms Example 4 st ochast ic primit ive act ions HALLWAYS up lef t o1 G1 G2 o2 right Fail 3 3 % of t he t ime down 8 mult i-st ep opt ions ( t o each room' s 2 hallways) Policy of one option: Target Hallway Planning with Macro-Predictions wit h cell-t o-cell primit ive act ions V (goal )=1 Iteration #0 Iteration #1 Iteration #2 Iteration #1 Iteration #2 wit h room-t o-room opt ions V (goal )=1 Iteration #0 Learning Path-to-Goal with and without Hallway Macros (Options) 1000 Actions Steps per 100 episode Macros & actions Macros 10 1 10 100 Episodes 1000 10,000 Mental Simulation • Knowledge can be gained from experience – by actually performing experiments • But knowledge can also be gained without overt experience – we call this thinking, reasoning, planning, cognition… • This can be done through “thought experiments” – internal simulation of experience – generated from predictive knowledge – subject to learning methods as before • Much thought can be achieved this way... Illustration: Dynamic Mission Planning for UAVs Reward=25 • 15 8 • ? • • 5 10 • Base Expected Reward/ Mission 60 • • 40 High Fuel RL planning w/strategies and real-time control – Tactics: which way to fly now – Strategies: which site to head for Strategies compress space and time – – 50 30 Mission: Fly over (observe) most valuable sites and return to base Stochastic weather affects observability (cloudy or clear) of sites Limited fuel Intractable with classical optimal control methods Temporal scales: RL planning w/strategies Static Replanner Low Fuel Reduce no. states from ~1011 to ~106 Reduce tour length from ~600 to ~6 Reinforcement Learning with strategies and real-time control outperforms optimal tour planner that assumes static weather Barto, Sutton, and Moll, Adaptive Networks Laboratory, University of Massachusetts What to compute and Why Reward Policy Value Functions The ultimate goal is reward, but our AI spends most of its time with knowledge Knowledge/ Predictions A Candidate Computational Theory of Artificial Intelligence • AI Agent should be focused on finding general macro-predictions of experience • Especially seeking predictions that enable rapid computation of values and optimal actions • Predictions and their associated experiments are the coin of the realm – they have a clear semantics, can be tested & learned – can be combined to produce other predictions, e.g. values • Mental Simulation (plus learning) – makes new predictions from old – start of a computational theory of knowledge use Conclusions • World knowledge must be expressed in terms of the data • Such posterior grounding is challenging, – lose expressiveness in the short term – lose external (human) coherence, explainability • But can be done step by step, • And brings palpable benefits – autonomous learning/verification/extension of knowledge – autonomous complexity management due to internal coherence – knowledge suited to general reasoning process – mental simulation • We must provide this grounding!