Survey							
                            
		                
		                * Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Learning in Worlds with Objects Leslie Pack Kaelbling MIT Artificial Intelligence Laboratory With Tim Oates, Natalia Hernandez, Sarah Finney Leslie Pack Kaelbling 1 NTT-MIT Collaboration Meeting, 2001 What is an Agent? A system that has an ongoing interaction with an external environment • household robot • factory controller • web agent • Mars explorer • pizza delivery robot Leslie Pack Kaelbling Environment Observation 2 Action NTT-MIT Collaboration Meeting, 2001 Agents Must Learn Learning is a crucial aspect of intelligent behavior • human programmers lack required knowledge • agents should work in a variety of environments • agents should work in changing environments What to learn? • World dynamics: What happens when I take a particular action? • Reward: What world states are good? Leslie Pack Kaelbling 3 NTT-MIT Collaboration Meeting, 2001 Crisis Current state-of-the-art learning methods will not work in domains with multiple objects: ? These are crucial domains for robots of the future. Leslie Pack Kaelbling 4 NTT-MIT Collaboration Meeting, 2001 Representation Learning requires some sort of representation of states of the world. The choice of representation affects • what information can be represented • what kinds of generalizations the agent can make Leslie Pack Kaelbling 5 NTT-MIT Collaboration Meeting, 2001 Attribute Vector State-of-the-art representation for learning temperature = 48.2 pressure = 57.9 mB valve1 = open valve2 = closed time = 10:48AM backlog = 78 volume = 32.2 production = 45.5 … Leslie Pack Kaelbling 6 NTT-MIT Collaboration Meeting, 2001 Generalization over Attribute Vectors x 1 0.5 0 -0.5 -1 0 3 2 time temp > 22 1 1 2 temp 3 0 pressure < 3 close valve Leslie Pack Kaelbling 7 open valve time < 10AM add reagent increase temp NTT-MIT Collaboration Meeting, 2001 Complex Everyday Domains Attribute vector is impossibly big book1-on-book2: book2-on-book1: true false pen-is-yellow: pen-is-blue: lamp-on: lamp-off: ink-bottle-level: true false true false 50% lamp-in-bottle: bottle-on-lamp: paper1-color: paper2-color: false false gray white fabric-behind-lamp: book2-is-clear: book4-is-clear: book1-is-clear: true false false true block1-on-block2: block3-unstable: block2-on-table: block1-in-front-of-lamp: false true false true … Leslie Pack Kaelbling 8 NTT-MIT Collaboration Meeting, 2001 Generalization over Objects • If book1 is on book2 and I move book2, then book1 will move • If the cup is on the table and I move the table, then the cup will move • If the pen is on the paper and I move the paper, then the pen will move • If the coat is on the chair and I move the chair, then the coat will move For all objects A and B: If A is on B and I move B, then A will move Leslie Pack Kaelbling 9 NTT-MIT Collaboration Meeting, 2001 Referring to Objects Traditional symbolic AI has the problem of “symbol grounding”: How do I know what object is named by book1? on(book1,book2) Leslie Pack Kaelbling 10 NTT-MIT Collaboration Meeting, 2001 Deictic Expressions “Deixis” is Greek for “pointing” ima koko watashi-ga motteiru hako Leslie Pack Kaelbling watashi-ga miteiru hako 11 NTT-MIT Collaboration Meeting, 2001 Automatic Generalization If I have an object in my hand and I open my hand, then the object that was in my hand is now on the table This is true, no matter what object is in your hand. Leslie Pack Kaelbling 12 NTT-MIT Collaboration Meeting, 2001 Communicating with Humans Natural language communication • speaks of the world in terms of objects and their relationships • uses deictic expressions Our robots of the future will have to be able to understand and generate human descriptions of the world Leslie Pack Kaelbling 13 NTT-MIT Collaboration Meeting, 2001 Long-Term Research Goal A robotic system with hand and cameras that can • learn to achieve tasks efficiently through trial and error • acquire natural language descriptions of the objects and their properties through “conversation” with humans Leslie Pack Kaelbling 14 NTT-MIT Collaboration Meeting, 2001 Short-Term Research Plan Explore deictic, object-based representation for learning algorithms • build simulated hand-eye robot system that manipulates blocks (with real physics) • have simulated robot learn to carry out tasks from trial and error Demonstrate empirically and theoretically that deictic representation is crucial for efficient learning Leslie Pack Kaelbling 15 NTT-MIT Collaboration Meeting, 2001 First Example Domain Unreliable block stacking: • robot is rewarded for making tall piles of blocks • the taller a pile is, the more likely it is to fall over when another block is added • a pile can be made more stable by building piles to its sides Once the robot learns to do this task, keep the physics of the domain the same, but reward a more complex behavior. Leslie Pack Kaelbling 16 NTT-MIT Collaboration Meeting, 2001 Learning by Doing Having an initial task to perform focuses the robot’s attention on aspects of the environment • Use extension of Utree learning algorithm to select important aspects of the environment • Generate new deictic expressions dynamically: the-block-on-top-of(the-block-I-am-looking-at) • Extend reinforcement learning methods to apply to object-based representations Leslie Pack Kaelbling 17 NTT-MIT Collaboration Meeting, 2001 Extracting General Rules There are too many facts that are true in any interesting environment. Solving tasks focuses attention on • particular objects (named with deictic expressions) • particular properties of those objects These objects and properties are likely of general importance: use them as input to association-rule learning algorithm to learn facts like: The thing that is on the thing that I am holding will probably fall off if I move Leslie Pack Kaelbling 18 NTT-MIT Collaboration Meeting, 2001 Enabling Planning Given general rules, the agent can “think” about the consequences of its actions and decide what to do, rather than learn through trial and error. Leslie Pack Kaelbling 19 NTT-MIT Collaboration Meeting, 2001 In Future An ambitious research project • vision algorithms for learning segmentation and object recognition • learning good properties and relations for characterizing the domain (“concept learning”) • connect with natural language learning for word meanings Leslie Pack Kaelbling 20 NTT-MIT Collaboration Meeting, 2001 Don’t miss any dirt! Leslie Pack Kaelbling 21 NTT-MIT Collaboration Meeting, 2001