Download Learning Agents - University of Connecticut

Learning Agents CSE298 CSE300 CSE333 Presented by: Huayan Gao ([email protected]), Thibaut Jahan ([email protected]), David Keil ([email protected]), Jian Lian ([email protected]) Students in CSE 333 Distributed Component Systems Prof. Steven A. Demurjian, Sr. Computer Science & Engineering Department The University of Connecticut 1 Outline CSE298 CSE300 CSE333  Agents  Distributed computing agents  The JADE platform  Reinforcement learning  UML design of agents  The maze problem  Conclusion and future work 2 Agents CSE298 CSE300 Some general features characterizing agents: CSE333           Autonomy goal-orientedness collaboration flexibility ability to be self-starting temporal continuity character adaptiveness mobility capacity to learn. 3 Classification of agents CSE298 CSE300  Interface Agents AI techniques to provide assistance to the user CSE333  Mobile agents capable of moving around networks gathering information  Co-operative agents communicate with, and react to, other agents in a multi-agent systems within a common environment  Reactive agents “reacts” to a stimulus or input that is governed by some state or event in its environment 4 Distributed Computing Agents CSE298 CSE300 CSE333   Common learning goal (strong sense) Separate goals but information sharing (weak sense) 5 The JADE Platform CSE298 CSE300  CSE333 Java Agent Development Environment - Java Software framework - Middleware platform - Simplifies implementation and deployment of MAS  Services Provided - AMS (Agent Management System) registration, directory and management - DF (Directory Facilitator) yellow pages service - ACC (Agent Communication Channel) message passing service within the platform (including remote agents) 6 JADE Platforms for distributed agents CSE298 CSE300 CSE333 7 Agents and Markov processes CSE298 CSE300 Agent type Environment type CSE333 Deterministic Stochastic Accessible Reflex Solves MDPs Inaccessible Policy-based non-Markov Solves POMDPs* *Partially observable Markov decision problems 8 Learning from the environment CSE298 CSE300  CSE333 Environment, especially a distributed one, may be complex, may change  Necessity to learn dynamically, without supervision  Reinforcement learning - used in adaptive systems - involves finding a policy  Q-learning, a special case of RL - compute Q-values into Q-table - finds optimal policy 9 Policy search CSE298 CSE300 CSE333  Policy: a mapping from states to actions  Policy is as opposed to action sequence  Agents that precompute action sequences cannot respond to new sensory information  Agent that follows a policy incorporates sensory information about state into action determination 10 Components of a learner CSE298 CSE300  In learning, percepts may help improve agent’s future success in interaction  Components: - Learning element (improves policy) - Performance element (executes policy) - Critic: Applies fixed performance measure - Problem generator: Suggests experimental actions that will provide information to learning element CSE333 11 A learning agent and its environment CSE298 CSE300 CSE333 12 Temporal difference learning CSE298 CSE300 CSE333 • Uses observed transitions and differences between utilities of successive states to adjust utility estimates • Update rule based on transition from state i to j: U(i)  U(i) + (R(i) + U(j) U(i)) where - U is estimated utility, - R is reward -  is learning rate 13 Q-learning CSE298 CSE300  CSE333   Q-learning: a variant of reinforcement learning in which the agent incrementally computes a table of expected aggregate future rewards Agent modifies the values in the table to refine its estimates. Using the temporal-difference learning approach, update formula is calculated after the learner goes from state i to state j: Q(a, i)  Q (a, i) + (R(i) + maxa Q(a, j) - Q (a, i)) 14 Q-values CSE298 CSE300  CSE333 Definition: Q-values are values Q(a, i) of expected utility associated with a given action in a given state  Utility of state: U(i) = maxa Q(a, i)  Q-values permit decision making without a transition model  Q-values are directly learnable from reward percepts 15 UML design of agents CSE298 CSE300 CSE333  Standard UML did not provide a complete solution for depicting the design of multiagent systems.  Multi-agent systems being actors and software, their design does not follow typical UML design  Goals, complex strategies, knowledge, etc. are often missed 16 Reactive use cases CSE298 CSE300 CSE333 17 A maze problem CSE298 CSE300  CSE333  Simple example consisting of a maze for which the learner must find a policy, where the reward is determined by eventually reaching or not reaching a goal location in the maze. Original problem definition may be modified by permitting multiple distributed agents that communicate, either directly or via the environment 18 Cat and Mouse problem CSE298 CSE300  CSE333  Example of reinforcement learning The rules of the Cat and Mouse game are: - Cat catches mouse; - Mouse escapes cat; - Mouse catches cheese; - Game is over when the cat catches the mouse.  Source: T. Eden, A. Knittel, R. van Uffelen. Reinforcement learning. www.cse.unsw.edu.au/~aek/catmouse  Our project included modifying existing Java code to enable remote deployment of learning agents and to begin exploration of a multiagent version 19 Cat-Mouse GUI CSE298 CSE300 CSE333 20 Use cases in the Cat-Mouse problem CSE298 CSE300 CSE333 21 Classes for the Cat-Mouse problem CSE298 CSE300 CSE333 22 Sequence diagram CSE298 CSE300 CSE333 23 Maze creation and registration CSE298 CSE300 CSE333 24 Cat creation and registration CSE298 CSE300 CSE333 25 JADE CSE298 CSE300 CSE333 Cat look up maze from AMS and DF service 26 JADE CSE298 CSE300 CSE333 Mouse Agent Creating and Registration 27 Mouse Agent joins game CSE298 CSE300 CSE333 28 Game begins CSE298 CSE300 CSE333 Game begins and Maze (master) and Mouse agents exchange information by ACL messages 29 Remote deployment of learning agents CSE298 CSE300 o CSE333 o Using JADE, we can deploy maze, mouse, and cat agents: Jademaze maze1 Jademouse mouse1 Jadecat cat1 Jademaze, jademouse, jadecat are batch file names to deploy maze and cat agents. If we want to create them from a remote PC, we will use the following commands: Jademaze –host hostname mazename; Jademaze –host hostname catname; Jademaze –host hostname mousename; 30 Cat-Mouse in JADE CSE298 CSE300  CSE333  JADE allows services to be hosted and discovered in a distributed dynamic environment. On top of those “basic” services, mouse/cat agents can conceive maze/mouse/cat services provided and join/quit from the maze server they discovered from DF service. 31 Innovation CSE298 CSE300  A backbone for a core platform encouraging other agents to connect and join CSE333  Access to ontologies and service description to move towards interoperability at the service level  A baseline set of deployed agent services that can be used as building blocks by application developers to create innovative value added services  A practical test for a learning agent system complying with FIPA standards. 32 Deployment Scenario CSE298 CSE300  CSE333  Infrastructure Deployment - Enable their agents to interact with service agents developed by others - Test applications in a realistic, distributed, open environment Agent and Service Deployment - FIPA ACL messages to exchange information - Standard FIPA ACL compatible content languages - FIPA defined agent management services| (directories, communication and naming). 33 Conclusions CSE298 CSE300  Demonstration of a feasible research approach exploring the relationship between reinforcement learning and deployment of component-based distributed agents  Communication between agents  Issues with the space complexity of Q-learning: where n = grid size, m = # mice, c = # cats, space complexity is 64n2(m+c+1) CSE333 1 mouse + 1 cat => 481Mb of memory storage for Q-Table 34 Future work CSE298 CSE300  Learning in environments that change in response to the learning agent CSE333  Communication among learning agents; multiagent learning  Overcoming problems of table size under multiagent conditions  Security in message-passing 35 Partial list of references CSE298 CSE300  CSE333       S. Flake, C. Geiger, J. Kuster. Towards UML-based analysis and design of multi-agent systems. ENAIS’2001. T. Mitchell. Machine learning. McGraw-Hill, 1997. A. Printista, M. Errecalde, C. Montoya. A parallel implementation of Q-Learning based on communication with cache. http://journal.info.unlp.edu.ar/journal6/ papers/ p4.pdf. S. Russell, P. Norvig. Artificial intelligence: A modern approach. Prentice Hall, 1995. S. Sen, G. Weiss. Learning in multiagent systems. In G. Weiss, Ed., Multiagent systems: A modern approach to distributed artificial intelligence, MIT Press, 1999. R. Sutton, A. Barto. Reinforcement learning: An introduction. MIT Press, 1998. K. Sycara, A. Pannu, M. Williamson, D. Zeng, K. Decker. Distributed intelligent agents. IEEE Expert, 12/96. 36

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Learning Agents - University of Connecticut