Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Reinforcement Learning in RealTime Strategy Games Nick Imrei Supervisors: Matthew Mitchell & Martin Dick Outline Reasons Background What this research is about Motivation and Aim RTS games Reinforcement Learning explained Applying RL to RTS This project Methodology Evaluation Summary Motivation and Aims Problem: has been a neglected area – game developers have adopted the “not broken so why fix it” philosophy Internet Thrashing – my own experience AI Aim: Use learning to develop a human-like player Simulate beginner → intermediate level play Use RL and A-life-like techniques E.g. Black and White, Pengi [Scott] RTS Games – The Domain Two or more teams of individuals/cohorts in a war-like situation on a series of battlefields Teams can have a variety of: E.g. Command & Conquer, Starcraft, Age of Empires, Red Alert, Empire Earth Weapons Units Resources Buildings Players required to manage all of the above to achieve the end goal. (Destroy all units, capture flag, etc.) Challenges offered in RTS games Real time constraints on actions High level strategies combined with lowlevel tactics Multiple goals and choices The Aim and Approach Create a human-like opponent Realistic Diverse behavior (not boring) This is difficult to do! Tactics and Strategy Agents will be reactive to environment Learn rather than code – Reinforcement learning The Approach Part 1 – Reinforcement Learning Reward and Penalty Action Rewards / Penalties Strategic Rewards / Penalties Penalize being shot Reward killing a player on the other team Securing / occupying a certain area Staying in certain group formations Destroying all enemy units Aim to receive maximum reward over time Problem: Credit assignment What rewards should be given to which behaviors? The Approach Part 2 – Credit Assignment States and actions Decide on a state space and action space Assign values to States, or States and Actions Train the agent in this space Reinforcement Learning example Reinforcement Learning example Why use Reinforcement Learning? Well suited to problems where there is a delayed reward (tactics and strategy) The trained agent moves in (worst case) linear time (reactive) Problems: Large state spaces (state aggregation) Long training times (ER and shaping) The Approach Part 3 – Getting Diversity A-life-like behavior using aggregated state spaces Agent Agent state space Research Summary: Investigate this approach using a simple RTS game Issues: Empirical Research Applying RL in a novel way Not using entire state space Need to investigate Appropriate reward functions Appropriate state spaces Problems with Training Will need lots of trials - the propagation problem No. trials can be reduced using Shaping [Mahadevan] and Experience Replay [Lin] Self play – other possibilities include A* and human opponents Tesauro, Samuel Methodology Hypothesis: “The combination of RL and reduced state spaces in a rich (RTS) environment will lead to human-like gameplay” Empirical investigation to test hypothesis Evaluate system behavior Analyze the observed results Describe interesting phenomenon Evaluation Measure the diversity of strategies How big a change (and what type) is required to change the behaviour – a qualitative analysis of this Success of strategies I.e. what level of gameplay does it achieve Time to win, points scored, resembles humans Compare to human strategies “10 requirements of a challenging and realistic opponent” [Scott] Summary thus far… Interested in a human-level game program Want to avoid brittle, predictable programmed solutions Search program space for most diverse solutions using RL to direct search Allows specifications of results, without needing to specify how this can be achieved Evaluate the results The Game – Maps and Terrain 2 Armies of equal amount on an n*n map. Terrain: Grass, Trees, Boundary Squares and Swamp All units can move on these squares, however Different terrain types affect a soldier’s attributes each in a different way The Game – Soldiers Soldier Attributes include: Sight Range Weapon Range Fatigue Speed Health Direction Relation Lines Experiments Part 1: Hand-coded Strategies Create 8 different Hand-coded Strategies Incl. Horde, Disperse, Central Defense, etc. Test their effectiveness based on: Time taken to win Time taken to eliminate an enemy once spotted Damage sustained when victorious Results of Experiments Part 1 Units deployed closer resulted in quicker games. No strategy was consistently successful against all others. The 3 most successful were: Occupy Horde Central Defense Strategies meant nothing once army sizes were > 150 on a 80*80 map. Experiments Part 2: Control Architectures Centralized All units are controlled by one entity Only do what is commanded (no autobehavior) View area = Central controllers Viewscreen Group Formation Unit Selection Unit Commanding Experiments Part 2: Control Architectures Localized Units are independently maneuvered controlled ala Artificial-life Viewing space is only what they see individually Formation ; Cohorts Unit Selection & Movement done via an A-life State Machine Experiments Part 2: Control Architectures Testing: Given the best 3 techniques from part1, Program them in a centralized and localized manner Base their effectiveness on criteria from part 1 Observe the realism of the 6 new hand-coded strategies Results of Experiments Part 2 As individual unit sight and weapon range increased, localized performed better. A-life performed better on rougher terrain, whereas centralized often got stuck. Centralized formation takes less time, hence it did better in situations where the ArmySize : MapSize ratio increased. Results of Experiments Part 2 Realism Evaluation: Localized resembles more a group of soldiers Centralized better resembles human gameplay. Given its success a local framework is used as a template for the learning agent Learning Agents - Architecture Each agent will work off the same learning table. Is expected to speed up learning – by learning from everyone’s mistakes rather than just your own Agents are trained against all opponents from parts 1 and 2 Learning Agents – Representing the world States Divide sight range up into sections Each section can have a combination of an ally, a health spot, an enemy or none. (On or off a health spot) 288 possible world states. Actions Move & Shoot (left, forward, right, back) Learning Agents – Representing the world Rewards: Positive: Shooting an enemy Moving to a health spot Negative: Being shot / killed Being on a health spot when health is full Reinforcement: Q(s,a) = R(s,a) + γ Σs’P(s’|s,a)Q(s,a,s’) Results of Experiments Part 3 Learning of behaviors was achieved in only a few simulations Agents developed the following behaviors: Shoot when seen unless health is low If Health is low, move to health spot Units form a health-spot queue Diversion of a centralized opponents attention Learning agents were consistently successful against all others bar centralized hording. Agents told what to do – not how to do it Human testing didn’t prove too successful! Conclusions A localized approach was found to be more successful overall than a centralized one. Given the sans base and resource element of the game, the all out aggressive strategies faired the best Learning strategies were successful against most programmed ones Diversion and health spot sharing behaviors were observed Future Work Extending the RTS game so it has: Resources and resource gathering Different Unit types Base building and maintenance Testing the RL/A-life framework in other game genres including Role Playing Games, Sim games and Sports. References Bob Scott. The illusion of intelligence. AI Game Programming Wisdom, pages 16–20, 2002. Sridhar Mahadevan and Jonathan Connell. Automatic programming of behavior-based robots using reinforcement learning. Artificial Intelligence 55, pages 311–364, 1992. L Lin. Reinforcement learning for robots using neural networks. PhD thesis, School of Computer Science, Carnegie Mellon University, Pittsburgh USA, 1993. Mark Bishop Ring. Continual Learning in Reinforcement Environments. MIT Press, 1994. Stay Tuned! For more information, see http://www.csse.monash.edu.au/~ngi/ Thanks for listening!