Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Evolving Multimodal Behavior PhD Proposal Jacob Schrum 11/4/09 Introduction Challenge: Discover behavior automatically Simulations, video games, robotics Why challenging? Noisy sensors Complex domains Continuous states/actions Multiple agents, teamwork Multiple objectives Multimodal behavior required (focus) What is Multimodal Behavior? Working definition: Agent exhibits distinct kinds of actions under different circumstances Examples: Offensive & defensive modes in soccer Search for weapons or opponents in video game Animal with foraging & fleeing modes Very important for teams Roles correspond to modes Example domains will involve teamwork Previous Approaches Design Approaches Hand Value-Function Based Approaches Learn code in a structured manner the utility of actions (RL) Evolutionary Approaches Selectively search based on performance Design Subsumption Architecture (Brooks 1986) Hierarchical design Lower levels independent of higher levels Built incrementally Common in robotics Hand coded Value-Function Based (1/2) MAXQ (Dietterich 1998) Hand-designed hierarchy TD learning at multiple levels Reduce state space Taxi domain Still just a grid world Discrete state & action Value-Function Based (2/2) Basis Behaviors (Matarić 1997) Low-level behaviors pre-defined Learn high-level control Discrete state space High-level features (conditions) Reward shaping necessary Applied to real robots Too much expert knowledge Evolutionary (1/2) Layered Evolution (Togelius 2004) Evolve components of subsumption architecture Applied to: EvoTanks (Thompson and Levine 2008) Unreal Tournament 2004 (van Hoorn et al. 2009) Must specify: Hierarchy Training tasks Similar to Layered Learning (Stone 2000) Evolutionary (2/2) Neuro-Evolving Robotic Operatives (Stanley et al. 2005) ML game Train robot army Many objectives Weighted sum: z-scores method User changes weights during training Dynamic objective management Leads to multimodal behavior Multiple Objectives Multimodal problems are typically multiobjective Modes associated with objectives Traditional: weighted sum (Cohon 1978) Must tune the weights Only one solution Bad for non-convex surfaces Need better formalism Each point corresponds to one set of specific weights Cannot be captured by weighted sum Greatest Mass Sarsa (Sprague and Ballard 2003) Multiple MDPs with shared action space Learn via Sarsa(0) update rule: Best value is sum of component values: Used in sidewalk navigation task Like weighted sum Convex Hull Iteration (Barrett and Narayanan 2008) Changes MDP formalism: Vector reward Find solutions for all possible weightings where Maximize: Results in compact set of solutions Different trade-offs Cannot capture non-convex surfaces Discrete states/actions only Need a way to capture non-convex surfaces! Pareto-based Multiobjective Optimization (Pareto 1890) Imagine game with two objectives: Damage Dealt Health Remaining dominates iff 1. High health but did not deal much damage Tradeoff between objectives and 2. Population of points not dominated are best: Pareto Front Dealt lot of damage, but lost lots of health Non-dominated Sorting Genetic Algorithm II (Deb et al. 2000) Population P with size N; Evaluate P Use mutation to get P´ size N; Evaluate P´ Calculate non-dominated fronts of {P P´} size 2N New population size N from highest fronts of {P P´} Constructive Neuroevolution Genetic Algorithms + Neural Networks Build structure incrementally Good at generating control policies Three basic mutations (no crossover used) Other structural mutations possible More later Perturb Weight Add Connection Add Node Evolution of Teamwork Heterogeneous Different roles Cooperation harder to evolve Team-level multimodal behavior Homogeneous Shared policy Individuals know how teammates act Individuals fill roles as needed: multimodal Completed Work Benefits of Multiobjective Neuroevolution Pareto-based Targeting Unachieved Goals (TUG) Speed up evolution with objective management Evolving Multiple Output Modes Allow leads to multimodal behavior networks to have multiple policies/modes Need a domain to experiment in … Battle Domain Evolved monsters (yellow) Scripted fighter (green) Approach nearest monster Swing bat repeatedly Monsters can hurt fighter Bat can hurt monsters Multiple objectives Deal damage Avoid damage Stay alive Can multimodal teamwork evolve? Benefits of Multiobjective Neuroevolution Research Questions: NSGA-II better than z-scores (weighted sum)? Homogeneous or heterogeneous teams better? 30 trials for each combination Three evaluations per individual Average scores to overcome noisy evals Incremental Evolution Hard to evolve against scripted strategy Could Incremental evolution against increasing speeds 0%, easily fail to evolve interesting behavior 40%, 80%, 90%, 95%, 100% Increase speed when all goals are met End when goals met at 100% Goals Average population performance high enough? Then increase speed Each objective has a goal: At least 50 damage to bot (1 kill) Less than 20 damage per monster on average (2 hits) Survive at least 540 time steps (90% of trial) AVG population objective score met goal value? Goal achieved Evolved Behaviors Baiting + Side-Swiping Lure fighter Turns allow team to catch up Attacks on left side of fighter Taking Turns Hit and run Next counter-clockwise monster rushes in Fighter hit on left side Multimodal behaviors! Multiobjective Conclusions NSGA-II faster than z-scores NSGA-II more likely to generate multimodal behavior Many runs did not finish/were slow Several “successful” runs did not have multimodal behavior Targeting Unachieved Goals Research Question: How to speed up evolution, make more reliable When objective’s goal is met, stop using it Restore objective if scores drop below goal Focuses on the most challenging objectives Combine NSGA-II with TUG Tough Objectives Evolved Behaviors Alternating Baiting Bait until another monster hits Then baiting monster attacks Fighter knocked back and forth Synchronized Formation Move as a group Fighter chases one bait Other monster rushes in with side swipe attacks More multimodal behaviors! TUG Conclusions TUG results in huge speed-up No wasted effort on achieved goals TUG runs finish more reliably Heterogeneous runs have more multimodal behavior than homogeneous Some runs still did not finish Some “successful” runs still did not have multimodal behavior Fight or Flight Separate Fight and Flight trials Fight = Battle Domain Flight: Scripted prey (red) instead of fighter Has no bat; has to escape Monsters confine and damage New objective: Deal damage in Flight Flight task requires teamwork Requires multimodal behavior New-Mode Mutation Encourage multimodal behavior New mode with inputs from preexisting mode Initially very similar Maximum preference node determines mode Evolving Multiple Output Modes Research Question: How to evolve teams that do well in both tasks Compare 1Mode to ModeMutation Three evals in Fight and three in Flight Same networks for two different tasks 1Mode Behaviors Aggressive + Corralling Aggressive in Fight task Take lots of damage Deal lots of damage Corralling in Flight task Run/Rush + Crowding Run/Rush in Fight task Good timing on attack Kill fighter w/o taking too much damage Crowding in Flight task Get too close to prey Knock prey out and it escapes Networks can’t handle both tasks! ModeMutation Behaviors Alternating Baiting + Corralling Alternating baiting in Fight task Corralling in Flight task Spread out to prevent escape Individuals rush in to attack Hit into Crowd + Crowding Hitting into Crowd in Fight task One attacker knocks fighter into others Crowding in Flight task Rush prey, ricochet back and forth Some times knocks prey free Networks succeed at both tasks! Mode Mutation Conclusions ModeMutation slower than 1Mode ModeMutation better at producing multimodal behaviors Harder task resulted in more failed runs Many unused output modes created Slows down execution Bloats output layer Proposed Work Extensions Avoiding Stagnation by Promoting Diversity 2. Extending Evolution of Multiple Output Modes 3. Heterogeneous Teams Using Subpopulations 4. Open-Ended Evolution + TUG 1. Evaluate in new tasks Killer App: Unreal Tournament 2004 1. Avoiding Stagnation by Promoting Diversity Behavioral diversity avoids stagnation Add a diversity objective (Mouret et al. 2009) Behavior vector: Given input vectors, concatenate outputs 1.2 -2 Diversity objective: … -1 2.2 … AVG distance from other behavior vectors in pop. 0.5 1 0.2 1.7 -2 … 1.5 -1 0.6 0.3 2 2. Extending Evolution of Multiple Output Modes Encourage mode differences Random input sources Probabilistic arbitration Bad modes less likely to persist Like softmax action selection Restrict New-Mode Mutation New objective: punish unused modes, reward used modes Delete similar modes Based on behavior metric Limit modes: make best use of limited resources Dynamically increase the limit? 3. Heterogeneous Teams Using Subpopulations Each team member from different subpopulation (Yong 2007) Encourages division of labor across teammates Different roles leads to multimodal team behavior 4. Open-Ended Evolution + TUG Keep increasing goals Evolution has something to strive towards Preserves benefits of TUG Does not settle early When to increase goals? When As all goals are achieved individual goals are achieved New Tasks More tasks require more modes Investigate single-agent tasks Investigate complementary objectives Only teams so far TUG only helps contradictory? Are hard when combined with others Tasks: 1. Predator • • 2. Opposite of Flight Partial observability Sink the Ball • • • Very different from previous Needs more distinct modes? Less mode sharing? Unreal Tournament 2004 Commercial First-Person Shooter (FPS) Challenging domain Continuous state and action Multiobjective Partial information Multimodal behaviors required Programming API: Pogamut Competitions: Botprize Deathmatch Unreal Deathmatch Packaged bots are hand-coded Previous winners of botprize hand-coded Learning attempts Simplified version of game (van Hoorn et al. 2009) Limited to certain behaviors (Kadlec 2008) Multimodal behavior in full game: not done yet Unreal Teams Team Deathmatch Largely ignored? Capture the Flag Teams protect own flag Bring enemy flag to base GP approach could not beat UT bots (Kadlec 2008) Domination King of hill Teams defend key locations RL approach learned group strategy hand-coded bots (Smith et al. 2007) of Review System for developing multimodal behavior Multiobjective Evolution Targeting Unachieved Goals New-Mode Mutation Behavioral Diversity Extending Mode Mutation Subpopulations Open-Ended Evolution Final evaluation in Unreal Tournament 2004 Conclusion Create system: Automatically discovers multimodal behavior No high-level hierarchy needed No low-level behaviors needed Works in continuous, noisy environments Discovers team behavior as well Agents w/array of different useful behaviors Lead to better agents/behaviors in simulations, games and robotics Questions? Auxiliary Slides Design (2) Behavior Trees (Isla 2005) Top-down approach Used first in Halo 2 Other commercial games since “Brute Force Approach to Common Sense” Add a behavior for every situation Hand coded Evolutionary (0) Dangerous Foraging (Stanley et al. 2003) Don’t know if food is safe or poison Partial information Multimodal: Eat food Avoid food Adaptive Teams of Agents (Bryant Roman legions defend Homogeneous teams Barbarians plunder Multimodal: Defend town Chase barbarian and Miikkulainen 2003) Other MOEAs PESA-II (Corne et al. 2001) strength = 0 strength = 1 In archive External archive and internal population Region-based selection using squeeze factor SPEA2 (Zitzler and Thiele 1999) External archive and internal population Fitness based on strength and density squeeze factor = 2 squeeze factor = 1 1. Predator Scripted agent is predator Chases nearest monster Monsters run to avoid damage When combined with Flight, is partially observable Opposite behavior from Flight Complementary objectives Avoid damage Stay alive 2. Sink the Ball Monsters push ball around No scripted agent Monster agent sensors tuned to ball Move ball to goal to sink it Very different from previous tasks More distinct behavioral modes Complementary objectives Min. distance between ball and goal Min. time to sink goal Heterogeneous z vs. NSGA-II Homogeneous z vs. NSGA-II The z-scores method was faster Heterogeneous NSGA-II vs. TUG Homogeneous NSGA-II vs. TUG Heterogeneous 1Mode vs. ModeMutation 1Mode was faster Homogeneous 1Mode vs. ModeMutation 1Mode was faster