Download schrum.proposal - Neural Network Research Group

Evolving Multimodal Behavior PhD Proposal Jacob Schrum 11/4/09 Introduction  Challenge: Discover behavior automatically  Simulations,  video games, robotics Why challenging?  Noisy sensors  Complex domains  Continuous states/actions  Multiple agents, teamwork  Multiple objectives  Multimodal behavior required (focus) What is Multimodal Behavior?  Working definition:  Agent exhibits distinct kinds of actions under different circumstances  Examples:  Offensive & defensive modes in soccer  Search for weapons or opponents in video game  Animal with foraging & fleeing modes  Very important for teams  Roles correspond to modes  Example domains will involve teamwork Previous Approaches  Design Approaches  Hand  Value-Function Based Approaches  Learn  code in a structured manner the utility of actions (RL) Evolutionary Approaches  Selectively search based on performance Design  Subsumption Architecture (Brooks 1986)  Hierarchical design  Lower levels independent of higher levels  Built incrementally  Common in robotics  Hand coded Value-Function Based (1/2)  MAXQ (Dietterich 1998)  Hand-designed hierarchy  TD learning at multiple levels  Reduce state space  Taxi domain Still just a grid world  Discrete state & action  Value-Function Based (2/2)  Basis Behaviors (Matarić 1997)  Low-level behaviors pre-defined  Learn high-level control  Discrete state space  High-level features (conditions)  Reward shaping necessary  Applied to real robots  Too much expert knowledge Evolutionary (1/2)  Layered Evolution (Togelius 2004)  Evolve components of subsumption architecture  Applied to: EvoTanks (Thompson and Levine 2008)  Unreal Tournament 2004 (van Hoorn et al. 2009)   Must specify: Hierarchy  Training tasks   Similar to Layered Learning (Stone 2000) Evolutionary (2/2)  Neuro-Evolving Robotic Operatives (Stanley et al. 2005)  ML game  Train robot army  Many objectives Weighted sum: z-scores method  User changes weights during training   Dynamic  objective management Leads to multimodal behavior Multiple Objectives  Multimodal problems are typically multiobjective  Modes  associated with objectives Traditional: weighted sum (Cohon 1978)  Must tune the weights  Only one solution  Bad for non-convex surfaces  Need better formalism Each point corresponds to one set of specific weights Cannot be captured by weighted sum Greatest Mass Sarsa (Sprague and Ballard 2003) Multiple MDPs with shared action space  Learn via Sarsa(0) update rule:   Best value is sum of component values: Used in sidewalk navigation task  Like weighted sum  Convex Hull Iteration (Barrett and Narayanan 2008)  Changes MDP formalism:  Vector reward  Find solutions for all possible weightings where Maximize:  Results in compact set of solutions   Different trade-offs  Cannot capture non-convex surfaces Discrete states/actions only  Need a way to capture non-convex surfaces!  Pareto-based Multiobjective Optimization (Pareto 1890)  Imagine game with two objectives:   Damage Dealt Health Remaining dominates  iff 1. High health but did not deal much damage Tradeoff between objectives and 2.  Population of points not dominated are best: Pareto Front Dealt lot of damage, but lost lots of health Non-dominated Sorting Genetic Algorithm II (Deb et al. 2000)     Population P with size N; Evaluate P Use mutation to get P´ size N; Evaluate P´ Calculate non-dominated fronts of {P P´} size 2N New population size N from highest fronts of {P  P´} Constructive Neuroevolution Genetic Algorithms + Neural Networks  Build structure incrementally  Good at generating control policies  Three basic mutations (no crossover used)  Other structural mutations possible   More later Perturb Weight Add Connection Add Node Evolution of Teamwork  Heterogeneous  Different roles  Cooperation harder to evolve  Team-level multimodal behavior  Homogeneous  Shared policy  Individuals know how teammates act  Individuals fill roles as needed: multimodal Completed Work  Benefits of Multiobjective Neuroevolution  Pareto-based  Targeting Unachieved Goals (TUG)  Speed  up evolution with objective management Evolving Multiple Output Modes  Allow  leads to multimodal behavior networks to have multiple policies/modes Need a domain to experiment in … Battle Domain   Evolved monsters (yellow) Scripted fighter (green)  Approach nearest monster  Swing bat repeatedly    Monsters can hurt fighter Bat can hurt monsters Multiple objectives  Deal damage  Avoid damage  Stay alive  Can multimodal teamwork evolve? Benefits of Multiobjective Neuroevolution  Research Questions:  NSGA-II better than z-scores (weighted sum)?  Homogeneous or heterogeneous teams better?   30 trials for each combination Three evaluations per individual  Average scores to overcome noisy evals Incremental Evolution  Hard to evolve against scripted strategy  Could  Incremental evolution against increasing speeds  0%,   easily fail to evolve interesting behavior 40%, 80%, 90%, 95%, 100% Increase speed when all goals are met End when goals met at 100% Goals  Average population performance high enough?   Then increase speed Each objective has a goal:  At least 50 damage to bot (1 kill)  Less than 20 damage per monster on average (2 hits)  Survive at least 540 time steps (90% of trial)  AVG population objective score met goal value?  Goal achieved Evolved Behaviors  Baiting + Side-Swiping  Lure fighter  Turns allow team to catch up  Attacks on left side of fighter  Taking Turns  Hit and run  Next counter-clockwise monster rushes in  Fighter hit on left side  Multimodal behaviors! Multiobjective Conclusions NSGA-II faster than z-scores  NSGA-II more likely to generate multimodal behavior  Many runs did not finish/were slow  Several “successful” runs did not have multimodal behavior  Targeting Unachieved Goals  Research Question:  How to speed up evolution, make more reliable When objective’s goal is met, stop using it  Restore objective if scores drop below goal  Focuses on the most challenging objectives  Combine NSGA-II with TUG  Tough Objectives Evolved Behaviors  Alternating Baiting  Bait until another monster hits  Then baiting monster attacks  Fighter knocked back and forth  Synchronized Formation  Move as a group  Fighter chases one bait  Other monster rushes in with side swipe attacks  More multimodal behaviors! TUG Conclusions  TUG results in huge speed-up  No     wasted effort on achieved goals TUG runs finish more reliably Heterogeneous runs have more multimodal behavior than homogeneous Some runs still did not finish Some “successful” runs still did not have multimodal behavior Fight or Flight    Separate Fight and Flight trials Fight = Battle Domain Flight:  Scripted prey (red) instead of fighter  Has no bat; has to escape  Monsters confine and damage  New objective: Deal damage in Flight   Flight task requires teamwork Requires multimodal behavior New-Mode Mutation Encourage multimodal behavior  New mode with inputs from preexisting mode   Initially  very similar Maximum preference node determines mode Evolving Multiple Output Modes  Research Question:  How   to evolve teams that do well in both tasks Compare 1Mode to ModeMutation Three evals in Fight and three in Flight  Same networks for two different tasks 1Mode Behaviors  Aggressive + Corralling  Aggressive in Fight task  Take lots of damage  Deal lots of damage  Corralling in Flight task  Run/Rush + Crowding  Run/Rush in Fight task  Good timing on attack  Kill fighter w/o taking too much damage  Crowding in Flight task  Get too close to prey  Knock prey out and it escapes  Networks can’t handle both tasks! ModeMutation Behaviors  Alternating Baiting + Corralling  Alternating baiting in Fight task  Corralling in Flight task    Spread out to prevent escape Individuals rush in to attack Hit into Crowd + Crowding  Hitting into Crowd in Fight task  One attacker knocks fighter into others  Crowding in Flight task  Rush prey, ricochet back and forth  Some times knocks prey free  Networks succeed at both tasks! Mode Mutation Conclusions ModeMutation slower than 1Mode  ModeMutation better at producing multimodal behaviors  Harder task resulted in more failed runs  Many unused output modes created   Slows down execution  Bloats output layer Proposed Work  Extensions Avoiding Stagnation by Promoting Diversity 2. Extending Evolution of Multiple Output Modes 3. Heterogeneous Teams Using Subpopulations 4. Open-Ended Evolution + TUG 1.   Evaluate in new tasks Killer App: Unreal Tournament 2004 1. Avoiding Stagnation by Promoting Diversity Behavioral diversity avoids stagnation  Add a diversity objective (Mouret et al. 2009)  Behavior vector:   Given input vectors, concatenate outputs  1.2 -2 Diversity objective: … -1 2.2 …  AVG distance from other behavior vectors in pop. 0.5 1 0.2 1.7 -2 … 1.5 -1 0.6 0.3 2 2. Extending Evolution of Multiple Output Modes  Encourage mode differences  Random  input sources Probabilistic arbitration  Bad modes less likely to persist  Like softmax action selection  Restrict New-Mode Mutation  New objective: punish unused modes, reward used modes  Delete similar modes  Based on behavior metric  Limit  modes: make best use of limited resources Dynamically increase the limit? 3. Heterogeneous Teams Using Subpopulations  Each team member from different subpopulation (Yong 2007)   Encourages division of labor across teammates Different roles leads to multimodal team behavior 4. Open-Ended Evolution + TUG  Keep increasing goals  Evolution has something to strive towards Preserves benefits of TUG  Does not settle early  When to increase goals?   When  As all goals are achieved individual goals are achieved New Tasks   More tasks require more modes Investigate single-agent tasks   Investigate complementary objectives    Only teams so far TUG only helps contradictory? Are hard when combined with others Tasks: 1. Predator • • 2. Opposite of Flight Partial observability Sink the Ball • • • Very different from previous Needs more distinct modes? Less mode sharing? Unreal Tournament 2004   Commercial First-Person Shooter (FPS) Challenging domain  Continuous state and action  Multiobjective  Partial information  Multimodal behaviors required   Programming API: Pogamut Competitions:  Botprize  Deathmatch Unreal Deathmatch    Packaged bots are hand-coded Previous winners of botprize hand-coded Learning attempts  Simplified version of game (van Hoorn et al. 2009)  Limited to certain behaviors (Kadlec 2008)  Multimodal behavior in full game: not done yet Unreal Teams  Team Deathmatch  Largely  ignored? Capture the Flag  Teams protect own flag  Bring enemy flag to base  GP approach could not beat UT bots (Kadlec 2008)  Domination  King of hill  Teams defend key locations  RL approach learned group strategy hand-coded bots (Smith et al. 2007) of Review  System for developing multimodal behavior  Multiobjective Evolution  Targeting Unachieved Goals  New-Mode Mutation  Behavioral Diversity  Extending Mode Mutation  Subpopulations  Open-Ended Evolution  Final evaluation in Unreal Tournament 2004 Conclusion  Create system:  Automatically discovers multimodal behavior  No high-level hierarchy needed  No low-level behaviors needed  Works in continuous, noisy environments  Discovers team behavior as well Agents w/array of different useful behaviors  Lead to better agents/behaviors in simulations, games and robotics  Questions? Auxiliary Slides Design (2)  Behavior Trees (Isla 2005)  Top-down approach  Used first in Halo 2  Other commercial games since  “Brute Force Approach to Common Sense”  Add a behavior for every situation  Hand coded Evolutionary (0)  Dangerous Foraging (Stanley et al. 2003)  Don’t know if food is safe or poison  Partial information  Multimodal:    Eat food Avoid food Adaptive Teams of Agents (Bryant  Roman legions defend  Homogeneous teams  Barbarians plunder  Multimodal:   Defend town Chase barbarian and Miikkulainen 2003) Other MOEAs  PESA-II (Corne et al. 2001) strength = 0 strength = 1 In archive  External archive and internal population  Region-based selection using squeeze factor  SPEA2 (Zitzler and Thiele 1999)  External archive and internal population  Fitness based on strength and density squeeze factor = 2 squeeze factor = 1 1. Predator  Scripted agent is predator  Chases nearest monster Monsters run to avoid damage  When combined with Flight, is partially observable   Opposite  behavior from Flight Complementary objectives  Avoid damage  Stay alive 2. Sink the Ball  Monsters push ball around  No scripted agent  Monster agent sensors tuned to ball   Move ball to goal to sink it Very different from previous tasks  More  distinct behavioral modes Complementary objectives  Min. distance between ball and goal  Min. time to sink goal Heterogeneous z vs. NSGA-II Homogeneous z vs. NSGA-II The z-scores method was faster Heterogeneous NSGA-II vs. TUG Homogeneous NSGA-II vs. TUG Heterogeneous 1Mode vs. ModeMutation 1Mode was faster Homogeneous 1Mode vs. ModeMutation 1Mode was faster

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download schrum.proposal - Neural Network Research Group