Download schrum.gecco2010

Evolving Agent Behavior in Multiobjective Domains Using Fitness-Based Shaping Jacob Schrum and Risto Miikkulainen University of Texas at Austin Department of Computer Science Typical Uses of MOEAs  Where have MOEAs proven themselves?              Wireless Sensor Networks (Woehrle et al, 2010) Groundwater Management (Siegfried et al 2009) Hydrologic model calibration (Tang et al, 2006) Epoxy polymerization (Deb et al, 2004) Voltage-controlled oscillator design (Chu et al, 2004) Multi-spindle gear-box design (Deb & Jain, 2003) Foundry casting scheduling (Deb & Reddy, 2001) Multipoint airfoil design (Poloni & Pediroda, 1997) Design of aerodynamic compressor blades (Obayashi, 1997) Electromagnetic system design (Michielssen & Weile, 1995) Microprocessor design (Stanley & Mudge, 1995) Design of laminated ceramic composites (Belegundu et al, 1994) Many engineering/design problems! New Domains for MOEAs   Simulated agents often face multiple objectives Automatic discovery of intelligent behavior      Video game opponents in Unreal Tournament (van Hoorn, 2009) Predator/prey scenarios (Schrum & Miikkulainen 2009) Race car driving in TORCS (Agapitos et al, 2008) Comparatively little so far Direct application of MOEA seldom successful  Success often depends on “shaping” What is Shaping?    Term from Behavioral Psychology Identified by B. F. Skinner (1938) Task-Based Example: Train rat to press lever    First reward proximity Then any interaction with lever Then actual pressing of lever Evolutionary Shaping    Environment changes, making task harder Evolution shapes behavior across generations Example: Migration given continental drift [1]    Animals become accustomed to short migration Continental drift increases distance of migration Ability to travel increasing distances required Arctic Tern Atlantic Salmon  EC models with incremental evolution (ex. [2]) [1] B. F. Skinner. The shaping of phylogenic behavior. Experimental Analysis of Behavior. 1975. [2] Schrum and Miikkulainen. Constructing Complex NPC Behavior via Multiobjective Neuroevolution. 2008. Fitness-Based Shaping     Not extensively used Little/no domain knowledge needed Multiobjective approach a good fit Selection criteria change   Exploiting ignored objectives (TUG) Exploiting unfilled niches (BD) Objective Space Dominated, but exploiting mostly ignored objective Uncrowded Niches Behavior Space Crowded Niches Mutiobjective Optimization   v  u iff  Pareto dominance:  i  1, , n: v  u i i  i  1, , n: v  u i i Assumes maximization Want nondominated points NSGA-II used in this work  What to evolve?     NNs as control policies Nondominated Constructive Neuroevolution     Genetic Algorithms + Neural Networks Build structure incrementally (complexification) Good at generating control policies Three basic mutations (no crossover used) Perturb Weight Add Connection Add Node Targeting Unachieved Goals  Main ideas:    “Hard” and “easy” defined in terms of goal values     Temporarily deactivate “easy” objectives Focus on “hard” objectives Easy: average fitness “persists” above goal (achieved) Hard: goal not yet achieved Objectives reactivated when no longer achieved Increase goal values when all achieved Hard Objectives TUG Example Other goals also achieved → Goals increase Noisy evaluations Goal achieved Reset recency-weighted average Behavioral Diversity  Originally developed for single-objective tasks [3]     Add behavioral diversity objective Encourage exploration of new behaviors Domain-specific behavior measure required Extensions in this work:    Senses Multiobjective task Domain independent method Only requires policy mapping ℝN to ℝ M, e.g. NNs Actions [3] J.-B. Mouret and S. Doncieux. Using behavioral exploration objectives to solve deceptive problems in neuro-evolution. 2009. Behavioral Diversity Details  Behavior vector:  Given input vectors, concatenate outputs 0.1 2.3 4.3 5.2 3.2 Behavior vector 0.5 5.3 7.5 3.4 2.1 2.4 4.3 0.7 4.2 … 1.3 4.2 5.6 4.5 7.7  Behavioral diversity objective:  AVG distance from other behavior vectors High average distance from other points … 2.1 3.5 Battle Domain  Evolved monsters (blue)   Scripted fighter (green)   Bat can hurt monsters Three objectives     Monsters can hurt fighter Deal damage Avoid damage Stay alive Previous work required incremental evolution to solve Experimental Comparison  NN copied to 4 monsters   In paper     Control: Plain NSGA-II TUG: NSGA-II with TUG using expert initial goals BD: NSGA-II with BD using random input vectors Additional methods since publication    Homogeneous teams TUG-Low: NSGA-II with TUG using minimal initial goals BD-Obs: NSGA-II with BD using inputs from evaluations Each repeated 30 times Attainment Surfaces [4]  Result attainment surface   Shows space dominated by single Pareto front Summary attainment surface s   Union of space dominated in at least s out of n runs Surface s weakly dominates s+1, etc. Individual surfaces intersect Surface 1 Surface 2 Surface 3 Pareto Fronts (Approximation Sets) Result Attainment Surfaces Summary Attainment Surfaces [4] J. Knowles. A summary-attainment surface plotting method for visualizing the performance of stochastic multiobjective optimizers. 2005. Final Summary Attainment Surfaces Animation: worst to best summary attainment surface Control TUG TUG-Low BD BD-Obs Hypervolume Metric [5]  Hypervolume of result attainment surface   WRT reference point   Simply “volume” for 3 domain objectives Slightly less than minimum scores Pareto-compliant metric  F1  F2  HV1  HV2 Hypervolume = A + B + C + D [5] E. Zitzler and L. Thiele. Multiobjective optimization using evolutionary algorithms – a comparative case study. 1998. Hypervolume Successful Behaviors BD TUG BD-Obs TUG-Low Discussion    Control: more extreme trade-offs BD: more precise timing BD-Obs and BD similar   “Real” inputs give no advantage TUG: more teamwork  Particular initial objectives  TUG-Low more like BD than TUG  ALL are better than Control Future Work  How to combine TUG and BD   Naïve combination doesn’t work Scaling up    Many objectives More complex domains Current work in Unreal Tournament promising Conclusion   BD and TUG improve MO evolution Domain independence!   Contrast to task-based shaping Expand MOEAs to a new range of domains Questions? Email: [email protected] See movies at: http://nn.cs.utexas.edu/?fitness-shaping TUG Details  Persistence:    rt 1  rt   ( xt 1  rt ) Goals:      Recency-weighted average rt surpasses goal Initial values based on domain knowledge Or simply the minimal values for objectives Increase each goal g when all are achieved g o  g o   (omax  g o ) Goal achieved o Objectives reactivated when no longer achieved TUG Cycles

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download schrum.gecco2010