Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Evolving Winning Controllers for Virtual Race Cars Yonatan Shichel & Moshe Sipper Outline • Introduction – Artificial Intelligence – AI in games • • Robocode: Java-based tank-battle simulator RARS: Robot Auto Racing Simulator – Evolutionary Computation • • • • GP-RARS: evolution of winning controllers for virtual race cars – – – – – – • Key concepts in evolution Genetic Algorithms (GA) Genetic Programming (GP) Game description Previous work Evolutionary environment setup & calibration Experiments and Results Discussion Result Analysis Concluding Remarks Introduction Artificial Intelligence (AI) Definition (Russell & Norvig, 2003): “systems that [act/think] [like humans/rationally]” Artificial Intelligence (AI) Definition (Russell & Norvig, 2003): “systems that [act/think] [like humans/rationally]” Artificial Intelligence (AI) Definition (Russell & Norvig, 2003): “systems that [act/think] [like humans/rationally]” Artificial Intelligence (AI) Definition (Russell & Norvig, 2003): “systems that [act/think] [like humans/rationally]” Artificial Intelligence (AI) Definition (Russell & Norvig, 2003): “systems that [act/think] [like humans/rationally]” AI in Games • • • • • • games are natural candidates for AI games provide a variety of challenges games allow exploration of real-world realms games allow comparison to human behavior games can be rewarding to master games are fun! Robocode Robocode • • • • tank-battle simulation Java-based, open-source programming game simplistic physical model active gamer community – extensive online robot library – ongoing tournaments RARS: Robot Auto Racing Simulator RARS: Robot Auto Racing Simulator RARS: Robot Auto Racing Simulator RARS: Robot Auto Racing Simulator RARS: Robot Auto Racing Simulator • • • • car-race simulation C++-based, open-source programming game sophisticated physical model inactive gamer community – limited online robot library – tournaments held between 1995 and 2003 Evolutionary Computation “a family of algorithmic approaches aimed at finding optimal solutions to search problems of high complexity” Key concepts in Evolution The Origin of Species (Darwin, 1859): • a population is composed of many individuals • individuals differ in characteristics, which are inheritable by means of sexual reproduction • environment consists of limited resources, leading to a struggle for survival Key concepts in Evolution The Origin of Species (Darwin, 1859): • fitter individuals are more likely to survive and reproduce, passing their characteristics to their offspring • as time passes, populations slowly adapt to their surrounding environment Genetic Algorithms (GA) Inspired by Darwin’s evolutionary principles: • a fixed-size population is composed of many solution instances for the problem at hand • solutions are encoded in genomes • a fitness function determines how fit each individual is • population is re-populated on each generation • fitter individuals have higher probabilities to be selected to next generation Genetic Algorithms (GA) • genetic operators – crossover and mutation – are applied on selected individuals for the creation of new individuals • process is repeated for many generations Genetic Algorithms (GA) A schematic flow of a basic GA: g=0 initialize population P0 evaluate P0 //assign fitness values to individuals while (termination condition not met) do g=g+1; select Pg from Pg-1 crossover Pg mutate Pg evaluate Pg end while Genetic Algorithms (GA) GA customization: • • • • • • • genome representation fitness measure selection method crossover method mutation method termination condition initial population creation Genetic Programming (GP) “an evolutionary computation approach aimed at the creation of computer programs rather than static solutions” Genetic Programming (GP) • individual’s genome is composed of LISP expressions Genetic Programming (GP) example of LISP expression: + * x 1 x (+ (* x x) 1) ==> x2+1 Genetic Programming (GP) • individual’s genome is composed of LISP expressions • LISP expressions are composed of functions and terminals Genetic Programming (GP) functions: terminals: {+, *} {1, x} + * x 1 x Genetic Programming (GP) functions: terminals: {+, *} {1, x} + * x 1 x Genetic Programming (GP) functions: terminals: {+, *} {1, x} + * x 1 x Genetic Programming (GP) functions: terminals: {+, *} {1, x} + * x 1 x Genetic Programming (GP) • individual’s genome is composed of LISP expressions • LISP expressions are composed of functions and terminals • LISP expressions evaluate to numeric values, hence representing functions Genetic Programming (GP) evaluation of LISP expression: 6 -3 -2 -1 5 x (+ (* x x) 1) 4 -2 5 3 -1 2 2 0 1 1 1 2 0 2 5 0 1 2 3 Genetic Programming (GP) • individual’s genome is composed of LISP expressions • LISP expressions are composed of functions and terminals • LISP expressions evaluate to numeric values, hence representing functions • genetic operators are defined to operate on (and return) LISP expressions Genetic Programming (GP) subtree substitution crossover: + * x 1 x 1 * 1 + x (+ (* x x) 1) x2+1 1 (- 1 (* 1 (+ x 1))) -x Genetic Programming (GP) subtree substitution crossover: + * x 1 x 1 * 1 + x (+ (* x x) 1) x2+1 1 (- 1 (* 1 (+ x 1))) -x Genetic Programming (GP) subtree substitution crossover: + * x 1 x 1 * 1 + x (+ (* x x) 1) x2+1 1 (- 1 (* 1 (+ x 1))) -x Genetic Programming (GP) subtree substitution crossover: + 1 1 * 1 + x 1 (- 1 (* 1 (+ x 1))) -x Genetic Programming (GP) subtree substitution crossover: + 1 1 Genetic Programming (GP) subtree substitution crossover: + * 1 1 + x 1 (+ (* 1 (+ x 1)) 1) x+2 1 Genetic Programming (GP) subtree substitution crossover: + * 1 1 * x + x 1 x 1 (+ (* 1 (+ x 1)) 1) x+2 (- 1 (* x x)) 1-x2 Genetic Programming (GP) random subtree growth mutation: + * x 1 x (+ (* x x) 1) x2+1 Genetic Programming (GP) random subtree growth mutation: + * x 1 x (+ (* x x) 1) x2+1 Genetic Programming (GP) random subtree growth mutation: + * x 1 Genetic Programming (GP) random subtree growth mutation: + * x 1 - 1 1 (+ (* x (- 1 1)) 1) 1 Genetic Programming (GP) A schematic flow of a basic GP: g=0 initialize population P0 evaluate P0 //assign fitness values to individuals while (termination condition not met) do g=g+1; while (Pg is not full) do OP = choose a genetic operator select individual or individuals from Pg-1 according to OP's inputs apply OP on selected individuals add the resulting individuals to Pg end while evaluate Pg end while GP-RARS evolution of winning controllers for virtual race cars Basic Rules • one or more cars drive on a track for given number of laps • cars are damaged when colliding or driving off track • car may be disabled and disqualified if its damage exceeds a certain level • the winner is the driver that finishes first Game Variants • • • • • number of cars: one, two, multiple number of tracks: one, multiple race length: short, long controller program: generic, specialized driver class: reactive (c2), optimal-path (c1) Game Variants • • • • • number of cars: one, two, multiple number of tracks: one, multiple race length: short, long controller program: generic, specialized driver class: reactive (c2), optimal-path (c1) Game Variants • • • • • number of cars: one, two, multiple number of tracks: one, multiple race length: short, long controller program: generic, specialized driver class: reactive (c2), optimal-path (c1) Game Variants • • • • • number of cars: one, two, multiple number of tracks: one, multiple race length: short, long controller program: generic, specialized driver class: reactive (c2), optimal-path (c1) Game Variants • • • • • number of cars: one, two, multiple number of tracks: one, multiple race length: short, long controller program: generic, specialized driver class: reactive (c2), optimal-path (c1) Game Variants • • • • • number of cars: one, two, multiple number of tracks: one, multiple race length: short, long controller program: generic, specialized driver class: reactive (c2), optimal-path (c1) Controlling the Car • movement: • steering: • fuel & damage: desired speed variable wheel angle variable pit stop request flag Car Sensors situation variables: • • • • • • current speed, drift speed and heading current track segment ID position on current track segment distances from left and right road shoulders distance to next track segment radii and lengths of current and next track segments additional data: • complete track layout • nearby cars information Car Sensors ...some basic RARS situation variables: The Challenge PEAS system (Russell & Norvig, 2003): • • • • Performance measure Environment Actuators Sensors The Challenge PEAS system (Russell & Norvig, 2003): • • • • Performance measure Environment Actuators Sensors The Challenge PEAS system (Russell & Norvig, 2003): • • • • Performance measure Environment Actuators Sensors The Challenge is the environment... ...observable? ...deterministic? ...episodic? ...static? ...discrete? ...single agent? RARS GP-RARS The Challenge is the environment... RARS GP-RARS ...observable? fully fully ...deterministic? ...episodic? ...static? ...discrete? ...single agent? The Challenge is the environment... RARS GP-RARS ...observable? fully fully ...deterministic? partially partially ...episodic? ...static? ...discrete? ...single agent? The Challenge is the environment... RARS GP-RARS ...observable? fully fully ...deterministic? partially partially ...episodic? no no ...static? ...discrete? ...single agent? The Challenge is the environment... RARS GP-RARS ...observable? fully fully ...deterministic? partially partially ...episodic? no no ...static? either static ...discrete? ...single agent? static indicates whether the environment changes with or without the intervention of the active agent. In the basic RARS game it can be non-static if more than one agent is active; GP-RARS is single-car and thus fully static. The Challenge is the environment... RARS GP-RARS ...observable? fully fully ...deterministic? partially partially ...episodic? no no ...static? either static ...discrete? continuous continuous ...single agent? The Challenge is the environment... RARS GP-RARS ...observable? fully fully ...deterministic? partially partially ...episodic? no no ...static? either static ...discrete? continuous continuous ...single agent? single OR multiple single The Challenge PEAS system (Russell & Norvig, 2003): • • • • Performance measure Environment Actuators Sensors The Challenge PEAS system (Russell & Norvig, 2003): • • • • Performance measure Environment Actuators Sensors Previous Work • planning approaches: – Genetic Algorithms (Eleveld, Sáez) – A* search (Pajala) • reactive approaches: – Decision Trees (Wang) – Action Tables (Cleland) – Artificial Neural Networks (Ng, Pyeatt, Coulum) – Evolving Neural Networks (Stanley) Previous Work • planning approaches: – Genetic Algorithms (Eleveld, Sáez) – A* search (Pajala) • reactive approaches: – Decision Trees (Wang) – Action Tables (Cleland) – Artificial Neural Networks (Ng, Pyeatt, Coulum) – Evolving Neural Networks (Stanley) Evolutionary Setup & Calibration • • • • • • • genome representation fitness measure selection method crossover method mutation method termination condition initial population creation Evolutionary Setup & Calibration • • • • • • • genome representation fitness measure selection method crossover method mutation method termination condition initial population creation Genome Representation • each individual is composed of two trees: – steering tree – throttling tree • trees evaluate to numeric values, which are truncated to fit game-world restrictions • trees are defined using an extensive set of functions and terminals, both simple and complex Genome Representation • terminal set (simple): {cur-rad, nex-rad, to-end, nex-len, v, vn, to-lft, to-rgt, track-width, random-constant, 0, 1} • terminal set (complex): {a, a-angle, off-center, inner-wall, outer-wall, closest-wall} • function set: {add(2), sub(2), mul(2), div(2), abs(1), neg(1), tan(1), if-greater(4), if-positive(3), if-cur-straight(2), if-nex-straight(2)} Genome Representation • terminal set (simple): {cur-rad, nex-rad, to-end, nex-len, v, vn, to-lft, to-rgt, track-width, random-constant, 0, 1} • terminal set (complex): {a, a-angle, off-center, inner-wall, outer-wall, closest-wall} • function set: {add(2), sub(2), mul(2), div(2), abs(1), neg(1), tan(1), if-greater(4), if-positive(3), if-cur-straight(2), if-nex-straight(2)} blue terminals and functions are the ones chosen after a calibration process Evolutionary Setup & Calibration • • • • • • • genome representation fitness measure selection method crossover method mutation method termination condition initial population creation Fitness Measure • fitness evaluation performed on a single-lap, single-car race on one track: sepang • track believed to exhibit various track features • two fitness measures were used: – race distance – modified race time Evolutionary Setup & Calibration • • • • • • • genome representation fitness measure selection method crossover method mutation method termination condition initial population creation Selection Method • several methods examined for a 250individual population: – tournament of k, with k={2,3,4,5,6,7} – fitness proportionate selection – square-fitness proportionate selection Selection Method • several methods examined for a 250individual population: – tournament of k, with k={2,3,4,5,6,7} – fitness proportionate selection – square-fitness proportionate selection Evolutionary Setup & Calibration • • • • • • • genome representation fitness measure selection method crossover method mutation method termination condition initial population creation Crossover & Mutation • crossover: subtree substitution • mutation: random subtree growth • probabilities: – 40% reproduction – 50% crossover – 10% mutation • 5% random constant mutation • 5% structural (subtree) mutation Evolutionary Setup & Calibration • • • • • • • genome representation fitness measure selection method crossover method mutation method termination condition initial population creation Initialization & Termination • initial population creation: – Koza’s ‘ramped-half-and-half’ method: for each k = {4,5,6,7,8}: • 10% of the trees grown to a depth up to k • 10% of the trees grown to a depth of exactly k • termination condition: – evolution stops after 255 generations Experiments & Results • several evolutionary runs were made • two best runs were taken, and best driver of last generation was extracted from each • driver was then tested for 10 single-lap, single-car races Experiments & Results best run, race-distance fitness: GP-Single-1 160.0 ± 0.4 seconds Experiments & Results best run, modified-race-time fitness: GP-Single-2 160.9 ± 0.3 seconds ...but how do they drive? Result Comparison • comparison to human-crafted drivers – on the training track – on ‘unseen’ tracks • comparison to machine-crafted drivers Result Comparison • comparison to human-crafted drivers – on the training track – on ‘unseen’ tracks • comparison to machine-crafted drivers Result Comparison single-car, single-lap race on sepang # 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 Driver Dodger13 K1999 K2001 SmoothB4 Bulle2 Sparky5 SmoothB3 Felix16 SmoothB2 GPSingle1 GPSingle2 Vector WappuCar Apex8 Djoefe Ali2 Mafanja SBv1r4 Burns Eagle Bulle Magic JR001 Class 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 Lap Time (sec.) 146.3 ± 0.1 146.6 ± 0.1 147.1 ± 0.1 148.3 ± 0.1 150.4 ± 0.1 150.4 ± 0.1 153.3 ± 0.1 153.6 ± 0.1 156.5 ± 0.1 160.0 ± 0.4 160.9 ± 0.3 160.1 ± 0.1 161.7 ± 0.1 162.5 ± 0.2 163.7 ± 0.1 164.1 ± 0.1 164.4 ± 0.3 165.7 ± 0.1 168.4 ± 5.7 169.3 ± 0.6 169.5 ± 0.2 174.0 ± 0.1 178.5 ± 0.1 Result Comparison • comparison to human-crafted drivers – on the training track – on ‘unseen’ tracks • comparison to machine-crafted drivers Result Comparison Aug. 2004 season results (16 tracks) # Driver 1st 2nd 3rd total 1 Vector 6 3 2 11 2 Eagle 3 2 1 6 3 GPSingle2 2 3 4 9 4 GPSingle1 2 2 2 6 5 SBv1r4 1 1 2 4 6 Bulle 1 7 Mafanja 2 8 Magic 2 2 9 WappuCar 1 2 10 Djoefe 2 2 11 Burns 1 1 12 Ali2 13 Apex8 14 JR001 1 1 2 4 Result Comparison • comparison to human-crafted drivers – on the training track – on ‘unseen’ tracks • comparison to machine-crafted drivers Result Comparison Previous Works Results Author Reported Time (sec.) GP-Single-1 GP-Single-2 v01 37.8 ± 0.1 38.1 ± 1.7 34.9 ± 0.1 suzuka 149.7 ± 0.1 177.1 ± 5.2 167.5 ± 0.3 race7 85.7 ± 0.2 61.9 ± 0.6 63.3 ± 0.4 v03 59.4 55.3 ± 0.5 49.3 ± 0.1 oval 33.0 31.0 ± 0.1 30.8 ± 0.1 complex 209.0 196.2 ± 6.0 204.6 ± 1.3 Coulum clkwis 38.0 37.8 ± 0.1 36.4 ± 0.1 (ANN) Cleland v01 37.4 38.1 ± 1.7 34.9 ± 0.1 (Action Tables) Stanley et al. clkwis 37.6 / 37.9 37.8 ± 0.1 36.4 ± 0.1 Eleveld (GA) Ng et al. (ANN) (Evolving ANN) Track Conclusions • GP-Drivers rank higher than any humancrafted driver in their class when racing on their training track • GP-Drivers rank among the top humancrafted drivers in their class when racing on new, unseen tracks • GP-Drivers perform better than any machine-crafted driver developed by past RARS researchers Discussion Performance Analysis GPSingle2 on sepang (159.9 sec) Performance Analysis Dodger13 on sepang (146.5 sec) Performance Analysis GPSingle2 on clkwis Genome Representation • terminal set (simple): {cur-rad, nex-rad, to-end, nex-len, v, vn, to-lft, to-rgt, track-width, random-constant, 0, 1} • terminal set (complex): {a, a-angle, off-center, inner-wall, outer-wall, closest-wall} • function set: {add(2), sub(2), mul(2), div(2), abs(1), neg(1), tan(1), if-greater(4), if-positive(3), if-cur-straight(2), if-nex-straight(2)} blue terminals and functions are the ones chosen after a calibration process Genome Representation • terminal set (simple): {cur-rad, nex-rad, to-end, nex-len, v, vn, to-lft, to-rgt, track-width, random-constant, 0, 1} • terminal set (complex): {a, a-angle, off-center, inner-wall, outer-wall, closest-wall} • function set: {add(2), sub(2), mul(2), div(2), abs(1), neg(1), tan(1), if-greater(4), if-positive(3), if-cur-straight(2), if-nex-straight(2)} blue terminals and functions are the ones “chosen” by evolution (in best-of-run) Genetic Analysis GP-Single-2, Steering (% (% (% (% (ifg 0.70230484 a α (* n -0.9850136)) (- a (neg a))) (- (% 1.0 (% v a)) (neg a))) (- ((* n (neg n)) (neg a)) (neg a))) (- (% 1.0 (% v a)) (neg (% (% 1.0 (% v a)) (% v a))))) Genetic Analysis GP-Single-2, Steering (% (% (% (% (ifg 0.70230484 a α (* n -0.9850136)) (- a (neg a))) (- (% 1.0 (% v a)) (neg a))) (- ((* n (neg n)) (neg a)) (neg a))) (- (% 1.0 (% v a)) (neg (% (% 1.0 (% v a)) (% v a))))) Genetic Analysis GP-Single-2, Steering (% (% (% (% (ifg 0.70230484 a α (* n -0.9850136)) (- a (neg a))) ((% a v) (neg a))) (- ((* n (neg n)) (neg a)) (neg a))) ((% a v) (neg (% (% a v) (% v a))))) Genetic Analysis GP-Single-2, Steering (% (% (% (% (ifg 0.70230484 a α (* n -0.9850136)) (- a (neg a))) ((% a v) (neg a))) (- ((* n (neg n)) (neg a)) (neg a))) ((% a v) (neg (% (% a v) (% v a))))) Genetic Analysis GP-Single-2, Steering (% (% (% (% (ifg 0.70230484 a α (* n -0.9850136)) (- a (neg a))) ((% a v) (neg a))) (- ((* n (neg n)) (neg a)) (neg a))) ((% a v) (neg (% (% a v) (% v a))))) Genetic Analysis GP-Single-2, Steering (% (% (% (% (ifg 0.70230484 a α (* n -0.9850136)) (+ a a )) (+ (% a v) a )) (- ((neg (* n n)) (neg a)) (neg a))) ((% a v) (neg (* (% a v) (% a v))))) Genetic Analysis GP-Single-2, Steering (% (% (% (% (ifg 0.70230484 a α (* n -0.9850136)) (+ a a )) (+ (% a v) a )) (- ((neg (* n n)) (neg a)) (neg a))) ((% a v) (neg (* (% a v) (% a v))))) Genetic Analysis GP-Single-2, Steering ... Genetic Analysis GP-Single-2, Steering behavior depends on distance, a, to upcoming curve: when next turn is far enough, controller slightly adjusts wheel angle to prevent drifting off track; when approaching a curve, however, controller steers according to relative curve angle—steep curves will result in extreme wheel angle values. Genetic Analysis what’s a/v? • a – distance to next obstacle • v – current speed Genetic Analysis what’s a/v? • a – distance to next obstacle • v – current speed a/v – time to crash! Genetic Analysis GP-Single-2, Throttling (ifpos (abs (% v a)) (- (% 1.0 (% v a)) (neg (- (* n (* n -0.86818504)) (neg a)))) (% (neg (- (- (* n (neg toright)) (neg a)) (neg a))) (- (% 1.0 (% v a)) (neg (% (* n (neg n)) (% v a)))))) Genetic Analysis GP-Single-2, Throttling (ifpos (abs (% v a)) (- (% 1.0 (% v a)) (neg (- (* n (* n -0.86818504)) (neg a)))) (% (neg (- (- (* n (neg toright)) (neg a)) (neg a))) (- (% 1.0 (% v a)) (neg (% (* n (neg n)) (% v a)))))) Genetic Analysis GP-Single-2, Throttling (- (% 1.0 (% v a)) (neg (- (* n (* n -0.86818504)) (neg a)))) Genetic Analysis GP-Single-2, Throttling Future Work • apply GP to other RARS variants – multiple-car scenarios – long (endurance) races • use GA to plan optimal paths • migrate research to TORCS Bibliography • • • • • • • Russell, Stuart and Norvig, Peter. Artificial Intelligence: A Modern Approach. 2nd edition. s.l. : Prentice Hall, 2003. ISBN 0-13-790395-2 Darwin, Charles. On the Origin of Species: By Means of Natural Selection or the Preservation of Favoured Races in the Struggle for Life. London : John Murray, 1859. ISBN 0-486-45006-6 GP-Robocode: Using Genetic Programming to Evolve Robocode Players. Shichel, Yehonatan, Ziserman, Eran and Sipper, Moshe. s.l. : Springer, 2005. 8th European Conference on Genetic Programming. pp. 143-154 Eleveld, Doug. [Online] http://rars.sourceforge.net/selection/douge1.txt Pajala, Jussi. [Online] http://rars.sourceforge.net/selection/jussi.html Wang, Zhijin. Car Simulation Using Reinforcement Learning. Computer Science Department, University of British Columbia. Vancouver, B.C., Canada : s.n., 2003 MoNiF: a modular neuro-fuzzy controller for race car navigation. Ng, Kim C, et al. Monterey, CA, USA : s.n., 1997. IEEE International Symposium on Computational Intelligence in Robotics and Automation. pp. 74-79. ISBN 0-8186-8138-1 Bibliography • • • • • Learning to Race: Experiments with a Simulated Race Car. Pyeatt, Larry D and Howe, Adele E. Sanibel Island, Florida, USA : s.n., 1998. 11th International Florida Artificial Intelligence Research Society Conference Coulom, Rémi. Reinforcement Learning Using Neural Networks, with Applications to Motor Control. Institut National Polytechnique de Grenoble. 2002. PhD Thesis Cleland, Ben. Reinforcement Learning for Racecar Control. University of Waikato. 2006. M.Sc. Thesis Neuroevolution of an automobile crash warning system. Stanley, Kenneth, et al. 2005. Genetic And Evolutionary Computation Conference. pp. 1977 - 1984. ISBN 1-59593-0108 Sáez, Yago, et al. Driving Cars by Means of Genetic Algorithms. Parallel Problem Solving from Nature – PPSN X. s.l. : Springer, 2008, pp. 1101-1110