Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Poznan University of Technology Faculty of Computing Science and Management Institute of Computing Science Evolving Players for a Real-Time Strategy Game Using Gene Expression Programming Paweł Lichocki 1 September 2008 Supervisor: Krzysztof Krawiec, Ph.D. 1 Clear page Abstract This thesis focuses on the fields of real-time strategy games, evolutionary computation, distributed machine learning and multi-agent systems. In general, the problem is to automatically learn the best strategy to play a real time strategy game, more precisely a two-player combat of marines and tanks. The idea was inspired by ORTS RTS Game AI Competition held annually at University of Alberta. The given problem is very complex and multicriterial, thus final solutions presented here are the result of a constant development and countless improvements. In the paper we try to underline the iterative nature of this process and propose a methodology that could be used for different problems in the real-time games field. We show how to model the strategy as a multi-agent system and how to fine-tune the evolutionary process of searching best players. We also explore the subject of distributed learning, focusing on using a computation cluster for evaluating solutions. The methods of evaluation are also elaborated in the context of co-evolution, we compare two different methods that use competitive fitness - single elimination tournament and hall of fame. In order to encode such complex structures as game strategies, we developed an innovative approach to strongly typed gene expression programming. It uses a linear chromosome that in the process of two phase expression is transformed into a valid strongly typed tree. Finally, we also present the issues of design and implementation of the framework used for performing experiments. All computations we have conducted require more than two years of a constant work of a single machine. But in the end - regardless of this computational effort - not all of the evolutionary methods worked as they had been expected to. However, the data gathered allowed us to draw many constructive conclusions and to propose several different directions of future research. Acknowledgments I am very grateful for the advice of my supervisor, Krzystof Krawiec, and his assistant, Wojciech Jaśkowski, from Poznan University of Technology. I wish to thank both of them for their feedback and guidelines. Wojciech Jaśkowski was an irreplaceable source of ideas and literature references. He kept me focused on my research, constantly encouraged me to do more work and - if needed - did not hesitate to criticise it constructively. I would like also to acknowledge Prof. Cristina Bazgan and Prof. Daniel Vanderpooten, who have been supportive during my visit at Université Paris-Dauphine. Being there I was lucky enough to be able to share my ideas with other French students. Their remarks allowed me to view my research from a different angle, for which I am very grateful to them. Contents 1 Introduction 1.1 Scope of research . . . . 1.2 Motivation and benefits 1.3 Goals . . . . . . . . . . 1.4 Thesis organization . . . 1.5 Used concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 The problem 2.1 Methodology . . . . . . . . . . . . 2.2 ORTS contest . . . . . . . . . . . . 2.3 Rules of the game . . . . . . . . . . 2.3.1 Type of game . . . . . . . . 2.3.2 Simulation . . . . . . . . . 2.3.3 World representation . . . . 2.3.4 Movement . . . . . . . . . . 2.3.5 Fight . . . . . . . . . . . . 2.3.6 Specific rules . . . . . . . . 2.3.7 Available orders . . . . . . . 2.4 State of the art . . . . . . . . . . . 2.4.1 Evolutionary computation . 2.4.2 Multi-agent systems . . . . 2.4.3 Machine learning in games 2.4.4 Commercial RTS games . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 Modelling 3.1 Objectives . . . . . . . . . . . . . . . . . . . . . 3.2 Initial player model . . . . . . . . . . . . . . . . 3.2.1 Human-based player . . . . . . . . . . . 3.2.2 MAS-based player . . . . . . . . . . . . 3.3 Minimizing AI task . . . . . . . . . . . . . . . . 3.3.1 Domain knowledge and memory . . . . 3.3.2 Focusing the AI task on units movement 3.4 Multi-agent system . . . . . . . . . . . . . . . . 3.4.1 Hybrid MAS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 7 7 8 9 9 . . . . . . . . . . . . . . . 11 11 12 13 13 13 13 13 14 14 15 15 15 16 17 18 . . . . . . . . . 20 20 20 20 21 22 22 22 23 23 5 Contents 3.4.2 3.4.3 Agent . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Vector representation . . . . . . . . . . . . . . . . . . . . . . . . . 4 Learning 4.1 Objectives . . . . . . . . . . . . . . . . 4.2 Algorithm . . . . . . . . . . . . . . . . 4.2.1 Search loop . . . . . . . . . . . 4.2.2 Selection and elitism . . . . . . 4.2.3 Co-evolution . . . . . . . . . . 4.3 Evaluation . . . . . . . . . . . . . . . . 4.3.1 Single Elimination Tournament 4.3.2 Hall of Fame . . . . . . . . . . 4.4 Distributed learning . . . . . . . . . . 4.4.1 Assumptions . . . . . . . . . . 4.4.2 Distributed SET . . . . . . . . 4.4.3 Distributed HoF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 Encoding 5.1 Objectives . . . . . . . . . . . . . . . . . . . 5.2 Strongly typed GEP . . . . . . . . . . . . . 5.2.1 Encoding simple structures . . . . . 5.2.2 Closure of the representation . . . . 5.2.3 Expression by two-phase translation 5.3 Genetic operators . . . . . . . . . . . . . . . 5.3.1 Recombination . . . . . . . . . . . . 5.3.2 Transposition . . . . . . . . . . . . . 5.3.3 Inversion and mutation . . . . . . . 5.3.4 Dc and RNC operators . . . . . . . 6 Representation 6.1 Objectives . . . . . . . . . . . . . . . . . 6.2 Types . . . . . . . . . . . . . . . . . . . 6.3 Functions . . . . . . . . . . . . . . . . . 6.4 Domain knowledge . . . . . . . . . . . . 6.4.1 Simple terminals . . . . . . . . . 6.4.2 Complex scalar terminals . . . . 6.4.3 Complex vector terminals . . . . 6.4.4 Complex boolean terminals . . . 6.4.5 Normalization . . . . . . . . . . . 6.4.6 RNC vectors and map mirroring 7 Implementation 7.1 Objectives . . . . . . . . . . 7.2 The framework . . . . . . . 7.2.1 Master-slave design . 7.2.2 Tools and libraries . 7.3 Maintaining experiments . . 7.3.1 Monitoring . . . . . 7.3.2 Logging and analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 24 . . . . . . . . . . . . 26 26 26 26 27 27 28 28 30 32 32 32 34 . . . . . . . . . . 35 35 35 35 36 36 39 39 41 42 43 . . . . . . . . . . 44 44 44 45 45 45 46 48 49 50 50 . . . . . . . 51 51 51 51 53 53 53 54 8 Experiments and results 55 8.1 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 8.2 Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 6 Contents 8.3 8.4 8.5 8.6 First experiment - The Reconnaissance 8.3.1 Objectives and assumptions . . 8.3.2 Results . . . . . . . . . . . . . 8.3.3 Conclusions . . . . . . . . . . . Second experiment - The Skirmish . . 8.4.1 Objectives and assumptions . . 8.4.2 Results . . . . . . . . . . . . . 8.4.3 Conclusions . . . . . . . . . . . Third experiment - The Final Battle . 8.5.1 Objectives and assumptions . . 8.5.2 Results . . . . . . . . . . . . . 8.5.3 Conclusions . . . . . . . . . . . 8.5.4 ORTS contest . . . . . . . . . . Next steps . . . . . . . . . . . . . . . . 8.6.1 Evolution dynamics . . . . . . 8.6.2 Computational cost . . . . . . 8.6.3 Redefinition of the problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 56 57 59 59 59 60 62 63 63 63 66 66 67 67 67 68 9 Summary 70 9.1 Contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 9.2 Work ahead . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 A DVD content 72 B Acronyms 73 Bibliography 74 1 Introduction The easiest way is always mined. One of the Murphy’s Laws of Combat Operations. 1.1 Scope of research This thesis focuses on the fields of real-time strategy (RTS) games, evolutionary computation (EC), distributed machine learning and multi-agent systems (MAS). In general, the problem is to automatically learn the best strategy to play an RTS game, more precisely - a Real-Time Tactical (RTT) two-player combat of marines and tanks. The method to achieve this is using a novel approach of strongly typed gene expression programming (stGEP) combined with a MAS model. The given problem is very complex and multicriterial, thus final solutions presented here are a result of a constant development and countless improvements. In the paper we try to emphasise the iterative nature of this process. Apart from showing that a machine is able to learn to play an intellectually demanding real-time game, this thesis proposes a methodology of modelling any RTT game and shows the methods of distributed learning. 1.2 Motivation and benefits The computer and video game sales in 2007 were worth 9.5 billion dollars in the United States only and the best-selling computer game genre was the strategy one - 33.9% of all units sold [9]. The game industry is growing and consolidating. It was in November 2006 the European Games Developer Federation was officially formed by its founding members, representing over 500 leading game development studios from ten European countries [2]. Games are constantly being improved, they need to be more complex, faster and smarter, which raises a wide range of unsolved problems in nearly all fields of computer science, as for example graphics, code optimization, algorithmics and artificial intelligence, which interests us the most. The volume of sales and never-ending intellectual and engineering problems bring more and more attention to the game industry from the scientific environment. It is worth mentioning that whereas commercial games 1.2 Motivation and benefits 8 industry pursue to increase the entertaining value of games, the AI research tries to push the cognitive abilities of the machines to new levels [11]. In the context of machine learning, the RTS games are especially challenging. Their common elements are severe time constraints and a strong demand of real-time AI which must be capable of solving real-world decisions quickly and satisfactory [12]. As for now, most scientific and programming effort has been devoted to the areas related to classic board games, such as chess, go or checkers. Thus, the main motivation, apart from aiding billions of dollars worth industry, is to bring the AI in games onto another level, to explore new possibilities and develop new methods. As the research in this field has begun only recently, one might say that this paper tries to co-create and enhance its fundamentals. Since computer games require methods for dealing with many different problems the benefits of solving them are also diverse: • Simulations are a critical aspect of military training which has much in common with commercial computer games and can learn from its successful experience [30]. • The cooperative path-finding and efficient management of unit groups is helpful in logistics. • Learning how to gather and distribute resources between industry, science and military might aid micro and macro economy. • And last but not least - RTS games can be seen as a part of operational research, being successfully transformed to many different combinatorial or scheduling problems. These are just few examples of many applications which may profit from the progress of AI research in RTS games. 1.3 Goals There is one main goal of this thesis - to automatically create the best strategy for a given RTS game. We assume that the measure of the AI success may be intuitively represented by the formula A·I, the more artificial and intelligent the resulting strategy is, the better. However, keeping the human intervention in the learning process to minimum is not a goal of itself. Sometimes it is for the best to incorporate some human knowledge accordingly to rule “gain much intelligence at the cost of little artificiality”. This thesis has several additional objectives: • Present strongly typed GEP, which is an innovative method of encoding complex structures as linear entities. • Elaborate the methodology of modelling and learning RTS players. • Analyze different methods of evaluation solutions in the context of distributed learning. • Prove experimentally that the approach is able to produce good solutions. • Test human-competitiveness by submitting best solutions to the 2008 ORTS RTS Game AI Competition [1]. • Present the design of the software framework used in experiments. This includes the stGEP implementation, the way of conducting evolutionary experiments and the method of distributing the computations. 1.4 Thesis organization 1.4 9 Thesis organization This thesis is organized as follows: • Chapter 1 refers to objectives and motivations of this work. • Chapter 2 presents the problem (rules of the game) and the methodology used to solve it. • Chapter 3 shows in details how to model an RTT player, we focus especially on using multi-agent systems. • Chapter 4 addresses the topic of evolving best strategies and fine-tuning the process of learning. The main issue brought up in this chapter is the evaluation of RTT players in distributed environment. • Chapter 5 gives details on a way to encode a complex structure in a linear chromosome. We also introduce the innovative approach of strongly typed gene expression programming. • Chapter 6 describes exact representations used to construct strategies in the experiments. • Chapter 7 is about the implemented software and the tools used and developed within the project. We show the design of a master-slave scheme of the framework and give some insights on how the experiments were maintained. • Chapter 8 shows the results of the experiments, the stress is put on the iterative process of introducing changes into our approach. For each experiment we present initial objectives and assumptions, than the analysis of the results and finally - the conclusions. • Chapter 9is a short summary of all work done and contains general conclusions and ideas for the future. 1.5 Used concepts The classification of computer games is still an open topic of dispute. Thus, it is necessary to make clear definitions of some concepts. • Real-Time - means not turn-based. In reality it is not possible to simulate a game with a continous time. Therefore, real-time is approximated by using as many turns per second as necessary, so the human player has the illusion of time continuity. • Real-Time Strategy (RTS) - a strategic computer game which gameplay mechanics implies combat, resource gathering, base building and technological development, as well as abstract unit control (giving orders as opposed to controlling units directly) [26]. • Real-Time Tactics (RTT) - a tactical computer game which gameplay mechanics implies only combat and abstract unit control. Players are expected to complete their tasks (usually defeating the enemy side) using only the combat forces provided to them. We consider RTT to be a subset of RTS games. 10 1.5 Used concepts Also the basic term strategy has different meanings, depending on the context. • Strategy - in game theory it is a way to play the game, a detailed plan describing the player’s behaviour in every possible situation (game state). The military point of view is different, let us quote wikipedia [5] Military strategy is a collective name for planning a warfare. The father of modern strategic study, Carl von Clausewitz, defined military strategy as "the employment of battles to gain the end of war." Liddell Hart’s definition put less emphasis on battles, defining strategy as "the art of distributing and applying military means to fulfill the ends of policy" Hence, both gave the preeminence to political aims over military goals, ensuring civilian control of the military. Military tactics (Greek: Taktikē, the art of organizing an army) are the techniques for using weapons or military units in combination for engaging and defeating an enemy in battle. The border line between strategy and tactics is blurred and sometimes categorization of a decision is a matter of almost personal opinion. Therefore for being consistent with genre classification of computer games, game theory and the military concepts, the key terms are defined as follow: • Real-Time Tactics - used to describe the game this thesis focuses on. From military point of view it is a simulation of a small battle with marines and tanks. • Strategy - is understood like in the game theory, that is as an algorithm to play the game. In context of RTT and military it is closest to military tactics. In this paper, as a synonym to strategy a word “player” is frequently used. Additionally, in the context of EC the strategy may be considered as the individual and more precisely - as its phenotype. 2 The problem No battle plan ever survives contact with the enemy. One of the Murphy’s Laws of Combat Operations. 2.1 Methodology Machine learning of the RTT player is a very complex and difficult task. Therefore it was break down into several steps. This decomposition to few independent subproblems simplifies and organizes the work. It was possible to propose a whole methodology for the AI-in-games learning. 1. Make assumptions (a) define the problem (b) set objectives and goal (c) view the literature if there is a research on similar problem 2. Modelling (a) model a general player (b) break down the model and focus the AI effort on the most important task (c) formulate the AI as a procedure returning a well-defined result (d) specify the details of the model 3. Learning (a) choose learning method (b) specify the method of evaluation of players (c) define the smallest indivisible computational task of the evaluation method (d) distribute the computation 4. Encoding 12 2.1 Methodology (a) encode the player as an abstract entity (b) define the expression of the player from the abstract entity (c) define the operators for exploring encoded solutions space 5. Representation (a) specify all the details of encoding the player - define solution space (b) put stress on designing the domain knowledge 6. Implementation (a) choose existing frameworks for simulations and learning or implement them yourself (b) implement all necessary additional software (c) if needed design a method of distributing the computations, since game simulations may be very time consuming 7. Experiments (a) conduct experiments (b) draw conclusions - if needed go back to any point above, and than test again (c) when satisfied, return the result 2.2 ORTS contest ORTS stands for Open Real-Time Strategy. It is an open-source and hack-free programming environment for real-time strategy games simulation. Let us quote the project’s site [3] ORTS is a programming environment for studying real-time AI problems such as pathfinding, dealing with imperfect information, scheduling, and planning in the domain of RTS games. These games are fast-paced and very popular. Furthermore, the current state of RTS game AI is bleak which is mainly caused by the lack of planning and learning - areas in which humans are currently much better than machines. Therefore, RTS games make an ideal test-bed for real-time AI research. Unfortunately, commercial RTS games are closed software which prevents researchers from connecting remote AI modules to them. Furthermore, commercial RTS games are based on peer-to-peer technology - which in a nutshell runs the entire simulation on all player machines and just hides part of the game state from the players. By tampering with the client software it is possible to reveal the entire game state and thereby gain an unfair advantage. We feel that this is unacceptable for playing games on the Internet. We therefore started the ORTS project to create a free software system that lets people and machines play fair RTS games. The communication protocol is public and all source code and artwork is freely available. Users can connect whatever client software they like. This is made possible by a server/client architecture in which only the currently visible parts of the game state are sent to the players. This openness leads to new and interesting possibilities ranging from on-line tournaments 13 2.2 ORTS contest of autonomous AI players to gauge their playing strength to hybrid systems in which human players use sophisticated GUIs which allow them to delegate tasks to AI helper modules of increasing performance. The ORTS RTS Game AI Competitions are organized annually from 2006 at University of Alberta in Canada. This years edition was held on 1-8 August 2008. It was decided to participate in the constest in order to compete with handmade AI and check humancompetitiveness of proposed solutions, for details please see Subsection 8.5.4. 2.3 Rules of the game 2.3.1 Type of game The game that this thesis focuses on is a two-player RTT. All rules, definitions and descriptions were taken from Game 4 (“Small-Scale Combat”) of 2008 ORTS RTS Game AI Competition [15, 13]. Since we use the Open Real-Time Strategy (ORTS) engine to simulate the game, it is necessary to explain in details how does this software work, because it somehow defines rules of the game itself. Two players start with 50 marines and 20 siege tanks each, randomly located within the left or right quarter of the map. Marines and tanks are located diagonally symmetric to the opponent’s. Neutral units (called “sheep”) are roaming randomly the map. The objective of the game is to destroy as many opposing units as possible in 2400 simulation frames. The games at the contest were run at speed of 8 simulation frames per second, which refers to 5 minutes. During our research we run the simulations as fast as possible, reaching approx. 40 frames per second. This means that single game lasts at maximum for one minute, this is the approximation that will be used in time cost analysis of the learning process in Section 4.4. 2.3.2 Simulation Time in the games is measured in discrete frames of equal duration. In the ORTS competition a pace of 8 frames per second was used. Once per frame the game server sends individual game views to all clients (the player software) which then can specify at most one action per game object under their control and send this vector of actions back to the server. All received actions are then randomly shuffled and executed on the server. 2.3.3 World representation The world is represented by a rectangular array of 64x48 tiles. There are no terrain features (the terrain is unobstructed) nor fog of war (there is full visibility). Objects in the world have a shape and a position on the terrain. The position and size of objects are represented by integers using a scale of 16 points per tile (“tile points”). Units (marines and tanks) are small circles, that are specified by their center and radius. 2.3.4 Movement All units have a maximum speed with which they can move. This means that within each simulation frame, the valid moves for an object are constrained to the integer coordinates that are within a distance of less or equal to the speed of the object. Movement targets are 14 2.3 Rules of the game only constrained to be integers; any location on the game field is acceptable. Every move is assumed to go in a straight line from the object’s current position to its destination. Objects move simultaneously and their current locations are rounded to tile points before being sent to the clients which can lead to temporary and small object overlap on the client side. In case of collisions with other objects, the moving objects are stopped at the collision location and no damage is inflicted. If the target location for a move command cannot be reached in one simulation cycle the object will continue to move in a straight line until: a new move (or stop command) is sent, the object collides with a other object or scripted game mechanics cause its motion to change. It is not possible to do several moves in one tick in order to avoid obstacles, even if the total distance of the moves is less than the maximum speed of the unit. 2.3.5 Fight Units can engage in combat with other units. Marines and tanks attack from a distance. It suffices for any part of the attacked object to be in range. Specifically, weapon range is compared against the minimum physical distance between any part of the objects. After attacking, the weapons are subjected to a cooldown period before they can attack again. The cooldown time is specified in simulation frames. Objects have hit points (HPs) indicating how much damage they can take before being destroyed. Lost HPs cannot be regained. Units can also have armor which decreases the damage dealt by a weapon by subtracting a constant from each attack. Damage values are uniformly distributed over certain intervals. Units only die after a simulation frame has been completed when their HPs have dropped below 1. This ensures that the order of executing attack actions is irrelevant. 2.3.6 Specific rules In the object value tables below, the vision range unit is tiles, build times and cooldown periods are given in simulation frames, costs are in minerals, the speed unit is tile points per simulation frame, and object sizes are given in tile points. For detailed units characteristic please refer to Table 2.1. Object HP Speed Size Range Armor Damage Cooldown Switch Tank Tank (in siege) Marine Sheep 150 80 ∞ 3 0 3 3 7 7 4 4 112 32-160 64 - 1 1 0 0 26-34 50-60 5-7 - 20 50 8 - 24 24 - Tab. 2.1: Units characteristics Marines and unsieged tanks can move and attack simultaneously. However, let us emphasise that only one action per game object can be send to the server in each simulation frame. Therefore in the simulation frame in which the unit shoots it is not possible to also order it to move (however it will continue moving if ordered to do so before). Tanks can switch between sieged and unsieged modes, enabling weapons weapon2 and weapon respectively. Neither weapon is available during this transition, nor can the tank move. In siege mode tanks attack ground locations dealing out splash damage with a radius of 15. This means that targets at the impact location receive the full damage, whereas objects at a distance receive linearly scaled down damage up to the impact-to-hull distance of 15 tile points. 15 2.3 Rules of the game 2.3.7 Available orders The full list of possible orders for each unit that can be send to the server is as fallows: • Marine – move(x,y[,s]): start moving towards (x,y) with speed s (or max-speed) – stop(): stops moving and attacking – weapon.attack(obj): attack object • Tank1 – move(x,y[,s]): start moving towards (x,y) with speed s (or max-speed) – stop(): stops moving and attacking – switch(): begin transition between sieged/unsieged modes (no effect if already switching) – weapon.attack(obj): attack object – weapon2.attack(x,y): create explosion of radius 15 centered on (x,y), dealing out splash-damage 2.4 State of the art 2.4.1 Evolutionary computation Evolutionary computation is a subfield of artificial intelligence and its techniques mostly involve metaheuristic optimization algorithms. Although there are many other biologically inspired approaches that relate to EC, the main part of it are evolutionary algorithms (EA). They use some mechanisms inspired by biological evolution: reproduction, mutation, recombination, and selection. Candidate solutions to the optimization problem play the role of individuals in a population, and the cost function determines the environment within which the solutions "live”. Evolution of the population then takes place after the repeated application of the above operators [4]. We wish to show the history and progress of evolutionary algorithms, therefore we distinguish four main mile stones2 : • genetic algorithms (GA) - the oldest of all EA methods dating back to mid ’50s. The chromosome is a sequence of ones and zeros and the fitness function is defined over the genetic representation and measures the quality of the represented solution. Therefore it may be said that in GA the phenotype is the same as the genotype and the method finds fixed solutions. • evolutionary programming (EP) - first used by Lawrance J. Fogel in the mid ’60s [24]. The chromosome is an abstract entity, which need to be expressed into solutions domain before being evaluated by fitness function. It may be said that in EP the phenotype is not the same as the genotype and the method finds fixed solutions. 1 weapon/weapon2 are enabled/disabled based on the current mode do not mention one, very popular subfield of EA, which is evolution strategies (ES) developed in early ’70s and are mainly used for optimization in continuous spaces 2 we 16 2.4 State of the art • genetic programming (GP) - the method for evolving computer programs. The first experiments with GP were reported by Smith [41] and Cramer [16]. However it was John Koza who in turn of ’80s and ’90s propagated the idea starting with [31] and popularized it in a series of his books. In GP a chromosome has a tree structure and does not need further expression. Therefore it may be said that in EP the phenotype is the same as the genotype and the method finds algorithms (a way of finding solutions). • gene expression programming (GEP) - the youngest of all EA methods, created and developed by Cândida Ferreira at the beginning of XXI century in her first papers [18, 19]. GEP is formally a subset of Genetic Programming and is sometimes referred to as linear GP. However, there are substantial differences - in GEP the chromosome is a linear entity which is expressed into a tree. Therefore it might be said that in GEP the phenotype is not the same as the genotype and the method finds algorithms (a way of finding solutions). It might be argued which of the methods is the best - is it better to evolve solution or entire algorithms. That obviously depends on the application and the problem. It is also impossible to say if it is better to have different genotypes and phenotypes or not. A comparison between GP and GEP effectiveness is a matter of still open discussion with some interesting claims by C. Ferreira [cite] that GEP may outperform GP by a factor from 100 to 60000 [23]. However, the progress from basic heuristics searching for sequences of 1s and 0s to methods finding algorithms encoded in an abstract form is unquestionable. For more information on evolutionary computation in general, please see David E. Goldberg book [27]. For Polish reader we recommend the book of Jarosław Arabas [8]. 2.4.2 Multi-agent systems The terms in the field of multi-agent systems are not well defined. Generally speaking an agent might be considered as computational mechanism that exhibits a high degree of autonomy, performing actions in its environment based on information (sensors, feedback) received from the environment [35]. A multi-agent means there are more than one agent in the environment. MAS can be used to solve problems which are difficult or impossible for an individual agent or monolithic system to solve. Examples of problems which are appropriate to multi-agent systems research include online trading, disaster response, and modelling social structures [6]. Multi-agent learning (MAL) means applying method of machine learning in order to develop a multi-agent system that will solve given problem or behave in desired way. Two mostly often used methods of MAL are: • Reinforcement learning - useful in domains where reinforcement information is provided after a sequence of actions performed in the environment [35]. Then an agent is rewarded or punished. This cannot be applied in RTT player learning since we do not know if agent behaviour is desired until the game ends. • Stochastic search - methods such as evolutionary computation, simulated annealing, stochastic hill-climbing, etc. The simplest approach in MAL is so-called team learning. It uses a single process of learning to discover a behaviour for entire team. Because there is only one learner, standard algorithms from single-agent machine learning may be used. Also this method focuses on performance of entire team and not only single individuals. All this makes 17 2.4 State of the art team learning a perfect choice for finding good game strategies. There are three different types of team learning: • Homogeneous learning - only a single behaviour algorithm is created and each agent acts accordingly to it. This simplifies the process of learning, since all the agents have identical behaviour and the search space is drastically reduced. However, sometimes problem requires agent specialization, in that case homogeneous learning is not advised. • Heterogeneous learning - there are separate behaviour algorithms created for each agent, thus this approach is ideal for problems requiring the emergence of specialists. However, the level of complexity is much higher than in homogeneous learning. • Hybrid learning - there are separate behaviour algorithms created for each type of agents. It seems this method combines the best aspects of two previous ones keeping the search space in a reasonable size. 2.4.3 Machine learning in games Games has always been used as a sort of testbed for AI methods. Two-player board games like checkers or chess have all been extensively researched and it is believed that machine learning matured enough to challenge much more difficult games, particularly the RTT games. Generally speaking, the main idea basis on learning from simulation. The method itself is not new, since it was first proposed for classic turn based games with imperfect information and/or random components [25], for example bridge or backgammon. In these cases the conventional search of the best move for the given game state is using a technique which evaluates positions by playing a multitude of games with this starting position against itself. However, in the case of RTT games learning from simulation is used to discover entire strategies to play the game, since its complexity makes evaluating of single moves impossible. Let us present a quick description of some positions from the literature, we will focus mainly on using EC in learning to play a game: • In [10] Azaria and Sipper apply genetic programming to the evolution of strategies for playing the game of backgammon. They explore two different strategies of learning: using a fixed external opponent as teacher, and letting the individuals play against each other. The conclusions were that the second approach is better and leads to excellent results. • In [28] Hauptam and Sipper successfully use genetic programing in the evolution of strategies for playing chess endgames, achieving competitiveness with human-based strategies. The work was continued in [29] where entire chess player was evolved. • In [40] Sharabi and Sipper took the AI in games to another level by evolving a control system for real-world, sumo-fighting robots. • In [17] Crawford-Marks tries to evolve team players for quidditch, a complex 3dimensional game taken from popular Harry Potter books [39]. However he concludes that on the evolutionary front, the first attempt at evolving quidditchplaying was not as successful as was hoped for. 18 2.4 State of the art Feature Commercial RTS games ORTS Cost License Game specification Network mode Prone to hacks Communication protocol Network data rate Unit control Game interface ~US$ 55 closed software fixed peer-to-peer yes veiled medium high-level, sequential fixed GUI US$ 0 free software (GPL) user-definable server-client no open low to medium low-level, parallel user-definable Tab. 2.2: How ORTS relates to commercial games 2.4.4 Commercial RTS games The rich set of RTS games is available on the market nowadays. The author himself remembers spending many hours playing such games as Warcraft II™, Starcraft™, Age of Empires™or Seven Kingdoms™. This games are fast-paced war simulations and became very popular in recent past, but mainly thanks to the multiplayer option. Single player campaigns are maybe very interesting, but after a relatively short time they stop being a real challenge and if a human player finishes the game it is mainly thanks to the narrative story embedded in the campaign. The way of designing the AI in the commercial RTS games is known only to developers. However it is not a secret that “the intelligence” is hand-designed. The AI programmers first learn to play the game in order to understand its rules and to design good strategies, which then are scripted and hardcoded into the game. The iterative process of tests and improvements results in a static and definitive computer player. Some developers do boast about innovative AI that learns even during playing, but as for now the practice shows that they are no match for a human player. Commercial RTS games software is closed and not expandable. This prevents researchers and hobbyists from tailoring RTS games to their needs and from connecting remote AI modules in order to gauge their playing strength [14]. It effectively disturbs the real progress of the AI in RTS. To counteract this M. Buro with co-workers implemented the previously mentioned ORTS framework. There are several advantages of using it, please refer to Table 2.2 taken from [14] for a comparison between commercial games and ORTS. Figures 2.1 and 2.2 show two exemplary screenshots downloaded from [3]. Fig. 2.1: ORTS screenshot, the framework may use 2D graphics Fig. 2.2: ORTS screenshots, the framework has also high quality 3D graphics 3 Modelling Teamwork is essential; it gives the enemy other people to shoot at. One of the Murphy’s Laws of Combat Operations. 3.1 Objectives This paragraph is about modelling a player in RTT game. We try to show the top-down nature of the design process. In addition to what was said in Section 1.5, we precisely define the strategy as a simple, interpretable procedure for (successfully) playing a game or subgame [25]. The requirements for this procedure are: • it must be easy to use in the playing application (a client for the ORTS server) • it must be easy to represent as an abstract entity to apply the methods of machine learning • it must be possible to automatically and effectively learn it 3.2 Initial player model 3.2.1 Human-based player The most intuitive model of a strategy in the RTT game is an all-knowing player which is the only, central decision maker. This approach is similar to human behaviour, which might be summarized in a following rule. As frequently as possible and/or in the reaction to game events, use all information available to give orders to selected units under your command. One may say that a player is a sort of “overmind” who knows as much as possible about the state of the game and is directing all the units. Let us rephrase the rule in the language of pseudo-code and in the context of client-server nature of an ORTS engine. There are two main issues regarding this approach: 21 3.2 Initial player model struct Decision Order o; Unit u; end; procedure player begin repeat State state = get_game_state_from_server(); Decision d = make_decision(state); send_order_to_server(d.unit, d.order); until game over end; Algorithm 3.1: Human-like RTT Player • Total freedom in making decisions. The AI must be capable not only to give best orders to units, but also to choose which unit and when to command! This greatly enlarge the solution space and makes practically impossible to learn the AI that will play the game. • Scalability. The decisions must be adequate both at the beginning of the game, when entire armies are still present and at the end, when probably only few units are left alive. Of course there are countless transition situations between those two. Human player easily overcomes this challenges, machine does not. 3.2.2 MAS-based player Designing a strategy that plays as human, that constantly and actively chooses and makes decisions both in the context of units and orders is a task worth separate thesis and research itself. Thus, there is a a great need for simplification and the main idea is to let units decide for themselves. This leads to a transformation from a situation where there is only one decision maker to a multi-agent system model. From now on each unit chooses the best action for it to do by itself (basing on the game state). procedure player begin foreach simulation frame do State state = get_game_state_from_server(); foreach friendly unit do Order o = unit.choose_order(state); if (NULL != o) then send_order_to_server(unit, o); done done end Algorithm 3.2: RTT Player basing on a MAS model The code 3.2 looks intuitive and simple. Let us underline that all AI is hidden in the function choose_order. There are however few severe drawbacks left. • The orders are made basing on the game state. Which means that all friendly and enemy units, their positions and current parameters are taken into account. But 22 3.2 Initial player model when considering one selected unit, large part of it is irrelevant, because orders have a local character. Knowing a priori which information to select could greatly aid the AI. • On the other hand, game state alone is not sufficient - there is a need for additional information derived from it. For example the number of still living enemy units is not given explicitly. Expecting that AI will learn, apart from playing, how and what knowledge extract from a game state heavily enlarges the solution space. 3.3 Minimizing AI task 3.3.1 Domain knowledge and memory Simply put, it is too much to ask from a machine to simultaneously learn how to acquire information and use it to make wise decisions in a noisy environment. In case of RTT the game state can be considered as raw data from which further information must be retrieved. In addition, when using MAS the information must be gathered in the context of an agent (unit). Listing 3.3 shows necessary changes to the algorithm. At this point we will not plunge into details how exactly domain knowledge is extracted, this will be covered in the Chapter 6. However, please notice that the algorithm has been enriched with the concept of memory. For example, it might be desirable to know if a unit was shot in the last simulation frame or how did the unit move previously. For simplicity reasons only the current and the last game state are remembered. procedure player begin State previous = NULL; State current = NULL; foreach simulation frame do previous = current; current = get_game_state_from_server(); foreach friendly unit do Information info = gather_information(previous, current, unit); Order o = unit.choose_order(info); if (NULL != o) then send_order_to_server(unit, o); done done end Algorithm 3.3: RTT Player with domain knowledge extraction and memory 3.3.2 Focusing the AI task on units movement The still unresolved problem is that it must be feasible to learn the function choose_order by conducting evolutionary experiments. After several brain cycles we decided to break down the choose_order to separate procedures, thus simplifying the task set ahead of AI. In this process several facts about given RTT game and ORTS engine are helpful: • The majority of time units move. At best, marines can shoot every 8 simulation frames and tanks - every 20. It means that units spend approx. 90% of their time moving. 23 3.3 Minimizing AI task • It might be assumed that units should shoot as frequently as they can. • If a unit moves intelligently (and that is the goal), we can assume that it is sufficient to always shoot the closest enemy. This makes the process of learning easier and more stable - a unit must only learn how to move, being certain it will always shoot the nearest enemy. • It is desired to always move with a maximum speed, because mostly often the goal is to reach destination as soon as possible. Thus, entire units behaviour can be brought down to 3.4: function unit.choose_order(info) begin Order o = NULL; Target t = NULL; if (self.can_shoot) then Target t = closest_target_in_range(info, self); end if (NULL != t) then o = SHOOT(t); else Vector v = self.choose_move_vector(info); if (self.type == TANK and v ==(0,0) ) then o = SWITCH(); else o = MOVE(v, self.max_speed); end send_order_to_server(unit, o); end Algorithm 3.4: Ordering a unit All AI effort is now put into finding the best vector of movement1 in the choose_move_vector routine. This way the problem of learning a strategy was successfully simplified and narrowed down in such a manner, that it seems to be possible to encode and learn a player. Algorithms 3.3 and 3.4 are the final ones. 3.4 Multi-agent system 3.4.1 Hybrid MAS In Subsection 2.4.2 three types of multi-agent systems are distinguished: homogeneous, heterogeneous and hybrid, each of them having certain advantages and drawbacks. For RTT player modelling the hybrid MAS will be used. The reasons for that are quite straightforward and actually relate to why not use any other type of MAS. • The tank and the marine are very different types of units and they cannot be modelled by the same agent. Therefore homogeneous MAS - assuming the use of only one agent to model all the units in the environment - is unsuitable. 1 in case of tanks the no-movement vector (0,0) is treated as an order to switch into siege mode 24 3.4 Multi-agent system • Heterogeneous MAS assumes there is a different agent for every entity in the model, which means that for the RTT strategy we would have 70 independent agents (50 marines and 20 tanks). This is unacceptable for two reasons. Firstly, the computation effort of maintaining such large players, as well as the memory cost, is too high. And secondly simultaneously learning proper behaviours for 70 agents is unacceptably enlarging the search space. The only reasonable choice is to use the hybrid MAS with two agents, first representing the marine type and second - the tank. 3.4.2 Agent In each simulation frame the agent is using the information retrieved from the game state to determine a vector where to move. There are many ways of doing so, for example one can imagine applying list of rules, using neural networks or other algebraic expressions. Keeping in mind that individuals in GP (and in GEP2 ) are expressed to trees, it was decided that they are the most natural representation. Being consistent with EC terminology they will be called in this thesis expression trees. In fact they are algebraic formulas that evaluate to a vector (a tuple of two real numbers). Kind Notation Description Function Function Function Terminal Terminal V = ADD(V, V) V = MUL(S,V) V = NOR(V) EAST = (1,0) NORTH = (0,1) addition of two vectors multiplication of a vector by a scalar normalization3 of a vector - Tab. 3.1: Exemplary set of operators and terminals Expression Description ADD(NORTH, EAST) MUL(-100,ADD(NORTH,EAST)) NOR(MUL(-100,ADD(NORTH,EAST))) describes a north-east vector of value (1,1) describes a south-west vector of value (-100, -100) describes a south-west vector of value (-1, -1) Tab. 3.2: Exemplary expressions For example, assume having a simple representation as presented in Table 3.1. Basing on this list it is possible to create many different expressions, see Table 3.2 for examples. The agent is exactly such an expression, only the actually used terminals and operators are different and more complex. It might be said that expression tree implements the choose_move_vector routine in algorithm 3.4. 3.4.3 Vector representation The vectors are presented as an ordered pair of two real numbers (a point in two dimensional space). It is important to emphasise that vectors are bounded (not free as could the representation suggest). They are bounded to the current operating agent (unit). Imagine having two units, first at position (1,2) and second at position (3,1). The move vector equals (2,1). In the first case, the move vector is bounded to the position (1,2) 2 in GEP however it possible to represent also other structures as for example neural networks [22] 3.4 Multi-agent system 25 thus results in a move to point (3,3). In the second case the move vector is bounded to the position (3,1) thus results in a destination (5,2). Perhaps the above remark is intuitive and obvious. However, in the context of the retrieving domain knowledge it is important to remember it. In Chapter 6 we propose acquiring the domain knowledge from a game state, for instance a vector to the geometric center of an enemy group. It is necessary to recalculate this vector for every unit, since from an agent point of view all vectors are bounded to its current position. 4 Learning Field experience is something you don’t get until just after you need it. One of the Murphy’s Laws of Combat Operations. 4.1 Objectives Knowing what the machine is going to learn it is necessary to define how. This section is about determining the details of the learning process. It was chosen to use gene expression programming, therefore few necessary components of the learning process are: • definition of genotype space • definition of phenotype space • transformation from genotype space into phenotype space1 • neighbourhood operators2 which allow to search trough genotype space (move from one solution to another) • evaluation and selection of the phenotypes which guide the process of searching (in a way chooses to which solution move to) • search loop which combines all of the above in order to find the optimum In this Chapter we focus on defining a general algorithm for the evolutionary learning process and on the methods of evaluation (also in the context of distributed computing). 4.2 Algorithm 4.2.1 Search loop The listing 4.1 shows the basic scheme of the search loop. The algorithm is similar to any other evolutionary technique. The details of methods and approaches used in this 1 in 2 in EC called expression EC called genetic or breed operators 27 4.2 Algorithm thesis - such as GEP, MAS, co-evolution, etc. - are hidden inside evaluate, select and breed procedures. procedure learn_best_player() begin Individual[] population; Individual best_ind; init( population ); repeat evaluate( population ); best_ind = get_best( population ); Individual[] parents = select( population ); population = breed( parents ); until stop_conditions return best_ind; end Algorithm 4.1: Evolutionary search loop • evaluate - composes of two steps. First is the expression of abstract, linear genomes (genotypes) into RTT players (phenotypes), which is explained in Section 5.2 covering the encoding of strategies. The second step is finding fitnesses of all individuals basing on their expressed phenotypes. This is the main topic this Section focuses on. • select - the procedure entirely abstracts from the definition of the problem, since it works in the domain of fitnesses (real numbers), for details please see the next paragraph. • breed - relates to neighbourhood operators which allow to search the genotype space. This is covered in Section 5.2. 4.2.2 Selection and elitism All the approaches, ideas and methods used by us in this thesis already create many degrees of freedom. It was not desired to introduce another ones. And since selection works in abstract domain of fitnesses there was no need to tailor it to RTT player learning. We took advantage over this and in all experiments we used the same selection method with the same parameters. There are several very popular ones and we have chosen the tournament selection with a size of a tournament set to 2. Additionally we always use elitism with a size of elite group set to 1. The reason of doing so is simple - we believe that manipulation of those settings has peripheral meaning for improving the RTT strategy learning process. Furthermore, we wished to observe the influence of other factors, such as changing evaluation method and the representation. Therefore the selection and elitism once set, remained unchanged during all experiments. 4.2.3 Co-evolution The first question that needs answering is how to evaluate a single strategy. After all, in a two-player RTT game it is always two strategies that are playing together, acting as reference points for each other. It is not possible to evaluate a single player abstracting from its current opponent. Let us emphasise that we wish to evaluate players without measuring their actual performance. It is irrelevant how did the strategy play in different games, all that matter is “better than” relation between compared set of strategies. This 28 4.2 Algorithm leads to intuitive conclusion that in order to evaluate entire population a sequence of games must be played, with each game increasing the fitness of the winner. In the context of evolutionary learning, two possible approaches can be distinguished. In the first one, the reference point(s) are defined manually. For RTT game learning, it means designing the strategies against which the automatically found ones will play and be evaluated. This is a typical case of evolution, because the fitness of the individual is independent from the rest of the population. However this approach seems to be unsuitable, because the evolution would be strongly biased by the choice of those teacher strategies. And if one wishes to counteract this it is necessary to design a huge amount of diverse reference players. But - ironically - at this point the researcher already has an entire set of solutions, thus further RTT strategy learning becomes pointless. In the second approach, instead of using external reference players, twosomes of strategies from the same generation are paired against each other. This is known as co-evolutionary approach, because the population itself is used to find the so-called competitive fitnesses3 of the individuals it contains. The dynamics of co-evolution learning is still a subject to many research [32, 33]. There are several interesting methods proposed in the literature, and as the most promising we chose to test single elimination tournament (SET) and various versions of hall of fame (HoF). It must be underlined that co-evolutionary learning from simulation is difficult, since the given strategy may be evaluated as very good in reference to one opponent and as very poor in reference to some another. Further more this process is extremely noisy and the winner of the same game may vary in different runs. The combination of this two phenomenas results in instability of learning. Both in SET and HoF we try to some extent neutralize this effect. However, it is out of the scope of this thesis to analyze and measure the potential instability itself, as well as the methods used to prevent it. 4.3 Evaluation 4.3.1 Single Elimination Tournament Basic SET In [7] Angeline and Pollack introduce the idea of using tournament fitness to evaluate individuals. Let us cite the description of Single Elimination Tournament (SET): Initially, the entire population is in the tournament. Two members are selected at random to compete against each other with only the winner of the competition progressing to the next level of the tournament. Once all the first level competitions are completed, the winners are randomly paired to determine the next level winners. The tournament continues until a single winner remains The fitness of a member of the population is its height in the playoff tree, the player at the top is then the best player of the generation. The method is very intuitive and easy to implement when population size is a power of two. The example of paring the individuals in SET is shown of Figure 4.1. Half of the populaiton has the minimal fitness of value 0 and only one individual has the maximum fitness of value "log n#, where n is the size of the population. The total number of fights in a single generation is equal to: 3 from now when using the term fitness we assume it to be competitive (subjective) 29 4.3 Evaluation Fig. 4.1: SET pairing !log n" " ! i=1 n# =n−1 2i Dealing with noise - SET with games repeats The SET works perfectly if the competition is non-noisy and fulfills the strong transivity assumption: that if player A beats player B, and player B beats player C, then player A must beat player C. Then the best player always win the tournament. Without this assumption, Single-Elimination Tournament’s real dynamics can be murky [34]. If the environment is suitably complex and an optimal strategy is not in the population, it is possible for even a poor strategy to win the tournament in a particular generation [7]. In order to minimize the noise influence - to be certain that truly a better player is a winner - we repeat the same game k times4 , where k is an odd number. The strategy that wins the majority of repeated games is evaluated as being better than the opponent. This changes the number of played games to k(n − 1). It does not however change the fitness values set to individuals. Still, the fitness ranges from 0 (for half of population) up to "log n# (for the only one all-winner). Memory - Cycling in SET SET has no memory of the process of learning. This is a very undesired feature in the context of strategy learning, because it might cause the cycling effect. It is usually seen in co-evolution of two or more subpopulations, however is also possible in a case of just one. Imagine there exist three5 strategies A, B and C violating the strong transivity assumption (A wins with B, B wins with C and C wins with A) and that one of them is discovered at some moment of evolution. If the learning process has no memory of what players were created in the past it is probable the evolution will start to cycle. For example, once discovered strategy A will evolve into a better playing strategy B, it will then evolve into a better playing strategy C, which in the end will evolve into a better playing strategy A. The evolution process cycles: A → B → C → A → ..., thus the learning of new players is practically stopped. This behaviour was actually observed in the first experiment, for details please see Section 8.3. 4 with 5 the different seeds for the pseudo-random numbers generator minimal length of the cycle is 3, however it might be longer 30 4.3 Evaluation 4.3.2 Hall of Fame Basic HoF The main idea in “hall of fame” family techniques is to incorporate memory into process of learning. Individuals in the population are evaluated against the good individuals discovered so far in the evolutionary run [34]. The method is very simple to implement, see Listing 4.2. procedure evaluate(Individual[] population, Individual[] HoF, int k) begin foreach individual in population do begin individual.fitness = 0; end Individual [] teachers = choose_randomly_k_individuals(HoF, k); express_phenotypes(population); express_phenotypes(teachers); foreach individual in population do begin foreach teacher in teachers do begin Individual winner = play_game(individual.phenotype, teacher.phenotype); if (winner == individual) then individual.fitness += 1; end end Individual best = choose_best(population); HoF.add(best); end Algorithm 4.2: Evaluation in simple HoF For clarity, it is necessary to highlight the difference between: • HoF size - a parameter saying against how many strategies from HoF each individual must fight (in a single generation). In the Listing 4.2 it is named k - to emphasize its similar function to number of game repetitions in SET (hence the same letter symbol). • And HoF length - a length of HoF array (how many strategies are in HoF already). The competitive fitness of the i-th individual equals: fi = k ! vi (teacherp ) p=1 Where the vi (j)6 is a binary function that results in: • 1 if the i-th individual from the population defeated the j-th individual from the hall of fame array, • and 0 otherwise; 6v stands for victory 31 4.3 Evaluation The variable teacherp is the p-th random individual from hall of fame array (altogether k random teachers are selected). Obviously the number of games played at each generation is equal to kn. Maintaining the hall It is necessary to properly maintain the array of best strategies. There are two main issues regarding that: • HoF initialization - when the learning begins the array of best players found so far is obviously empty. The evaluation cannot take place unless there are some strategies in HoF. This could be resolved by adding to HoF array few random individuals at the beginning. However, most of them would probably play very poorly and only introduce more noise to the evolution process. Thus in our solution it was decided to introduce manual teacher strategies. This approach takes advantage over supervised learning and solves the problem of initialization the HoF array. The fitness function changes respectively: fi = wmanual kmanual ! vi (teacherpmanual ) + wlearned p=1 klearned ! vi (teacherplearned ) p=1 The weights wmanual and wlearned allow giving different importance to reference strategies depending on if they were learned or manually set. Therefore it is possible to find a sort of tradeoff between supervised and unsupervised learning. • Unique HoF - as the result of using the elitism the same strategy could be evaluated as the best in more than one generation. This results in reoccurance of the same player in HoF array, which could lead to biasing the evolution7 . The solution is very simple - do not add the individual into HoF array if there is already another one with the same expression tree. Thus hall of fame acts actually as set (in strict mathematical meaning) of phenotypes. Competitive fitness sharing In case of the co-evolution the stagnation might be a great problem. The populations are used to evaluate themselves, so at some point the evolution may be “content” with already found individuals, even if they are far from being good RTT players (from human point of view). In a way initializing the HoF with manual strategies tries to counteract this. But it is not enough. There is a need to enhence the selection pressure during entire evolution and thus force the finding of new, original individuals that can successfully play the RTT game. To achieve that, we follow the idea of the competitive fitness sharing8 [37, 38]: fi = k ! p=1 1 v (teacher p) + 1 j=1 j vi (teacherp ) · $n The fitness used in a simple HoF was extended by a scaling factor. This promotes individuals that are one-of-a-kind and win with strategies rarely defeated by others. The analysis of two extreme cases will help to understand this property. 7 if a strategy is over-represented in HoF the evolution will focus on beating this frequent player clarity we show the competitive HoF without the manually designed set of teacher strategies 8 for 32 4.3 Evaluation • Every strategy defeats entire HoF: fevery = k ! p=1 k 1 = n + 1 1 + 1 j=1 1 · $n • One strategy defeats entire HoF, while rest of the population loses all the games: fone = k ! p=1 1 · $one−1 j=1 1 $n 0+1+ j=one+1 0 + 1 = k 2 frest = 0 The situation when entire population wins with every player from HoF is very unlikely, thus winning with unbeatable reference strategies is rewarded. We believe this feature makes from competitive HoF a good choice for RTT player learning. 4.4 Distributed learning 4.4.1 Assumptions In evolutionary learning it is the evaluation of individuals that takes most of the time. It might be even said that all of the other procedures (like selection and breeding) are insignificantly fast. Furthermore the evolutionary learning is an iterative process where next generation is computed from the previous one. All this leads us to the masterslave scheme of computation where one host is distinguished as a master and runs the search loop dispatching the evaluation of individuals to many slaves (please see Chapter 7 for more details). The question arises - how to split the evaluation procedure into independent tasks? Luckily, the nature of the RTT game learning helps us out and the answer is very simple - the smallest indivisible task is a single game between two strategies. We assume the game lasts on average for t seconds and there are m machines available. The RTT game defined in Section 2.3 is said to last for maximum 1 minute and from previous section we know there are at each generation about nk games to play. For example, if n=1024 and k=5 that gives 3072 minutes, which is more than two days to evaluate just one generation! If one wishes for example to run over 150 generations it is necessary to wait for about a year for the results! This makes computational experiments in the field of RTT playing almost impossible to be conducted. To deal with this problem two techniques are proposed: • game speed up, see Subsection 2.3.1 • and distribution of computation. We assume there are no errors while evaluating players and all games end successfully. Obviously in the real life errors happen and the software must handle them properly, see Chapter 7. However, in the theoretical analysis those errors are neglectable. 4.4.2 Distributed SET Evaluation of each generation in SET is not easy to be distributed among many hosts, because it consists of successive steps9 that must be performed only after completing 9 fights on consecutive levels of tournament’s tree 33 4.4 Distributed learning the proceeding ones. Let assume there are m machines performing the computation and each game last for approx. the same amount of time - t. Then the time of processing one generation is equal to: !log2 n" ! i=1 t· % k 2ni m & =t· !log2 n" % ! i=1 kn 2i m & For example, if n =128, m = 64, k = 5 and t=60 seconds then the time of processing just one generation equals: % 5·128 & $7 ' ( $7 2i = 60 · i=1 10 i=1 60 · 64 2i = 60 · (5 + 3 + 2 + 1 + 1 + 1 + 1) = 60 · 14 = 840 [s] And in case of having just one host (m = 1): % 5·128 & $7 $7 ' ( 2i = 60 · i=1 384 = i=1 60 · 1 2i 60 · (192 + 96 + 48 + 24 + 12 + 16 + 3) = 60 · 381 = 22860 [s] If computation are distributed on 64 machines the evaluation of one generation takes 14 minutes to complete. It is an enormous improvement in comparison to more than 6 hours on one host. However it is still long, hence typically it would be wished to use as much CPUs as possible. Bigger the problem is, more CPUs we want and other way round - more CPUs we have, the bigger problem might be solved. Let us see what happens in case the number of machines is proportional to the number of individuals in the generation - assume having m=n machines. There are k · (n − 1) fights for each generation. In an optimistic (desired) scenario, when all the fights are independent from each other, the evaluation of one generation takes k steps. We already know it is not the case of SET. The question is how much CPU power is wasted. Assumptions: 1. m=n 2. k < n 3. n is a power of two 4. k is a positive odd number. First lets analyze the optimistic scenario, where all fights are independent, the expected time of the computation is: % & k · (n − 1) t· ≈ kt m In SET the evaluation of a generation lasts for: ( $log n ' $log n ' ( t · )i=12 2kn = t · i=12 2ki *= im $$log2 k% ' k ( $log2 n =t· i=1 i=!log k" 1 = 2i + 2 = t · [(k + 'log2 k( − 1) + (log2 n − "log2 k#)] = = t · [(k + log2 n − 1) + ('log2 k( − "log2 k#)] = = t · (k + log2 n − 2) = kt + t · log2 n4 Finally, the utilization of computational power could be measured as the ratio of the two above values: 34 4.4 Distributed learning kt kt + t · log2 n 4 = 1 1 + log2 1 k n 4 = 1 + 1 + log2 k n4 Table 4.1 shows the CPU utilization ratio for typical values of k and n (once again under the assumptions the number of machines is equal to the number of individuals). It clearly shows that much computational power is wasted. Therefore when using SET it is recommended to use population size at least few times bigger than number of machines available. However with constantly growing in size computational clusters, with new tendency of multicore CPUs and already having hundreds of cores GPUs this guideline may be harder and harder to follow. The authors believe that evaluation methods easier to distribute uniformly will be in favor in nearby future. And SET is not one of them. n k 1 2 3 5 9 32 64 128 256 512 1024 0,17 0,38 0,50 0,58 0,64 0,14 0,33 0,45 0,54 0,6 0,13 0,30 0,42 0,50 0,56 0,11 0,27 0,38 0,47 0,53 0,10 0,25 0,36 0,44 0,50 0,09 0,23 0,33 0,41 0,47 Tab. 4.1: CPUs utilization ratio in SET in case m=n 4.4.3 Distributed HoF In HoF all fights are independent, hence this evaluation method is easily distributed among many hosts and there are no scalability issues as in SET. As said previously, there are kn fights for each generation, thus the time of evaluating single one is equal to: % & k·n t· m Having m=kn machines will ideally distribute the computations. We know that a typical use-case of SET (assuming n=128, k=3 and m=64) leads to evaluation of a single generation lasting 14 minutes. For HoF the k coefficient means the size of the reference strategies array and therefore it is usually bigger. Let assume k=20 and the same number of machines m=64. The question arises - what is the size of the population, so the evaluation of single generation would last for comparable time? The answer is a maximal whole number n fulfilling the relation: ' ( t · kn m = time "0.3125n# = 10 n = 32 It means that in typical case HoF must use few times smaller population than SET in order to achieve similar computational time. On the other hand there are no scalability issues regarding HoF and it is safe to say that mostly often10 the CPU utilization level equals 1. 10 For example, if n=32 and k=20 more than 640 CPUs are required for the utilization level to drop below the 100% 5 Encoding The side with the simplest uniforms wins. One of the Murphy’s Laws of Combat Operations. 5.1 Objectives We wish to use gene expression programming, thus encoding the player involves dealing with two issues: • the strategy must be represented as a linear genome, • the genetic operators must be designed. Luckily, the second part has already been dealt with in [21] and requires few changes. On the other hand, encoding strongly typed trees as a linear chromosome is quite a challenge. In this section we show an innovative method of encoding complex tree structures in a linear chromosome. Further on we assume the reader knows the basic concepts of gene expression programming. The list of all terminals and operators used to encode the player will be called representation. 5.2 Strongly typed GEP 5.2.1 Encoding simple structures From [20] it is known how to encode algebraic expressions in GEP. For example, let assume the following: • functions = {/. +, *} • terminals = {a, b, c} • all terminals are of the same type, functions take two arguments, and return one of the same type. 36 5.2 Strongly typed GEP Fig. 5.1: Example of GEP chromosome and corresponding expression tree The crucial is the last assumption about the types concordance, because it allows to encode any expression from this representation. An example is presented on Figure 5.1. The encoding is not that simple in case of many different types of terminals. In presented example, if the terminal ’c’ was of a different type that required by the multiplication function than the expressed tree would be invalid. The similar problem has already been encountered in Genetic Programming (see for example [36]) and in general there are two ways of handling this situation: • create invalid trees and then repair them (for example. by pruning), • custom the process of translation so only valid trees are created. Choosing one of them is a matter of personal opinion and problem’s characteristics. 5.2.2 Closure of the representation This thesis proposes an elegant solution which we called expression by two-phase translation. The objective is to create only valid trees without putting any severe restrictions on the representation. The only assumption is: in the representation there must exist functions for all types and for all arities, as well as terminals for all types. This feature we called closure. A simple example will clarify this requirement - imagine having two types A and B and three functions already defined1 : • A = ADD(A, A) a two-arity function returning the A type. • B = NOT(B) a one-arity function returning the B type. • A = NOT(A) a one-arity function returning the A type. This representation does not fulfill the requirement. If there is already some two-arity function there must exist two-arity functions for every type. Obviously the two-arity function of type B is missing, adding for example B=ADD(A,B) will solve the problem and close the representation. Please notice that it is irrelevant what types the arguments of the functions are, all that matters is the arity and the return type. 5.2.3 Expression by two-phase translation The main idea is to separate the expression tree’s structure from actual functions and terminals used in its nodes. First the structure of an expression tree is created, than thestill-empty nodes are filled with concrete functions and terminals, thus the translation 1 if a function is called to be of some type it means that it evaluates to that type (returns it) 37 5.2 Strongly typed GEP process has two steps, hence the given name. Redefinition of GEP’s head and tail is required. The head First let assume the set Arities contains all arities (as numbers) of functions present in the representation. It also contains a special symbol T 2 , standing for terminal node. For example if there exists (at least one) 1-, 2- and 4-arity function in the representation then Arities = {T, 1, 2, 4}. Knowing that, the head is defined as a linear string of symbols taken from the set Arities. The successive letters of a head, beginning from the first, straightforwardly define the tree structure. The tree is created from root in a BFS3 manner. It is worth noticing that when building the tree structure, arities values acts as functions and the T symbol is the terminal (and the only one). The redefined head can be considered to actually be a whole GEP gene. Formally, it is even necessary to distinguish in the new head its own subhead and subtail. However since there is only one terminal symbol, the subtail is redundant, because it would be a string containing many times the same T symbol. Therefore the subtail is useless and T symbol is considered to be a default value when constructing tree’s structure. Please refer to Figure 5.2 for an example. Notice that the last two nodes were by default set to T s. Fig. 5.2: Structure of the expression tree for the head “32T1T121T” The tail Assume we have the Arities set defined as in the previous paragraph and T ypes set consisting of all types from the given representation. The tail has n independent parts, where n = |Arities| · |T ypes|. Each part corresponds to a different tuple from Arities × T ypes. The symbols from the tail are used to fill in all nodes in the tree. A single node is set by a simple procedure: 1. Determine the return type of the current node. 2. Determine the arity of the current node. 3. Choose next unused symbol from appropriate part (corresponding to type and arity). 2 the T may be considered as having 0-arity good is the DFS, however it is BFS which is mostly often used in GP 3 equivalently 38 5.2 Strongly typed GEP Entire process begins in the root. Its type is given beforehand and depends on the problem (in our case it is a vector, as said in Chapter 3). Then recursively all descending nodes are filled in. The order of tree traversing is arbitrary, however using different orders will result in different trees expressed from the same genome. For sake of consistency with the typical approach we advise using BFS. Gene parts One could see the stGEP chromosome as composing of many separate genes. The first one describes the structure of a tree and - in classic GEP terminology - has only the head (tail is useless since there is only one terminal symbol). All of the others are sequences of terminals used to fill in the empty nodes, so - in classic GEP terminology - have only tails. The stGEP chromosome has sense and can be expressed only if all of the parts listed above are present. To underline the integrity of stGEP genome and taking into account the “head” and “tail” nature of different parts we find it appropriate to call the first part the head and all others the tail. On the same time we use the term gene part. The head and all parts of the tail are separate gene parts. This term is important because for example the transposition of symbol sequences in the genome can only be done within the borders of the same gene part. Genome size analysis It is necessary to have enough symbols in each of the tail’s parts in order to successfully perform the two phase expression. The head’s size is constant and depends on the user. The tail’s parts sizes differ whether their consist of terminals or functions: • sizei = (max(Arities) − 1) · h + 1 if the i-th part of the tail consists of terminals (regardless of type). • sizei = h if the i-th part of the tail consists of functions (regardless of type), where h is the head’s size. The last one is an important parameter because it is in fact upper bound for the tree size4 . Changing the value of h changes the maximum and expected size of expressions. This determines the phenotypes’ complexity level, thus influences the learning process. Knowing that terminals’ parts appear in the tail as many times as many different types there are in the representation, the entire genome’s size is: sizeall = h + |T ypes| · ((max(Arities) − 1) · h + 1) + |T ypes| · (|Arities| − 1) · h Random values in stGEP If desired the random values of chosen (in particular one or all) types may be introduced. This involves introducing two additional symbol sequences in the genome (for each type desired): • Dc gene part: the sequence of indexes telling which random number to use • and RNC gene part: the random numbers itself. 4 we consider a tree size to be the number of its nodes 39 5.2 Strongly typed GEP The sizes of the both the Dc and RNC strings are the same and are equal to size of tail’s part corresponding to terminals of appropriate type, because only a 0-arity terminal might turn out to be an RNC and in the worse case all terminals are RNCs. Therefore, the size of the genome changes respectively to: sizeall = h + 3 · |T ypes| · ((max(Arities) − 1) · h + 1) + |T ypes| · (|Arities| − 1) · h Phenotype size analysis The maximum size of a tree is - as said previously - h. In this case, the expression process “uses” the entire head and h symbols from the tail, which gives 2h5 symbols used altogether. This allows to estimate what fraction of a genome is actually expressed as the phenotype (at maximum): sizephenotype = 2h h + |T ypes| · ((max(Arities) − 1) · h + 1) + |T ypes| · (|Arities| − 1) · h sizephenotype ≈ 2 1 + |T ypes| · (max(Arities) − 1) + |T ypes| · (|Arities| − 1) sizephenotype ≈ 2 |T ypes| · (max(Arities) + |Arities| − 2) + 1 In the RTT game this thesis focuses on, we use 3 different types, functions of 1-, 2-, 3-arity and terminals. We consider this to be a very typical case when using stGEP. It means that at maximum 15% of the genome is actually expressed. However, the phenotype is expected to be often much smaller, than at the upperbound case. The exact numbers certainly vary and depend on many factors such as representation itself, learning process, etc. Nevertheless it seems to be safe to assume that typically a 5%10% of a genome is expressed into a strategy. We find this value to be very suitable, because in every individual potential subsolutions might be hidden and they can be freed or stored for later in the evolution process, while at the same time the significant part of the genome is actually a subject to expression into phenotype. 5.3 Genetic operators Genetic operators are used to breed the population. C. Ferreira introduced in GEP a set of original genetic operators that take advantage over the linear nature of the chromosome. Few changes are required for the stGEP, Table 5.1 shows the differences in comparison to GEP and summarizes the probabilities of using each operator during breeding. 5.3.1 Recombination The strong typing has no influence on recombination operators proposed by Ferreira. However, it does allow to introduce one more recombination operator - gene-part recom5 for ease of calculations we assume no RNCs are in the representation. On the one hand their enlarge the chromosome, but on the other - using an RNC requires reading the indexes and randomized values, which means the percent of genome taking part in the expression process is higher. It is safe to assume that both this effects cancel each other out. 40 5.3 Genetic operators Operator Type in GEP Default probability one point recombination two point recombination gene recombination gene part recombination stIS transposition stRIS transposition gene transposition st inversion mutation RNC mutation Dc mutation Dc inversion Dc transposition recombination recombination recombination recombination transposition transposition transposition inversion mutation mutation mutation inversion transposition the same the same the same not present different different the same different the same the same the same the same the same 0.3 0.3 0.1 0.1 0.1 0.1 0.1 0.1 0.044 0.01 0.044 0.1 0.1 Tab. 5.1: Genetic operator in GEP and stGEP bination. One point recombination One-point recombination swaps a part of one chromosome with the corresponding part of another chromosome. There is one point of cut chosen randomly. See Figure 5.3 for an example - two abstract genes, one consisting of digits other of letters, different colors show the swapped fragments. The probability of performing one-point recombination between two selected individuals is given by parameter onePointRecombProbability. The default value - as suggested by Ferreira - equals to 0.3. Fig. 5.3: One point recombination Two point recombination Two-point recombination swaps a part of one chromosome with the corresponding part of another chromosome. There are two points of cut chosen randomly. See Figure 5.4 for an example - two abstract genes, one consisting of digits other of letters, different colors show the swapped fragments. The probability of performing two-point recombination between two selected individuals is given by parameter twoPointRecombProbability. The default value - as suggested by Ferreira equals to 0.3. Fig. 5.4: Two point recombination 41 5.3 Genetic operators Gene recombination Gene recombination swaps one gene in a chromosome with the corresponding gene in another chromosome. The gene to swap is chosen randomly. See Figure 5.5 for an example - two abstract genes, one consisting of digits other of letters, different colors show the swapped genes. The probability of performing gene recombination between two selected individuals is given by parameter geneRecombProbability. The default value - as suggested by Ferreira - equals to 0.1. Fig. 5.5: Gene recombination Gene part recombination The stGEP introduces the idea of gene parts, thus it seems logical to introduce new recombination operator that will operate on them. Gene part recombination swaps one gene-part in a chromosome with the corresponding gene part in another chromosome. The gene to swap is chosen randomly. See Figure 5.6 for an example - one abstract gene composing of two parts, one consisting of digits other of letters, different colors show the swapped gene parts. The probability of performing gene part recombination between two selected individuals is given by parameter genePartRecombProbability. We suggest the default value of 0.1. Fig. 5.6: Gene-part recombination 5.3.2 Transposition The strong typing has big influence on transposition operators. stIS transposition Strongly typed Insertion Sequence transposition transposes (inserts) a small fragment of a gene part into the same gene part: • in case if a chosen gene part is head - after the root position • in case if a chosen gene part belongs to a tail - at arbitrary position (also at root) See Figure 5.7 for an example - two abstract genes, one consisting of digits other of letters, different color shows the transposed fragment. The probability of stIS transposition to happen in a gene part is given by parameter stISTranspositionProbability. The default value - as suggested by Ferreira - equals to 0.1. However, please notice that originally in GEP the probability refers to entire gene (not gene part), therefore it might be desired to lower the value of this parameter. The answer to question “how much” depends on real application of stGEP and the actual representation used. 42 5.3 Genetic operators Fig. 5.7: stIS transposition stRIS transposition Strongly typed Root Insertion Sequence transposition transposes (inserts) a small fragment of a gene part into the same gene part at root position. The transposed fragment must start with a function, thus this works only in the head. See Figure 5.8 for an example - two abstract genes, one consisting of digits other of letters, different color shows the transposed part. The probability of stRIS transposition to happen in a gene head is given by parameter stRISTranspositionProbability. The default value - as suggested by Ferreira - equals to 0.1. Fig. 5.8: stRIS transposition Gene transposition Gene transposition swaps the first gene with a randomly chosen other gene. This works in case of having many compatible genes in the chromosome. See Figure 5.9 for an example - two abstract genes, one consisting of digits other of letters, different color shows the transposed part. The probability of gene transposition to happen in a chromosome is given by parameter geneTranspositionProbability. The default value - as suggested by Ferreira - equals to 0.1. Fig. 5.9: Gene transposition In the RTT player there is one gene for marine’s behaviour and one for tank’s. Both of them are compatible (use the same representation), so in our case stGene transposition means always swapping the first and the second (last) gene. Additionally, please notice since gene parts are by definition incompatible, there is no stGene part transposition on the contrary to analogical situation in recombination operators. 5.3.3 Inversion and mutation The strong typing has influence on inversion but not on mutation. st inversion Strongly typed inversion randomly selects start and end positions within the same gene part and reverts the order of the sequence. The probability of st inversion to happen in a gene part is given by parameter stInversionProbability. The default value - as suggested by Ferreira - equals to 0.1. However, please notice that originally in GEP the probability refers to entire gene (not gene part), therefore it might be desired to lower the value of this parameter. The answer to question “how much” depends on real application of stGEP and the actual representation used. 43 5.3 Genetic operators mutation Mutation randomly changes the symbol (accordingly to proper gene part domain) at arbitrary position in the gene. The probability of changing any position is given by parameter mutationProbability. The default value - as suggested by Ferreira - equals to 0.044. 5.3.4 Dc and RNC operators RNC mutation Mutates constants from the RNC areas. The value is randomly chosen from the appropriate domain. The probability of changing any position in RNC areas is given by parameter rncMutationProbability. The default value - as suggested by Ferreira - equals to 0.01. Dc mutation Mutates the values (indexes) in the Dc areas. The new value is a randomly selected integer from [0, size] (where size means the size of the appropriate Dc gene part). The probability of changing any position in Dc areas is given by parameter dcMutationProbability. The default value - as suggested by Ferreira - equals to 0.044. Dc inversion Inverts the values (indexes) in the Dc areas. The limits of inverted sequences are randomly chosen. The probability of inverting a fragment in any Dc area is given by parameter dcInversionProbability. The default value - as suggested by Ferreira - equals to 0.1. Dc transposition Does the IS transposition in the Dc areas (fragments may be inserted also at the root position). The probability of transposing a fragment in any Dc area is given by parameter dcTranspositionProbability. The default value - as suggested by Ferreira - equals to 0.1. 6 Representation If you know exactly what is happening than you are not in combat. One of the Murphy’s Laws of Combat Operations. 6.1 Objectives The detailed list of all terminals and operators must be designed, which in a way will implement a routine gather_information from algorithm 3.3. During the work on this thesis and conducting the experiments two representations were created. The analysis of the experiments presented in Chapter 8 brought us to conclusion that the initial one was insufficient for successful player learning. However it was a good starting point for designing a second set of terminals1 . When designing the representation we wish to fulfill two requirements: • The representation must be small. This limits the solutions’ space and thus eases the learning. • The representation must have sufficient expression power, therefore the set must be big and rich enough. Obviously both requirements cannot be fully satisfied at the same time and a tradeoff between them must be found. In the context of the RTT games and GEP, choosing proper terminals and operators for expressing units is more a case of an intuition than strict analysis. It might be said that the way how a designer understands a game lets him find more adequate representation. In this chapter we show two different representations - “simple” and “complex”, each of them composes of a list of terminals and a list of functions. 6.2 Types There are three types in the representation, see Table 6.1. 1 only terminals represent the domain knowledge, therefore once created set of operators was not changed 45 6.3 Functions Notation Meaning S V = S×S B = {0, 1} scalar (a real number) vector (a tuple of two real numbers) boolean Tab. 6.1: Types in initial representation 6.3 Functions There are operators of 1-, 2- and 3- arity of every type. Table 6.2 shows detailed list of them all. Notation Return type No of args Notes IF(B,B,B) ADD(B,B) MUL(B,B) LT(S,S) IF(B,S,S) ADD(S,S) MUL(S,S) SUB(S,S) OPP(S) ABS(S) SIG(S) ANG(V,V) LEN(V) IF(B,V,V) ADD(V,V) MUL(S,V) SUB(V,V) OPP(V) NOR(V) ROT(S,V) RIG(V) LEF(V) B B B B S S S S S S S S S V V V V V V V V V 3 2 2 2 3 2 2 2 1 1 1 2 1 3 2 2 2 1 1 2 1 1 if clause logic OR logic AND less than if clause addition multiplication subtraction opposite value absolute value sigmoid function angle between two vectors length of a vector if clause addition scalar multiplication subtraction opposite vector vector normalization vector rotation perpendicular vector turned to the right perpendicular vector turned to the left Tab. 6.2: Functions 6.4 Domain knowledge 6.4.1 Simple terminals Table 6.3 shows the terminal list as it was proposed at the very beginning. The initial simple representation was used to conduct first experiments. During the tests many drawbacks were discovered: • The lack of memory in the RTT player algorithm - it was not yet implemented than. 46 6.4 Domain knowledge No Name Type Description 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 MY_RANGE MY_SPEED MY_HP MATES_HP NEAREST_MATE_HP ENEMIES_HP NEAREST_ENEMY_HP N_MATES N_ENEMIES N_ENEMIES_IN_RANGE NEAREST_MATE CENTER_MATES NEAREST_ENEMY CENTER_ENEMIES HOME_VECTOR IS_MOBILE S S S S S S S S S S V V V V V B shoot range of the current unit maximum speed of the current unit HP level of the current unit sum of HP levels of all friendly units HP level of the nearest friendly unit sum of HP levels of all enemy units HP level of the nearest enemy unit the number of all friendly units the number of all enemy units the number of enemies units in shoot range vector to the nearest mate the geometric center of all friendly units vector to the nearest enemy the geometric center of all enemy units the initial geometric center of all friendly units true for marine and tank not in siege mode Tab. 6.3: Simple terminals • The boolean terminals were necessary to close the representation (see Subsection 5.2.2). However in the basic representation we introduced only one such symbol. The extremely short list of boolean terminals results in imbalance in the representation. • None of the terminals distinguishes tanks from marines. Intuitively, this prevent learning strategies adjusted to different type units. • It seemed that there were not enough terminals giving local information and evolution were not able to build on top of them intelligent strategies. Simply put, it seems that the simple representation does not have enough expression power. Therefore, after several designing cycles a new one was§ created. 6.4.2 Complex scalar terminals The scalar terminals from the complex representation provide information above all about: • hit points level of some distinguished units • and the number of certain units in the context of: • units alliance (friend or foe), • units type (marine or tank), • and units position (nearest, furthest, in range, on entire map). The full list of scalar terminals - from the “current agent” point of view - is the following: 1. TICK - which simulation frame of the game it is 47 6.4 Domain knowledge 2. MY_HP - my current HP level 3. NEAREST_MARINE_MATE_HP - HP level of the nearest friendly marine 4. NEAREST_TANK_MATE_HP - HP level of the nearest friendly tank 5. NEAREST_MARINE_ENEMY_HP - HP level of the nearest enemy marine 6. NEAREST_TANK_ENEMY_HP - HP level of the nearest enemy tank 7. WEAKEST_MYRANGE_MARINE_MATE_HP - HP level of the weakest friendly marine I can aid 8. WEAKEST_MYRANGE_TANK_MATE_HP - HP level of the weakest friendly tank I can aid 9. WEAKEST_MYRANGE_MARINE_ENEMY_HP - HP level of the weakest enemy marine I can shoot 10. WEAKEST_MYRANGE_TANK_ENEMY_HP - HP level of the weakest enemy tank I can shoot 11. RANGE_ANY_MATE_NUM - number of friendly units that can shoot at my position2 12. RANGE_MARINE_MATE_NUM -number of friendly marines that can shoot an enemy near me 13. RANGE_TANK_MATE_NUM - number of friendly tanks that can shoot an enemy near me 14. ALL_MARINE_MATE_NUM - number of all friendly marines that are still alive 15. ALL_TANK_MATE_NUM - number of all friendly tanks that are still alive 16. MYRANGE_ANY_ENEMY_NUM - number of enemy units that I can shoot 17. MYRANGE_MARINE_ENEMY_NUM - number of enemy marines that I can shoot 18. MYRANGE_TANK_ENEMY_NUM - number of enemy tanks that I can shoot 19. RANGE_ANY_ENEMY_NUM - number of enemy units that can shoot me 20. RANGE_MARINE_ENEMY_NUM - number of enemy marines that can shoot me 21. RANGE_TANK_ENEMY_NUM - number of enemy tanks that can shoot me 22. ALL_MARINE_MATE_NUM - number of enemy marines that are still alive 23. ALL_TANK_MATE_NUM - number of enemy tanks that are still alive 2 an approximation of how many units may aid me 6.4 Domain knowledge 48 6.4.3 Complex vector terminals Vector terminals should allow units to: • synchronize their movement with the rest of the team, • and choose wisely the path to approach the enemy (or to construct solid defense). Therefore, the vector terminals from the complex representation mostly often point to: • some distinguised units, • or geometric centers of distinguished groups of units in the context of: • units alliance (friend or foe), • units type (marine or tank), • units characteristic (weakest), • and units position (nearest, furthest, in range, on entire map). The full list of complex vector terminals- from the “current agent” point of view - is the following: 1. ME - a (0,0) vector. 2. HOME_TEAM - vector to the starting geometric center of all friendly units. 3. HOME_MINE - vector to my starting position. 4. FURTHEST_RANGE_MARINE_MATE - vector to the furthest friendly marine that can shoot an enemy near me. 5. FURTHEST_RANGE_TANK_MATE - vector to the furthest friendly tank that can shoot an enemy near me. 6. FURTHEST_RANGE_MARINE_ENEMY - vector to the furthest enemy marine that can shoot me. 7. FURTHEST_RANGE_TANK_ENEMY - vector to the furthest enemy tank that can shoot me. 8. NEAREST_MARINE_MATE - vector to the nearest friendly marine. 9. NEAREST_TANK_MATE - vector to the nearest frienfly tank. 10. NEAREST_MARINE_ENEMY - vector to the nearest enemy marine. 11. NEAREST_TANK_ENEMY - vector to the nearest enemy tank. 12. WEAKEST_MYRANGE_MARINE_MATE - vector to the weakest friendly marine that I can aid. 13. WEAKEST_MYRANGE_TANK_MATE - vector to the weakest friendly tank that I can aid. 14. WEAKEST_MYRANGE_MARINE_ENEMY - vector to the weakest enemy marine I can shoot. 49 6.4 Domain knowledge 15. WEAKEST_MYRANGE_TANK_ENEMY - vector to the weakest enemt tank I can shoot. 16. CENTER_ALL_MATE - vector to the geometric center of all friendly units. 17. CENTER_MARINE_MATE - vector to the geometric center of all friendly marines. 18. CENTER_TANK_MATE - vector to the geometric center of all friendly tanks. 19. CENTER_ALL_ENEMY- vector to the geometric center of all enemy units. 20. CENTER_MARINE_ENEMY- vector to the geometric center of all enemy marines. 21. CENTER_TANK_ENEMY- vector to the geometric center of all enemy tanks. 22. PATH_MATE - vector parallel to the previous move vector of the nearest friendly unit. 23. ME_BACK - vector to my previous position. 6.4.4 Complex boolean terminals The main idea behind proposed boolean terminals was to provide some useful and nontrivial information to the agent. The full list - from the “current agent” point of view is the following: 1. AM_ON_EDGE - true if the current unit is close to the edge of the map. 2. AM_ON_FIRE - true if my HP level has just dropped down. 3. AM_IN_GROUP - true if there are friendly units in my range. 4. AM_WINNING - true if my side is winning. 5. AM_BULLY - true if in range there are more friendly units than the enemies’. 6. AM_MOBILE - true if I can move (always true in case of marine, and true for non-sieged tanks). 7. IS_BEGIN_TIME - true at the beginning of the game (up to the 480th simulation frame). 8. AM_HEALTHY - true if the current unit’s HP level is higher than 80% of its maximum. 9. AM_DYING - true if the current unit’s HP level is lower than 20% of its maximum. 10. JUST_SHOT - true if the current unit has shot in previous simulation frame. 11. AM_COLLIDING - true if the unit has been blocked in the previous simulation frame (and thus did not move as ordered). 6.4 Domain knowledge 50 6.4.5 Normalization All scalar terminals are normalized to be in range of [0,1]. For example, the maximum level of HP for the marine is 80. Imagine that during a game a given marine unit was shot few times and in concrete game state it has 56 HP. In that situation the MY_HP 56 = 0.7. This would be the value actually used scalar terminal would evaluate to value 80 in the expression tree. As said in Subsection 3.4.3 all vectors are bounded to the current unit’s position. Additionally all of them are also normalized by map dimension (analogically to scalar terminals). For example, imagine a geometric center of all friendly units to be in point (200,300) and the current unit to be at position (100,100). First we bound the vector to the current unit, thus the CENTER_ALL_MATES equals (100,200). Than we normalize , 100 200 , 768 = (0.09765625, 0.26041667). This would be it by the map dimensions getting 1024 the value actually used in the expression tree. After the move vector is calculated it is used to determine a specific point on the map that the current unit will go to. For example if a move vector equals (0.2, 0.3) and the position of the current unit is still (100,100) the destination point is set to be (110, 115). It is important for the destination to be unreachable in one simulation frame, since we want units to move always with a maximum speed. Anyhow, a new move vector is computed in each simulation frame, so the fact the unit does not reach it previous destination does not influence its behaviour. 6.4.6 RNC vectors and map mirroring In the representation we use scalar and vector RNCs. There is one issue regarding random constant vectors. For example, let us consider a simple expression tree that consists of one constant vector (0.5,0). In case a player has a starting position on the left side of the map, playing accordingly to this expression tree would move units towards the enemy. However, in case a player starts on the right side of the map the units would run away from the enemy. Therefore, depending on the starting position (which is random) the exactly same strategy results in a different units behaviour. To prevent this from happening, at the beginning of the game we check on which side a given player starts. If it is the left side the constant vectors remain unchanged and if it is the right side we multiply the x coordinate by -1. Let us emphasise that this applies only to RNC vectors, since all terminal vectors computed from the game state are always properly directed. 7 Implementation Things that must work together, can’t be carried to the field that way. One of the Murphy’s Laws of Combat Operations. 7.1 Objectives One of the main goals of this thesis was to conduct distributed evolutionary learning from simulations. From the beginning the technical complexity was known to be high, since the application must: • perform artificial evolution of a complex species, • perform RTT game simulations, • distribute the computation (namely evaluation of the individuals, which is the main CPU power demanding task), • provide checkpointing • and handle errors. In this chapter we show the entire framework design and how the computations were distributed. Additionally, there is given a short description how the experiments were maintained, the results were gathered and the analysis was performed. 7.2 The framework 7.2.1 Master-slave design There are five entities in the framework design: • Experiment - performs main evolutionary loop. 52 7.2 The framework • Evaluator - entry point for the evaluation. Creates computational Tasks and sends them to the Manager to be executed. • Manager - maintains a pool of Tasks and a pool of Hosts. Each time there is a free host (not performing computations) and there is at least one game to simulate it assigns properly the Task to the Host. • Task - represents a game simulation. Consists of two individuals (that are supposed to fight against each other), a random seed, status and result. The task can have assigned five statuses: NEW, SUBMITTED, RUN, FINISHED and DONE. The transition between them is shown on Figure 7.2. At the beginning when task is created by the Evaluator it is given a status NEW. As soon the tasks is inserted in the pool in the Manager it changes the status to SUBMITTED. When task is being executed it has status RUN. After completion, the task is always returned to the Manager. However, in case a task ends in failure (for example the host went down) it is automatically given status SUBMITTED and if the game simulation finished successfully it is given status FINISHED. And finally, when tasks are retrieved from the pool by the Evaluator they are given status DONE. • Host - an entity representing a machine performing computations. It is responsible for sending the command (over ssh) to run a game simulation on a remote host. Fig. 7.1: Framework design Summarizing, one distinguished host acts as master and processes entire evolution loop, sending the evaluations’ tasks to slaves. This is done via ssh command, therefore no daemon is required to be running on the slave hosts. This approach might be called a minimalistic master-slave scheme, minimalistic - because there are no application slaves, just hosts with proper software installed1 . For entire framework design please see Figure 1 in our case only ORTS simulation framework is required 53 7.2 The framework Fig. 7.2: Task status transitions 7.1, please notice that Experiment component performs entire search loop, thus it handles also selection and breeding (what is not shown on the design). 7.2.2 Tools and libraries Keeping in mind all objectives from previous point it is possible to break down the framework into several “challenges”. For each must be made a decision what tools to use and what should be implemented by ourselves. • Evolutionary Computation System (ECJ ver. 18) - is used to perform entire experiment. This Java framework provides the implementation of the main search loop (see Algorithm 4.1) and the selection of the individuals, as well as checkpointing. • Strongly typed GEP - bases on the GEP plugin to ECJ written by Bob Orchard in Java. The standard plugin turned out to be not flexible enough to handle stGEP complex individuals, therefore it was rewritten and extended. • Distributed evaluation of individuals: – ORTS framework - used as simulation engine for RTT games playing. This required to implement a dedicated ORTS client in C++, the game server remained unchanged. – Bash scripts - used to execute the game simulations on many remote hosts (by ssh command). – asynchronous tasks and hosts pool - this was implemented in Java as an extension of ECJ evaluator and allowed the experiment to perform distributed evaluation of the individuals, as well as provided error-handling. • Bash scripts - used for the experiment on-line monitoring. • Postgres database, bash scripts, gnuplot - used for the analysis of the results. 7.3 Maintaining experiments 7.3.1 Monitoring A set of command line tools were develop in order to monitor the experiments, the most important are: • ping: shows the status of each remote host performing computations. 54 7.3 Maintaining experiments • synchronize: synchronizes entire cluster with a chosen host (it means copying all necessary files from the chosen host into corresponding paths on remote hosts). • run: runs the distributed experiment, takes as an argument a parameter file, which lists all computational hosts and experiment’s specification. • rerun: restarts the distributed experiment from chosen checkpoint file given as a parameter. Checkpoint files are automatically created by ECJ framework every generation. • kill: kills the experiment along with all hanging processes involving ORTS simulations. • backup: this is a daemon script which copies the checkpoint files and logs from master host to network disk, in case of a master failure. 7.3.2 Logging and analysis Each experiment generated many logs. Each task created by evaluator was minutely observed, the logs contained exact information about all tasks transition from one state to another, along with times of those happenings. At each generation entire population, both expressed and not (raw genotype), was written to a file. Also all failures of hosts were logged. By using simple bash command-line tools (such as grep, cut, wc, expr, etc.), all log files allowed for: • On-line experiment progress monitoring (for example the number of generations processed so far or the number of hosts down). • Off-line analysis of the results (after completion of entire experiment). For automatic results analysis, such as for example making graphs of average fitness, the postgres database and gnuplot were used. The log files generated by experiment were parsed by bash scripts and the retrieved data was inserted into a database. Using the functionalities provided by database (for example max, min, length) proper data was collected and plotted into a graph. In database there was only one simple relation named Experiment with fields: • generation INTEGER • fitness FLOAT • marine VARCHAR • tank VARCHAR 8 Experiments and results If enough data is collected, a board of inquiry can prove anything. One of the Murphy’s Laws of Combat Operations. 8.1 Objectives In this chapter we describe the results of three chosen computational experiments that were performed to verify our approach. It is worth mentioning that altogether eight experiments were conducted, which is roughly two years of one 1.2GHz CPU computation. During the research we were constantly introducing implementation changes, testing different parameters, evaluators, etc. Very often a new, improved version of the software were ready before finishing the previous experiments. Furthermore, some experiments failed due to bugs in implementation. Therefore, we gathered data from three, most representative experiments and summarized all the work and conclusions in this chapter. The main idea of the experiments was to fine-tune the evolution process, focusing on the evaluation method and the complexity of the representation. Please see Table 8.1 for most important improvements introduced in successive three experiments. Experiment Evaluation Representation 1 2 3 SET HoF HoF simple simple complex Tab. 8.1: Evaluation method and representation complexity in three experiments For each experiment we show the initial settings along with reasons for using them and present the results. We focused on: • The length of the expression trees (meaning the total length of an expression string, which is directly proportional to number of nodes in an expression tree). This shows the complexity level of the phenotypes, but does not reflect the actual complexity level of playable strategies. It is possible to have an expression tree with many nodes, which in fact can be reduced to a very simple form. 56 8.1 Objectives • Number of one-node expression trees. This reflects the diversity of the population. • In case of HoF - size and length of the reference strategies set. • Best players evolved. This is a qualitative (not quantitative) analysis of learned strategies - we checked how best players from different generations fought against each other. Therefore it was possible to describe strategies in the context of units actual behaviour (the exact analysis of the expressed phenotypes, due to their complexities, was not possible). • If observed - other characteristics of the learning process. Please notice that we do not analyze the individuals’ fitnesses - as it is often done in the research devoted to artificial evolution: • because in SET the maximum, minimum and average fitness is constant • and due to strong competitive nature of the fitness in HoF, what makes it a nonobjective measure. 8.2 Environment The experiments took place at three laboratories of Institute of Computing Science at Poznan University of Technology. There were 45 machines altogether, but specific cluster configuration varies from experiment to experiment. Table 8.2 shows a summary of the computational mini-cluster we have created. Table 8.3 summarizes the availability of computers, thus showing the actual computation power used. Lab name All hosts Working hosts CPU RAM OS lab-43 lab-44 lab-45 15 15 15 13 14 15 2x2.2GHz 2x3GHz 2x2.2GHz 1GB 2GB 1GB Linux Linux Linux Tab. 8.2: Experimental cluster configuration 8.3 First experiment - The Reconnaissance 8.3.1 Objectives and assumptions The first experiment is called a “reconnaissance”, since the main goal is to check if our approach combining stGEP, MAS and RTT players is promising. The reasons for using Lab name Experiment 1 Experiment 2 Experiment 3 lab-43 lab-44 lab-45 13 14 15 13 12 13 13 14 0 total (hosts) total (cores) 42 84 38 76 27 54 Tab. 8.3: Availability of hosts 8.3 First experiment - The Reconnaissance 57 certain settings are: • Evaluator - in the first experiment we used Single Elimination Tournament, which was already implemented in ECJ and required few changes in order to handle distributed computations. • Number of the game repetitions - this is specific to SET evaluation method, each game was repeated 5 times. It was a maximal value that could be used and that would not lengthen too much the experiment time. This way the single generation was evaluated on average in 20 minutes. • Representation - simple representation was used, which at this time was believed to have sufficient expression power. • Size of the population - we decided to use a relatively large1 population of 128 individuals, to see if a broad search can overcome the expected instability of SET. • Number of generations - the experiment went on for 140 generations. • Genetic operators probabilities - we used all probabilities as suggested in Table 5.1. Those values try to establish a sort of equilibrium between mutation and recombination, not favoring any of them. • Size of the genotype - the head was a sequence of 40 symbols. It means that expression tree has at maximum 40+(3-1)*40+1=121 nodes. • Probability of randomly drawing a function or terminal symbol - this factor says how probable it is to randomly set a function symbol in a head during initialization or mutation. In the first experiment we wished to check how this probability influences the length of the expression, therefore we used a value of 0.75 for the marine gene and 0.5 for the tank gene. 8.3.2 Results The dynamics of the evolution process is presented on Figures 8.1 and 8.2. At the beginning the average marine phenotype length is much larger than the tanks, but with time the significant difference fades out. In the end, marine phenotypes tend to be up to twice as long as the tank phenotypes. This confirms that 0.75 is a better value for the probability of randomly choosing a function symbol in a gene. However, this still does not prevent the population from being overcome by individuals with a very simple phenotype, since the average length of the phenotypes is surprisingly low. Even more, starting from the 50th generation the 60-80% of the individuals have both the marine and tank expression tree of a size 1. Evolution “chooses” simplicity and - from given terminals and functions - is unable to construct more sophisticated strategies. The detailed analysis of the best strategies from each generation shows that three main types of them might be distinguished: • full defense - all marines and all tanks gather and await for the enemy. • full attack - all marines and all tanks simple move towards the enemy. • half attack / half defense - either marines or tanks simple move towards enemy, while all the units of the other type gather and wait. 250 Marine Tank Avarage expression length 200 150 100 50 0 0 20 40 60 80 100 120 140 Generation Fig. 8.1: Average phenotype length, using SET and simple representation 110 One-length indiviudals 100 90 80 Number 70 60 50 40 30 20 10 0 0 20 40 60 80 100 120 140 Generation Fig. 8.2: Number of one-lengths phenotypes (marine and tank), using SET and simple representation 59 8.3 First experiment - The Reconnaissance Marine Tank Notes CENTER_MATES NEAREST_ENEMY HOME NEAREST_MATE NEAREST_ENEMY CENTER_ENEMY full defense strategy full attack strategy half attack / half defense strategy Tab. 8.4: Examples of best players, using SET and simple representation For examples please see Table 8.4. It must be underlined that in almost every generation the best player had just one node in a marine and tank expression tree. Thus, the behaviour of units was very simple, they just blindly moved towards the enemy or gathered and waited for him. Furthermore, a cycling of strategies was discovered - “half attack / half defense” was at some evolution’s point beaten either by “full defense” or “full attack” and those were at some point beaten again by a “half attack / half defense”. 8.3.3 Conclusions From the analysis of the results the following conclusions were drawn: • The evaluation with no memory and in a noisy environment causes intense population cycling. • The high probability of randomly setting a function symbol in a head, may - up to certain extent - help in keeping high complexity level of the strategies. • The population convergence towards simple players is unquestionable. The learning process did not evolve any intelligence reaching further than already designed terminal set. We suspect there might be different causes of that: – using co-evolution with no external reference players – evaluation method prone to noise – not expressive enough representation In the context of evolving an intelligent player the first experiment failed. However, we did learn much and know that many changes must be introduced and tested before the evolution will be able to find a good RTT strategy. 8.4 Second experiment - The Skirmish 8.4.1 Objectives and assumptions The second experiment is called a “skirmish”, since its main goal is to gather experience before the final try of finding the best RTT players. Many improvements are tested, if they have a positive influence on learning process. The reasons for using certain settings are: • Evaluator - the first experiment proved the SET (having no memory) to be insufficient, thus we have implemented Hall of Fame with all its extensions: uniqueness, manual teachers and competitive fitness sharing. 1 in comparison to later experiments 8.4 Second experiment - The Skirmish 60 • Size of the reference array - this is specific to HoF evaluation method, each game is tested against 17 learned players and 3 manual teachers (20 altogether). It was a maximal value that could be used and that would not lengthen too much the experiment time. This way the single generation was evaluated on average in 15 minutes. • Representation - to objectively observe the influence of all introduced changes, we decided to once again use simple representation. • Size of the population - there are 32 individuals in the population, since a larger one makes the experiment too demanding in the sense of computations time. One could increase the size of the population at the expense of using a smaller reference array. However in that case, the evaluation method would be as prone to noise as SET, and from the first experiment we know that this could lead to population cycling. Therefore the size of HoF is favored over the size of the population. • Number of generations - the experiment went on for 140 generations. • Genetic operators probabilities - we used slightly different values than previously, giving more significance to mutation (probability of happening changed from 0.044 to 0.1), and less to recombination (probability of happening changed from 0.3 to 0.15). • Size of the genotype - the head was a sequence of 40 symbols. It means that expression tree has at maximum 40+(3-1)*40+1=121 nodes. • Probability of randomly drawing a function or terminal symbol - basing on the experiences from the first experiment this factor equals 0.75 for both the marine gene and the tank gene. 8.4.2 Results In comparison to the first experiment, phenotypes are (on average) longer and - as suspected - there are no significant differences between marines and tanks. Figures 8.4 and 8.3 give more light on the evolution dynamics. At the beginning the number of individuals with one-node expression trees is relatively low. However, at the 40th generation the learning process suddenly starts to favor simplicity, as the phenotype length drops down and more and more strategies consists of one node trees. This phenomenon lasts up to the 80th generation with a peek magnitude at the 60th. The closer analysis shows that the initial random individuals converged to one simple individual and around the 60th generation the almost entire population has been “overrun” by it. All the individuals had identical phenotypes and therefore identical strategy, shown in Table 8.5. Figures 8.4 and 8.3 suggest that in later generations once again more complex individuals are present. However, the actual playing strategy did not change! For example, assume having a strategy such as: IF(LT(0,1),NEAREST_ENEMY,some_large_subtree). The decision clause always evaluates to TRUE, therefore always the NEAREST_ENEMY is returned, regardless of the other parts of the formula. Exactly this is happening in the second experiment. All of those more complex expression trees were in fact reducible to the simple one to which population converged previously. Therefore it seems that the genetic operators allowed the evolution to explore the solutions’ space, but only in the local optimum, which the learning process was not able to abandon. The phenotypes were getting more complex, but the strategy represented by them not. On Figure 8.5 it is seen that the HoF’s reference array is staidly and quickly getting bigger. From the 40th up to the 80th generation a mild plateau can be observed, but this 300 Marine Tank Avarage expression length 250 200 150 100 50 0 0 20 40 60 80 100 120 140 Generation Fig. 8.3: Average phenotype length, using HoF and simple representation 30 One-length indiviudals 25 Number 20 15 10 5 0 0 20 40 60 80 100 120 140 Generation Fig. 8.4: Number of one-lengths phenotypes (marine and tank), using HoF and simple representation 62 8.4 Second experiment - The Skirmish 140 HoF length HoF size 120 Hall of Fame dynamics 100 80 60 40 20 0 0 20 40 60 80 100 120 140 Generation Fig. 8.5: Size and length of the reference strategies array, using HoF and simple representation was expected, since the previous analysis showed that during that time the population were overcome by individuals having identical phenotypes. Marine Tank Notes NEAREST_ENEMY NEAREST_ENEMY full attack strategy Tab. 8.5: Examples of best players, using HOF and simple representation 8.4.3 Conclusions From the analysis of the results the following conclusions were drawn: • HoF with fitness sharing seems to be less prone to noise than SET and thus it is a better evaluation method in case of RTT player learning. • The individuals of different phenotypes may play similar strategy. This holds back the learning process. Also it causes the HoF’s reference array to contain mainly individualas having the same “playing style”, regardless of not adding twice individuals with identical phenotype. • The population convergence towards a simple players (even if “decorated” with lots of useless formulas) is unquestionable. The learning process did not evolve any intelligence reaching further than already designed terminal set. However, the evolution process was able to explore the phenotypes space, since the phenotypes were actually getting more complex and no cycling was observed. In the context of evolving an intelligent player the second experiment failed. However, we did test different approaches and settings and are ready for the final experiment. 8.5 Third experiment - The Final Battle 8.5 63 Third experiment - The Final Battle 8.5.1 Objectives and assumptions The last experiment is called a “final battle”, since the main goal is to automatically learn best RTT player and submit it to ORTS RTS 2008 contest. The main change (in comparison to the second experiment) is using the complex representation instead of the simple one. The reasons for using certain settings are: • Evaluator - hall of fame is used (with all its extensions: uniqueness, manual teachers and competitive fitness sharing). It proved to be a better choice than SET. • Size of reference array - this is specific to HoF evaluation method, each game is tested against 16 learned players and 4 manual teachers. It was a maximal value that could be used and that would not lengthen too much the experiment time. However, at time we conducted the third experiment, less hosts were available (see Table 8.3), thus single generation was evaluated on average in 25 minutes. • Representation - in order to evolve an intelligent player we decided to design and use the complex representation. • Size of the population - just like in the second experiment, we decided to use the population of 32 individuals. • Number of generations - since it is the final experiment, it went on for a little bit longer than the previous ones. There were 190 generations. • Genetic operators probabilities - we used the same values as in the second experiment, favoring mutation over recombination. • Size of the genotype - since the complex representation is significantly larger than the simple one, it was decided to use a larger head of 50 symbols. It means that expression tree has at maximum 50+(3-1)*50+1=151 nodes. • Probability of randomly drawing a function or terminal symbol - we used the same value as in the second experiment. 8.5.2 Results The evolution dynamics seems to be similar to the case from the second experiment. The population converge to a simple strategy and later on the phenotypes slowly are getting more complex. However, there are differences: • Closer analysis of the individuals shows that the population converge to few simple strategies, not to only one. Most probably this happens because the complex representation is much larger than the simple one and similar players can be represented using different terminals. • The convergence is faster, which might suggest that the complex representation (having more terminals) makes it easier for the evolution to learn simple strategies. • Comparing to the second experiment, the phenotypes complexity level increases more slowly. Also the dynamics of HoF’s array growth is different, the individuals are added to the reference set more rarely. It seams that the learning process is more stable, a once found good solution is not simply altered by adding non-functional parts to the phenotype (as it happened in the second experiment). 350 Marine Tank 300 Avarage expression length 250 200 150 100 50 0 0 20 40 60 80 100 Generation 120 140 160 180 Fig. 8.6: Average phenotype length, using HoF and complex representation 30 One-length indiviudals 25 Number 20 15 10 5 0 0 20 40 60 80 100 Generation 120 140 160 180 Fig. 8.7: Number of one-lengths phenotypes (marine and tank), using HoF and complex representation 65 8.5 Third experiment - The Final Battle HoF length HoF size 180 160 Hall of Fame dynamics 140 120 100 80 60 40 20 0 0 20 40 60 80 100 Generation 120 140 160 180 Fig. 8.8: Size and length of the reference strategies array, using HoF and complex representation The question arises - if the evolution takes advantage over a more expressive representation and actually finds better strategies. The close analysis of best individuals from each generation shows that in most cases the strategies works on already known rules like “full attack” or “full defense” from experiment one and two. However, three unique strategies - not discovered in previous experiments - were found: • flank attack - in most offensive strategies the units are moving towards the geometric center of the enemy. This may cause units to block each other and in a result not all units attack at once. A player who takes an advantage over this has been found by the evolution. In the “flank attack” units are still moving towards the enemy, but they tend to turn a little bit making an attempt to surround hostile units. • guerrilla attack - if the hostile units are not in the shoot range, tanks are moving towards the enemy. But as soon as the enemy is close (in a shoot range) the tanks are retreating. At the same time marines gather at the geometric center of friendly forces and are waiting for the enemy to come. This is similar behaviour to guerrilla fighters, they approach the enemy, make a fast attack and then run away or try to make an ambush. • siege defense - for the first time a strategy that orders tanks to switch into “siege mode” was evolved. Marines on the other hand gather in the geometric center of friendly forces. This is a good defense strategy against simple “full attack”, which most other individuals played. Due to high complexity of phenotypes of the above strategies, we present only the “siege defense” which is actually very simple, see Table 8.6. It must be underlined that regardless of using elitism, the above three strategies were evaluated as best only in one generation. It suggests that the HoF evaluation method is still not sufficiently prone to noise. 66 8.5 Third experiment - The Final Battle MARINE=IF( JUST_SHOOT, CENTER_MARINE_ENEMY, CENTER_TANK_MATE ) TANK=HOME_MINE Tab. 8.6: Best defense strategy evolved 8.5.3 Conclusions In the third experiment, the evolution process for the first time were able to discover counter-strategies to those learned earlier. However, the results are still unsatisfying: • The strategies with complex phenotypes as “flank attack” and “guerrilla attack” are actually behaving similarly to simple one-node strategies from previous experiments. The differences in units behaviour are subtle, thus it might be said that evolution made only one small step forward. • On the other hand, the effective “siege defense” strategy is actually very simple. Evolution did not create anything new that would not be already present in the terminal set. • HoF evaluation method works better than SET, but still it does not prevent the evolution from forgetting good solutions discovered (which should not happen in case of using elitism). It means that results of a game between two strategies may vary in different simulations. However, this should has been expected, since most strategies play similarly - in case of even opponents both of them win equally often. Therefore the true problem is lack of diversity in the population. Summarizing, the problem of learning the RTT player is definitely very difficult. The task set ahead of the evolution was ambitious. It might be said that - in spite of enormous effort put into developing our approach - only a small success was achieved. However, please notice, that due to incredibly high CPU power demand, the evolution process run for only 190 generations and was using only 32 individuals. We believe this is not even close to “enough” for the machine learning to succeed in case of a such complex problem. Therefore, put in a proper context of little resources we had, this small success of finding only a little bit more intelligent strategies allows us to hopefully expect that with further improvements to our approach (see Section 8.6 for next steps we propose) and with the use of much more computational power, the evolutionary learning of RTT players can produce effective and sophisticated strategies. 8.5.4 ORTS contest Unfortunately the ORTS RTS Game AI Competition 2008 had not been so popular as in the previous years. There was only one opponent in the category of tactical combat, with whom our evolved AI solution played 200 games. For the contest we submitted an algorithm merging three strategies found in the third experiment: full attack, flank attack and siege defense. For each game, the strategy to play was randomly picked at the beginning (with a probability uniformly distributed among all three options). Our solution won 15% played games (see [1]). Keeping in mind that our evolved players were simple, we fought that they would be no match against an AI designed manually. In this context we consider winning 15% of fights as a success. However, unfortunately due to low interest in the ORTS RTS Game AI Competition, it was impossible to test our solutions against large and representative set of players. Thus, the results of the contest are more a curiosity rather than having some significant meaning. 67 8.6 Next steps 8.6 Next steps 8.6.1 Evolution dynamics Let us summary all experiments in the following conclusions: • It was expected that individuals with different chromosomes might have identical phenotypes. But it was not foreseen that individuals with distinct expression trees would be playing a game in the same way. The cause of this is the redundancy in phenotype domain - many expression tress can be reduced to much simpler ones. It means that breeding new individuals and searching trough phenotypes space do not mean searching in the strategies domain, since most different individuals are having the same playing style. More work on the representation design is required. • The population has always converged to one or few simple strategies. Sometimes the counter-strategies were found, but once the evolution reached a local optimum it was not able to abandon it, due to redundant nature of phenotypes described in previous point. • The evaluation of two players in a simulation is very noisy and the results vary in different games (with different initial conditions). Therefore using elitism although should - does not let the good and innovative ideas to survive in the population. Introducing more informative game result should help. In place of a simple “win/lose” outcome, we propose using a quantitative information, namely the difference between global hit point levels of the players. • Using hall of fame gives better results than single elimination tournament. We believe that choosing an evaluation method having a memory of the learning process is a step in a good direction. • The question why does the population converge remains unanswered. There might be several causes: – The representation and the problem characteristic: perhaps a simple attacking strategy is just the easiest one to be discovered by evolution. In terms of solutions domain it is a local optimum that is impossible to avoid and later - impossible to leave from. Improving the representation and/or redefining the problem might reshape the solutions space and therefore enhance diverse strategies exploration. – Co-evolution and competitive fitness: it is believed that evaluating individuals basing on themselves as reference points may lead to stagnation of the evolution process. A possible solution is to put more effort into the design of manual teachers reference set. – stGEP: perhaps the genetic operators are amplifying the convergence effect favoring simple phenotypes. The research comparing stGEP to classic GP might tell more. 8.6.2 Computational cost The problem that effectively prevented the machine learning from finding good RTT players is the enormous CPU power needed to conduct a large experiment. In the case of relatively small experiments we performed, it took altogether approx. two weeks of 68 8.6 Next steps constant computation on around 80 processors! It is unknown how does the evolution work in case of larger populations or if it is allowed to go on for hundreds or even thousands of generations. It seems to be impossible to do appropriate research, since one would need a huge computational cluster. But even having one, it is not a good idea to simply rely on number of machines. The method that is effective but only under the condition of using thousands of computers will not become popular and will be available only to few. We propose to try new things: • Simulation optimization - ORTS is a large and complex framework. The game simulation requires a server and clients that communicate via network. It is very costly. The simple, low-level simulator dedicated for one chosen game would be many times faster. This may allow larger experiments to be conducted. • Changing the metaheuristic search method - one source of the strength of EC lays in the fact that evolution of the population is indeed a parallel search. But maintaining an entire set of individuals is very costly and - in case of RTT player learning perhaps too costly. A well guided one-individual search may be more efficient that “large population” approach. 8.6.3 Redefinition of the problem Perhaps evolving players as proposed in this thesis is still too challenging for the methods of machine learning. The task is to create best strategy for a combat battle, which was brought down to learning how to maneuver the units. This is a great simplification of the problem, but maybe one step further is needed. For example, imagine a real battlefield. When the human commander orders a group of marines to move somewhere he assumes the soldiers know how to move. He is certain that they will not run into each other and block themselves. He knows they will use the shortest path and successfully avoid all obstacles. Assuming this, the commander can think abstractly and create a great strategy that will win him a battle. But is it the case of artificial RTT player? The answer is no, because in our problem the agents had to learn much more by themselves. For example when units were moving, they sometimes blocked each other. Using the human-commander analogy, it is like we would try to create a best strategy for soldiers that do not know how to walk! This is obviously futile and finidng good solutions is much more harder. We have presented in this paper how to, step by step, break down the task set ahead of the AI. We tend to think that this process should has been taken even further. We propose several improvements that are good starting points of continuing our research: • Implement intelligent behaviour - for example implement avoiding obstacles and finding shortest paths by units. Also the choice whom to shoot can be taken more wisely. For example, taking into account all enemies that are in range and their current hit points can result in destroying them a lot faster than the method of “shooting the closest one”. • Improve the representation - in the complex representation we had already introduced the idea of context information. Most terminals had several versions - one for marines, one for tanks, one for units in range, etc. This could be taken even further - for example directed terminals could be introduced. The number of units in range is important, but how much more useful it would be if a unit could know how many enemy units are in certain direction. Knowing there are 8 tanks around is less informative compared to knowing that 6 of them are on the left and only 2 on the right. 8.6 Next steps 69 • Change the representation - perhaps the domain knowledge hidden in the terminal and function is too much low-level and even all the improvements proposed above will not change much. On the one hand they will give more expression power to the representation, but on the other hand the solution space will get larger and more complex, thus cancelling the positive effects. The solution to those problems is a representation defined on a higher level - using more abstract types, less terminals and more problem dedicated functions. For example, instead of two terminals: NUMBER_OF_MARINES, NUMBER_OF_TANKS the function NUMBER(unit) should be used. We believe that more informative representation should be a result of parameterising and creating new functions and not adding more and more very complex terminals. • focus on communication between agents - this aspect was a little bit neglected by us. It does seem that in evolved strategies units had trouble with coordinating their behaviour. All of them were independently from others choosing were to move. Perhaps it is a good idea to distinguish one or more units as so-called leaders. Then the behaviour of the ordinary units could depend not only on their local situation but also on the behaviour of the leaders. This would introduce more cooperation between units. On the other hand, the process of learning could become more prone to noise. 9 Summary No matter which way you have to march, it is always uphill. One of the Murphy’s Laws of Combat Operations. 9.1 Contribution The benefits of this work are diverse. Firstly, the paper pointed to the generic methodology of approaching a very demanding problem such as machine learning of RTT players. This involved presenting the following steps: defining a game and designing its model, choosing a learning method along with all the details of solutions’ encoding and evaluation, and performing actual experiments. Secondly, in the evolutionary computation field, a novel approach of strongly typed gene expression programming was developed and its detailed description was presented. It is a very flexible method and can be used in many different problems, for instance the RTT player learning. We also studied different co-evolutionary evaluation methods and elaborated fine-tuning of the learning process. Finally, our work contributed in the field of distributed computation, since we showed both the theoretical analysis and the real-life design of a master-slave evaluation framework along with details of its implementation. The goal set ahead at the beginning of this paper was to evolve a good strategy, perhaps even a human-competitive one. It may be stated that we are in the midway of fulfilling this objective. We put enormous intellectual effort into designing our approach and performed computations that would take on a single machine more than 2 years. But still, the challenge to automatically create AI for a tactical combat is extremely ambitious and hard. In spite of all the work done, the final results might seem a little disappointing, since best strategies evolved are quite naive. However, they have still managed to compete with humanly designed algorithms on the ORTS RTS Game AI Competition 2008. They did not win, but put up a significant resistance, showing that our research is heading in a good direction. Let us not forget that the research on AI in the real-time games has only just begun. The number of literature positions on this topic is relatively limited and practically most papers are just a few years old. We hope this thesis will contribute well to development of science, enriching such fields as distributed machine learning, AI in games and evolutionary computation. 71 9.2 Work ahead 9.2 Work ahead There are countless possibilities of continuing our research. Although a lot has been achieved, much more is still ahead of us. Generally speaking, it is not known exactly how the different settings and choices we have made influence the learning process. Firstly, one could try to model a game with a help of neural networks or maybe even some rule-based methods rather than a multi-agent system using expression trees. Making an objective comparison between GP and stGEP is another road to follow. There are also some aspects of artificial evolution we had disregarded. For example the selection method, which has been set at the beginning of our work and remained unchanged up to very the end. In terms of improving our approach and in order to make it more effective, there are so many ideas, we do not know where to begin with. For more detailed conclusions drawn from the experiments and more suggestions for the future, please see Section 8.6. It is in this section that we propose to continue research on evolution dynamics in the context of RTT games, focus on optimization issues and even try to redefine the problem or use other machine learning methods. One thing is certain - teaching a machine to play an intellectually demanding, heavily time constrained game will remain an open problem for a long time. Let us hope this great challenge of pushing AI to new levels will draw more and more attention of the scientific environment. We believe that the game is worth the candle. A DVD content The DVD attached to this thesis contains: • The software environment described in Chapter 7. • Raw results of the experiments described in Chapter 8. B Acronyms AI Artificial Intelligence RTS Real Time Strategy RTT Real Time Combat MAS Multi Agent System SET Single Elimination Tournament HoF Hall of Fame EC Evolutionary Computing GA Genetic Algorithms EP Evolutionary Programming ES Evolutionary Strategies GEP Gene Expression Programming GP Genetic Programming st strongly typed IS Insertion Sequence RIS Root Insertion Sequence ORTS Open Real Time Strategy CPU Central Processing Unit GPU Graphics Processing Unit Bibliography [1] 2008 ORTS RTS Game AI Competition, http://www.cs.ualberta.ca/ mburo/orts/aiide08/index.html, 2008. [2] European games developer federation, http://www.egdf.eu/index.html, 2008. [3] Orts - a free software rts game engine, http://www.cs.ualberta.ca/ mburo/orts/, 2008. [4] Wikipedia, the free encyclopedia, http://en.wikipedia.org/wiki/evolutionary_algorithm, 2008. [5] Wikipedia, the free encyclopedia, http://en.wikipedia.org/wiki/military_strategy, 2008. [6] Wikipedia, the free agent_system, 2008. encyclopedia, http://en.wikipedia.org/wiki/multi- [7] Peter J. Angeline and Jordan B. Pollack. Competitive environments evolve better solutions for complex tasks. In Stephanie Forrest, editor, Proceedings of the 5th International Conference on Genetic Algorithms, ICGA-93, pages 264–270, University of Illinois at Urbana-Champaign, 17-21 July 1993. Morgan Kaufmann. [8] Jarosław Arabas. Wykłady z algorytmów ewolucyjnych, volume 303. Wydawnictwa Naukowo-Techniczne, Warszawa, 2001. [9] E. S. Association. 2008 sales, demographics and usage data. essential facts about the computer and video game industry., 2008. [10] Yaniv Azaria and Moshe Sipper. GP-gammon: Genetically programming backgammon players. Genetic Programming and Evolvable Machines, 6(3):283–300, sep 2005. Published online: 12 August 2005. [11] M. Buro. Call for AI research in RTS games. AAAI-04 AI in Games Workshop, San Jose, 2004. [12] M. Buro and T. Furtak. RTS games as test-bed for real-time research. Invited Paper at the Workshop on Game AI, JCIS, 2003. [13] Michael Buro. Game 4 description, http://www.cs.ualberta.ca/ mburo/orts/aiide08/game4, 2008. 75 B Bibliography [14] Michael Buro and Timothy Furtak. On the development of a free rts game engine. In GameOn’NA Conference Montreal, 2005. [15] Michael Buro and Timothy Furtak. ORTS Competition: Getting Started, February 2008. [16] Nichael Lynn Cramer. A representation for the adaptive generation of simple sequential programs. In Proceedings of an International Conference on Genetic Algorithms and the Applications, pages 183–187, 1985. [17] Raphael Crawford-Marks. Virtual witches and warlocks: Computational evolution of teamwork and strategy in a dynamic, heterogeneous and noisy 3D environment. Division iii (senior) thesis, School of Cognitive Science, Hampshire College, 18 May 2004. [18] Candida Ferreira. Gene expression programming: a new adaptive algorithm for solving problems. COMPLEX SYSTEMS, 13:87, 2001. [19] Candida Ferreira. Gene expression programming in problem solving. In Rajkumar Roy, Mario Köppen, Seppo Ovaska, Takeshi Furuhashi, and Frank Hoffmann, editors, Soft Computing and Industry Recent Applications, pages 635–654. SpringerVerlag, 10–24 September 2001. Published 2002. [20] Cândida Ferreira. Function finding and the creation of numerical constants in gene expression programming. In 7th Online World Conference on Soft Computing in Industrial Applications, September 23 - October 4 2002. on line. [21] Candida Ferreira. Mutation, transposition, and recombination: An analysis of the evolutionary dynamics. In Manuel Grana Romay and Richard Duro, editors, 4th International Workshop on Frontiers in Evolutionary Algorithms, North Carolina, USA, 8-14 March 2002. [22] Candida Ferreira. Designing neural networks using gene expression programming. In Ajith Abraham and Mario Köppen, editors, 9th Online World Conference on Soft Computing in Industrial Applications, page Paper No. 14, On the World Wide Web, 20 September - 8 October 2004. [23] Candida Ferreira. programming.com/, 2008. GEP home page, http://www.gene-expression- [24] L. J. Fogel, A. J. Owens, and M. J. Walsh. Artificial Intelligence through Simulated Evolution. John Wiley, New York, USA, 1966. [25] Johannes Fürnkranz. Machine learning in games: a survey. pages 11–59, 2001. [26] Bruce Geryk. A history of real-time strategy http://www.gamespot.com/gamespot/features/all/real_time, 2008. games, [27] David E. Goldberg. Genetic Algorithms in Search, Optimization, and Machine Learning. Addison-Wesley Professional, January 1989. [28] Ami Hauptman and Moshe Sipper. GP-endchess: Using genetic programming to evolve chess endgame players. In EuroGP, pages 120–131, 2005. [29] Ami Hauptman and Moshe Sipper. Evolution of an efficient search algorithm for the mate-in-N problem in chess. In Marc Ebner, Michael O’Neill, Anikó Ekárt, Leonardo Vanneschi, and Anna Isabel Esparcia-Alcázar, editors, Proceedings of the 10th European Conference on Genetic Programming, volume 4445 of Lecture Notes in Computer Science, pages 78–89, Valencia, Spain, 11 - 13 April 2007. Springer. B Bibliography 76 [30] J.C. Herz and Michael R. Macedonia. Computer games and the military: Two views. Defense Horizons, Center for Technology and National Security Policy, National Defense University, 11, April 2002. [31] J. R. Koza. Hierarchical genetic algorithms operating on populations of computer programs. In N. S. Sridharan, editor, Proceedings of the Eleventh International Joint Conference on Artificial Intelligence IJCAI-89, volume 1, pages 768–774, Detroit, MI, USA, 20-25 August 1989. Morgan Kaufmann. [32] S. Luke and R.P. Wiegand. When coevolutionary algorithms exhibit evolutionary dynamics. pages 236–241, 2002. [33] S. Luke and R.P. Wiegand. Guaranteeing coevolutionary objective measures. Poli et al.[201], pages 237–251, 2003. [34] Liviu Panait and Sean Luke. A comparison of two competitive fitness functions. In GECCO ’02: Proceedings of the Genetic and Evolutionary Computation Conference, pages 503–511, San Francisco, CA, USA, 2002. Morgan Kaufmann Publishers Inc. [35] Liviu Panait and Sean Luke. Cooperative multi-agent learning: The state of the art. Autonomous Agents and Multi-Agent Systems, 11(3):387–434, November 2005. [36] Riccardo Poli, William B. Langdon, Nicholas F. McPhee, and John R. Koza. Genetic programming an introductory tutorial and a survey of techniques and applications. Technical Report CES-475, Department of Computing and Electronic Systems, University of Essex, UK, October 2007. [37] Christopher D. Rosin and Richard K. Belew. Methods for competitive co-evolution: Finding opponents worth beating. In Proceedings of the 6th International Conference on Genetic Algorithms, pages 373–381, San Francisco, CA, USA, 1995. Morgan Kaufmann Publishers Inc. [38] Christopher D. Rosin and Richard K. Belew. New methods for competitive coevolution. Evol. Comput., 5(1):1–29, 1997. [39] J. K. Rowling. Harry Potter and the sorcerer’s stone. Scholastic, New York, 1999. [40] S. Sharabi and M. Sipper. GP-sumo: Using genetic programming to evolve sumobots. Genetic Programming and Evolvable Machines, 7(3):211–230, 2006. [41] S. F. Smith. A Learning System Based on Genetic Adaptive Algorithms. Phd thesis, University of Pittsburgh, 1980.