Download Evolving Players for a Real-Time Strategy Game Using Gene

Document related concepts

Microevolution wikipedia , lookup

Gene expression programming wikipedia , lookup

Transcript
Poznan University of Technology
Faculty of Computing Science and Management
Institute of Computing Science
Evolving Players for a Real-Time Strategy
Game Using Gene Expression Programming
Paweł Lichocki
1 September 2008
Supervisor: Krzysztof Krawiec, Ph.D.
1
Clear page
Abstract
This thesis focuses on the fields of real-time strategy games, evolutionary computation,
distributed machine learning and multi-agent systems. In general, the problem is to
automatically learn the best strategy to play a real time strategy game, more precisely a two-player combat of marines and tanks. The idea was inspired by ORTS RTS Game
AI Competition held annually at University of Alberta.
The given problem is very complex and multicriterial, thus final solutions presented
here are the result of a constant development and countless improvements. In the paper
we try to underline the iterative nature of this process and propose a methodology that
could be used for different problems in the real-time games field. We show how to model
the strategy as a multi-agent system and how to fine-tune the evolutionary process of
searching best players. We also explore the subject of distributed learning, focusing on
using a computation cluster for evaluating solutions. The methods of evaluation are
also elaborated in the context of co-evolution, we compare two different methods that
use competitive fitness - single elimination tournament and hall of fame. In order to
encode such complex structures as game strategies, we developed an innovative approach
to strongly typed gene expression programming. It uses a linear chromosome that in the
process of two phase expression is transformed into a valid strongly typed tree.
Finally, we also present the issues of design and implementation of the framework
used for performing experiments. All computations we have conducted require more
than two years of a constant work of a single machine. But in the end - regardless
of this computational effort - not all of the evolutionary methods worked as they had
been expected to. However, the data gathered allowed us to draw many constructive
conclusions and to propose several different directions of future research.
Acknowledgments
I am very grateful for the advice of my supervisor, Krzystof Krawiec, and his assistant,
Wojciech Jaśkowski, from Poznan University of Technology. I wish to thank both of them
for their feedback and guidelines. Wojciech Jaśkowski was an irreplaceable source of ideas
and literature references. He kept me focused on my research, constantly encouraged me
to do more work and - if needed - did not hesitate to criticise it constructively.
I would like also to acknowledge Prof. Cristina Bazgan and Prof. Daniel Vanderpooten, who have been supportive during my visit at Université Paris-Dauphine. Being
there I was lucky enough to be able to share my ideas with other French students. Their
remarks allowed me to view my research from a different angle, for which I am very
grateful to them.
Contents
1 Introduction
1.1 Scope of research . . . .
1.2 Motivation and benefits
1.3 Goals . . . . . . . . . .
1.4 Thesis organization . . .
1.5 Used concepts . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
2 The problem
2.1 Methodology . . . . . . . . . . . .
2.2 ORTS contest . . . . . . . . . . . .
2.3 Rules of the game . . . . . . . . . .
2.3.1 Type of game . . . . . . . .
2.3.2 Simulation . . . . . . . . .
2.3.3 World representation . . . .
2.3.4 Movement . . . . . . . . . .
2.3.5 Fight . . . . . . . . . . . .
2.3.6 Specific rules . . . . . . . .
2.3.7 Available orders . . . . . . .
2.4 State of the art . . . . . . . . . . .
2.4.1 Evolutionary computation .
2.4.2 Multi-agent systems . . . .
2.4.3 Machine learning in games
2.4.4 Commercial RTS games . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
3 Modelling
3.1 Objectives . . . . . . . . . . . . . . . . . . . . .
3.2 Initial player model . . . . . . . . . . . . . . . .
3.2.1 Human-based player . . . . . . . . . . .
3.2.2 MAS-based player . . . . . . . . . . . .
3.3 Minimizing AI task . . . . . . . . . . . . . . . .
3.3.1 Domain knowledge and memory . . . .
3.3.2 Focusing the AI task on units movement
3.4 Multi-agent system . . . . . . . . . . . . . . . .
3.4.1 Hybrid MAS . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
7
7
7
8
9
9
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
11
11
12
13
13
13
13
13
14
14
15
15
15
16
17
18
.
.
.
.
.
.
.
.
.
20
20
20
20
21
22
22
22
23
23
5
Contents
3.4.2
3.4.3
Agent . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Vector representation . . . . . . . . . . . . . . . . . . . . . . . . .
4 Learning
4.1 Objectives . . . . . . . . . . . . . . . .
4.2 Algorithm . . . . . . . . . . . . . . . .
4.2.1 Search loop . . . . . . . . . . .
4.2.2 Selection and elitism . . . . . .
4.2.3 Co-evolution . . . . . . . . . .
4.3 Evaluation . . . . . . . . . . . . . . . .
4.3.1 Single Elimination Tournament
4.3.2 Hall of Fame . . . . . . . . . .
4.4 Distributed learning . . . . . . . . . .
4.4.1 Assumptions . . . . . . . . . .
4.4.2 Distributed SET . . . . . . . .
4.4.3 Distributed HoF . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
5 Encoding
5.1 Objectives . . . . . . . . . . . . . . . . . . .
5.2 Strongly typed GEP . . . . . . . . . . . . .
5.2.1 Encoding simple structures . . . . .
5.2.2 Closure of the representation . . . .
5.2.3 Expression by two-phase translation
5.3 Genetic operators . . . . . . . . . . . . . . .
5.3.1 Recombination . . . . . . . . . . . .
5.3.2 Transposition . . . . . . . . . . . . .
5.3.3 Inversion and mutation . . . . . . .
5.3.4 Dc and RNC operators . . . . . . .
6 Representation
6.1 Objectives . . . . . . . . . . . . . . . . .
6.2 Types . . . . . . . . . . . . . . . . . . .
6.3 Functions . . . . . . . . . . . . . . . . .
6.4 Domain knowledge . . . . . . . . . . . .
6.4.1 Simple terminals . . . . . . . . .
6.4.2 Complex scalar terminals . . . .
6.4.3 Complex vector terminals . . . .
6.4.4 Complex boolean terminals . . .
6.4.5 Normalization . . . . . . . . . . .
6.4.6 RNC vectors and map mirroring
7 Implementation
7.1 Objectives . . . . . . . . . .
7.2 The framework . . . . . . .
7.2.1 Master-slave design .
7.2.2 Tools and libraries .
7.3 Maintaining experiments . .
7.3.1 Monitoring . . . . .
7.3.2 Logging and analysis
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
24
24
.
.
.
.
.
.
.
.
.
.
.
.
26
26
26
26
27
27
28
28
30
32
32
32
34
.
.
.
.
.
.
.
.
.
.
35
35
35
35
36
36
39
39
41
42
43
.
.
.
.
.
.
.
.
.
.
44
44
44
45
45
45
46
48
49
50
50
.
.
.
.
.
.
.
51
51
51
51
53
53
53
54
8 Experiments and results
55
8.1 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
8.2 Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
6
Contents
8.3
8.4
8.5
8.6
First experiment - The Reconnaissance
8.3.1 Objectives and assumptions . .
8.3.2 Results . . . . . . . . . . . . .
8.3.3 Conclusions . . . . . . . . . . .
Second experiment - The Skirmish . .
8.4.1 Objectives and assumptions . .
8.4.2 Results . . . . . . . . . . . . .
8.4.3 Conclusions . . . . . . . . . . .
Third experiment - The Final Battle .
8.5.1 Objectives and assumptions . .
8.5.2 Results . . . . . . . . . . . . .
8.5.3 Conclusions . . . . . . . . . . .
8.5.4 ORTS contest . . . . . . . . . .
Next steps . . . . . . . . . . . . . . . .
8.6.1 Evolution dynamics . . . . . .
8.6.2 Computational cost . . . . . .
8.6.3 Redefinition of the problem . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
56
56
57
59
59
59
60
62
63
63
63
66
66
67
67
67
68
9 Summary
70
9.1 Contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
9.2 Work ahead . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
A DVD content
72
B Acronyms
73
Bibliography
74
1
Introduction
The easiest way is always mined.
One of the Murphy’s Laws of Combat Operations.
1.1 Scope of research
This thesis focuses on the fields of real-time strategy (RTS) games, evolutionary computation (EC), distributed machine learning and multi-agent systems (MAS). In general, the
problem is to automatically learn the best strategy to play an RTS game, more precisely
- a Real-Time Tactical (RTT) two-player combat of marines and tanks. The method to
achieve this is using a novel approach of strongly typed gene expression programming
(stGEP) combined with a MAS model. The given problem is very complex and multicriterial, thus final solutions presented here are a result of a constant development and
countless improvements. In the paper we try to emphasise the iterative nature of this
process. Apart from showing that a machine is able to learn to play an intellectually
demanding real-time game, this thesis proposes a methodology of modelling any RTT
game and shows the methods of distributed learning.
1.2 Motivation and benefits
The computer and video game sales in 2007 were worth 9.5 billion dollars in the United
States only and the best-selling computer game genre was the strategy one - 33.9% of
all units sold [9]. The game industry is growing and consolidating. It was in November
2006 the European Games Developer Federation was officially formed by its founding
members, representing over 500 leading game development studios from ten European
countries [2]. Games are constantly being improved, they need to be more complex,
faster and smarter, which raises a wide range of unsolved problems in nearly all fields of
computer science, as for example graphics, code optimization, algorithmics and artificial
intelligence, which interests us the most. The volume of sales and never-ending intellectual and engineering problems bring more and more attention to the game industry
from the scientific environment. It is worth mentioning that whereas commercial games
1.2 Motivation and benefits
8
industry pursue to increase the entertaining value of games, the AI research tries to push
the cognitive abilities of the machines to new levels [11].
In the context of machine learning, the RTS games are especially challenging. Their
common elements are severe time constraints and a strong demand of real-time AI which
must be capable of solving real-world decisions quickly and satisfactory [12]. As for now,
most scientific and programming effort has been devoted to the areas related to classic
board games, such as chess, go or checkers. Thus, the main motivation, apart from
aiding billions of dollars worth industry, is to bring the AI in games onto another level,
to explore new possibilities and develop new methods. As the research in this field has
begun only recently, one might say that this paper tries to co-create and enhance its
fundamentals.
Since computer games require methods for dealing with many different problems the
benefits of solving them are also diverse:
• Simulations are a critical aspect of military training which has much in common
with commercial computer games and can learn from its successful experience [30].
• The cooperative path-finding and efficient management of unit groups is helpful in
logistics.
• Learning how to gather and distribute resources between industry, science and
military might aid micro and macro economy.
• And last but not least - RTS games can be seen as a part of operational research,
being successfully transformed to many different combinatorial or scheduling problems.
These are just few examples of many applications which may profit from the progress of
AI research in RTS games.
1.3 Goals
There is one main goal of this thesis - to automatically create the best strategy for
a given RTS game. We assume that the measure of the AI success may be intuitively
represented by the formula A·I, the more artificial and intelligent the resulting strategy is,
the better. However, keeping the human intervention in the learning process to minimum
is not a goal of itself. Sometimes it is for the best to incorporate some human knowledge
accordingly to rule “gain much intelligence at the cost of little artificiality”.
This thesis has several additional objectives:
• Present strongly typed GEP, which is an innovative method of encoding complex
structures as linear entities.
• Elaborate the methodology of modelling and learning RTS players.
• Analyze different methods of evaluation solutions in the context of distributed
learning.
• Prove experimentally that the approach is able to produce good solutions.
• Test human-competitiveness by submitting best solutions to the 2008 ORTS RTS
Game AI Competition [1].
• Present the design of the software framework used in experiments. This includes
the stGEP implementation, the way of conducting evolutionary experiments and
the method of distributing the computations.
1.4 Thesis organization
1.4
9
Thesis organization
This thesis is organized as follows:
• Chapter 1 refers to objectives and motivations of this work.
• Chapter 2 presents the problem (rules of the game) and the methodology used to
solve it.
• Chapter 3 shows in details how to model an RTT player, we focus especially on
using multi-agent systems.
• Chapter 4 addresses the topic of evolving best strategies and fine-tuning the process
of learning. The main issue brought up in this chapter is the evaluation of RTT
players in distributed environment.
• Chapter 5 gives details on a way to encode a complex structure in a linear chromosome. We also introduce the innovative approach of strongly typed gene expression
programming.
• Chapter 6 describes exact representations used to construct strategies in the experiments.
• Chapter 7 is about the implemented software and the tools used and developed
within the project. We show the design of a master-slave scheme of the framework
and give some insights on how the experiments were maintained.
• Chapter 8 shows the results of the experiments, the stress is put on the iterative
process of introducing changes into our approach. For each experiment we present
initial objectives and assumptions, than the analysis of the results and finally - the
conclusions.
• Chapter 9is a short summary of all work done and contains general conclusions and
ideas for the future.
1.5 Used concepts
The classification of computer games is still an open topic of dispute. Thus, it is necessary
to make clear definitions of some concepts.
• Real-Time - means not turn-based. In reality it is not possible to simulate a game
with a continous time. Therefore, real-time is approximated by using as many turns
per second as necessary, so the human player has the illusion of time continuity.
• Real-Time Strategy (RTS) - a strategic computer game which gameplay mechanics
implies combat, resource gathering, base building and technological development, as
well as abstract unit control (giving orders as opposed to controlling units directly)
[26].
• Real-Time Tactics (RTT) - a tactical computer game which gameplay mechanics
implies only combat and abstract unit control. Players are expected to complete
their tasks (usually defeating the enemy side) using only the combat forces provided
to them. We consider RTT to be a subset of RTS games.
10
1.5 Used concepts
Also the basic term strategy has different meanings, depending on the context.
• Strategy - in game theory it is a way to play the game, a detailed plan describing
the player’s behaviour in every possible situation (game state).
The military point of view is different, let us quote wikipedia [5]
Military strategy is a collective name for planning a warfare. The father
of modern strategic study, Carl von Clausewitz, defined military strategy as
"the employment of battles to gain the end of war." Liddell Hart’s definition
put less emphasis on battles, defining strategy as "the art of distributing and
applying military means to fulfill the ends of policy" Hence, both gave the
preeminence to political aims over military goals, ensuring civilian control of
the military.
Military tactics (Greek: Taktikē, the art of organizing an army) are the
techniques for using weapons or military units in combination for engaging
and defeating an enemy in battle.
The border line between strategy and tactics is blurred and sometimes categorization of
a decision is a matter of almost personal opinion. Therefore for being consistent with
genre classification of computer games, game theory and the military concepts, the key
terms are defined as follow:
• Real-Time Tactics - used to describe the game this thesis focuses on. From military
point of view it is a simulation of a small battle with marines and tanks.
• Strategy - is understood like in the game theory, that is as an algorithm to play
the game. In context of RTT and military it is closest to military tactics. In this
paper, as a synonym to strategy a word “player” is frequently used. Additionally,
in the context of EC the strategy may be considered as the individual and more
precisely - as its phenotype.
2
The problem
No battle plan ever survives contact with the enemy.
One of the Murphy’s Laws of Combat Operations.
2.1 Methodology
Machine learning of the RTT player is a very complex and difficult task. Therefore it
was break down into several steps. This decomposition to few independent subproblems
simplifies and organizes the work. It was possible to propose a whole methodology for
the AI-in-games learning.
1. Make assumptions
(a) define the problem
(b) set objectives and goal
(c) view the literature if there is a research on similar problem
2. Modelling
(a) model a general player
(b) break down the model and focus the AI effort on the most important task
(c) formulate the AI as a procedure returning a well-defined result
(d) specify the details of the model
3. Learning
(a) choose learning method
(b) specify the method of evaluation of players
(c) define the smallest indivisible computational task of the evaluation method
(d) distribute the computation
4. Encoding
12
2.1 Methodology
(a) encode the player as an abstract entity
(b) define the expression of the player from the abstract entity
(c) define the operators for exploring encoded solutions space
5. Representation
(a) specify all the details of encoding the player - define solution space
(b) put stress on designing the domain knowledge
6. Implementation
(a) choose existing frameworks for simulations and learning or implement them
yourself
(b) implement all necessary additional software
(c) if needed design a method of distributing the computations, since game simulations may be very time consuming
7. Experiments
(a) conduct experiments
(b) draw conclusions - if needed go back to any point above, and than test again
(c) when satisfied, return the result
2.2 ORTS contest
ORTS stands for Open Real-Time Strategy. It is an open-source and hack-free programming environment for real-time strategy games simulation. Let us quote the project’s
site [3]
ORTS is a programming environment for studying real-time AI problems such as pathfinding, dealing with imperfect information, scheduling,
and planning in the domain of RTS games. These games are fast-paced and
very popular. Furthermore, the current state of RTS game AI is bleak which
is mainly caused by the lack of planning and learning - areas in which humans are currently much better than machines. Therefore, RTS games make
an ideal test-bed for real-time AI research. Unfortunately, commercial RTS
games are closed software which prevents researchers from connecting remote
AI modules to them. Furthermore, commercial RTS games are based on
peer-to-peer technology - which in a nutshell runs the entire simulation on
all player machines and just hides part of the game state from the players.
By tampering with the client software it is possible to reveal the entire game
state and thereby gain an unfair advantage. We feel that this is unacceptable
for playing games on the Internet. We therefore started the ORTS project
to create a free software system that lets people and machines play fair RTS
games. The communication protocol is public and all source code and artwork is freely available. Users can connect whatever client software they like.
This is made possible by a server/client architecture in which only the currently visible parts of the game state are sent to the players. This openness
leads to new and interesting possibilities ranging from on-line tournaments
13
2.2 ORTS contest
of autonomous AI players to gauge their playing strength to hybrid systems
in which human players use sophisticated GUIs which allow them to delegate
tasks to AI helper modules of increasing performance.
The ORTS RTS Game AI Competitions are organized annually from 2006 at University
of Alberta in Canada. This years edition was held on 1-8 August 2008. It was decided
to participate in the constest in order to compete with handmade AI and check humancompetitiveness of proposed solutions, for details please see Subsection 8.5.4.
2.3
Rules of the game
2.3.1
Type of game
The game that this thesis focuses on is a two-player RTT. All rules, definitions and
descriptions were taken from Game 4 (“Small-Scale Combat”) of 2008 ORTS RTS Game
AI Competition [15, 13]. Since we use the Open Real-Time Strategy (ORTS) engine
to simulate the game, it is necessary to explain in details how does this software work,
because it somehow defines rules of the game itself.
Two players start with 50 marines and 20 siege tanks each, randomly located within
the left or right quarter of the map. Marines and tanks are located diagonally symmetric
to the opponent’s. Neutral units (called “sheep”) are roaming randomly the map. The
objective of the game is to destroy as many opposing units as possible in 2400 simulation
frames. The games at the contest were run at speed of 8 simulation frames per second,
which refers to 5 minutes. During our research we run the simulations as fast as possible,
reaching approx. 40 frames per second. This means that single game lasts at maximum
for one minute, this is the approximation that will be used in time cost analysis of the
learning process in Section 4.4.
2.3.2
Simulation
Time in the games is measured in discrete frames of equal duration. In the ORTS
competition a pace of 8 frames per second was used. Once per frame the game server
sends individual game views to all clients (the player software) which then can specify
at most one action per game object under their control and send this vector of actions
back to the server. All received actions are then randomly shuffled and executed on the
server.
2.3.3
World representation
The world is represented by a rectangular array of 64x48 tiles. There are no terrain
features (the terrain is unobstructed) nor fog of war (there is full visibility). Objects in
the world have a shape and a position on the terrain. The position and size of objects are
represented by integers using a scale of 16 points per tile (“tile points”). Units (marines
and tanks) are small circles, that are specified by their center and radius.
2.3.4
Movement
All units have a maximum speed with which they can move. This means that within each
simulation frame, the valid moves for an object are constrained to the integer coordinates
that are within a distance of less or equal to the speed of the object. Movement targets are
14
2.3 Rules of the game
only constrained to be integers; any location on the game field is acceptable. Every move
is assumed to go in a straight line from the object’s current position to its destination.
Objects move simultaneously and their current locations are rounded to tile points before
being sent to the clients which can lead to temporary and small object overlap on the
client side. In case of collisions with other objects, the moving objects are stopped at the
collision location and no damage is inflicted. If the target location for a move command
cannot be reached in one simulation cycle the object will continue to move in a straight
line until: a new move (or stop command) is sent, the object collides with a other object
or scripted game mechanics cause its motion to change. It is not possible to do several
moves in one tick in order to avoid obstacles, even if the total distance of the moves is
less than the maximum speed of the unit.
2.3.5 Fight
Units can engage in combat with other units. Marines and tanks attack from a distance.
It suffices for any part of the attacked object to be in range. Specifically, weapon range is
compared against the minimum physical distance between any part of the objects. After
attacking, the weapons are subjected to a cooldown period before they can attack again.
The cooldown time is specified in simulation frames. Objects have hit points (HPs)
indicating how much damage they can take before being destroyed. Lost HPs cannot be
regained. Units can also have armor which decreases the damage dealt by a weapon by
subtracting a constant from each attack. Damage values are uniformly distributed over
certain intervals. Units only die after a simulation frame has been completed when their
HPs have dropped below 1. This ensures that the order of executing attack actions is
irrelevant.
2.3.6 Specific rules
In the object value tables below, the vision range unit is tiles, build times and cooldown
periods are given in simulation frames, costs are in minerals, the speed unit is tile points
per simulation frame, and object sizes are given in tile points. For detailed units characteristic please refer to Table 2.1.
Object
HP
Speed
Size
Range
Armor
Damage
Cooldown
Switch
Tank
Tank (in siege)
Marine
Sheep
150
80
∞
3
0
3
3
7
7
4
4
112
32-160
64
-
1
1
0
0
26-34
50-60
5-7
-
20
50
8
-
24
24
-
Tab. 2.1: Units characteristics
Marines and unsieged tanks can move and attack simultaneously. However, let us emphasise that only one action per game object can be send to the server in each simulation
frame. Therefore in the simulation frame in which the unit shoots it is not possible to
also order it to move (however it will continue moving if ordered to do so before). Tanks
can switch between sieged and unsieged modes, enabling weapons weapon2 and weapon
respectively. Neither weapon is available during this transition, nor can the tank move.
In siege mode tanks attack ground locations dealing out splash damage with a radius of
15. This means that targets at the impact location receive the full damage, whereas objects at a distance receive linearly scaled down damage up to the impact-to-hull distance
of 15 tile points.
15
2.3 Rules of the game
2.3.7 Available orders
The full list of possible orders for each unit that can be send to the server is as fallows:
• Marine
– move(x,y[,s]): start moving towards (x,y) with speed s (or max-speed)
– stop(): stops moving and attacking
– weapon.attack(obj): attack object
• Tank1
– move(x,y[,s]): start moving towards (x,y) with speed s (or max-speed)
– stop(): stops moving and attacking
– switch(): begin transition between sieged/unsieged modes (no effect if already
switching)
– weapon.attack(obj): attack object
– weapon2.attack(x,y): create explosion of radius 15 centered on (x,y), dealing
out splash-damage
2.4
State of the art
2.4.1
Evolutionary computation
Evolutionary computation is a subfield of artificial intelligence and its techniques mostly
involve metaheuristic optimization algorithms. Although there are many other biologically inspired approaches that relate to EC, the main part of it are evolutionary algorithms
(EA). They use some mechanisms inspired by biological evolution: reproduction, mutation, recombination, and selection. Candidate solutions to the optimization problem play
the role of individuals in a population, and the cost function determines the environment
within which the solutions "live”. Evolution of the population then takes place after the
repeated application of the above operators [4].
We wish to show the history and progress of evolutionary algorithms, therefore we
distinguish four main mile stones2 :
• genetic algorithms (GA) - the oldest of all EA methods dating back to mid ’50s.
The chromosome is a sequence of ones and zeros and the fitness function is defined
over the genetic representation and measures the quality of the represented solution.
Therefore it may be said that in GA the phenotype is the same as the genotype
and the method finds fixed solutions.
• evolutionary programming (EP) - first used by Lawrance J. Fogel in the mid ’60s
[24]. The chromosome is an abstract entity, which need to be expressed into solutions domain before being evaluated by fitness function. It may be said that
in EP the phenotype is not the same as the genotype and the method finds fixed
solutions.
1 weapon/weapon2
are enabled/disabled based on the current mode
do not mention one, very popular subfield of EA, which is evolution strategies (ES) developed in
early ’70s and are mainly used for optimization in continuous spaces
2 we
16
2.4 State of the art
• genetic programming (GP) - the method for evolving computer programs. The
first experiments with GP were reported by Smith [41] and Cramer [16]. However
it was John Koza who in turn of ’80s and ’90s propagated the idea starting with
[31] and popularized it in a series of his books. In GP a chromosome has a tree
structure and does not need further expression. Therefore it may be said that in
EP the phenotype is the same as the genotype and the method finds algorithms (a
way of finding solutions).
• gene expression programming (GEP) - the youngest of all EA methods, created and
developed by Cândida Ferreira at the beginning of XXI century in her first papers
[18, 19]. GEP is formally a subset of Genetic Programming and is sometimes
referred to as linear GP. However, there are substantial differences - in GEP the
chromosome is a linear entity which is expressed into a tree. Therefore it might be
said that in GEP the phenotype is not the same as the genotype and the method
finds algorithms (a way of finding solutions).
It might be argued which of the methods is the best - is it better to evolve solution or
entire algorithms. That obviously depends on the application and the problem. It is
also impossible to say if it is better to have different genotypes and phenotypes or not.
A comparison between GP and GEP effectiveness is a matter of still open discussion
with some interesting claims by C. Ferreira [cite] that GEP may outperform GP by a
factor from 100 to 60000 [23]. However, the progress from basic heuristics searching for
sequences of 1s and 0s to methods finding algorithms encoded in an abstract form is
unquestionable.
For more information on evolutionary computation in general, please see David E.
Goldberg book [27]. For Polish reader we recommend the book of Jarosław Arabas [8].
2.4.2
Multi-agent systems
The terms in the field of multi-agent systems are not well defined. Generally speaking
an agent might be considered as computational mechanism that exhibits a high degree
of autonomy, performing actions in its environment based on information (sensors, feedback) received from the environment [35]. A multi-agent means there are more than one
agent in the environment. MAS can be used to solve problems which are difficult or
impossible for an individual agent or monolithic system to solve. Examples of problems
which are appropriate to multi-agent systems research include online trading, disaster
response, and modelling social structures [6].
Multi-agent learning (MAL) means applying method of machine learning in order to
develop a multi-agent system that will solve given problem or behave in desired way.
Two mostly often used methods of MAL are:
• Reinforcement learning - useful in domains where reinforcement information is provided after a sequence of actions performed in the environment [35]. Then an agent
is rewarded or punished. This cannot be applied in RTT player learning since we
do not know if agent behaviour is desired until the game ends.
• Stochastic search - methods such as evolutionary computation, simulated annealing,
stochastic hill-climbing, etc.
The simplest approach in MAL is so-called team learning. It uses a single process of
learning to discover a behaviour for entire team. Because there is only one learner,
standard algorithms from single-agent machine learning may be used. Also this method
focuses on performance of entire team and not only single individuals. All this makes
17
2.4 State of the art
team learning a perfect choice for finding good game strategies. There are three different
types of team learning:
• Homogeneous learning - only a single behaviour algorithm is created and each agent
acts accordingly to it. This simplifies the process of learning, since all the agents
have identical behaviour and the search space is drastically reduced. However,
sometimes problem requires agent specialization, in that case homogeneous learning
is not advised.
• Heterogeneous learning - there are separate behaviour algorithms created for each
agent, thus this approach is ideal for problems requiring the emergence of specialists. However, the level of complexity is much higher than in homogeneous
learning.
• Hybrid learning - there are separate behaviour algorithms created for each type
of agents. It seems this method combines the best aspects of two previous ones
keeping the search space in a reasonable size.
2.4.3 Machine learning in games
Games has always been used as a sort of testbed for AI methods. Two-player board games
like checkers or chess have all been extensively researched and it is believed that machine
learning matured enough to challenge much more difficult games, particularly the RTT
games. Generally speaking, the main idea basis on learning from simulation. The method
itself is not new, since it was first proposed for classic turn based games with imperfect
information and/or random components [25], for example bridge or backgammon. In
these cases the conventional search of the best move for the given game state is using a
technique which evaluates positions by playing a multitude of games with this starting
position against itself. However, in the case of RTT games learning from simulation is
used to discover entire strategies to play the game, since its complexity makes evaluating
of single moves impossible.
Let us present a quick description of some positions from the literature, we will focus
mainly on using EC in learning to play a game:
• In [10] Azaria and Sipper apply genetic programming to the evolution of strategies
for playing the game of backgammon. They explore two different strategies of
learning: using a fixed external opponent as teacher, and letting the individuals
play against each other. The conclusions were that the second approach is better
and leads to excellent results.
• In [28] Hauptam and Sipper successfully use genetic programing in the evolution of
strategies for playing chess endgames, achieving competitiveness with human-based
strategies. The work was continued in [29] where entire chess player was evolved.
• In [40] Sharabi and Sipper took the AI in games to another level by evolving a
control system for real-world, sumo-fighting robots.
• In [17] Crawford-Marks tries to evolve team players for quidditch, a complex 3dimensional game taken from popular Harry Potter books [39]. However he concludes that on the evolutionary front, the first attempt at evolving quidditchplaying was not as successful as was hoped for.
18
2.4 State of the art
Feature
Commercial RTS games
ORTS
Cost
License
Game specification
Network mode
Prone to hacks
Communication protocol
Network data rate
Unit control
Game interface
~US$ 55
closed software
fixed
peer-to-peer
yes
veiled
medium
high-level, sequential
fixed GUI
US$ 0
free software (GPL)
user-definable
server-client
no
open
low to medium
low-level, parallel
user-definable
Tab. 2.2: How ORTS relates to commercial games
2.4.4
Commercial RTS games
The rich set of RTS games is available on the market nowadays. The author himself
remembers spending many hours playing such games as Warcraft II™, Starcraft™, Age of
Empires™or Seven Kingdoms™. This games are fast-paced war simulations and became
very popular in recent past, but mainly thanks to the multiplayer option. Single player
campaigns are maybe very interesting, but after a relatively short time they stop being a
real challenge and if a human player finishes the game it is mainly thanks to the narrative
story embedded in the campaign. The way of designing the AI in the commercial RTS
games is known only to developers. However it is not a secret that “the intelligence” is
hand-designed. The AI programmers first learn to play the game in order to understand
its rules and to design good strategies, which then are scripted and hardcoded into the
game. The iterative process of tests and improvements results in a static and definitive
computer player. Some developers do boast about innovative AI that learns even during
playing, but as for now the practice shows that they are no match for a human player.
Commercial RTS games software is closed and not expandable. This prevents researchers and hobbyists from tailoring RTS games to their needs and from connecting
remote AI modules in order to gauge their playing strength [14]. It effectively disturbs
the real progress of the AI in RTS. To counteract this M. Buro with co-workers implemented the previously mentioned ORTS framework. There are several advantages of
using it, please refer to Table 2.2 taken from [14] for a comparison between commercial
games and ORTS. Figures 2.1 and 2.2 show two exemplary screenshots downloaded from
[3].
Fig. 2.1: ORTS screenshot, the framework may use 2D graphics
Fig. 2.2: ORTS screenshots, the framework has also high quality 3D graphics
3
Modelling
Teamwork is essential; it gives the enemy other people to shoot at.
One of the Murphy’s Laws of Combat Operations.
3.1 Objectives
This paragraph is about modelling a player in RTT game. We try to show the top-down
nature of the design process. In addition to what was said in Section 1.5, we precisely
define the strategy as a simple, interpretable procedure for (successfully) playing a game
or subgame [25]. The requirements for this procedure are:
• it must be easy to use in the playing application (a client for the ORTS server)
• it must be easy to represent as an abstract entity to apply the methods of machine
learning
• it must be possible to automatically and effectively learn it
3.2 Initial player model
3.2.1
Human-based player
The most intuitive model of a strategy in the RTT game is an all-knowing player which
is the only, central decision maker. This approach is similar to human behaviour, which
might be summarized in a following rule.
As frequently as possible and/or in the reaction to game events, use all information
available to give orders to selected units under your command.
One may say that a player is a sort of “overmind” who knows as much as possible
about the state of the game and is directing all the units. Let us rephrase the rule in the
language of pseudo-code and in the context of client-server nature of an ORTS engine.
There are two main issues regarding this approach:
21
3.2 Initial player model
struct Decision
Order o;
Unit u;
end;
procedure player
begin
repeat
State state = get_game_state_from_server();
Decision d = make_decision(state);
send_order_to_server(d.unit, d.order);
until game over
end;
Algorithm 3.1: Human-like RTT Player
• Total freedom in making decisions. The AI must be capable not only to give best
orders to units, but also to choose which unit and when to command! This greatly
enlarge the solution space and makes practically impossible to learn the AI that
will play the game.
• Scalability. The decisions must be adequate both at the beginning of the game,
when entire armies are still present and at the end, when probably only few units
are left alive. Of course there are countless transition situations between those two.
Human player easily overcomes this challenges, machine does not.
3.2.2 MAS-based player
Designing a strategy that plays as human, that constantly and actively chooses and
makes decisions both in the context of units and orders is a task worth separate thesis
and research itself. Thus, there is a a great need for simplification and the main idea is
to let units decide for themselves. This leads to a transformation from a situation where
there is only one decision maker to a multi-agent system model. From now on each unit
chooses the best action for it to do by itself (basing on the game state).
procedure player
begin
foreach simulation frame do
State state = get_game_state_from_server();
foreach friendly unit do
Order o = unit.choose_order(state);
if (NULL != o) then
send_order_to_server(unit, o);
done
done
end
Algorithm 3.2: RTT Player basing on a MAS model
The code 3.2 looks intuitive and simple. Let us underline that all AI is hidden in the
function choose_order. There are however few severe drawbacks left.
• The orders are made basing on the game state. Which means that all friendly and
enemy units, their positions and current parameters are taken into account. But
22
3.2 Initial player model
when considering one selected unit, large part of it is irrelevant, because orders
have a local character. Knowing a priori which information to select could greatly
aid the AI.
• On the other hand, game state alone is not sufficient - there is a need for additional
information derived from it. For example the number of still living enemy units
is not given explicitly. Expecting that AI will learn, apart from playing, how and
what knowledge extract from a game state heavily enlarges the solution space.
3.3
Minimizing AI task
3.3.1
Domain knowledge and memory
Simply put, it is too much to ask from a machine to simultaneously learn how to acquire
information and use it to make wise decisions in a noisy environment. In case of RTT
the game state can be considered as raw data from which further information must be
retrieved. In addition, when using MAS the information must be gathered in the context
of an agent (unit). Listing 3.3 shows necessary changes to the algorithm. At this point
we will not plunge into details how exactly domain knowledge is extracted, this will be
covered in the Chapter 6. However, please notice that the algorithm has been enriched
with the concept of memory. For example, it might be desirable to know if a unit was
shot in the last simulation frame or how did the unit move previously. For simplicity
reasons only the current and the last game state are remembered.
procedure player
begin
State previous = NULL;
State current = NULL;
foreach simulation frame do
previous = current;
current = get_game_state_from_server();
foreach friendly unit do
Information info = gather_information(previous, current, unit);
Order o = unit.choose_order(info);
if (NULL != o) then
send_order_to_server(unit, o);
done
done
end
Algorithm 3.3: RTT Player with domain knowledge extraction and memory
3.3.2
Focusing the AI task on units movement
The still unresolved problem is that it must be feasible to learn the function choose_order
by conducting evolutionary experiments. After several brain cycles we decided to break
down the choose_order to separate procedures, thus simplifying the task set ahead of
AI. In this process several facts about given RTT game and ORTS engine are helpful:
• The majority of time units move. At best, marines can shoot every 8 simulation
frames and tanks - every 20. It means that units spend approx. 90% of their time
moving.
23
3.3 Minimizing AI task
• It might be assumed that units should shoot as frequently as they can.
• If a unit moves intelligently (and that is the goal), we can assume that it is sufficient
to always shoot the closest enemy. This makes the process of learning easier and
more stable - a unit must only learn how to move, being certain it will always shoot
the nearest enemy.
• It is desired to always move with a maximum speed, because mostly often the goal
is to reach destination as soon as possible.
Thus, entire units behaviour can be brought down to 3.4:
function unit.choose_order(info)
begin
Order o = NULL;
Target t = NULL;
if (self.can_shoot) then
Target t = closest_target_in_range(info, self);
end
if (NULL != t) then
o = SHOOT(t);
else
Vector v = self.choose_move_vector(info);
if (self.type == TANK and v ==(0,0) ) then
o = SWITCH();
else
o = MOVE(v, self.max_speed);
end
send_order_to_server(unit, o);
end
Algorithm 3.4: Ordering a unit
All AI effort is now put into finding the best vector of movement1 in the choose_move_vector
routine. This way the problem of learning a strategy was successfully simplified and narrowed down in such a manner, that it seems to be possible to encode and learn a player.
Algorithms 3.3 and 3.4 are the final ones.
3.4
Multi-agent system
3.4.1
Hybrid MAS
In Subsection 2.4.2 three types of multi-agent systems are distinguished: homogeneous,
heterogeneous and hybrid, each of them having certain advantages and drawbacks. For
RTT player modelling the hybrid MAS will be used. The reasons for that are quite
straightforward and actually relate to why not use any other type of MAS.
• The tank and the marine are very different types of units and they cannot be
modelled by the same agent. Therefore homogeneous MAS - assuming the use of
only one agent to model all the units in the environment - is unsuitable.
1 in
case of tanks the no-movement vector (0,0) is treated as an order to switch into siege mode
24
3.4 Multi-agent system
• Heterogeneous MAS assumes there is a different agent for every entity in the model,
which means that for the RTT strategy we would have 70 independent agents
(50 marines and 20 tanks). This is unacceptable for two reasons. Firstly, the
computation effort of maintaining such large players, as well as the memory cost,
is too high. And secondly simultaneously learning proper behaviours for 70 agents
is unacceptably enlarging the search space.
The only reasonable choice is to use the hybrid MAS with two agents, first representing
the marine type and second - the tank.
3.4.2 Agent
In each simulation frame the agent is using the information retrieved from the game
state to determine a vector where to move. There are many ways of doing so, for
example one can imagine applying list of rules, using neural networks or other algebraic
expressions. Keeping in mind that individuals in GP (and in GEP2 ) are expressed to
trees, it was decided that they are the most natural representation. Being consistent
with EC terminology they will be called in this thesis expression trees. In fact they are
algebraic formulas that evaluate to a vector (a tuple of two real numbers).
Kind
Notation
Description
Function
Function
Function
Terminal
Terminal
V = ADD(V, V)
V = MUL(S,V)
V = NOR(V)
EAST = (1,0)
NORTH = (0,1)
addition of two vectors
multiplication of a vector by a scalar
normalization3 of a vector
-
Tab. 3.1: Exemplary set of operators and terminals
Expression
Description
ADD(NORTH, EAST)
MUL(-100,ADD(NORTH,EAST))
NOR(MUL(-100,ADD(NORTH,EAST)))
describes a north-east vector of value (1,1)
describes a south-west vector of value (-100, -100)
describes a south-west vector of value (-1, -1)
Tab. 3.2: Exemplary expressions
For example, assume having a simple representation as presented in Table 3.1. Basing
on this list it is possible to create many different expressions, see Table 3.2 for examples.
The agent is exactly such an expression, only the actually used terminals and operators
are different and more complex. It might be said that expression tree implements the
choose_move_vector routine in algorithm 3.4.
3.4.3
Vector representation
The vectors are presented as an ordered pair of two real numbers (a point in two dimensional space). It is important to emphasise that vectors are bounded (not free as could
the representation suggest). They are bounded to the current operating agent (unit).
Imagine having two units, first at position (1,2) and second at position (3,1). The move
vector equals (2,1). In the first case, the move vector is bounded to the position (1,2)
2 in
GEP however it possible to represent also other structures as for example neural networks [22]
3.4 Multi-agent system
25
thus results in a move to point (3,3). In the second case the move vector is bounded to
the position (3,1) thus results in a destination (5,2).
Perhaps the above remark is intuitive and obvious. However, in the context of the
retrieving domain knowledge it is important to remember it. In Chapter 6 we propose
acquiring the domain knowledge from a game state, for instance a vector to the geometric
center of an enemy group. It is necessary to recalculate this vector for every unit, since
from an agent point of view all vectors are bounded to its current position.
4
Learning
Field experience is something you don’t get until just after you need it.
One of the Murphy’s Laws of Combat Operations.
4.1 Objectives
Knowing what the machine is going to learn it is necessary to define how. This section is
about determining the details of the learning process. It was chosen to use gene expression
programming, therefore few necessary components of the learning process are:
• definition of genotype space
• definition of phenotype space
• transformation from genotype space into phenotype space1
• neighbourhood operators2 which allow to search trough genotype space (move from
one solution to another)
• evaluation and selection of the phenotypes which guide the process of searching (in
a way chooses to which solution move to)
• search loop which combines all of the above in order to find the optimum
In this Chapter we focus on defining a general algorithm for the evolutionary learning
process and on the methods of evaluation (also in the context of distributed computing).
4.2
Algorithm
4.2.1
Search loop
The listing 4.1 shows the basic scheme of the search loop. The algorithm is similar to
any other evolutionary technique. The details of methods and approaches used in this
1 in
2 in
EC called expression
EC called genetic or breed operators
27
4.2 Algorithm
thesis - such as GEP, MAS, co-evolution, etc. - are hidden inside evaluate, select and
breed procedures.
procedure learn_best_player()
begin
Individual[] population;
Individual best_ind;
init( population );
repeat
evaluate( population );
best_ind = get_best( population );
Individual[] parents = select( population );
population = breed( parents );
until stop_conditions
return best_ind;
end
Algorithm 4.1: Evolutionary search loop
• evaluate - composes of two steps. First is the expression of abstract, linear genomes
(genotypes) into RTT players (phenotypes), which is explained in Section 5.2 covering the encoding of strategies. The second step is finding fitnesses of all individuals
basing on their expressed phenotypes. This is the main topic this Section focuses
on.
• select - the procedure entirely abstracts from the definition of the problem, since
it works in the domain of fitnesses (real numbers), for details please see the next
paragraph.
• breed - relates to neighbourhood operators which allow to search the genotype
space. This is covered in Section 5.2.
4.2.2
Selection and elitism
All the approaches, ideas and methods used by us in this thesis already create many
degrees of freedom. It was not desired to introduce another ones. And since selection
works in abstract domain of fitnesses there was no need to tailor it to RTT player learning.
We took advantage over this and in all experiments we used the same selection method
with the same parameters. There are several very popular ones and we have chosen the
tournament selection with a size of a tournament set to 2. Additionally we always use
elitism with a size of elite group set to 1. The reason of doing so is simple - we believe that
manipulation of those settings has peripheral meaning for improving the RTT strategy
learning process. Furthermore, we wished to observe the influence of other factors, such as
changing evaluation method and the representation. Therefore the selection and elitism
once set, remained unchanged during all experiments.
4.2.3
Co-evolution
The first question that needs answering is how to evaluate a single strategy. After all,
in a two-player RTT game it is always two strategies that are playing together, acting as
reference points for each other. It is not possible to evaluate a single player abstracting
from its current opponent. Let us emphasise that we wish to evaluate players without
measuring their actual performance. It is irrelevant how did the strategy play in different
games, all that matter is “better than” relation between compared set of strategies. This
28
4.2 Algorithm
leads to intuitive conclusion that in order to evaluate entire population a sequence of
games must be played, with each game increasing the fitness of the winner. In the
context of evolutionary learning, two possible approaches can be distinguished.
In the first one, the reference point(s) are defined manually. For RTT game learning,
it means designing the strategies against which the automatically found ones will play
and be evaluated. This is a typical case of evolution, because the fitness of the individual
is independent from the rest of the population. However this approach seems to be
unsuitable, because the evolution would be strongly biased by the choice of those teacher
strategies. And if one wishes to counteract this it is necessary to design a huge amount
of diverse reference players. But - ironically - at this point the researcher already has an
entire set of solutions, thus further RTT strategy learning becomes pointless.
In the second approach, instead of using external reference players, twosomes of
strategies from the same generation are paired against each other. This is known as
co-evolutionary approach, because the population itself is used to find the so-called competitive fitnesses3 of the individuals it contains. The dynamics of co-evolution learning
is still a subject to many research [32, 33]. There are several interesting methods proposed in the literature, and as the most promising we chose to test single elimination
tournament (SET) and various versions of hall of fame (HoF). It must be underlined
that co-evolutionary learning from simulation is difficult, since the given strategy may
be evaluated as very good in reference to one opponent and as very poor in reference
to some another. Further more this process is extremely noisy and the winner of the
same game may vary in different runs. The combination of this two phenomenas results
in instability of learning. Both in SET and HoF we try to some extent neutralize this
effect. However, it is out of the scope of this thesis to analyze and measure the potential
instability itself, as well as the methods used to prevent it.
4.3
Evaluation
4.3.1
Single Elimination Tournament
Basic SET
In [7] Angeline and Pollack introduce the idea of using tournament fitness to evaluate
individuals. Let us cite the description of Single Elimination Tournament (SET):
Initially, the entire population is in the tournament. Two members are
selected at random to compete against each other with only the winner of
the competition progressing to the next level of the tournament. Once all
the first level competitions are completed, the winners are randomly paired
to determine the next level winners. The tournament continues until a single
winner remains The fitness of a member of the population is its height in the
playoff tree, the player at the top is then the best player of the generation.
The method is very intuitive and easy to implement when population size is a power of
two. The example of paring the individuals in SET is shown of Figure 4.1.
Half of the populaiton has the minimal fitness of value 0 and only one individual has
the maximum fitness of value "log n#, where n is the size of the population. The total
number of fights in a single generation is equal to:
3 from
now when using the term fitness we assume it to be competitive (subjective)
29
4.3 Evaluation
Fig. 4.1: SET pairing
!log n" "
!
i=1
n#
=n−1
2i
Dealing with noise - SET with games repeats
The SET works perfectly if the competition is non-noisy and fulfills the strong transivity
assumption: that if player A beats player B, and player B beats player C, then player
A must beat player C. Then the best player always win the tournament. Without this
assumption, Single-Elimination Tournament’s real dynamics can be murky [34]. If the
environment is suitably complex and an optimal strategy is not in the population, it is
possible for even a poor strategy to win the tournament in a particular generation [7].
In order to minimize the noise influence - to be certain that truly a better player is
a winner - we repeat the same game k times4 , where k is an odd number. The strategy
that wins the majority of repeated games is evaluated as being better than the opponent.
This changes the number of played games to k(n − 1). It does not however change the
fitness values set to individuals. Still, the fitness ranges from 0 (for half of population)
up to "log n# (for the only one all-winner).
Memory - Cycling in SET
SET has no memory of the process of learning. This is a very undesired feature in the
context of strategy learning, because it might cause the cycling effect. It is usually seen
in co-evolution of two or more subpopulations, however is also possible in a case of just
one. Imagine there exist three5 strategies A, B and C violating the strong transivity
assumption (A wins with B, B wins with C and C wins with A) and that one of them
is discovered at some moment of evolution. If the learning process has no memory of
what players were created in the past it is probable the evolution will start to cycle. For
example, once discovered strategy A will evolve into a better playing strategy B, it will
then evolve into a better playing strategy C, which in the end will evolve into a better
playing strategy A. The evolution process cycles: A → B → C → A → ..., thus the
learning of new players is practically stopped. This behaviour was actually observed in
the first experiment, for details please see Section 8.3.
4 with
5 the
different seeds for the pseudo-random numbers generator
minimal length of the cycle is 3, however it might be longer
30
4.3 Evaluation
4.3.2 Hall of Fame
Basic HoF
The main idea in “hall of fame” family techniques is to incorporate memory into process
of learning. Individuals in the population are evaluated against the good individuals
discovered so far in the evolutionary run [34]. The method is very simple to implement,
see Listing 4.2.
procedure evaluate(Individual[] population, Individual[] HoF, int k)
begin
foreach individual in population do
begin
individual.fitness = 0;
end
Individual [] teachers = choose_randomly_k_individuals(HoF, k);
express_phenotypes(population);
express_phenotypes(teachers);
foreach individual in population do
begin
foreach teacher in teachers do
begin
Individual winner = play_game(individual.phenotype, teacher.phenotype);
if (winner == individual) then
individual.fitness += 1;
end
end
Individual best = choose_best(population);
HoF.add(best);
end
Algorithm 4.2: Evaluation in simple HoF
For clarity, it is necessary to highlight the difference between:
• HoF size - a parameter saying against how many strategies from HoF each individual must fight (in a single generation). In the Listing 4.2 it is named k - to
emphasize its similar function to number of game repetitions in SET (hence the
same letter symbol).
• And HoF length - a length of HoF array (how many strategies are in HoF already).
The competitive fitness of the i-th individual equals:
fi =
k
!
vi (teacherp )
p=1
Where the vi (j)6 is a binary function that results in:
• 1 if the i-th individual from the population defeated the j-th individual from the
hall of fame array,
• and 0 otherwise;
6v
stands for victory
31
4.3 Evaluation
The variable teacherp is the p-th random individual from hall of fame array (altogether k
random teachers are selected). Obviously the number of games played at each generation
is equal to kn.
Maintaining the hall
It is necessary to properly maintain the array of best strategies. There are two main
issues regarding that:
• HoF initialization - when the learning begins the array of best players found so far is
obviously empty. The evaluation cannot take place unless there are some strategies
in HoF. This could be resolved by adding to HoF array few random individuals
at the beginning. However, most of them would probably play very poorly and
only introduce more noise to the evolution process. Thus in our solution it was
decided to introduce manual teacher strategies. This approach takes advantage
over supervised learning and solves the problem of initialization the HoF array.
The fitness function changes respectively:
fi = wmanual
kmanual
!
vi (teacherpmanual ) + wlearned
p=1
klearned
!
vi (teacherplearned )
p=1
The weights wmanual and wlearned allow giving different importance to reference
strategies depending on if they were learned or manually set. Therefore it is possible
to find a sort of tradeoff between supervised and unsupervised learning.
• Unique HoF - as the result of using the elitism the same strategy could be evaluated
as the best in more than one generation. This results in reoccurance of the same
player in HoF array, which could lead to biasing the evolution7 . The solution is
very simple - do not add the individual into HoF array if there is already another
one with the same expression tree. Thus hall of fame acts actually as set (in strict
mathematical meaning) of phenotypes.
Competitive fitness sharing
In case of the co-evolution the stagnation might be a great problem. The populations
are used to evaluate themselves, so at some point the evolution may be “content” with
already found individuals, even if they are far from being good RTT players (from human
point of view). In a way initializing the HoF with manual strategies tries to counteract
this. But it is not enough. There is a need to enhence the selection pressure during entire
evolution and thus force the finding of new, original individuals that can successfully play
the RTT game. To achieve that, we follow the idea of the competitive fitness sharing8
[37, 38]:
fi =
k
!
p=1
1
v
(teacher
p) + 1
j=1 j
vi (teacherp ) · $n
The fitness used in a simple HoF was extended by a scaling factor. This promotes
individuals that are one-of-a-kind and win with strategies rarely defeated by others. The
analysis of two extreme cases will help to understand this property.
7 if
a strategy is over-represented in HoF the evolution will focus on beating this frequent player
clarity we show the competitive HoF without the manually designed set of teacher strategies
8 for
32
4.3 Evaluation
• Every strategy defeats entire HoF:
fevery =
k
!
p=1
k
1
=
n
+
1
1
+
1
j=1
1 · $n
• One strategy defeats entire HoF, while rest of the population loses all the games:
fone =
k
!
p=1
1 · $one−1
j=1
1
$n
0+1+
j=one+1 0 + 1
=
k
2
frest = 0
The situation when entire population wins with every player from HoF is very unlikely,
thus winning with unbeatable reference strategies is rewarded. We believe this feature
makes from competitive HoF a good choice for RTT player learning.
4.4 Distributed learning
4.4.1
Assumptions
In evolutionary learning it is the evaluation of individuals that takes most of the time.
It might be even said that all of the other procedures (like selection and breeding) are
insignificantly fast. Furthermore the evolutionary learning is an iterative process where
next generation is computed from the previous one. All this leads us to the masterslave scheme of computation where one host is distinguished as a master and runs the
search loop dispatching the evaluation of individuals to many slaves (please see Chapter
7 for more details). The question arises - how to split the evaluation procedure into
independent tasks? Luckily, the nature of the RTT game learning helps us out and
the answer is very simple - the smallest indivisible task is a single game between two
strategies.
We assume the game lasts on average for t seconds and there are m machines available.
The RTT game defined in Section 2.3 is said to last for maximum 1 minute and from
previous section we know there are at each generation about nk games to play. For
example, if n=1024 and k=5 that gives 3072 minutes, which is more than two days to
evaluate just one generation! If one wishes for example to run over 150 generations it is
necessary to wait for about a year for the results! This makes computational experiments
in the field of RTT playing almost impossible to be conducted. To deal with this problem
two techniques are proposed:
• game speed up, see Subsection 2.3.1
• and distribution of computation.
We assume there are no errors while evaluating players and all games end successfully.
Obviously in the real life errors happen and the software must handle them properly, see
Chapter 7. However, in the theoretical analysis those errors are neglectable.
4.4.2
Distributed SET
Evaluation of each generation in SET is not easy to be distributed among many hosts,
because it consists of successive steps9 that must be performed only after completing
9 fights
on consecutive levels of tournament’s tree
33
4.4 Distributed learning
the proceeding ones. Let assume there are m machines performing the computation and
each game last for approx. the same amount of time - t. Then the time of processing
one generation is equal to:
!log2 n"
!
i=1
t·
%
k 2ni
m
&
=t·
!log2 n" %
!
i=1
kn
2i m
&
For example, if n =128, m = 64, k = 5 and t=60 seconds then the time of processing
just one generation equals:
% 5·128 &
$7 ' (
$7
2i
= 60 · i=1 10
i=1 60 ·
64
2i =
60 · (5 + 3 + 2 + 1 + 1 + 1 + 1) =
60 · 14 = 840 [s]
And in case of having just one host (m = 1):
% 5·128 &
$7
$7 ' (
2i
= 60 · i=1 384
=
i=1 60 ·
1
2i
60 · (192 + 96 + 48 + 24 + 12 + 16 + 3) =
60 · 381 = 22860 [s]
If computation are distributed on 64 machines the evaluation of one generation takes
14 minutes to complete. It is an enormous improvement in comparison to more than
6 hours on one host. However it is still long, hence typically it would be wished to
use as much CPUs as possible. Bigger the problem is, more CPUs we want and other
way round - more CPUs we have, the bigger problem might be solved. Let us see what
happens in case the number of machines is proportional to the number of individuals
in the generation - assume having m=n machines. There are k · (n − 1) fights for each
generation. In an optimistic (desired) scenario, when all the fights are independent from
each other, the evaluation of one generation takes k steps. We already know it is not the
case of SET. The question is how much CPU power is wasted. Assumptions:
1. m=n
2. k < n
3. n is a power of two
4. k is a positive odd number.
First lets analyze the optimistic scenario, where all fights are independent, the expected
time of the computation is:
%
&
k · (n − 1)
t·
≈ kt
m
In SET the evaluation of a generation lasts for:
(
$log n '
$log n ' (
t · )i=12 2kn
= t · i=12 2ki *=
im
$$log2 k% ' k ( $log2 n
=t·
i=1
i=!log k" 1 =
2i +
2
= t · [(k + 'log2 k( − 1) + (log2 n − "log2 k#)] =
= t · [(k + log2 n − 1) + ('log2 k( − "log2 k#)] =
= t · (k + log2 n − 2) = kt + t · log2 n4
Finally, the utilization of computational power could be measured as the ratio of the
two above values:
34
4.4 Distributed learning
kt
kt + t · log2
n
4
=
1
1 + log2
1
k
n
4
=
1
+
1 + log2 k n4
Table 4.1 shows the CPU utilization ratio for typical values of k and n (once again under the assumptions the number of machines is equal to the number of individuals). It
clearly shows that much computational power is wasted. Therefore when using SET it is
recommended to use population size at least few times bigger than number of machines
available. However with constantly growing in size computational clusters, with new
tendency of multicore CPUs and already having hundreds of cores GPUs this guideline
may be harder and harder to follow. The authors believe that evaluation methods easier
to distribute uniformly will be in favor in nearby future. And SET is not one of them.
n
k
1
2
3
5
9
32
64
128
256
512
1024
0,17
0,38
0,50
0,58
0,64
0,14
0,33
0,45
0,54
0,6
0,13
0,30
0,42
0,50
0,56
0,11
0,27
0,38
0,47
0,53
0,10
0,25
0,36
0,44
0,50
0,09
0,23
0,33
0,41
0,47
Tab. 4.1: CPUs utilization ratio in SET in case m=n
4.4.3
Distributed HoF
In HoF all fights are independent, hence this evaluation method is easily distributed
among many hosts and there are no scalability issues as in SET. As said previously,
there are kn fights for each generation, thus the time of evaluating single one is equal to:
%
&
k·n
t·
m
Having m=kn machines will ideally distribute the computations. We know that a typical use-case of SET (assuming n=128, k=3 and m=64) leads to evaluation of a single
generation lasting 14 minutes. For HoF the k coefficient means the size of the reference
strategies array and therefore it is usually bigger. Let assume k=20 and the same number of machines m=64. The question arises - what is the size of the population, so the
evaluation of single generation would last for comparable time? The answer is a maximal
whole number n fulfilling the relation:
' (
t · kn
m = time
"0.3125n# = 10
n = 32
It means that in typical case HoF must use few times smaller population than SET in
order to achieve similar computational time. On the other hand there are no scalability
issues regarding HoF and it is safe to say that mostly often10 the CPU utilization level
equals 1.
10 For example, if n=32 and k=20 more than 640 CPUs are required for the utilization level to drop
below the 100%
5
Encoding
The side with the simplest uniforms wins.
One of the Murphy’s Laws of Combat Operations.
5.1
Objectives
We wish to use gene expression programming, thus encoding the player involves dealing
with two issues:
• the strategy must be represented as a linear genome,
• the genetic operators must be designed.
Luckily, the second part has already been dealt with in [21] and requires few changes. On
the other hand, encoding strongly typed trees as a linear chromosome is quite a challenge.
In this section we show an innovative method of encoding complex tree structures in a
linear chromosome. Further on we assume the reader knows the basic concepts of gene
expression programming. The list of all terminals and operators used to encode the
player will be called representation.
5.2 Strongly typed GEP
5.2.1
Encoding simple structures
From [20] it is known how to encode algebraic expressions in GEP. For example, let
assume the following:
• functions = {/. +, *}
• terminals = {a, b, c}
• all terminals are of the same type, functions take two arguments, and return one
of the same type.
36
5.2 Strongly typed GEP
Fig. 5.1: Example of GEP chromosome and corresponding expression tree
The crucial is the last assumption about the types concordance, because it allows to
encode any expression from this representation. An example is presented on Figure 5.1.
The encoding is not that simple in case of many different types of terminals. In
presented example, if the terminal ’c’ was of a different type that required by the multiplication function than the expressed tree would be invalid. The similar problem has
already been encountered in Genetic Programming (see for example [36]) and in general
there are two ways of handling this situation:
• create invalid trees and then repair them (for example. by pruning),
• custom the process of translation so only valid trees are created.
Choosing one of them is a matter of personal opinion and problem’s characteristics.
5.2.2
Closure of the representation
This thesis proposes an elegant solution which we called expression by two-phase translation. The objective is to create only valid trees without putting any severe restrictions
on the representation. The only assumption is: in the representation there must exist
functions for all types and for all arities, as well as terminals for all types. This feature
we called closure. A simple example will clarify this requirement - imagine having two
types A and B and three functions already defined1 :
• A = ADD(A, A) a two-arity function returning the A type.
• B = NOT(B) a one-arity function returning the B type.
• A = NOT(A) a one-arity function returning the A type.
This representation does not fulfill the requirement. If there is already some two-arity
function there must exist two-arity functions for every type. Obviously the two-arity
function of type B is missing, adding for example B=ADD(A,B) will solve the problem
and close the representation. Please notice that it is irrelevant what types the arguments
of the functions are, all that matters is the arity and the return type.
5.2.3
Expression by two-phase translation
The main idea is to separate the expression tree’s structure from actual functions and
terminals used in its nodes. First the structure of an expression tree is created, than thestill-empty nodes are filled with concrete functions and terminals, thus the translation
1 if
a function is called to be of some type it means that it evaluates to that type (returns it)
37
5.2 Strongly typed GEP
process has two steps, hence the given name. Redefinition of GEP’s head and tail is
required.
The head
First let assume the set Arities contains all arities (as numbers) of functions present in
the representation. It also contains a special symbol T 2 , standing for terminal node. For
example if there exists (at least one) 1-, 2- and 4-arity function in the representation then
Arities = {T, 1, 2, 4}. Knowing that, the head is defined as a linear string of symbols
taken from the set Arities. The successive letters of a head, beginning from the first,
straightforwardly define the tree structure. The tree is created from root in a BFS3
manner. It is worth noticing that when building the tree structure, arities values acts
as functions and the T symbol is the terminal (and the only one). The redefined head
can be considered to actually be a whole GEP gene. Formally, it is even necessary to
distinguish in the new head its own subhead and subtail. However since there is only one
terminal symbol, the subtail is redundant, because it would be a string containing many
times the same T symbol. Therefore the subtail is useless and T symbol is considered to
be a default value when constructing tree’s structure. Please refer to Figure 5.2 for an
example. Notice that the last two nodes were by default set to T s.
Fig. 5.2: Structure of the expression tree for the head “32T1T121T”
The tail
Assume we have the Arities set defined as in the previous paragraph and T ypes set
consisting of all types from the given representation. The tail has n independent parts,
where n = |Arities| · |T ypes|. Each part corresponds to a different tuple from Arities ×
T ypes. The symbols from the tail are used to fill in all nodes in the tree. A single node
is set by a simple procedure:
1. Determine the return type of the current node.
2. Determine the arity of the current node.
3. Choose next unused symbol from appropriate part (corresponding to type and
arity).
2 the
T may be considered as having 0-arity
good is the DFS, however it is BFS which is mostly often used in GP
3 equivalently
38
5.2 Strongly typed GEP
Entire process begins in the root. Its type is given beforehand and depends on the
problem (in our case it is a vector, as said in Chapter 3). Then recursively all descending
nodes are filled in. The order of tree traversing is arbitrary, however using different orders
will result in different trees expressed from the same genome. For sake of consistency
with the typical approach we advise using BFS.
Gene parts
One could see the stGEP chromosome as composing of many separate genes. The first
one describes the structure of a tree and - in classic GEP terminology - has only the head
(tail is useless since there is only one terminal symbol). All of the others are sequences
of terminals used to fill in the empty nodes, so - in classic GEP terminology - have only
tails. The stGEP chromosome has sense and can be expressed only if all of the parts
listed above are present. To underline the integrity of stGEP genome and taking into
account the “head” and “tail” nature of different parts we find it appropriate to call
the first part the head and all others the tail. On the same time we use the term gene
part. The head and all parts of the tail are separate gene parts. This term is important
because for example the transposition of symbol sequences in the genome can only be
done within the borders of the same gene part.
Genome size analysis
It is necessary to have enough symbols in each of the tail’s parts in order to successfully
perform the two phase expression. The head’s size is constant and depends on the user.
The tail’s parts sizes differ whether their consist of terminals or functions:
• sizei = (max(Arities) − 1) · h + 1 if the i-th part of the tail consists of terminals
(regardless of type).
• sizei = h if the i-th part of the tail consists of functions (regardless of type),
where h is the head’s size. The last one is an important parameter because it is in fact
upper bound for the tree size4 . Changing the value of h changes the maximum and
expected size of expressions. This determines the phenotypes’ complexity level, thus
influences the learning process.
Knowing that terminals’ parts appear in the tail as many times as many different
types there are in the representation, the entire genome’s size is:
sizeall = h + |T ypes| · ((max(Arities) − 1) · h + 1) + |T ypes| · (|Arities| − 1) · h
Random values in stGEP
If desired the random values of chosen (in particular one or all) types may be introduced.
This involves introducing two additional symbol sequences in the genome (for each type
desired):
• Dc gene part: the sequence of indexes telling which random number to use
• and RNC gene part: the random numbers itself.
4 we
consider a tree size to be the number of its nodes
39
5.2 Strongly typed GEP
The sizes of the both the Dc and RNC strings are the same and are equal to size of tail’s
part corresponding to terminals of appropriate type, because only a 0-arity terminal
might turn out to be an RNC and in the worse case all terminals are RNCs. Therefore,
the size of the genome changes respectively to:
sizeall = h + 3 · |T ypes| · ((max(Arities) − 1) · h + 1) + |T ypes| · (|Arities| − 1) · h
Phenotype size analysis
The maximum size of a tree is - as said previously - h. In this case, the expression
process “uses” the entire head and h symbols from the tail, which gives 2h5 symbols used
altogether. This allows to estimate what fraction of a genome is actually expressed as
the phenotype (at maximum):
sizephenotype =
2h
h + |T ypes| · ((max(Arities) − 1) · h + 1) + |T ypes| · (|Arities| − 1) · h
sizephenotype ≈
2
1 + |T ypes| · (max(Arities) − 1) + |T ypes| · (|Arities| − 1)
sizephenotype ≈
2
|T ypes| · (max(Arities) + |Arities| − 2) + 1
In the RTT game this thesis focuses on, we use 3 different types, functions of 1-, 2-,
3-arity and terminals. We consider this to be a very typical case when using stGEP.
It means that at maximum 15% of the genome is actually expressed. However, the
phenotype is expected to be often much smaller, than at the upperbound case. The
exact numbers certainly vary and depend on many factors such as representation itself,
learning process, etc. Nevertheless it seems to be safe to assume that typically a 5%10% of a genome is expressed into a strategy. We find this value to be very suitable,
because in every individual potential subsolutions might be hidden and they can be freed
or stored for later in the evolution process, while at the same time the significant part of
the genome is actually a subject to expression into phenotype.
5.3 Genetic operators
Genetic operators are used to breed the population. C. Ferreira introduced in GEP
a set of original genetic operators that take advantage over the linear nature of the
chromosome. Few changes are required for the stGEP, Table 5.1 shows the differences
in comparison to GEP and summarizes the probabilities of using each operator during
breeding.
5.3.1
Recombination
The strong typing has no influence on recombination operators proposed by Ferreira.
However, it does allow to introduce one more recombination operator - gene-part recom5 for ease of calculations we assume no RNCs are in the representation. On the one hand their enlarge
the chromosome, but on the other - using an RNC requires reading the indexes and randomized values,
which means the percent of genome taking part in the expression process is higher. It is safe to assume
that both this effects cancel each other out.
40
5.3 Genetic operators
Operator
Type
in GEP
Default probability
one point recombination
two point recombination
gene recombination
gene part recombination
stIS transposition
stRIS transposition
gene transposition
st inversion
mutation
RNC mutation
Dc mutation
Dc inversion
Dc transposition
recombination
recombination
recombination
recombination
transposition
transposition
transposition
inversion
mutation
mutation
mutation
inversion
transposition
the same
the same
the same
not present
different
different
the same
different
the same
the same
the same
the same
the same
0.3
0.3
0.1
0.1
0.1
0.1
0.1
0.1
0.044
0.01
0.044
0.1
0.1
Tab. 5.1: Genetic operator in GEP and stGEP
bination.
One point recombination
One-point recombination swaps a part of one chromosome with the corresponding part
of another chromosome. There is one point of cut chosen randomly. See Figure 5.3 for
an example - two abstract genes, one consisting of digits other of letters, different colors
show the swapped fragments. The probability of performing one-point recombination
between two selected individuals is given by parameter onePointRecombProbability. The
default value - as suggested by Ferreira - equals to 0.3.
Fig. 5.3: One point recombination
Two point recombination
Two-point recombination swaps a part of one chromosome with the corresponding part
of another chromosome. There are two points of cut chosen randomly. See Figure 5.4 for
an example - two abstract genes, one consisting of digits other of letters, different colors
show the swapped fragments. The probability of performing two-point recombination
between two selected individuals is given by parameter twoPointRecombProbability. The
default value - as suggested by Ferreira equals to 0.3.
Fig. 5.4: Two point recombination
41
5.3 Genetic operators
Gene recombination
Gene recombination swaps one gene in a chromosome with the corresponding gene in
another chromosome. The gene to swap is chosen randomly. See Figure 5.5 for an
example - two abstract genes, one consisting of digits other of letters, different colors
show the swapped genes. The probability of performing gene recombination between two
selected individuals is given by parameter geneRecombProbability. The default value - as
suggested by Ferreira - equals to 0.1.
Fig. 5.5: Gene recombination
Gene part recombination
The stGEP introduces the idea of gene parts, thus it seems logical to introduce new
recombination operator that will operate on them. Gene part recombination swaps one
gene-part in a chromosome with the corresponding gene part in another chromosome.
The gene to swap is chosen randomly. See Figure 5.6 for an example - one abstract gene
composing of two parts, one consisting of digits other of letters, different colors show the
swapped gene parts. The probability of performing gene part recombination between
two selected individuals is given by parameter genePartRecombProbability. We suggest
the default value of 0.1.
Fig. 5.6: Gene-part recombination
5.3.2
Transposition
The strong typing has big influence on transposition operators.
stIS transposition
Strongly typed Insertion Sequence transposition transposes (inserts) a small fragment of
a gene part into the same gene part:
• in case if a chosen gene part is head - after the root position
• in case if a chosen gene part belongs to a tail - at arbitrary position (also at root)
See Figure 5.7 for an example - two abstract genes, one consisting of digits other of letters,
different color shows the transposed fragment. The probability of stIS transposition to
happen in a gene part is given by parameter stISTranspositionProbability. The default
value - as suggested by Ferreira - equals to 0.1. However, please notice that originally in
GEP the probability refers to entire gene (not gene part), therefore it might be desired
to lower the value of this parameter. The answer to question “how much” depends on
real application of stGEP and the actual representation used.
42
5.3 Genetic operators
Fig. 5.7: stIS transposition
stRIS transposition
Strongly typed Root Insertion Sequence transposition transposes (inserts) a small fragment of a gene part into the same gene part at root position. The transposed fragment
must start with a function, thus this works only in the head. See Figure 5.8 for an example - two abstract genes, one consisting of digits other of letters, different color shows
the transposed part. The probability of stRIS transposition to happen in a gene head is
given by parameter stRISTranspositionProbability. The default value - as suggested by
Ferreira - equals to 0.1.
Fig. 5.8: stRIS transposition
Gene transposition
Gene transposition swaps the first gene with a randomly chosen other gene. This works in
case of having many compatible genes in the chromosome. See Figure 5.9 for an example
- two abstract genes, one consisting of digits other of letters, different color shows the
transposed part. The probability of gene transposition to happen in a chromosome is
given by parameter geneTranspositionProbability. The default value - as suggested by
Ferreira - equals to 0.1.
Fig. 5.9: Gene transposition
In the RTT player there is one gene for marine’s behaviour and one for tank’s. Both
of them are compatible (use the same representation), so in our case stGene transposition
means always swapping the first and the second (last) gene. Additionally, please notice
since gene parts are by definition incompatible, there is no stGene part transposition on the contrary to analogical situation in recombination operators.
5.3.3
Inversion and mutation
The strong typing has influence on inversion but not on mutation.
st inversion
Strongly typed inversion randomly selects start and end positions within the same gene
part and reverts the order of the sequence. The probability of st inversion to happen in a
gene part is given by parameter stInversionProbability. The default value - as suggested
by Ferreira - equals to 0.1. However, please notice that originally in GEP the probability
refers to entire gene (not gene part), therefore it might be desired to lower the value
of this parameter. The answer to question “how much” depends on real application of
stGEP and the actual representation used.
43
5.3 Genetic operators
mutation
Mutation randomly changes the symbol (accordingly to proper gene part domain) at
arbitrary position in the gene. The probability of changing any position is given by
parameter mutationProbability. The default value - as suggested by Ferreira - equals to
0.044.
5.3.4
Dc and RNC operators
RNC mutation
Mutates constants from the RNC areas. The value is randomly chosen from the appropriate domain. The probability of changing any position in RNC areas is given by
parameter rncMutationProbability. The default value - as suggested by Ferreira - equals
to 0.01.
Dc mutation
Mutates the values (indexes) in the Dc areas. The new value is a randomly selected
integer from [0, size] (where size means the size of the appropriate Dc gene part). The
probability of changing any position in Dc areas is given by parameter dcMutationProbability. The default value - as suggested by Ferreira - equals to 0.044.
Dc inversion
Inverts the values (indexes) in the Dc areas. The limits of inverted sequences are randomly chosen. The probability of inverting a fragment in any Dc area is given by parameter dcInversionProbability. The default value - as suggested by Ferreira - equals to
0.1.
Dc transposition
Does the IS transposition in the Dc areas (fragments may be inserted also at the root
position). The probability of transposing a fragment in any Dc area is given by parameter
dcTranspositionProbability. The default value - as suggested by Ferreira - equals to 0.1.
6
Representation
If you know exactly what is happening than you are not in combat.
One of the Murphy’s Laws of Combat Operations.
6.1 Objectives
The detailed list of all terminals and operators must be designed, which in a way will
implement a routine gather_information from algorithm 3.3. During the work on this
thesis and conducting the experiments two representations were created. The analysis
of the experiments presented in Chapter 8 brought us to conclusion that the initial one
was insufficient for successful player learning. However it was a good starting point for
designing a second set of terminals1 .
When designing the representation we wish to fulfill two requirements:
• The representation must be small. This limits the solutions’ space and thus eases
the learning.
• The representation must have sufficient expression power, therefore the set must
be big and rich enough.
Obviously both requirements cannot be fully satisfied at the same time and a tradeoff
between them must be found. In the context of the RTT games and GEP, choosing
proper terminals and operators for expressing units is more a case of an intuition than
strict analysis. It might be said that the way how a designer understands a game lets him
find more adequate representation. In this chapter we show two different representations
- “simple” and “complex”, each of them composes of a list of terminals and a list of
functions.
6.2 Types
There are three types in the representation, see Table 6.1.
1 only terminals represent the domain knowledge, therefore once created set of operators was not
changed
45
6.3 Functions
Notation
Meaning
S
V = S×S
B = {0, 1}
scalar (a real number)
vector (a tuple of two real numbers)
boolean
Tab. 6.1: Types in initial representation
6.3 Functions
There are operators of 1-, 2- and 3- arity of every type. Table 6.2 shows detailed list of
them all.
Notation
Return type
No of args
Notes
IF(B,B,B)
ADD(B,B)
MUL(B,B)
LT(S,S)
IF(B,S,S)
ADD(S,S)
MUL(S,S)
SUB(S,S)
OPP(S)
ABS(S)
SIG(S)
ANG(V,V)
LEN(V)
IF(B,V,V)
ADD(V,V)
MUL(S,V)
SUB(V,V)
OPP(V)
NOR(V)
ROT(S,V)
RIG(V)
LEF(V)
B
B
B
B
S
S
S
S
S
S
S
S
S
V
V
V
V
V
V
V
V
V
3
2
2
2
3
2
2
2
1
1
1
2
1
3
2
2
2
1
1
2
1
1
if clause
logic OR
logic AND
less than
if clause
addition
multiplication
subtraction
opposite value
absolute value
sigmoid function
angle between two vectors
length of a vector
if clause
addition
scalar multiplication
subtraction
opposite vector
vector normalization
vector rotation
perpendicular vector turned to the right
perpendicular vector turned to the left
Tab. 6.2: Functions
6.4
Domain knowledge
6.4.1
Simple terminals
Table 6.3 shows the terminal list as it was proposed at the very beginning.
The initial simple representation was used to conduct first experiments. During the
tests many drawbacks were discovered:
• The lack of memory in the RTT player algorithm - it was not yet implemented
than.
46
6.4 Domain knowledge
No
Name
Type
Description
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
MY_RANGE
MY_SPEED
MY_HP
MATES_HP
NEAREST_MATE_HP
ENEMIES_HP
NEAREST_ENEMY_HP
N_MATES
N_ENEMIES
N_ENEMIES_IN_RANGE
NEAREST_MATE
CENTER_MATES
NEAREST_ENEMY
CENTER_ENEMIES
HOME_VECTOR
IS_MOBILE
S
S
S
S
S
S
S
S
S
S
V
V
V
V
V
B
shoot range of the current unit
maximum speed of the current unit
HP level of the current unit
sum of HP levels of all friendly units
HP level of the nearest friendly unit
sum of HP levels of all enemy units
HP level of the nearest enemy unit
the number of all friendly units
the number of all enemy units
the number of enemies units in shoot range
vector to the nearest mate
the geometric center of all friendly units
vector to the nearest enemy
the geometric center of all enemy units
the initial geometric center of all friendly units
true for marine and tank not in siege mode
Tab. 6.3: Simple terminals
• The boolean terminals were necessary to close the representation (see Subsection
5.2.2). However in the basic representation we introduced only one such symbol.
The extremely short list of boolean terminals results in imbalance in the representation.
• None of the terminals distinguishes tanks from marines. Intuitively, this prevent
learning strategies adjusted to different type units.
• It seemed that there were not enough terminals giving local information and evolution were not able to build on top of them intelligent strategies.
Simply put, it seems that the simple representation does not have enough expression
power. Therefore, after several designing cycles a new one was§ created.
6.4.2
Complex scalar terminals
The scalar terminals from the complex representation provide information above all
about:
• hit points level of some distinguished units
• and the number of certain units
in the context of:
• units alliance (friend or foe),
• units type (marine or tank),
• and units position (nearest, furthest, in range, on entire map).
The full list of scalar terminals - from the “current agent” point of view - is the following:
1. TICK - which simulation frame of the game it is
47
6.4 Domain knowledge
2. MY_HP - my current HP level
3. NEAREST_MARINE_MATE_HP - HP level of the nearest friendly marine
4. NEAREST_TANK_MATE_HP - HP level of the nearest friendly tank
5. NEAREST_MARINE_ENEMY_HP - HP level of the nearest enemy marine
6. NEAREST_TANK_ENEMY_HP - HP level of the nearest enemy tank
7. WEAKEST_MYRANGE_MARINE_MATE_HP - HP level of the weakest friendly
marine I can aid
8. WEAKEST_MYRANGE_TANK_MATE_HP - HP level of the weakest friendly
tank I can aid
9. WEAKEST_MYRANGE_MARINE_ENEMY_HP - HP level of the weakest enemy marine I can shoot
10. WEAKEST_MYRANGE_TANK_ENEMY_HP - HP level of the weakest enemy
tank I can shoot
11. RANGE_ANY_MATE_NUM - number of friendly units that can shoot at my
position2
12. RANGE_MARINE_MATE_NUM -number of friendly marines that can shoot an
enemy near me
13. RANGE_TANK_MATE_NUM - number of friendly tanks that can shoot an enemy near me
14. ALL_MARINE_MATE_NUM - number of all friendly marines that are still alive
15. ALL_TANK_MATE_NUM - number of all friendly tanks that are still alive
16. MYRANGE_ANY_ENEMY_NUM - number of enemy units that I can shoot
17. MYRANGE_MARINE_ENEMY_NUM - number of enemy marines that I can
shoot
18. MYRANGE_TANK_ENEMY_NUM - number of enemy tanks that I can shoot
19. RANGE_ANY_ENEMY_NUM - number of enemy units that can shoot me
20. RANGE_MARINE_ENEMY_NUM - number of enemy marines that can shoot
me
21. RANGE_TANK_ENEMY_NUM - number of enemy tanks that can shoot me
22. ALL_MARINE_MATE_NUM - number of enemy marines that are still alive
23. ALL_TANK_MATE_NUM - number of enemy tanks that are still alive
2 an
approximation of how many units may aid me
6.4 Domain knowledge
48
6.4.3 Complex vector terminals
Vector terminals should allow units to:
• synchronize their movement with the rest of the team,
• and choose wisely the path to approach the enemy (or to construct solid defense).
Therefore, the vector terminals from the complex representation mostly often point to:
• some distinguised units,
• or geometric centers of distinguished groups of units
in the context of:
• units alliance (friend or foe),
• units type (marine or tank),
• units characteristic (weakest),
• and units position (nearest, furthest, in range, on entire map).
The full list of complex vector terminals- from the “current agent” point of view - is the
following:
1. ME - a (0,0) vector.
2. HOME_TEAM - vector to the starting geometric center of all friendly units.
3. HOME_MINE - vector to my starting position.
4. FURTHEST_RANGE_MARINE_MATE - vector to the furthest friendly marine
that can shoot an enemy near me.
5. FURTHEST_RANGE_TANK_MATE - vector to the furthest friendly tank that
can shoot an enemy near me.
6. FURTHEST_RANGE_MARINE_ENEMY - vector to the furthest enemy marine
that can shoot me.
7. FURTHEST_RANGE_TANK_ENEMY - vector to the furthest enemy tank that
can shoot me.
8. NEAREST_MARINE_MATE - vector to the nearest friendly marine.
9. NEAREST_TANK_MATE - vector to the nearest frienfly tank.
10. NEAREST_MARINE_ENEMY - vector to the nearest enemy marine.
11. NEAREST_TANK_ENEMY - vector to the nearest enemy tank.
12. WEAKEST_MYRANGE_MARINE_MATE - vector to the weakest friendly marine that I can aid.
13. WEAKEST_MYRANGE_TANK_MATE - vector to the weakest friendly tank
that I can aid.
14. WEAKEST_MYRANGE_MARINE_ENEMY - vector to the weakest enemy marine I can shoot.
49
6.4 Domain knowledge
15. WEAKEST_MYRANGE_TANK_ENEMY - vector to the weakest enemt tank I
can shoot.
16. CENTER_ALL_MATE - vector to the geometric center of all friendly units.
17. CENTER_MARINE_MATE - vector to the geometric center of all friendly marines.
18. CENTER_TANK_MATE - vector to the geometric center of all friendly tanks.
19. CENTER_ALL_ENEMY- vector to the geometric center of all enemy units.
20. CENTER_MARINE_ENEMY- vector to the geometric center of all enemy marines.
21. CENTER_TANK_ENEMY- vector to the geometric center of all enemy tanks.
22. PATH_MATE - vector parallel to the previous move vector of the nearest friendly
unit.
23. ME_BACK - vector to my previous position.
6.4.4
Complex boolean terminals
The main idea behind proposed boolean terminals was to provide some useful and nontrivial information to the agent. The full list - from the “current agent” point of view is the following:
1. AM_ON_EDGE - true if the current unit is close to the edge of the map.
2. AM_ON_FIRE - true if my HP level has just dropped down.
3. AM_IN_GROUP - true if there are friendly units in my range.
4. AM_WINNING - true if my side is winning.
5. AM_BULLY - true if in range there are more friendly units than the enemies’.
6. AM_MOBILE - true if I can move (always true in case of marine, and true for
non-sieged tanks).
7. IS_BEGIN_TIME - true at the beginning of the game (up to the 480th simulation
frame).
8. AM_HEALTHY - true if the current unit’s HP level is higher than 80% of its
maximum.
9. AM_DYING - true if the current unit’s HP level is lower than 20% of its maximum.
10. JUST_SHOT - true if the current unit has shot in previous simulation frame.
11. AM_COLLIDING - true if the unit has been blocked in the previous simulation
frame (and thus did not move as ordered).
6.4 Domain knowledge
50
6.4.5 Normalization
All scalar terminals are normalized to be in range of [0,1]. For example, the maximum
level of HP for the marine is 80. Imagine that during a game a given marine unit was
shot few times and in concrete game state it has 56 HP. In that situation the MY_HP
56
= 0.7. This would be the value actually used
scalar terminal would evaluate to value 80
in the expression tree.
As said in Subsection 3.4.3 all vectors are bounded to the current unit’s position.
Additionally all of them are also normalized by map dimension (analogically to scalar
terminals). For example, imagine a geometric center of all friendly units to be in point
(200,300) and the current unit to be at position (100,100). First we bound the vector to
the current unit, thus the CENTER_ALL_MATES equals (100,200). Than we normalize
, 100 200 , 768 = (0.09765625, 0.26041667). This would be
it by the map dimensions getting 1024
the value actually used in the expression tree.
After the move vector is calculated it is used to determine a specific point on the
map that the current unit will go to. For example if a move vector equals (0.2, 0.3) and
the position of the current unit is still (100,100) the destination point is set to be (110,
115). It is important for the destination to be unreachable in one simulation frame, since
we want units to move always with a maximum speed. Anyhow, a new move vector
is computed in each simulation frame, so the fact the unit does not reach it previous
destination does not influence its behaviour.
6.4.6 RNC vectors and map mirroring
In the representation we use scalar and vector RNCs. There is one issue regarding random
constant vectors. For example, let us consider a simple expression tree that consists of
one constant vector (0.5,0). In case a player has a starting position on the left side
of the map, playing accordingly to this expression tree would move units towards the
enemy. However, in case a player starts on the right side of the map the units would run
away from the enemy. Therefore, depending on the starting position (which is random)
the exactly same strategy results in a different units behaviour. To prevent this from
happening, at the beginning of the game we check on which side a given player starts.
If it is the left side the constant vectors remain unchanged and if it is the right side we
multiply the x coordinate by -1. Let us emphasise that this applies only to RNC vectors,
since all terminal vectors computed from the game state are always properly directed.
7
Implementation
Things that must work together, can’t be carried to the field that way.
One of the Murphy’s Laws of Combat Operations.
7.1
Objectives
One of the main goals of this thesis was to conduct distributed evolutionary learning from
simulations. From the beginning the technical complexity was known to be high, since
the application must:
• perform artificial evolution of a complex species,
• perform RTT game simulations,
• distribute the computation (namely evaluation of the individuals, which is the main
CPU power demanding task),
• provide checkpointing
• and handle errors.
In this chapter we show the entire framework design and how the computations were
distributed. Additionally, there is given a short description how the experiments were
maintained, the results were gathered and the analysis was performed.
7.2 The framework
7.2.1
Master-slave design
There are five entities in the framework design:
• Experiment - performs main evolutionary loop.
52
7.2 The framework
• Evaluator - entry point for the evaluation. Creates computational Tasks and sends
them to the Manager to be executed.
• Manager - maintains a pool of Tasks and a pool of Hosts. Each time there is a free
host (not performing computations) and there is at least one game to simulate it
assigns properly the Task to the Host.
• Task - represents a game simulation. Consists of two individuals (that are supposed
to fight against each other), a random seed, status and result. The task can have
assigned five statuses: NEW, SUBMITTED, RUN, FINISHED and DONE. The
transition between them is shown on Figure 7.2. At the beginning when task is
created by the Evaluator it is given a status NEW. As soon the tasks is inserted in
the pool in the Manager it changes the status to SUBMITTED. When task is being
executed it has status RUN. After completion, the task is always returned to the
Manager. However, in case a task ends in failure (for example the host went down)
it is automatically given status SUBMITTED and if the game simulation finished
successfully it is given status FINISHED. And finally, when tasks are retrieved from
the pool by the Evaluator they are given status DONE.
• Host - an entity representing a machine performing computations. It is responsible
for sending the command (over ssh) to run a game simulation on a remote host.
Fig. 7.1: Framework design
Summarizing, one distinguished host acts as master and processes entire evolution
loop, sending the evaluations’ tasks to slaves. This is done via ssh command, therefore
no daemon is required to be running on the slave hosts. This approach might be called a
minimalistic master-slave scheme, minimalistic - because there are no application slaves,
just hosts with proper software installed1 . For entire framework design please see Figure
1 in
our case only ORTS simulation framework is required
53
7.2 The framework
Fig. 7.2: Task status transitions
7.1, please notice that Experiment component performs entire search loop, thus it handles
also selection and breeding (what is not shown on the design).
7.2.2
Tools and libraries
Keeping in mind all objectives from previous point it is possible to break down the
framework into several “challenges”. For each must be made a decision what tools to use
and what should be implemented by ourselves.
• Evolutionary Computation System (ECJ ver. 18) - is used to perform entire experiment. This Java framework provides the implementation of the main search loop
(see Algorithm 4.1) and the selection of the individuals, as well as checkpointing.
• Strongly typed GEP - bases on the GEP plugin to ECJ written by Bob Orchard in
Java. The standard plugin turned out to be not flexible enough to handle stGEP
complex individuals, therefore it was rewritten and extended.
• Distributed evaluation of individuals:
– ORTS framework - used as simulation engine for RTT games playing. This
required to implement a dedicated ORTS client in C++, the game server
remained unchanged.
– Bash scripts - used to execute the game simulations on many remote hosts
(by ssh command).
– asynchronous tasks and hosts pool - this was implemented in Java as an extension of ECJ evaluator and allowed the experiment to perform distributed
evaluation of the individuals, as well as provided error-handling.
• Bash scripts - used for the experiment on-line monitoring.
• Postgres database, bash scripts, gnuplot - used for the analysis of the results.
7.3 Maintaining experiments
7.3.1
Monitoring
A set of command line tools were develop in order to monitor the experiments, the most
important are:
• ping: shows the status of each remote host performing computations.
54
7.3 Maintaining experiments
• synchronize: synchronizes entire cluster with a chosen host (it means copying all
necessary files from the chosen host into corresponding paths on remote hosts).
• run: runs the distributed experiment, takes as an argument a parameter file, which
lists all computational hosts and experiment’s specification.
• rerun: restarts the distributed experiment from chosen checkpoint file given as a
parameter. Checkpoint files are automatically created by ECJ framework every
generation.
• kill: kills the experiment along with all hanging processes involving ORTS simulations.
• backup: this is a daemon script which copies the checkpoint files and logs from
master host to network disk, in case of a master failure.
7.3.2
Logging and analysis
Each experiment generated many logs. Each task created by evaluator was minutely
observed, the logs contained exact information about all tasks transition from one state
to another, along with times of those happenings. At each generation entire population,
both expressed and not (raw genotype), was written to a file. Also all failures of hosts
were logged. By using simple bash command-line tools (such as grep, cut, wc, expr, etc.),
all log files allowed for:
• On-line experiment progress monitoring (for example the number of generations
processed so far or the number of hosts down).
• Off-line analysis of the results (after completion of entire experiment).
For automatic results analysis, such as for example making graphs of average fitness,
the postgres database and gnuplot were used. The log files generated by experiment
were parsed by bash scripts and the retrieved data was inserted into a database. Using
the functionalities provided by database (for example max, min, length) proper data was
collected and plotted into a graph. In database there was only one simple relation named
Experiment with fields:
• generation INTEGER
• fitness FLOAT
• marine VARCHAR
• tank VARCHAR
8
Experiments and results
If enough data is collected, a board of inquiry can prove anything.
One of the Murphy’s Laws of Combat Operations.
8.1 Objectives
In this chapter we describe the results of three chosen computational experiments that
were performed to verify our approach. It is worth mentioning that altogether eight experiments were conducted, which is roughly two years of one 1.2GHz CPU computation.
During the research we were constantly introducing implementation changes, testing different parameters, evaluators, etc. Very often a new, improved version of the software
were ready before finishing the previous experiments. Furthermore, some experiments
failed due to bugs in implementation. Therefore, we gathered data from three, most
representative experiments and summarized all the work and conclusions in this chapter.
The main idea of the experiments was to fine-tune the evolution process, focusing on
the evaluation method and the complexity of the representation. Please see Table 8.1 for
most important improvements introduced in successive three experiments.
Experiment
Evaluation
Representation
1
2
3
SET
HoF
HoF
simple
simple
complex
Tab. 8.1: Evaluation method and representation complexity in three experiments
For each experiment we show the initial settings along with reasons for using them
and present the results. We focused on:
• The length of the expression trees (meaning the total length of an expression string,
which is directly proportional to number of nodes in an expression tree). This shows
the complexity level of the phenotypes, but does not reflect the actual complexity
level of playable strategies. It is possible to have an expression tree with many
nodes, which in fact can be reduced to a very simple form.
56
8.1 Objectives
• Number of one-node expression trees. This reflects the diversity of the population.
• In case of HoF - size and length of the reference strategies set.
• Best players evolved. This is a qualitative (not quantitative) analysis of learned
strategies - we checked how best players from different generations fought against
each other. Therefore it was possible to describe strategies in the context of units
actual behaviour (the exact analysis of the expressed phenotypes, due to their
complexities, was not possible).
• If observed - other characteristics of the learning process.
Please notice that we do not analyze the individuals’ fitnesses - as it is often done in the
research devoted to artificial evolution:
• because in SET the maximum, minimum and average fitness is constant
• and due to strong competitive nature of the fitness in HoF, what makes it a nonobjective measure.
8.2
Environment
The experiments took place at three laboratories of Institute of Computing Science at
Poznan University of Technology. There were 45 machines altogether, but specific cluster
configuration varies from experiment to experiment. Table 8.2 shows a summary of the
computational mini-cluster we have created. Table 8.3 summarizes the availability of
computers, thus showing the actual computation power used.
Lab name
All hosts
Working hosts
CPU
RAM
OS
lab-43
lab-44
lab-45
15
15
15
13
14
15
2x2.2GHz
2x3GHz
2x2.2GHz
1GB
2GB
1GB
Linux
Linux
Linux
Tab. 8.2: Experimental cluster configuration
8.3 First experiment - The Reconnaissance
8.3.1
Objectives and assumptions
The first experiment is called a “reconnaissance”, since the main goal is to check if our
approach combining stGEP, MAS and RTT players is promising. The reasons for using
Lab name
Experiment 1
Experiment 2
Experiment 3
lab-43
lab-44
lab-45
13
14
15
13
12
13
13
14
0
total (hosts)
total (cores)
42
84
38
76
27
54
Tab. 8.3: Availability of hosts
8.3 First experiment - The Reconnaissance
57
certain settings are:
• Evaluator - in the first experiment we used Single Elimination Tournament, which
was already implemented in ECJ and required few changes in order to handle
distributed computations.
• Number of the game repetitions - this is specific to SET evaluation method, each
game was repeated 5 times. It was a maximal value that could be used and that
would not lengthen too much the experiment time. This way the single generation
was evaluated on average in 20 minutes.
• Representation - simple representation was used, which at this time was believed
to have sufficient expression power.
• Size of the population - we decided to use a relatively large1 population of 128
individuals, to see if a broad search can overcome the expected instability of SET.
• Number of generations - the experiment went on for 140 generations.
• Genetic operators probabilities - we used all probabilities as suggested in Table
5.1. Those values try to establish a sort of equilibrium between mutation and
recombination, not favoring any of them.
• Size of the genotype - the head was a sequence of 40 symbols. It means that
expression tree has at maximum 40+(3-1)*40+1=121 nodes.
• Probability of randomly drawing a function or terminal symbol - this factor says how
probable it is to randomly set a function symbol in a head during initialization or
mutation. In the first experiment we wished to check how this probability influences
the length of the expression, therefore we used a value of 0.75 for the marine gene
and 0.5 for the tank gene.
8.3.2
Results
The dynamics of the evolution process is presented on Figures 8.1 and 8.2. At the
beginning the average marine phenotype length is much larger than the tanks, but with
time the significant difference fades out. In the end, marine phenotypes tend to be up
to twice as long as the tank phenotypes. This confirms that 0.75 is a better value for
the probability of randomly choosing a function symbol in a gene. However, this still
does not prevent the population from being overcome by individuals with a very simple
phenotype, since the average length of the phenotypes is surprisingly low. Even more,
starting from the 50th generation the 60-80% of the individuals have both the marine
and tank expression tree of a size 1. Evolution “chooses” simplicity and - from given
terminals and functions - is unable to construct more sophisticated strategies.
The detailed analysis of the best strategies from each generation shows that three
main types of them might be distinguished:
• full defense - all marines and all tanks gather and await for the enemy.
• full attack - all marines and all tanks simple move towards the enemy.
• half attack / half defense - either marines or tanks simple move towards enemy,
while all the units of the other type gather and wait.
250
Marine
Tank
Avarage expression length
200
150
100
50
0
0
20
40
60
80
100
120
140
Generation
Fig. 8.1: Average phenotype length, using SET and simple representation
110
One-length indiviudals
100
90
80
Number
70
60
50
40
30
20
10
0
0
20
40
60
80
100
120
140
Generation
Fig. 8.2: Number of one-lengths phenotypes (marine and tank), using SET and
simple representation
59
8.3 First experiment - The Reconnaissance
Marine
Tank
Notes
CENTER_MATES
NEAREST_ENEMY
HOME
NEAREST_MATE
NEAREST_ENEMY
CENTER_ENEMY
full defense strategy
full attack strategy
half attack / half defense strategy
Tab. 8.4: Examples of best players, using SET and simple representation
For examples please see Table 8.4. It must be underlined that in almost every generation the best player had just one node in a marine and tank expression tree. Thus,
the behaviour of units was very simple, they just blindly moved towards the enemy or
gathered and waited for him. Furthermore, a cycling of strategies was discovered - “half
attack / half defense” was at some evolution’s point beaten either by “full defense” or
“full attack” and those were at some point beaten again by a “half attack / half defense”.
8.3.3
Conclusions
From the analysis of the results the following conclusions were drawn:
• The evaluation with no memory and in a noisy environment causes intense population cycling.
• The high probability of randomly setting a function symbol in a head, may - up to
certain extent - help in keeping high complexity level of the strategies.
• The population convergence towards simple players is unquestionable. The learning process did not evolve any intelligence reaching further than already designed
terminal set. We suspect there might be different causes of that:
– using co-evolution with no external reference players
– evaluation method prone to noise
– not expressive enough representation
In the context of evolving an intelligent player the first experiment failed. However, we
did learn much and know that many changes must be introduced and tested before the
evolution will be able to find a good RTT strategy.
8.4 Second experiment - The Skirmish
8.4.1
Objectives and assumptions
The second experiment is called a “skirmish”, since its main goal is to gather experience
before the final try of finding the best RTT players. Many improvements are tested, if
they have a positive influence on learning process. The reasons for using certain settings
are:
• Evaluator - the first experiment proved the SET (having no memory) to be insufficient, thus we have implemented Hall of Fame with all its extensions: uniqueness,
manual teachers and competitive fitness sharing.
1 in
comparison to later experiments
8.4 Second experiment - The Skirmish
60
• Size of the reference array - this is specific to HoF evaluation method, each game
is tested against 17 learned players and 3 manual teachers (20 altogether). It was
a maximal value that could be used and that would not lengthen too much the
experiment time. This way the single generation was evaluated on average in 15
minutes.
• Representation - to objectively observe the influence of all introduced changes, we
decided to once again use simple representation.
• Size of the population - there are 32 individuals in the population, since a larger
one makes the experiment too demanding in the sense of computations time. One
could increase the size of the population at the expense of using a smaller reference
array. However in that case, the evaluation method would be as prone to noise as
SET, and from the first experiment we know that this could lead to population
cycling. Therefore the size of HoF is favored over the size of the population.
• Number of generations - the experiment went on for 140 generations.
• Genetic operators probabilities - we used slightly different values than previously,
giving more significance to mutation (probability of happening changed from 0.044
to 0.1), and less to recombination (probability of happening changed from 0.3 to
0.15).
• Size of the genotype - the head was a sequence of 40 symbols. It means that
expression tree has at maximum 40+(3-1)*40+1=121 nodes.
• Probability of randomly drawing a function or terminal symbol - basing on the
experiences from the first experiment this factor equals 0.75 for both the marine
gene and the tank gene.
8.4.2
Results
In comparison to the first experiment, phenotypes are (on average) longer and - as suspected - there are no significant differences between marines and tanks. Figures 8.4 and
8.3 give more light on the evolution dynamics. At the beginning the number of individuals with one-node expression trees is relatively low. However, at the 40th generation the
learning process suddenly starts to favor simplicity, as the phenotype length drops down
and more and more strategies consists of one node trees. This phenomenon lasts up to
the 80th generation with a peek magnitude at the 60th. The closer analysis shows that
the initial random individuals converged to one simple individual and around the 60th
generation the almost entire population has been “overrun” by it. All the individuals
had identical phenotypes and therefore identical strategy, shown in Table 8.5. Figures
8.4 and 8.3 suggest that in later generations once again more complex individuals are
present. However, the actual playing strategy did not change! For example, assume
having a strategy such as: IF(LT(0,1),NEAREST_ENEMY,some_large_subtree). The
decision clause always evaluates to TRUE, therefore always the NEAREST_ENEMY is
returned, regardless of the other parts of the formula. Exactly this is happening in the
second experiment. All of those more complex expression trees were in fact reducible to
the simple one to which population converged previously. Therefore it seems that the
genetic operators allowed the evolution to explore the solutions’ space, but only in the
local optimum, which the learning process was not able to abandon. The phenotypes
were getting more complex, but the strategy represented by them not.
On Figure 8.5 it is seen that the HoF’s reference array is staidly and quickly getting
bigger. From the 40th up to the 80th generation a mild plateau can be observed, but this
300
Marine
Tank
Avarage expression length
250
200
150
100
50
0
0
20
40
60
80
100
120
140
Generation
Fig. 8.3: Average phenotype length, using HoF and simple representation
30
One-length indiviudals
25
Number
20
15
10
5
0
0
20
40
60
80
100
120
140
Generation
Fig. 8.4: Number of one-lengths phenotypes (marine and tank), using HoF and
simple representation
62
8.4 Second experiment - The Skirmish
140
HoF length
HoF size
120
Hall of Fame dynamics
100
80
60
40
20
0
0
20
40
60
80
100
120
140
Generation
Fig. 8.5: Size and length of the reference strategies array, using HoF and simple
representation
was expected, since the previous analysis showed that during that time the population
were overcome by individuals having identical phenotypes.
Marine
Tank
Notes
NEAREST_ENEMY
NEAREST_ENEMY
full attack strategy
Tab. 8.5: Examples of best players, using HOF and simple representation
8.4.3
Conclusions
From the analysis of the results the following conclusions were drawn:
• HoF with fitness sharing seems to be less prone to noise than SET and thus it is a
better evaluation method in case of RTT player learning.
• The individuals of different phenotypes may play similar strategy. This holds back
the learning process. Also it causes the HoF’s reference array to contain mainly
individualas having the same “playing style”, regardless of not adding twice individuals with identical phenotype.
• The population convergence towards a simple players (even if “decorated” with
lots of useless formulas) is unquestionable. The learning process did not evolve
any intelligence reaching further than already designed terminal set. However, the
evolution process was able to explore the phenotypes space, since the phenotypes
were actually getting more complex and no cycling was observed.
In the context of evolving an intelligent player the second experiment failed. However,
we did test different approaches and settings and are ready for the final experiment.
8.5 Third experiment - The Final Battle
8.5
63
Third experiment - The Final Battle
8.5.1
Objectives and assumptions
The last experiment is called a “final battle”, since the main goal is to automatically
learn best RTT player and submit it to ORTS RTS 2008 contest. The main change (in
comparison to the second experiment) is using the complex representation instead of the
simple one. The reasons for using certain settings are:
• Evaluator - hall of fame is used (with all its extensions: uniqueness, manual teachers
and competitive fitness sharing). It proved to be a better choice than SET.
• Size of reference array - this is specific to HoF evaluation method, each game is
tested against 16 learned players and 4 manual teachers. It was a maximal value
that could be used and that would not lengthen too much the experiment time.
However, at time we conducted the third experiment, less hosts were available (see
Table 8.3), thus single generation was evaluated on average in 25 minutes.
• Representation - in order to evolve an intelligent player we decided to design and
use the complex representation.
• Size of the population - just like in the second experiment, we decided to use the
population of 32 individuals.
• Number of generations - since it is the final experiment, it went on for a little bit
longer than the previous ones. There were 190 generations.
• Genetic operators probabilities - we used the same values as in the second experiment, favoring mutation over recombination.
• Size of the genotype - since the complex representation is significantly larger than
the simple one, it was decided to use a larger head of 50 symbols. It means that
expression tree has at maximum 50+(3-1)*50+1=151 nodes.
• Probability of randomly drawing a function or terminal symbol - we used the same
value as in the second experiment.
8.5.2
Results
The evolution dynamics seems to be similar to the case from the second experiment. The
population converge to a simple strategy and later on the phenotypes slowly are getting
more complex. However, there are differences:
• Closer analysis of the individuals shows that the population converge to few simple
strategies, not to only one. Most probably this happens because the complex representation is much larger than the simple one and similar players can be represented
using different terminals.
• The convergence is faster, which might suggest that the complex representation
(having more terminals) makes it easier for the evolution to learn simple strategies.
• Comparing to the second experiment, the phenotypes complexity level increases
more slowly. Also the dynamics of HoF’s array growth is different, the individuals
are added to the reference set more rarely. It seams that the learning process is more
stable, a once found good solution is not simply altered by adding non-functional
parts to the phenotype (as it happened in the second experiment).
350
Marine
Tank
300
Avarage expression length
250
200
150
100
50
0
0
20
40
60
80
100
Generation
120
140
160
180
Fig. 8.6: Average phenotype length, using HoF and complex representation
30
One-length indiviudals
25
Number
20
15
10
5
0
0
20
40
60
80
100
Generation
120
140
160
180
Fig. 8.7: Number of one-lengths phenotypes (marine and tank), using HoF and
complex representation
65
8.5 Third experiment - The Final Battle
HoF length
HoF size
180
160
Hall of Fame dynamics
140
120
100
80
60
40
20
0
0
20
40
60
80
100
Generation
120
140
160
180
Fig. 8.8: Size and length of the reference strategies array, using HoF and complex
representation
The question arises - if the evolution takes advantage over a more expressive representation and actually finds better strategies. The close analysis of best individuals from
each generation shows that in most cases the strategies works on already known rules
like “full attack” or “full defense” from experiment one and two. However, three unique
strategies - not discovered in previous experiments - were found:
• flank attack - in most offensive strategies the units are moving towards the geometric center of the enemy. This may cause units to block each other and in a result
not all units attack at once. A player who takes an advantage over this has been
found by the evolution. In the “flank attack” units are still moving towards the
enemy, but they tend to turn a little bit making an attempt to surround hostile
units.
• guerrilla attack - if the hostile units are not in the shoot range, tanks are moving
towards the enemy. But as soon as the enemy is close (in a shoot range) the tanks
are retreating. At the same time marines gather at the geometric center of friendly
forces and are waiting for the enemy to come. This is similar behaviour to guerrilla
fighters, they approach the enemy, make a fast attack and then run away or try to
make an ambush.
• siege defense - for the first time a strategy that orders tanks to switch into “siege
mode” was evolved. Marines on the other hand gather in the geometric center of
friendly forces. This is a good defense strategy against simple “full attack”, which
most other individuals played.
Due to high complexity of phenotypes of the above strategies, we present only the “siege
defense” which is actually very simple, see Table 8.6. It must be underlined that regardless of using elitism, the above three strategies were evaluated as best only in one
generation. It suggests that the HoF evaluation method is still not sufficiently prone to
noise.
66
8.5 Third experiment - The Final Battle
MARINE=IF( JUST_SHOOT, CENTER_MARINE_ENEMY, CENTER_TANK_MATE )
TANK=HOME_MINE
Tab. 8.6: Best defense strategy evolved
8.5.3
Conclusions
In the third experiment, the evolution process for the first time were able to discover
counter-strategies to those learned earlier. However, the results are still unsatisfying:
• The strategies with complex phenotypes as “flank attack” and “guerrilla attack”
are actually behaving similarly to simple one-node strategies from previous experiments. The differences in units behaviour are subtle, thus it might be said that
evolution made only one small step forward.
• On the other hand, the effective “siege defense” strategy is actually very simple.
Evolution did not create anything new that would not be already present in the
terminal set.
• HoF evaluation method works better than SET, but still it does not prevent the
evolution from forgetting good solutions discovered (which should not happen in
case of using elitism). It means that results of a game between two strategies may
vary in different simulations. However, this should has been expected, since most
strategies play similarly - in case of even opponents both of them win equally often.
Therefore the true problem is lack of diversity in the population.
Summarizing, the problem of learning the RTT player is definitely very difficult. The
task set ahead of the evolution was ambitious. It might be said that - in spite of enormous
effort put into developing our approach - only a small success was achieved. However,
please notice, that due to incredibly high CPU power demand, the evolution process run
for only 190 generations and was using only 32 individuals. We believe this is not even
close to “enough” for the machine learning to succeed in case of a such complex problem.
Therefore, put in a proper context of little resources we had, this small success of finding
only a little bit more intelligent strategies allows us to hopefully expect that with further
improvements to our approach (see Section 8.6 for next steps we propose) and with the
use of much more computational power, the evolutionary learning of RTT players can
produce effective and sophisticated strategies.
8.5.4
ORTS contest
Unfortunately the ORTS RTS Game AI Competition 2008 had not been so popular as
in the previous years. There was only one opponent in the category of tactical combat,
with whom our evolved AI solution played 200 games. For the contest we submitted
an algorithm merging three strategies found in the third experiment: full attack, flank
attack and siege defense. For each game, the strategy to play was randomly picked at
the beginning (with a probability uniformly distributed among all three options).
Our solution won 15% played games (see [1]). Keeping in mind that our evolved players were simple, we fought that they would be no match against an AI designed manually.
In this context we consider winning 15% of fights as a success. However, unfortunately
due to low interest in the ORTS RTS Game AI Competition, it was impossible to test
our solutions against large and representative set of players. Thus, the results of the
contest are more a curiosity rather than having some significant meaning.
67
8.6 Next steps
8.6
Next steps
8.6.1
Evolution dynamics
Let us summary all experiments in the following conclusions:
• It was expected that individuals with different chromosomes might have identical
phenotypes. But it was not foreseen that individuals with distinct expression trees
would be playing a game in the same way. The cause of this is the redundancy in
phenotype domain - many expression tress can be reduced to much simpler ones.
It means that breeding new individuals and searching trough phenotypes space do
not mean searching in the strategies domain, since most different individuals are
having the same playing style. More work on the representation design is required.
• The population has always converged to one or few simple strategies. Sometimes
the counter-strategies were found, but once the evolution reached a local optimum
it was not able to abandon it, due to redundant nature of phenotypes described in
previous point.
• The evaluation of two players in a simulation is very noisy and the results vary
in different games (with different initial conditions). Therefore using elitism although should - does not let the good and innovative ideas to survive in the
population. Introducing more informative game result should help. In place of a
simple “win/lose” outcome, we propose using a quantitative information, namely
the difference between global hit point levels of the players.
• Using hall of fame gives better results than single elimination tournament. We
believe that choosing an evaluation method having a memory of the learning process
is a step in a good direction.
• The question why does the population converge remains unanswered. There might
be several causes:
– The representation and the problem characteristic: perhaps a simple attacking
strategy is just the easiest one to be discovered by evolution. In terms of
solutions domain it is a local optimum that is impossible to avoid and later
- impossible to leave from. Improving the representation and/or redefining
the problem might reshape the solutions space and therefore enhance diverse
strategies exploration.
– Co-evolution and competitive fitness: it is believed that evaluating individuals basing on themselves as reference points may lead to stagnation of the
evolution process. A possible solution is to put more effort into the design of
manual teachers reference set.
– stGEP: perhaps the genetic operators are amplifying the convergence effect
favoring simple phenotypes. The research comparing stGEP to classic GP
might tell more.
8.6.2
Computational cost
The problem that effectively prevented the machine learning from finding good RTT
players is the enormous CPU power needed to conduct a large experiment. In the case
of relatively small experiments we performed, it took altogether approx. two weeks of
68
8.6 Next steps
constant computation on around 80 processors! It is unknown how does the evolution
work in case of larger populations or if it is allowed to go on for hundreds or even
thousands of generations. It seems to be impossible to do appropriate research, since
one would need a huge computational cluster. But even having one, it is not a good idea
to simply rely on number of machines. The method that is effective but only under the
condition of using thousands of computers will not become popular and will be available
only to few. We propose to try new things:
• Simulation optimization - ORTS is a large and complex framework. The game
simulation requires a server and clients that communicate via network. It is very
costly. The simple, low-level simulator dedicated for one chosen game would be
many times faster. This may allow larger experiments to be conducted.
• Changing the metaheuristic search method - one source of the strength of EC lays in
the fact that evolution of the population is indeed a parallel search. But maintaining
an entire set of individuals is very costly and - in case of RTT player learning perhaps too costly. A well guided one-individual search may be more efficient that
“large population” approach.
8.6.3 Redefinition of the problem
Perhaps evolving players as proposed in this thesis is still too challenging for the methods
of machine learning. The task is to create best strategy for a combat battle, which was
brought down to learning how to maneuver the units. This is a great simplification of the
problem, but maybe one step further is needed. For example, imagine a real battlefield.
When the human commander orders a group of marines to move somewhere he assumes
the soldiers know how to move. He is certain that they will not run into each other
and block themselves. He knows they will use the shortest path and successfully avoid
all obstacles. Assuming this, the commander can think abstractly and create a great
strategy that will win him a battle. But is it the case of artificial RTT player? The
answer is no, because in our problem the agents had to learn much more by themselves.
For example when units were moving, they sometimes blocked each other. Using the
human-commander analogy, it is like we would try to create a best strategy for soldiers
that do not know how to walk! This is obviously futile and finidng good solutions is much
more harder. We have presented in this paper how to, step by step, break down the task
set ahead of the AI. We tend to think that this process should has been taken even
further. We propose several improvements that are good starting points of continuing
our research:
• Implement intelligent behaviour - for example implement avoiding obstacles and
finding shortest paths by units. Also the choice whom to shoot can be taken more
wisely. For example, taking into account all enemies that are in range and their
current hit points can result in destroying them a lot faster than the method of
“shooting the closest one”.
• Improve the representation - in the complex representation we had already introduced the idea of context information. Most terminals had several versions - one
for marines, one for tanks, one for units in range, etc. This could be taken even
further - for example directed terminals could be introduced. The number of units
in range is important, but how much more useful it would be if a unit could know
how many enemy units are in certain direction. Knowing there are 8 tanks around
is less informative compared to knowing that 6 of them are on the left and only 2
on the right.
8.6 Next steps
69
• Change the representation - perhaps the domain knowledge hidden in the terminal
and function is too much low-level and even all the improvements proposed above
will not change much. On the one hand they will give more expression power to
the representation, but on the other hand the solution space will get larger and
more complex, thus cancelling the positive effects. The solution to those problems is a representation defined on a higher level - using more abstract types,
less terminals and more problem dedicated functions. For example, instead of
two terminals: NUMBER_OF_MARINES, NUMBER_OF_TANKS the function
NUMBER(unit) should be used. We believe that more informative representation
should be a result of parameterising and creating new functions and not adding
more and more very complex terminals.
• focus on communication between agents - this aspect was a little bit neglected
by us. It does seem that in evolved strategies units had trouble with coordinating
their behaviour. All of them were independently from others choosing were to move.
Perhaps it is a good idea to distinguish one or more units as so-called leaders. Then
the behaviour of the ordinary units could depend not only on their local situation
but also on the behaviour of the leaders. This would introduce more cooperation
between units. On the other hand, the process of learning could become more
prone to noise.
9
Summary
No matter which way you have to march, it is always uphill.
One of the Murphy’s Laws of Combat Operations.
9.1 Contribution
The benefits of this work are diverse. Firstly, the paper pointed to the generic methodology of approaching a very demanding problem such as machine learning of RTT players.
This involved presenting the following steps: defining a game and designing its model,
choosing a learning method along with all the details of solutions’ encoding and evaluation, and performing actual experiments. Secondly, in the evolutionary computation
field, a novel approach of strongly typed gene expression programming was developed
and its detailed description was presented. It is a very flexible method and can be
used in many different problems, for instance the RTT player learning. We also studied
different co-evolutionary evaluation methods and elaborated fine-tuning of the learning
process. Finally, our work contributed in the field of distributed computation, since we
showed both the theoretical analysis and the real-life design of a master-slave evaluation
framework along with details of its implementation.
The goal set ahead at the beginning of this paper was to evolve a good strategy,
perhaps even a human-competitive one. It may be stated that we are in the midway of
fulfilling this objective. We put enormous intellectual effort into designing our approach
and performed computations that would take on a single machine more than 2 years.
But still, the challenge to automatically create AI for a tactical combat is extremely
ambitious and hard. In spite of all the work done, the final results might seem a little
disappointing, since best strategies evolved are quite naive. However, they have still
managed to compete with humanly designed algorithms on the ORTS RTS Game AI
Competition 2008. They did not win, but put up a significant resistance, showing that
our research is heading in a good direction.
Let us not forget that the research on AI in the real-time games has only just begun.
The number of literature positions on this topic is relatively limited and practically most
papers are just a few years old. We hope this thesis will contribute well to development of science, enriching such fields as distributed machine learning, AI in games and
evolutionary computation.
71
9.2 Work ahead
9.2
Work ahead
There are countless possibilities of continuing our research. Although a lot has been
achieved, much more is still ahead of us. Generally speaking, it is not known exactly how
the different settings and choices we have made influence the learning process. Firstly, one
could try to model a game with a help of neural networks or maybe even some rule-based
methods rather than a multi-agent system using expression trees. Making an objective
comparison between GP and stGEP is another road to follow. There are also some
aspects of artificial evolution we had disregarded. For example the selection method,
which has been set at the beginning of our work and remained unchanged up to very the
end. In terms of improving our approach and in order to make it more effective, there
are so many ideas, we do not know where to begin with. For more detailed conclusions
drawn from the experiments and more suggestions for the future, please see Section 8.6.
It is in this section that we propose to continue research on evolution dynamics in the
context of RTT games, focus on optimization issues and even try to redefine the problem
or use other machine learning methods.
One thing is certain - teaching a machine to play an intellectually demanding, heavily
time constrained game will remain an open problem for a long time. Let us hope this
great challenge of pushing AI to new levels will draw more and more attention of the
scientific environment. We believe that the game is worth the candle.
A
DVD content
The DVD attached to this thesis contains:
• The software environment described in Chapter 7.
• Raw results of the experiments described in Chapter 8.
B
Acronyms
AI
Artificial Intelligence
RTS
Real Time Strategy
RTT
Real Time Combat
MAS
Multi Agent System
SET
Single Elimination Tournament
HoF
Hall of Fame
EC
Evolutionary Computing
GA
Genetic Algorithms
EP
Evolutionary Programming
ES
Evolutionary Strategies
GEP
Gene Expression Programming
GP
Genetic Programming
st
strongly typed
IS
Insertion Sequence
RIS
Root Insertion Sequence
ORTS
Open Real Time Strategy
CPU
Central Processing Unit
GPU
Graphics Processing Unit
Bibliography
[1] 2008 ORTS RTS Game AI Competition, http://www.cs.ualberta.ca/ mburo/orts/aiide08/index.html, 2008.
[2] European games developer federation, http://www.egdf.eu/index.html, 2008.
[3] Orts - a free software rts game engine, http://www.cs.ualberta.ca/ mburo/orts/,
2008.
[4] Wikipedia, the free encyclopedia, http://en.wikipedia.org/wiki/evolutionary_algorithm,
2008.
[5] Wikipedia, the free encyclopedia, http://en.wikipedia.org/wiki/military_strategy,
2008.
[6] Wikipedia,
the
free
agent_system, 2008.
encyclopedia,
http://en.wikipedia.org/wiki/multi-
[7] Peter J. Angeline and Jordan B. Pollack. Competitive environments evolve better
solutions for complex tasks. In Stephanie Forrest, editor, Proceedings of the 5th International Conference on Genetic Algorithms, ICGA-93, pages 264–270, University
of Illinois at Urbana-Champaign, 17-21 July 1993. Morgan Kaufmann.
[8] Jarosław Arabas. Wykłady z algorytmów ewolucyjnych, volume 303. Wydawnictwa
Naukowo-Techniczne, Warszawa, 2001.
[9] E. S. Association. 2008 sales, demographics and usage data. essential facts about
the computer and video game industry., 2008.
[10] Yaniv Azaria and Moshe Sipper. GP-gammon: Genetically programming backgammon players. Genetic Programming and Evolvable Machines, 6(3):283–300, sep 2005.
Published online: 12 August 2005.
[11] M. Buro. Call for AI research in RTS games. AAAI-04 AI in Games Workshop, San
Jose, 2004.
[12] M. Buro and T. Furtak. RTS games as test-bed for real-time research. Invited Paper
at the Workshop on Game AI, JCIS, 2003.
[13] Michael Buro. Game 4 description, http://www.cs.ualberta.ca/ mburo/orts/aiide08/game4, 2008.
75
B Bibliography
[14] Michael Buro and Timothy Furtak. On the development of a free rts game engine.
In GameOn’NA Conference Montreal, 2005.
[15] Michael Buro and Timothy Furtak. ORTS Competition: Getting Started, February
2008.
[16] Nichael Lynn Cramer. A representation for the adaptive generation of simple sequential programs. In Proceedings of an International Conference on Genetic Algorithms
and the Applications, pages 183–187, 1985.
[17] Raphael Crawford-Marks. Virtual witches and warlocks: Computational evolution
of teamwork and strategy in a dynamic, heterogeneous and noisy 3D environment.
Division iii (senior) thesis, School of Cognitive Science, Hampshire College, 18 May
2004.
[18] Candida Ferreira. Gene expression programming: a new adaptive algorithm for
solving problems. COMPLEX SYSTEMS, 13:87, 2001.
[19] Candida Ferreira. Gene expression programming in problem solving. In Rajkumar Roy, Mario Köppen, Seppo Ovaska, Takeshi Furuhashi, and Frank Hoffmann,
editors, Soft Computing and Industry Recent Applications, pages 635–654. SpringerVerlag, 10–24 September 2001. Published 2002.
[20] Cândida Ferreira. Function finding and the creation of numerical constants in gene
expression programming. In 7th Online World Conference on Soft Computing in
Industrial Applications, September 23 - October 4 2002. on line.
[21] Candida Ferreira. Mutation, transposition, and recombination: An analysis of the
evolutionary dynamics. In Manuel Grana Romay and Richard Duro, editors, 4th
International Workshop on Frontiers in Evolutionary Algorithms, North Carolina,
USA, 8-14 March 2002.
[22] Candida Ferreira. Designing neural networks using gene expression programming.
In Ajith Abraham and Mario Köppen, editors, 9th Online World Conference on Soft
Computing in Industrial Applications, page Paper No. 14, On the World Wide Web,
20 September - 8 October 2004.
[23] Candida Ferreira.
programming.com/, 2008.
GEP
home
page,
http://www.gene-expression-
[24] L. J. Fogel, A. J. Owens, and M. J. Walsh. Artificial Intelligence through Simulated
Evolution. John Wiley, New York, USA, 1966.
[25] Johannes Fürnkranz. Machine learning in games: a survey. pages 11–59, 2001.
[26] Bruce
Geryk.
A
history
of
real-time
strategy
http://www.gamespot.com/gamespot/features/all/real_time, 2008.
games,
[27] David E. Goldberg. Genetic Algorithms in Search, Optimization, and Machine
Learning. Addison-Wesley Professional, January 1989.
[28] Ami Hauptman and Moshe Sipper. GP-endchess: Using genetic programming to
evolve chess endgame players. In EuroGP, pages 120–131, 2005.
[29] Ami Hauptman and Moshe Sipper. Evolution of an efficient search algorithm for
the mate-in-N problem in chess. In Marc Ebner, Michael O’Neill, Anikó Ekárt,
Leonardo Vanneschi, and Anna Isabel Esparcia-Alcázar, editors, Proceedings of the
10th European Conference on Genetic Programming, volume 4445 of Lecture Notes
in Computer Science, pages 78–89, Valencia, Spain, 11 - 13 April 2007. Springer.
B Bibliography
76
[30] J.C. Herz and Michael R. Macedonia. Computer games and the military: Two views.
Defense Horizons, Center for Technology and National Security Policy, National
Defense University, 11, April 2002.
[31] J. R. Koza. Hierarchical genetic algorithms operating on populations of computer
programs. In N. S. Sridharan, editor, Proceedings of the Eleventh International Joint
Conference on Artificial Intelligence IJCAI-89, volume 1, pages 768–774, Detroit,
MI, USA, 20-25 August 1989. Morgan Kaufmann.
[32] S. Luke and R.P. Wiegand. When coevolutionary algorithms exhibit evolutionary
dynamics. pages 236–241, 2002.
[33] S. Luke and R.P. Wiegand. Guaranteeing coevolutionary objective measures. Poli
et al.[201], pages 237–251, 2003.
[34] Liviu Panait and Sean Luke. A comparison of two competitive fitness functions. In
GECCO ’02: Proceedings of the Genetic and Evolutionary Computation Conference,
pages 503–511, San Francisco, CA, USA, 2002. Morgan Kaufmann Publishers Inc.
[35] Liviu Panait and Sean Luke. Cooperative multi-agent learning: The state of the
art. Autonomous Agents and Multi-Agent Systems, 11(3):387–434, November 2005.
[36] Riccardo Poli, William B. Langdon, Nicholas F. McPhee, and John R. Koza. Genetic
programming an introductory tutorial and a survey of techniques and applications.
Technical Report CES-475, Department of Computing and Electronic Systems, University of Essex, UK, October 2007.
[37] Christopher D. Rosin and Richard K. Belew. Methods for competitive co-evolution:
Finding opponents worth beating. In Proceedings of the 6th International Conference
on Genetic Algorithms, pages 373–381, San Francisco, CA, USA, 1995. Morgan
Kaufmann Publishers Inc.
[38] Christopher D. Rosin and Richard K. Belew. New methods for competitive coevolution. Evol. Comput., 5(1):1–29, 1997.
[39] J. K. Rowling. Harry Potter and the sorcerer’s stone. Scholastic, New York, 1999.
[40] S. Sharabi and M. Sipper. GP-sumo: Using genetic programming to evolve
sumobots. Genetic Programming and Evolvable Machines, 7(3):211–230, 2006.
[41] S. F. Smith. A Learning System Based on Genetic Adaptive Algorithms. Phd thesis,
University of Pittsburgh, 1980.