Additional Coursework
Intelligent Robotics course
Boris Mocialov
Nigar Mehraliyeva
Alipasha Jamalli
Heriot-Watt University
Edinburgh, United Kingdom
[email protected]
Heriot-Watt University
Edinburgh, United Kingdom
[email protected]
Heriot-Watt University
Edinburgh, United Kingdom
[email protected]
Abstract—This paper describes implementation and
evaluation processes, results of experiments which have been done
to achieve following the line and avoiding obstacles behavior for epuck robot.
Keywords—evalutionary robotics; behavior-based robotics;
controller; neural network; fitness function
This paper describes implementation and evaluation
processes of e-puck robot controller and differences between
behavior-based and evolutionary robotics approaches. The task
of the robot is following the line on the ground all the time and
avoiding obstacles while racing to finish the circuit as quick as
possible. Evolutionary robotics techniques have been applied to
evaluate the robot controller. And Robot controllers have been
developed using Webots simulator software.
A. Implementation
All simulations were run according to the following set up:
Rank selection
Mutate every gene with mutation probability and
mutation deviation
Two-point crossover
Elite part: 10%
Mutation probability: 10%
Mutation deviation: 20%
Population: 50
Generations: 1000
Multilayer perceptron neural network model is used to make
decision for robot controller. The input layer consists of eleven
neurons, which are corresponds to sensors (eight proximity
sensors and three ground sensors) and output layer consists of
two neurons according to wheel speed (right and left wheel
speeds). As there is no standard way to determine the appropriate
artificial neural network structure, different number of hidden
layers with different number of neurons for each layer were
experimented. A neural network structure without bias including
single, two, three, four hidden layers, firstly, with five neurons
in each hidden layer, then changing the number of neurons
between two and ten in second hidden layer were tested.
Sigmoid function was selected as activation function.
During use of above mentioned neural network architectures
the robot tended to go around the field at every evaluation, not
coming back to the line after avoiding the first obstacle on its
way. Even changes in the amount of hidden layers and the
number of neurons for each layer doesn’t affect to performance.
Another applied approach was based on splitting neural
network into two parts, where one section will be responsible for
the line following and another for obstacle avoidance. Either one
or another sub-network will be active at a time depending on
whether a robot is currently following a line or avoiding an
obstacle. Sigmoid activation function was chosen to propagate
signal that arrives to a node. To introduce more flexibility to the
network, variable slope of the activation function had been
chosen. The slope of every node had been evolved using the
same genetic algorithm that was used for the evolution of the
weights for the line following and another for obstacle
avoidance. Either one or another sub-network will be active at a
time depending on whether a robot is currently following a line
or avoiding an obstacle.
B. Evolution
Different type of crossover operators like as: box crossover,
line crossover, two-point crossover (all crossover operators with
different probabilities) were tested. But the obtained e-puck
robot behavior did not differ much with applying different type
of crossover operators with different probabilities.
Application of mutation operator with different probabilities
also did not assist to obtain better performance (the behavior of
the robot did not differ much).
equal importance on both tasks following the line and
avoiding the obstacles.
The fitness function used by Nordin and Banzhaf
(1995) to achieve avoding obtacles behaviour for
Khepera robot:
𝑓 = ∑ 𝑝𝑖 + |15 − 𝑤1 | + |15 − 𝑤2 | + |𝑤1 − 𝑤2 |,
The e-puck robot behavior is nor differ much by changing
rank selection to tournament selection.
Different versions of fitness function were applied to reward
the task achievement. The applied fitness functions are
The formula using for fitness function is sum of two
functions as shown on equation1. The first one is the
function which used by Nolfi and Stefano (2000) for
Khepera robot to avoid obstacles. The second part was
added by us to award the following line.
𝑓 = 𝑉(1 − √∆𝑣)(1 − 𝑖) + 𝑜𝑛_𝑙𝑖𝑛𝑒,
𝑓 = (1 − √∆𝑣) ∗ (1 − 𝑖) ∗ 𝑜𝑛_𝑙𝑖𝑛𝑒 ,
𝑓 = 𝑉 + (1 − √∆𝑣) + (1 − 𝑖) + 𝑜𝑛_𝑙𝑖𝑛𝑒,
where the definition of the variables correspondingly are
the same with the definitions the equation (1).
𝑓 = (𝑉 + (1 − √∆𝑣) + (1 − 𝑖) + 𝑜𝑛_𝑙𝑖𝑛𝑒)/4, (4)
where the definition of the variables correspondingly are
the same with the definitions the equation (1).
The fitness function based on the formula used by
Nordin and Banzhaf (1997) for avoiding obstacles:
𝑓 = 𝛼(𝑤1 + 𝑤2 − |𝑤1 − 𝑤2 |) − 𝛽(∑8𝑖=0 𝑠𝑖 ),
Where ∑8𝑖=0 𝑠𝑖 is the sum of all proximity sensor
values, 𝑤1 is the left wheel speed, 𝑤2 is the right wheel
speed, 𝛼 and 𝛽 are constants. The values 𝛼 = 16,1 and
𝛽 = 1has been used in our experiments.
All behaviour possibilities were tried to be
considered:punish hitting obstacles if maximum of
proximity sensor values above from obstacle threshold,
punish oscillatory if the absolute value of the algebraic
difference between the signed speed values of the
wheels greater than 50, punish standing still if both
wheel speeds are equal to zero, reward fast speed if both
of the wheel speeds are greater than an half of maximum
wheel speed, otherwise punish low speed.
10. 𝑓 = (𝑉(1 − √∆𝑣)(1 − 𝑖) + 𝑜𝑛𝑙𝑖𝑛𝑒 ) −
𝑎𝑏𝑠(𝑉(1 − √∆𝑣)(1 − 𝑖) − 𝑜𝑛_𝑙𝑖𝑛𝑒),
where abs means absolute value, the other variable
variables correspondingly are the same with the
definitions the equation (1).
where the definition of the variables correspondingly are
the same with the definitions the equation (1). Only when
online equals to 0, it is equaled to 0.0001.
where ∑ 𝑝𝑖 is the sum of all proximity sensor values, 𝑤1
is the left wheel speed, 𝑤2 is the right wheel speed.
where 𝑉 is the normalized sum of rotation speeds of
the two wheels, ∆𝑣 is the normalized absolute value
of the algebraic difference between the signed speed
values of the wheels (positive is one direction, negative
the other), and 𝑖 is the normalized activation value of the
infrared sensor with the highest activity, on_line equals
1 if the all three ground sensors are on the line, equals to
0.66 if two ground sensors are on the line, equals to 0.33
if only one ground sensor is on the line and equals to
zero if none of the ground sensors is on the line. The
better performance of the robot corresponds to the higher
value of fitness function. The component 𝑉 encourages
motion, (1 − √∆𝑣)encourages straight displacement,
(1 − 𝑖) encourages obstacle avoidance (but without
saying in what direction the robot should move), online
encourages following the line.
The use of various type of neural network structures without
splitting into sections using with the different type of different
fitness functions mentioned above does not affect the robot
performance significantly. The found results show that the best
performance was found by only splitting neural network into
two sections as discussed above, adding biases and using the
fitness function which is shown on equation 8. The neural
network structure corresponding to the best result is presented
on Figure 1. The set-up for the best performance is as following:
In equation (1) punishment was added to penalise
oscillatory movement by 2 marks. The fitness function
in this case is 𝑓 = 𝑓 − 2, when absolute value of the
algebraic difference between the signed speed values
of the wheels greater than 50.
Not fully connected neural network
Rank selection
Mutate every gene with mutation probability and
mutation deviation
𝑓 = (𝑉(1 − √∆𝑣)(1 − 𝑖) + 𝑜𝑛_𝑙𝑖𝑛𝑒)/(𝑧),
Two-point crossover
Elite part: 10%
Mutation probability: 10%
Mutation deviation: 20%
where 𝑧 is the absolute value of difference between
𝑉(1 − √∆𝑣) ∗ (1 − 𝑖) and online. The definitions of
these variables correspondingly are the same with the
definitions the equation (1). This function is used to give
Fitness function:
𝑓 = (𝑉(1 − √∆𝑣)(1 − 𝑖) + 𝑜𝑛_𝑙𝑖𝑛𝑒) − 𝑎𝑏𝑠(𝑓 =
𝑉(1 − √∆𝑣)(1 − 𝑖) − 𝑜𝑛_𝑙𝑖𝑛𝑒) (the definition of this
function was shown above)
Population: 50
Generations: 1000
Activation function: Custom sigmoid *
MAX_SPEED1 || Sigmoid * MAX_SPEED2
Custom sigmoid = 1/1+exp(slope * -x)
slope - evolved
MAX_SPEED1 = 500
MAX_SPEED2 = 200
Evaluation time: 60 seconds
The found fitness function plot until 250 generation related
to best performance of the robot is presented on Figure 2. The
obtained trajectory according to the best behavior of the e-puck
robot is shown Figure 3. This best trajectory achieved when the
fitness function equals to 0.312671. This value of fitness
function was found on 148th generation. After 148th generation
Figure 2. The fitness function
Figure 1. The neural network structure
Figure 3. Trajectory of e-puck robot corresponding to the best
the value of the fitness function started to increase, in opposite,
the behavior of robot became more undesired. The found
weights for best result:{1.002151 0.038582 -0.237155 0.069138 0.991504 0.633540 1.014423 1.520142 1.176184
0.255908 0.723387 0.472929 0.551734 -0.754705 1.906338
0.342988 0.765217 -0.218661 0.848782 0.882759 0.745948
0.504366 0.686922 0.140097 0.859544 -0.842507 0.538433
0.673799 0.687551 0.036975 0.836208 0.585567 0.423217
1.177922 0.204886 0.566049 0.789327 0.550478 1.505674
0.193723 0.288644 0.363805 -0.303193 0.623066 1.186009
0.128952 0.966394 -0.738753 0.630619 0.045746 -0.303255
1.122907 1.185418 0.922135 0.283408 0.374248 0.090419
0.743278 0.759668 0.837929 0.027833 0.410441 -0.004732
0.563925 0.156509 1.103527 0.132984 0.952096 1.106409
0.818443 0.921981}
As Nolfi and Stefano (2000) mentioned in evolutionary and
behavior-based robotics approaches environment plays a great
role in determination of basic behaviors. The behavior-based
robotics approach relies on gathering basic behaviors.
Depending on the environment global behavior of the robot
creates interaction between basic behaviors. A coordination
mechanism identifies which behavior is stronger in specific
time. Behaviors are gradually adjusted and corresponding
behaviors are examined by the designer until the desired robotic
behaviors obtained. There two type of coordination mechanism
implementation: competitive and cooperative. In competitive
method the output is depended only one behavior, while in
cooperative method the output may be depended on different
behaviors with different strength. But, it is not clear how a
desired behavior should be decomposed and it is very
difficult to perform such decomposition by hand.
According to Illah and Nourbakhsh (2004) behavior-based
system may have multiple active behaviors at any one time.
Even when individual behaviors are tuned to optimize
performance, this fusion and rapid switching between multiple
behaviors can negate that fine-tuning. The behavior-based
approach does not directly scale to other environments or to
larger environments
methods enables the robotic controllers more advantageous for
relatively fast adaptation time and carefree operations. The main
aim of evolutionary robotics approach is autonomously design
robots or robot controllers. It means that their inner workings is
not described in these type of robots [2].
The difference between the behavior-based and evolutionary
approaches is shown on Figures 4 and 5. According to Nolfie In
the behavior-based approach the desired behavior is divided
by the designer into a set of basic behaviors which are
implemented into separate sub-sections of the robot's control
system (Figure 4). In evolutionary robotics the designer does not
need to decide how to divide the desired behavior into basic
behaviors (Figure 5). The way in which a desired behavior is
divided into modules is the result of a self-organization process.
The systems having self-organization capabilities can execute
task in unforeseen environments and adapt to dynamic
conditions [3].
Figure 5. Evolutionary approach [1]
Bräunl (2008) describes that in a behavior-based system, a
certain number of behaviors run as parallel processes. While
each behavior can access all sensors, only one behavior can have
control over the robot’s actuators or driving mechanism.
Therefore, an overall controller is required to coordinate
behavior selection or behavior activation or behavior output
merging at appropriate times to achieve the desired objective.
Early behavior-based systems such as Brooks (1986) used a
fixed priority ordering of behaviors. For example, the wall
avoidance behavior always has priority over the foraging
behavior. Obviously such a rigid system is very restricted in its
capabilities and becomes difficult to manage with increasing
system complexity.
Differ from behavior-based approach evolutionary robotics
relies on an evaluation of the system as a whole system. In this
case the designer is not required to decide how to split the
desired behavior into simple basic behaviors [1]. It makes
robotic systems to adapt to unpredictable or changing
environments without human influence [3]. Evolutionary
Figure 4. Behavior-based approach [1]
