Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Additional Coursework Intelligent Robotics course Boris Mocialov Nigar Mehraliyeva Alipasha Jamalli Heriot-Watt University Edinburgh, United Kingdom [email protected] Heriot-Watt University Edinburgh, United Kingdom [email protected] Heriot-Watt University Edinburgh, United Kingdom [email protected] Abstract—This paper describes implementation and evaluation processes, results of experiments which have been done to achieve following the line and avoiding obstacles behavior for epuck robot. Keywords—evalutionary robotics; behavior-based robotics; controller; neural network; fitness function I. INTRODUCTION This paper describes implementation and evaluation processes of e-puck robot controller and differences between behavior-based and evolutionary robotics approaches. The task of the robot is following the line on the ground all the time and avoiding obstacles while racing to finish the circuit as quick as possible. Evolutionary robotics techniques have been applied to evaluate the robot controller. And Robot controllers have been developed using Webots simulator software. II. IMPLEMENTATION AND EVOLUTION A. Implementation All simulations were run according to the following set up: • Rank selection • Mutate every gene with mutation probability and mutation deviation • Two-point crossover • Elite part: 10% • Mutation probability: 10% • Mutation deviation: 20% • Population: 50 • Generations: 1000 Multilayer perceptron neural network model is used to make decision for robot controller. The input layer consists of eleven neurons, which are corresponds to sensors (eight proximity sensors and three ground sensors) and output layer consists of two neurons according to wheel speed (right and left wheel speeds). As there is no standard way to determine the appropriate artificial neural network structure, different number of hidden layers with different number of neurons for each layer were experimented. A neural network structure without bias including single, two, three, four hidden layers, firstly, with five neurons in each hidden layer, then changing the number of neurons between two and ten in second hidden layer were tested. Sigmoid function was selected as activation function. During use of above mentioned neural network architectures the robot tended to go around the field at every evaluation, not coming back to the line after avoiding the first obstacle on its way. Even changes in the amount of hidden layers and the number of neurons for each layer doesn’t affect to performance. Another applied approach was based on splitting neural network into two parts, where one section will be responsible for the line following and another for obstacle avoidance. Either one or another sub-network will be active at a time depending on whether a robot is currently following a line or avoiding an obstacle. Sigmoid activation function was chosen to propagate signal that arrives to a node. To introduce more flexibility to the network, variable slope of the activation function had been chosen. The slope of every node had been evolved using the same genetic algorithm that was used for the evolution of the weights for the line following and another for obstacle avoidance. Either one or another sub-network will be active at a time depending on whether a robot is currently following a line or avoiding an obstacle. B. Evolution Different type of crossover operators like as: box crossover, line crossover, two-point crossover (all crossover operators with different probabilities) were tested. But the obtained e-puck robot behavior did not differ much with applying different type of crossover operators with different probabilities. Application of mutation operator with different probabilities also did not assist to obtain better performance (the behavior of the robot did not differ much). equal importance on both tasks following the line and avoiding the obstacles. 7. The fitness function used by Nordin and Banzhaf (1995) to achieve avoding obtacles behaviour for Khepera robot: 𝑓 = ∑ 𝑝𝑖 + |15 − 𝑤1 | + |15 − 𝑤2 | + |𝑤1 − 𝑤2 |, The e-puck robot behavior is nor differ much by changing rank selection to tournament selection. Different versions of fitness function were applied to reward the task achievement. The applied fitness functions are following: 1. The formula using for fitness function is sum of two functions as shown on equation1. The first one is the function which used by Nolfi and Stefano (2000) for Khepera robot to avoid obstacles. The second part was added by us to award the following line. 𝑓 = 𝑉(1 − √∆𝑣)(1 − 𝑖) + 𝑜𝑛_𝑙𝑖𝑛𝑒, 2. 𝑓 = (1 − √∆𝑣) ∗ (1 − 𝑖) ∗ 𝑜𝑛_𝑙𝑖𝑛𝑒 , 𝑓 = 𝑉 + (1 − √∆𝑣) + (1 − 𝑖) + 𝑜𝑛_𝑙𝑖𝑛𝑒, (3) where the definition of the variables correspondingly are the same with the definitions the equation (1). 4. 𝑓 = (𝑉 + (1 − √∆𝑣) + (1 − 𝑖) + 𝑜𝑛_𝑙𝑖𝑛𝑒)/4, (4) where the definition of the variables correspondingly are the same with the definitions the equation (1). 5. 6. 8. The fitness function based on the formula used by Nordin and Banzhaf (1997) for avoiding obstacles: 𝑓 = 𝛼(𝑤1 + 𝑤2 − |𝑤1 − 𝑤2 |) − 𝛽(∑8𝑖=0 𝑠𝑖 ), (7) Where ∑8𝑖=0 𝑠𝑖 is the sum of all proximity sensor values, 𝑤1 is the left wheel speed, 𝑤2 is the right wheel speed, 𝛼 and 𝛽 are constants. The values 𝛼 = 16,1 and 𝛽 = 1has been used in our experiments. 9. All behaviour possibilities were tried to be considered:punish hitting obstacles if maximum of proximity sensor values above from obstacle threshold, punish oscillatory if the absolute value of the algebraic difference between the signed speed values of the wheels greater than 50, punish standing still if both wheel speeds are equal to zero, reward fast speed if both of the wheel speeds are greater than an half of maximum wheel speed, otherwise punish low speed. 10. 𝑓 = (𝑉(1 − √∆𝑣)(1 − 𝑖) + 𝑜𝑛𝑙𝑖𝑛𝑒 ) − 𝑎𝑏𝑠(𝑉(1 − √∆𝑣)(1 − 𝑖) − 𝑜𝑛_𝑙𝑖𝑛𝑒), (8) where abs means absolute value, the other variable variables correspondingly are the same with the definitions the equation (1). (2) where the definition of the variables correspondingly are the same with the definitions the equation (1). Only when online equals to 0, it is equaled to 0.0001. 3. where ∑ 𝑝𝑖 is the sum of all proximity sensor values, 𝑤1 is the left wheel speed, 𝑤2 is the right wheel speed. (1) where 𝑉 is the normalized sum of rotation speeds of the two wheels, ∆𝑣 is the normalized absolute value of the algebraic difference between the signed speed values of the wheels (positive is one direction, negative the other), and 𝑖 is the normalized activation value of the infrared sensor with the highest activity, on_line equals 1 if the all three ground sensors are on the line, equals to 0.66 if two ground sensors are on the line, equals to 0.33 if only one ground sensor is on the line and equals to zero if none of the ground sensors is on the line. The better performance of the robot corresponds to the higher value of fitness function. The component 𝑉 encourages motion, (1 − √∆𝑣)encourages straight displacement, (1 − 𝑖) encourages obstacle avoidance (but without saying in what direction the robot should move), online encourages following the line. (6) III. RESULTS The use of various type of neural network structures without splitting into sections using with the different type of different fitness functions mentioned above does not affect the robot performance significantly. The found results show that the best performance was found by only splitting neural network into two sections as discussed above, adding biases and using the fitness function which is shown on equation 8. The neural network structure corresponding to the best result is presented on Figure 1. The set-up for the best performance is as following: In equation (1) punishment was added to penalise oscillatory movement by 2 marks. The fitness function in this case is 𝑓 = 𝑓 − 2, when absolute value of the algebraic difference between the signed speed values of the wheels greater than 50. Not fully connected neural network Rank selection Mutate every gene with mutation probability and mutation deviation 𝑓 = (𝑉(1 − √∆𝑣)(1 − 𝑖) + 𝑜𝑛_𝑙𝑖𝑛𝑒)/(𝑧), Two-point crossover Elite part: 10% Mutation probability: 10% Mutation deviation: 20% (5) where 𝑧 is the absolute value of difference between 𝑉(1 − √∆𝑣) ∗ (1 − 𝑖) and online. The definitions of these variables correspondingly are the same with the definitions the equation (1). This function is used to give Fitness function: 𝑓 = (𝑉(1 − √∆𝑣)(1 − 𝑖) + 𝑜𝑛_𝑙𝑖𝑛𝑒) − 𝑎𝑏𝑠(𝑓 = 𝑉(1 − √∆𝑣)(1 − 𝑖) − 𝑜𝑛_𝑙𝑖𝑛𝑒) (the definition of this function was shown above) Population: 50 Generations: 1000 Activation function: Custom sigmoid * MAX_SPEED1 || Sigmoid * MAX_SPEED2 Custom sigmoid = 1/1+exp(slope * -x) slope - evolved MAX_SPEED1 = 500 MAX_SPEED2 = 200 Evaluation time: 60 seconds The found fitness function plot until 250 generation related to best performance of the robot is presented on Figure 2. The obtained trajectory according to the best behavior of the e-puck robot is shown Figure 3. This best trajectory achieved when the fitness function equals to 0.312671. This value of fitness function was found on 148th generation. After 148th generation Figure 2. The fitness function Figure 1. The neural network structure Figure 3. Trajectory of e-puck robot corresponding to the best performance the value of the fitness function started to increase, in opposite, the behavior of robot became more undesired. The found weights for best result:{1.002151 0.038582 -0.237155 0.069138 0.991504 0.633540 1.014423 1.520142 1.176184 0.255908 0.723387 0.472929 0.551734 -0.754705 1.906338 0.342988 0.765217 -0.218661 0.848782 0.882759 0.745948 0.504366 0.686922 0.140097 0.859544 -0.842507 0.538433 0.673799 0.687551 0.036975 0.836208 0.585567 0.423217 1.177922 0.204886 0.566049 0.789327 0.550478 1.505674 0.193723 0.288644 0.363805 -0.303193 0.623066 1.186009 0.128952 0.966394 -0.738753 0.630619 0.045746 -0.303255 1.122907 1.185418 0.922135 0.283408 0.374248 0.090419 0.743278 0.759668 0.837929 0.027833 0.410441 -0.004732 0.563925 0.156509 1.103527 0.132984 0.952096 1.106409 0.818443 0.921981} IV. DISCUSSION As Nolfi and Stefano (2000) mentioned in evolutionary and behavior-based robotics approaches environment plays a great role in determination of basic behaviors. The behavior-based robotics approach relies on gathering basic behaviors. Depending on the environment global behavior of the robot creates interaction between basic behaviors. A coordination mechanism identifies which behavior is stronger in specific time. Behaviors are gradually adjusted and corresponding behaviors are examined by the designer until the desired robotic behaviors obtained. There two type of coordination mechanism implementation: competitive and cooperative. In competitive method the output is depended only one behavior, while in cooperative method the output may be depended on different behaviors with different strength. But, it is not clear how a desired behavior should be decomposed and it is very difficult to perform such decomposition by hand. According to Illah and Nourbakhsh (2004) behavior-based system may have multiple active behaviors at any one time. Even when individual behaviors are tuned to optimize performance, this fusion and rapid switching between multiple behaviors can negate that fine-tuning. The behavior-based approach does not directly scale to other environments or to larger environments methods enables the robotic controllers more advantageous for relatively fast adaptation time and carefree operations. The main aim of evolutionary robotics approach is autonomously design robots or robot controllers. It means that their inner workings is not described in these type of robots [2]. The difference between the behavior-based and evolutionary approaches is shown on Figures 4 and 5. According to Nolfie In the behavior-based approach the desired behavior is divided by the designer into a set of basic behaviors which are implemented into separate sub-sections of the robot's control system (Figure 4). In evolutionary robotics the designer does not need to decide how to divide the desired behavior into basic behaviors (Figure 5). The way in which a desired behavior is divided into modules is the result of a self-organization process. The systems having self-organization capabilities can execute task in unforeseen environments and adapt to dynamic conditions [3]. Figure 5. Evolutionary approach [1] Bräunl (2008) describes that in a behavior-based system, a certain number of behaviors run as parallel processes. While each behavior can access all sensors, only one behavior can have control over the robot’s actuators or driving mechanism. Therefore, an overall controller is required to coordinate behavior selection or behavior activation or behavior output merging at appropriate times to achieve the desired objective. Early behavior-based systems such as Brooks (1986) used a fixed priority ordering of behaviors. For example, the wall avoidance behavior always has priority over the foraging behavior. Obviously such a rigid system is very restricted in its capabilities and becomes difficult to manage with increasing system complexity. Differ from behavior-based approach evolutionary robotics relies on an evaluation of the system as a whole system. In this case the designer is not required to decide how to split the desired behavior into simple basic behaviors [1]. It makes robotic systems to adapt to unpredictable or changing environments without human influence [3]. Evolutionary Figure 4. Behavior-based approach [1] REFERENCES [1] [2] [3] [4] S. Nolfi and D. Floreano, Evolutionary Robotics, The MIT Press Cambridge, London, England, March 2000. J.B. Mouret and S. Doncieux, “Evolutionary Computation”, vol. 20, No. 1, Massachusetts Institute of Technology, 2012, pp. 91-133. I. Wang, K.C. Tan and C. M. Chew, Evolutionary Robotics: From Algorithms to Implementations, vol. 28, World Scientific Series in Robotics and Intelligent Systems, 2006. R. Illah and S. Nourbakhsh, Introduction to Autonomous Mobile Robots, The MIT Press, London, England 2004. “Title of paper if known,” unpublished. [5] [6] [7] [8] T. Bräunl, Embedded Robotics, Springer, Australia, 2008. R. A. Brooks, “Robust Layered Control System for a Mobile Robot”, IEEE Journal of Robotics and Automation, vol. 2, no 1, March 1986, pp. 14-23. P. Nordin and W. Banzhaf, “Genetic programming Conrollong a Miniature Robot”, Working Notes of the AAAI-95 Fall Symposium Series, MIT, Cambridge, MA, 10-12 November 1995, pp. 61-67. P. Nordin and W. Banzhaf, “Control of Khepera robot using Genetic Algorithms”, Control and Cybernetics, vol. 26, no 3, 1997.