Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Engineering goalkeeper behavior using an emotion learning method Stevo Bozinovski and Peter Schoell German National Center for Information Technology (GMD) Sankt Augustin Abstract. This paper reports an approach toward engineering an evolving robot behavior for a role of a goalkeeper. In the course of our study of the problem, we developed a hand coded controller as well. The method of learning we used is emotion based self-reinforcement learning. The training of the learning robot is made by a non-didactic trainer which performs random shooting. We present also a typical learning curve as result from a training experiment. Introduction The new AI challenge, set up by the RoboCup initiators, in next 50 years to develop a football team that will beat a human football team, induced many research efforts to converge toward building robots capable of playing football (soccer). Various issues emerged that will be a challenge for the next years, and they are covered in contemporary reports on the subject (e.g. Asada et al., 1999a). One of the issues is learning to play football. Close to that issue is the issue which we call the issue of of football training, which we state as follows: Human footbal players are trained, not programmed to play football. Is it possible to achieve that with robot players? Does it make sense to define a RoboCup league in which only trained robots will play? Following such an idea, we tried to develop a training based controller for a robot (a physical robot or a software agent). We considered the role of the goalkeeper robot, which looked the simplest among other roles in a football team. In what follows we will describe briefly our development of a trained goalkeeper. A hand-coded goalkeeper controller It seems that the RoboCup effort turns gradually toward a role based team structure (Asada et al. 1999a). In such a role based structure the role of a goalkeeper (goalie, goaltender, goalman) has special importance. There are reports of building teams without implementing a goalkeeper (Tambe 1999). But it seems, as development progresses that a role of a goalkeeper should be considered. The goalkeeper role can be assigned just to one player, or possibly to each player, with ability to take that role in a recognized situation. Stone and Veloso (1999) cover the issue of rigid and flexible roles within a RoboCup team. To study the role of the goalkeeper, we first developed a hand coded controller. We used previous experience in building GMD robots (Bredenfeld et al. 1999). From the previous experience we can define two main problems in connection with the goalkeeper: the selflocalization problem and the ball-blocking problem. Here we will not consider the self localization problem in detail but let us only mention that we considered that problem for real robots and we developed a solution (Schoell, 1999). Let us only mention that four behaviors from our Dual Dynamics approach (Jaeger and Christaller, 1998) were used for this purpose: ToggleBack, ToggleForward, CorrectHeading, and InitRobot. The robot can determine it´s position by several infrared sensors. Goal pols are used as fixed landmarks, and also the rear wall of the goal. The individual behaviors are specified with the Dual Dynamics Design Tool developed by A. Bredenfeld and tested on a software simulator. Here we will concentrate on the problem goalkeeper against the ball (ball blocking problem). Previously we used a simple behavior for blocking the ball. In the course of study for developing an evolving, trained goalkeeper, we developed a simulation program with the following three behaviors: Sentry: Sentry is the default behavior and it is activated when the ball is not in sight. Since we assume that the robot must not go beyond the goal frame, it is also activated when the ball is left to the goal while the robot is already on the left edge of the goal (analogously for the right edge). As part of this behavior, when the ball is not in sight, the robot positions itself in the middle of the goal frame. Predict. This behavior is activated when the ball is visible and is producing a trajectory that is not close to the goal. In this behavior the robot is moving along with the ball with a speed equal to the ball speed. With this behavior the robot expresses prediction where the ball will appear close to the goal in near future. Intercept. This is the behavior that is activated when the ball is close to the goal. As a result, the robot is moving toward the ball with near maximum speed in order to intercept the ball. Actually, the robot is always moving in one dimension, along the goal, so the interception is done in that dimension. The optimization parameter is the distance between the ball and the robot along the robot movement. A simulation program, written as Java animation applet, was developed to explore the performance of a goalkeeper designed in such a way, and it showed successful results. A trained goalkeeper There have been several reports on training a football playing robot. An interesting approach has been reported by Tambe et al. (1999). They trained a robot for a shooting role, and actually implemented the supervised pattern classification method. A human expert selected a training set of various possible shooting situations, and assigned a desired shoot for each situation. It was shown that a successful training is achieved using this method. Another approach has been reported for passing-shooting pair from the Osaka team (Asada et al. 1999b) but with limited success. As Werger (1999) reports, the system was not implemented in a real play. It was mentioned (Asada et al. 1999a) that the Osaka team used reinforcement learning for training the goalkeeper too, but we had not yet the opportunity to study description of that training. We studied the goalkeeper problem from a viewpoint of implementing a successful training. A candidate approach was the supervised pattern classification training, as previously done by Tambe et al. (1999) for a shooter role. But, during the study we recognized that the goalkeeper problem has similarity to the pole-balancing problem (Figure 1). The pole balancing problem is well known in reinforcement learning theory, and we already have some experience in solving that problem (Bozinovski and Anderson 1983). So, we decided to try that approach, using the Crossbar Learning, as we did with the pole balancing problem. Figure 1. Viewing the goalkeeper problem as a pole balancing problem The Crossbar Learning approach Crossbar learning method was implemented in the Crossbar Adaptive Array (CAA) architecture (Bozinovski 1982), which we describe here briefly (Figure 2). The learning method it introduced is now recognized as a forerunner of the well known Q-learning method (Barto 1996), but the contribution of this architecture to machine learning is still under consideration (Bozinovski 1999). Behavioral Environment sensors s1 s2 s3 sM .... b1 situation recognizer situations x1 x2 xm crossbar learning memory bn behavior routines B1 B2 B3 behavior selection three state buffers emotion judgment genome string Genetic Environment Figure 2. Crossbar adaptive array architecture As Figure 2 shows, the CAA architecture is assumed to exist in two environments: a behavioral environment in which it behaves, and a genetic environment which it communicates its genetic information with. It receives a set of signals from the behavioral environment, and makes environment partitioning in several situations. Situation is a concept which is behavior dependent. For a previously chosen set of behaviors, a situation is a region in the input variable space, being in which, a behavior-based system triggers the assigned behavior. The situations are received by the crossbar learning memory of the system. This memory firstly computes the emotion being in the encountered situation, and then also it computes the behavior (action) that should be triggered in that situation. It´s learning rule is w(i,j) = w(i,j) + emotion(k) where i is a previously performed behavior, j is the situation in which the behaviuor i was performed, k is the situation which is consequence of i and j, w(i,j) is the element of the learning matrix named crossbar element, and emotion(k) is computed emotion of being in k. . In CAA emotion is understand as internal state evaluation, performed by the agent itself. From the genetic environment a CAA agent receives a genome (or species) string which carries information about some internal states of the agent, in connection to the environment situations. It assures that an agent will feel bad in a dangerous situation; otherwise, with feeling good or neutral in dangerous situiation, it will not survive the environment due to the wrongly inherited genetic memory. In the case of a goalkeeper agent, we introduced that only being in a situation of received goal, the goalkeeper will feel bad. So the goalkeeper is genetically bound to learn to avoid bad situations, and will tend to defend the goal. Goalkeeper training using crossbar learning method In implementing crossbar learning to a goalkeeping robot, we divided the input space in 10 situations depending on the ball vector in the field. For example, situation 1 is recognized when the ball is left of the goalkeeper and is moving left, situation 5 is recognized when the ball is in front of the robot and is moving straight to the goal, etc. The trainer is chosen to be non-didactical, in a sense that he is not introducing lessons and didactic order between them. He is just shooting the ball randomly in the direction toward the goal side. The ball sometimes misses the goal (a miss shot), and sometimes bounces from the side-walls. Some of the shoots reach the goal, directly or after bouncing. It is expected that the goalkeeper will learn to defend those shoots. We distinguish a training phase and an exploitation phase of the goalkeeper. In the training phase the goalkeeper is trained, and after a certain level of performance is achieved, the goalkeeper is assumed ready to enter a real match, exploitation phase. A performance measure is needed to decide whether a goalkeeper is well-trained and can enter ist exploitation phase. As performance measure we choosed the ratio defended/received goals. We estimated that for a good trained human goalkeeper that ratio must be at least 3, which means 75% of all the balls reaching the goal frame are defended. But actually we trained our robot till that ratio is 4 or more, depending of the experiment. We assumed that the training shoud not last more than 500 shooting trials, including miss-shoots. With such a restriction we obtained a performance retio between 4 and 10. Figure 3 shows a learning curve of one of our experiment Learning Curve for Experiment 8 6 5 4 3 2 1 Number of shoots toward the goal 177 169 161 153 145 137 129 121 113 97 105 89 81 73 65 57 49 41 33 25 17 9 0 1 Performane: Defended/received goals 7 Figure 3. A learning curve for an evolving goalkeeper As figure 3 shows, at the beginning the robot can possibly defend the goal by chance, but after a several steps the learning performance ratio will drop, and then will increase due to the learning process. In this experiment, after about 50 shots toward the goal, the robot has gained experience enough to start behaving as a goalkeeper. Its performance is getting better gradually. But although he is trained to recognize most of the situations after 50 trials, some of the situations are learned after about 113 shoots toward the goal. The trainng in this experiment is carried out until the performance coeficient reached value 7. It is assumed that this goalkeeper is now well trained and can enter its exploitation phase. We carried out a number of experiments on one of our Java simulators, and all were successful, meaning all achieved a performance ratio greater that 4 in less than 500 training trials, and all produced an evolving, well trained goalkeeper. Conclusion Robot training for football playing is one of the challenges that can be recognized in connection with the RoboCup initiative. Here we described our effort of building a goalkeeper robot using a training method, and an emotion based learning architecture as a robot controller. Our simulation study proved the feasibility of this method for utilization in real robots. Our future work goes toward implementation of this learning scheme to a dual dynamics based robot controller (Jaeger and Christaller, 1998, Bredenfeld et al., 1999). References Asada M., Kitano H., Noda I., Veloso M.: Robocup: Today and tomorrow – What we have learned. Artificial Intelligence 110: 193-214, 1999a Asada M., Uchibe E., Hosoda K.: Cooperative behavior acquisition for mobile robots in dynamically changing real worlds via vision-based reinforcement learning and development. Artificial Intelligence 110: 275-292, 1999b Barto A. Reinforcement learning. In Omidvar and Elliot (Eds.) Neural networks for Control, Academic Press, 1996 Bredenfeld A., Cristaller T., Goehring W., Guenter H., Jaeger H., Kobialka H-U., Ploeger P-G., Schoell P., Siegberg A., Verbeek C., Wilberg J. Behavior engineering with „dual dynamics“ model and design tools. ICAI-99 RoboCup Workshop, Stockholm, 1999 Bozinovski S. A self learning system using secondary experiment In R.Trappl (Ed.) Cybernetics and Systems, North Holland, 1982 Bozinovski S., Anderson C. Associative memory as controller of an unstable system: Simulation of a learning control, Proc. IEEE Melecon, C5.11, Athens, 1983 Bozinovski S.: Crossbar Adaptive array: The first connectionist network that solved the delayed reinforcement löearning problem. In A. Dobnikar, N. Steele, D. Pearson, R. Albrecht (Eds.) Artificial Neural Nets and Genetic Programming, Springer Verlag, 1999 Jaeger H., Christaller T. Dual dynamics: designing behavior systems for autonomous robots. Artificial Life and Robotics 2: 108-112, 1998 Schoell P. The Goalkeeper of the GMD RoboCup team. http://ais.gmd.de/BE/ddd Tambe M., Adibi J., Al-Onaizan Y., Erdem A., Kaminka G., Marsella S., Muslea I.,: Building agent teams using an explicit teamwork model and learning. Artificial intelligence 110: 215-239, 1999 Werger B.: Cooperation without deliberation: A minimal behavior-based approach to multi-robot teams. Artificial Intelligence 110: 293-320, 1999 Acknowledgment The first author wants to express great gratitude being invited by Prof. Thomas Christaller and Dr. Herbert Jaeger to work with GMD, as well as for the excellent conditions provided. Dr. Jaeger also made valuable contributions to this work. Dr Bernd Mueller made valuable comments on this paper.