Download KANTRA: Human-Machine Interaction for Intelligent Robots

Sonderforschungsbereich 314 Künstliche Intelligenz - Wissensbasierte Systeme KI-Labor am Lehrstuhl für Informatik IV Leitung: Prof. Dr. W. Wahlster VITRA Universität des Saarlandes FB 14 Informatik IV Postfach 151150 D-66041 Saarbrücken Fed. Rep. of Germany Tel. 0681 / 302-2363 Bericht Nr. 104 KANTRA: Human-Machine Interaction for Intelligent Robots Using Natural Language Tim C. Lueth, Thomas Laengle, Gerd Herzog, Eva Stopp, Ulrich Rembold Juni 1994 ISSN 0944-7814 104 KANTRA: Human-Machine Interaction for Intelligent Robots Using Natural Language Thomas Laengle?, Tim C. Lueth? Eva Stopp, Gerd Herzog, Ulrich Rembold? ? SFB 314 – Project VITRA FB 14 Informatik Prof. Dr. W. Wahlster Universität des Saarlandes D-66041 Saarbrücken email: [email protected] Institute for Real-Time Computer Systems and Robotics Prof. Dr.-Ing. U. Rembold and Prof. Dr.-Ing. R. Dillmann Universität Karlsruhe D-76128 Karlsruhe email: [email protected] Abstract In this paper, a new natural language interface is presented that can be applied to make the use of intelligent robots more flexible. This interface was developed for the autonomous mobile two-arm robot KAMRO, which uses several camera systems to generate an environment model and to perform assembly tasks. A fundamental requirement in human-machine interaction for intelligent robots is the ability to refer to objects in the robot’s environment. Hence, the interface and the intelligent system need similar environment models and it is necessary to provide current sensor information. Additional flexibility can be achieved by integrating the man-machine interface into the control architecture of the robot and to give it access to all internal information and to the models that the robot uses for an autonomous behaviour. In order to fully exploit the capabilities of a natural language access, we favour a dialogue-based approach, i.e., for the interface, KANTRA, presented here, the human-machine interaction is not restricted to unidirectional communication. 1 Introduction The future use of intelligent robots [Rembold et al. 93] in manufacturing or as service robots for different applications will lead to high demands for manmachine interaction and communication. The more robots work autonomously the more cooperation of man-machine and machine-machine [Fukuda et al. 93] will be necessary. Distributed multi-robot systems show the way future intelligent manufacturing systems can operate independently [Asama 92; Tsukune et al. 93]. In such an environment, all machines can’t always be programmed or controlled in the same way. On that account, natural language as a communication medium for humans is an efficient means to make a technical system easier accessible to its users [Wahlster 89]. A practical advantage of a natural language access is Proceedings of the 3rd International Workshop on Robot and Human Communication 1 the possibility to convey information in a varying degree of condensation and to communicate on different levels of abstraction in an application-specific way. In this article, we want to report about the joint efforts of the University of Karlsruhe and the University of the Saarland at providing a natural language access to the autonomous mobile two-arm robot KAMRO which is being developed at IPR. In the second section, the state of the art in natural language access to intelligent technical systems is presented. In section three, our mobile robot KAMRO is briefly described. In section four and five, the main aspects of natural language access are summarized, and in section six, our conception for a dialoguebased natural language access system is presented. The article ends with an evaluation and our conclusion for future work. 2 State of the Art Although natural language processing and robotics constitute two major areas of AI, they have been studied rather independently. Only a few works are concerned with natural language access for human-machine-interaction and communication. Sondheimer [Sondheimer 76] focuses on the problem of spatial reference in natural language machine control. The well known SHAKEY system [Nilsson 84], a mobile robot without manipulators, is able to understand simple commands given in natural language. The work described in [Sato & Hirai 87] concentrates on language-aided instruction for teleoperational control. Specific words can be utilized to simplify the specification of teleoperational functions for the instruction of a remote robot system. Torrance [Torrance 94] presents a natural language interface for a navigating indoor office-based mobile robot. In addition to giving commands and asking questions about the robot’s plans the user can associate arbitrary names with specific locations in the environment. Some theoretical aspects of natural language communication with robot systems from the perspective of computer linguistics are discussed in [Lobin 92]. Other approaches have been concerned with natural language control of autonomous agents within simulated 2D or 3D environments [Badler et al. 91; Chapman 91; Vere & Bickmore 90]. One salient aspect for natural language access to robot systems is the relationship between sensory information and verbal descriptions. Such issues have already been investigated in the field of integrated natural language and vision processing [Bajcsy et al. 85; Herzog & Wazinski 94; Neumann 89; Wahlster et al. 83]. 3 The Intelligent Mobile Robot KAMRO Higher intelligence and greater autonomy of more advanced robot systems increase the requirements for the design of a flexible interface to control the system on different levels of abstraction. In the KAMRO (Karlsruhe Autonomous Mobile Robot) project, for example, an autonomous mobile robot (Fig. 1) for assembly tasks is being developed which also has the capability of recovering from error situations [Lüth & Rembold 94]. The autonomous mobile robot KAMRO is a 2 two-arm robot-system that consists of a mobile platform with an omnidirectional drive system, two Puma 260 manipulators, and different sensors for navigation, docking and manipulation. Figure 1: The Mobile Robot KAMRO KAMRO is capable of performing assembly tasks (Fig. 2) autonomously. The tasks or robot operations can be described on different levels: assembly precedence graphs, implicit elementary operations (pick, place) and explicit elementary operations (grasp, transfer, fine motion, join, exchange, etc.). Given a complex task , it is transformed by the control architecture (Fig. 3) from assembly precedence graph level to explicit elementary operation level. The generation of suitable sequences of elementary operations depends on position and orientation of the assembly parts on the worktable while execution is controlled by the realtime robot control system. Status and sensor data which is given back to the planning-system enable KAMRO to control the execution of the plan and correct it, if necessary. 4 Human-Robot Interaction Tactile, acoustic, and vision sensors provide robots with perceptual capabilities, enabling them to explore and analyze their environment in order to behave more intelligently. The intelligent robot system KAMRO uses several camera systems to generate an environment model in order to perform complex assembly tasks. 3 Figure 2: The Cranfield Assembly Benchmark To interact with this robot system, it would be an insufficient proposal to use natural language just as a command-language for the operator of a robot system. The reason is that incomplete information must may completed by the human operator if the robot is not able to generate the missing information from its sensors or knowledge base. Four main situations of human-machine interaction can be distinguished in the context of natural language access to the KAMRO system: Task specification: Operations and tasks which are to be performed by the robot can be specified on different levels. On high level the commands are like “assemble Cranfield Benchmark” The lowest level are explicit elementary operations as “finemotion” or “grasp”. This is useful if a operator likes to control the robot or its components directly like teleoperation. On a higher level, the operator can also specify a suitable assembly sequence, in order to perform a certain task. Several different descriptions can lead towards a assembly precedence graph generated automatically from the sentences. The most abstract specification would just provide a qualitative description of the desired state, i.e., the final positions of the different assembly parts. Execution monitoring: The capability of autonomous behaviour gives the robot systems the ability to work up an assembly mission in different orders. Therefore, it is important to inform the operator what part of the mission the robot performs. Depending on the demands formulated by the user, it is possible to give the descriptions and explanations in more or less detail. Explanation of error recovering: The capability of recovering from error situations leads to dynamic adjustment of assembly plans during execution. This feature makes it more difficult for the operator to predict and understand the behaviour of the robot. Natural language explanations and descriptions of why and how a certain plan was changed would increase the cooperativeness of the intelligent robot. Similar descriptions are even 4 Knowledge base Action plan Plan execution system FATE Robot operation Status Real-time robot control RT-RCS KAMRO Figure 3: Simplified Control Structure of the KAMRO System more important in error situations which can not be autonomously handled by the robot. Updating the environment representation: Because of the dynamic nature of the environment, geometric and visual information can never be complete, even for an autonomous robot equipped with several sensors. In a natural-language dialogue, user and robot can aid each other by providing additional information about the environment and world model. To completely implement the interaction possibilities on a high level, it would be necessary to take all four above mentioned situations into account. 5 Dialog-Oriented Use of Natural Language As mentioned above, simple instructions in a natural language syntax are not sufficient for interacting with an autonomous mobile robot. In any command-based approach the limitations of uni-directional communication will soon become obvious [Webber et al. 93]. Flexible human-machine interaction is only possible if intelligent robots are made more responsive. A robot needs the capability to report task specific information, to explain its behaviour, and to provide information about the environment. We claim, that a dialog-based approach has to be taken for all the communicative situations mentioned above. That way, further queries can always be utilized to resolve ambiguities or misunderstandings. Most existing natural language interfaces have been developed in order to provide access to databases or expert systems (e.g. [Allgayer et al. 89; Carbonell et al. 85; Trost et al. 87]). In general, three main modules of a natural language dialog system can be distinguished: 5 1. Analysis component: A parser translates the natural language input from the user into a semantic representation encoded in a knowledge representation language. These propositions form the basis for further processing. 2. Evaluation component: Utterances have to be interpreted with respect to the internal world model of the intelligent system. Further interaction between natural language access system and application system is required to perform the tasks requested by the user. Information is fed back to the dialog system if queries have to be answered or if the user has to be informed about interesting state changes. 3. Generation component: The information to be expressed to the user needs to be translated from its propositional form into natural language utterances. Depending on the situational context, either a response, a query, or a textual description is generated. Various knowledge sources constitute the knowledge base of a natural language dialog system. Lexicon and grammar provide the morpho-syntactic knowledge for the analysis and generation of natural language utterances. Domainspecific conceptual knowledge serves as a link between the linguistic level and the internal representation on the conceptual level of the application system. A record of the dialog contributions of both user and system is stored in the dialog memory which is necessary for the analysis and generation of pronouns, etc. In addition, a natural language interface maintains an explicit model of the user’s goals, plans, and prior knowledge about the domain under consideration. Such a user model is a necessary prerequisite for a system to be capable of exhibiting cooperative dialog behaviour. A peculiarity of a natural language interface to a robot, database, or expert system is the fact that the meaning of a natural language expression depends on the contents of the domain or world model of the application system. Thus, a referential semantics has to be defined, which relates objects on the linguistic level to entities in the knowledge base of the intelligent system. The access to sensory data in an autonomous mobile robot allows the definition of a referential semantics that is perceptually anchored. Spatial reference, i.e., the way humans refer to the location, direction, orientation, and relation of objects and activities in space, is one of the most important phenomena here. The ability to refer to spatial aspects of the complex environment is essential for human-robot interaction in any of the four communicative situations mentioned in the previous chapter. 6 The Natural Language Interface KANTRA The KAMRO system is an autonomous mobile two-arm robot situated in a complex dynamic environment. Flexible human-robot interaction is required in order to cope with diverse communicative situations like task specification, execution monitoring, explanation of error recovery, and updating the environment representation. Tasks can be specified on different levels of abstraction. Error recovery 6 is explained and task execution is monitored by generating adequate natural language descriptions. Information about the environment and the world model can be exchanged in a more complex dialogue. KAMRO’s natural language interface KANTRA (KAmro Natural language TRAnslator) is founded on a dialog-based approach, as it has been proposed in the previous chapter. The architecture of our integrated system is shown in Fig. 4. Natural language commands and queries from the user form the input for the natural language access system. The linguistic analysis translates the natural language expressions into propositions. The syntactic-semantic parser we use is a modified version of SB-PATR [Harbusch 86] which is based on a unification grammar with semantic information. Commands Queries Analysis Evaluation Morpho-Syntactic Knowledge Conceptual Knowledge Encapsulated User Model of the Robot Knowledge Linguistic Dialog Memory ... Autonomous Mobile Robot Generation Descriptions Explanations Queries Task Representation KAMRO Environment Representation Execution Representation Figure 4: Structure of the KANTRA System The propositions are further interpreted in the evaluation component, which is also responsible for the reference semantic interpretation. For this task, the natural language system must have access to all processed information and the environment representation inside the intelligent agent. The evaluation component realizes the interface to the robot and the shared knowledge sources. The encapsulated knowledge of the robot comprises that part of its knowledge base which pertains to low-level control of the mobile platform, manipulators, and sensors. Depending on the robot’s reactions, like confirmations, error notifications, sensor input, information about changes in the world model etc., the evaluation component is also responsible for the appropriate reactions and responses of the access system. The generation component translates selected propositions into natural language descriptions, explanations, and queries. An incremental generator, which is based on Tree Adjoining Grammars, generates the surface structures [Harbusch et al. 91]. KANTRA is an extension of the VITRA (Visual Translator) system, which allows for natural language access to visual data [Herzog & Wazinski 94]. A referential semantics has been defined which connects verbal descriptions to visual and geometric information. This approach provides powerful methods to treat the problem of spatial reference. In order to use and understand localization expressions, the interface has to take into account how the user perceives 7 Figure 5: Different Localisation Expressions for Robot and Operator the assembly parts and the robot (cf. Fig. 5). A more detailed description of the utilization of spatial relations in KANTRA can be found in [Stopp et al. 94]. 7 Evaluation In the context of natural language access to the autonomous mobile robot KAMRO, many important applications are presented in the first part of the article: the interface can be used for the four main situations of human-machine interaction: task specification, execution monitoring, explanation of error recovering, and updating the environment representation. Information about the environment and the world model can be exchanged in a more complex dialogue. Natural language utterances have to be interpreted with respect to the robots current environment, i.e., a reference semantics is required which connects verbal descriptions to visual and geometric information. Implementing interfaces for all of the mentioned four types of natural dialogue areas is a challenging task. The work described here is only a first step towards the first application, task description. The KAMRO interface is based on the VITRA (Visual Translator) system, which allow natural language access to visual data. To use spatial relationships and to understand localization expressions, the interface must consider how the operator perceives the assembly parts and the robot. If the natural language interface is supposed to have access to all processed information as well as the environment representation inside the intelligent system, it has to be integrated into the robot’s control structure This way, both systems have access to tasks, plans and internal models. To implement the interaction possibilities on a high level, it would be necessary to take all four above mentioned situations into account. The presented concept is one step towards future natural language interfaces for intelligent robots. 8 8 Conclusion The aim of our joint efforts is the integration of the Karlsruhe autonomous mobile robot KAMRO and the natural language component VITRA developed in Saarbruecken. In this contribution, we have focused the problem of task specification or to be more specific, the spatial relations at task description. The specific need for generating and understanding localization expressions was shown, and it was also described how such natural language utterances can be processed taking into account the information provided by the vision sensors and the model of the environment. In this context, the presented concept is one step towards natural language access to intelligent robot systems. 9 Acknowledgement This research work was performed at the Institute for Real-Time Computer Systems and Robotics (IPR), Prof. Dr.-Ing. U. Rembold and Prof. Dr.-Ing. R. Dillmann, Faculty for Computer Science, University of Karlsruhe, and at the Department of Computer Science, University of the Saarland, Saarbruecken, Prof. Dr. W. Wahlster. The project is part of the nationally based research project on artificial intelligence (SFB 314) funded by the German Research Foundation (DFG). References [Allgayer et al. 89] J. Allgayer, K. Harbusch, A. Kobsa, C. Reddig, N. Reithinger, and D. Schmauks. XTRA: A Natural-Language Access System to Expert Systems. Int. Journal of Man-Machine Studies, 31(2):161–195, 1989. [Asama 92] H. Asama. Distributed Autonomous Robotic System Configurated with Multiple Agents and Its Cooperative Behaviour. Journal of Robotics and Mechatronics, 4(3):199–204, 1992. [Badler et al. 91] N. I. Badler, B. A. Barsky, and D. Zeltzer (eds.). Making Them Move: Mechanics, Control, and Animation of Articulated Figures. San Mateo, CA: Kaufmann, 1991. [Bajcsy et al. 85] R. Bajcsy, A. Joshi, E. Krotkov, and A. Zwarico. LandScan: A Natural Language and Computer Vision System for Analyzing Aerial Images. In: Proc. of the 9th IJCAI, pp. 919–921, Los Angeles, CA, 1985. [Carbonell et al. 85] J. G. Carbonell, W. M. Boggs, M. L. Mauldin, and P. G. Anick. The XCALIBUR Project: A Natural Natural Language Interface to Expert Systems and Data Bases. In: S. Andriole (ed.), Applications in Artificial Intelligence. Petrocelli, 1985. [Chapman 91] D. Chapman. Vision, Instruction, and Action. Cambridge, MA: MIT Press, 1991. 9 [Fukuda et al. 93] T. Fukuda, K. Sekiyama, T. Ueyama, and F. Arai. Efficient Communication Method in the Cellular Robotic System. In: IROS IEEE/RSJ Int. Conf. on Intelligent Robots and Systems, pp. 1091–1096, Yokohama, Japan, 1993. [Harbusch et al. 91] K. Harbusch, W. Finkler, and A. Schauder. Incremental Syntax Generation with Tree Adjoining Grammars. In: W. Brauer and D. Hernandez (eds.), Verteilte Künstliche Intelligenz und kooperatives Arbeiten: 4. Int. GI-Kongreß Wissensbasierte Systeme, pp. 363–374. Berlin, Heidelberg: Springer, 1991. [Harbusch 86] K. Harbusch. A First Snapshot of XTRAGRAM, A Unification Grammar for German Based on PATR. Memo 14, Universität des Saarlandes, SFB 314 (XTRA), 1986. [Herzog & Wazinski 94] G. Herzog and P. Wazinski. VIsual TRAnslator: Linking Perceptions and Natural Language Descriptions. Artificial Intelligence Review Journal, 8, 1994. Special Volume on the Integration of Natural Language and Vision Processing, edited by P. Mc Kevitt, to appear. [Lobin 92] H. Lobin. Situierte Agenten als natürlichsprachliche Schnittstellen. Arbeitsberichte Computerlinguistik 3-92, Univ. Bielefeld, Germany, 1992. [Lüth & Rembold 94] T. C. Lüth and U. Rembold. Extensive Manipulation Capabilities and Reliable Behaviour at Autonomous Robot Assembly. In: Proc. of IEEE Int. Conf. on Robotics and Automation, San Diego, CA, 1994. [Neumann 89] B. Neumann. Natural Language Description of Time-Varying Scenes. In: D. L. Waltz (ed.), Semantic Structures, pp. 167–207. Hillsdale, NJ: Lawrence Erlbaum, 1989. [Nilsson 84] N. J. Nilsson. Shakey the Robot. Technical Note 323, Artificial Intelligence Center, SRI International, Menlo Park, CA, 1984. [Rembold et al. 93] U. Rembold, T. C. Lueth, and A. Hoermann. Advancement of Intelligent Machines. In: ICAM JSME Int. Conf. on Advanced Mechatronics, pp. 1–7, Tokyo, Japan, 1993. [Sato & Hirai 87] T. Sato and S. Hirai. Language-Aided Robotic Teleoperation System (LARTS) for Advanced Teleoperation. IEEE Journal on Robotics and Automation (RA), 3(5):476–480, 1987. [Sondheimer 76] N. K. Sondheimer. Spatial Reference and Natural Language Machine Control. Int. Journal of Man-Machine Studies, 8:329–336, 1976. [Stopp et al. 94] E. Stopp, K.-P. Gapp, G. Herzog, Th. Laengle, and T. C. Lueth. Utilizing Spatial Relations for Natural Language Access to an Autonomous Mobile Robot. In: L. Dreschler-Fischer and B. Nebel (eds.), KI-94. Berlin, Heidelberg: Springer, 1994. 10 [Torrance 94] M. C. Torrance. Natural Communication with Robots. Master’s thesis, MIT, Department of Electrical Engineering and Computer Science, Cambridge, MA, 1994. [Trost et al. 87] H. Trost, E. Buchberger, W. Heinz, C. Hörtnagl, and J. Matiasek. Datenbank-Dialog: A German Language Interface for Relational Databases. Applied Artificial Intelligence, 1:181–201, 1987. [Tsukune et al. 93] H. Tsukune, M. Tsukamoto, T. Matshisita, F. Tomita, K. Okada, T. Ogasawara, K. Takase, and T. Yuba. Modular Manufacturing. Journal of Intelligent Manufacturing, 4:163–181, 1993. [Vere & Bickmore 90] S. Vere and T. Bickmore. A Basic Agent. Computational Intelligence, 6(1):41–60, 1990. [Wahlster et al. 83] W. Wahlster, H. Marburger, A. Jameson, and S. Busemann. Over-answering Yes-No Questions: Extended Responses in a NL Interface to a Vision System. In: Proc. of the 8th IJCAI, pp. 643–646, Karlsruhe, FRG, 1983. [Wahlster 89] W. Wahlster. Natural Language Systems: Some Research Trends. In: H. Schnell and N. O. Bernsen (eds.), Logic and Linguistics: Research Directions in Cognitive Science - European Perspectives, Vol. 2, pp. 171– 183. Hillsdale, NJ: Lawrence Erlbaum, 1989. [Webber et al. 93] B. Webber, B. Grosz, S. Hirai, T. Rist, and D. Scott. Instructions: Language and Behaviour (Panel). In: Proc. of the 13th IJCAI, pp. 1684–1689, Chambery, France, 1993. 11

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download KANTRA: Human-Machine Interaction for Intelligent Robots