Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Spoken dialog for e-learning supported by domain ontologies Dario Bianchi, Monica Mordonini and Agostino Poggi Dipartimento di Ingegneria dell’Informazione Parco Area delle Scienze 181/A 43100 Parma Italy {bianchi,mordonini,poggi}@ce.unipr.it Ontologies • Ontologies may be used in the context of e-learning to capture knowledge about different domains. • An ontology can describe learning activities like enroll in a course, take an assessment test, join a discussion group and also the actors involved in the learning process such as students, tutors etc. • A second use of an ontology regards the description of the content of the learning material. Learning objects and metadata • Many e-learning system use learning material in a standardized form of reusable learning objects. These objects can be annotated by metadata to describe the content, the technology used and so on and can be stored in a repository. • The metadata used to describe the learning objects are extension of those used to describe the digital libraries, like the Dublin Core, and are standardized by IMS. • To interact whit a repository of learning material the ontology should describe both the structure of the metadata and the structure of the domain that the learning objects deal with. System Initiative Dialog • Ontologies are considered the basis of the communication between agents too. It was successful with the advent of speech acts based communication languages for agents [8,9]. • Communication acts are also used for agent to human dialog. • The simple model is that of a single or system initiative dialog manager in which all the initiative is taken by the agent. • In this case the context of the dialog can be represented by a finite state automata. Mixed initiative dialogs • Mixed initiative dialogs arise in situations in which also the human can take the control of the dialog. They are more difficult to manage because one has to deal with belief, desire and intention models. • In many practical situation, the single initiative dialog manager is adequate. The agent can ask the human about the action to perform prompting for the required data and so on. Automatic Generation of Spoken Dialog • We present a system that supports an automatic generation of a spoken dialog between users and an agent based application from the ontologies used in the application domain. • Our aim is to use the ontology that models the agent to agent communication also to model the speech interaction between humans and agents. • In this case, the ontology is the source for the automatic generation of two grammars, one used for the speech recognition process and the other for speech generation. • The application should be multilingual. System Description • The system is based on the JADE agent development platform, the Java Speech API and the IBM Via Voice tool. • JADE (Java Agent Development framework) is a software framework to aid the development of agent applications in compliance with the FIPA specifications for interoperable intelligent multiagent systems The Java Speech API • The Java Speech API incorporates speech technology into user interfaces for applets and applications based on Java technology. • This API specifies a cross-platform interface to support command and control recognizers, dictation systems and speech synthesizers. • We have used the IBM implementation of JSAPI that provides an interface for the Via Voice tool. IBM Via Voice is a tool for the recognition and generation of speech available in many languages. • Nevertheless the system does not depend on the specific JSAPI implementation. Speech Manager • The system is based on an agent, called SpeechManager, that allows the interaction between a user and the agents of a JADE application. • To manage the conversation the SpeechManager has to maintain a context. • To represent this context this agent uses a finite state automaton (implemented through a JADE FSMBehaviour). • Each state maintains a “conversation act”. Conversation Acts • RecoSpellAct: speech recognition in spelling mode. • RecoDictAct: speech recognition in dictation mode. • RecoGramAct: speech recognition on the basis of a predefined grammar. The output is the tags associated to the rules of the grammar activated during the recognition. • RecoAltsAct: speech recognition on the basis of a list of alternative strings. • RecoIntAct: speech recognition of an integer on the basis of an integer grammar. • RecoIntSpellAct: speech recognition of an integer on the basis of the integer grammar and spelling the digits. • RecoFloatAct: speech recognition of a float number on the basis of a float grammar. • RecoDatetAct: speech recognition of a date on the basis of a date grammar. Specialized grammars • Specialized grammars was developed to support recognition of dates and numbers (integer and floating point both in dictation or in spelling modes). • These grammars needs to be implemented for each language supported but are quite general and independent from the specific ontology. SpeechManager • The ontologies used by SpeechManager agents can be derived from the ones normally used by JADE agent systems in the different application domains. • The ontology used by SpeechManager consists of two kinds of elements – Concepts: entities that an agent can handle (simple or aggregate objects). – Agent actions: the action the agents can do on the objects which are instances of defined concepts. • Both Concepts and Agent Actions have slots and filler value types. Generating the Dialog • At the beginning, the SpeechManager prompts the user to choose an Agent Action. • Each attribute of the action is used to prompt for the values of the involved concepts. • To obtain the value of a concept, the SpeechManager recursively ask the user for the value of its attributes. • Recursion terminate when the requested value is a primitive type (string, number, date etc). • At the end of recognition of an Agent Action request, the SpeechManager delegates the execution of the action to an agent of the system. • This delegated agent is found by querying the Directory Facilitator of the system about an agent able to perform the action. Input modalities and lexical entries • To fill Agent Action and its attributes from speech input, it is necessary to associate with each element (Agent Action, Concepts and terminal attributes) an input modality (grammar, alternatives or spelling) and, eventually, a specific grammar. • It is also necessary to give the lexical entries (strings of words for the synthesizer) for the different languages that correspond to the names of the Agent Actions, Concepts and their attributes involved in the interaction with the user, and to recognize the different word tokens in the grammars (e.g., the different strings in alternatives). Application to e-learning • As a part of an a e-learning system we have developed some application to give the possibility to a student to enroll in a course, to take an assessment test or to discovery the learning material he need. • In these applications the ontologies must describe the activities and actors involved in e-learning such as courses, assessment test, students, tutors and the action to perform. • The metadata and the domain ontology are used by the student to search the repository to find the learning objects. The student can “navigate” the repository on the basis of the concepts describing the subject of interest. SpeechManager: which action would you execute? (enroll in course, assessment test,……) User: I want to enroll in a course. SpeechManager: To complete the request of enroll in a course I need more information SpeechManager: what is your first name? User: Mark………… ……………………………. Part of a possible dialog between a SpeechManager agent and a user. The system supports Italian and English. Conclusion I • The method used to mapping the ontology to a dialog is very general and does not depend on the domain considered. • To build a conversation agent for a new domain it is necessary only to design the ontology and to write the dictionary. • Also the support to new languages is straightforward. One have only to extend the dictionary and give specialized rules to deal with numbers and dates. Now the system supports Italian and English. Conclusion II • Another problem to address is that the dialog generated is not very flexible. To give a more usable interface it is necessary to deal with synonyms and paraphrases that the user can say. • On the other side a general purpose recognition grammar, even if useful for written text processing, is not equally valuable for speech recognition. • Our approach is to generate a tailored grammar for each application domain. So we obtain a limited number or grammar rules and of lexical entries, improving the reliability of speech recognition and the robustness of the system.