Download Year-3 Slides (Eunice Ma)

CONFUCIUS: An Intelligent MultiMedia Storytelling Interpretation and Presentation System Minhua Eunice Ma Supervisor: Prof. Paul Mc Kevitt School of Computing and Intelligent Systems Faculty of Engineering University of Ulster, Magee Outline         Related research Overview of CONFUCIUS Automatic generation of 3D animation Semantic representation Natural language processing Current state of implementation Relation to other work Conclusion & Future work Faculty Research Student Conference Jordanstown, 15 Jan 2004 Related research  3D visualisation  Virtual humans & embodied agents: Jack, Improv, BEAT  MultiModal interactive storytelling: AesopWorld, KidsRoom, Larsen & Petersen’s Interactive Storytelling, computer games  Automatic Text-to-Graphics Systems: WordsEye, CD-based language animation  Related research in NLP     Lexical semantics Levin’s verb classes Jackendoff’s Lexical Conceptual Structure Schank’s scripts Faculty Research Student Conference Jordanstown, 15 Jan 2004 Objectives of CONFUCIUS Storywriter /playwright Movie/drama script CONFUCIUS 3D animation User /story listener  To interpret natural language sentences/stories and to extract conceptual semantics from the natural language  To generate 3D animation and virtual worlds automatically from natural language  To integrate 3D animation with speech and non-speech audio, to form an intelligent multimedia storytelling system Faculty Research Student Conference Jordanstown, 15 Jan 2004 Architecture of CONFUCIUS Natural language stories Script writer Script parser Prefabricated objects (knowledge base) LCS lexicon Natural grammar Language knowledge mapping Language Processing 3D authoring tools, existing 3D models & character models visual knowledge (3D graphic library) Text To Speech Sound effects semantic representations visual knowledge Animation generation Synchronizing & fusion 3D world with audio in VRML Faculty Research Student Conference Jordanstown, 15 Jan 2004 Software & Standards  Java    parsing semantic representation changing VRML code to add/modify animation integrating modules  Natural language processing tools   Connexor Machinese DFG parser (morphologic and syntax parsing) WordNet (lexicon, semantic inference)  3D graphic modelling   Existing 3D models (virtual human/object) on Internet Authoring tools     Humanoid characters: Character Studio Props & stage: 3D Studio Max Narrator: Microsoft Agent Modelling language & standard   VRML 97 for modelling geometry of objects, props, environment H-Anim specifications for humanoid modelling Faculty Research Student Conference Jordanstown, 15 Jan 2004 Agents and Avatars—How much autonomy?  Autonomous agents have higher requirements for sensing, memory, reasoning, planning, behaviour control & emotion (sense-emotioncontrol-action structure)  “User-controlled” avatars require fewer autonomous actions-- basic naïve physics such as collision detection and reaction still required  Virtual character in non-interactive storytelling between agents and avatars--its behaviours, emotion, responses to changing environment described in story input Virtual humans: Autonomy & intelligence: avatars characters in non-interactive storytelling interface agents low Faculty Research Student Conference Jordanstown, 15 Jan 2004 autonomous agents high Graphics library objects/props characters Simple geometry files geometry & joint hierarchy Files (H-Anim) instantiation motions animation library (key frames) Faculty Research Student Conference Jordanstown, 15 Jan 2004 Level of Articulation (LOA) of H-Anim  CONFUCIUS adopts LOA1 in human animation  animation engine adds ROUTEs dynamically based on H-anim’s joints & animation keyframes  CONFUCIUS’ human animation adapted for other LOAs. pushing objects holding objects Joints and segments of LOA1 Example site nodes on hands Faculty Research Student Conference Jordanstown, 15 Jan 2004 Semantic representations Categories Knowledge representations rule-based representation Decomposite FOPC (First Order Predicate Calculus) general knowledge representation & reasoning Typical applications expert systems semantic networks sentence representation, expert systems lexical semantics Schank’s scripts story understanding frame-based representations XML-based representations multimodal semantics Conceptual Dependency (CD) event-logic truth conditions physical knowledge representation & x-schema and f-structure reasoning (inc. Lexical-Conceptual Structure spatial /temporal (LCS) reasoning) Lexical Visual Semantic Representation (LVSR) Faculty Research Student Conference Jordanstown, 15 Jan 2004 dynamic vision (movement) recognition & generation Lexical Visual Semantic Representation  Lexical Visual Semantic Representation (LVSR): semantic representation between language syntax and 3D models  LVSR based on Jackendoff’s LCS adapted to task of language visualization (enhancement with Schank’s scripts)  Ontological categories: OBJ, HUMAN, EVENT, STATE, PLACE, PATH, PROPERTY      OBJ -- props/places (e.g. buildings) HUMAN -- human being/other articulated animated characters (e.g. animals) as long as their skeleton hierarchy is defined EVENT -- actions, movements and manners STATE -- static existence PROPERTY -- attributes of OBJ/HUMAN Faculty Research Student Conference Jordanstown, 15 Jan 2004 PATH & PLACE predicates  interpret spatial movement of OBJ/HUMANs  62 common English prepositions  7 PATH predicates & 11 PLACE predicates PATH predicates Direction feature Termination feature to from toward away_from via across along 1 0 1 0 n/a n/a n/a 1 1 0 0 0 n/a n/a PLACE predicates at behind end_of in in_front_of near on out over top_of under Faculty Research Student Conference Jordanstown, 15 Jan 2004 contact/attach feature unmarked <-contact> n/a unmarked <-contact> <-contact> <+contact> unmarked <-contact> n/a unmarked NLP in CONFUCIUS Pre-processing Part-of-speech tagger Connexor FDG parser Syntactic parser Semantic inference WordNet LCS database Disambiguation FEATURES Morphological parser Coreference resolution Temporal reasoning Lexical temporal relations Faculty Research Student Conference Jordanstown, 15 Jan 2004 Post-lexical temporal relations Visual valency & verb ontology 2.2.1. Human action verbs 2.2.1.1. One visual valency (the role is a human, (partial) movement) 2.2.1.1.1. Biped kinematics: arm actions (wave, scratch), leg actions (walk, jump, kick), torso actions (bow), combined actions (climb) 2.2.1.1.2. Facial expressions & lip movement, e.g. laugh, fear, say, sing, order 2.2.1.2. Two visual valency (at least one role is human) 2.2.1.2.1. One human and one object (vt. or vi.+instrument) e.g. throw, push, kick, open, eat, drink, bake, trolley 2.2.1.2.2. Two humans, e.g. fight, chase, guide 2.2.1.3. Visual valency ≥ 3 (at least one role is human) 2.2.1.3.1. Two humans and one object (inc. ditransitive verbs), e.g. give, show 2.2.1.3.2. One human and 2+ objects (vt. + object + implicit instr./goal/theme) e.g. cut, write, butter, pocket, dig, cook 2.2.1.4. Verbs without distinct visualisation when out of context: verbs of trying, helping, letting, creating/destroying 2.2.1.5. High level behaviours (routine events), political and social activities e.g. interview, eat out (go to restaurant), go shopping Faculty Research Student Conference Jordanstown, 15 Jan 2004 Level-of-Detail (LOD) basic-level verbs & troponyms EVENT … go cause event level verbs … walk climb limp stride trot swagger run jump manner level verbs jog romp skip bounce hop Faculty Research Student Conference Jordanstown, 15 Jan 2004 troponym level verbs Current status of implementation  Collision detection example (contact verbs: hit, collide, scratch, touch) The car collided with a wall.  using ParallelGraphics’ VRML extension--object-to-object collision  non-speech sound effects  H-Anim examples: 3 visual valency verbs John put a cup of coffee on the table.  H-Anim Site node  locative tags of object (on_table tag for table object) 2 visual valency verbs John pushed the door. John ate the bread. Nancy sat on the chair. 1 visual valency verbs The waiter came to me: “Can I help you? Sir.”  speech modality & lip synchronization  camera direction (avatar’s point-of-view) Faculty Research Student Conference Jordanstown, 15 Jan 2004 Relation to other work  Domain-independent general purpose humanoid character animation  CONFUCIUS’ character animation focuses on language-to-humanoid animation process rather than considering human modelling & motion solely  Implementable semantic representation LVSR connecting linguistic semantics to visual semantics & suitable for action execution (animation)  Categorization and visualisation of eventive verbs based on visual valency  Reusable common sense knowledge base to elicit implied actions, instruments, goals, themes underspecified in language input Faculty Research Student Conference Jordanstown, 15 Jan 2004 Conclusion & Future work  Humanoid animation explores problems in language visualization & automatic animation production  Formalizes meaning of action verbs and spatial prepositions  Maps language primitives with visual primitives  Reusable common senses knowledge base for other systems Further work Prospective applications  Discourse level interpretation  Action composition for simultaneous  Children’s education activities  Verbs concerning multiple characters’ synchronization & coordination (e.g. introduce)  Movie/drama production  Multimedia presentation  Computer games  Virtual Reality Faculty Research Student Conference Jordanstown, 15 Jan 2004

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Year-3 Slides (Eunice Ma)