Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
WELCOME TAG Based Parsing for Machine Translation English to Indian Language Dr. Hemant Darbari Programme Co-ordinator Applied Artificial Intelligence Group, & ACTS Advanced Computing Training School C-DAC, Pune [email protected] Outline MANTRA: Introduction Parsing Process in TAG: An Overview Workflow of TAG Parser Generation Process in MANTRA Generation Process in MANTRA for Multlingual Translation Sample Outputs of MANTRA Samples of Constructions Solved through TAG Issues Regarding Structural Differences and Translation Accuracy System specifications MANTRA: Achievements MANTRA: Introduction MANTRA MANTRA is an acronym of MAchiNe assisted TRAnslation tool. A Tree Adjoining Grammar (TAG) based Machine Translation System of Applied AI Group of C-DAC, Pune MANTRA translates English documents into Hindi and other Indian Languages, such as Oriya <O>, Tamil <T>, Urdu <U>, Marathi <M> & Bangla <B> MANTRA covers the following domains: Administration, Finance, Agriculture, Small Scale Industries, Information Technology and Healthcare, Tourism and Proceedings and documents of Rajya Sabha Parsing Process in TAG An Overview Tree Adjoining Grammar (TAG) TAG Stands for Tree Adjoining Grammars • The formalism of this grammar is based on investigation and research of Arvind Joshi (1987) • Tree is the basic building blocks of this formalism • In contrast to other formalism, where dependencies are defined between elements of rule (node), in TAG dependencies are defined between different trees . Tree Adjoining Grammar (TAG) A TAG is defined as a 5-tuple grammar G = (N, T,S,I,A) where • • • • • N is a finite set of non-terminal symbols T is a finite set of terminals S is a distinguished non-terminal, I is a finite set of trees called initial trees and A is a finite set of trees called auxiliary trees Tree Adjoining Grammar (TAG) • This is an LR Parser • Combines both top-down and bottom-up operations that’s why it is called hybrid parser. • And Supports Multiple Parallel Parses Tree Adjoining Grammar (TAG) A state S is defined as a 10-tuple, S=[a, dot, side, pos, l, ft, fr, star, t~, b~] where: • a: is the current tree being parsed. • dot: current position of the dot in the tree a. • side: is the side of the symbol the dot is on side E {left, right}. • pos: is the position of the dot pos E {above, below}. • l:latest index in the input lexical array Contd… Tree Adjoining Grammar (TAG) A state S is defined as a 10-tuple, S=[a, dot, side, pos, l, ft, fr, star, t~, b~] where: • star: is the position of most recently adjuncted node • foot_l: index of input lexical array that is found before foot node • foot_r: index of input lexical array that is found after foot node • tl* : index of input lexical array corresponding to point of adjunction as star • bl* : index of input lexical array that is found just before the foot node at star Tree Adjoining Grammar (TAG) There are two fundamental trees in TAG • Initial Tree • Auxiliary Tree Sentences can be represented using a derived tree, constructed from initial and auxiliary tree through Adjunctions and/or Substitutions operation Tree Adjoining Grammar (TAG) Initial tree • Initial trees represent basic syntactic relation in a sentence • Every initial tree at the interior node is labeled with a nonterminal symbol • Every Frontier node is either labeled with terminal symbols or non-terminal symbols which are marked with substitution marked ‘ ‘ • A derivation start with an initial tree combining via either substitution or adjunction Tree Adjoining Grammar (TAG) Non-terminal Nodes NP_r S_r D_r N D NP VP the boy V Frontier nodes [ α 2] Terminal nodes left [ α 1] INITIAL TREES [ α 3] Tree Adjoining Grammar (TAG) Initial tree Example for Initial Tree: Ram has arrived S NP NP0 N Ram -> nodes marked as ( VP V arrived VP V VP* (NA) has ) are substitution mark to indicate initial tree Tree Adjoining Grammar (TAG) Substitution • Substitution is simple attachment operation • Substitution replaces a frontier node with another tree whose top node has the same label • After substitution the result is a derived tree • Only initial or derived tree can be substituted in another tree Tree Adjoining Grammar (TAG) Auxiliary Trees VP VP * adj adv adj* adv -> nodes marked as ( NP* adj pretty today [β1] NP [β2] * ) are foot nodes. [β3] Tree Adjoining Grammar (TAG) NP_r NP_r D_r N D N D the boy [α] [ α 1] boy the [Derived tree] Substitution operation Tree Adjoining Grammar (TAG) Adjunction • Adjunction is an insertion operation . • Adjunction inserts an auxiliary tree into another tree • The foot node label of auxiliary tree must match the label of node at which it adjoins. Tree Adjoining Grammar (TAG) Β3 is inserted here below this node Adjunction Operation b/w Initial & Auxiliary Tree Sub Tree NP_r N N D N* adj Foot node boy good the Sub tree is substituted here Derived tree from substitution operation now become initial tree for adjunction. [α] [β3] Tree Adjoining Grammar (TAG) NP_r N D the adj N good boy Derived tree after Adjunction WORK FLOW DIAGRAM OF TAG PARSER Dot Traversal Of Tag Parser end start A B C E F D G H I Predict Operation Scanner operation : Complete Operation : Derived Tree : S_r NA : :a: :a: :b: :b: : S NA : :S: : S : NA : S* NA : :e: :d: :d: :c: :c: • • The Earley-type Recognizer for TAGs follows: The following seven operations on each state s = [c~, dot, side,pos, l, f,, fr, star, t~, b~] 1.Scanner 2. Move dot down 3. Move dot up 4. Left Predictor 5. Left Completor 6. Right Predictor 7. Right Completor Generation Process in MANTRA Generation Process in MANTRA STEP1: TAG Generator selects a sentence initial tree from corresponding target language. STEP2: TAG generator performs the synthesis as per the target language structure (sentence order) STEP3: TAG generator performs the following operation such as- substitution, adjunction , node anchoring, and node embedding. Generation Process in MANTRA for Multilingual Translation Generator input Multilingual translation through TAG based parsing and generation in MANTRA Jaipur is the pink city of India GENERATOR O/P English - Hindi Generator O/P English - Oriya Generator o/p English Urdu Generator o/p English Marathi Generator o/p English Tamil Sample Outputs of MANTRA Sample Outputs For English - Hindi Sample Outputs For English - Marathi Sample Outputs For English - Oriya Sample Outputs For English - Urdu Sample Outputs For English - Bangla Sample Outputs For English - Tamil Samples of Constructions Solved through TAG Passive constructions: The deputation of officers to the post will be governed by the OM referred to above. Stative Constructions: The leave sanctioned to Shri Bhat stands cancelled. Transposing and reframing of clause order and phrase order: Officers possessing experience of the post are hereby promoted...... ®ÖÛú¤üß Ûêú †®Öã³Ö¾Ö ¸ÜÖ®Öê¾ÖÖ»Öê †f¬ÖÛúÖ¸üß ...(relative clause formation) †f¬ÖÛúÖ¸üß •ÖÖê ®ÖÛú¤üß Ûúê †®Öã³Ö¾Ö ¸üÜÖŸÖê Æïü..(shifting of clause or phrase order) Changing of verb class: transitive verb to linking verbs The post carries a special pay. (transitive) ‡ÃÖ ¯Ö¤ü Ûêúúú ÃÖÖ£Ö ¾Ö¿ÖêÂÖ ¾ÖêŸÖ®Ö ÛúÖ ¯ÖÏÖ¾Ö¬ÖÖ®Ö Æîü … (linking verb) He will be designated as Secretary (Finance). ˆ®ÖÛúÖ ¯Ö¤ü®ÖÖ´Ö ÃÖf“Ö¾Ö (f¾Ö¢Ö) Ûêú ¹ý¯Ö ´Öë ÆüÖêÝÖÖ … (linking verb) ¾Öê ÃÖf“Ö¾Ö (f¾Ö¢Ö) ¯Ö¤ü®ÖÖf ´ÖŸÖ ÆüÖëÝÖê … (transitive) Hanging frozen expressions Orders have been issued vide Office Memorandum No dol/08/1a to all the rajbhasha officials ÃÖ³Öß ¸üÖ•Ö³ÖÖÂÖÖ †f¬ÖÛúÖf¸üµÖÖë ÛúÖê †Ö¤êü¿Ö •ÖÖ¸üß Ûú¸ü f¤ü‹ ÝÖµÖê Æïü ¤êüfÜÖ‹ ÛúÖµÖÖÔ»ÖµÖ –ÖÖ¯Ö®Ö ÃÖÓܵÖÖ ¸Ö. f¾Ö/08/Ûú Issues regarding Structural Differences & Translation Accuracy English to Oriya Plural Adjective required Singular Nouns: The adjective like all, both etc takes singular noun form in sentence rather than the plural. Ex: Rajasthan State Transport Corporation (RSTC) has bus services to all the major destinations of north India.. Relative pronoun sentence has syntax variation output Ex: Bikaner is also one major hub for the tourists looking for an adventurous Camel ride, which gives an insight into the exquisite lifestyle of remote Rajasthan. English to Oriya Honorific Problem: It is not possible to provide honorific mark at the contextual behavior. Ex: The majestic Ashoka pillar records visit of emperor Ashoka to Sarnath. Accuracy in Translation from English to Oriya is 50% English to Marathi Postposition not joined to the root Jaipur , popularly-known-as the Pink-City , is the capital of Rajasthan-state , India Position of clause Kaziranga National Park is best known for the one-horned Rhinoceros. Accuracy in Translation from English to Marathi is 30% English to Urdu Urdu is a inflectional or isolating language like Hindi. Basically, the variations in the lexical choices are major features in Urdu. Problem identified in syntactic level Arrangement of clauses Activisation of the passive sentence Accuracy in Translation from English to Urdu is 40% System Specification in MANTRA Desktop solutions Standalone SQL versions (Normal, Encrypted) Available Platforms Technology Web Based Solution (Internet) Java, EJB My SQL versions (Normal, Encrypted) Enterprise Solution (Intranet) VC++ Access version (Normal) Desktop solutions (Standalone) VC++ SQL Express version (Normal) MSDE version (Normal) MANTRA: Achievements MANTRA: Achievements MANTRA Technology is a recipient of the Computer world Smithsonian Award and is a part of the “1999 Innovation Collection” in the National Museum for American History. MANTRA: Achievements Launched on 14th Sept 2007 by Honorable Minister of Home Affairs, GOI MANTRA: Achievements Papers to be Laid on the Table [PLOT] List Of Business [LOB] Parliamentary Bulletin Part-I Launched on 29th August 2007 by Honorable Vice-President of India Thank You!