Download TAG based Parsing in English to IL MT

Document related concepts
no text concepts found
Transcript
WELCOME
TAG Based Parsing
for
Machine Translation English to Indian Language
Dr. Hemant Darbari
Programme Co-ordinator
Applied Artificial Intelligence Group, & ACTS
Advanced Computing Training School
C-DAC, Pune
[email protected]
Outline

MANTRA: Introduction

Parsing Process in TAG: An Overview

Workflow of TAG Parser

Generation Process in MANTRA

Generation Process in MANTRA for Multlingual Translation

Sample Outputs of MANTRA

Samples of Constructions Solved through TAG

Issues Regarding Structural Differences and Translation Accuracy

System specifications

MANTRA: Achievements
MANTRA: Introduction
MANTRA
MANTRA is an acronym of
MAchiNe assisted TRAnslation tool.
 A Tree Adjoining Grammar (TAG) based Machine Translation System of
Applied AI Group of C-DAC, Pune
 MANTRA translates English documents into Hindi and other Indian
Languages, such as Oriya <O>, Tamil <T>, Urdu <U>, Marathi <M> &
Bangla <B>
 MANTRA covers the following domains: Administration, Finance,
Agriculture, Small Scale Industries, Information Technology and
Healthcare, Tourism and Proceedings and documents of Rajya Sabha
Parsing Process in TAG An Overview
Tree Adjoining Grammar (TAG)
TAG Stands for Tree Adjoining Grammars
• The formalism of this grammar is based on investigation
and research of Arvind Joshi (1987)
• Tree is the basic building blocks of this formalism
• In contrast to other formalism, where dependencies are
defined between elements of rule (node), in TAG
dependencies are defined between different trees .
Tree Adjoining Grammar (TAG)
A TAG is defined as a 5-tuple grammar
G = (N, T,S,I,A) where
•
•
•
•
•
N is a finite set of non-terminal symbols
T is a finite set of terminals
S is a distinguished non-terminal,
I is a finite set of trees called initial trees and
A is a finite set of trees called auxiliary trees
Tree Adjoining Grammar (TAG)
• This is an LR Parser
• Combines both top-down and bottom-up operations
that’s why it is called hybrid parser.
• And Supports Multiple Parallel Parses
Tree Adjoining Grammar (TAG)
A state S is defined as a 10-tuple,
S=[a, dot, side, pos, l, ft, fr, star, t~, b~]
where:
• a: is the current tree being parsed.
• dot: current position of the dot in the tree a.
• side: is the side of the symbol the dot is on
side E {left, right}.
• pos: is the position of the dot
pos E {above, below}.
• l:latest index in the input lexical array
Contd…
Tree Adjoining Grammar (TAG)
A state S is defined as a 10-tuple,
S=[a, dot, side, pos, l, ft, fr, star, t~, b~]
where:
• star: is the position of most recently adjuncted node
• foot_l: index of input lexical array that is found before
foot node
• foot_r: index of input lexical array that is found after foot
node
• tl* : index of input lexical array corresponding to point of
adjunction as star
• bl* : index of input lexical array that is found just before
the foot node at star
Tree Adjoining Grammar (TAG)
There are two fundamental trees in TAG
• Initial Tree
• Auxiliary Tree
Sentences can be represented using a derived tree,
constructed from initial and auxiliary tree through
Adjunctions and/or Substitutions operation
Tree Adjoining Grammar (TAG)
Initial tree
• Initial trees represent basic syntactic relation in a sentence
• Every initial tree at the interior node is labeled with a nonterminal symbol
• Every Frontier node is either labeled with terminal symbols
or non-terminal symbols which are marked with substitution
marked ‘ ‘
• A derivation start with an initial tree combining via either
substitution or adjunction
Tree Adjoining Grammar (TAG)
Non-terminal Nodes
NP_r
S_r
D_r
N
D
NP
VP
the
boy
V
Frontier nodes
[ α 2]
Terminal nodes
left
[ α 1]
INITIAL TREES
[ α 3]
Tree Adjoining Grammar (TAG)
Initial tree
Example for Initial Tree:
Ram has arrived
S
NP
NP0
N
Ram
-> nodes marked as (
VP
V
arrived
VP
V
VP* (NA)
has
) are substitution mark to indicate initial tree
Tree Adjoining Grammar (TAG)
Substitution
• Substitution is simple attachment operation
• Substitution replaces a frontier node with another tree
whose top node has the same label
• After substitution the result is a derived tree
• Only initial or derived tree can be substituted in another
tree
Tree Adjoining Grammar (TAG)
Auxiliary Trees
VP
VP *
adj
adv
adj*
adv
-> nodes marked as (
NP*
adj
pretty
today
[β1]
NP
[β2]
* ) are foot nodes.
[β3]
Tree Adjoining Grammar (TAG)
NP_r
NP_r
D_r
N
D
N
D
the
boy
[α]
[ α 1]
boy
the
[Derived tree]
Substitution operation
Tree Adjoining Grammar (TAG)
Adjunction
• Adjunction is an insertion operation .
• Adjunction inserts an auxiliary tree into another tree
• The foot node label of auxiliary tree must match the
label of node at which it adjoins.
Tree Adjoining Grammar (TAG)
Β3 is inserted here below this node
Adjunction Operation b/w Initial & Auxiliary Tree
Sub Tree
NP_r
N
N
D
N*
adj
Foot node
boy
good
the
Sub tree is substituted here
 Derived tree from substitution operation
now become initial tree for adjunction.
[α]
[β3]
Tree Adjoining Grammar (TAG)
NP_r
N
D
the
adj
N
good
boy
Derived tree after Adjunction
WORK FLOW DIAGRAM
OF
TAG PARSER
Dot Traversal Of Tag Parser
end
start
A
B
C
E
F
D
G
H
I
Predict Operation
Scanner operation :
Complete Operation :
Derived Tree
: S_r NA :
:a:
:a:
:b:
:b:
: S NA :
:S:
: S : NA
: S* NA :
:e:
:d:
:d:
:c:
:c:
•
•
The Earley-type Recognizer for TAGs follows:
The following seven operations on each state
s = [c~, dot, side,pos, l, f,, fr, star, t~, b~]
1.Scanner
2. Move dot down
3. Move dot up
4. Left Predictor
5. Left Completor
6. Right Predictor
7. Right Completor
Generation Process
in
MANTRA
Generation Process in MANTRA
STEP1:
TAG Generator selects a sentence initial tree from corresponding target
language.
STEP2:
TAG generator performs the synthesis as per the target language structure
(sentence order)
STEP3:
TAG generator performs the following operation such as- substitution,
adjunction , node anchoring, and node embedding.
Generation Process in MANTRA
for
Multilingual Translation
Generator input
Multilingual translation through TAG based parsing and
generation in MANTRA
Jaipur is the pink city of India
GENERATOR O/P
English - Hindi
Generator
O/P
English - Oriya
Generator o/p
English Urdu
Generator o/p
English Marathi
Generator o/p
English Tamil
Sample Outputs of MANTRA
Sample Outputs For English - Hindi
Sample Outputs For English - Marathi
Sample Outputs For English - Oriya
Sample Outputs For English - Urdu
Sample Outputs For English - Bangla
Sample Outputs For English - Tamil
Samples of Constructions Solved through
TAG
Passive constructions:
The deputation of officers to the post will be governed by the OM referred to above.
Stative Constructions:
The leave sanctioned to Shri Bhat stands cancelled.
Transposing and reframing of clause order and
phrase order:
Officers possessing experience of the post are hereby promoted......
®ÖÛú¤üß Ûêú †®Öã³Ö¾Ö ¸ÜÖ®Öê¾ÖÖ»Öê †f¬ÖÛúÖ¸üß ...(relative
clause formation)
†f¬ÖÛúÖ¸üß •ÖÖê ®ÖÛú¤üß Ûúê †®Öã³Ö¾Ö ¸üÜÖŸÖê
Æïü..(shifting of clause or phrase order)
Changing of verb class: transitive verb to linking verbs
The post carries a special pay.
(transitive)
‡ÃÖ ¯Ö¤ü Ûêúúú ÃÖÖ£Ö ¾Ö¿ÖêÂÖ ¾ÖêŸÖ®Ö ÛúÖ ¯ÖÏÖ¾Ö¬ÖÖ®Ö Æîü … (linking verb)
He will be designated as Secretary (Finance).
ˆ®ÖÛúÖ ¯Ö¤ü®ÖÖ´Ö ÃÖf“Ö¾Ö (f¾Ö¢Ö) Ûêú ¹ý¯Ö ´Öë ÆüÖêÝÖÖ … (linking verb)
¾Öê ÃÖf“Ö¾Ö (f¾Ö¢Ö) ¯Ö¤ü®ÖÖf ´ÖŸÖ ÆüÖëÝÖê … (transitive)
Hanging frozen expressions
Orders have been issued vide Office Memorandum No dol/08/1a to all the rajbhasha
officials
ÃÖ³Öß ¸üÖ•Ö³ÖÖÂÖÖ †f¬ÖÛúÖf¸üµÖÖë ÛúÖê †Ö¤êü¿Ö •ÖÖ¸üß Ûú¸ü f¤ü‹ ÝÖµÖê Æïü ¤êüfÜÖ‹
ÛúÖµÖÖÔ»ÖµÖ –ÖÖ¯Ö®Ö ÃÖÓܵÖÖ ¸Ö. f¾Ö/08/Ûú
Issues regarding
Structural Differences
&
Translation Accuracy
English to Oriya
Plural Adjective required Singular Nouns:
The adjective like all, both etc takes singular noun form in sentence rather than the plural.
Ex: Rajasthan State Transport Corporation (RSTC) has bus services to all the major destinations of
north India..
Relative pronoun sentence has syntax variation output
Ex: Bikaner is also one major hub for the tourists looking for an adventurous Camel
ride, which gives an insight into the exquisite lifestyle of remote Rajasthan.
English to Oriya
Honorific Problem:
It is not possible to provide honorific mark at the contextual behavior.
Ex: The majestic Ashoka pillar records visit of emperor Ashoka to Sarnath.
Accuracy in Translation from English to Oriya is 50%
English to Marathi
Postposition not joined to the root
Jaipur , popularly-known-as the Pink-City , is the capital of Rajasthan-state , India
Position of clause
Kaziranga National Park is best known for the one-horned Rhinoceros.
Accuracy in Translation from English to Marathi is 30%
English to Urdu
Urdu is a inflectional or isolating language like Hindi. Basically, the variations in the
lexical choices are major features in Urdu.
Problem identified in syntactic level
Arrangement of clauses
Activisation of the passive sentence
Accuracy in Translation from English to Urdu is 40%
System Specification in MANTRA
Desktop solutions
Standalone
SQL versions
(Normal, Encrypted)
Available
Platforms
Technology
Web Based
Solution
(Internet)
Java, EJB
My SQL versions
(Normal, Encrypted)
Enterprise Solution
(Intranet)
VC++
Access version
(Normal)
Desktop solutions
(Standalone)
VC++
SQL Express version
(Normal)
MSDE version
(Normal)
MANTRA: Achievements
MANTRA: Achievements
MANTRA Technology is a recipient
of the Computer world Smithsonian
Award and is a part of the
“1999 Innovation Collection” in the
National Museum for American
History.
MANTRA: Achievements
Launched on 14th Sept 2007 by Honorable Minister of Home Affairs, GOI
MANTRA: Achievements
Papers to be Laid on the Table [PLOT]
List Of Business [LOB]
Parliamentary Bulletin Part-I
Launched on 29th August 2007 by Honorable Vice-President of India
Thank You!