Download Slides - web.iiit.ac.in

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Constraint based Dependency
Telugu Parser
Guided by Dr.Rajeev Sangal
Dr.Dipti Misra
Samar Hussain
Team members Phani Chaitanya
Ravi kiran
Overview
•
•
•
•
Motivation
A word about the language
Overview of constraint based parser
Analysis of special cases
–
–
–
–
Genitives
Copula
“ani” construction
Conjuncts
• Future work
Motivation
– We thought about a question answering system in
Telugu mainly for medical and tourism domain
which could help native Telugu speakers (as a
preliminary diagnosis tool and a travel guide). And
we were in need of a parser to make things easier.
A word about the language
• Telugu is a South Asian language
• Features
– Morphologically rich
– Free word order
– Agglutinative
• challenges
– No Treebank
– No parser
– No wordnet
Overview of constraint based parser
Telugu : rAmuduPosiMtiki
vaccAkaIndentify
paMdu
sourceni
tagging and
Raw sentence
Gloss
English
:Rama
and demand
groups apple
after_coming
chunking
home
:Ram eats an apple after coming home
wiMtadu
eats
Load frames
(demand and
transformation)
Identify source
groups satisfying
demands and
draw arcs
Final parse
Integer
programming
module (solves
the equations)
Apply the 3
constraints and
form equations
for each demand
Overview of constraint based parser
1
1.1
2
2.1
3
3.1
4
4.1
4.2
5
5.1
5.2
((
rAmudu
Raw sentence
))
((
iMtiki
))
((
vaccAka
))
((
paMdu
ni
))
((
wiMtAdu
.
))
))
Final parse
NP
NN
NP
NN
VG
VRB
NP
NN
PREP
VG
VFM
SYM
Indentify source
<af=rAma,n,,,,0,,adj_vAdu,>
Pos tagging
and
and demand
chunking
groups
<af=illu,n,,s,,0,,ki,>
<af=vaccu,v,,,any,0,,ina_Aka,>
Load frames
(demand and
transformation)
<af=paMdu,n,,s,,0,,0,>|<af=paMdu,n,,s,,0,,obl,>
Identify source
<af=ni,n,,s,,0,,0,>
groups satisfying
demands and
draw arcs
<af=winu,v,,,3_p,0,,wA,>
Integer
programming
module (solves
the equations)
Apply the 3
constraints and
form equations
for each demand
Overview of constraint based parser
1
1.1
2
2.1
3
3.1
4
4.1
4.2
5
5.1
5.2
((
rAmudu
Raw sentence
))
((
iMtiki
))
((
vaccAka
))
((
paMdu
ni
))
((
wiMtAdu
.
))
))
Final parse
NP
NN
NP
NN
VG
VRB
NP
NN
PREP
VG
VFM
SYM
Indentify source
<af=rAma,n,,,,0,,adj_vAdu,>
Pos tagging
and
and demand
chunking
groups
Source
Source
<af=illu,n,,s,,0,,ki,>
<af=vaccu,v,,,any,0,,ina_Aka,>
Load frames
Demand (demand and
transformation)
Source
<af=paMdu,n,,s,,0,,0,>
<af=ni,n,,s,,0,,0,>
<af=winu,v,,,3_p,0,,wA,>
Integer
programming
module (solves
the equations)
Identify source
groups satisfying
demands and
Demand
draw arcs
Apply the 3
constraints and
form equations
for each demand
Overview of constraint based parser
Frame for winu (eat in basic form so no transformation required)
------------------------------------------------------------------Indentify source
arc-label |necessity| vibhakti|lextype |posn|reln
Pos tagging and
Raw sentence
and demand
------------------------------------------------------------------chunking
k1
m
0
n
l
c groups
k2
m
ni
n
l
c
-------------------------------------------------------------------Frame for vaccu (come)
------------------------------------------------------------------arc-label |necessity| vibhakti|lextype |posn|reln
------------------------------------------------------------------k1
m
0
n
l
c
K2
m
ki
n
l
c
-------------------------------------------------------------------
Winu[wa] (eat)
k1
k2
Load frames
paMdu
rAmudu(Ram)(demand
and(fruit)
transformation)
Vmod
(after coming )Vaccu[ina_aka]
k1
rAmudu
Transformation charts [ina_aka (after+ing)]
---------------------------------------------------------------------------Integer
Apply the 3
arc-label |necessity| vibhakti|lextype
|posn|reln|op
programming
constraints and
---------------------------------------------------------------------------Final parse
module (solves
form equations
K1
m
0 the equations)
n
l
c remove
for each demand
Vmod
m
v
r
p insert
-----------------------------------------------------------------------------
Identify source
groups satisfying
demands
k2 and
draw arcs
(House)iMtiki
Overview of constraint based parser
Frame for vaccAka (after transformation)
arc-label
necessity
vibhaktiPos tagging
lextype and posn
k2 Raw sentence
m
ki
n
l
chunking
Vmod
m
v
r
Indentify
source
reln
and
c demand
p
groups
------------------------------------------------------------Frame for winu
k1
m
0
n
l
c
k2
m
ni
n
l
c
----------------------------------------------------------------------------------------
Load frames
(demand and
transformation)
X1:k1
rAmudu
iMtiki
vaccAka
paMduni
X2:k2
X3:k2
Final parse
Integer
programming
module (solves
the equations)
Identify
source
wiMtadu
groups satisfying
demands and
draw arcs
Apply the 3
constraints and
form equations
for each demand
X4:vmod
Overview of constraint based parser
Indentify
source
C1 : For each of the mandatory karakas in a karaka chart
for each
demand group, there should be
Pos tagging and
Rawone
sentence
demand
exactly
outgoing edge labeled
by the karaka by theand
demand
group.
chunking
groups
C2 : for each of the optional or desirable karakas in a karaka chart for each demand group, there
should be at most one outgoing edge labeled by the karaka by the demand group.Load frames
C3 : There should be exactly one incoming arc into each source group
Equations formed by applying the above constraints are
C1 :
X1 = 1
X2 = 1
X3 = 1
X4 = 1
:
Identify source
groups satisfying
demands and
draw arcs
C2 : No optional field found
C3 :
Final parse
Integer
X1 = 1
X2 = 1 programming
module (solves
X3 = 1 the equations)
X4 = 1
(demand and
transformation)
Apply the 3
constraints and
form equations
for each demand
Overview of constraint based parser
1
1.1
2
2.1
3
3.1
4
4.1
4.2
5
5.1
5.2
((
rAmudu
Raw sentence
))
((
iMtiki
))
((
vaccAka
))
((
paMdu
ni
))
((
wiMtAdu
.
))
))
Final parse
NP
NN
NP
NN
VG
VRB
NP
NN
PREP
VG
VFM
SYM
< af=rAma,n,,,,0,,adj_vAdu,/drel=k1:5/name=1>
Indentify source
<af=rAma,n,,,,0,,adj_vAdu,>
Pos tagging
and
and demand
chunking
groups
<af=illu,n,,s,,0,,ki,/drel = k2:3/name=2>
<af=illu,n,,s,,0,,ki,>
Load frames
(demand and
<af=vaccu,v,,,any,0,,ina_Aka,/drel = vmod:5/name=3>
transformation)
<af=vaccu,v,,,any,0,,ina_Aka,>
<af=paMdu,n,,s,,0,,0,/drel = k2:5/name=4>
<af=paMdu,n,,s,,0,,0,>|<af=paMdu,n,,s,,0,,obl,>
Identify source
<af=ni,n,,s,,0,,0,>
groups satisfying
demands and
<af=winu,v,,,3_p,0,,wA,/name = 5>
draw arcs
<af=winu,v,,,3_p,0,,wA,>
Integer
programming
module (solves
the equations)
Apply the 3
constraints and
form equations
for each demand
Analysis of special cases
•
•
•
•
Genitives
Copula
“ani” construction
Conjuncts
Genitives
• Genitives is the case that marks a noun as being the
possessor of another noun (ex – his, her, its …… etc)
• Cases
– Genitive marker exists
– Telugu : rAmudi yoVkka puswakaM
– Gloss : ram
's
book
• So when there is a marker then it is a straight forward that the
noun preceding “yoVkka” holds an R6 relation with the noun
succeeding “yoVkka”.
– Genitive marker is dropped
– Telugu : rAmudi puswakaM
– Gloss : ram
book
• here is the suffix “udi” in “rAmudi” which gives the information
about existence of genitive.
Genitive contd..
• Exceptions in case where genitive marker can be dropped
•
•
•
•
Telugu
: raGu puswakaM rAmudiki icCadu
Gloss
: Raghu book
Ram
gave
English (sense 1): Raghu gave book to sita.
English (sense 2): Raghu’s book is given to sita.
So for non-masculine nouns (Raghu and Sita)in Telugu we
don’t have any markers for genitives.
• So we output all possible parses for this case. The parses
include
icCAdu
raGu
k4
k1
k2
puswakam
rAmudiki
k4
icCAdu
rAmudiki
k2
puswakam
r6
raGu
Copula
• Ex – is, are, were ….. Etc
• Copula is generally dropped in Telugu
For ex– Telugu : rAmudu maMci bAludu
– gloss : RAM good boy
– Eng : Ram is a good boy.
• So we handle these cases by introducing a “NULL_VG”
Frame for NULL_VG
-------------------------------------------------------------------------------------------arc-label
necessity
vibhakti
lextype
posn
reln
-------------------------------------------------------------------------------------------k1
m
0
n
l
c
k1S
m
0
n
l
c
--------------------------------------------------------------------------------------------
‘ani’ construction
• ‘ani’ in telugu is some times similar to “that” in english.
• There are three different ways of using “ani” as follows :
 Used as complementizer :
• Telugu : rAmudu paMdu wiMtAdu ani mohan ceVppAdu.
• Gloss : Ram
fruit
will_eat that mohan said .
• English : Ram said that Mohan will eat a fruit.
 Used as verb :
• Telugu : mohan rAmudu paMdu wiMtAdu ani vellipoyAdu .
• English : mohan left saying ram eats an apple.
 Used to state a reason :
• Telugu : mohan rAmudu paMdu winnAdani vellipoyAdu.
• Gloss : Mohan Ram fruit
had_eaten went.
• English : Mohan went because ram had eaten the fruit.
“ani” construction Contd …
So we created a demand frame for “ani”
Frame for ani
-------------------------------------------------------------------------------------------arc-label
necessity
vibhakti
lextype
posn
reln
-------------------------------------------------------------------------------------------Ccof
m
v_fin
l
c
Ccof
m
v_fin
r
p
--------------------------------------------------------------------------------------------
Conjuncts
• In Telugu conjuncts occur as suffixes (tam of the
verb) , DheergAs and as lexical items such as
“inkA” , “anduke” , “mariyu” , “kAni” , “aiwe” and
“anwe”.
 Suffixes :
 Here , just applying the corresponding transformation
chart of the verb solves the case.
Telugu :
nenu iMtiki velwe nixrapowAnu.
Gloss :
I
home if go
will_sleep .
English:
I will sleep if I go home.
Contd …
•
Lexical items :
Here we will have frame for each lexical entry which will do the corresponding
job.
In case of “mariyu” :
Frame 1 :
-------------------------------------------------------------------------------------------arc-label
necessity
vibhakti
lextype
posn
reln
-------------------------------------------------------------------------------------------Ccof
m
v
l
c
Ccof
m
v
r
c
-------------------------------------------------------------------------------------------Frame 2 :
-------------------------------------------------------------------------------------------arc-label
necessity
vibhakti
lextype
posn
reln
-------------------------------------------------------------------------------------------Ccof
m
n
l
c
Ccof
m
n
r
c
--------------------------------------------------------------------------------------------
Contd …
• DheergAs :
 Often by elongation of the vowel at the end of lexical
items the conjuncts information is implicit there
without the need of explicit lexical entries such as
“mariyu”.
•
•
•
Telugu : rAmudU
siwA iMtiki vellAru.
Gloss : Ram (implicit conj) sita home went .
English : Ram and Sita went home .
 In such cases a NULL_CCP is introduced which serves
like explicit conjunct lexical entry and we have a
frames for the NULL_CCP similar to the one in
previous slide.
Future work !!
• A thorough analysis of Relative clauses.
• Analysis and handling of NULL VERBS in case
of complex constructions.
• And their implementation.
• Verb and TAM Classification.
THANKS !!
Any Queries ??
Related documents