Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
Constraint based Dependency Telugu Parser Guided by Dr.Rajeev Sangal Dr.Dipti Misra Samar Hussain Team members Phani Chaitanya Ravi kiran Overview • • • • Motivation A word about the language Overview of constraint based parser Analysis of special cases – – – – Genitives Copula “ani” construction Conjuncts • Future work Motivation – We thought about a question answering system in Telugu mainly for medical and tourism domain which could help native Telugu speakers (as a preliminary diagnosis tool and a travel guide). And we were in need of a parser to make things easier. A word about the language • Telugu is a South Asian language • Features – Morphologically rich – Free word order – Agglutinative • challenges – No Treebank – No parser – No wordnet Overview of constraint based parser Telugu : rAmuduPosiMtiki vaccAkaIndentify paMdu sourceni tagging and Raw sentence Gloss English :Rama and demand groups apple after_coming chunking home :Ram eats an apple after coming home wiMtadu eats Load frames (demand and transformation) Identify source groups satisfying demands and draw arcs Final parse Integer programming module (solves the equations) Apply the 3 constraints and form equations for each demand Overview of constraint based parser 1 1.1 2 2.1 3 3.1 4 4.1 4.2 5 5.1 5.2 (( rAmudu Raw sentence )) (( iMtiki )) (( vaccAka )) (( paMdu ni )) (( wiMtAdu . )) )) Final parse NP NN NP NN VG VRB NP NN PREP VG VFM SYM Indentify source <af=rAma,n,,,,0,,adj_vAdu,> Pos tagging and and demand chunking groups <af=illu,n,,s,,0,,ki,> <af=vaccu,v,,,any,0,,ina_Aka,> Load frames (demand and transformation) <af=paMdu,n,,s,,0,,0,>|<af=paMdu,n,,s,,0,,obl,> Identify source <af=ni,n,,s,,0,,0,> groups satisfying demands and draw arcs <af=winu,v,,,3_p,0,,wA,> Integer programming module (solves the equations) Apply the 3 constraints and form equations for each demand Overview of constraint based parser 1 1.1 2 2.1 3 3.1 4 4.1 4.2 5 5.1 5.2 (( rAmudu Raw sentence )) (( iMtiki )) (( vaccAka )) (( paMdu ni )) (( wiMtAdu . )) )) Final parse NP NN NP NN VG VRB NP NN PREP VG VFM SYM Indentify source <af=rAma,n,,,,0,,adj_vAdu,> Pos tagging and and demand chunking groups Source Source <af=illu,n,,s,,0,,ki,> <af=vaccu,v,,,any,0,,ina_Aka,> Load frames Demand (demand and transformation) Source <af=paMdu,n,,s,,0,,0,> <af=ni,n,,s,,0,,0,> <af=winu,v,,,3_p,0,,wA,> Integer programming module (solves the equations) Identify source groups satisfying demands and Demand draw arcs Apply the 3 constraints and form equations for each demand Overview of constraint based parser Frame for winu (eat in basic form so no transformation required) ------------------------------------------------------------------Indentify source arc-label |necessity| vibhakti|lextype |posn|reln Pos tagging and Raw sentence and demand ------------------------------------------------------------------chunking k1 m 0 n l c groups k2 m ni n l c -------------------------------------------------------------------Frame for vaccu (come) ------------------------------------------------------------------arc-label |necessity| vibhakti|lextype |posn|reln ------------------------------------------------------------------k1 m 0 n l c K2 m ki n l c ------------------------------------------------------------------- Winu[wa] (eat) k1 k2 Load frames paMdu rAmudu(Ram)(demand and(fruit) transformation) Vmod (after coming )Vaccu[ina_aka] k1 rAmudu Transformation charts [ina_aka (after+ing)] ---------------------------------------------------------------------------Integer Apply the 3 arc-label |necessity| vibhakti|lextype |posn|reln|op programming constraints and ---------------------------------------------------------------------------Final parse module (solves form equations K1 m 0 the equations) n l c remove for each demand Vmod m v r p insert ----------------------------------------------------------------------------- Identify source groups satisfying demands k2 and draw arcs (House)iMtiki Overview of constraint based parser Frame for vaccAka (after transformation) arc-label necessity vibhaktiPos tagging lextype and posn k2 Raw sentence m ki n l chunking Vmod m v r Indentify source reln and c demand p groups ------------------------------------------------------------Frame for winu k1 m 0 n l c k2 m ni n l c ---------------------------------------------------------------------------------------- Load frames (demand and transformation) X1:k1 rAmudu iMtiki vaccAka paMduni X2:k2 X3:k2 Final parse Integer programming module (solves the equations) Identify source wiMtadu groups satisfying demands and draw arcs Apply the 3 constraints and form equations for each demand X4:vmod Overview of constraint based parser Indentify source C1 : For each of the mandatory karakas in a karaka chart for each demand group, there should be Pos tagging and Rawone sentence demand exactly outgoing edge labeled by the karaka by theand demand group. chunking groups C2 : for each of the optional or desirable karakas in a karaka chart for each demand group, there should be at most one outgoing edge labeled by the karaka by the demand group.Load frames C3 : There should be exactly one incoming arc into each source group Equations formed by applying the above constraints are C1 : X1 = 1 X2 = 1 X3 = 1 X4 = 1 : Identify source groups satisfying demands and draw arcs C2 : No optional field found C3 : Final parse Integer X1 = 1 X2 = 1 programming module (solves X3 = 1 the equations) X4 = 1 (demand and transformation) Apply the 3 constraints and form equations for each demand Overview of constraint based parser 1 1.1 2 2.1 3 3.1 4 4.1 4.2 5 5.1 5.2 (( rAmudu Raw sentence )) (( iMtiki )) (( vaccAka )) (( paMdu ni )) (( wiMtAdu . )) )) Final parse NP NN NP NN VG VRB NP NN PREP VG VFM SYM < af=rAma,n,,,,0,,adj_vAdu,/drel=k1:5/name=1> Indentify source <af=rAma,n,,,,0,,adj_vAdu,> Pos tagging and and demand chunking groups <af=illu,n,,s,,0,,ki,/drel = k2:3/name=2> <af=illu,n,,s,,0,,ki,> Load frames (demand and <af=vaccu,v,,,any,0,,ina_Aka,/drel = vmod:5/name=3> transformation) <af=vaccu,v,,,any,0,,ina_Aka,> <af=paMdu,n,,s,,0,,0,/drel = k2:5/name=4> <af=paMdu,n,,s,,0,,0,>|<af=paMdu,n,,s,,0,,obl,> Identify source <af=ni,n,,s,,0,,0,> groups satisfying demands and <af=winu,v,,,3_p,0,,wA,/name = 5> draw arcs <af=winu,v,,,3_p,0,,wA,> Integer programming module (solves the equations) Apply the 3 constraints and form equations for each demand Analysis of special cases • • • • Genitives Copula “ani” construction Conjuncts Genitives • Genitives is the case that marks a noun as being the possessor of another noun (ex – his, her, its …… etc) • Cases – Genitive marker exists – Telugu : rAmudi yoVkka puswakaM – Gloss : ram 's book • So when there is a marker then it is a straight forward that the noun preceding “yoVkka” holds an R6 relation with the noun succeeding “yoVkka”. – Genitive marker is dropped – Telugu : rAmudi puswakaM – Gloss : ram book • here is the suffix “udi” in “rAmudi” which gives the information about existence of genitive. Genitive contd.. • Exceptions in case where genitive marker can be dropped • • • • Telugu : raGu puswakaM rAmudiki icCadu Gloss : Raghu book Ram gave English (sense 1): Raghu gave book to sita. English (sense 2): Raghu’s book is given to sita. So for non-masculine nouns (Raghu and Sita)in Telugu we don’t have any markers for genitives. • So we output all possible parses for this case. The parses include icCAdu raGu k4 k1 k2 puswakam rAmudiki k4 icCAdu rAmudiki k2 puswakam r6 raGu Copula • Ex – is, are, were ….. Etc • Copula is generally dropped in Telugu For ex– Telugu : rAmudu maMci bAludu – gloss : RAM good boy – Eng : Ram is a good boy. • So we handle these cases by introducing a “NULL_VG” Frame for NULL_VG -------------------------------------------------------------------------------------------arc-label necessity vibhakti lextype posn reln -------------------------------------------------------------------------------------------k1 m 0 n l c k1S m 0 n l c -------------------------------------------------------------------------------------------- ‘ani’ construction • ‘ani’ in telugu is some times similar to “that” in english. • There are three different ways of using “ani” as follows : Used as complementizer : • Telugu : rAmudu paMdu wiMtAdu ani mohan ceVppAdu. • Gloss : Ram fruit will_eat that mohan said . • English : Ram said that Mohan will eat a fruit. Used as verb : • Telugu : mohan rAmudu paMdu wiMtAdu ani vellipoyAdu . • English : mohan left saying ram eats an apple. Used to state a reason : • Telugu : mohan rAmudu paMdu winnAdani vellipoyAdu. • Gloss : Mohan Ram fruit had_eaten went. • English : Mohan went because ram had eaten the fruit. “ani” construction Contd … So we created a demand frame for “ani” Frame for ani -------------------------------------------------------------------------------------------arc-label necessity vibhakti lextype posn reln -------------------------------------------------------------------------------------------Ccof m v_fin l c Ccof m v_fin r p -------------------------------------------------------------------------------------------- Conjuncts • In Telugu conjuncts occur as suffixes (tam of the verb) , DheergAs and as lexical items such as “inkA” , “anduke” , “mariyu” , “kAni” , “aiwe” and “anwe”. Suffixes : Here , just applying the corresponding transformation chart of the verb solves the case. Telugu : nenu iMtiki velwe nixrapowAnu. Gloss : I home if go will_sleep . English: I will sleep if I go home. Contd … • Lexical items : Here we will have frame for each lexical entry which will do the corresponding job. In case of “mariyu” : Frame 1 : -------------------------------------------------------------------------------------------arc-label necessity vibhakti lextype posn reln -------------------------------------------------------------------------------------------Ccof m v l c Ccof m v r c -------------------------------------------------------------------------------------------Frame 2 : -------------------------------------------------------------------------------------------arc-label necessity vibhakti lextype posn reln -------------------------------------------------------------------------------------------Ccof m n l c Ccof m n r c -------------------------------------------------------------------------------------------- Contd … • DheergAs : Often by elongation of the vowel at the end of lexical items the conjuncts information is implicit there without the need of explicit lexical entries such as “mariyu”. • • • Telugu : rAmudU siwA iMtiki vellAru. Gloss : Ram (implicit conj) sita home went . English : Ram and Sita went home . In such cases a NULL_CCP is introduced which serves like explicit conjunct lexical entry and we have a frames for the NULL_CCP similar to the one in previous slide. Future work !! • A thorough analysis of Relative clauses. • Analysis and handling of NULL VERBS in case of complex constructions. • And their implementation. • Verb and TAM Classification. THANKS !! Any Queries ??