* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download ppt
Survey
Document related concepts
Transcript
Markov Model Based Classification of Semantic Roles A Final Project in Probabilistic Methods in AI Course Submitted By: Shlomit Tshuva, Libi Mann and Noam Ben Haim The Problem Different parts in the sentence denote different semantic roles. The team cars and publicity vehicles will drive through the night Automatically identify the different roles Good for Automatic Translation, Question Answering and more Self_Mover Duration The Graphical Model Markov Chain: Headwords (verbs and nouns, excluding adjectives and determiners) as the Nodes. Local Potentials – Estimated from FrameNet data base, augmented with WordNet data, with nonzero probability for unseen data. Transition Tables – From statistics on consecutive Frame Elements. Results From 456 Sentences only 4 FE appeared more than 3 times (GOAL, PATH, SELF_MOVER and SOURCE). Boundaries were not taken into account when counting. Both Precision and Recall measures are ~67%. Major drawback is wrong boundaries for FEs, and tendency of names to be attributed to GOAL. Problems Sparse Data – Only 456 annotated sentences in the largest annotated Frame, and not statistically characteristic - usage based. Not enough lemmas in the database. Some Frame Elements (FE’s) appear only a few times (Path, Source, Time). Some words almost exclusively belong to a single FE. We tried to solve some of the lack of data w.r.t lemmas by using WordNet for words relationships – added some noise, but a good start. A lot of sentences have large unmarked sections, and when we have a word that appeared a lot in some FE, it has a big prior for that FE. Problems (Cont.) Using only local dependencies Hard to exit a FE – unless a significant headword appears The transition from FE to itself dominate the distribution Treating ALL proper names the same – whether they denote a Person (Usually SELF_MOVER) or a place (A GOAL, SOURCE or AREA) The information of the number of appearances of a frame element in a sentence is lost. Restricted usage of Syntax Related to the local dependencies problem But Syntax only is no good either (~69% with State of the art systems) Further Research Add syntax in all levels. Enhance data Use syntactic constituents to estimate constituent specific transition tables. Use syntactic constituents to determine FE boundaries. Larger windows. More representative Local Potentials Lemma specific transition tables More extensive usage of WordNet Differentiate between relations (we only used Hypernym relation) Wider search in the WordNet hierarchy (we only used siblings of second order)