Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Dependency Parsing Some slides are based on: 1) PPT presentation on dependency parsing by Prashanth Mannem 2) Seven Lectures on Statistical Parsing by Christopher Manning Constituency parsing • Breaks sentence into constituents (phrases), which are then broken into smaller constituents • Describes phrase structure and clause structure ( NP, PP, VP, etc.) • Structures often recursive S NP VP VP mom is NP an amazing show Dependency parsing • Syntactic structure consists of lexical items, linked by binary asymmetric relations called dependencies • Interested in grammatical relations between individual words (governing & dependent words) • Does not propose a recursive structure, rather a network of relations • These relations can also have labels Dependency vs. Constituency • Dependency structures explicitly represent – Head-dependent relations (directed arcs) – Functional categories (arc labels) – Possibly some structural categories (parts-of-speech) • Constituency structure explicitly represent – Phrases (non-terminal nodes) – Structural categories (non-terminal labels) – Possibly some functional categories (grammatical functions) Dependency vs. Constituency • A dependency grammar has a notion of a head • Officially, CFGs don’t • But modern linguistic theory and all modern statistical parsers (Charniak, Collins, …) do, via handwritten phrasal “head rules”: – The head of a Noun Phrase is a noun/number/… – The head of a Verb Phrase is a verb/modal/…. Based on a slide by Chris Manning Dependency vs. Constituency • The head rules can be used to extract a dependency parse from a CFG parse (follow the heads) • A phrase structure tree can be got from a dependency tree, but dependents are flat Based on a slide by Chris Manning Definition: dependency graph • An input word sequence w1…wn • Dependency graph G = (V,E) where – V is the set of nodes i.e. word tokens in the input seq. – E is the set of unlabeled tree edges (i, j) i, j є V – (ii, j) indicates an edge from i (parent, head, governor) to j (child, dependent) Definition: dependency graph • A dependency graph is well-formed iff – Single head: Each word has only one head – Acyclic: The graph should be acyclic – Connected: The graph should be a single tree with all the words in the sentence – Projective: If word A depends on word B, then all words between A and B are also subordinate to B (i.e. dominated by B) Non-projective dependencies Ram saw a dog yesterday which was a Yorkshire Terrier Parsing algorithms • Dependency based parsers can be broadly categorized into – Grammar driven approaches • Parsing done using grammars – Data driven approaches • Parsing by training on annotated/un-annotated data Unlabeled graphs • Dan Klein recently showed that labeling is relatively easy and that the difficulty of parsing lies in creating bracketing (Klein, 2004) • Therefore some parsers run in two steps: 1) bracketing; 2) labeling Traditions • Dynamic programming – e.g., Eisner (1996), McDonald (2006) • Deterministic search – e.g., Covington (2001), Yamada and Matsumoto, Nivre (2006) • Constraints satisfaction – e.g., Maruyama, Foth et al. Data driven • Two main approaches – Global, Exhaustive, Graph-based parsing – Local, greedy, transition-based parsing Graph-based parsing • Assume there is a scoring function: s :V ´V ´ L ® R • The score of a graph is s(G = (V, E)) = å s(i, j) (i, j )ÎE • Parsing for input string x is G* = argmaxGÎD(Gx ) All dependency graphs å (i, j )ÎE s(i, j) MST algorithm (McDonald, 2006) • Scores are based on features, independent of other dependencies s(i, j) = w × f (i, j) • Features can be – Head and dependent word and POS separately – Head and dependent word and POS bigram features – Words between head and dependent – Length and direction of dependency MST algorithm (McDonald, 2006) • Parsing can be formulated as maximum spanning tree problem • Use Chu-Liu-Edmonds (CLE) algorithm for MST (runs in O(n2 ) , considers non-projective arcs) • Uses online learning for determining weight vector w s(i, j) = w × f (i, j) Transition-based parsing • A transition system for dependency parsing defines: – a set C of parser configurations, each of which defines a (partially built) dependency graph G – a set T of transitions, each a function t :CC – for every sentence x = w0,w1, . . . ,wn • a unique initial configuration cx • a set Qx of terminal configurations Transition sequence • A transition sequence Cx,m = (cx, c1, . . . , cm) for a sentence x is a sequence of configurations such that cm Î Q and, for every ci ¹ cx there is a transition t Î T such that ci = t(ci-1 ) • The graph defined by cm Î Q is the dependency graph of x Transition scoring function • The score of a transition t in a configuration c s(c, t) represents the likelihood of taking transition t out of configuration c s : C ´T ® R • Parsing is finding the optimal transition sequence ( t* = argmaxtÎT s(c,t) ) Yamada and Matsumoto (2003) • A transition-based (shift-reduce) parser • Considers two adjacent words • Runs in iterations, continues as long as new dependencies are created • In every iteration, consider 3 different actions and choose one using SVM (or other discriminative learning technique) • Time complexity O(n 2 ) • Accuracy was shown to be close to the state-of-the-art algorithms (e.g., Eisner’s) Y&M (2003) Actions • Shift • Left • Right Y&M (2003) Learning • Features (lemma, POS tag) are collected from the context Stack-based parsing • Introducing a stack and a buffer – The buffer is a queue of all input words (left to right) – The stack begins empty; words are pushed to the stack by the defined actions • Reduces Y&M complexity to linear time 2 stack-based parsers • Nivre’s (2003, 2006) arc-standard i doesn’t have a head already s b j doesn’t have a head already Stack Buffer 2 stack-based parsers • Nivre’s (2003, 2006) arc-eager Example (arc eager) Red _ROOT_ S figures on the screen indicated falling stocks Q Borrowed from Dependency Parsing (P. Mannem) Example _ROOT_ Red figures S on the screen indicated falling stocks Q Shift Borrowed from Dependency Parsing (P. Mannem) Example Red _ROOT_ S figures on the screen indicated falling stocks Q Left-arc Borrowed from Dependency Parsing (P. Mannem) Example _ROOT_ Red figures on S the screen indicated falling stocks Q Shift Borrowed from Dependency Parsing (P. Mannem) Example _ROOT_ Red figures on the S screen indicated falling stocks Q Right-arc Borrowed from Dependency Parsing (P. Mannem) Example _ROOT_ Red figures on the screen S indicated falling stocks Q Shift Borrowed from Dependency Parsing (P. Mannem) Example _ROOT_ Red figures the on S screen indicated falling stocks Q Left-arc Borrowed from Dependency Parsing (P. Mannem) Example _ROOT_ Red figures on the screen indicated S falling stocks Q Right-arc Borrowed from Dependency Parsing (P. Mannem) Example _ROOT_ Red figures the on S screen indicated falling stocks Q Reduce Borrowed from Dependency Parsing (P. Mannem) Example _ROOT_ Red on figures S the screen indicated falling stocks Q Reduce Borrowed from Dependency Parsing (P. Mannem) Example _ROOT_ Red S figures on the screen indicated falling stocks Q Left-arc Borrowed from Dependency Parsing (P. Mannem) Example _ROOT_ Red figures on the screen indicated falling S stocks Q Right-arc Borrowed from Dependency Parsing (P. Mannem) Example _ROOT_ Red figures on the screen indicated falling stocks S Q Shift Borrowed from Dependency Parsing (P. Mannem) Example _ROOT_ Red figures on the screen indicated falling S stocks Q Left-arc Borrowed from Dependency Parsing (P. Mannem) Example _ROOT_ Red figures on the screen indicated falling stocks S Q Right-arc Borrowed from Dependency Parsing (P. Mannem) Example _ROOT_ Red figures on the screen indicated falling S stocks Q Reduce Borrowed from Dependency Parsing (P. Mannem) Example _ROOT_ Red S figures on the screen indicated falling stocks Q Reduce Borrowed from Dependency Parsing (P. Mannem) Graph (MSTParser) vs. Transitions (MaltParser) • Accuracy on different languages Characterizing the Errors of Data-Driven Dependency Parsing Models, McDonald and Nivre 2007 Graph (MSTParser) vs. Transitions (MaltParser) • Sentence length vs. accuracy Characterizing the Errors of Data-Driven Dependency Parsing Models, McDonald and Nivre 2007 Graph (MSTParser) vs. Transitions (MaltParser) • Dependency length vs. precision Characterizing the Errors of Data-Driven Dependency Parsing Models, McDonald and Nivre 2007 Known Parsers • Stanford (constituency + dependency) • MaltParser (dependency) • MSTParser (dependency) • Hebrew – Yoav Goldberg’s parser (http://www.cs.bgu.ac.il/~yoavg/software/hebpar sers/hebdepparser/)