Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Coarse-to-Fine Efficient Viterbi Parsing Nathan Bodenstab OGI RPE Presentation May 8, 2006 Outline • • • • • • What is Natural Language Parsing? Data Driven Parsing Hypergraphs and Parsing Algorithms High Accuracy Parsing Coarse-to-Fine Empirical Results 2 What is Natural Language Parsing? • Provides a sentence with syntactic information by hierarchically clustering and labeling its constituents. • A constituent is a group of one or more words that function together as a unit. 3 What is Natural Language Parsing? • Provides a sentence with syntactic information by hierarchically clustering and labeling its constituents. • A constituent is a group of one or more words that function together as a unit. 4 Why Parse Sentences? • Syntactic structure is useful in – Speech Recognition – Machine Translation – Language Understanding • Word Sense Disambiguation (ex. “bottle”) • Question-Answering • Document Summarization 5 Outline • • • • • • What is Natural Language Parsing? Data Driven Parsing Hypergraphs and Parsing Algorithms High Accuracy Parsing Coarse-to-Fine Empirical Results 6 Data Driven Parsing • Parsing = Grammar + Algorithm • Probabilistic Context-Free Grammar P(children=[Determiner, Adjective, Noun] | parent=NounPhrase) 7 Data Driven Parsing • Find the maximum likelihood parse tree from all grammatically valid candidates. • The probability of a parse tree is the product of all its grammar rule (constituent) probabilities. • The number of grammatically valid parse trees increases exponentially with the length of the sentence. 8 Outline • • • • • • What is Natural Language Parsing? Data Driven Parsing Hypergraphs and Parsing Algorithms High Accuracy Parsing Coarse-to-Fine Empirical Results 9 Hypergraphs • A directed hypergraph can facilitate dynamic programming (Klein and Manning, 2001). • A hyperedge connects a set of tail nodes to a set of head nodes. Standard Edge Hyperedge 10 Hypergraphs 11 The CYK Algorithm • Separates the hypergraph into “levels” • Exhaustively traverses every hyperedge, level by level 12 The A* Algorithm Priority Queue • Maintains a priority queue of traversable hyperedges • Traverses best-first until a complete parse tree is found 13 Outline • • • • • • What is Natural Language Parsing? Data Driven Parsing Hypergraphs and Parsing Algorithms High Accuracy Parsing Coarse-to-Fine Empirical Results 14 High(er) Accuracy Parsing • Modify the Grammar to include more context • (Grand) Parent Annotation (Johnson, 1998) P(children=[Determiner, Adjective, Noun] | parent=NounPhrase, grandParent=Sentence) 15 Increased Search Space Original Grammar Parent Annotated Grammar 16 Increased Search Space Original Grammar Parent Annotated Grammar 17 Increased Search Space Original Grammar Parent Annotated Grammar 18 Increased Search Space Original Grammar Parent Annotated Grammar 19 Increased Search Space Original Grammar Parent Annotated Grammar 20 Grammar Comparison Accuracy % 90 85 80 75 70 al xic Le K& M +P ar He ad t Pa re n W SJ 65 • Exact Inference with the CYK algorithm becomes intractable. • Most algorithms using Lexical models resort to greedy search strategies. • We want to find the globally optimal (Viterbi) parse tree for these highaccuracy models efficiently. 21 Outline • • • • • • What is Natural Language Parsing? Data Driven Parsing Hypergraphs and Parsing Algorithms High Accuracy Parsing Coarse-to-Fine Empirical Results 22 Coarse-to-Fine • Efficiently find the optimal parse tree of a large, context-enriched model (Fine) by following hyperedges suggested by solutions of a simpler model (Coarse). • To evaluate the feasibility of Coarse-to-Fine, we use – Coarse = WSJ – Fine = Parent 85 80 75 70 al xic Le K& M +P ar He ad Pa re nt 65 WS J Accuracy % 90 23 Increased Search Space Coarse Grammar Fine Grammar 24 Coarse-to-Fine Build Coarse hypergraph 25 Coarse-to-Fine Choose a Coarse hyperedge 26 Coarse-to-Fine Replace the Coarse hyperedge with Fine hyperedge (modifies probability) 27 Coarse-to-Fine Propagate probability difference 28 Coarse-to-Fine Repeat until optimal parse tree has only Fine hyperedges 29 Upper-Bound Grammar • Replacing a Coarse hyperedge with a Fine hyperedge can increase or decrease its probability. • Once we have found a parse tree with only Fine hyperedges, how can we be sure it is optimal? • Modify the probability of Coarse grammar rules to be an upperbound on the probability of Fine grammar rules. PCoarse A max PFine AP n nN max P | A, Parent n nN where N is the set of non-terminals and A is a grammar rule. 30 Outline • • • • • • What is Natural Language Parsing? Data Driven Parsing Hypergraphs and Parsing Algorithms High Accuracy Parsing Coarse-to-Fine Empirical Results 31 Results Computational Time Search Guidance 100 1000000 10 Time (seconds) Hyperedges Traversed 10000000 100000 10000 1000 CYK A* CTF 100 10 1 1 0.1 CTF CYK A* 0.01 0.001 5 7 9 11 13 15 17 19 Sentence Length 21 23 25 5 7 9 11 13 15 17 19 21 23 25 Sentence Length 32 Summary & Future Research • Coarse-to-Fine is a new exact inference algorithm to efficiently traverse a large hypergraph space by using the solutions of simpler models. • Full probability propagation through the hypergraph hinders computational performance. – Full propagation is not necessary; lower-bound of log2(n) operations. • Over 95% reduction in search space compared to baseline CYK algorithm. – Should prune even more space with higher-accuracy (Lexical) models. 33 Thanks 34 Choosing a Coarse Hyperedge Top-Down vs. Bottom-Up 35 Top-Down vs. Bottom-Up Computational Time Comparison Search Guidance Comparison 100 300000 90 CTF Top-Down CTF Top-Down 250000 CTF Bottom-Up Hyperedges Traversed Time (seconds) 80 70 60 50 40 30 20 CTF Bottom-Up 200000 150000 100000 50000 10 0 0 5 7 9 11 13 15 17 Sentence Length 19 21 23 25 5 7 9 11 13 15 17 19 21 23 25 Sentence Length • Top-Down • Bottom-Up • Traverses more hyperedges • Traverses less hyperedges • Hyperedges are closer to the root • Hyperedges are near the leaves • Requires less propagation (1/2) (words) and shared by many trees • True probability of trees isn’t know at the beginning of CTF 36 Coarse-to-Fine Motivation Optimal Fine Tree Optimal Coarse Tree 37