Download Coarse-to-Fine Efficient Viterbi Parsing

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Dependency grammar wikipedia , lookup

Stemming wikipedia , lookup

Antisymmetry wikipedia , lookup

Transformational grammar wikipedia , lookup

Construction grammar wikipedia , lookup

Context-free grammar wikipedia , lookup

Junction Grammar wikipedia , lookup

Probabilistic context-free grammar wikipedia , lookup

Parsing wikipedia , lookup

Transcript
Coarse-to-Fine Efficient
Viterbi Parsing
Nathan Bodenstab
OGI RPE Presentation
May 8, 2006
Outline
•
•
•
•
•
•
What is Natural Language Parsing?
Data Driven Parsing
Hypergraphs and Parsing Algorithms
High Accuracy Parsing
Coarse-to-Fine
Empirical Results
2
What is Natural Language Parsing?
• Provides a sentence with syntactic information by
hierarchically clustering and labeling its constituents.
• A constituent is a group of one or more words that
function together as a unit.
3
What is Natural Language Parsing?
• Provides a sentence with syntactic information by
hierarchically clustering and labeling its constituents.
• A constituent is a group of one or more words that
function together as a unit.
4
Why Parse Sentences?
• Syntactic structure is useful in
– Speech Recognition
– Machine Translation
– Language Understanding
• Word Sense Disambiguation (ex. “bottle”)
• Question-Answering
• Document Summarization
5
Outline
•
•
•
•
•
•
What is Natural Language Parsing?
Data Driven Parsing
Hypergraphs and Parsing Algorithms
High Accuracy Parsing
Coarse-to-Fine
Empirical Results
6
Data Driven Parsing
• Parsing = Grammar + Algorithm
• Probabilistic Context-Free Grammar
P(children=[Determiner, Adjective, Noun] | parent=NounPhrase)
7
Data Driven Parsing
• Find the maximum likelihood parse tree from all
grammatically valid candidates.
• The probability of a parse tree is the product of
all its grammar rule (constituent) probabilities.
• The number of grammatically valid parse trees
increases exponentially with the length of the
sentence.
8
Outline
•
•
•
•
•
•
What is Natural Language Parsing?
Data Driven Parsing
Hypergraphs and Parsing Algorithms
High Accuracy Parsing
Coarse-to-Fine
Empirical Results
9
Hypergraphs
• A directed hypergraph can facilitate dynamic
programming (Klein and Manning, 2001).
• A hyperedge connects a set of tail nodes to a set
of head nodes.
Standard Edge
Hyperedge
10
Hypergraphs
11
The CYK Algorithm
• Separates the hypergraph into “levels”
• Exhaustively traverses every hyperedge, level by level
12
The A* Algorithm
Priority Queue
• Maintains a priority queue of traversable hyperedges
• Traverses best-first until a complete parse tree is found
13
Outline
•
•
•
•
•
•
What is Natural Language Parsing?
Data Driven Parsing
Hypergraphs and Parsing Algorithms
High Accuracy Parsing
Coarse-to-Fine
Empirical Results
14
High(er) Accuracy Parsing
• Modify the Grammar to include more context
• (Grand) Parent Annotation (Johnson, 1998)
P(children=[Determiner, Adjective, Noun] | parent=NounPhrase, grandParent=Sentence)
15
Increased Search Space
Original Grammar
Parent Annotated
Grammar
16
Increased Search Space
Original Grammar
Parent Annotated
Grammar
17
Increased Search Space
Original Grammar
Parent Annotated
Grammar
18
Increased Search Space
Original Grammar
Parent Annotated
Grammar
19
Increased Search Space
Original Grammar
Parent Annotated
Grammar
20
Grammar Comparison
Accuracy %
90
85
80
75
70
al
xic
Le
K&
M
+P
ar
He
ad
t
Pa
re
n
W
SJ
65
• Exact Inference with the CYK algorithm becomes intractable.
• Most algorithms using Lexical models resort to greedy search strategies.
• We want to find the globally optimal (Viterbi) parse tree for these highaccuracy models efficiently.
21
Outline
•
•
•
•
•
•
What is Natural Language Parsing?
Data Driven Parsing
Hypergraphs and Parsing Algorithms
High Accuracy Parsing
Coarse-to-Fine
Empirical Results
22
Coarse-to-Fine
• Efficiently find the optimal parse tree of a large, context-enriched
model (Fine) by following hyperedges suggested by solutions of a
simpler model (Coarse).
• To evaluate the feasibility of Coarse-to-Fine, we use
– Coarse = WSJ
– Fine = Parent
85
80
75
70
al
xic
Le
K&
M
+P
ar
He
ad
Pa
re
nt
65
WS
J
Accuracy %
90
23
Increased Search Space
Coarse Grammar
Fine Grammar
24
Coarse-to-Fine
Build Coarse hypergraph
25
Coarse-to-Fine
Choose a Coarse hyperedge
26
Coarse-to-Fine
Replace the Coarse hyperedge with
Fine hyperedge (modifies probability)
27
Coarse-to-Fine
Propagate probability difference
28
Coarse-to-Fine
Repeat until optimal parse tree
has only Fine hyperedges
29
Upper-Bound Grammar
• Replacing a Coarse hyperedge with a Fine hyperedge can
increase or decrease its probability.
• Once we have found a parse tree with only Fine hyperedges, how
can we be sure it is optimal?
• Modify the probability of Coarse grammar rules to be an upperbound on the probability of Fine grammar rules.
PCoarse  A     max PFine  AP  n   
nN
 max P | A, Parent  n 
nN
where N is the set of non-terminals and
A
is a grammar rule.
30
Outline
•
•
•
•
•
•
What is Natural Language Parsing?
Data Driven Parsing
Hypergraphs and Parsing Algorithms
High Accuracy Parsing
Coarse-to-Fine
Empirical Results
31
Results
Computational Time
Search Guidance
100
1000000
10
Time (seconds)
Hyperedges Traversed
10000000
100000
10000
1000
CYK
A*
CTF
100
10
1
1
0.1
CTF
CYK
A*
0.01
0.001
5
7
9
11
13
15
17
19
Sentence Length
21
23
25
5
7
9
11
13
15
17
19
21
23
25
Sentence Length
32
Summary & Future Research
• Coarse-to-Fine is a new exact inference algorithm to
efficiently traverse a large hypergraph space by using
the solutions of simpler models.
• Full probability propagation through the hypergraph
hinders computational performance.
– Full propagation is not necessary; lower-bound of log2(n)
operations.
• Over 95% reduction in search space compared to
baseline CYK algorithm.
– Should prune even more space with higher-accuracy (Lexical)
models.
33
Thanks
34
Choosing a Coarse Hyperedge
Top-Down vs. Bottom-Up
35
Top-Down vs. Bottom-Up
Computational Time Comparison
Search Guidance Comparison
100
300000
90
CTF Top-Down
CTF Top-Down
250000
CTF Bottom-Up
Hyperedges Traversed
Time (seconds)
80
70
60
50
40
30
20
CTF Bottom-Up
200000
150000
100000
50000
10
0
0
5
7
9
11
13
15
17
Sentence Length
19
21
23
25
5
7
9
11
13
15
17
19
21
23
25
Sentence Length
• Top-Down
• Bottom-Up
• Traverses more hyperedges
• Traverses less hyperedges
• Hyperedges are closer to the root
• Hyperedges are near the leaves
• Requires less propagation (1/2)
(words) and shared by many trees
• True probability of trees isn’t
know at the beginning of CTF
36
Coarse-to-Fine Motivation
Optimal Fine Tree
Optimal Coarse Tree
37