Download Borrowed from Dependency Parsing

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Dependency Parsing
Some slides are based on:
1) PPT presentation on dependency parsing by Prashanth Mannem
2) Seven Lectures on Statistical Parsing by Christopher Manning
Constituency parsing
• Breaks sentence into constituents (phrases),
which are then broken into smaller
constituents
• Describes phrase structure and clause
structure ( NP, PP, VP, etc.)
• Structures often recursive
S
NP
VP
VP
mom
is
NP
an
amazing
show
Dependency parsing
• Syntactic structure consists of lexical items,
linked by binary asymmetric relations called
dependencies
• Interested in grammatical relations between
individual words (governing & dependent
words)
• Does not propose a recursive structure, rather
a network of relations
• These relations can also have labels
Dependency vs. Constituency
• Dependency structures explicitly represent
– Head-dependent relations (directed arcs)
– Functional categories (arc labels)
– Possibly some structural categories (parts-of-speech)
• Constituency structure explicitly represent
– Phrases (non-terminal nodes)
– Structural categories (non-terminal labels)
– Possibly some functional categories (grammatical
functions)
Dependency vs. Constituency
• A dependency grammar has a notion of a head
• Officially, CFGs don’t
• But modern linguistic theory and all modern
statistical parsers (Charniak, Collins, …) do, via handwritten phrasal “head rules”:
– The head of a Noun Phrase is a noun/number/…
– The head of a Verb Phrase is a verb/modal/….
Based on a slide by Chris Manning
Dependency vs. Constituency
• The head rules can be used to extract a
dependency parse from a CFG parse (follow
the heads)
• A phrase structure tree can be got from a
dependency tree, but dependents are flat
Based on a slide by Chris Manning
Definition: dependency graph
• An input word sequence w1…wn
• Dependency graph G = (V,E) where
– V is the set of nodes i.e. word tokens in the input
seq.
– E is the set of unlabeled tree edges
(i, j) i, j є V
– (ii, j) indicates an edge from i (parent, head,
governor) to j (child, dependent)
Definition: dependency graph
• A dependency graph is well-formed iff
– Single head: Each word has only one head
– Acyclic: The graph should be acyclic
– Connected: The graph should be a single tree with
all the words in the sentence
– Projective: If word A depends on word B, then all
words between A and B are also subordinate to B
(i.e. dominated by B)
Non-projective dependencies
Ram
saw
a
dog
yesterday
which
was
a
Yorkshire Terrier
Parsing algorithms
• Dependency based parsers can be broadly
categorized into
– Grammar driven approaches
• Parsing done using grammars
– Data driven approaches
• Parsing by training on annotated/un-annotated data
Unlabeled graphs
• Dan Klein recently showed that labeling is
relatively easy and that the difficulty of
parsing lies in creating bracketing (Klein, 2004)
• Therefore some parsers run in two steps:
1) bracketing; 2) labeling
Traditions
• Dynamic programming
– e.g., Eisner (1996), McDonald (2006)
• Deterministic search
– e.g., Covington (2001), Yamada and Matsumoto,
Nivre (2006)
• Constraints satisfaction
– e.g., Maruyama, Foth et al.
Data driven
• Two main approaches
– Global, Exhaustive, Graph-based parsing
– Local, greedy, transition-based parsing
Graph-based parsing
• Assume there is a scoring function:
s :V ´V ´ L ® R
• The score of a graph is
s(G = (V, E)) =
å
s(i, j)
(i, j )ÎE
• Parsing for input string x is
G* = argmaxGÎD(Gx )
All dependency graphs
å
(i, j )ÎE
s(i, j)
MST algorithm (McDonald, 2006)
• Scores are based on features, independent of
other dependencies
s(i, j) = w × f (i, j)
• Features can be
– Head and dependent word and POS separately
– Head and dependent word and POS bigram
features
– Words between head and dependent
– Length and direction of dependency
MST algorithm (McDonald, 2006)
• Parsing can be formulated as
maximum spanning tree problem
• Use Chu-Liu-Edmonds (CLE) algorithm for MST
(runs in O(n2 ) , considers non-projective arcs)
• Uses online learning for determining weight
vector w
s(i, j) = w × f (i, j)
Transition-based parsing
• A transition system for dependency parsing
defines:
– a set C of parser configurations, each of which
defines a (partially built) dependency graph G
– a set T of transitions, each a function t :CC
– for every sentence x = w0,w1, . . . ,wn
• a unique initial configuration cx
• a set Qx of terminal configurations
Transition sequence
• A transition sequence Cx,m = (cx, c1, . . . , cm)
for a sentence x is a sequence of
configurations such that cm Î Q and, for every
ci ¹ cx there is a transition t Î T such that
ci = t(ci-1 )
• The graph defined by cm Î Q is the
dependency graph of x
Transition scoring function
• The score of a transition t in a configuration c
s(c, t) represents the likelihood of taking
transition t out of configuration c
s : C ´T ® R
• Parsing is finding the optimal transition
sequence ( t* = argmaxtÎT s(c,t) )
Yamada and Matsumoto (2003)
• A transition-based (shift-reduce) parser
• Considers two adjacent words
• Runs in iterations, continues as long as new dependencies
are created
• In every iteration, consider 3 different actions and choose
one using SVM (or other discriminative learning technique)
• Time complexity O(n 2 )
• Accuracy was shown to be close to the state-of-the-art
algorithms (e.g., Eisner’s)
Y&M (2003) Actions
• Shift
• Left
• Right
Y&M (2003) Learning
• Features (lemma, POS tag) are collected
from the context
Stack-based parsing
• Introducing a stack and a buffer
– The buffer is a queue of all input words (left to
right)
– The stack begins empty; words are pushed to the
stack by the defined actions
• Reduces Y&M complexity to linear time
2 stack-based parsers
• Nivre’s (2003, 2006) arc-standard
i doesn’t
have a head
already
s
b
j doesn’t
have a head
already
Stack
Buffer
2 stack-based parsers
• Nivre’s (2003, 2006) arc-eager
Example (arc eager)
Red
_ROOT_
S
figures
on
the
screen
indicated
falling
stocks
Q
Borrowed from Dependency Parsing (P. Mannem)
Example
_ROOT_ Red
figures
S
on
the
screen
indicated
falling
stocks
Q
Shift
Borrowed from Dependency Parsing (P. Mannem)
Example
Red
_ROOT_
S
figures
on
the
screen
indicated
falling
stocks
Q
Left-arc
Borrowed from Dependency Parsing (P. Mannem)
Example
_ROOT_ Red
figures
on
S
the
screen
indicated
falling
stocks
Q
Shift
Borrowed from Dependency Parsing (P. Mannem)
Example
_ROOT_ Red
figures
on
the
S
screen
indicated
falling
stocks
Q
Right-arc
Borrowed from Dependency Parsing (P. Mannem)
Example
_ROOT_ Red
figures
on
the
screen
S
indicated
falling
stocks
Q
Shift
Borrowed from Dependency Parsing (P. Mannem)
Example
_ROOT_ Red
figures
the
on
S
screen
indicated
falling
stocks
Q
Left-arc
Borrowed from Dependency Parsing (P. Mannem)
Example
_ROOT_ Red
figures
on
the
screen
indicated
S
falling
stocks
Q
Right-arc
Borrowed from Dependency Parsing (P. Mannem)
Example
_ROOT_ Red
figures
the
on
S
screen
indicated
falling
stocks
Q
Reduce
Borrowed from Dependency Parsing (P. Mannem)
Example
_ROOT_ Red
on
figures
S
the
screen
indicated
falling
stocks
Q
Reduce
Borrowed from Dependency Parsing (P. Mannem)
Example
_ROOT_ Red
S
figures
on
the
screen
indicated
falling
stocks
Q
Left-arc
Borrowed from Dependency Parsing (P. Mannem)
Example
_ROOT_ Red
figures
on
the
screen indicated
falling
S
stocks
Q
Right-arc
Borrowed from Dependency Parsing (P. Mannem)
Example
_ROOT_ Red
figures
on
the
screen indicated falling
stocks
S
Q
Shift
Borrowed from Dependency Parsing (P. Mannem)
Example
_ROOT_ Red
figures
on
the
screen indicated
falling
S
stocks
Q
Left-arc
Borrowed from Dependency Parsing (P. Mannem)
Example
_ROOT_ Red
figures
on
the
screen indicated
falling
stocks
S
Q
Right-arc
Borrowed from Dependency Parsing (P. Mannem)
Example
_ROOT_ Red
figures
on
the
screen indicated
falling
S
stocks
Q
Reduce
Borrowed from Dependency Parsing (P. Mannem)
Example
_ROOT_ Red
S
figures
on
the
screen indicated
falling
stocks
Q
Reduce
Borrowed from Dependency Parsing (P. Mannem)
Graph (MSTParser) vs. Transitions
(MaltParser)
• Accuracy on different languages
Characterizing the Errors of Data-Driven Dependency Parsing Models, McDonald and Nivre 2007
Graph (MSTParser) vs. Transitions
(MaltParser)
• Sentence length vs. accuracy
Characterizing the Errors of Data-Driven Dependency Parsing Models, McDonald and Nivre 2007
Graph (MSTParser) vs. Transitions
(MaltParser)
• Dependency length vs. precision
Characterizing the Errors of Data-Driven Dependency Parsing Models, McDonald and Nivre 2007
Known Parsers
• Stanford (constituency + dependency)
• MaltParser (dependency)
• MSTParser (dependency)
• Hebrew
– Yoav Goldberg’s parser
(http://www.cs.bgu.ac.il/~yoavg/software/hebpar
sers/hebdepparser/)