Download The CKY algorithm part 1: Recognition

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Construction grammar wikipedia , lookup

Stemming wikipedia , lookup

Dependency grammar wikipedia , lookup

Junction Grammar wikipedia , lookup

Antisymmetry wikipedia , lookup

Context-free grammar wikipedia , lookup

Transformational grammar wikipedia , lookup

Probabilistic context-free grammar wikipedia , lookup

Parsing wikipedia , lookup

Transcript
The CKY algorithm part 1:
Recognition
Syntactic analysis (5LN455)
2014-11-16
Sofia Cassel
Department of Linguistics and Philology
Based on slides from Marco Kuhlmann / Sara Stymne
Recap: Parsing
Parsing
The automatic analysis of a sentence
with respect to its syntactic structure.
Phrase structure trees
root (top)
leaves (bottom)
Ambiguity
Ambiguity
Parsing: do you remember?
• How is parsing = searching?
• What is bottom-up parsing?
• What is top-down parsing?
Parsing
Parsing as search:
search through all possible parse trees
for a given sentence
• bottom–up:
build parse trees starting at the leaves
• top–down:
build parse trees starting at the root node
Exercise: Grammar
What is CNF?
Write a small grammar in CNF.
(for example, for the sentence
“I booked a flight from LA.”)
Grammar rules
• preterminal rules
rewrite a part-of-speech tag to a token: C → wi .
Examples?
• inner rules
rewrite a syntactic category to other categories:
C → C1 C2 , C → C1 .
Examples?
Grammar rules
• preterminal rules:
rules that rewrite a part-of-speech tag
to a token, i.e. rules of the form C → wi .
Pro → I, Verb → booked, Noun → flight
• inner rules:
rules that rewrite a syntactic category to other
categories: C → C1 C2 , C → C1 .
S → NP VP, NP → Det Nom, NP → Pro
Overview of the CKY algorithm
• efficient, bottom-up parsing algorithm for CFGs
• discovered at least three (!) times
and named after Cocke, Kasami, and Younger
• one of the most important and most used parsing
algorithms
Applications
Recognition:
Is there any parse tree at all?
Probabilistic parsing:
What is the most probable parse tree?
Restrictions
• can only handle rules that are at most binary:
C → wi , C → C1 , C → C1 C2 .
• requires preprocessing (binarization) and
postprocessing (debinarization).
Restrictions - details
• originally handles grammars in CNF (Chomsky
normal form):
C → wi , C → C1 C2 , (S → ε)
• ε is normally not used in natural language grammars
• We also allow unit productions, C → C1
• Extended CNF
• Easy to integrate into CNF, easier grammar
conversions
Conventions
We have:
• a context-free grammar G
• a sequence of word tokens w = w1 … wn
We will:
• compute parse trees of w according to G
• write S for the start symbol of G
Fencepost positions
We view the sequence w as a fence with n holes,
one hole for each token wi ,
and we number the fenceposts from 0 till n.
Structure
• Is there any parse tree at all?
• What is the most probable parse tree?
Recognition
Recognition
Recognizer
Is there any parse tree at all
for the sequence w according to the grammar G?
(In practice, we also want a concrete parse tree)
Recognition
Parse trees
Recognition
Recognizing small trees
C → wi
Recognition
Recognizing small trees
Recognition
Recognizing small trees
Example?
Recognition
Recognizing big trees
C → C1 C2
Recognition
Recognizing big trees
Example?
Recognition
Recognizing big trees
Recognition
Discuss!
• How do we know that we have recognized the
right thing? (that the sentence is grammatical)
• How do we need to extend this reasoning
in the presence of unary rules: C → C1 ?
Recognition
Signatures
• Rules are independent of a parse tree’s inner
structure.
• The only thing that is important is
how the parse tree looks from the ‘outside’.
• We call this the signature of the parse tree.
• A parse tree with signature [min, max, C] is one
that covers all words between min and max
and whose root node is labeled with C.
Recognition
Discuss!
• What is the signature of a parse tree
for the complete sentence?
• How many different signatures are there?
• Can you relate the runtime of the parsing
algorithm to the number of signatures?