* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Context Free Grammars 10/28/2003 Reading: Chap 9, Jurafsky
Old Irish grammar wikipedia , lookup
English clause syntax wikipedia , lookup
Serbo-Croatian grammar wikipedia , lookup
Kannada grammar wikipedia , lookup
Construction grammar wikipedia , lookup
Antisymmetry wikipedia , lookup
Modern Hebrew grammar wikipedia , lookup
Preposition and postposition wikipedia , lookup
Morphology (linguistics) wikipedia , lookup
Georgian grammar wikipedia , lookup
French grammar wikipedia , lookup
Controlled grammar wikipedia , lookup
Lexical semantics wikipedia , lookup
Compound (linguistics) wikipedia , lookup
Portuguese grammar wikipedia , lookup
Arabic grammar wikipedia , lookup
Spanish grammar wikipedia , lookup
Dependency grammar wikipedia , lookup
Chinese grammar wikipedia , lookup
Zulu grammar wikipedia , lookup
Ancient Greek grammar wikipedia , lookup
Malay grammar wikipedia , lookup
Latin syntax wikipedia , lookup
Turkish grammar wikipedia , lookup
Yiddish grammar wikipedia , lookup
Polish grammar wikipedia , lookup
Transformational grammar wikipedia , lookup
Scottish Gaelic grammar wikipedia , lookup
Junction Grammar wikipedia , lookup
Context-free grammar wikipedia , lookup
Esperanto grammar wikipedia , lookup
Probabilistic context-free grammar wikipedia , lookup
Pipil grammar wikipedia , lookup
English grammar wikipedia , lookup
Context Free Grammars Syntax Syntax = rules describing how words can connect to each other Goal of syntax is to model the knowledge people unconsciously have about the grammar of their native language Syntax Why should you care? Grammar checkers Question answering Information extraction Machine translation Constituency The basic idea here is that groups of words within utterances can be shown to act as single units. And in a given language, these units form coherent classes that can be shown to behave in similar ways Constituency How words group into units and how the various kinds of units behave Constituency Ordering: What are the rules that govern the ordering of words and bigger units in the language? Constituency E.g., Noun phrases (NPs) • • • • • • Three parties from Brooklyn A high-class SUV such as Mindy’s The Broadway coppers They Harry the Horse The reason he comes into the Hot Box How do we know these form a constituent? They can all appear before a verb: But individual words can’t always appear before verbs: Three parties from Brooklyn arrive… A high-class SUV such as Mindy’s attracts… The Broadway coppers love… They sit *from arrive… *as attracts… *the is *spot is… Must be able to state generalizations like: Noun phrases occur before verbs Preposing and postposing: On September 17th, I’d like to fly from Atlanta to Denver I’d like to fly on September 17th from Atlanta to Denver I’d like to fly from Atlanta to Denver on September 17th. But not: *On September, I’d like to fly 17th from Atlanta to Denver *On I’d like to fly September 17th from Atlanta to Denver Context-Free Grammars Context-free grammars (CFGs) Also known as Phrase structure grammars Backus-Naur form Consist of Rules Terminals Non-terminals Context-Free Grammars Terminals We’ll Non-Terminals The constituents in a language Like take these to be words noun phrase, verb phrase and sentence Rules Rules are equations that consist of a single non-terminal on the left and any number of terminals and nonterminals on the right. CFG Example S -> NP VP NP -> Det NOMINAL NOMINAL -> Noun VP -> Verb Det -> a Noun -> flight Verb -> left these rules are defined independent of the context where they might occur -> CFG CFGs S -> NP VP This says that there are units called S, NP, and VP in this language That an S consists of an NP followed immediately by a VP Generativity As • • • with FSAs we can view these rules as either analysis or synthesis machines Generate strings in the language Reject strings not in the language Impose structures (trees) on strings in the language Parsing Parsing is the process of taking a string and a grammar and returning a (many?) parse tree(s) for that string Context? The non-terminal on the left-hand side of a rule is out there all by itself A -> B C Means that I can rewrite an A as a B followed by a C regardless of the context in which A is found Key Constituents (English) • • • • Sentences Noun phrases Verb phrases Prepositional phrases Some NP Rules Here are some rules for the noun phrases Together, these describe two kinds of NPs. One that consists of a determiner followed by a nominal And another that says that proper names are NPs. The third rule illustrates two things An explicit disjunction Two kinds of nominals A recursive definition Same non-terminal on the right and left-side of the rule Grammar CFGs more formally A context-free grammar has 4 parameters (“is a 4-tuple”) 1) A set of non-terminal symbols (“variables”) N 2) A set of terminal symbols (disjoint from N) 3) A set of productions P, each of the form 4) A -> Where A is a non-terminal and is a string of symbols from the infinite set of strings ( N)* A designated start symbol S Derivations A derivation is a sequence of rules expansion Common to represent derivation by a parse tree Bracketed Notation [S [NP [PRO I]] [VP [V prefer] [NP [Det a] [Nom [N morning] [N flight] ] ] ] ] Sentence Types Declaratives: A plane left. S NP VP Imperatives: Leave! S VP Yes-No Questions: Did the plane leave? S Aux NP VP WH Questions: When did the plane leave? S WH-NP Aux NP VP 5/23/2017 21 NPs NP -> Pronoun NP -> Proper-Noun I came, you saw it, they conquered Los Angeles is west of Texas John Hennessy is the president of Stanford NP -> Det Noun The president NP -> Nominal Nominal -> Noun Nominal l Noun A morning flight to Denver PPs PP -> Preposition NP From LA To the store On Tuesday morning With lunch Recursion We’ll have to deal with rules such as the following where the non-terminal on the left also appears somewhere on the right (directly) NP -> NP PP VP -> VP PP [[The flight] [to Boston]] [[departed Miami] [at noon]] Recursion Example flights from Denver Flights from Denver to Miami Flights from Denver to Miami in February Flights from Denver to Miami in February on a Friday Flights from Denver to Miami in February on a Friday under $300 Flights from Denver to Miami in February on a Friday under $300 with lunch Determiners Noun phrases can start with determiners... Determiners can be Simple A Or car simple possessives John’s Or car complex recursive versions of that John’s 5/23/2017 lexical items: the, this, a, an, etc. sister’s husband’s son’s car 26 Pre-determiners Word classes that appear in the NP before the determiner are called predeterminers Eg: all the flights Word classes that appear in the NP between the determiner and the head noun are called post determiners ( they include cardinal numbers, ordinal numbers and quantifiers, adjective phrase) Eg: one stop The first one The nonstop flight the least expensive fare Nominals Contains the head and any pre- and post- modifiers of the head. Pre Quantifiers, cardinals, ordinals... Three cars Adjectives large cars Ordering 5/23/2017 constraints Three large cars ?large three cars 28 NP->(det) (card) (ord) (quant) (AP) Nominal After the head noun there can be post modifiers Postmodifiers Three kinds Prepositional phrases Non-finite clauses Any flights arriving before noon Relative clauses All flights from Seattle a flight that serves breakfast Same general (recursive) rule to handle these Nominal Nominal PP Nominal Nominal GerundVP Nominal Nominal RelClause GerundVP-> GerundV NP GerundV PP GerundV (being, preferring, arriving…) GerundV NP PP 5/23/2017 RelClause -> (who, that) VP 30 Coordination Constructions Noun phrases and other units can be conjoined with conjunctions like and, or and but S -> S and S John went to NY and Mary followed him NP -> NP and NP ( the flights and the costs) VP -> VP and VP (leaving Denver and arriving in Sanfrancisco) … Practice: Another Recursion Example The following where the non-terminal on the left also appears somewhere on the right (directly). NP -> NP PP [[The flight] [to Boston]] VP -> VP PP [[departed Miami] [at noon]] Recursion Example flights from Denver Flights from Denver to Miami Flights from Denver to Miami in February Flights from Denver to Miami in February on a Friday Flights from Denver to Miami in February on a Friday under $300 Flights from Denver to Miami in February on a Friday under $300 with lunch Potential Problems in CFG Agreement Subcategorization Movement Sentence Types Declaratives: A plane left. S NP VP Imperatives: Leave! S VP Yes-No Questions: Did the plane leave? S Aux NP VP WH Questions: When did the plane leave? S WH-NP Aux NP VP Noun Phrases Let’s consider the following rule in more detail... NP Det Nominal Most of the complexity of English noun phrases is hidden in this rule. Consider the derivation for the following example All the morning flights from Denver to Tampa leaving before 10 Noun Phrases NP Structure Clearly this NP is really about flights. That’s the central criticial noun in this NP. It is the head. We can dissect this kind of NP into the stuff that can come before the head, and the stuff that can come after it. Determiners Noun phrases can start with determiners... Determiners can be Simple lexical items: the, this, a, an, etc. A car Or simple possessives John’s car Or complex recursive versions of that John’s sister’s husband’s son’s car Nominals Contains the head and any pre- and post- modifiers of the head. Pre- Quantifiers, cardinals, ordinals... Three cars Adjectives large cars Ordering constraints Three large cars ?large three cars Postmodifiers Three kinds Prepositional phrases From Seattle Non-finite clauses Arriving before noon Relative clauses That serve breakfast Same general (recursive) rule to handle these Nominal Nominal PP Nominal Nominal GerundVP Nominal Nominal RelClause Agreement This dog *This dogs Those dogs *Those dog This dog eats *This dog eat Those dogs eat *Those dogs eats Verb Phrases English VPs consist of a head verb along with 0 or more following constituents which we’ll call arguments. Subcategorization Sneeze: John sneezed Find: Please find [a flight to NY]NP Give: Give [me]NP[a cheaper fare]NP Help: Can you help [me]NP[with a flight]PP Prefer: I prefer [to leave earlier]TO-VP Told: I was told [United has a flight]S … *John sneezed the book *I prefer United has a flight *Give with a flight Subcat expresses the constraints that a predicate places on the number and type of the argument it wants to take Overgeneration The various rules for VPs overgenerate. They permit the presence of strings containing verbs and arguments that don’t go together For example VP -> V NP therefore Sneezed the book is a VP since “sneeze” is a verb and “the book” is a valid NP In lecture: go over the grammar for assignment 3 Possible CFG Solution Possible solution for agreement. SgS -> SgNP SgVP Can use the same trick for all the verb/VP classes. PlS -> PlNp PlVP SgNP -> SgDet SgNom PlNP -> PlDet PlNom PlVP -> PlV NP SgVP ->SgV Np … Movement Core example [[My travel agent]NP [booked [the flight]NP]VP]S I.e. “book” is a straightforward transitive verb. It expects a single NP arg within the VP as an argument, and a single NP arg as the subject. Movement What about? Which flight do you want me to have the travel agent book? The direct object argument to “book” isn’t appearing in the right place. It is in fact a long way from where its supposed to appear. And note that it’s separated from its verb by 2 other verbs. Formally… To put all previous discussions/examples in a formal definition for CFG: A context free grammar has four parameters: 1. 2. 3. 4. A set of non-terminal symbols N A set of terminal symbols T A set of production rules P, each of the form A a, where A is a non-terminal, and a is a string of symbols from the infinite set of strings (T N)* A designated start symbol S Grammar equivalence and normal form Strong equivalence: two grammars are strongly equivalent if: they generate/accept the same set of strings they assign the same phrase structure to each sentence two grammars are weakly equivalent if: they generate/accept the same set of strings they do not assign the same phrase structure to each sentence Normal form Restrict the form of productions Chomsky Normal Form (CNF) Right hand side of the productions has either one or two terminals or non-terminals e.g. A -> BC A -> a Any grammar can be translated into a weakly equivalent CNF A -> B C D <=> A-> B X X -> C D Treebanks Treebanks are corpora in which each sentence has been paired with a parse tree (presumably the right one). These are generally created By first parsing the collection with an automatic parser And then having human annotators correct each parse as necessary. This generally requires detailed annotation guidelines that provide a POS tagset, a grammar and instructions for how to deal with particular grammatical constructions. Penn Treebank Penn TreeBank is a widely used treebank. Most well known is the Wall Street Journal section of the Penn TreeBank. 1 M words from the 1987-1989 Wall Street Journal. Treebank Grammars Treebanks implicitly define a grammar for the language covered in the treebank. Simply take the local rules that make up the sub-trees in all the trees in the collection and you have a grammar. Not complete, but if you have decent size corpus, you’ll have a grammar with decent coverage. Treebank Grammars Such grammars tend to be very flat due to the fact that they tend to avoid recursion. For example, the Penn Treebank has 4500 different rules for VPs. Among them... Heads in Trees Finding heads in treebank trees is a task that arises frequently in many applications. Particularly important in statistical parsing We can visualize this task by annotating the nodes of a parse tree with the heads of each corresponding node. Lexically Decorated Tree Head Finding The standard way to do head finding is to use a simple set of tree traversal rules specific to each non-terminal in the grammar. Noun Phrases Treebank Uses Treebanks (and headfinding) are particularly critical to the development of statistical parsers Chapter 14 Also valuable to Corpus Linguistics Investigating the empirical details of various constructions in a given language Dependency Grammars In CFG-style phrase-structure grammars the main focus is on constituents. But it turns out you can get a lot done with just binary relations among the words in an utterance. In a dependency grammar framework, a parse is a tree where the nodes stand for the words in an utterance The links between the words represent dependency relations between pairs of words. Relations may be typed (labeled), or not. Dependency Relations Dependency Parse They hid the letter on the shelf Dependency Parsing The dependency approach has a number of advantages over full phrase-structure parsing. Deals well with free word order languages where the constituent structure is quite fluid Parsing is much faster than CFG-bases parsers Dependency structure often captures the syntactic relations needed by later applications CFG-based approaches often extract this same information from trees anyway. Dependency Parsing There are two modern approaches to dependency parsing Optimization-based approaches that search a space of trees for the tree that best matches some criteria Shift-reduce approaches that greedily take actions based on the current word and state. Summary Context-free grammars can be used to model various facts about the syntax of a language. When paired with parsers, such grammars consititute a critical component in many applications. Constituency is a key phenomena easily captured with CFG rules. But agreement and subcategorization do pose significant problems Treebanks pair sentences in corpus with their corresponding trees.