* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download notes
Yiddish grammar wikipedia , lookup
Construction grammar wikipedia , lookup
Spanish grammar wikipedia , lookup
Ancient Greek grammar wikipedia , lookup
Lexical semantics wikipedia , lookup
Context-free grammar wikipedia , lookup
Antisymmetry wikipedia , lookup
Pipil grammar wikipedia , lookup
Lithuanian declension wikipedia , lookup
Dependency grammar wikipedia , lookup
Probabilistic context-free grammar wikipedia , lookup
Transformational grammar wikipedia , lookup
1 ICL/Context Free Grammars/2005-10-31 Contents 1 Outline 1 2 What is a Context Free Grammar? 2.1 Some Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 2 3 3 Example CFG for English 3.1 Constituency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 Recursion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3 Ambiguity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 6 7 8 4 Challenges for CFGs 4.1 Agreement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 Subcategorization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3 Unbounded Dependencies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 9 10 11 5 Summary 11 2 What is a Context Free Grammar? Syntax • How words are combined to form phrases; and • how phrases are combined to form sentences. • New concept: Constituency • Groups of words may behave as a single unit or constituent, – – – – – They ate pizza at 8 pm. They ate pizza then. [substitution by pro-form] At 8 pm, they ate pizza. [preposing] When did they eat pizza? At 8 pm. [constituent answer] They ate pizza at 6 pm and at 8 pm. [coordinate conjunct] Syntax in CL Syntactic analysis used to varying degrees in applications such as: • Grammar Checkers • Spoken Language Understanding • Question Answering systems • Information Extraction • Automatic Text Generation • Machine Translation Typically, fine-grained syntactic analysis is a prerequisite for fine-grained semantic interpretation. Context Free Grammars (CFGs) • Capture constituency and ordering; • formalise descriptive linguistic work of the 1940s and ’50s; • are widely used in linguistics. • CFGs are somewhat biased towards languages like English which have relatively fixed word order. • Most modern linguistic theories of grammar incorporate some notions from context free grammar. 2.1 Some Definitions Context Free Grammars (CFGs) Formally, a CFG is a 4-tuple hN, Σ, P, Si, where • N is a set of non-terminal symbols (e.g., syntactic categories) • Σ a set of terminal symbols (e.g., words) • P a set of productions (rules) of the form A → α, where – A is a non-terminal, and – α is a string of symbols from the set (Σ ∪ N )? (i.e., both terminals and non-terminals) • a designated start symbol S Example CFG Let G = hN, Σ, P, Si, where • N = {S, NP, VP, Det, Nom, V, N} • Σ = {a, flight, left} • P = { S → NP VP, NP → Det Nom, Nom → N, VP → V, Det → a, N → flight, V → left } • S = S. NP = ‘noun phrase’, VP = ‘verb phrase’, Det = ‘determiner’, Nom = ‘Nominal’, N = ‘noun’, V = ‘verb’. Derivations • A derivation of a string from non-terminal A is the result of successively applying productions (from G) to A: NP Det Nom a Nom a N a flight by by by by NP→ Det Nom Det→ a Nom→ N N→ flight 2 • Can also write: NP⇒ Det Nom⇒ a Nom⇒ a N⇒ a lingflight, where ⇒ means “yields in one rule application”. • G generates a flight (as a string of category NP). Grammars and Languages • CFG is an abstract model for associating structures with strings; • not intended as model of how humans produce sentences. • Sentences that can be derived by a grammar G belong to the formal language defined by G, and are called Grammatical Sentences with respect to G. • Sentences that cannot be derived by G are Ungrammatical Sentences with respect to G.. • The language LG defined by grammar G is the set of strings composed of terminal symbols that are derivable from the start symbol: LG = {w|w ∈ Σ? and S derives w} 2.2 Trees Parse Trees • Derivations can also be visualized as parse trees (or constituent structure trees), e.g. NP Det Nom a N flight • Trees express: – hierarchical grouping into constituents – grammatical category of constituents – left-to-right order of constituents Parse Trees, cont. • Trees can also be written as labeled bracketings: [NP [Det a] [Nom [N flight]]] • Dominance: node x dominates node y if there’s a connected sequence of branches descending from x to y. E.g. – NP dominates non-terminals Det, Nom and N • Immediate Dominance: node x immediately dominates node y if x dominates y and there’s no distinct node between x and y. E.g. – NP immediately dominates Det and Nom. 3 Parse Trees, cont. NP Det Nom a N flight • A node is called the daughter of the node which immediately dominates it. • Distinct nodes immediately dominated by the same node are called sisters. • A node which is not dominated by any other node is called the root node. • Nodes which do not dominate any other nodes are called leaves. CFG: As opposed to what? • Regular Grammars: – All rules of the form A → xB or A → x. – Equivalent to Regular Expressions. – Regarded as too weak to capture lingistic generalizations. • Context Sensitive Grammars: – Allows rules of the form XAY → XαY ; i.e., the way in which A is expanded can depend on the context X Y . – Regarded as ’too strong’ — can describe languages that aren’t possible human languages. – Regular languages ⊂ Context Free languages ⊂ Context Sensitive languages 3 Example CFG for English Grammars and Constituency • A huge amount of skilled effort goes into the development of grammars for human languages — can only scratch the surface here. • There’s lot’s of research into English syntactic structure — but also lots of disagreement. • Various criteria for determining constituency: – substitution by pro-forms – preposing – constituent answers – coordination • Some clear-cut decisions, but quite a lot of unclear ones too. 4 A Tiny N V A Pro PropN Det P Conj Lexicon → flight | passenger | trip | morning | . . . → is | prefers | like | need | depend | fly → cheapest | non-stop | first | latest other | direct | . . . → me | I | you | it | . . . → Alaska | Baltimore | Los Angeles Chicago | United | American | . . . → the | a | an | this | these | that | . . . → from | to | on | near | . . . → and | or | but | . . . A Tiny Grammar S → NP VP NP → Pro | PropN | Det A Nom | Det Nom Nom → Nom PP | N Nom | N VP → VP PP | V NP | V NP PP | V PP PP → P NP I + want a morning flight I Los Angeles the + next + passenger a + flight flight + to Los Angeles morning + flight trip leave + in the morning want + a flight sell + a ticket + to me depend + on the weather from + Los Angeles Example Noun Phrase NP Det A the next Nom Nom PP Nom PP N from NY to LA flight Example Noun Phrase: Heads NP Det A the next Nom Nom PP Nom PP N from NY flight 5 to LA Example Verb Phrase VP PP VP V NP PP in the morning sell a ticket to me Arguments vs. Modifiers • Arguments: ’essential participants’ in an event • Modifiers: optional additional information about an event • As with other linguistic distinctions, some clear cases and some unclear ones. • We’ve chosen to reflect the distinction in the parse trees: – arguments are sisters of V (or N) – modifiers are sisters of VP (or Nom) Example Sentence S VP NP Det A Nom the other N NP V A Nom non-stop N prefers Det a passenger flight 3.1 Constituency Are VPs Constituents? S S VP NP Kim V NP ate pizza NP V NP Kim ate pizza • Kim ate pizza and Lee did too. • What did Kim do? Ate pizza. • Kim said she would eat pizza, and eat pizza she did. 6 Constituency in REs? • Regular Expression: (the|a)(other|non-stop)?(passenger|flight)prefers (the|a)(other|non-stop)?(passenger|flight) • No explicit representation of NP which can be ’re-used’ in different positions in a sentence. Constituency in Regular Grammars? Det the A N other passenger V prefers Det a A non-stop N flight 3.2 Recursion Recursive Structures • There is no upper bound on the length of a grammatical English sentence. – Therefore the set of English sentences is infinite. • A grammar is a finite statement about well-formedness. – To account for an infinite set, it has to allow iteration (e.g., X + ) or recursion. • Recursive rules: where the non-terminal on the left-hand side of the arrow in a rule also appears on the right-hand side of a rule. Recursive Structures, cont. Direct recursion: Nom → Nom PP flight to Boston VP → VP PP departed Miami at noon Indirect recursion: S → NP VP VP → V S said that the flight was late Recursion Example: Sentential Complements 7 S NP VP Pro V they said S NP VP Pro V he claimed S NP VP Pro V she lied Recursion Example: Possessives NP NPposs ’s NP ’s best friend NP NP dog Nom NPposs NPposs Nom Nom ’s sister John Coordination NP → NP and NP VP → VP and VP S → S and S • I need [[NP the times] and [NP the fares]]. • a flight [[VP departing at 9a.m.] and [VP returning at 5p.m.]] • [[S I depart on Wednesday] and [S I’ll return on Friday]]. Any phrasal constituent XP can be conjoined with a constituent of the same type —XP to form a new constituent of type XP. General schema: XP → XP and XP 3.3 Ambiguity Syntactic Ambiguity • Many kinds of syntactic (structural) ambiguity. • PP attachment has received much attention: 8 VP V NP saw Nom VP Nom PP the man with a telescope V NP PP saw Nom with a telescope the man PP Ambiguity • Different structures naturally correspond to different semantic interpretations (‘readings’) • Arises from independently motivated syntactic rules: VP → V . . . PP Nom → Nom PP • However, also strong, lexically influenced, preferences: – I bought [a book [on linguistics]] – I bought [a book ] [on sunday] 4 Challenges for CFGs Problem Areas for CFGs • Agreement • Subcategorization • ’Movement’ or unbounded dependencies 4.1 Agreement Number Agreement In English, some determiners agree in number with the head noun: • This dog • Those dogs • *Those dog • *This dogs And verbs agree in number with their subjects: • What flights leave in the morning? • *What flight leave in the morning? 9 Number Agreement, cont. Expand our grammar with multiple sets of rules? NPsg → Detsg Nsg NPpl → Detpl Npl Ssg → NPsg VPsg Spl → NPpl VPpl VPsg → Vsg (NP) (NP) (PP) VPpl → Vpl (NP) (NP) (PP) • worse when we add person and even worse in languages with richer agreement (e.g., three genders). • lose generalizations about nouns and verbs — can’t say property P is true of all words of category V. 4.2 Subcategorization Subcategorization Verbs have preferences for the kinds of constituents (cf. arguments) they co-occur with. • I found the cat. • *I disappeared the cat. • It depends [PP on the question]. • *It depends [PP {to/from/by} the question]. A traditional subcategorization of verbs: • transitive (takes a direct object NP) • intransitive In more recent approaches, there might be as many as a hundred subcategorizations of verb. Subcategorization, cont. More examples: • find is subcategorized for an NP (can take an NP complement) • want is subcategorized for an NP or an infinitival VP • bet is subcategorized for NP NP S A listing of the possible sequences of complements is called the subcategorization frame for the verb. As with agreement, the obvious CFG solution yields rule explosion: VP → Vintr VP → Vtr NP VP → Vditr NP NP Example Subcategorization Frames 10 Frame NP Verb eat, sleep prefer, find, leave, NP NP show, give NP PP VPinf help, load, prefer, want, need S mean 4.3 Example I want to eat Find [NP the flight from Pittsburgh to Boston] Show [NP me] [NP airlines with flights from Pittsburgh] Can you help [NP me] [PP with a flight] I would prefer [VPinf to go by United airlines] Does this mean [S AA has a hub in Boston]? Unbounded Dependencies Unbounded Dependency (or Movement) Constructions • *I gave to the driver. • I gave some money to the driver. • $5 [I gave to the driver ], (and $1 I gave to the porter ). • He asked how much [I gave to the driver ]. • I forgot about the money which [I gave • How much did you think [I gave to the driver ]. to the driver ]? • How much did you think he claimed [I gave to the driver ]? • How much did you think he claimed that I said [I gave to the driver ]? • ... 5 Summary Summary • CFGs capture hierarchical structure of constituents in natural language. • More powerful than REs, and can express recursive structure. • Hard to get a variety of linguistic generalizations in ’vanilla’ CFGs, though this can be mitigated with use of features (not covered here). • Building a CFG for a reasonably large set of English constructions is a lot of work! Reading • Jurafsky & Martin, Chapter 9 • Parsing tutorial in NLTK-Lite 11