* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download COMP 790: Statistical Language Processing
Ukrainian grammar wikipedia , lookup
Udmurt grammar wikipedia , lookup
Morphology (linguistics) wikipedia , lookup
Navajo grammar wikipedia , lookup
Ojibwe grammar wikipedia , lookup
Old Norse morphology wikipedia , lookup
Kannada grammar wikipedia , lookup
Lithuanian grammar wikipedia , lookup
Georgian grammar wikipedia , lookup
Arabic grammar wikipedia , lookup
Old English grammar wikipedia , lookup
Japanese grammar wikipedia , lookup
Compound (linguistics) wikipedia , lookup
Old Irish grammar wikipedia , lookup
Portuguese grammar wikipedia , lookup
Preposition and postposition wikipedia , lookup
Swedish grammar wikipedia , lookup
Modern Hebrew grammar wikipedia , lookup
Macedonian grammar wikipedia , lookup
Zulu grammar wikipedia , lookup
Chinese grammar wikipedia , lookup
Spanish grammar wikipedia , lookup
Ancient Greek grammar wikipedia , lookup
Icelandic grammar wikipedia , lookup
French grammar wikipedia , lookup
Serbo-Croatian grammar wikipedia , lookup
Determiner phrase wikipedia , lookup
Italian grammar wikipedia , lookup
Vietnamese grammar wikipedia , lookup
Latin syntax wikipedia , lookup
Esperanto grammar wikipedia , lookup
Scottish Gaelic grammar wikipedia , lookup
Lexical semantics wikipedia , lookup
Malay grammar wikipedia , lookup
Yiddish grammar wikipedia , lookup
Polish grammar wikipedia , lookup
COMP 791A: Statistical Language Processing Linguistic Essentials Chap. 3 1 Levels of study of NLP Lexical Possible words in a given language Phonetics & phonology How words are related to sounds rose [roz] Parts-of-speech & Morphology How words are constructed from basic meaning units (morphemes) rose ?gellapou friend + ly --> friendly rose + ly ≠ rosely friend + s --> friends woman + s ≠ womans Phrase Structure and Syntax How words can be ordered to form correct sentences ?Red the is rose / adj det verb noun The rose is red / det noun verb adj 2 Levels of study of NLP (con’t) Semantics What words mean (lexical semantics, word sense disambiguation) How word meanings are combined into the meaning of sentences. chair --> furniture / person The chair is broken. The chair is sick. Pragmatics How language conventions affects the literal meaning (interpretation) Discourse Do you have the time? Do you have the children? How surrounding sentences affect interpretation The chair’s leg is broken. He went skiing last week-end. The chair’s leg is broken. Someone placed a 500kg package on it. World-Knowledge How general knowledge about the world affects interpretation The prof sent the student to see the chair because he was fed up with his behavior. The prof sent the student to see the chair because he wanted to see him. The prof sent the student to see the chair because he was taking in class. 3 Levels of study of NLP Lexical Phonetics & phonology Parts-of-speech & Morphology Phrase Structure and Syntax Semantics Pragmatics Discourse World-Knowledge 4 Parts of Speech and Morphology Parts of Speech (POS) Morphology word/lexical/syntactic/grammatical categories/tag/class Ex: noun, verb, adjectives, prepositions, … study and description of word formation in a language modification of a root form (stem) by affixes affix: prefixes, suffixes, infixes, circumfixes and exceptions… thief --> thieves chief --> chiefs Word categories are systematically related by morphological processes 5 Morphological processes Inflection to indicate case, gender, number, tense, person, mood, or voice does not change the word’s grammatical class or meaning significantly Derivation car --> cars talk --> talking creation of a new word may have different meaning and/or grammatical class infect --> disinfect grateful --> ungrateful wide (adjective) --> widely (adverb) teach (verb) --> teacher (noun) Compounding merging 2 or more words into a single one written as separate words but pronounced as a single word / denotes 1 single concept so merits an entry in lexicon tea kettle, disk drive, mad cow disease 6 Classes of POS Open (lexical) class things, actions, events, … ex. cat, John, eat new words can be added easily nouns, verbs, adjectives, adverbs some languages do not have all these categories Closed (functional) class generally function/grammatical words ex. the, in, and, for relatively fixed membership prepositions, determiners, pronouns, conjunctions, particles, numerals, auxiliary verbs 7 Main POS Open class Noun – refers to entities like people, places, things or ideas. Adjective – describes the properties of nouns or pronouns. Verb – describes actions, activities and states. Adverb – describes a verb, an adjective or another adverb. Closed class Pronoun – word that take the place of a noun or other. Determiner – describes the particular reference of a noun. Preposition - expresses spatial or time relationships. … 8 Nouns (open) Entities like people, places, things or ideas Typical inflections: ex: dog, tree, Mary, idea number (singular, plural), gender (masculine, feminine, neuter), case (nominative, genitive, accusative, dative) Sub-categories: proper nouns (John) adverbial nouns (today, home) 9 Verbs (open) Actions, activities, and states The men work in the field. The men are working in the field. The men are in the field. Typical inflections: tenses: present, past, future other inflection: number, person aspect: progressive, perfective voice: active, passive Sub-category: auxiliaries (considered closed-class words) modal verbs (considered closed-class words) ex: be, do, will ex: can, should, could main verbs 10 Main verbs Transitive requires a direct object (found with questions: what? or whom?) Intransitive does not require a direct object. ?The child broke. The child broke a glass. The train arrived. Some verbs can be both transitive and intransitive The ship sailed the seas. (transitive) The ship sails at noon. (intransitive) I met my friend at the airport. (transitive) The delegates met yesterday. (intransitive) 11 Adjectives (open) Properties and attributes long road rainy day attractive hat Typical inflections: number, gender, case Sub-categories: comparative (richer) superlative (richest) 12 Adverbs (open) words added to a verb, adjective, adverbs or other to expand its meaning You must set up the copy now. Mary walks gracefully. Sometimes I take a walk in the woods. Jack usually leaves the house at seven. I have always admired her. sub-categories: locative (here) degree (very) manner (slowly) temporal (late, yesterday (noun?)) 13 Closed class categories Determiners: words that makes specific the denotation of a noun phrase articles the hat, a hat demonstrative this hat, that hat possessive John‘s hat, my hat, her book wh-determiner which hat, whose hat quantifier some hat, every hat Prepositions: words that show the relationship between certain words in a sentence by, to, at,… Conjunctions: The accident occurred under the bridge. words used to join other words or group of words or, when, but, and,… Auxiliary & modal verbs: be, do, can , may, should,… 14 Closed class categories (con’t) Particles: words that are added to main verbs to construct different verbs check+out = check out, make+up = make up Ex: She made up a story She made it up particles vs. prepositions she <ran up> a bill / she <ran> <up> a hill Numerals: one, third 15 Closed class categories (con’t) Pronouns: a word that replaces a noun or even another sentence ex: she, ourselves, mine, that subcategories: Personal: You are very nice. Possessive: Mine is nicer. Interrogative: used to ask questions: who?, what?, which? Who is that girl ? Demonstrative: point out definite persons, places or things: this, these, that This is my book. He said he was busy, but that was a lie. Relative: joins the clause which is introduced its own attachment: who, which, that She is the girl who won the race. ... 16 Other parts of speech Interjections: Negatives: no, not Politeness markers: Ouch! Hello, bye Existential: There are 3 students sleeping. 17 Summary Open class nouns verbs adjectives adverbs cat, spirit eat, cook slow, large slowly Closed class prepositions determiners pronouns conjunctions auxiliary verbs particles numerals on, under, at a, the, some she, who, I, other and, but, or can, may, should up, on, off one, two, first 18 The substitution test Basic test to determine if 2 words belong to the same POS class intelligent The sad one is in the corner. green fat … 19 POS Tagging Automatically assign POS tags to words in a text. Children/NOUN eat/VERB sweet/ADJECTIVE candy/NOUN The/ARTICLE children/NOUN ate/VERB the/ARTICLE cake/NOUN The/ARTICLE news/NOUN has/AUXILIARY been/MAIN VERB quite/ADVERB sad/ADJECTIVE in/PREPOSITION fact/NOUN ./PERIOD 20 Why do POS Tagging? 1st step towards NLU easier then full NLU (results > 95% accuracy) Useful for: speech recognition/ synthesis (better accuracy) stemming in IR how to recognize/pronounce a word CONtent /noun VS conTENT/adj which morphological affixes the word can take adverb - ly = noun (friendly - ly = friend) Indexing in IR pick out nouns which may be more important than other words in indexing documents 21 Tag Sets A tag indicates the various conventional parts of speech. Different Tag Sets have been used Ex. Brown Tag Set, Penn Treebank Tag Set Tag examples: NP Proper noun NN Singular noun AT Article DET Determinant More on this later 22 Penn Treebank tag Set Tag Description Examples CC conjunction, coordinating and but either et for less minus neither nor or plus so therefore CD numeral, cardinal mid-1890 nine-thirty forty-two one-tenth ten million 0.5 one DT determiner all an another any both del each either every half la many much IN preposition or subordinating conjunct. astride among upon whether out inside pro despite on by throughout JJ adjective or numeral, ordinal third ill-mannered pre-war regrettable oiled calamitous first JJR adjective, comparative bleaker braver breezier briefer brighter brisker broader bumper NN noun, common, singular or mass common-carrier cabbage knuckle-duster Casino afghan shed NNP noun, proper, singular Motown Venneboerger Czestochwa Ranzer Conchita Trumplane NNS noun, common, plural undergraduates scotches bric-a-brac products bodyguards facets PRP pronoun, personal hers herself him himself it itself me myself one oneself ours RB adverb occasionally unabatingly maddeningly adventurously professedly RP particle aboard about across along apart around aside at away back TO "to" as preposition or infinitive marker to VB verb, base form ask assemble assess assign assume atone attention avoid bake VBD verb, past tense dipped pleaded swiped wore soaked tidied convened halted VBG verb, present participle or gerund telegraphing stirring focusing angering judging stalling lactating VBN verb, past participle imitated dilapidated aerosolized chaired languished panelized used VBP verb, present tense, not 3rd p. singular predominate wrap resort sue twist spill cure lengthen brush VBZ verb, present tense, 3rd p. singular bases reconstructs marks mixes displeases seals carps weaves … 23 Ambiguities in POS tagging Children eat sweet candy / noun. Too much boiling will candy / adjective the molasses. Fruit flies / ? like / ? a banana. 24 Levels of study of NLP Lexical Phonetics & phonology Parts-of-speech & Morphology Phrase Structure and Syntax Semantics Pragmatics Discourse World-Knowledge 25 Syntax or Phrase Structure Syntax study of the regularities and constrains of word order and phrase structure the book is red vs red book is the Grammar expresses the relations among the constituents of a sentence 26 Constituents also called, syntactic structures Main Constituents: S: sentence The boy is happy. NP: noun phrase the little boy Sam Smith I three boy from Montreal eat an apple sing VP: verb phrase leave Boston in the morning PP: prepositional phrase in the morning about my ticket AdjP: adjective phrase really funny rather clear very large AdvP: adverb phrases slowly really slowly 27 Sentence Moods/Types Declarative Imperative Eat! S --> VP Yes-No Question Mary eats. S --> NP VP Did Mary eat? S --> Aux NP VP Wh-Question When did Mary eat? S --> WH-pro Aux NP VP 28 Noun Phrases NP --> pre-modifiers head post-modifiers head: central noun in NP the little boy, the boy from Montreal determiners, cardinal, ordinal, quantifier pre-modifiers: the boy, two boys, first boy, several boys funny boy, really funny boy flights from Montreal gerundive (-ing) AdjP post-modifiers: PP non-finite clause flights arriving from Montreal dinner served on board, jewels stolen from the queen flight to arrive from Montreal -ed infinitive form relative clause flight that arrives from Montreal, girl who won the race 29 Verb Phrases VP --> head-verb complements adjuncts Some VPs: Verb Verb NP Verb NP PP Verb PP Verb S Verb VP eat. leave Montreal. leave Montreal in the morning. leave in the morning. think I would like the fish. want to leave. want to leave Montreal. want to leave Montreal in the morning. want to want to leave Montreal in the morning. 30 Subcategorisation frames Some verbs can take complements that others cannot I want to fly. * I find to fly. Verbs are subcategorized according to the complements they can take --> subcategorisation frames traditionally: transitive vs intransitive nowadays: up to 100 subcategories / frames Frame Verb Example empty NP eat, sleep prefer, find I eat. I prefer apples. NP NP show, give Show me your hand. PPfrom PPto fly, travel VPto S I fly from Montreal to Toronto prefer, want I prefer to leave. mean Does this mean you are going to leave me? 31 Prepositional phrases PP --> Preposition NP from Japan inside my blue bag 32 Adjective Phrases AdjP --> Adj Modifiers tall very tall taller than Mary 33 Adverb Phrases AdvP --> Adv Modifiers affirmatively very graciously rather secretively 34 Context Free Grammars set of non-terminal symbols set of terminal symbols lexicon of words & punctuation cat, mouse, nurses, eat, ... sentence S constituents & parts-of-speech S, NP, VP, PP, Det, N, V, ... a non-terminal designated as the starting symbol a set of re-write rules having a single non-terminal on the LHS and one or more terminal or non-terminal in the RHS S --> NP VP NP --> Pro | PN | Det Nominal 35 A simple context-free grammar S --> NP VP NP --> AT NNS NP --> AT NN NP --> NP PP VP --> VP PP VP --> VBD VP --> VBD NP P --> IN NP The Grammar NNS --> children NNS --> students NNS --> mountains VBD --> slept VBD --> ate VBD --> saw AT --> the IN --> in IN --> of NN --> cake The Lexicon 36 A parse tree a tree representation of the application of the grammar to a specific sentence. S NP AT The VP NNS children VBD ate NP AT the NN cake 37 Stochastic Grammars Grammars obtained by adding probabilities to “algebraic” (i. e., non-probabilistic) grammars. 1 S --> NP VP 0.4 NP --> AT NNS 0.4 NP --> AT NN 0.2 NP --> NP PP 0.1 VP --> VP PP 0.1 VP --> VBD 0.8 VP --> VBD NP 1 P --> IN NP 38 Syntactic Dependencies Local dependency dependency between two words expressed within the same syntactic rule. The 3/plural books/plural. n-grams models this very well. Non-local dependency two words can be syntactically dependent even though they occur far apart in a sentence Ex: subject-verb agreement The children who found a wallet on the street yesterday while walking their dog were given a reward. challenge for certain statistical NLP approaches (ex. ngrams) that model local dependencies. 39 Difficulties in parsing Attachment ambiguity The children ate the cake with a spoon. The children ate (the cake with a spoon).?? The children (ate with a spoon).?? 40 Other difficulties NP bracketing plastic cat food can cover --> ? (plastic cat) (food can) cover --> ? plastic (cat food can) cover --> ? (plastic cat food) (can cover) Conjunctions and appositives Maddy, my dog, and Samy --> ?(Maddy, my dog), and (Samy) --> ?(Maddy), (my dog), and (Samy) 41 Another Ambiguity: Garden-Path Sentences well-studied class of syntactic ambiguity sentence is re-analysed when the last word in encountered humans have difficulty analysing such sentences Example: The horse raced past the barn fell. (the horse that was raced past the barn) fell. 42 Garden Path: Wrong Parse [S [NP The horse] [VP raced past the barn]]fell dt: determiner n: noun v: verb p: preposition S: sentence NP: noun phrase VP: verb phrase PP: prepositional phrase 43 Garden Path: Right Parse [S [NP The horse [PAP raced past the barn]][VP fell]] dt: determiner n: noun v: verb p: preposition S: sentence NP: noun phrase VP: verb phrase PP: prepositional phrase PAP: passive phrase 44 Levels of study of NLP Lexical Phonetics & phonology Parts-of-speech & Morphology Phrase Structure and Syntax Semantics Pragmatics Discourse World-Knowledge 45 Semantics the study of the meaning of words, constructions, and utterances can be divided into two parts: lexical semantics meaning of words compositional semantics Meaning of sentences and discourse the meaning of the whole often differs from the meaning of the parts. 46 Lexical Semantics Meaning of individual words I went to the bank of Montreal and deposited 50$. I went to the bank of the river and dangled my feet. Word Sense Disambiguation Determining which sense of a word is used in a specific sentence Semantic relations between words: hypernymy, hyponymy, synonymy, antonymy, meronymy, holonymy, polysemy, homonymy and homophony. 47 Meaning of sentences The cat eats the mouse = The mouse is eaten by the cat. Goal: Some characteristics of a sentence that influence semantic interpretation: built a representation of the meaning of the sentence attach semantic roles to constituents Type Polarity Tense Voice declarative, interrogative, imperative, exclamatory positive, negative past, present, future Active, passive Some semantic roles (different from syntactic roles): Agent the doer of a volitional act Patient the thing that is affected by an act Recipient the receiver of an object Instrument the instrument used to perform an act. Time the time the act is performed. Location the location of an act or object. … 48 Semantic Roles Ex: Ex: JohnAGENT hit PeterPATIENT with a ballINSTRUMENT. I ate I ate I ate I ate spaghetti spaghetti spaghetti spaghetti with with with with meatballsINGREDIENT_OF_SPAGUETTI saladSIDE DISH_OF_SPAGUETTI a forkINSTRUMENT a friendACOMPANIER_OF_EATING Important for machine translation… I AGENT: PERSON_LACKING_SOMEONE miss you PATIENT: PERSON_MISSED ?Je PATIENT: PERSON_MISSED teAGENT: PERSON_LACKING_SOMEONE manque. Tu PATIENT: PERSON_MISSED me AGENT: PERSON_LACKING_SOMEONE manques. 49 Levels of study of NLP Lexical Phonetics & phonology Parts-of-speech & Morphology Phrase Structure and Syntax Semantics Pragmatics Discourse World-Knowledge 50 Pragmatics goes beyond the study of the meaning of a sentence tries to explain what the speaker is really expressing understanding how people use language socially (ex. figures of speech, speech acts, discourse analysis, …) Ex: Could you spare some change? 51 Discourse Analysis In logics: A B C C B A Not in NL: John visited Paris. He bought Mary some expensive cologne. Then he flew home. He went to Kmart. He bought some underwear. John visited Paris. Then he flew home. He went to Kmart. He bought Mary some expensive cologne. He bought some underwear. NL Text must be coherent ? Bill went to see his mother. The trunk is what makes the bonsai, it gives it both its grace and power. 52 Using world knowledge Using our general knowledge of the world to interpret a sentence/discourse Ex: A men was killed yesterday because a jealous husband returned home earlier then usual. Ex: Silence of the lambs… 53