Download EE517 – Statistical Language Processing

EE517 – Statistical Language Processing Prof. Mari Ostendorf Basic Linguistics (M & S Chpt 3) Outline • Part of speech and morphology (word structure) • Syntax (phrase structure) • Lexical semantics • Discourse Note: we’ll cover the first two linguistics topics in more detail because there’s more statistical modeling work in those areas. Word Part of Speech Part of speech (POS) = grammatical categories (e.g. noun, verb, etc) POS is usually listed with words in a dictionary. POS tagging is an important tool in language processing. It is analogous to using Markov models (simplest model beyond i.i.d.) in being the simplest thing that you can do beyond just using the words. Major POS groups include: • Nouns, pronouns (person, place, thing, animal, concept, etc) – Nouns can be singular vs. plural (dog, dogs), proper noun (Joe), possessive (Joe’s) – Pronouns (stand-ins for nouns) can be: First, second or third person (I, you, he/she); nominative (he, she); accusative (me, him, her); possessive (my, mine); reflexive (herself) • Determiners, adjectives (accompany nouns) – Determiners include: articles (a, the), demonstratives (this, that) – Adjectives describe properties of nouns: red, long, pretty, rich, richer, richest • Verbs (describe actions, activities, states) – main verbs: He threw the stone. (action); I read (activity); I have $50. (state) – verbs used with other verbs: ∗ auxiliary verbs: have, be ∗ modals: may, can, shall, will – verbs have many forms based on singular/plural, tense, infinitive, etc. (see text) • Adverbs, prepositions, particles (accompany verbs and more) – adverbs modify verbs (often, quickly), and sometimes adjectives (very) – prepositions are small words that express relations, e.g. spatial (in, on, by, over, under), temporal (after, before), ... – particles are a subclass of prepositions that connect with the verb as in a single lexical item (though not necessarily sequential). Example: The plane took off. (particle) She took off her jacket. (particle) She took her jacket off. (particle) She took the book off the shelf. (preposition) (*) She took the book off. • Conjunctions and complementizers (connect phrases) – Coordinating conjunctions connect words, phrases or sentence that are in some sense equals (and, but) – Complementizers (also called subordinating conjunctions) connect phrases where one is primary and the other secondary (that, because, if, although) • Miscellaneous other categories: – Interjections: oh – Fillers: uh, um You often can’t determine the category in isolation: light the fire (V); turn on the light (N); light lunch (Adj) ASIDE: The examples given here are very much English-centric. There are different distinctions that are important in other languages. Morphology Systematic relation of different forms of a “word” (or, lemma) Morphological analysis is an important tool for dealing with words you’ve never seen before, which is always a problem because in practice our dictionaries are finite. It is less critical for English, but is particularly useful for highly inflected languages (e.g. Finnish, Czech, Turkish). Turkish example from Jurafsky & Martin: urgarlastiramadiklarimizdanmissinizcasina “(behaving) as if you are among those whom we could not cause to become civilized” Typical computational problems that benefit from morphological analysis: • Find word root for information retrieval or to assess when information content is new vs. old • Pronunciation prediction (text-to-speech), spell checking • Part-of-speech tagging and parsing • Machine translation Two classes of morphemes: stem (supplies main meaning), affix (add to meaning) Three types of morphological processes: • inflection: does not change word class substantially but indicates minor grammatical distinctions, such as: – singular/plural: dog, dogs – tense: play, playing, played – gender: fils/fille (French) chica/chico (Spanish) • derivation: usually results in more radical change in syntactic category, sometimes changing meaning – wide ADJ → widely ADV – understand V → understandable ADJ – teach V → teacher N – compute V → computer N • compounding: merging 2 words to get 1 lexical item – disk drive, mad cow disease – babysitter, suitcase, overtake Phrase Structure Syntax = study of regularities and constraints of word order and phrase structure. There are multiple theories of syntax. We’ll take a limited view that gets across some key ideas and is useful for the computational models that we will look at. A sequence of words is a constituent if you can replace it or move it around: my good friends I invited Janet my neighbor and his charming daughter Main types of constituents: • NP: noun phrase • VP: verb phrase • PP: preposition phrase • AP: adjective phrase Examples: She is very sure of herself. The man caught the butterfly with a net. Phrase structure can be represented with a tree, as in the figure, or with bracketing: [S [N P The man][V P [V BD caught][N P the butterfly][P P [IN with][N P a net]]] The leaf nodes of the tree are the words and are referred to as “terminal nodes” or “terminals”. The internal nodes are referred to as “nonterminals” and are associated with phrase labels. Phrases illustrated in this example: • NP: the man, the butterfly, a net • VP: caught the butterfly with a net • PP: with a net Phrases typically have a “head”, which is usually the central constituent that determines the syntactic character of the phrase. For example, the head of a noun phrase is the main noun; the head of a verb phrase is the mean verb. You can describe either bracket or tree representation with rewrite rules: S → NP V P V P → V BD N P P P P P → IN N P and of course more rules are needed to handle other types of sentences. Often the tree and the rules are converted to be binary, in which case you would need two V P rules: V P → V P P P ; V P → V BD N P. These rules are referred to as “context free” because they don’t use information about the specific words or neighboring context. The options depend only on a particular non-terminal phrase type. • Advantages of context-free grammars: unlike n-grams, the can capture non-local dependencies, e.g. The women who found the wallet were rewarded. (*) The women who found the wallet was rewarded. • A disadvantage: lexical context often matters. “sew clothes” but not “sew wood blocks” A solution to this problem is to “lexicalize” the grammar, which is popular in current statistical parsers. Aside: There many are other types of grammars used in statistical model, including dependency grammars and tree adjoining grammars, for example. Arguments and adjuncts Verbs have: • arguments (centrally involved with the verb) – subject NP (almost always) – direct object NP (for transitive verbs, not intransitive) – indirect object NP (for ditransitive verbs) Example: She gave him a ball. • adjuncts (optional, frequently to tell time, place or manner) – He went yesterday – He went to the store – John gave her the book with a smile. Empty Categories Useful for dealing with “missing” arguments. [S 00 [N P Which book][S 0 [M D should][S [N P Peter][V P [V B buy][N P φ]]]]] Semantics Semantics: the meaning of words and combinations of words Focus here on the meaning of words, or lexical semantics • Relations between words – more vs. less general: animal → { cat, (dog → husky), giraffe} animal is a hypernym of cat; cat is a hyponym of animal – part vs. whole: tree → { trunk, (branch → leaf), root} branch is a meronym of tree – antonyms (opposites): hot vs. cold – synonyms (similar): car vs. automobile • Different meanings of the same orthographic word (word sense) – homonyms: (some closely related meanings, some not) ∗ bank: the financial institution vs. the edge of land next to a river vs. blood bank vs. airplane turn ∗ branch: of a tree vs. of an organization – other: (same spelling, different pronunciation) bass: the fish vs. the musical instrument vs. low pitched sound • Arguments of verbs can have different semantic roles – agent: person or thing doing something – patient: person or thing having something done to it – instrument – goal Examples: John cut the grass with the lawnmower. The lawnmower cut the grass. Discourse Phenomena at the level of groups of phrases or sentences. Elements of discourse (Grosz & Sidner) • Segmentation: how phrases and sentences are grouped into topics and sub-topics • Intention: purpose of an utterance or phrase, speech act (question, statement, greeting, backchannel (uh-huh)) • Attention: what’s in focus, global topic vs. local NP Problems that have received attention in data-driven work: • Topic segmentation or topic change detection, topic detection and tracking • Text coherence, discourse planning • Speech act labeling (e.g. does ok mean agree or question or moving on or . . . ) • Reference resolution and entity recognition Mary helped Peter get out of the cab. He thanked her. Mary helped the passenger out of the cab. The woman thanked her. Hurricane Hugo destroyed 20000 Florida homes. The disaster has been the most costly in . . .

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download EE517 – Statistical Language Processing