Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Natural Language Processing Artificial Intelligence CMSC 25000 February 28, 2002 Agenda • Why NLP? – Goals & Applications • Challenges: Knowledge & Ambiguity – Key types of knowledge • Morphology, Syntax, Semantics, Pragmatics, Discourse – Handling Ambiguity • Syntactic Ambiguity: Probabilistic Parsing • Semantic Ambiguity: Word Sense Disambiguation • Conclusions Why Language? • Natural Language in Artificial Intelligence – Language use as distinctive feature of human intelligence – Infinite utterances: • Diverse languages with fundamental similarities • “Computational linguistics” – Communicative acts • Inform, request,... Why Language? Applications • Machine Translation • Question-Answering – Database queries to web search • Spoken language systems • Intelligent tutoring Knowledge of Language • What does it mean to know a language? – Know the words (lexicon) • Pronunciation, Formation, Conjugation – Know how the words form sentences • Sentence structure, Compositional meaning – Know how to interpret the sentence • Statement, question,.. – Know how to group sentences • Narrative coherence, dialogue Word-level Knowledge • Lexicon: – List of legal words in a language – Part of speech: • noun, verb, adjective, determiner • Example: – – – – Noun -> cat | dog | mouse | ball | rock Verb -> chase | bite | fetch | bat Adjective -> black | brown | furry | striped | heavy Determiner -> the | that | a | an Word-level Knowledge: Issues • Issue 1: Lexicon Size – Potentially HUGE! – Controlling factor: morphology • Store base forms (roots/stems) – Use morphologic process to generate / analyze – E.g. Dog: dog(s); sing: sings, sang, sung, singing, singer,.. • Issue 2: Lexical ambiguity – rock: N/V; dog: N/V; – “Time flies like a banana” Sentence-level Knowledge: Syntax • Language models – More than just words: “banana a flies time like” – Formal vs natural: Grammar defines language Chomsky Hierarchy Context A-> aBc n n ab Free Recursively =Any Enumerable Context = AB->BA n n n a Sensitive b c Regular S->aS Expression a*b* Syntactic Analysis: Grammars • Natural vs Formal languages – Natural languages have degrees of acceptability • ‘It ain’t hard’; ‘You gave what to whom?’ • Grammar combines words into phrases – S-> NP VP – NP -> {Det} {Adj} N – VP -> V | V NP | V NP PP Syntactic Analysis: Parsing • Recover phrase structure from sentence – Based on grammar S NP Det Adj VP N V NP Det Adj N The black cat chased the furry mouse Syntactic Analysis: Parsing • Issue 1: Complexity • Solution 1: Chart parser - dynamic programming 2 Gn – O( ) • Issue 2: Structural ambiguity – ‘I saw the man on the hill with the telescope’ • Is the telescope on the hill?’ • Solution 2 (partial): Probabilistic parsing Semantic Analysis • Grammatical = Meaningful – “Colorless green ideas sleep furiously” • Compositional Semantics – Meaning of a sentence is meaning of subparts – Associate semantic interpretation with syntactic – E.g. Nouns are variables (themselves): cat,mouse • Adjectives: unary predicates: Black(cat), Furry(mouse) • Verbs: multi-place: VP:x chased(x,Furry(mouse)) • Sentence ( x chased(x, Furry(mouse))Black(cat) – chased(Black(cat),Furry(mouse)) Semantic Ambiguity • Examples: – I went to the bank• of the river • to deposit some money – He banked • at First Union • the plane • Interpretation depends on – Sentence (or larger) topic context – Syntactic structure Pragmatics & Discourse • Interpretation in context – Act accomplished by utterance • “Do you have the time?”, “Can you pass the salt?” • Requests with non-literal meaning – Also, includes politeness, performatives, etc • Interpretation of multiple utterances – “The cat chased the mouse. It got away.” – Resolve referring expressions Natural Language Understanding Input Tokenization/ Morphology Parsing Semantic Analysis Pragmatics/ Meaning Discourse • Key issues: – Knowledge • How acquire this knowledge of language? – Hand-coded? Automatically acquired? – Ambiguity • How determine appropriate interpretation? – Pervasive, preference-based Handling Syntactic Ambiguity • Natural language syntax • Varied, has DEGREES of acceptability • Ambiguous • Probability: framework for preferences – Augment original context-free rules: PCFG – Add probabilities to transitions 0.2 1.0 PP -> P NP 0.45 NP -> N VP -> V 0.65 0.45 NP -> Det N VP -> V NP 0.10 NP -> Det Adj N VP0.10 -> V NP PP 0.05 NP -> NP PP 0.85 S -> NP VP S0.15 -> S conj S PCFGs • Learning probabilities – Strategy 1: Write (manual) CFG, • Use treebank (collection of parse trees) to find probabilities – Strategy 2: Use larger treebank (+ linguistic constraint) • Learn rules & probabilities (inside-outside algorithm) • Parsing with PCFGs – Rank parse trees based on probability – Provides graceful degradation • Can get some parse even for unusual constructions - low value Parse Ambiguity • Two parse trees S S NP N I NP VP V NP PP Det N P NP Det N N VP V NP NP PP Det N P NP Det N saw the man with the telescope I saw the man with the telescope Parse Probabilities P(T , S ) p(r (n)) nT – T(ree),S(entence),n(ode),R(ule) – T1 = 0.85*0.2*0.1*0.65*1*0.65 = 0.007 – T2 = 0.85*0.2*0.45*0.05*0.65*1*0.65 = 0.003 • Select T1 • Best systems achieve 92-93% accuracy Semantic Ambiguity • “Plant” ambiguity – Botanical vs Manufacturing senses • Two types of context – Local: 1-2 words away – Global: several sentence window • Two observations (Yarowsky 1995) – One sense per collocation (local) – One sense per discourse (global) Learn Disambiguators • Initialize small set of “seed” cases • Collect local context information – “collocations” • E.g. 2 words away from “production”, 1 word from “seed” • • • • Contexts = rules Make decision list= rules ranked by mutual info Iterate: Labeling via DL, collecting contexts Label all entries in discourse with majority sense – Repeat Disambiguate • For each new unlabeled case, – Use decision list to label • > 95% accurate on set of highly ambiguous – Also used for accent restoration in e-mail Natural Language Processing • Goals: Understand and imitate distinctive human capacity • Myriad applications: MT, Q&A, SLS • Key Issues: – Capturing knowledge of language • Automatic acquisition current focus: linguistics+ML – Resolving ambiguity, managing preference • Apply (probabilistic) knowledge • Effective in constrained environment