Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Outline: Where have we been and were are we going? • • We’re making consistent progress, or We’re running around in circles, or – – – – ACL Meeting Fred Jelinek 2005 2000 1995 Bob Moore 1990 of data lifts all boats 1985 No matter what happens, it’s goin’ be great! % Statistical Papers • 1950s: Empiricism (Information Theory, Behaviorism) 1970s: Rationalism (AI, Cognitive Psychology) 1990s: Empiricism (Data Mining, Statistical NLP, Speech) 2010s: Rationalism (TBD) 100% We’re going off a cliff… 80% – Don’t worry; be happy 60% 40% 20% Rising tide 0% Rising Tide of Data Lifts All Boats If you have a lot of data, then you don’t need a lot of methodology • 1985: “There is no data like more data” – Fighting words uttered by radical fringe elements (Mercer at Arden House) • 1995: The Web changes everything • All you need is data (magic sauce) – – – – – – – No linguistics No artificial intelligence (representation) No machine learning No statistics No error analysis No data mining No text mining “It never pays to think until you’ve run out of data” – Eric Brill Moore’s Law Constant: Banko & Brill: Mitigating the Paucity-of-Data Problem (HLT 2001) Data Collection Rates Improvement Rates More data is better data! Fire everybody and spend the money on data Quoted out of context No consistently best learner The rising tide of data will lift all boats! TREC Question Answering & Google: What is the highest point on Earth? The rising tide of data will lift all boats! Acquiring Lexical Resources from Data: Dictionaries, Ontologies, WordNets, Language Models, etc. http://labs1.google.com/sets England Japan Cat cat France Germany Italy Ireland China India Indonesia Malaysia Dog Horse Fish Bird more ls rm mv Spain Scotland Belgium Korea Taiwan Thailand Rabbit Cattle Rat cd cp mkdir Canada Austria Australia Singapore Australia Bangladesh Livestock Mouse Human man tail pwd Applications • What good is word sense disambiguation (WSD)? – Information Retrieval (IR) • Salton: Tried hard to find ways to use NLP to help IR – • • – • • but failed to find much (if anything) Croft: WSD doesn’t help because IR is already using those methods Sanderson (next two slides) Machine Translation (MT) • • Original motivation for much of the work on WSD But IR arguments may apply just as well to MT What good is POS tagging? Parsing? NLP? Speech? Commercial Applications of Natural Language Processing, CACM 1995 – $100M opportunity (worthy of government/industry’s attention) 1. 2. • Don’t worry; Be happy Search (Lexis-Nexis) Word Processing (Microsoft) Warning: premature commercialization is risky Sanderson (SIGIR-94) http://dis.shef.ac.uk/mark/cv/publications/papers/my_papers/SIGIR94.pdf Not much? 5 Ian Andersons F • Could WSD help IR? • Answer: no – Introducing ambiguity by pseudo-words doesn’t hurt (much) Query Length (Words) Short queries matter most, but hardest for WSD Sanderson (SIGIR-94) http://dis.shef.ac.uk/mark/cv/publications/papers/my_papers/SIGIR94.pdf Soft WSD? F • Resolving ambiguity badly is worse than not resolving at all – 75% accurate WSD degrades performance – 90% accurate WSD: breakeven point Query Length (Words) Some Promising Suggestions (Generate lots of conference papers, but may not support the field) • Two Languages are Better than One – For many classic hard NLP problems • Word Sense Disambiguation (WSD) • PP-attachment • Conjunction • Predicate-argument relationships • Japanese and Chinese Word breaking – Parallel corpora plenty of annotated (labeled) testing and training data – Don’t need unsupervised magic (data >> magic) • Demonstrate that NLP is good for something – Statistical methods (IR & WSD) focus on bags of nouns, • Ignoring verbs, adjectives, predicates, intensifiers, etc. – Hypothesis: Ignored because perceptrons can’t model XOR – Task: classify “comments” into “good,” “bad” and “neutral” • Lots of terms associated with just one category • Some associated with two – Depending on argument • Good & Bad, but not neutral: Mickey Mouse, Rinky Dink – Bad: Mickey Mouse(us) – Good: Mickey Mouse(them) – Current IR/WSD methods don’t capture predicateargument relationships Web Apps: Document Language Model ≠ Query Language Model • Documents – – – – Function Words Adjectives Verbs Predicates • Queries – – – – – Typos Brand Names Celebrities Named Entities Slower Vocab Growth Technical Op: Reduce IR to Translation Promising Apps: Web Spam, Frame Problem Speech Data Mining & Call Centers: An Intelligence Bonanza • Some companies are collecting information with technology designed to monitor incoming calls for service quality. • Last summer, Continental Airlines Inc. installed software from Witness Systems Inc. to monitor the 5,200 agents in its four reservation centers. • But the Houston airline quickly realized that the system, which records customer phone calls and information on the responding agent's computer screen, also was an intelligence bonanza, says André Harris, reservations training and quality-assurance director. Speech Data Mining • Label calls as success or failure based on some subsequent outcome (sale/no sale) • Extract features from speech • Find patterns of features that can be used to predict outcomes • Hypotheses: – Customer: “I’m not interested” no sale – Agent: “I just want to tell you…” no sale Inter-ocular effect (hits you between the eyes); Don’t need a statistician to know which way the wind is blowing Outline • We’re making consistent progress, or • We’re running around in circles, or – Don’t worry; be happy • We’re going off a cliff… According to unnamed sources: Speech Winter Language Winter Dot Boom Dot Bust Sample of 20 Survey Questions (Strong Emphasis on Applications) • When will – More than 50% of new PCs have dictation on them, either at purchase or shortly after. – Most telephone Interactive Voice Response (IVR) systems accept speech input. – Automatic airline reservation by voice over the telephone is the norm. – TV closed-captioning (subtitling) is automatic and pervasive. – Telephones are answered by an intelligent answering machine that converses with the calling party to determine the nature and priority of the call. – Public proceedings (e.g., courts, public inquiries, parliament, etc.) are transcribed automatically. • Two surveys of ASRU attendees: 1997 & 2003 $ Hockey Stick Business Case 2003 Last Year 2004 This Year t 2005 Next Year 2003 Responses ≈ 1997 Responses + 6 Years (6 years of hard work No progress) Wrong Apps? • New Priorities – Increase demand for space >> Data entry • New Killer Apps – Search >> Dictation • Speech Google! – Data mining • Old Priorities – Dictation app dates back to days of dictation machines – Speech recognition has not displaced typing • Speech recognition has improved • But typing skills have improved even more – My son will learn typing in 1st grade – Sec rarely take dictation – Dictation machines are history • My son may never see one • Museums have slide rulers and steam trains – But dictation machines? Borrowed Slide: Jelinek (LREC) Great Strategy Success Great Challenge: Annotating Data • Produce annotated data with minimal supervision Self-organizing “Magic” ≠ Error Analysis • Active learning – Identify reliable labels – Identify best candidates for annotation • Co-training • Bootstrap (project) resources from one application to another Grand Challenges ftp://ftp.cordis.lu/pub/ist/docs/istag040319-draftnotesofthemeeting.pdf Roadmaps: Structure of a Strategy (not the union of what we are all doing) • Goals – Example: Replace keyboard with microphone – Exciting (memorable) sound bite – Broad grand challenge that we can work toward but never solve • Metrics – Examples: • – Quantity is not a good thing – Awareness – 1-slide version • if successful, you get maybe 3 more slides • – Easy to measure • • Mostly for next year: Q1-4 • Plus some for years 2, 5, 10 & 20 Milestones – Should be no question if it has been accomplished – Example: reduce WER on task x by y% by time t Accomplishments v. Activities – Accomplishments are good – Activity is not a substitute for accomplishments – Milestones look forward whereas accomplishments look backward • Serendipity is good! Size of container – Goal: 1-3 – Metrics: 3 – Milestones: a dozen • WER: word error rate • Time to perform task • Small is beautiful – Accomplishments: a dozen • Broad applicability & illustrative – Don’t cover everything – Highlight stuff that • Applies to multiple groups • Forward-Looking / Exciting Goals: 1. The multilingual companion 2. Life log Grand Challenges Goal: Produce NLP apps that improve the way people communicate with one another Goal: Reduce barriers to entry €€€ Apps & Techniques Resources Evaluation Substance: Recommended if… Summary: What Worked and What Didn’t? What’s the right answer? • Data – Stay on msg: It is the data, stupid! • • WVLC (Very Large) >> EMNLP (Empirical Methods) If you have a lot of data, – • • Then you don’t need a lot of methodology Rising Tide of Data Lifts All Boats Methodology – Empiricism means different things to different people 1. 2. 3. – There’ll be a quiz at the end of the decade… Machine Learning (Self-organizing Methods) Exploratory Data Analysis (EDA) Corpus-Based Lexicography Lots of papers on 1 • • Magic: Recommended if… EMNLP-2004 theme (error analysis) 2 Senseval grew out of 3 Promise: Recommended if… Short term ≠ Long term Lonely