Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Learning Text Mining Walter Daelemans [email protected] http://cnts.uia.ac.be CNTS, University of Antwerp, Belgium ILK, Tilburg University, Netherlands Language Technology BERGEN 2002 Outline • Using a Language Bank for Text Mining • Shallow Parsing for Text Mining – Example: Question Answering • Memory-Based Learning – Memory-Based Shallow Parsing • Tagging • Chunking • Relation-finding – Memory-Based Information Extraction Text Mining • Automatic extraction of reusable information (knowledge) from text, based on linguistic features of the text • Goals: – Data mining (KDD) from unstructured and semistructured data – (Corporate) Knowledge Management – “Intelligence” • Examples: – Email routing and filtering – Finding protein interactions in biomedical text – Matching resumes and vacancies Document Set of Documents Structured Information + existing data Author Recognition Document Dating Language Identification Text Categorization Information Extraction Summarization Question Answering Topic Detection and Tracking Document Clustering Terminology Extraction Ontology Extraction Knowledge Discovery LE Components Text Lexical / Morphological Analysis Applications OCR Spelling Error Correction Tagging Grammar Checking Chunking Information retrieval Syntactic Analysis Grammatical Relation Finding Document Classification Information Extraction Named Entity Recognition Summarization Word Sense Disambiguation Question Answering Semantic Analysis Ontology Extraction and Refinement Reference Resolution Dialogue Systems Discourse Analysis Machine Translation Meaning LE Components Text Lexical / Morphological Analysis Applications OCR Spelling Error Correction Grammar Checking Shallow Parsing Information retrieval Document Classification Information Extraction Named Entity Recognition Summarization Word Sense Disambiguation Question Answering Semantic Analysis Ontology Extraction and Refinement Reference Resolution Dialogue Systems Discourse Analysis Machine Translation Meaning LE Components Text Lexical / Morphological Analysis Applications OCR Spelling Error Correction Grammar Checking Shallow Parsing Information retrieval Document Classification Information Extraction Named Entity Recognition Summarization Word Sense Disambiguation Question Answering Semantic Analysis Ontology Extraction and Refinement Reference Resolution Dialogue Systems Discourse Analysis Machine Translation Meaning Example: Shallow Parsing for Question Answering • Give answer to question (document retrieval: find documents relevant to query) • Who invented the telephone? – Alexander Graham Bell • When was the telephone invented? – 1876 (Buchholz & Daelemans, 2001) QA System: Shapaqa • Parse question When was the telephone invented? – Which slots are given? • Verb invented • Object telephone – Which slots are asked? • Temporal phrase linked to verb • Document retrieval on internet with given slot keywords • Parsing of sentences with all given slots • Count most frequent entry found in asked slot (temporal phrase) Shapaqa: example • When was the telephone invented? • Google: invented AND “the telephone” – produces 835 pages – 53 parsed sentences with both slots and with a temporal phrase is through his interest in Deafness and fascination with acoustics that the telephone was invented in 1876 , with the intent of helping Deaf and hard of hearing The telephone was invented by Alexander Graham Bell in 1876 When Alexander Graham Bell invented the telephone in 1876 , he hoped that these same electrical signals could Shapaqa: example (2) • So when was the phone invented? • Internet answer is noisy, but robust – – – – – – 17: 3: 2: 2: 1: … 1876 1874 ago later Bell • System was developed quickly • Precision 76% (Google 31%) • International competition (TREC): MRR 0.45 4 x OSWALD Who shot Kennedy ? * www.anusha.com/jfk.htm situation in which Oswald shot Kennedy on November 22 , 1963 . * www.mcb.ucdavis.edu/people/hemang/spooky.html Lee Harvey Oswald shot Kennedy from a warehouse and ran . * www.gallica.co.uk/monarch.htm November 1963 U.S. President Kennedy was shot by Lee Harvey Oswald . * astrospeak.indiatimes.com/mystic_corner.htm Lee Harvey Oswald shot Kennedy from a warehouse and fled . 2 x BISHOP * www.powells.com/biblio/0-200/000637901X.html The day Kennedy was shot by Jim Bishop . * www.powells.com/biblio/49200-49400/0517431009.html The day Kennedy was shot by by Jim Bishop . 1 x BULLET * www.lustlysex.com/index_m.htm President John F. Kennedy was shot by a Republican bullet . 1 x MAN * www.ncas.org/condon/text/appndx-p.htm KENNEDY ASSASSINATION Kennedy was shot by a man who was not . Documents Text Analysis / Shallow Parsing Analyzed Text who, what, where, when, why, how, … Text Mining Structured Data / Knowledge Language Bank Classifiers Documents Text Analysis / Shallow Parsing Analyzed Text who, what, where, when, why, how, … Text Mining Structured Data / Knowledge Information Sources Annotated corpus Examples Machine Learning •Feature selection and construction •Learning algorithm parameter optimization •Combination •Boosting •Cross-validation Postprocessing Input Optimal Classifier Output Generalisation Abstraction Rule Induction + abstraction … Connectionism (Fill in your most hated Inductive Logic Programming linguist here) Statistics Handcrafting + generalisation - generalisation Memory-Based Learning - abstraction Table Lookup MBL: Use memory traces of experiences as a basis for analogical reasoning, rather than using rules or other abstractions extracted from experience and replacing the experiences. This “rule of nearest neighbor” has considerable elementary intuitive appeal and probably corresponds to practice in many situations. For example, it is possible that much medical diagnosis is influenced by the doctor's recollection of the subsequent history of an earlier patient whose symptoms resemble in some way those of the current patient. (Fix and Hodges, 1952, p.43) -etje MBL -kje Coda last syl ? Nucleus last syl Memory-Based Learning • Basis: k nearest neighbor algorithm: – store all examples in memory – to classify a new instance X, look up the k examples in memory with the smallest distance D(X,Y) to X – let each nearest neighbor vote with its class – classify instance X with the class that has the most votes in the nearest neighbor set • Choices: – similarity metric – number of nearest neighbors (k) – voting weights The properties of NLP tasks … • NLP tasks are mappings between linguistic representation levels that are – – • • • context-sensitive (but mostly local!) complex (sub/ir/regularity), pockets of exceptions Similar representations at one linguistic level correspond to similar representations at the other level Several information sources interact in (often) unpredictable ways at the same level Data is sparse … fit the bias of MBL • The mappings can be represented as (cascades of) classification tasks (disambiguation or segmentation) • Locality is implemented through windowing over representations • Inference is based on Similarity-Based / Analogical Reasoning • Adaptive data fusion / relevance assignment is available through feature weighting • It is a non-parametric approach • Similarity-based smoothing is implicit • Regularities and subregularities / exceptions can be modeled uniformly Shallow Parsing Classifiers The woman will give Mary a book POS Tagging The/Det woman/NN will/MD give/VB Mary/NNP a/Det book/NN Chunking [The/Det woman/NN]NP [will/MD give/VB]VP [Mary/NNP]NP [a/Det book/NN]NP Relation Finding Subject [The woman] [will give] [Mary] [a book] I-Object Object Memory-Based POS Tagger Assigning morpho-syntactic categories (parts-of-speech) to words in context: The green train Det Adj/NN NNS/VBZ Det Adj NNS runs down NN/VB Prep/Adv/Adj VB Prep that SC/Pron Pron track . NN/VB . NN . Disambiguation: resolution of a combination of lexical and local contextual constraints. • Lexical representations: Frequency-sensitive ambiguity class lexicon. • Convert sentences to MBL cases by ‘windowing’. Local constraints are modeled by features of neighboring words. Memory-Based POS Tagger (2) • Case base for known words. Features: tag-2, tag-1, lexfocus, word(top100)focus, lex+1, lex+2 POS tag • Case base for unknown words. Features: tag-2, tag-1, pref, cap, hyp, num, suf1, suf2, suf3, lex+1, lex+2 POS tag Memory-Based POS Tagger (3) • Experimental results: (Zavrel & Daelemans, 1999) language English W SJ English LOB Dutch Czech Spanish Swedish tagset size train test accuracy 44 2000 200 96.4 170 931 115 97.0 13 611 100 95.7 42 495 100 93.6 484 711 89 97.8 23 1156 11 95.6 Memory-Based XP Chunker Assigning non-recursive phrase brackets (Base XPs) to phrases in context: [NPThe Det I-NP woman NP] NN I-NP [VPwill giveVP] MD VB I-VP I-VP [NPMary NP] NNP I-NP [NPa book NP] Det NN B-NP I-NP Convert NP, VP, ADJP, ADVP, PrepP, and PP brackets to classification decisions (I/O/B tags) (Ramshaw & Marcus, 1995). Features: POS -2, IOBtag-2, word -2, POS -1, IOBtag-1, word -1, POS focus, wordfocus, POS +1, word +1, POS +2, word +2, IOB tag . . Memory-Based XP Chunker (2) • Results (WSJ corpus) (Buchholz, Veenstra, Daelemans, 1999) type prec recall F1 NP 92.5 92.2 VP 91.9 91.7 ADJP 68.4 65.0 ADVP 78.0 77.9 Prep 95.5 96.7 PP 91.9 92.2 ADVFunc 78.0 69.5 92.3 91.8 66.7 77.9 96.1 92.0 73.5 • Useful for: Information Retrieval, Information Extraction, Terminology Discovery, etc. • See also CoNLL-2000 Shared task at: http://lcg-www.uia.ac.be/ Memory-Based GR labeling Assigning labeled Grammatical Relation links between words in a sentence: The woman will Det NN MD I-NP I-NP I-VP SUBJ-1 VP-1 give VB I-VP VP-1 Mary NNP I-NP OBJ-1 a Det B-NP book NN B-NP . . GR’s of Focus with relation to Verbs (subject, object, location, …, none) Features: Focus: prep, adv-func, word+1, word0, word-1, word-2, POS+1, POS0, POS-1, POS-2, Chunk+1, Chunk0, Chunk-1, Chunk-2. Verb: POS, word, Distance: words, VPs, comma’s GRtype Memory-Based GR labeling (2) • Results (WSJ corpus) (Buchholz, Veenstra, Daelemans, 1999) features prec recall F1 words+POS only 60.7 41.3 +NPs 65.9 55.7 +VPs 72.1 62.9 +ADJPs +ADVPs 72.1 63.0 +Preps 72.5 64.3 +PPs 73.6 65.6 +ADVFunc 74.8 67.9 49.1 60.4 67.2 67.3 68.2 69.3 71.2 • Completes shallow parser. Useful for e.g. Question Answering, IE etc. • See demo at: http://ilk.kub.nl/ From POS tagging to IE Classification-Based Approach • POS tagging The/Det woman/NN will/MD give/VB Mary/NNP a/Det book/NN • NP chunking The/I-NP woman/I-NP will/I-VP give/I-VP Mary/I-NP a/B-NP book/I-NP • Relation Finding [NP-SUBJ-1 the woman ] [VP-1 will give ] [NP-I-OBJ-1 Mary] [NPOBJ-1 a book ]] • Semantic Tagging = Information Extraction [Giver the woman][will give][Givee Mary][Given a book] • Semantic Tagging = Question Answering Who will give Mary a book? [Giver ?][will give][Givee Mary][Given a book] Language Bank Classifiers Documents Text Analysis / Shallow Parsing Analyzed Text who, what, where, when, why, how, … Text Mining Structured Data / Knowledge Language Bank Documents Analyzed Text who, what, where, when, why, how, … Structured Data / Knowledge Classifiers IE: Experimental comparison with HMM • Systems: – TK_SemTagger: Textkernel’s Memory-Based shallow semantic tagger, using TiMBL v4.0; and – TnT (Brants, 2000), trigram HMM, smoothing by deleted interpolation. Handles unknown words by successive abstraction over suffix tree. • Classification-based Information Extraction (Zavrel & Daelemans, forthcoming) • Data set: Seminar Announcements. • Score: F1 (=2*p*r/(p+r)), Exact match, all occurrences. Data set Seminar Announcement data set (Freitag, 1998): 485 documents (1-243 for training, 243-283 for validation, 283-323 testing • Extracted fields: – – – – “speaker” “location” “start time” “end time” Experiments: TK_SemTagger Feature set: -4 -3 -2 -1 0 1 2 3 4 tag, word{f>4} tag, word{f>4} tag, word{f>4} tag, word{f>4} lextag, word{f>4}, suf1, suf2, pref1, cap, allcap, initial, hyp, num, at lextag, word{f>4} lextag, word{f>4} lextag, word{f>4} lextag, word{f>4} Experiments: comparison Comparison best TnT and TK_SemTagger model: TnT Context TKSem Best speaker location stime etime 0.66 0.70 0.85 0.90 0.71 0.87 0.95 0.96 Conclusions • Text Mining tasks benefit from linguistic analysis (shallow parsing) • Problems ranging from shallow parsing to application-oriented tasks (Information Extraction) can be formulated as classificationbased learning tasks • These classifiers can be trained on Language Banks • Memory-Based Learning seems to have the right bias for this type of task (can cope with rich feature sets and exceptions) Text Mining at CNTS (http://cnts.uia.ac.be) • Development of shallow parsers, lemmatizers, named-entity recognizers and other tools – ML basic research aspects, design, adaptation to specific TM applications – English, Dutch, … • Information and Ontology Extraction – BIOMINT (?) (EU), ONTOBASIS (IWT) • Summarization and automatic subtitling – MUSA (EU), ATRANOS (IWT)