Download PPT - Foundations of Artificial Intelligence

Advanced AI - Part II Luc De Raedt University of Freiburg WS 2004/2005 Many slides taken from Helmut Schmid Topic   Statistical Natural Language Processing Applies  Machine Learning / Statistics to   Learning : the ability to improve one’s behaviour at a specific task over time - involves the analysis of data (statistics) Natural Language Processing Rationalism versus Empiricism  Rationalist     Noam Chomsky - innate language structures AI : hand coding NLP Dominant view 1960-1985 Empiricist    Ability to learn is innate AI : language is learned from corpora Dominant 1920-1960 and becoming increasingly important Rationalism versus Empiricism  Noam Chomsky:   But it must be recognized that the notion of “probability of a sentence” is an entirely useless one, under any known interpretation of this term Fred Jelinek (IBM 1988)   Every time a linguist leaves the room the recognition rate goes up. (Alternative: Every time I fire a linguist the recognizer improves) This course  Empiricist approach   Focus will be on probabilistic models for learning of natural language No time to treat natural language in depth !   (though this would be quite useful and interesting) Deserves a full course by itself Ambiguity NLP and Statistics Statistical Disambiguation • Define a probability model for the data • Compute the probability of each alternative • Choose the most likely alternative NLP and Statistics Statistical Methods deal with uncertainty. They predict the future behaviour of a system based on the behaviour observed in the past.  Statistical Methods require training data. The data in Statistical NLP are the Corpora Corpora  Corpus: text collection for linguistic purposes  Tokens How many words are contained in Tom Sawyer?  71.370  Types How many different words are contained in T.S.?  8.018  Hapax Legomena words appearing only once Word Counts word freq word freq the 3332 in 906 and 2972 that 877 a 1775 he 877 to 1725 I 783 of 1440 his 772 was 1161 you 686 it 1027 Tom 679  The most frequent words are function words Word Counts f 1 2 3 4 5 6 7 8 9 10 11-50 51-100 > 100 nf 3993 1292 664 410 243 199 172 131 82 91 540 99 102 How many words appear f times? Word Counts Word Counts Zipf‘s Law word the and a he but be there one about more never Oh two f 3332 2972 1775 877 410 294 222 172 158 138 124 116 104 r f*r 1 3332 2 5944 3 5235 10 8770 20 8400 30 8820 40 8880 50 8600 60 9480 70 9660 80 9920 90 10440 100 10400 Zipf‘s Law: f~1/r word turned you‘ll name comes group lead friends begin family brushed sins Could Applausive f 51 30 21 16 13 11 10 9 8 4 2 2 1 (f*r = const) r f*r 200 10200 300 9000 400 8400 500 8000 600 7800 700 7700 800 8000 900 8100 1000 8000 2000 8000 3000 6000 4000 8000 8000 8000 Some probabilistic models  N-grams  Predicting the next word  Artificial intelligence and machine …. Statistical natural language ….   Probabilistic   Regular (Markov Models) Context-free grammars Illustration    Wall Street Journal Corpus 3 000 000 words Correct parse tree for sentences known    Constructed by hand Can be used to derive stochastic context free grammars SCFG assign probability to parse trees  Compute the most probable parse tree Conclusions   Overview of some probabilistic and machine learning methods for NLP Also very relevant to bioinformatics !  Analogy between parsing   A sentence A biological string (DNA, protein, mRNA, …)

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download PPT - Foundations of Artificial Intelligence