Download PPT - Foundations of Artificial Intelligence

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Machine learning wikipedia , lookup

Pattern recognition wikipedia , lookup

Stemming wikipedia , lookup

Time series wikipedia , lookup

Embodied language processing wikipedia , lookup

Transcript
Advanced AI - Part II
Luc De Raedt
University of Freiburg
WS 2004/2005
Many slides taken from Helmut Schmid
Topic


Statistical Natural Language Processing
Applies

Machine Learning / Statistics to


Learning : the ability to improve one’s behaviour at a
specific task over time - involves the analysis of data
(statistics)
Natural Language Processing
Rationalism versus Empiricism

Rationalist




Noam Chomsky - innate language structures
AI : hand coding NLP
Dominant view 1960-1985
Empiricist



Ability to learn is innate
AI : language is learned from corpora
Dominant 1920-1960 and becoming increasingly
important
Rationalism versus Empiricism

Noam Chomsky:


But it must be recognized that the notion of
“probability of a sentence” is an entirely useless
one, under any known interpretation of this term
Fred Jelinek (IBM 1988)


Every time a linguist leaves the room the
recognition rate goes up.
(Alternative: Every time I fire a linguist the
recognizer improves)
This course

Empiricist approach


Focus will be on probabilistic models for learning
of natural language
No time to treat natural language in depth !


(though this would be quite useful and
interesting)
Deserves a full course by itself
Ambiguity
NLP and Statistics
Statistical Disambiguation
• Define a probability model for the data
• Compute the probability of each alternative
• Choose the most likely alternative
NLP and Statistics
Statistical Methods deal with uncertainty.
They predict the future behaviour of a system
based on the behaviour observed in the past.
 Statistical Methods require training data.
The data in Statistical NLP are the Corpora
Corpora
 Corpus: text collection for linguistic purposes
 Tokens
How many words are contained in Tom Sawyer?
 71.370
 Types
How many different words are contained in T.S.?
 8.018
 Hapax Legomena
words appearing only once
Word Counts
word
freq
word
freq
the
3332
in
906
and
2972
that
877
a
1775
he
877
to
1725
I
783
of
1440
his
772
was
1161
you
686
it
1027
Tom
679
 The most frequent words are function words
Word Counts
f
1
2
3
4
5
6
7
8
9
10
11-50
51-100
> 100
nf
3993
1292
664
410
243
199
172
131
82
91
540
99
102
How many words appear f times?
Word Counts
Word Counts
Zipf‘s Law
word
the
and
a
he
but
be
there
one
about
more
never
Oh
two
f
3332
2972
1775
877
410
294
222
172
158
138
124
116
104
r
f*r
1
3332
2
5944
3
5235
10
8770
20
8400
30
8820
40
8880
50
8600
60
9480
70
9660
80
9920
90 10440
100 10400
Zipf‘s Law: f~1/r
word
turned
you‘ll
name
comes
group
lead
friends
begin
family
brushed
sins
Could
Applausive
f
51
30
21
16
13
11
10
9
8
4
2
2
1
(f*r = const)
r
f*r
200 10200
300
9000
400
8400
500
8000
600
7800
700
7700
800
8000
900
8100
1000 8000
2000 8000
3000 6000
4000 8000
8000 8000
Some probabilistic models

N-grams

Predicting the next word

Artificial intelligence and machine ….
Statistical natural language ….


Probabilistic


Regular (Markov Models)
Context-free grammars
Illustration



Wall Street Journal Corpus
3 000 000 words
Correct parse tree for sentences known



Constructed by hand
Can be used to derive stochastic context free
grammars
SCFG assign probability to parse trees

Compute the most probable parse tree
Conclusions


Overview of some probabilistic and
machine learning methods for NLP
Also very relevant to bioinformatics !

Analogy between parsing


A sentence
A biological string (DNA, protein, mRNA, …)