Download P(A,B)

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Statistics wikipedia , lookup

History of statistics wikipedia , lookup

Probability interpretations wikipedia , lookup

Inductive probability wikipedia , lookup

Probability wikipedia , lookup

Transcript
Statistical NLP
Course for Master in Computational Linguistics
2nd Year
2016-2017
Diana Trandabăț
Intro to probabilities
• Probability deals with prediction:
– Which word will follow in this ....?
– How can parses for a sentence be ordered?
– Which meaning is more likely?
– Which grammar is more linguistically plausible?
– See phrase “more lies ahead”. How likely is it that
“lies” is noun?
– See “Le chien est noir”. How likely is it that the correct
translation is “The dog is black”?
• Any rational decision can be described probabilistically.
Notations
• Experiment (or trial)
– repeatable process by which observations are made
– e.g. tossing 3 coins
• Observe basic outcome from sample space, Ω, (set of all
possible basic outcomes)
• Examples of sample spaces:
• one coin toss, sample space Ω = { H, T };
• three coin tosses, Ω = {HHH, HHT, HTH, HTT, THH, THT, TTH,
TTT}
• part-of-speech of a word, Ω = {N, V, Adj, etc…}
• next word in Shakespeare play, |Ω| = size of vocabulary
• number of words in your Msc. Thesis Ω = { 0, 1, … ∞ }
Notation
• An event A, is a set of basic outcomes, i.e., a subset of the
sample space, Ω.
Example:
Ω = {HHH, HHT, HTH, HTT, THH, THT, TTH, TTT}
A=Ω is the certain event P(A=Ω)=1
A=∅ is the impossible event P(A=∅) = 0
For “not A” , we write Ā
Intro to probablities
Intro to probabilities
• “A coin is tossed 3 times.
• What is the likelihood of 2 heads?”
– Experiment: Toss a coin three times
– Sample space Ω = {HHH, HHT, HTH, HTT, THH, THT,
TTH, TTT}
– Event: basic outcomes that have exactly 2 H’s
A = {THH, HTH, HHT}
– the likelihood of 2 heads is 3 out of 8 possible outcomes
P(A) = 3/8
Probability distribution
• A probability distribution is an assignment of
probabilities from a set of outcomes.
– A uniform distribution assigns the same probability
to all outcomes (eg a fair coin).
– A gaussian distribution assigns a bell-curve over
outcomes.
– Many others.
– Uniform and gaussians popular in SNLP.
Joint probabilities
Probabilities as sets
A
A∩B
P(A|B) = P(A∩B) / P(B)
P(A∩B)= P(A|B) * P(B)
P(B|A) = P(A∩B) / P(B)
P(B∩A)= P(A∩B) =
P(B|A) * P(A) = P(A|B) * P(B)
B
Probabilities as sets
A
A∩B
P(A|B) = P(A∩B) / P(B)
P(A∩B)= P(A|B) * P(B)
P(B|A) = ?
P(B∩A)= P(A∩B) =
P(B|A) * P(A) = P(A|B) * P(B)
B
Probabilities as sets
A
A∩B
P(A|B) = P(A∩B) / P(B)
P(A∩B)= P(A|B) * P(B)
P(B|A) = P(B∩A) / P(A)
P(B∩A)= P(A∩B) =
P(B|A) * P(A) = P(A|B) * P(B)
B
Multiplication
rule
Probabilities as sets
A
A∩B
B
P(A) = P(A∩B) + P(A∩B)
P(A) = P(A|B) * P(B) + P(A|B) * P(B)
Additivity rule
Bayes’ Theorem
• Bayes’ Theorem lets us swap the order of
dependence between events
• We saw that P(A | B)  P(A, B)
P(B)
• Bayes’ Theorem:
P(B | A)P(A)
P(A| B) 
P(B)
Independent events
• Two events are independent if:
P(A,B)=P(A)*P(B)
• Consider a fair dice. Intuitively, each side (1, 2, 3, 4, 5, 6)
has an appearance chance of 1/6.
• Consider the eveniment X “the number on the dice will be
devided by 2” and Y “the number s divided by 3”.
Independent events
• Two events are independent if:
P(A,B)=P(A)*P(B)
• Consider a fair dice. Intuitively, each side (1, 2, 3, 4, 5, 6)
has an appearance chance of 1/6.
• Consider the eveniment X “the number on the dice will be
devided by 2” and Y “the number s divided by 3”.
• X={2, 4, 6}, Y={3, 6}
Independent events
• Two events are independent if:
P(A,B)=P(A)*P(B)
• Consider a fair dice. Intuitively, each side (1, 2, 3, 4, 5, 6)
has an appearance chance of 1/6.
• Consider the eveniment X “the number on the dice will be
devided by 2” and Y “the number s divided by 3”.
• X={2, 4, 6}, Y={3, 6}
• p(X)=p(2)+p(4)+p(6)=1/6+1/6+1/6=3/6=1/2
• p(Y)=p(3)+p(6)=1/3
Independent events
• Two events are independent if:
P(A,B)=P(A)*P(B)
• Consider a fair dice. Intuitively, each side (1, 2, 3, 4, 5, 6)
has an appearance chance of 1/6.
• Consider the eveniment X “the number on the dice will be
devided by 2” and Y “the number s divided by 3”.
• X={2, 4, 6}, Y={3, 6}
• p(X)=p(2)+p(4)+p(6)=1/6+1/6+1/6=3/6=1/2
• p(Y)=p(3)+p(6)=1/3
• p(X,Y)=p(6)=1/2*1/3=p(X)*p(Y)=1/6
• ==> X and Y are independents
Independent events
• Consider Z the event “the number on the dice can be
divided by 4”
Are X and Z independent?
p(Z)=p(4)=1 /6
p(X,Z)=1/6,
p(X|Z)=p(X,Z) / p(Z)=1/6 /1/6=11/2 ==> non-indep.
Other useful relations:
p(x)=p(x|y) *p(y)
yY
or
p(x)=p(x,y)
yY
Chain rule:
p(x1,x2,…xn) = p(x1) * p(x2| x1 )*p(x3| x1,x2)*... p(xn| x1,x2 ,…xn-1)
The demonstration is easy, through successive reductions:
Consider event y as coincident of events x1,x2 ,…xn-1
p(x1,x2,…xn)= p(y, xn)=p(y)*p(xn| y)= p(x1,x2 ,…xn-1)*p(xn | x1,x2 ,…xn-1)
similar for the event z
p(x1,x2,…xn-1)= p(z, xn-1)=p(z)*p(xn -1| z)= p(x1,x2 ,…xn-2)*p(xn -1| x1,x2 ,…xn-2)
...
p(x1,x2,…xn)= p(x1) * p(x2| x1 )*p(x3| x1,x2)*... p(xn| x1,x2 ,…xn-1)
prior bigram, trigram,
n-gram
Objections
• People don’t compute probabilities.
• Why would computers?
• Or do they?
•
John went to …
the market
go
red
if
number
Objections
• Statistics only count words and co-occurrences
• Two different concepts:
– Statistical model and statistical method
• The first doesn’t need the second one.
• A person which used the intuition to raison is using a
statistical model without statistical methods.
• Objections refer mainly to the accuracy of statistical
models.
Reference
• Christopher D. Manning and Hinrich
Schiitze, Fundations of Statistical Natural
Language Processing
Great!
P(See you next time)…=
Great!
P(See you next time)=…