Download Document

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Probability and Statistics
Review
Thursday Mar 12
‫מודל למידה‬
‫‪‬‬
‫‪‬‬
‫‪‬‬
‫מה המשתנים החשובים?‬
‫מה הטווח שלהם?‬
‫מהן הקומבינציות החשובות?‬
‫חזרה מהירה על הסתברות‬
‫•‬
‫•‬
‫•‬
‫•‬
‫•‬
‫•‬
‫מאורע ‪ ,‬אוסף מאורעות‬
‫משתנה מקרי‬
‫הסתברות מותנית ‪ ,‬חוק ההסתברות השלמה ‪,‬‬
‫חוק השרשרת ‪,‬חוק בייס‬
‫תלות ואי תלות‬
‫שונות ‪,‬תוחלת ושונות משותפת‬
‫‪Moments‬‬
Sample space and Events
• W : Sample Space, result of an experiment
• If you toss a coin twice W = {HH,HT,TH,TT}
• Event: a subset of W
• First toss is head = {HH,HT}
• S: event space, a set of events:
• Closed under finite union and complements
• Entails other binary operation: union, diff, etc.
• Contains the empty event and W
Probability Measure
• Defined over (W,S) s.t.
• P(a) >= 0 for all a in S
• P(W) = 1
• If a, b are disjoint, then
• P(a U b) = p(a) + p(b)
• We can deduce other axioms from the above ones
• Ex: P(a U b) for non-disjoint event
Visualization
• We can go on and define conditional
probability, using the above visualization
‫הסתברות מותנית‬
-P(F|H) = Fraction of worlds in which H is true that
also have F true
p( F | H ) =
p( F  H )
p( H )
Rule of total probability
B5
B2
B3
B4
A
B7
B6
B1
p A) =  PBi )P A | Bi )
Bayes Rule
• We know that P(smart) = .7
• If we also know that the students grade is
A+, then how this affects our belief about
his intelligence?
P( x) P( y | x)
P x | y ) =
P( y )
• Where this comes from?
‫דוגמא‬
‫במפעל פעולות שתי מכונות ‪ A‬ו‪ 10% . B-‬מתוצרת המפעל מיוצרת במכונה ‪ A‬ו‪-‬‬
‫‪ 90%‬במכונה ‪ 1% .B‬מהמוצרים המיוצרים במכונה ‪ A‬ו ‪ 5%‬מהמוצרים‬
‫המיוצרים במכונה ‪ B‬הם פגומים‪.‬‬
‫‪ ‬נבחר מוצר אקראי‪ ,‬מה ההסתברות שהוא פגום?‬
‫‪ ‬נמצא מוצר שהוא פגום‪ ,‬מה ההסתברות שיוצר במכונה ‪?A‬‬
‫‪ ‬אחרי ביקור של טכנאי שמטפל במכונה ‪ , B‬מוצאים ש ‪ 1.9%‬ממוצרי המפעל‬
‫הם פגומים‪ .‬מה עכשיו ההסתברות שמוצר המיוצר במכונה ‪ B‬יהיה פגום?‬
‫פתרון‪:‬‬
‫נגדיר את המאורעות הבאים‪-A :‬המוצר הנבחר יוצר במכונה ‪-B , A‬המוצר‬
‫הנבחר יוצר במכונה ‪-C ,B‬המוצר שנבחר פגום‪.‬‬
‫א‪ .‬עפ"י נוסחת ההסתברות השלמה מתקיים‪P(C|A)*P(A)+P(C|B)*P(B) :‬‬
‫‪0.9*0.05 +0.1*0.01=0.046‬‬
‫ב‪ .‬עפ"י נוסחת בייס‪>= P(A|C)=P(C|A)*P(A)/P(C) :‬‬
‫‪P(A|C)=0.01*0.1/0.046=0.021‬‬
‫משתנה מקרי בדיד‬
‫• מ"מ הוא פונקציה ממרחב המאורעות הכללי (העולם) למרחב‬
‫המאפיינים‬
‫• בעצם ניתן לייצג התפלגות חדשה על פי המשתנה המקרי‬
• Modeling students (Grade and Intelligence):
• W = all possible students
• What are events
• Grade_A = all students with grade A
• Grade_B = all students with grade A
• Intelligence_High = … with high intelligence
Random Variables
W
I:Intelligence
High
low
A
G:Grade
B
A+
Random Variables
W
I:Intelligence
High
low
A
G:Grade
B
P(I = high) = P( {all students whose intelligence is high})
A+
‫הסתברות משותפת‬
• Joint probability distributions quantify this
• P( X= x, Y= y) = P(x, y)
• How probable is it to observe these two attributes together?
• How can we manipulate Joint probability distributions?
.1,2,3,4 ‫מרכיבים באקראי מס' דו סיפרתי מהספרות‬:‫דוגמא‬
.‫ מופיעה‬1 ‫ מס' הפעמים שהספרה‬Y ‫ מס' הספרות השונות המופיעות במס' ו‬X ‫יהי‬
?(X,Y) ‫מהי ההתפלגות המשותפת של הזוג‬
1
X
Y
0
3/16
1
2
6/16
0
1/16
2
6/16
0
‫חוק השרשרת‬
‫‪• Always true‬‬
‫)‪• P(x,y,z) = p(x) p(y|x) p(z|x, y‬‬
‫)‪= p(z) p(y|z) p(x|y, z‬‬
‫…=‬
‫כדי לסבך קצת את העניינים נוסיף לשאלה מקודם‬
‫את הנתון הבא ‪ z‬יהיה מס הפעמים שמופיע ספרה גדולה ממש מ ‪2‬‬
‫)‪P(x=2,y=1,z=1)=P(x=2)*P(y=1|x=2)*P(z=1|y=1,x=2‬‬
‫])‪=0.75*[(6/16)/P(x=2)]*[(4/16)/(6/16‬‬
Conditional Probability
events
P X = x Y = y) =
But we will always write it this way:
p( x, y)
P x | y ) =
p( y )
P X = x  Y = y)
P Y = y)
‫הסתברות השולית‬
• We know P(X,Y), what is P(X=x)?
• We can use the low of total probability, why?
p  x ) =  P  x, y )
y
=  P y )Px | y )
y
B4
B5
B2
B3
A
B7
B6
B1
Marginalization Cont.
• Another example
p  x ) =  P  x, y , z )
y,z
=  P y, z )Px | y, z )
z,y
Bayes Rule cont.
• You can condition on more variables
P( x | z ) P( y | x, z )
P x | y , z ) =
P( y | z )
‫אי תלות‬
• X is independent of Y means that knowing Y
does not change our belief about X.
• P(X|Y=y) = P(X)
• P(X=x, Y=y) = P(X=x) P(Y=y)
• Why this is true?
• The above should hold for all x, y
• It is symmetric and written as X  Y
CI: Conditional Independence
• X  Y | Z if once Z is observed, knowing the
value of Y does not change our belief about X
• The following should hold for all x,y,z
• P(X=x | Z=z, Y=y) = P(X=x | Z=z)
• P(Y=y | Z=z, X=x) = P(Y=y | Z=z)
• P(X=x, Y=y | Z=z) = P(X=x| Z=z) P(Y=y| Z=z)
We call these factors : very useful concept !!
Properties of CI
• Symmetry:
– (X  Y | Z)  (Y  X | Z)
• Decomposition:
– (X  Y,W | Z)  (X  Y | Z)
• Weak union:
– (X  Y,W | Z)  (X  Y | Z,W)
• Contraction:
– (X  W | Y,Z) & (X  Y | Z)  (X  Y,W | Z)
• Intersection:
– (X  Y | W,Z) & (X  W | Y,Z)  (X  Y,W | Z)
– Only for positive distributions!
– P(a)>0, 8a, a;
Monty Hall Problem




You're given the choice of three doors: Behind one
door is a car; behind the others, goats.
You pick a door, say No. 1
The host, who knows what's behind the doors,
opens another door, say No. 3, which has a goat.
Do you want to pick door No. 2 instead?
Host reveals
Goat A
or
Host reveals
Goat B
Host must
reveal Goat B
Host must
reveal Goat A
Monty Hall Problem: Bayes
Rule
Ci : the car is behind door i, i = 1, 2, 3
 P  Ci ) = 1 3
 Hij : the host opens door j after you pick door i



P H ij Ck
)
i= j
0
0
j=k

=
i=k
1 2
 1 i  k , j  k
Monty Hall Problem



WLOG, i=1, j=3
P  C1 H13 ) =
P  H13
P  H13 C1 ) P  C 1 )
P  H13 )
1 1 1
C1 ) P  C1 ) =  =
2 3 6
Monty Hall Problem: Bayes Rule cont.

P  H13 ) = P  H13 , C1 )  P  H13 , C2 )  P  H13 , C3 )
= P  H13 C1 ) P  C1 )  P  H13 C2 ) P  C2 )

1
1
=  1
6
3
1
=
2
16 1
P  C1 H13 ) =
=
12 3
Monty Hall Problem: Bayes Rule cont.



16 1
P  C1 H13 ) =
=
12 3
1 2
P C2 H13 = 1  =  P C1 H13
3 3

)
You should switch!

)
Moments

Mean (Expectation):  = E  X )



 v P X = v )
Continuous RVs:E  X ) =  xf  x ) dx
Discrete RVs: E  X ) =
vi i


Variance: V  X ) = E  X   )


i
Discrete RVs: V  X ) =

Continuous RVs: V  X ) =
2
 vi   ) P  X = vi )
2
vi


x  )


2
f  x )dx
Properties of Moments

Mean




E  X  Y) = E  X)  E  Y)
E  aX) = aE  X)
If X and Y are independent,E  XY) = E  X)  E  Y)
Variance


V  aX  b ) = a 2V  X )
If X and Y are independent, V X  Y = V (X)  V (Y)

)
The Big Picture
Probability
Model
Data
Estimation/learning
Statistical Inference

Given observations from a model

What (conditional) independence assumptions
hold?


Structure learning
If you know the family of the model (ex,
multinomial), What are the value of the
parameters: MLE, Bayesian estimation.

Parameter learning
MLE

Maximum Likelihood estimation

Example on board


Given N coin tosses, what is the coin bias (q )?
Sufficient Statistics: SS


Useful concept that we will make use later
In solving the above estimation problem, we only
cared about Nh, Nt , these are called the SS of
this model.


All coin tosses that have the same SS will result in the
same value of q
Why this is useful?
Statistical Inference

Given observation from a model

What (conditional) independence assumptions
holds?


Structure learning
If you know the family of the model (ex,
multinomial), What are the value of the
parameters: MLE, Bayesian estimation.

Parameter learning
We need some concepts from information theory
Information Theory
• P(X) encodes our uncertainty about X
• Some variables are more uncertain that others
P(Y)
P(X)
X
Y
• How can we quantify this intuition?
• Entropy: average number of bits required to encode X

1 
1

)
H P  X ) = E log
=
P
x
log
 

)
p
x
P x )

 x
Information Theory cont.
• Entropy: average number of bits required to encode X

1 
1

)
H P  X ) = E log
=
P
x
log
 

)
p
x
P x )

 x
• We can define conditional entropy similarly

1 
H P  X | Y ) = E log
= H P  X ,Y )  H P Y )

p x | y )

• We can also define chain rule for entropies (not surprising)
H P  X , Y , Z ) = H P  X )  H P Y | X )  H P Z | X , Y )
Mutual Information: MI
• Remember independence?
• If XY then knowing Y won’t change our belief about X
• Mutual information can help quantify this! (not the only
way though)
• MI:
I P  X ;Y ) = H P X )  H P  X | Y )
• Symmetric
• I(X;Y) = 0 iff, X and Y are independent!
Continuous Random Variables



What if X is continuous?
Probability density function (pdf) instead of
probability mass function (pmf)
A pdf is any function f  x ) that describes the
probability density in terms of the input
variable x.
PDF

Properties of pdf




f  x )  0, x



f  x) = 1
f  x )  1 ???
Actual probability can be obtained by taking
the integral of pdf

E.g. the probability of X being between 0 and 1 is
P  0  X  1) =

1
0
f  x )dx
Cumulative Distribution
Function


FX  v ) = P  X  v )
Discrete RVs


FX  v ) =

vi
P  X = vi )
Continuous RVs



v
FX  v ) =
f  x ) dx

d
FX  x ) = f  x )
dx
Acknowledgment

Andrew Moore Tutorial: http://www.autonlab.org/tutorials/prob.html

Monty hall problem: http://en.wikipedia.org/wiki/Monty_Hall_problem

http://www.cs.cmu.edu/~guestrin/Class/10701-F07/recitation_schedule.html