Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Introduction to Discrete Probability
Unbiased die
Discrete Probability
CSC-2259 Discrete Structures
Sample Space:
S {1,2,3,4,5,6}
All possible outcomes
Konstantin Busch - LSU
1
Probability of event
Event: any subset of sample space
E2 {2,5}
E1 {3}
2
p( E )
E:
size of event set
|E|
size of sample space | S |
Experiment: procedure that yields events
Throw die
Konstantin Busch - LSU
3
Note that:
0 p( E ) 1
since
0 | E || S |
Konstantin Busch - LSU
4
1
What is the probability that
a die brings 3?
E {3}
Event Space:
Sample Space:
Probability:
What is the probability that
a die brings 2 or 5?
S {1,2,3,4,5,6}
p( E )
E {2,5}
Event Space:
Sample Space:
|E| 1
|S| 6
Probability:
S {1,2,3,4,5,6}
p( E )
|E| 2
|S| 6
5
6
Two unbiased dice
What is the probability that
two dice bring (1,1)?
Event Space:
Sample Space: 36 possible outcomes
S {(1,1), (1,2), (1,3), , (6,6)}
E {(1,1)}
Sample Space: S {(1,1), (1,2), (1,3), , (6,6)}
First die
Second die
Ordered pair
Probability:
7
p( E )
|E| 1
| S | 36
8
2
Game with unordered numbers
Game authority selects
a set of 6 winning numbers out of 40
What is the probability that
two dice bring same numbers?
Event Space: E {(1,1), (2,2), (3,3), (4,4), (5,5), (6,6)}
Player picks a set of 6 numbers
(order is irrelevant)
Sample Space: S {(1,1), (1,2), (1,3), , (6,6)}
Probability:
p( E )
Number choices: 1,2,3,…,40
i.e. winning numbers: 4,7,16,25,33,39
i.e. player numbers: 8,13,16,23,33,40
|E| 6
| S | 36
What is the probability that a player wins?
9
Winning event:
E {{4,7,16,25,33,39}} | E | 1
a single set with the 6 winning numbers
Konstantin Busch - LSU
10
Probability that player wins:
P( E )
Sample space:
S {all subsets with 6 numbers out of 40}
{{1,2,3,4,5,6},{1,2,3,4,5 ,7},{1,2,3,4,5,8},}
|E|
1
1
| S | 40 3,838,380
6
40
| S | 3,838,380
6
Konstantin Busch - LSU
11
Konstantin Busch - LSU
12
3
Event: E {{2 h ,2d ,2c ,2s },{3 h ,3d ,3c ,3s },,{jh , jd , jc , js }}
A card game
Deck has 52 cards
| E | 13
13 kinds of cards (2,3,4,5,6,7,8,9,10,a,k,q,j),
each kind has 4 suits (h,d,c,s)
Player is given hand with 4 cards
What is the probability that the cards
of the player are all of the same kind?
Konstantin Busch - LSU
Sample Space:
S {all possible sets of 4 cards out of 52}
{{2 h ,2d ,2c ,2s },{2 h ,2d ,2c ,3h },{2 h ,2d ,2c ,3d },}
52 52! 52 51 50 49
| S |
270,725
4 3 2
4 4! 48!
13
Konstantin Busch - LSU
14
Game with ordered numbers
Probability that hand has 4 same kind cards:
P( E )
each set of 4 cards is of
same kind
Game authority selects from a bin 5 balls
in some order labeled with numbers 1…50
|E|
13
13
| S | 52 270,725
4
Number choices: 1,2,3,…,50
i.e. winning numbers: 37,4,16,33,9
Player picks a set of 5 numbers
(order is important)
i.e. player numbers: 40,16,13,25,33
What is the probability that a player wins?
Konstantin Busch - LSU
15
Konstantin Busch - LSU
16
4
Sampling without replacement:
After a ball is selected
it is not returned to bin
Sampling with replacement:
After a ball is selected
it is returned to bin
Sample space size: 5-permutations of 50 balls
| S | P(50,5)
50!
50
50 49 48 47 46 245,251,200
(50 5)! 45!
Probability of success: P( E )
|E|
1
| S | 245,251,200
Konstantin Busch - LSU
Probability of Inverse:
Sample space size: 5-permutations of 50 balls
with repetition
| S | 505 312,500,000
Probability of success: P( E )
17
|E|
1
| S | 312,500,000
Konstantin Busch - LSU
18
Example: What is the probability that
a binary string of 8 bits contains
at least one 0?
p( E ) 1 p( E )
E {01111111, 10111111,,00111111,,00000000}
Proof:
p( E )
E S E
E {11111111}
|S E| |S || E|
|E|
1
1 p( E )
|S|
|S|
|S|
p( E ) 1 p( E ) 1
|E|
1
1 8
|S|
2
End of Proof
Konstantin Busch - LSU
19
Konstantin Busch - LSU
20
5
Probability of Union:
Example: What is the probability that
a binary string of 8 bits
starts with 0 or ends with 11?
E1, E2 S
p( E1 E2 ) p( E1 ) p( E2 ) p( E1 E2 )
Strings that start with 0:
E1 {00000000, 00000001,,01111111}
Proof: | E1 E2 || E1 | | E2 | | E1 E2 |
| E1 | 27 (all binary strings with 7 bits 0xxxxxxx)
| E1 E2 | | E1 | | E1 | | E1 E2 |
|S|
|S|
| E1 | | E2 | | E1 E2 |
|S| |S|
|S|
p( E1 ) p( E2 ) p( E1 E2 )
p( E1 E2 )
Konstantin Busch - LSU
Strings that end with 11:
E2 {00000011, 00000111,,11111111}
End of Proof 21
| E1 E2 | 25 (all binary strings with 5 bits 0xxxxx11)
Strings that start with 0 or end with 11:
p( E1 E2 ) p( E1 ) p( E2 ) p( E1 E2 )
Sample space:
22
S {x1 , x2 ,, xn }
Probability distribution function
p:
0 p( xi ) 1
| E1 | | E2 | | E1 E2 |
|S| |S|
|S|
n
p( x ) 1
2 7 2 6 25 1 1 1 5
8 8 8
2 2 2
2 4 8 8
Konstantin Busch - LSU
Konstantin Busch - LSU
Probability Theory
Strings that start with 0 and end with 11:
E1 E2 {000000011, 00000111,,01111111}
| E2 | 26 (all binary strings with 6 bits xxxxxx11)
x 1
23
i
Konstantin Busch - LSU
24
6
Notice that it can be:
Uniform probability distribution:
p ( xi ) p( x j )
p ( xi )
Example: Biased Coin
Heads (H) with probability 2/3
Tails (T) with probability 1/3
Sample space:
p( H )
2
3
Sample space:
1
3
p( H ) p(T )
S {x1 , x2 ,, xn }
Example: Unbiased Coin
Heads (H) or Tails (T) with probability 1/2
S {H , T }
p (T )
1
n
2 1
1
3 3
Konstantin Busch - LSU
25
S {H , T }
p( H )
E:
E {x1 , x2 ,, xk } S
p (T )
26
S {1,2,3,4,5,6}
p(1) p(2) p(3) p(4) p(5)
k
p( E ) p( xi )
i 1
For uniform probability distribution: p( E )
Konstantin Busch - LSU
What is the probability
that the die outcome is 2 or 6?
|E|
|S|
27
1
2
Konstantin Busch - LSU
Example: Biased die
Probability of event
1
2
p( E ) p(2) p(6)
Konstantin Busch - LSU
1
7
p(6)
2
7
E {2,6}
1 2 3
7 7 7
28
7
Conditional Probability
Combinations of Events:
Complement:
Three tosses of an unbiased coin
p( E ) 1 p( E )
Union: p( E1 E2 ) p( E1 ) p( E2 ) p( E1 E2 )
Union of disjoint events: p Ei p( Ei )
i
i
Konstantin Busch - LSU
29
Sample space:
Tails
Heads
Tails
Condition:
first coin is Tails
Question:
What is the probability that
there is an odd number of Tails,
given that first coin is Tails?
Konstantin Busch - LSU
30
Event without condition:
S {TTT , TTH , THT , THH , HTT , HTH , HHT , HHH }
E {TTT , THH , HTH , HHT }
Odd number of Tails
Restricted sample space given condition:
Event with condition:
F {TTT , TTH , THT , THH }
EF E F {TTT , THH }
first coin is Tails
Konstantin Busch - LSU
first coin is Tails
31
Konstantin Busch - LSU
32
8
F {TTT , TTH , THT , THH }
Notation of event with condition:
EF E F {TTT , THH }
EF E | F
event E given F
Given condition,
the sample space changes to F
p( EF )
| E F | | E F | / | S | p( E F ) 2 / 8
0.5
|F|
|F|/|S|
p( F )
4/8
p( EF ) p( E | F )
p( E F )
p( F )
(the coin is unbiased)
Konstantin Busch - LSU
33
(for arbitrary probability distribution)
p( E | F )
Assume equal probability to have boy or girl
F is:
Sample space:
p( E F )
p( F )
Konstantin Busch - LSU
34
Example: What is probability that a family
of two children has two boys
given that one child is a boy
Conditional probability definition:
Given sample space S with
events E and F (where p ( F ) 0 )
the conditional probability of E given
Konstantin Busch - LSU
Condition:
35
S {BB , BG , GB, GG}
F {BB , BG , GB}
one child is a boy
Konstantin Busch - LSU
36
9
Event:
Independent Events
E {BB}
both children are boys
Events
E1
and
E2
are independent iff:
p( E1 E2 ) p( E1 ) p( E2 )
Conditional probability of event:
p( E | F )
p( E F )
p({BB})
1/ 4 1
p( F )
p({BB , BG , GB}) 3 / 4 3
Equivalent definition (if p( E2 ) 0 ):
p( E1 | E2 ) p( E1 )
Konstantin Busch - LSU
37
Example: 4 bit uniformly random strings
E1 : a string begins with 1
E2 : a string contains even 1
| E1 E2 | 4
p ( E1 ) p ( E2 )
p( E1 E2 )
8 1
16 2
Success probability:
Failure probability:
p
q 1 p
Example: Biased Coin
4 1 1 1
p( E1 ) p( E2 )
16 4 2 2
Events E1 and E2 are independent
Konstantin Busch - LSU
38
Bernoulli trial: Experiment with two outcomes:
success or failure
E1 {1000, 1001, 1010, 1011, 1100, 1101, 1110, 1111}
E2 {0000, 0011, 0101, 0110, 1001, 1010, 1100, 1111}
E1 E2 {1111, 1100, 1010, 1001}
| E1 || E2 | 8
Konstantin Busch - LSU
39
Success = Heads
2
p p( H )
3
Failure = Tails
q p(T )
Konstantin Busch - LSU
1
3
40
10
Throw the biased coin 5 times
Independent Bernoulli trials:
the outcomes of successive Bernoulli trials
do not depend on each other
What is the probability to have 3 heads?
Heads probability:
p
2
3
(success)
Tails probability:
q
1
3
(failure)
Example: Successive coin tosses
Konstantin Busch - LSU
41
Konstantin Busch - LSU
42
Probability that any particular sequence has
3 heads and 2 tails is specified positions:
3 2
HHHTT
HTHHT
pq
For example:
HTHTH
HHHTT
THHTH
p
p
p
q
q
p
q
p
p
q
HTHHT
Total numbers of ways to arrange in sequence
5 coins with 3 heads: 5
3
Konstantin Busch - LSU
pppqq p 3q 2
HTHTH
p
43
pqppq p 3q 2
q
p
q
Konstantin Busch - LSU
p
pqpqp p 3q 2
44
11
Throw the biased coin 5 times
Probability of having 3 heads:
5
p 3q 2 p 3q 2 p 3q 2 p 3q 2
3
1st
sequence
success
(3 heads)
2nd
sequence
success
(3 heads)
Probability to have exactly 3 heads:
5 3 2
p q
3
5 st
3
2
1
0.0086
3
All possible ways to arrange in sequence 5 coins with 3 heads
45
Theorem: Probability to have k successes
in n independent Bernoulli trials:
n k nk
p q
k
Konstantin Busch - LSU
Proof:
n
b(k ; n, p) p k q n k
k
Probability that
a sequence has
k successes and
n k failures
in specified positions
Example:
SFSFFS…SSF
47
46
n k nk
p q
k
Total number of
sequences with
k successes and
n k failures
Also known as
binomial probability distribution:
Konstantin Busch - LSU
3
Probability to have 3 heads and 2 tails
in specified sequence positions
sequence
success
(3 heads)
Konstantin Busch - LSU
5! 2
3!2! 3
End of Proof
Konstantin Busch - LSU
48
12
Birthday Problem
Example: Random uniform binary strings
probability for 0 bit is 0.9
probability for 1 bit is 0.1
What is probability of 8 bit 0s out of 10 bits?
i.e. 0100001000
p 0 .9
q 0 .1
k 8
n 10
n
10
b(k ; n, p) p k q nk (0.9)8 (0.1) 2 0.1937102445
k
8
Konstantin Busch - LSU
Birthday collision: two people have birthday
in same day
Problem:
How many people should be in a room
so that the probability of birthday collision
is at least ½?
Assumption: equal probability to be born in any day
49
Konstantin Busch - LSU
50
We will compute
366 days in a year
pn
If the number of people is 367 or more
then birthday collision is guaranteed by
pigeonhole principle
:probability that n people have
all different birthdays
It will helps us to get
Assume that we have
n 366
Konstantin Busch - LSU
1 pn :probability that there is
people
a birthday collision among n people
51
Konstantin Busch - LSU
52
13
Sample space:
Cartesian product
Event set: each person’s birthday is different
S {1,2,,366} {1,2,,366} {1,2,,366}
1st person’s
Birthday
choices
2nd person’s
Birthday
choices
nth person’s
Birthday
choices
E {(1,2,,366), (366,1,,365), , (366,365,,1)}
1st person’s
birthday
Sample size:
S {(1,1,,1), (2,1,,1), (366,366,,366)}
| E | P(366, n)
Sample space size: | S | 366 366366 366n
Konstantin Busch - LSU
53
Probability of no birthday collision
pn
1 pn
n 22
1 pn 0.475
n 23
1 pn 0.506
Konstantin Busch - LSU
#choices
#choices
366!
366 365 364(366 n 1)
(366 n)!
Konstantin Busch - LSU
n 23
Therefore:
55
nth person’s
birthday
#choices
Probability of birthday collision:
| E | 366 365 364(366 n 1)
|S|
366n
Probability of birthday collision:
2nd person’s
birthday
54
1 pn
1 pn 0.506
n 23 people have probability
at least ½ of birthday collision
Konstantin Busch - LSU
56
14
Randomized algorithms:
The birthday problem analysis can be used
to determine appropriate hash table sizes
that minimize collisions
algorithms with randomized choices
(Example: quicksort)
Las Vegas algorithms:
Hash function collision:
randomized algorithms whose output
is always correct (i.e. quicksort)
h(k1 ) h(k2 )
Monte Carlo algorithms:
randomized algorithms whose output
is correct with some probability
(may produce wrong output)
Konstantin Busch - LSU
57
A Monte Carlo algorithm
b random_num ber(1, ,n)
if (Miller_Test( n,b ) == failure)
return(false) // n is not prime
}
return(true) // most likely n is prime
Konstantin Busch - LSU
58
Miller_Test( n,b ) {
n-1 2 s t
s, t 0, s log n, t is odd
for ( j 0 to s-1 ) {
j
t
if ( b 1 ( mod n) or b2 t 1 ( mod n) )
return(success)
}
return(failure)
}
Primality_Test( n,k ) {
for(i 1 to k ) {
}
Konstantin Busch - LSU
59
Konstantin Busch - LSU
60
15
If the primality test algorithm returns false
then the number is not prime for sure
A prime number n passes the
Miller test for every 1 b n
If the algorithm returns true then the
answer is correct (number is prime)
with high probability:
A composite number n passes the
Miller test in range 1 b n
for fewer than n numbers
k
1
1
1 1 1
n
4
4
false positive with probability:
1
4
Konstantin Busch - LSU
for k log 4 n
61
62
Bayes’ Theorem Proof:
Bayes’ Theorem
p( E ) 0
Konstantin Busch - LSU
p( F ) 0
p( E | F ) p( F )
p( F | E )
p( E | F ) p( F ) p( E | F ) p( F )
p( F | E )
p( E F )
p( E )
p( E F ) p( F | E ) p( E )
p( E | F )
p( E F )
p( F )
p( E F ) p( E | F ) p( F )
p( F | E ) p( E ) p( E | F ) p( F )
Applications: Machine Learning
Spam Filters
Konstantin Busch - LSU
p( F | E )
63
Konstantin Busch - LSU
p( E | F ) p( F )
p( E )
64
16
E (E F ) (E F )
p( E ) p( E F ) p( E F )
p( F | E )
(E F ) (E F )
p( E F ) p( E | F ) p( F )
p( E | F ) p( F )
p( E )
p( E ) p( E | F ) p( F ) p( E | F ) p( F )
p( E F ) p( E | F ) p( F )
p( F | E )
p( E | F ) p( F )
p( E | F ) p( F ) p( E | F ) p( F )
p( E ) p( E | F ) p( F ) p( E | F ) p( F )
End of Proof
Konstantin Busch - LSU
65
Example: Select random box
then select random ball in box
Box 1
Konstantin Busch - LSU
66
E : select red ball
F : select box 1
E : select green ball
F : select box 2
Box 2
Question probability: P( F | E ) ?
Question:
If a red ball is selected,
what is the probability it was taken from box 1?
Konstantin Busch - LSU
67
Question:
If a red ball is selected,
what is the probability it was taken from box 1?
Konstantin Busch - LSU
68
17
Bayes’ Theorem:
p( F | E )
p( E | F ) p( F )
p( E | F ) p( F ) p( E | F ) p( F )
E : select red ball
F : select box 1
E : select green ball
F : select box 2
Box 1
Box 2
p( F ) 1 / 2 0.5
p( F ) 1 / 2 0.5
Probability
to select box 1
Probability
to select box 2
We only need to compute:
p( F )
p( F )
p( E | F )
p( E | F )
Konstantin Busch - LSU
69
E : select red ball
F : select box 1
E : select green ball
F : select box 2
Box 1
Konstantin Busch - LSU
p( F | E )
Box 2
p( E | F ) p( F )
p( E | F ) p( F ) p( E | F ) p( F )
p( F ) 1 / 2 0.5
p( F ) 1 / 2 0.5
p( E | F ) 7 / 9 0.777...
p( E | F ) 7 / 9 0.777...
Probability to select
red ball from box 1
p( E | F ) 3 / 7 0.428....
Probability to select
red ball from box 2
Konstantin Busch - LSU
p( F | E )
70
p( E | F ) 3 / 7 0.428....
0.777 0.5
0.777
0.644
0.777 0.5 0.428 0.5 0.777 0.428
Final
result
71
Konstantin Busch - LSU
72
18
Spam Filters
What if we had more boxes?
Training set:
Generalized Bayes’ Theorem:
p ( Fj | E )
Spam (bad) emails
B
Good emails
G
p( E | Fj ) p( Fj )
n
p( E | F ) p( F )
i 1
i
A user classifies each email
in training set as good or bad
i
Sample space S F1 F2 Fn
mutually exclusive events
Konstantin Busch - LSU
73
Konstantin Busch - LSU
A new email X arrives
Find words that occur in B and
G
nG (w)
nB (w)
number of spam emails
that contain word w
p( w)
number of good emails
that contain word w
nB ( w)
|B|
q( w)
Konstantin Busch - LSU
S:
Event that X is spam
E:
Event that X contains word
w
What is the probability that X is spam
given that it contains word w ?
nG ( w)
|G |
P( S | E ) ?
Probability that
a good email
contains w
Probability that
a spam email
contains w
74
Reject if this probability is at least 0.9
75
Konstantin Busch - LSU
76
19
p( S | E )
Example:
p( E | S ) p( S )
p( E | S ) p( S ) p( E | S ) p( S )
Training set for word “Rolex”:
“Rolex” occurs in 250 of 2000 spam emails
We only need to compute:
p( S )
p( S )
p( E | S )
0.5
0.5
p( w)
simplified
assumption
nB ( w)
|B|
“Rolex” occurs in 5 of 1000 good emails
p( E | S )
q( w)
nG ( w)
|G |
Computed from training set
Konstantin Busch - LSU
77
“Rolex” occurs in 250 of 2000 spam emails
Konstantin Busch - LSU
78
If new email X contains word “Rolex”
what is the probability that it is a spam?
nB ( Rolex ) 250
p( Rolex )
If new email contains word “Rolex”
what is the probability that it is a spam?
nB ( Rolex ) 250
0.125
|B|
2000
S:
Event that X is spam
E:
Event that X contains word “Rolex”
“Rolex” occurs in 5 of 1000 good emails
nG ( Rolex ) 5
q( Rolex )
P( S | E ) ?
nG ( Rolex )
5
0.005
|G|
1000
Konstantin Busch - LSU
79
Konstantin Busch - LSU
80
20
p( S | E )
p( E | S ) p( S )
p( E | S ) p( S ) p( E | S ) p( S )
p( S | E )
0.125 0.5
0.125
0.961...
0.125 0.5 0.005 0.5 0.13
We only need to compute:
p( S )
p( S )
0.5
0.5
simplified
assumption
p( E | S )
New email is considered to be spam because:
p( E | S )
p ( Rolex ) 0.125
q( Rolex ) 0.005
p( S | E ) 0.961 0.9 spam threshold
Computed from training set
Konstantin Busch - LSU
81
Konstantin Busch - LSU
82
Better spam filters use two words:
p( S | E1 E2 )
p( E1 | S ) p( E2 | S )
p( E1 | S ) p( E2 | S ) p( E1 | S ) p( E2 | S )
Assumption: E1 and E2 are independent
the two words appear
independent of each other
Konstantin Busch - LSU
83
21