Download Document

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Introduction to Discrete Probability
Unbiased die
Discrete Probability
CSC-2259 Discrete Structures
Sample Space:
S  {1,2,3,4,5,6}
All possible outcomes
Konstantin Busch - LSU
1
Probability of event
Event: any subset of sample space
E2  {2,5}
E1  {3}
2
p( E ) 
E:
size of event set
|E|

size of sample space | S |
Experiment: procedure that yields events
Throw die
Konstantin Busch - LSU
3
Note that:
0  p( E )  1
since
0 | E || S |
Konstantin Busch - LSU
4
1
What is the probability that
a die brings 3?
E  {3}
Event Space:
Sample Space:
Probability:
What is the probability that
a die brings 2 or 5?
S  {1,2,3,4,5,6}
p( E ) 
E  {2,5}
Event Space:
Sample Space:
|E| 1

|S| 6
Probability:
S  {1,2,3,4,5,6}
p( E ) 
|E| 2

|S| 6
5
6
Two unbiased dice
What is the probability that
two dice bring (1,1)?
Event Space:
Sample Space: 36 possible outcomes
S  {(1,1), (1,2), (1,3), , (6,6)}
E  {(1,1)}
Sample Space: S  {(1,1), (1,2), (1,3), , (6,6)}
First die
Second die
Ordered pair
Probability:
7
p( E ) 
|E| 1

| S | 36
8
2
Game with unordered numbers
Game authority selects
a set of 6 winning numbers out of 40
What is the probability that
two dice bring same numbers?
Event Space: E  {(1,1), (2,2), (3,3), (4,4), (5,5), (6,6)}
Player picks a set of 6 numbers
(order is irrelevant)
Sample Space: S  {(1,1), (1,2), (1,3), , (6,6)}
Probability:
p( E ) 
Number choices: 1,2,3,…,40
i.e. winning numbers: 4,7,16,25,33,39
i.e. player numbers: 8,13,16,23,33,40
|E| 6

| S | 36
What is the probability that a player wins?
9
Winning event:
E  {{4,7,16,25,33,39}} | E | 1
a single set with the 6 winning numbers
Konstantin Busch - LSU
10
Probability that player wins:
P( E ) 
Sample space:
S  {all subsets with 6 numbers out of 40}
 {{1,2,3,4,5,6},{1,2,3,4,5 ,7},{1,2,3,4,5,8},}
|E|
1
1


| S |  40  3,838,380
 
6
 40 
| S |    3,838,380
6
Konstantin Busch - LSU
11
Konstantin Busch - LSU
12
3
Event: E  {{2 h ,2d ,2c ,2s },{3 h ,3d ,3c ,3s },,{jh , jd , jc , js }}
A card game
Deck has 52 cards
| E | 13
13 kinds of cards (2,3,4,5,6,7,8,9,10,a,k,q,j),
each kind has 4 suits (h,d,c,s)
Player is given hand with 4 cards
What is the probability that the cards
of the player are all of the same kind?
Konstantin Busch - LSU
Sample Space:
S  {all possible sets of 4 cards out of 52}
 {{2 h ,2d ,2c ,2s },{2 h ,2d ,2c ,3h },{2 h ,2d ,2c ,3d },}
 52  52! 52  51 50  49
| S |   

 270,725
4  3 2
 4  4! 48!
13
Konstantin Busch - LSU
14
Game with ordered numbers
Probability that hand has 4 same kind cards:
P( E ) 
each set of 4 cards is of
same kind
Game authority selects from a bin 5 balls
in some order labeled with numbers 1…50
|E|
13
13


| S |  52  270,725
 
4
Number choices: 1,2,3,…,50
i.e. winning numbers: 37,4,16,33,9
Player picks a set of 5 numbers
(order is important)
i.e. player numbers: 40,16,13,25,33
What is the probability that a player wins?
Konstantin Busch - LSU
15
Konstantin Busch - LSU
16
4
Sampling without replacement:
After a ball is selected
it is not returned to bin
Sampling with replacement:
After a ball is selected
it is returned to bin
Sample space size: 5-permutations of 50 balls
| S | P(50,5) 
50!
50

 50  49  48  47  46  245,251,200
(50  5)! 45!
Probability of success: P( E ) 
|E|
1

| S | 245,251,200
Konstantin Busch - LSU
Probability of Inverse:
Sample space size: 5-permutations of 50 balls
with repetition
| S | 505  312,500,000
Probability of success: P( E ) 
17
|E|
1

| S | 312,500,000
Konstantin Busch - LSU
18
Example: What is the probability that
a binary string of 8 bits contains
at least one 0?
p( E )  1  p( E )
E  {01111111, 10111111,,00111111,,00000000}
Proof:
p( E ) 
E  S E
E  {11111111}
|S E| |S || E|
|E|

 1
 1  p( E )
|S|
|S|
|S|
p( E )  1  p( E )  1 
|E|
1
 1 8
|S|
2
End of Proof
Konstantin Busch - LSU
19
Konstantin Busch - LSU
20
5
Probability of Union:
Example: What is the probability that
a binary string of 8 bits
starts with 0 or ends with 11?
E1, E2  S
p( E1  E2 )  p( E1 )  p( E2 )  p( E1  E2 )
Strings that start with 0:
E1  {00000000, 00000001,,01111111}
Proof: | E1  E2 || E1 |  | E2 |  | E1  E2 |
| E1 | 27 (all binary strings with 7 bits 0xxxxxxx)
| E1  E2 | | E1 |  | E1 |  | E1  E2 |

|S|
|S|
| E1 | | E2 | | E1  E2 |



|S| |S|
|S|
 p( E1 )  p( E2 )  p( E1  E2 )
p( E1  E2 ) 
Konstantin Busch - LSU
Strings that end with 11:
E2  {00000011, 00000111,,11111111}
End of Proof 21
| E1  E2 | 25 (all binary strings with 5 bits 0xxxxx11)
Strings that start with 0 or end with 11:
p( E1  E2 )  p( E1 )  p( E2 )  p( E1  E2 )
Sample space:
22
S  {x1 , x2 ,, xn }
Probability distribution function
p:
0  p( xi )  1
| E1 | | E2 | | E1  E2 |


|S| |S|
|S|
n
 p( x )  1
2 7 2 6 25 1 1 1 5
 8 8 8    
2 2 2
2 4 8 8
Konstantin Busch - LSU
Konstantin Busch - LSU
Probability Theory
Strings that start with 0 and end with 11:
E1  E2  {000000011, 00000111,,01111111}

| E2 | 26 (all binary strings with 6 bits xxxxxx11)
x 1
23
i
Konstantin Busch - LSU
24
6
Notice that it can be:
Uniform probability distribution:
p ( xi )  p( x j )
p ( xi ) 
Example: Biased Coin
Heads (H) with probability 2/3
Tails (T) with probability 1/3
Sample space:
p( H ) 
2
3
Sample space:
1
3
p( H )  p(T ) 
S  {x1 , x2 ,, xn }
Example: Unbiased Coin
Heads (H) or Tails (T) with probability 1/2
S  {H , T }
p (T ) 
1
n
2 1
 1
3 3
Konstantin Busch - LSU
25
S  {H , T }
p( H ) 
E:
E  {x1 , x2 ,, xk }  S
p (T ) 
26
S  {1,2,3,4,5,6}
p(1)  p(2)  p(3)  p(4)  p(5) 
k
p( E )   p( xi )
i 1
For uniform probability distribution: p( E ) 
Konstantin Busch - LSU
What is the probability
that the die outcome is 2 or 6?
|E|
|S|
27
1
2
Konstantin Busch - LSU
Example: Biased die
Probability of event
1
2
p( E )  p(2)  p(6) 
Konstantin Busch - LSU
1
7
p(6) 
2
7
E  {2,6}
1 2 3
 
7 7 7
28
7
Conditional Probability
Combinations of Events:
Complement:
Three tosses of an unbiased coin
p( E )  1  p( E )
Union: p( E1  E2 )  p( E1 )  p( E2 )  p( E1  E2 )


Union of disjoint events: p  Ei    p( Ei )
 i
 i
Konstantin Busch - LSU
29
Sample space:
Tails
Heads
Tails
Condition:
first coin is Tails
Question:
What is the probability that
there is an odd number of Tails,
given that first coin is Tails?
Konstantin Busch - LSU
30
Event without condition:
S  {TTT , TTH , THT , THH , HTT , HTH , HHT , HHH }
E  {TTT , THH , HTH , HHT }
Odd number of Tails
Restricted sample space given condition:
Event with condition:
F  {TTT , TTH , THT , THH }
EF  E  F  {TTT , THH }
first coin is Tails
Konstantin Busch - LSU
first coin is Tails
31
Konstantin Busch - LSU
32
8
F  {TTT , TTH , THT , THH }
Notation of event with condition:
EF  E  F  {TTT , THH }
EF  E | F
event E given F
Given condition,
the sample space changes to F
p( EF ) 
| E  F | | E  F | / | S | p( E  F ) 2 / 8



 0.5
|F|
|F|/|S|
p( F )
4/8
p( EF )  p( E | F ) 
p( E  F )
p( F )
(the coin is unbiased)
Konstantin Busch - LSU
33
(for arbitrary probability distribution)
p( E | F ) 
Assume equal probability to have boy or girl
F is:
Sample space:
p( E  F )
p( F )
Konstantin Busch - LSU
34
Example: What is probability that a family
of two children has two boys
given that one child is a boy
Conditional probability definition:
Given sample space S with
events E and F (where p ( F )  0 )
the conditional probability of E given
Konstantin Busch - LSU
Condition:
35
S  {BB , BG , GB, GG}
F  {BB , BG , GB}
one child is a boy
Konstantin Busch - LSU
36
9
Event:
Independent Events
E  {BB}
both children are boys
Events
E1
and
E2
are independent iff:
p( E1  E2 )  p( E1 ) p( E2 )
Conditional probability of event:
p( E | F ) 
p( E  F )
p({BB})
1/ 4 1



p( F )
p({BB , BG , GB}) 3 / 4 3
Equivalent definition (if p( E2 )  0 ):
p( E1 | E2 )  p( E1 )
Konstantin Busch - LSU
37
Example: 4 bit uniformly random strings
E1 : a string begins with 1
E2 : a string contains even 1
| E1  E2 | 4
p ( E1 )  p ( E2 ) 
p( E1  E2 ) 
8 1

16 2
Success probability:
Failure probability:
p
q  1 p
Example: Biased Coin
4 1 1 1
    p( E1 ) p( E2 )
16 4 2 2
Events E1 and E2 are independent
Konstantin Busch - LSU
38
Bernoulli trial: Experiment with two outcomes:
success or failure
E1  {1000, 1001, 1010, 1011, 1100, 1101, 1110, 1111}
E2  {0000, 0011, 0101, 0110, 1001, 1010, 1100, 1111}
E1  E2  {1111, 1100, 1010, 1001}
| E1 || E2 | 8
Konstantin Busch - LSU
39
Success = Heads
2
p  p( H ) 
3
Failure = Tails
q  p(T ) 
Konstantin Busch - LSU
1
3
40
10
Throw the biased coin 5 times
Independent Bernoulli trials:
the outcomes of successive Bernoulli trials
do not depend on each other
What is the probability to have 3 heads?
Heads probability:
p
2
3
(success)
Tails probability:
q
1
3
(failure)
Example: Successive coin tosses
Konstantin Busch - LSU
41
Konstantin Busch - LSU
42
Probability that any particular sequence has
3 heads and 2 tails is specified positions:
3 2
HHHTT
HTHHT
pq
For example:
HTHTH
HHHTT
THHTH
p
p
p
q
q
p
q
p
p
q


HTHHT
Total numbers of ways to arrange in sequence
5 coins with 3 heads:  5 
 
 3
Konstantin Busch - LSU
pppqq  p 3q 2
HTHTH
p
43
pqppq  p 3q 2
q
p
q
Konstantin Busch - LSU
p
pqpqp  p 3q 2
44
11
Throw the biased coin 5 times
Probability of having 3 heads:
 5
p 3q 2  p 3q 2    p 3q 2    p 3q 2
 3
1st
sequence
success
(3 heads)
2nd
sequence
success
(3 heads)
Probability to have exactly 3 heads:
 5 3 2
  p q
 3
 5  st
 
 3
2
1
   0.0086
 3
All possible ways to arrange in sequence 5 coins with 3 heads
45
Theorem: Probability to have k successes
in n independent Bernoulli trials:
 n  k nk
  p q
k 
Konstantin Busch - LSU
Proof:
n
b(k ; n, p)    p k q n  k
k 
Probability that
a sequence has
k successes and
n  k failures
in specified positions
Example:
SFSFFS…SSF
47
46
 n  k nk
  p q
k 
Total number of
sequences with
k successes and
n  k failures
Also known as
binomial probability distribution:
Konstantin Busch - LSU
3
Probability to have 3 heads and 2 tails
in specified sequence positions
sequence
success
(3 heads)
Konstantin Busch - LSU
5!  2 

 
3!2!  3 
End of Proof
Konstantin Busch - LSU
48
12
Birthday Problem
Example: Random uniform binary strings
probability for 0 bit is 0.9
probability for 1 bit is 0.1
What is probability of 8 bit 0s out of 10 bits?
i.e. 0100001000
p  0 .9
q  0 .1
k 8
n  10
n
10 
b(k ; n, p)    p k q nk   (0.9)8 (0.1) 2  0.1937102445
k 
8
Konstantin Busch - LSU
Birthday collision: two people have birthday
in same day
Problem:
How many people should be in a room
so that the probability of birthday collision
is at least ½?
Assumption: equal probability to be born in any day
49
Konstantin Busch - LSU
50
We will compute
366 days in a year
pn
If the number of people is 367 or more
then birthday collision is guaranteed by
pigeonhole principle
:probability that n people have
all different birthdays
It will helps us to get
Assume that we have
n  366
Konstantin Busch - LSU
1  pn :probability that there is
people
a birthday collision among n people
51
Konstantin Busch - LSU
52
13
Sample space:
Cartesian product
Event set: each person’s birthday is different
S  {1,2,,366}  {1,2,,366}   {1,2,,366}
1st person’s
Birthday
choices
2nd person’s
Birthday
choices
nth person’s
Birthday
choices
E  {(1,2,,366), (366,1,,365), , (366,365,,1)}
1st person’s
birthday
Sample size:
S  {(1,1,,1), (2,1,,1),  (366,366,,366)}
| E | P(366, n) 
Sample space size: | S | 366  366366  366n
Konstantin Busch - LSU
53
Probability of no birthday collision
pn 
1  pn
n  22
1  pn  0.475
n  23
1  pn  0.506
Konstantin Busch - LSU
#choices
#choices
366!
 366  365  364(366  n  1)
(366  n)!
Konstantin Busch - LSU
n  23
Therefore:
55
nth person’s
birthday
#choices
Probability of birthday collision:
| E | 366  365  364(366  n  1)

|S|
366n
Probability of birthday collision:
2nd person’s
birthday
54
1  pn
1  pn  0.506
n  23 people have probability
at least ½ of birthday collision
Konstantin Busch - LSU
56
14
Randomized algorithms:
The birthday problem analysis can be used
to determine appropriate hash table sizes
that minimize collisions
algorithms with randomized choices
(Example: quicksort)
Las Vegas algorithms:
Hash function collision:
randomized algorithms whose output
is always correct (i.e. quicksort)
h(k1 )  h(k2 )
Monte Carlo algorithms:
randomized algorithms whose output
is correct with some probability
(may produce wrong output)
Konstantin Busch - LSU
57
A Monte Carlo algorithm
b  random_num ber(1, ,n)
if (Miller_Test( n,b ) == failure)
return(false) // n is not prime
}
return(true) // most likely n is prime
Konstantin Busch - LSU
58
Miller_Test( n,b ) {
n-1  2 s t
s, t  0, s  log n, t is odd
for ( j  0 to s-1 ) {
j
t
if ( b  1 ( mod n) or b2 t  1 ( mod n) )
return(success)
}
return(failure)
}
Primality_Test( n,k ) {
for(i  1 to k ) {
}
Konstantin Busch - LSU
59
Konstantin Busch - LSU
60
15
If the primality test algorithm returns false
then the number is not prime for sure
A prime number n passes the
Miller test for every 1  b  n
If the algorithm returns true then the
answer is correct (number is prime)
with high probability:
A composite number n passes the
Miller test in range 1  b  n
for fewer than n numbers
k
1
1
1    1  1
n
4
4
false positive with probability:
1
4
Konstantin Busch - LSU
for k  log 4 n
61
62
Bayes’ Theorem Proof:
Bayes’ Theorem
p( E )  0
Konstantin Busch - LSU
p( F )  0
p( E | F ) p( F )
p( F | E ) 
p( E | F ) p( F )  p( E | F ) p( F )
p( F | E ) 
p( E  F )
p( E )
p( E  F )  p( F | E ) p( E )
p( E | F ) 
p( E  F )
p( F )
p( E  F )  p( E | F ) p( F )
p( F | E ) p( E )  p( E | F ) p( F )
Applications: Machine Learning
Spam Filters
Konstantin Busch - LSU
p( F | E ) 
63
Konstantin Busch - LSU
p( E | F ) p( F )
p( E )
64
16
E  (E  F )  (E  F )
p( E )  p( E  F )  p( E  F )
p( F | E ) 
(E  F )  (E  F )  
p( E  F )  p( E | F ) p( F )
p( E | F ) p( F )
p( E )
p( E )  p( E | F ) p( F )  p( E | F ) p( F )
p( E  F )  p( E | F ) p( F )
p( F | E ) 
p( E | F ) p( F )
p( E | F ) p( F )  p( E | F ) p( F )
p( E )  p( E | F ) p( F )  p( E | F ) p( F )
End of Proof
Konstantin Busch - LSU
65
Example: Select random box
then select random ball in box
Box 1
Konstantin Busch - LSU
66
E : select red ball
F : select box 1
E : select green ball
F : select box 2
Box 2
Question probability: P( F | E )  ?
Question:
If a red ball is selected,
what is the probability it was taken from box 1?
Konstantin Busch - LSU
67
Question:
If a red ball is selected,
what is the probability it was taken from box 1?
Konstantin Busch - LSU
68
17
Bayes’ Theorem:
p( F | E ) 
p( E | F ) p( F )
p( E | F ) p( F )  p( E | F ) p( F )
E : select red ball
F : select box 1
E : select green ball
F : select box 2
Box 1
Box 2
p( F )  1 / 2  0.5
p( F )  1 / 2  0.5
Probability
to select box 1
Probability
to select box 2
We only need to compute:
p( F )
p( F )
p( E | F )
p( E | F )
Konstantin Busch - LSU
69
E : select red ball
F : select box 1
E : select green ball
F : select box 2
Box 1
Konstantin Busch - LSU
p( F | E ) 
Box 2
p( E | F ) p( F )
p( E | F ) p( F )  p( E | F ) p( F )
p( F )  1 / 2  0.5
p( F )  1 / 2  0.5
p( E | F )  7 / 9  0.777...
p( E | F )  7 / 9  0.777...
Probability to select
red ball from box 1
p( E | F )  3 / 7  0.428....
Probability to select
red ball from box 2
Konstantin Busch - LSU
p( F | E ) 
70
p( E | F )  3 / 7  0.428....
0.777  0.5
0.777

 0.644
0.777  0.5  0.428  0.5 0.777  0.428
Final
result
71
Konstantin Busch - LSU
72
18
Spam Filters
What if we had more boxes?
Training set:
Generalized Bayes’ Theorem:
p ( Fj | E ) 
Spam (bad) emails
B
Good emails
G
p( E | Fj ) p( Fj )
n
 p( E | F ) p( F )
i 1
i
A user classifies each email
in training set as good or bad
i
Sample space S  F1  F2    Fn
mutually exclusive events
Konstantin Busch - LSU
73
Konstantin Busch - LSU
A new email X arrives
Find words that occur in B and
G
nG (w)
nB (w)
number of spam emails
that contain word w
p( w) 
number of good emails
that contain word w
nB ( w)
|B|
q( w) 
Konstantin Busch - LSU
S:
Event that X is spam
E:
Event that X contains word
w
What is the probability that X is spam
given that it contains word w ?
nG ( w)
|G |
P( S | E )  ?
Probability that
a good email
contains w
Probability that
a spam email
contains w
74
Reject if this probability is at least 0.9
75
Konstantin Busch - LSU
76
19
p( S | E ) 
Example:
p( E | S ) p( S )
p( E | S ) p( S )  p( E | S ) p( S )
Training set for word “Rolex”:
“Rolex” occurs in 250 of 2000 spam emails
We only need to compute:
p( S )
p( S )
p( E | S )
0.5
0.5
p( w) 
simplified
assumption
nB ( w)
|B|
“Rolex” occurs in 5 of 1000 good emails
p( E | S )
q( w) 
nG ( w)
|G |
Computed from training set
Konstantin Busch - LSU
77
“Rolex” occurs in 250 of 2000 spam emails
Konstantin Busch - LSU
78
If new email X contains word “Rolex”
what is the probability that it is a spam?
nB ( Rolex )  250
p( Rolex ) 
If new email contains word “Rolex”
what is the probability that it is a spam?
nB ( Rolex ) 250

 0.125
|B|
2000
S:
Event that X is spam
E:
Event that X contains word “Rolex”
“Rolex” occurs in 5 of 1000 good emails
nG ( Rolex )  5
q( Rolex ) 
P( S | E )  ?
nG ( Rolex )
5

 0.005
|G|
1000
Konstantin Busch - LSU
79
Konstantin Busch - LSU
80
20
p( S | E ) 
p( E | S ) p( S )
p( E | S ) p( S )  p( E | S ) p( S )
p( S | E ) 
0.125  0.5
0.125

 0.961...
0.125  0.5  0.005  0.5 0.13
We only need to compute:
p( S )
p( S )
0.5
0.5
simplified
assumption
p( E | S )
New email is considered to be spam because:
p( E | S )
p ( Rolex )  0.125
q( Rolex )  0.005
p( S | E )  0.961  0.9 spam threshold
Computed from training set
Konstantin Busch - LSU
81
Konstantin Busch - LSU
82
Better spam filters use two words:
p( S | E1  E2 ) 
p( E1 | S ) p( E2 | S )
p( E1 | S ) p( E2 | S )  p( E1 | S ) p( E2 | S )
Assumption: E1 and E2 are independent
the two words appear
independent of each other
Konstantin Busch - LSU
83
21
Related documents