Download Advanced Algorithms – COMS31900 Probability Probability Event

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Statistics wikipedia , lookup

Probability interpretations wikipedia , lookup

Probability wikipedia , lookup

Randomness wikipedia , lookup

Transcript
Probability
Advanced Algorithms – COMS31900
Sample space S is the set of outcomes of an experiment.
2013/2014
For x ∈ S , the probability of x, written Pr(x), is a real number
P
between 0 and 1, such that x∈S Pr(x) = 1.
E XAMPLE
Lecture 1
Mathematical preliminaries
The outcome of a die roll: S = {1, 2, 3, 4, 5, 6}.
Pr(1) = Pr(2) = Pr(3) = Pr(4) = Pr(5) = Pr(6) = 16 .
Markus Jalsenius
E XAMPLE
Flip a coin: S = {H, T}.
Pr(H) = Pr(T) = 12 .
E XAMPLE
Amount of money you can win when playing some lottery:
S = {£0, £10, £100, £1000, £10, 000, £100, 000}.
Pr(£0) = 0.9, Pr(£10) = 0.08, . . . , Pr(£100, 000) = 0.0001.
2
4
Event
Probability
An event is a subset V of the sample space S .
The probability of event V happening, denoted Pr(V ), is
X
Pr(x).
Pr(V ) =
O BSERVE
The sample space is not necessarily finite.
E XAMPLE
x∈S
Flip a coin until first tail shows up:
S = {T, HT, HHT, HHHT, HHHHT, HHHHHT, . . . }.
Pr(It takes n coin flips) =
and
1 n
2
,
∞ n
X
1
n=1
E XAMPLE
=1
2
5
Flip a coin 3 times.
There are 8 outcomes, each have probability 18 .
Event V : first and last coinflips are the same.
Pr(V ) = Pr(HHH)+Pr(HTH)+Pr(THT)+Pr(TTT) = 4× 81 = 12 .
6
Random variable
Linearity of expectation
A random variable (r.v.) Y over sample space S is a function S → R
T HEOREM
Let Y1 , Y2 , . . . , Yk be k random variables. Then
(a mapping from every outcome to the real numbers).
The probability of Y taking value y is
Pr(Y = y) =
X
k
k
X
X
E
Yi =
E(Yi )
Pr(x).
i=1
x∈S
such that
Y(x) = y
E XAMPLE
Two coin flips.
S
Y
HH
2
1
5
2
HT
TH
TT
Pr(Y = 2) =
7
O BSERVE
Linearity of expectation always holds, regardless of whether the random
varibles Yi are dependent or not.
The expected value (the mean) of a r.v. Y ,
denoted E(Y ), is
X
Y (x) · Pr(x).
E(Y ) =
E XAMPLE
What is the expected sum of two dice rolls?
Sum over all possible outcomes:
2
3
1
1
+ 3 · 36
+ 4 · 36
+ . . . , 12 · 36
=7
E(sum) = 2 · 36
Couple the dice so that their outcomes are always the same.
x∈S
1
2
E(Y ) = 2 ·
1
2
i=1
+ 1·
1
4
+ 5·
1
4
=
7
2
8
Individually, they are still normal dice.
E(sum) = 2 · 16 + 4 · 16 + 6 · 16 + 8 · 16 + 10 · 16 + 12 · 16 = 7
Indicator random variables
Mean
An indicator random variable is a r.v. that can only be 0 and 1.
We usually use the letter I to emphasise that the r.v. is an indicator r.v.
Property: E(I) = 0 · Pr(I = 0) + 1 · Pr(I = 1) = Pr(I = 1).
E XAMPLE
Often an indicator r.v. I is associated with an event such that if the
event happens then I = 1, otherwise I = 0.
Indicator random variables and linearity of expectation work great
together!
Roll a die n times. How many die rolls do we expect to show a value
that is at least the value of the previous roll?
j=2
9
n
10
100
200
1000
100,000
E XAMPLE
For j ∈ {2, . . . , n}, indicator r.v. Ij = 1 if the value of the j th
roll is at least the value of the previous roll, otherwise Ij = 0.
7
Pr(Ij = 1) = 12
(verify this!). Use linearity of expectation:
n
n
X
X
7
E
Ij =
E(Ij ) = (n − 1) ·
12
Flip a coin n times.
Let r.v. T be the number of tails.
E(T ) = n2 .
How likely is T to be within 5% of n
2?
Chance
24%
38%
52%
89%
(100 − 2.6 · 10−54 )%
O BSERVE
The sample average of independent, identically distributed variables
converges in probability towards their means (Central Limit Theorem).
j=2
10
Markov’s inequality
Markov’s inequality
E XAMPLE
E XAMPLE
Suppose average (mean) speed on the motorway is 60 mph.
Then at most
n people go to a party, leaving their hats at the door. Each person leaves
with a random hat.
1
2 of all cars run at 120 mph or more,
2
3 of all cars run at 90 mph or more,
How many leave with their own hat?
For j ∈ {1, . . . , n}, let indicator r.v. Ij = 1 if the j th person gets their
own hat, otherwise Ij = 0. By linearity of expectation,
otherwise the mean must be higher than 60 mph.
T HEOREM
If X is a non-negative r.v., then for all a > 0,
Pr(X ≥ a) ≤
E(X)
.
a
Go through the proof as an
exercise. Indicator random
variables and linearity of
expectation will come in
handy again.
E
j=1
11
n
n
X
X
1
Ij =
E(Ij ) =
Pr(Ij = 1) = n · = 1.
n
j=1
j=1
E(X)
By Markov’s inequality (recall: Pr(X ≥ a) ≤ a ),
Pr(5 or more people leaving with their own hats) ≤ 15 ,
Pr(at least 1 person leaving with their own hat) ≤ 11 = 1.
E XAMPLE
From the example above:
60
= 12 ,
Pr(speed of a random car ≥ 120 mph) ≤ 120
60
Pr(speed of a random car ≥ 90 mph) ≤ 90
= 23 .
n
X
12
Sometimes Markov’s inequality is not particularly informative.
(In fact, here it can be shown that as n → ∞, the probability that at least
one person leaves with their own hat is 1 − 1e .)
Union bound
Markov’s inequality
C OROLLARY
E XAMPLE
If X is a non-negative r.v. that only takes integer values, then
S = {1, . . . , 6} is the set of outcomes of a die roll.
Three events:
Pr(X > 0) = Pr(X ≥ 1) ≤ E(X) .
V1 = {3, 4, 5} (between 3 and 5),
V2 = {1, 2, 3} (3 or less),
V3 = {2, 3, 5} (a prime).
E XAMPLE
The probability of winning money on lottery is at most the expected gain.
V2
S
1
3
O BSERVE
4
For an indicator r.v. I , the bound is tight, as Pr(I > 0) = E(I).
V1
5
6
2
V3
Pr(V1 or V2 or V3 ) = 56 ≤ Pr(V1 ) + Pr(V2 ) + Pr(V3 ) = 32 .
Double counting Pr(2) and Pr(5),
Triple counting Pr(3).
13
14
Union bound
Binomial coefficients
The number of ways to choose k elements from a set of n elements is
!
n
n!
=
.
k
k!(n − k)!
T HEOREM
Let V1 , . . . , Vk be k events. Then
Pr
k
[
Vi
i=1
≤
k
X
Pr(Vi ).
i=1
O BSERVE
The bound is tight when the events are all disjoint.
P ROOF
Indicator r.v. Ij = 1 if event Vj happens, otherwise Ij = 0.
P
Let X = kj=1 Ij be the number of events happening.
Sk
Pr
j=1 Vj
P
P
= Pr(X > 0) ≤ E(X) = kj=1 E(Ij ) = kj=1 Pr(Vj ).
Using previous corollary
Linearity of expectation
15
16
Binomial coefficients
L EMMA
n k
k
≤
!
n · e k
n
≤
k
k
P ROOF
The lower bound
The number of ways of picking k balls from n balls with repetition
allowed is nk . One can generate all the possible ways by first
deciding which k out of n balls to draw, and then draw from the k
selected balls instead (this will ideed overcount the number of ways).
n
There are k ways to choose the k balls and k k ways to pick from
the selected k balls with repetition allowed.
This gives us
!
!
n k
k
k
⇐⇒
i=0
17
n k
n
≤
.
k
k
The upper bound
!
∞
n · e k
n
nk
nk kk
nk X ki
≤
= k ·
≤ k ·
≤
.
k
k!
k!
i!
k
k
k
nk ≤