Download Notes - kaharris.org

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Random variable wikipedia , lookup

Infinite monkey theorem wikipedia , lookup

Randomness wikipedia , lookup

Inductive probability wikipedia , lookup

Birthday problem wikipedia , lookup

Ars Conjectandi wikipedia , lookup

Conditioning (probability) wikipedia , lookup

Probability interpretations wikipedia , lookup

Law of large numbers wikipedia , lookup

Transcript
Bounding Probabilities
Chebyshev’s Inequality
A measure of the concentration of a random variable X near its
mean µ is its variance σ .
Chebyshev’s Inequality says that the probability that X lies outside
Math 425
Introduction to Probability
Lecture 36
2
an arbitrary interval (µ − ε, µ + ε) is negligible, provided the ratio
sufficiently small.
Kenneth Harris
[email protected]
If X is a random variable with finite mean µ and variance σ 2 ,
then for any value ε > 0,
April 18, 2009
σ2
P { X − µ ≥ ε } ≤ 2 .
ε
Math 425 Introduction to Probability Lecture 36
April 18, 2009
1/1
Kenneth Harris (Math 425)
Math 425 Introduction to Probability Lecture 36
Bounding Probabilities
∞
f (x) dx
µ+ε
Compute the variance Var (X ) = σ :
2
σ2
Z
f (x) dx +
=
−∞
3/1
Proof of Chebyshev’s Inequality
Fix ε > 0. For clarity we take X to be continuous with density f (x).
Z
April 18, 2009
Bounding Probabilities
Proof of Chebyshev’s Inequality
Z
is
Proposition (Chebyshev’s Inequality)
Department of Mathematics
University of Michigan
Kenneth Harris (Math 425)
σ
ε
∞
(x − µ)2 · f (x) dx
−∞
f (x) dx
Z
|x−µ|≥ε
µ−ε
Z
=
(x − µ)2 · f (x) dx
≥
Z
|x−µ|≥ε
≥
f (x) dx
= P X − µ ≥ ε .
≥ ε2
|x−µ|≥ε
Z
f (x) dx
(x − µ)2 ≥ ε2
|x−µ|≥ε
= ε2 · P X − µ ≥ ε .
8ÈX-ΜȳΕ<
8ÈX-ΜÈ<Ε<
Μ-Ε
H
Μ
8ÈX-ΜȳΕ<
So indeed,
σ2
P { X − µ ≥ ε } ≤ 2 .
ε
Μ+Ε
L
R
Note. The proof for discrete X is exactly the same but for replacing " " with
P
" " and density "f (x)" with mass "p(x)".
Kenneth Harris (Math 425)
Math 425 Introduction to Probability Lecture 36
April 18, 2009
4/1
Kenneth Harris (Math 425)
Math 425 Introduction to Probability Lecture 36
April 18, 2009
5/1
Bounding Probabilities
Bounding Probabilities
Example
Example
Chebyshev’s inequality is quite crude, and we will see examples
Example. Chebyshev’s inequality is the best possible inequality – in
that there are random variables for which the inequality is in fact an
equality.
where it provides a poor bound. However
Fix ε > 0 and choose X with distribution
pX (−ε) =
1
2
pX (ε) =
Chebyshev’s inequality provides a bound for any distribution
whatsoever, and can therefore be used even when no information
about the distribution except its statistics (mean and variance) are
known.
2
Chebyshev’s inequality is the best we can do when only the
statistics of the distribution are known (by the previous example).
3
Chebyshev’s inequality is still useful enough to provide very
general and powerful theorems, such as the Weak Law of Large
Numbers. These results do not depend on finer determination of the
1
2
So,
µ = E[X ] = 0
1
2
2
σ = Var (X ) = ε ,
and thus,
1 = P X − µ ≥ ε ≤
σ2
ε2
= 1.
probability of the “tail" of the distribution.
Note: this distribution cuts-out the center (µ − ε, µ + ε) and places the entire
mass of the tail of the distribution on ±σ.
Kenneth Harris (Math 425)
Math 425 Introduction to Probability Lecture 36
April 18, 2009
6/1
Kenneth Harris (Math 425)
Math 425 Introduction to Probability Lecture 36
Bounding Probabilities
7/1
Bounding Probabilities
Example
Example
Example. Let X be a random variable with variance σ 2 = 0 and mean
µ.
By Chebyshev’s inequality, the probability that X lies outside the
interval (µ − ε, µ + ε) equals zero for any ε > 0:
Example. Chebyshev’s inequality provides a bound on the probability
that a distribution lies greater than k σ (k standard deviations) from its
mean.
Let X be any random variable with mean µ and variance σ 2 .
Use ε = k σ in Chebyshev:
P X − µ ≥ k σ ≤
2
σ
P X − µ ≥ ε ≤ 2 = 0.
ε
probability for a normally distributed random variable:
Let X be a random variable for which E[X ] = E[X ] . Then
2
2
k
1
2
3
4
Var (X ) = E[X 2 ] − E[X ]2 = 0.
Thus, P {X = E[X ]} = 1.
Math 425 Introduction to Probability Lecture 36
σ2
1
= 2.
k 2 σ2
k
Here is a comparison of Chebyshev’s bound with the known
So, the event {X = µ} has probability 1.
Kenneth Harris (Math 425)
April 18, 2009
April 18, 2009
8/1
Kenneth Harris (Math 425)
Chebyshev (=
1
0.5
0.111
0.0625
1
)
k2
Normal
0.3164
0.0456
0.0038
0.0000633
Math 425 Introduction to Probability Lecture 36
April 18, 2009
9/1
Bounding Probabilities
Sample Mean
Example
Problem
Example. Let X be the number of heads obtained in tossing a fair coin
100 times. Find a bound on P {41 ≤ X ≤ 59}.
The statistics for X are
E[X ] = 100 ·
1
= 50
2
Var (X ) = 100 ·
Problem. We wish to determine the value of an unknown X by making
a measurement. However, we expect some variation in the expected
measurement do to measuring inaccuracies.
Here are some examples
1 2
= 25
2
Use Chebyshev’s inequality to obtain the bound:
X is the unknown bias of a coin.
25
P {41 ≤ X ≤ 59} = 1 − P {X − 50 ≥ 10} ≥ 1 − 2 = 0.75.
10
Using the standard normal approximation to the binomial
distribution:
P {41 ≤ X ≤ 59}
X is the angle of reflection in the collision of billiard balls.
X is the average score on the final exam of a typical class in Probability
theory.
= P {40.5 ≤ X ≤ 59.5}
59.5 − 50 40.5 − 50 −Φ
≈ 0.9426.
≈ Φ
5
5
The actual probability is about 0.9431.
Kenneth Harris (Math 425)
Math 425 Introduction to Probability Lecture 36
April 18, 2009
10 / 1
Kenneth Harris (Math 425)
Sample Mean
April 18, 2009
12 / 1
Sample Mean
Problem
Problem
Let X have mean µ and variance σ . Chebyshev’s inequality puts a
bound on the accuracy ε of a single measurement of X
2
σ2
P { X − µ < ε } ≥ 1 − 2 .
ε
thenIf the
variation σ is very small compared to the desired accuracy ε,
this probability will be very close to 1. That is, we can be “almost
. . . . . . or else make a sufficiently large number of independent
measurements.
A powerful consequence of Chebyshev’s inequality is that if we
average of sufficiently many independent measurements of X , then the
sample mean, X , will with high probability provide the desired
accuracy in the measurement:
X − ε ≤ µ ≤ X + ε.
certain" that
X − ε ≤ µ ≤ X + ε.
If σ is not small compared to ε, then a single measurement cannot
provide the desired accuracy with any certain.
We must either give-up accuracy on a single measurement . . . . . .
Kenneth Harris (Math 425)
Math 425 Introduction to Probability Lecture 36
Math 425 Introduction to Probability Lecture 36
April 18, 2009
13 / 1
The reason is that X has the same mean µ, but smaller variance σ 2 .
Sampling reduces variation!!
This result is the Weak Law of Large Numbers.
Kenneth Harris (Math 425)
Math 425 Introduction to Probability Lecture 36
April 18, 2009
14 / 1
Sample Mean
Sample Mean
Definition IID random variables
Mean and variance of Sample Mean
Proposition
Definition
A finite (or infinite) sequence of random variables X1 , X2 , . . . is said to
be independent and identically distributed (IID) if they are mutually
independent and each has the same distribution function:
P {Xi ≤ x} = F (x)
for all i and x
Let X1 , X2 , . . . be IID random variables with finite common mean µ and
variance σ 2 .
Define the random variables: sum Sn and average An for each n by
Sn = X1 + X2 + . . . + Xn
Var (Xi ) = σ 2
Sn
X1 + X2 + . . . + Xn
=
n
n
Then
Note that IID random variables have the same statistics: they have the
same mean µ and variance σ 2 :
E[Xi ] = µ
An =
Var (Sn ) = n · σ 2
σ2
Var (An ) =
n
E[Sn ] = n · µ
E[An ] = µ
for all i
Note: An is called the sample mean of X1 , . . . , Xn by statisticians and
often denoted by X .
Kenneth Harris (Math 425)
Math 425 Introduction to Probability Lecture 36
April 18, 2009
15 / 1
Kenneth Harris (Math 425)
Math 425 Introduction to Probability Lecture 36
Sample Mean
April 18, 2009
16 / 1
Sample Mean
Proof
Example
By the linearity property of expectation (no independence required):
Example. Let X1 , X2 , . . . , Xn be the indicator variables for the individual
trials in a Bernoulli trials process with probability 0.3 for success.
So,
E[Xi ] = 0.3
E[Sn ] = E[X1 ] + . . . + E[Xn ] = n · µ
1
Sn
E[An ] = E[ ] = E[Sn ] = µ.
n
n
Since X , . . . , X
1
The sample average of the X is
i
An =
n
are independent, variance is linear:
Math 425 Introduction to Probability Lecture 36
X1 + X2 + . . . + Xn
n
So,
Var (Sn ) = Var (X1 ) + . . . + Var (Xn ) = n · σ 2
Sn
1
σ2
Var (An ) = Var ( ) = 2 Var (Sn ) =
.
n
n
n
Kenneth Harris (Math 425)
Var (Xi ) = (0.3)(0.7) = 0.21.
April 18, 2009
µ = E[An ] = 0.3
σ 2 = Var (An ) =
0.21
.
n
Chebyshev’s inequality for ε = 0.1 provides
0.21
n − 21
P 0.2 ≤ An ≤ 0.4 ≥ 1 −
=
.
n(0.1)2
n
17 / 1
Kenneth Harris (Math 425)
Math 425 Introduction to Probability Lecture 36
April 18, 2009
18 / 1
Sample Mean
Sample Mean
Example – continued
Example
Example. Suppose we choose at random n numbers in the interval
[0, 1] with uniform distribution. Let Xi be the ith choice. Then
21
n − 21
P 0.2 ≤ An ≤ 0.4 ≥ 1 −
=
.
n
n
Z
µ
Bounds for n = 50, n = 100 and n = 1000:
P 0.2 ≤ A50
P 0.2 ≤ A100
P 0.2 ≤ A1000
≤ 0.4
≤ 0.4
≤ 0.4
≥
0.58
σ2
x 2 dx − µ2 =
0
≥ 0.79
Let An =
≥ 0.979.
X1 +...+Xn
n
Var An
≈ 0.836347
≈ 0.962549
=
1
12n
1
1
P An − < ε ≥ 1 −
.
2
12nε2
April 18, 2009
19 / 1
Kenneth Harris (Math 425)
Math 425 Introduction to Probability Lecture 36
April 18, 2009
20 / 1
Sample Mean
Example – continued
Example
Example. Suppose we choose at random n numbers using the
standard normal distribution. Let Xi be the ith choice. Then
1
1
P An − < ε ≥ 1 −
.
2
12nε2
This says that if we choose n numbers at random from [0, 1], then
µ
= E[Xi ] = 0
2
= Var (Xi ) = 1.
σ
1
the chances are better than 1 − 12nε
2 that the average of the chosen
1
values differs from 2 by less than ε.
n
Let An = X1 +...+X
be the sample average.
n
This is a normal distribution with mean 0 and variance 1. The
statistics for the average value are
Suppose ε = 0.1. Then
E An =
Var An
=
1
100
P An − < ε ≥ 1 −
.
2
12n
0
1
n
By Chebyshev’s Inequality, for any ε > 0
For n = 100, the probability is about 0.92,
For n = 1000, the probability is about 0.99,
P An − 0 ≥ ε ≤
For n = 10, 000, the probability is about 0.999.
Math 425 Introduction to Probability Lecture 36
1
2
By Chebyshev’s Inequality, for any ε > 0
≈ 1.
Math 425 Introduction to Probability Lecture 36
1
.
12
be the sample average. Then
Sample Mean
Kenneth Harris (Math 425)
1
Z
= Var (Xi ) =
E An =
P 0.2 ≤ A50 ≤ 0.4
P 0.2 ≤ A100 ≤ 0.4
P 0.2 ≤ A1000 ≤ 0.4
1
2
x dx =
0
The actual values are
Kenneth Harris (Math 425)
1
= E[Xi ] =
April 18, 2009
21 / 1
Kenneth Harris (Math 425)
1
.
nε2
Math 425 Introduction to Probability Lecture 36
April 18, 2009
22 / 1
Sample Mean
Weak Law of Large Numbers
Example – continued
The Weak Law of Large Numbers
For ε = 0.1 Chebyshev gives:
100
P An ≥ 0.1 ≤
.
n
Here is a comparison of the actual probabilities (A
Theorem
Let X1 , X2 , . . . be a sequence of IID random variables with finite mean
E[Xi ] = µ and finite variance Var (Xi ) = σ 2 .
n is normally
−1
distributed with mean 0 and variance n ) with the Chebyshev
estimates:
P A ≥ 0.1
Chebyshev
n
n
100
200
300
400
500
600
700
800
900
1000
Kenneth Harris (Math 425)
0.3173
0.1573
0.0833
0.0455
0.0254
0.0143
0.0082
0.0047
0.0027
0.0016
Let An =
1.0000
0.5000
0.3333
0.2500
0.2000
0.1667
0.1429
0.1250
0.1111
0.1000
X1 +...+Xn
n
be the sample average. For any ε > 0,
P An − µ ≥ ε → 0
as n → ∞.
Equivalently,
P An − µ < ε → 1
Math 425 Introduction to Probability Lecture 36
April 18, 2009
23 / 1
Kenneth Harris (Math 425)
Weak Law of Large Numbers
Let X , X , . . . be IID. Then for the sample average A
n
2
E An = µ
n → ∞.
Math 425 Introduction to Probability Lecture 36
April 18, 2009
25 / 1
Weak Law of Large Numbers
Proof
1
as
Var An
Measurements
=
We want to make a measurement X with expected value µ and with
X1 +...+Xn
,
n
some accuracy , where each measurement randomly varies with
variance σ 2 .
We also want a high degree of certainty of accuracy:
σ2
=
.
n
By Chebyshev’s Inequality, for any fixed ε > 0,
P X − ≤ µ ≤ X +ε ≥ p
By Chebyshev’s inequality, choose n sufficiently large so that
σ2
P An − µ ≥ ε ≤ 2 .
nε
Thus, for fixed ε,
1−p ≥
P An − µ ≥ ε → 0
as
as
equivalently n ≥
To increase the accuracy ε →
n → ∞.
Or equivalently,
P An − µ < ε → 1
σ2
nε2
n≥
n → ∞.
σ2
.
(1 − p)ε2
ε
k,
=⇒
σ2
.
(1 − p)ε2
increase the trials by k 2 :
k 2n ≥
σ2
(1 − p) kε 2
.
To Increase the accuracy by 1 decimal place, multiply trials by 100.
Kenneth Harris (Math 425)
Math 425 Introduction to Probability Lecture 36
April 18, 2009
26 / 1
Kenneth Harris (Math 425)
Math 425 Introduction to Probability Lecture 36
April 18, 2009
27 / 1
Weak Law of Large Numbers
Weak Law of Large Numbers
Example: Coin Tossing
Example: Random Numbers
Plot of the mass of the random variable A
(sample average) of a fair coin
tossed n times. The mean of An is µ = 0.5. Note the larger n, the greater the
percentage of area is contained in the interval (0.45, 0.55), as predicted by
the Law of Large Numbers.
n
0.25
0.15
0.10
0.05
0.2
0.4
0.6
0.08
0.8
n=100
0.06
0.04
0.02
0.2
0.4
Kenneth Harris (Math 425)
0.6
0.8
n=50
0.2
1.0
(sample average) of n reals
uniformly chosen in [0, 1]. The mean of An is µ = 0.5. Note the larger n, the
greater the percentage of area is contained in the interval (0.45, 0.55), as
predicted by the Law of Large Numbers.
n
8
0.10
0.08
0.06
0.04
0.02
n=10
0.20
Plot of the density of the random variable A
0.4
0.6
8
n=5
6
0.8
4
2
2
1.0
0.3 0.4 0.5 0.6 0.7 0.8
0.3 0.4 0.5 0.6 0.7 0.8
8
0.05
0.04
0.03
0.02
0.01
n=200
7
6
5
4
3
2
1
n=20
6
4
2
0.3 0.4 0.5 0.6 0.7 0.8
Math 425 Introduction to Probability Lecture 36
April 18, 2009
n=10
6
4
n=30
0.3 0.4 0.5 0.6 0.7 0.8
28 / 1
Kenneth Harris (Math 425)
Example: Coin tossing
0.3 0.4
0.5 0.6 0.7 0.8
Math 425 Introduction to Probability Lecture 36
April 18, 2009
29 / 1
Example: Coin tossing
Example – continued
Example – continued
Consider a Bernoulli trials process with IID indicator variables
Example – continued. The variance σ 2 = p(1 − p) has a maximum
value of 14 achieved at p = 12 :
X1 , X2 , . . . denoting whether the trial was a success or failure. Suppose
the probability of success is p. So,
0.25
0.20
Let A
E[Xi ] = p
Var (Xi ) = p(1 − p).
0.15
n
=
X1 +...+Xn
n
be the same average over n trials. So,
E An = p
0.10
p(1 − p)
Var An =
.
n
0.05
1
p=
By Chebyshev’s inequality, for any ε > 0
0.2
Math 425 Introduction to Probability Lecture 36
2
0.6
0.8
1.0
Plugging back into Chebyshev gives us a bound on the deviation of the
sample average from the mean:
p(1 − p)
1
P An − p ≥ ε ≤
≤
.
2
nε
4nε2
p(1 − p)
P An − p ≥ ε ≤
.
nε2
Kenneth Harris (Math 425)
0.4
April 18, 2009
31 / 1
Kenneth Harris (Math 425)
Math 425 Introduction to Probability Lecture 36
April 18, 2009
32 / 1
Example: Coin tossing
Example: Coin tossing
Example
Example – continued
We have the following bound on the deviation of the sample
average A from the mean p using Chebyshev’s inequality:
n
Example
We have two coins: one is fair and the other produces heads with
probability 3/4. One coin is picked at random. How many tosses
suffice for us to be 95 percent sure of which coin we had?
P An − p ≥ ε ≤
We want to n large enough so that we have only 5% error:
To make this problem more concrete: if the proportion of heads is
1
≤ 0.05
4nε2
less than 0.625, then we will guess the coin was fair; otherwise, if the
proportion of heads is greater than 0.625 we will guess the biased coin.
How many tosses will suffice for 95 percent certainty that the
chosen coin will not deviate by more than = 0.125 from its mean?
Kenneth Harris (Math 425)
Math 425 Introduction to Probability Lecture 36
April 18, 2009
1
.
4nε2
33 / 1
equivalently,
n ≥
1
.
4(0.05)ε2
We now have a bound on the number of trials needed without
needing to know the mean or the variance.
Kenneth Harris (Math 425)
Example: Coin tossing
Math 425 Introduction to Probability Lecture 36
April 18, 2009
Example: Coin tossing
Example – continued
Degree of Certainty vs. Number of Trials
For ε = 0.125 choose n so that
To achieve certainty p that we are within ε = 0.125 of the mean
n≥
1
4(0.05)(0.125)2
requires n trials, where
equivalently n ≥ 320.
By tossing the coin n ≥ 320 times we can be 95% certain the
n ≥
sample average is within 0.125 of the true bias p of the coin to heads:
P An − p ≥ 0.125 ≤ 0.05
Toss the coin 320 times and count heads.
If fewer than 200 heads appear guess the fair coin.
If more than 200 heads appear guess the biases coin.
If exactly 200 heads appears, then laugh at your (bad?) luck.
You can be 95 percent certain you chose the right coin.
Kenneth Harris (Math 425)
34 / 1
Math 425 Introduction to Probability Lecture 36
April 18, 2009
35 / 1
1
4(1 − p)(0.125)2
Degree of Certainty
50%
75%
90%
95%
99%
99.9%
Kenneth Harris (Math 425)
Number of Trials
32
64
160
320
1600
16,000
Math 425 Introduction to Probability Lecture 36
April 18, 2009
36 / 1