Download Limit Theorems

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Statistics wikipedia , lookup

Probability wikipedia , lookup

Randomness wikipedia , lookup

Transcript
Limit Theorems
Chia-Ping Chen
Professor
Department of Computer Science and Engineering
National Sun Yat-sen University
Probability
Introduction
bounds for probability
sequences of random variables
very large number of random variables
Prof. C. Chen
Limit Theorems
Probability Bounds
Prof. C. Chen
Limit Theorems
Markov Inequality
For any non-negative random variable X and a > 0
P(X ≥ a) ≤
Prof. C. Chen
E[X]
a
Limit Theorems
Proof
Define
Y (X) = (X < a) ? 0 : a
⇒ Y (X) ≤ X
⇒ E[Y (X)] ≤ E[X]
⇒ a P(X ≥ a) ≤ E[X]
E[X]
⇒ P(X ≥ a) ≤
a
Prof. C. Chen
Limit Theorems
Chebyshev Inequality
For any random variable X and c > 0
P (|X − E[X]| ≥ c) ≤
Prof. C. Chen
var(X)
c2
Limit Theorems
Proof
Define
Z(X) = (|X − E[X]| < c) ? 0 : c2
⇒ Z(X) ≤ (X − E[X])2
⇒ E[Z(X)] ≤ E[(X − E[X])2 ]
⇒ c2 P(|X − E[X]| ≥ c) ≤ var(X)
var(X)
⇒ P(|X − E[X]| ≥ c) ≤
c2
Prof. C. Chen
Limit Theorems
Comparison
Markov inequality
for non-negative random variables only
the probability of being away from 0 by a
the bound is inversely proportional to a
Chebyshev inequality
good for any random variables
the probability of being away from mean by c
the bound is inversely proportional to c2
Prof. C. Chen
Limit Theorems
Example 5.1 & 5.2
Let X be uniformly distributed in the interval [0, 4]
E[X] = 2,
var(X) =
4
3
By Markov inequality
E[X]
=1
2
E[X]
2
P(X ≥ 3) ≤
=
3
3
P(X ≥ 2) ≤
By Chebyshev inequality
P(|X − 2| ≥ 1) ≤
Prof. C. Chen
var(X)
4
=
2
1
3
Limit Theorems
Example 5.3
When X takes values in [a, b], it can be shown that
var(X) ≤
(b − a)2
4
Using this result, the upper bound in Chebyshev inequality can
be replaced by a simpler bound
P(|X − E[X]| ≥ c|) ≤
Prof. C. Chen
(b − a)2
4c2
Limit Theorems
Sequence of Random Variables
Prof. C. Chen
Limit Theorems
Definition
A sequence of random variables is
X1 , X2 , . . .
An instantiation of a sequence of random variables is called a
sample sequence.
Prof. C. Chen
Limit Theorems
Examples
packet arrivals in a slot
inter-arrival times
daily currency exchange rates
feature sequence of acoustic signal
daily rainfalls at a particular location
sequence of DNA
words in a sentence
Prof. C. Chen
Limit Theorems
Independent and Identically Distributed Sequence
A sequence of random variables
X1 , X2 , . . .
is said to be iid (independent and identically distributed)
if the random variables are independent and have the same
probability distribution function.
The common mean and variance of the random variables in an
iid sequence will be denoted by
E[Xi ] = µ,
Prof. C. Chen
var(Xi ) = σ 2
Limit Theorems
Sample Mean
The sample mean of a sequence
X1 , X2 , . . .
is defined by
Mn =
X1 + · · · + Xn
n
For an iid sequence of random variables
n
P
E[Mn ] =
n
n
P
var(Mn ) =
E[Xi ]
i=1
var(Xi )
i=1
Prof. C. Chen
=µ
n2
=
σ2
n
Limit Theorems
Weak Law of Large Numbers
For an iid sequence of random variables
X1 , X2 , . . .
the sequence of sample means
M1 , M2 , . . .
converges to µ in the following sense: for any > 0
lim P(|Mn − µ| ≥ ) = 0
n→∞
That is, the probability that Mn differs from µ by more than vanishes as n increases.
Prof. C. Chen
Limit Theorems
Proof
By Chebyshev inequality, for any > 0, the probability that Mn
differs from µ by more than is bounded by
P (|Mn − µ| ≥ ) ≤
var(Mn )
2
Substituting
var(Mn ) =
we have
P (|Mn − µ| ≥ ) ≤
Prof. C. Chen
σ2
n
σ2
n2
−→ 0
Limit Theorems
Example 5.4 Relative Frequency
The relative frequency of occurrence of an event converges to
the probability of the event.
Loosely speaking
probability = relative frequency of occurrence
Prof. C. Chen
Limit Theorems
Explanation
Consider event A. Let
(
Xi =
1, if A occurs in the ith experiment
0, otherwise
Consider the sample mean of the iid sequence X1 , X2 , . . .
Mn =
X1 + · · · + Xn
n
The right-hand side is the relative frequency of event A
Mn converges to E[X], which is
E[X] = 1 · P(A) + 0 · (1 − P(A)) = P(A)
Prof. C. Chen
Limit Theorems
Example 5.5 Poll
Let p be the fraction of voters who support Trump. We poll n
randomly selected voters and record Mn , the fraction of them
that support Trump. We view Mn as our estimate of p and
would like to investigate its properties.
Prof. C. Chen
Limit Theorems
Analysis of Poll Result
Let Xi indicate the support of the ith selected voter for Trump.
⇒ E[Xi ] = p · 1 + (1 − p) · 0 = p
var(Xi ) = E[Xi2 ] − E 2 [Xi ] = p(1 − p)
Mn =
X1 + · · · + Xn
n
⇒ E[Mn ] = p,
var(Mn ) =
p(1 − p)
n
By Chebyshev’s inequality, for any > 0
P (|Mn − p| ≥ ) ≤
p(1 − p)
1
≤
n2
4n2
which can be arbitrarily small by increasing n.
Prof. C. Chen
Limit Theorems
Application
In a poll, accuracy and confidence of the estimate can be
achieved simultaneously by using a large n.
To have the probability that Mn is within 0.01 of p not less
than 0.95
P (|Mn − p| ≥ 0.01) ≤ 0.05
we choose n such that
1
≤ 0.05
4n(0.01)2
⇒ n ≥ 50000
Prof. C. Chen
Limit Theorems
Convergence of Random Variables
Prof. C. Chen
Limit Theorems
Convergence of Real Numbers
A sequence of real numbers
r1 , r2 , . . .
is said to converge to a real number r if for any δ > 0
|rn − r| ≤ δ,
∀ n ≥ n0
for some n0 . This is denoted by
lim rn = r
n→∞
or
rn −→ r
As n increases, rn gets arbitrarily close to r, and never gets
away.
Prof. C. Chen
Limit Theorems
From Real Numbers to Random Variables
Consider a sequence of random variables
Y1 , Y2 , . . .
What does it mean to say that the sequence converges to a
random variable?
convergence in probability
convergence with probability 1
Prof. C. Chen
Limit Theorems
Convergence in Probability
A sequence
Y1 , Y2 , . . .
is said to converge in probability to a random variable Y if
for any > 0
lim P(|Yn − Y | ≥ ) = 0
n→∞
This is denoted by
P
Yn −−→ Y
As n increases, the probability that Yn gets away from Y gets
arbitrarily close to 0 and never gets away from 0.
Prof. C. Chen
Limit Theorems
A Closer Look
Convergence in probability can be understood by convergence
of real numbers. Specifically, for any > 0, the sequence
pn () = P(|Yn − Y | ≥ ) −→ 0
On the other hand, sample sequences of Yn are not required to
converge!
Prof. C. Chen
Limit Theorems
Accuracy and Confidence
Convergence in probability
P
Yn −−→ Y
also means that for any accuracy > 0 and confidence δ > 0,
there exists an n0 such that
P(|Yn − Y | ≥ ) ≤ δ
for n ≥ n0 .
Prof. C. Chen
Limit Theorems
Test of Convergence in Probability
To decide whether
P
Yn −−→ Y
decide if
lim P(|Yn − Y | ≥ ) = 0
n→∞
for any .
Prof. C. Chen
Limit Theorems
Example 5.6 Convergence in Probability
Consider an iid sequence of random variables
X1 , X2 , . . .
in which Xi is uniformly distributed in the interval [0, 1], and let
Yn = min(X1 , . . . , Xn )
Is it true that
P
Yn −−→ 0
For any > 0
lim P(|Yn − 0| ≥ ) = lim (1 − )n = 0
n→∞
n→∞
Prof. C. Chen
Limit Theorems
Example 5.7 Convergence in Probability
Let Y be an exponential random variable with parameter λ = 1.
For any positive integer n, let
Yn =
Y
,
n
Is it true that
n = 1, 2, . . .
P
Yn −−→ 0
For any > 0
lim P(|Yn − 0| ≥ ) = lim e−n = 0
n→∞
n→∞
Prof. C. Chen
Limit Theorems
Example 5.8 Convergence in Probability
Consider a sequence of random variables
Y1 , Y2 , . . .
with
P(Yn = y) =

1

1 − n ,

1
,
n

0,
Is it true that
y=1
y = n2
otherwise
P
Yn −−→ 1
How about
E[Yn ] −→ 1
Prof. C. Chen
Limit Theorems
Re-Statement of the Weak Law of Large Numbers
For an iid sequence of random variables, the sample mean
converges to the common mean in probability
P
Mn −−→ µ
Prof. C. Chen
Limit Theorems
Normalized Sample Mean
For an iid sequence of random variables
X1 , X2 , . . .
the normalized sample mean is defined by
Zn =
(X1 + · · · + Xn ) − nµ
√
σ n
E[Zn ] = 0
var(Zn ) = 1
Prof. C. Chen
Limit Theorems
Central Limit Theorem
The normalized sample mean of an iid sequence of random
variables has the limiting distribution of a standard normal
random variable.
lim P(Zn ≤ t) = P(Y ≤ t),
n→∞
Y is standard normal
= Φ(t)
Z t
1
2
=√
e−s /2 ds
2π −∞
Prof. C. Chen
Limit Theorems
Approximation
The sum of iid random variables is approximately Gaussian
√
(Sn = X1 + · · · + Xn ) = nσZn + nµ
≈ N (nµ, nσ 2 )
⇒ P(Sn ≤ c) = P (X1 + · · · + Xn ≤ c)
X1 + · · · + Xn − nµ
c − nµ
√
=P
≤ √
σ n
σ n
c − nµ
= P Zn ≤ √
σ n
c − nµ
≈P Y ≤ √
σ n
c − nµ
√
=Φ
σ n
Prof. C. Chen
Limit Theorems
Example 5.9 Loading Packages
We load on a plane 100 packages whose weights are
independent random variables that are uniformly distributed
between 5 and 50 pounds. What is the probability that the
total weight will exceed 3000 pounds?
Prof. C. Chen
Limit Theorems
Solution
Let Xi be the weight of package i
E[Xi ] = 27.5,
var(Xi ) =
(50 − 5)2
= 168.75
12
The total weight of n packages is
Sn = X1 + · · · + Xn
S100 − 100 · 27.5
3000 − 100 · 27.5
√
> √
100 · 168.75
100 · 168.75
= P(Zn > 1.92)
⇒ P(S100 > 3000) = P
= 1 − P(Zn ≤ 1.92)
≈ 1 − Φ(1.92)
= 0.0274
Prof. C. Chen
Limit Theorems
Example 5.10 Processing Parts
A machine processes parts, one part at a time. The processing
times of different parts are independent random variables,
uniformly distributed in [1, 5]. We wish to approximate the
probability that the number of parts processed within 320 time
units, denoted by N320 , is at least 100.
Prof. C. Chen
Limit Theorems
Solution
N320 ≥ 100 ⇔ S100 = T1 + · · · + T100 ≤ 320
Since
E[Ti ] = 3,

⇒ P(S100 ≤ 320) = P 
var(Ti ) =
S100 − 100 · 3
q
100 ·
4
3
4
3

≤
= P(Zn ≤ 1.73)
≈ P(Y ≤ 1.73)
= 0.9582
Prof. C. Chen
Limit Theorems
320 − 100 · 3 
q
100 ·
4
3
Example 5.11 Poll
We poll n voters and record the fraction Mn of those polled
who are in favor of a particular candidate. If p is the fraction of
the entire voter population that supports this candidate, then
Mn =
X1 + · · · + Xn
n
where Xi ’s are iid Bernoulli random variables with parameter p.
Given sample size n and accuracy , the central limit theorem
can be used to bound
P(|Mn − p| > )
Given accuracy and bound δ, a sample size n to satisfy
P(|Mn − p| > ) < δ
can be decided.
Prof. C. Chen
Limit Theorems
Bounding Probability
X1 + · · · + Xn
X1 + · · · + Xn − np
−p=
n
n
X1 + · · · + Xn − np σ
√
√
=
nσ
n
σ
= Zn · √
n
√ ! !
n
⇒ P(|Mn − p| ≥ ) ≈ 2P(Mn − p ≥ ) = 2P Zn ≥
σ
√ ! !
n
≈ 2P Y ≥
σ
"
√ ! !#
n
≤2 1−P Y ≤
σ
√ ≤ 2 1 − Φ(2 n)
Mn − p =
Prof. C. Chen
Limit Theorems
Sample Size
Consider the same accuracy and confidence set earlier
= 0.01,
δ = 0.05
We want to find sample size n such that
P(|Mn − p| ≥ 0.01) ≤ 0.05
P(|Mn − p| ≥ ) ≤ 2(1 − Φ(2 ·
√
n · )) ≤ δ
√
⇒ P(|Mn − p| ≥ 0.01) ≤ 2(1 − Φ(2 · n · 0.01)) ≤ 0.05
Using the standard normal table
√
√
Φ(2 · n · 0.01) ≥ 0.975 ⇒ 2 · n · 0.01 ≥ 1.96
⇒ n ≥ 9604
Prof. C. Chen
Limit Theorems
Convergence with Probability 1
A sequence of random variables is said to converge with
probability 1, or converge almost surely, to random
variable Y if
P lim Yn = Y = 1
n→∞
That is
P({ω | lim Yn (ω) = Y (ω)}) = 1
n→∞
This is denoted by
a.s.
Yn −−−→ Y
Prof. C. Chen
Limit Theorems
Comparison with Convergence in Probability
Convergence with probability 1 implies convergence in
probability. The converse is not true.
Yn −−−→ Y ⇒ P
⇒ P
⇒ P
a.s.
⇒
lim Yn = Y
=1
lim Yn 6= Y
=0
n→∞
n→∞
lim |Yn − Y | =
6 0 =0
n→∞
P
lim P (|Yn − Y | > ) = 0 ⇒ Yn −−→ Y
n→∞
The converse is not true since convergence in probability does
not require sample sequences of Yn to converge, which is
required by convergence with probability 1.
Prof. C. Chen
Limit Theorems
Example 5.15 Convergence with Probability 1
Let
X1 , X2 , . . .
be a sequence of iid random variables uniformly distributed in
[0, 1], and
Yn = min(X1 , . . . , Xn ),
n = 1, 2, . . .
Show that Yn converges to 0 with probability 1.
Prof. C. Chen
Limit Theorems
Solution
Y1 ≥ Y2 ≥ · · · ≥ 0
Any sample sequence of Y1 , Y2 , . . . is non-increasing and
bounded from below, so its convergence is assured. For any
>0
P(Yn > ) = P(X1 > , . . . , Xn > ) = (1 − )n
⇒ P
lim Yn = 0 = 1
n→∞
Prof. C. Chen
Limit Theorems
Example 5.16 Convergence
Consider a discrete-time arrival process. The time slots are
partitioned into consecutive intervals of the form
Ik = {2k , 2k + 1, . . . , 2k+1 − 1}. Note that the length of Ik is 2k ,
which increases with k. During each Ik , there is exactly one
arrival, and all times within an interval are equally likely. The
arrival times within different intervals are assumed independent.
Define Yn = 1 if there is an arrival at time n, and Yn = 0
otherwise. Show that Yn converges to 0 in probability, but Yn
does not converge to 0 with probability 1.
Prof. C. Chen
Limit Theorems
Comparison
Yn converges to 0 in probability, since
P(|Yn − 0| > ) = P(Yn 6= 0) =
1
n→∞
−−−→ 0
2blog2 nc
On the other hand, Yn does not converge to 0 with probability
1. In fact, any sample sequence does not even converge!
Prof. C. Chen
Limit Theorems
Strong Law of Large Numbers
For an iid sequence of random variables
X1 , X2 , . . .
the sequence of sample means
M1 , M2 , . . .
converges to µ with probability 1
a.s.
Mn −−−→ µ
That is
X1 + · · · + Xn
P lim
=µ =1
n→∞
n
Prof. C. Chen
Limit Theorems
Weak and Strong
Consider an iid sequence.
by the weak law, the probability that the sample mean gets
away from the common mean converges to 0
by the strong law, the probability of the set of sample
sequences whose sample means do not converge to the
common mean is 0
Prof. C. Chen
Limit Theorems