Download Independent random variables

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Series (mathematics) wikipedia , lookup

Non-standard calculus wikipedia , lookup

Nyquist–Shannon sampling theorem wikipedia , lookup

List of important publications in mathematics wikipedia , lookup

Georg Cantor's first set theory article wikipedia , lookup

Infinite monkey theorem wikipedia , lookup

Fermat's Last Theorem wikipedia , lookup

Brouwer fixed-point theorem wikipedia , lookup

Fundamental theorem of calculus wikipedia , lookup

Mathematical proof wikipedia , lookup

Four color theorem wikipedia , lookup

Karhunen–Loève theorem wikipedia , lookup

Wiles's proof of Fermat's Last Theorem wikipedia , lookup

Fundamental theorem of algebra wikipedia , lookup

Theorem wikipedia , lookup

Law of large numbers wikipedia , lookup

Proofs of Fermat's little theorem wikipedia , lookup

Transcript
Independent random variables
E6711: Lectures 3
Prof. Predrag Jelenković
1 Last two lectures
probability spaces
probability measure
random variables and stochastic processes
distribution functions
independence
conditional probability
memoriless property of geometric and exponential distributions
expectation
conditional expectation (double expectation)
mean-square estimation
1
Let fXj ; j
and
1g be a sequence of independent random variables
Nn =
n
X
Xj
j =1
be a partial sum of the first n of these r.v.s. In many applications understanding the statistical behavior of these sums is very important.
Thus, a big part of probability theory studies the characteristics of
Nn .
In this lecture we review some of the well-known theorems of
probability theory:
Markov and Chebyshev’s inequalities
Laws of Large Numbers
Central Limit Theorem
2 Inequalities
Proposition 2.1 (Markov’s inequality) If
dom variable, then for any a > 0
P[X
X
is a nonnegative ran-
a] E aX :
Proof: For a > 0, let us define an indicator function
1[X
a] =
Then,
1[X
1 if X
a
0 otherwise:
a] Xa ;
2
thus, by taking the expected value on both sides in the preceding
inequality we obtain
E 1[X
a] = P[X a] E aX :
3
Corollary 2.1 (Chebyshev’s inequality) If X is a random variable
with finite mean and variance 2 = E (X )2, then for any > 0
j
P[ X
Proof: Let Y
)2,
= (X
j
2
j ] 2
:
then
j ] = P[(X ) ]
= P[Y ]
EY = ;
P[ X
2
2
2
2
2
2
where the last inequality follows from Markov’s inequality.
Corollary 2.2 (Chernoff’s bound) Let M (t)
t > 0, then
P[X
Proof: Let Y
=e
tX
y] e
and a = ety , then
P[X
ty
def
=
E etX <
M (t):
y] = P[tX ty]
tX
= P[e
ety ]
= P[Y a]
EaY = e ty M (t);
3
3
1 for some
note that the last inequality follows from Markov’s inequality.
3
3 Laws of Large Numbers: ergodic theorems
Ergodic theory studies the conditions under which the sample path
average
def
Y =
X1 +
converges to the mean = E X 1 as
+ Xn
n
n
(3.1)
! 1.
Theorem 3.1 (Weak Law of Large Numbers) Let X 1; X2; : : :, be a
sequence of independent random variables with finite mean and
variance 2. Then, for any > 0
X1 + X2 +
P n
+ Xn
!0
as
n
! 1:
Proof: Recall the definition of Y from equation (3.1), then
EY
=
E
X1 + X2 +
n
and
Var(Y ) = Var
=
=
+ Xn = X1 + X2 +
+ Xn n
Var(X1 ) +
+ Var(Xn)
n2
2
n
Thus, by Chebyshev’s inequality
j
P[ Y
2
j ] n ! 0
2
4
as
n
! 1;
3
we conclude the proof of the theorem.
Now we know that
X1 + X2 +
P n
+ Xn converges to zero, however it is not clear how fast? This problem is
investigated by the theory of Large Deviations.
Recall M (t) = E etX1 and define the rate function
def
l(a) =
log
inf e
t0
ta
M (t)
= sup(ta
t0
log M (t)):
Then
Theorem 3.2 For every a > E X 1 and n 1
P
X1 + X2 +
n
+ Xn a e
nl(a)
:
Proof: For any t > 0
P
X1 + X2 +
n
+ Xn a = P het X
(
e
(Chernoff’s inequality)
=
Thus,
P
X1 + X2 +
n
1 +X2 +
+Xn) etan
i
E et(X1 +X2++Xn)
ta
tX1 n
e Ee
:
tan
+ Xn a inf
t0
=e
e
nl(a)
ta
tX1 n
Ee
:
3
5
The preceding two theorems estimate the probabilities that a sample path mean is close to the (ensemble) mean. The following theorem goes one step further in showing that for almost every fixed
omega the sample path average converges to the mean (in the ordinary deterministic sense).
Theorem 3.3 (Strong Law of Large Numbers) Let X 1; X2; : : :, be
a sequence of independent random variables with finite mean and
def
K = E X14 < 1. Then, for almost every ! (or with probability 1)
X1 + X2 +
n
+ Xn ! as
n
! 1:
Remark: For this theorem to hold it is enough to assume that the
mean = E X1 exists (i.e., it could be even infinite). However, in
order to present a simpler proof, we impose a stronger assumption
E X14 < 1.
Proof: To begin, assume that = E X j = 0; then
E Nn4
=
+Xn)(X + +Xn)(X + +Xn)(X + +Xn)]:
E [(X1 +
1
1
1
Now, expanding the right-hand side of the equation above will result
in terms of the form (i 6= j 6= k )
E Xi4
E [Xi3 Xj ] = E Xi3 E Xj
=0
by independence
E Xi2 Xj2
E [Xi2 Xj Xk ] = E Xi2 E Xj E Xk
by independence
E [Xi Xj Xk Xl ] = E Xi E Xj E Xk E Xl = 0 by independence:
=0
6
Next,
there
are
n terms of the form E X i4 and for each i 6= j there are
4
2
2
= 6 terms in the expansion that are equal to E X i Xj . Hence,
2
E Nn4
n
4
= nE X1 + 6
2 2
(E X1 )
2
2 2
1)(E X1 ) :
= nK + 3n(n
Also, K
0
<
1 implies E X
Var(X ) = E X
2
1
2
1
<
4
1
1, since
2 2
(E X1 )
)
(3.2)
2 2
(E X1 )
K
(3.3)
Now, by replacing (3.3) in (3.2), we obtain
E Nn4
nK + 3nK 4nK :
n4
Thus,
E
1 4
X
N
n
n=1
4
n
3
=
2
1
X
E N4
n
n=1
n
4
2
1
X
4K
n=1
n2
<
1:
Therefore, with probability 1
1 4
X
N
n
n=1
n4
<
1;
which implies that, with probability 1
Nn4
lim
= 0;
n!1 n4
or equivalently
P
Nn
lim
= 0
n!1 n
= 1:
This concludes the proof of the case = 0. If
Xj0 = Xj
E Xj and use the same proof.
7
6
= 0,
then define
3
4 Central Limit Theorem
Central Limit Theorem, Similarly to the Large Deviation Theorem,
measures the deviation of the sample mean from the expected value
.
Theorem 4.1 (Cental Limit Theorem (CLT)) Let X j ; j 1 be a
sequence of i.i.d. r.v.s with mean and variance 2 < 1. Then, the
distribution of
def
Zn =
X1 +
+p Xn
n
tends to standard normal distribution as
number a
P
X1 +
pXn
n
n
a !p
n
Z
1
2
n
! 1, i.e., for any real
a
1
e
x2 =2
dx
as
n
! 1:
First, we state the following key lemma that will be used in the
proof of CLT.
Lemma 4.1 Let Z1; Z2; : : :, be a sequence of r.v.s having distribution
functions FZn and moment generating functions M Zn (t) = E etZn ; n 1; And let Z be a random variable having distribution F Z and moment generating function M Z (t). It MZn (t) ! MZ (t) as n ! 1,
for all t, then
FZn (x)
! FZ (x)
Proof: Omitted.
8
as
x
! 1:
3
Proof of CLT: Assume that = 0 and 2 = 1. Then, moment
p
generating function (m.g.f.) of X j = n is equal to
h
p i
tXj = n
E e
Thus, the m.g.f. of
pn) where M (t) = E etX :
p
X = n is equal to
= M (t=
Pn
j =1
j
j
M
Now, if L(t)
def
= log M (t),
log
pn)
Thus
lim
n!1
L(t=
n
1
M
pn
t
= lim
n!1
= lim
n!1
= lim
n!1
= lim
n!1
then
n
00
L (0) =
n
:
= nL
L0 (t=
L0 (t=
2n
pn)n
pn
t
3=2
pn)t
=
t
pn)n
L(t=
pn)
n
1
:
(by L’Hospital’s rule)
2
2n
L00 (t=
1=2
2n
3= 2 2
t
(by L’Hospital’s rule)
3=2
p
t
00
L (t= n)
2
2
Next, note that
L(0) = 0
pn
t
0
L (0) =
M 0 (0)
M (0)
M (0)M 00 (0)
==0
0
(M (0))
2
(M (0))2
Hence, for any finite t
lim
n!1
p
00
L (t= n) = 1
9
=
E X2
= 1:
and, therefore
p
t
lim nL(t= n) =
;
2
n!1
or, equivalently
2
lim M
n!1
pn
t
n
t2 =2
=e
:
On the other hand, if Z is a standard normal r.v., then
E etN
t2 =2
=e
;
which, by Lemma 4.1, concludes the proof of the theorem for = 0
and 2 = 1.
For the general case 6= 0 and 2 6= 1, we can introduce new
variables
def
Xj =
clearly
E Xj
= 0
Xj
and
;
Var(Xj ) = 1;
and, therefore, we can use the already proved case.
10
3