Download Here - University of Illinois at Chicago

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Random variable wikipedia , lookup

Probability box wikipedia , lookup

Inductive probability wikipedia , lookup

Birthday problem wikipedia , lookup

Randomness wikipedia , lookup

Ars Conjectandi wikipedia , lookup

Conditioning (probability) wikipedia , lookup

Infinite monkey theorem wikipedia , lookup

Central limit theorem wikipedia , lookup

Probability interpretations wikipedia , lookup

Law of large numbers wikipedia , lookup

Transcript
Stat 501 – Probability Theory I
Some Lecture Notes
Ryan Martin
Department of Mathematics, Statistics, and Computer Science
University of Illinois at Chicago
[email protected]
www.math.uic.edu/~rgmartin
October 20, 2014
Abstract
These notes are based on the two lectures1 I gave in Stat 501, Probability Theory I, as a substitute for Professor Cheng Ouyang. There is a brief discussion of
the so-called “basic grouping lemma” concerning independent σ-algebras, followed
by a more detailed discussion of the Borel–Cantelli lemma, some applications, and
some elaborations on independence.
Independence, and the basic grouping lemma
Let {Bt : t ∈ T } be a collection of independent σ-algebras, i.e., for any k ≥ 1, for any
t1 , . . . , tk , and any Bt1 , . . . , Btk in Bt1 , . . . , Btk , the events Bt1 , . . . , Btk are independent.
It makes sense that disjoint sub-collections of the σ-algebras are also independent. Here
is the formal result.
Lemma (Grouping Lemma). Let {Bt : t ∈ T } be an independent collection of σ-algebras.
Let S be an index set with the property that, for s ∈ S, Ts ⊂ T and {Ts : s ∈ S} are
pairwise disjoint. Define
BTs = smallest σ-algebra containing all Bt , t ∈ Ts .
Then {BTs : s ∈ S} is an independent collection of σ-algebras.
Proof. Pretty easy, see pages 101–102 in Resnick.
Despite the complicated σ-algebra terminology, the Grouping Lemma is quite intuitive. For example, let X1 , . . . , Xn be a collection of independent random variables. Then
the Grouping Lemma says that
• σ({X1 , . . . , Xk }) and σ({Xk+1 , . . . , Xn }) are independent σ-algebras;
Pk
Pn
•
X
and
i
i=1
i=k+1 Xi are independent random variables;
• and, more generally, f (X1 , . . . , Xk ) and g(Xk+1 , . . . , Xn ) are independent random
variables for any suitable real-valued functions f and g.
1
These lectures are based, in part, on Sidney Resnick’s A Probability Path.
1
Borel–Cantelli lemma, part I
For a given sequence of events {An : n ≥ 1} in a common probability space, recall the
notion of “limsup” of events. That is
\ [
lim sup An =
An =: {An , i.o.}.
n
N ≥1 n≥N
That is, An occurs “infinitely often” in the sense that, for ever N , there exists n such
that An occurs. Then the first Borel–Cantelli lemma concerns the probability of this
“infinitely often” event.
Theorem
(Borel–Cantelli, part I). Let An be events in the probability space (Ω, A, P ).
P
If n P (An ) < ∞, then P (lim supn An ) = P (An , i.o.) = 0.
S
Proof. It is clear that lim supn An ⊆ n≥N An for anySN . Furthermore,
by monotonicP
ity of P and Boole’s inequality P (lim supn An ) ≤ P ( n≥N An ) ≤ n≥N P (An ). Since
the summation over all n is finite, given any ε > 0, there exists N = Nε such that
P
n≥Nε P (An ) < ε. Since ε is arbitrary, it follows that P (lim supn An ) = 0.
Application: Strong law of large numbers
The Borel–Cantelli lemma above looks very simple, but has important consequences.
In fact, this result is commonly used to prove various almost sure convergence results.
Besides the application here, I am aware of many applications of the Borel–Cantelli lemma
in proofs of consistency for general Bayesian posterior distributions.
This section gives an example that is a bit ahead of the course schedule. However, since
the Borel–Cantelli lemma is so useful, it makes sense to give an interesting application to
highlight its importance. Consider a sequence of iid random variables X1 , X2 , . . . , with
common mean µ. A result that is justified, but not proved, in an introductory probability
course is the law of large numbers, i.e.,
P (|X̄n − µ| > ε) → 0,
n → ∞,
∀ ε > 0,
P
where X̄n = n−1 ni=1 Xi is the sample mean. That is, for large samples, the sample
mean X̄n will be close to the population mean µ with high probability. This is often
referred to as a weak law of large numbers. Here, using the Borel–Cantelli lame, we will
prove a stronger result, namely, a strong law of large numbers,2 but for only a special
case of standard normal random variables.
Let X1 , X2 , . . . be a sequence of independent N(0, 1) random variables, i.e., their common density function is
2
(2π)−1/2 e−x /2 , x ∈ R.
The crucial step to the application of the Borel–Cantelli lemma and the strong law of
large numbers is the following “concentration inequality” for normal random variables.
Lemma. Let Z ∼ N(0, 1). Then, for any ε > 0,
2 /2
P (|Z| > ε) ≤ ε−1 e−ε
2
.
The difference between the “weak” and “strong” law of large numbers is the mode of convergence:
the former is “in probability,” while the latter is “with probability 1.” You will learn more about the
difference between these two modes of convergence later in Stat 510.
2
Proof. For Z ∼ N(0, 1) and ε > 0, we get
Z ∞
Z ∞
1
z −z2 /2
1
−z 2 /2
e
dz.
P (Z > ε) =
e
dz ≤
1/2
1/2
(2π)
(2π)
ε
ε
ε
By substitution (u = z 2 /2), we can simplify the upper bound to get
2 /2
P (Z > ε) ≤ ε−1 (2π)−1/2 e−ε
.
By symmetry of the standard normal,
P (|Z| > ε) = 2P (Z > ε) ≤
2 1 −ε2 /2
2
e
≤ ε−1 e−ε /2 .
1/2
(2π) ε
This can be extended to the case of a sample mean for n independent standard
iid
normals, due to the well-known fact that X̄n ∼ N(0, n−1 ) when X1 , . . . , Xn ∼ N(0, 1).
Corollary. For all n and all ε > 0, P (|X̄n | > ε) ≤ n−1/2 ε−1 e−nε
2 /2
.
Proof. The distribution of X̄n is the same as n−1/2 Z, for Z ∼ N(0, 1). Then
P (|X̄n | > ε) = P (|Z| > n1/2 ε).
Write ε0 = n1/2 ε and apply the previous lemma.
The main theorem of this section is a law of large numbers for X̄n , that is, we claim that
the sample mean, X̄n , of standard normal random variables converges to the population
mean, 0, with probability 1. A key step to proving this claim is representing the event
of non-convergence as a lim sup. Given ε > 0, let An = {|X̄n | > ε}. Then the event
{X̄n 6→ 0} is the same as lim supn An . To understand this, think about convergence using
the calculus definition of limit:
A sequence of numbers xn converges to x if, for any ε > 0, there exists
N = N (ε, x) such that |xn − x| < ε for all n > N .
In this sense, a sequence xn does not converge to x if there exists ε such that |xn − x| > ε
for infinitely many n. This proves that {X̄n 6→ 0} = lim supn An .
iid
Theorem. For X1 , X2 , . . . ∼ N(0, 1), P (X̄n 6→ 0) = 0.
P
2
Proof. For An as defined above, the Corollary gives P (An ) ≤ ε−1 e−nε /2 . Since n P (An ) <
∞, it follows from the Borel–Cantelli lemma that P (lim supn An ) = 0. From the above
equivalence, we get P (X̄n 6→ 0) = 0.
The above theorem says that X̄n converges to its mean, 0, with probability 1. This
is a stronger result than the weak law of large numbers in Stat 401. The latter only
requires a finite variance (which clearly holds for the normal), while the former requires
some stronger control on the tails of the normal distribution. It turns out that even the
strong law of large numbers holds in broad generality, without needing the Borel–Cantelli
lemma and/or concentration inequalities like in the Lemma; in fact, all that’s needed is
E|X1 | < ∞ in the iid case. You will see these details later on in Stat 501/502.
3
Borel–Cantelli lemma, part II
Note that the first Borel–Cantelli lemma above does not have anything to do with independence. It turns out that there is a second version of the result which says, for
independent events An , P (An , i.o.) is either 0 or 1, depending on whether the series
P
n P (An ) converges or diverges.
Theorem (Borel–Cantelli, part II). Let {An : n ≥ 1} be independent events. Then
(
P
0 if
P (An ) < ∞,
P (An , i.o.) =
Pn
1 if
n P (An ) = ∞.
Proof. The first part follows from
result. For the second
Ppart, we show that
S theTprevious
c
c
c
[lim supn An ] = lim inf n An = N ≥1 n≥N An has probability 0 if n P (An ) = ∞. Of
course,
[ \
\
lim inf Acn =
Acn ⊆
Acn , ∀ N ≥ 1.
n
N ≥1 n≥N
n≥N
By independence, the set on the right-hand side above has probability
\
Y
c
P
An =
{1 − P (An )}.
n≥N
n≥N
The claim is that the right-hand side above equals zero. Pick a finite integer M . Since
1 − x ≤ e−x , we get
NY
+M
n=N
{1 − P (An )} ≤
NY
+M
e−P (An ) = e−
PN +M
n=N
P (An )
.
n=N
P
Since the series n P (An ) diverges, the exponent in the upper bound approaches −∞ as
M → ∞, and it follows that P (lim supn An ) = 1 − P (lim inf n Acn ) = 1 − 0 = 1.
In applications, at least in statistics, it is rare for the events An of interest to be
independent. So, as far as I know, the first Borel–Cantelli lemma is the most useful in
practice. However, the second Borel–Cantelli lemma provides a first peek at an interesting phenomenon that occurs more generally. That is, for certain “limiting events,”
the probability can be either 0 or 1, nothing in between. You will see later under the
name Kolmogorov Zero-One Law, which implies, for example, that a law of large numbers convergence holds with either probability 1 or probability 0, no middle ground. An
interesting and somewhat philosophical implication is that there is no uncertainty in the
limiting case—this is, effectively, the fundamental theorem of statistics.3
Resnick considers an independent coin-flipping experiment, and the event of interest is
lim supn An = {coin lands heads infinitely often}. If the coin is fixed, so that P (An ) ≡ p,
then clearly the series diverges so the coin lands on heads infinitely often with probability
1, which is quite intuitive. On the other hand, in order for heads to not appear infinitely
often, the probability P (An ) for heads on the nth flip must decay quite rapidly.
3
This is my terminology, not given in a book, etc—but I don’t think that makes it wrong!
4