Download An Introduction to Laws of Large Numbers

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
Transcript
An Introduction to Laws of Large Numbers
Introduction
Introduction
To Laws of
Large
Numbers
Weak Law of
Large Numbers
Strong Law
Strongest Law
Examples
Information
Theory
Statistical
Learning
Appendix
Random
Variables
Working with
R.V.’s
Independence
Limits of
Random
Variables
Modes of
Convergence
Chebyshev
John
CVGMI Group
Contents
1
Introduction
Introduction
To Laws of
Large
Numbers
Weak Law of
Large Numbers
Strong Law
Strongest Law
Examples
Information
Theory
Statistical
Learning
Appendix
Random
Variables
Working with
R.V.’s
Independence
Limits of
Random
Variables
Modes of
Convergence
Chebyshev
Introduction
Contents
1
2
Introduction
Introduction
To Laws of
Large
Numbers
Weak Law of
Large Numbers
Strong Law
Strongest Law
Examples
Information
Theory
Statistical
Learning
Appendix
Random
Variables
Working with
R.V.’s
Independence
Limits of
Random
Variables
Modes of
Convergence
Chebyshev
Introduction
Introduction To Laws of Large Numbers
Weak Law of Large Numbers
Strong Law
Strongest Law
Contents
1
2
Introduction
Introduction
To Laws of
Large
Numbers
Weak Law of
Large Numbers
Strong Law
Strongest Law
Examples
Information
Theory
Statistical
Learning
Appendix
Random
Variables
Working with
R.V.’s
Independence
Limits of
Random
Variables
Modes of
Convergence
Chebyshev
3
Introduction
Introduction To Laws of Large Numbers
Weak Law of Large Numbers
Strong Law
Strongest Law
Examples
Information Theory
Statistical Learning
Contents
1
2
Introduction
Introduction
To Laws of
Large
Numbers
Weak Law of
Large Numbers
Strong Law
Strongest Law
3
Examples
Information
Theory
Statistical
Learning
Appendix
Random
Variables
Working with
R.V.’s
Independence
Limits of
Random
Variables
Modes of
Convergence
Chebyshev
4
Introduction
Introduction To Laws of Large Numbers
Weak Law of Large Numbers
Strong Law
Strongest Law
Examples
Information Theory
Statistical Learning
Appendix
Random Variables
Working with R.V.’s
Independence
Limits of Random Variables
Modes of Convergence
Chebyshev
Intuition
Introduction
Introduction
To Laws of
Large
Numbers
Weak Law of
Large Numbers
Strong Law
Strongest Law
Examples
Information
Theory
Statistical
Learning
Appendix
Random
Variables
Working with
R.V.’s
Independence
Limits of
Random
Variables
Modes of
Convergence
Chebyshev
We’re working with random variables. What could we observe?
Random Variables {Xn }∞
n=1
Intuition
Introduction
Introduction
To Laws of
Large
Numbers
Weak Law of
Large Numbers
Strong Law
Strongest Law
Examples
Information
Theory
Statistical
Learning
Appendix
Random
Variables
Working with
R.V.’s
Independence
Limits of
Random
Variables
Modes of
Convergence
Chebyshev
We’re working with random variables. What could we observe?
Random Variables {Xn }∞
n=1 ... More specifically
Bernoulli sequence, looking at probability
of average of the
P
sum of N random events. P(x1 < N
X
i=1 i < x2 )
Intuition
Introduction
Introduction
To Laws of
Large
Numbers
Weak Law of
Large Numbers
Strong Law
Strongest Law
Examples
Information
Theory
Statistical
Learning
Appendix
Random
Variables
Working with
R.V.’s
Independence
Limits of
Random
Variables
Modes of
Convergence
Chebyshev
We’re working with random variables. What could we observe?
Random Variables {Xn }∞
n=1 ... More specifically
Bernoulli sequence, looking at probability
of average of the
P
sum of N random events. P(x1 < N
X
i=1 i < x2 )
Coin Flipping anybody?
Intuition
Introduction
Introduction
To Laws of
Large
Numbers
Weak Law of
Large Numbers
Strong Law
Strongest Law
Examples
Information
Theory
Statistical
Learning
Appendix
Random
Variables
Working with
R.V.’s
Independence
Limits of
Random
Variables
Modes of
Convergence
Chebyshev
We’re working with random variables. What could we observe?
Random Variables {Xn }∞
n=1 ... More specifically
Bernoulli sequence, looking at probability
of average of the
P
sum of N random events. P(x1 < N
X
i=1 i < x2 )
Coin Flipping anybody?
as N increases we see that the probability of observing an
equal amount of 0 or 1 is ∼ equal
Intuition
Introduction
Introduction
To Laws of
Large
Numbers
Weak Law of
Large Numbers
Strong Law
Strongest Law
Examples
Information
Theory
Statistical
Learning
Appendix
Random
Variables
Working with
R.V.’s
Independence
Limits of
Random
Variables
Modes of
Convergence
Chebyshev
We’re working with random variables. What could we observe?
Random Variables {Xn }∞
n=1 ... More specifically
Bernoulli sequence, looking at probability
of average of the
P
sum of N random events. P(x1 < N
X
i=1 i < x2 )
Coin Flipping anybody?
as N increases we see that the probability of observing an
equal amount of 0 or 1 is ∼ equal
This is intuitive: As the number of samples increases the
average observation should tend toward the theoretical
mean
Weak Law
Introduction
Introduction
To Laws of
Large
Numbers
Weak Law of
Large Numbers
Strong Law
Strongest Law
Examples
Information
Theory
Statistical
Learning
Appendix
Random
Variables
Working with
R.V.’s
Independence
Limits of
Random
Variables
Modes of
Convergence
Chebyshev
Let’s try to work out this most basic fundamental theorem of
random variables arising from repeatedly observing a random
event. How do we build such a theorem?
Weak Law
Introduction
Introduction
To Laws of
Large
Numbers
Weak Law of
Large Numbers
Strong Law
Strongest Law
Examples
Information
Theory
Statistical
Learning
Appendix
Random
Variables
Working with
R.V.’s
Independence
Limits of
Random
Variables
Modes of
Convergence
Chebyshev
Let’s try to work out this most basic fundamental theorem of
random variables arising from repeatedly observing a random
event. How do we build such a theorem?
In blackbox scenario we want to be able to use
independence.
Weak Law
Introduction
Introduction
To Laws of
Large
Numbers
Weak Law of
Large Numbers
Strong Law
Strongest Law
Examples
Information
Theory
Statistical
Learning
Appendix
Random
Variables
Working with
R.V.’s
Independence
Limits of
Random
Variables
Modes of
Convergence
Chebyshev
Let’s try to work out this most basic fundamental theorem of
random variables arising from repeatedly observing a random
event. How do we build such a theorem?
In blackbox scenario we want to be able to use
independence.
We want to be able to use variance and expectation: so
let’s make Xn ∈ L2 for all n, and associate with each Xn
it’s mean, we’ll denote by µn , and variance, by σn .
Weak Law
Introduction
Introduction
To Laws of
Large
Numbers
Weak Law of
Large Numbers
Strong Law
Strongest Law
Examples
Information
Theory
Statistical
Learning
Appendix
Random
Variables
Working with
R.V.’s
Independence
Limits of
Random
Variables
Modes of
Convergence
Chebyshev
Let’s try to work out this most basic fundamental theorem of
random variables arising from repeatedly observing a random
event. How do we build such a theorem?
In blackbox scenario we want to be able to use
independence.
We want to be able to use variance and expectation: so
let’s make Xn ∈ L2 for all n, and associate with each Xn
it’s mean, we’ll denote by µn , and variance, by σn .
We will create new random variables from the sequence
and work with these; aim for results in terms of them.
Weak Law
Introduction
Introduction
To Laws of
Large
Numbers
Weak Law of
Large Numbers
Strong Law
Strongest Law
Examples
Information
Theory
Statistical
Learning
Appendix
Random
Variables
Working with
R.V.’s
Independence
Limits of
Random
Variables
Modes of
Convergence
Chebyshev
Let’s try to work out this most basic fundamental theorem of
random variables arising from repeatedly observing a random
event. How do we build such a theorem?
In blackbox scenario we want to be able to use
independence.
We want to be able to use variance and expectation: so
let’s make Xn ∈ L2 for all n, and associate with each Xn
it’s mean, we’ll denote by µn , and variance, by σn .
We will create new random variables from the sequence
and work with these; aim for results in terms of them.
This seems like a start, let’s try to prove a theorem...
Weak Law
Introduction
Theorem
Introduction
To Laws of
Large
Numbers
Let {Xn } be a sequence of independent L2 random variables
n
P
Xi − µi → 0.
with means µn and variances σn . Then
Weak Law of
Large Numbers
Strong Law
Strongest Law
Examples
Information
Theory
Statistical
Learning
Appendix
Random
Variables
Working with
R.V.’s
Independence
Limits of
Random
Variables
Modes of
Convergence
Chebyshev
i=1
So a sequence of functions on Ω, the sample space, are going
towards a function = zero...
Weak Law
Introduction
Theorem
Introduction
To Laws of
Large
Numbers
Let {Xn } be a sequence of independent L2 random variables
n
P
Xi − µi → 0.
with means µn and variances σn . Then
Weak Law of
Large Numbers
Strong Law
Strongest Law
Examples
Information
Theory
Statistical
Learning
Appendix
Random
Variables
Working with
R.V.’s
Independence
Limits of
Random
Variables
Modes of
Convergence
Chebyshev
i=1
So a sequence of functions on Ω, the sample space, are going
towards a function = zero...
Can we prove this?
Weak Law
Introduction
Theorem
Introduction
To Laws of
Large
Numbers
Let {Xn } be a sequence of independent L2 random variables
n
P
Xi − µi → 0.
with means µn and variances σn . Then
Weak Law of
Large Numbers
Strong Law
Strongest Law
Examples
Information
Theory
Statistical
Learning
Appendix
Random
Variables
Working with
R.V.’s
Independence
Limits of
Random
Variables
Modes of
Convergence
Chebyshev
i=1
So a sequence of functions on Ω, the sample space, are going
towards a function = zero...
Can we prove this?
We haven’t used (σn ): we’ll clearly need these since
otherwise the Xi are wildly unpredictable.
Weak Law
Introduction
Theorem
Introduction
To Laws of
Large
Numbers
Let {Xn } be a sequence of independent L2 random variables
n
P
Xi − µi → 0.
with means µn and variances σn . Then
Weak Law of
Large Numbers
Strong Law
Strongest Law
Examples
Information
Theory
Statistical
Learning
Appendix
Random
Variables
Working with
R.V.’s
Independence
Limits of
Random
Variables
Modes of
Convergence
Chebyshev
i=1
So a sequence of functions on Ω, the sample space, are going
towards a function = zero...
Can we prove this?
We haven’t used (σn ): we’ll clearly need these since
otherwise the Xi are wildly unpredictable.
IDEA: Constrain lim
n
P
i=1
σi2 = 0
Counterexample
Introduction
Introduction
To Laws of
Large
Numbers
Weak Law of
Large Numbers
Strong Law
Strongest Law
Examples
Information
Theory
Statistical
Learning
Appendix
Random
Variables
Working with
R.V.’s
Independence
Limits of
Random
Variables
Modes of
Convergence
Chebyshev
Weak Law: Pitfalls
Introduction
Introduction
To Laws of
Large
Numbers
Weak Law of
Large Numbers
Strong Law
Strongest Law
Examples
Information
Theory
Statistical
Learning
Appendix
Random
Variables
Working with
R.V.’s
Independence
Limits of
Random
Variables
Modes of
Convergence
Chebyshev
We haven’t specified a mode of convergence: this is a
technical point, but if you recall uniform, pointwise, a.e.,
normed convergence from analysis then the goal of the
proof can change radically.
IDEA: We want our rule to hold with high probability, that
is it should hold for essentially all values of ω ∈ Ω.
Convergence in the normed space is pretty ambitious at
this point.
Weak Law: Pitfalls
Introduction
Introduction
To Laws of
Large
Numbers
Weak Law of
Large Numbers
Strong Law
Strongest Law
Examples
Information
Theory
Statistical
Learning
Appendix
Random
Variables
Working with
R.V.’s
Independence
Limits of
Random
Variables
Modes of
Convergence
Chebyshev
We haven’t specified a mode of convergence: this is a
technical point, but if you recall uniform, pointwise, a.e.,
normed convergence from analysis then the goal of the
proof can change radically.
IDEA: We want our rule to hold with high probability, that
is it should hold for essentially all values of ω ∈ Ω.
Convergence in the normed space is pretty ambitious at
this point.
We haven’t specified a rate of convergence.
The rate of convergence of the σn should imply something
about the rate of convergence of the Xn − µn .
The Xn − µn should converge slower than the σn .
Weak Law: Pitfalls
Introduction
Introduction
To Laws of
Large
Numbers
Weak Law of
Large Numbers
Strong Law
Strongest Law
Examples
Information
Theory
Statistical
Learning
Appendix
Random
Variables
Working with
R.V.’s
Independence
Limits of
Random
Variables
Modes of
Convergence
Chebyshev
We haven’t specified a mode of convergence: this is a
technical point, but if you recall uniform, pointwise, a.e.,
normed convergence from analysis then the goal of the
proof can change radically.
IDEA: We want our rule to hold with high probability, that
is it should hold for essentially all values of ω ∈ Ω.
Convergence in the normed space is pretty ambitious at
this point.
We haven’t specified a rate of convergence.
The rate of convergence of the σn should imply something
about the rate of convergence of the Xn − µn .
The Xn − µn should converge slower than the σn .
n
P
IDEA: Weaken constraint to lim n−2
σi2 = 0
i=1
Weak Law: Pitfalls
Introduction
Introduction
To Laws of
Large
Numbers
Weak Law of
Large Numbers
Strong Law
Strongest Law
Examples
Information
Theory
Statistical
Learning
Appendix
Random
Variables
Working with
R.V.’s
Independence
Limits of
Random
Variables
Modes of
Convergence
Chebyshev
We haven’t specified a mode of convergence: this is a
technical point, but if you recall uniform, pointwise, a.e.,
normed convergence from analysis then the goal of the
proof can change radically.
IDEA: We want our rule to hold with high probability, that
is it should hold for essentially all values of ω ∈ Ω.
Convergence in the normed space is pretty ambitious at
this point.
We haven’t specified a rate of convergence.
The rate of convergence of the σn should imply something
about the rate of convergence of the Xn − µn .
The Xn − µn should converge slower than the σn .
n
P
IDEA: Weaken constraint to lim n−2
σi2 = 0
i=1
The Xn − µn should converge on average
Law of Large Numbers 2.0
Theorem
Introduction
Introduction
To Laws of
Large
Numbers
Weak Law of
Large Numbers
Strong Law
Strongest Law
Examples
Information
Theory
Statistical
Learning
Appendix
Random
Variables
Working with
R.V.’s
Independence
Limits of
Random
Variables
Modes of
Convergence
Chebyshev
Let {Xn } be a sequence of independent L2 random variables
n
P
with means µn and variances σn2 . Then n1
Xi − µi → 0 in
measure if lim
1
n2
Let Yn (ω) = 1/n
i=1
n
P
σi2
i=1
n
P
= 0.
(Xi (ω) − µi ). Then E (Yn ) = 0 and
i=1
σ 2 (Yn ) = 1/n2
n
P
i=1
σi2 by independence. Now we can use the
limit of σ 2 (Yn ) from the hypothesis, and need to prove
P({ω : |Yn (ω)| > }) ≤
σ 2 (Yn )
→ 0 as n →∗ ∞
2
Aside: Lp bounds
Introduction
Introduction
To Laws of
Large
Numbers
Weak Law of
Large Numbers
Strong Law
Strongest Law
Examples
Information
Theory
Statistical
Learning
Appendix
Random
Variables
Working with
R.V.’s
Independence
Limits of
Random
Variables
Modes of
Convergence
Chebyshev
What does
P({ω : |Yn (ω)| > }) ≤
σ 2 (Yn )
→ 0 as n →∗ ∞
2
actually say?
Part of Yn > is bounded by the integral of Yn2 divided by
epsilon squared... Thinking about this makes it obvious.
Better bounds can be obtained: Chernoff bounds
Law of Large Numbers 2.1
Theorem
Introduction
Introduction
To Laws of
Large
Numbers
Weak Law of
Large Numbers
Strong Law
Strongest Law
Examples
Information
Theory
Statistical
Learning
Appendix
Random
Variables
Working with
R.V.’s
Independence
Limits of
Random
Variables
Modes of
Convergence
Chebyshev
Let {Xn } be a sequence of independent L2 random variables
n
P
Xi − µi → 0 in
with means µn and variances σn2 . Then n1
measure if
lim n−2
n
P
i=1
i=1
σi2
= 0.
⇓ weakened; use P(maxk |
Pk
i=1 Xi
− µi | ≥ ) ≤ −2
Pn
2
i=1 σi
Theorem
Let {Xn } be a sequence of independent L2 random variables
n
P
with means µn and variances σn2 . Then n1
Xi − µi → 0 a.e.
if
lim n−2
n
P
i=1
i=1
σi2
< ∞.
Khinchine’s Law
Introduction
Introduction
To Laws of
Large
Numbers
Weak Law of
Large Numbers
Strong Law
Strongest Law
Examples
Information
Theory
Statistical
Learning
Appendix
Random
Variables
Working with
R.V.’s
Independence
Limits of
Random
Variables
Modes of
Convergence
Chebyshev
Theorem
Let {Xn } be a sequence of independent L1 random variables,
n
P
Xi − µi → 0
identically distributed, with means µ. Then n1
a.e. as n → ∞.
i=1
Large Sample Size in Coding
Introduction
Introduction
To Laws of
Large
Numbers
Weak Law of
Large Numbers
Strong Law
Strongest Law
Examples
Information
Theory
Statistical
Learning
Appendix
Random
Variables
Working with
R.V.’s
Independence
Limits of
Random
Variables
Modes of
Convergence
Chebyshev
Now we need to use the laws. Start with an example in source
coding.
C : X (Ω) → Σ∗ encoding events
Interesting to know how complex C must be to encode
(Yi )N
i=1
Entropy: H(x) = E [−lg (p(x))]; representation of how
uncertain a r.v. is
Problem: p(·) is unknown to the function and the
distribution needs to be learned. WLLN can be used to
answer: how uncertain is the distribution?
Asymptotic Equipartition Property
Theorem
Introduction
Introduction
To Laws of
Large
Numbers
Weak Law of
Large Numbers
Strong Law
Strongest Law
Examples
Information
Theory
Statistical
Learning
Appendix
Random
Variables
Working with
R.V.’s
Independence
Limits of
Random
Variables
Modes of
Convergence
Chebyshev
Asymptotic Equipartition Property If X1 , . . . , Xn i.i.d.
∼ p(x) then
1
− lg (p(x1 , . . . , xn )) → H(x).
n
Xi i.i.d.
∼ p(x) so lgp(xi ) are i.i.d. The weak law of large numbers
says that
Y
Y
1
P(| − (lg ( (p(xi ))) − E [ −lg (p(x)))]| > ) → 0
n
i
i
1X
P(| −
lg ((p(xi ))) − E [−lg (p(x))]| > ) → 0
n
i
so the sample entropy approaches the true entropy in
probability as the sample size increases.
Statistical Learning
Introduction
Introduction
To Laws of
Large
Numbers
Weak Law of
Large Numbers
Strong Law
Strongest Law
Examples
Information
Theory
Statistical
Learning
Appendix
Random
Variables
Working with
R.V.’s
Independence
Limits of
Random
Variables
Modes of
Convergence
Chebyshev
Want to minimize R(f ) = E (l(x, y , f (x))) eg
l = (x, y , f ) → 1/2|f (x) − y |
Stuck minimizing over all f , under a distribution we don’t
know... hopeless...
IDEA: Take Law of Large numbers and apply it to this
framework, and P
hopefully
Remp (f ) = 1/m i l(xi , yi , f (xi )) → R(f )
Use Lp bounds to prove convergence results on testing
error.
Mini Appendix: Measure Theory
I won’t go into too much detail in this regard. If we’ve made it
this far then great.
Introduction
Introduction
To Laws of
Large
Numbers
Weak Law of
Large Numbers
Strong Law
Strongest Law
Examples
Information
Theory
Statistical
Learning
Appendix
Random
Variables
Working with
R.V.’s
Independence
Limits of
Random
Variables
Modes of
Convergence
Chebyshev
(X , A) is a measurable space if A ⊂ 2X is closed under
countable unions and complements. These correspond to
’measurable events’.
(X , A) with µ is a measure space if µ : A → [0, ∞)
P is
countably additive over disjoint sets: µ(∪i Ai ) = i µ(Ai )
if Ai ∩ Aj = ∅ if i 6= j.
More great properties fall out of this quickly: measure of
the emptyset is zero, measures are monotonic under the
containment (partial) ordering, even dominated and
monotone convergence (of sets) come out from this.
Chebyshev’s Inequality - Let f ∈ Lp (R)
R and pdenote
p
Eα = {x : |f | > α} and now ||f ||p ≥ Eα |f | ≥ αp µ(Eα )
by monotonicity and positivity of measures.
L2 R.V.’s: Definition
Introduction
Introduction
To Laws of
Large
Numbers
Weak Law of
Large Numbers
Strong Law
Strongest Law
We say that a function fR: X → R belongs to the function
space Lp (X ) if ||f ||p = ( |f (x)|p dx)1/p < ∞ and say f has
finite Lp norm.
Definition
The fundamental object being considered in statistics is the
random variable. Given a measure space (Ω, B, µ) X : Ω → R
is an L2 random variable if
Examples
Information
Theory
Statistical
Learning
Appendix
Random
Variables
Working with
R.V.’s
Independence
Limits of
Random
Variables
Modes of
Convergence
Chebyshev
{X < r } ∈ B
∀r ∈ R
µ(Ω) = 1 and
Z
2
X (ω) dµ(ω)
1/2
< ∞.
We may also say that a random variable with finite 2nd
2
L2 R.V.’s: Questions
Why B?
Introduction
Introduction
To Laws of
Large
Numbers
Weak Law of
Large Numbers
Strong Law
Strongest Law
Examples
Information
Theory
Statistical
Learning
Appendix
Random
Variables
Working with
R.V.’s
Independence
Limits of
Random
Variables
Modes of
Convergence
Chebyshev
Physical paradoxes, eg. Banach-Tarski
Want to talk about physical events ∼ measurable sets
B is as descriptive as we’d like most of the time
What does µ do here?
µ weights the events ∼ measurable sets
Puts values to {ω ∈ Ω : X (ω) < r }; Ω is the sample space
X maps random events (patterns occuring in data) to the
reals which have a good standard notion of measure. So X
induces the distribution P(B) = µ(X −1 (B)) which gives
P({X (ω) ∈ B}) = µ(X −1 (B))
as expected. This is usually written without the sample
variable.
For more sophisticated questions/answers see MAA 6616
or STA 7466, but discussion is encouraged
Statistics: Functions of Measured Events
Now we can start building
Introduction
Introduction
To Laws of
Large
Numbers
Weak Law of
Large Numbers
Strong Law
Strongest Law
Theorem
If f : R → R is measurable, then
Z
Z
f (x) dP = f (X (ω)) dµ
Examples
Information
Theory
Statistical
Learning
Appendix
Random
Variables
Working with
R.V.’s
Independence
Limits of
Random
Variables
Modes of
Convergence
Chebyshev
ie. the random variable pushes forward a measure onto R, and
the integrals of measurable functions of random variables are
therefore ultimately determined by real integration.
This may be proved using Dirac measures and measure
properties. Following these lines we may develop the basics of
the theory of probability from measure theory. (ok, Homework)
Statistics: Moments
Introduction
Introduction
To Laws of
Large
Numbers
Weak Law of
Large Numbers
Strong Law
Strongest Law
Examples
Information
Theory
Statistical
Learning
Definition
Define the first moment as the linear functional
Z
E (X ) = t dP.
Then the pth central moment is is a functional given by
Z
mp (X ) = (t − E (X ))p dP.
Appendix
Random
Variables
Working with
R.V.’s
Independence
Limits of
Random
Variables
Modes of
Convergence
Chebyshev
Note that when X ∈ L2 these are well defined by the theorem
above. Pullbacks are omitted. Since we’re talking about L2
let’s define the second central moment as σ 2 for convenience.
If we need higher moments, sadly, mathematics says no.
Statistics: Independence
Introduction
Introduction
To Laws of
Large
Numbers
Weak Law of
Large Numbers
Strong Law
Strongest Law
Examples
Information
Theory
Statistical
Learning
Appendix
Random
Variables
Working with
R.V.’s
Independence
Limits of
Random
Variables
Modes of
Convergence
Chebyshev
Now that we know what random variables are, let’s try to
define how a collection of them interact in the most basic way
possible.
Definition
Let X = {Xα } be a collection of random variables. Then X is
independent if
P(∩α Xα−1 (Bα )) = P((X1 , ..., Xn )−1 (B1 × ... × Bn ))
= (µ(X1−1 ) × ... × µ(Xn−1 ))(B1 × ... × Bn ).
(Head Explodes) Really this just means that the Xα are
independent if their joint distributions are given by the product
measure over the induced measures.
Limits: Repeating Independent Trials
Introduction
Introduction
To Laws of
Large
Numbers
Weak Law of
Large Numbers
Strong Law
Strongest Law
Examples
Information
Theory
Statistical
Learning
Appendix
Random
Variables
Working with
R.V.’s
Independence
Limits of
Random
Variables
Modes of
Convergence
Chebyshev
The significance of the belabored definition of independence is
that when a joint distribution, just a distribution over several
random variables, contains zero information about how the
variables are related.
If we have an infinite collection of random variables,
independent of each other - in spite of independence - we
would like to be able to infer information about the lower
order moments from the higher order moments and vice
versa.
If we have ideal conditions on the the random variables
then we should be able to deduce information about
moments from a very large sequence.
The type of convergence we get should be sensitive to the
hypotheses.
Modes of Convergence
Introduction
Introduction
To Laws of
Large
Numbers
Weak Law of
Large Numbers
Strong Law
Strongest Law
Examples
Information
Theory
Statistical
Learning
Appendix
Random
Variables
Working with
R.V.’s
Independence
Limits of
Random
Variables
Modes of
Convergence
Chebyshev
We clearly need to specify how the means are converging,
right?
Modes of Convergence
Introduction
Introduction
To Laws of
Large
Numbers
Weak Law of
Large Numbers
Strong Law
Strongest Law
Examples
Information
Theory
Statistical
Learning
Appendix
Random
Variables
Working with
R.V.’s
Independence
Limits of
Random
Variables
Modes of
Convergence
Chebyshev
We clearly need to specify how the means are converging,
right? Why?
Modes of Convergence
Introduction
Introduction
To Laws of
Large
Numbers
Weak Law of
Large Numbers
Strong Law
Strongest Law
Examples
Information
Theory
Statistical
Learning
Appendix
Random
Variables
Working with
R.V.’s
Independence
Limits of
Random
Variables
Modes of
Convergence
Chebyshev
We clearly need to specify how the means are converging,
right? Why? P
So how could N
i=1 (Xi − µi )/N → 0?
Modes of Convergence
Introduction
Introduction
To Laws of
Large
Numbers
Weak Law of
Large Numbers
Strong Law
Strongest Law
Examples
We clearly need to specify how the means are converging,
right? Why? P
So how could N
i=1 (Xi − µi )/N → 0?
P
In measure : limN P({| N
i=1 (Xi − µi )/N| ≥ }) → 0
R PN
In norm: limN ( ( i=1 (Xi − µi )/N)p dµ(ω))1/p → 0
P
A.E : P({limN | N
i=1 (Xi − µi )/N| > 0}) → 0
Information
Theory
Statistical
Learning
AE
Appendix
Random
Variables
Working with
R.V.’s
Independence
Limits of
Random
Variables
Modes of
Convergence
Chebyshev
fnk
fn
fnk
µ
fn
Lp
Lp−i
Chebyshev
Introduction
Introduction
To Laws of
Large
Numbers
Weak Law of
Large Numbers
Strong Law
Strongest Law
Examples
Information
Theory
Statistical
Learning
Appendix
Random
Variables
Working with
R.V.’s
Independence
Limits of
Random
Variables
Modes of
Convergence
Chebyshev