Download Quick Review: More Theorems for Conditional Expectation

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Foundations of statistics wikipedia , lookup

History of statistics wikipedia , lookup

Inductive probability wikipedia , lookup

Probability amplitude wikipedia , lookup

Central limit theorem wikipedia , lookup

Law of large numbers wikipedia , lookup

Transcript
Quick Review:
More Theorems for Conditional Expectation:
Definition
Theorem
The conditional expectation of Z = h(X , Y ) is defined as
R
h(x , y )fY |X =x0 (y )dy if continuous
P 0
E [Z |X = x0 ] =
h(x0 , y )pY |X =x0 (y )
if discrete
Law of Iterated Expectations
E [h(X , Y )] = E [E [h(X , Y )|X ]]
In particular, E [E [Y |X ]] = E [Y ].
I
E [a + bX + cY |X = x0 ] = a + bx0 + cE [Y |X = x0 ]]
Theorem
I
If Y and X are independent, then
Analysis of Variance
E [h(Y )|X = x0 ] = E [h(Y )].
Theorems for Multivariate Normal
Theorem
Let Y be a multivariate normal random variable, Nn (µY , ΣY ), and
consider a linear transformation of Y:
W = AY + b,
where A is an n × n matrix and b is a vector of length n. Then W
is also multivariate normal with mean
AµY + b
and covariance matrix
AΣY AT .
I
In fact, the theorem also holds if A is p × n and b is p × 1,
with 1 ≤ p ≤ n. The result is well defined as long as AΣY AT
is non-singular.
V (Y ) = V (E [Y |X ]) + E [V (Y |X )]
Conditional Expectation for Bivariate Normal: 2nd
Example
Suppose {X , Y } is N2 (µ, Σ), where
3
µ=
1
and
Σ=
1 0.5
0.5 2
.
Find E [Y |X = x0 ].
1. Find A and b such that {X , Y }T = AZ + b. Choose A
upper/lower triangular, depending on conditioning.
2. Using the transformation, find the expectation.
Estimation
According to the National Academy of Sciences 2005 publication,
“Saving Women’s Lives: Strategies for Improving Breast Cancer
Detection and Diagnosis”, the risk of a false positive result in a
mammogram is about 1 in 10.
Estimation Con’t.
In probability/statistics language we say that X1 , . . . , Xn are IID
(independent and identically distributed) Bernoulli(p).
In general, if X1 , . . . , Xn are IID from some distribution with, say,
pdf f , mean µ and variance σ 2 , then
How did they come up with this number?
1. The joint density of X1 , . . . , Xn is
Πni=1 f (xi ).
2. The mean of X̄n is µ.
3. The variance of X̄n is
But what is the distribution of X̄n ?
σ2
n .
Convergence of Random Variables
For some special cases, we can write down the answer:
1. X ∼ Bernoulli(p), then nX̄n ∼ Binomial(n, p)
2. X ∼ N (µ, σ 2 ), then nX̄n ∼ N (µ, σ 2 /n)
3. X ∼ Exponential(λ), then 2nλX̄n ∼ Gamma(1/2, n/2).
1. Convergence with probability one.
2. Convergence in probability.
But this isn’t good enough!
3. Convergence in Mean Square.
Therefore, we study the asymptotics of X̄n .
4. Convergence in Distribution
Why so many? RVs are functions, random ones too.
Convergence with probability one.
Convergence in Probability
This is the strongest type of convergence, and is usually the
hardest to prove.
This is a weaker type of convergence, and is often easier to prove.
Definition
Xn converges to X in probability if for all ε > 0
Definition
Xn converges to X with probability one if
lim P(|Xn − X | > ε) = 0.
n→∞
P(Xn → X ) = 1.
This is usually written as
p
I Xn → X
This is written in many ways:
I
Xn → X wp1.
wp1
Xn → X
I
Xn → X a.s.
I
This is as close to “pointwise” convergence as random variables
get - since we can always ignore what’s happening on a set that
has probability zero.
Convergence in MSE
Convergence in Probability to a Constant:
p
Suppose that Xn → c. Then we can also check that
1 x ≥c
lim P(Xn ≤ x) =
n→∞
0 x <c
Convergence in Distribution
This is the weakest type of convergence for RVs: it says something
only about the behaviour of the limit (and nothing about the joint
relationship of Xn and X ).
Definition
Xn converges to X in Mean Square if
lim E [(Xn − X )2 ] = 0.
Definition
n→∞
Xn converges to X in distribution if
This is usually written as
ms
I Xn → X
lim P(Xn ≤ x) = P(X ≤ x)
n→∞
for all x such that P(X ≤ x) is continuous at x.
Convergence in MS to a Constant:
2
2
E [(Xn − c) ] = V (Xn ) + (E [Xn ] − c) .
ms
Therefore, Xn → c iff V (Xn ) → 0 and E [Xn ] → c.
This is usually written as
d
I Xn → X
I
Xn ⇒ X
Relationships between the types of convergence
A Technical Note
Let Xn = 1/2n .
Xn → X wp1
⇓
p
Xn → X
ms
⇐∗ Xn → X
Let Z be a standard normal, and consider Xn = (−1)n+1 Z .
⇓
Xn ⇒ X
Proof of ∗:
P(|Xn − X | > ε) ≤
E [(Xn − X )2 ]
ε2
Law of Large Numbers (LLN)
Central Limit Theorem (CLT)
Theorem
Theorem
Strong Law of Large Numbers.
X1 , . . . Xn are IID with mean µ < ∞ (E [|X |] < ∞). Then X̄n → µ
with probability one.
X1 , . . . Xn are IID with mean µ and V (X ) = σ 2 < ∞. Then
√
n(X̄n − µ) ⇒ Z ,
Theorem
where Z ∼ N (0, σ 2 ).
Weak Law of Large Numbers.
X1 , . . . Xn are IID with mean µ < ∞ (E [|X |] < ∞) and
V (X ) = σ 2 < ∞. Then X̄n → µ in probability.
Proof (of WLLN):
E [(X̄n − µ)2 ] = σ 2 /n.
Why is this important?
I
Why is this important?
I
There are many different ways of writing this.
Proof of CLT