Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Quick Review: More Theorems for Conditional Expectation: Definition Theorem The conditional expectation of Z = h(X , Y ) is defined as R h(x , y )fY |X =x0 (y )dy if continuous P 0 E [Z |X = x0 ] = h(x0 , y )pY |X =x0 (y ) if discrete Law of Iterated Expectations E [h(X , Y )] = E [E [h(X , Y )|X ]] In particular, E [E [Y |X ]] = E [Y ]. I E [a + bX + cY |X = x0 ] = a + bx0 + cE [Y |X = x0 ]] Theorem I If Y and X are independent, then Analysis of Variance E [h(Y )|X = x0 ] = E [h(Y )]. Theorems for Multivariate Normal Theorem Let Y be a multivariate normal random variable, Nn (µY , ΣY ), and consider a linear transformation of Y: W = AY + b, where A is an n × n matrix and b is a vector of length n. Then W is also multivariate normal with mean AµY + b and covariance matrix AΣY AT . I In fact, the theorem also holds if A is p × n and b is p × 1, with 1 ≤ p ≤ n. The result is well defined as long as AΣY AT is non-singular. V (Y ) = V (E [Y |X ]) + E [V (Y |X )] Conditional Expectation for Bivariate Normal: 2nd Example Suppose {X , Y } is N2 (µ, Σ), where 3 µ= 1 and Σ= 1 0.5 0.5 2 . Find E [Y |X = x0 ]. 1. Find A and b such that {X , Y }T = AZ + b. Choose A upper/lower triangular, depending on conditioning. 2. Using the transformation, find the expectation. Estimation According to the National Academy of Sciences 2005 publication, “Saving Women’s Lives: Strategies for Improving Breast Cancer Detection and Diagnosis”, the risk of a false positive result in a mammogram is about 1 in 10. Estimation Con’t. In probability/statistics language we say that X1 , . . . , Xn are IID (independent and identically distributed) Bernoulli(p). In general, if X1 , . . . , Xn are IID from some distribution with, say, pdf f , mean µ and variance σ 2 , then How did they come up with this number? 1. The joint density of X1 , . . . , Xn is Πni=1 f (xi ). 2. The mean of X̄n is µ. 3. The variance of X̄n is But what is the distribution of X̄n ? σ2 n . Convergence of Random Variables For some special cases, we can write down the answer: 1. X ∼ Bernoulli(p), then nX̄n ∼ Binomial(n, p) 2. X ∼ N (µ, σ 2 ), then nX̄n ∼ N (µ, σ 2 /n) 3. X ∼ Exponential(λ), then 2nλX̄n ∼ Gamma(1/2, n/2). 1. Convergence with probability one. 2. Convergence in probability. But this isn’t good enough! 3. Convergence in Mean Square. Therefore, we study the asymptotics of X̄n . 4. Convergence in Distribution Why so many? RVs are functions, random ones too. Convergence with probability one. Convergence in Probability This is the strongest type of convergence, and is usually the hardest to prove. This is a weaker type of convergence, and is often easier to prove. Definition Xn converges to X in probability if for all ε > 0 Definition Xn converges to X with probability one if lim P(|Xn − X | > ε) = 0. n→∞ P(Xn → X ) = 1. This is usually written as p I Xn → X This is written in many ways: I Xn → X wp1. wp1 Xn → X I Xn → X a.s. I This is as close to “pointwise” convergence as random variables get - since we can always ignore what’s happening on a set that has probability zero. Convergence in MSE Convergence in Probability to a Constant: p Suppose that Xn → c. Then we can also check that 1 x ≥c lim P(Xn ≤ x) = n→∞ 0 x <c Convergence in Distribution This is the weakest type of convergence for RVs: it says something only about the behaviour of the limit (and nothing about the joint relationship of Xn and X ). Definition Xn converges to X in Mean Square if lim E [(Xn − X )2 ] = 0. Definition n→∞ Xn converges to X in distribution if This is usually written as ms I Xn → X lim P(Xn ≤ x) = P(X ≤ x) n→∞ for all x such that P(X ≤ x) is continuous at x. Convergence in MS to a Constant: 2 2 E [(Xn − c) ] = V (Xn ) + (E [Xn ] − c) . ms Therefore, Xn → c iff V (Xn ) → 0 and E [Xn ] → c. This is usually written as d I Xn → X I Xn ⇒ X Relationships between the types of convergence A Technical Note Let Xn = 1/2n . Xn → X wp1 ⇓ p Xn → X ms ⇐∗ Xn → X Let Z be a standard normal, and consider Xn = (−1)n+1 Z . ⇓ Xn ⇒ X Proof of ∗: P(|Xn − X | > ε) ≤ E [(Xn − X )2 ] ε2 Law of Large Numbers (LLN) Central Limit Theorem (CLT) Theorem Theorem Strong Law of Large Numbers. X1 , . . . Xn are IID with mean µ < ∞ (E [|X |] < ∞). Then X̄n → µ with probability one. X1 , . . . Xn are IID with mean µ and V (X ) = σ 2 < ∞. Then √ n(X̄n − µ) ⇒ Z , Theorem where Z ∼ N (0, σ 2 ). Weak Law of Large Numbers. X1 , . . . Xn are IID with mean µ < ∞ (E [|X |] < ∞) and V (X ) = σ 2 < ∞. Then X̄n → µ in probability. Proof (of WLLN): E [(X̄n − µ)2 ] = σ 2 /n. Why is this important? I Why is this important? I There are many different ways of writing this. Proof of CLT