* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Lesson 6 Chapter 5: Convolutions and The Central Limit Theorem
Survey
Document related concepts
Infinitesimal wikipedia , lookup
History of trigonometry wikipedia , lookup
Georg Cantor's first set theory article wikipedia , lookup
Infinite monkey theorem wikipedia , lookup
Series (mathematics) wikipedia , lookup
Vincent's theorem wikipedia , lookup
List of important publications in mathematics wikipedia , lookup
Pythagorean theorem wikipedia , lookup
Karhunen–Loève theorem wikipedia , lookup
Fermat's Last Theorem wikipedia , lookup
Wiles's proof of Fermat's Last Theorem wikipedia , lookup
Four color theorem wikipedia , lookup
Nyquist–Shannon sampling theorem wikipedia , lookup
Law of large numbers wikipedia , lookup
Fundamental theorem of algebra wikipedia , lookup
Transcript
Outline Law of Large Numbers Convolutions The Central Limit Theorem Lesson 6 Chapter 5: Convolutions and The Central Limit Theorem Michael Akritas Department of Statistics The Pennsylvania State University Michael Akritas Lesson 6 Chapter 5: Convolutions and The Central Limit Theorem Outline Law of Large Numbers Convolutions The Central Limit Theorem 1 Law of Large Numbers 2 Convolutions Introduction Distribution of Sums of Normal Random Variables 3 The Central Limit Theorem The DeMoivre-Laplace Theorem Michael Akritas Lesson 6 Chapter 5: Convolutions and The Central Limit Theorem Outline Law of Large Numbers Convolutions The Central Limit Theorem Theorem (The Law of Large Numbers) Let X1 , . . . , Xn be independent and identically distributed and let g be a function such that −∞ < E[g(X1 )] < ∞. Then, n 1X g(Xi ) converges in probability to E[g(X1 )], n i=1 It guarantees the consistency of averages, i.e., that as the sample size increases, the average approximates the population mean more and more accurately. Limitation: The LLN provides no guidance regarding the quality of the approximation. Michael Akritas Lesson 6 Chapter 5: Convolutions and The Central Limit Theorem Outline Law of Large Numbers Convolutions The Central Limit Theorem The Assumption of finite µ and σ 2 The LLN requires that the population has finite mean. Later we will be using also the assumption of finite variance. Is it possible for the mean of a RV to not exist? Or a for RV to have infinite mean, or to have finite µ but infinite σ 2 ? The most famous abnormal RV is the Cauchy whose pdf is f (x) = 1 1 , −∞ < x < ∞. π 1 + x2 Its median is zero, but its mean does not exist. Michael Akritas Lesson 6 Chapter 5: Convolutions and The Central Limit Theorem Outline Law of Large Numbers Convolutions The Central Limit Theorem Example Let X1 and X2 have PDFs f1 (x) = x −2 , 1 ≤ x < ∞, f2 (x) = 2x −3 , 1 ≤ x < ∞, respectively. Then, E(X1 ) = ∞, E(X2 ) = 2, Var(X2 ) = ∞. Solution. Z ∞ E(X1 ) = 1 Z E(X2 ) = 1 ∞ xf1 (x) dx = ∞, E(X12 ) = xf2 (x) dx = 2, E(X22 ) = Michael Akritas Z Z ∞ 1 ∞ x 2 f1 (x) dx = ∞ x 2 f2 (x) dx = ∞ 1 Lesson 6 Chapter 5: Convolutions and The Central Limit Theorem Outline Law of Large Numbers Convolutions The Central Limit Theorem Distributions with infinite mean, or infinite variance, are described as heavy tailed. The LLN may not hold for heavy tailed distributions: X may not converge (as in the case of Cauchy) or diverge. (Also the CLT, which we will see shortly, does not hold.) Samples coming from heavy tailed distributions are much more likely to contain outliers. If large outliers exist in a data set, it might be a good idea to focus on estimating another quantity, such as the median, which is well defined also for heavy tailed distributions. Michael Akritas Lesson 6 Chapter 5: Convolutions and The Central Limit Theorem Outline Law of Large Numbers Convolutions The Central Limit Theorem 1 Law of Large Numbers 2 Convolutions Introduction Distribution of Sums of Normal Random Variables Introduction Distribution of Sums of Normal Random Variables 3 The Central Limit Theorem The DeMoivre-Laplace Theorem Michael Akritas Lesson 6 Chapter 5: Convolutions and The Central Limit Theorem Outline Law of Large Numbers Convolutions The Central Limit Theorem Introduction Distribution of Sums of Normal Random Variables The distribution of the sum of two independent random variables is called the convolution of the two distributions. In some cases it is easy to find the convolution of two RVs: 1 2 3 If Xi ∼ Bin(ni , p), i = 1, . . . , k, are independent then X1 + · · · + Xk ∼ Bin(n1 + · · · + nk , p). (Example 5.3.2.) If Xi ∼ Poisson(λi ), i = 1, . . . , k , are independent then X1 + · · · + Xk ∼ Poisson(λ1 + · · · + λk ). (Example 5.3.1.) We will also see that if Xi ∼ N(µi , σi2 ), i = 1, . . . , k , are independent then X1 + · · · + Xk ∼ N(µ1 + · · · + µk , σ12 + · · · + σk2 ). Such nice results are the exception. In general, the distribution of a sum of independent random variables is not easy to determine. Michael Akritas Lesson 6 Chapter 5: Convolutions and The Central Limit Theorem Outline Law of Large Numbers Convolutions The Central Limit Theorem Introduction Distribution of Sums of Normal Random Variables Example 5.3.3 derives the convolution of two independent U(0, 1) random variables. http://stat.psu.edu/ ˜mga/401/fig/convol_2Unif.pdf See also relation (5.3.3) for general formula of the density of the convolution of two densities (engineering majors might recognize it!). Instead of the mathematical derivation of the formula for the convolution of two independent U(0, 1) random variables, we will use R and random numbers to give a numerical ”proof” of it. The numerical ”proof” is based on the fact that a histogram serves as an estimator of the pdf. In fact, histograms are consistent estimators (i.e. as the sample size increases, the estimation error tends to zero). Michael Akritas Lesson 6 Chapter 5: Convolutions and The Central Limit Theorem Outline Law of Large Numbers Convolutions The Central Limit Theorem Introduction Distribution of Sums of Normal Random Variables Numerical verification of the result of Example 5.3.3 can be done with the R commands ”hist” and ”density” (smooth histogram): Try m=matrix(runif(150000),nrow=3) plot(density((m[1,]+m[2,])/2),ylim=c(0,2)); lines(density(m[1,]), col=”red”) How about the convolution of three independent uniforms? Michael Akritas Lesson 6 Chapter 5: Convolutions and The Central Limit Theorem Outline Law of Large Numbers Convolutions The Central Limit Theorem 1 Law of Large Numbers 2 Convolutions Introduction Distribution of Sums of Normal Random Variables Introduction Distribution of Sums of Normal Random Variables 3 The Central Limit Theorem The DeMoivre-Laplace Theorem Michael Akritas Lesson 6 Chapter 5: Convolutions and The Central Limit Theorem Outline Law of Large Numbers Convolutions The Central Limit Theorem Introduction Distribution of Sums of Normal Random Variables Proposition Let X1 , X2 , . . . , Xn be independent with Xi ∼ N(µi , σi2 ), and set Y = a1 X1 + · · · + an Xn . Then Y ∼ N(µY , σY2 ), where µY = a1 µ1 + · · · + an µn , and σY2 = a12 σ12 + · · · + an2 σn2 . Corollary Let X1 , X2 , . . . , Xn be iid N(µ, σ 2 ). Then X ∼ N(µ, Michael Akritas σ2 ). n Lesson 6 Chapter 5: Convolutions and The Central Limit Theorem Outline Law of Large Numbers Convolutions The Central Limit Theorem Introduction Distribution of Sums of Normal Random Variables Example (Controlling the estimation error via n) We want to be 95% certain that X does not differ from the population mean by more than 0.3 units. How large should the sample size n be? It is known that the population distribution is normal with σ 2 = 9. Remark: 1 This example illustrates the use of a probabilistic result (which is the previous corollary) to answer a statistical question. 2 The example assumes normality with known variance. In real life these are typically unknown. More details on this will be given in Chapter 8. Michael Akritas Lesson 6 Chapter 5: Convolutions and The Central Limit Theorem Outline Law of Large Numbers Convolutions The Central Limit Theorem Introduction Distribution of Sums of Normal Random Variables Solution: Using the previous corollary we have ! X − µ 0.3 0.3 P(|X − µ| < 0.3) = P √ < √ = P |Z | < √ . σ/ n σ/ n σ/ n We want to find n so that P(|X − µ| < 0.3) = 0.95. Thus, 0.3 √ = z0.025 , or n = σ/ n 1.96 σ 0.3 2 = 384.16. Thus, using n = 385 will satisfy the precision objective. Michael Akritas Lesson 6 Chapter 5: Convolutions and The Central Limit Theorem Outline Law of Large Numbers Convolutions The Central Limit Theorem The DeMoivre-Laplace Theorem The complexity of finding the exact distribution of a sum (or average) of k (> 2) RVs increases rapidly with k. This is why the following is the most important theorem in probability and statistics. Theorem (The Central Limit Theorem) Let X1 , . . . , Xn be iid with mean µ and variance σ 2 < ∞. Then, for large enough n (n ≥ 30 for the purposes of this course), X has approximately a normal distribution with mean µ and variance σ 2 /n, i.e. σ2 · X ∼ N µ, . n Michael Akritas Lesson 6 Chapter 5: Convolutions and The Central Limit Theorem Outline Law of Large Numbers Convolutions The Central Limit Theorem The DeMoivre-Laplace Theorem The Central Limit Theorem: Comments The CLT does not require the Xi to be continuous RVs. T = X1 + · · · + Xn has approximately a normal distribution with mean nµ and variance nσ 2 , i.e. · T = X1 + · · · + Xn ∼ N nµ, nσ 2 . The CLT can be stated for more general linear combinations of independent r.v.s, and also for certain dependent RVs. The present statement suffices for the range of applications of this book. The CLT explains the central role of the normal distribution in probability and statistics. Michael Akritas Lesson 6 Chapter 5: Convolutions and The Central Limit Theorem Outline Law of Large Numbers Convolutions The Central Limit Theorem The DeMoivre-Laplace Theorem Example Two towers are constructed, each by stacking 30 segments of concrete. The height (in inches) of a randomly selected segment is uniformly distributed in (35.5, 36.5). A roadway can be laid across the two towers provided the heights of the two towers are within 4 inches. Find the probability that the roadway can be laid. Solution: The heights of the two towers are T1 = X1,1 + · · · + X1,30 , and T2 = X2,1 + · · · + X2,30 , where Xi,j is the height of the jth segment used in the ith tower. By the CLT, · · T1 ∼ N(1080, 2.5), and T2 ∼ N(1080, 2.5). Michael Akritas Lesson 6 Chapter 5: Convolutions and The Central Limit Theorem Outline Law of Large Numbers Convolutions The Central Limit Theorem The DeMoivre-Laplace Theorem Example (Continued) Since the two heights T1 , T2 are independent r.v.s, their difference, D, is distributed as · D = T1 − T2 ∼ N(0, 5). Therefore, the probability that the roadway can be laid is −4 4 · P(−4 < D < 4) = P <Z < 2.236 2.236 = Φ(1.79) − Φ(−1.79) = .9633 − .0367 = .9266. Michael Akritas Lesson 6 Chapter 5: Convolutions and The Central Limit Theorem Outline Law of Large Numbers Convolutions The Central Limit Theorem The DeMoivre-Laplace Theorem Example The waiting time for a bus, in minutes, is uniformly distributed in [0, 10]. In five months a person catches the bus 120 times. Find the 95th percentile, t0.05 , of the person’s total waiting time T . Solution. Write T = X1 + . . . + X120 , where Xi =waiting time for catching the bus the ith time. Since n = 120 ≥ 30, and E(Xi ) = 5, Var(Xi ) = 102 /12 = 100/12, we have 100 · T ∼ N 120 × 5, 120 = N(600, 1000). 12 Therefore, √ √ t0.05 ' 600 + z.05 1000 = 600 + 1.645 1000 = 652.02. Michael Akritas Lesson 6 Chapter 5: Convolutions and The Central Limit Theorem Outline Law of Large Numbers Convolutions The Central Limit Theorem 1 Law of Large Numbers 2 Convolutions The DeMoivre-Laplace Theorem Introduction Distribution of Sums of Normal Random Variables 3 The Central Limit Theorem The DeMoivre-Laplace Theorem Michael Akritas Lesson 6 Chapter 5: Convolutions and The Central Limit Theorem Outline Law of Large Numbers Convolutions The Central Limit Theorem The DeMoivre-Laplace Theorem The DeMoivre-Laplace theorem was proved first by DeMoivre in 1733 for p = 0.5 and extended to general p by Laplace in 1812. Due to the prevalence of the Bernoulli distribution in the experimental sciences, it continues to be stated separately even though it is now recognized as a special case of the CLT. Theorem (DeMoivre-Laplace) If T ∼ Bin(n, p) then, for n satisfying np ≥ 5 and n(1 − p) ≥ 5, · T ∼ N (np, np(1 − p)) . Michael Akritas Lesson 6 Chapter 5: Convolutions and The Central Limit Theorem Outline Law of Large Numbers Convolutions The Central Limit Theorem The DeMoivre-Laplace Theorem The continuity correction For an integer valued RV X , and Y the approximating normal RV, the principle of continuity correction specifies P(X ≤ k) ' P(Y ≤ k + 0.5). The DeMoivre-Laplace theorem and continuity correction yield k + 0.5 − np P(T ≤ k) ' Φ p . np(1 − p) For individual probabilities use P(X = k ) ' P(k − 0.5 < Y < k + 0.5). Michael Akritas Lesson 6 Chapter 5: Convolutions and The Central Limit Theorem Outline Law of Large Numbers Convolutions The Central Limit Theorem The DeMoivre-Laplace Theorem 0.00 0.05 0.10 0.15 0.20 0.25 Bin(10,0.5) PMF and N(5,2.5) PDF 0 2 4 6 8 10 Figure: Approximation of P(X ≤ 5) with and without continuity correction. Michael Akritas Lesson 6 Chapter 5: Convolutions and The Central Limit Theorem Outline Law of Large Numbers Convolutions The Central Limit Theorem The DeMoivre-Laplace Theorem Example A college basketball teams plays 30 regular season games, 16 which are against class A teams and 14 are against class B teams. The probability that the team will win a game is 0.4 if the team plays against a class A team and 0.6 if the team plays against a class B team. Assuming that the results of different games are independent, approximate the probability that 1 the team will win at leat 18 games; 2 the number of wins against class B teams is smaller than that against class A teams. Michael Akritas Lesson 6 Chapter 5: Convolutions and The Central Limit Theorem Outline Law of Large Numbers Convolutions The Central Limit Theorem The DeMoivre-Laplace Theorem Solution: If X1 and X2 denote the number of wins against class A teams and class B teams, X1 ∼ Bin(16, 0.4), and X2 ∼ Bin(14, 0.6). Since 16 × 0.4 and 14 × 0.4 are both ≥ 5, · X1 ∼ N(16 × 0.4, 16 × 0.4 × 0.6) = N(6.4, 3.84), · X2 ∼ N(14 × 0.6, 14 × 0.6 × 0.4) = N(8.4, 3.36). Consequently, since X1 and X2 are independent, · X1 + X2 ∼ N(6.4 + 8.4, 3.84 + 3.36) = N(14.8, 7.20), · X2 − X1 ∼ N(8.4 − 6.4, 3.84 + 3.36) = N(2, 7.20). Michael Akritas Lesson 6 Chapter 5: Convolutions and The Central Limit Theorem Outline Law of Large Numbers Convolutions The Central Limit Theorem The DeMoivre-Laplace Theorem Solution continued: Hence, using also continuity correction, the needed approximation for part 1 is 17.5 − 14.8 √ P(X1 + X2 ≥ 18) = 1 − P(X1 + X2 ≤ 17) ' 1 − Φ 7.2 = 1 − Φ(1.006) = 1 − 0.843 = 0.157, and the needed approximation for part 2 is −0.5 − 2 √ P(X2 − X1 < 0) ' Φ = Φ(−0.932) = 0.176. 7.2 Michael Akritas Lesson 6 Chapter 5: Convolutions and The Central Limit Theorem