Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
7 Sampling Distributions and the Central Limit Theorem 7.1 Introduction Example 7.1. Suppose that Y1 , . . . , Yn is an iid sample from fY (y). For example, each of the following are statistics: P • T (Y1 , . . . , Yn ) = Ȳ = n1 ni=1 Yi • T (Y1 , . . . , Yn ) = 12 [Y(n/2) + Y(n/2+1) ] if n is even. • T (Y1 , . . . , Yn ) = Y(1) • T (Y1 , . . . , Yn ) = Y(n) • T (Y1 , . . . , Yn ) = S 2 = Y(1) 1 Pn n 1 i=1 (Yi Ȳ )2 30 7.2 Sampling distributions related to the normal distribution Example 7.2. Suppose that Y1 , . . . , Yn is an iid sample from a N (µ, of the sample mean? 2 ). What is the distribution Example 7.3. In the interest of pollution control, an experimenter records Y , the amount of bacteria per unit volume of water (measured in mg/cm3 ). The population distribution for Y is assumed to be normal with mean µ = 48 and variance 2 = 100. That is Y ⇠ N (µ, 2 ). (a) What is the probability that a single water specimen’s bacteria amount will exceed 50 mg/cm3 ? (b) Suppose that the experimenter takes a random sample of n = 100 water specimens, and denote the observation by Y1 , . . . , Y100 . What is the probability that the sample mean Ȳ will exceed exceed 50 mg/cm3 ? (c) How large should the sample size n be so that P (Ȳ > 50) < 0.01? 31 32 Now we prove that (n 1)S 2 2 ⇠ 33 2 (n 1). Example 7.4. In an ecological study examining the e↵ects of Hurricane Katrina, researchers choose n = 9 plots and, for each plot, record Y , the amount of dead weight material (recorded in grams). Denote the nine dead weights by Y1 , . . . , Y9 , where Yi represents the dead weight for plot i. The researchers model the data Y1 , . . . , Y9 as an iid N (100, 32) sample. What is the probability that the sample variance S 2 of the nine dead weights is less than 20? That is, what is P (S 2 < 20)? Further, how large should the sample size n be so that P (S 2 < 20) < 0.01. 34 7.3 The t distribution Recall that if Y1 , . . . , Yn is an iid N (µ, 2) Z= Suppose we replace sample, the sample mean Ȳ ⇠ N (µ, Ȳ 2 /n); µ p ⇠ N (0, 1). / n by its estimator S, now we want to find the distribution of t= Ȳ µ p . S/ n 35 i.e., 36 37 The t(3) density function (dotted) and the standard normal density (solid) 38 7.4 The F distribution 1. If W ⇠ F (⌫1 , ⌫2 ), then 1/W ⇠ F (⌫2 , ⌫1 ). 2. If T ⇠ t(⌫), then T 2 ⇠ F (1, ⌫). 3. If W ⇠ F (⌫1 , ⌫2 ), then (⌫1 /⌫2 )W/[1 + (⌫1 /⌫2 )W ] ⇠ Beta(⌫1 /2, ⌫2 /2). 39 Example 7.5. Suppose that Y1 , . . . , Yn is an iid sample from a N (µ, p (Ȳ µ)/(S/ n). What is the distribution of T 2 ? Then we have F = S12 / S22 / 2 1 2 ⇠ F (n1 40 1, n2 1). 2) distribution. Let T = 7.5 The Central Limit Theorem 41 42 Proof of the central limit theorem: 43 Example 7.6. A chemist is studying the degradation behavior of vitamin B6 in a multivitamin. The chemist selects a random sample of n = 36 multivitamin tablets, and for each tablet, counts the number of days until the B6 content falls below the FDA requirement. Let Y1 , . . . , Y36 denote the measurements for the 36 tablets, and assume that Y1 , . . . , Y36 is an iid sample from a Poisson distribution with mean 50. What is the approximate probability that the average number of days Ȳ will exceed 52? How many tablets does the research need to observe so that P (Ȳ < 49.5) ⇡ 0.01? 44 7.6 The normal approximation to the binomial Let X= n X Yi , i=1 the number of “successes.” What is the distribution of X? Define the sample proportion p̂ as n p̂ = X 1X = Yi = Ȳ . n n i=1 45 Example 7.7. Use Monte Carlo simulation to approximate the sample distribution of p̂ for the following cases: Case 1: n = 10, p = 0.1 Case 4: n = 10, p = 0.5 Case 2: n = 40, p = 0.1 Case 2: n = 40, p = 0.5 Case 3: n = 100, p = 0.1 Case 6: n = 100, p = 0.5 One can clearly see that the normal approximation is not good when p = 0.1, except when n is very large. On the other hand, when p = 0.5, the normal approximation is already pretty good when n = 40. 46