Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
Chapter 6 Sampling Distributions 6.1 Random Sampling Def 1 A population consists of the totality of the observations with which we are concerned. Remark 1. The size of a populations may be finite or infinite. For example, • Finite population — the blood types for students in a scholl. • Infinite population — the observations of atmospheric pressure every day from past to future. 2. The observations from a population have a common distribution. We will use Xi to denote the value of the ith observation. Def 2 A sample is a subset of population. Def 3 A random sample of size n is a collection of n indenpendent random variables with a common distribtion. Remark The probability distribtion of a (random) sample, say, x1 , x2 , . . . , xn , of size n is described as the following jpdf f (x1 , x2 , . . . , xn ) = f (x1 )f (x2 ) · · · f (xn ) 6.2 Some Important Statistics Def 4 Any function T (X1 , X2 , . . . , Xn ) of the observations X1 , X2 , . . . , Xn , is called a statistic. Central Tendency in a sample Let X1 , X2 , . . . , Xn represent a random sample of size n. 1 1. sample mean — X= 2. sample median — n 1X Xi n i=1 Y(n+1)/2 if n is odd X̃ = Y + Y(n/2)+1 n/2 if n is even 2 where Yi ’s denote the order statistics corresponding to X1 , . . . , Xn . 3. mode — the value in the sample that occurs most often or with the greatest frequency. The mode may not exist, and when it does it is not necessary unique. Example 1 The lengths of time, in minutes, that 10 patients waited in a doctor’s office before receiving treatment were recorded as follows: 5, 11, 9, 5, 10, 15, 6, 10, 5, 10. Find 1. the mean; 2. the median; 3. the mode. sol) 1. x = 1 (5 10 + 11 + · · · + 10) = 8.6 mimutes; 2. x̃ = 9.5 minutes; 3. M o(X) = 5 and 10 minutes. Facts 1. M o(X) ≤ X̃ ≤ X or X ≤ X̃ ≤ M o(X). 2. n X |Xi − X̃| ≤ i=1 3. n X n X 2 (Xi − X) ≤ i=1 |Xi − a| for any a. i=1 n X (Xi − a)2 for any a. i=1 4. X − M o(X) ≈ 3(X − X̃). 2 Variability in the Sample 1. range — Let X(n) = max(X1 , . . . , Xn ) and X(1) = min(X1 , . . . , Xn ). The range of a sample is defined to be X(n) − X(1) . 2. sample variance — S2 = n 1 X (Xi − X)2 n − 1 i=1 à n n 1 X 1 X Xi2 − Xi = n − 1 i=1 n i=1 ( n X 1 2 = Xi2 − nX n − 1 i=1 !2 ) 3. sample deviation — S. Example 2 The IQ’s of a random sample of five members of a sorority are 108, 112, 127, 118, and 113. Find 1. the range; 2. the sample variance; 3. the sample deviation. sol) 1. Range = 127 − 108 = 19. 2. 5 X x2i = 1082 + 1122 + 1272 + 1182 + 1132 = 67, 030 i=1 5 X xi = 108 + 112 + 127 + 118 + 113 = 578 i=1 S à 2 5782 1 67030 − = 4 5 = 53.3 3. S = 7.300. 3 ! 6.3 Sampling Distributions Sampling Distributions of Means 1. E[X] = µ, V ar[X] = σ 2 /n. 2. From central limit theorem, Z = lim n←∞ X −µ √ ∼ N (0, 1) σ/ n 3. Suppose there are two populations with means and variances being (µ1 , σ12 ) and (µ2 , σ22 ), respectively. If two random sample of size n1 and n2 are drawn at random from two populations. Then E[X 1 ± X 2 ] = µ1 ± µ2 σ2 σ2 V ar[X 1 ± X 2 ] = 1 + 2 n1 n2 As n1 −→ ∞ and n2 −→ ∞, Z= (X 1 ± X 2 ) − (µ1 ± µ2 ) r σ12 n1 + σ22 n2 ∼ N (0, 1) 4. The above fact trivially holds if both populations are normally distributed. Example 3 If all possible sample size of 16 are drawn from a normal population with mean equal to 50 and standard deviation equal to 5, what is the probability that a sample mean X will fall in the interval from µX − 1.9σX to µX − 0.4σX ? sol) • We want to find P (µX − 1.9σX < X < µX − 0.4σX ). • Clearly, X ∼ N (50, 52 /16) ≡ N (50, 1.5625). • P (µX − 1.9σX < X < µX − 0.4σX ) = P (X < µX − 0.4σX ) − P (X < µX − 1.9σX ) = Φ(−0.4) − Φ(−1.9) = 0.3446 − 0.0287 = 0.3159 4 Example 4 Given the discrete uniform population ( f (x) = 1 3 x = 2, 4, 6 0 otherwise find the probability that a random sample of size 54, selected with replacement, will yield a sample mean greater than 4.1 but less than 4.4. sol) • We want to find P (4.1 < X < 4.4). • 1 (2 + 4 + 6) = 4 3 1 2 8 = (2 + 42 + 62 ) − 42 = 3 3 µX = 2 σX • n = 54 > 30, X ∼ N (4, (2/9)2 ). Hence, P (4.1 < X < 4.4) = P (X < 4.4) − P (X ≤ 4.1) à ! à ! 4.4 − 4 4.1 − 4 −Φ = Φ 2/9 2/9 = Φ(1.8) − Φ(0.45) = 0.9641 − 0.6736 = 0.2905 Example 5 A random sample of size 25 is taken from a normal population having a mean of 80 and a standard deviation of 5. A second random sample of size 36 is taken from a different normal population having a mean of 75 and a standard deviation of 3. Find the probability that the sample mean computed from the 25 measurements will exceed the sample mean computed from the 36 measurements by at least 3.4 but less than 5.9. • n1 = 25, µ1 = 80, σ1 = 5. • n2 = 36, µ2 = 75, σ2 = 3. • We want to find P (3.4 ≤ X 1 − X 2 < 5.9). • Clearly, X 1 ∼ N (80, 52 /25), X 2 ∼ (75, 32 /36) =⇒ (X 1 − X 2 ) ∼ N (5, 5/4). 5 • P (3.4 ≤ X 1 − X 2 < 5.9) = P (X 1 − X 2 < 5.9) − P (X 1 − X 2 < 3.4) = Φ 5.9 − 5 q 5/4 −Φ 3.4 − 5 q 5/4 = Φ(0.8050) − Φ(−1.4311) = 0.78955 − 0.07466 = 0.71489 Remark Let Z ∼ N (0, 1). The critical value zα is defined to be P (Z ≥ zα ) = α Several often used zα are z0.005 z0.01 z0.025 z0.05 = = = = 2.58 2.33 1.96 1.645 We can find zα from the probability table. Chi-square distribution Def 5 A random variable X is said to possess a chi-square distribution with ν degrees of freedom, denoted by X ∼ χ2ν , if it has the pdf ( f (x) = 2−ν/2 (ν/2)−1 −x/2 x e Γ(ν/2) 0 x¿0 x≤0 Remark 1. χ2ν ≡ Γ(n/2, 1/2). 2. X ∼ N (0, 1) =⇒ X 2 ∼ χ21 . 3. Let Xi ∼ N (0, 1), i = 1, . . . , n be n independent random variables. Then, n X Xi2 ∼ χ2n . i=1 4. Let Xi ∼ N (µ, σ 2 ), i = 1, . . . , n be n independent random variables. Then, ¶ n µ X Xi − µ 2 i=1 σ 6 ∼ χ2n . 5. Let χ2 ∼ χ2ν . The chi-square critical value χ2α,ν is defined such that P (χ2 ≥ χ2α,ν ) = α We can find the critical value from the table. Example 6 Given a sample of size 16 from a normal population N (50, 102 ), 1. find b such that P (X − 50 ≤ b) = 0.05; 2. find c such that P (|X − 50| ≥ c) = 0.05; 3. find d such that P à n X ! 2 (Xi − 50) ≥ d = 0.05. i=1 sol) 1. à =⇒ P P (X − 50 ≤ b) ! X − 50 b √ ≤ √ 10/ 16 10/ 16 b =⇒ P (Z ≤ ) 2.5 =⇒ P (Z ≤ −z0.05 ) b =⇒ 2.5 =⇒ b = 0.05 = 0.05 = 0.05 = 0.05 = −z0.05 = −z0.05 · 2.5 = −1.645 · 2.5 = −4.1125 2. P (|X − 50| ≥ c) = 0.05 =⇒ ¯ ï ! ¯ X − 50 ¯ c ¯ ¯ √ ¯≥ √ P ¯ = 0.05 ¯ 10/ 16 ¯ 10/ 16 µ ¶ c 2.5 ¶ µ c =⇒ P Z ≥ 2.5 =⇒ P (Z ≥ z0.025 ) c =⇒ 2.5 =⇒ c =⇒ P |Z| ≥ 7 = 0.05 = 0.025 = 0.025 = z0.025 = z0.025 · 2.5 = 1.96 · 2.5 = 4.9 3. ! à 16 X P 2 (Xi − 50) ≥ d i=1 =⇒ P à 16 µ X Xi − 50 ¶2 10 i=1 à d ≥ 100 = 0.05 ! = 0.05 ! d =⇒ P χ ≥ 100 =⇒ P (χ2 ≥ χ0.05,16 ) d =⇒ 100 =⇒ d 2 = 0.05 = 0.05 = χ0.05,16 = 100 · χ0.05,16 = 100 · 7.962 = 796.2 Sampling Distributions of S 2 Let E[Xi ] = µ and V ar[Xi ] = σ 2 . Then, 1. " # n n 1X 1X E[X] = E Xi = E[Xi ] = µ n i=1 n i=1 " # n X 1 1 σ2 2 X = V ar[X] = V ar · nσ = i n2 n2 n i=1 2. n X (Xi − a)2 = i=1 n X (Xi − X) i=1 n X 2 = (Xi − X)2 = i=1 n X (Xi − X)2 + n(X − a)2 i=1 n X Xi2 2 − nX = i=1 n X n X à Xi2 i=1 n 1 X − Xi n i=1 !2 (Xi − µ)2 − n(X − µ)2 i=1 3. E " n X # 2 (Xi − X) =E " n X # 2 (Xi − µ) h i − nE (X − µ)2 = (n − 1)σ 2 i=1 i=1 4. " # n 1 X E[S ] = E (Xi − X)2 = σ 2 n − 1 i=1 2 5. X and S 2 are independent. 8 6. ¶ n µ X Xi − µ 2 i=1 σ à (n − 1)S 2 X −µ √ = + 2 σ σ/ n !2 7. Therefore, if Xi ∼ N (µ, σ 2 ), then • ¶ n µ X Xi − µ 2 σ i=1 à ∼ χ2n ; !2 X −µ √ • ∼ χ21 ; σ/ n (n − 1)S 2 • ∼ χ2n−1 ; 2 σ h i 2 = n − 1; • E (n−1)S 2 σ • V ar h (n−1)S 2 σ2 i = 2(n − 1). Example 7 A manufacturer of car batteries guarantees that his batteries will last, on the average, µ = 3 years with a standard deviation of σ = 1 year. Assume that the battery lifetime follows a normal distribution. If a sample of size 5 is taken to estimate the mean lifetime and variance. Find 1. P (|X − µ| ≥ b) = 0.05; 2. Find c such that P (S 2 ≥ c) = 0.05; à 3. Find d such that P ! 5 1X (Xi − µ)2 ≥ d = 0.05. 5 i=1 sol) 1. X ∼ N (3, 1/5). P (|X − 3| ≥ b) = 0.05 =⇒ ¯ ¯ ¯ ¯ ¯X − 3¯ b ¯≥ q P ¯¯ q ¯ ¯ 1/5 ¯ 1/5 = 0.05 b =⇒ P |Z| ≥ q = 0.05 1/5 =⇒ P (|Z| ≥ z0.025 ) = 0.05 b = z0.025 =⇒ q 1/5 q =⇒ b = z0.025 · 1/5 = 1.96(0.4472) = 0.8765 9 2. 4S 2 ∼ χ24 . 1 à =⇒ P P (S 2 ≥ c) = 0.05 ! 4S 2 4c ≥ = 0.05 1 1 ³ ³ ´ = =⇒ P χ2 ≥ 4c = 0.05 ´ =⇒ P χ2 ≥ χ20.05,4 = 0.05 =⇒ 4c = χ20.05,4 =⇒ c = χ20.05,4 /4 = 9.488/4 = 2.372 3. ¶ 5 µ X Xi − µ 2 i=1 σ ∼ χ25 . à P à =⇒ P ! 5 1X (Xi − µ)2 ≥ d 5 i=1 µ 5 1X Xi − µ 5 i=1 σ ¶2 d ≥ 2 σ à 5d =⇒ P χ ≥ 2 σ = 0.05 ! = 0.05 ! 2 ³ =⇒ P χ2 ≥ χ20.05,5 = 0.05 ´ = 0.05 5d = χ20.05,5 2 σ =⇒ d = σ 2 χ20.05,5 = 11.070/5 = 2.214 =⇒ t-distribution Let X ∼ N (µ, σ 2 ). Then X ∼ N (µ, σ 2 /n) or, eqivalently, Z= X −µ √ ∼ N (0, 1) σ/ n In most cases, the value of σ 2 is not available. Thus, we will use S 2 to estimate σ 2 . The t-distribution deals with the distribution about the statistic T defined by T = X −µ √ S/ n Def 6 Let Z ∼ N (0, 1) and W ∼ χ2ν be two independent random variables. The random variable Z T =q W/ν is said to possess a t-distribution with ν degrees of freedom and is denoted by T ∼ tv . 10 Facts Let X ∼ tν . Then: 1. Γ ³ ν+1 2 ´ à t2 ³ ´ fT (t) = 1+ √ ν Γ ν2 πν !−(ν+1)/2 , −∞ < t < ∞ 2. E[T ] = 0 V ar[T ] = ν , ν−2 ν>2 3. As ν −→ ∞, tν ≡ N (0, 1). 4. Let X1 , . . . , Xn be a random sample from a normal population N (µ, σ 2 ). Then • X and S 2 are independent. X −µ (n − 1)S 2 √ ∼ N (0, 1), W = ∼ χ2n−1 . 2 σ/ n σ • Hence, Z X −µ √ ∼ tn−1 T =q = S/ n W/(n − 1) • Z= 5. The value of tα,ν , defined by P (T ≥ tα,ν ) = α, can be found from the probability table. 6. t1−α,ν = −tα,ν . Example 8 The gas consumption (liters/hr) of automobiles manufactured by a company is normally distributed but with mean µ and variance σ 2 being unknown. Now, a random sample of size 16 is taken to estimate µ by X. Find c in terms of sample variance s such that P (|X − µ| < c) = 0.95. sol) P (|X − µ| < c) =⇒ 1 − 2P (X − µ ≥ c) =⇒ P (X − µ ≥ c) ! à X −µ c √ ≥ √ =⇒ P S/ 16 S/ 16 11 = 0.95 = 0.95 = 0.025 = 0.025 à c √ =⇒ P T ≥ ! = 0.025 S/ 16 =⇒ P (T ≥ t0.025,15 ) = 0.025 c √ =⇒ = t0.025,15 s/ 16 √ =⇒ c = t0.025,15 · s/ 16 = (2.131/4)s = 0.5328s F -distribution Def 7 Let W1 ∼ χ2ν1 and W2 ∼ χ2ν2 be two independent random variables. The the random variable W1 /ν1 F = W2 /ν2 is said to possess and an F distribution with ν1 and ν2 degrees of freedom and is denoted by F ∼ Fν1 ,ν2 . Facts 1. ´ ³ ν /2 ν /2 ν1 +ν2 ν1 1 ν2 2 Γ 2 ³ ´ ³ ´ fF (x) = 0 Γ ν1 2 Γ ν2 2 x(ν1 /2)−1 x>0 (ν2 + ν1 x)(ν1 +ν2 )/2 x≤0 2. Let X1 , . . . , Xm be a sample of size m arising from a normal population N (µ1 , σ12 ), and let Y1 , . . . , Yn be a sample of size n arising from a normal population N (µ2 , σ22 ). Suppose that the two population are independent. Then F = S12 /σ12 ∼ Fm−1,n−1 S22 /σ22 pf) • Let (m − 1)S12 ; σ12 (n − 1)S22 . = σ22 W1 = W2 • Clearly, W1 ∼ χ2m−1 and W2 ∼ χ2n−1 . Hence, W1 /(m − 1) S 2 /σ 2 = 12 12 ∼ Fm−1,n−1 W2 /(n − 1) S2 /σ2 12 3. In the above, if σ12 = σ22 , we have S12 ∼ Fm−1,n−1 S22 4. The critical value fα,ν1 ,ν2 is defined such that P (F ≥ fα,ν1 ,ν2 ) = α. 5. f1−α,ν2 ,ν1 = pf) 1 fα,ν1 ,ν2 . • 1 − α = 1 − P (F ≥ fα,ν1 ,ν2 ) à ! W1 /ν1 = 1−P ≥ fα,ν1 ,ν2 W2 /ν2 à ! W2 /ν2 1 = 1−P ≤ W1 /ν1 fα,ν1 ,ν2 à ! W2 /ν2 1 = P ≥ W1 /ν1 fα,ν1 ,ν2 • According to the defintion regarding to the F critical value, we have f1−α,ν2 ,ν1 = 1 fα,ν1 ,ν2 6. The F critical value can be found from the probability table. Example 9 f0.95,11,8 =? sol) From the probability table, f0.05,8,11 = 2.95. Hence, f0.95,11,8 = 1 f0.05,8,11 = 1 = 0.34 2.95 Example 10 Two samples of size 5 are taken from sample two independent normal populations. Assume that the two populations have the same variance. 1. Find b such that P (S12 /S22 ≥ b) = 0.05. 2. Find c such taht P (S12 /S22 ≤ c) = 0.05. sol) Clearly, S12 /S22 ∼ F4,4 . 13 1. P (S12 /S22 ≥ b) =⇒ P (F ≥ b) =⇒ P (F ≥ f0.05,4,4 ) =⇒ b = = = = 0.05 0.05 0.05 f0.05,4,4 = 6.39 2. P (S12 /S22 ≤ c) =⇒ 1 − P (S12 /S22 > c) =⇒ P (F ≥ c) =⇒ P (F ≥ f0.95,4,4 ) =⇒ b 6.4 = = = = = 0.05 0.05 0.95 0.95 f0.95,4,4 1 1 = = 0.1565 = f0.05,4,4 6.39 Exercises 1. A finite population contains six numbers 1,2,3,4,5,6. (a) Find the mean µ and variance σ 2 of the numbers in the population. (b) Randomly taking a sample of size 2 without replacement from the population. Find the distribution of the sample mean X. N − n σ2 (c) Show that the sample mean in (b) satisfies E[X] = µ, and V ar[X] = . N −1 n 2. Let X1 , X2 , X3 , X4 be a sample from a normal population N (0, 1). What value of c can make the statistic c(X1 + X2 ) T = q X32 + X42 a t-distribution? Determine the number of degrees of freedom for the t-distribution. 3. Let Y1 , Y2 be a random sample from a normal population N (0, 1), and let X1 , X2 be a random sample from another independent normal population N (1, 1). Find (a) the distribution of X + Y ; Y1 + Y2 (b) the distribution of q (X2 − X1 )2 + (Y2 − Y1 )2 (X2 − X1 )2 + (Y2 − Y1 )2 ; 2 (X2 + X1 − 2)2 (d) the distribution of . (X2 − X1 )2 (c) the distribution of 14 ; 4. Let S12 and S22 be the sample variances from two independent normal population N (µ1 , 10) and N (µ2 , 15), respectively. Suppose n1 = n2 = 10. (a) Find b such that P (S12 /S22 ≥ b) = 0.01. (b) Find c such that P (S12 /S22 ≤ c) = 0.05. 5. Find (a) χ20.05,20 (b) χ20.95,15 (c) χ20.025,10 (d) t0.025,10 (e) t0.95,20 (f) f0.05,10,11 (g) f0.99,6,8 . 6. Let X 1 and X 2 be sample means from two independent normal populations with a common variance σ 2 . Define Sp2 , called pooled variance, by Sp2 = (n1 − 1)S12 + (n2 − 1)S22 n1 + n2 − 2 Show that (a) E[Sp ] = σ 2 ; (b) (n1 + n2 − 2)Sp2 ∼ χ2n1 +n2 −2 . σ2 7. Let X1 , . . . , Xn is a sample from a Bernoulli population. Let Y = n X Xi i=1 (a) Determine the distribution of Y . (b) If n = 100 and p = 0.05. Use the other two possible distributions to approximate P (Y = 3). (c) Let p̂ = X = Y /n. For n > 40, find c such that P (|p̂ − 0.05| < c) = 0.95. 8. The lifetime of a system is Y = X1 + X2 + X3 + X4 , whereX1 , X2 , X3 , X4 are the lifetimes of its subsystems. Suppose that each subsystem is independent, and the lifetime is exponentially distributed with MTBF (mean time between failure) being 3 hours. Find the probability that the system can survive at least 18 hours. 9. Let X1 , . . . , X100 be a sample from a normal population N (0, 1). (a) Find the pdf of sample variance S2 = 100 1 X (Xi − X)2 . 99 i=1 (b) Find E[S]. (c) Find P (X1 + X2 − X3 − X4 ≥ 2). 10. Let Y1 < Y2 < Y3 < Y4 be the order statistics of a random sample of size n = 4 from a continuous symmetric distribution with mean µ and variance σ 2 . Find the probability of Y3 < µ. 15 11. Suppose X1 , . . . , Xn is a random sample from N (µ, σ 2 ) population, where µ, σ 2 are both unknown. Let n 1X 1 Xn = Xi , Sn2 = (Xi − X n )2 . n i=1 n If Xn+1 is an additional observation, find the constant k so that k(X n − Xn+1 )/Sn has a t-distribution. 12. Let Xi ∼ N (0, 1), i = 1, 2, . . . , 6 be mutually independent. Let √ √ √ Y = (2X1 − 2X2 + 2X3 )2 + ( 3X4 − 2X5 − 3X6 )2 . Find the constant k so that kY has a chi-square distribution. 16