Download MATH 183 The Chi-Square Distributions

Dr. Neal, WKU MATH 183 The Chi-Square Distributions The chi-square distributions can be used in statistics to analyze the standard deviation " of a normally distributed measurement and to test the goodness of fit of various population models on a set of data. A chi-square distribution is based on a parameter known as the degrees of!freedom n , where n is an integer greater than or equal to 1. Such a random variable is denoted by X ~ ! 2 (n) . The ! 2 (n) distribution is defined to be the sum of the squares of n independent standard normal distributions. ! ! For example, suppose X1, . . . , Xn are independent normally distributed ! measurements having mean µ i and standard deviation ! i for i = 1, . . ., n . These measurements could be the heights or IQ scores of various groups of people. By subtracting the mean and then dividing by the standard deviation, we convert each X ! µi measurement into a standard normal distribution: Zi = i ~ N(0,!1) , for 1 " i " n . "i So Z1 ~ N(0, 1) and its distribution graph will be the common “bell-shaped curve” ! which is symmetric about the origin. Then Z12 ~ ! 2 (1) . Its plot will consist of positive values concentrated near the origin, and it will have mean 1 and variance 2 . The standard normal distribution 2 ! (1) distribution 2 ! (2) distribution 2 ! (n) distribution By standardizing, squaring, and summing random measurements from the respective normal populations, we obtain a chi-square distribution with n degrees of freedom: # X ! µ &2 # X ! µ &2 # X ! µ &2 2 1 1 2 2 n ( = Z 2 + Z 2 + ...+ Z 2 . (( + %% (( + . . . + %% n ! (n) = %% 1 n ( ! 2 " " " $ $ $ 1 ' 2 ' n ' The distribution graphs for n " 3 are skewed bell-shaped curves, defined on [0, ∞), with increasingly larger values of x as the point at which the graph obtains its maximum. The mean is now n, the variance is 2n, and the standard deviation is 2n . For n ≥ 3, the maximum (mode) occurs when x = n " 2 . ! ! Mean = n X ~ ! 2 (n) = Z12 + Z22 + ...+ Zn2 ! ! Variance = 2 n Standard Deviation = Mode = n " 2 (for n ≥ 3) ! ! ! ! ! 2n Dr. Neal, WKU The theoretical distribution curve is given by n / 2!1 ! x /2 , for x ≥ 0, e f (x) = Cn x where Cn is a constant that depends on n given by ! ) 1 + " % !+2 n /2 $ n ! 1' ! ++ #2 & Cn = * + (n!2 )/2 " n ! 1% $ '! +2 # 2 & + +, (n ! 1)! ( for n even for n odd . A chi-square curve can be plotted using the built-in ! 2 pdf( command from the ! DISTR menu. For example, to graph the ! 2 (10) curve, enter ! 2 pdf( X,10) into the Y= screen. To compute P(a ! X ! b) for X ~ ! 2 (n) , enter ! 2 cdf(a, b, n) or Shade ! 2 (a, b, n). Example 1. Let X ~ ! 2 (10) . (a) Where does the maximum of the curve occur? (b) Compute P(6 ! X ! 10) . Is there symmetry at the outer tails; i.e., does P(0 ! X ! 6) = P(X ! 10) ? (c) Find the left and right bounds that contain 90% of the distribution. Solution. (a) For X ~ ! 2 (10) , the maximum (mode) occurs when x = n ! 2 = 8. (b) From the TI output, we see that P(6 ! X ! 10) ≈ 0.37477. Also, the left-tail is P(0 ! X ! 6) ≈ 0.1847, and the right-tail is P(X ! 10) ≈ 0.4405. So the two tails outside of the inner region 6 ! X ! 10 are not symmetric. For there to be 90% in the middle of the distribution, we must have 5% at each tail. The values where these occur (chi-square scores) can be found with the table on the next page. In this case, the values are about 3.940 and 18.31. Dr. Neal, WKU Left and Right Chi–Square Scores for 80%, 90%, 95%, and 98% intervals. (L = Prob. of Left Tail, R = Prob. of Right Tail) 1 2 3 4 5 0.01 L 0.000 0.020 0.115 0.297 0.554 0.025 L 2.706 4.605 6.251 7.779 9.236 0.05 R 3.841 5.991 7.815 9.488 11.07 0.025 R 0.001 0.051 0.216 0.484 0.831 0.004 0.103 0.352 0.711 1.145 0.016 0.211 0.584 1.064 1.610 5.024 7.378 9.348 11.14 12.83 6.635 9.210 11.34 13.28 15.09 6 7 8 9 10 0.872 1.239 1.646 2.088 2.558 1.237 1.690 2.180 2.700 3.247 1.635 2.167 2.733 3.325 3.940 2.204 2.833 3.490 4.168 4.865 10.64 12.02 13.36 14.68 15.99 12.59 14.07 15.51 16.92 18.31 14.45 16.01 17.54 19.02 20.48 16.81 18.48 20.09 21.67 23.21 11 12 13 14 15 3.053 3.571 4.107 4.660 5.229 3.816 4.404 5.009 5.629 6.262 4.575 5.226 5.892 6.571 7.261 5.578 6.304 7.042 7.790 8.547 17.28 18.55 19.81 21.06 22.31 19.68 21.03 22.36 23.68 25.00 21.92 23.34 24.74 26.12 27.49 24.72 26.22 27.69 29.14 30.58 16 17 18 19 20 5.812 6.408 7.015 7.633 8.260 6.908 7.564 8.231 8.907 9.591 7.962 8.672 9.390 10.12 10.85 9.312 10.08 10.86 11.65 12.44 23.54 24.77 25.99 27.20 28.41 26.30 27.59 28.87 30.14 31.41 28.84 30.19 31.53 32.85 34.17 32.00 33.41 34.80 36.19 37.57 21 22 23 24 25 8.897 9.542 10.20 10.86 11.52 10.28 10.98 11.69 12.40 13.12 11.59 12.34 13.09 13.85 14.61 13.24 14.04 14.85 15.66 16.47 29.62 30.81 32.01 33.20 34.38 32.67 33.92 35.17 36.42 37.65 35.48 36.78 38.08 39.36 40.65 38.93 40.29 41.64 42.98 44.31 26 27 28 29 30 12.20 12.88 13.56 14.26 14.95 13.84 14.57 15.31 16.05 16.79 15.38 16.15 16.93 17.71 18.49 17.29 18.11 18.94 19.77 20.60 35.56 36.74 37.92 39.09 40.26 38.88 40.11 41.34 42.56 43.77 41.92 43.19 44.46 45.72 46.98 45.64 46.96 48.28 49.59 50.89 40 50 60 70 80 22.16 29.71 37.48 45.44 53.34 24.43 32.36 40.48 48.76 57.15 26.51 34.76 43.19 51.74 60.39 29.05 37.69 46.46 55.33 64.28 51.80 63.17 74.70 85.53 96.58 55.76 67.50 79.08 90.53 101.9 59.34 71.42 83.30 95.02 106.6 63.69 76.15 88.38 100.4 112.3 d.f. 0.05 L 0.10 L 0.10 R 0.01 R Dr. Neal, WKU Theorems I. Let { x1 , x2 , . . . , xn } denote the collection of all random samples of size n from 1 n 2 2 normally distributed measurements having variance " . Let S = (x i ! x )2 be " n ! 1 i =1 ! the distribution of all possible sample variances. Then ! ! (n ! 1) S2 2 is a distribution. ! (n " 1) "2 Thus with a normally distributed measurement, we can evaluate P(a ! S ! b) by P(a ! S ! b) = P(a2 ! S2 ! b2 ) $ (n " 1)a2 (n " 1)S2 (n " 1)b2 ' ) = P && ! ! 2 #2 # 2 )( % # $ (n " 1)a2 (n " 1)b 2 ') 2 = P && ! * (n " 1) ! # 2 )( % #2 provided " 2 is known. 2 II. ! Let S be the sample variance from a random sample of size n of a normally distributed measurement having variance " 2 . A confidence interval for " 2 , with level of confidence r = 1" # , is given by ! ! (n ! 1)S2 (n ! 1)S2 2 ! ! , "# " R L ! where L and R are the left and right bounds of the ! 2 (n " 1) distribution that give r probability in the middle. A confidence interval for " is (n ! 1)S2 (n ! 1)S2 "#" . R L III. To test the null hypothesis H0 : " = M !for a normally distributed measurement, we obtain the sample deviation S from a random sample of size n . The test statistic is then (n ! 1) S 2 (n ! 1) S 2 x= = which is compared with the ! 2 (n " 1) distribution. Compute 2 2 ! " M 2 ! ! P ! (n " 1) # x for the alternative the (left-tail) P -value Ha : " < M , and compute the ( ( 2 ) ) (right-tail) P -value P ! (n " 1) # x for the alternative Ha : " > M . ! ! Dr. Neal, WKU Example 2. Random samples of size 46 are taken from a measurement that is N(100,15) . What is P(13 ! S ! 17) ? Example 3. From a normally distributed measurement, a sample of size 20 yields S = 3.96 . Find a 98% confidence interval for the true standard deviation " . ! Example 4. From a normally distributed measurement, a sample of size 25 yields a ! H0 : " = 15? sample deviation of 13.96. Is there evidence to reject the hypothesis Solutions Example 2: ! P(13 ! S ! 17) = P(132 ! S2 ! 172 ) $ (n " 1)132 (n " 1)S 2 (n " 1)172 ' )) = P && ! ! #2 #2 #2 % ( $ 45 *169 45 * 289 ' = P& ! + 2 (n " 1) ! ) % 225 225 ( ( ) , P 33.8 ! + 2 (45) ! 57.8 , 0.794 2 (using ! cdf(33.8, 57.8, 45) ) (n ! 1)S2 (n ! 1)S2 19 ! 3.962 19 ! 3.96 2 "#" "# " ; hence, , R L 36.19 7.633 or 2.8693 ! " ! 6.24776 . Example 3: Example 4: For S = 13.96, we use the alternative Ha : " < 15. The test statistic is x= 2 (n ! 1) S 2 24 # 13.962 = = 20. 78737 ~ ! 2 (n " 1) = ! 2 (24) 2 2 ! " 15 and P ! (24) " 20.78737 ≈ 0.348765 ( ! 2 cdf(0, 20.78737, 24). If " = 15 were true, then there is still a 34.8765% chance of obtaining a sample deviation of 13.96 or lower with a sample of size 25. There is not enough evidence to reject H0 . ( ) ! Dr. Neal, WKU Exercises 1. Let X ~ ! 2 (15) . Find (a) P(13 ! X ! 17) , (b) P(X < 13) and (c) P(X > 17) . Show a graph for each. (d) Find the bounds that contain 95% of the distribution. 2. Adult heights are found to be normally distributed with mean µ = 68 inches and standard deviation " = 3.5 inches. Suppose various random samples of size n = 26 are collected. Compute P(2.8 ! S ! 4.2) . ! yields a sample 3. From a! normally distributed measurement, a sample of size 25 deviation of 14.85. Find a 95% confidence interval for the true standard deviation. 4. From a normally distributed measurement, a sample of size 16 yields S = 4.26. Is there evidence to reject the hypothesis H0 : " = 3? ! ! Answers: 1. (a) 0.2834 (b) 0.3977 (c) 0.3189 (d) L = 6.262 and R = 27.49 $ 25 ! 2.82 25 ! 4. 22 ') 2 2 2. P && " # (25) " ) = P 16 " # (25) " 36 * 0.8432 2 2 % 3.5 3.5 ( ( 3. Use 24 ! 14.852 "#" 39. 36 ( 2 ) 24 ! 14.852 to obtain 11.6 ≤ " ≤ 20.66. 12.40 ) 4. Test stat = 30.246, P ! (15) " 30. 246 ≈ 0.011. If " = 3 were true, then there is only a ! 1.1% chance of getting an S of 4.26 or higher with a sample of size 16. Can reject H0 in favor of Ha : " > 3. ! ! !

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download MATH 183 The Chi-Square Distributions