Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
Elementary Statistics and Inference 22S:025 or 7P:025 Lecture 39 1 Elementary Statistics and Inference 22S:025 or 7P:025 Chapter 28 (cont.) 2 Chapter 28 – The Chi-Square Test (cont.) A) Testing Independence The Chi-Square Test (χ2) can be used to test whether responses to some issues are answered independently or related – example, example do respondents to a survey question, “I think social security will be available to retirees in 2040,” are answered independently by gender and age range. 3 1 Chapter 28 – The Chi-Square Test (cont.) Example: The HANES (p. 58) study took a probability sample of 2,237 Americans, age 25-34. A question was asked about “handedness”. The results are shown in Table 5 below: 4 Chapter 28 – The Chi-Square Test (cont.) The table of counts has 3 rows and 2 columns, and the χ2-statistic can be used to answer the hypothesis: H0: Is gender independent of handedness H1: Gender is associated with handedness 5 Chapter 28 – The Chi-Square Test (cont.) The expected counts are determined based on the assumption that gender is independent of handedness – for example: 1070 + 934 2004 = 2237 2237 number of men = 1067 P(right - handed) = E (men who are right handed) = 1067 × 2004 = 955.8 2237 6 2 Chapter 28 – The Chi-Square Test (cont.) In general for m × n table – the expected counts are l k n as follows: l - - - - - m E jk j - - - - Ojk - - - - - - m - - - - - Rj Ck N = expected count in cell j, k if counts are distributed independently E jk = R j × Ck 7 N Chapter 28 – The Chi-Square Test (cont.) For the example in Table 7 – (934 + 1070) × (934 + 113 + 20) E11 = = 955.8 ~ 956 2237 E12 = (934 + 1070) × (1070 + 92 + 8) (2004) × (1170) = = 1048.1 2237 2237 E21 = (934 + 113 + 20) × (113.92) (1067) × (205) = = 97.78 ~ 98 2237 2237 E31 = (934 + 113 + 20) × (20 + 8) (1067) × (28) = = 13.3 ~ 13 2237 2237 8 Chapter 28 – The Chi-Square Test (cont.) E22 = (113 + 92) × (1070 + 92 + 8) (205) × (1170) = = 107.2 ~ 107 2237 2237 E32 = (20 + 8) × (1070 + 92 + 8) (28) × (1170) = = 14.6 ~ 15 2237 2237 χ2 = ∑ j ,k (ο jk − E jk ) 2 N = (934 − 956) 2 (1070 − 1048) 2 (173 − 98) 2 + + + 956 1048 98 (92 − 107) 2 (20 − 13) 2 (8 − 15) 2 + + = 12.4 ~ 12 107 13 15 9 3 Chapter 28 – The Chi-Square Test (cont.) The degrees of freedom for an m × n table of counts is the number of rows in the table – 1 x the number of columns in the table -1. df = ( m − 1)(n − 1) In our example df = (3 − 1)(2 − 1) = 2 Then refer the computed χ2=12 to the appropriate row of the Chi-Square probability table for 5%. For df = 2, χ 52%,2 = 5.99. 10 Chapter 28 – The Chi-Square Test (cont.) This means that if the computed χ2 value (i.e., 12) exceeds 5.99, it would occur less than 5% of the time if the null hypothesis were true - or the p-value is less than 5% - therefore reject the hypothesis, an association exists between “gender and handedness”. The great majority of persons in HANES in the 25 25-34 34 range were “right-handed”. Further, “left-handed” people are more likely to be men. Exercise Set C (pp. 539-540) #2, 3, 5, 7 11 Chapter 28 – The Chi-Square Test (cont.) #2. (Hypothetical.) In a certain town, there are about one million eligible voters. A simple random sample of size 10,000 was chosen, to study the relationship between sex and participation in the last election. The results: Men Women Voted 2,792 3,591 Didn’t Vote 1,486 2,131 Make a χ2-test of the null hypothesis that sex and voting are independent. 12 4 Chapter 28 – The Chi-Square Test (cont.) Expected Counts Men Women Total Voted 2,730.6 3,652.4 6,383 Didn’t Vote 1,547.4 2,069.6 3,617 4,278 5,722 10,000 df = (2 − 1)( 2 − 1) = 1 6383× 5722 6383 × 4278 6383× = 3652.4 E12 = = 2730.6 10,000 10,000 3617 × 5722 3617 × 4278 E 22 = = 2069.6 = 1547.4 E 21 = 10,000 10,000 2 2 2 (2792 − 2730.6) (3591 − 3652.4) (1486 − 1547.4) ( 2131 − 2069.6) 2 χ2 = + + + 2730.6 3652.4 1547.4 2069.6 χ 2 = 1.38 + 1.03 + 2.44 + 1.82 = 6.67, χ 5%,1 = 3.84 E11 = Based on the evidence, reject the hypothesis of independence – men are more likely to have voted. 13 Chapter 28 – The Chi-Square Test (cont.) #7. To test whether a die is fair, someone rolls it 600 times. On each roll, he just records whether the result was even or odd, and large (4, 5, 6) or small (1, 2, 3). The observed frequencies turn out as follows: Large Small Even 183 113 Odd 88 216 Question: Is the die fair? 14 Chapter 28 – The Chi-Square Test (cont.) To answer this question, you use – i) the one-sample z-test. ii) iii) the two-sample z-test. the χ2-test test, with a null hypothesis that tells you the contents of the box (section 1). iv) The χ2-test for independence (section 4). Now answer the question. 15 5 Chapter 28 – The Chi-Square Test (cont.) Apply Goodness-of-Fit Test (n=600) Outcomes Probability Expected Outcomes Observed Outcomes Even Large (4 or 6) 2/6 600 x 2/6 = 200 183 Even Small (2) 1/6 600 x 1/6 = 100 113 Odd Large (5) 1/6 600 x 1/6 = 100 88 Odd Small (1 or 3) 2/6 600 x 2/6 = 200 216 600 600 16 Chapter 28 – The Chi-Square Test (cont.) (200 − 183) 2 (100 − 113) 2 (100 − 88) 2 ( 200 − 216) 2 + + + 183 113 88 216 χ 2 = 1.58 + 1.50 + 1.64 + 1.19 = 5.91 χ2 = χ 52%,3 = 7.82, df = 3, p − value > 5% - retain H 0 The die is fair! 17 Chapter 28 – The Chi-Square Test (cont.) E. Review Exercises – (pp. 541-543) #1, 2, 3, 6, 8, 10 #2. As part of a study on the selection of grand juries in Alameda county, the educational level of grand jurors was compared with the county distribution: Educational Level County Number of Jurors Elementary 28.4% 1 Secondary 48.5% 10 Some College 11.9% 16 College Degree 11.2% 35 100.0% 62 Total 18 6 Chapter 28 – The Chi-Square Test (cont.) Could a simple random sample of 62 people from the county show a distribution of educational level so different from the county-wide one? Choose one option and explain. i) This is absolutely impossible impossible. ii) This is possible, but fantastically unlikely. iii) This is possible but unlikely – the chance is around 1% or so. iv) This is quite possible – the chance is around 10% or so. v) This is nearly certain. 19 Chapter 28 – The Chi-Square Test (cont.) Expected Counts (n=62) df = 3 (ο − E ) 2 Count Observed 28.4% x 62 = 17.6 1 15.7 S Secondary d 48 5% x 62 = 48.5% 30 1 30.1 10 13 4 13.4 Some College 11.9% x 62 = 7.4 16 9.9 College Degree 11.2% x 62 = 6.9 35 114.4 62 62 153 Elementary E χ2= 153, p-value < 1% 20 Chapter 28 – The Chi-Square Test (cont.) #8. Two people are trying to decide whether a die is fair. They roll it 100 times, with the results shown at the top of the next page. One person wants to make a z-test, the other wants to make a χ2-test. Who is g Explain p briefly. y right? 21 7 Chapter 28 – The Chi-Square Test (cont.) Outcomes Observed Expected (ο − E ) 2 E 1 21 16.67 1.12 2 15 16.67 0.17 3 13 16.67 0.81 4 17 16.67 0.01 5 19 16.67 0.33 6 15 16.67 0.17 100 χ2 = 2.61, df = 5, 2.61 p-value > 5%, retain 22 Chapter 28 – The Chi-Square Test (cont.) #10. The U.S. has bilateral extradition treaties with many countries. ( A person charged with a crime in his home country may escape to the U.S.; if he is captured in the U.S., authorities in his home country may request that he be “extradited,” that is, turned over to them for prosecution under their laws.) The Senate attached a special rider to the treaty governing extradition to Northern Ireland: fugitives cannot be returned if they will be discriminated against on the basis of religion. In a leading case, the defense tried to establish discrimination in Northern Ireland’s criminal justice system. 23 Chapter 28 – The Chi-Square Test (cont.) One argument was based on 1991 acquittal rates for persons charged with terrorist offenses. These rates were significantly different for Protestants and Catholics: χ2 ≈ 6.2 on 1 degree of freedom, P ≈1%. The data are shown below: 8 Protestants out of 15 were acquitted, compared to 27 Catholics out of 65 65. a) Is the calculation of χ2 correct? If not, can you guess what the mistake was? (That might be quite difficult.) b) What box model did the defense have in mind? Comment briefly on the model. Protestant Catholic Acquitted 8 27 Convicted 7 38 24 8 Chapter 28 – The Chi-Square Test (cont.) Expected Outcomes Protestant Catholic Total Acquitted 6.56 28.44 35 Convicted 8.44 36.56 45 15 65 80 Total (8 − 6.56) 2 (27 − 28.44) 2 (7 − 8.44) 2 (38 − 36.56) 2 + + + 6.56 28.44 8.44 36.56 2 χ = .32 + .07 + .25 + .06 = .70 df = 1, p − value > 5%, retain H 0 χ2 = 25 Chapter 28 – The Chi-Square Test (cont.) Note: When testing for independence in a 2 x 2 table of counts, a short-cut technique can be used. Example: Counts χ2 = A B A+ B C D C+D A+ C B+D N=A+B+C +D N ( AD − BC ) 2 ( A + C )( B + D)( A + B)(C + D) 26 Chapter 28 – The Chi-Square Test (cont.) Apply this to data of exercise #10 in review exercises: 8 27 7 38 45 15 65 80 χ2 = 35 80(8 ⋅ 38 − 7 ⋅ 27) 2 (80)(115)(115) = = .69 (15)(65)(35)(45) (15)(65)(35)(45) 27 9