Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Lecture 12 Dan Piett STAT 211-019 West Virginia University Last Week Hypothesis Tests on a difference in means Hypothesis Tests on a difference in proportions The 2-sided alternative Overview Chi-Squared Goodness of Fit Test Chi-Squared Test of Independence Section 12.1 Chi-Squared Goodness of Fit Test Multinomial Data Previously we have looked at data coming from a binomial distribution 2 Outcomes (Success, Failure) Example: Flipping a coin (Heads, Tails) Suppose we are interested in data with more than 2 outcomes Example: Rolling a die 6 Outcomes (1, 2, 3, 4, 5, 6) We obtain multinomial data from a multinomial experiment Multinomial Experiments Multinomial Experiments follow these properties Fixed number of trials, n 2. Each trial results in exactly one of K possible outcomes 3. Probability pi, is the probability of getting outcome i on a single trial 1. 4. p1 + p2 + p3 + … + pK = 1 Trials are independent Finding Expected Frequencies Remembering back to the binomial distribution Expected Value = n*p For our multinomial distribution we will have K expected counts Each Expected Count; Ei = n*pi Example: Rolling a fair 6-sided die 600 times (pi = 1/6) Outcome 1 2 3 4 5 6 Probability 1/6 1/6 1/6 1/6 1/6 1/6 Expected Counts 100 100 100 100 100 100 Observed Frequencies When we do our multinomial experiment, we will not always get exactly our expected counts. Example: We expected 100 4’s on our dice experiment. Suppose we only get 85. 85 is our Observed Frequency; Oi Our Observed Frequencies (Counts) are our actual data Suppose on our 600 dice throws, these are our observed counts Outcome 1 2 3 4 5 6 Expected Counts 100 100 100 100 100 100 Observed Counts 97 113 102 85 109 94 Chi-Squared Goodness of Fit Test So the question to be asked when looking at a table like this is “are our observed counts far enough from our expected counts to determine that the expected counts are wrong?” Outcome 1 2 3 4 5 6 Expected Counts 100 100 100 100 100 100 Observed Counts 97 113 102 85 109 94 This is what the Chi-Squared Goodness of Fit Test attempts to answer. Note that our test will follow the 7 step procedure Chi-Square Goodness of Fit Test 1. 2. 3. 4. 5. H0: p1 = #1, p2 = #2, … pK = #k HA: At least one pi ≠ #i Alpha is .05 if not specified Test Statistic = P-value will come from the Chi-Squared Table with df = k-1 P(Test Statistic > Chi Squared Tabled Value) There is only 1 alternative hypothesis Our decision rule will be to reject H0 if p-value < alpha 7. We have (do not have) enough evidence at the .05 level to conclude that the at least one of our probabilities is incorrect. We require that our expected counts at each cell are at least 5 and that our sample is independent and random. 6. Example: For Fall 2013, 99 STAT 211 students were given a choice of 3 section times (A,B,C) to take the final exam. The data that follows shows the number of students who selected each section. Does the data indicate that the students exhibit a preference, or indicate that all sections are equally likely to be chosen. Use alpha=.05 (Hint: If all 3 are equally likely, all pi’s will be 1/3) Observed Counts: A – 40 B – 30 C – 29 Section 12.2 Chi-Squared Test for Independence Association of Categorical Variables Thus far, all of our confidence intervals and hypothesis tests have been done on numeric variables. We will now shift our attention to categorical variables Ex: Eye Color, Class Rank The question we wish to answer is, “is there an association between two categorical variables?” Ex: Is there an association between Eye Color and Hair Color? We will use a Chi Squared Test to answer this question, but first we need to discuss contingency tables. Contingency Tables (Observed) We can organize categorical data in a contingency table, with r rows and c columns. This is known as an r x c (r by c) contingency table. Note that the contingency tables contains observed counts Example: Some Possible Values for Hair Color vs. Eye Color Hair x Eye Brown Blue Green Black 90 20 8 Brown 65 22 9 Blonde 33 75 12 Contingency Tables (Expected) Much like the goodness of fit test, we will need to calculate our expected counts. The formula for the expected counts is So for the previous example Hair x Eye Brown Blue Green Total Black 110 (81.1) 20 (45.6) 8 (11.3) 138 Brown 65 (??) 22 (??) 9 (??) 96 Blonde 33 (??) 75 (??) 12 (??) 120 Total 208 117 29 354 We now have Observed and Expected counts, so we can do a Chi- Squared Test for independence Chi-Squared Test for Independence 1. 2. 3. 4. 5. H0: Variable 1 and Variable 2 are independent HA: Variable 1 and Variable 2 are not independent (dependent) Alpha is .05 if not specified Test Statistic = P-value will come from the Chi-Squared Table with df = (r-1)(c-1) P(Test Statistic > Chi Squared Tabled Value) There is only 1 alternative hypothesis Our decision rule will be to reject H0 if p-value < alpha 7. We have (do not have) enough evidence at the .05 level to conclude that the variables are dependent. We require that our expected counts at each cell are at least 5 and that our sample is independent and random. 6. Example Does “test failure” reduce academic aspirations and thereby contribute to a decision to drop out of school? A survey of 283 students is randomly selected from schools with low graduation rates. The contingency table below reports the results to the question “Do tests required for graduation discourage students from staying in school?” Does there appear to be a relationship between the schools’ location and the students’ responses? Response x School Urban Suburban Rural Yes 57 27 47 No 23 16 12 Unsure 45 25 31