Download Chi-Square for Contingency Tables

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
Transcript
Chi‐Square for Contingency Tables 2 x 2 Case A test for p1 = p2 We have learned a confidence interval for p1 – p2, the difference in the population proportions. We want a hypothesis testing procedure for this difference. Definitions A contingency table is a tabular arrangement of count data representing how the row factor frequencies relate to the column factor. We call a contingency table with “r” rows and “c” columns, an r x c contingency table. Each category in a contingency table is called a cell. Example Consider a 2 x 2 contingency table with the row factor denoting a success versus failure, and the column factor denoting Group 1 or Group 2, where the samples for both Group 1 and Group 2 are independent of each other. Then, the contingency table looks like this: Group 1 Group 2
Y2 Success Y1 Failure Recall Example 10.37 regarding effectiveness of Timolol on angina status. The contingency table would be as follows: Timolol
Angina free 44 Not Angina Free 116
Placebo
19 128
We have already used this data to construct a 95% confidence interval for the difference in the proportion of angina free for the Timolol versus the Placebo conditions. Let p1 denote the probability (or population proportion) of success for Group 1 Let p2 denote the probability (or population proportion) of success for Group 2 To test HO: p1 = p2, we’ll introduce Pearson’s χ2 (Chi‐square) statistic. Definition O‐E
2
where the sum is over all the cells in the table, O denotes Pearson’s χ2 statistic is Xs2 ∑
E
observed values in each cell, and E denotes the value we’d expect to see (if HO were true). Now, we have the observed values (the data we collected). What are the E’s? Remember, we conduct hypothesis tests under the assumption that the null hypothesis is true. If the null hypothesis were true, then _____________. So, then p1 and p2 would be estimating a common p (i.e. the probability of a success would be the same under Group 1 or Group 2 in our example). Then, we could estimate this common p by using a weighted (“pooled”) estimator. ppool
n1 p1 n2 p2
n1 n2
n1
Y1
Y
n2 2
n1
n2
n1 n2
Y1 Y2
n1 n2
Little Sidebar… Suppose you are flipping an unfair coin, where the probability of a heads is 0.3 and the probability of a tails is 0.7. How many heads would you expect to see if you were to flip this unfair coin ten times? Now, apply this thought process to get the expected successes for Group 1. And compute the expected successes for Group 2. Chi‐square for Contingency Tables Page 2 Fill out the “Expected Table” for the Group 1/Group 2 success/failure contingency table. Success Failure Group 1
Group 2
Things to remember • The E’s (expected counts) need not be integers and we do not round them • The row and column totals are the same for observed and expected tables (this is a good way to check your calculations!) • For the Chi‐square test (we’ll begin implementing in just a moment) to be valid, we need each E ≥ 1 and for the average E ≥ 5 Chi‐square for Contingency Tables Page 3 Calculating P‐values under the χ2 distribution The χ2 distribution is a right skewed distribution. The values of a χ2 random variable are greater than or equal to 0. The χ2 distribution has degrees of freedom. The degrees of freedom for a χ2 test with a contingency table are df = (# of rows ‐ 1)(# of columns – 1) For a non‐directional alternative, P = P{χ2df ≥ X2s} If df=1, we have the option of performing a directional alternative. In this case, 1
P
2
P χ2df Xs2 if data deviate in the direction specified by HA
0.5 otherwise TI‐83/84 Matrix (2nd x‐inverse) ‐> scroll over to EDIT ‐> ENTER ‐> Enter your matrix STAT ‐> scroll over to TESTS ‐> scroll down to X2‐Test ‐> ENTER ‐> Make sure your observed values are in the matrix specified; the expected matrix will be calculated for you and stored in the matrix specified ‐> Calculate ‐> ENTER Chi‐square for Contingency Tables Page 4 Example Using the table below, conduct a test of hypothesis at the α = 0.01 significance level, to determine whether there is a significant difference in the probability of being angina free under Timolol or placebo. Timolol
Angina free 44 Not Angina Free 116
Chi‐square for Contingency Tables Placebo
19 128
Page 5 What if the researchers wanted to know to know whether the probability of being angina free is greater under Timolol than under placebo? What if the researchers wanted to detect whether the probability of being angina free under Timolol is less than under placebo? Chi‐square for Contingency Tables Page 6 A Test for Association The work‐up of all the previous examples assumed we had two independent samples and we were observing those two samples for the outcome of one variable. Many times, we are in the situation where we observe one sample for two explanatory factors. Factor 2 Factor 1
Level 1
Level 1 Y1
Level 2 Level 2
Y2 In the case where we have one sample and we’re observing it for two explanatory factors, we’ll test the hypothesis of association. The test for HO: there is no association is numerically equivalent to that of HO: p1 = p2 but the hypotheses and interpretations are different. Chi‐square for Contingency Tables Page 7 Example 10.21 To study the association of hair color and eye color in a German population, an anthropologist observed a sample of 6,800 men. Hair Color
Dark
Light
Dark 726 131 Eye Color Light 3,129 2,814 Test at the α = 0.05 significance level, whether hair color is associated with eye color in this population of German men. Chi‐square for Contingency Tables Page 8 General r x c Case The ideas presented in the 2 x 2 cases just presented can be easily extended to general r x c contingency tables. For the case where we have c different samples (your columns), and we’re checking each sample for different levels of the row factor, the hypothesis will change slightly. Here, we’ll test whether the distributions are the same for each sample. (Think about it, if we have more than a success and a failure, then for each column we’ll have P(level 1), P(level 2), …,P(level r). And then, the null hypothesis would be testing whether p11 = p12 = …= p1c and p21 = p22 = … = p2c, etc… This is called a compound hypothesis.) For the case where we have one sample and we’re checking that one sample for different levels of two different factors, we’ll still be testing association. Chi‐square for Contingency Tables Page 9 Example 10.31 The following table shows the observed distribution of A, B, AB, and O blood types in three samples of African Americans living in different locations. I (Florida) II (Iowa) III (Missouri)
A 122 1781
353
B 117 1351
269
AB 19 289
60
O 244 3301
713
Test at the α = 0.05 level of significance, whether the distribution of blood type for African Americans is different across the three regions. Chi‐square for Contingency Tables Page 10 Example 10.33 To study the association of hair color and eye color in a German population, an anthropologist observed a sample of 6,800 men (this is the same study as that of example 10.21). Brown Eye Color Grey or Green Blue Brown
438 1387
807 Black
288
746
189
Hair Color
Fair
115
946
1768 Red
16
53
47
Test, at the α = 0.05 significance level, whether hair color is associated with eye color in this population of German men. Chi‐square for Contingency Tables Page 11 Final Notes on Chi‐Square for Contingency Tables • Remember your calculator gives P‐values for a non‐directional alternative • We can have a directional alternative when we’re in the 2 x 2 table, and when HA is directional, one must check the data deviate in the direction specified by HA o If yes, cut P‐value in half o If no, P > 0.5 and fail to reject HO • Degrees of freedom for an r x c table are (# rows – 1)(# columns – 1) • Pearson’s X2 statistic for contingency tables uses the approximation X2 ~ χ2df, so in order to be a valid approximation, a standard rule of thumb is to require E ≥ 1 for each cell and the average E ≥ 5 (and observations independent of one another) • If expected counts are small, and data forms a 2 x 2 table, Fisher’s exact test may be appropriate • By contrast, example 10.21 illustrates X2s is very sensitive with large sample sizes • For r x c tables, we have the following two hypotheses o c samples and we’re checking for r levels of a row factor, then we’re testing whether the distributions are the same (for the groups – your columns) o one sample and we’re checking for r levels of a row factor, and c levels of a column factor, then we’re testing for an association of the row and column factors Chi‐square for Contingency Tables Page 12