Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Data Analysis and Statistical Methods Statistics 651 http://www.stat.tamu.edu/~suhasini/teaching.html Lecture 28 (MWF) The chi-squared test for independence Suhasini Subba Rao Lecture 28 (MWF) The chi-squared test for independence Example Let us return to an example in Lecture 26. The FDA approved the drug Minodixil as a remedy for male pattern baldness. They did a study and this is what they found: Minodixil Placebo Sample size 310 309 % with new hair growth 32 20 • The way we approached the problem was to examine and compare the conditional probabilities. We let pM be the probability a person has new hair growth and uses Minodixil and pP be the probability a person has new hair growth and and doesn’t use Minodixil. These are unknown so we estimate them from the data using p̂M = 99/310 = 0.32 and p̂P = 60/302 = 0.2. 1 Lecture 28 (MWF) The chi-squared test for independence • If there is no difference between Minodoxil and the Placebo, pM and pP would be the same. • Therefore we test H0 : pM − pP ≤ 0 against the alternative HA : pM − pP 6= 0 (in reality we were testing HA : pM − pP > 0, however the following discussion only applies to the two-sided test). • We return to the above problem, but look at it in terms of conditional probabilities and independence. • Recalling the definitions in Lecture 5, two variables are independent (in this example the type of hair treatment and whether hair growth is seen or not) if P (Hair growth|Minidoxil) = P (Minidoxil), in other other words Minidoxil has no influence on hair growth. • Again recalling definition from Lecture 5 pM and pP are conditional 2 Lecture 28 (MWF) The chi-squared test for independence probabilities - that is the probability of seeing an increase in hair when given Minodoxil (or the placebo). • As there are only two groups, if pM = pP this means the conditional probabilities are the same as the marginal probability. • In other words, if pM = pP are the same it is the same as saying that cream and hair difference are statistically independent. Indeed if pM = pP (they are the same) they the same as the marginal probabilities. • Now we look at a different method of approaching this problem based on explicitly checking for statistical independence. The advantage of this approach is that it not restricted to just the two-by-two tables. The disadvantage is that one-sided tests are not possible. 3 Lecture 28 (MWF) The chi-squared test for independence Does gender play a role in wearing a dress at the Oscars? • H0: Gender does not play a role on dress wearing HA :Gender does play a role on dress wearing. • If the null were true overall we would expect the proportion who wore a dress to be the same as the proportion of females who wore a dress to be the same as the proportion of males who wore a dress (marginals the same as the conditionals). • Data was collected. 4 Lecture 28 (MWF) The chi-squared test for independence Example 1: Oscar data Dress No Dress Total Dress No Dress Total Male 0 200 200 Male 0 0 ( 200 = 0.00) 100 200 ( 200 = 1) 200 Female 215 5 220 Total 215 205 420 Female 215 ( 215 220 = 0.977) 5 5 ( 220 = 0.023) 220 Total 215 205 420 • Looking at the numbers it appears that gender has an influence on dress. Is this statistically significant? What would be the numbers look like if there was no association? 5 Lecture 28 (MWF) The chi-squared test for independence The test: What we expect under independence Dress No Dress Total Dress No Dress Total Dress No Dress Total Male Female 200 200 ( 420 = 0.476) 220 ( 220 420 = 0.524) Male 51.2% of 200 males 102.38 48.8% of 200 males 97.62 200 200 ( 420 = 0.476) Female 51.2% of 220 females 112.6 48.8% of 220 females 107.38 220 ( 220 420 = 0.524) Male 0 200 200 Female 215 5 220 Total 215 205 420 Total 215 215 ( 420 = 0.512) 205 205 ( 420 = 0.488) 420 Total 215 ( 215 420 = 0.512) 205 ( 205 420 = 0.488) 420 Measure the difference between what we do observe. 6 Lecture 28 (MWF) The chi-squared test for independence There is such a clear mismatch, we really do believe there is a dependence. This means the we should be able to reject the null and the ‘p-value’ in the test will be small. How to do actually measure this difference: (102.8 − 0)2 (112.6 − 215)2 (97.62 − 200)2 (107.38 − 5)2 T = + ++ = 400. 102.8 112.6 97.62 107.38 This should correspond to a tiny p-value (zero almost). 7 Lecture 28 (MWF) The chi-squared test for independence Does gender play a role in grades? • H0: Gender does not play grades HA:Gender does play in grades. • If the null were true overall we would expect the proportion who passed to be the same as the proportion of females who passed to be the same as the proportion of males passed (marginals the same as the conditionals). • Data was collected. 8 Lecture 28 (MWF) The chi-squared test for independence Example 1: Grades Data Pass Fail Total Male 108 12 120 Female 180 20 200 Total 288 32 320 Male Female Total Pass 108 ( 108 120 = 0.9) 180 ( 180 200 = 0.9) 288 Fail 12 12 ( 120 = 0.1) 20 20 ( 200 = 0.1) 32 Total 120 200 320 • Looking at the numbers it at least for this data set there does not appear to be an association. 9 Lecture 28 (MWF) The chi-squared test for independence The test: What we expect under independence Pass Fail Total Pass Fail Total Pass Fail Total Male Female 120 200 Total 288 32 320 Male 90% of the 120 males = 108 10% of 120 males =12 120 ( 120 320 = 0.375) Male 108 12 120 Female 180 20 200 Total 288 32 320 Female 90% of the 200 females = 180 10% of the 320 females = 20 200 ( 200 320 = 0.625) Total 288 ( 288 320 = 0.9) 32 32 ( 320 = 0.32) 320 Measure the difference between what we do observe. This time it is T = 0. The data is exactly as we would expect it to be under the null of no association. p-value should be 100%. 10 Lecture 28 (MWF) The chi-squared test for independence Returning to the Minidoxil Example We relook at the data: Minodixil Placebo Pooled new hair growth 99 60 159 no hair growth 211 242 453 Sample size 310 302 612 We want to test H0 :There is no association between hair growth and hair cream used against HA : There is an association between hair growth and hair cream used. This is the same as H0 : pM − pP = 0 against HA : pM − pP 6= 0. 11 Lecture 28 (MWF) The chi-squared test for independence Suppose the null is true, that is there is no difference between Minidoxil and the Placebo. In this case, the best estimate for the probability of seeing hair growth if any hair cream is used is the ‘pooled’ estimate which is p= 159 99 + 60 = = 0.26 310 + 302 612 (see lecture 26) and the best estimate for the probability of not seeing a difference if cream is placed in the hair is 1−p= 451 451 = = 0.74. 310 + 302 612 We then look back at the data and calculate what we expect to see if there is no association between the two: 12 Lecture 28 (MWF) The chi-squared test for independence Minodixil Placebo new hair growth 0.26 × 310 = 80.54 0.26 × 302 = 78.46 159 no hair growth 0.74 × 453 = 229.5 0.74 × 302 = 223.5 453 Sample size 310 302 612 Just like in the goodness of fit test we calculate the difference between what we expect and what we actually observe in the data (99 − 80.54)2 (211 − 229.5)2 (78.46 − 60)2 (223.5 − 242)2 T = + + + = 11.58 80.54 229.5 78.46 223.5 Recall if T is zero there is a perfect match between what we expect to see and what we observe and the data is consistent with the null being true. On the other hand, if T is ‘large’ there is a complete mismatch between what we expect and what we observe and there is evidence against the null (the data is unlikely to be independent). 13 Lecture 28 (MWF) The chi-squared test for independence How to tell determine whether T is large or not? • For the Gender/Dress example T = 400 we had to reject the null, p-value was very small. • For the Gender/Grade example T = 0, we could not reject the null, p-value must be one. • The distribution of T under the null. that there is no dependence, follows a χ-square distribution with 1-degree of freedom. The p-value is the area to the right of T . The 5% critical value is 3.841. Since T = 11.58 is smaller than 3.481 the p-value is less than 5%. In fact T = 11.58 is less than 10.83, so the p-value is less than 0.1%. The output using Statcrunch is given below, compare what we have calculated to what you observe in the output. 14 Lecture 28 (MWF) The chi-squared test for independence Chi-square plot 15 Lecture 28 (MWF) The chi-squared test for independence The Minidoxil Example (Statcrunch output) Contingency table results: Rows: Difference Columns: None Cell format Count (Row percent) (Column percent) (Total percent) Expected count Minidoxil Placebo Total Yes 99 (62.26%) (31.94%) (16.18%) 80.54 60 159 (37.74%) (100.00%) (19.87%) (25.98%) (9.804%) (25.98%) 78.46 No 211 (46.58%) (68.06%) (34.48%) 229.5 242 (53.42%) (80.13%) (39.54%) 223.5 453 (100.00%) (74.02%) (74.02%) 310 (50.65%) (100.00%) (50.65%) 302 (49.35%) (100.00%) (49.35%) 612 (100.00%) (100.00%) (100.00%) Total Chi−Square test: Statistic Chi−square DF 1 Value P−value 11.584855 0.0007 We see that the p-value is 0.07%. Since 0.07% < 5%, there is evidence to suggest that the cream applied has an influence on hair growth. Make sure you understand what all the percentages in the output mean. 16 Lecture 28 (MWF) The chi-squared test for independence Comparing the results of the chi-squared test and proportions test • In Lecture 26 we had used the same data set to test for two-proportions H0 : pM − pP ≤ 0 against HA : pM − pP > 0 (one-sided test). The p-value for this test was 0.03%). • The p-value for chi-squared test is 0.07% • The p-value for the one-sided proportions test is 0.03% • If the proportions test was a two-sided test, the p-value would be 0.06%. • We see that the p-value for the two-sided two sample proportion test is (almost) the same as the chi-squared test. 17 Lecture 28 (MWF) The chi-squared test for independence • This is because the chi-squared test for independence for 2-by-2 tables is the same as a two-sided test for two sample proportions. However, an advantage of the two-sample proportion test is that it can be used to test other alternatives such as H0 : pM − pP ≤ 0.3 against HA : pM − pP > 0.3 and one-sided test, which isn’t possible with the chi-squared test. 18 Lecture 28 (MWF) The chi-squared test for independence Testing for independence (or lack of association) • Often in statistics we want to test whether there is an association between two events. For example, is there a link between signs of diabetes and whether someone is over weight, smoking and cancer etc. • These type of question can be placed within a categorical data framework (see testing for independence lecture30.pdf for an alternative explanation). 19 Lecture 28 (MWF) The chi-squared test for independence Tests for independence: General setup • We have m × k cells (the word ‘cell’ is usually used in categorial data when referring to one event). The observed table is: B1 B2 .. Bm Subtotals • n·i = Pm A1 n11 n21 A2 n12 n22 ... ... ... Ak n1k n2k Subtotals n·1 n·2 nm1 n1· nm2 n2· ... ... nmk nk· n·m N j=1 nji and ni· = Pk j=1 nij . • n1· + . . . + nm· = n·1 + . . . + n·k = N . 20 Lecture 28 (MWF) The chi-squared test for independence The general setup Under the null hypothesis of independence between events the expected table is A1 A2 ... Ak Subtotals B1 n1· n·1 N n·1 n2· N ... n·1 nk· N n·1 B2 .. n·1 n1· N n2· n·2 N ... nk· n·2 N n·2 Bm n1· n·m N n2· n·m N ... nk· n·m N n·m Subtotals n1· n2· ... nk· N n n We need to compare each nij with i·N ·j , if they are close we don’t have enough evidence to say that the events are dependent. 21 Lecture 28 (MWF) The chi-squared test for independence Testing for independence: the test statistic • The test statistic is T = n n k X m X nij − i·N ·j i=1 j=1 ni· n·j N 2 ∼ χ2(m−1)×(k−1). • As usual if T is larger than under the null T ∼ χ2(m−1)×(k−1)(α). 22 Lecture 28 (MWF) The chi-squared test for independence Length of hair and feet size Example Test whether there is an association between length of hair and big feet using the data: Big feet Not big feet Subtotals Long hair 20 150 170 Medium hair 80 170 250 Short Hair 250 80 330 Subtotals 350 400 Total=750 We see that the p-value is small (it is less than 0.01%), which suggests a dependence between feet size and hair. How check where this difference lies? Do tests on proportions between each pairing. 23 Lecture 28 (MWF) The chi-squared test for independence This is a two sample test on proportions, which looks for significant difference between each groups pair • Long Hair vs. Medium Hair. p-value < 0.01% • Long Hair vs. Short Hair. p-value < 0.01% • Short Hair vs. Medium Hair. p-value < 0.01% It seems that hair has a influence on feet size. The people with shorter hair tend to have bigger feet. 24 Lecture 28 (MWF) The chi-squared test for independence Example: Test for independence between height and bossiness • Psychologists wanted to investigate whether there was dependence between height and how bossy someone was (aka Do short men have a Napolean complex). • They gathered the following data. bossy not bossy short 90 110 200 medium 155 145 300 large 55 145 200 totals 300 400 700 • Test the hypothesis that there is no dependence between height and bossiness against the alternative that there is. 25 Lecture 28 (MWF) The chi-squared test for independence Solution • Recall that independence means that if you randomly selection someone the probability they will be be bossy is the same as if you were to restrict the population to tall people (or short people or middle size people) and randomly select someone in this subpopulation (of only tall, or short or middle size people). If this is the case, then size has no dependence on bossiness. • In reality we cannot calculate these probabilities, because we do not observe the entire population of people, but we do have samples from the population. In this case we have a sample of 700 people. • First look at the data. We see that the proportion of short men who are bossy is larger than the proportion that the proportion of medium and 26 Lecture 28 (MWF) The chi-squared test for independence large men that are bossy. So from looking at the data, there appears to be a dependence. But this difference could be due to random variation. So we want to test whether the difference is significant or not. Our objective is to test: H0 : There is no dependence between height and bossiness. HA : There is a dependence between height and bossiness. • We first have to make a table of expected values under the null that there is no dependence between height and bossiness. 27 Lecture 28 (MWF) The chi-squared test for independence The test statistic is T = 29. • Now because there are 3 × 2 cells (it is a 3 by 2 table), under the null T has a χ2 distribution with (3 − 1) × (2 − 1) = 2-degrees of freedom. • Look up Table 7: χ22(0.05) = 5.991. • The p-value is P (T > 26) = 0.0001. • Since T = 29 > 5.99, there is enough evidence to reject the null. Equivalently the p-value is very small. That is, based on the data there appears to be a dependence between size and bossiness. • What happens when we do a pair-wise test? 28 Lecture 28 (MWF) The chi-squared test for independence Beware of interpretation: Simpson’s paradox The Fiona, Jane and Steph are being rated at a call center. Their ratings data is collected. • We see that the rating of Fiona (85%) is greater than Jane (59%) and Steph (75%). • In fact if we compared Fiona with either Jane or Steph we would see that the difference is statistically significant and there is evidence that Fiona gets better ratings (the difference is over and above random variation). • To check the above do a two sample test for proportions H0 : pF − pJ ≤ 0 vs. HA : pF − pJ > 0 etc. • Does this mean Fiona is better at dealing with customers? • We need to be careful when draw causal conclusions. 29 Lecture 28 (MWF) The chi-squared test for independence A breakdown of the data We now consider a break down of the data over two weeks; the first week and the second week. Comparing the scores we see that Jane performs better on both weeks. What is happening? 30