Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
14 Chapter Chi-Square Tests Chi-Square Test for Independence Chi-Square Tests for Goodness-of-Fit McGraw-Hill/Irwin Copyright © 2010 by The McGraw-Hill Companies, Inc. All rights reserved. Chi-Square Test for Independence Chi-Square Test • • • • In a test of independence for an r x c contingency table, the hypotheses are H0: Variable A is independent of variable B H1: Variable A is not independent of variable B Use the chi-square test for independence to test these hypotheses. This non-parametric test is based on frequencies. The n data pairs are classified into c columns and r rows and then the observed frequency fjk is compared with the expected frequency ejk. 14-2 Chi-Square Test for Independence Chi-Square Distribution • • • • The critical value comes from the chi-square probability distribution with n degrees of freedom. n = degrees of freedom = (r – 1)(c – 1) where r = number of rows in the table c = number of columns in the table Appendix E contains critical values for right-tail areas of the chisquare distribution. The mean of a chi-square distribution is n with variance 2n. 14-3 Chi-Square Test for Independence Expected Frequencies • Assuming that H0 is true, the expected frequency of row j and column k is: ejk = RjCk/n where Rj = total for row j (j = 1, 2, …, r) Ck = total for column k (k = 1, 2, …, c) n = sample size Steps in Testing the Hypotheses Step 1: State the Hypotheses H0: Variable A is independent of variable B H1: Variable A is not independent of variable B 14-4 Chi-Square Test for Independence Steps in Testing the Hypotheses • Step 2: Specify the Decision Rule Calculate n = (r – 1)(c – 1) For a given a, look up the right-tail critical value (c2R) from Appendix E or by using Excel. Reject H0 if c2R > test statistic. • Step 3: Calculate the Expected Frequencies ejk = RjCk/n 14-5 Chi-Square Test for Independence Steps in Testing the Hypotheses • Step 4: Calculate the Test Statistic The chi-square test statistic is calc • Step 5: Make the Decision Reject H0 if c2R > test statistic or if the p-value < a. 14-6 Chi-Square Test for Independence Small Expected Frequencies • • The chi-square test is unreliable if the expected frequencies are too small. Rules of thumb: • Cochran’s Rule requires that ejk > 5 for all cells. • Up to 20% of the cells may have ejk < 5 • • Most agree that a chi-square test is infeasible if ejk < 1 in any cell. If this happens, try combining adjacent rows or columns to enlarge the expected frequencies. 14-7 Chi-Square Test for Goodness-of-Fit Why Do a Chi-Square Test on Numerical Data? • • • The researcher may believe there’s a relationship between X and Y, but doesn’t want to use regression. There are outliers or anomalies that prevent us from assuming that the data came from a normal population. The researcher has numerical data for one variable but not the other. Purpose of the Test • • • The goodness-of-fit (GOF) test helps you decide whether your sample resembles a particular kind of population. The chi-square test will be used because it is versatile and easy to understand. Goodness-of-fit tests may lack power in small samples. As a guideline, a chi-square goodness-of-fit test should be avoided if n < 25. 14-8 Chi-Square Test for Goodness-of-Fit Multinomial GOF Test • • A multinomial distribution is defined by any k probabilities p1, p2, …, pk that sum to unity. For example, consider the following “official” proportions of M&M colors. The hypotheses are H0: p1 = .13, p2 = .13, p3 = .24, p4 = .20, p5 = .16, p6 = .14 H1: At least one of the pj differs from the hypothesized value 14-9 Chi-Square Test for Goodness-of-Fit Hypotheses for GOF • • The hypotheses are: H0: The population follows a _____ distribution H1: The population does not follow a ______ distribution The blank may contain the name of any theoretical distribution (e.g., uniform, Poisson, normal). 14-10 Chi-Square Test for Goodness-of-Fit Test Statistic and Degrees of Freedom for GOF • Assuming n observations, the observations are grouped into c classes and then the chi-square test statistic is found using: calc where fj = the observed frequency of observations in class j ej = the expected frequency in class j if H0 were true 14-11 Uniform Goodness-of-Fit Test Uniform Distribution • • • The uniform goodness-of-fit test is a special case of the multinomial in which every value has the same chance of occurrence. The chi-square test for a uniform distribution compares all c groups simultaneously. The hypotheses are: H0: p1 = p2 = …, pc = 1/c H1: Not all pj are equal 14-12 Uniform Goodness-of-Fit Test Uniform GOF Test: Grouped Data • • • • • • The test can be performed on data that are already tabulated into groups. Calculate the expected frequency ej for each cell. The degrees of freedom are n = c – 1 since there are no parameters for the uniform distribution. Obtain the critical value c2a from Appendix E for the desired level of significance a. The p-value can be obtained from Excel. Reject H0 if p-value < a. 14-13 Uniform Goodness-of-Fit Test Uniform GOF Test: Raw Data • • • • • First form c bins of equal width (Xmax – Xmin)/c and create a frequency distribution. Calculate the observed frequency fj for each bin. Define ej = n/c and perform the chi-square calculations. The degrees of freedom are n = c – 1 since there are no parameters for the uniform distribution. Obtain the critical value from Appendix E for a given significance level a and make the decision. 14-14 Uniform Goodness-of-Fit Test Uniform GOF Test: Raw Data • • • Calculate the mean and standard deviation of the uniform distribution as: s= [(b – a + 1)2 – 1)/12 m = (a + b)/2 If the data are not skewed and the sample size is large (n > 30), then the mean is approximately normally distributed. So, test the hypothesized uniform mean using 14-15 14-15