Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Tutorial: Chi-Square Distribution Presented by: Nikki Natividad Course: BIOL 5081 - Biostatistics Purpose To measure discontinuous categorical/binned data in which a number of subjects fall into categories We want to compare our observed data to what we expect to see. Due to chance? Due to association? When can we use the Chi-Square Test? ◦ Testing outcome of Mendelian Crosses, Testing Independence – Is one factor associated with another?, Testing a population for expected proportions Assumptions: 1 or more categories Independent observations A sample size of at least 10 Random sampling All observations must be used For the test to be accurate, the expected frequency should be at least 5 Conducting Chi-Square Analysis 1) 2) 3) 4) 5) 6) Make a hypothesis based on your basic biological question Determine the expected frequencies Create a table with observed frequencies, expected frequencies, and chi-square values using the formula: (O-E)2 E Find the degrees of freedom: (c-1)(r-1) Find the chi-square statistic in the Chi-Square Distribution table If chi-square statistic > your calculated chi-square value, you do not reject your null hypothesis and vice versa. Example 1: Testing for Proportions HO: Horned lizards eat equal amounts of leaf cutter, carpenter and black ants. HA: Horned lizards eat more amounts of one species of ants than the others. Leaf Cutter Ants Carpenter Ants Black Ants Total Observed 25 18 17 60 Expected 20 20 20 60 O-E 5 -2 -3 0 (O-E)2 E 1.25 0.2 0.45 χ2 = 1.90 χ2 = Sum of all: (O-E)2 E Calculate degrees of freedom: (c-1)(r-1) = 3-1 = 2 Under a critical value of your choice (e.g. α = 0.05 or 95% confidence), look up Chi-square statistic on a Chi-square distribution table. Example 1: Testing for Proportions χ2α=0.05 = 5.991 Example 1: Testing for Proportions Leaf Cutter Ants Carpenter Ants Black Ants Total Observed 25 18 17 60 Expected 20 20 20 60 O-E 5 -2 -3 0 (O-E)2 E 1.25 0.2 0.45 χ2 = 1.90 Chi-square statistic: χ2 = 5.991 Our calculated value: χ2 = 1.90 *If chi-square statistic > your calculated value, then you do not reject your null hypothesis. There is a significant difference that is not due to chance. 5.991 > 1.90 ∴ We do not reject our null hypothesis. SAS: Example 1 Included to format the table Define your data Indicate what your want in your output SAS: Example 1 SAS: What does the p-value mean? “The exact p-value for a nondirectional test is the sum of probabilities for the table having a test statistic greater than or equal to the value of the observed test statistic.” High p-value: High probability that test statistic > observed test statistic. Do not reject null hypothesis. Low p-value: Low probability that test statistic > observed test statistic. Reject null hypothesis. SAS: Example 1 High probability that Chi-Square statistic > our calculated chi-square statistic. We do not reject our null hypothesis. SAS: Example 1 Example 2: Testing Association c HO: Gender and eye colour are not associated with each other. HA: Gender and eye colour are associated each other. cellchi2 with = displays how much each cell contributes to the overall chi-squared value no col = do not display totals of column no row = do not display totals of rows chi sq = display chi square statistics Example 2: More SAS Examples Example 2: More SAS Examples (2-1)(3-1) = 1*2 = 2 High probability that Chi-Square statistic > our calculated chi-square statistic. (78.25%) We do not reject our null hypothesis. Example 2: More SAS Examples If there was an association, can check which interactions describe association by looking at how much each cell contributes to the overall Chi-square value. Limitations No categories should be less than 1 No more than 1/5 of the expected categories should be less than 5 ◦ To correct for this, can collect larger samples or combine your data for the smaller expected categories until their combined value is 5 or more Yates Correction* ◦ When there is only 1 degree of freedom, regular chitest should not be used ◦ Apply the Yates correction by subtracting 0.5 from the absolute value of each calculated O-E term, then continue as usual with the new corrected values What do these mean? Likelihood Ratio Chi Square Continuity-Adjusted Chi-Square Test Mantel-Haenszel Chi-Square Test QMH = (n-1)r2 r2 is the Pearson correlation coefficient (which also measures the linear association between row and column) ◦ http://support.sas.com/documentation/cdl/en/procstat/63104/HTML/def ault/viewer.htm#procstat_freq_a0000000659.htm Tests alternative hypothesis that there is a linear association between the row and column variable Follows a Chi-square distribution with 1 degree of freedom Phi Coefficient Contigency Coefficient Cramer’s V Yates & 2 x 2 Contingency Tables HO: Heart Disease is not associated with cholesterol levels. HA: Heart Disease is more likely in patients with a high cholesterol diet. High Cholesterol Low Cholesterol Total Heart Disease 15 7 22 Expected 12.65 9.35 22 Chi-Square 0.44 0.59 1.03 No Heart Disease 8 10 18 Expected 10.35 7.65 18 Chi-Square 0.53 0.72 1.25 TOTAL 23 17 40 Chi-Square Total Calculate degrees of freedom: (c-1)(r-1) = 1*1 = 1 We need to use the YATES CORRECTION 2.28 Yates & 2 x 2 Contingency Tables HO: Heart Disease is not associated with cholesterol levels. HA: Heart Disease is more likely in patients with a high cholesterol diet. High Cholesterol Low Cholesterol Total Heart Disease 15 7 22 Expected 12.65 9.35 22 Chi-Square 0.27 0.37 0.64 No Heart Disease 8 Expected 10.35 Chi-Square 0.33 TOTAL 23 Chi-Square Total 10 (|15-12.65| - 0.5)2 12.65 7.65 0.45 = 0.27 17 18 18 0.78 40 1.42 Example 1: Testing for Proportions χ2α=0.05 = 3.841 Yates & 2 x 2 Contingency Tables HO: Heart Disease is not associated with cholesterol levels. HA: Heart Disease is more likely in patients with a high cholesterol diet. High Cholesterol Low Cholesterol Total Heart Disease 15 7 22 Expected 12.65 9.35 22 Chi-Square 0.27 0.37 0.64 No Heart Disease 8 10 18 Expected 10.35 7.65 18 Chi-Square 0.33 0.45 0.78 TOTAL 23 17 40 Chi-Square Total 3.841 > 1.42 ∴ We do not reject our null hypothesis. 1.42 Fisher’s Exact Test Left: Use when the alternative to independence is negative association between the variables. These observations tend to lie in lower left and upper right cells of the table. Small p-value = Likely negative association. Right: Use this one-sided test when the alternative to independence is positive association between the variables. These observations tend to lie in upper left and lower right cells or the table. Small p-value = Likely positive association. Two-Tail: Use this when there is no prior alternative. Yates & 2 x 2 Contingency Tables Yates & 2 x 2 Contingency Tables HO: Heart Disease is not associated with cholesterol levels. HA: Heart Disease is more likely in patients with a high cholesterol diet. Conclusion The Chi-square test is important in testing the association between variables and/or checking if one’s expected proportions meet the reality of one’s experiment There are multiple chi-square tests, each catered to a specific sample size, degrees of freedom, and number of categories We can use SAS to conduct Chi-square tests on our data by utilizing the command proc freq References Chi-Square Test Descriptions: http://www.enviroliteracy.org/pdf/materials/1210.pdf http://129.123.92.202/biol1020/Statistics/Appendix%206 %20%20The%20Chi-Square%20TEst.pdf Ozdemir T and Eyduran E. 2005. Comparison of chi-square and likelihood ratio chi-square tests: power of test. Journal of Applied Sciences Research. 1(2):242-244. SAS Support website: http://www.sas.com/index.html “FREQ procedure” YouTube Chi-square SAS Tutorial (user: mbate001): http://www.youtube.com/watch?v=ACbQ8FJTq7k