Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
381 Testing for Independence QSCI 381 – Lecture 41 (Larson and Farber, Sect 10.2) Independence 381 Two variables are independent if the occurrence of one variable does not affect the probability of the other. We often wish to examine whether two variables are independent: Age and having a “high” heavy metal concentration. Concerns regarding the most important factors influencing a fishery and occupation. Contingency Tables 381 An shows the observed frequencies for two variables. The observed frequencies are arranged in r rows and c columns. The intersection of a row and a column is called a cell. 381 Example-A-1 Age-class High heavy metals? 1-10 11-20 21-30 31-40 41+ Yes 12 16 22 21 16 No 219 180 232 190 75 We wish to examine whether having a high concentration of heavy metals is independent of age. Expected Frequencies 381 The expected frequency for a cell Er,c in a contingency table is: Er ,c (sum of row r ) x(sum of column c) Sample size Age-class Total High heavy metals? 1-10 11-20 21-30 31-40 41+ Yes 20.44 17.35 22.48 18.67 8.05 87 No 210.56 178.65 231.52 192.33 82.95 896 211 91 983 Total 231 196 254 381 The Chi-square Test for Independence-I A is used to test the independence of two variables. The conditions for use of this test are: the observed frequencies must be obtained from a random sample; and each expected frequency must be greater than or equal to 5. The null hypothesis for the test is that the variables are independent and the alternative hypothesis is that they are dependent. 381 The Chi-square Test for Independence-II The way this test works is to compare the observed frequencies with the expected frequencies (these expected frequencies are calculated assuming that the two variables are independent). If the value of the test statistic is high then we reject the null hypothesis of independence. 381 The Chi-square Test for Independence-III The test statistic for the chi-square independence test is: 2 i j (Oi , j Ei , j )2 Ei , j where Oij represents the observed frequencies and Eij represents the expected frequencies. The sampling distribution for the test statistic is a chi-square distribution with degrees of freedom (r-1)(c-1). Example-A-2 381 Age-class High heavy metals? 1-10 11-20 21-30 31-40 41+ Yes 3.488 0.105 0.010 0.290 7.840 No 0.339 0.010 0.001 0.028 0.761 2 i j (Oi , j Ei , j )2 Ei , j 12.871 The value of the test statistic is in the rejection region for =0.05 but not for =0.01. 381 Using EXCEL to conduct Chi-square Tests. EXCEL includes a function CHITEST which can be used to test for independence. CHITEST(observed range, expected range) CHITEST returns the probability associated with the test statistic, i.e. it returns CHIDIST(2,(r-1)(c-1)). The result of applying CHITEST to the data for the example is 0.011922, i.e. a probability less than 0.05 and greater than 0.01. Example-B-1 381 We sample 150 animals and assess the fraction in each of four categories to be: Mature Female 30 Mature Male 40 Immature Female 32 Immature Male 48 Test the null hypothesis that sex and maturity state are independent (=0.01). Example-B-2 381 Mature Immature Female 30 (28.93) 32 (33.07) Male 40 (41.07) 48 (46.93) 2=0.1256 We cannot reject the null hypothesis of independence. We did reject the null hypothesis that these data are consistent with a “healthy” marine mammal population. Homogeneity of Proportions 381 The chi-square test can be used to test the null hypothesis that proportions in various categories are equal among several populations. The alternative hypothesis for this test is that at least one proportion differs among populations.