Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Yr Yes Fr 15 (15%) 85 100 E: 30 E:70 Chi: 7.5 Chi: 3.2 So 25 (25%) 75 E: E: Chi: Chi: 100 Jr 30 (30%) 70 E: E: Chi: Chi: 100 Se 50 (50%) 50 E: E: Chi: Chi: 100 120 400 Total No 280 Total Expected count = Chi-square Contribution = Get Chi-square statistic by summing contributions from all the cells How many df for the chiSquare statistic? The chi-square statistic follows a chisquare distribution with a specific number of degrees of freedom (df ). Yr Yes Fr 15 85 (15%) 100 So 25 75 (25%) 100 Jr 30 70 (30%) 100 Se 50 50 (50%) 100 Total 120 No 280 Total 400 df = (r – 1)(c – 1) r = # row variable categories d = # column variable categories For our example, df = A. 1 B. 2 C. 3 D. 4 Example 2: Two Different Categorical Variables PSU First Choice (1 – 2) (3 – 5) (6 – 8) (≥ 9) Total No 12 60 29 19 120 Yes 100 128 41 11 280 Total 112 188 70 30 401 Double all counts to get: PSU First Choice (1 – 2) (3 – 5) (6 – 8) (≥ 9) Total No 24 120 58 38 240 Yes 200 256 82 22 560 Total 224 376 140 60 802 Pearson chisquare = 40.392 For fixed proportions, increasing the sample size strengthens the evidence against H0. Pearson chisquare = 80.784, which is exactly doubled! Hypotheses for ANOVA: Remember: There are groups within the population, as defined by their values of the categorical variable. H0: Population means are the same for each group. Ha: Not all population group means are the same. H0: µ1 = µ2 = … = µk Ha: Not all µ’s are the same. In our particular situation… H0: Each class (F, So, J, Se) has the same mean study time. Ha: The mean study times are not the same for each class. ANOVA output is summarized in a single table: Analysis of Variance Source CollYear Error Total DF 3 283 286 Adj SS 186.7 25087.6 25274.2 The second row gives the “within” variation. Adj MS 62.23 88.65 F-Value 0.70 P-Value 0.552 The F statistic is merely the ratio of the MS between to the MS within. It is the test statistic we use for ANOVA! One more ANOVA table fact: The MS error, or MS within, is also called the pooled sample variance. You can take its square root to get the pooled standard deviation. The p-value is based on the F statistic and the two DF values for between and within.