Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Chi Square 2 Parametric Statistics • Everything we have done so far assumes that data are representative of a probability distribution (normal curve). • We are making inferences about the parameters (statistics of the population) of the distribution. • That is why this is called parametric statistics. Non-Parametric Statistics • If the data are not assumed to be part of a probability distribution, then there is no distribution to which to make inferences. • The most frequent reason this happens is when data are not interval. • To make inferences, some characteristic of the data must approximate a probability distribution. • These are call non-parametric statistics. Variables • Nominal – Named groups (mode) • Ordinal – Ordered named groups (median) • Interval (Ratio) – Continuous scales (mean) A Problem • With nominal or ordinal data we cannot compare group means. – average category or average of low/medium/high • But, there ought to be some way to know if the the distributions of responses in nominal or ordinal categories could have occurred by chance. • The most common way to do this is with a 2 (chi square). Reading Selection by Source 50 Female Male 40 30 20 10 0 None Book Pre Dist Magazine Post Dist Online Contingency Tables (Cross Tabs) Test of Independence Book Magazine Online Women 24 36 42 Men 22 19 30 • Are these numbers independent? If they are not, then it means that one variable influences the other. Contingency Tables (Cross Tabs) Test of Independence Book Magazine Online Women 24 36 42 Men 22 19 30 • If gender is not influenced by reading preference then the proportions for each category of reading preference should be the same. Contingency Tables (Cross Tabs) Test of Independence Book Magazine Online Women 24 36 42 Men 22 19 30 • If reading preference is not influenced by gender then the proportions for each category of gender should be the same. Contingency Tables (Cross Tabs) Test of Independence Book Magazine Online Women 24 36 42 Men 22 19 30 • If neither is influenced by the other then the proportions should be the same throughout the model. • Is the the distribution of reading preference different by gender? • Is the distribution of gender different by reading preference? • Are these two variables related? • Null: the two variables are not related— they are independent. Chi Square Tests of Independence • Given the Observed Frequencies there ought to be some way to imagine what the most likely number would be in each cell if the numbers were independent. • This is kind of super averaging the counts in related cells and predicting what should be in the cell. Contingency Tables Test of Independence Book Women 24 Men 22 Magazine ? 36 ? 19 ? Online 42 ? 30 ? ? Chi Square Tests of Independence • Given the Observed Frequencies, determine what would be in each cell if the variance in the rows and columns were accounted for—looking at row and column proportions simultaneously. • This is done by: Row total x Column total/ Total • This new set of values is called the Expected Frequencies Contingency Tables Test of Independence Book Women 24 Men 22 Magazine 36 27.12 42 32.43 19 18.88 Online 42.45 30 22.57 29.55 Contingency Tables Test of Independence Book Women 24 Men 22 Magazine 36 27.12 42 32.43 19 18.88 Online 42.45 Row Total = 102 30 22.57 Column Total = 46 29.55 Sample Total = 173 (102 x 46)/173 = 27.12 Now What? (Computing a Chi Square) • The gathered data are the Observed Values. • Expected Values—a computation of what should be in each cell based on the existing sample distribution. • First the computer builds a model that represents the expected frequencies for each cell. • Then the differences between the observed frequencies the expected frequencies are computed. (O – E) Now What? (Computing a Chi Square) • Because (O – E) might be negative each difference is squared. • Since we want to know when the differences are comparatively big or small the real number difference has to be turned into a ratio. • Now it is time to add all of these up: the sum of squared differences—chi square • Last, given the df, the computed sum of squares is compared to a distribution of possible sum of squares. (O – E)2 (O – E)2 E Σ (O – E)2 E Chi Distribution This curve represents the probability of getting a given chi square. (The sum of each of the squared differences divided by the expected frequency.) 5% of the area There is one of these for every possible degrees of freedom df = (number of rows - 1) x (number of columns - 1) • Sometimes the difference between the actual values and the expected values is so small that they it can be attributed to chance variation. • If that is true we say that the variables are independent. • Sometimes the difference between the actual values and the expected values is so large that it is worth talking about why those differences appeared. The difference is so large it is unlikely to have happened by chance. (p <.05) Contingency Tables Test of Independence Book Women 24 Men 22 Magazine 36 27.12 42 32.43 19 18.88 Online 42.45 30 22.57 29.55 chi square = 1.85 Chi Distribution This curve represents the probability of getting a given chi square. (The sum of all the differences squared divided by the total number of data points.) 2 df 1.85 5% of the area There is one of these for every possible degrees of freedom (rows-1) x (columns-1) Contingency Tables Test of Independence Book Women 24 Men 22 Magazine 36 27.12 42 32.43 19 18.88 Online 42.45 30 22.57 29.55 chi square = 1.85 p = .40 2 Cautions • When observed values drop below 5, the estimator has too much influence on the statistic. • In other words, do not do 2 with small samples. • Avoid over interpreting the results. Caution: chi square only tells if the total difference was likely to occur by chance—not individual differences. You can only say IF the variables are related—not how. Book Women 24 Men 22 Magazine 36 27.12 42 32.43 19 18.88 Online 42.45 30 22.57 29.55 chi square = 1.85 p = .40 Using Excel to Compute 2 The Chi Square Calculation • You will never see the distribution, only the chi calculation and the p value. • The chi table example is on the webpage Table 1 Faculty and Student Self-Perception of Technology Competence by Gender Reported Skill Level Minimally Skilled Moderately Skilled Accomplished Male Faculty 7 56 22 Female Faculty 48 153 79 Male Students 4 15 17 Female Students χ2 (6) = 13.22, p = .04 2 15 4 Actual counts Degrees Chi of Square value p value Freedom Chi Square (Goodness of Fit) • Special case when the expected frequencies are predetermined. • Are there important differences between what we are seeing and some assumed norm (the predetermined values)? 2 Goodness of Fit • Counts in categories • Compares observed counts to norms. • Tests to see if the differences between the two are so large they are unlikely to have occurred randomly. 2 Goodness of Fit Expected Observed Expected Expected Observed Observed 2 Goodness of Fit 2 Goodness of Fit Chi Square = 8.09 p = .018 In Excel Ham 16 23 Cheese 34 23 Chi Square 0.01753637 =CHITEST(actual_range,expected_range) Caprese 19 23 Examples of Goodness of Fit 2 • Technically all 2 are Goodness of fit. • Uses might be: – When no difference is predicted in categories. – Comparison of two iterations of the same group to see if change has occurred. – Comparison to known distribution (i.e., z-scores) Chi Square • Non-parametric comparisons are weak. • They serve as a motivation for parametric analysis. • Sometimes they are all that is possible.