Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Quantitative Methods Partly based on materials by Sherry O’Sullivan Part 3 Chi - Squared Statistic Recap on T-Statistic • It used the mean and standard error of a population sample • The data is on an “interval” or scale • Mean and standard error are the parameters • This approach is known as parametric • Another approach is non-parametric testing Introduction to Chi-Squared • It does not use the mean and standard error of a population sample • Each respondent can only choose one category (unlike scale in t-Statistic) • The expected frequency must be greater than 5 in each category for the test to succeed. • If any of the categories have less than 5 for the expected frequency, then you need to increase your sample size – Or merge categories Example using Chi-Squared • “Is there a preference amongst the UW student population for a particular web browser? “ (Dr C Price’s Data) – They could only indicate one choice – These are the observed frequencies responses from the sample – This is called a ‘contingency table’ Firefox Observed 30 frequencies IExplorer Safari Chrome Opera 6 4 8 2 Was it just chance? • How confident am I? – Was the sample representative of all UW students? – Was the variation in the measurements just chance? • Chi-Squared test for significance – Several ways to use the test – Simplest is Null Hypothesis • H0: The students show “no preference” for a particular browser Chi-Squared: “Goodness of fit” (No preference) • H0: The students show “no preference” for a particular browser • This leads to Hypothetical or Expected distribution of frequency – We would expect an equal number of respondents per category – We had 50 respondents and 5 categories Expected frequencies Firefox IExplorer Safari Chrome Opera 10 10 10 10 10 Expected frequency table Stage1: Formulation of Hypothesis • H0: There is no preference in the underlying population for the factor suggested. • H1: There is a preference in the underlying population for the factors suggested. • The basis of the chi-squared test is to compare the observed frequencies against the expected frequencies Stage 2: Expected Distribution • As our “null- hypothesis” is no preference, we need to work out the expected frequency: – You would expect each category to have the same amount of respondents – Show this in “Expected frequency” table – Each expected frequency must be more than 5 to be valid Expected frequencies Firefox IExplorer Safari Chrome Opera 10 10 10 10 10 Stage 3a: Level of confidence • Choose the level of confidence (often 0.05; sometimes 0.01) – 0.05 means that there is 5% chance that conclusion is chance – 95% chance that our conclusions are accurate Stage 3b: Degree of freedom We need to find the degree of freedom This is calculated with the number of categories ◦ We had 5 categories, df = 5-1 (4) Stage 3b: Critical value of ChiSquared • In order to compare our calculated chisquare value with the “critical value” in the chi-squared table we need: – Level of confidence (0.05) – Degree of freedom (4) • Our critical value from the table = 9.49 Chi-Squared Table from http://ourwayit.com/CA517/LearningActivities.htm Stage 4: Calculate statistics • We find the differences between the observed and the expected values for each category • We square each difference, and divide the answer by its expected frequency • We add all of them up Firefox IExplorer Safari Chrome Opera Observed 30 6 4 8 2 Expected 10 10 10 10 10 = 52 Stage 5: Decision • Can we reject the H0 that students show no preference for a particular browser? – Our value of 52 is way beyond 9.49. We are (at least) 95% confident the value did not occur by chance – And probably much more confident than that • So yes we can safely reject the null hypothesis • Which browser do they prefer? – Firefox as it is way above expected frequency of 10 Alternative Method • Outline: Calculate chi-squared, and use the table to find the confidence • In this case, calculated Χ2 = 52 • Go to the appropriate row of the table, and look across for the highest value that is LOWER than the measured value • The top of that column gives our confidence that the effect is real Chi-Squared Table from http://ourwayit.com/CA517/LearningActivities.htm •The probability of this result happening by chance is less than 0.001 •We can be at least 99.9% confident of our result Chi-Squared: “No Difference from a Comparison Population”. • RQ: Are drivers of high performance cars more likely to be involved in accidents? – Sample n = 50 and Market Research data of proportion of people driving these categories FO = observed accident frequency Ownership (%) High Compact Midsize Performance 20 14 9 Full size 9 10% 20% 40% 30% Contingency Table – Null hypothesis H0: type of car has no effect on accident frequency – Once the expected frequencies (under the null hypothesis) have been calculated, the analysis is the same as the ‘no preference’ calculation High Compact Midsize Full Performance size FO = observed accident frequency 20 14 Ownership (%) FE = expected accident frequency 10% 40% 5 (10% of 50) 20 9 9 30% 15 20% 10 Chi-Squared test for “Independence”. • What makes computer games fun? • Review found the following – Factors (Mastery, Challenge and Fantasy) – Is there a different opinion depending on gender? • Research sample of 50 males and 50 females Mastery Challenge Fantasy Male 10 32 8 Female 24 8 18 Observed frequency table What is the research question? 1. A single sample with individuals measured on 2 variables – RQ: ”Is there a relationship between fun factor and gender?” – HO : “There is no such relationship” 2. Two separate samples representing 2 populations (male and female) – RQ: ““Do male and female players have different preferences for fun factors?” – HO : “Male and female players do not have different preferences” Chi-Squared analysis for “Independence”. • Establish the null hypothesis (previous slide) • Determine the critical value of chi-squared dependent on the confidence limit (0.05) and the degrees of freedom. – df = (Rows – 1)*(Columns – 1) = 1 * 2 = 2 (R=2, C=3) Mastery Challenge Fantasy Male 10 32 8 Female 24 8 18 • Look up in chi-squared table – Critical chi-squared value = 5.99 Chi-Squared Table from http://ourwayit.com/CA517/LearningActivities.htm Chi-Squared analysis for “Independence”. • Calculate the expected frequencies – Add each column and divide by types (in this case 2) – Easier if you have equal number for each gender (if not come and see me) Mastery Challenge Fantasy Respondents Male (FObs) 10 32 8 50 Female (FObs) 24 8 18 50 Cat total 34 40 26 Male (FExp) 17 20 13 Female (FExp) 17 20 13 Chi-Squared analysis for “Independence”. • Calculate the statistics using the chi-squared formula – Ensure you include both male and female data 2 2 2 2 (10 17) (32 20) (24 17) (8 20) 2 ... 17 20 17 20 24.01 Mastery Challenge Fantasy Male (FObs) 10 32 8 Female (FObs) 24 8 18 Male (FExp) 17 20 13 Female (FExp) 17 20 13 Stage 5: Decision • Can we reject the null hypothesis? – Our value of 24.01 is way beyond 5.99. We are 95% confident the value did not occur by chance • Conclusion: We are 95% confident that there is a relationship between gender and fun factor • But else can we get from this? – Significant fun factor for males = Challenge – Significant fun factor for females = Mastery and Fantasy Mastery Challenge Fantasy Male (FObs) 10 32 8 Female (FObs) 24 8 18 Male (FExp) 17 20 13 Female (FExp) 17 20 13 Alternative Method: • Outline: Calculate chi-squared, and use the table to find the confidence • In this case, calculated Χ2 = 24.01 • Go to the appropriate row of the table, and look across for the highest value that is LOWER than the measured value • The top of that column gives our confidence that the effect is real Chi-Squared Table from http://ourwayit.com/CA517/LearningActivities.htm •The probability of this result happening by chance is less than 0.001 •We can be at least 99.9% confident of our result Computers • A computer can be used to calculate the expected values – but you have to tell it how – Use formulae in Excel • Then the computer will calculate the p value for you – p = probability that the observed difference is due to chance – There is a nice command in Excel that will do this End