Important Terms 1. Qualitative Variable: Data that expresses observations which cannot be measured numerically. For example, colour, gender, preference of something. Also known as categorical variables. 2. Quantitative Variable: Data expressing a certain quantity, amount of range. For example, height (meters), speed(km/h). 3. Statistically Significant: The likelihood that a relationship between two or more variables is caused by something other than random chance. 4. Null Hypothesis: The hypothesis that there is no significant difference between samples of specific variables. Any observed difference is due to sampling or experimental error. Variables are independent of each other. No relationship. 5. P-Value: The P-Value, or calculated probability is the probability of finding the observed, or more extreme results, when the null hypothesis is true. The chance of finding another result at least as extreme as your findings, when the Null Hypothesis is true. 6. Critical Region: The set of outcomes of a statistical test for which the null hypothesis is to be rejected. 7. Significance Level: The probability of rejecting the null hypothesis when it is true, usually set at 0.05 (5%). A significance level of 0.05 indicates a 5% risk of concluding a difference exists, when there is no actual difference – a very low risk. Again for a significance level of 0.05 we’d expect to find a sample mean that falls in the critical region, 5% of the time. If the P-Value is lower than the Significance Level, the relationship is significant. If the P-Value is greater than the Significance Level, the relationship is not significant. 8. Degrees of Freedom: Each of a number of independent values or quantities which can be assigned to a statistical distribution. DF=(R-1) x (C-1) 9. Chi-Squared Distribution: the distribution of a sum of the squared of K (any number of) independent standard normal random variables. Standard Normal variables indicates they are distributed in the shape of a bell curve, then the sum of their squares. The Chi Square The Chi Square Test is used to test relationships on categorical variables. It measures the divergence between observed data and expected data. - Chi Squared Goodness-of-Fit Test: Test how well a sample of categorical data fits a theoretical distribution. - Chi Square Test of Association: Determines whether one variable is associated with a different variable. Eg: Whether sale of different colours of cars depend on the city they are sold. - Chi Square Test of Independence: Determines whether the observed value of one variable depends on the observed value of a different variable. Eg: Whether the candidate a person votes for is related to their gender. Use of the Chi Square The test is used when you have at least two categorical variables (age, gender, ethnicity). It is used to determine whether there is a significant relationship between the two variables. This method should be used when the sampling method was random, the variables are categorical (qualitative data) and the expected frequencies was at least 5 in each cell of the table. For example, in an election survey, voters may be classified by gender (male or female) and voting preference (democrat, republican or independent). We could use a chi squared test to determine whether there is a relationship between gender and voting preference. How it is used 1. State the Hypothesis. H0: Variable A and Variable B are independent (not related). H1: Variable A and Variable B are dependent (related. 2. Analysis Plan. Using sample data to accept or reject the null hypothesis (no relationship). One must choose significance levels (between 0 and 1), usually 0.01, 0.5 or 0.10 are used. This means 1%, 5% or 10% chance. 3. Calculate Degrees of Freedom Number of categories minus 1. DF = (r-1) x (c-1) (rows and columns) Or, easier way is the number of 4. Calculate Expected Frequencies For every cell in the table 5. Calculate X2 6. Interpret Results. If the P-Value is less than the significance level, the null hypothesis (H0) is false - we can conclude there is a relationship between the two variables. If the P-value is greater than the significance level, the null hypothesis is true - there is no relationship. This P-Value indicates that a chi-square statistic having 2 degrees of freedom is more extreme than 16.2. Therefore, we cannot accept the null hypothesis. If we use a chi-square distribution calculator, the P-value for an X2 value of 16.2, is 0.0003, or 0.03% chance. Since the P-Value, 0.0003 is lower than the significance level of 0.05, we can conclude that there is a relationship between gender and voting preference. There is only a 0.03% chance of concluding there is a relationship between the 2 variables, when there isn’t.