Download Chi Square_H

yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Important Terms
1. Qualitative Variable: Data that expresses observations which cannot be measured
numerically. For example, colour, gender, preference of something. Also known as
categorical variables.
2. Quantitative Variable: Data expressing a certain quantity, amount of range. For
example, height (meters), speed(km/h).
3. Statistically Significant: The likelihood that a relationship between two or more
variables is caused by something other than random chance.
4. Null Hypothesis: The hypothesis that there is no significant difference between
samples of specific variables. Any observed difference is due to sampling or
experimental error. Variables are independent of each other. No relationship.
5. P-Value: The P-Value, or calculated probability is the probability of finding the
observed, or more extreme results, when the null hypothesis is true. The chance of
finding another result at least as extreme as your findings, when the Null Hypothesis
is true.
6. Critical Region: The set of outcomes of a statistical test for which the null hypothesis
is to be rejected.
7. Significance Level: The probability of rejecting the null hypothesis when it is true,
usually set at 0.05 (5%). A significance level of 0.05 indicates a 5% risk of concluding
a difference exists, when there is no actual difference – a very low risk. Again for a
significance level of 0.05 we’d expect to find a sample mean that falls in the critical
region, 5% of the time. If the P-Value is lower than the Significance Level, the
relationship is significant. If the P-Value is greater than the Significance Level, the
relationship is not significant.
8. Degrees of Freedom: Each of a number of independent values or quantities which
can be assigned to a statistical distribution. DF=(R-1) x (C-1)
9. Chi-Squared Distribution: the distribution of a sum of the squared of K (any number
of) independent standard normal random variables. Standard Normal variables
indicates they are distributed in the shape of a bell curve, then the sum of their
The Chi Square
The Chi Square Test is used to test relationships on categorical variables. It measures the
divergence between observed data and expected data.
- Chi Squared Goodness-of-Fit Test: Test how well a sample of categorical data fits a
theoretical distribution.
- Chi Square Test of Association: Determines whether one variable is associated with a
different variable. Eg: Whether sale of different colours of cars depend on the city
they are sold.
- Chi Square Test of Independence: Determines whether the observed value of one
variable depends on the observed value of a different variable. Eg: Whether the
candidate a person votes for is related to their gender.
Use of the Chi Square
The test is used when you have at least two categorical variables (age, gender, ethnicity). It
is used to determine whether there is a significant relationship between the two variables.
This method should be used when the sampling method was random, the variables are
categorical (qualitative data) and the expected frequencies was at least 5 in each cell of the
table. For example, in an election survey, voters may be classified by gender (male or
female) and voting preference (democrat, republican or independent). We could use a chi
squared test to determine whether there is a relationship between gender and voting
How it is used
1. State the Hypothesis.
H0: Variable A and Variable B are independent (not related).
H1: Variable A and Variable B are dependent (related.
2. Analysis Plan.
Using sample data to accept or reject the null hypothesis (no relationship).
One must choose significance levels (between 0 and 1), usually 0.01, 0.5 or
0.10 are used. This means 1%, 5% or 10% chance.
3. Calculate Degrees of Freedom
Number of categories minus 1.
DF = (r-1) x (c-1) (rows and columns)
Or, easier way is the number of
4. Calculate Expected Frequencies
For every cell in the table
5. Calculate X2
6. Interpret Results. If the P-Value is less than the significance level, the null
hypothesis (H0) is false - we can conclude there is a relationship between the
two variables. If the P-value is greater than the significance level, the null
hypothesis is true - there is no relationship.
This P-Value indicates that a chi-square statistic having 2 degrees of freedom is more
extreme than 16.2. Therefore, we cannot accept the null hypothesis. If we use a chi-square
distribution calculator, the P-value for an X2 value of 16.2, is 0.0003, or 0.03% chance. Since
the P-Value, 0.0003 is lower than the significance level of 0.05, we can conclude that there
is a relationship between gender and voting preference. There is only a 0.03% chance of
concluding there is a relationship between the 2 variables, when there isn’t.