Download Chi-square tests

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
Transcript
Tutorial: Chi-Square Distribution
Presented by: Nikki Natividad
Course: BIOL 5081 - Biostatistics
Purpose



To measure discontinuous categorical/binned data in
which a number of subjects fall into categories
We want to compare our observed data to what we
expect to see. Due to chance? Due to association?
When can we use the Chi-Square Test?
◦ Testing outcome of Mendelian Crosses, Testing Independence –
Is one factor associated with another?, Testing a population for
expected proportions
Assumptions:






1 or more categories
Independent observations
A sample size of at least 10
Random sampling
All observations must be used
For the test to be accurate, the expected
frequency should be at least 5
Conducting Chi-Square Analysis
1)
2)
3)
4)
5)
6)
Make a hypothesis based on your basic biological
question
Determine the expected frequencies
Create a table with observed frequencies, expected
frequencies, and chi-square values using the formula:
(O-E)2
E
Find the degrees of freedom: (c-1)(r-1)
Find the chi-square statistic in the Chi-Square
Distribution table
If chi-square statistic > your calculated chi-square value,
you do not reject your null hypothesis and vice versa.
Example 1: Testing for Proportions
HO: Horned lizards eat equal amounts of leaf cutter, carpenter and black ants.
HA: Horned lizards eat more amounts of one species of ants than the others.
Leaf Cutter
Ants
Carpenter
Ants
Black Ants
Total
Observed
25
18
17
60
Expected
20
20
20
60
O-E
5
-2
-3
0
(O-E)2
E
1.25
0.2
0.45
χ2 = 1.90
χ2 = Sum of all: (O-E)2
E
Calculate degrees of freedom: (c-1)(r-1) = 3-1 = 2
Under a critical value of your choice (e.g. α = 0.05 or 95% confidence),
look up Chi-square statistic on a Chi-square distribution table.
Example 1: Testing for Proportions
χ2α=0.05 = 5.991
Example 1: Testing for Proportions
Leaf Cutter
Ants
Carpenter
Ants
Black Ants
Total
Observed
25
18
17
60
Expected
20
20
20
60
O-E
5
-2
-3
0
(O-E)2
E
1.25
0.2
0.45
χ2 = 1.90
Chi-square statistic: χ2 = 5.991
Our calculated value: χ2 = 1.90
*If chi-square statistic > your calculated value, then you do not reject your null
hypothesis. There is a significant difference that is not due to chance.
5.991 > 1.90 ∴ We do not reject our null hypothesis.
SAS: Example 1
Included to format
the table
Define your data
Indicate what your
want in your
output
SAS: Example 1
SAS: What does the p-value mean?
“The exact p-value for a nondirectional test is the sum of
probabilities for the table having a test statistic greater
than or equal to the value of the observed test statistic.”
High p-value: High probability that test statistic > observed
test statistic. Do not reject null hypothesis.
Low p-value: Low probability that test statistic > observed
test statistic. Reject null hypothesis.
SAS: Example 1
High probability that
Chi-Square statistic > our
calculated chi-square
statistic.
We do not reject our null
hypothesis.
SAS: Example 1
Example 2: Testing Association
c
HO: Gender and eye colour are not
associated with each other.
HA: Gender and eye colour are
associated
each other.
cellchi2 with
= displays
how much each cell
contributes to the overall chi-squared value
no col = do not display totals of column
no row = do not display totals of rows
chi sq = display chi square statistics
Example 2: More SAS Examples
Example 2: More SAS Examples
(2-1)(3-1) = 1*2 = 2
High probability that
Chi-Square statistic > our
calculated chi-square statistic.
(78.25%)
We do not reject our null
hypothesis.
Example 2: More SAS Examples
If there was an
association, can check
which interactions
describe association
by looking at how
much each cell
contributes to the
overall Chi-square
value.
Limitations



No categories should be less than 1
No more than 1/5 of the expected categories should be
less than 5
◦ To correct for this, can collect larger samples or
combine your data for the smaller expected categories
until their combined value is 5 or more
Yates Correction*
◦ When there is only 1 degree of freedom, regular chitest should not be used
◦ Apply the Yates correction by subtracting 0.5 from the
absolute value of each calculated O-E term, then
continue as usual with the new corrected values
What do these mean?
Likelihood Ratio Chi Square

Continuity-Adjusted Chi-Square Test

Mantel-Haenszel Chi-Square Test
QMH = (n-1)r2

r2 is the Pearson correlation coefficient (which also
measures the linear association between row and
column)
◦ http://support.sas.com/documentation/cdl/en/procstat/63104/HTML/def
ault/viewer.htm#procstat_freq_a0000000659.htm


Tests alternative hypothesis that there is a linear
association between the row and column variable
Follows a Chi-square distribution with 1 degree of
freedom
Phi Coefficient

Contigency Coefficient

Cramer’s V

Yates & 2 x 2 Contingency Tables
HO: Heart Disease is not associated with cholesterol levels.
HA: Heart Disease is more likely in patients with a high cholesterol diet.
High
Cholesterol
Low
Cholesterol
Total
Heart Disease
15
7
22
Expected
12.65
9.35
22
Chi-Square
0.44
0.59
1.03
No Heart Disease
8
10
18
Expected
10.35
7.65
18
Chi-Square
0.53
0.72
1.25
TOTAL
23
17
40
Chi-Square Total
Calculate degrees of freedom: (c-1)(r-1) = 1*1 = 1
We need to use the YATES CORRECTION
2.28
Yates & 2 x 2 Contingency Tables
HO: Heart Disease is not associated with cholesterol levels.
HA: Heart Disease is more likely in patients with a high cholesterol diet.
High
Cholesterol
Low
Cholesterol
Total
Heart Disease
15
7
22
Expected
12.65
9.35
22
Chi-Square
0.27
0.37
0.64
No Heart Disease
8
Expected
10.35
Chi-Square
0.33
TOTAL
23
Chi-Square Total
10
(|15-12.65| - 0.5)2
12.65
7.65
0.45
= 0.27
17
18
18
0.78
40
1.42
Example 1: Testing for Proportions
χ2α=0.05 = 3.841
Yates & 2 x 2 Contingency Tables
HO: Heart Disease is not associated with cholesterol levels.
HA: Heart Disease is more likely in patients with a high cholesterol diet.
High
Cholesterol
Low
Cholesterol
Total
Heart Disease
15
7
22
Expected
12.65
9.35
22
Chi-Square
0.27
0.37
0.64
No Heart Disease
8
10
18
Expected
10.35
7.65
18
Chi-Square
0.33
0.45
0.78
TOTAL
23
17
40
Chi-Square Total
3.841 > 1.42 ∴ We do not reject our null hypothesis.
1.42
Fisher’s Exact Test

Left: Use when the alternative to independence is
negative association between the variables. These
observations tend to lie in lower left and upper right
cells of the table. Small p-value = Likely negative
association.

Right: Use this one-sided test when the alternative to
independence is positive association between the
variables. These observations tend to lie in upper left
and lower right cells or the table. Small p-value = Likely
positive association.

Two-Tail: Use this when there is no prior alternative.
Yates & 2 x 2 Contingency Tables
Yates & 2 x 2 Contingency Tables
HO: Heart Disease is not
associated with cholesterol
levels.
HA: Heart Disease is more
likely in patients with a high
cholesterol diet.
Conclusion

The Chi-square test is important in testing the
association between variables and/or checking if one’s
expected proportions meet the reality of one’s
experiment

There are multiple chi-square tests, each catered to a
specific sample size, degrees of freedom, and number of
categories

We can use SAS to conduct Chi-square tests on our
data by utilizing the command proc freq
References
Chi-Square Test Descriptions:
http://www.enviroliteracy.org/pdf/materials/1210.pdf
http://129.123.92.202/biol1020/Statistics/Appendix%206
%20%20The%20Chi-Square%20TEst.pdf
Ozdemir T and Eyduran E. 2005. Comparison of chi-square and
likelihood ratio chi-square tests: power of test. Journal of
Applied Sciences Research. 1(2):242-244.
SAS Support website: http://www.sas.com/index.html
“FREQ procedure”
YouTube Chi-square SAS Tutorial (user: mbate001):
http://www.youtube.com/watch?v=ACbQ8FJTq7k