Download RM_Chi_Square

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
Transcript
Chi-Square
X2
Review: the “null” hypothesis
•
•
•
Inferential statistics are used to test hypotheses
Whenever we use inferential statistics the “null
hypothesis” applies
– Null hypothesis: There is no relationship
between variables. Any apparent effect was
produced by chance
– To reject the null, the test statistic (e.g., R2,
t, b, X2, etc.) must be so large that the
probability the null is true is less than five
in one-hundred (< .05)
How do we know if the null is true?
– Compare the test statistic to a table
– “Probability” or p means the chance that
the null hypothesis is true
– In a study, look for asterisks in the
statistic’s column. If there is no asterisk, the
null for that relationship is true.
– Usually one asterisk (*) means the
probability the null is true is less than 5 in
100 (p <.05). Two asterisks (**) is better (p
<.01, probability the null is true is less than
one in 100). Three (***) is great (p <.001,
probability less than one in 1,000.)
Null hypothesis is true
Reject null hypothesis
Chi-Square (X 2)
Hypothesis: Gender  Court disposition
•
•
•
•
•
A test statistic, used to test hypotheses
Tests for relationship between two categorical
variables (nominal or ordinal)
Yields a coefficient that can be looked up in a table
– The larger the coefficient, the less the
probability that the null hypothesis is correct
Evaluates difference between Observed and
Expected cell frequencies:
– “Observed” means the actual data
– “Expected” means what we would expect if
there was no relationship between the
variables
– If there is no difference between observed and
expected frequencies,  2 is zero and the null
hypothesis is true
– Greater the difference, the larger the value of
 2, thus the smaller the probability that the
null hypothesis is true
We will always place the values of the IV in rows,
and of the DV in columns. It can be done the other
way, and does not affect computing  2.
Court disposition (observed)
Gender
Jail
Released
Total
Male
84
16
100
Female
30
20
50
Total
114
36
n = 150
Court disposition (expected)
Gender
Jail
Released
Total
Male
76
24
100
Female
38
12
50
Total
114
36
n = 150
Building the “expected” table
Hypothesis: Gender  Court disposition
“Observed” table - the actual data
Create a new table from scratch
Court disposition
Gender
Jail
Released
Total
Male
84
16
100
Female
30
20
50
Total
114
36
n = 150
“Expected” table - what you expect if the null
hypothesis of no relationship is true
1. Bring over the “marginals” - all the totals
Court disposition
Gender
Male/Jail:
Male/Released:
Female/Jail:
Female/Released:
Released
Total
Male
100
Female
50
Total
2. Fill in each
cell, one at a
time
Jail
Divide its row total
by the grand total,
then multiply by its
column total
114
36
n = 150
Building the “expected” table
Hypothesis: Gender  Court disposition
“Observed” table - the actual data
Create a new table from scratch
Court disposition (observed)
Gender
Jail
Released
Total
Male
84
16
100
Female
30
20
50
Total
114
36
n = 150
Male/Jail: 100/150 X 114=75.9=76
Male/Released: 100/150 X 36=23.9=24
Female/Jail: 50/150 X 114=37.9=38
Female/Released: 50/150 X 36=11.9=12
2. Fill in each
cell, one at a
time
“Expected” table -“expected” frequencies if the null
hypothesis of no relationship between variables is true
1. Bring over the “marginals” - all the totals
Court disposition (expected)
Gender
Jail
Released
Total
Male
76
24
100
Female
38
12
50
Total
114
36
n = 150
Divide its row total
by the grand total,
then multiply by its
column total
Demonstrating the meaning of “expected”
Court disposition (expected freqs.)
Court disposition (expected pcts.)
Gender
Jail
Released
Total
Gender
Jail
Released
Total
Male
76
24
100
Male
76%
24%
100%
Female
38
12
50
Female
76%
24%
100%
Total
114
36
n = 150
Checking the expected frequencies table by converting it into percentages
In an expected table, as the value of the independent variable changes,
the distribution across the dependent variable should remain the same
In this example, as we switch the value of independent variable gender,
the distribution across dependent variable court disposition doesn’t change
A properly done expected table will always show no relationship -- it’s the null hypothesis!
Comparing the observed and expected tables:
the meaning of Chi-Square (X 2)
•
•
•
•
•
•
The observed table is the data, as we find it
The expected table is purposely built to demonstrate no relationship between variables.
It “is” the null hypothesis.
To determine whether the observed table demonstrates a relationship between
variables, we compare its cell frequencies to those in the “expected” table
– The less similar the tables, the more likely that the working hypothesis is true, and
the less likely that the null hypothesis is true
 2 is a ratio that reflects the dissimilarity in cell frequencies. The more dissimilar, the
larger the  2 .
O= observed (actual) frequency E= expected frequency (if null hypothesis is true)
(O - E)2
 2 = ---------E
More formally,  2 is the ratio of systematic variation to chance variation. The larger the
ratio, the more likely that we can reject the null hypothesis.
Chi-square is not always a good measure because its accuracy is closely tied to sample
size.
– Over-estimate significance with large samples, under-estimate with small samples
– Ideal sample size is around 150, with no cells less than 5
Computing X2
Always pair up the corresponding cells and divide by the expected frequency
Observed frequencies
Court disposition
Expected frequencies
Court disposition
Gender
Jail
Released
Total
Gender
Jail
Released
Total
Male
84
16
100
Male
76
24
100
Female
30
20
50
Female
38
12
50
Total
114
36
n = 150
Total
114
36
n = 150
(O - E)2
(84-76)2
(16-24)2 (30-38)2
(20-12)2
 2 =  --------- = ----------- + ------------ + ------------ + ------------ = 10.5
E
76
24
38
12
Assessing the significance of X2
•
•
•
•
To reject the null hypothesis a test statistic, such as  2, must be of sufficient magnitude. The
larger the better!
df = rows minus 1 X columns minus 1 (r-1 X c-1)=(2 – 1) X (2 – 1)=1
In social science research we reject the null hypothesis when there are
fewer than five chances in 1,000 (p=<.05) that it is true. Our chi-square is larger than what we
need: there is less than one chance in a thousand (p=<.01) that the null is true.
Our observed data has proven so different from what would be expected if there was no
relationship between variables that we can reject the null hypothesis of no relationship. We
thus confirm the working hypothesis that gender affects disposition. There is less than one
chance in a thousand that we’re wrong!
 2 =10.5
 Null hypothesis is true
Reject null hypothesis 
Class exercise
Hypothesis: More building alarms  Less crime
•
•
•
•
•
•
Randomly sampled 120 businesses with alarms
• 50 had crimes, 70 didn’t
Randomly sampled 90 businesses without alarms
• 50 had crimes, 40 didn’t
Build the observed and expected tables
– Remember, they’re tables, so place the values of the independent variable in rows
Compute  2
(O - E)2
 2 = ---------E
Use the table to assess the
probability that the null
hypothesis is correct
df= r-1 X c-1
Convey your findings using
simple words. What does the
data show about building
alarms and crime? How certain
are you of your conclusions?
Observed (obtained) frequencies
Crime
Expected (by chance) frequencies
Crime
Alarm
Y
N
Total
Alarm
Y
N
Total
Y
50
70
120
Y
57
63
120
N
50
40
90
N
43
47
90
Total
100
110
210
Total
100
110
210
(O - E)2
(50-57)2
(70-63)2 (50-43)2
(40-47)2
2 =  --------- = ----------- + ------------ + ------------ + ------------ = 3.82
E
57
63
43
47
 2 = 3.82
df = r-1 X c-1 = (2 – 1) X (2 – 1) = 1
• To reject the null hypothesis at .05 level we need a  2 of 3.841 or greater
• Our chi-square is smaller, making the probability that the null hypothesis is true
greater than the max of five in one-hundred (defaults to next lower level, .10, or ten
chances in one-hundred that the null hypothesis is true)
• So we must accept the null hypothesis – there is NO significant relationship between
crime and alarms
Parking lot exercise
1. Graph the distribution of car values
for each parking lot
2. Fill in the frequency and percentage
tables
3. Use the frequency (not percentage!) table to create an “frequencies expected”
table (meaning, expected if the null hypothesis of no relationship is correct)
Frequencies observed
Row marginal
Total cases
Xcolumn marginal
Frequencies expected
10
20
X6=3
4. Compute X 2: Cell by corresponding cell, subtract EXPECTED from OBSERVED.
Square each difference. Divide each result by EXPECTED. Then total them up.
5. Check the table. Begin with the largest probability level that allows you to
reject the null hypothesis, .05. Is the Chi-square at least that large? If not, the
null hypothesis is true.
•
•
•
•
The greatest risk we can take that the null hypothesis is true is five in one-hundred (.05)
Our Chi-square, 8.66, is greater than 7.815, the required minimum
We can thus reject the NULL hypothesis and accept the WORKING hypothesis that higher
income persons drive more expensive cars, with only five chances in 100 of being wrong.
Larger Chi-squares could have reduced the risk that the null hypothesis is true to two in
one-hundred (.02), one in one-hundred (.01), or even one in one-thousand (.001)
Homework
Homework exercise
Hypothesis: Sergeants have more stress than patrol officers
Job Stress
Low
High
Total
Sergeant
30
60
90
Patrol Officer
86
24
110
116
84
200
Position on police force
Total
Source: Fitzgerald & Cox, Research Methods in Criminal Justice, p. 165
1. Calculate expected cell frequencies (null hypothesis of no relationship is true)
2. Compute Chi-square
3. Use table in Appendix E to determine your chi-square’s probability level
4. Can we reject the null hypothesis?
Homework answer
Job Stress
Low
High
Total
Sergeant
30
60
90
Patrol Officer
86
24
110
116
84
200
Position on police force
Total
Observed
Source: Fitzgerald & Cox, Research Methods in Criminal Justice, p. 165
Job Stress
Low
High
Total
Sergeant
52
38
90
Patrol Officer
64
46
110
116
84
200
Position on police force
Expected
Total
Source: Fitzgerald & Cox, Research Methods in Criminal Justice, p. 165
(30-52)2 (60-38)2 (86-64)2 (24-46)2
2 =  --------- + ---------- + --------- + --------- = 40.1
52
38
64
46
2 = 40.1
df = r-1 X c-1 = (2 – 1) X (2 – 1) = 1
To reject at .05 level need 2 = 3.841 or greater
Reject null hypothesis – Less than 1 chance in 1,000 that
relationship is due to chance
Practice for the final
•
You will test a hypothesis using two categorical variables and determine whether the
independent variable has a statistically significant effect.
•
You will be asked to state the null hypothesis.
•
You will used supplied data to create an Observed frequencies table. You will use it to create
an Expected frequencies table. You will be given a formula but should know the procedure.
•
You will compute the Chi-Square statistic and degrees of freedom. You will be given formulas
but should know the procedures by heart.
•
You will use the Chi-Square table to determine whether the results support the working
hypothesis.
– Print and bring to class: http://www.sagepub.com/fitzgerald/study/materials/appendices/app_e.pdf
•
Sample question: Hypothesis is that alarm systems prevent burglary. Random sample of 120
business with an alarm system and 90 without. Fifty businesses of each kind were burglarized.
– Null hypothesis: No significant difference in crime between businesses with and without
alarms
Observed frequencies
Expected frequencies
Observed frequencies
(50-57)2
--------- +
57
Expected frequencies
(70-63)2
(50-43)2
(40-47)2
---------- + ----------- + ----------- =
63
43
47
.86 + .78 + 1.14 + 1.04 = 3.82
– Chi-Square = 3.82
– Df = (r-1) X (c-1) = 1
– Check the table. Do the results support the working hypothesis? No - Chi-Square
must be at least 3.84 to reject the null hypothesis of no relationship between
alarm systems and crime, with only five chances in 100 that it is true