Download Elementary Statistics and Inference Elementary Statistics and

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Elementary Statistics and
Inference
22S:025 or 7P:025
Lecture 39
1
Elementary Statistics and
Inference
22S:025 or 7P:025
Chapter 28 (cont.)
2
Chapter 28 – The Chi-Square Test (cont.)
A)
Testing Independence
„
The Chi-Square Test (χ2) can be used to test whether
responses to some issues are answered
independently or related – example,
example do respondents to
a survey question, “I think social security will be
available to retirees in 2040,” are answered
independently by gender and age range.
3
1
Chapter 28 – The Chi-Square Test (cont.)
Example: The HANES (p. 58) study took a probability
sample of 2,237 Americans, age 25-34. A question was
asked about “handedness”. The results are shown in
Table 5 below:
4
Chapter 28 – The Chi-Square Test (cont.)
„
The table of counts has 3 rows and 2 columns, and the
χ2-statistic can be used to answer the hypothesis:
H0: Is gender independent of handedness
H1: Gender is associated with handedness
5
Chapter 28 – The Chi-Square Test (cont.)
„
The expected counts are determined based on the
assumption that gender is independent of handedness –
for example:
1070 + 934 2004
=
2237
2237
number of men = 1067
P(right - handed) =
E (men who are right handed) = 1067 ×
2004
= 955.8
2237
6
2
Chapter 28 – The Chi-Square Test (cont.)
„
In general for m × n table – the expected counts are
l
k
n
as follows:
l
-
-
-
-
-
m
E jk
j
-
-
-
-
Ojk
-
-
-
-
-
-
m
-
-
-
-
-
Rj
Ck
N
= expected count in cell j, k if counts are distributed independently
E jk =
R j × Ck
7
N
Chapter 28 – The Chi-Square Test (cont.)
„
For the example in Table 7 –
(934 + 1070) × (934 + 113 + 20)
E11 =
= 955.8 ~ 956
2237
E12 =
(934 + 1070) × (1070 + 92 + 8) (2004) × (1170)
=
= 1048.1
2237
2237
E21 =
(934 + 113 + 20) × (113.92) (1067) × (205)
=
= 97.78 ~ 98
2237
2237
E31 =
(934 + 113 + 20) × (20 + 8) (1067) × (28)
=
= 13.3 ~ 13
2237
2237
8
Chapter 28 – The Chi-Square Test (cont.)
E22 =
(113 + 92) × (1070 + 92 + 8) (205) × (1170)
=
= 107.2 ~ 107
2237
2237
E32 =
(20 + 8) × (1070 + 92 + 8) (28) × (1170)
=
= 14.6 ~ 15
2237
2237
χ2 = ∑
j ,k
(ο jk − E jk ) 2
N
=
(934 − 956) 2 (1070 − 1048) 2 (173 − 98) 2
+
+
+
956
1048
98
(92 − 107) 2 (20 − 13) 2 (8 − 15) 2
+
+
= 12.4 ~ 12
107
13
15
9
3
Chapter 28 – The Chi-Square Test (cont.)
„
The degrees of freedom for an m × n table of counts is
the number of rows in the table – 1 x the number of
columns in the table -1.
df = ( m − 1)(n − 1)
In our example df = (3 − 1)(2 − 1) = 2
„
Then refer the computed χ2=12 to the appropriate row of
the Chi-Square probability table for 5%.
For df = 2, χ 52%,2 = 5.99.
10
Chapter 28 – The Chi-Square Test (cont.)
„
This means that if the computed χ2 value (i.e., 12)
exceeds 5.99, it would occur less than 5% of the time if
the null hypothesis were true - or the p-value is less than
5% - therefore reject the hypothesis, an association
exists between “gender and handedness”. The great
majority of persons in HANES in the 25
25-34
34 range were
“right-handed”. Further, “left-handed” people are more
likely to be men.
Exercise Set C (pp. 539-540) #2, 3, 5, 7
11
Chapter 28 – The Chi-Square Test (cont.)
#2.
(Hypothetical.) In a certain town, there are about
one million eligible voters. A simple random sample
of size 10,000 was chosen, to study the relationship
between sex and participation in the last election.
The results:
Men
Women
Voted
2,792
3,591
Didn’t Vote
1,486
2,131
Make a χ2-test of the null hypothesis that sex and
voting are independent.
12
4
Chapter 28 – The Chi-Square Test (cont.)
Expected Counts
Men
Women
Total
Voted
2,730.6
3,652.4
6,383
Didn’t Vote
1,547.4
2,069.6
3,617
4,278
5,722
10,000
df = (2 − 1)( 2 − 1) = 1
6383× 5722
6383 × 4278
6383×
= 3652.4
E12 =
= 2730.6
10,000
10,000
3617 × 5722
3617 × 4278
E 22 =
= 2069.6
= 1547.4
E 21 =
10,000
10,000
2
2
2
(2792 − 2730.6)
(3591 − 3652.4)
(1486 − 1547.4)
( 2131 − 2069.6) 2
χ2 =
+
+
+
2730.6
3652.4
1547.4
2069.6
χ 2 = 1.38 + 1.03 + 2.44 + 1.82 = 6.67,
χ 5%,1 = 3.84
E11 =
ƒ
Based on the evidence, reject the hypothesis of
independence – men are more likely to have voted.
13
Chapter 28 – The Chi-Square Test (cont.)
#7.
To test whether a die is fair, someone rolls it 600
times. On each roll, he just records whether the
result was even or odd, and large (4, 5, 6) or small
(1, 2, 3). The observed frequencies turn out as
follows:
Large
Small
Even
183
113
Odd
88
216
Question: Is the die fair?
14
Chapter 28 – The Chi-Square Test (cont.)
To answer this question, you use –
i)
the one-sample z-test.
ii)
iii)
the two-sample z-test.
the χ2-test
test, with a null hypothesis that tells you
the contents of the box (section 1).
iv)
The χ2-test for independence (section 4).
Now answer the question.
15
5
Chapter 28 – The Chi-Square Test (cont.)
Apply Goodness-of-Fit Test
(n=600)
„
Outcomes
Probability
Expected
Outcomes
Observed
Outcomes
Even Large (4 or 6)
2/6
600 x 2/6 = 200
183
Even Small (2)
1/6
600 x 1/6 = 100
113
Odd Large (5)
1/6
600 x 1/6 = 100
88
Odd Small (1 or 3)
2/6
600 x 2/6 = 200
216
600
600
16
Chapter 28 – The Chi-Square Test (cont.)
(200 − 183) 2 (100 − 113) 2 (100 − 88) 2 ( 200 − 216) 2
+
+
+
183
113
88
216
χ 2 = 1.58 + 1.50 + 1.64 + 1.19 = 5.91
χ2 =
χ 52%,3 = 7.82,
df = 3,
ƒ
p − value > 5% - retain H 0
The die is fair!
17
Chapter 28 – The Chi-Square Test (cont.)
E.
Review Exercises – (pp. 541-543) #1, 2, 3, 6, 8, 10
#2.
As part of a study on the selection of grand juries in
Alameda county, the educational level of
grand jurors was compared with the county
distribution:
Educational Level
County
Number of Jurors
Elementary
28.4%
1
Secondary
48.5%
10
Some College
11.9%
16
College Degree
11.2%
35
100.0%
62
Total
18
6
Chapter 28 – The Chi-Square Test (cont.)
Could a simple random sample of 62 people from the
county show a distribution of educational level so
different from the county-wide one? Choose one
option and explain.
i)
This is absolutely impossible
impossible.
ii)
This is possible, but fantastically unlikely.
iii)
This is possible but unlikely – the chance is
around 1% or so.
iv)
This is quite possible – the chance is around
10% or so.
v)
This is nearly certain.
19
Chapter 28 – The Chi-Square Test (cont.)
Expected Counts (n=62) df = 3
(ο − E ) 2
Count
Observed
28.4% x 62 =
17.6
1
15.7
S
Secondary
d
48 5% x 62 =
48.5%
30 1
30.1
10
13 4
13.4
Some College
11.9% x 62 =
7.4
16
9.9
College Degree 11.2% x 62 =
6.9
35
114.4
62
62
153
Elementary
E
χ2= 153, p-value < 1%
20
Chapter 28 – The Chi-Square Test (cont.)
#8.
Two people are trying to decide whether a die is fair.
They roll it 100 times, with the results shown at the
top of the next page. One person wants to make a
z-test, the other wants to make a χ2-test. Who is
g
Explain
p
briefly.
y
right?
21
7
Chapter 28 – The Chi-Square Test (cont.)
Outcomes
Observed
Expected
(ο − E ) 2
E
1
21
16.67
1.12
2
15
16.67
0.17
3
13
16.67
0.81
4
17
16.67
0.01
5
19
16.67
0.33
6
15
16.67
0.17
100
χ2 = 2.61, df = 5,
2.61
p-value > 5%, retain
22
Chapter 28 – The Chi-Square Test (cont.)
#10.
The U.S. has bilateral extradition treaties with many
countries. ( A person charged with a crime in his
home country may escape to the U.S.; if he is
captured in the U.S., authorities in his home country
may request that he be “extradited,” that is, turned
over to them for prosecution under their laws.)
The Senate attached a special rider to the treaty
governing extradition to Northern Ireland:
fugitives cannot be returned if they will be
discriminated against on the basis of religion. In
a leading case, the defense tried to establish
discrimination in Northern Ireland’s criminal
justice system.
23
Chapter 28 – The Chi-Square Test (cont.)
One argument was based on 1991 acquittal rates for
persons charged with terrorist offenses.
These rates were significantly different for
Protestants and Catholics: χ2 ≈ 6.2 on 1 degree of
freedom, P ≈1%. The data are shown below: 8
Protestants out of 15 were acquitted, compared
to 27 Catholics out of 65
65.
a) Is the calculation of χ2 correct? If not, can you
guess what the mistake was? (That might be
quite difficult.)
b) What box model did the defense have in mind?
Comment briefly on the model.
Protestant
Catholic
Acquitted
8
27
Convicted
7
38
24
8
Chapter 28 – The Chi-Square Test (cont.)
Expected Outcomes
Protestant
Catholic
Total
Acquitted
6.56
28.44
35
Convicted
8.44
36.56
45
15
65
80
Total
(8 − 6.56) 2 (27 − 28.44) 2 (7 − 8.44) 2 (38 − 36.56) 2
+
+
+
6.56
28.44
8.44
36.56
2
χ = .32 + .07 + .25 + .06 = .70
df = 1,
p − value > 5%,
retain H 0
χ2 =
25
Chapter 28 – The Chi-Square Test (cont.)
Note: When testing for independence in a 2 x 2 table of
counts, a short-cut technique can be used.
Example:
Counts
χ2 =
A
B
A+ B
C
D
C+D
A+ C
B+D
N=A+B+C +D
N ( AD − BC ) 2
( A + C )( B + D)( A + B)(C + D)
26
Chapter 28 – The Chi-Square Test (cont.)
Apply this to data of exercise #10 in review exercises:
8
27
7
38
45
15
65
80
χ2 =
35
80(8 ⋅ 38 − 7 ⋅ 27) 2
(80)(115)(115)
=
= .69
(15)(65)(35)(45) (15)(65)(35)(45)
27
9