Download Total

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Degrees of freedom (statistics) wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Taylor's law wikipedia , lookup

German tank problem wikipedia , lookup

Gibbs sampling wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript
2
χ
Applications
(pronounced “kie-square”; text Ch. 15)
• Tests for Independence
– is characteristic R the same or different in
different populations?
• Tests for Equality of Population Proportions,
Three or More Populations
• Tests for “Goodness of Fit”
– did this sample come from a population with
____ distribution?
• Confidence Intervals and Hypothesis Tests
for Variances and Standard Deviations
χ2 Contingency Tables (Tests for Independence)
Medication/
Outcome
Prozan
Recovered
150
Didn’t
Recover
50
Drug X
200
100
Total
Total
Research Question: Was the proportion who recovered on Prozan
the same as the proportion that recovered on Drug X?
Some difference in proportions is NOT sufficient evidence – the
difference could be due to sampling variability.
• So we ask: how probable is the observed difference if
there is no actual difference between these populations.
• That probability is governed by the χ2 distribution
• Chi-Square Calculation:
r
k
χ2 = ∑ ∑
i =1 j =1
(Oij − Eij ) 2
Eij
r is the number of rows; k is the number of columns. The expression
directs us to calculate the expected number for each row and column
intersection, then calculate the required squares. The expected number
is the proportion in the total sample with characteristic R multiplied by the
size of the subsample.
The chi-square statistic has (r-1) X (k-1) degrees of freedom
Medication/Out Recovered
come
Didn’t Recover
Prozan
150
50
Drug X
200
100
Total
Total
How many would we expect to recover in the Prozan group?
Of the total sample of 500, 350 recovered so P(R) = 350/500 = 0.7.
There are 200 in the Prozan sample, so E11 = 0.7 × 200 = 140.
Formula: Eij = [Cj/n) × Ri where Cj is the total in the j-th column and Ri is the
total in the i-th row, while n is the size of the total sample.
Medication Recovered Didn’t
/Outcome
Recover
Prozan
O11: 150
O12: 50
E11: 140
E12: 60
O22: 100
Drug X
O21: 200
E21: 210
E22: 90
Total
350
150
r
k
χ 2 = ∑∑
i =1 j =1
(Oij − Eij ) 2
Eij
Total
200
300
500
10 2 10 2 10 2 10 2
=
+
+
+
= 3.9683
140 210 60
90
Major/
Start Sal
<40,000
Business
Other
120
160
≥ 40,000
280
340
Total
Total
χ
2
=
r
∑
i = 1
k
∑
j = 1
(O
ij
−
E
E
ij
ij
)
2
the expected number of business
grads with salary < 40,000
1)
2)
3)
4)
120
124.44
400
none of the above
The squared value for “Other <
40000” is
1)
2)
3)
4)
160
6.25
0.1267
155.56
The χ2 value is
1)
2)
3)
4)
12.35
900
0.4139
none of the above
Method/
Level
Not
Proficient
Proficient
Highly
Proficient
Sight
64
86
31
Phonics
51
78
45
Write-toRead
33
99
51
Total
Total
• Excel Procedures:
– Enter the data as a table in Excel
– Create a table in Excel showing the expected frequencies, as
below
– When cell B9 is copied to B9 to D11, it will create the required
expected frequencies
– Use chitest(B2:D4,B9:D11); the result is the p-value of the test
B
9
10
11
=(B$5/$E$5)*$E2
C
D
Comic Relief
• Achieving energy
independence!
• Your tax dollars at work!
• The photo on the right is of
the Great Howard’s Knbob
Wind Turbine Generator,
a joint project of the
Department of Energy and
the National Aeronautics
and Space Administration.
It cost 30 million 1980
dollars – about
$75,000,000 in today’s
money.
During its lifetime this machine produced
A. zero useable watts of
power
B. a cult that worshiped the
machine and called
themselves “Whooshies”
C. significant interference
with TV signals
D. Several auto accidents
due to brake failure as
flatlanders descended
from the Knob
E. all of the above
Testing for equality of three or
more population proportions
• Purpose: to avoid a series of pair-wise hypothesis tests
• Very similar to tests for independence
• Example: the following table gives proportions of pet
ownership among samples in four towns: is the
proportion the same or different in the four locations?
City
New
York
Percentage 12
own pet
n=
200
Atlanta Boone
36
52
Los
Angeles
39
300
200
300
• To use χ2 we must have numbers in each cell, so
begin by calculating the number in each
category:
City/
Pet
Own
Don’t
Own
Totals
New York Atlanta Boone LA
Totals
0.12x200 108
= 24
176
192
104
117
353
96
183
647
200
200
300
1000
300
2
χ
Tests for Goodness of Fit:
• General Notion: We often wish to know whether a
particular distribution fits a general definition
• Example: To use t tests, we must suppose that the
population is normally distributed
• If a sample is drawn from, say, a normal distribution, the
sample values should be reflect the population
distribution
• Allows us to state the number in the sample that should
be in a particular range
• Example: 68% of a normal distribution is within +/- 1
standard deviation of the mean. About 68% of the
values in a sample from a normal distribution should be
within +/- 1 standard deviation of the mean
• Comparison of actual and expected numbers is the
province of the χ2 distribution
• Let Oj be the number observed in the sample in
range j
• Let Ej be the number that would be expected if
the population had a given distribution, as
uniform, Poisson, normal, etc.
• Then
2
k
χ =∑
2
j =1
(O j − E j )
Ej
• where k is the number of categories
• degrees of freedom = k – 1 – m where m is the
number of parameter estimates used in the
calculation
• Example: Are the answers to Dr.
Dinwiddie’s multiple-choice tests random?
If so, the answers should conform to a
uniform distribution and P(A) = P(B) =
P(C) = P(D) = ¼. (For the uniform
distribution P(E) = 1/n, where n is the
number of possible values.)
• On a recent exam there were sixty
questions with correct answers: A-20, B-5,
C-17, and D-18.
H0: the distribution of answers is uniform
H1: the distribution is not uniform
Correct
Answer
A
Observed
Expected
20
15
B
5
15
C
17
15
D
18
15
k
(O j − E j )
j =1
Ej
χ =∑
2
Squared
Difference
2
Then χ2 = 9.207, and no parameters were estimated, so
degrees of freedom = 4 – 1 = 3
• Excel and the chi-square distribution
– CHIDIST(x value, df) returns the area in the
right-hand tail of the chi-square distribution
• goodness of fit tests are all upper one-tail tests, so
chidist gives the p-value of the test
– CHIINV(probability, df) gives the chi-square
value for the upper tail of the probability
entered
• use to find the critical value for a chi-square test
• For the Dinwiddie problem:
CHIDIST(9.207, 3) gives the p-value of the
test
• EXAMPLE: Hamish suspects that the
dice at Black Bart’s are not fair, so he
spirits one out of the casino one night.
After rolling the stolen die 120 times, he
has the following result:
No. of Dots
1
No. of Times
27
2
3
4
5
6
24
18
11
27
13
No. of Dots
No. of Times
1
27
2
3
4
5
6
24
18
11
27
13
Calculate the appropriate statistic.
• Testing for normality
– suppose that nationally auto insurance has a
mean price of $700 with standard deviation
$135. We have a sample of 80 NC drivers,
and we’d like to know whether their insurance
bills are normally distributed with the national
parameters.
– how many would we expect in the range 700
to 835?
– HINT: how many standard deviations? What
proportion are within that range of standard
deviations?
• answer: on a normal distribution, 0.34 are
between the mean and +1 st dev, so we’d
expect to find 0.34 * 80 = 27.2 in that
range
• Setting up a spreadsheet: use normsdist
• Continuing in that fashion, we’d have the
following
St
Devs
< -2
Range
Prop.
< 430
0.02275
Expected
freq
1.82
-2 to -1 430-565 0.1359
10.87
-1 to 0
565-700 0.3413
27.31
0 to 1
700-835 0.3413
27.31
1 to 2
835-970 0.1359
10.87
>2
> 970
1.82
0.02275
• To find the observed values in the sample,
use the HISTOGRAM tool
• An elaborated solution appears under
“Study Aids” on my web site. Click on the
link to normaltest.xls
• Issue: how many degrees of freedom does
the χ2 statistic have?
– df = k – 1 – m = 6 – 1 – 0 = 5
• Alternate technique: determine whether
the sample was drawn from a normal
population
• First, calculate sample mean and standard
deviation and use those numbers in the
calculation
• Issue: how many degrees of freedom does
the χ2 statistic have?
– df = k – 1 – m = 6 – 1 – 2 = 3
• Testing for conformity to an observed
distribution:
– The national distribution of pets is as follows:
Number of Pets
Percentage of Households
0
55
1
25
2
10
3
5
4
3
5 or more
2
A marketing company wants to know whether Boone
conforms to the national pattern. In a sample of 300
Boone households, they found the following:
No. of Pets
No. of
Households
0
128
1
75
2
50
3
20
4
18
5 or more
9
k
(O j − E j )
j =1
Ej
χ =∑
2
Expected No.
2
Squares