Download Chi-Square Tests and the F-Distribution Goodness of Fit

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Statistics wikipedia , lookup

Transcript
Chapter 10
Chi-Square Tests and
the F-Distribution
§ 10.1
Goodness of Fit
Multinomial Experiments
A multinomial experiment is a probability experiment consisting of
a fixed number of trials in which there are more than two possible
outcomes for each independent trial. (Unlike the binomial
experiment in which there were only two possible outcomes.)
Example:
A researcher claims that the distribution of favorite pizza toppings
among teenagers is as shown below.
Each outcome is
classified into
categories.
Topping
Cheese
Pepperoni
Sausage
Mushrooms
Onions
Frequency, f
41%
25%
15%
10%
9%
The probability for
each possible
outcome is fixed.
Larson & Farber, Elementary Statistics: Picturing the World, 3e
3
1
Chi-Square Goodness-of-Fit Test
A Chi-Square Goodness-of-Fit Test is used to test whether a frequency
distribution fits an expected distribution.
To calculate the test statistic for the chi-square goodness-of-fit test, the
observed frequencies and the expected frequencies are used.
The observed frequency O of a category is the frequency for the category
observed in the sample data.
The expected frequency E of a category is the calculated frequency for the
category. Expected frequencies are obtained assuming the specified (or
hypothesized) distribution. The expected frequency for the ith category is
Ei = npi
where n is the number of trials (the sample size) and pi is the assumed
probability of the ith category.
Larson & Farber, Elementary Statistics: Picturing the World, 3e
4
Observed and Expected Frequencies
Example:
200 teenagers are randomly selected and asked what their favorite
pizza topping is. The results are shown below.
Find the observed frequencies and the expected frequencies.
Topping
Cheese
Pepperoni
Sausage
Mushrooms
Onions
Results (n
= 200)
78
52
30
25
15
% of
teenagers
41%
25%
15%
10%
9%
Observed
Frequency
78
52
30
25
15
Expected Frequency
200(0.41) = 82
200(0.25) = 50
200(0.15) = 30
200(0.10) = 20
200(0.09) = 18
Larson & Farber, Elementary Statistics: Picturing the World, 3e
5
Chi-Square Goodness-of-Fit Test
For the chi-square goodness-of-fit test to be used, the following must be true.
1.
2.
The observed frequencies must be obtained by using a random sample.
Each expected frequency must be greater than or equal to 5.
The Chi-Square Goodness-of-Fit Test
If the conditions listed above are satisfied, then the sampling distribution for the
goodness-of-fit test is approximated by a chi-square distribution with k – 1
degrees of freedom, where k is the number of categories. The test statistic for the
chi-square goodness-of-fit test is
χ2 =∑
(O − E )2
The test is always a right-
test.
E
where O represents the observed
frequencytailed
of each
category and E represents the
expected frequency of each category.
Larson & Farber, Elementary Statistics: Picturing the World, 3e
6
2
Chi-Square Goodness-of-Fit Test
Performing a Chi-Square Goodness-of-Fit Test
In Words
In Symbols
1. Identify the claim. State the null and
alternative hypotheses.
State H0 and Ha.
2. Specify the level of significance.
Identify α.
3. Identify the degrees of freedom.
d.f. = k – 1
4. Determine the critical value.
Use Table 6 in
Appendix B.
5. Determine the rejection region.
Continued.
Larson & Farber, Elementary Statistics: Picturing the World, 3e
7
Chi-Square Goodness-of-Fit Test
Performing a Chi-Square Goodness-of-Fit Test
In Words
In Symbols
6. Calculate the test statistic.
2
χ 2 = ∑ (O − E )
E
7. Make a decision to reject or fail to
reject the null hypothesis.
If χ2 is in the rejection
region, reject H0.
Otherwise, fail to
reject H0.
8. Interpret the decision in the context
of the original claim.
Larson & Farber, Elementary Statistics: Picturing the World, 3e
8
Chi-Square Goodness-of-Fit Test
Example:
A researcher claims that the distribution of favorite pizza toppings
among teenagers is as shown below. 200 randomly selected
teenagers are surveyed.
Topping
Cheese
Pepperoni
Sausage
Mushrooms
Onions
Frequency, f
39%
26%
15%
12.5%
7.5%
Using α = 0.01, and the observed and expected values previously
calculated, test the surveyor’s claim using a chi-square goodnessof-fit test.
Continued.
Larson & Farber, Elementary Statistics: Picturing the World, 3e
9
3
Chi-Square Goodness-of-Fit Test
Example continued:
H0: The distribution of pizza toppings is 39% cheese, 26%
pepperoni, 15% sausage, 12.5% mushrooms, and 7.5%
onions. (Claim)
Ha: The distribution of pizza toppings differs from the claimed
or expected distribution.
Because there are 5 categories, the chi-square distribution has k – 1 = 5
– 1 = 4 degrees of freedom.
With d.f. = 4 and α = 0.01, the critical value is χ20 = 13.277.
Continued.
Larson & Farber, Elementary Statistics: Picturing the World, 3e
10
Chi-Square Goodness-of-Fit Test
Example continued:
Rejection
region
α = 0.01
X2
χ20 = 13.277
χ2 =∑
Topping
Cheese
Pepperoni
Sausage
Mushrooms
Onions
Observed
Frequency
78
52
30
25
15
Expected
Frequency
82
50
30
20
18
(O − E )2 = (78 − 82)2 + (52 − 50)2 + (30 − 30)2 + (25 − 20)2 + (15 − 18)2
82
50
30
20
18
E
≈ 2.025
Fail to reject H0.
There is not enough evidence at the 1% level to reject the
surveyor’s claim.
Larson & Farber, Elementary Statistics: Picturing the World, 3e
11
§ 10.2
Independence
4
Contingency Tables
An r × c contingency table shows the observed frequencies for
two variables. The observed frequencies are arranged in r rows and
c columns. The intersection of a row and a column is called a cell.
The following contingency table shows a random sample of 321
fatally injured passenger vehicle drivers by age and gender.
(Adapted from Insurance Institute for Highway Safety)
Age
Gender
16 – 20
21 – 30
31 – 40
41 – 50
51 – 60
Male
32
51
52
43
28
61 and older
10
Female
13
22
33
21
10
6
Larson & Farber, Elementary Statistics: Picturing the World, 3e
13
Expected Frequency
Assuming the two variables are independent, you can use the
contingency table to find the expected frequency for each cell.
Finding the Expected Frequency for Contingency Table Cells
The expected frequency for a cell Er,c in a contingency table is
E xpect ed fr equ en cy E r ,c =
(Su m of r ow r ) × (Su m of colu m n c )
.
Sa m ple size
Larson & Farber, Elementary Statistics: Picturing the World, 3e
14
Expected Frequency
Example:
Find the expected frequency for each “Male” cell in the contingency table
for the sample of 321 fatally injured drivers. Assume that the variables,
age and gender, are independent.
Age
Gender
16 – 20
21 – 30
31 – 40
41 – 50
51 – 60
61 and
older
Total
Male
32
51
52
43
28
10
216
Female
13
22
33
21
10
6
105
Total
45
73
85
64
38
16
321
Continued.
Larson & Farber, Elementary Statistics: Picturing the World, 3e
15
5
Expected Frequency
Example continued:
Age
Gender
16 – 20
21 – 30
31 – 40
41 – 50
51 – 60
Male
Female
Total
32
13
45
51
22
73
52
33
85
43
21
64
28
10
38
E xpect ed fr equ en cy E r ,c =
61 and
older
10
6
16
Total
216
105
321
(Su m of r ow r ) × (Su m of colu m n c )
Sa m ple size
E 1,1 =
216 ⋅ 45
≈ 30.28
321
E 1,2 =
216 ⋅ 73
≈ 49.12
321
E 1,3 =
216 ⋅ 85
≈ 57.20
321
E 1,4 =
216 ⋅ 64
≈ 43.07
321
E 1,5 =
216 ⋅ 38
≈ 25.57
321
E 1,6 =
216 ⋅ 16
≈ 10.77
321
Larson & Farber, Elementary Statistics: Picturing the World, 3e
16
Chi-Square Independence Test
A chi-square independence test is used to test the independence of
two variables. Using a chi-square test, you can determine whether
the occurrence of one variable affects the probability of the
occurrence of the other variable.
For the chi-square independence test to be used, the following must
be true.
1. The observed frequencies must be obtained by using a random
sample.
2. Each expected frequency must be greater than or equal to 5.
Larson & Farber, Elementary Statistics: Picturing the World, 3e
17
Chi-Square Independence Test
The Chi-Square Independence Test
If the conditions listed are satisfied, then the sampling distribution
for the chi-square independence test is approximated by a chisquare distribution with
(r – 1)(c – 1)
degrees of freedom, where r and c are the number of rows and
columns, respectively, of a contingency table. The test statistic for
the chi-square independence test is
χ2 =∑
(O − E )2
E
The test is always a righttailed test.
where O represents the observed frequencies and E represents the
expected frequencies.
Larson & Farber, Elementary Statistics: Picturing the World, 3e
18
6
Chi-Square Independence Test
Performing a Chi-Square Independence Test
In Words
In Symbols
1. Identify the claim. State the null and
alternative hypotheses.
State H0 and Ha.
2. Specify the level of significance.
Identify α.
3. Identify the degrees of freedom.
d.f. = (r – 1)(c – 1)
4. Determine the critical value.
Use Table 6 in
Appendix B.
5. Determine the rejection region.
Continued.
Larson & Farber, Elementary Statistics: Picturing the World, 3e
19
Chi-Square Independence Test
Performing a Chi-Square Independence Test
In Words
In Symbols
6. Calculate the test statistic.
2
χ 2 = ∑ (O − E )
E
7. Make a decision to reject or fail to
reject the null hypothesis.
If χ2 is in the rejection
region, reject H0.
Otherwise, fail to
reject H0.
8. Interpret the decision in the context
of the original claim.
Larson & Farber, Elementary Statistics: Picturing the World, 3e
20
Chi-Square Independence Test
Example:
The following contingency table shows a random sample of 321
fatally injured passenger vehicle drivers by age and gender. The
expected frequencies are displayed in parentheses. At α = 0.05,
can you conclude that the drivers’ ages are related to gender in
such accidents?
Age
Gender
16 – 20
21 – 30
31 – 40
41 – 50
51 – 60
Male
32
(30.28)
13
(14.72)
45
51
(49.12)
22
(23.88)
73
52
(57.20)
33
(27.80)
85
43
(43.07)
21
(20.93)
64
28
(25.57)
10
(12.43)
38
Female
61 and Total
older
216
10
(10.77)
6 (5.23)
105
Larson & Farber, Elementary Statistics: Picturing the World, 3e
16
321
21
7
Chi-Square Independence Test
Example continued:
Because each expected frequency is at least 5 and the drivers were
randomly selected, the chi-square independence test can be used to
test whether the variables are independent.
H0: The drivers’ ages are independent of gender.
Ha: The drivers’ ages are dependent on gender. (Claim)
d.f. = (r – 1)(c – 1) = (2 – 1)(6 – 1) = (1)(5) = 5
With d.f. = 5 and α = 0.05, the critical value is χ20 = 11.071.
Continued.
Larson & Farber, Elementary Statistics: Picturing the World, 3e
22
Chi-Square Independence Test
Example continued:
Rejection
region
α = 0.05
X2
χ20 = 11.071
χ2 =∑
(O − E )2
= 2.84
E
Fail to reject H0.
O
E
O–E
32
51
52
43
28
10
13
22
33
21
10
6
30.28
49.12
57.20
43.07
25.57
10.77
14.72
23.88
27.80
20.93
12.43
5.23
1.72
1.88
−5.2
−0.07
2.43
−0.77
−1.72
−1.88
5.2
0.07
−2.43
0.77
2
(O – E)2 (O − E )
E
2.9584
3.5344
27.04
0.0049
5.9049
0.5929
2.9584
3.5344
27.04
0.0049
5.9049
0.5929
0.0977
0.072
0.4727
0.0001
0.2309
0.0551
0.201
0.148
0.9727
0.0002
0.4751
0.1134
There is not enough evidence at the 5% level to conclude that age
is dependent on gender in such accidents.
Larson & Farber, Elementary Statistics: Picturing the World, 3e
23
§ 10.3
Comparing Two
Variances
8
F-Distribution
Let s 12 an d s 22represent the sample variances of two different
populations. If both populations are normal and the population
2
variances
are equal,σthen
the
σ 22sampling distribution of
1 an d
F =
s 12
is called an F-distribution.
s 22
There are several properties of this distribution.
1. The F-distribution is a family of curves each of which is determined
by two types of degrees of freedom: the degrees of freedom
corresponding to the variance in the numerator, denoted d.f.N, and
the degrees of freedom corresponding to the variance in the
denominator, denoted d.f.D.
Continued.
Larson & Farber, Elementary Statistics: Picturing the World, 3e
25
F-Distribution
Properties of the F-distribution continued:
2.
3.
4.
5.
F-distributions are positively skewed.
The total area under each curve of an F-distribution is equal to 1.
F-values are always greater than or equal to 0.
For all F-distributions, the mean value of F is approximately equal to
1.
d.f.N = 1 and d.f.D = 8
d.f.N = 8 and d.f.D = 26
d.f.N = 16 and d.f.D = 7
d.f.N = 3 and d.f.D = 11
F
1
2
3
4
Larson & Farber, Elementary Statistics: Picturing the World, 3e
26
Critical Values for the F-Distribution
Finding Critical Values for the F-Distribution
1. Specify the level of significance α.
2. Determine the degrees of freedom for the numerator, d.f.N.
3. Determine the degrees of freedom for the denominator, d.f.D.
4. Use Table 7 in Appendix B to find the critical value. If the hypothesis
test is
a. one-tailed, use the α F-table.
1
b. two-tailed, use the α 2F-table.
Larson & Farber, Elementary Statistics: Picturing the World, 3e
27
9
Critical Values for the F-Distribution
Example:
Find the critical F-value for a right-tailed test when
d.f.N = 5 and d.f.D = 28.
α = 0.05,
1
2
1
161.4
18.51
Appendix B: Table 7: F-Distribution
α = 0.05
d.f.N: Degrees of freedom, numerator
2
3
4
5
6
199.5
215.7
224.6
230.2
234.0
19.00
19.16
19.25
19.30
19.33
27
28
29
4.21
4.20
4.18
3.35
3.34
3.33
d.f.D: Degrees
of freedom,
denominator
2.96
2.95
2.93
2.73
2.71
2.70
2.57
2.56
2.55
2.46
2.45
2.43
The critical value is F0 = 2.56.
Larson & Farber, Elementary Statistics: Picturing the World, 3e
28
Critical Values for the F-Distribution
Example:
1
Find the critical F-value for a two-tailed test when 1
α = (0.10) = 0.05
2
2
α = 0.10, d.f.N = 4 and d.f.D = 6.
d.f.D: Degrees
of freedom,
denominator
1
2
3
4
5
6
7
1
161.4
18.51
10.13
7.71
6.61
5.99
5.59
Appendix B: Table 7: F-Distribution
α = 0.05
d.f.N: Degrees of freedom, numerator
2
3
4
5
6
199.5
215.7
224.6
230.2
234.0
19.00
19.16
19.25
19.30
19.33
9.55
9.28
9.12
9.01
8.94
6.94
6.59
6.39
6.26
6.16
5.79
5.41
5.19
5.05
4.95
5.14
4.76
4.53
4.39
4.28
4.74
4.35
4.12
3.97
3.87
The critical value is F0 = 4.53.
Larson & Farber, Elementary Statistics: Picturing the World, 3e
29
Two-Sample F-Test for Variances
Two-Sample F-Test for Variances
A two-sample F-test is used to compare two population variances
when
sample
σ 12aan
d σ 22 is randomly selected from each population.
The populations must be independent and normally distributed.
The test statistic is
s2
F = 12
s2
where s 12 an d s 22 represent the sample variances with
The
of freedom for the numerator is d.f.N = n1 – 1 and the
s 12 ≥degrees
s 22.
degrees of freedom for the denominator is d.f.D = n2 – 1, where n1 is
the size of the sample having variance and n2 is the size of the
sample having variance
s 12
s 22.
Larson & Farber, Elementary Statistics: Picturing the World, 3e
30
10
Two-Sample F-Test for Variances
Using a Two-Sample F-Test to Compare
In Words
In Symbols
1. Identify the claim. State the null and
alternative hypotheses.
State H0 and Ha.
2. Specify the level of significance.
Identify α.
3. Identify the degrees of freedom.
d.f.N = n1 – 1
d.f.D = n2 – 1
4. Determine the critical value.
Use Table 7 in
Appendix B.
Continued.
Larson & Farber, Elementary Statistics: Picturing the World, 3e
31
11