Download MATH 183 The Chi-Square Distributions

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Dr. Neal, WKU
MATH 183
The Chi-Square Distributions
The chi-square distributions can be used in statistics to analyze the standard deviation "
of a normally distributed measurement and to test the goodness of fit of various
population models on a set of data.
A chi-square distribution is based on a parameter known as the degrees of!freedom n ,
where n is an integer greater than or equal to 1. Such a random variable is denoted by
X ~ ! 2 (n) . The ! 2 (n) distribution is defined to be the sum of the squares of n independent
standard normal distributions.
!
!
For example, suppose X1, . . . , Xn are independent normally distributed
!
measurements having mean µ i and standard deviation ! i for i = 1, . . ., n . These
measurements could be the heights or IQ scores of various groups of people. By
subtracting the mean and then dividing by the standard deviation, we convert each
X ! µi
measurement into a standard normal distribution: Zi = i
~ N(0,!1) , for 1 " i " n .
"i
So Z1 ~ N(0, 1) and its distribution graph will be the common “bell-shaped curve”
!
which is symmetric about the origin. Then Z12 ~ ! 2 (1) . Its plot will consist of positive
values concentrated near the origin, and it will have mean 1 and variance 2 .
The standard
normal distribution
2
! (1) distribution
2
! (2) distribution
2
! (n) distribution
By standardizing, squaring, and summing random measurements from the
respective normal populations, we obtain a chi-square distribution with n degrees of
freedom:
# X ! µ &2 # X ! µ &2
# X ! µ &2
2
1
1
2
2
n ( = Z 2 + Z 2 + ...+ Z 2 .
(( + %%
(( + . . . + %% n
! (n) = %%
1
n
(
! 2
"
"
"
$
$
$
1 '
2 '
n '
The distribution graphs for n " 3 are skewed bell-shaped curves, defined on [0, ∞),
with increasingly larger values of x as the point at which the graph obtains its
maximum. The mean is now n, the variance is 2n, and the standard deviation is 2n .
For n ≥ 3, the maximum (mode) occurs when x = n " 2 .
!
!
Mean = n
X ~ ! 2 (n) = Z12 + Z22 + ...+ Zn2
! !
Variance = 2 n
Standard Deviation =
Mode = n " 2 (for n ≥ 3)
!
!
!
!
!
2n
Dr. Neal, WKU
The theoretical distribution curve is given by
n / 2!1 ! x /2
, for x ≥ 0,
e
f (x) = Cn x
where Cn is a constant that depends on n given by
!
)
1
+
"
%
!+2 n /2 $ n ! 1' !
++
#2 &
Cn = *
+ (n!2 )/2 " n ! 1%
$
'!
+2
# 2 &
+
+, (n ! 1)! (
for n even
for n odd .
A chi-square curve can be plotted using the built-in ! 2 pdf( command from the
!
DISTR menu. For example, to graph the ! 2 (10) curve, enter ! 2 pdf( X,10) into the Y=
screen.
To compute P(a ! X ! b) for X ~ ! 2 (n) , enter ! 2 cdf(a, b, n) or Shade ! 2 (a, b, n).
Example 1. Let X ~ ! 2 (10) . (a) Where does the maximum of the curve occur? (b)
Compute P(6 ! X ! 10) . Is there symmetry at the outer tails; i.e., does P(0 ! X ! 6) =
P(X ! 10) ? (c) Find the left and right bounds that contain 90% of the distribution.
Solution. (a) For X ~ ! 2 (10) , the maximum (mode) occurs when x = n ! 2 = 8. (b)
From the TI output, we see that P(6 ! X ! 10) ≈ 0.37477. Also, the left-tail is P(0 ! X ! 6)
≈ 0.1847, and the right-tail is P(X ! 10) ≈ 0.4405. So the two tails outside of the inner
region 6 ! X ! 10 are not symmetric.
For there to be 90% in the middle of the distribution, we must
have 5% at each tail. The values where these occur (chi-square
scores) can be found with the table on the next page. In this case,
the values are about 3.940 and 18.31.
Dr. Neal, WKU
Left and Right Chi–Square Scores for
80%, 90%, 95%, and 98% intervals.
(L = Prob. of Left Tail, R = Prob. of Right Tail)
1
2
3
4
5
0.01
L
0.000
0.020
0.115
0.297
0.554
0.025
L
2.706
4.605
6.251
7.779
9.236
0.05
R
3.841
5.991
7.815
9.488
11.07
0.025
R
0.001
0.051
0.216
0.484
0.831
0.004
0.103
0.352
0.711
1.145
0.016
0.211
0.584
1.064
1.610
5.024
7.378
9.348
11.14
12.83
6.635
9.210
11.34
13.28
15.09
6
7
8
9
10
0.872
1.239
1.646
2.088
2.558
1.237
1.690
2.180
2.700
3.247
1.635
2.167
2.733
3.325
3.940
2.204
2.833
3.490
4.168
4.865
10.64
12.02
13.36
14.68
15.99
12.59
14.07
15.51
16.92
18.31
14.45
16.01
17.54
19.02
20.48
16.81
18.48
20.09
21.67
23.21
11
12
13
14
15
3.053
3.571
4.107
4.660
5.229
3.816
4.404
5.009
5.629
6.262
4.575
5.226
5.892
6.571
7.261
5.578
6.304
7.042
7.790
8.547
17.28
18.55
19.81
21.06
22.31
19.68
21.03
22.36
23.68
25.00
21.92
23.34
24.74
26.12
27.49
24.72
26.22
27.69
29.14
30.58
16
17
18
19
20
5.812
6.408
7.015
7.633
8.260
6.908
7.564
8.231
8.907
9.591
7.962
8.672
9.390
10.12
10.85
9.312
10.08
10.86
11.65
12.44
23.54
24.77
25.99
27.20
28.41
26.30
27.59
28.87
30.14
31.41
28.84
30.19
31.53
32.85
34.17
32.00
33.41
34.80
36.19
37.57
21
22
23
24
25
8.897
9.542
10.20
10.86
11.52
10.28
10.98
11.69
12.40
13.12
11.59
12.34
13.09
13.85
14.61
13.24
14.04
14.85
15.66
16.47
29.62
30.81
32.01
33.20
34.38
32.67
33.92
35.17
36.42
37.65
35.48
36.78
38.08
39.36
40.65
38.93
40.29
41.64
42.98
44.31
26
27
28
29
30
12.20
12.88
13.56
14.26
14.95
13.84
14.57
15.31
16.05
16.79
15.38
16.15
16.93
17.71
18.49
17.29
18.11
18.94
19.77
20.60
35.56
36.74
37.92
39.09
40.26
38.88
40.11
41.34
42.56
43.77
41.92
43.19
44.46
45.72
46.98
45.64
46.96
48.28
49.59
50.89
40
50
60
70
80
22.16
29.71
37.48
45.44
53.34
24.43
32.36
40.48
48.76
57.15
26.51
34.76
43.19
51.74
60.39
29.05
37.69
46.46
55.33
64.28
51.80
63.17
74.70
85.53
96.58
55.76
67.50
79.08
90.53
101.9
59.34
71.42
83.30
95.02
106.6
63.69
76.15
88.38
100.4
112.3
d.f.
0.05
L
0.10
L
0.10
R
0.01
R
Dr. Neal, WKU
Theorems
I. Let { x1 , x2 , . . . , xn } denote the collection of all random samples of size n from
1 n
2
2
normally distributed measurements having variance " . Let S =
(x i ! x )2 be
"
n ! 1 i =1
!
the distribution of all possible sample variances. Then
!
!
(n ! 1) S2
2
is
a
distribution.
!
(n
"
1)
"2
Thus with a normally distributed measurement, we can evaluate P(a ! S ! b) by
P(a ! S ! b) = P(a2 ! S2 ! b2 )
$ (n " 1)a2 (n " 1)S2 (n " 1)b2 '
)
= P &&
!
!
2
#2
# 2 )(
% #
$ (n " 1)a2
(n " 1)b 2 ')
2
= P &&
!
*
(n
"
1)
!
# 2 )(
% #2
provided " 2 is known.
2
II.
! Let S be the sample variance from a random sample of size n of a normally
distributed measurement having variance " 2 . A confidence interval for " 2 , with level
of confidence r = 1" # , is given by
!
!
(n ! 1)S2
(n ! 1)S2
2
!
!
,
"# "
R
L
!
where L and R are the left and right bounds of the ! 2 (n " 1) distribution that give r
probability in the middle. A confidence interval for " is
(n ! 1)S2
(n ! 1)S2
"#"
.
R
L
III. To test the null hypothesis H0 : " = M !for a normally distributed measurement, we
obtain the sample deviation S from a random sample of size n . The test statistic is then
(n ! 1) S 2 (n ! 1) S 2
x=
=
which is compared with the ! 2 (n " 1) distribution. Compute
2
2
!
"
M
2
!
! P ! (n " 1) # x for the alternative
the (left-tail) P -value
Ha : " < M , and compute the
(
(
2
)
)
(right-tail) P -value P ! (n " 1) # x for the alternative Ha : " > M .
!
!
Dr. Neal, WKU
Example 2. Random samples of size 46 are taken from a measurement that is N(100,15) .
What is P(13 ! S ! 17) ?
Example 3. From a normally distributed measurement, a sample of size 20 yields
S = 3.96 . Find a 98% confidence interval for the true standard deviation " .
!
Example 4. From a normally distributed measurement, a sample of size 25 yields a
!
H0 : " = 15?
sample deviation of 13.96. Is there evidence to reject the hypothesis
Solutions
Example 2:
!
P(13 ! S ! 17) = P(132 ! S2 ! 172 )
$ (n " 1)132 (n " 1)S 2 (n " 1)172 '
))
= P &&
!
!
#2
#2
#2
%
(
$ 45 *169
45 * 289 '
= P&
! + 2 (n " 1) !
)
% 225
225 (
(
)
, P 33.8 ! + 2 (45) ! 57.8 , 0.794
2
(using ! cdf(33.8, 57.8, 45) )
(n ! 1)S2
(n ! 1)S2
19 ! 3.962
19 ! 3.96 2
"#"
"# "
; hence,
,
R
L
36.19
7.633
or 2.8693 ! " ! 6.24776 .
Example 3:
Example 4: For S = 13.96, we use the alternative Ha : " < 15. The test statistic is
x=
2
(n ! 1) S 2 24 # 13.962
=
= 20. 78737 ~ ! 2 (n " 1) = ! 2 (24)
2
2
!
"
15
and P ! (24) " 20.78737 ≈ 0.348765 ( ! 2 cdf(0, 20.78737, 24). If " = 15 were true, then
there is still a 34.8765% chance of obtaining a sample deviation of 13.96 or lower with a
sample of size 25. There is not enough evidence to reject H0 .
(
)
!
Dr. Neal, WKU
Exercises
1. Let X ~ ! 2 (15) . Find (a) P(13 ! X ! 17) , (b) P(X < 13) and (c) P(X > 17) . Show a
graph for each. (d) Find the bounds that contain 95% of the distribution.
2. Adult heights are found to be normally distributed with mean µ = 68 inches and
standard deviation " = 3.5 inches. Suppose various random samples of size n = 26 are
collected. Compute P(2.8 ! S ! 4.2) .
! yields a sample
3. From a! normally distributed measurement, a sample of size 25
deviation of 14.85. Find a 95% confidence interval for the true standard deviation.
4. From a normally distributed measurement, a sample of size 16 yields S = 4.26. Is
there evidence to reject the hypothesis H0 : " = 3?
!
!
Answers: 1. (a) 0.2834 (b) 0.3977
(c) 0.3189
(d) L = 6.262 and R = 27.49
$ 25 ! 2.82
25 ! 4. 22 ')
2
2
2. P &&
"
#
(25)
"
) = P 16 " # (25) " 36 * 0.8432
2
2
% 3.5
3.5
(
(
3. Use
24 ! 14.852
"#"
39. 36
(
2
)
24 ! 14.852
to obtain 11.6 ≤ " ≤ 20.66.
12.40
)
4. Test stat = 30.246, P ! (15) " 30. 246 ≈ 0.011. If " = 3 were true, then there is only a
!
1.1% chance of getting an S of 4.26 or higher with a sample of size 16. Can reject H0 in
favor of Ha : " > 3.
!
!
!
Related documents