Download Chapter 6 Sampling Distributions

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Chapter 6
Sampling Distributions
6.1
Random Sampling
Def 1 A population consists of the totality of the observations with which we are concerned.
Remark
1. The size of a populations may be finite or infinite. For example,
• Finite population — the blood types for students in a scholl.
• Infinite population — the observations of atmospheric pressure every day from
past to future.
2. The observations from a population have a common distribution. We will use Xi to
denote the value of the ith observation.
Def 2 A sample is a subset of population.
Def 3 A random sample of size n is a collection of n indenpendent random variables with
a common distribtion.
Remark
The probability distribtion of a (random) sample, say, x1 , x2 , . . . , xn , of size n is described
as the following jpdf
f (x1 , x2 , . . . , xn ) = f (x1 )f (x2 ) · · · f (xn )
6.2
Some Important Statistics
Def 4 Any function T (X1 , X2 , . . . , Xn ) of the observations X1 , X2 , . . . , Xn , is called a statistic.
Central Tendency in a sample
Let X1 , X2 , . . . , Xn represent a random sample of size n.
1
1. sample mean —
X=
2. sample median —
n
1X
Xi
n i=1




 Y(n+1)/2
if n is odd
X̃ = 

Y + Y(n/2)+1

 n/2
if n is even
2
where Yi ’s denote the order statistics corresponding to X1 , . . . , Xn .
3. mode — the value in the sample that occurs most often or with the greatest frequency.
The mode may not exist, and when it does it is not necessary unique.
Example 1 The lengths of time, in minutes, that 10 patients waited in a doctor’s office
before receiving treatment were recorded as follows:
5, 11, 9, 5, 10, 15, 6, 10, 5, 10.
Find
1. the mean;
2. the median;
3. the mode.
sol)
1. x =
1
(5
10
+ 11 + · · · + 10) = 8.6 mimutes;
2. x̃ = 9.5 minutes;
3. M o(X) = 5 and 10 minutes.
Facts
1. M o(X) ≤ X̃ ≤ X or X ≤ X̃ ≤ M o(X).
2.
n
X
|Xi − X̃| ≤
i=1
3.
n
X
n
X
2
(Xi − X) ≤
i=1
|Xi − a| for any a.
i=1
n
X
(Xi − a)2 for any a.
i=1
4. X − M o(X) ≈ 3(X − X̃).
2
Variability in the Sample
1. range — Let X(n) = max(X1 , . . . , Xn ) and X(1) = min(X1 , . . . , Xn ). The range of a
sample is defined to be X(n) − X(1) .
2. sample variance —
S2 =
n
1 X
(Xi − X)2
n − 1 i=1

Ã
n
n
1 X
1 X
Xi2 −
Xi
=
n − 1 i=1
n i=1
(
n
X
1
2
=
Xi2 − nX
n − 1 i=1
!2 


)
3. sample deviation — S.
Example 2 The IQ’s of a random sample of five members of a sorority are 108, 112, 127,
118, and 113. Find
1. the range;
2. the sample variance;
3. the sample deviation.
sol)
1. Range = 127 − 108 = 19.
2.
5
X
x2i = 1082 + 1122 + 1272 + 1182 + 1132 = 67, 030
i=1
5
X
xi = 108 + 112 + 127 + 118 + 113 = 578
i=1
S
Ã
2
5782
1
67030 −
=
4
5
= 53.3
3. S = 7.300.
3
!
6.3
Sampling Distributions
Sampling Distributions of Means
1. E[X] = µ, V ar[X] = σ 2 /n.
2. From central limit theorem,
Z = lim
n←∞
X −µ
√ ∼ N (0, 1)
σ/ n
3. Suppose there are two populations with means and variances being (µ1 , σ12 ) and (µ2 , σ22 ),
respectively. If two random sample of size n1 and n2 are drawn at random from two
populations. Then
E[X 1 ± X 2 ] = µ1 ± µ2
σ2 σ2
V ar[X 1 ± X 2 ] = 1 + 2
n1 n2
As n1 −→ ∞ and n2 −→ ∞,
Z=
(X 1 ± X 2 ) − (µ1 ± µ2 )
r
σ12
n1
+
σ22
n2
∼ N (0, 1)
4. The above fact trivially holds if both populations are normally distributed.
Example 3 If all possible sample size of 16 are drawn from a normal population with mean
equal to 50 and standard deviation equal to 5, what is the probability that a sample mean X
will fall in the interval from µX − 1.9σX to µX − 0.4σX ?
sol)
• We want to find P (µX − 1.9σX < X < µX − 0.4σX ).
• Clearly, X ∼ N (50, 52 /16) ≡ N (50, 1.5625).
•
P (µX − 1.9σX < X < µX − 0.4σX ) = P (X < µX − 0.4σX ) − P (X < µX − 1.9σX )
= Φ(−0.4) − Φ(−1.9)
= 0.3446 − 0.0287 = 0.3159
4
Example 4 Given the discrete uniform population
(
f (x) =
1
3
x = 2, 4, 6
0 otherwise
find the probability that a random sample of size 54, selected with replacement, will yield a
sample mean greater than 4.1 but less than 4.4.
sol)
• We want to find P (4.1 < X < 4.4).
•
1
(2 + 4 + 6) = 4
3
1 2
8
=
(2 + 42 + 62 ) − 42 =
3
3
µX =
2
σX
• n = 54 > 30, X ∼ N (4, (2/9)2 ). Hence,
P (4.1 < X < 4.4) = P (X < 4.4) − P (X ≤ 4.1)
Ã
!
Ã
!
4.4 − 4
4.1 − 4
−Φ
= Φ
2/9
2/9
= Φ(1.8) − Φ(0.45)
= 0.9641 − 0.6736 = 0.2905
Example 5 A random sample of size 25 is taken from a normal population having a mean
of 80 and a standard deviation of 5. A second random sample of size 36 is taken from a
different normal population having a mean of 75 and a standard deviation of 3. Find the
probability that the sample mean computed from the 25 measurements will exceed the sample
mean computed from the 36 measurements by at least 3.4 but less than 5.9.
• n1 = 25, µ1 = 80, σ1 = 5.
• n2 = 36, µ2 = 75, σ2 = 3.
• We want to find P (3.4 ≤ X 1 − X 2 < 5.9).
• Clearly, X 1 ∼ N (80, 52 /25), X 2 ∼ (75, 32 /36) =⇒ (X 1 − X 2 ) ∼ N (5, 5/4).
5
•
P (3.4 ≤ X 1 − X 2 < 5.9) = P (X 1 − X 2 < 5.9) − P (X 1 − X 2 < 3.4)

= Φ

5.9 − 5 

q
5/4

−Φ

3.4 − 5 

q
5/4
= Φ(0.8050) − Φ(−1.4311)
= 0.78955 − 0.07466 = 0.71489
Remark
Let Z ∼ N (0, 1). The critical value zα is defined to be
P (Z ≥ zα ) = α
Several often used zα are
z0.005
z0.01
z0.025
z0.05
=
=
=
=
2.58
2.33
1.96
1.645
We can find zα from the probability table.
Chi-square distribution
Def 5 A random variable X is said to possess a chi-square distribution with ν degrees of
freedom, denoted by X ∼ χ2ν , if it has the pdf
(
f (x) =
2−ν/2 (ν/2)−1 −x/2
x
e
Γ(ν/2)
0
x¿0
x≤0
Remark
1. χ2ν ≡ Γ(n/2, 1/2).
2. X ∼ N (0, 1) =⇒ X 2 ∼ χ21 .
3. Let Xi ∼ N (0, 1), i = 1, . . . , n be n independent random variables. Then,
n
X
Xi2 ∼ χ2n .
i=1
4. Let Xi ∼ N (µ, σ 2 ), i = 1, . . . , n be n independent random variables. Then,
¶
n µ
X
Xi − µ 2
i=1
σ
6
∼ χ2n .
5. Let χ2 ∼ χ2ν . The chi-square critical value χ2α,ν is defined such that
P (χ2 ≥ χ2α,ν ) = α
We can find the critical value from the table.
Example 6 Given a sample of size 16 from a normal population N (50, 102 ),
1. find b such that P (X − 50 ≤ b) = 0.05;
2. find c such that P (|X − 50| ≥ c) = 0.05;
3. find d such that P
à n
X
!
2
(Xi − 50) ≥ d = 0.05.
i=1
sol)
1.
Ã
=⇒ P
P (X − 50 ≤ b)
!
X − 50
b
√ ≤
√
10/ 16
10/ 16
b
=⇒ P (Z ≤
)
2.5
=⇒ P (Z ≤ −z0.05 )
b
=⇒
2.5
=⇒ b
= 0.05
= 0.05
= 0.05
= 0.05
= −z0.05
= −z0.05 · 2.5
= −1.645 · 2.5 = −4.1125
2.
P (|X − 50| ≥ c) = 0.05
=⇒
¯
ï
!
¯ X − 50 ¯
c
¯
¯
√ ¯≥
√
P ¯
= 0.05
¯ 10/ 16 ¯
10/ 16
µ
¶
c
2.5 ¶
µ
c
=⇒ P Z ≥
2.5
=⇒ P (Z ≥ z0.025 )
c
=⇒
2.5
=⇒ c
=⇒ P |Z| ≥
7
= 0.05
= 0.025
= 0.025
= z0.025
= z0.025 · 2.5
= 1.96 · 2.5 = 4.9
3.
!
à 16
X
P
2
(Xi − 50) ≥ d
i=1
=⇒ P
à 16 µ
X Xi − 50 ¶2
10
i=1
Ã
d
≥
100
= 0.05
!
= 0.05
!
d
=⇒ P χ ≥
100
=⇒ P (χ2 ≥ χ0.05,16 )
d
=⇒
100
=⇒ d
2
= 0.05
= 0.05
= χ0.05,16
= 100 · χ0.05,16
= 100 · 7.962 = 796.2
Sampling Distributions of S 2
Let E[Xi ] = µ and V ar[Xi ] = σ 2 . Then,
1.
"
#
n
n
1X
1X
E[X] = E
Xi =
E[Xi ] = µ
n i=1
n i=1
"
#
n
X
1
1
σ2
2
X
=
V ar[X] =
V
ar
·
nσ
=
i
n2
n2
n
i=1
2.
n
X
(Xi − a)2 =
i=1
n
X
(Xi − X)
i=1
n
X
2
=
(Xi − X)2 =
i=1
n
X
(Xi − X)2 + n(X − a)2
i=1
n
X
Xi2
2
− nX =
i=1
n
X
n
X
Ã
Xi2
i=1
n
1 X
−
Xi
n i=1
!2
(Xi − µ)2 − n(X − µ)2
i=1
3.
E
" n
X
#
2
(Xi − X)
=E
" n
X
#
2
(Xi − µ)
h
i
− nE (X − µ)2 = (n − 1)σ 2
i=1
i=1
4.
"
#
n
1 X
E[S ] = E
(Xi − X)2 = σ 2
n − 1 i=1
2
5. X and S 2 are independent.
8
6.
¶
n µ
X
Xi − µ 2
i=1
σ
Ã
(n − 1)S 2
X −µ
√
=
+
2
σ
σ/ n
!2
7. Therefore, if Xi ∼ N (µ, σ 2 ), then
•
¶
n µ
X
Xi − µ 2
σ
i=1
Ã
∼ χ2n ;
!2
X −µ
√
•
∼ χ21 ;
σ/ n
(n − 1)S 2
•
∼ χ2n−1 ;
2
σ
h
i
2
= n − 1;
• E (n−1)S
2
σ
• V ar
h
(n−1)S 2
σ2
i
= 2(n − 1).
Example 7 A manufacturer of car batteries guarantees that his batteries will last, on the
average, µ = 3 years with a standard deviation of σ = 1 year. Assume that the battery
lifetime follows a normal distribution. If a sample of size 5 is taken to estimate the mean
lifetime and variance. Find
1. P (|X − µ| ≥ b) = 0.05;
2. Find c such that P (S 2 ≥ c) = 0.05;
Ã
3. Find d such that P
!
5
1X
(Xi − µ)2 ≥ d = 0.05.
5 i=1
sol)
1. X ∼ N (3, 1/5).
P (|X − 3| ≥ b) = 0.05
=⇒
¯
¯

¯
¯
¯X − 3¯
b
¯≥ q

P ¯¯ q
¯
¯ 1/5 ¯
1/5


= 0.05
b 
=⇒ P |Z| ≥ q
= 0.05
1/5
=⇒ P (|Z| ≥ z0.025 ) = 0.05
b
= z0.025
=⇒ q
1/5
q
=⇒ b = z0.025 · 1/5
= 1.96(0.4472) = 0.8765
9
2.
4S 2
∼ χ24 .
1
Ã
=⇒ P
P (S 2 ≥ c) = 0.05
!
4S 2
4c
≥
= 0.05
1
1
³
³
´
= =⇒ P χ2 ≥ 4c = 0.05
´
=⇒ P χ2 ≥ χ20.05,4 = 0.05
=⇒ 4c = χ20.05,4
=⇒ c = χ20.05,4 /4 = 9.488/4 = 2.372
3.
¶
5 µ
X
Xi − µ 2
i=1
σ
∼ χ25 .
Ã
P
Ã
=⇒ P
!
5
1X
(Xi − µ)2 ≥ d
5 i=1
µ
5
1X
Xi − µ
5 i=1
σ
¶2
d
≥ 2
σ
Ã
5d
=⇒ P χ ≥ 2
σ
= 0.05
!
= 0.05
!
2
³
=⇒ P χ2 ≥ χ20.05,5
= 0.05
´
= 0.05
5d
= χ20.05,5
2
σ
=⇒ d = σ 2 χ20.05,5
= 11.070/5 = 2.214
=⇒
t-distribution
Let X ∼ N (µ, σ 2 ). Then X ∼ N (µ, σ 2 /n) or, eqivalently,
Z=
X −µ
√ ∼ N (0, 1)
σ/ n
In most cases, the value of σ 2 is not available. Thus, we will use S 2 to estimate σ 2 . The
t-distribution deals with the distribution about the statistic T defined by
T =
X −µ
√
S/ n
Def 6 Let Z ∼ N (0, 1) and W ∼ χ2ν be two independent random variables. The random
variable
Z
T =q
W/ν
is said to possess a t-distribution with ν degrees of freedom and is denoted by T ∼ tv .
10
Facts
Let X ∼ tν . Then:
1.
Γ
³
ν+1
2
´
Ã
t2
³
´
fT (t) =
1+
√
ν
Γ ν2
πν
!−(ν+1)/2
,
−∞ < t < ∞
2.
E[T ] = 0
V ar[T ] =
ν
,
ν−2
ν>2
3. As ν −→ ∞, tν ≡ N (0, 1).
4. Let X1 , . . . , Xn be a random sample from a normal population N (µ, σ 2 ). Then
• X and S 2 are independent.
X −µ
(n − 1)S 2
√ ∼ N (0, 1), W =
∼ χ2n−1 .
2
σ/ n
σ
• Hence,
Z
X −µ
√ ∼ tn−1
T =q
=
S/ n
W/(n − 1)
• Z=
5. The value of tα,ν , defined by
P (T ≥ tα,ν ) = α,
can be found from the probability table.
6. t1−α,ν = −tα,ν .
Example 8 The gas consumption (liters/hr) of automobiles manufactured by a company
is normally distributed but with mean µ and variance σ 2 being unknown. Now, a random
sample of size 16 is taken to estimate µ by X. Find c in terms of sample variance s such
that P (|X − µ| < c) = 0.95.
sol)
P (|X − µ| < c)
=⇒ 1 − 2P (X − µ ≥ c)
=⇒ P (X − µ ≥ c)
!
Ã
X −µ
c
√ ≥ √
=⇒ P
S/ 16
S/ 16
11
= 0.95
= 0.95
= 0.025
= 0.025
Ã
c
√
=⇒ P T ≥
!
= 0.025
S/ 16
=⇒ P (T ≥ t0.025,15 ) = 0.025
c
√
=⇒
= t0.025,15
s/ 16
√
=⇒ c = t0.025,15 · s/ 16
= (2.131/4)s = 0.5328s
F -distribution
Def 7 Let W1 ∼ χ2ν1 and W2 ∼ χ2ν2 be two independent random variables. The the random
variable
W1 /ν1
F =
W2 /ν2
is said to possess and an F distribution with ν1 and ν2 degrees of freedom and is denoted by
F ∼ Fν1 ,ν2 .
Facts
1.

´
³
ν /2 ν /2
ν1 +ν2

ν1 1 ν2 2
Γ


2

³ ´ ³ ´

fF (x) = 



 0
Γ
ν1
2
Γ
ν2
2
x(ν1 /2)−1
x>0
(ν2 + ν1 x)(ν1 +ν2 )/2
x≤0
2. Let X1 , . . . , Xm be a sample of size m arising from a normal population N (µ1 , σ12 ),
and let Y1 , . . . , Yn be a sample of size n arising from a normal population N (µ2 , σ22 ).
Suppose that the two population are independent. Then
F =
S12 /σ12
∼ Fm−1,n−1
S22 /σ22
pf)
• Let
(m − 1)S12
;
σ12
(n − 1)S22
.
=
σ22
W1 =
W2
• Clearly, W1 ∼ χ2m−1 and W2 ∼ χ2n−1 . Hence,
W1 /(m − 1)
S 2 /σ 2
= 12 12 ∼ Fm−1,n−1
W2 /(n − 1)
S2 /σ2
12
3. In the above, if σ12 = σ22 , we have
S12
∼ Fm−1,n−1
S22
4. The critical value fα,ν1 ,ν2 is defined such that
P (F ≥ fα,ν1 ,ν2 ) = α.
5. f1−α,ν2 ,ν1 =
pf)
1
fα,ν1 ,ν2
.
•
1 − α = 1 − P (F ≥ fα,ν1 ,ν2 )
Ã
!
W1 /ν1
= 1−P
≥ fα,ν1 ,ν2
W2 /ν2
Ã
!
W2 /ν2
1
= 1−P
≤
W1 /ν1
fα,ν1 ,ν2
Ã
!
W2 /ν2
1
= P
≥
W1 /ν1
fα,ν1 ,ν2
• According to the defintion regarding to the F critical value, we have
f1−α,ν2 ,ν1 =
1
fα,ν1 ,ν2
6. The F critical value can be found from the probability table.
Example 9 f0.95,11,8 =?
sol) From the probability table, f0.05,8,11 = 2.95. Hence,
f0.95,11,8 =
1
f0.05,8,11
=
1
= 0.34
2.95
Example 10 Two samples of size 5 are taken from sample two independent normal populations. Assume that the two populations have the same variance.
1. Find b such that P (S12 /S22 ≥ b) = 0.05.
2. Find c such taht P (S12 /S22 ≤ c) = 0.05.
sol) Clearly, S12 /S22 ∼ F4,4 .
13
1.
P (S12 /S22 ≥ b)
=⇒ P (F ≥ b)
=⇒ P (F ≥ f0.05,4,4 )
=⇒ b
=
=
=
=
0.05
0.05
0.05
f0.05,4,4 = 6.39
2.
P (S12 /S22 ≤ c)
=⇒ 1 − P (S12 /S22 > c)
=⇒ P (F ≥ c)
=⇒ P (F ≥ f0.95,4,4 )
=⇒ b
6.4
=
=
=
=
=
0.05
0.05
0.95
0.95
f0.95,4,4
1
1
=
= 0.1565
=
f0.05,4,4
6.39
Exercises
1. A finite population contains six numbers 1,2,3,4,5,6.
(a) Find the mean µ and variance σ 2 of the numbers in the population.
(b) Randomly taking a sample of size 2 without replacement from the population.
Find the distribution of the sample mean X.
N − n σ2
(c) Show that the sample mean in (b) satisfies E[X] = µ, and V ar[X] =
.
N −1 n
2. Let X1 , X2 , X3 , X4 be a sample from a normal population N (0, 1). What value of c
can make the statistic
c(X1 + X2 )
T = q
X32 + X42
a t-distribution? Determine the number of degrees of freedom for the t-distribution.
3. Let Y1 , Y2 be a random sample from a normal population N (0, 1), and let X1 , X2 be a
random sample from another independent normal population N (1, 1). Find
(a) the distribution of X + Y ;
Y1 + Y2
(b) the distribution of q
(X2 − X1 )2 + (Y2 − Y1 )2
(X2 − X1 )2 + (Y2 − Y1 )2
;
2
(X2 + X1 − 2)2
(d) the distribution of
.
(X2 − X1 )2
(c) the distribution of
14
;
4. Let S12 and S22 be the sample variances from two independent normal population
N (µ1 , 10) and N (µ2 , 15), respectively. Suppose n1 = n2 = 10.
(a) Find b such that P (S12 /S22 ≥ b) = 0.01.
(b) Find c such that P (S12 /S22 ≤ c) = 0.05.
5. Find (a) χ20.05,20 (b) χ20.95,15 (c) χ20.025,10 (d) t0.025,10 (e) t0.95,20 (f) f0.05,10,11 (g) f0.99,6,8 .
6. Let X 1 and X 2 be sample means from two independent normal populations with a
common variance σ 2 . Define Sp2 , called pooled variance, by
Sp2 =
(n1 − 1)S12 + (n2 − 1)S22
n1 + n2 − 2
Show that
(a) E[Sp ] = σ 2 ;
(b)
(n1 + n2 − 2)Sp2
∼ χ2n1 +n2 −2 .
σ2
7. Let X1 , . . . , Xn is a sample from a Bernoulli population. Let
Y =
n
X
Xi
i=1
(a) Determine the distribution of Y .
(b) If n = 100 and p = 0.05. Use the other two possible distributions to approximate
P (Y = 3).
(c) Let p̂ = X = Y /n. For n > 40, find c such that P (|p̂ − 0.05| < c) = 0.95.
8. The lifetime of a system is Y = X1 + X2 + X3 + X4 , whereX1 , X2 , X3 , X4 are the
lifetimes of its subsystems. Suppose that each subsystem is independent, and the
lifetime is exponentially distributed with MTBF (mean time between failure) being 3
hours. Find the probability that the system can survive at least 18 hours.
9. Let X1 , . . . , X100 be a sample from a normal population N (0, 1).
(a) Find the pdf of sample variance
S2 =
100
1 X
(Xi − X)2 .
99 i=1
(b) Find E[S].
(c) Find P (X1 + X2 − X3 − X4 ≥ 2).
10. Let Y1 < Y2 < Y3 < Y4 be the order statistics of a random sample of size n = 4 from a
continuous symmetric distribution with mean µ and variance σ 2 . Find the probability
of Y3 < µ.
15
11. Suppose X1 , . . . , Xn is a random sample from N (µ, σ 2 ) population, where µ, σ 2 are
both unknown. Let
n
1X
1
Xn =
Xi , Sn2 = (Xi − X n )2 .
n i=1
n
If Xn+1 is an additional observation, find the constant k so that k(X n − Xn+1 )/Sn has
a t-distribution.
12. Let Xi ∼ N (0, 1), i = 1, 2, . . . , 6 be mutually independent. Let
√
√
√
Y = (2X1 − 2X2 + 2X3 )2 + ( 3X4 − 2X5 − 3X6 )2 .
Find the constant k so that kY has a chi-square distribution.
16
Related documents