Download Chapter 7. Inference for the Mean of a Population 7.1.Inference for

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Chapter 7. Inference for the Mean of a Population
7.1.Inference for the mean of a Population
Both confidence intervals and tests of significance for the mean of a normal
population are based on the sample mean, which estimates the unknown The
sampling distribution of sample mean depends on . This fact causes no difficulty when
is known. When is unknown, however, we must estimate even though we
are primarily interested in The sample standard deviation s is used to estimate the
population standard deviation 


The t distributions
Suppose that an random sample of size n is drawn from an N(, 2) population.
Then, the one-sample t statistic
t=
x— Â
s/ n
has the t distribution with n-1 degrees of freedom.
1.Generally, we use t-distribution instead of standard normal distribution to calculate
confidence interval and carry out test of hypothesis about the unknown mean
when the standard deviation (or variance 2) of population is unknown.
2. You can calculate degree of freedom as n-1 in one sample problem(confidence interval
and test of hypothesis).
7.2 Comparing Two Means
 Notations :
Population 1 : mean Â1 , variance È 12
Population 2 : mean Â2 , variance È 22
2
Sample from population 1 : sample size n1 , mean x—
1 , variance s 1
Sample from population 2 : sample size n 2 , mean x—2 , variance s 22
 Properties of the sampling distribution of
E X—1 X—2 a  X— X—= Â1 Â2
1)
1
1
2) Var X—
X—
1
X—2
2
2
X—2 a È X—
1
È 21 È 22
ƒ
X—=
n1
n2
2
3) When n1 and n 2are large enough, the distribution of
X—
1
X—2
is approximately
normal.
Case 1 : Test and confidence interval for a difference between two
normal population means known È 1 and È 2 

Null hypothesis : Ho : Â1
x—1 x—2 - o
z=
Test statistic :
È 21 / n1 ƒ È 22 / n 2
Alternative Hypothesis
H a : Â1 Â2 …H a : Â1 Â2 „ H a : Â1 Â2 ` -
0
0
0
Â2 = -
o
Rejection region for level  test
ze z·
zd
z·
z·
either z e z · / 2or z d
 Example 1 :
Analysis of a random sample consisting of n1 =20 specimens of cold-rolled
steel to determine yield strengths resulted in a sample average strength of
x—= 29.8
ksi. A second random sample of n2 =25 two-side galvanized steel
y = 34.7
—
specimens gave a sample average strength of
ksi. Assuming that the
two yield-strength distributions are normal with È 1 = 4.0 and È 2 = 5.0 , does
/2
the data indicate that the corresponding true average yield strengths Â1 and Â2
are different? Let's carry out a test at significance level =0.01.
1.Parameter interest : Â1 Â2 the difference between the true average strengths
for the two types of steel
2.Null hypothesis : Ho : Â1 Â2 = 0
3.Alternative hypothesis : Ha : Â1 Â2 ` 0
x— y—
- 0 = 0, z =
4.Test statistic : With
È 1 / n1 ƒ È 22 / n.2
5.Rejection region : The form of Ha implies use of a two-tailed test with rejection
region either z e z 0.005= 2.58 or z d z 0.005= 2.58 .
29.8 34.7
6.The computed value z= 16.0 / 20ƒ 25.0 / 25 = -3.66. Since -3.66<-2.58, the
computed z does fall in the lower tail of the rejection region. Ho is therefore
rejected at level 0.01 in favor of the conclusion that Â1 ` Â2 . The sample data
strongly suggests that the true average yields strength for cold-rolled steel differs
from that galvanized steel.
 Confidence Interval :
When both population distributions are normal, standardizing
X—
1
X—2
gives a random
variable Z with a standard normal distribution. Since the area under the z curve between
z · / 2 and z · / 2 is 1 · , it follows that
X—1 X—2 Â1 Â2
P
z · /2 „
„ z· /2 = 1 ·
, equivalently
È 12 / n1 ƒ È 22 / n2
È 21 È 22
ƒ
„ Â1 Â2 „ X—1
n1
n2
This implies that a 100(1-)% CI for Â1 Â2 is
P X—1
X—2
x—
1
z · /2
x—2
z·
/2
È 21 È 22
ƒ
, x—
1
n1
n2
X—2 ƒ z · / 2
x—2 ƒ z ·
/2
Case 2 : Two-Sample t test and Confidence Interval
1)Assume È 1 ` È 2
È 21 È 22
ƒ
=1
n1
n2
È 12 È 22
ƒ
n1
n2
·
The natural estimator of Â1
Â2 is still
1
estimator can be expressed as Var X—
X—
1
X—2
, and the variance of this
X—2 =
È 12 È 22
ƒ . È 1 and È 2,
n1
n2
however,
are unknown so that we need to estimate both of them. Thus
s 12 s 22
‘
—
—
Var X 1 X 2 =
ƒ
a V 1 ƒ .V 2
n1 n 2


Null hypothesis : Ho : Â1 Â2 = x—
x—
- o
1
2
t=
Test statistic :
s 12 / n1 ƒ s 22 / n 2with
V
d.f.=truncated
1
ƒV
2
2
2
2
V 1
V 2
ƒ
n1 1 n 2 1
Alternative Hypothesis
H a : Â1 Â2 …H a : Â1 Â2 „ H a : Â1 Â2 ` -
 Confidence Interval :
x—
x—2 t ·
1
o
/ 2, df
0
0
0
Rejection region for level  test
t e t · , df
td
t · , df
either t e t · / 2 , dfor t d t ·
s 21 / n1 ƒ s 22 / n2 , x—
1
x—2 ƒ t ·
/ 2,df
/ 2 , df
s 12 / n1 ƒs 22 / n 2
Example 2 :
Suppose that Â1 and Â2 are true mean stopping distances(ft) at 50 mph for cars
of a certain type equipped with two different types of braking systems. Use the t test at
Â2 „
10
significance level 0.01 to test H o : Â1 Â2 = 10
vs H a : Â1
for
n = 6 , x—2 = 115.7 , s 1 = 5.03 , n2 = 6 , x—2 = 129.3 , s 2 = 5.38.
following data: 1
2) Assume È 1 = È 2.
2
Since È is the variance of both the X 1 distribution and the X 2
distribution, the best estimator should depend on both the Xi1's and Xj2's.
Furthermore, more weight should be given to the sample corresponding to the
2
2
larger of the two sample sizes. Both S 1 and S, the
two sample variances, are
2
2
estimators of È ; a better estimator than either one individually is the weighted
average of the two, which is called to be the pooled estimator of the common
2
variance È , denoted by
 S
2
p=
n1
2
2
1 S 1 ƒ n2 1 S 2

n1 ƒ n 2 2


Â2 = - o
Null hypothesis : Ho : Â1
x—
x—2 - o
1
t=
1
1
n ƒ n2
Pooled test statistic :
sp
ƒ with d.f. = 1
n1 n 2
te t ·
/2 , n
1
Alternative Hypothesis
H a : Â1 Â2 …- 0
H a : Â1 Â2 „ - 0
H a : Â1
Â2 ` - 0
td
t· / 2, n ƒ n
ƒ n or 2
2
1
2
2
Rejection region for level  test
te t · , n ƒ n 2
td
t· , n ƒ n
1
2
1
2
2
either
2
 Pooled Confidence Interval :
x—
x—2 t · / 2, n
1
1
ƒn
2
2
sp
1
1
ƒ
, x—
1
n1 n2
x—2 ƒ t · / 2, n
1
ƒn
2
2
1
1
sp  ƒ
n1 n 2

Case 3 : Paired t Test
Xn , Yn
The data consists of n independently selected pairs X 1 , Y 1 , X 2 , Y 2 , ... , with
E X = Â1 and E Y = Â2. . Let D1 = X 1 Y 1 , D 2 = X 2 Y 2 , ... , D n = X n Y n , so the Di ' s
are the differences within pairs. Then the Di 'sare assumed to be normally distributed
2
with variance È D .
Null hypothesis : Ho : ÂD a E X Y = Â1 Â2 = - o
—
d - o
—
d and s D
t=
Test statistic :
are the sample mean and
s D / n ,where
standard deviation of the d i 's . d.f.=n-1.
Alternative Hypothesis
H a : ÂD …- 0
H a : ÂD „ - 0
H a : ÂD ` - 0
Rejection region for level  test
te t · , n 1
td t · , n 1
either t e t · / 2 , n 1or t d t · / 2 ,
n
1
 Confidence Interval :
—
d t·
/2 , n
1
sD/ n , —
dƒ t ·
/2 , n
1
sD / n

Case 4 : Population Proportion
Having presented methods for comparing the means of two different populations, we now
turn to the comparison of two population proportions. The notation for this problem is an
extension of the notation used in the corresponding one-population problem.
Let X 1 ~ Bin n1 , Æ1 and X 2 ~ Bin n 2 , Æ2 with X
and X 2 independent variables.
Æ1 1 Æ1
Æ 1 Æ
ƒ 2 . 2
Then, E P 1 P 2 = Æ1 Æ2 and Var P 1 P 2 =
n1
n2
Æ1 and Æ2, however, are unknown so that we need to estimate them. Assuming that
Æ1 = Æ2 = Æc , instead of separate samples of size n1 and n 2 from two different
populations, we really have a single sample of size of n1 ƒ n 2 from one population
with proportion Æc . Since the total number of individuals in this combined sample
having the characteristic of interest is X 1 ƒ X ,2the estimator of Æc is
X ƒX2
n1
n2
‘ c a p c= 1
Æ
=
p 1ƒ
p 2
n1 ƒ n 2
n1 ƒ n 2
n1 ƒ n
2

1
Null hypothesis : Ho : Æ1 Æ2 = 0
p1 p2
z=
1
1
Test statistic :
p c 1 p c
ƒ
n1 n2
n
and
n
are
large enough .
2
when 1
Alternative Hypothesis
Rejection region for level  test
ze z·
H a : Æ1 Æ2 …0
zd
z·
H a : Æ1 Æ2 „ 0
z· / 2
H a : Æ1 ` Æ2
either z e z · / 2 or z d
 Confidence Interval:
p1 1 p
p 1 p 2 z · /2
n1
1 p2
p
, p 1 p 2ƒ z · /2
n2
This CI is valid provided that ni Æi e 5and ni 1 Æi e 5, i=1,2 .
1
ƒ
p
2
1
1 p
n1
1
ƒ
p
2
1 p
n2
2