Download z Tests and Intervals

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Psychometrics wikipedia , lookup

Confidence interval wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Taylor's law wikipedia , lookup

Misuse of statistics wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript
1
Chapter 9.1: z Tests and Confidence Intervals for a Difference Between Two
Population Means
Instructor: Dr. Arnab Maity
2
In Chapter 8, we learned about the following.
• Tests about a mean of a single population
– z test (when either σ is known OR sample size n is large)
– t test (when either σ is unknown AND sample size n is small)
In this chapter, we will learn how to do inference about the difference between means of two
populations.
Setting: We have two populations with means µ1 and µ2 . We are interested in the difference
∆ = µ1 − µ2 .
• Let µ1 denote true average Rockwell hardness for heat-treated steel specimens and µ2
denote true average hardness for cold-rolled specimens. Then an investigator might
wish to use samples of hardness observations from each type of steel as a basis for
calculating an interval estimate of µ1 − µ2 , the difference between the two true average
hardness.
• In the above example, another problem of interest is to test whether the Rockwell
hardness of the two types of steel is equal. In other words, one may want to test the
hypothesis H0 : µ1 − µ2 = 0.
Data and point estimators: We observe random samples from each of the two populations.
• X1 , · · · , Xm is a random sample with mean µ1 and variance σ12 .
• Y1 , · · · , Yn is a random sample with mean µ2 and variance σ22 .
• The X and Y samples are independent of each other.
• Point estimators:
µ̂1 =
µ̂2 =
ˆ =
∆
Result: Under the above assumptions, we have
• E(X̄ − Ȳ ) = µ1 − µ2 .
• V ar(X̄ − Ȳ ) = σ12 /m + σ22 /n.
These two results are valid without assumptions on the underlying distributions of the two
samples. Thus we can obtain an unbiased estimator for the mean difference ∆ = µ1 − µ2
without any distributional assumptions. But in order to construct CI or to develop a test
on ∆, we need further assumptions, if the sample size is not large.
3
Case 1: Normal populations with known variances
Null hypothesis:
H0 : µ1 − µ2 = ∆0 .
Test statistic:
X̄ − Ȳ − ∆0
Z= q 2
.
σ22
σ1
+ n
m
Null distribution: Under H0 , the test statistic has a standard normal [N(0,1)]
distribution.
Alternative Hypothesis
Ha : µ1 − µ2 > ∆0
Ha : µ1 − µ2 < ∆0
Ha : µ1 − µ2 6= ∆0
Rejection Region for Level α Test
z ≥ zα (upper-tailed)
z ≤ −zα (lower-tailed)
z ≤ −zα/2 OR z ≥ zα/2 (two-tailed)
The 100(1 − α)% Confidence Interval (CI) for µ1 − µ2
For normal distributions with known σ1 and σ2 :
r
σ12 σ22
+ .
(x̄ − ȳ) ± zα/2
m
n
p-value
1 − Φ(z)
Φ(z)
2(1 − Φ(|z|))
4
1. Does car ownership affect academic achievement? The GPA for m = 100 non-owners
of cars had a sample average GPA of 2.70 and a known population variance of 0.36.
The n = 100 car owners had a sample average GPA of 2.54 and population variance
of 0.40. Do the data present sufficient evidence to indicate a difference in the mean
achievement between car owners and non-owners of cars? Use α = 0.05.
Solution:
(1) Parameter of interest is
(2) The null hypothesis is
(3) The alternative hypothesis is
(4) The test statistic value is
2.7 − 2.54 − 0
x̄ − ȳ − ∆0
q
=
= 1.84
z= q 2
σ1
σ22
0.36
0.4
+
+ n
100
100
m
(5) The rejection region is
(6) Conclusion:
Another way to answer the same question is by calculating the p-value. The p-value
is: 2(1 − Φ(|z|)) = 2(1 − Φ(1.85)) = 0.064.
5
Case 2: Large sample tests with m > 40 and n > 40
Suppose that the populations may not be normal and the σs are unknown. Then the
test statistic:
X̄ − Ȳ − ∆0
Z= q 2
,
s22
s1
+ n
m
where s21 is the sample variance of X sample, and s22 is the sample variance of Y
sample.
The rejection region and p-value can be calculated as in the earlier z-tests.
The 100(1 − α)% Confidence Interval (CI) for µ1 − µ2
For large sample sizes (with both m and n are large):
r
s21 s22
x̄ − ȳ ± zα/2
+ .
m
n
The upper and lower confidence bounds can be obtained by replacing zα/2 with zα
and retaining the appropriate sign (+ or −).
2. A new method of storing snap beans is believed to retain more ascorbic acid than the
old method. In an experiment, snap beans were harvested under uniform conditions
and frozen in 250 equal-size packages. One hundred of those packages were randomly
selected and stored according to the new method, and the other 150 packages were
stored by the old method. Subsequently, ascorbic acid determinations (in mg/kg) were
made, and the following summary statistics were calculated.
New method
Sample Size
100
Sample Mean ascorbic acid
450
Sample standard deviation
20
Old Method
150
410
45
(a) Perform the appropriate test at a 1% level of significance.
6
(b) Construct a 95% confidence interval for the difference in the mean amount of
ascorbic acid retained.
(c) Does the confidence interval suggest that the new method retains more ascorbic
acid than the old method?
7
3. Big Foods Grocery has two grocery stores located in Raleigh. One store is located on
First Street and the other on Main Street and each is run by a different manager. Each
manager claims that her store’s layout maximizes the amounts customers will purchase
on impulse. Both managers surveyed a sample of their customers and asked them how
much more they spent than they had planned to, in other words, how much did they
spend on impulse?
For manager 1, 45 subjects were sampled and the sample mean was $3.14 with a
standard standard deviation of $1.99.
For manager 2, 51 subjects were sampled and the sample mean was $2.64 with a
standard standard deviation of $1.29.
• Does either manager have a claim as begin different? Conduct a hypothesis test
using α = 0.1.
• Find a 90% Confidence Interval for the mean difference. Are your conclusions the
same as above? Why?