Download Statistics Notes 10 - Home Page of Vance A. Hughey

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Statistics wikipedia , lookup

Transcript
Lecture Notes for
Principles of Statistics II (Economics 262)
Western Nevada College
Copyright © 2008
By Vance A. Hughey
Carson City, Nevada
The following material is copyrighted. The text of this publication, or any part thereof, may not
be reproduced or transmitted in any form or by any means, electronic or mechanical, including
photocopying, storage in an information retrieval system, or otherwise, without the prior written
permission of Vance A. Hughey.
Statistical Inference About Means and Proportions with Two Populations
Page 1
10.
Statistical Inference About Means and Proportions with Two Populations—Expands
the discussion of interval estimation and hypothesis testing to cases where two
populations are involved. Specifically, this chapter discusses the selection of random
samples and performs statistical analyses that will enable the drawing of conclusions
about the difference between means and/or proportions of two populations.
Estimation of the Difference Between the Means of Two Populations—Independent
Samples
Sometimes we are faced with situations where we have two separate populations where
the difference between the means of the two populations is important. Examples of the
kinds of questions that can be answered using this procedure include:
•
Is the average return on investment in common stocks different than the average
return on municipal bonds?
•
Are women more likely to drink white wine than men?
•
Is one hospital more likely to perform cesarean sections than another hospital?
•
Is it reasonable to expect do-it-yourselfers to be younger than non-do-it-yourselfers?
This procedure is also useful in conducting efficacy studies. These are studies to
determine the capacity of a product or procedure to produce a desired effect.
To begin with, we will take two random samples, one each from different populations.
From this sample we will be able to compute the sample mean for each sample:
x1 = sample mean for the sample taken from population 1
x 2 = sample mean for the sample taken from population 2
We can then use the sample means as point estimates of the population means:
µ1 = mean of population 1
µ2 = mean of population 2
And since the difference between the mean of population 1 and the mean of population 2
is µ1 − µ 2 , we can say that x1 − x 2 is a point estimate of the difference between the
means of two populations.
This is the point estimate of interest in answering the kinds of questions I mentioned a
few minutes ago. And this point estimate, like other point estimators, has a sampling
distribution of its own with a mean = µ1 − µ2 and a standard deviation,
Statistical Inference About Means and Proportions with Two Populations
Page 2
σ12 σ 22
+
. If we use a large sample, n ≥ 30 , then the sampling distribution can
n1 n2
be approximated by a normal probability distribution.
σ x1 −x 2 =
The sampling distribution of x1 − x2 is
! x1 " x2
! 12 ! 22
=
+
n1 n2
x1 " x2
µ1 " µ2
The interval estimate of the difference between the means of two populations follows the
same procedure that we have used for interval estimation in Chapter 8:
x1 − x2 ± zα / 2σ x1 −x 2
Since s may not be known, we can again substitute s for σ giving us:
sx1 −x 2 =
s12 s22
+
n1 n2
Hypothesis Tests About the Difference Between the Means of Two Populations—
σ 1 and σ 2 Known
Now let’s conduct a hypothesis test. As usual, we have to consider two different cases.
The first is the case where both σ 1 and σ 2 are assumed to be known. The second is the
case where both σ 1 and σ 2 are assumed to be unknown.
σ 1 and σ 2 Known—Let’s use the Greystone Department Store example beginning on
page 395 to illustrate how to conduct this hypothesis test:
Statistical Inference About Means and Proportions with Two Populations
Page 3
The Greystone Department Store study provided the following data on the ages of
customers from independent random samples taken at two store locations:
Inner-City Store
Suburban Store
n1 = 36
n 2 = 49
x1 = 40 years
x 2 = 35 years
σ 1 = 9 years
σ 2 = 10 years
The questions we need to answer are:
a. State the hypotheses that could be used to detect a difference between the
population mean ages at the two stores.
b. What is the value of the test statistic?
c. What is the p-value?
d. At α = .05 , what is your conclusion?
For α = .05 , test the hypothesis H 0 : µ1 − µ2 = 0 against the alternative hypothesis
H a : µ1 − µ2 ≠ 0 . Since we are using α = .05 , and since we are going to conduct a twotailed test, we need to find zα / 2 . Since 1− α = .95 and .95 ÷ 2 = 0.4750 , zα / 2 = 1.96 . If we
calculate a test statistic z-value less than –1.96 or greater than 1.96, we will reject the null
hypothesis and conclude that the two means are not equal.
2
"x "! x–!1 #xx2 =
1
2
!/2 = .025
!/2 = .025
-1.96
0
z
1.96
reject H0
The test statistic is z =
2
"
"
" 1" 22 2
+ !+!
n1 n1 n2 n2
2
1
reject H0
(x1 − x2 ) − D0
σ 12 σ 22
+
n1 n2
, where D0 = µ1 − µ2 .
Statistical Inference About Means and Proportions with Two Populations
Page 4
Using the sample standard deviation in place of the population standard deviation, we get
z = 2.41. Since 2.41 > 1.96, we will conclude that we must reject the null hypothesis;
customers at the two stores differ in terms of their ages.
σ 1 and σ 2 Unknown—Now we can see how hypothesis testing differs when σ 1 and σ 2 are
not known. When we compare two means, the population standard deviations are rarely
known. This creates technical problems that require modification of the methods we just
employed. In particular, we will employ the t-distribution much as we did in the case of
inferences about a single population mean when σ was not known. In computing the
degrees of freedom, we use the formula:
df = n1 + n2 − 2 .
We could use the formula on page 403, but it is rather complicated and doesn’t yield a
value that is significantly difference than the one using the simpler formula.
df =
⎛ s12 s22 ⎞
⎜⎝ n + n ⎟⎠
1
2
2
2
1 ⎛ s12 ⎞
1 ⎛ s22 ⎞
+
n1 − 1 ⎜⎝ n1 ⎟⎠
n2 − 1 ⎜⎝ n2 ⎟⎠
2
Let’s do an exercise to solve a problem where σ 1 and σ 2 are unknown.
Salary surveys of marketing and management majors show the following starting annual
salary data:
Marketing Majors
Management Majors
n1 = 14
n 2 = 16
x1 = $19, 800
x 2 = $19, 300
s1 = $1, 000
s 2 = $1, 400
The null hypothesis is that the mean annual salaries are the same for both majors.
H 0 : µ1 − µ2 = 0
H a : µ1 − µ2 ≠ 0
Degrees of freedom = n1 + n2 − 2 = 28 .
Reject H0 : if t < −2.048 or t > 2.048 (we find this value in the t-distribution table where
df = 28 and area in upper tail is .025).
Statistical Inference About Means and Proportions with Two Populations
Page 5
t=
t=
( x1 − x2 ) − D0
s12 s22
+
n1 n2
(19, 800 − 19, 300 ) − ( 0 ) = 1.14
1, 000 2 1, 400 2
+
14
16
Do not reject H 0 ; there is no significant difference between the mean annual salaries of
the two majors.
Inferences About the Difference Between the Means of Two Populations—Matched
Samples
Sometimes it is either necessary or desirable to eliminate “noise” in the sampling process
to assure a better test. “Noise reduction” in experimental design involves choosing
samples in order to reduce the inherent variation and thereby sharpen the ability of a test
to detect differences between populations. Using the method of matched samples,
researchers are able to block out variations in sample data that are attributable to factors
other than the process being examined.
Matched sample design typically involves wanting to test two different processes or
procedures. In this case, we could have two different groups of workers perform their
tasks using one or the other of these processes or procedures. But, that would introduce
the possibility of sampling error even if there were no difference in the procedures. To
limit this noise, we can design a test where one group is used to test both procedures.
Then the difference between the results for each element of the group will be averaged.
If there were absolutely no difference between the two procedures, we would expect the
average difference to be zero.
Since it is unlikely for the sample data to result in a difference of zero, we need to be able
to assess the probability that a sample could result in a particular non-zero value even
though the population value was, in fact, zero.
To use this matched-sample methodology, we still need to establish a null hypothesis and
alternative hypothesis. But now we will let µ d = the mean of the difference values for
the population and establish the hypotheses as follows:
H0 : µd = 0
H a : µd ≠ 0
Statistical Inference About Means and Proportions with Two Populations
Page 6
n
Now instead of using x we will use d =
n
the difference data is sd =
∑ (d
i=1
they are x and s , respectively.
i
−d)
n −1
∑d
i=1
n
i
. And the sample standard deviation for
2
. Note: Calculators will treat d and sd as if
We will then apply the standard techniques of statistical inference.
The example on page 410 of the text uses a small sample with n = 6 , which dictates the
use of the t-distribution:
t=
d − µd
sd / n
We also assume the population of differences has a normal population.
Let’s once again refer to an exercise in the text to see if we can solve a matched-sample
problem—question 21 on page 414.
21.
A market research firm used a sample of individuals to rate the potential to
purchase for a particular product before and after the individuals saw a new
television commercial about the product. The potential to purchase ratings were
based on a 0 to 10 scale, with higher values indicating hire potential to purchase.
The null hypothesis stated that the mean rating “after” would be less than or equal
to the mean rating “before.” Rejection of this hypothesis would provide the
conclusion that the commercial improved the mean potential to purchase rating.
Use α =.05 and the following data to test the hypothesis and comment on the value
of the commercial:
Individual
1
2
3
4
5
6
7
8
Purchase Rating
After Seeing
Commercial
6
6
7
4
3
9
7
6
Purchase Rating
Before Seeing
Commercial
5
4
7
3
5
8
5
6
Statistical Inference About Means and Proportions with Two Populations
Difference
1
2
0
1
–2
1
2
0
Page 7
Difference = rating after – rating before.
H0 : µd ≤ 0
H a : µd > 0
With n − 1 = 7 degrees of freedom, reject H 0 if t > 1.895
d = 0.63
sd = 1.3025
t=
d − µd
sd / n
=
0.63 − 0
1.3025/ 8
= 1.37
Do not reject H 0 ; we cannot conclude that seeing the commercial improves the potential
to purchase.
Inferences About the Difference Between the Proportions of Two Populations
When considering the difference between two population proportions, we will use the
following properties:
Mean : E( p1 − p2 ) = p1 − p2
Standard Deviation: σ p1 − p2 =
p1 (1− p1 )
n1
+
p2 (1− p2 )
n2
The sampling distribution of p1 − p2 can be approximated by a normal probability
distribution if the sample sizes are large. We have large samples if:
n1 p1 ;
n1 (1 − p1 ) ;
n 2 p2 ; and
n 2 (1 − p2 )
are all greater than or equal to 5.
To compute an interval estimate of the difference between the proportions of two
populations (assuming a large sample case) use the following formula:
p1 − p2 ± zα 2σ p1 − p2
Statistical Inference About Means and Proportions with Two Populations
Page 8
where 1− α is the confidence coefficient.
If σ p1 − p2 is unavailable, use the estimate of σ p1 − p2 :
p1 (1− p1 )
s p1 − p2 =
n1
+
p2 (1− p2 )
n2
When conducting hypothesis tests using proportions, the test statistic is written as:
z=
( p1 − p2 ) − ( p1 − p2 )
⎛1 1⎞
p (1 − p ) ⎜ + ⎟
⎝ n1 n2 ⎠
When the null hypothesis to be tested is H 0 : p1 − p2 = 0 , we are assuming that p1 = p2 ,
so we do not need to use the individual values for p1 − p2 in the formula for s p1 − p2 .
Instead, we only need one value of p since it can be used as an estimate for both p1 and
p2 . Again, we will need to pool the two sample results to get one estimate of the
population proportion:
p=
n1 p1 + n2 p2
n1 + n2
The formula for the standard deviation of the sampling distribution then becomes:
s p1 − p2 =
⎛1 1⎞
p (1 − p ) ⎜ + ⎟
⎝ n1 n2 ⎠
Statistical Inference About Means and Proportions with Two Populations
Page 9