• Study Resource
• Explore

Survey

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia, lookup

Student's t-test wikipedia, lookup

Taylor's law wikipedia, lookup

Bootstrapping (statistics) wikipedia, lookup

Confidence interval wikipedia, lookup

Resampling (statistics) wikipedia, lookup

Misuse of statistics wikipedia, lookup

Degrees of freedom (statistics) wikipedia, lookup

Psychometrics wikipedia, lookup

Foundations of statistics wikipedia, lookup

Statistical inference wikipedia, lookup

Sufficient statistic wikipedia, lookup

Transcript
```Chapter 13
Two Populations
1
12.1 Introduction
• Variety of techniques are presented whose
objective is to compare two populations.
• We are interested in:
– The difference between two means.
– The ratio of two variances.
– The difference between two proportions.
2
between Two Means: Independent
Samples
• Two random samples are drawn from the two
populations of interest.
• Because we compare two population means, we
use the statistic x1  x 2.
3
The Sampling Distribution of x1  x 2
1.
2.
x1  x 2 is normally distributed if the (original)
population distributions are normal .
x1  x 2 is approximately normally distributed if
the (original) population is not normal, but the
samples’ size is sufficiently large (greater than 30).
3.
The expected value of
4.
The variance of
x1  x 2 is m1 - m2
x1  x 2 is s12/n1 + s22/n2
4
Making an inference about m – m
• If the sampling distribution of x1  x 2 is normal or
approximately normal we can write:
( x 1  x 2 )  (m  m  )
Z


s s 

n1 n2
• Z can be used to build a test statistic or a
confidence interval for m1 - m2
5
Making an inference about m – m
• Practically, the “Z” statistic is hardly used,
because the population variances are not known.
( x 1  x 2 )  (m  m  )
Zt 


2
2
s
s
?
S
?
S1
2

n1 n2
• Instead, we construct a t statistic using the
sample “variances” (S12 and S22).
6
Making an inference about m – m
• Two cases are considered when producing the
t-statistic.
– The two unknown population variances are equal.
– The two unknown population variances are not equal.
7
Inference about m – m: Equal variances
• Calculate the pooled variance estimate by:
2
2
(
n

1
)
s

(
n

1
)
s
1
2
2
S p2  1
n1  n2  2
The pooled
variance
estimator
n1 = 10
S
n2 = 15
S 22
2
1
Example: s12 = 25; s22 = 30; n1 = 10; n2 = 15. Then,
(10  1)( 25)  (15  1)( 30)
Sp 
 28.04347
10  15  2
2
8
Inference about m – m: Equal variances
• Calculate the pooled variance estimate by:
2
2
(
n

1
)
s

(
n

1
)
s
1
2
2
S p2  1
n1  n2  2
The pooled
Variance
estimator
n2 = 15
n1 = 10
S
2
1
S 22
S p2
Example: s12 = 25; s22 = 30; n1 = 10; n2 = 15. Then,
(10  1)( 25)  (15  1)( 30)
Sp 
 28.04347
10  15  2
2
9
Inference about m – m: Equal variances
• Construct the t-statistic as follows:
( x1  x 2 )  (m  m  )
t
1
2 1
sp (  )
n1 n2
d.f .  n1  n2  2
• Perform a hypothesis test
H0: m  m = 0
H1: m  m > 0
or < 0
or
0
Build a confidence interval
( x1  x 2 )  t  
1 1
sp (  )
n1 n2
2
where   is the confidence level.
10
Inference about m – m: Unequal variances
t
( x1  x2 )  ( m  m  )
d.f. 
s12 s 22
(  )
n1 n2
( s12 n1  s 22 / n2 ) 2
( s12
2
n1 )

n1  1
( s 22
n2 )
n2  1
2
11
Inference about m – m: Unequal variances
Conduct a hypothesis test
as needed, or,
build a confidence interval
Confidence interval
s12 s22
( x1  x2 )  t 2 (

)
n1 n2
where    is the confidence level
12
Which case to use:
Equal variance or unequal variance?
• Whenever there is insufficient evidence that the
variances are unequal, it is preferable to perform
the equal variances t-test.
• This is so, because for any two given samples
The number of degrees
of freedom for the equal
variances case

The number of degrees
of freedom for the unequal
variances case
13
14
Example: Making an inference about m – m
• Example 13.1
– Do people who eat high-fiber cereal for
breakfast consume, on average, fewer
calories for lunch than people who do not eat
high-fiber cereal for breakfast?
– A sample of 150 people was randomly drawn.
Each person was identified as a consumer or
a non-consumer of high-fiber cereal.
– For each person the number of calories
consumed at lunch was recorded.
15
Example: Making an inference about m – m
Consmers Non-cmrs
568
498
589
681
540
646
636
739
539
596
607
529
637
617
633
555
.
.
.
.
705
819
706
509
613
582
601
608
787
573
428
754
741
628
537
748
.
.
.
.
Solution:
• The data are interval.
• The parameter to be tested is
the difference between two means.
• The claim to be tested is:
The mean caloric intake of consumers (m1)
is less than that of non-consumers (m2).
16
Example: Making an inference about m – m
•
The hypotheses are:
H0: (m1 - m2) = 0
H1: (m1 - m2) < 0
– To check the whether the population variances are
equal, we use (Xm13-01) computer output to find the
sample variances
We have s12= 4103, and s22 = 10,670.
– It appears that the variances are unequal.
17
Example: Making an inference about m – m
• Compute: Manually
– From the data we have:
x1  604 .02,

x2  633 .23
s12  4,103 , s22  10,670
(4103 43  10670 107 ) 2
4103

43 
10670 107 

43  1
107  1
2
2
 122 .6  123
18
Example: Making an inference about m – m
• Compute: Manually
– The rejection region is t < -t, = -t.05,123 1.658
t
( x1  x2 )  ( m  m  )
s12
n1

s22
n2

(604 .02  633 .23)  (0)
 -2.09
4103 10670

43
107
19
Example: Making an inference about m – m
Xm13-01
t-Test: Two-Sample Assuming Unequal Variances
Mean
Variance
Observations
Hypothesized Mean Difference
df
t Stat
P(T<=t) one-tail
t Critical one-tail
P(T<=t) two-tail
t Critical two-tail
Consumers Nonconsumers
604.02
633.23
4102.98
10669.77
43
107
0
123
-2.09
0.0193
1.6573
0.0386
1.9794
At the 5% significance
level there is sufficient
evidence to reject the
null hypothesis.
.0193 < .05
-2.09 < -1.6573
20
Example: Making an inference about m – m
• Compute: Manually
The confidence interval estimator for the difference
between two means is
 s2 s2 
 1

2
(x  x )  t
 

1 2
 2 n
n 
2
 1
4103 10670
 (604 .02  633 .239 )  1.9796

43
107
 29.21  27.65   56.86,  1.56
21
22
Example: Making an inference about m – m
• Example 13.2
– An ergonomic chair can be assembled using two
different sets of operations (Method A and Method B)
– The operations manager would like to know whether
the assembly time under the two methods differ.
23
Example: Making an inference about m – m
• Example 13.2
– Two samples are randomly and independently selected
• A sample of 25 workers assembled the chair using method A.
• A sample of 25 workers assembled the chair using method B.
• The assembly times were recorded
– Do the assembly times of the two methods differs?
24
Example: Making an inference about m – m
Assembly times in Minutes
Method A Method B
6.8
5.2
Solution
5.0
6.7
7.9
5.7
5.2
6.6
• The data are interval.
7.6
8.5
5.0
6.5
• The parameter of interest is the difference
5.9
5.9
5.2
6.7
between two population means.
6.5
6.6
.
.
.
.
• The claim to be tested is whether a difference
.
.
between the two methods exists.
.
.
25
Example: Making an inference about m – m
•
Compute: Manually
–The hypotheses test is:
H0: (m1 - m2)  0
H1: (m1 - m2)  0
– To check whether the two unknown population variances are
equal we calculate S12 and S22 (Xm13-02).
– We have s12= 0.8478, and s22 =1.3031.
– The two population variances appear to be equal.
26
Example: Making an inference about m – m
•
Compute: Manually
– To calculate the t-statistic we have:
x1  6.288 x2  6.016 s12  0.8478 s22  1.3031
(25  1)( 0.848)  (25  1)(1.303)
S 
 1.076
25  25  2
2
p
t
(6.288  6.016)  0
1
 1
1.076  
 25 25 
d.f .  25  25  2  48
 0.93
27
Example: Making an inference about m – m
• The rejection region is
t < -t/, =-t.025,48 = -2.009 or
t > t/, = t.025,48 = 2.009
For  = 0.05
• The test: Since t= -2.009 < 0.93 < 2.009, there is
insufficient evidence to reject the null hypothesis.
Rejection region
Rejection region
-2.009
.093 2.009
28
Example: Making an inference about m – m
Xm13-02
t-Test: Two-Sample Assuming Equal Variances
Method A Method B
Mean
6.29
6.02
Variance
0.8478
1.3031
Observations
25
25
Pooled Variance
1.08
Hypothesized Mean Difference
0
df
48
t Stat
0.93
P(T<=t) one-tail
0.1792
t Critical one-tail
1.6772
P(T<=t) two-tail
0.3584
t Critical two-tail
2.0106
-2.0106 < .93 < +2.0106
.3584 > .05
29
Example: Making an inference about m – m
• Conclusion: There is no evidence to infer at the
5% significance level that the two assembly
methods are different in terms of assembly time
30
Example: Making an inference about m – m
A 95% confidence interval for m1 - m2 is calculated as follows:
( x1  x2 )  t

s 2p (
1
1
 )
n1 n2
1
1
 6.288  6.016  2.0106 1.075(
 )
25 25
 0.272  0.5896  [0.3176 , 0.8616 ]
Thus, at 95% confidence level -0.3176 < m1 - m2 < 0.8616
Notice: “Zero” is included in the confidence interval
31
Checking the required Conditions for the equal
variances case (Example 13.2)
Design A
12
10
The data appear to be
approximately normal
8
6
4
2
0
5
5.8
6.6
Design B
7.4
8.2
More
4.2
5
5.8
7
6
5
4
3
2
1
0
6.6
7.4
More
32
13.4 Matched Pairs Experiment
• What is a matched pair experiment?
• Why matched pairs experiments are needed?
• How do we deal with data produced in this way?
The following example demonstrates a situation
where a matched pair experiment is the correct
approach to testing the difference between two
population means.
33
34
13.4 Matched Pairs Experiment
Example 13.3
– To investigate the job offers obtained by MBA graduates, a
study focusing on salaries was conducted.
– Particularly, the salaries offered to finance majors were
compared to those offered to marketing majors.
– Two random samples of 25 graduates in each discipline were
selected, and the highest salary offer was recorded for each
one. The data are stored in file Xm13-03.
– Can we infer that finance majors obtain higher salary offers
than do marketing majors among MBAs?.
35
13.4 Matched Pairs Experiment
• Solution
– Compare two populations of
interval data.
– The parameter tested is
m1 - m2
– H0: (m1 - m2) = 0
H1: (m1 - m2) > 0
Finance
61,228
51,836
20,620
73,356
84,186
.
.
.
Marketing
73,361
36,956
63,627
71,069
40,203
.
.
.
m1 The mean of the highest salary
offered to Finance MBAs
m2 The mean of the highest salary
offered to Marketing MBAs
36
13.4 Matched Pairs Experiment
• Solution – continued
From the data we have:
x1  65,624
x 2  60,423
s12  360,433,294,
s 22  262,228,559
• Let us assume equal
variances
Equal Variances
Mean
Variance
Observations
Pooled Variance
Hypothesized Mean Difference
df
t Stat
P(T<=t) one-tail
t Critical one-tail
P(T<=t) two-tail
t Critical two-tail
Finance
65624
360433294
25
311330926
0
48
1.04
0.1513
1.6772
0.3026
2.0106
Mark eting
60423
262228559
25
There is insufficient evidence to conclude
that Finance MBAs are offered higher
37
salaries than marketing MBAs.
The effect of a large sample variability
• Question
– The difference between the sample means is
65624 – 60423 = 5,201.
– So, why could we not reject H0 and favor H1 where
(m1 – m2 > 0)?
38
The effect of a large sample variability
– Sp2 is large (because the sample variances are
large) Sp2 = 311,330,926.
– A large variance reduces the value of the t statistic
and it becomes more difficult to reject H0.
( x1  x 2 )  (m  m  )
t
1
2 1
sp (  )
n1 n2
39
Reducing the variability
The range of observations
sample A
The values each sample consists of might markedly vary...
The range of observations
sample B
40
Reducing the variability
Differences
...but the differences between pairs of observations
might be quite close to one another, resulting in a small
The range of the
variability of the differences.
differences
0
41
The matched pairs experiment
• Since the difference of the means is equal to
the mean of the differences we can rewrite the
hypotheses in terms of mD (the mean of the
differences) rather than in terms of m1 – m2.
• This formulation has the benefit of a smaller
variability. Group 1
Group 2
Difference
10
15
12
11
-2
+4
Mean1 =12.5 Mean2 =11.5
Mean1 – Mean2 = 1
Mean Differences = 1
42
The matched pairs experiment
• Example 13.4
– It was suspected that salary offers were affected by
students’ GPA, (which caused S12 and S22 to increase).
– To reduce this variability, the following procedure was
used:
• 25 ranges of GPAs were predetermined.
• Students from each major were randomly selected, one from
each GPA range.
• The highest salary offer for each student was recorded.
– From the data presented can we conclude that Finance
majors are offered higher salaries?
43
The matched pairs hypothesis test
• Solution (by hand)
– The parameter tested is mD (=m1 – m2)
Finance Marketing
– The hypotheses:
H0: mD = 0
The rejection region is
H1: mD > 0
t > t.05,25-1 = 1.711
– The t statistic:
t
xD  mD
sD
Degrees of freedom = nD – 1
n
44
The matched pairs hypothesis test
• Solution
– From the data (Xm13-04) calculate:
GPA Group Finance Marketing Difference
1
95171
89329
5842
2
88009
92705
-4696
3
98089
99205
-1116
4
106322
99003
7319
5
74566
74825
-259
6
87089
77038
10051
7
88664
78272
10392
8
71200
59462
11738
9
69367
51555
17812
10
82618
81591
1027
.
.
.
.
.
.
.
.
.
Difference
Mean
Standard Error
Median
Mode
Standard Deviation
Sample Variance
Kurtosis
Skewness
Range
Minimum
Maximum
Sum
Count
5065
1329
3285
#N/A
6647
44181217
-0.6594
0.3597
23533
-5721
17812
126613
25
45
The matched pairs hypothesis test
• Solution
x D  5,065
s D  6,647
– Calculate t
x D  mD
5065  0
t

 3.81
sD n 6647 25
46
The matched pairs hypothesis test
t-Test: Paired Two Sample for Means
Finance Mark eting
Mean
65438
60374
Variance
444981810 469441785
Observations
25
25
Pearson Correlation
0.9520
Hypothesized Mean Difference
0
df
24
t Stat
3.81
P(T<=t) one-tail
0.0004
t Critical one-tail
1.7109
P(T<=t) two-tail
0.0009
t Critical two-tail
2.0639
Xm13-04
3.81 > 1.7109
.0004 < .05
47
The matched pairs hypothesis test
Conclusion:
There is sufficient evidence to infer at 5%
significance level that the Finance MBAs’ highest
salary offer is, on the average, higher than that of
the Marketing MBAs.
48
The matched pairs mean difference
estimation
Confidence Interval Estimator of m D
x D  t / 2,n 1
s
n
Example 13 .5
The 95 % confidence interval of the mean difference
6647
in Example 13 .4 is 5065  2.064
 5,065  2,744
25
49
The matched pairs mean difference
estimation
Using Data Analysis Plus
GPA Group Finance Marketing Difference
1
95171
89329
5842
2
88009
92705
-4696
3
98089
99205
-1116
4
106322
99003
7319
5
74566
74825
-259
6
87089
77038
10051
7
88664
78272
10392
8
71200
59462
11738
9
69367
51555
17812
10
82618
81591
1027
.
.
.
.
.
.
.
.
.
Xm13-04
t-Estimate: Mean
Mean
Standard Deviation
LCL
UCL
Difference
5065
6647
2321
7808
First calculate the differences,
then run the confidence interval
procedure in Data Analysis Plus.
50
Checking the required conditions
for the paired observations case
• The validity of the results depends on the
normality of the differences.
Frequency
Histogram
10
5
0
0
5000
10000
15000
20000
Difference
51
of two variances
• In this section we draw inference about the ratio
of two population variances.
• This question is interesting because:
– Variances can be used to evaluate the consistency
of processes.
– The relationship between population variances
determines which of the equal-variances or unequalvariances t-test and estimator of the difference
between means should be applied
52
Parameter and Statistic
• Parameter to be tested is s12/s22
• Statistic used is
2
1
2
2
s s
F
s s
2
1
2
2
• Sampling distribution of s12/s22
– The statistic [s12/s12] / [s22/s22] follows the F distribution
with 1 = n1 – 1, and 2 = n2 – 1.
53
Parameter and Statistic
– Our null hypothesis is always
H0: s12 / s22 = 1
S12/s12
– Under this null hypothesis the F statistic F = 2 2
S2 /s2
becomes
s
F
s
2
1
2
2
54
55
Testing the ratio of two population variances
Example 13.6 (revisiting Example 13.1)
(see Xm13-01)
Calories intake at lunch
Consmers Non-cmrs
In order to perform a test
568
705
The hypotheses are:
498
819
regarding average
589
706

681
509
s
consumption of calories at
540
613
H0:   1
646
582
people’s lunch in relation to
636
601
s 
739
608
the inclusion of high-fiber

539
787
s

596
573
cereal in their breakfast, the
1
H
:
607
428
1

529
754
variance ratio of two samples
s
637
741
has to be tested first.
617
628
633
555
.
.
.
.
537
748
.
.
.
.
56
Testing the ratio of two population variances
• Solving by hand
– The rejection region is
F>F/2,1,2 or
F<1/F/,,
F  F / 2, 1, 2  F.025, 42,106  F.025,40,120  1.61
F
1
F / 2, 2, 1

1
F.025,106, 42

1
F.025,120,40
1

 .58
1.72
– The F statistic value is F=S12/S22 = .3845
– Conclusion: Because .3845<.58 we reject the null hypothesis in
favor of the alternative hypothesis, and conclude that there is
sufficient evidence at the 5% significance level that the
57
population variances differ.
Testing the ratio of two population variances
Example 13.6 (revisiting Example 13.1)
(see Xm13-01)
In order
to perform aare:
test
The hypotheses
regarding average

s
consumption
at
H0:  ofcalories
1
 in relation to
people’s s
lunch

the inclusion
 of high-fiber
s
cereal
in
breakfast,
the

1
H1: their

s
variance ratio
 of two samples
has to be tested first.
F-Test Two-Sample for Variances
Consumers Nonconsumers
Mean
604
633
Variance
4103
10670
Observations
43
107
df
42
106
F
0.3845
P(F<=f) one-tail
0.0004
F Critical one-tail
0.6371
58
Estimating the Ratio of Two Population
Variances
• From the statistic F = [s12/s12] / [s22/s22] we can
isolate s12/s22 and build the following confidence
interval:
2
2 
 s12 

s
s
1
1
 
 1 F / 2, 2,1


2
 s2  F
 s2 
s
2
 2   / 2,1, 2
 2
where 1  n  1 and  2  n2  1
59
Estimating the Ratio of Two Population Variances
• Example 13.7
– Determine the 95% confidence interval estimate of the ratio
of the two population variances in Example 13.1
– Solution
• We find F/2,v1,v2 = F.025,40,120 = 1.61 (approximately)
F/2,v2,v1 = F.025,120,40 = 1.72 (approximately)
• LCL = (s12/s22)[1/ F/2,v1,v2 ]
= (4102.98/10,669.77)[1/1.61]= .2388
• UCL = (s12/s22)[ F/2,v2,v1 ]
= (4102.98/10,669.77)[1.72]= .6614
60
between two population proportions
• In this section we deal with two populations whose data
are nominal.
• For nominal data we compare the population
proportions of the occurrence of a certain event.
• Examples
– Comparing the effectiveness of new drug versus older one
– Comparing market share before and after advertising
campaign
– Comparing defective rates between two machines
61
Parameter and Statistic
• Parameter
– When the data are nominal, we can only count the
occurrences of a certain event in the two
populations, and calculate proportions.
– The parameter is therefore p1 – p2.
• Statistic
– An unbiased estimator of p1 – p2 is p̂1  p̂ 2 (the
difference between the sample proportions).
62
Sampling Distribution of p̂1  p̂ 2
• Two random samples are drawn from two populations.
• The number of successes in each sample is recorded.
• The sample proportions are computed.
Sample 1
Sample size n1
Number of successes x1
Sample proportion
pˆ 1 
x1
n1
Sample 2
Sample size n2
Number of successes x2
Sample proportion
x2
p̂ 2 
n2
63
Sampling distribution of p̂1  p̂ 2
• The statistic p̂1  p̂ 2 is approximately normally distributed
if n1p1, n1(1 - p1), n2p2, n2(1 - p2) are all greater than or
equal to 5.
• The mean of p̂1  p̂ 2 is p1 - p2.
• The variance of p̂1  p̂ 2 is (p1(1-p1) /n1)+ (p2(1-p2)/n2)
64
The z-statistic
Z
( pˆ 1  pˆ 2 )  ( p1  p 2 )
p1 (1  p1 ) p 2 (1  p 2 )

n1
n2
Because p1 and p 2 are unknown the standard error
must be estimated using the sample proportions.
The method depends on the null hypothesis
65
Testing the p1 – p2
• There are two cases to consider:
Case 1:
H0: p1-p2 =0
Calculate the pooled proportion
Case 2:
H0: p1-p2 =D (D is not equal to 0)
Do not pool the data
x1  x 2
p̂ 
n1  n 2
Then
(p̂1  p̂ 2 )  (p1  p 2 )
Z
1
1
p̂(1  p̂)(  )
n1 n2
x1
p̂1 
n1
Then
Z
x2
p̂ 2 
n2
(p̂1  p̂ 2 )  D
p̂1 (1  p̂1 ) p̂ 2 (1  p̂ 2 )

n1
n2
66
Testing p1 – p2 (Case 1)
• Example 13.8
– The marketing manager needs to decide which of
two new packaging designs to adopt, to help
improve sales of his company’s soap.
– A study is performed in two supermarkets:
• Brightly-colored packaging is distributed in supermarket 1.
• Simple packaging is distributed in supermarket 2.
– First design is more expensive, therefore,to be
financially viable it has to outsell the second design.
67
Testing p1 – p2 (Case 1)
• Summary of the experiment results
– Supermarket 1 - 180 purchasers of Johnson Brothers
soap out of a total of 904
– Supermarket 2 - 155 purchasers of Johnson Brothers
soap out of a total of 1,038
– Use 5% significance level and perform a test to find
which type of packaging to use.
68
Testing p1 – p2 (Case 1)
• Solution
– The problem objective is to compare the population
of sales of the two packaging designs.
– The data are nominal (Johnson Brothers or other
soap)
Population 1: purchases at supermarket 1
– The hypotheses are
Population 2: purchases at supermarket 2
H0: p1 - p2 = 0
H1: p1 - p2 > 0
– We identify this application as case 1
69
Testing p1 – p2 (Case 1)
• Compute: Manually
– For a 5% significance level the
rejection region is
z > z = z.05 = 1.645
The sample proportions are
pˆ 1  180 904  .1991 , and pˆ 2  155 1,038  .1493
The pooled proportion is
pˆ  ( x1  x 2 ) (n1  n 2 )  (180  155 ) (904  1,038 )  .1725
The z statistic becomes
( pˆ  pˆ 2 )  ( p1  p 2 )
.1991  .1493
Z 1

 2.90
 1
 1
1 
1 
.1725 (1  .1725 )



pˆ (1  pˆ ) 
70
 904 1,038 
 n1 n 2 
Testing p1 – p2 (Case 1)
• Excel (Data Analysis Plus)
Xm13-08
z-Test: Two Proportions
Supermark et 1 Supermark et 2
Sample Proportions
0.1991
0.1493
Observations
904
1038
Hypothesized Difference
0
z Stat
2.90
P(Z<=z) one tail
0.0019
z Critical one-tail
1.6449
P(Z<=z) two-tail
0.0038
z Critical two-tail
1.96
Conclusion: There is sufficient evidence to conclude at the 5%
significance level, that brightly-colored design will outsell the
simple design.
71
Testing p1 – p2 (Case 2)
• Example 13.9 (Revisit Example 13.8)
– Management needs to decide which of two new
packaging designs to adopt, to help improve sales of a
certain soap.
– A study is performed in two supermarkets:
– For the brightly-colored design to be financially viable it
has to outsell the simple design by at least 3%.
72
Testing p1 – p2 (Case 2)
• Summary of the experiment results
– Supermarket 1 - 180 purchasers of Johnson Brothers’
soap out of a total of 904
– Supermarket 2 - 155 purchasers of Johnson Brothers’
soap out of a total of 1,038
– Use 5% significance level and perform a test to find
which type of packaging to use.
73
Testing p1 – p2 (Case 2)
• Solution
– The hypotheses to test are
H0: p1 - p2 = .03
H1: p1 - p2 > .03
– We identify this application as case 2 (the
hypothesized difference is not equal to zero).
74
Testing p1 – p2 (Case 2)
• Compute: Manually
Z 

( pˆ 1  pˆ 2 )  D
pˆ 1 (1  pˆ 1 ) pˆ 2 (1  pˆ 2 )

n1
n2
 180   155 
  .03

  
 904   1,038 
 1 .15
.1991 (1  .1991 ) .1493 (1  .1493 )

904
1,038
The rejection region is z > z = z.05 = 1.645.
Conclusion: Since 1.15 < 1.645 do not reject the null hypothesis.
There is insufficient evidence to infer that the brightly-colored
design will outsell the simple design by 3% or more.
75
Testing p1 – p2 (Case 2)
• Using Excel (Data
Analysis Plus)
Xm13-08
z-Test: Two Proportions
Supermark et 1 Supermark et 2
Sample Proportions
0.1991
0.1493
Observations
904
1038
Hypothesized Difference
0.03
z Stat
1.14
P(Z<=z) one tail
0.1261
z Critical one-tail
1.6449
P(Z<=z) two-tail
0.2522
z Critical two-tail
1.96
76
Estimating p1 – p2
• Estimating the cost of life saved
– Two drugs are used to treat heart attack victims:
• Streptokinase (available since 1959, costs \$460)
• t-PA (genetically engineered, costs \$2900).
– The maker of t-PA claims that its drug outperforms
Streptokinase.
– An experiment was conducted in 15 countries.
• 20,500 patients were given t-PA
• 20,500 patients were given Streptokinase
• The number of deaths by heart attacks was recorded.
77
Estimating p1 – p2
• Experiment results
– A total of 1497 patients treated with Streptokinase
died.
– A total of 1292 patients treated with t-PA died.
• Estimate the cost per life saved by using t-PA
78
Estimating p1 – p2
• Solution
– The problem objective: Compare the outcomes of
two treatments.
– The data are nominal (a patient lived or died)
– The parameter to be estimated is p1 – p2.
• p1 = death rate with t-PA
• p2 = death rate with Streptokinase
79
Estimating p1 – p2
• Compute: Manually
1497
1292
 .0730, p̂ 2 
 .0630
– Sample proportions: p̂1 
20500
20500
(p̂1  p̂ 2 ) 
p̂1 (1  p̂1 ) p̂ 2 (1  p̂ 2 )

n1
n2
– The 95% confidence interval estimate is
.0730  .0630  1.96
LCL  .0051
.0730 (1  .0730 ) .0630 (1  .0630 )

 .0100  .0049
20500
20500
UCL  .0149
80
Estimating p1 – p2
• Interpretation
– We estimate that between .51% and 1.49% more
heart attack victims will survive because of the use
of t-PA.
– The difference in cost per life saved is
2900-460= \$2440.
– The total cost saved by switching to t-PA is
estimated to be between 2440/.0149 = \$163,758 and
2440/.0051 = \$478,431
81
```
Related documents