Download S 2

Document related concepts

History of statistics wikipedia , lookup

Sufficient statistic wikipedia , lookup

Foundations of statistics wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Psychometrics wikipedia , lookup

Taylor's law wikipedia , lookup

Omnibus test wikipedia , lookup

Misuse of statistics wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript
Economics 173
Business Statistics
Lectures 5 & 6
Summer, 2001
Professor J. Petry
1
Chapter 12
Inference about the
Comparison of
Two Populations
2
12.1 Introduction
• Variety of techniques are presented whose
objective is to compare two populations.
• We are interested in:
– The difference between two means.
– The ratio of two variances.
– The difference between two proportions.
3
12.2 Inference about the Difference b/n
Two Means: Independent Samples
• Two random samples are drawn from the two
populations of interest.
• Because we are interested in the difference
between the two means, we shall build the
statistic x for each sample (and support the
analysis by the statistic S2 as well).
4
The Sampling Distribution of x  x
1


2
x1  x 2 is normally distributed if the (original)
population distributions are normal .
x1  x 2 is approximately normally distributed if the
(original) population is not normal, but the sample
size is large.

Expected value of

The variance of
x1  x 2 is m1 - m2
x1  x 2 is s12/n1 + s22/n2
5
• If the sampling distribution of x1  x 2 is normal or
approximately normal we can write:
( x 1  x 2 )  (m  m  )
Z
s s 

n1 n2
• Z can be used to build a test statistic or a
confidence interval for m1 - m2
6
• Practically, the “Z” statistic is hardly used,
because the population variances are not known.
( x 1  x 2 )  (m  m  )
Zt 
sS?12 sS?22

n1 n2
• Instead, we construct a “t” statistic using the
sample “variances” (S12 and S22).
7
• Two cases are considered when producing the
t-statistic.
– The two unknown population variances are equal.
– The two unknown population variances are not equal.
8
Case I: The two variances are equal
• Calculate the pooled variance estimate by:
2
2
(
n

1
)
s

(
n

1
)
s
2
1
2
2
Sp  1
n1  n2  2
n2 = 15
n1 = 10
S
2
1
S 22
S p2
Example: S12 = 25; S22 = 30; n1 = 10; n2 = 15. Then,
(10  1)( 25)  (15  1)( 30)
Sp 
 28.04347
10  15  2
2
9
• Construct the t-statistic as follows:
( x1  x 2 )  (m  m  )
t
1
2 1
sp (  )
n1 n2
d.f .  n1  n2  2
• Perform a hypothesis test
H0: m  m = 0
H1: m  m > 0;
or < 0;
or
0
Build an interval estimate
( x1  x 2 )  t  
1 1
sp (  )
n1 n2
2
where   is the confidence level.
10
Case II: The two variances are unequal
t
( x1  x 2 )  (m  m  )
d.f. 
2
1
2
2
s s
(  )
n1 n2
( s12 n1  s 22 ) 2
2
1
2
2
2
( s n1 ) ( s n2 )

n1  1
n2  1
2
11
Run a hypothesis test
as needed, or,
build an interval estimate
Estimator
s12
s 22
(x 1  x 2 )  t  

n1 n 2
where   is the confidence level.
12
• Example 12.1
– Do people who eat high-fiber cereal for
breakfast consume, on average, fewer
calories for lunch than people who do not eat
high-fiber cereal for breakfast?
– A sample of 150 people was randomly drawn.
Each person was identified as a consumer or
a non-consumer of high-fiber cereal.
– For each person the number of calories
consumed at lunch was recorded.
13
Calories consumed at lunch
Consmers Non-cmrs
568
498
589
681
540
646
636
739
539
596
607
529
637
617
633
555
.
.
.
.
705
819
706
509
613
582
601
608
787
573
428
754
741
628
537
748
.
.
.
.
Solution:
• The data are quantitative.
• The parameter to be tested is
the difference between two means.
• The claim to be tested is that
mean caloric intake of consumers (m1)
is less than that of non-consumers (m2).
14
• Identifying the technique
–The hypotheses are:
H0: (m1 - m2) = 0
H1: (m1 - m2) < 0
(m1 < m2)
– To check the relationships between the variances, we use a
computer output to find the samples’ standard deviations.
We have S1 = 64.05, and S2 = 103.29. It appears that the
variances are unequal.
– We run the t - test for unequal variances.
15
Calories consumed at lunch
Consmers Non-cmrs
568
498
589
681
540
646
636
739
539
596
607
529
637
617
633
555
.
.
.
.
705
819
706
509
613
582
601
608
787
573
428
754
741
628
537
748
.
.
.
.
t-Test: Two-Sample Assuming
Unequal Variances
Consumers
Nonconsumers
Mean
604.023 633.234
Variance
4102.98 10669.8
Observations
43
107
Hypothesized Mean Difference
0
df
123
t Stat
-2.09107
P(T<=t) one-tail 0.01929
t Critical one-tail 1.65734
P(T<=t) two-tail 0.03858
t Critical two-tail 1.97944
• At 5% significance level there is
sufficient evidence to reject the null
hypothesis.
16
• Solving by hand
– The interval estimator for the difference between two
means is
s2 s2
(x  x )  t
( 1  2)
1 2
 2 n
n
1
2
64.05 2 103 .29 2
 (604 .02  633 .239 )  1.9796

43
107
 29.21  27.65
17
• Example 12.2
– Do job design (referring to worker movements) affect
worker’s productivity?
– Two job designs are being considered for the
production of a new computer desk.
– Two samples are randomly and independently selected
• A sample of 25 workers assembled a desk using design A.
• A sample of 25 workers assembled the desk using design B.
• The assembly times were recorded
– Do the assembly times of the two designs differs?
18
Assembly times in Minutes
Design-A Design-B
5.2
6.8
6.7
5.0
5.7
7.9
6.6
5.2
Solution
8.5
7.6
6.5
5.0
• The data are quantitative.
5.9
5.9
6.7
5.2
6.6
6.5
• The parameter of interest is the difference
.
.
between two population means.
.
.
.
.
.
.
• The claim to be tested is whether a difference
between the two designs exists.
19
• Solving by hand
(6.288  6.016)  0
 0.93
1
1
1.075(  )
25 25
d.f .  25  25  2  48
t
–The hypotheses test is:
H0: (m1 - m2) = 0
H1: (m1 - m2)  0
– To check the relationship between the two variances calculate
the value of S1 and S2. We have S1= 0.92, and S2 =1.14.
We can infer that the two variances are equal to one another.
– To calculate the t-statistic we have:
Let us determine the
x1  6.288 x 2  6.016 s  0.8481 s  1.2996
rejection region
2
1
S p2 
2
2
(25  1)( 0.8481)  (25  1)(1.2996 )
 1.075
25  25  2
20
• The rejection region is
t  t  2, d.f.  t 0.025,48  2.009
Notice the absolute value
|t|
For  = 0.05
• The test: Since t= 0.93 < 2.009, there is
insufficient evidence to reject the null hypothesis.
.025
Rejection region
.093 2.009
21
• Conclusion: From this experiment, it is unclear at
5% significance level if the two job designs are
different in terms of worker’s productivity.
.025
Rejection region
.093 2.009
22
Design-A Design-B
6.8
5.2
5.0
6.7
7.9
5.7
5.2
6.6
7.6
8.5
5.0
6.5
5.9
5.9
5.2
6.7
6.5
6.6
.
.
.
.
.
.
.
.
Degrees of freedom
t - statistic
P-value of the one tail test
P-value of the two tail test
The Excel printout
t-Test: Two-Sample Assuming Equal Variances
Design-A
Mean
6.288
2
S1 0.847766667
Variance
Observations
25
Pooled Variance
1.075416667
Hypothesized Mean Difference
0
df
48
t Stat
0.927332603
P(T<=t) one-tail
0.179196744
t Critical one-tail
1.677224191
P(T<=t) two-tail
0.358393488
t Critical two-tail
2.01063358
Design-B
6.016
1.3030667
25
2
S22
Sp
m  m 
23
A 95% confidence interval for m1 - m2 is calculated as follows:
( x1  x 2 )  t  
1 1
sp (  ) 
n1 n2
2
1
1
 6.288  6.016  2.0106 1.075(  ) 
25 25
 0.272  0.5896  [ 0.3176 , 0.8616 ]
Thus, at 95% confidence level
-0.3176 < m1 - m2 < 0.8616
Notice: “Zero” is included in the interval
24
Checking the required Conditions for the
equal variances case (example 12.2)
Design A
12
The distributions are not
bell shaped, but they
seem to be approximately
normal. Since the technique
is robust, we can be confident
about the results.
10
8
6
4
2
0
5
5.8
6.6
Design B
7.4
8.2
More
4.2
5
5.8
7
6
5
4
3
2
1
0
6.6
7.4
More
25
12.4 Matched Pairs Experiment
• What is a matched pair experiment?
• Why matched pairs experiments are needed?
• How do we deal with data produced in this way?
The following example demonstrates a situation
where a matched pair experiment is the correct
approach to testing the difference between two
population means.
26
Example 12.3
• To determine whether a new steel-belted radial tire lasts
longer than a current model, the manufacturer designs
the following experiment.
– A pair of newly designed tires are installed on the rear wheels
of 20 randomly selected cars.
– A pair of currently used tires are installed on the rear wheels
of another 20 cars.
– Drivers drive in their usual way until the tires worn out.
– The number of miles driven by each driver were recorded.
See data next.
27
Solution
New-Design
70
83
78
46
74
56
74
52
99
57
77
84
72
98
81
63
88
69
54
97
m1
Exstng-Dsn
47
65
59
61
75
65
73
85
97
84
72
39
72
91
64
63
79
74
76
43
• Compare two populations of
quantitative data.
• The parameter is m1 - m2
The hypotheses are:
H0: (m1 - m2) = 0
H1: (m1 - m2) > 0
Mean distance driven before worn out
occurs for the new design tires
m2
Mean distance driven before worn out
occurs for the existing design tires
28
• The hypotheses are
H0: m1 - m2 = 0
H1: m1 - m2 > 0
The test statistic is
t
x1  x2  (m1  m2 )
1 1
s(  )
nand
We run the t ntest,
1
1
2
p
obtain the following
Excel results.
t-Test: Two-Sample Assuming
Equal Variances
New Dsgn Exstng dsgn
Mean
73.6
69.2
Variance
243.4105263
226.8
Observations
20
20
Pooled Variance 235.1052632
Hypothesized Mean Difference0
df
38
t Stat
0.907447484
P(T<=t) one-tail 0.184944575
t Critical one-tail 1.685953066
P(T<=t) two-tail
0.36988915
t Critical two-tail 2.024394234
We conclude that there is insufficient
evidence to reject H0 in favor of H1.
29
New design
7
6
5
4
3
2
1
0
45
60
75
90
105
More
105
More
Existing design
12
10
8
6
4
2
0
45
60
75
90
While the sample mean of the new design is larger than the sample mean
of the existing design, the variability within each sample is large enough
for the sample distributions to overlap and cover about the same range.
It is therefore difficult to argue that one expected value is different than
the other.
30
• Example 12.4
– to eliminate variability
among
within
t-Test: Paired
Twoobservations
Sample
each sample the experiment
for Means
New-Dsn Exst-Dsn
was redone.
Mean
73.6
69.05
– One tire of each type
was 316.366
Variance
242.779
Observations
20 of 20 20
installed on the rear wheel
Pearson Correlation
0.91468
randomly
selected cars (each
Hypothesized
Mean Difference
0
df
car was sampled twice, 19
thus
t Stat
2.81759
creating
a
pair
of
observations).
P(T<=t) one-tail
0.0055
t Critical– one-tail
1.72913
The number of miles
until
P(T<=t) two-tail
0.01099
wear-out
was
recorded
t Critical two-tail
2.09302
Car
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
New-Dsn Exst-Dsn
57
48
64
50
102
89
62
56
81
78
87
75
61
50
62
49
74
70
62
66
100
98
90
86
83
78
84
90
86
98
62
58
67
58
40
41
71
61
77
82
31
The range of observations
sample A
So what really
The values each sample consists of might markedly vary...
happened here?
The range of observations
sample B
32
Differences
...but the differences between pairs of observations
might be quite close to one another, resulting in a small
The range of the
variability.
differences
0
33
Observe the statistic t shown below
and notice how a small variability of
the differences (small sD) helps in
rejecting the null hypothesis.
34
• Solving by hand
– Calculate the difference for each xi
– Calculate the average differences and the standard
deviation of the differences
– Build the statistics as follows:
t
xD  m D
sD
nD
– Run the hypothesis test using t distribution with nD - 1
degrees of freedom.
35
– The hypotheses test for this problem is
H0: mD = 0
New-Dsn Exst-Dsn Difference
57
48
9
14
H1: mD > 0
The rejection 64
region is:50
The statistic is
t
xD  m D
sD

nD
4.55  0
7.22186
 2.817
102
62
81
87
.05,19
61
62
74
62
100
90
83
84
86
62
67
40
71
77
89
56
78
75
50
49
70
66
98
86
78
90
98
58
58
41
61
82
Average =
Standard Deviation =
13
6
3
12
11
13
4
-4
2
4
5
-6
-12
4
9
-1
10
-5
4.55
7.2218636
t > t with d.f. = 20-1 = 19.
If  = .05, t
= 1.729.
Since 2.817 > 1.729, there
is sufficient evidence in the data
to reject the null hypothesis in
favor of the alternative hypothesis.
20 Conclusion: At 5% significance
level the new type tires last longer
than the current type.
Estimating the mean difference
Interval Estimator of m D
x D  t  / 2, n D 1
sD
nD
The 95% confidence int erval of the mean difference
7.22
in Example 12.4 is 4.55  2.093
 4.55  3.38
20
37
Checking the required conditions
for the paired observations case
• The validity of the results depends on the
normality of the differences.
8
6
4
2
0
-12
-6
0
6
12
More
38
12.5 Inferences about the ratio
of two variances
• In this section we discuss how to compare the
variability of two populations.
• In particular, we draw inference about the ratio of
two population variances.
• This question is interesting because:
– Variances can be used to evaluate the consistency of
processes.
– The relationships between variances determine the technique
used to test relationships between mean values
39
• Point estimator of s12/s22
– Recall that S2 is an unbiased estimator of s2.
– Therefore, it is not surprising that we estimate s12/s22
by S12/S22.
• Sampling distribution for s12/s22
– The statistic [S12/s12] / [S22/s22] follows the F distribution.
– The test statistic for s12/s22 is derived from this
statistic.
40
• Testing s12 / s22
– Our null hypothesis is always
H0: s12 / s22 = 1
S12/s12
– Under this null hypothesis the F statistic F =
2/s 2
S
2
2
becomes
F=
S12
S22
41
Example 12.5
Calories consumed at lunch
Consmers Non-cmrs
568
498
(see example 12.1)
589
The hypotheses are:
681
In order
to perform a
540

test regarding
average
646
H0: s  1
636
consumption
 of
739
s
539
calories atpeople’s
596

lunchH in: s
relation
to
the
1
607
1

F-Test Two-Sample for Variances 529
inclusionsof high-fiber
637
cereal in their
Consumers Nonconsumers
617
Mean
604.0232558
633.2336449
633
breakfast, the variance
Variance 4102.975637
10669.76565
555
ratio of two samplesObservations
43
. 107
42
. 106
has to be tested first.dfF
0.384542245
.
P(F<=f) one-tail
0.000368433
.
F Critical one-tail
0.637072617
705
819
706
509
613
582
601
608
787
573
428
754
741
628
537
748
.
.
.
.
42
• Solving by hand
– The rejection region is
F>F/2,n1,n2 or F<1/F,n,n
which becomes (for =0.05)...
F  F / 2,n1,n 2  F.025 ,42,106  F.025 ,40,120  1.61
F  1/ F / 2,n 2,n1  1/ F.025 ,106 ,42  1/ F.025 ,120 ,40  .63
– The F statistic value is F=S12/S22 = .3845
– Conclusion: Because .3845<.63 we can reject the null
hypothesis in favor of the alternative hypothesis.
– There is sufficient evidence in the data to argue at 5%
significance level that the variance of the two groups differ.
43
Estimating the Ratio of Two Population
Variances
• From the statistic F = [S12/s12] / [S22/s22] we can
isolate s12/s22 and build the following interval
estimator:
2
2 
 s12 

s
s
1
1
 
 1 F / 2,n 2,n1


2
 s2  F
 s2 
s
2
 2   / 2,n1,n 2
 2
where n1  n  1 and n 2  n2  1
44
• Example 12.6
– Determine the 95% confidence interval estimate of
the ratio of the two population variances in example
12.1
– Solution
• we find Fa/2,v1,v2 = F.025,40,120 = 1.61 (approximately)
Fa/2,v2,v1 = F.025,120,40 = 1.72 (approximately)
• LCL = (s12/s22)[1/ Fa/2,v1,v2 ]
= (4102.98/10,669.770)[1/1.61]= .2388
• UCL = (s12/s22)[ Fa/2,v2,v1 ]
= (4102.98/10,669.770)[1.72]= .6614
45
12.6 Inference about the difference
between two population proportions
• In this section we deal with two populations
whose data are qualitative.
• When data are qualitative we can (only) ask
questions regarding the proportions of
occurrence of certain outcomes.
• Thus, we hypothesize on the difference p1-p2,
and draw an inference from the hypothesis test.
46
• Sampling Distribution of the Difference
p̂1  p̂ 2
Between Two sample proportions
– Two random samples are drawn from two populations.
– The number of successes in each sample is recorded.
– The sample proportions are computed.
Sample 1
Sample size n1
Number of successes x1
Sample proportion
pˆ 1 
x1
n1
Sample 2
Sample size n2
Number of successes x2
Sample proportion
x2
p̂ 2 
n2
47
– The statistic p̂1  p̂ 2 is approximately normally
distributed if n1p1, n1(1 - p1), n2p2, n2(1 - p2) are all
Because p1, p2, are unknown,
equal to or greater than 5.
– The mean of p̂1  p̂ 2 is p1 -
we use their estimates instead.
p2. Thus, n1p̂1,n1q̂1,n2p̂2 ,n2q̂2
are all equal to or greater than 5.
– The variance of p̂1  p̂ 2 is p1(1-p1) /n1)+ (p2(1-p2)/n2)
The statistic
Z
(p̂1  p̂ 2 )  (p1  p 2 )
p1 (1  p1 ) p 2 (1  p 2 )

n1
n2
is approximately normally distributed
48
• Testing the Difference between Two
Population p1  p 2 Proportions
– We hypothesize on the difference between the two
proportions, p1 - p2.
– There are two cases to consider:
Case 1:
Case 2:
H0: p1-p2 =0
H0: p1-p2 =D (D is not equal to 0)
Calculate the pooled proportion
Do not pool the data
Then
x1  x 2
p̂ 
n1  n 2
(p̂1  p̂ 2 )  (p1  p 2 )
Z
1
1
p̂(1  p̂)(  )
n1 n2
x1
p̂1 
n1
Then
Z
x2
p̂ 2 
n2
(p̂1  p̂ 2 )  D
p̂1 (1  p̂1 ) p̂ 2 (1  p̂ 2 )

n1
n492
• Example 12.7
– A research project employing 22,000 American
physicians was conduct to discover whether aspirin
can prevent heart attacks.
– Half of the participants in the research took aspirin,
and half took placebo.
– In a three years period,104 of those who took aspirin
and 189 of those who took the placebo had had
heart attacks.
– Is aspirin effective in preventing heart attacks?
50
• Solution
– Identifying the technique
• The problem objective is to compare the population of
those who take aspirin with those who do not.
• The data is qualitative (Take/do not take aspirin)
• The hypotheses test are
Population 1 - aspirin takers
H0: p1 - p2 = 0
Population 2 - placebo takers
H1: p1 - p2 < 0
• We identify here case 1 so
Z
(p̂1  p̂2 )  (p1  p2 )
1 1
p̂(1  p̂)(  )
n1 n2
51
– Solving by hand
• For a 5% significance level the rejection region is
z < -z = -z.05 = -1.645
- 5.02 < - 1.645, so reject
the null hypothesis.
The sample proportions are
p̂1  104 11,000  .00945 , and p̂ 2  189 11,000  .01718
The pooled proportion is
p̂  ( x1  x 2 ) (n1  n2 )  (104  189) (11,000  11,000)  .01332
The z statistic becomes
(p̂1  p̂ 2 )  (p1  p 2 )
.009455  .01718
Z

 5.02
1 1
1
1
p̂(1  p̂)(  )
.01332 (.98668 )(

)
n1 n2
11,000 11,000
52
• Example 12.8 (Marketing application)
– Management needs to decide which of two new
packaging designs to adopt, to help improve sales of a
soap.
– A study is performed in two communities:
• Design A is distributed in Community 1.
• Design B is distributed in Community 2.
• The old design packages is still offered in both communities.
– For design A to be financially viable it has to outsell
design B by at least 3%.
53
– Summary of the experiment results
• Community 1 - 580 packages with new design A sold
324 packages with old design sold
• Community 2 - 604 packages with new design B sold
442 packages with old design sold
– Use 1% significance level and perform a test to find
which type of packaging to use.
54
• Solution
– Identifying the technique
• The problem objective is to compare two populations,
consisting of the values “purchase of the new design”,
and “purchase of the old design”.
• Data are qualitative. We need to test p1 - p2..
• The hypotheses to test are
H0: p1 - p2 = .03
H1: p1 - p2 > .03
• We have to perform case 2 of the test for difference in
proportions (the difference is not equal to zero).
55
• Solving by hand
Z
(p̂1  p̂ 2 )  D
p̂1 (1  p̂1 ) p̂ 2 (1  p̂ 2 )

n1
n2
 580   604 


  .03
580  324   604  442 

 1.58
.642(1  .642) .577(1  .577)

904
1046
.642
The rejection region is z > z = z.01 = 2.33.
Conclusion: Do not reject the null hypothesis.
There is insufficient evidence to infer that
packaging with design A will outsell design B
by 3% or more.
56
• Estimating the Difference Between Two
Population Proportions
p̂1 (1  p̂1 ) p̂ 2 (1  p̂2 )
(p̂1  p̂2 ) 

n1
n2
• Example 12.9
Estimate with 95% the proportion of men who would avoid a heart
attack if they take aspirin regularly.
(.009455  .01718 )  1.96
.009455 (.999545 ) .01718 (.98282 )


11,000
11,000
 [ .010753 ,  .004697 ]
57
12.7 Market Segmentation
(Optional)
• Marketing Segmentation is a statistical analysis
aimed at determining the differences that exist
between buyers and non-buyers of a company’s
product.
• Statistics plays a major role in market segmentation.
– Surveys are used to gather the relevant data.
– Statistical tests are used to differentiate among segments.
– Sales and profit estimates are derived.
58
• Example 12.10
– A new company in the market offers no-wait services
for car oil and filter change.
– The company wants to make decisions about where to
advertise, and the nature of the advertisement.
– A sample of 1000 car owners was selected. The
drivers were asked to report whether or not they used
a no-wait station, as well as several characteristics of
their lives (including age).
59
– The research should reveal whether differences in age
exist between customers of no-wait service and
customers of other types of facilities (see file XM12-10)
• Solution
– Identifying the technique
• The problem objective is to compare the population of ages
of no-wait customers, to the population of ages of other
facility users.
• Data are quantitative.
• Samples are independent.
• The parameter to be tested is m1 - m2., (m represents mean
age)
60
– The hypotheses are
H0: m1 - m2 = 0
H1: m1 - m2 = 0
– When testing for the relationship between the two
variances we get the following results
F-Test Two-Sample for Variances
No-Wait
Other
Mean
47.78331 44.03448
Variance 77.17323 60.09721
Observations
623
377
df
622
376
F
1.28414
P(F<=f) one-tail
0.003822
F Critical one-tail
1.166224
We run the test for m1 - m2
with two equal variances
61
Chapter 13
Statistical Inferences:
A Review of
Chapter 11 through 12
62
13.1 Introduction
In this chapter we try to build a framework that help
decide which technique (or techniques) should be
used in solving a problem.
63
Flow chart of techniques for Chapters 11 and 12
64
Problem objective?
Describing a single population
Compare two populations
Data type?
Data type?
Qualitative
Quantitative
Quantitative
Z test &
estimator of p
Type of descriptive
measurements?
Central location
Variability
t- test &
estimator of m
c- test &
estimator of s2
Type of descriptive
measurements?
Central location
Continue
Qualitative
Z test &
estimator of p1-p2
Variability
F- test &
2
estimator of s2/s65
Experimental design?
Continue
Continue
Experimental design?
Independent samples
Matched pairs
t- test &
estimator of mD
Population variances?
Equal
Continue
Unequal
Problem objective?
t- test &
estimator of m1-m2
(Equal variances)
t- test &
estimator of m1-m2
(Unequal variances)
Describing a single population
Compare two populations
Data type?
Data type?
Qualitative
Quantitative
Quantitative
test&&
ZZ test
estimator of
ofpp
estimator
Type of descriptive
measurements?
Central location
Variability
t- test &
estimator of m
c - test &
estimator of s 22
Type of descriptive
measurements?
Central location
Continue
Qualitative
test&&
ZZ test
estimator of
ofpp11-p
-p22
estimator
Variability
F- test
test &&
66
Festimator of
of ss22/s
/s223
estimator
Experimental design?
Continue
Summary of statistical inferences:
Chapters 11 and 12
• Problem objective: Describe a single population.
– Data type: Quantitative
• Descriptive measurement: Central location
– Parameter: m
–
–
x m
Test statistic: t 
s n
s
Interval estimator: x  t  2
n
– Required condition: Normal population
67
Summary - continued
• Descriptive measurement: Variability.
– Parameter: s2
– Test statistic:
– Interval estimator:
2
(
n

1
)
s
c2 
s2
(n  1)s 2
LCL 
,
2
c 2
(n  1)s 2
UCL  2
c 1 2
– Required condition: normal population.
68
Summary - continued
– Data type:Qualitative
– Parameter: p
– Test statistic:
p̂  p
z
p(1  p) n
– Interval estimator: p̂  z  2
– Required condition:
p̂(1  p̂)
n
np  5 and n(1  p)  5 ( for test)
np̂  5 and n(1  p̂)  5 ( for estimate
69
Summary - continued
• Problem objective: Compare two populations.
– Data type: Quantitative.
• Descriptive measurement: Central location
– Experimental design: Independent samples
» population variances:
» Parameter: m1 - m2
» Test statistic:
s12  s22
d.f. = n1 + n2 -2
Interval estimator:
( x 1  x 2 )  (m1  m 2 )
1
2 1
t
x1  x 2  t  2 sp (  )
1
1
n1 n 2
s p2 (  )
n1 n 2
» Required condition: Normal populations
70
Summary - continued
• Problem objective: Compare two populations.
– Data type: Quantitative.
• Descriptive measurement: Central location
– Experimental design: Independent samples
» population variances:
» Parameter: m1 - m2
» Test statistic:
t
( x 1  x 2 )  (m1  m 2 )
s s
2
1
2
2
d.f. 
( s12 n1  s 22 ) 2
2
1
2
2
2
( s n1 ) ( s n2 )

n1  1
n2  1
Interval estimator:
x x t
1
2
2
s12 s 22
(  )
n1 n2
» Required condition: Normal populations
s12 s 22
(  )
n1 n2
71
2
Summary - continued
• Problem objective: Compare two populations.
– Data type: Quantitative.
• Descriptive measurement: Central location
– Experimental design: Matched pairs
» Parameter: mD
» Test statistic:
t
Interval estimator:
xD  m D
sD
d.f. = nD - 1
nD
x D  t  / 2 ,nD 1
sD
nD
» Required condition: Normal differences
72
Summary - continued
• Problem objective: Compare two populations.
– Data type: Quantitative
• Descriptive measurement: Variability
– Parameter:
s s
– Test statistic:
F  s12 s 22
2
1
2
2
 s12  1

 s12 
,  2 F / 2,n 2,n1 
 2 
 s 2  F / 2,n1,n 2  s 2 

wherepopulation
n1  n  1 and n 2  n2  1
– Required condition: Normal
– Interval estimator:
73
Summary - continued
• Problem objective: Compare two populations.
– Data type: Qualitative
– Parameter: p1 - p2
– Test statistic:
Case 1: H0: p1 - p2= 0
(p̂1  p̂2 )  (p1  p2 )
Z
1 1
p̂(1  p̂)(  )
n1 n2
– Interval estimator:
Required condition:
n1p̂1, n1(1  p̂1), np̂2 , n2 (1  p̂2 )  5
Case 2 : H0 : p1  p2  D
Z
( pˆ 1  pˆ 2 )  ( p1  p 2 )
pˆ 1 (1  pˆ 1 ) pˆ 2 (1  pˆ 2 )

n1
n2
p̂1 (1  p̂1 ) p̂ 2 (1  p̂2 )
(p̂1  p̂2 ) 

n1
n2
74