Download Statistical Analysis Notes-Section6-2

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Eigenstate thermalization hypothesis wikipedia , lookup

Taylor's law wikipedia , lookup

Omnibus test wikipedia , lookup

Misuse of statistics wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Student's t-test wikipedia , lookup

Analysis of variance wikipedia , lookup

Transcript
ARG/PDW: MCEN4027F00
VII: 1
MULTIPLE COMPARISONS
When the computed value of the F-statistic in a single factor
ANOVA is not significant, the analysis is terminated because no
differences among at least two of the population means have been
identified.
However, when Ho is rejected, the investigator usually wishes to
know which of the population means are different from each
other.
Such an analysis can be conducted using a post-treatment
methodology that termed a multiple comparisons
procedure.
ARG/PDW: MCEN4027F00
VII: 2
Procedure for Making Multiple Comparisons
Many practical experiments are conducted to determine the largest
(or the smallest mean in a set).
For example, suppose that a chemist has developed five
chemical solutions for removing a corrosive substance from a
metal fitting.
The chemist would then want to determine the solution that
will remove the greatest amount of the corrosive substance
from the fitting in a single application.
Similarly, a production engineer might want to determine which
among six machines or which among three technicians achieves
the highest productivity per hour.
A mechanical engineer might want to choose one engine, from
among five, that is most efficient, and so on.
ARG/PDW: MCEN4027F00
VII: 3
Choosing the treatment with the largest mean from among five
treatments might appear to be a simple matter.
We could make, for example, n1 = n2 = ..... = n5 = 10
observations on each treatment, obtain the sample means, and
compare them using student’s t-tests to determine whether
differences exist among the pairs of means.
However, there is a problem associated with this procedure: a
student’s t-test with its corresponding value of , is valid only
when the two treatments to be compared are selected prior to
experimentation.
A student’s t-test cannot be used after the fact to compare
the treatments for the largest and smallest sample means
because they will always be farther apart, on the average,
than any pair of treatments selected at random.
Furthermore, if you conduct a series of t tests, each with a
chance  of indicating a difference between a pair of means
if in fact no difference exists, then the risk of making at
least one Type I error in a series of t tests will be larger
than the value of  specified for a single t test.
ARG/PDW: MCEN4027F00
VII: 4
There are a number of procedures for comparing and ranking a group
of treatment means.The one that we will discuss is known as Tukey’s
method for multiple comparisons and utilizes the studentized range:
y
 y min
q  max
s n
(where ymax and ymin are the largest and smallest sample means,
respectively) to determine whether the difference in any pair of
sample means implies a difference in the corresponding treatment
means.
The logic behind this multiple comparisons procedure is that if
we determine a critical value for the difference between the
largest and smallest sample means, one that implies a difference
in their respective treatment means, then any other pair of
sample means that differ by as much as or more than this critical
value would also imply a difference in the corresponding
treatment means.
Tukey’s procedure selects this critical distance, , so that the
probability of making one or more Type I errors (concluding
that difference exists between a pair of treatment means if, in
fact, they are identical) is .
Therefore, the risk of making a Type I error applies to the
whole procedure, i.e., to the comparisons of all pairs of
means, rather than to a single comparison.
Consequently, the value of  selected is called an
experimentwise error rate (in contrast to a comparisonwise
error rate).
ARG/PDW: MCEN4027F00
VII: 5
Tukey’s procedure is based on the assumption that the p sample
means are based on independent random samples, each containing an
equal number nt of observations.
Then if s = MSE is the computed standard deviation for the
analysis, then the distance  is,
  q  p, v 
s
nt
The tabulated statistic q(p,) is the critical value of the
studentized range, the value that locates  in the upper tail of the
q distribution.
This critical value depends on , the number of treatment means
involved in the comparison, and  the number of degrees of
freedom associated with MSE.
Values of q(p,) are usually given in standard tables of
statistics texts for  = 0.05 and  = 0.01.
ARG/PDW: MCEN4027F00
VII: 6
ARG/PDW: MCEN4027F00
VII: 7
Multiple Comparions Example 1
Automobile hydrocarbon and CO emissions are controlled by
elaborate emission-control systems.
Such systems are affected by a number of factors including
operating conditions, wear on the system, and system tuning.
Suppose five systems are to be compared and sample selection is
randomized relative to the three conditions just cited.
Emission values in hydrocarbon parts per million (ppm) for each
system are replicated using a sample size of four.
Do the system emission characteristics differ?
The experiment involves a single factor, system type, which is at five
levels.
We utilize a single-factor ANOVA p = 5 treatments.
Let i represent the emission values for each of the five systems.
Ho: 1 = 2 = 3 = 4 = 5
H1: At least two of the five means differ.
The null hypothesis is tested using  = 0.05.
A statistical software package can be used to perform the
ANOVA calculations.
ARG/PDW: MCEN4027F00
VII: 8
System
Emissions (ppm)
Means
1
102
92
100
90
96.0
2
92
88
96
82
89.5
3
83
80
85
90
84.5
4
72
70
66
72
70.0
5
86
88
90
84
87.0
Anova: Single Factor
SUMMARY
Groups
Row 1
Row 2
Row 3
Row 4
Row 5
Count Sum Average Variance
4 384
96
34.67
4 358
89.5
35.67
4 338
84.5
17.67
4 280
70
8
4 348
87
6.67
ANOVA
Source of Variation
Between Groups
Within Groups
SS
1479
308
Total
1787
df
4
15
19
MS
369.7
20.5
F
18
P-value
F crit
0.0000135 3.06
ARG/PDW: MCEN4027F00
VII: 9
Since the sample value of the F test statistic is 18 and F0.05,4,15 = 3.06,
mean equality is strongly rejected at the 0.05 level.
In fact, F0.01,4,15 = 4.89, and the null hypothesis is also rejected at
the 0.01 level.
To obtain a finer analysis of the data, we employ Tukey’s multiple
comparison test.
From Table 12, the critical value of the studentized range
distribution, q,m, is:
q0.05,5,15 = 4.37
Therefore,
 0.05  4.37 
205
 9.89
4
ARG/PDW: MCEN4027F00
VII: 10
We now look for differences that are greater than the critical value:
x1  x4 = 96.0-70.0 = 26.0
x1  x3 = 96.0-84.5 = 11.5
x2  x4 = 89.5-70.0 = 19.5
x5  x4 = 87.0-70.0 = 17.0
x3  x4 = 84.5-70.0 = 14.5
From this information we can draw the following conclusions:
 There is evidence to indicate that system 1 is different from
the other four systems, i.e., system one provides the lowest
emission levels.
 There is evidence to indicate that system 3 is different from
system 1.
 There is no evidence to indicate that systems 3, 5 and 2 are
different or that systems 1, 2 and 5 are different.
ARG/PDW: MCEN4027F00
VII: 11
Multiple Comparions Example
To simultaneously test the mileage effects of three fuel mixtures and
two carburetors, an investigator decides to perform a two-way
ANOVA with three replications.
Based upon this design, 18 cars are randomly selected for the six
treatment combinations, and the resulting miles-per-gallon data
are given below.
What can be concluded?
We utilize a two-factor ANOVA and use a statistical software
package to perform the calculations.
Fuel
Carburetor
1
2
3
1
18.4
18.7
20.6
19.0
17.9
20.0
18.6
18.0
20.7
20.1
21.0
22.4
21.2
20.8
22.4
20.2
20.5
21.8
2
Mean
19.58 19.48 21.32
Mean
19.10
ARG/PDW: MCEN4027F00
VII: 12
ANOVA
Anova: Two-Factor With Replication
Fuels
SUMMARY
A
B
C
Total
Carburetor I
Count
Sum
Average
Variance
3
56
18.67
0.09333
3
54.6
18.2
0.19
3
61.3
20.43
0.1433
9
171.9
19.1
1.148
3
61.5
20.5
0.37
3
62.3
20.77
0.06333333
3
66.6
22.2
0.12
9
190.4
21.16
0.7653
6
117.5
19.58
1.194
6
116.9
19.48
2.078
6
127.9
21.32
1.042
Carburetor II
Count
Sum
Average
Variance
Total
Count
Sum
Average
Variance
ANOVA
Source of Variation
Carburetors
Fuel
Interaction
Error
Total
SS
19.01
12.75
0.5911
1.96
34.32
df
MS
1
2
2
12
19.01
6.376
0.2956
0.1633
F
116.4
39.03
1.810
P-value
1.57E-07
5.59E-06
0.2057
17
From the ANOVA the interaction null hypothesis is accepted, i.e.,
there is no interaction effect, and the null hypotheses regarding the
two treatments, fuels and carburetors, are rejected, i.e., both of these
are statistically significant.
F crit
4.747
3.885
3.885
ARG/PDW: MCEN4027F00
VII: 13
To obtain a finer analysis of the data, we employ Tukey’s multiple
comparison test.
Since the carburetor null hypothesis was rejected, and since
there are only two carburetor levels, it can be concluded that
1 2.
We then test the means from treatment 1, fuels:
From Table 12, the critical value of the studentized range
distribution, q,m, is:
q0.05,3,12 = 3.77
Therefore,
 0.05  3.77 
.1633
 0.621
6
ARG/PDW: MCEN4027F00
VII: 14
We now look for differences that are greater than the critical
value for fuels:
x1  x2 = 19.58-19.48 = 0.10
x3  x1 = 21.32-19.58 = 1.74
x3  x2 = 21.32-19.48 = 1.84
We can draw the following conclusions:
There is evidence to indicate that fuel 3 is different from
fuel 1 and 2, i.e., fuel 1 provides the highest mpg.
There is no evidence to indicate that fuels 1 and 2 are
different.