Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
History of statistics wikipedia , lookup
Bootstrapping (statistics) wikipedia , lookup
Eigenstate thermalization hypothesis wikipedia , lookup
Taylor's law wikipedia , lookup
Omnibus test wikipedia , lookup
Misuse of statistics wikipedia , lookup
Resampling (statistics) wikipedia , lookup
ARG/PDW: MCEN4027F00 VII: 1 MULTIPLE COMPARISONS When the computed value of the F-statistic in a single factor ANOVA is not significant, the analysis is terminated because no differences among at least two of the population means have been identified. However, when Ho is rejected, the investigator usually wishes to know which of the population means are different from each other. Such an analysis can be conducted using a post-treatment methodology that termed a multiple comparisons procedure. ARG/PDW: MCEN4027F00 VII: 2 Procedure for Making Multiple Comparisons Many practical experiments are conducted to determine the largest (or the smallest mean in a set). For example, suppose that a chemist has developed five chemical solutions for removing a corrosive substance from a metal fitting. The chemist would then want to determine the solution that will remove the greatest amount of the corrosive substance from the fitting in a single application. Similarly, a production engineer might want to determine which among six machines or which among three technicians achieves the highest productivity per hour. A mechanical engineer might want to choose one engine, from among five, that is most efficient, and so on. ARG/PDW: MCEN4027F00 VII: 3 Choosing the treatment with the largest mean from among five treatments might appear to be a simple matter. We could make, for example, n1 = n2 = ..... = n5 = 10 observations on each treatment, obtain the sample means, and compare them using student’s t-tests to determine whether differences exist among the pairs of means. However, there is a problem associated with this procedure: a student’s t-test with its corresponding value of , is valid only when the two treatments to be compared are selected prior to experimentation. A student’s t-test cannot be used after the fact to compare the treatments for the largest and smallest sample means because they will always be farther apart, on the average, than any pair of treatments selected at random. Furthermore, if you conduct a series of t tests, each with a chance of indicating a difference between a pair of means if in fact no difference exists, then the risk of making at least one Type I error in a series of t tests will be larger than the value of specified for a single t test. ARG/PDW: MCEN4027F00 VII: 4 There are a number of procedures for comparing and ranking a group of treatment means.The one that we will discuss is known as Tukey’s method for multiple comparisons and utilizes the studentized range: y y min q max s n (where ymax and ymin are the largest and smallest sample means, respectively) to determine whether the difference in any pair of sample means implies a difference in the corresponding treatment means. The logic behind this multiple comparisons procedure is that if we determine a critical value for the difference between the largest and smallest sample means, one that implies a difference in their respective treatment means, then any other pair of sample means that differ by as much as or more than this critical value would also imply a difference in the corresponding treatment means. Tukey’s procedure selects this critical distance, , so that the probability of making one or more Type I errors (concluding that difference exists between a pair of treatment means if, in fact, they are identical) is . Therefore, the risk of making a Type I error applies to the whole procedure, i.e., to the comparisons of all pairs of means, rather than to a single comparison. Consequently, the value of selected is called an experimentwise error rate (in contrast to a comparisonwise error rate). ARG/PDW: MCEN4027F00 VII: 5 Tukey’s procedure is based on the assumption that the p sample means are based on independent random samples, each containing an equal number nt of observations. Then if s = MSE is the computed standard deviation for the analysis, then the distance is, q p, v s nt The tabulated statistic q(p,) is the critical value of the studentized range, the value that locates in the upper tail of the q distribution. This critical value depends on , the number of treatment means involved in the comparison, and the number of degrees of freedom associated with MSE. Values of q(p,) are usually given in standard tables of statistics texts for = 0.05 and = 0.01. ARG/PDW: MCEN4027F00 VII: 6 ARG/PDW: MCEN4027F00 VII: 7 Multiple Comparions Example 1 Automobile hydrocarbon and CO emissions are controlled by elaborate emission-control systems. Such systems are affected by a number of factors including operating conditions, wear on the system, and system tuning. Suppose five systems are to be compared and sample selection is randomized relative to the three conditions just cited. Emission values in hydrocarbon parts per million (ppm) for each system are replicated using a sample size of four. Do the system emission characteristics differ? The experiment involves a single factor, system type, which is at five levels. We utilize a single-factor ANOVA p = 5 treatments. Let i represent the emission values for each of the five systems. Ho: 1 = 2 = 3 = 4 = 5 H1: At least two of the five means differ. The null hypothesis is tested using = 0.05. A statistical software package can be used to perform the ANOVA calculations. ARG/PDW: MCEN4027F00 VII: 8 System Emissions (ppm) Means 1 102 92 100 90 96.0 2 92 88 96 82 89.5 3 83 80 85 90 84.5 4 72 70 66 72 70.0 5 86 88 90 84 87.0 Anova: Single Factor SUMMARY Groups Row 1 Row 2 Row 3 Row 4 Row 5 Count Sum Average Variance 4 384 96 34.67 4 358 89.5 35.67 4 338 84.5 17.67 4 280 70 8 4 348 87 6.67 ANOVA Source of Variation Between Groups Within Groups SS 1479 308 Total 1787 df 4 15 19 MS 369.7 20.5 F 18 P-value F crit 0.0000135 3.06 ARG/PDW: MCEN4027F00 VII: 9 Since the sample value of the F test statistic is 18 and F0.05,4,15 = 3.06, mean equality is strongly rejected at the 0.05 level. In fact, F0.01,4,15 = 4.89, and the null hypothesis is also rejected at the 0.01 level. To obtain a finer analysis of the data, we employ Tukey’s multiple comparison test. From Table 12, the critical value of the studentized range distribution, q,m, is: q0.05,5,15 = 4.37 Therefore, 0.05 4.37 205 9.89 4 ARG/PDW: MCEN4027F00 VII: 10 We now look for differences that are greater than the critical value: x1 x4 = 96.0-70.0 = 26.0 x1 x3 = 96.0-84.5 = 11.5 x2 x4 = 89.5-70.0 = 19.5 x5 x4 = 87.0-70.0 = 17.0 x3 x4 = 84.5-70.0 = 14.5 From this information we can draw the following conclusions: There is evidence to indicate that system 1 is different from the other four systems, i.e., system one provides the lowest emission levels. There is evidence to indicate that system 3 is different from system 1. There is no evidence to indicate that systems 3, 5 and 2 are different or that systems 1, 2 and 5 are different. ARG/PDW: MCEN4027F00 VII: 11 Multiple Comparions Example To simultaneously test the mileage effects of three fuel mixtures and two carburetors, an investigator decides to perform a two-way ANOVA with three replications. Based upon this design, 18 cars are randomly selected for the six treatment combinations, and the resulting miles-per-gallon data are given below. What can be concluded? We utilize a two-factor ANOVA and use a statistical software package to perform the calculations. Fuel Carburetor 1 2 3 1 18.4 18.7 20.6 19.0 17.9 20.0 18.6 18.0 20.7 20.1 21.0 22.4 21.2 20.8 22.4 20.2 20.5 21.8 2 Mean 19.58 19.48 21.32 Mean 19.10 ARG/PDW: MCEN4027F00 VII: 12 ANOVA Anova: Two-Factor With Replication Fuels SUMMARY A B C Total Carburetor I Count Sum Average Variance 3 56 18.67 0.09333 3 54.6 18.2 0.19 3 61.3 20.43 0.1433 9 171.9 19.1 1.148 3 61.5 20.5 0.37 3 62.3 20.77 0.06333333 3 66.6 22.2 0.12 9 190.4 21.16 0.7653 6 117.5 19.58 1.194 6 116.9 19.48 2.078 6 127.9 21.32 1.042 Carburetor II Count Sum Average Variance Total Count Sum Average Variance ANOVA Source of Variation Carburetors Fuel Interaction Error Total SS 19.01 12.75 0.5911 1.96 34.32 df MS 1 2 2 12 19.01 6.376 0.2956 0.1633 F 116.4 39.03 1.810 P-value 1.57E-07 5.59E-06 0.2057 17 From the ANOVA the interaction null hypothesis is accepted, i.e., there is no interaction effect, and the null hypotheses regarding the two treatments, fuels and carburetors, are rejected, i.e., both of these are statistically significant. F crit 4.747 3.885 3.885 ARG/PDW: MCEN4027F00 VII: 13 To obtain a finer analysis of the data, we employ Tukey’s multiple comparison test. Since the carburetor null hypothesis was rejected, and since there are only two carburetor levels, it can be concluded that 1 2. We then test the means from treatment 1, fuels: From Table 12, the critical value of the studentized range distribution, q,m, is: q0.05,3,12 = 3.77 Therefore, 0.05 3.77 .1633 0.621 6 ARG/PDW: MCEN4027F00 VII: 14 We now look for differences that are greater than the critical value for fuels: x1 x2 = 19.58-19.48 = 0.10 x3 x1 = 21.32-19.58 = 1.74 x3 x2 = 21.32-19.48 = 1.84 We can draw the following conclusions: There is evidence to indicate that fuel 3 is different from fuel 1 and 2, i.e., fuel 1 provides the highest mpg. There is no evidence to indicate that fuels 1 and 2 are different.