* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download ANALYSIS OF VARIANCE
Degrees of freedom (statistics) wikipedia , lookup
History of statistics wikipedia , lookup
Bootstrapping (statistics) wikipedia , lookup
Confidence interval wikipedia , lookup
Taylor's law wikipedia , lookup
Resampling (statistics) wikipedia , lookup
Misuse of statistics wikipedia , lookup
2.08.14 One-way ANOVA The concept of an ANOVA The one-way ANOVA model Assumptions The F test CI estimate of the ith mean CI estimate of the difference of 2 means Bonferoni and Tukey estimates Two-way ANOVA The two-way ANOVA model Assumptions The F test CI estimates 6360ho5_7 1 ANALYSIS OF VARIANCE: ONE-WAY Purpose Compares multiple means: Tests for differences in the means for more than 2 groups Examples of Applications Compare mean strength of concrete developed using 6 different forms Determine the variety of beans producing the greatest mean yield How about multiple t-tests? Ease in interpretation: The number of t-tests increases rapidly as the number of groups to be compared increases, and thus, analysis becomes difficult. Error Reduction: In completing many analyses, the probability of committing at least one type I error somewhere in the process increases with the number of tests that are completed. The probability of committing at least one type I error in an analysis is called the experiment-wise error rate. Initial Investigation - Descriptive Comparisons Look at the means of each process or group (treatment); compare the variances (are they approximately the same?). Examine side-by-side box plots by treatment. Consider the spread and center points. Model Suppose r samples of size n1, n2, …nr are independent samples from normal distributions with a common variance 2. Notation yij is the jth observation in the ith sample j is the observation number i is the sample number Sample 1 x11, x12…x1j Sample 2 x21, x22…x2j … … Sample i xi1, xi2…xij The one-way model is: yij = i + ij where I is the mean by group [An observation = population mean + random error] Residuals: The error terms are independent random variables with mean 0 and variance 2. They look like a random sample from a normal distribution. To check, use a normal plot of the residuals 6360ho5_7 2 ^ eij = yij - I and is found by y ij - yi eij : residual corresponding to yij ^ y ij is the jth observation from the ith sample (observation number is j and sample number is i) y ij = the fitted value corresponding to yij. It corresponds to the sample mean. Assumptions 1. r samples of size n1, n2,..,nr samples are independently and randomly selected 2. Normality: the values in each group are normally distributed (note that the test is robust relative to this assumption i.e. modest departures will not adversely affect the results.) 3. Homogeniety of Variance: the variances within each population are equal (if sample sizes are equal, modest departures are not serious.) Look at side-by-side box plots and values of si. 4. Independence of Error: Residuals (e: IN(0,2)) are independent for each value. [The residuals for one observation are not related to the residual for another.) Construct a normal plot of the residuals or plot the residuals vs. the fits. Example for Discussion Consider the following times to complete a task using 3 different machines. I II III 25 23 21 26 22 23 24 24 20 24 24 21 26 22 20 Compare the time to complete a task using three different machines. The Hypothesis Ho: 1 = 2 = … r (All group means are equal) Ha: not all mean are equal (At least one group mean is not equal to another) Source of Variation (why do means vary?) Between Group Variation (treatment) - The values may vary because the treatments are different. The bigger the effect of the treatment, the more variation in-group means we will find. 6360ho5_7 3 Within Group (error) - People and specimens are different; they differ whether we treat them the same or not. There are differences in means because there is variability in samples. NOTE: Even if Ho is true (i.e. all group means are equal), there will be differences in the sample means; variability in the samples will create the differences. If Ho is true, the between group variation will estimate the variability as well as the within group variation. If Ho is false the between group variation will be larger than the within group... The F-Test The variation between or within groups is the basis for the F-test. Consider the following sum of squares. Total Variation - includes both between group and within group variation (variation without regard to treatment) SST - Total sum of squares (difference between cell mean and grand means) SSB = SSTr = Sum of Squares Between groups - Variation from Treatment SSW = SSE = Random variation-random error - Sum of Squares Within groups SST = SSB + SSW = SSTr + SSE MSTr = SSTr/(r-1) MSE = SSE/(n-r) F= MSTr MSE df (numerator) = r -1 df(denominator) = n-r If Ho is false, the Between-Group variation will be larger than the Within-Group variation. If Ha is false, the treatment variation and random variation are about the same. 6360ho5_7 4 ANOVA Table Source Treatment (Between) Error (Within) Total SS n (Y i i (Y ij j Y )2 Y i )2 df r-1 n-r i SST MS SSTr = MSTr r 1 SSE = MSE nr F F= MSTr MSE n-1 Pooled Sample Variance: Estimates the baseline variation present in a response variable, sp2 may be thought of as an “average” variance for each groups or the weighted average of the sample variances. (Note that the ANOVA assumes equal variances and equal variances are required to find the pooled sd.) r = number of samples ni = sample sizes si2 = sample variance sp2 = MSE2 (n1 1) s1 (n2 1) s 22 ... (nr 1) s 2p 2 2 sp = (n1 1) (n2 1) ... (nr 1) r = number of samples ni = sample sizes si2 = sample variances R2: fraction of the raw variability in y accounted for by fitting the model to the data SSTr R2 = SST Standardized Residuals – see page 458 Confidence Interval Estimate for the ith Population Mean – estimates a cell mean sp with df = n - r (total sample size-number of groups) Yi t ni Two-Sided Confidence Interval Estimate of the Difference in PopulationMeans 1 1 with df = n – r (Yi Yi ' ) t s p ni ni ' df = n - r where n is the total sample size and r is the number of groups or cells 6360ho5_7 5 Prediction Interval of a Single Observation This interval predicts or locates a single additional observation from a particular one of the r distributions. Tolerance Interval The tolerance interval locates most of a large number of values within one of the r distributions. Simultaneous Confidence Levels Bonferroni Inequality Tukey’s Comparison 6360ho5_7 6 EXAMPLE OF ONE WAY ANOVA (Completely Randomized Design) An experiment was conducted to compare the wearing qualities of three types of paint when subjected to the abrasive action of a slowly rotating cloth-surfaced wheel. Ten paint specimens were tested for each paint type and the number of hours until visible abrasion was apparent was recorded for each specimen. WEAR DATA FOR 3 TYPES OF PAINT (in hours) Type Hours of Wear 148 76 393 520 236 134 55 1 166 415 153 Totals 2296 2 513 264 433 94 535 327 214 165 280 304 3129 3 335 643 216 536 128 723 258 380 594 465 4278 Note that the numbers of the following items are used on the attached Minitab printout. Items in italics are my notes regarding the interpretation of the data. For XL, see: http://www.uh.edu/~tech132/6360ho5_7phs.xls.xlsx Preliminaries 1. Run the descriptive statistics on the data. 2. Explore the test assumptions Three (3) samples of size 10 are independently and randomly selected. This assumption is satisfied with a proper experimental design. Normality: values in each group are normally distributed (the model is robust relative to this assumption i.e.... modest departures will not adversely effect the results.) Homogeneity of Variance: the variance within each group should be equal for all populations ( = 2 1 2 2 = 2 3 ). Look at the variances found in calculating the descriptive statistics. Do a dot plot or side-by-side box plots. My print-out does not provide one of these plots, but you should run one to check the spread of the values. There is a test called Hartley’s test for Homogeneity of Population Variances that can be run to check this assumptions. It is not routinely run; it is extremely sensitive to departures from normality of the populations. A second test is Bartlett’s Test, also sensitive to departures from normality. If population variances are extremely different, it may be desirable to do a transformation of the data. For example transform the data by finding the square root (often useful for count or percentage data) or the logarithm of each value of x. The type of transformation depends upon the type of data. Once the data are transformed, perform the ANOVA on the transformed data. Test results for differences in transformed treatment means and apply to the original treatment means. It is often difficult to assign a practical interpretation to the transformed variable and the various treatment means, and some applied statisticians do not use transformations unless there is evidence to suggest sizable differences among the treatment population variances (Mendenhall and Beavers) Independence of Error: Residual are independent for each value. Check by doing a normal probability plot of the residuals or plot the residuals vs. fits. If the residuals are normally distributed then the resulting plot will be linear. There is a simple correlation test that can be used to test the hypothesis that the values are correlated; you need a table that I will distribute in class. The normal probability plot is linear and indicates that this assumption is satisfied. Note: The ANOVA has been show n to be ‘robust’ to violations of these assumptions. 6360ho5_7 7 3. Test the hypotheses: 1 = 2 = 3 (Population means are equal) Ha: At least two of the population means are not the same. The test is an F-test. It looks at the ratio of the mean square between samples (MSB) to the mean square error, random error within samples (MSE). If the null hypothesis is true, we would expect these values to be approximately the same and the ratio to be one; if Ho is not true, then MSB should be larger than MSE. In this problem the F ratio is 3.51. With 2 and 27 degrees of freedom, the F-table can be used to find a critical value of 3.35 when is 0.05. (Recall, is the probability of rejecting a true Ho.) Since the calculated F exceeds the table value of F, reject Ho and conclude that at least two of the population means are not equal. This conclusion is also verified by p = 0.04. 4. If the null hypothesis is rejected then we need to determine which means are different. This is typically determined by calculating confidence intervals for the difference in population means (1 - 2 2 - 3 ,1 - 3). We have discussed two procedures for determining simultaneous confidence intervals for multiple confidence intervals (there are others). Bonferroni Inequality ( y i y i ' ) ts p 1 1 n1 n2 This process involves calculating a t-interval for the difference in means where the value of t is adjusted to allow for an acceptable family confidence level. When many intervals are constructed, this procedure can be inefficient because the confidence level for each individual interval would be extremely large. Tukey’s Method involves the use of Studentized range distribution. D-10-B) ( yi yi' ) (See Tables D-10-A and q* 1 1 sp n1 n2 2 These intervals can be most easily computed using computer packages such as Minitab. If there is only one difference, a simple t confidence interval could be calculated for the difference between (usually the largest and smallest) means. It is sometimes desirable to calculate a confidence interval for one of the means; this CI is calculated using the t-interval. The Bonferroni Inequality can be applied in determining t. y i t s p ni On the print-out , the graph that is shown following the ANOVA summary table indicates that there is a difference between Paint 1 and Paint 3. This conclusion is verified by the Tukey pairwise comparisons. (Tukey’s method has a 95% CI family rate with individual comparisons at 98%) With 95% confidence, it is asserted that the wear time for Paint 3 is from 12 to 385 hours longer than for Paint 1; thus, Paint 3 appears to possess slightly greater wearing quality than Paint 1. There is no statistically significant difference in the wear time between Paints 1 and 2 or between Paints 2 and 3. 5. sp2 is needed for the calculations described in item 4. It is the pooled standard deviation and may be thought of as an “average” variance for each groups or the weighted average of the sample variances. sp2 estimates the baseline variation that is present in a response variable (hours until apparent abrasion). (Note that the ANOVA assumes equal variances and equal variances are required to find the pooled sd.) From the ANOVA summary table, find the MSE and take the square root to get sp= 167.88.This value is the pooled standard deviation which can be thought of as the average (or base) variation of the hours of wear for any of the three treatments i.e. paint types. 6360ho5_7 8 C1 T1 C2 T2 C3 T3 C6 Hrs 148 76 393 520 236 134 55 166 415 153 513 264 433 94 535 327 214 165 280 304 335 643 216 536 128 723 258 380 594 465 148 76 393 520 236 134 55 166 415 153 513 264 433 94 535 327 214 165 280 304 335 643 216 536 128 723 258 380 594 465 6360ho5_7 C7 Paint Type 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3 3 3 9 Worksheet size: 100000 cells MTB > describe c1-c3 Descriptive Statistics Variable T1 T2 T3 N 10 10 10 Mean 229.6 312.9 427.8 Variable T1 T2 T3 Minimum 55.0 94.0 128.0 Median 159.5 292.0 422.5 Maximum 520.0 535.0 723.0 TrMean 215.1 312.5 428.4 Q1 119.5 201.7 247.5 StDev 158.2 144.2 196.8 SE Mean 50.0 45.6 62.2 Q3 398.5 453.0 606.2 MTB > Boxplot c1 c2 c3; SUBC> Box; SUBC> Symbol; SUBC> Outlier; SUBC> Overlay; SUBC> ScFrame; SUBC> ScAnnotation. MTB > Note: The Following is obtained from the menu using: Stat/Anova/One-Way (Unstacked) Responses in separate columns: C1 C2 C3 MTB > AOVOneway c1 c2 c3. One-way Analysis of Variance Analysis of Variance Source DF SS Factor 2 198080 Error 27 760987 Total 29 959067 MS 99040 28185 F 3.51 P 0.044 Individual 95% CIs For Mean Based on Pooled StDev Level T1 T2 T3 6360ho5_7 N 10 10 10 Mean 229.6 312.9 427.8 StDev 158.2 144.2 196.8 ----------+---------+---------+-----(--------*--------) (--------*--------) (--------*--------) ----------+---------+---------+------ 10 Pooled StDev = MTB > Note: 167.9 240 360 480 The Following is obtained from the menu using: Stat/Anova/One-Way Responses is stacked in C6; I have factors in C7. (See p2) I used the following from this dialog box. Comparisons: Tukey's Family Error Rate Graphs: Dotplot, Boxplot, Normal Plot of Residuals, Residuals vs Fits MTB > Name c11 = 'RESI1' c12 = 'FITS1' MTB > Oneway c6 c7 'RESI1' 'FITS1'; SUBC> Tukey 5; SUBC> GDotplot; SUBC> GBoxplot; SUBC> GNormalplot; SUBC> GFits. One-way Analysis of Variance Analysis of Variance for Hrs Source DF SS MS Paint Ty 2 198080 99040 Error 27 760987 28185 Total 29 959067 F 3.51 P 0.044 Individual 95% CIs For Mean Based on Pooled StDev Level 1 2 3 N 10 10 10 Mean 229.6 312.9 427.8 Pooled StDev = StDev 158.2 144.2 196.8 167.9 ----------+---------+---------+-----(--------*--------) (--------*--------) (--------*--------) ----------+---------+---------+------ 240 360 480 Tukey's pairwise comparisons Family error rate = 0.0500 Individual error rate = 0.0196 Critical value = 3.51 Intervals for (column level mean) - (row level mean) 6360ho5_7 11 1 2 -270 103 3 -385 -12 2 -301 71 MTB > 6360ho5_7 12 EXAMPLE OF TWO- WAY ANOVA (a Factorial Experiment) An experiment was conducted to determine the effect of sintering time (two levels) on the compressive strength of two different metals. Five test specimens were sintered for each metal at each of two sintering times. The data (in thousands of pounds per square inch) are shown in the following table. Metal 1 Metal 2 100 minutes 17.1 15.2 16.5 16.7 14.9 12.3 13.8 10.8 11.6 12.1 200 minutes 19.4 17.2 18.9 20.7 20.1 15.6 17.2 16.7 16.1 18.3 This experiment represents a factorial experiment; every level of every variable is paired with every level of every other variable. To collect this data, we could process a fixed quantity of metal 1 for 100 minutes and the same quantity of the metal for 200 minutes. Measure the compressive strength on the processed metal samples. This procedure is duplicated for metal 2. Note this problem involves two factors: metal type and sintering time. The different settings for a given factor are call levels. In this problem, there are two levels of each factor i.e. two types of metal and 2 different times. By completing the ANOVA, we can determine 1) if there is a difference in strength based on metal type, 2) if there is a difference in strength based on the sintering time, or 3) if there is a difference in strength based on a particular combination of the metal type and the sintering time interaction. The main effects of an independent variable is the effect of the variable averaging over all levels of other variables in the experiment. For example, the main effect of for metal type involves a comparison of the means of metal 1 and metal 2. If the effect of metal type depends on sintering time, then there is an interaction effect. Note that the numbers of the following items are used on the attached Minitab printout. Items in italics are my notes regarding the interpretation of the data. Preliminaries 1. Run the descriptive statistics on the data. 2. Plot the data. In this example, strength is plotted as the dependent variable on the vertical axis; time is plotted on the horizontal axis. There should be a separate set of points based on the metal type, producing two ‘lines’. It can be noted that as you increase the sintering time from 100 to 200 minutes, the compressive strength increases for both metal 1 and metal 2. Since these lines are approximately ‘parallel,’ we can speculate that there is no significant interaction, and if there are statistically significant differences in the compressive strength of the metal, they will be differences resulting from metal type or sintering time. (Note it would enhance the graph to note error bars that reflect CI estimates of each of the sample means; this step is not done automatically by the software). There is an interaction when the magnitude of an effect is greater at one level of a variable than at another level of the variable. 3. Explore the test assumptions Independence of Error: Residuals are independent for each value. Check by doing a normal probability plot of the residuals; plot the residuals vs. fits. The normal plot is approximately linear and thus, indicates no major problems with the normal distribution assumption. Similarly, the plot of the residuals vs. the fits (sample means) does not indicate a trend in as a function of mean response. Common Variance: Variances are the same for each population group. 6360ho5_7 13 Check by doing a dot plot of the residuals vs. sample means and check the spread (variability). 3. Test the hypotheses. There are three tests of significance. Test for Main Effects, Metal Type 1 = 2 or i = 0 (There are no differences in population means for the metal types) Ha: At least two of the population means are not the same. In this problem the F ratio is 41.15, with 1 and 16 degrees of freedom, the F-table can be used to find a critical value of F when is 0.05. From the summary table p< 0.001, and thus, we can reject Ho and conclude that population compressive strengths for metal 1 and metal 2 differ; metal 1 appears to have a the compressive strength for metal 1 appears to be greater than the strength for metal 2. Test for Main Effects, Sintering Time = or j = 0 (There are no differences in population means for the two sintering times) Ha: At least two of the population means are not the same. In this problem the F ratio is 60.99, with 1 and 16 degrees of freedom, the F-table can be used to find a critical value of F when is 0.05. From the summary table p< 0.001, and thus, we can reject Ho and conclude that population compressive strengths for sintering for 100 and 200 minutes differ; sintering for 200 minutes appears to result in a greater compressive strength. Test for Interaction Effects or ij = 0 (Metal type and sintering time do not interact to produce different compressive strengths) Ha: Metal type and sintering types interact. In this problem the F ratio is 2.17, with 1 and 16 degrees of freedom, the F-table can be used to find a critical value of F when is 0.05. From the summary table = 0.16, and thus, we cannot reject Ho; we conclude that population compressive strengths based on the interaction of metal type and sintering time are not different i.e. all combinations of metal/time result in the same compressive strength. 4. If the null hypothesis is rejected then we need to determine which means are different. This is typically determined by calculating confidence intervals for the difference in population means (1 - 2 2 - 3, 1 - 3). We have discussed two procedures for determining simultaneous confidence intervals for multiple confidence intervals (there are others). Bonferroni Inequality ( y i y i ' ) ts p 1 1 n1 n2 Tukey’s Method involves the use of Studentized range distribution See equation 8-17 on page 456; use table D -10 Difference in strength based on metal type (rows) ( yi yi' ) q * sp JM df = n - IJ = 16; I = 2; J = 2; M common cell size = 5; s p = 1.25 (17.65-14.45) + 1.6 [1.6,4.8] lbs. per sq. inch With 99% confidence, it is asserted that the compressive strength of metal 1 is from 1.6 to 4.8 pounds per square inch greater than for Metal 2. Difference in strength based on sintering time (columns) ( yi yi' ) q * sp IM (17.65-14.45) + 1.6 [2.3,5.5] lbs. per sq. inch With 99% confidence, it is asserted that the compressive strength of for a 100-minute sintering time is from 2.3 to 5.5 pounds per square inch greater than for the 200 minute sintering time. 6360ho5_7 14 If there is only one difference, a simple t confidence interval could be calculated for the difference between (usually the largest and smallest) means. It is sometimes desirable to calculate a confidence interval for one of the cell means; this CI is calculated using the t-interval. The Bonferroni Inequality can be applied in determining t. y 6360ho5_7 ij t s p n ij 15 Example of Two-Way ANOVA Effect of the sintering time on the compressive strength of two different metals 1. Run the descriptive statistics on the data. MTB > print c1-c3 Data Display Row Strength 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 MTB > SUBC> SUBC> SUBC> Metal C1 17.1 15.2 16.5 16.7 14.9 12.3 13.8 10.8 11.6 12.1 19.4 17.2 18.9 20.7 20.1 15.6 17.2 16.7 16.1 18.3 C2 Time C3 1 1 1 1 1 2 2 2 2 2 1 1 1 1 1 2 2 2 2 2 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 table c2 c3; data c1; means c1; standard deviation c1. Tabulated Statistics Rows: Metal 1 1 17.100 15.200 16.500 16.700 14.900 6360ho5_7 Columns: Time 2 All 19.400 -17.200 18.900 20.700 20.100 16 16.080 0.971 2 19.260 1.339 17.670 2.006 12.300 15.600 -13.800 17.200 10.800 16.700 11.600 16.100 12.100 18.300 12.120 16.780 14.450 1.103 1.043 2.656 All ---14.100 18.020 16.060 2.306 1.729 2.824 Cell Contents -Strength:Data Mean StDev 6360ho5_7 17 2. Plot the data and test the hypotheses. MTB > %Main c2 c3; SUBC> Response c1. Minitab Plot [Use: Stat/ANOVA/Twoway/Interaction Plot] Interaction Plot - Data Means for Strength Metal 1 Mean Strength 18 2 16 14 12 1 2 Time MTB > Name c4 = 'RESI1' c5 = 'FITS1' MTB > Twoway c1 c3 c2 'RESI1' 'FITS1'; SUBC> Means c3 c2; SUBC> GNormalplot; SUBC> GFits. Two-way Analysis of Variance Analysis of Variance for Strength Source DF SS MS Time 1 76.83 76.83 Metal 1 51.84 51.84 Interaction 1 2.74 2.74 Error 16 20.16 1.26 Total 19 151.57 Time 1 2 6360ho5_7 Mean 14.10 18.02 F 60.99 41.15 2.17 P 0.000 0.000 0.160 Individual 95% CI --+---------+---------+---------+--------(----*----) (----*----) --+---------+---------+---------+--------13.50 15.00 16.50 18.00 18 Metal 1 2 Mean 17.67 14.45 Individual 95% CI ------+---------+---------+---------+----(-----*------) (-----*------) ------+---------+---------+---------+----14.40 15.60 16.80 18.00 MTB > C1 Strength 17.1 15.2 16.5 16.7 14.9 12.3 13.8 10.8 11.6 12.1 19.4 17.2 18.9 20.7 20.1 15.6 17.2 16.7 16.1 18.3 6360ho5_7 C2 Metal 1 1 1 1 1 2 2 2 2 2 1 1 1 1 1 2 2 2 2 2 C3 Time 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 C4 RESI1 1.02 -0.88 0.42 0.62 -1.18 0.18 1.68 -1.32 -0.52 -0.02 0.14 -2.06 -0.36 1.44 0.84 -1.18 0.42 -0.08 -0.68 1.52 C5 FITS1 16.08 16.08 16.08 16.08 16.08 12.12 12.12 12.12 12.12 12.12 19.26 19.26 19.26 19.26 19.26 16.78 16.78 16.78 16.78 16.78 19 Normal Probability Plot of the Residuals (response is Strength) 2 Normal Score 1 0 -1 -2 -2 -1 0 1 2 Residual 6360ho5_7 20