Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Statistics MINITAB - Lab 14 ANOVA - TEST TO COMPARE P TREATMENT MEANS 1. Completely Randomised Design If we are analysing data that has more than two groups we need to test for the equality of all group means simultaneously. Multiple pairwise comparisons using t-tests is not appropriate as the experimentwise error rate* would exceed the specified level. However there is one omnibus test called Analysis of Variance (ANOVA) which is available. * see part 3 for definition of experimentwise error rate Summary from Lecture Notes A completely randomised design is a design for which independent random samples of experimental units are selected for each treatment. Given p treatments and n experimental units randomly assigned to each treatment, Ho: 1 = 2 = ..... = p Ha: j k for some j, k (i.e. all treatment means are the same) (at least two treatment means differ) The test statistic is a comparison of the difference between the treatment means to the amount of sampling variability using values called Sum of Squares (SS). In a completely randomised design we need to calculate two SS values: SST - the sum of squares for treatments, and SSE - the sum of squares for error. ni xi x 2 SSE = ni i 1 j 1 i 1 where x p p SST = xi 2 ij xi is the mean for treatment i, and x is the mean for all responses. An ANOVA tables is constructed using the following model. ANOVA TABLE SOURCE TREATMENTS ERROR TOTAL DF p-1 N-p N-1 SS MS SST / ( p-1) SSE / (N-p) F MST / MSE = SST + SSE Where N is the total number of experimental units, p is the number of treatments, MS is Mean Square, MST is Mean Square for Treatment (i.e. SST / (p-1) ) and MSE is Mean Square for Error (i.e. SSE / (N-p) ). The test statistic F = MST / MSE, and is compared to the F distribution with (p-1) numerator and (N-p) denominator degrees of freedom. Assumptions: 1. Samples are selected randomly and independently from the respective populations 2. All p population probability distributions are normal 3. The p population variances are equal. Rejection Region: if F > F, where F is based on quantile of the F distribution with (p-1) numerator and (N-p) denominator degrees of freedom 1 Download and open the dataset called FERTILISER.MTW from Onlineclasses. This dataset contains 3 variables as follows: Fertiliser Type: this variable is coded 1,2 or 3 to represent 3 different based fertilisers Block: A blocking variable - see part two of this sheet for further details. Yield: The crop yield from plots measured on some scale. An experimenter conducted an experiment to ascertain the effect of the different types of fertiliser on crop yield. She divided a field into 12 separate plots and assigned the plots to a fertiliser treatment at random. A crop was grown in each plot and the yield recorded. You have been asked to analyse these data to ascertain if the fertiliser type had any effect on yield. Conduct an ANOVA assuming a completely randomised design (i.e. ignore the block variable for the time being) with = .01, as follows: First get a feel for what is going on in the data. Get the mean yield for each fertiliser type using descriptive statistics? Treat 1 :______________ Treat 2 :______________ Treat 2 :______________ What are the experimental units here ? _____________________________________________ How to conduct an completely randomised ANOVA with MINITAB Go to Stat > ANOVA > General Linear Model... 1. Select the response here 2. Select the treatment variable here 2 NB: This ANOVA facility in MINITAB has two SS columns in the ANOVA table - Seq SS and Adj SS. In the examples you will be looking at these will be same, so both columns contain the correct SS as defined in the summary box above. Report your analysis here, including H0, HA, , test statistic, p-value and conclusions. Do you think you should conduct multiple comparisons on the basis of you results of this analysis, why? ______________________________________________________________________________ ______________________________________________________________________________ 2. Randomised Block Design In randomised block designs we try to reduce sampling variability by matching experimental units that are very similar at the start of the experiment. A group of experimental units that form a matched set are called blocks. The theory behind randomised block design is that the sampling variability of the experimental units within a block will be reduced, in turn reducing the measure of error, MSE. 3 Summary from Lecture Notes The information given in the first summary box plus the following. Matched sets of experimental units (called blocks) are formed, each block consisting of p experimental units (where p is the number of treatments). Each block should consist of experimental units that are as similar as possible. The number of blocks is designated b. One experimental unit from each block is randomly assigned to each treatment, resulting in N = b*p responses. In a randomised block design we need to calculate three SS values: SST - the sum of squares for treatments, SSB the sum of squares for blocks and SSE - the sum of squares for error. It is easier in these cases to calculate SSE as SSTotal (total sum of squares) - SST - SSB bx p SST = i 1 px x where xTi the mean for treatment i 2 Ti b SSB = i 1 x where xbi the mean for block i 2 bi N SSTotal = x x where 2 i x the mean for all responses i 1 SSE = SSTotal - SST - SSB An ANOVA tables is constructed using the following model. ANOVA TABLE SOURCE TREATMENTS BLOCKS ERROR TOTAL DF p-1 b-1 N-p-b+1 N-1 SS MS SST / ( p-1) SSB / (b-1) SSE / (N-p-b+1) F MST / MSE MSB / MSE = SST + SSB + SSE Assumptions: 1. All probability distributions of observations corresponding to all block-treatment combinations are normal. 3. The variances of all probability distributions are equal. Rejection Region for treatments: if F > F, where F is based on quantile of the F distribution with (p-1) numerator and (N-p-b+1) denominator degrees of freedom Use the same data and repeat your analysis. This time however you are given the additional information that the plots in the field were matched into blocks of equal underlying fertility (this is often the case in large fields - some parts are more naturally fertile than others). So this time include the variable block. The variable block is coded from 1 to 4, where block 1 was the area of lowest natural fertility and block 4 the area of highest natural fertility. 4 First get a feel for what is going on in the data. Get the mean yield in each block using descriptive statistics. What is the general trend ? _____________________________________________________________________________ How to conduct an randomised block ANOVA with MINITAB: Go to Stat > ANOVA > General Linear Model... 1. Select the response variable here 2. Select the treatment and block variables here 3. Click OK Report your analysis here, including Ho, Ha, , test statistic, p and conclusions. By how much was the estimate of SSE reduced from the first analysis ? __________________ What is the relationship between SSB and the reduction of SSE ? __________________ 5 3. Multiple Comparisons Once the null hypothesis is rejected in an ANOVA the next task is to locate the treatments that are different from each other. There are many methods of multiple comparisons, but among some of those which are widely used are Tukey, Bonferroni and Scheffé. These 3 methods are designed to keep the experimentwise error rate at or below . The experimentwise error rate is the probability of making a Type I error over all the multiple comparisons. You should have rejected the null hypothesis in the second analysis above, so now we need to see which treatments are significantly different from each other. Run the ANOVA again but this time click on Comparisons and click for Bonferrroni pairwise comparisons. Go to Stat > ANOVA > General Linear Model...> Comparisons... 1. Select the treatment variable here 2. Select the multiple comparison method here 3. Click OK You will be presented with both confidence intervals for the difference between the treatment means and also the result of a hypothesis test testing for no difference between means. Fill in the following table. Means Difference CI for difference P value of test Significantly different - Y/N T1 V T2 T1 V T3 T2 V T3 6 Assignment: Open the file sticking_times.mtw which can be found on onlineclasses. An experiment was conducted on three different types of glue to determine if there was any difference in the length of time they lasted. Eight broken items were assigned at random to each of the glues a, b and c to be fixed. The number of days each of the twenty-four items lasted was recorded. (Support any answers with appropriate p-values.) Is there any evidence of a difference in the mean lasting time of the glues a, b and c? Which glues significantly differ? REVISION SUMMARY After this lab you should be able to : - perform an ANOVA test - understand the hypothesis in the ANOVA table - perform multiple treatment comparisons - understand the reason for blocking END 7