Download Statistics MINITAB

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Statistics wikipedia , lookup

Transcript
Statistics
MINITAB - Lab 14
ANOVA - TEST TO COMPARE P TREATMENT MEANS
1.
Completely Randomised Design
If we are analysing data that has more than two groups we need to test for the equality of all
group means simultaneously. Multiple pairwise comparisons using t-tests is not appropriate as the
experimentwise error rate* would exceed the specified  level. However there is one omnibus
test called Analysis of Variance (ANOVA) which is available.
* see part 3 for definition of experimentwise error rate
Summary from Lecture Notes
A completely randomised design is a design for which independent random samples of experimental units are
selected for each treatment.
Given p treatments and n experimental units randomly assigned to each treatment,
Ho: 1 = 2 = ..... = p
Ha: j k for some j, k
(i.e. all treatment means are the same)
(at least two treatment means differ)
The test statistic is a comparison of the difference between the treatment means to the amount of sampling
variability using values called Sum of Squares (SS).
In a completely randomised design we need to calculate two SS values: SST - the sum of squares for treatments,
and SSE - the sum of squares for error.
 ni xi  x 
2
SSE =
ni
i 1 j 1
i 1
where
 x
p
p
SST =
 xi 
2
ij
xi is the mean for treatment i, and x is the mean for all responses.
An ANOVA tables is constructed using the following model.
ANOVA TABLE
SOURCE
TREATMENTS
ERROR
TOTAL
DF
p-1
N-p
N-1
SS
MS
SST / ( p-1)
SSE / (N-p)
F
MST / MSE
= SST + SSE
Where N is the total number of experimental units, p is the number of treatments, MS is Mean Square, MST is
Mean Square for Treatment (i.e. SST / (p-1) ) and MSE is Mean Square for Error (i.e. SSE / (N-p) ).
The test statistic F = MST / MSE, and is compared to the F distribution with (p-1) numerator and (N-p)
denominator degrees of freedom.
Assumptions:
1. Samples are selected randomly and independently from the respective populations
2. All p population probability distributions are normal
3. The p population variances are equal.
Rejection Region: if F > F, where F is based on quantile of the F distribution with (p-1) numerator and (N-p)
denominator degrees of freedom
1
Download and open the dataset called FERTILISER.MTW from Onlineclasses. This dataset
contains 3 variables as follows:
Fertiliser Type: this variable is coded 1,2 or 3 to represent 3 different based fertilisers
Block: A blocking variable - see part two of this sheet for further details.
Yield: The crop yield from plots measured on some scale.
An experimenter conducted an experiment to ascertain the effect of the different types of fertiliser
on crop yield. She divided a field into 12 separate plots and assigned the plots to a fertiliser
treatment at random. A crop was grown in each plot and the yield recorded. You have been
asked to analyse these data to ascertain if the fertiliser type had any effect on yield. Conduct an
ANOVA assuming a completely randomised design (i.e. ignore the block variable for the time
being) with  = .01, as follows:
First get a feel for what is going on in the data. Get the mean yield for each fertiliser type using
descriptive statistics?
Treat 1 :______________
Treat 2 :______________
Treat 2 :______________
What are the experimental units here ? _____________________________________________
How to conduct an completely randomised ANOVA with MINITAB
Go to Stat > ANOVA > General Linear Model...
1. Select the response
here
2. Select the treatment
variable here
2
NB: This ANOVA facility in MINITAB has two SS columns in the ANOVA table - Seq SS and
Adj SS. In the examples you will be looking at these will be same, so both columns contain
the correct SS as defined in the summary box above.
Report your analysis here, including H0, HA, , test statistic, p-value and conclusions.
Do you think you should conduct multiple comparisons on the basis of you results of this analysis,
why?
______________________________________________________________________________
______________________________________________________________________________
2.
Randomised Block Design
In randomised block designs we try to reduce sampling variability by matching experimental units
that are very similar at the start of the experiment. A group of experimental units that form a
matched set are called blocks. The theory behind randomised block design is that the
sampling variability of the experimental units within a block will be reduced, in turn
reducing the measure of error, MSE.
3
Summary from Lecture Notes
The information given in the first summary box plus the following.
Matched sets of experimental units (called blocks) are formed, each block consisting of p experimental units
(where p is the number of treatments). Each block should consist of experimental units that are as similar as
possible. The number of blocks is designated b.
One experimental unit from each block is randomly assigned to each treatment, resulting in N = b*p responses.
In a randomised block design we need to calculate three SS values: SST - the sum of squares for treatments, SSB the sum of squares for blocks and SSE - the sum of squares for error. It is easier in these cases to calculate SSE as
SSTotal (total sum of squares) - SST - SSB
 bx
p
SST =
i 1
 px

 x where xTi  the mean for treatment i
2
Ti

b
SSB =
i 1
 x where xbi  the mean for block i
2
bi
N
SSTotal =
 x  x  where
2
i
x  the mean for all responses
i 1
SSE = SSTotal - SST - SSB
An ANOVA tables is constructed using the following model.
ANOVA TABLE
SOURCE
TREATMENTS
BLOCKS
ERROR
TOTAL
DF
p-1
b-1
N-p-b+1
N-1
SS
MS
SST / ( p-1)
SSB / (b-1)
SSE / (N-p-b+1)
F
MST / MSE
MSB / MSE
= SST + SSB + SSE
Assumptions:
1. All probability distributions of observations corresponding to all block-treatment combinations are normal.
3. The variances of all probability distributions are equal.
Rejection Region for treatments: if F > F, where F is based on quantile of the F distribution with (p-1) numerator
and (N-p-b+1) denominator degrees of freedom
Use the same data and repeat your analysis. This time however you are given the additional
information that the plots in the field were matched into blocks of equal underlying fertility (this is
often the case in large fields - some parts are more naturally fertile than others). So this time
include the variable block. The variable block is coded from 1 to 4, where block 1 was the area of
lowest natural fertility and block 4 the area of highest natural fertility.
4
First get a feel for what is going on in the data. Get the mean yield in each block using descriptive
statistics. What is the general trend ?
_____________________________________________________________________________
How to conduct an randomised block ANOVA with MINITAB:
Go to Stat > ANOVA > General Linear Model...
1. Select the response
variable here
2. Select the treatment
and block variables here
3. Click OK
Report your analysis here, including Ho, Ha, , test statistic, p and conclusions.
By how much was the estimate of SSE reduced from the first analysis ? __________________
What is the relationship between SSB and the reduction of SSE ?
__________________
5
3.
Multiple Comparisons
Once the null hypothesis is rejected in an ANOVA the next task is to locate the treatments that are
different from each other. There are many methods of multiple comparisons, but among some of
those which are widely used are Tukey, Bonferroni and Scheffé. These 3 methods are designed
to keep the experimentwise error rate at or below . The experimentwise error rate is the
probability of making a Type I error over all the multiple comparisons. You should have rejected
the null hypothesis in the second analysis above, so now we need to see which treatments are
significantly different from each other. Run the ANOVA again but this time click on Comparisons
and click for Bonferrroni pairwise comparisons.
Go to Stat > ANOVA > General Linear Model...> Comparisons...
1. Select the treatment
variable here
2. Select the
multiple comparison
method here
3. Click OK
You will be presented with both confidence intervals for the difference between the treatment
means and also the result of a hypothesis test testing for no difference between means. Fill in the
following table.
Means
Difference
CI for difference
P value of test
Significantly
different - Y/N
T1 V T2
T1 V T3
T2 V T3
6
Assignment:
Open the file sticking_times.mtw which can be found on onlineclasses.
An experiment was conducted on three different types of glue to determine if there was any
difference in the length of time they lasted. Eight broken items were assigned at random to each
of the glues a, b and c to be fixed. The number of days each of the twenty-four items lasted was
recorded.
(Support any answers with appropriate p-values.)
Is there any evidence of a difference in the mean lasting time of the glues a, b and c?
Which glues significantly differ?
REVISION SUMMARY
After this lab you should be able to :
-
perform an ANOVA test
-
understand the hypothesis in the ANOVA table
-
perform multiple treatment comparisons
-
understand the reason for blocking
END
7