Download Chapter 3: Single Factor Experiments with No Restrictions on

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Bootstrapping (statistics) wikipedia , lookup

Psychometrics wikipedia , lookup

Degrees of freedom (statistics) wikipedia , lookup

Taylor's law wikipedia , lookup

Omnibus test wikipedia , lookup

Misuse of statistics wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Analysis of variance wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript
Chapter 3: Single Factor Experiments with No
Restrictions on Randomization
A single factor experiment is an experiment in which
only one factor is varied. These are the simplest of all
possible experiments, but many things that we could
wish to study fall into this class.
Examples:
A purchaser for a major university is concerned about
repair costs for copy machines. She purchases five machines from each of three manufacturers and determines
the cost of repairs during the first 1 million copies made.
A fishery scientist is interested in studying the effects of
fertilizer runoff on the spawning of bullhead. To understand the process better, he will contaminate the water
in 12 tanks with one of four levels of fertilizer runoff
and determine the weight of eggs laid in each tank.
A professor in an optics lab wishes to determine if there
is a difference is the carelessness of his laboratory workers. He counts the number of microchips out of 1000
broken by each of his five employees during the first
semester.
Identify the single factor in each of the above situations.
1
If we believe that the experimental units are approximately homogeneous, and there are no restrictions on
the order of experimentation, then we have a completely randomized design. We will assume that the
number of observations for each level of the factor will
be determined based upon the cost of the experiment
and the power of our test.
We will write the model for this experiment as
Yij = µ + τj + ²ij .
Here, Yij is the ith observation on the jth treatment; µ is
the common effect for the experiment; τj is the effect of
the jth treatment; and ²ij is the random error present
in the ith observation on the jth treatment. We will
usually assume that the ²ij s are normally and independently distributed (NID) with mean zero and variance
the same for all treatments. We will write this as ²ij are
NID(0,σ²2 ) where σ²2 is the common variance within all
treatments.
Notice that µ is always a fixed parameter, and τj = µj −µ
where µj is just the true mean for the jth population.
We will discuss different types of inference that can
be made, depending upon which assumptions we make
about the treatment effects.
2
The Fixed Effects Model
Recall that we defined fixed effects in chapter 1. When
we are only interested in the treatments under consideration, it is appropriate to use a fixed effects model. For
the fixed effects model, we make the following additional
assumptions.
• τ1 , τ2, ..., τk are considered to be fixed parameters.
•
Pk
j=1 τj = 0, which implies that µ = (1/k)
Pk
j=1 µj .
Show the second bullet here....
The analysis that is traditionally performed here involves
the hypothesis test H0 : τ1 = τ2 = ... = τk vs the alternative H1 : at least one of the treatments is not equal to
0. This alternative is equivalent to the statement that
the effect of the treatments is not all the same.
We will discuss some tests to determine if we should
reject the null hypothesis, and some additional investigations that we can undertake.
3
The Random Effect Model
Recall that if we look only at a random subset of the
treatments of interest, we have a random effects model.
In this case, the scientist is usually interested in determining what proportion of the observed variability is due
to the differences between levels of the treatment. For
the random model, we make the following assumptions.
• τ1 , τ2, ..., τk are considered to be random variables.
• The τj ’s are NID(0,στ2 ).
For this type of model, we would test H0 : στ = 0 vs
H1 : στ > 0. Notice that this is equivalent to a test
that the population means are equal. This is because
a variable which has mean 0 and variance 0 must just
take on the value 0.
We can perform a test of this hypothesis in the same
manner which we would use for a fixed effect model.
This is true in this case, but is not always true of fixed
and random effects models.
We will next turn our attention to the analysis of variance, which you have hopefully seen before.
4
Analysis of Variance Rationale
Suppose that we have k populations, each one corresponding to a level of our factor of interest. The observations that we see for that level can then be thought
of as a sample from the appropriate population. Thus,
we would have that E(Yi1 ) = µ1 , the population mean
for population 1,E(Yi2 ) = µ2 , and E(Yij ) = µj .
Now, notice that Yij − µ = Yij − µ + µj − µj = (µj − µ) +
(Yij − µj ), τj = µj − µ, and Yij − µj = ²ij .
Next, consider the dot notation, which uses a subscript
· to indicate that we have summed over that subscript.
For example, let nj be the number of observations from
Pnj
population j. Then, T·j = i=1 Yij . Also, let Y ·j denote
the average of the samples from population j, Y ·j =
Pnj
Pk
i=1 Yij /n. Now, let N =
j=1 nj , and then
T·· =
k
X
T·j =
j=1
nj
k X
X
Yij .
j=1 i=1
The mean of all of the observations for all k populations
is
k
X
T··
Y ·· =
=
nj Y ·j /N.
N
j=1
5
Using this notation, we can create an identity for the
samples that is similar to that for the entire population.
Notice
Yij − Y ·· = (Y ·j − Y ·· ) + (Yij − Y ·j ).
Just for the fun of it, let’s square both sides of this
equation and sum them over all observations in all populations....
nj
k X
X
j=1 i=1
(Yij − Y ·· )2 =
nj
k X
X
(Y ·j − Y ·· )2 +
j=1 i=1
+2
nj
k X
X
nj
k X
X
(Yij − Y ·j )2
j=1 i=1
(Y ·j − Y ·· )(Yij − Y ·j )
j=1 i=1
I claim that the last term is equal to zero. Let’s show
this.
6
Your text refers to the resulting equation as the fundamental equation of analysis of variance. It is
nj
k X
X
(Yij − Y ·· )2 =
j=1 i=1
nj
k X
X
(Y ·j − Y ·· )2 +
j=1 i=1
nj
k X
X
(Yij − Y ·j )2 .
j=1 i=1
We will call these three terms SStotal for the first term,
SSbetween or SStreatment for the second term, and SSwithin
or SSerror for the third term.
Error Mean Square
We
2 that for a particular population,
Pnj saw in Chapter
2
i=1 (Yij − Y ·j ) /(nj − 1) would provide an unbiased estimate of the variance. If the variances in all populations are the same, we can get a better estimate of
the variance by pooling the estimates for the individual
populations.
Pk Pnj
2
SSerror
i=1 (Yij − Y ·j )
j=1
.
MSerror =
=
Pk
N
−
k
(n
−
1)
j
j=1
Note that before we sample, MSerror is an unbiased estimate of the common population variance, whether or
not the population means are different.
Think about how you would show this.
figure it out, come ask me.
If you can’t
7
Treatment Mean Squares
If the k populations all have the same mean, we can form
another unbiased estimate of the common population
variance. We can divide SStreatment by its degrees of
freedom to obtain the treatment mean squares,
Pk
2
SStreatment
j=1 nj (Y ·j − Y ·· )
MStreatment =
=
.
k−1
k−1
When the means of the k populations are equal, MStreatment
is an unbiased estimate of σ²2 . Please see your text for
an explanation. Notice that if the treatment means are
not all equal, the average value of MStreatment is greater
than σ²2 .
The F Ratio
In general, the treatment and error sums of squares are
not independent. However, when the hypothesis of
no treatment effect is true, it can be shown that the
treatment and error mean squares
• Are unbiased estimators of σ²2 .
• Have independent chi-square distributions with k−1
and N − k degrees of freedom, respectively.
Thus, their ratio (F = MStreatment /MSerror ) follows an
F distribution with ν1 = k − 1 and ν2 = N − k when
the model assumptions are satisfied and the population
means are equal.
8
The One-Way ANOVA
We can test the null hypothesis, H0 : τ1 = τ2 = ... =
τk = 0 using the F statistic proposed above. Values of
F which are much greater than one suggest that the
null hypothesis be rejected; this is because when the
null hypothesis is false, the average value of MStreatment
is larger than σ²2 , although the average value of MSerror
is always equal to σ²2 .
We are only interested in rejecting the null hypothesis
in the event that the value of F is too large - thus we
will use the upper tail of the distribution only for making
comparisons.
We can construct an ANOVA table to summarize these
results. A generic table for a one-way ANOVA would
have the following form:
Source
Treatment
Error
Totals
df
k−1
N −k
N −1
SS
SST
SSE
SStotal
MS
MST
MSE
F
f = MST
MSE
p value
pr(F(ν1 ,ν2 ) ≥ f )
Notice that I have used the abbreviations SST, SSE,
MST, and MSE for the appropriate mean squares and
sum of squares. This is for compactness of the table,
only.
Now, let us try to fill in an example table for the following scenario.
9
Cara is a nutrition graduate student interested in studying the formation of colon cancer. She wishes to compare two different diets (low fat vs high fat). She feeds
lab rats one of the two diets for three weeks and then
collects fecal samples for analysis. Suppose that she
feeds 12 rats the low fat diet and she gives another 10
rats the high fat diet. If she has SStreatment = 60 and
SStotal = 82, fill in the following ANOVA table.
Source
Treatment
Error
Totals
df
SS
MS
F
p value
David is a veterinarian who is hoping to learn more
about the different medications available to treat feline diabetes. He has four different types of medication
available to him, and he decides to prescribe each medication to 8 cats. He then measures the circulating blood
sugar levels 30 minutes after treatment. Suppose that
his summary information suggests that MStreatment = 12
and SStotal = 136. Please fill in this ANOVA table.
Source
Treatment
Error
Totals
df
SS
MS
F
p value
10
Tests on Means
Your book points out that after we determine that there
are differences between at least some of the treatment
means, we still have some questions to answer.
For the fixed effects model, we will wish to answer questions like
• Which treatment is best (worst)?
• Does treatment A differ from treatment C?
• Is the mean of treatment B different from that
mean of A and C together?
First, let us consider qualitative treatments. In this case,
the type of procedure that we will use depends upon
whether or not the treatment comparisons of interest
were selected before the experiment was run. If the
means are decided upon before the experiment, orthogonal contrasts may be used.
The method of orthogonal contrasts allows us to
make at most the number of degrees of freedom comparisons. We may do this if the comparisons are chosen
before the experiment without concern about the overall level α of the test. This is not true if we look at the
data first, and then decide what to compare. Why?
11
There are more than one possible way we could wish
to define a contrast. W e will assume that the number of samples for the different treatments are equal.
We might be interested in the difference between two
treatments (or equivalently T·1 − T·2 ) or in the difference between the mean of three treatments and a fourth
treatment (T·1 + T·2 + T·3 − 3T·4 ). We will generalize this
by saying that a contrast, Cm , in the treatment totals
as follows:
Cm = c1m T·1 + c2m T·2 + ... + ckm T·k ,
where
c1m + c2m + ... + ckm = 0.
We will say that contrasts are orthogonal when they
are independent of one another. Your book contains a
derivation of the following formula - we will take it on
faith. Two contrasts are orthogonal if and only if, for
contrasts Cm and Cq , c1m c1q + c2m c2q + ... + ckm ckq = 0.
Are the following contrasts orthogonal? 1 −1 0 and 1
1 −2.
12
Sum of Squares for Contrasts
Let Cm be a contrast in the treatment total, and let
each sample consist of n observations. Then, the sum
of squares for Cm is
SSCm =
n
2
Cm
Pk
2
j=1 cjm
.
It can be shown that the mean and variance of a contrast
Cm are
E(Cm ) = n(cim µ1 + c2m µ2 + ... + ckm µk )
Var(Cm ) = nσ²2(c21m + c22m + ... + c2km ).
However, σ²2 is unknown, and we would have to estimate
it with MSerror . Thus, we could form a t statistics as
Cm − E(Cm )
.
t= p
Var(Cm )
Suppose that we wish to perform the usual test that
the means in the contrast are not different. In this case,
E(Cm ) is equal to zero. We can use the t statistic above
to perform this test, or alternatively, we can square the
numerator and denominator of that statistic. The result
is
2
F =t =
n
Pk
2
Cm
2
j=1 cjm MSerror
=
SSCm
∼ F(1,N −k)
MSerror
when the null hypothesis is true. This is the form usually
used by software, and can be compared to the values in
Table D.
13
We could also express the contrasts in terms of the
treatment means. Not surprisingly, when the number
of samples in each treatment group is the same, the
contrast in the means, Cq is just equal to Cm /n. This
change by a factor of n will be observed in the contrast
value, the mean, and the standard deviation. However,
the sum of squares for the contrast remains unchanged,
and thus the F statistic remains valid.
Unequal Sample Sizes
Suppose that we have total T·1 , T·2, ..., T·k based upon
samples of size n1 , n2 , ..., nk . Then, we can define
• Cm = c1m T·1 + c2m T·2 + ... + ckm T·k is a contrast in
the treatment totals, provided that n1 c1m + n2c2m +
... + nk ckm = 0.
• The contrasts Cm and Cq are orthogonal if and only
if n1 c1m c1q + n2 c2m c2q + ... + nk ckm ckq = 0.
• The sum of squares for the contrast has the form
SSCm = Pk
2
Cm
2
j=1 nj cjm
.
Suppose that we have 4 samples from the first treatment and 6 samples from the second treatment. If the
treatment totals are 14.2 and 17.6, find an orthogonal contrast, and then find the sum of squares for that
contrast.
14
Multiple Comparison Procedures
If we look at the means of the treatments before we decide what comparisons to make, we can have problems
with the α level of any tests that we use. There are
some multiple comparison procedures that we can use
to deal with the problem.
The first procedure that we will consider is called the
Student-Newman-Keuls Range test. This is a procedure which aims to answer the question which treatment means are different. To do this by hand, we would
complete the following steps:
• Arrange the k means from smallest to largest.
• Find MSerror and its associated degrees of freedom.
• Find the standard error of the mean for each treatment
s
MSerror
sY ·j =
,
nj
where the error mean square is the one used as the
denominator in the F test on the population means.
• Obtain the appropriate k − 1 tabled ranges from
Table E.1 or E.2, using n2 as the error degrees of
freedom.
15
• Multiply the ranges by sY ·j to obtain the k − 1 least
significant ranges.
• Beginning with the largest vs smallest, test all possible pairs of values to determine if they are larger
than the least significant range. Declare as different all pairs of means which are larger than the
least significant distance, unless the two numbers
are contained in another non-significant interval.
The results of this test may be visualized by drawing a
graph with the treatment means indicated. A line passes
under all means which cannot be declared significantly
different. Suppose that we conclude based upon our
procedure that the treatments from smallest to largest
are B, C, A, D. Now, suppose that we conclude that µD >
µB , µD > µC , µD = µA , µA = µB , µA = µC , and µC = µB .
Draw the picture.
SAS does this by giving all treatments which are not
significantly different the same letter designation. For
example:
16
The SNK procedure only allows us to compare pairs
of means, but does not allow us to answer questions
about the mean across two treatments. These sort of
questions can be answered by using Scheffé’s Test.
This procedure allows us to test as many contrasts as
we desire and to decide upon these contrasts after seeing
the results of the experiment. What is the drawback of
this procedure?
Note that to perform these comparisons, we need to
first determine the overall α level for our test and have
the error mean squares available. Then,
• Set up all contrasts of interest and find their numerical values.
• Determine the number f such that pr(F(k−1,N −k) ≥
f ) = α.
• Calculate A =
p
(k − 1)f .
• Compute the standard error for each contrast to be
tested. This is given by
q
sCm = MSerror (n1 c21m + n2 c22m + ... + nk c2km ).
• Let cm be the observed value of a contrast. Reject
the hypothesis that the contrast is equal to zero
when |cm | > AsCm .
17
Note that most software packages, including SAS, will
perform these (and other) tests and procedures for you.
We will not spend much time on calculating them by
hand, but ensure that you know where they come from,
and which to use in a particular situation.
Confidence Limits for Means
We can form a 100(1 − α)% confidence interval for a
mean µj by
p
y ·j ± t1−α/2 MSerror /nj
where the mean square error in question is that from
the denominator of the F test for the treatment mean.
These are a summary of the procedures that we can use
to further investigate the differences between means.
This is only logical in the case that we can reject the
global hypothesis that all of the treatment means are
equal. This is also only logical in the case that we are
in a fixed effects case.
18
Components of Variance
In the previous slides, we considered what further steps
could be taken in the case that we had a fixed effects
model. However, in the case that we are interested in
a random effects experiment, there are different questions that we can seek to answer when the global null
hypothesis is rejected.
In the random effects model, we are usually interested in
answering the question ”what fraction of the total variability can be attributed to the differences in treatment
means?” We claimed in an earlier section that
E(MSerror ) = σ²2
but
E(MStreatment ) = σ²2 + nστ2 .
Suppose the we perform an experiment and find the
MSerror = 112 and MStreatment = 48 with three replicates
in each of four treatments. What would be a naive way
to estimate σ²2 and στ2 ?
19
Now, notice that
E(MStotal ) =
n(k − 1) 2
στ + σ²2 .
N −1
Read the derivation of these expectations in the text
book. If you don’t understand them, please let me know.
Checking the Model
Recall that we assume that our samples are independent, random, normal samples from populations with
equal variances. Often this may not be the case. Although it is not essential that all of the model assumptions be satisfied, we do need the assumptions to be
reasonable.
The first way that we will assess the validity of the assumptions is using plots of the data and the residuals.
Define the residual as the difference between the observed value and the predicted value based upon the
model. Denote the predicted value of Yij by Ŷij . In the
one factor case, the predicted value is just the mean for
the jth treatment, Ŷij = Y ·j . Then, the residual of Yij ,
Eij is
Eij = Yij − Y ·j
We would then do a normal quantile plot and a ShapiroWilk test on the residuals. If the assumptions are satisfied, the residuals are normally distributed, so if these
tests indicate a lack of normality, we would believe that
the assumptions are not satisfied.
20
Assessing the equality of variances across treatments
can be initially done visually. If we plot the values of
the residuals for each treatment separately and compare
them by eye, we may sometimes see that the variances
do not appear to be equal.
Your text also contains a more quantitative assessment,
based upon the quantity D4 . First, we would determine
the average range of the data within the treatments.
Then, multiply this by the D4 value from the table. If
all of the ranges are less than this value, then the assumption of homogenous variance is reasonable.
We need to ensure that the samples are independent, as
violating this assumption leads to severe problems with
ANOVA methods. If we have time series data, we can
perform a test to assess if there is a violation of the
independence assumption. We can calculate the sample
correlation coefficient, r1 , as
Pn−1
i=1 ei ei+1
r1 = P
.
n
2
e
i=1 i
The sampling distribution of this statistic is approximately normal with mean 0 and variance 1/n. Thus,
we should consider that independence is suspect
if the
√
value of the statistic is greater than 1.96/ n.
21
In addition to the time series test, if it is applicable, we
should consider plotting the residuals vs other variables
in the model. One important plot to consider is the
plot of the residuals vs the predicted values. If we see
patterns in the plots, this is an indication that the data
are not independent.
Further, if we plot the observations vs the order in which
the measurements were collected, we should again not
see a pattern. If we do, we would again conclude that
the data do not appear to be independent.
What to do if the model is inadequate?
One obvious thing to attempt if the model is inadequate
is to try a different model. Sometimes this is not possible. If it is not possible to randomize completely, the
restriction on randomization should be included in the
model.
If the data appear not to be normal, or if the variances
are not homogeneous, we can try to do a transformation
of the data to correct for this problem. After a transformation, the data should again be checked before the
analysis is accepted.
Your book contains suggestions for different types of
transformations that should be considered for different
types of data structures.
22