* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Chapter 3: Single Factor Experiments with No Restrictions on
Bootstrapping (statistics) wikipedia , lookup
Psychometrics wikipedia , lookup
Degrees of freedom (statistics) wikipedia , lookup
Taylor's law wikipedia , lookup
Omnibus test wikipedia , lookup
Misuse of statistics wikipedia , lookup
Resampling (statistics) wikipedia , lookup
Chapter 3: Single Factor Experiments with No Restrictions on Randomization A single factor experiment is an experiment in which only one factor is varied. These are the simplest of all possible experiments, but many things that we could wish to study fall into this class. Examples: A purchaser for a major university is concerned about repair costs for copy machines. She purchases five machines from each of three manufacturers and determines the cost of repairs during the first 1 million copies made. A fishery scientist is interested in studying the effects of fertilizer runoff on the spawning of bullhead. To understand the process better, he will contaminate the water in 12 tanks with one of four levels of fertilizer runoff and determine the weight of eggs laid in each tank. A professor in an optics lab wishes to determine if there is a difference is the carelessness of his laboratory workers. He counts the number of microchips out of 1000 broken by each of his five employees during the first semester. Identify the single factor in each of the above situations. 1 If we believe that the experimental units are approximately homogeneous, and there are no restrictions on the order of experimentation, then we have a completely randomized design. We will assume that the number of observations for each level of the factor will be determined based upon the cost of the experiment and the power of our test. We will write the model for this experiment as Yij = µ + τj + ²ij . Here, Yij is the ith observation on the jth treatment; µ is the common effect for the experiment; τj is the effect of the jth treatment; and ²ij is the random error present in the ith observation on the jth treatment. We will usually assume that the ²ij s are normally and independently distributed (NID) with mean zero and variance the same for all treatments. We will write this as ²ij are NID(0,σ²2 ) where σ²2 is the common variance within all treatments. Notice that µ is always a fixed parameter, and τj = µj −µ where µj is just the true mean for the jth population. We will discuss different types of inference that can be made, depending upon which assumptions we make about the treatment effects. 2 The Fixed Effects Model Recall that we defined fixed effects in chapter 1. When we are only interested in the treatments under consideration, it is appropriate to use a fixed effects model. For the fixed effects model, we make the following additional assumptions. • τ1 , τ2, ..., τk are considered to be fixed parameters. • Pk j=1 τj = 0, which implies that µ = (1/k) Pk j=1 µj . Show the second bullet here.... The analysis that is traditionally performed here involves the hypothesis test H0 : τ1 = τ2 = ... = τk vs the alternative H1 : at least one of the treatments is not equal to 0. This alternative is equivalent to the statement that the effect of the treatments is not all the same. We will discuss some tests to determine if we should reject the null hypothesis, and some additional investigations that we can undertake. 3 The Random Effect Model Recall that if we look only at a random subset of the treatments of interest, we have a random effects model. In this case, the scientist is usually interested in determining what proportion of the observed variability is due to the differences between levels of the treatment. For the random model, we make the following assumptions. • τ1 , τ2, ..., τk are considered to be random variables. • The τj ’s are NID(0,στ2 ). For this type of model, we would test H0 : στ = 0 vs H1 : στ > 0. Notice that this is equivalent to a test that the population means are equal. This is because a variable which has mean 0 and variance 0 must just take on the value 0. We can perform a test of this hypothesis in the same manner which we would use for a fixed effect model. This is true in this case, but is not always true of fixed and random effects models. We will next turn our attention to the analysis of variance, which you have hopefully seen before. 4 Analysis of Variance Rationale Suppose that we have k populations, each one corresponding to a level of our factor of interest. The observations that we see for that level can then be thought of as a sample from the appropriate population. Thus, we would have that E(Yi1 ) = µ1 , the population mean for population 1,E(Yi2 ) = µ2 , and E(Yij ) = µj . Now, notice that Yij − µ = Yij − µ + µj − µj = (µj − µ) + (Yij − µj ), τj = µj − µ, and Yij − µj = ²ij . Next, consider the dot notation, which uses a subscript · to indicate that we have summed over that subscript. For example, let nj be the number of observations from Pnj population j. Then, T·j = i=1 Yij . Also, let Y ·j denote the average of the samples from population j, Y ·j = Pnj Pk i=1 Yij /n. Now, let N = j=1 nj , and then T·· = k X T·j = j=1 nj k X X Yij . j=1 i=1 The mean of all of the observations for all k populations is k X T·· Y ·· = = nj Y ·j /N. N j=1 5 Using this notation, we can create an identity for the samples that is similar to that for the entire population. Notice Yij − Y ·· = (Y ·j − Y ·· ) + (Yij − Y ·j ). Just for the fun of it, let’s square both sides of this equation and sum them over all observations in all populations.... nj k X X j=1 i=1 (Yij − Y ·· )2 = nj k X X (Y ·j − Y ·· )2 + j=1 i=1 +2 nj k X X nj k X X (Yij − Y ·j )2 j=1 i=1 (Y ·j − Y ·· )(Yij − Y ·j ) j=1 i=1 I claim that the last term is equal to zero. Let’s show this. 6 Your text refers to the resulting equation as the fundamental equation of analysis of variance. It is nj k X X (Yij − Y ·· )2 = j=1 i=1 nj k X X (Y ·j − Y ·· )2 + j=1 i=1 nj k X X (Yij − Y ·j )2 . j=1 i=1 We will call these three terms SStotal for the first term, SSbetween or SStreatment for the second term, and SSwithin or SSerror for the third term. Error Mean Square We 2 that for a particular population, Pnj saw in Chapter 2 i=1 (Yij − Y ·j ) /(nj − 1) would provide an unbiased estimate of the variance. If the variances in all populations are the same, we can get a better estimate of the variance by pooling the estimates for the individual populations. Pk Pnj 2 SSerror i=1 (Yij − Y ·j ) j=1 . MSerror = = Pk N − k (n − 1) j j=1 Note that before we sample, MSerror is an unbiased estimate of the common population variance, whether or not the population means are different. Think about how you would show this. figure it out, come ask me. If you can’t 7 Treatment Mean Squares If the k populations all have the same mean, we can form another unbiased estimate of the common population variance. We can divide SStreatment by its degrees of freedom to obtain the treatment mean squares, Pk 2 SStreatment j=1 nj (Y ·j − Y ·· ) MStreatment = = . k−1 k−1 When the means of the k populations are equal, MStreatment is an unbiased estimate of σ²2 . Please see your text for an explanation. Notice that if the treatment means are not all equal, the average value of MStreatment is greater than σ²2 . The F Ratio In general, the treatment and error sums of squares are not independent. However, when the hypothesis of no treatment effect is true, it can be shown that the treatment and error mean squares • Are unbiased estimators of σ²2 . • Have independent chi-square distributions with k−1 and N − k degrees of freedom, respectively. Thus, their ratio (F = MStreatment /MSerror ) follows an F distribution with ν1 = k − 1 and ν2 = N − k when the model assumptions are satisfied and the population means are equal. 8 The One-Way ANOVA We can test the null hypothesis, H0 : τ1 = τ2 = ... = τk = 0 using the F statistic proposed above. Values of F which are much greater than one suggest that the null hypothesis be rejected; this is because when the null hypothesis is false, the average value of MStreatment is larger than σ²2 , although the average value of MSerror is always equal to σ²2 . We are only interested in rejecting the null hypothesis in the event that the value of F is too large - thus we will use the upper tail of the distribution only for making comparisons. We can construct an ANOVA table to summarize these results. A generic table for a one-way ANOVA would have the following form: Source Treatment Error Totals df k−1 N −k N −1 SS SST SSE SStotal MS MST MSE F f = MST MSE p value pr(F(ν1 ,ν2 ) ≥ f ) Notice that I have used the abbreviations SST, SSE, MST, and MSE for the appropriate mean squares and sum of squares. This is for compactness of the table, only. Now, let us try to fill in an example table for the following scenario. 9 Cara is a nutrition graduate student interested in studying the formation of colon cancer. She wishes to compare two different diets (low fat vs high fat). She feeds lab rats one of the two diets for three weeks and then collects fecal samples for analysis. Suppose that she feeds 12 rats the low fat diet and she gives another 10 rats the high fat diet. If she has SStreatment = 60 and SStotal = 82, fill in the following ANOVA table. Source Treatment Error Totals df SS MS F p value David is a veterinarian who is hoping to learn more about the different medications available to treat feline diabetes. He has four different types of medication available to him, and he decides to prescribe each medication to 8 cats. He then measures the circulating blood sugar levels 30 minutes after treatment. Suppose that his summary information suggests that MStreatment = 12 and SStotal = 136. Please fill in this ANOVA table. Source Treatment Error Totals df SS MS F p value 10 Tests on Means Your book points out that after we determine that there are differences between at least some of the treatment means, we still have some questions to answer. For the fixed effects model, we will wish to answer questions like • Which treatment is best (worst)? • Does treatment A differ from treatment C? • Is the mean of treatment B different from that mean of A and C together? First, let us consider qualitative treatments. In this case, the type of procedure that we will use depends upon whether or not the treatment comparisons of interest were selected before the experiment was run. If the means are decided upon before the experiment, orthogonal contrasts may be used. The method of orthogonal contrasts allows us to make at most the number of degrees of freedom comparisons. We may do this if the comparisons are chosen before the experiment without concern about the overall level α of the test. This is not true if we look at the data first, and then decide what to compare. Why? 11 There are more than one possible way we could wish to define a contrast. W e will assume that the number of samples for the different treatments are equal. We might be interested in the difference between two treatments (or equivalently T·1 − T·2 ) or in the difference between the mean of three treatments and a fourth treatment (T·1 + T·2 + T·3 − 3T·4 ). We will generalize this by saying that a contrast, Cm , in the treatment totals as follows: Cm = c1m T·1 + c2m T·2 + ... + ckm T·k , where c1m + c2m + ... + ckm = 0. We will say that contrasts are orthogonal when they are independent of one another. Your book contains a derivation of the following formula - we will take it on faith. Two contrasts are orthogonal if and only if, for contrasts Cm and Cq , c1m c1q + c2m c2q + ... + ckm ckq = 0. Are the following contrasts orthogonal? 1 −1 0 and 1 1 −2. 12 Sum of Squares for Contrasts Let Cm be a contrast in the treatment total, and let each sample consist of n observations. Then, the sum of squares for Cm is SSCm = n 2 Cm Pk 2 j=1 cjm . It can be shown that the mean and variance of a contrast Cm are E(Cm ) = n(cim µ1 + c2m µ2 + ... + ckm µk ) Var(Cm ) = nσ²2(c21m + c22m + ... + c2km ). However, σ²2 is unknown, and we would have to estimate it with MSerror . Thus, we could form a t statistics as Cm − E(Cm ) . t= p Var(Cm ) Suppose that we wish to perform the usual test that the means in the contrast are not different. In this case, E(Cm ) is equal to zero. We can use the t statistic above to perform this test, or alternatively, we can square the numerator and denominator of that statistic. The result is 2 F =t = n Pk 2 Cm 2 j=1 cjm MSerror = SSCm ∼ F(1,N −k) MSerror when the null hypothesis is true. This is the form usually used by software, and can be compared to the values in Table D. 13 We could also express the contrasts in terms of the treatment means. Not surprisingly, when the number of samples in each treatment group is the same, the contrast in the means, Cq is just equal to Cm /n. This change by a factor of n will be observed in the contrast value, the mean, and the standard deviation. However, the sum of squares for the contrast remains unchanged, and thus the F statistic remains valid. Unequal Sample Sizes Suppose that we have total T·1 , T·2, ..., T·k based upon samples of size n1 , n2 , ..., nk . Then, we can define • Cm = c1m T·1 + c2m T·2 + ... + ckm T·k is a contrast in the treatment totals, provided that n1 c1m + n2c2m + ... + nk ckm = 0. • The contrasts Cm and Cq are orthogonal if and only if n1 c1m c1q + n2 c2m c2q + ... + nk ckm ckq = 0. • The sum of squares for the contrast has the form SSCm = Pk 2 Cm 2 j=1 nj cjm . Suppose that we have 4 samples from the first treatment and 6 samples from the second treatment. If the treatment totals are 14.2 and 17.6, find an orthogonal contrast, and then find the sum of squares for that contrast. 14 Multiple Comparison Procedures If we look at the means of the treatments before we decide what comparisons to make, we can have problems with the α level of any tests that we use. There are some multiple comparison procedures that we can use to deal with the problem. The first procedure that we will consider is called the Student-Newman-Keuls Range test. This is a procedure which aims to answer the question which treatment means are different. To do this by hand, we would complete the following steps: • Arrange the k means from smallest to largest. • Find MSerror and its associated degrees of freedom. • Find the standard error of the mean for each treatment s MSerror sY ·j = , nj where the error mean square is the one used as the denominator in the F test on the population means. • Obtain the appropriate k − 1 tabled ranges from Table E.1 or E.2, using n2 as the error degrees of freedom. 15 • Multiply the ranges by sY ·j to obtain the k − 1 least significant ranges. • Beginning with the largest vs smallest, test all possible pairs of values to determine if they are larger than the least significant range. Declare as different all pairs of means which are larger than the least significant distance, unless the two numbers are contained in another non-significant interval. The results of this test may be visualized by drawing a graph with the treatment means indicated. A line passes under all means which cannot be declared significantly different. Suppose that we conclude based upon our procedure that the treatments from smallest to largest are B, C, A, D. Now, suppose that we conclude that µD > µB , µD > µC , µD = µA , µA = µB , µA = µC , and µC = µB . Draw the picture. SAS does this by giving all treatments which are not significantly different the same letter designation. For example: 16 The SNK procedure only allows us to compare pairs of means, but does not allow us to answer questions about the mean across two treatments. These sort of questions can be answered by using Scheffé’s Test. This procedure allows us to test as many contrasts as we desire and to decide upon these contrasts after seeing the results of the experiment. What is the drawback of this procedure? Note that to perform these comparisons, we need to first determine the overall α level for our test and have the error mean squares available. Then, • Set up all contrasts of interest and find their numerical values. • Determine the number f such that pr(F(k−1,N −k) ≥ f ) = α. • Calculate A = p (k − 1)f . • Compute the standard error for each contrast to be tested. This is given by q sCm = MSerror (n1 c21m + n2 c22m + ... + nk c2km ). • Let cm be the observed value of a contrast. Reject the hypothesis that the contrast is equal to zero when |cm | > AsCm . 17 Note that most software packages, including SAS, will perform these (and other) tests and procedures for you. We will not spend much time on calculating them by hand, but ensure that you know where they come from, and which to use in a particular situation. Confidence Limits for Means We can form a 100(1 − α)% confidence interval for a mean µj by p y ·j ± t1−α/2 MSerror /nj where the mean square error in question is that from the denominator of the F test for the treatment mean. These are a summary of the procedures that we can use to further investigate the differences between means. This is only logical in the case that we can reject the global hypothesis that all of the treatment means are equal. This is also only logical in the case that we are in a fixed effects case. 18 Components of Variance In the previous slides, we considered what further steps could be taken in the case that we had a fixed effects model. However, in the case that we are interested in a random effects experiment, there are different questions that we can seek to answer when the global null hypothesis is rejected. In the random effects model, we are usually interested in answering the question ”what fraction of the total variability can be attributed to the differences in treatment means?” We claimed in an earlier section that E(MSerror ) = σ²2 but E(MStreatment ) = σ²2 + nστ2 . Suppose the we perform an experiment and find the MSerror = 112 and MStreatment = 48 with three replicates in each of four treatments. What would be a naive way to estimate σ²2 and στ2 ? 19 Now, notice that E(MStotal ) = n(k − 1) 2 στ + σ²2 . N −1 Read the derivation of these expectations in the text book. If you don’t understand them, please let me know. Checking the Model Recall that we assume that our samples are independent, random, normal samples from populations with equal variances. Often this may not be the case. Although it is not essential that all of the model assumptions be satisfied, we do need the assumptions to be reasonable. The first way that we will assess the validity of the assumptions is using plots of the data and the residuals. Define the residual as the difference between the observed value and the predicted value based upon the model. Denote the predicted value of Yij by Ŷij . In the one factor case, the predicted value is just the mean for the jth treatment, Ŷij = Y ·j . Then, the residual of Yij , Eij is Eij = Yij − Y ·j We would then do a normal quantile plot and a ShapiroWilk test on the residuals. If the assumptions are satisfied, the residuals are normally distributed, so if these tests indicate a lack of normality, we would believe that the assumptions are not satisfied. 20 Assessing the equality of variances across treatments can be initially done visually. If we plot the values of the residuals for each treatment separately and compare them by eye, we may sometimes see that the variances do not appear to be equal. Your text also contains a more quantitative assessment, based upon the quantity D4 . First, we would determine the average range of the data within the treatments. Then, multiply this by the D4 value from the table. If all of the ranges are less than this value, then the assumption of homogenous variance is reasonable. We need to ensure that the samples are independent, as violating this assumption leads to severe problems with ANOVA methods. If we have time series data, we can perform a test to assess if there is a violation of the independence assumption. We can calculate the sample correlation coefficient, r1 , as Pn−1 i=1 ei ei+1 r1 = P . n 2 e i=1 i The sampling distribution of this statistic is approximately normal with mean 0 and variance 1/n. Thus, we should consider that independence is suspect if the √ value of the statistic is greater than 1.96/ n. 21 In addition to the time series test, if it is applicable, we should consider plotting the residuals vs other variables in the model. One important plot to consider is the plot of the residuals vs the predicted values. If we see patterns in the plots, this is an indication that the data are not independent. Further, if we plot the observations vs the order in which the measurements were collected, we should again not see a pattern. If we do, we would again conclude that the data do not appear to be independent. What to do if the model is inadequate? One obvious thing to attempt if the model is inadequate is to try a different model. Sometimes this is not possible. If it is not possible to randomize completely, the restriction on randomization should be included in the model. If the data appear not to be normal, or if the variances are not homogeneous, we can try to do a transformation of the data to correct for this problem. After a transformation, the data should again be checked before the analysis is accepted. Your book contains suggestions for different types of transformations that should be considered for different types of data structures. 22