Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
Lecture 11. Experimental design. Blocking Jesper Rydén Matematiska institutionen, Uppsala universitet [email protected] Regression and Analysis of Variance • fall 2014 Blocking: An important technique in design of experiments Earlier example of blocking encountered: the paired t-test. Examples of blocks: people/operators, batches, time (days, measurement occasions) Randomized block design with four treatmentments A, B, C and D, and three blocks. Randomization within blocks. The model for a randomized block design The response y for a randomized block design is a function of two qualitative variables: blocks and treatments. Example of model with four treatments A, B, C and D and three blocks: y = β0 + β1 x1 + β2 x2 + β3 x3 + β4 x4 + β5 x5 + where x1 x1 x2 x2 x3 x3 x4 x4 x5 x5 =1 =0 =1 =0 =1 =0 =1 =0 =1 =0 if if if if if if if if if if measurement made in Block 2, not; measurement made in Block 3, not; treatment B is applied, not; treatment C is applied, not; treatment D is applied, not. How the randomized block design reduces noise Interpretations of β parameters? Suppose we are interested in predicting the average response for Treatment A in Block 1. For such an observation x1 = x2 = x3 = x4 = x5 = 0 and thus y = β0 + . Conclusion: β0 is the average response for Treatment A in Block 1. Blackboard ANOVA table: Randomized Complete Block Design Block design ANOVA table – we have met before? The two-factor factorial model, one observation per cell, looks exactly like the randomized complete block model. The experimental situations that lead to the two models are very different: Factorial model: All ab runs are made in random order. Randomized block model: Randomization occurs only within the block. Remarks Testing for block effect. Careful when testing for blocks using (MSBlocks /MSE ), should be used informally. See comments by Montgomery, referring to Box and Hunter. When is blocking necessary? Suppose an experiment is conducted as a randomized block design, and blocking was not really necessary. Then: ab observations and (a − 1)(b − 1) degrees of freedom for error. If run as completely randomized single-factor design with b replicates, ab observations and a(b − 1) degrees of freedom for error. When is blocking necessary? Montgomery and Runger (2011): As a general rule, when in doubt as to the importance of block effects, the experimenter should block and gamble that the block effect does exist. If the experimenter is wrong, the slight loss in the degrees of freedom for error will have a negligible effect, unless the number of degrees of freedom is very small. Example. Comparison of fertilizers The effect of three fertilizers, A, B and C, is investigated. The yield was measured at 12 test squares, more precisely 4 blocks with 3 test squares in each. Incomplete ANOVA table: Source Fertilizer Block Residual Total D.f. S. Sq. 312 132 24 468 MS Write down a model, complete the ANOVA table. Test hypotheses of interest. Fertilizer A is a standard one, while B and C are new options. Differences? Blackboard Blocking in two directions: Latin Square Experiment: The effect of five different chemicals on a material manufactured in a continous process (paper, textiles). The response of interest is the strength of the material, collected on a roll. Blocking along the roll (variability over time). Blocking across the roll. Latin squares – Fisher, Sudoku. . . Latin square A 5 × 5 latin-square design: Latin square: regression model Consider a 3 × 3 latin square. y = β0 + β1 x1 + β2 x2 + β3 x3 + β4 x4 + | {z } | {z } Row differences Column differences β x +β x | 5 5 {z 6 }6 Treatment differences + Latin square, cont. Let β0 be the average response for treatment A in row 1, column 1. Further: β1 β2 β3 β4 β5 β6 = = = = = = difference difference difference difference difference difference between between between between between between rows 2 and 1, rows 3 and 1, columns 2 and 1, columns 3 and 1, treatments B and A, treatments C and A. For example, the model for the observation in row 2, column 3 implies x1 = 1, x2 = 0, x3 = 0, x4 = 1, x5 = 0, x6 = 0. Latin Square: ANOVA table Example. Latin Square Gasoline consumption, four different gasolines A, B, C and D. Four cars, four drivers. D 19.5 B 18.0 A 18.1 C 20.1 B 16.8 A 17.9 D 21.0 D 19.2 A C B B 19.8 17.9 17.5 17.0 C 19.2 D 17.7 B 17.2 A 18.5 Is there a difference between treatments? Between cars, drivers? Example. Yield of grass. The yield from two types of grass is investigated. The yield could depend on the number of harvests per year: two, three or four times. Moreover, the experiment is conducted in four blocks. Block 1 2 3 4 Elephant grass 2 3 4 109 222 187 97 125 163 133 134 143 113 173 179 Guatemala 2 3 277 246 293 263 260 194 325 190 grass 4 252 181 224 248 Example, cont. ANOVA table (incomplete): Source G: Grass type H: Harvests/year G*H (interaction) Block Error Total D.f. S. Sq. 57526.04 225 18176.33 4478.46 12734.79 93140.63 M. Sq. Write down the model, complete the ANOVA table, test relevant hypotheses and make an interpretation of interaction. Which combination of grass type and harvests/year is most benefitial for the yield?