Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Islamic University, Gaza - Palestine Chapter 3 Experiments with a Single Factor: The Analysis of Variance 1 Islamic University, Gaza - Palestine 3.1 An Example • Chapter 2: A signal-factor experiment with two levels of the factor • Consider signal-factor experiments with a levels of the factor, a 2 • Example: – – – – The tensile strength of a new synthetic fiber. The weight percent of cotton Five levels: 15%, 20%, 25%, 30%, 35% a = 5 and n = 5 2 Islamic University, Gaza - Palestine • Does changing the cotton weight percent change the mean tensile strength? • Is there an optimum level for cotton content? 3 Islamic University, Gaza - Palestine 3.2 The Analysis of Variance • a levels (treatments) of a factor and n replicates for each level. • yij: the jth observation taken under factor level or treatment i. 4 Islamic University, Gaza - Palestine Models for the Data • Means model: i 1,2,..., a y ij i ij , j 1,2,..., n – yij is the ij th observation, – i is the mean of the ith factor level, – ij is a random error with mean zero, • Effects model: i 1,2,..., a y ij i ij , j 1,2,..., n 5 Islamic University, Gaza - Palestine • Linear statistical model • One-way or Signal-factor analysis of variance model • Completely randomized design: the experiments are performed in random order so that the environment in which the treatment are applied is as uniform as possible. • For hypothesis testing, the model errors are assumed to be normally and independently distributed random variables with mean zero and variance, σ2, i.e. yij ~ N(μ+τi, σ2) • Fixed effect model: a levels have been specifically chosen by the experimenter. 6 Islamic University, Gaza - Palestine 3.3 Analysis of the Fixed Effects Model • Interested in testing the equality of the a treatment means, and E(yij) = μ + τi = μi , i = 1,2, …, a H0: μ1 = μ2 = …… = μa H1: μi ≠ μj, for at least one pair (i, j) • Constraint (Restraint): • H0: τ1 = τ2 = … = τa = 0 v.s. H1: τi ≠ 0, for at least one i i a i i 0 i 7 Islamic University, Gaza - Palestine • Notations: n a n y i y ij , y y ij j 1 i 1 j 1 y i y i / n , y y / N , N an an: the total number of observations. 3.3.1 Decomposition of the Total Sum of Squares • • Total variability into its component parts. The total sum of squares (a measure of overall variability a n in the data) 2 SST ( yij y.. ) i 1 j 1 • Degree of freedom: an – 1 = N – 1 8 Islamic University, Gaza - Palestine a n a n 2 ( y y ) [( y y ) ( y y )] ij .. i . .. ij i. i 1 j 1 2 i 1 j 1 a a n n ( y i . y .. ) ( y ij y i . )2 i 1 2 i 1 j 1 SS T SS Treatments SS Error • SSTreatment: sum of squares of the differences between the treatment averages (sum of squares due to treatments) and the grand average, and a – 1 degree of freedom • SSE: sum of squares of the differences of observations within treatments from the treatment average (sum of squares due to error), and N – a degrees of freedom. 9 Islamic University, Gaza - Palestine SST SSTreatments SS E • A large value of SSTreatments reflects large differences in treatment means • A small value of SSTreatments likely indicates no differences in treatment means • dfTotal = dfTreatment + dfError • SSE (n1)S12 (n 1)Sa2 N a (n 1)(n 1) •If there are no differences between a treatment means, SS Treatments a 1 n ( y i y ) 2 i a 1 10 Islamic University, Gaza - Palestine • Mean squares: MS Treatments SS Treatments SS E , MS E a 1 N a a n a 1 1 E ( MS E ) E ( y ij2 y i2 ) 2 N a i 1 j 1 n i 1 a E ( MS Treatments ) 2 n( i ) /(a 1) i 1 3.3.2 Statistical Analysis • Assumption: ξij are normally and independently distributed with mean zero and variance σ2 11 Islamic University, Gaza - Palestine • SST/ σ 2 ~ Chi-square (N – 1), SSE/ σ2 ~ Chi-square (N – a), SSTreatments/σ2 ~ Chi-square (a – 1), and SSE/ σ2 and SSTreatments/ σ2 are independent (Theorem 3.1) • H0: τ1 = τ2 = …. = τa = 0 v.s. H1: τi ≠ 0, for at least one i Islamic University, Gaza - Palestine • Reject H0 if F0 > Fα, a-1, N-a • Rewrite the sum of squares: • See page 71 y2 SS T y ij N i 1 j 1 a n 1 a 2 y2 SS Treatments y i n i 1 N SS E SS T SS Treatments 13 Islamic University, Gaza - Palestine Response:Strength ANOVA for Selected Factorial Model Analysis of variance table [Partial sum of squares] Sum of Mean F SourceSquares DF Square Value Prob > F Model 475.76 4 118.94 14.76 < 0.0001 A 475.76 4 118.94 14.76 < 0.0001 Pure Error161.20 20 8.06 Cor Total636.96 24 Std. Dev. 2.84 Mean 15.04 C.V. 18.88 PRESS 251.88 R-Squared Adj R-Squared Pred R-Squared Adeq Precision 0.7469 0.6963 0.6046 9.294 14 Islamic University, Gaza - Palestine 3.3.3 Estimation of the Model Parameters • Model: yij = µ + τi +ξij • Estimators: • Confidence intervals: ˆ y ˆi y i y ˆ i y i ̂ y i ~ N ( i , 2 / n) y i t / 2, N a MS E MS E i y i t / 2 , N a n n y i y j t / 2, N a MS E MS E i j y i y j t / 2, N a n n 15 Islamic University, Gaza - Palestine • Example 3.3 (page 75) • Simultaneous Confidence Intervals (Bonferroni method): Construct a set of r simultaneous confidence intervals on treatment means which is at least 100(1-): 100(1-/r) C.I.’s 3.3.4 Unbalanced Data • Let ni observations be taken under treatment i, i=1,2,…,a, N = i ni, ( some of the measured data are missed) 2 y SS T y ij2 N i 1 j 1 a ni a SS Treatments i 1 y i2 y2 ni N 16 Islamic University, Gaza - Palestine 1. The test statistic is relatively insensitive to small departures from the assumption of equal variance for the a treatments if the sample sizes are equal. 2. The power of the test is maximized if the samples are of equal size. 17 Islamic University, Gaza - Palestine 3.4 Model Adequacy Checking • Assumptions: yij ~ N(µ+τi, σ2) • The examination of residuals • Definition of residual: • The residuals should be structure-less. eij y ij yˆ ij , yˆ ij ˆ ˆi y ( y i y ) y i 18 Islamic University, Gaza - Palestine 3.4.1 The Normality Assumption • Plot a histogram of the residuals • Plot a normal probability plot of the residuals • See Table 3-6 19 Islamic University, Gaza - Palestine • May be – Slightly skewed (right tail is longer than left tail) – Light tail (the left tail of error is thinner than the tail part of standard normal) • Outliers • The possible causes of outliers: calculations, data coding, copy error,…. • Sometimes outliers are more informative than the rest of the data. 20 Islamic University, Gaza - Palestine • Detect outliers: Examine the standardized residuals, d ij eij MS E 3.4.2 Plot of Residuals in Time Sequence • Plotting the residuals in time order of data collection is helpful in detecting correlation between the residuals. • Independence assumption 21 Islamic University, Gaza - Palestine R e s i d u a ls v s . R u n 5 .2 R es iduals 2 .9 5 0 .7 - 1 .5 5 - 3 .8 1 4 7 10 13 16 19 22 25 Run Num ber 22 Islamic University, Gaza - Palestine 3.4.3 Plot of Residuals Versus Fitted Values • Plot the residuals versus the fitted values R e s i d u a ls v s . P r e d i c t e d • Structure-less 5 .2 2 .9 5 R es iduals 2 2 0 .7 2 2 - 1 .5 5 2 2 2 - 3 .8 9 .8 0 1 2 .7 5 1 5 .7 0 1 8 .6 5 2 1 .6 0 P r e d i c te d 23 Islamic University, Gaza - Palestine • Nonconstant variance: the variance of the observations increases as the magnitude of the observation increase, i.e. yij 2 • If the factor levels having the larger variance also have small sample sizes, the actual type I error rate is larger than anticipated. • Variance-stabilizing transformation Poisson Square root transformation yij Lognormal Logarithmic transformation log yij Binomial Arcsin transformation arcsin y ij Islamic University, Gaza - Palestine • Statistical Tests for Equality Variance: H 0 : 12 a2 v.s. H 1 : above not true for at least one i2 q 2 – Bartlett’s test: 0 2.3026 c a q ( N a ) log S (ni 1) log S i2 2 P i 1 1 a 1 1 c 1 (ni 1) ( N a ) 3(a 1) i 1 a S p2 (ni 1) S i2 /( N a ) i 1 – Reject null hypothesis if 02 2 ,a 1 Islamic University, Gaza - Palestine • Example 3.4: the test statistic is 02 0.93 and 02.05, 4 9.49 • Bartlett’s test is sensitive to the normality assumption • The modified Levene test: – Use the absolute deviation of the observation in each treatment from the treatment median. d ij y ij ~ y i , i 1,2, , a, j 1,2, , ni – Mean deviations are equal => the variance of the observations in all treatments will be the same. – The test statistic for Levene’s test is the ANOVA F statistic for testing equality of means. 26 Islamic University, Gaza - Palestine • Example 3.5: • – Four methods of estimating flood flow frequency procedure (see Table 3.7) – ANOVA table (Table 3.8) – The plot of residuals v.s. fitted values (Figure 3.7) – Modified Levene’s test: F0 = 4.55 with P-value = 0.0137. Reject the null hypothesis of equal variances. 27 Islamic University, Gaza - Palestine • • • • Let E(y) = and y Find y* = y that yields a constant variance. * +-1 Variance-Stabilizing Transformations * and = 1 - Transformation *constant 0 1 No transformation * 1/2 ½ ½ Square root * 1 0 Log * 3/2 3/2 -1/2 Reciprocal square root * 2 2 -1 Reciprocal 28 Islamic University, Gaza - Palestine • How to find : • Use S i i and y i i log yi log log i • See Figure 3.8, Table 3.10 and Figure 3.9 29 Islamic University, Gaza - Palestine 3.5 Practical Interpretation of Results • Conduct the experiment => perform the statistical analysis => investigate the underlying assumptions => draw practical conclusion 3.5.1 A Regression Model • Qualitative factor: compare the difference between the levels of the factors. • Quantitative factor: develop an interpolation equation for the response variable. Islamic University, Gaza - Palestine Regression analysis : See Figure 3.1 25 X = A: Cotton Weight % 20.5 Final Equation in Terms of Actual Factors: This is an empirical model of the experimental results 2 Strength Strength = +62.61143 -9.01143* Cotton Weight % +0.48143 * Cotton Weight %^2 -7.60000E-003 * Cotton Weight %^3 2 2 2 16 2 11.5 7 2 2 15.00 20.00 25.00 30.00 A: Cotton 31 Weight % 35.00 Islamic University, Gaza - Palestine 3.5.2 Comparisons Among Treatment Means • If that hypothesis is rejected, we don’t know which specific means are different • Determining which specific means differ following an ANOVA is called the multiple comparisons problem 3.5.3 Graphical Comparisons of Means Islamic University, Gaza - Palestine 3.5.4 Contrast • A contrast: a linear combination of the parameters of the form a a i 1 i 1 ci i , ci 0 • H0: = 0 v.s. H1: 0 • Two methods for this testing. 33 Islamic University, Gaza - Palestine The first method: a a i 1 i 1 Let C ci y i Then Var (C ) n 2 ci2 a Under H 0 , c y i 1 i i a ~ N (0,1) n 2 ci2 i 1 a Hence the statistic, t 0 c y i 1 i i a nMS E ci2 i 1 ~ t N a Islamic University, Gaza - Palestine • The second method: a F0 t 02 ( ci y i ) 2 i 1 a nMS E ci2 ~F1,N a i 1 a ci y i MS C SS C / 1 , SS C i 1 a F0 MS E MS E n ci2 i 1 35 Islamic University, Gaza - Palestine The C.I. for a contrast, a ci i i 1 σ2 Let C ci y i . Then Var(C) n i 1 a MS E n a Hence C.I. ci y i t / 2, N a i 1 a 2 c i i 1 a 2 c i i 1 • Unequal Sample Size ci y i ci y i i 1 3. SSC i a1 a 2 2 n c MS E ni ci ii a a a 1. ni ci 0 2. t 0 i 1 i 1 i 1 2 Islamic University, Gaza - Palestine 3.5.5 Orthogonal Contrast • Two contrasts with coefficients, {ci} and {di}, are orthogonal if ci di = 0 • For a treatments, the set of a – 1 orthogonal contrasts partition the sum of squares due to treatments into a – 1 independent single-degree-of-freedom components. Thus, tests performed on orthogonal contrasts are independent. • See Example 3.6 (Page 94) 37 Islamic University, Gaza - Palestine 3.5.6 Scheffe’s Method for Comparing All Contrasts • Scheffe (1953) proposed a method for comparing any and all possible contrasts between treatment means. Suppose u c1u 1 c au a , u 1,2, , m a C u ciu y i and S Cu MS E (ciu2 / ni ) i 1 i 1 The critical value : S ,u S Cu (a 1) F ,a 1, N a If C u S ,u , then reject H 0 : u 0 • See Page 95 and 96 Islamic University, Gaza - Palestine 3.5.7 Comparing Pairs of Treatment Means • Compare all pairs of a treatment means • Tukey’s Test: – The studentized range statistic: q y max y min MS E / n , y max and y min are the largest and smallest sample means out of a group of p sample means MS E The critical point is T q (a, f ) n or T q (a, f ) MS E (1 / ni 1 / n j ) – See Example 3.7 Islamic University, Gaza - Palestine • Sometimes overall F test from ANOVA is significant, but the pairwise comparison of mean fails to reveal any significant differences. • The F test is simultaneously considering all possible contrasts involving the treatment means, not just pairwise comparisons. The Fisher Least Significant Difference (LSD) Method • For H0: i = j t0 y i y j MS E (1 / ni 1 / n j ) Islamic University, Gaza - Palestine • The least significant difference (LSD): LSD t / 2, N a • See Example 3.8 1 1 MS E n n j i Duncan’s Multiple Range Test • The a treatment averages are arranged in ascending order, and the standard error of each average is determined as S yi MS E , nh nh a a 1 / n i 1 i Islamic University, Gaza - Palestine • Assume equal sample size, the significant ranges are R P r p, f S yi , p 2,3, , a • Total a(a-1)/2 pairs • Example 3.9 The Newman-Keuls Test • Similar as Duncan’s multiple range test • The critical values: K P q ( p, f ) S yi 42 Islamic University, Gaza - Palestine 3.5.8 Comparing Treatment Means with a Control • Assume one of the treatments is a control, and the analyst is interested in comparing each of the other a – 1 treatment means with the control. • Test H0: i = a v.s. H1: : i a, i = 1,2,…, a – 1 • Dunnett (1964) • Compute y y , i 1,2, , a 1 i • Reject H0 if y i y a a 1 1 d (a 1, f ) MS E ni n a • Example 3.10 43 Islamic University, Gaza - Palestine 3.7 Determining Sample Size • Determine the number of replicates to run 3.7.1 Operating Characteristic Curves (OC Curves) • OC curves: a plot of type II error probability of a statistical test, 1 PReject H 0 | H 0 is false 1 P ( F0 F ,a 1, N a | H 0 is false) 44 Islamic University, Gaza - Palestine • If H0 is false, then F0 = MSTreatment / MSE ~ noncentral F with degree of freedom a – 1 and N – a and noncentrality parameter • Chart V of the Appendix • Determine a 2 n i2 i 1 a 2 • Let i be the specified treatments. Then estimates of i : • For 2, from prior experience, a previous experiment or a preliminary test or a judgment estimate. a i i , i / a i 1 45 Islamic University, Gaza - Palestine • Example 3.11 • Difficulty: How to select a set of treatment means on which the sample size decision should be based. • Another approach: Select a sample size such that if the difference between any two treatment means exceeds a specified value the null hypothesis should be rejected. 2 nD 2 a 2 Islamic University, Gaza - Palestine 3.7.2 Specifying a Standard Deviation Increase • Let P be a percentage for increase in standard deviation of an observation. Then a 2 i /a i 1 / n 1 0.01P 2 1 n • For example (Page 110): If P = 20, then 1.2 2 1 n 0.66 n 47 Islamic University, Gaza - Palestine 3.7.3 Confidence Interval Estimation Method • Use Confidence interval. y i y j t / 2, N a MS E MS E i j y i y j t / 2, N a n n • For example: we want 95% C.I. on the difference in mean tensile strength for any two cotton weight percentages to be 5 psi and = 3. See Page 110. 48 Islamic University, Gaza - Palestine 3.9 The Regression Approach to the Analysis of Variance Model: yij = + i + ij 2 L ij2 yij i a n i 1 j 1 a n i 1 j 1 L L 0, i 1,2,, a i y a n i 1 j 1 ˆ ˆi 0 & yij ˆ ˆi 0, i 1,2,, a n ij j 1 Islamic University, Gaza - Palestine • The normal equations Nˆ nˆ1 nˆ nˆ1 nˆ nˆ • Apply the constraint nˆ2 nˆ2 nˆa nˆa y y1 y 2 y a ˆ y ,ˆi y i y Then estimations are • Regression sum of squares (the reduction due to fitting the full model) a a i 1 i 1 R( , ) ˆy ˆi y i y i2 n Islamic University, Gaza - Palestine The error sum of squares: a n SS E y ij2 R , i 1 j 1 Find the sum of squares resulting from the treatment effects: R( | ) R( , ) R( ) R(Full Model) - R(Reduced Model) y y /n N i 1 2 2 i 51 Islamic University, Gaza - Palestine • The testing statistic for H0: 1 = … = a R( | ) /(a 1) F0 ~ Fa 1, N a a n 2 y ij R( , ) /( N a) i 1 j 1 52