Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Regression analysis wikipedia , lookup
Linear regression wikipedia , lookup
Regression toward the mean wikipedia , lookup
Confidence interval wikipedia , lookup
Time series wikipedia , lookup
Data assimilation wikipedia , lookup
Choice modelling wikipedia , lookup
What If There Are More Than Two Factor Levels? • The t-test does not directly apply • There are lots of practical situations where there are either more than two levels of interest, or there are several factors of simultaneous interest • The analysis of variance (ANOVA) is the appropriate analysis “engine” for these types of experiments – Chapter 3, textbook • The ANOVA was developed by Fisher in the early 1920s, and initially applied to agricultural experiments • Used extensively today for industrial experiments 1 An Example (See pg. 61) • Consider an investigation into the relationship between the RF (radio-frequency) power setting and the etch rate for a wafer etching tool. • The response variable is etch rate. • The experimenter wants to determine the power setting that will give a desired target etch rate. • Other variables are fixed (gas, gap, etc.). • RF power levels: 160, 180, 200, and 220 W. • The experiment is replicated 5 times – runs made in random order 2 An Example (cont’ed) The run order should be randomized. First, we need to assign numbers to the experimental run, e.g., as follows: RF Power Experimental Run Number 160 1 2 3 4 5 180 6 7 8 9 10 200 11 12 13 14 15 220 16 17 18 19 20 3 An Example (cont’ed) Choose a random number between 1 and 20, and assign it as the first to run/test, then the next. Until all 20 runs are assigned a test number. E.g.: Test Sequence Run Number Power Level 1 14 200 2 17 220 3 19 220 4 1 160 5 4 160 … … … 20 15 200 What if the test was run in the original nonrandomized order? 4 An Example (cont’ed) 5 The Analysis of Variance (Sec. 3-2, pg. 63) • In general, there will be a levels of the factor, or a treatments, and n replicates of the experiment, run in random order…a completely randomized design (CRD) • N = an total runs • We consider the fixed effects case…the random effects case will be discussed later • Objective is to test hypotheses about the equality of the a treatment means 6 The Analysis of Variance • The name “analysis of variance” stems from a partitioning of the total variability in the response variable into components that are consistent with a model for the experiment • The basic single-factor ANOVA model is i 1, 2,..., a yij i ij , j 1, 2,..., n an overall mean, i ith treatment effect, ij experimental error, NID(0, 2 ) 7 Models for the Data There are several ways to write a model for the data: yij i ij is called the effects model Let i i , then yij i ij is called the means model Regression models can also be employed 8 The Analysis of Variance • Total variability is measured by the total sum of squares: a n SST ( yij y.. )2 i 1 j 1 • The basic ANOVA partitioning is: a n a n 2 ( y y ) [( y y ) ( y y )] ij .. i. .. ij i. 2 i 1 j 1 i 1 j 1 a a n n ( yi. y.. ) 2 ( yij yi. ) 2 i 1 i 1 j 1 SST SSTreatments SS E 9 The Analysis of Variance SST SSTreatments SSE • A large value of SSTreatments reflects large differences in treatment means • A small value of SSTreatments likely indicates no differences in treatment means • Formal statistical hypotheses are: H 0 : 1 2 a H1 : At least one mean is different 10 The Analysis of Variance • While sums of squares cannot be directly compared to test the hypothesis of equal means, mean squares can be compared. • A mean square is a sum of squares divided by its degrees of freedom: dfTotal dfTreatments df Error an 1 a 1 a (n 1) MSTreatments SSTreatments SS E , MS E a 1 a (n 1) • If the treatment means are equal, the treatment and error mean squares will be (theoretically) equal. • If treatment means differ, the treatment mean square will be larger than the error mean square. 11 The Analysis of Variance is Summarized in a Table • Computing…see text, pp 70 – 74 • The reference distribution for F0 is the Fa-1, a(n-1) distribution • Reject the null hypothesis (equal treatment means) if F0 > Fa,a-1, a(n-1) => an upper tail, one tail critical region. 12 Calculation of Sum of Squares 2 y SST yij2 .. N i 1 j 1 a n 1 a 2 y..2 SSTreatments yi. n i 1 N The error sum of squares is SSE = SST – SSTreatments Usually these are done using a computer program. 13 Confidence Intervals in ANOVA • Can be established based on t-distribution • A 100(1- a)% confidence interval on the ith treatment mean i is: yi. ta / 2, N a MS E MS E i yi. ta / 2, N a n n • A 100(1- a)% confidence interval on the difference in any two treatments means is: yi. y j . ta / 2, N a 2MS E 2MS E i i yi. y j . ta / 2, N a n n 14 Simultaneous Confidence Intervals • Can be established based on one-at-a-time confidence intervals. • If there are r 100(1- a)% confidence intervals of interest, the probability that the r intervals will simultaneously be correct is at least 1-ra. E.g. r = 5, a = 0.05, 1 – ra = 0.75 r = 10, a = 0.05, 1 – ra = 0.50 • Bonferroni method: Replacing a in the equations with a /r, then the simultaneous confidence intervals on treatment means/differences have a confidence level of at least 100(1- a)%. 15 ANOVA Computer Output (Design-Expert) Response:etch rate ANOVA for Selected Factorial Model Analysis of variance table [Partial sum of squares] Sum of Mean F Source Squares DF Square Value Prob > F Model 66870.55 3 22290.18 66.80 < 0.0001 A 66870.55 3 22290.18 66.80 < 0.0001 Pure Error 5339.20 16 333.70 Cor Total 72209.75 19 Std. Dev. 18.27 Mean 617.75 C.V. 2.96 PRESS 8342.50 R-Squared Adj R-Squared Pred R-Squared Adeq Precision 0.9261 0.9122 0.8845 19.071 16