Download Lecture 3

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Regression analysis wikipedia , lookup

Linear regression wikipedia , lookup

Regression toward the mean wikipedia , lookup

Confidence interval wikipedia , lookup

Time series wikipedia , lookup

Data assimilation wikipedia , lookup

Choice modelling wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Coefficient of determination wikipedia , lookup

Transcript
What If There Are More Than
Two Factor Levels?
• The t-test does not directly apply
• There are lots of practical situations where there are either
more than two levels of interest, or there are several factors
of simultaneous interest
• The analysis of variance (ANOVA) is the appropriate
analysis “engine” for these types of experiments – Chapter
3, textbook
• The ANOVA was developed by Fisher in the early 1920s,
and initially applied to agricultural experiments
• Used extensively today for industrial experiments
1
An Example (See pg. 61)
• Consider an investigation into the relationship
between the RF (radio-frequency) power setting
and the etch rate for a wafer etching tool.
• The response variable is etch rate.
• The experimenter wants to determine the power
setting that will give a desired target etch rate.
• Other variables are fixed (gas, gap, etc.).
• RF power levels: 160, 180, 200, and 220 W.
• The experiment is replicated 5 times – runs made
in random order
2
An Example (cont’ed)
The run order should be randomized. First, we need to assign
numbers to the experimental run, e.g., as follows:
RF Power
Experimental Run Number
160
1
2
3
4
5
180
6
7
8
9
10
200
11
12
13
14
15
220
16
17
18
19
20
3
An Example (cont’ed)
Choose a random number between 1 and 20, and assign it as
the first to run/test, then the next. Until all 20 runs are
assigned a test number. E.g.:
Test Sequence
Run Number
Power Level
1
14
200
2
17
220
3
19
220
4
1
160
5
4
160
…
…
…
20
15
200
What if the test was run in the original nonrandomized order?
4
An Example (cont’ed)
5
The Analysis of Variance (Sec. 3-2, pg. 63)
• In general, there will be a levels of the factor, or a treatments, and n
replicates of the experiment, run in random order…a completely
randomized design (CRD)
• N = an total runs
• We consider the fixed effects case…the random effects case will be
discussed later
• Objective is to test hypotheses about the equality of the a treatment
means
6
The Analysis of Variance
• The name “analysis of variance” stems from a
partitioning of the total variability in the response
variable into components that are consistent with a
model for the experiment
• The basic single-factor ANOVA model is
 i  1, 2,..., a
yij     i   ij , 
 j  1, 2,..., n
  an overall mean,  i  ith treatment effect,
 ij  experimental error, NID(0,  2 )
7
Models for the Data
There are several ways to write a model for
the data:
yij     i   ij is called the effects model
Let i     i , then
yij  i   ij is called the means model
Regression models can also be employed
8
The Analysis of Variance
• Total variability is measured by the total sum of
squares:
a
n
SST   ( yij  y.. )2
i 1 j 1
• The basic ANOVA partitioning is:
a
n
a
n
2
(
y

y
)

[(
y

y
)

(
y

y
)]
 ij ..  i. .. ij i.
2
i 1 j 1
i 1 j 1
a
a
n
 n ( yi.  y.. ) 2   ( yij  yi. ) 2
i 1
i 1 j 1
SST  SSTreatments  SS E
9
The Analysis of Variance
SST  SSTreatments  SSE
• A large value of SSTreatments reflects large differences in
treatment means
• A small value of SSTreatments likely indicates no differences in
treatment means
• Formal statistical hypotheses are:
H 0 : 1  2 
 a
H1 : At least one mean is different
10
The Analysis of Variance
• While sums of squares cannot be directly compared to test
the hypothesis of equal means, mean squares can be
compared.
• A mean square is a sum of squares divided by its degrees
of freedom:
dfTotal  dfTreatments  df Error
an  1  a  1  a (n  1)
MSTreatments
SSTreatments
SS E

, MS E 
a 1
a (n  1)
• If the treatment means are equal, the treatment and error
mean squares will be (theoretically) equal.
• If treatment means differ, the treatment mean square will
be larger than the error mean square.
11
The Analysis of Variance is
Summarized in a Table
• Computing…see text, pp 70 – 74
• The reference distribution for F0 is the Fa-1, a(n-1) distribution
• Reject the null hypothesis (equal treatment means) if
F0 > Fa,a-1, a(n-1)
=> an upper tail, one tail critical region.
12
Calculation of Sum of Squares
2
y
SST   yij2  ..
N
i 1 j 1
a
n
1 a 2 y..2
SSTreatments   yi. 
n i 1
N
The error sum of squares is
SSE = SST – SSTreatments
Usually these are done using a computer program.
13
Confidence Intervals in ANOVA
• Can be established based on t-distribution
• A 100(1- a)% confidence interval on the ith
treatment mean i is:
yi.  ta / 2, N a
MS E
MS E
 i  yi.  ta / 2, N a
n
n
• A 100(1- a)% confidence interval on the
difference in any two treatments means is:
yi.  y j .  ta / 2, N a
2MS E
2MS E
 i  i  yi.  y j .  ta / 2, N a
n
n
14
Simultaneous Confidence Intervals
• Can be established based on one-at-a-time
confidence intervals.
• If there are r 100(1- a)% confidence intervals of
interest, the probability that the r intervals will
simultaneously be correct is at least 1-ra.
E.g. r = 5, a = 0.05, 1 – ra = 0.75
r = 10, a = 0.05, 1 – ra = 0.50
• Bonferroni method:
Replacing a in the equations with a /r, then the
simultaneous confidence intervals on treatment
means/differences have a confidence level of at
least 100(1- a)%.
15
ANOVA Computer Output
(Design-Expert)
Response:etch rate
ANOVA for Selected Factorial Model
Analysis of variance table [Partial sum of squares]
Sum of
Mean
F
Source Squares
DF
Square
Value Prob > F
Model 66870.55
3
22290.18
66.80 < 0.0001
A
66870.55
3
22290.18
66.80 < 0.0001
Pure Error 5339.20
16
333.70
Cor Total 72209.75
19
Std. Dev. 18.27
Mean
617.75
C.V.
2.96
PRESS 8342.50
R-Squared
Adj R-Squared
Pred R-Squared
Adeq Precision
0.9261
0.9122
0.8845
19.071
16