Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
Islamic University, Gaza - Palestine
Chapter 3 Experiments with a Single Factor: The
Analysis of Variance
1
Islamic University, Gaza - Palestine
3.1 An Example
• Chapter 2: A signal-factor experiment with two levels
of the factor
• Consider signal-factor experiments with a levels of
the factor, a 2
• Example:
–
–
–
–
The tensile strength of a new synthetic fiber.
The weight percent of cotton
Five levels: 15%, 20%, 25%, 30%, 35%
a = 5 and n = 5
2
Islamic University, Gaza - Palestine
• Does changing the cotton
weight percent change the
mean tensile strength?
• Is there an optimum level for
cotton content?
3
Islamic University, Gaza - Palestine
3.2 The Analysis of Variance
• a levels (treatments) of a factor and n replicates for each
level.
• yij: the jth observation taken under factor level or treatment
i.
4
Islamic University, Gaza - Palestine
Models for the Data
• Means model:
i 1,2,..., a
y ij i ij ,
j 1,2,..., n
– yij is the ij th observation,
– i is the mean of the ith factor level,
– ij is a random error with mean zero,
• Effects model:
i 1,2,..., a
y ij i ij ,
j 1,2,..., n
5
Islamic University, Gaza - Palestine
• Linear statistical model
• One-way or Signal-factor analysis of variance model
• Completely randomized design: the experiments are performed
in random order so that the environment in which the treatment
are applied is as uniform as possible.
• For hypothesis testing, the model errors are assumed to be
normally and independently distributed random variables with
mean zero and variance, σ2, i.e. yij ~ N(μ+τi, σ2)
• Fixed effect model: a levels have been specifically chosen by
the experimenter.
6
Islamic University, Gaza - Palestine
3.3 Analysis of the Fixed Effects Model
• Interested in testing the equality of the a treatment means, and
E(yij) = μ + τi = μi , i = 1,2, …, a
H0: μ1 = μ2 = …… = μa
H1: μi ≠ μj, for at least one pair (i, j)
• Constraint (Restraint):
• H0: τ1 = τ2 = … = τa = 0 v.s. H1: τi ≠ 0, for at least one i
i
a
i
i 0
i
7
Islamic University, Gaza - Palestine
•
Notations:
n
a
n
y i y ij , y y ij
j 1
i 1 j 1
y i y i / n , y y / N , N an
an: the total number of observations.
3.3.1 Decomposition of the Total Sum of Squares
•
•
Total variability into its component parts.
The total sum of squares (a measure of overall variability
a
n
in the data)
2
SST ( yij y.. )
i 1 j 1
•
Degree of freedom: an – 1 = N – 1
8
Islamic University, Gaza - Palestine
a
n
a
n
2
(
y
y
)
[(
y
y
)
(
y
y
)]
ij .. i . ..
ij
i.
i 1 j 1
2
i 1 j 1
a
a
n
n ( y i . y .. ) ( y ij y i . )2
i 1
2
i 1 j 1
SS T SS Treatments SS Error
• SSTreatment: sum of squares of the differences between the
treatment averages (sum of squares due to treatments) and the
grand average, and a – 1 degree of freedom
• SSE: sum of squares of the differences of observations within
treatments from the treatment average (sum of squares due to
error), and N – a degrees of freedom.
9
Islamic University, Gaza - Palestine
SST SSTreatments SS E
• A large value of SSTreatments reflects large differences in treatment means
• A small value of SSTreatments likely indicates no differences in treatment
means
• dfTotal = dfTreatment + dfError
•
SSE (n1)S12 (n 1)Sa2
N a (n 1)(n 1)
•If there are no differences between a treatment means,
SS Treatments
a 1
n ( y i y ) 2
i
a 1
10
Islamic University, Gaza - Palestine
• Mean squares:
MS Treatments
SS Treatments
SS E
, MS E
a 1
N a
a
n
a
1
1
E ( MS E )
E ( y ij2 y i2 ) 2
N a i 1 j 1
n i 1
a
E ( MS Treatments ) 2 n( i ) /(a 1)
i 1
3.3.2 Statistical Analysis
• Assumption: ξij are normally and independently distributed
with mean zero and variance σ2
11
Islamic University, Gaza - Palestine
• SST/ σ 2 ~ Chi-square (N – 1), SSE/ σ2 ~ Chi-square (N – a),
SSTreatments/σ2 ~ Chi-square (a – 1), and SSE/ σ2 and
SSTreatments/ σ2 are independent (Theorem 3.1)
• H0: τ1 = τ2 = …. = τa = 0 v.s. H1: τi ≠ 0, for at least one i
Islamic University, Gaza - Palestine
• Reject H0 if F0 > Fα, a-1, N-a
• Rewrite the sum of squares:
• See page 71
y2
SS T y ij
N
i 1 j 1
a
n
1 a 2 y2
SS Treatments y i
n i 1
N
SS E SS T SS Treatments
13
Islamic University, Gaza - Palestine
Response:Strength
ANOVA for Selected Factorial Model
Analysis of variance table [Partial sum of squares]
Sum of
Mean
F
SourceSquares
DF
Square
Value Prob > F
Model 475.76
4
118.94
14.76 < 0.0001
A
475.76
4
118.94
14.76 < 0.0001
Pure Error161.20
20
8.06
Cor Total636.96
24
Std. Dev. 2.84
Mean
15.04
C.V.
18.88
PRESS 251.88
R-Squared
Adj R-Squared
Pred R-Squared
Adeq Precision
0.7469
0.6963
0.6046
9.294
14
Islamic University, Gaza - Palestine
3.3.3 Estimation of the Model Parameters
• Model: yij = µ + τi +ξij
• Estimators:
• Confidence intervals:
ˆ y
ˆi y i y
ˆ i y i
̂
y i ~ N ( i , 2 / n)
y i t / 2, N a
MS E
MS E
i y i t / 2 , N a
n
n
y i y j t / 2, N a
MS E
MS E
i j y i y j t / 2, N a
n
n
15
Islamic University, Gaza - Palestine
• Example 3.3 (page 75)
• Simultaneous Confidence Intervals (Bonferroni method):
Construct a set of r simultaneous confidence intervals on
treatment means which is at least 100(1-): 100(1-/r) C.I.’s
3.3.4 Unbalanced Data
• Let ni observations be taken under treatment i, i=1,2,…,a, N =
i ni, ( some of the measured data are missed)
2
y
SS T y ij2
N
i 1 j 1
a
ni
a
SS Treatments
i 1
y i2 y2
ni
N
16
Islamic University, Gaza - Palestine
1. The test statistic is relatively insensitive to small
departures from the assumption of equal variance for the
a treatments if the sample sizes are equal.
2. The power of the test is maximized if the samples are of
equal size.
17
Islamic University, Gaza - Palestine
3.4 Model Adequacy Checking
• Assumptions: yij ~ N(µ+τi, σ2)
• The examination of residuals
• Definition of residual:
• The residuals should be structure-less.
eij y ij yˆ ij ,
yˆ ij ˆ ˆi y ( y i y ) y i
18
Islamic University, Gaza - Palestine
3.4.1 The Normality Assumption
• Plot a histogram of the residuals
• Plot a normal probability plot of the residuals
• See Table 3-6
19
Islamic University, Gaza - Palestine
• May be
– Slightly skewed (right tail is longer than left tail)
– Light tail (the left tail of error is thinner than the tail part of
standard normal)
• Outliers
• The possible causes of outliers: calculations, data coding,
copy error,….
• Sometimes outliers are more informative than the rest of
the data.
20
Islamic University, Gaza - Palestine
• Detect outliers: Examine the standardized residuals,
d ij
eij
MS E
3.4.2 Plot of Residuals in Time Sequence
• Plotting the residuals in time order of data collection is
helpful in detecting correlation between the residuals.
• Independence assumption
21
Islamic University, Gaza - Palestine
R e s i d u a ls v s . R u n
5 .2
R es iduals
2 .9 5
0 .7
- 1 .5 5
- 3 .8
1
4
7
10
13
16
19
22
25
Run Num ber
22
Islamic University, Gaza - Palestine
3.4.3 Plot of Residuals Versus Fitted Values
• Plot the residuals versus the fitted values
R e s i d u a ls v s . P r e d i c t e d
• Structure-less
5 .2
2 .9 5
R es iduals
2
2
0 .7
2
2
- 1 .5 5
2
2
2
- 3 .8
9 .8 0
1 2 .7 5
1 5 .7 0
1 8 .6 5
2 1 .6 0
P r e d i c te d
23
Islamic University, Gaza - Palestine
• Nonconstant variance: the variance of the observations increases as
the magnitude of the observation increase, i.e. yij 2
• If the factor levels having the larger variance also have small sample
sizes, the actual type I error rate is larger than anticipated.
• Variance-stabilizing transformation
Poisson
Square root transformation yij
Lognormal
Logarithmic transformation log yij
Binomial
Arcsin transformation arcsin y ij
Islamic University, Gaza - Palestine
• Statistical Tests for Equality Variance:
H 0 : 12 a2 v.s. H 1 : above not true for at least one i2
q
2
– Bartlett’s test:
0 2.3026
c
a
q ( N a ) log S (ni 1) log S i2
2
P
i 1
1 a
1
1
c 1
(ni 1) ( N a )
3(a 1) i 1
a
S p2 (ni 1) S i2 /( N a )
i 1
– Reject null hypothesis if
02 2 ,a 1
Islamic University, Gaza - Palestine
• Example 3.4: the test statistic is
02 0.93 and 02.05, 4 9.49
• Bartlett’s test is sensitive to the normality assumption
• The modified Levene test:
– Use the absolute deviation of the observation in each treatment
from the treatment median.
d ij y ij ~
y i , i 1,2, , a, j 1,2, , ni
– Mean deviations are equal => the variance of the observations
in all treatments will be the same.
– The test statistic for Levene’s test is the ANOVA F statistic for
testing equality of means.
26
Islamic University, Gaza - Palestine
• Example 3.5:
•
– Four methods of estimating flood flow frequency procedure (see
Table 3.7)
– ANOVA table (Table 3.8)
– The plot of residuals v.s. fitted values (Figure 3.7)
– Modified Levene’s test: F0 = 4.55 with P-value = 0.0137. Reject the
null hypothesis of equal variances.
27
Islamic University, Gaza - Palestine
•
•
•
•
Let E(y) = and y
Find y* = y that yields a constant variance.
* +-1
Variance-Stabilizing Transformations
* and
= 1 -
Transformation
*constant
0
1
No transformation
* 1/2
½
½
Square root
*
1
0
Log
* 3/2
3/2
-1/2
Reciprocal square root
* 2
2
-1
Reciprocal
28
Islamic University, Gaza - Palestine
• How to find :
• Use
S i i and y i i
log yi log log i
• See Figure 3.8, Table 3.10 and Figure 3.9
29
Islamic University, Gaza - Palestine
3.5 Practical Interpretation of Results
• Conduct the experiment => perform the statistical analysis =>
investigate the underlying assumptions => draw practical
conclusion
3.5.1 A Regression Model
• Qualitative factor: compare the difference between the levels
of the factors.
• Quantitative factor: develop an interpolation equation for the
response variable.
Islamic University, Gaza - Palestine
Regression analysis : See Figure 3.1 25
X = A: Cotton Weight %
20.5
Final Equation in Terms of
Actual Factors:
This is an empirical model of
the experimental results
2
Strength
Strength = +62.61143
-9.01143* Cotton Weight %
+0.48143 * Cotton Weight
%^2 -7.60000E-003 *
Cotton Weight %^3
2
2
2
16
2
11.5
7
2
2
15.00
20.00
25.00
30.00
A: Cotton
31 Weight %
35.00
Islamic University, Gaza - Palestine
3.5.2 Comparisons Among Treatment Means
• If that hypothesis is rejected, we don’t know which
specific means are different
• Determining which specific means differ following an
ANOVA is called the multiple comparisons problem
3.5.3 Graphical Comparisons of Means
Islamic University, Gaza - Palestine
3.5.4 Contrast
• A contrast: a linear combination of the parameters of the form
a
a
i 1
i 1
ci i , ci 0
• H0: = 0 v.s. H1: 0
• Two methods for this testing.
33
Islamic University, Gaza - Palestine
The first method:
a
a
i 1
i 1
Let C ci y i Then Var (C ) n 2 ci2
a
Under H 0 ,
c y
i 1
i
i
a
~ N (0,1)
n 2 ci2
i 1
a
Hence the statistic, t 0
c y
i 1
i
i
a
nMS E ci2
i 1
~ t N a
Islamic University, Gaza - Palestine
• The second method:
a
F0 t 02
( ci y i ) 2
i 1
a
nMS E ci2
~F1,N a
i 1
a
ci y i
MS C SS C / 1
, SS C i 1 a
F0
MS E
MS E
n ci2
i 1
35
Islamic University, Gaza - Palestine
The C.I. for a contrast,
a
ci i
i 1
σ2
Let C ci y i . Then Var(C)
n
i 1
a
MS E
n
a
Hence C.I. ci y i t / 2, N a
i 1
a
2
c
i
i 1
a
2
c
i
i 1
• Unequal Sample Size
ci y i
ci y i
i 1
3. SSC i a1
a
2
2
n
c
MS E ni ci
ii
a
a
a
1. ni ci 0 2. t 0
i 1
i 1
i 1
2
Islamic University, Gaza - Palestine
3.5.5 Orthogonal Contrast
• Two contrasts with coefficients, {ci} and {di}, are orthogonal if
ci di = 0
• For a treatments, the set of a – 1 orthogonal contrasts partition
the sum of squares due to treatments into a – 1 independent
single-degree-of-freedom components. Thus, tests performed
on orthogonal contrasts are independent.
• See Example 3.6 (Page 94)
37
Islamic University, Gaza - Palestine
3.5.6 Scheffe’s Method for Comparing All Contrasts
• Scheffe (1953) proposed a method for comparing any and all
possible contrasts between treatment means.
Suppose u c1u 1 c au a , u 1,2, , m
a
C u ciu y i and S Cu MS E (ciu2 / ni )
i 1
i 1
The critical value : S ,u S Cu (a 1) F ,a 1, N a
If C u S ,u , then reject H 0 : u 0
• See Page 95 and 96
Islamic University, Gaza - Palestine
3.5.7 Comparing Pairs of Treatment Means
• Compare all pairs of a treatment means
• Tukey’s Test:
– The studentized range statistic:
q
y max y min
MS E / n
, y max and y min are the largest and smallest
sample means out of a group of p sample means
MS E
The critical point is T q (a, f )
n
or T q (a, f ) MS E (1 / ni 1 / n j )
– See Example 3.7
Islamic University, Gaza - Palestine
• Sometimes overall F test from ANOVA is significant, but the pairwise comparison of mean fails to reveal any significant
differences.
• The F test is simultaneously considering all possible contrasts
involving the treatment means, not just pairwise comparisons.
The Fisher Least Significant Difference (LSD) Method
• For H0: i = j
t0
y i y j
MS E (1 / ni 1 / n j )
Islamic University, Gaza - Palestine
• The least significant difference (LSD):
LSD t / 2, N a
• See Example 3.8
1
1
MS E
n n
j
i
Duncan’s Multiple Range Test
• The a treatment averages are arranged in ascending order,
and the standard error of each average is determined as
S yi
MS E
, nh
nh
a
a
1 / n
i 1
i
Islamic University, Gaza - Palestine
• Assume equal sample size, the significant ranges are
R P r p, f S yi , p 2,3, , a
• Total a(a-1)/2 pairs
• Example 3.9
The Newman-Keuls Test
• Similar as Duncan’s multiple range test
• The critical values:
K P q ( p, f ) S yi
42
Islamic University, Gaza - Palestine
3.5.8 Comparing Treatment Means with a Control
• Assume one of the treatments is a control, and the analyst is
interested in comparing each of the other a – 1 treatment
means with the control.
• Test H0: i = a v.s. H1: : i a, i = 1,2,…, a – 1
• Dunnett (1964)
• Compute
y y , i 1,2, , a 1
i
• Reject H0 if
y i y a
a
1
1
d (a 1, f ) MS E
ni n a
• Example 3.10
43
Islamic University, Gaza - Palestine
3.7 Determining Sample Size
• Determine the number of replicates to run
3.7.1 Operating Characteristic Curves (OC Curves)
• OC curves: a plot of type II error probability of a statistical
test,
1 PReject H 0 | H 0 is false
1 P ( F0 F ,a 1, N a | H 0 is false)
44
Islamic University, Gaza - Palestine
• If H0 is false, then
F0 = MSTreatment / MSE ~ noncentral F
with degree of freedom a – 1 and N – a and noncentrality
parameter
• Chart V of the Appendix
• Determine
a
2
n i2
i 1
a 2
• Let i be the specified treatments. Then estimates of i :
• For 2, from prior experience, a previous experiment or a
preliminary test or a judgment estimate.
a
i i , i / a
i 1
45
Islamic University, Gaza - Palestine
• Example 3.11
• Difficulty: How to select a set of treatment means on which the
sample size decision should be based.
• Another approach: Select a sample size such that if the
difference between any two treatment means exceeds a
specified value the null hypothesis should be rejected.
2
nD
2
a 2
Islamic University, Gaza - Palestine
3.7.2 Specifying a Standard Deviation Increase
• Let P be a percentage for increase in standard deviation of an
observation. Then
a
2
i /a
i 1
/ n
1 0.01P
2
1 n
• For example (Page 110): If P = 20, then
1.2
2
1 n 0.66 n
47
Islamic University, Gaza - Palestine
3.7.3 Confidence Interval Estimation Method
• Use Confidence interval.
y i y j t / 2, N a
MS E
MS E
i j y i y j t / 2, N a
n
n
• For example: we want 95% C.I. on the difference in mean
tensile strength for any two cotton weight percentages to be
5 psi and = 3. See Page 110.
48
Islamic University, Gaza - Palestine
3.9 The Regression Approach to the Analysis of Variance
Model: yij = + i + ij
2
L ij2 yij i
a
n
i 1 j 1
a
n
i 1 j 1
L L
0, i 1,2,, a
i
y
a
n
i 1 j 1
ˆ ˆi 0 & yij ˆ ˆi 0, i 1,2,, a
n
ij
j 1
Islamic University, Gaza - Palestine
• The normal equations
Nˆ nˆ1
nˆ nˆ1
nˆ
nˆ
• Apply the constraint
nˆ2
nˆ2
nˆa
nˆa
y
y1
y 2
y a
ˆ y ,ˆi y i y
Then estimations are
• Regression sum of squares (the reduction due to fitting the full
model)
a
a
i 1
i 1
R( , ) ˆy ˆi y i
y i2
n
Islamic University, Gaza - Palestine
The error sum of squares:
a
n
SS E y ij2 R ,
i 1 j 1
Find the sum of squares resulting from the treatment effects:
R( | ) R( , ) R( )
R(Full Model) - R(Reduced Model)
y
y /n
N
i 1
2
2
i
51
Islamic University, Gaza - Palestine
• The testing statistic for H0: 1 = … = a
R( | ) /(a 1)
F0
~ Fa 1, N a
a n 2
y ij R( , ) /( N a)
i 1 j 1
52