Download Using SAS Software in the Design and Analysis of Two-Level Fractional Factorial Experiments

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Data assimilation wikipedia , lookup

Instrumental variables estimation wikipedia , lookup

Time series wikipedia , lookup

Linear regression wikipedia , lookup

Regression analysis wikipedia , lookup

Least squares wikipedia , lookup

Coefficient of determination wikipedia , lookup

Transcript
USING SAS SOFTWARE IN THE DESIGN AND ANALYSIS OF
TWO-LEVEL FRACTIONAL FACTORIAL EXPERIMENTS
Joanne R. Wendelberger, General Motors Research Laboratories
current, and time. In the analysis phase of the experiment, regression
analysis will be used to estimate the effects of varying each variable
from its low to its high level. The -'s and +'s will become -l's and
+l's for the experimental variable settings, recorded in columns, and
these columns will then be used as independent variables for regression
modeling of the dependent variable yield.
A full factorial design for this experiment requires 24 or 16 runs
in order to obtain every possible combination. The resulting design
matrix is given in Table II. Note that the pattern 01 the plus and
minu8 JJigns ron be easily generat«l in an orderlll manner. The first
variable column has alternating plus and minus signs, the second
variable column has alternating pairs of plus and minus signs, and
so on.
Ta.ble II
DESIGN MATRIX AND EXPERIMENTAL SETTINGS
FOR A 24 FACTORIAL EXPERIMENT
OVERVIEW
The design and analysis of fractional factorial experiments will
be described. Use of SAS software for the design and analysis of
fractional factorial experiments will be discussed and illustrated with
examples from the author's experience.
INTRODUCTION
Scientific studies often investigate the effect of changing the
settings of one or more variables on some measured quantity of
interest. Experimental design deals with the selection of settings
of the input variables at which the output or response variable is
measured. A class of designs called the 2 k - p fractional factorials is
very cost effective, especially at the initial variable screening process.
These designs have been discussed in several statistical publications
including Box and Hunter (1961), Box, Hunter, and Hunter (1978),
Daniel (1976), and Davies (1978). Although these designs are not
new, they are greatly underutilized.
A
Use of these fractional factorial designs allows efficient use
of resources for obtaining information.
Instead 01 requiring
mea8urements on everll possible #!ombination of experimental settings,
the fraetionalladoriai alloWJJ the experimenter to conduct a r«lucedsd
of experiments, while JJtill retaining the ability to obtain information
about the effectJJ of changing settingJJ of the experimental variables.
The structure of these designs results in uncorrelated estimates
of effects, a desirable situation which is hard to attain when
measurements are collected without controlling input variables.
Off
30
35
120
120
100
120
100
120
100
120
100
120
100
120
CURRENT TIME
20
on
on
on
on
off
off
off
off
on
20
30
30
20
20
30
30
on
on
off
off
off
off
30
30
30
30
30
30
30
30
35
35
35
35
35
35
35
35
In addition to main effects, interaction effeets of combinations of
variable8 mall be of intere8t. An interaction occurs when the effect of a
variable depends on levels of one or more other variables. Interaction
effects of two or more variables can be estimated using products of
the columns associated with the experimental variables. For example,
the interaction AB between the variables A and B is associated with
the column obtained by multiplying the A and B columns together.
The interaction columns will be used in the regression analysis as
additional independent variables for modeling the dependent variable
yield.
With this design, it is possible to obtain uncorrelated estimates
of the main effects A, B, C, D, the two-factor interactions AB, AC,
AD, BC, BD, CD, the three-factor interactions ABC, ABD, ACD,
BC D, and the four-factor interaction ABC D, plus the mean response.
Note that the number of parameters which can be estimated is limited
by the total number of runs, in this case 16. The complete design
matrix containing all the main effects and interactions is given in
Table III.
Table I
On
20
20
30
30
20
20
30
30
In this example, k = 4, so the matrix has k = 4 columns to
represent the 4 variables and 2k = 24 = 16 rows which contain the
16 possible combinations of the low and high settings for the four
experimental variables.
Once the data collection phase of the experiment is completed,
the column for a given variable, replaced with plus and minus 1'5, can
be used to obtain an estimate of that variable's main effect. The main
effect of an input variable measures the difference between the values,
of the response- variable when measured at the input variable's high
and low settings.
EXPERIMENTAL SETTINGS FOR A 4 VARIABLE EXPERIM:ENT
30
PRESSURE
100
100
+ + + +
In a. two-level factorial design each variable has 2 settings, a low
value and a high value. A full two-level factorial with k variables
requires 2k runs. To illustrate, consider the example of a 2i factorial
experiment: suppose an experimenter wants to study the effect of
the variables temperature, pressure, current and time on the yield, in
grams, of a chemical process. Each of the four input variables has a
high and a low level as shown in Table I.
20
TEMPERATURE
120
100
+
+
+
+
+
+ +
+
+ +
+ +
+
+ + +
In a factorial design experiment, factors or variables are selected
which are thought to have some effect on a measured quantity of
interest known as the response or dependent variable. Specific fixed
levels are chosen for each of the variables, and the response is
measured for different combinations of the variable settings. In a full
faetorial, the response is measured at every possible combination of
the variable settings. Often, the full factorial requires a large number
of experiments or runs. For example, if X has 2 levels, Y has 3 levels,
and Z has 5 levels, the full factorial would require 2 . 3·5 = 30 runs.
When there are many variables, or if variables have many levels, the
number of runs required for a full factorial may he too large to be
practical.
HIGH
D
+
+
+
+ +
+ + +
FACTORIAL DESIGNS
LOW
100
C
+
+
+ +
In this paper, the design and analysis of 2 k - p fractional factorial
experiments will be reviewed. To illUstrate the structure of fractional
factorial designs and the subsequent analysis of data obtained from
such designs, an example will be discussed in detail, including use of
SAS. Additional applications from the author's experience will also
be provided.
VARIABLE
Temperature (0 C)
Pressure (lbs/in 2 )
Current
Time (seconds)
B
120
FRACTIONAL FACTORIAL DESIGNS
A class of two-lcvellactorial designs which allow8 reduction af the
number of runs while still allawing estimation of important effeds i8
the familll of two-level Iradiona! factorial designs. The construction
of two-level fractional factorial designs is based on the structure of
two-level full factorial designs.
Letting the minus sign (-) represent the low level and the plus sign
(+) the high level for each variable, the design may be represented
by a matrix of - 's and + 's indicating the levels of the four input
variables for each experimental run. The letters A, B, C, and D, will
he used to represent the experimental variables temperature, pressure,
In practice, interactions involving three or more factors are often
negligible. We can take advantage of the fact that some of the effects
estimable in the full factorial are negligible to reduce the number
772
Table III
COMPLETE DESIGN MATRIX FOR A 2i FACTORIAL EXPERIMENT
A 8
+
-
- +
+ +
+
C
+
-
-
+
+
-
-
-
+
+
+
+
+
+
+
+
- + +
+ + +
+
+
+ +
+
-
A8 AC AD 8C
0
-
-
+
+
- + +
+ + +
+
+
-
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
BD CD
+
+
+
+
+
+
+
+
ABC ABO ACD BCD ABeD
+
+
+
+
+
+
+
+
+
+
+
+
Table V
ALIASING PATTERN FOR A 25- 1 FRACTIONAL FACTORIAL DESIGN
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
1::::0 ABCDE
+
+
+
+
+
A:o:: BCDE
B = ACDE
C=ABDE
D=ABCE
E=ABcD
AB=CDE
AC= BDE
AD =:: BCE
AE = BCD
BC=ADE
BD= ACE
BE= ACD
CD = ABE
CE=ABD
DE = ABC
+
+
+
+
+
+
+
+
+
+
of runs required for the experiment. In the example given above,
assume that the 3-way and 4-way interactions are negligible. This
information can then be used either to study additional variables
without increasing the number of runs or to reduce the number of
runs for the present study.
Suppose a fifth variable E is added to the 4 variable study
described above. For example, suppose two different catalysts are
being studied. Catalyst 1 can be designated as the low level and
Catalyst 2 as the high level. A full factorial with 5 variables requires
2 5 = 32 runs. Instead of adding a fifth column, which would require
twice as many runs, the new fifth variable is assigned to the column
for the ABC D interaction to obtain its settings. Interaction effects
involving the new variable E are obtained as before by multiplying
columns of the desired effects. Instead of representing just one effect,
each column now may represent multiple effects. These effects are said
to be aliased or confounded. Table IV gives the new design matrix,
experimental settings, and sample fictitious data for the response
variable yield.
relations or generators: A table in Box, Hunter, and Hunter (1978)
gives generators for several 2 k - p designs. A U. S. Department of
Commerce Publication (1957) gives defining relationships for several
designs, from which generators can be obtained.
Aliasing patterns are also obtained from the generators. The
algebra used to multiply words together is as follows. If the total
number of times a letter appears in the words is odd, then it appears
in the resulting product. If the total is even, the letter does not appear
in the product. For example, if ABCD and CnE are multiplied using
this rule, the result is ABE since A, B, and E each appear once, and
C and D each appear twice. Any letter times itself is equal to the
identity I. For example AB . AB = A 2 B2 = I . I = I. This algebra
can be used to generate the entire aliasing pattern and is described
in detail in Box, Hunter, and Hunter (1978).
Table IV
DESIGN MATRIX AND EXPERIMENTAL SETTINGS
FOR A 2,-1 FRACTIONAL FACTORIAL EXPERIMENT
A B C
+
+
+
+
+
D
+
+
+
+
+ +
+ +
+
+ +
+
+
+
+ +
+ + +
E
TEMP
+
100
120
100
120
100
120
100
120
100
120
100
12{)
100
120
100
120
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
PRESS
20
20
30
30
20
20
30
30
20
20
30
30
20
20
30
30
Given the generators of a specific design, the design matrix can
be easily generated with SAS software using DATA IIteps. In the SAS
program steps + 1 's and -1 's are used instead of plus and minus signs.
First, the design matrix of a full factorial is created for the first k - P
variables either by direct input, or by programming statements as
shown in the following DATA step for the 2 5 - 1 example defined by
1= ABC DE. Note the pattern forming the argument for the !NT
function which holds for any number of variables.
CUR TIME CAT YLD
on
on
on
on
off
off
off
off
30
30
30
30
30
30
30
30
35
on
on
off
off
off
off
35
35
35
35
35
35
35
2
1
1
2
1
2
2
1
1
2
2
1
2
1
1
2
52.4
15.4
68.9
61.1
58.8
69.3
62.9
11.6
59.6
69.1
63.0
72.5
53.1
15.8
69.9
66.9
DATA FULLFACT;
DO 1=1 TO 16;
A=(-l)"'*I;
B~(-1)"(INT«I+I)/2));
C~(-I)"(INT«3+I)/4));
D~(-l) "(INT«7 + 1)/8));
OUTPUT;
END;
The generator relationships are then used to obtain one or more
columns for additional variables by multiplying together columns of
the full factorial in assignment statements.
This design is called a half-fraction of the 32 run full factorial
design. Since it has 5 variables, with one of them defined in terms
of the other 4, it is a 25 - 1 fractional factorial design. Note that the
number of runs is equal to 25- 1 = 24 = 16, half the number required
for a full 5-variable factorial.
DATA FRACTION;
SET FULLFACT;
E=A*B*C*D;
Table V shows the complete aliasing patternfor the 25 - 1 design
resulting from setting E = ABC D. I is used to represent the identity
element and consists of a column of 16 plus signs. The I column will
be used to estimate the response average in the regression analysis of
the data.
Columns corresponding to the interactions are also generated by
multiplication in assignment statements. Note that only one column
is generated for each pair of aliased effects. For example, since
AC = BDE, only the AC column is generated.
Note that each main effect and 2-factor interaction is aliased
with a 3 or 4-way interaction which was assumed to be negligible.
Thus, ignoring effects expected to be unimportant, we ean still obtain
estimates 0/ all main effects and 2-way interactions.
DATA DESIGN;
SET FRACTION;
AB=A*B;
AC=A*C;
AD=A*D;
AE=A*E;
BC=B*C;
BD=B*D;
BE=B*E;
CD=C*D;
CE=C*E;
DE=D*E;
GENERATING DESIGN MATRICES
The portion of the aliasing pattern involving I, the effect
associated with the mean, is called the defining relation. Depending
on the degree of fractionation, the defining relation contains one or
more effects or words. The words in the defining relation, or some
subset of these words, can be used to generate the entire aliasing
pattern. The words in such a set are known as generators.
Publications listing experimental designs may include defining
773
response which are collected from repeated runs of the experiment at
the same experimental settings. If replicate measurements are not
available and some alias sets can be assumed negligible, then the
residual mean square error of the regression model can be used to
compute an alternate error estimate. Further details on obtaining
error estimates may be found in Box, Hunter, and Hunter (1978).
The total number of variables for the complete design should be at
most Zk-p - 1.
RANDOMIZATION
The order in which experimental runs are done should be random
to distribute the effects of unknown, hidden variables. Randomization
can he done in SAS using the PLAN procedure, or for more complex
situations using DATA steps, generating random numbers, and then
using RANK and SORT procedures. For the Z5-1 chemical yield
experiment exaIllvle. a completely randomized run order can be
obtained using PROe PLAN. Output from the procedure consists
of a list of the numbers from 1 to 16 in random order.
Once an estimate of the standard error for a coefficient has been
obtained, one can assess whether individual coefficients are important.
To determine the significance of a particular coefficient, the ratio
of the estimated coefficient to its standard error is compared to
the percentage points of Student's t-distribution. Traditionally, this
computation has been done using tables. With SAS software, the
probability value for a given value of t and a specified number of
degrees of freedom may be obtained with the PROBT function. Use
of the PROBT function for the two-sided t-test appropriate for testing
whether an effect is significantly different from zero is documented in
SAS User's Guide; Basics under SAS functions.
PROC PLAN;
TITLE 'COMPLETELY RANDOMIZED RUN ORDER';
FACTORS RUN:::16;
Suppose that for the chemical yield data, each data value Was
actually the average of two measurements, Yl and Y2, obtained in
separate experimental runs and that the resulting standard error of
each individual coefficient has been determined to be .91. Then to
determine the significance of the A effect, consider the test statistic
Sample Run Order: 13,4,9,3,14,7,15,5,16,6,12,8,11,2,1,10
ESTIMATION OF EFFECTS
Estima.tion of the effects of variables a.nd their interactions may
be done by simple averaging techniques, the Yates algorithm, or
regression. Averaging methods for small studies using the simple
geometrical structure of factorials shed light on the interpretation of
what is meant by a main effect or an interaction. The Yates algorithm
provides a shortcut useful for hand calculations. However, for routine
calculations the regression technique is most convenient, particularly
for'large studies. The response variable is modeled as linear function
of the variables and their interactions. The regression coefficients,
which can be obtained from the SAS regression procedure PRGe
REG, will correspond to the effects on the response of varying the
settings of input variables from low to high values. Howey-er, with
the -1 and +1 scaling, regression estimates will be equal to one half
times the value of the effect of changing from the low setting to the
high setting. Thus, to obtain values which can be interpreted directly
as effects, the coefficients of the input variables must be multiplied
by two. The intercept term corresponds to the average response and
does not require doubling.
A
The value t = 10.9 is compared to a t-distribution with 16 degrees
of freedom. Using the SAS PROBT function, the probability of
obtaining a value as large in magnitUde as 10.9, if A has no effect,
is found to be less than .001. Since a value which o<:curs less than
one time in a thousand seems highly unlikely, one concludes that the
variable A does have a significant effe<:t. Similarly, the effects of B,
AB, and E are found tb be significant. The remaining effects ha.d
t-ratios which were not large enough to be considered unlikely and
hence were not considered significant.
Normal probability plots may be useful in displaying the
importance of different effects. In a normal prohablity plot, data
values are plotted against their normal scores, a transformed version
of the data. If the data behaves like a random sample from a normal
distribution, the p<Jints should lie along a reference line which is
generated using the m~ and standard deviation of the data. IT
a set of effects is plotted, and some of the effects are important, then
the data will not follow the reference line. Instead, the points for
the nonsignificant effects usually lie along some different line, and the
points of the significant effects will be offset. Removal of significant
effects and replotting of the remaining effects should result in tbe
expected linear pattern.
For the above example, regression analysis can be used to obtain
estimates of effects, regression diagndstics, residuals and predicted
values for further analysis. Given a data set containing the complete
design matrix for the above example and the measured values for
the response variable, the estimation of regression coefficients can
be accomplished with the statements below.
Estimates Of the
average and variable effects obtained from the intercept and doubled
regression coefficients are given in Table VI.
Normal probability plots may be generated using the RANK,
MEANS and GPLOT (or PLOT) procedures. In a SAS Institute
Technical Report (1983), Chilko discusses probability plotting with
SAS software. Half-normal probability plots, discussed by Daniel
(1959) may also be usefuL
PROGREG;
MODEL Y=A BCD E AB AC AD AE BC BD BE CD CE DE;
Figure 1 contains a normal probability plot of all of the effects
of the model estimated above for the chemical yield study. Most of
the effects lie along a line different from the reference line. The A,
B, AB, and E effect·.. are offset from this line indicating that they
are significant. Figure 2 contains a second plot of the effects after the
significant effects are removed. The points lie along the reference
line as expected. SAS program statements to generate a normal
probability plot of the effects and a reference line from a dataset
EFFECTS containing the effect values in the variable EFFECT are
as follows.
Table VI
ESTIM:ATES OF EFFECTS FOR 25 - 1 FRACTIONAL FACTORIAL
EFFECT VALUE
Average
66.06
A
9.96
B
3.a6
C
-.04
D
.48
E --6.02
AB -6.60
AC
-.24
AD
-.16
AE
BC
BD
BE
CD
CE
DE
9.96
t = .91 = -:9T = 10.9.
PROC RANK DATA=EFFECTS NORMAL=BLOM
OUT=RANKS;
VAREFFECT;
RANKS EFFRANK;
PROCMEANS;
VAREFFECT;
OUTPUT OUT=SUMMARY MEAN=MEAN STD=STD;
DATA LINE;
SET SUMMARY;
LOOP: SET RANKS;
LINE=MEAN+EFFRANK*STD;
OUTPUT;
GOTO LOOP;
PROC GPLOT;
.44
-.04
-.06
.26
.24
.06
-.26
EXAMINATION OF EFFECTS
Individual coefficients for each 0/ the efJ~ct3 must fie examined
relative to the natural random variation inhl!rl!nt in the data. An
error estimate may he obtained from replicate measurements of the
774
Figure 1
Figure 3
RESIDUALS VS PREDICTED YIELD VALUES
NORMAL PROBABILITY PLOT OF EFFECTS
ALL .FrECTS INCLUDED
:a.ao
8.!,
<.'"
ec
CE
;
S -! .33 ~
NORM~L
SCORES
Figure 2
Figure 4
NORMAL PROBABILITY PLOT OF RESIDUALS
NORMAL PROBABILITY PLOT OF EFFECTS
,,
o
"
0
..
NORMAL SCORES
of residuals VB. predictE'd values is displayed in Figure 3. No obvious
inadequacies are evident. The following Htatements can be used to plot
residuals against predicted values and against each of the experimental
variables.
PROC GPLOT;
PLOT RESIDUAL*(PREDICT ABC DE);
The normal probability plots discussed earlier for analysis of
effects may also be used as graphical tools for studying residuals. A
normal probability plot of the residuals provides a check of whether
the residuals are normally distributed, as required for some statistical
tests. Points which deviate strongly from the line of a normal
probability plot may indicate lack of fit in the model or presence of
outliers. Figure 4 shows a normal plot of the residuals of the chemical
yield data which does not suggest any inadequacies.
PLOT EFFECT*EFFRANK LINE*EFFRANK /OVERLAY;
SYMBOLI V=STAR !=NONE C=BLACK;
SYMBOL2 V=NONE I=JOIN C=DLACK;
Based on the results of statistical tests and graphical displays
of the effects, effects which are significant are selected. These
effects may then be used to predict the response value within the
range of experimental conditions studied. One method of obtaining
predicted values is to evaluate the response from a regression model
containing only the significant effects. Residuals, which are obtained
by subtracting the predicted response values from the actual measured
response values may also be obtained.
The SAS REG procedure is used again with the chemical data but
this time an OUTPUT statement is included to save the predicted
values and residuals in an output dataset called REGOUT.
INTERPRETATION OF EFFECTS
PROC REG;
MODEL Y =A B AB E;
OUTPUT OUT=REGOUT P=PREDICT R=RESIDUAL;
Interpretation of main effects depends on the presence or absence
of significant interaction effects. When a variable is not involved
in any significant interactions, its effect may be interpreted as the
difference in the response which occurs when the input is varied
from its low value to its high value. However, if the variable does
signifkantly interact with one or more other variables, the value of
the main effect by itself is not meaningful. Rather, the main effect
takes on a different value at each level of the other variables with
which it interacts.
Returning to the chemical yield experiment, the significant effects
can be examined in light of the original experimental variables.
The E effect corresponds to the Catalyst variable. Since the value
of the E effect is -6.02, using Catalyst 2 instead of Catalyst 1
results in a drop in yield of 6 grams. A and B correspond to the
variables Temperature and Pressure. The interaction term AB was
also significant. Therefore the main effects of A and B are not
meaningful by themselves. Instead, we say that the effect of changing
the temperature from 100° to 120° is different at the two different
RESIDUAL ANALYSIS
An important part of the modeling process is the model checking
phase. Careful examination of the residuals or deviations of the
model from the data can reveal flaws or inconsistencies. Plots of
residuals versus predicted response values and of residuals versus the
experimental values provide useful diagnostic tools. For a model to he
considered adequate, these plots should appear to he random, without
evidence of obvious patterns such as trends or increasing spread in the
values. Draper and Smith (1981) discuss the use of residual analysis
for model checking.
Plots of residuals versus predicted values and experimental
settings may be obtained using the output from PROC REG if
residuals and predicted values are saved in an output dataset with
the OUTPUT statement. For the chemical yield example, a plot
775
behavior, and results were diffie_ult to interpret. Investigation finally
led the experimenter back to the original recording sheets from
the experiment, and a transcription error was discovered. When
the incorrect value was corrected the probability plot of residuals
appeared reasonably normal, and effects of variables made sense. The
graphical tool used in the analysis helped uncover an outlier which
could have led to mistaken conclusions.
pressure levels used in the experiment. Since the variables C and
D do not appear in any of the significant effects, we may conclude
that the effects of current and time on yield are negligible within the
experimental ranges studied.
ADDITIONAL EXAMPLES
ExtUllple 1:
Experiment
Screening Design for
Industrial Process
CONCLUSION
The 2k - p fractional factorial designs are useful for studying the
effects of several variables simultaneously. SAS software provides
several useful tools for designing and analyzing data from fractional
factorials. The use of 2k - p fractional factorials and other statistical
experimental designs should be encouraged. The programs described
here could be modified to accommodate more general problems,
rather than just specific cases. Use of other SAS facilities, such
as the macro language and SAS/AF™, which are documented by
SAS Institute (1985), might be useful in implementing a system for
assisting experimenters with the design and analysis of experiments.
A set of process variables were identified as potentially important
factors in determining the quality of the finished product. A screening
design was desired to try to select a subset of important variables for
further study. Generators for a 2 13 - 8 design were determined based on
the aliasing pattern of a design in a Department of Commerce (1975)
publication, "Fractional Factorial Experiment Designs for Factors at
Two Levels.' Use of blocking variables allowed the runs to be grouped
into 4 blocks, corresponding to the 4 days of the experiment. Blocks
are used to try to account for variation between the different days.
Blocking variables are assigned to interaction columns in the same way
as additional experimental variables. Within each block, run order
was randomized. Appendix A gives the program used to prepare the
randomized design. The different stages of the design are produced
as output from the program.
™ SAS and SAS/GRAPH are registered trademarks of SAS
Institute Inc., Cary, NC, USA. SAS/ AF is a trademark of SAS
Institute Inc.
Exalllpie 2: Central COlllposite Design for Optinrizing
Equiplllent Settings
REFERENCES
1. Box, G. E. P., and Hunter, J. S. (1961)' "The 2k- p Fractional
Factorial Designs, Part I," Telhnometrils, 3,311-351.
2. Box, G. E. P., and Hunter, J. S. (1961), "The 2k- p Fractional
Factorial Designs, Part II," Telhnometrics,3,449-458.
In some situations, an experimenter may need to estimate
quadratic effects in addition to the main effects and two-factor
interactions which can be estimated with a fractional factorial design.
An extension of factorial and fractional factorial designs which allows
estimation of quadratic effects is the central composite design. The
central composite design is a useful tool for exploring surfaces which
arise when a response variable is modeled as a quadratic function of
experimental variables. In a composite design, a factorial or fractional
factorial design is augmented by two additional types of design points.
Center points occur at the center of the range of settings for each
variable. Star points fix one variable at a non-central value and all
the other variables at the center. The factorial points, star points,
and center points are combined to form a composite design.
3. Box, Hunter, and Hunter (1978), Statistics for Experimenters,
New York: John Wiley.
4. Chilko, Daniel (1983), Probability Plots, SAS Technical Report
A-106, SAS Institute Inc., Cary, NC.
5. Daniel, C. (1959), "Use of Half-Normal Plots in Interpreting
Factorial Two-Level Experiments," Telhnometrics,I,149.
6. Daniel, C. (1976), Applilations of Statistics to
Experimentation, New York: John Wiley.
Industrial
7. Davies, O. L. (1978), The Design and Analysis of Industrial
Experiments, London: Longman Group.
8. Draper,
N. R.
(1985),
"Small Composite Designs,"
Tahnometrils,27,173-180.
An experimenter was interested in studying a mechanical system
to determine optimum operating conditions. Eight settings could be
varied on the equipment. A 2 8 - 3 fractional factorial design would
provide information on main effects and some two-factor interactions.
Adding star points and center points to the fractional factorial
produced a central composite design which provided additional
information to estimate quadratic effects. The program given in
Appendix B illustrates generation of the design matrix and run
randomization.
9. Draper, N. R. and Smith, H. (1981), Applied Regression Analysis,
New York: John Wiley.
10. Myers, R. H. (1976), Response Surface Methodology, Blacksburg,
Virginia: Virginia Polytechnic Institute and State University.
11. SAS Institute Inc. (1985), SAS Uscr's Guide: Basics, Version 5
Edition, Cary, NC.
12. SAS Institute Inc. (1985), SAS User's Guidl: StatistilS, Version
5 Edition, Cary, NC.
13. SAS Institute Inc. (1985), BAS/GRAPH User's Guide, Version 5
Edition, Cary, NC.
14. SAS Institute Inc. (1985), SAS/AF User's Guide, Version 5
Edition, Cary, NC.
15. U. S. Department of Commerce (1957), Fractional Faetoriai
Experiment Designs for Factors at Two Levels, National Technical
Information Service.
Further information on composite designs and other response
surface techniques is available in Myers (1976). Recent work by
Draper (1985) improves on traditional central composite designs by
further reducing the number of required runs in certain cases.
Exaillple 3: Outlier Detection in a Biolllechanies Study
A 25 - 2 fractional factorial design was run to study the effects
of 5 variables on a biomechanical system. At the end of the 8
runs, a fear that failure to account for some other variables led to
augmentation by another set of 8 rUllS resulting in a 21 - 3 design.
The combined data was analyzed as described in the chemical yield
example. A probability plot of the residuals indicated non-normal
776
APPENDIX A - SCREENING DESIGN
APPEHDIX B - CENTRAL COMPOSITE DESIGR
,; PROGRAM TO GENERATE EXPERMEIITAL DESIGN FOR 2**(13-8) FRACTIONAL *
* FACTORIAL IN 4 SLOCKS OF 10 UNITS EACH. RUNS ARE RANDOMIZED
• WITHIN EACH BLOCK. EACH BLOCK INCLUDES 2 REPLICATE POIKTS.
• PROGRAM WHICH GENERATES A Z**(8-3) FRACTIONAL FACTORIAL DESIGH, •
* ADDS STAR POINTS AND CENTER POINTS TO FORM A CERTRAL COMPOSITE •
* DESIGN, AND RANDOtIlZES RUN ORDER
~ .... " ... *.".*" .... *" .... ~ ... * .. *"." .. ** .. ~**.*."*.* ....... ** .. ***" •• * ... ,,**"*
* GENERATE 2**5 FACTORIAL DESIGII *
** u* ~ ****u u **""* ~.*" *,.*,.* ...... *
• GElIERATE A PULL Z**5 FACTORIAL ..
data full,
do i-I to 3Z;
a (-l)*"i;
1>"'(-1) **(int(l+i)/Z»;
c=( -1)""(int(3+i)/4» 1
eo:{-I)""(int{7+i)/S» ;
f={ -1)**(int{15+i)/16»;
output;
data full.
do iKl to 32;
a={-l)"*i;
IF-(-l)·''(int( (1+i)/2»;
c={-1)*"(int( (3+i)/4»;
d=( -l)d(int( (7+i)/8»;
e-{-l)"*(int( (15+1)/16»;
output;
,
E
"""
"""
proc print;
title 'Z*"5 FACTORIAL DESIGN';
ti.tle1 'Z*·5 FACTORIAL DESIGN';
~~..~~!~;!* ..." ............. "**"." .. *** .. ** ........ ,."''' .... **
~ GENERATE Z**{13-S) FRACTIONAL FACTORIAL IN , BLOCKS OF S "
***",*****",,,,*,,,**,,,,**** .. *,,***~,, ... **,, •• ****,... ,, ••• ,,,*.,,,,,,,.*.,,,."'''**
" ADD VARIABLES TO OBTAIN FRACTIOI!"AL FACrORIAL •
data fraction;
set full;
d"'a*b*c;
g=b·e"f;
h-b"c"e;
j=a*b*c*e*t;
k=c"e*f;
l-b*c'f;
1Do.a·b·e;
n_*b*f;
blkl-f"g;
blk2=g*h;
exp= n ;
titl;; 7 Z**(13-8) FRACTIONAL FACTORIAL DESIGN IN , BLOCKS OF S',
proc print;
var abc d e t 9' h j k 1 ID n blkl blkZ;
* ****** "* ***"******* **" * .. *""
* GENERATE REPLICATE POINTS"
data fraction,
set full;
f~a"b·c;
g=a·b*d;
h~b"c*d"e;
tit1e1 'Z**{8-3) FRACTIONAL FACIORIAL DESIGN';
~~~.. ~:!'.!;!**** ...... " .. *,.* .. ,. ... * ...... "" ** ... *,. .. " ~
.. GENERATE STAR POINTS AND CENTER POINTS •
data star;
input a b c d e f 9 h;
cards;
0 0 0
ZOO 0
-2 0 0 0 0 0
o ZOO
-Z
o
,
o
o
data reps;
input e.xp abc d e f g h j kIm n blU blk2;
cards;
33
0 0 0 0 0 0 0 0 0 0 0 0 0 -1 -1
34
0 0 0 0 0 0 0 0 0 0 0 0 0 -1 -1
35
0 0 0 0 0 0 0 0 0 0 0 0 0 -1 L
36
0 0 0 0 0 0 0 0 0 0 0 0 0 -1 1
31
0 0 0 0 0 0 0 0 0 0 0 0 0 1-1
36
00 0 0 0 0 0 0 0 0 0 0 0 1-1
39000000000000011
40000000000000011
0 -2
0 2
0
0
,
OOO-ZOOO
o 0 0
0
0
o -,
o 0
0
0
0
0 o -, 0
0
0
0 o -,
0 0
o
0
0
0
0 0
0
0 0
0
,
•,
,
,
.. ADD REPLICATE POINTS TO BLOCKED FRACTIONAL FACTORIAL ..
0
0
0
0
0
0
0
,
-,
0
0
• * .. ,." ~ ..... " .. ~ .. " ..... ~**" .. ~" •
.. FORM COMPOSITE DESIGN *
data blocks;
set traction reps;
proc sort;
by blk1 blkZ;
title 'Z**{13-8) FRACTIONAL FACTORIAL DESIGN',
titleZ 'IN 4 BLOCKS OF 10';
tit1e3 'INCLUDING REPLICATE POIlITS';
proc print;
var exp abc d e t 9 h j k 1 D'I n blk1 blk2;
*** .. * .. *" .... *** .... *** .... *".***** ......... "'.. *" .. * ................. "'*.**
.. RANDOMIZE THE ORDER OF BLOCKS AND RUNS WITHIN Br.OCKS "
data couq:>Os;
set fraction star;
title1 '8 VARIABLE COMPOSITE DESIGN';
~~.~~;!~:!!:!" ......... " ......" ............... ** .... " .. ***.. * ........ .
• GENERATE A RANDOM ORDER FOR THE 50 DESIGN roIIITS *
,
data rand;
do ellp"'1 to 50;
x=ranuni(O) ;
proc sort datasblocks;
by blkl b1kZ,
data randblk;
do Jt1=1 to 4;
run_;
output;
....,
run;
proc rank out=random;
var run;
.... "'*"' .... ** .. "'".. "'*** ...... ***"' ..... **... **********... " ............ ,,*
.. ASSIGII RAMOOH ORDER TO POIIITS IN COMPOSITE DESIGN *
block-ranuni(O),
output;
000,
proc rank out"'ranks;
var block;
data ranks;set ranks;
do xZ-1 to 10;
run=ranuui(O);
output;
data design;
merge couq:>Os random;
proc sort;
"""
by run;
data design;
JDerge blocks ranks,
proc sort;by block;
proc rank out~final;
var run;
by block;
proc sort,by block run;
titleZ 'RANDOMIZED WITHIN 4 BLOCKS OF 10';
proc print;
var eKp block run abc d e t 9' h j k 1 ID n;
run;
tit1el 'CE!ITRAL COMPOSITE DESIGN IN RAKDOM ORDER' 1
proc print;
var ruu eKp abc d e f 9 h;
run;
777