Download Alternate Dispersion Measures in Replicated Factorial Experiments

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Degrees of freedom (statistics) wikipedia , lookup

Inductive probability wikipedia , lookup

Foundations of statistics wikipedia , lookup

Receiver operating characteristic wikipedia , lookup

Taylor's law wikipedia , lookup

Psychometrics wikipedia , lookup

Density of states wikipedia , lookup

History of statistics wikipedia , lookup

Misuse of statistics wikipedia , lookup

Transcript
Alternate Dispersion Measures in Replicated Factorial Experiments
Neal A. Mackertich
The Raytheon Company, Sudbury MA 02421
James C. Benneyan
Northeastern University, Boston MA 02115
Peter D. Kraus
The Raytheon Company, Tewksbury MA 01876
Abstract
Any of several statistics traditionally are used to detect dispersion effects in the analysis
of replicated factorial experiments, including the within-run standard deviation s, the
natural logarithm ln(s+1), various signal-to-noise ratios, and others. This study examines
the relative performance of each approach using recent experimental designs, with
ln(s+1) typically producing the best results. An alternate approach, based on the absolute deviations from the within-run means, also is shown to increase detection but at the
expense of uncontrolled false alarms.
Introduction
While designed experiments historically have focused primarily on optimizing the mean of
one or more responses, they increasingly are being used to identify factors that affect response dispersion. Examples include increased interest in variance reduction as a primary
objective, increased emphasis on process and product robustness, and work on designs for
estimating variance functions (e.g., Box (1988), Box and Jones (1992), Byrne and Taguchi
(1986), Davidian and Carroll (1987), Phadke (1989), and Vining and Schaub (1996)). In order to detect differences in response variance due to factor effects, any of several statistics
traditionally are used in the analysis of replicated factorial experiments. These include the
within-run standard deviation s, the natural logarithm ln(s+1), the within-run variance s2,
various signal-to-noise ratios, and others.
This heightened emphasis on response variance increases the importance of understanding
the relative performance of alternate possible measures for detecting true variance effects in
typical experimental scenarios. The current study also is motivated by the suggestion that the
best measure for detecting differences in some value of interest might not necessarily be one
of the most familiar statistics for estimating that parameter. We examined the relative performance of conventional and alternate types of dispersion measures in recent experimental
scenarios, as well as another type of dispersion measure motivated during the course of this
study by a desire to increase power by preserving associated degrees of freedom.
1
Alternate Dispersion Measures in DOE
Results summarized below indicate that ln(s+1) tends to be the best traditional measure for
detecting variance effects, while larger-the-better and smaller-the-better signal-to-noise ratios
tend to be quite poor. Two alternate statistics suggested during the course of this study, the
absolute deviation from the within-run mean, |yi,j - y i|, and an approximately normalized
transform of this statistic, |yi,j - y i|.42 increase power in several examples over s, ln(s+1), and
popular signal-to-noise ratios, although at the expense of uncontrolled type I error rates.
Traditional Dispersion Measures in DOE
The most common measures used in replicated designs to identify factors and interactions
affecting response dispersion are well-known in the literature (for example, see Box (1988),
Box, Hunter, and Hunter (1978), Montgomery (1984), Myers and Montgomery (1995),
Schmidt and Launsby (1995), Taguchi (1986)) and include:
•
the within-run sample standard deviation, s, of all response replicates at each set of
experimental conditions,
•
the within-run sample variance, s2, of all response replicates at each set of experimental conditions,
•
the logarithm or natural logarithm of s or (s+1), and
•
Taguchi's three most common "signal-to-noise" ratios (i.e., "nominal-the-best",
"smaller-the-better", and "larger-the better").
The within-run standard deviation s and variance s2 are direct estimates of their theoretical
counterparts σ and σ 2 and need no further motivation. The usual rationale for taking logarithms of the sample standard deviation is to approximately normalize s in order to obtain
more accurate results via standard statistical tests for significance, with some small constant
(by convention 1) often first added to avoid the potential problem of taking a logarithm of
zero (such as due to rounding in data collection).
The three most standard signal-to-noise ratios typically are referred to as "nominal-the-best",
"smaller-the-better", and "larger-the-better" and have been proposed for the three common
situations for which the experimental objectives either are to:
•
achieve response values as close as possible to a desired target value (and with as
minimum variability about that target as possible), such as for a manufacturing dimension or output voltage,
•
minimize all response values as much as possible (and with as minimum variability as
possible), such as for percent shrinkage or deviation from round, or
•
maximize all response values as much as possible (and with as minimum variability as
possible), such as for bond strength or time until failure.
2
Alternate Dispersion Measures in DOE
These three signal-to-noise ratios usually are estimated, respectively, by:
•
æy ö
^
"Nominal-the-best": S/N N = 10 • log ç 2 ÷ ,
ès ø
•
æ 1 n 2ö
^
ç
"Smaller-the-better": S/N S = -10 • log å yi , and
è n i=1 ø
•
"Larger-the-better":
2
æ1 n 1 ö
^
S/N L = -10 • log ç å 2 ÷ .
è n i=1 yi ø
As discussed by others, note that an important distinction of signal-to-noise ratios is that
rather than decoupling the mean and variance into separate analyses, they attempt to form a
single combined metric of both central tendency and variability (e.g., Gunter (1988) and
Hunter (1987)). Also note that although Taguchi (1986) proposed many other signal-to-noise
ratios (for example, a less common nominal-the-best signal-to-noise ratio has the form S/NN2
= 10 • log s2), the above three almost exclusively tend to be used in practice. For each ratio
and their associated loss functions, the general rationale is to penalize in some way for deviations from the desired value in terms of both location and dispersion in a single measure. For
further discussion, see Box (1988), Hunter (1985, 1987), Phadke (1989), Pignatello and
Ramberg (1991), Taguchi (1986), and others.
Regardless of which measure is used, significant factors or interactions typically are identified via the analysis of variance, F tests, t tests, probability plots, dot plots, and the like (see
Box, Hunter, and Hunter (1978), Daniel (1976), and Montgomery (1984)). Unlike the case
of testing for mean effects, however these tests now are conducted on the 2k within-run summary statistics (in the case of two-level factorial designs), rather than the n individual outcomes within each of the 2k runs, resulting in the loss of 2k(n - 1) degrees of freedom and the
need for either at least one empty column or variance pooling to identify dispersion effects.
In addition to the above traditional measures, also of interest here is whether some alternative
could result in better variance effect detection. As suggested earlier, the best test criterion for
detecting differences in a process parameter (in this case response variance σ2) may not necessarily be based on the most familiar direct estimate of that parameter (e.g., the sample variance s2). This notion is similar to a statement by Box (1988) that "the two desiderata - the
best choice of performance measure and the best way to employ the data to estimate it - are
distinct and frequently attainable, and they ought not be confused."
Following this reasoning, one general type of alternate measure might be based on replacing
each of the n observed responses within a given run i with some function of that value in
such a manner that each new value now individually provides some type of measure of dispersion. The primary motivation for this approach is that each transformed value now itself
becomes a type of individual dispersion measure, rather than all within-run i replicates being
aggregated into a single collective dispersion measure (such as si, ln(si+1), S/NN, i, et cetera).
Statistical analysis then might be conducted on these individual values, rather than on the
traditional summary measures, much as one would conduct analysis for mean effects using
3
Alternate Dispersion Measures in DOE
each of the n within-run observations as individual point estimates of the run response
mean µi. Such an approach would thereby increase the number of associated degrees of freedom, which in turn may result in stronger detection power, such as due to a tighter null reference distribution (although see below caution).
The general structure of a 22 or L4 replicated design is shown in Figure 1a, with all of the n
responses within each run i replaced in Figure 1b by the corresponding absolute deviation of
each observation from the within-run mean, |yi,j - y i|, where the subscript i denotes a given
set of experimental conditions (i.e., a run) and the subscript j denotes a given replication of
the experiment under these conditions. The absolute deviation values then are analyzed as if
they were individual responses using traditional analysis of variance or other methods, analogous to the approach employed in the study of central tendency, with the total original number of degrees of freedom preserved. This may be especially advantageous for highly
Traditional Measures
Experimental
Factor Settings
Replicate j Result, yi,j
Mean
Variance
Run (i)
A
B
C (AB)
1
2
...
n
yi
s i ln(si+1) S/NN,i
1
+1
+1
+1
y1,1
y1,2
...
y1,n
y1
s1 ln(s1+1) S/NN,1
2
+1
-1
-1
y2,1
y2,2
...
y2,n
y2
s2 ln(s2+1) S/NN,2
3
-1
+1
-1
y3,1
y3,2
...
y3,n
y3
s3 ln(s3+1) S/NN,3
4
-1
-1
+1
y4,1
y4,2
...
y4,n
y4
s4 ln(s4+1) S/NN,4
Figure 1a: Traditional Structure of Replicated 22 Design
Using Either s, ln(s+1), or S/NN as Dispersion Measure
Alternate Measures
Experimental
Factor Settings
Some Function of Replicate Results
Mean
Variance
Run (i)
A
B
C (AB)
1
2
...
n
yi
| yi• − yi |
1
+1
+1
+1
|y1,1- y 1|
|y1,2- y 1|
...
|y1,n- y 1|
y1
| y1• − y1 |
2
+1
-1
-1
|y2,1- y 2|
|y2,2- y 2|
...
|y2,n- y 2|
y2
| y2• − y2 |
3
-1
+1
-1
|y3,1- y 3|
|y3,2- y 3|
...
|y3,n- y 3|
y3
| y3• − y3 |
4
-1
-1
+1
|y4,1- y 4|
|y4,2- y 4|
...
|y4,n- y 4|
y4
| y4• − y4 |
Figure 1b: Alternate Structure of Replicated 22 Design Using |yi,j - y i| as Dispersion Measure
4
Alternate Dispersion Measures in DOE
replicated experiments, fairly saturated designs, or small designs, but also could produce unpredictable type I errors. (Taking absolute values prevents the n within-run (yi,j - y i) deviation terms from mathematically canceling each other out to sum to 0.)
Comparison of Measures
Approach
In order to compare the relative performance of each measure, several factors and interactions were included in standard 2k-p factorial experimental designs, and the ability to detect
different magnitude variance effects of a factor was examined. Response values for each run
were generated from normal distributions with parameters specified by standard first-order
additive models
µ = β0 +
åβ
r
Xr +
and
σ(γs) = γ 0 +
åγ
ååβ
r
r
r
r,m
Xr X m
m
r ≠m
X r + γ sXs +
r
r ≠s
å åγ
r
r ,m
Xr Xm ,
m
r ≠m
where Xr = Coded setting of factor r,
βr = Mean coefficient (half-effect) of factor r,
βr,m = Mean coefficient (half-effect) of interaction XrXm, r ≠ m,
γr = Standard deviation coefficient (half-effect) of factor r, r ≠ s, and
γr,m = Standard deviation coefficient (half-effect) of interaction XrXm, r ≠ m, and
γs = Standard deviation coefficient (half-effect) of factor “S” (varied in analysis).
Each of the factor settings are determined from the experimental layout and coded as +1 or -1
in the usual manner, with the half-effect size γs of factor S varied as described below. Note
that γs represents the degree to which factor S affects the response standard deviation, with γs
= 0 indicating that no true effect exists and larger values representing larger effects.
By iterating across a range of values for γs, the operating characteristics for each dispersion
measure were estimated by filling the experimental array with simluated response values using the µ and σ model equations and coded factor settings, calculating each dispersion measure, and conducting analyses of variance on these results in the usual manner. This process
was repeated 100,000 times at each increment of γs to achieve reasonably accurate results,
with the probability of each measure signaling a variance effect for each value of γs estimated
as
Pr(Signal | γs) =
Number of Times F test was Significant
Total Number of Simulation Replications
5
.
Alternate Dispersion Measures in DOE
A 2 5−1
Example
V
As an illustration, in the following analysis four replicates were generated for each of the 16
experimental runs of a 2 5−1
factorial experiment, with six hypothesized factors and interacV
tions assigned to columns, using mean and standard deviation model equations
µ = 100 + 10X1 - 5X2 + 7X3 - 4X2X3 + 5X1X4
σ = 10 + X1 + 1.5X2 - X3 + γ4X4 + 0.75X5 - 0.75X2X3 + 0.5X1X4,
such that the overall response mean and variance with all factors set at their midpoints are µY
= 100 and σY = 10, respectively (i.e., when all coded terms are zero such that no half-effects
are added or subtracted). The half-effect γ4 for factor 4 then was incremented iteratively from
γ4 = 0 to γ4 = 4 by 0.05 (corresponding to half-effects ranging from 0% to 40% of total response standard deviation at center points), at each increment of γ4 repeating the simulation
and subsequent analysis of variance (at an α = .05 significance level) 100,000 times.
These results are shown in Figure 2, with values along the ordinate representing the size of
the half-effect of Factor 4 relative to the response standard deviation if all Xi’s are set equal
to zero, γ4/γ0. This scaling can be thought of as half the relative reduction in σ possible by
changing between X4 = -1 and X4 = +1 (assuming no active interactions). Also included in
this analysis is the absolute deviation term raised to the 0.42 power, |yi,j - y i|.42, as an attempt
1
0.9
AbsDev
Estimated Probability of Detecting Effect
AbsDev^.42
0.8
s
-10log(s^2)
ln(s+1)
0.7
-10log(s^2)
AbsDev^.42
S/N(n)
0.6
var
0.5
S/N(n)
ln(s+1)
s
AbsDev
S/N(l)
Var
S/N(s)
0.4
0.3
0.2
0.1
S/N(l)
0
0.00
0.05
0.10
0.15
0.20
0.25
0.30
0.35
S/N(s)
0.40
Relative Contribution to Standard Deviation at Center Points
5−1
Figure 2: Relative Performance of Alternate vs Traditional Measures, 2 V Example (4 replicates)
6
Alternate Dispersion Measures in DOE
Probability of Detection
Value of γ4 and γ4/γ0 (Relative Contribution of Factor 4 to Variability)
Dispersion
Measure
γ4 = 0
(γ4/γγ0 = 0)
γ4 = 1
(γ4/γγ0 = .1)
γ4 = 2
(γ4/γγ0 = .2)
γ4 = 3
(γ4/γγ0 = .3)
γ4 = 4
(γ4/γγ0 = .4)
|yj - y |
0.0804
0.2115
0.5294
0.8365
0.9738
|yj - y |
0.0812
0.1822
0.4738
0.7892
0.9610
ln(s+1)
0.0486
0.1241
0.3603
0.6615
0.8820
S/NN1
0.0436
0.1216
0.3487
0.6433
0.8626
S/NN2
0.0471
0.1205
0.3499
0.6460
0.8659
s
0.0472
0.1245
0.3577
0.6533
0.8748
s2
0.0383
0.0959
0.2720
0.5081
0.7161
S/NS
0.0036
0.0032
0.0035
0.0034
0.0038
S/NL
0.0037
0.0053
0.0076
0.0121
0.0188
.42
5−1
Table 1: Comparison of Dispersion Measure Performance, 2 V Example
to normalize the absolute deviation terms before forming the test statistic as explained below
in the Discussion section. The ordering from top to bottom of the legend in Figure 2 reflects
the relative ordering of each measure’s power from best to worst. False alarm probabilities
for each measure and their power to detect several effect sizes also are tabulated in Table 1
for further comparison.
As these results illustrate, the choice of dispersion measure can make a significant difference
in both the probability of detecting true dispersion effects (i.e., power) and the probability of
erroneously signaling when no effect truly exists (i.e., significance). In this example, s,
ln(s+1), S/NN, and S/NN2 have comparable operating characteristics and false alarm probabilities close to the intended α = 0.05 ( αˆ ln(s+1) ≈ 0.0486), αˆ s ≈ 0.0472, αˆ S/N(N) ≈ 0.0436, and
αˆ S/N(N2) ≈ 0.0471). Note that ln(s+1) and s exhibit slightly higher power for all effect sizes,
which agrees with studies reported elsewhere (Schmidt and Launsby (1995), Box (1988),
Gunter (1988)).
The within-run variance, s2, has significantly lower power, with S/NS and S/NL both essentially being useless with αˆ S/N(S) ≈ 0.0036 and αˆ S/N(L) ≈ 0.0037 and with negligible detection
power across all effect sizes. As pointed out by Hunter (1987) and Montgomery (1996), this
is not surprising given that these measures inseparably confound location and dispersion effects. Awareness of these tradeoffs can be important information to practitioners in selecting
a response measure and interpreting analysis results. Interestingly, the absolute deviation
measures exhibit consistently higher power than the others, although at the expense of higher
false alarm probabilities ( αˆ = 0.0804 and αˆ = 0.0812, respectively), possibly due to the effect
of taking absolute values on terms slightly less than the average.
7
Alternate Dispersion Measures in DOE
Adjustment for Equal False Alarm Probabilities
Because the differences in false alarm rates noted above make direct comparison difficult, as
an exercise the (estimated) significance levels for each measure were adjusted empirically to
all equal αˆ = 0.05. With γ4 = 0, all 100,000 F test values for each measure were sorted to
locate the empirical 5th percentile, which then was used as an empirical critical value for that
measure, denoted herein as F'. These empirical Fα'ˆ =.05 values then were used in place of the
usual critical value as the half-effect γ4 increased iteratively as previously.
The resultant α-adjusted operating characteristics for the same 2 5−1
example as above are
V
compared in Figure 3 and Table 2. As shown, the same performance comparisons and orderings as previously still hold after adjusting all measures for equal specificity, but with smaller
differences in relative power. The conventional measures ln(s+1), s, S/NN, and S/NN2 again
are grouped together, with the sample variance s2 exhibiting significantly less power, and
S/NS and S/NL again being essentially useless for detecting dispersion effects. Similar to
previously, both absolute deviation statistics have greater power than traditional measures.
1
0.9
Estimated Probability of Detecting Effect
AbsDev
0.8
AbsDev^.42
AbsDev^.42
s
0.7
AbsDev
ln(s+1)
S/N(n)
0.6
-10log(s^2)
ln(s+1)
-10log(s^2)
s
S/N(n)
0.5
var
0.4
S/N(l)
Var
S/N(s)
0.3
0.2
S/N(l)
0.1
0
0.00
S/N(s)
0.05
0.10
0.15
0.20
0.25
0.30
0.35
0.40
Relative Contribution to Standard Deviation at Center Points
Figure 3: Relative Performance of Alternate vs Traditional Measures,
2 5V−1 Example (4 replicates)
(After empirical adjustment for equal false alarm probabilities)
Of course, determining the 5th empirical percentile of the corresponding F ′ values or otherwise adjusting for a desired α probability will not be possible in practice, but the above example illustrates that relative improvements are possible even with false alarm probabilities
somehow set equal. As illustrated by the following examples, similar benefits also are
8
Alternate Dispersion Measures in DOE
Probability of Detection
Value of γ4 and γ4/γ0 (Relative Contribution of Factor 4 to Variability)
Dispersion
Measure
γ4 = 0
(γ4/γγ0 = 0)
γ4 = 1
(γ4/γγ0 = .1)
γ4 = 2
(γ4/γγ0 = .2)
γ4 = 3
(γ4/γγ0 = .3)
γ4 = 4
(γ4/γγ0 = .4)
|yj - y |
0.0500
0.1385
0.4172
0.7538
0.4498
|yj - y |
0.0500
0.1286
0.3846
0.7176
0.9368
ln(s+1)
0.0500
0.1270
0.3658
0.6673
0.8854
S/NN1
0.0500
0.1353
0.3754
0.6725
0.8807
S/NN2
0.0500
0.1266
0.3615
0.6586
0.8737
s
0.0500
0.1303
0.3688
0.6649
0.8821
s2
0.0500
0.1189
0.3230
0.5741
0.7772
S/NS
0.0500
0.0476
0.0467
0.0480
0.0504
S/NL
0.0500
0.0651
0.0864
0.1168
0.1544
.42
5 −1
Table 2: Comparison of Dispersion Measure Performance, 2 V Example
(After empirical adjustment for equal false alarm probabilities)
obtained under other experimental conditions (design size, number of replicates, degree of
saturation).
A Second Example (23)
A fairly sparse design was used in the above example, resulting in a good number of degrees
of freedom associated with the SSE term to estimate between treatment variance (8 for traditional measures versus 56 for the absolute deviation measure). Alternatively, if a more saturated 23 design with 5 replicates were used to test which of five factors or interactions appear
significant, the error term would have only 2 degrees of freedom using traditional measures
versus 34 for the absolute deviation measure. Use of smaller or more saturated designs therefore may impact traditional measures more adversely than the absolute deviation measures.
In order to examine the relative performance of each measure in such cases, 5 replicates for
each run of a full factorial 23 experimental design were generated using the mean and standard deviation model equations
µ = 5071 + 17.25X1 + 17.25X2 - 1.5X3 - 3.5X1X3 - 4.75X2X3
σ = 7.5 + γ1X1 - 0.73X2 + 0.96X3 - 1.27X1X3 - 0.95X2X3
with the standard deviation half-effect of factor 1, γ1, incremented as above and with each
analysis of variance again using α = 0.05 significance levels. These results are shown in Fig-
9
Alternate Dispersion Measures in DOE
ure 4a, with Figure 4b again based on empirically adjusted αˆ = 0.05 significance levels. To
facilitate further comparison, Tables 3a and 3b also summarize these results.
1
0.9
AbsDev
Estimated Probability of Detecting Effect
AbsDev^.42
0.8
s
AbsDev
ln(s+1)
0.7
-10log(s^2)
0.6
S/N(n)
var
0.5
S/N(l)
AbsDev^.42
S/N(s)
0.4
ln(s+1)
-10log(s^2)
0.3
S/N(n)
s
0.2
Var
0.1
S/N(l)
S/N(s)
0
0.00 0.02 0.04
0.06 0.08 0.10 0.12
0.14 0.16 0.18
0.20 0.22
0.24 0.26 0.28
0.30 0.32 0.34 0.36
0.38 0.40
Relative Contribution to Standard Deviation at Center Points
Figure 4a: Relative Performance of Alternate vs. Traditional Dispersion Measures, 23 Example (5 replicates)
1
Estimated Probability of Detecting Effect
0.9
0.8
AbsDev
AbsDev^.42
AbsDev^.42
s
0.7
AbsDev
ln(s+1)
-10log(s^2)
0.6
0.5
S/N(n)
var
S/N(l)
0.4
-10log(s^2)
S/N(s)
S/N(n)
ln(s+1)
0.3
s
0.2
Var
0.1
S/N(s)
S/N(l)
0
0.00 0.02 0.04 0.06 0.08 0.10 0.12 0.14 0.16 0.18 0.20 0.22 0.24 0.26 0.28 0.30 0.32 0.34 0.36 0.38 0.40
Relative Contribution to Standard Deviation at Center Points
Figure 4b: Relative Performance of Alternate vs. Traditional Dispersion Measures, 23 Example (5 replicates)
(After empirical adjustment for equal false alarm probabilities)
10
Alternate Dispersion Measures in DOE
Probability of Detection
Value of γ1 and γ1/γ0 (Relative Contribution of Factor 1 to Variability)
Dispersion
Measure
γ1 = 0
(γ1/γγ0 = 0)
γ1 = .75
(γ1/γγ0 = .1)
γ1 = 1.5
(γ1/γγ0 = .2)
γ1 = 2.25
(γ1/γγ0 = .3)
γ1 = 3
(γ1/γγ0 = .4
|yj - y |
0.0807
0.1562
0.3647
0.6423
0.8610
|yj - y |
0.0746
0.1526
0.3592
0.6411
0.8714
ln(s+1)
0.0466
0.0831
0.1660
0.2761
0.3700
S/NN1
0.0453
0.0802
0.1610
0.2683
0.3585
S/NN2
0.0455
0.0818
0.1641
0.2719
0.3619
s
0.0462
0.0690
0.1304
0.2105
0.2725
s2
0.0393
0.0495
0.0833
0.1256
0.1558
S/NS
0.0567
0.0569
0.0575
0.0575
0.0571
S/NL
0.0579
0.0582
0.0578
0.0569
0.0568
.42
Table 3a: Comparison of Dispersion Measure Performance, 23 Example
Probability of Detection
Value of γ1 and γ1/γ0 (Relative Contribution of Factor 1 to Variability)
Dispersion
Measure
γ1 = 0
(γ1/γγ0 = 0)
γ1 = .75
(γ1/γγ0 = .1)
γ1 = 1.5
(γ1/γγ0 = .2)
γ1 = 2.25
(γ1/γγ0 = .3)
γ1 = 3
(γ1/γγ0 = .4
|yj - y |
0.0500
0.1051
0.2786
0.5431
0.7967
|yj - y |.42
0.0500
0.1097
0.2875
0.5619
0.8226
ln(s+1)
0.0500
0.0886
0.1767
0.2919
0.3881
S/NN1
0.0500
0.0877
0.1756
0.2907
0.3842
S/NN2
0.0500
0.0896
0.1785
0.2933
0.3868
s
0.0500
0.0752
0.1407
0.2256
0.2913
2
s
0.0500
0.0632
0.1057
0.1577
0.1941
S/NS
0.0500
0.0504
0.0509
0.0577
0.0504
S/NL
0.0500
0.0498
0.0495
0.0487
0.0484
Table 3b: Comparison of Dispersion Measure Performance, 23 Example
(After empirical adjustment for equal false alarm probabilities)
As previously, note that ln(s+1), S/NN, and S/NN2 exhibit higher power and roughly the same
false alarm rates as s, S/NL and S/NS again are fairly useless, and |yi,j - y i| and |yi,j - y i|.42 exhibit higher power across all magnitudes of variance effects. In contrast with the first example, note that each measure exhibits lower power as γ1 increases than previously, which may
11
Alternate Dispersion Measures in DOE
be attributable to the differences in design saturation and degrees of freedom associated with
the error terms. Use of the absolute deviation also may be appealing for another reason,
namely that if the above design had been fully saturated, standard ANOVA or other analysis
of these terms offers an alternative to the problematic practice of pooling up or down sumsof-squares that are estimated to have negligible effects. As discussed by Montgomery
(1997), "the pooling of mean squares (variances) is a procedure that has long been known to
produce considerable bias in test results."
Number of Replicates
In order to examine the effect of the number of replications per run on the relative performance of each measure, Figures 5 and 6 illustrate the effect of using only n = 2 replicates per
run and increasing to 8 replicates per run, respectively, for the same 23 example as above
(where previously n = 5 replicates). Figures 5a and 6a are for the unadjusted case and Figures 5b and 6b are for the case with all false alarm probabilities empirically adjusted to αˆ =
0.05 in the manner described above. Table 4 also compares the performance of ln(s+1), |yi,j y i|, and |yi,j - y i|.42 across a range of other number of replicates.
As these results illustrate, note that the difference in performance can vary significantly for
different numbers of replicates, with the ln(s+1) term consistently exhibiting among the best
detection power than other traditional measures across all effect and replicate sizes. The absolute deviation terms exhibit better power but again at the expense of uncontrolled type I
error rates. The |yi,j - y i| false detection rate for the n = 2 replicate case is very inflated above
the desired α = 0.05 well beyond any useful point, perhaps due to the lack of central limit
effects, whereas it appears to decrease to 0 as n increases. As a general rule, therefore, use of
this measure for a small number of replications should be avoided; most cases using four or
more replications examined to-date resulted in false detection rates of roughly 0.10 or lower
(with α = .05).. Also note that none of the measures are effective in this example for n = 2.
Table 4summarizes the benefits of larger numbers of replicates for ln(s+1) on both power and
convergence to the desired α.
Discussion
In all examined cases, ln(s+1) and both nominal signal-to-noise ratios consistently produced
equal or better power than other traditional measures, followed by s and s2, while the largerthe-better and smaller-the-better ratios were relatively useless. The increased power of |yi,j y i| and |yi,j - y i|.42 to detect true dispersion effects is interesting, although at the expense of
example, power to detect dispersion half-effects of
higher the false alarm rates. In one 2 5−1
V
roughly 25% of the average total response variance (when all other factors are at their center
points) was increased to approximately 0.64 from approximately 0.51 for the conventional s,
ln(s+1), S/NN1, and S/NN2 measures. Other examples suggest similar benefits for different
experiment sizes, saturation levels, and number of replicates.
12
Alternate Dispersion Measures in DOE
1
0.9
0.7
AbsDev^.42
AbsDev
AbsDev^.42
0.6
s
ln(s+1)
0.5
-10log(s^2)
S/N(n)
0.4
var
S/N(l)
0.3
S/N(s)
0.2
-10log(s^2)
S/N(n)
S/N(l)
S/N(s)
ln(s+1)
0.1
Var
0
0.00
0.02
0.04
0.06
0.08
0.10
0.12
0.14
0.16
0.18
0.20
0.22
0.24
0.26
0.28
0.30
0.32
0.34
0.36
0.38
Relative Contribution to Standard Deviation at Center Points
0.40
s
Figure 5a: 2 Replicates per Run, 23 Example
0.3
AbsDev
0.25
Estimated Probability of Detecting Effect
Estimated Probability of Detecting Effect
AbsDev
0.8
AbsDev^.42
s
ln(s+1)
0.2
-10log(s^2)
S/N(n)
var
0.15
AbsDev^.42
S/N(l)
S/N(s)
S/N(n)
-10log(s^2)
AbsDev
ln(s+1)
0.1
s
Var
S/N(s)
0.05
S/N(l)
0
0.00
0.02
0.04
0.06
0.08
0.10
0.12
0.14
0.16
0.18
0.20
0.22
0.24
0.26
0.28
0.30
Relative Contribution to Standard Deviation at Center Points
Figure 5b: 2 Replicates per Run, 23 Example
(After empirical adjustment for equal false alarm probabilities)
13
0.32
0.34
0.36
0.38
0.40
Alternate Dispersion Measures in DOE
1
AbsDev^.42
0.8
s
AbsDev
ln(s+1)
AbsDev^.42
-10log(s^2)
0.6
S/N(n)
ln(s+1)
var
S/N(n)
S/N(l)
-10log(s^2)
S/N(s)
s
0.4
Var
0.2
S/N(s)
S/N(l)
0
0.00 0.02 0.04 0.06 0.08 0.10 0.12 0.14 0.16 0.18 0.20 0.22 0.24 0.26 0.28 0.30 0.32 0.34 0.36 0.38 0.40
Relative Contribution to Standard Deviation at Center Points
Figure 6a: 8 Replicates per Run, 23 Example
1
AbsDev^.42
AbsDev
AbsDev
AbsDev^.42
Estimated Probability of Detecting Effect
Estimated Probability of Detecting Effect
AbsDev
0.8
s
ln(s+1)
-10log(s^2)
0.6
S/N(n)
ln(s+1)
var
-10log(s^2)
S/N(l)
0.4
S/N(n)
s
S/N(s)
Var
0.2
S/N(l)
S/N(s)
0
0.00 0.02 0.04 0.06 0.08 0.10 0.12 0.14 0.16 0.18 0.20 0.22 0.24 0.26 0.28 0.30 0.32 0.34 0.36 0.38 0.40
Relative Contribution to Standard Deviation at Center Points
Figure 6b: 8 Replicates per Run, 23 Example
(After empirical adjustment for equal false alarm probabilities)
14
Alternate Dispersion Measures in DOE
Probability of Detection
Value of γ1 and γ1/γ0 (Relative Contribution of Factor 1 to σ)
Number of
Replicates
2
4
6
Dispersion
Measure
γ1 = 0
(γ1/γγ0 = 0)
γ1 = .75
(γ1/γγ0 = .1)
γ1 = 1.5
(γ1/γγ0 = .2)
γ1 = 2.25
(γ1/γγ0 = .3)
γ1 = 3
(γ1/γγ0 = .4
ln(s+1)
0.0308
0.0334
0.0434
0.0599
0.0771
|yj - y |
0.6553
0.6764
0.7248
0.7954
0.8623
|yj - y |.42
0.6395
0.6620
0.7106
0.7818
0.8555
ln(s+1)
0.0452
0.0691
0.1290
0.2053
0.2747
|yj - y |
0.1402
0.2141
0.4051
0.6463
0.8435
|yj - y |.42
0.1283
0.2051
0.3934
0.6402
0.8493
ln(s+1)
0.0481
0.0942
0.2023
0.3423
0.4540
|yj - y |
0.0468
0.1193
0.3376
0.6413
0.8790
|yj - y |
0.0440
0.1199
0.3355
0.6434
0.8909
ln(s+1)
0.0488
0.1152
0.2595
0.4297
0.5288
|yj - y |
0.0171
0.0734
0.2947
0.6481
0.9058
|yj - y |
0.0168
0.0767
0.2960
0.6553
0.9199
ln(s+1)
0.0488
0.1348
0.3138
0.5108
0.6057
|yj - y |
0.0067
0.0470
0.2593
0.6546
0.8281
|yj - y |.42
0.0069
0.0513
0.2654
0.6667
0.4408
ln(s+1)
0.0490
0.1530
0.3654
0.5815
0.6709
|yj - y |
0.0027
0.0292
0.2331
0.6630
0.4437
|yj - y |
0.0027
0.0325
0.2410
0.6755
0.9562
ln(s+1)
0.0495
0.1707
0.4062
0.6187
0.6736
|yj - y |
0.0009
0.0205
0.2096
0.6714
0.9574
|yj - y |
0.0011
0.0233
0.2188
0.6875
0.9671
ln(s+1)
0.0496
0.1882
0.4420
0.6459
0.6763
|yj - y |
0.0004
0.0140
0.1901
0.6765
0.9658
0.0004
0.0164
0.2009
0.6959
0.9754
.42
8
.42
10
12
.42
14
.42
16
.42
|yj - y |
Table 4: Effect of Number of Replicates (unadjusted 23 case, desired α = .05)
Although the absolute deviation measures are relatively easy to implement by hand or in a
spreadsheet and present little additional work for the analyst, their false alarm rates can vary
dramatically when using traditional analysis of variance methods. Ideally a more exact
mathematical test based on the underlying random variable or reference distribution therefore
could be developed. For example, assuming Y ~ normal, the absolute deviation from the
mean has a folded or “half normal” distribution related to a central chi density with one degree of freedom (Johnson, Kotz, and Balakrishnan (1994)), suggesting some type of test of
15
Alternate Dispersion Measures in DOE
the upper chi tail. Alternatively, the absolute deviation can be transformed to approximate
normality by raising it to a power somewhere between 0.35 and 0.55 dependent upon the
specific criteria, with |Y - Y |.4168 being based on the Kullback-Leibler information.
References
Box, G. E. P. (1988), “Signal-to-Noise Ratios, Performance Criteria, and Transformations” (with
discussions), Technometrics, 30, 1-40.
Box, G. E. P. , Hunter, W. G., and Hunter, J. S. (1978), Statistics for Experimenters, New York: John
Wiley & Sons.
Byrne, D. M., and Taguchi, S. (1987), “The Taguchi Approach to Parameter Design”, Quality Progress, 20(12), Dec 1987, pp. 19-26.
Daniel, C. (1976), Applications of Statistics to Industrial Experimentation, New York: John Wiley &
Sons.
Davidian, M. and Carroll R. J. (1987), “Variance Function Estimation”, Journal of the American Statistical Association, 82, 1079-1091.
Gunter, B. (1988), “Discussion: Signal-to-Noise Ratios, Performance Criteria, and Transformations”,
Technometrics, 30, 32-35.
Hunter, J. S. (1985), “Statistical Design Applied to Product Design”, Journal of Quality Technology,
17(4), 210-221.
Hunter, J. S. (1987), “Signal to Noise Ratio Debated”, Quality Progress 20(5), May 1987, pp. 7-9.
Johnson, N. L., Kotz, S., and Balakrishnan, N. (1994), Continuous Univariate Distributions, New
York: John Wiley & Sons.
Montgomery, D. C. (1997), Design and Analysis of Experiments, 4th ed., New York: John Wiley &
Sons.
Montgomery, D. C. (1996), Introduction to Statistical Quality Control, 3rd ed., New York: John
Wiley & Sons.
Myers, R. H., and Montgomery, D. C. (1995), Response Surface Methodology, New York: John
Wiley & Sons.
Phadke, M. S. (1989), Quality Engineering Using Robust Design, Princeton NJ: Prentice-Hall.
Pignatiello, J. J. and Ramberg, J. S. (1991), "The Ten Triumphs and Tragedies of Genichi Taguchi”,
Quality Engineering 4(2), 211-225.
Schmidt, S. R., and Launsby, R.G. (1995), Understanding Industrial Designed Experiments, 4th ed.,
Colorado Springs, CO: Air Academy Press.
Taguchi, G. (1986). Introduction to Quality Engineering, Dearborn, MI: American Supplier Institute.
Vining, G. G. and Schaub, D. (1996), “Experimental Designs for Estimating Both Mean and Variance Functions”, Journal of Quality Technology, 28(2), 135-147.
16