Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Degrees of freedom (statistics) wikipedia , lookup
Inductive probability wikipedia , lookup
Foundations of statistics wikipedia , lookup
Receiver operating characteristic wikipedia , lookup
Taylor's law wikipedia , lookup
Psychometrics wikipedia , lookup
Density of states wikipedia , lookup
Alternate Dispersion Measures in Replicated Factorial Experiments Neal A. Mackertich The Raytheon Company, Sudbury MA 02421 James C. Benneyan Northeastern University, Boston MA 02115 Peter D. Kraus The Raytheon Company, Tewksbury MA 01876 Abstract Any of several statistics traditionally are used to detect dispersion effects in the analysis of replicated factorial experiments, including the within-run standard deviation s, the natural logarithm ln(s+1), various signal-to-noise ratios, and others. This study examines the relative performance of each approach using recent experimental designs, with ln(s+1) typically producing the best results. An alternate approach, based on the absolute deviations from the within-run means, also is shown to increase detection but at the expense of uncontrolled false alarms. Introduction While designed experiments historically have focused primarily on optimizing the mean of one or more responses, they increasingly are being used to identify factors that affect response dispersion. Examples include increased interest in variance reduction as a primary objective, increased emphasis on process and product robustness, and work on designs for estimating variance functions (e.g., Box (1988), Box and Jones (1992), Byrne and Taguchi (1986), Davidian and Carroll (1987), Phadke (1989), and Vining and Schaub (1996)). In order to detect differences in response variance due to factor effects, any of several statistics traditionally are used in the analysis of replicated factorial experiments. These include the within-run standard deviation s, the natural logarithm ln(s+1), the within-run variance s2, various signal-to-noise ratios, and others. This heightened emphasis on response variance increases the importance of understanding the relative performance of alternate possible measures for detecting true variance effects in typical experimental scenarios. The current study also is motivated by the suggestion that the best measure for detecting differences in some value of interest might not necessarily be one of the most familiar statistics for estimating that parameter. We examined the relative performance of conventional and alternate types of dispersion measures in recent experimental scenarios, as well as another type of dispersion measure motivated during the course of this study by a desire to increase power by preserving associated degrees of freedom. 1 Alternate Dispersion Measures in DOE Results summarized below indicate that ln(s+1) tends to be the best traditional measure for detecting variance effects, while larger-the-better and smaller-the-better signal-to-noise ratios tend to be quite poor. Two alternate statistics suggested during the course of this study, the absolute deviation from the within-run mean, |yi,j - y i|, and an approximately normalized transform of this statistic, |yi,j - y i|.42 increase power in several examples over s, ln(s+1), and popular signal-to-noise ratios, although at the expense of uncontrolled type I error rates. Traditional Dispersion Measures in DOE The most common measures used in replicated designs to identify factors and interactions affecting response dispersion are well-known in the literature (for example, see Box (1988), Box, Hunter, and Hunter (1978), Montgomery (1984), Myers and Montgomery (1995), Schmidt and Launsby (1995), Taguchi (1986)) and include: • the within-run sample standard deviation, s, of all response replicates at each set of experimental conditions, • the within-run sample variance, s2, of all response replicates at each set of experimental conditions, • the logarithm or natural logarithm of s or (s+1), and • Taguchi's three most common "signal-to-noise" ratios (i.e., "nominal-the-best", "smaller-the-better", and "larger-the better"). The within-run standard deviation s and variance s2 are direct estimates of their theoretical counterparts σ and σ 2 and need no further motivation. The usual rationale for taking logarithms of the sample standard deviation is to approximately normalize s in order to obtain more accurate results via standard statistical tests for significance, with some small constant (by convention 1) often first added to avoid the potential problem of taking a logarithm of zero (such as due to rounding in data collection). The three most standard signal-to-noise ratios typically are referred to as "nominal-the-best", "smaller-the-better", and "larger-the-better" and have been proposed for the three common situations for which the experimental objectives either are to: • achieve response values as close as possible to a desired target value (and with as minimum variability about that target as possible), such as for a manufacturing dimension or output voltage, • minimize all response values as much as possible (and with as minimum variability as possible), such as for percent shrinkage or deviation from round, or • maximize all response values as much as possible (and with as minimum variability as possible), such as for bond strength or time until failure. 2 Alternate Dispersion Measures in DOE These three signal-to-noise ratios usually are estimated, respectively, by: • æy ö ^ "Nominal-the-best": S/N N = 10 • log ç 2 ÷ , ès ø • æ 1 n 2ö ^ ç "Smaller-the-better": S/N S = -10 • log å yi , and è n i=1 ø • "Larger-the-better": 2 æ1 n 1 ö ^ S/N L = -10 • log ç å 2 ÷ . è n i=1 yi ø As discussed by others, note that an important distinction of signal-to-noise ratios is that rather than decoupling the mean and variance into separate analyses, they attempt to form a single combined metric of both central tendency and variability (e.g., Gunter (1988) and Hunter (1987)). Also note that although Taguchi (1986) proposed many other signal-to-noise ratios (for example, a less common nominal-the-best signal-to-noise ratio has the form S/NN2 = 10 • log s2), the above three almost exclusively tend to be used in practice. For each ratio and their associated loss functions, the general rationale is to penalize in some way for deviations from the desired value in terms of both location and dispersion in a single measure. For further discussion, see Box (1988), Hunter (1985, 1987), Phadke (1989), Pignatello and Ramberg (1991), Taguchi (1986), and others. Regardless of which measure is used, significant factors or interactions typically are identified via the analysis of variance, F tests, t tests, probability plots, dot plots, and the like (see Box, Hunter, and Hunter (1978), Daniel (1976), and Montgomery (1984)). Unlike the case of testing for mean effects, however these tests now are conducted on the 2k within-run summary statistics (in the case of two-level factorial designs), rather than the n individual outcomes within each of the 2k runs, resulting in the loss of 2k(n - 1) degrees of freedom and the need for either at least one empty column or variance pooling to identify dispersion effects. In addition to the above traditional measures, also of interest here is whether some alternative could result in better variance effect detection. As suggested earlier, the best test criterion for detecting differences in a process parameter (in this case response variance σ2) may not necessarily be based on the most familiar direct estimate of that parameter (e.g., the sample variance s2). This notion is similar to a statement by Box (1988) that "the two desiderata - the best choice of performance measure and the best way to employ the data to estimate it - are distinct and frequently attainable, and they ought not be confused." Following this reasoning, one general type of alternate measure might be based on replacing each of the n observed responses within a given run i with some function of that value in such a manner that each new value now individually provides some type of measure of dispersion. The primary motivation for this approach is that each transformed value now itself becomes a type of individual dispersion measure, rather than all within-run i replicates being aggregated into a single collective dispersion measure (such as si, ln(si+1), S/NN, i, et cetera). Statistical analysis then might be conducted on these individual values, rather than on the traditional summary measures, much as one would conduct analysis for mean effects using 3 Alternate Dispersion Measures in DOE each of the n within-run observations as individual point estimates of the run response mean µi. Such an approach would thereby increase the number of associated degrees of freedom, which in turn may result in stronger detection power, such as due to a tighter null reference distribution (although see below caution). The general structure of a 22 or L4 replicated design is shown in Figure 1a, with all of the n responses within each run i replaced in Figure 1b by the corresponding absolute deviation of each observation from the within-run mean, |yi,j - y i|, where the subscript i denotes a given set of experimental conditions (i.e., a run) and the subscript j denotes a given replication of the experiment under these conditions. The absolute deviation values then are analyzed as if they were individual responses using traditional analysis of variance or other methods, analogous to the approach employed in the study of central tendency, with the total original number of degrees of freedom preserved. This may be especially advantageous for highly Traditional Measures Experimental Factor Settings Replicate j Result, yi,j Mean Variance Run (i) A B C (AB) 1 2 ... n yi s i ln(si+1) S/NN,i 1 +1 +1 +1 y1,1 y1,2 ... y1,n y1 s1 ln(s1+1) S/NN,1 2 +1 -1 -1 y2,1 y2,2 ... y2,n y2 s2 ln(s2+1) S/NN,2 3 -1 +1 -1 y3,1 y3,2 ... y3,n y3 s3 ln(s3+1) S/NN,3 4 -1 -1 +1 y4,1 y4,2 ... y4,n y4 s4 ln(s4+1) S/NN,4 Figure 1a: Traditional Structure of Replicated 22 Design Using Either s, ln(s+1), or S/NN as Dispersion Measure Alternate Measures Experimental Factor Settings Some Function of Replicate Results Mean Variance Run (i) A B C (AB) 1 2 ... n yi | yi• − yi | 1 +1 +1 +1 |y1,1- y 1| |y1,2- y 1| ... |y1,n- y 1| y1 | y1• − y1 | 2 +1 -1 -1 |y2,1- y 2| |y2,2- y 2| ... |y2,n- y 2| y2 | y2• − y2 | 3 -1 +1 -1 |y3,1- y 3| |y3,2- y 3| ... |y3,n- y 3| y3 | y3• − y3 | 4 -1 -1 +1 |y4,1- y 4| |y4,2- y 4| ... |y4,n- y 4| y4 | y4• − y4 | Figure 1b: Alternate Structure of Replicated 22 Design Using |yi,j - y i| as Dispersion Measure 4 Alternate Dispersion Measures in DOE replicated experiments, fairly saturated designs, or small designs, but also could produce unpredictable type I errors. (Taking absolute values prevents the n within-run (yi,j - y i) deviation terms from mathematically canceling each other out to sum to 0.) Comparison of Measures Approach In order to compare the relative performance of each measure, several factors and interactions were included in standard 2k-p factorial experimental designs, and the ability to detect different magnitude variance effects of a factor was examined. Response values for each run were generated from normal distributions with parameters specified by standard first-order additive models µ = β0 + åβ r Xr + and σ(γs) = γ 0 + åγ ååβ r r r r,m Xr X m m r ≠m X r + γ sXs + r r ≠s å åγ r r ,m Xr Xm , m r ≠m where Xr = Coded setting of factor r, βr = Mean coefficient (half-effect) of factor r, βr,m = Mean coefficient (half-effect) of interaction XrXm, r ≠ m, γr = Standard deviation coefficient (half-effect) of factor r, r ≠ s, and γr,m = Standard deviation coefficient (half-effect) of interaction XrXm, r ≠ m, and γs = Standard deviation coefficient (half-effect) of factor “S” (varied in analysis). Each of the factor settings are determined from the experimental layout and coded as +1 or -1 in the usual manner, with the half-effect size γs of factor S varied as described below. Note that γs represents the degree to which factor S affects the response standard deviation, with γs = 0 indicating that no true effect exists and larger values representing larger effects. By iterating across a range of values for γs, the operating characteristics for each dispersion measure were estimated by filling the experimental array with simluated response values using the µ and σ model equations and coded factor settings, calculating each dispersion measure, and conducting analyses of variance on these results in the usual manner. This process was repeated 100,000 times at each increment of γs to achieve reasonably accurate results, with the probability of each measure signaling a variance effect for each value of γs estimated as Pr(Signal | γs) = Number of Times F test was Significant Total Number of Simulation Replications 5 . Alternate Dispersion Measures in DOE A 2 5−1 Example V As an illustration, in the following analysis four replicates were generated for each of the 16 experimental runs of a 2 5−1 factorial experiment, with six hypothesized factors and interacV tions assigned to columns, using mean and standard deviation model equations µ = 100 + 10X1 - 5X2 + 7X3 - 4X2X3 + 5X1X4 σ = 10 + X1 + 1.5X2 - X3 + γ4X4 + 0.75X5 - 0.75X2X3 + 0.5X1X4, such that the overall response mean and variance with all factors set at their midpoints are µY = 100 and σY = 10, respectively (i.e., when all coded terms are zero such that no half-effects are added or subtracted). The half-effect γ4 for factor 4 then was incremented iteratively from γ4 = 0 to γ4 = 4 by 0.05 (corresponding to half-effects ranging from 0% to 40% of total response standard deviation at center points), at each increment of γ4 repeating the simulation and subsequent analysis of variance (at an α = .05 significance level) 100,000 times. These results are shown in Figure 2, with values along the ordinate representing the size of the half-effect of Factor 4 relative to the response standard deviation if all Xi’s are set equal to zero, γ4/γ0. This scaling can be thought of as half the relative reduction in σ possible by changing between X4 = -1 and X4 = +1 (assuming no active interactions). Also included in this analysis is the absolute deviation term raised to the 0.42 power, |yi,j - y i|.42, as an attempt 1 0.9 AbsDev Estimated Probability of Detecting Effect AbsDev^.42 0.8 s -10log(s^2) ln(s+1) 0.7 -10log(s^2) AbsDev^.42 S/N(n) 0.6 var 0.5 S/N(n) ln(s+1) s AbsDev S/N(l) Var S/N(s) 0.4 0.3 0.2 0.1 S/N(l) 0 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 S/N(s) 0.40 Relative Contribution to Standard Deviation at Center Points 5−1 Figure 2: Relative Performance of Alternate vs Traditional Measures, 2 V Example (4 replicates) 6 Alternate Dispersion Measures in DOE Probability of Detection Value of γ4 and γ4/γ0 (Relative Contribution of Factor 4 to Variability) Dispersion Measure γ4 = 0 (γ4/γγ0 = 0) γ4 = 1 (γ4/γγ0 = .1) γ4 = 2 (γ4/γγ0 = .2) γ4 = 3 (γ4/γγ0 = .3) γ4 = 4 (γ4/γγ0 = .4) |yj - y | 0.0804 0.2115 0.5294 0.8365 0.9738 |yj - y | 0.0812 0.1822 0.4738 0.7892 0.9610 ln(s+1) 0.0486 0.1241 0.3603 0.6615 0.8820 S/NN1 0.0436 0.1216 0.3487 0.6433 0.8626 S/NN2 0.0471 0.1205 0.3499 0.6460 0.8659 s 0.0472 0.1245 0.3577 0.6533 0.8748 s2 0.0383 0.0959 0.2720 0.5081 0.7161 S/NS 0.0036 0.0032 0.0035 0.0034 0.0038 S/NL 0.0037 0.0053 0.0076 0.0121 0.0188 .42 5−1 Table 1: Comparison of Dispersion Measure Performance, 2 V Example to normalize the absolute deviation terms before forming the test statistic as explained below in the Discussion section. The ordering from top to bottom of the legend in Figure 2 reflects the relative ordering of each measure’s power from best to worst. False alarm probabilities for each measure and their power to detect several effect sizes also are tabulated in Table 1 for further comparison. As these results illustrate, the choice of dispersion measure can make a significant difference in both the probability of detecting true dispersion effects (i.e., power) and the probability of erroneously signaling when no effect truly exists (i.e., significance). In this example, s, ln(s+1), S/NN, and S/NN2 have comparable operating characteristics and false alarm probabilities close to the intended α = 0.05 ( αˆ ln(s+1) ≈ 0.0486), αˆ s ≈ 0.0472, αˆ S/N(N) ≈ 0.0436, and αˆ S/N(N2) ≈ 0.0471). Note that ln(s+1) and s exhibit slightly higher power for all effect sizes, which agrees with studies reported elsewhere (Schmidt and Launsby (1995), Box (1988), Gunter (1988)). The within-run variance, s2, has significantly lower power, with S/NS and S/NL both essentially being useless with αˆ S/N(S) ≈ 0.0036 and αˆ S/N(L) ≈ 0.0037 and with negligible detection power across all effect sizes. As pointed out by Hunter (1987) and Montgomery (1996), this is not surprising given that these measures inseparably confound location and dispersion effects. Awareness of these tradeoffs can be important information to practitioners in selecting a response measure and interpreting analysis results. Interestingly, the absolute deviation measures exhibit consistently higher power than the others, although at the expense of higher false alarm probabilities ( αˆ = 0.0804 and αˆ = 0.0812, respectively), possibly due to the effect of taking absolute values on terms slightly less than the average. 7 Alternate Dispersion Measures in DOE Adjustment for Equal False Alarm Probabilities Because the differences in false alarm rates noted above make direct comparison difficult, as an exercise the (estimated) significance levels for each measure were adjusted empirically to all equal αˆ = 0.05. With γ4 = 0, all 100,000 F test values for each measure were sorted to locate the empirical 5th percentile, which then was used as an empirical critical value for that measure, denoted herein as F'. These empirical Fα'ˆ =.05 values then were used in place of the usual critical value as the half-effect γ4 increased iteratively as previously. The resultant α-adjusted operating characteristics for the same 2 5−1 example as above are V compared in Figure 3 and Table 2. As shown, the same performance comparisons and orderings as previously still hold after adjusting all measures for equal specificity, but with smaller differences in relative power. The conventional measures ln(s+1), s, S/NN, and S/NN2 again are grouped together, with the sample variance s2 exhibiting significantly less power, and S/NS and S/NL again being essentially useless for detecting dispersion effects. Similar to previously, both absolute deviation statistics have greater power than traditional measures. 1 0.9 Estimated Probability of Detecting Effect AbsDev 0.8 AbsDev^.42 AbsDev^.42 s 0.7 AbsDev ln(s+1) S/N(n) 0.6 -10log(s^2) ln(s+1) -10log(s^2) s S/N(n) 0.5 var 0.4 S/N(l) Var S/N(s) 0.3 0.2 S/N(l) 0.1 0 0.00 S/N(s) 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 Relative Contribution to Standard Deviation at Center Points Figure 3: Relative Performance of Alternate vs Traditional Measures, 2 5V−1 Example (4 replicates) (After empirical adjustment for equal false alarm probabilities) Of course, determining the 5th empirical percentile of the corresponding F ′ values or otherwise adjusting for a desired α probability will not be possible in practice, but the above example illustrates that relative improvements are possible even with false alarm probabilities somehow set equal. As illustrated by the following examples, similar benefits also are 8 Alternate Dispersion Measures in DOE Probability of Detection Value of γ4 and γ4/γ0 (Relative Contribution of Factor 4 to Variability) Dispersion Measure γ4 = 0 (γ4/γγ0 = 0) γ4 = 1 (γ4/γγ0 = .1) γ4 = 2 (γ4/γγ0 = .2) γ4 = 3 (γ4/γγ0 = .3) γ4 = 4 (γ4/γγ0 = .4) |yj - y | 0.0500 0.1385 0.4172 0.7538 0.4498 |yj - y | 0.0500 0.1286 0.3846 0.7176 0.9368 ln(s+1) 0.0500 0.1270 0.3658 0.6673 0.8854 S/NN1 0.0500 0.1353 0.3754 0.6725 0.8807 S/NN2 0.0500 0.1266 0.3615 0.6586 0.8737 s 0.0500 0.1303 0.3688 0.6649 0.8821 s2 0.0500 0.1189 0.3230 0.5741 0.7772 S/NS 0.0500 0.0476 0.0467 0.0480 0.0504 S/NL 0.0500 0.0651 0.0864 0.1168 0.1544 .42 5 −1 Table 2: Comparison of Dispersion Measure Performance, 2 V Example (After empirical adjustment for equal false alarm probabilities) obtained under other experimental conditions (design size, number of replicates, degree of saturation). A Second Example (23) A fairly sparse design was used in the above example, resulting in a good number of degrees of freedom associated with the SSE term to estimate between treatment variance (8 for traditional measures versus 56 for the absolute deviation measure). Alternatively, if a more saturated 23 design with 5 replicates were used to test which of five factors or interactions appear significant, the error term would have only 2 degrees of freedom using traditional measures versus 34 for the absolute deviation measure. Use of smaller or more saturated designs therefore may impact traditional measures more adversely than the absolute deviation measures. In order to examine the relative performance of each measure in such cases, 5 replicates for each run of a full factorial 23 experimental design were generated using the mean and standard deviation model equations µ = 5071 + 17.25X1 + 17.25X2 - 1.5X3 - 3.5X1X3 - 4.75X2X3 σ = 7.5 + γ1X1 - 0.73X2 + 0.96X3 - 1.27X1X3 - 0.95X2X3 with the standard deviation half-effect of factor 1, γ1, incremented as above and with each analysis of variance again using α = 0.05 significance levels. These results are shown in Fig- 9 Alternate Dispersion Measures in DOE ure 4a, with Figure 4b again based on empirically adjusted αˆ = 0.05 significance levels. To facilitate further comparison, Tables 3a and 3b also summarize these results. 1 0.9 AbsDev Estimated Probability of Detecting Effect AbsDev^.42 0.8 s AbsDev ln(s+1) 0.7 -10log(s^2) 0.6 S/N(n) var 0.5 S/N(l) AbsDev^.42 S/N(s) 0.4 ln(s+1) -10log(s^2) 0.3 S/N(n) s 0.2 Var 0.1 S/N(l) S/N(s) 0 0.00 0.02 0.04 0.06 0.08 0.10 0.12 0.14 0.16 0.18 0.20 0.22 0.24 0.26 0.28 0.30 0.32 0.34 0.36 0.38 0.40 Relative Contribution to Standard Deviation at Center Points Figure 4a: Relative Performance of Alternate vs. Traditional Dispersion Measures, 23 Example (5 replicates) 1 Estimated Probability of Detecting Effect 0.9 0.8 AbsDev AbsDev^.42 AbsDev^.42 s 0.7 AbsDev ln(s+1) -10log(s^2) 0.6 0.5 S/N(n) var S/N(l) 0.4 -10log(s^2) S/N(s) S/N(n) ln(s+1) 0.3 s 0.2 Var 0.1 S/N(s) S/N(l) 0 0.00 0.02 0.04 0.06 0.08 0.10 0.12 0.14 0.16 0.18 0.20 0.22 0.24 0.26 0.28 0.30 0.32 0.34 0.36 0.38 0.40 Relative Contribution to Standard Deviation at Center Points Figure 4b: Relative Performance of Alternate vs. Traditional Dispersion Measures, 23 Example (5 replicates) (After empirical adjustment for equal false alarm probabilities) 10 Alternate Dispersion Measures in DOE Probability of Detection Value of γ1 and γ1/γ0 (Relative Contribution of Factor 1 to Variability) Dispersion Measure γ1 = 0 (γ1/γγ0 = 0) γ1 = .75 (γ1/γγ0 = .1) γ1 = 1.5 (γ1/γγ0 = .2) γ1 = 2.25 (γ1/γγ0 = .3) γ1 = 3 (γ1/γγ0 = .4 |yj - y | 0.0807 0.1562 0.3647 0.6423 0.8610 |yj - y | 0.0746 0.1526 0.3592 0.6411 0.8714 ln(s+1) 0.0466 0.0831 0.1660 0.2761 0.3700 S/NN1 0.0453 0.0802 0.1610 0.2683 0.3585 S/NN2 0.0455 0.0818 0.1641 0.2719 0.3619 s 0.0462 0.0690 0.1304 0.2105 0.2725 s2 0.0393 0.0495 0.0833 0.1256 0.1558 S/NS 0.0567 0.0569 0.0575 0.0575 0.0571 S/NL 0.0579 0.0582 0.0578 0.0569 0.0568 .42 Table 3a: Comparison of Dispersion Measure Performance, 23 Example Probability of Detection Value of γ1 and γ1/γ0 (Relative Contribution of Factor 1 to Variability) Dispersion Measure γ1 = 0 (γ1/γγ0 = 0) γ1 = .75 (γ1/γγ0 = .1) γ1 = 1.5 (γ1/γγ0 = .2) γ1 = 2.25 (γ1/γγ0 = .3) γ1 = 3 (γ1/γγ0 = .4 |yj - y | 0.0500 0.1051 0.2786 0.5431 0.7967 |yj - y |.42 0.0500 0.1097 0.2875 0.5619 0.8226 ln(s+1) 0.0500 0.0886 0.1767 0.2919 0.3881 S/NN1 0.0500 0.0877 0.1756 0.2907 0.3842 S/NN2 0.0500 0.0896 0.1785 0.2933 0.3868 s 0.0500 0.0752 0.1407 0.2256 0.2913 2 s 0.0500 0.0632 0.1057 0.1577 0.1941 S/NS 0.0500 0.0504 0.0509 0.0577 0.0504 S/NL 0.0500 0.0498 0.0495 0.0487 0.0484 Table 3b: Comparison of Dispersion Measure Performance, 23 Example (After empirical adjustment for equal false alarm probabilities) As previously, note that ln(s+1), S/NN, and S/NN2 exhibit higher power and roughly the same false alarm rates as s, S/NL and S/NS again are fairly useless, and |yi,j - y i| and |yi,j - y i|.42 exhibit higher power across all magnitudes of variance effects. In contrast with the first example, note that each measure exhibits lower power as γ1 increases than previously, which may 11 Alternate Dispersion Measures in DOE be attributable to the differences in design saturation and degrees of freedom associated with the error terms. Use of the absolute deviation also may be appealing for another reason, namely that if the above design had been fully saturated, standard ANOVA or other analysis of these terms offers an alternative to the problematic practice of pooling up or down sumsof-squares that are estimated to have negligible effects. As discussed by Montgomery (1997), "the pooling of mean squares (variances) is a procedure that has long been known to produce considerable bias in test results." Number of Replicates In order to examine the effect of the number of replications per run on the relative performance of each measure, Figures 5 and 6 illustrate the effect of using only n = 2 replicates per run and increasing to 8 replicates per run, respectively, for the same 23 example as above (where previously n = 5 replicates). Figures 5a and 6a are for the unadjusted case and Figures 5b and 6b are for the case with all false alarm probabilities empirically adjusted to αˆ = 0.05 in the manner described above. Table 4 also compares the performance of ln(s+1), |yi,j y i|, and |yi,j - y i|.42 across a range of other number of replicates. As these results illustrate, note that the difference in performance can vary significantly for different numbers of replicates, with the ln(s+1) term consistently exhibiting among the best detection power than other traditional measures across all effect and replicate sizes. The absolute deviation terms exhibit better power but again at the expense of uncontrolled type I error rates. The |yi,j - y i| false detection rate for the n = 2 replicate case is very inflated above the desired α = 0.05 well beyond any useful point, perhaps due to the lack of central limit effects, whereas it appears to decrease to 0 as n increases. As a general rule, therefore, use of this measure for a small number of replications should be avoided; most cases using four or more replications examined to-date resulted in false detection rates of roughly 0.10 or lower (with α = .05).. Also note that none of the measures are effective in this example for n = 2. Table 4summarizes the benefits of larger numbers of replicates for ln(s+1) on both power and convergence to the desired α. Discussion In all examined cases, ln(s+1) and both nominal signal-to-noise ratios consistently produced equal or better power than other traditional measures, followed by s and s2, while the largerthe-better and smaller-the-better ratios were relatively useless. The increased power of |yi,j y i| and |yi,j - y i|.42 to detect true dispersion effects is interesting, although at the expense of example, power to detect dispersion half-effects of higher the false alarm rates. In one 2 5−1 V roughly 25% of the average total response variance (when all other factors are at their center points) was increased to approximately 0.64 from approximately 0.51 for the conventional s, ln(s+1), S/NN1, and S/NN2 measures. Other examples suggest similar benefits for different experiment sizes, saturation levels, and number of replicates. 12 Alternate Dispersion Measures in DOE 1 0.9 0.7 AbsDev^.42 AbsDev AbsDev^.42 0.6 s ln(s+1) 0.5 -10log(s^2) S/N(n) 0.4 var S/N(l) 0.3 S/N(s) 0.2 -10log(s^2) S/N(n) S/N(l) S/N(s) ln(s+1) 0.1 Var 0 0.00 0.02 0.04 0.06 0.08 0.10 0.12 0.14 0.16 0.18 0.20 0.22 0.24 0.26 0.28 0.30 0.32 0.34 0.36 0.38 Relative Contribution to Standard Deviation at Center Points 0.40 s Figure 5a: 2 Replicates per Run, 23 Example 0.3 AbsDev 0.25 Estimated Probability of Detecting Effect Estimated Probability of Detecting Effect AbsDev 0.8 AbsDev^.42 s ln(s+1) 0.2 -10log(s^2) S/N(n) var 0.15 AbsDev^.42 S/N(l) S/N(s) S/N(n) -10log(s^2) AbsDev ln(s+1) 0.1 s Var S/N(s) 0.05 S/N(l) 0 0.00 0.02 0.04 0.06 0.08 0.10 0.12 0.14 0.16 0.18 0.20 0.22 0.24 0.26 0.28 0.30 Relative Contribution to Standard Deviation at Center Points Figure 5b: 2 Replicates per Run, 23 Example (After empirical adjustment for equal false alarm probabilities) 13 0.32 0.34 0.36 0.38 0.40 Alternate Dispersion Measures in DOE 1 AbsDev^.42 0.8 s AbsDev ln(s+1) AbsDev^.42 -10log(s^2) 0.6 S/N(n) ln(s+1) var S/N(n) S/N(l) -10log(s^2) S/N(s) s 0.4 Var 0.2 S/N(s) S/N(l) 0 0.00 0.02 0.04 0.06 0.08 0.10 0.12 0.14 0.16 0.18 0.20 0.22 0.24 0.26 0.28 0.30 0.32 0.34 0.36 0.38 0.40 Relative Contribution to Standard Deviation at Center Points Figure 6a: 8 Replicates per Run, 23 Example 1 AbsDev^.42 AbsDev AbsDev AbsDev^.42 Estimated Probability of Detecting Effect Estimated Probability of Detecting Effect AbsDev 0.8 s ln(s+1) -10log(s^2) 0.6 S/N(n) ln(s+1) var -10log(s^2) S/N(l) 0.4 S/N(n) s S/N(s) Var 0.2 S/N(l) S/N(s) 0 0.00 0.02 0.04 0.06 0.08 0.10 0.12 0.14 0.16 0.18 0.20 0.22 0.24 0.26 0.28 0.30 0.32 0.34 0.36 0.38 0.40 Relative Contribution to Standard Deviation at Center Points Figure 6b: 8 Replicates per Run, 23 Example (After empirical adjustment for equal false alarm probabilities) 14 Alternate Dispersion Measures in DOE Probability of Detection Value of γ1 and γ1/γ0 (Relative Contribution of Factor 1 to σ) Number of Replicates 2 4 6 Dispersion Measure γ1 = 0 (γ1/γγ0 = 0) γ1 = .75 (γ1/γγ0 = .1) γ1 = 1.5 (γ1/γγ0 = .2) γ1 = 2.25 (γ1/γγ0 = .3) γ1 = 3 (γ1/γγ0 = .4 ln(s+1) 0.0308 0.0334 0.0434 0.0599 0.0771 |yj - y | 0.6553 0.6764 0.7248 0.7954 0.8623 |yj - y |.42 0.6395 0.6620 0.7106 0.7818 0.8555 ln(s+1) 0.0452 0.0691 0.1290 0.2053 0.2747 |yj - y | 0.1402 0.2141 0.4051 0.6463 0.8435 |yj - y |.42 0.1283 0.2051 0.3934 0.6402 0.8493 ln(s+1) 0.0481 0.0942 0.2023 0.3423 0.4540 |yj - y | 0.0468 0.1193 0.3376 0.6413 0.8790 |yj - y | 0.0440 0.1199 0.3355 0.6434 0.8909 ln(s+1) 0.0488 0.1152 0.2595 0.4297 0.5288 |yj - y | 0.0171 0.0734 0.2947 0.6481 0.9058 |yj - y | 0.0168 0.0767 0.2960 0.6553 0.9199 ln(s+1) 0.0488 0.1348 0.3138 0.5108 0.6057 |yj - y | 0.0067 0.0470 0.2593 0.6546 0.8281 |yj - y |.42 0.0069 0.0513 0.2654 0.6667 0.4408 ln(s+1) 0.0490 0.1530 0.3654 0.5815 0.6709 |yj - y | 0.0027 0.0292 0.2331 0.6630 0.4437 |yj - y | 0.0027 0.0325 0.2410 0.6755 0.9562 ln(s+1) 0.0495 0.1707 0.4062 0.6187 0.6736 |yj - y | 0.0009 0.0205 0.2096 0.6714 0.9574 |yj - y | 0.0011 0.0233 0.2188 0.6875 0.9671 ln(s+1) 0.0496 0.1882 0.4420 0.6459 0.6763 |yj - y | 0.0004 0.0140 0.1901 0.6765 0.9658 0.0004 0.0164 0.2009 0.6959 0.9754 .42 8 .42 10 12 .42 14 .42 16 .42 |yj - y | Table 4: Effect of Number of Replicates (unadjusted 23 case, desired α = .05) Although the absolute deviation measures are relatively easy to implement by hand or in a spreadsheet and present little additional work for the analyst, their false alarm rates can vary dramatically when using traditional analysis of variance methods. Ideally a more exact mathematical test based on the underlying random variable or reference distribution therefore could be developed. For example, assuming Y ~ normal, the absolute deviation from the mean has a folded or “half normal” distribution related to a central chi density with one degree of freedom (Johnson, Kotz, and Balakrishnan (1994)), suggesting some type of test of 15 Alternate Dispersion Measures in DOE the upper chi tail. Alternatively, the absolute deviation can be transformed to approximate normality by raising it to a power somewhere between 0.35 and 0.55 dependent upon the specific criteria, with |Y - Y |.4168 being based on the Kullback-Leibler information. References Box, G. E. P. (1988), “Signal-to-Noise Ratios, Performance Criteria, and Transformations” (with discussions), Technometrics, 30, 1-40. Box, G. E. P. , Hunter, W. G., and Hunter, J. S. (1978), Statistics for Experimenters, New York: John Wiley & Sons. Byrne, D. M., and Taguchi, S. (1987), “The Taguchi Approach to Parameter Design”, Quality Progress, 20(12), Dec 1987, pp. 19-26. Daniel, C. (1976), Applications of Statistics to Industrial Experimentation, New York: John Wiley & Sons. Davidian, M. and Carroll R. J. (1987), “Variance Function Estimation”, Journal of the American Statistical Association, 82, 1079-1091. Gunter, B. (1988), “Discussion: Signal-to-Noise Ratios, Performance Criteria, and Transformations”, Technometrics, 30, 32-35. Hunter, J. S. (1985), “Statistical Design Applied to Product Design”, Journal of Quality Technology, 17(4), 210-221. Hunter, J. S. (1987), “Signal to Noise Ratio Debated”, Quality Progress 20(5), May 1987, pp. 7-9. Johnson, N. L., Kotz, S., and Balakrishnan, N. (1994), Continuous Univariate Distributions, New York: John Wiley & Sons. Montgomery, D. C. (1997), Design and Analysis of Experiments, 4th ed., New York: John Wiley & Sons. Montgomery, D. C. (1996), Introduction to Statistical Quality Control, 3rd ed., New York: John Wiley & Sons. Myers, R. H., and Montgomery, D. C. (1995), Response Surface Methodology, New York: John Wiley & Sons. Phadke, M. S. (1989), Quality Engineering Using Robust Design, Princeton NJ: Prentice-Hall. Pignatiello, J. J. and Ramberg, J. S. (1991), "The Ten Triumphs and Tragedies of Genichi Taguchi”, Quality Engineering 4(2), 211-225. Schmidt, S. R., and Launsby, R.G. (1995), Understanding Industrial Designed Experiments, 4th ed., Colorado Springs, CO: Air Academy Press. Taguchi, G. (1986). Introduction to Quality Engineering, Dearborn, MI: American Supplier Institute. Vining, G. G. and Schaub, D. (1996), “Experimental Designs for Estimating Both Mean and Variance Functions”, Journal of Quality Technology, 28(2), 135-147. 16