Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
The Calculation of Average Run Length for a Nonparametric CUSUM Procedure Dennis W. King, STATKING Consulting Inc. Michael T. Longnecker, Texas A&M U niwrsity ABSTRACT terms this means monitoring the mean of the process random variables. Many authors have discussed sequential statistical tests to indicate when a process goes "out of control", i.e., when the process mean drifts away from 00 • Some of the authors who have addressed this topic are Shewhart(1931), Page(1954) and Lucas(1976). All of these articles center on the calculation of A method of calculating the probability mass function of the Wilcoxon signed rank statistic is discussed. The algorithm presented, written using PROC IML, is based on the work of Milton (1970). A brief summary of a newly developed error bound for this algorithm is given. For process control schemes based on the Wilcoxon signed rank statistic, the calculation of the Awrage Run Length (ARL) of the scheme requires the evaluation of the probability mass function of the Wilcoxon statistic under a shift in the location parameter. The application of the algorithm's output to the calculation of ARL is shown in detail. n Sn = L W n -;· (v(X;) - k), (Ll) ;=1 at each time point n in the process. The components of this sum are Wn-j, the weight placed on the observation at each time point; v(Xj), some condensation of the data vector Xj (Xii, ... ,Xgj) observed at time point i; and k, a reference value. For a CUSUM procedure, we set all Wj=1. The choice of v (Xj) is arbitrary but a common choice is X - 00 • We test Ho : 0 = 00 and reject Ho in favor of Ha : 0 > 00 when Sj ~ h where = 1. INTRODUCTION The use of statistical procedures to monitor repetitive manufacturing processes has become quite widespread. A characteristic of the process is observed over time. The observed random variables may be measurements of the weight of rolled coils of processed steel or the measured dimension of a die which was cut to a specified length. The main thrust is to track whether the manufactured product is near a management specified goal value, say 00 • In statistical Sn = max(O, Sn-l + v(Xn) - k), (1.2) where So = o. The reference value k is chosen so that the decision interval h does not depend on the time point, i.e., we have a constant rejection region. Note that we are making an unknown number of statistical tests since the proced ure continues indefinitely until we reject Ho. The random variable N will be used to denote the number of tests performed before the process is 487 signaled out of control. We say that E(N), the expected wlue of N, is the average run length(ARL) of the procedure. Thisvalue E(N) is the common "yardstick" by which we measure one type of charting procedure versus another. When the condensation statistic, IJ(X.. ) is a continuous random variable, such as X - 00 , then the ARL is given by an integral equation (see Lucas(1976)) which must be solved by methods of numeric integration. When v(X.. ) is a discrete random variable, Brook and Evans(1972) have shown that the ARL is given by E(N) = (I - p)-l . 1 particular step are the values of W, O,I, ... ,h. So then the transient state probabilities are Pio = P(Si ~ 0) = P(W < k - i) Pij = P(Si = i) = P(W = k + i-i) Pih = P(Si ~ h) = P(W ~ k + h - i) (2.2) To obtain the distribution of W, we first find the distribution of W+ = E~=I '11 ja(Rn where a(j) are as in (2.1) andWi(t) = 0,1 as t <,~ O. Then applying the transformation W = 2(W+ - E(W+)) = 2W+ - g(g + 1)/2, P(W = w) = P(W+ = w/2 + E(W+)). (2.3) (1.3) When the process is in control, i.e. II = 00, all orderings of the ranks are equally likely and it can be shown (see Randles and Wolfe(1979),p.52) that where P is the probability transition matrix of a Markov chain whose states are the values of the discrete CUSUM. (2.4) 2. THE DISTRIBUTION OF THE WILCOXON STATISTIC where dg(e) is the number of subsets of the integers (1, ... ,g) for which the sum of the elements in the subset equals c and c = 0,1, ... , g(g + 1)/2. When the process is out of control (II > 00), not all permutations of the ranks are equally likely and additional notation is necessary. Without loss of generality take 110=0. Let X(I) ~ X(2) ~ ... ~ X(g) be the ordered values of X"X2, ... , Xg. Denote Zgi = 1 ifthe ithsmallest obserVlltion, X(i) , is non negative and 0 if the ith smallest IS negative. The rank configurations are then Zg = (ZgJ, Zg2,""Zgg). Thus, W+ = EiZgj. Then P(W+ = c) is obtained by summing Po (Zg = Zg) for all possible rank configurations Zg for which E iZgi = c. For example, with g=5, Pq(W+ = 12) = Pq(Z. = When g > 1, we may use in place of X00 in (1.1), 9 IJ(Xi) =SRi= EWia(Rt) (2.1) j=1 where Rt is the rank of IXj - 00 1among IXI - Ool,···,IXg - 00 1, Wi = W(Xj -(10) andw(t) = -1,1 as t <, ~ O. For the set of scores, aO) j, j=I, ... ,g, the signed linear rank statistic in (2.1) becomes the Wilcoxon Signed Rank Statistic, denoted henceforth as W. Since this condensation statistic is discrete, the ARL for this type of CUSUM will be given exactJy by (1.3). The elements of the probability transition matrix in (1.3) will be formed from the probability distribution of W. The states of the Markov chain for any 488 00111) + PO(Z5 = 11011). Klotz(1963) has shown the expression for PO(Zg = Zg) is g! 1 1t'-1 1t2 II 00 o 9 •.• 0 0 error introduced by using a Newton-Cotes formula to approximate the integral over the finite region will be denoted "eale. The total error is then lo(tj - 8jlJ)dtj (2.&) j=1 {= where 10 is the pdf of Fo and 8j = 2zj - 1. Denote this integral as I g • Milton(1970) recognized that the region of integration, depicted in Figure 1 in the appendix, for this particular problem allows a convenient approximation formula. In Figure 1, we see that if we use a simple midpoint formula for numeric integration in anyone dimension the region we need will be given by 12 ..:. m2(/11121 + 112122 + '13123 2 m2(?= h~{2j + 0=1 .L; (2.8) then to ~ ttrune. Once a has been determined, we then select m, the width of the interval in the midpoint numerical integration procedure, by iterating on m until (2.6) «(/maz(m) where m=(b-a)/3 for M=3 subintervals and Iii = the pdf 10 in (2.&) evaluated at the midpoint of its respective subinterval. This generalizes to M subintervals in 2 dimensions as 12 • + {eale. When using the algorithm we first specify a truncation error, say to. It has been shown by King and Longnecker(1990), that if we choose the truncation boundary, a, as shown in Figure 1, such that + 111/22 + '11123 + '12123) M {trune + D(m))9 - (/maz{m»9) * N ADDS < {eale (2.10) where NADDS, the total number of additions performed, -=- (M+:-l), where M = aim. The value Imaz(m) is height ofthe highest rectangle in the intergration region and D (m) is the maximum value of the integration error over the region. These values are given in King and Longnecker(1990) for specific symmetric distributions. H we follow this two-step procedure then the total error will be ~ i. The above error bounding procedure has been implemented in the SAS Macros to be described below. hil2i) 1:0;0 <rSM (2.7) We can then further generalize to g dimensions and state the above in matrix terms. This allows us to program the algorithm using PROC IML. This method is much more easily programmed than the quadrature methods and requires only M*g storage locations in the computer. A brief description of the error bounding for this algorithm is now given. For the details, see King and Longnecker(1990). The method of numeric integration used here is subject to two sources of error. The error introduced by truncating an infinite region of integration will be denoted by "trune. The 3. SAS MACROS FOR WILCOXON CUSUM The programming necessary to implement the procedures discussed in section 2 is threefold: (a) calculate a and m necessary to achieve the specified error bound, €, for 489 V(Xi)' under a shift o. The ARL's can then each probability in the probability density function of W for the given underlying distribution and group size, (b) compute the probability density function of W under specified shift according to the Milton Algorithm and (c) calculate the ARL of the Wilcoxon CUSUM using the pdf of W. This is done with three SAS Macros, whose call statments are of the form be compared to a comparable CUSUM procedure (in terms of in-control ARL) where V(Xi) = X - 00. %BOUND(P=,SHIFT=,DIST=,DF=); %WALTDIST(P=,SHIFT=,DIST=,DF= ); %ARLWILC(P=,SHIFT=,DIST=,DF=,H=,K= Table 1 shows that when the data comes from very heavy tailed distributions the ARL is shorter for the Wilcoxon CUSUM than the parametric CUSUM. For the other distributions, the loss in ARL is small so the nonparametric procedure can be considered a global procedure to protect against non Normal data in the small sample case. 5. A DATA EXAMPLE where DIST=name of the underlying distribution of the data (NORM, LOGS, DEXP, As an example, consider data generated from the soft drink industry. At the end of each hour of production, five bottles of soft drink are sampled and their mllevels are recorded. The data values shown in Table 2 are the deviations of the fill levels relative to the target value of 8 ounces. The data are assumed to be sampled from a Normal universe with iT = 1 ounce. An upward shift in the process mean of .25 ounces has occured. As we can see, the non parametric CUSUM detects this shift at time point 5 and the parametric procedure signals slightly more quickly at time point 4. This is consistent with the ARL results of Table 1. The calculations necessary to implement the nonparametric scheme are not that difficult and could be carried out by technicians or line personnel. T) P=sample size at each time point H=decision interval K=reference value DF=degrees of freedom for the T distribution SHIFT= the shift from the target mean in standard deviation units Note that you may run only the error bounding macro or the error bound and the distribution of the Wilcoxon Statistic or all 3 macros. The outputs for these macros are shown in the appendix. 4. ARL COMPARISONS The ARL under various shifts of size Table 1 in the appendix shows the ARL for a particular Wilcoxon CUSUM for various underlying distributions. The errors in using the estimated probabilities from above is < 1%. Note the underlying distribution must be specified when using non parametric statistic, o can then be calculated. 6. DISCUSSION The procedure discussed above can calculate the values of the probability density function of the Wilcoxon signed rank statistic to 4 decimal place accuracy for most symmetric densities. It can also calculate the 490 Journal of Quality Technology, 8, 1-12. ARL for the Wilcoxon CUSUM procedure described above. However, the code does have some limitations. The procedure is inadequate for the scaled T distribution with ~ 1 degree of freedom. This is due to the fact that the step size, m, cannot be varied across the region of integration. This, in turn, males the computer storage and cpu requirements in this situation infeasible, at least for implementation on most microcomputers. For other symmetric distributions, the time necessary to calculate the ARL of the Wilcoxon CUSUM still prohibits generation of large tables of ARL. Thus, the procedure is more of a research tool which allows comparison of the ARL's for this procedure versus ARL's for parametric approaches. Milton, R.C.(1970). Rank Order Probabilities: Two Sample Normal Shift Alternati6eB, John Wiley & Sons, New York, NY. Page, E.S. (1954). "Continous Inspection Schemes", Biometrika, 41, 100-114. Randles, R.H. and Wolfe, D.A.(1979). Introduction to the Theory of Nonparametric Statistics, New York: John Wiley & Sons. Shewhart, W.A.(1931). Economic Control of Quality 0/ Manufactured Product, New York: Van Nostrand. REFERENCES Brook,D. and Evans, D.A.(1972). "An Approach to the Probability Distribution of CUSUM Run Length", Biometrika, 59, 539-549. van Dobben de Bruyn, D.S. (1968). Cu· mulati6e Sum TestB: Theory and Practice, London: Griffin. King, D.W. and Longnecker, M.T.(1990) .• "Computing the Distribution of the Wilcoxon Statistic with Applicaitons to Process Control", to appear in Communications in Statis· tics - Simulation and Computing. Klotz, J.H.(1963). "Small Sample Power and Efficiency for the One Sample Wilcoxon and Normal Scores Tests", Annals of Mathematical StatisticB, 34, 624-632. Lucas, J.M.(1976). "The Design and Use of V-Mask Control Schemes", 491 Table 1. Comparison of Parametric and Nonparametric Exact ARL's for Various Underlying Distributions, g = 5 shift Procedure Distri bution 0 .25 .5 1.0 2.0 3.0 CUSUM Wilcoxon Normal Normal 100.9 100.9 30.6 36.1 15.8 17.9 7.9 8.8 5.1 5.5 5.0 5.0 CUSUM Wilcoxon Double Exp. Double Exp. 100.5 100.9 31.0 27.3 15.5 14.1 8.5 8.0 5.0 5.7 5.0 5.2 CUSUM Wilcoxon Logistic Logistic 100.5 100.9 30.0 33.2 16.0 16.4 8.5 8.4 5.0 5.6 5.0 5.1 CUSUM ·Wilcoxon T* 2 df T* 2 df 99.5 100.9 42.0 26.0 19.0 13.3 8.5 7.8 5.0 5.8 5.0 5.4 FIG. 1 Region ofIntegration for the Milton Algorithm X 2 truncation reg ion C '23 o • m • ~--~~--~~--~~~---X1 o '11 '12 492 '13 C TABLE 2 Sample Data -2,-.5,1.1,0.8,0.4 0.2,0.1,1.4,0.0,-1 1.7,2.1,-1,1.1,0.2 -1,0.2,0.2,1.9,1.2 1.7,-.0,0.5,-.3,1.1 Signed Ranks Wilcoxon Statistic -5,-2,4,3,1 3,2,5,1,-4 4,5,-3,2,1 -4,2,1,5,3 5,-1,3,-2,4 Parametric Wilcoxon CUSUM CUSUM H=.89,K=.129 H=11,K=5 1 7 9 7 9 0.00000 0.00000 0.66293 0.95244 * 1.43964 * °62 8 12 Output from BOUND Macro g SHIFT DIST DF 5 1 NORM 4 C m EPSCALC 5.265 0.011 .00001 EPSTRUNC .00001 Output from WALTDIST Macro OBS c 1 2 3 1 2 4 3 5 6 4 5 6 7 8 9 10 11 12 13 14 15 Wilcoxon pdf 0.00010 0.00013 0.00018 0.00054 0.00089 0.00287 0.00419 0.00671 0.01319 0.02571 0.04143 0.05509 0.11568 0.10470 0.20697 0.42155 ° 7 8 9 10 11 12 13 14 15 16 Output from ARLWILC Macro OBS ARL 1 8.81426 493 NN 479 •