* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Sample Size versus Detection Probabilities of Off-Aim Drifts in Support or Quality Improvement: A SAS/GRAPH Application
Survey
Document related concepts
Transcript
SAMPLE SIZE VERSUS DETECTION PROBABILITIES OF OFF-AIM DRIFTS IN SUPPORT OF QUALITY IMPROVEMENT: A SAS/GRAPH APPLICATION Mark Carpenter and Cecil Hallum Division of Mathematical and Information Sciences Sam Houston State University Huntsville, Texas 77341-2206 1 INTRODUCTION Monitoring processes or services for the purpose of quality improvement. is a critical goal in most product and services oriented environments t.oday. In the work place it is customary for the customer to demand insight into the quality of the products or services they purchase. To provide this insight, the manufacturer or service provider must oftentimes exhibit quautitative evidence of the key chara.cteristics·of the processes or products. A key question upfront is that of the monitoring goals and the corresponding sampling frequency dictated by such goals. This quantitative evidence may be as fundamental as t.hat reflected in a quality control chart on the meant ra.nge, variance, or some other statistical entity all the way lip to that. of using multivariat.e statistics to study joint relationships of multiple variables in a proc~..s. To perform these tasks oftentimes dictates a need to collect and maintain considerable data loads. As an example, if one is tracking the vist.osity of a chemica! product, a priori decisions must be made in regard to such things as: (a) What will be the criterion to identify an inadequate product (e.g., a suspect it6111 mig-ht be one that fans outside a 3 standard deviation envelope - ' as reflected in a quality COIltrol chart)? (b) What will be a sufficient sampling frequeucy (i.e., monitoring rate) to gain adequate insight into the process behavior"? (c) Monitoring a process usually requires use of various statistical estimators of ullderlying pro- ct".$S parameters, consequently, how does one ensure that the estimators rlleet project goals (e.g., in control charting, one should insist on some understanding of the accuracy of the sample mean and varianc_e as estimators of the true process mean and variance)? <eI) Since numerous products and services are Ulan~ ufactured or administered in a temporal manner, when should a critical monitoring para.l1cter estimate be updated (e.g., iu control charting, when should one update the variance estimate of a process to ensllre a I'nore accurate control chart from its usage)? Continuing with the example of mOllitoring viscosity, numerous critical decisions are required including: I) a choice for what. variance value to lise (e.g., if it is estimat.ed from sampled data, how mlich data should be utilized and when should it be updated to best reHect current acceptahle viscosity characteristics?), 2) a choice of a decision logic that permits adequate warning with a minimum of false alarms, 3) identificatioll of decisioll points in lillie to consider updatillg estilllat.es of process characteristics, alld 4) monit.oring frequencies (i.e., is it slIffici(mt to monit.or viscosity hourly, daily, weekly, etc:?) to meet preselected accuracy goals. This paper addresses these issues Ilsing slati.sliciJ hypothesis testillg rnethods. Specifically, llsing t.he cOllcept. of the-"power" of a test. and the actual -decisioll logic. in testing hypot.heses concerning the meall of a process, relationships between appropriate salllpl<.~ sizes, dett~c tion of significallt. orr-aim drifts, and associated cOIlJidence statements about all conclusions are attainable. 889 Unfortunately, these statistical concepts are not always straightforward for the layman (i.e., nonstatistician) and any insight that supports clarity is highly desirable (and critical) to the correct usage and interpretation of snch tools. This paper provides a SAS/GRA I'll program that can be utilized to present these concepts compactly aud with considerable clarity for the layman. 2 ApPROACH Sample Size versus DeteCtion Probability (one mean) Emphasis is on monitoring quality via the means and variances. The differing cases discussed differ only in the number of processes (i.e., populations) being monitored. 2.1 various curves corresponding to various choices of the detection probahilities (i.e., values of .1- #). one attaillS illsight into these interrelationships hy investigating t.he graph in Figure 1. This graph was generated by a few lines ofSAS/GRAPIl (see the program listing in Sectioll 3) and can easily be altered to get variom; other related plots (e.g., one may wish to select differing values of COIIfidence I - ( t and/or differing probahilities of detectioll, I - (3). a=.05 50 DETECTING OFF-AIM DRIFTS IN THE PROCESS MEAN J1. Consider monitoring a particular product characteristic with a desirable preselected !(oll-ailnl> (mean;-I') value of 1'0. Since variation is an inherent part of the typical manufacturing process, the product manufacturer must necessarily monitor (Le" sample) the process periodically to maintain adequate insight into product characteristics. It is especially desirable- to be able to answer the following question: What sample size, n, should be utilized to detect an off-aim drift (i.e., away from "0) toward a value I" such that the probability is I - (3 of detecting when the difference 6 (1'0 - ",)/ s exceeds a preselected value (e.g., \.5) with a confidence probability of 1 -" that the collect.ive conclusions hold t.rue. Notice that the difference 6 is the magnitude of oft-aim drift ill terms of standard deviation units $ (i.e., s is the sample standard deviation) and, hence, is a unitless quantity. Consequently, since 5 is unitless, the results are applicable to numerous monitoring tasks. Obviously, it seems reasollable to expect n, ~<;, 6, (1', and /3 to be functionally related in some manner. From the use of statistical hypothesis testing logic, this connection is. easily seen to be given by the following implicit fuuctional relatiollship: \ 10 \\ s m 30 ~ I \~\ 10 \~ l = [F- 1(1- (3) ,r + r l ( l - a/2)]/Jil = 6 (J) \\ 10 \ ~ ~ ~ ::s;::: 1-,_.'lI!> -......:: ::--:.------ t-- I-- o 0.15 0.50 0.75 .00 1.15 1.50 1.75 1.00 6 Figure 1: Relationships between n, 6. I-Ct. I-p 2.2 DETECTING CHANGES IN THE PROCESS VARIANCE (72 Since numerous products and services are manufactured or ~dministered ill a temporallllanner, when should the variance of the critical monitorillg parameter estilllat(,~ be up·dated (e.g.,-in the simple task of control chartiug, when should one updat.e t.he variance estimator to cllsure a more accurate control chart froUl its usage)"! Thus, a critical question regarding a process variance is: what sample size, n, should be utilized to detect a shift iu the variance q'!. (i.e" away from q5 toward a value q~) such that the probability is I - {3 of detectillg whe" the ratiu u'5/iTr exceeds a prcsclecte<1 value (e.g., 3) with a confi· dence probability of I - cr that the colll~cl.ivc cOllclusions hold true. The functional relationship betwccn II, (\', Ii, arid the ratio I{ = (J'~/ cif is givcu implicitly as follows: Arriving at, this relationship amouut.s t.o that of finding the power, 1- p, in testing the statistic.at hypothesis If 0 : I' = Ito versus the alternative /11 : I' = 1'1 wit.h an overall confidence coefficient of I-cr. The notation F-l in the above is the inverse of the cumulative distributioll function oCthe t-distribution with n-I degrees of freedom (see any introductory. statistics text, e.g;, [2]). To the layman, sllch an expression lacks clarity; however, use of SAS/GRAPH provides the needed insight.. Specifically, by fixing 0' = .05 (a frequent choice which guaraIlLC<.~s a confidence coefficient of .95), letting the vert.ical axis represent choices of the sample size n; the horizontal axis represent candidate values of It, aut! by oV(~rlayillg 890 J( =1'-,1 :( .. _1 (I - (./2)/F-:,t ({3) Sample Size versus Power (Two Variances) (2) .\:"_1 a=.05 As in t.he ca.'ie for I' (discussed above), arriving at this relationship consists of finding the power, 1 - p, in testing lIo : (7'2 = (T~ versus Ifl : (12 = C"? with an overall confidence coefficient of 1 - 0'. Figure 2 provides COIIsiderable clarity regarding these interrelationships (the generation ofthis plot using SAS/GRAPII is discussed in Section 3 as well). 50 , 40 s • m 30 Sample Size versus Power (Variance) 0'=.05 V e 50 ~ z 20 40 10 s m 30 f S i 20 z 10 '\ \\\ .\ ~,\~ ~\I\ "- - .n I--- t--= .- - o .6 ." I~l\ r::; r--... '\ ~ "- ~ ." I-~~ ~ t::::hr------ 1\ '"f:f -- :::- -I'---- ~ 10 <5 r---.. ~ figure 3: Relationships between n, 6, l-cx, 1-/1 . r- I- /-... function F-l in equation (3) is the inverse of the CUUIUlative F-distributioll with 11- 1 degrees of freedom for, both, the numerator and denominator (see [2] for a discussion of the F-distribution). As before, this functional relationship results from the logistics that are the basis for testing Ho: "'U".~ 1 versus H, : dUd~ = k (where k -I 1). Similarly, if d[ and d~ are acceptably the same, the following relationship establishes the implicit relationship between Q; {3, ", and 6 = (1'1 ~ 1'2)/5. Figure 4, provides the clarification needed to see these relat.ionships; again, the SAS/GRAPII program that generated this plot is discussed in Section 3. t- o 5 8 = <5 rigure 2: Relationships between n, 6.1-a, l-p 2.3 \ DETECTING DIFFERENCES BETWEEN TWO PROCESS VARIANCES AND MEANS At times, t.wo geographically remote plants are tasked with m<lllufacLuring the same product for shipment to customers. A key issue is whether or not the two plants produce t.he same product (Le., what. are the conditions that leat! to declaring a difference in the variances 'and means). First, for variances, the implicit relationship of jIlt.crest is In this expression, F,-l is the inverse of t.he cumulat.ive 2 .. _2 t-distribution with 2" - 2 degrees of freedom [2]. 2.4 DETECTING DIFFERENCES BETWEEN SEVERAL PROCESS MEANS Generalizing from monitorillg two proc.ess means to that of several means results in yet other relationships (see reference {ID, Again, these int.errelationships are not which is plotted in Figllre:j IIsing the SAS/GRAPII program discussed in Section :L It should be noted the thc 891 easily understood without resorting to graphical depictions like those given in the previous section.' In particular, a graph can easily be generated for the situation whereby there are k means and the plot depicts the minimal sample size required to detect a maximum difference .6. between a pair of the k means with a detection probability 1 - ;3 and with overall confidence I-(\' (to be discussed at conference time). Sample Size versus DetectIon Probability (two means) a=.05 50 I \ 40 \\\ \ s m 30 \I~<1\ ! ~ I \ 10 . ~ 1',,;,~ ~ '\ 1 5 "'-. ~ "'- ~ 10 t:- ~ t---- t--- -- o .5 1 .0 1. 5 1.0 1 .5 6 figure 4: Relationships between n, 6.1-a,l-/J 2.5 DETECTING CHANGES IN LINEAR FUNCTIONS OF THE MEAN A general framework consists-of considering the general linear statistical model Y = X/3 + l where },-, is the vector of data, X is the design matrix, P is the vector of means and ( is the error vector (see [I]). Of particular interest are tests of the form No : ),T;3 = 1'0 versus HI : ),T;3 = 1'1 where), is a chosen vedor that gives the desirable test concerning a particular linear combination of means. In particular, if f3 = U!J, Il2, "', ILk]T then choosing ), = [I, -1,0, ... , OJT yields the same results as discussed above between It I and It2' lIowever, ot.her candidate choices for ..\ result.s in- !Clore f1exihilit.y and leads to the relationship F.-.!.(I -;3) + F;~.(I which is easily implerncllted ill SAS/CRAPII (to be discussed at conference Lime). It should be noted that in equ'ation (5), F'~~k is the iuverse of the 'climulative distribut.ion function of the t-distributioll with 1t-k degr<..~s of freedom (lJ . 3 DISCUSSION OF SASjGRAPH PROGRAM The program listing given below is the specific· program used to generate the plot in Figure 1. The generation of the other plot.s given ill Figures 2 til rough 5 result from minor modifications of this listing; t.hese modifications are detailed in the discllssion following the listing.' It should be noted that t.his particula{ program generates 4 overlayed plots in each case considered in this paper; the user may wish to only plot one or possibly cl~ange the'values of cr and/or P, or alter the ou~put in some other way. As can be seen, the program is small and modifications to generate differing outputs than given herein should be straightforward. I. /*COPTIONS DEV=VGA; 1* 2. LEGENDI LABEI,=NON8 VALUI;;=(I'OWI'=GREEK 'l-b=.95' '1-0=.85' 'l":b=.75' 'l-b=.G5') ACROSS=I FRAME POSI1'ION=(INSID8 MIDDLE) MODE=PROTEClr OFFSE1'=(5,0); 3. SYMBOLI C=I3LUE I=JOIN V=NONE; 4. SYMBOL2 C=GREEN I=JOIN V=NONE; 5. SYM130L3 C=YELLOW I=JOIN V=NONE; G. SYM130L4 C=RED I=JOIN V=NONE; 7. AXISI LAI3EL=(F=GREEK 'd') ORDER=(.25 TO 2 BY .25); 8. COPTIONS VSIZE=4.5 IN HSIZE=:l.5 IN; 9. DATA MADE; 10. ALPIIA=.05; II. D8TAI=.05; 12. BE1'A2=.15; 13. 138TA3=.2.5; Ij .. BETA4=.~!i; 1.1. PI = 1- B 8TA 1;1'2= I-B ETA2;P3= 1-· UETA:l;P4= I-BETA4; IG. ARRAY D(4);ARRAY 1'(4) 1'1·1'4; 17. DON=5T051; 18. DO 1=1 TO 4; - ,,/2) - I = 6/J),7ViTX)-I), (5) 892 19. D(I)=(TINV( I'(I),N-I )+TIN V( I-A LPIIA/2,N, 1))/N**.5); 4 20. END; DISCUSSION jCONCLUSIONS The availability of SAS/GRAPII with options like the 21. OUTPUT; overlay capability, as well a.~ its mallY other flexillilities, can greatly aid the I"Ylllan (i.e., the non"tatistician). This paper demonstrates how invaluable SAS/GRAPII can be in establishing easily reada!>le graphics generated 22. END; 23. LABEL N='Sample Size'; 24. PROC GpLOT DATA=MADE; from somewhat complicated statistical, functional rela~ tionships. Seve'ral key relationships are included that are of value in gaining insight to support critical decisions related to sample_ sizes, probabilities of detecting pre~ selected ofT~aim process drifts, magnitudes of the drifL to be sensed, along with confidence coefficients for joint results. 25. PLOT N*DI=I N*D2=2 N*D3=3 N*D4=4/0VERLAY IIAXIS=IIAXISI VAXIS=O TO 50 BY 10 VREF=IO 20 30 40 50 IIREF=.5 .75 I 1.25 1.5 1.75 LEGEND=LEGENDI; 26. TITLEI 'Sample Size versus Delection Probability (one mean)'; 27. TITLE2 F=GREEK 'a=.05'; 5 REFERENCES 1. Neter, John, ,",,'assermall, William, and Kutner, Michael II " Al'l,lied Linear SltdisliclLl Alodds. Third To obtain the plot in Figure 2, one needs to change Edition, Irwin Pub!., 1990. program line 19 to: D(I)=(CINV( I-ALPIIA/2,N-I) )/(CI NV( 1-1'(1), N-I»; and the OVERLAY portion of program line 25 to: OVERLAY HAXIS=AXIS3 VAXIS=O TO 50 BY 10 VREF=lO 2030 4050 IIREF=1.52 2.5 3 3.5 4 4.5 5 5.5 6 6.5 77:5 8 LEGEND=LEGENDI; 2. Triola. Mario- F., Elemenlary Statistics. Fifth Edition, Addison-Wesley Pub!. Co., 1992. For the plot in Figure 3, line 19 becomes: D(I)=(FINV( I-ALP II A/2),N-I,N-I )/FINV( I-P(I),NI,N-I)); and the OVERLAY portion of line 25 is changed to: OVERLAY HAXIS=AXIS4 VAXIS=O TO 50 BY 10 VREF=IO 20 30 40 50 HREF=3 4 56 78 LEGEND=LEGENDI; Finally, for the plot in Fignre 4, line 19 becomes: D(I)=(TINV( 1'( 1),2*N-2)+TI NV( I-A Lp II A/2,2* N2»/«N/2)**.5); and the OVERLAY portion of line 25 is changed to: OVERLAY I1AXIS=AXIS2 VAXIS=O TO 50 BY 10 VREF=IO 20 30 40 50 IIREF=.75 I 1.25 1.5 1.7522.25 LEGEND=LEGEND1. Note that the use of the IlAXIS statement wa.'.; necessary to get the plot to take all a certain desirable look. In particular, this part of the plot is determined by the range of values the user wishes to sec for fJ (in the case of tests on means) or k (in the C"lSe of test.s 011 the variances). Further <iisC'.ussioll regarding t.he flexibility and options in the above SAS/G itA I'll program will be taken up at conferenc.e time. 893