Download Sample Size versus Detection Probabilities of Off-Aim Drifts in Support or Quality Improvement: A SAS/GRAPH Application

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Degrees of freedom (statistics) wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Foundations of statistics wikipedia , lookup

History of statistics wikipedia , lookup

Taylor's law wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Student's t-test wikipedia , lookup

Misuse of statistics wikipedia , lookup

Transcript
SAMPLE SIZE
VERSUS
DETECTION PROBABILITIES OF OFF-AIM DRIFTS
IN SUPPORT OF
QUALITY IMPROVEMENT: A SAS/GRAPH
APPLICATION
Mark Carpenter and Cecil Hallum
Division of Mathematical and Information Sciences
Sam Houston State University
Huntsville, Texas 77341-2206
1
INTRODUCTION
Monitoring processes or services for the purpose of quality improvement. is a critical goal in most product and
services oriented environments t.oday. In the work place
it is customary for the customer to demand insight into
the quality of the products or services they purchase. To
provide this insight, the manufacturer or service provider
must oftentimes exhibit quautitative evidence of the key
chara.cteristics·of the processes or products. A key question upfront is that of the monitoring goals and the corresponding sampling frequency dictated by such goals.
This quantitative evidence may be as fundamental as
t.hat reflected in a quality control chart on the meant
ra.nge, variance, or some other statistical entity all the
way lip to that. of using multivariat.e statistics to study
joint relationships of multiple variables in a proc~..s. To
perform these tasks oftentimes dictates a need to collect
and maintain considerable data loads. As an example,
if one is tracking the vist.osity of a chemica! product, a
priori decisions must be made in regard to such things
as:
(a) What will be the criterion to identify an inadequate product (e.g., a suspect it6111 mig-ht
be one that fans outside a 3 standard deviation envelope - ' as reflected in a quality COIltrol chart)?
(b) What will be a sufficient sampling frequeucy
(i.e., monitoring rate) to gain adequate insight
into the process behavior"?
(c) Monitoring a process usually requires use of
various statistical estimators of ullderlying pro-
ct".$S parameters, consequently, how does one
ensure that the estimators rlleet project goals
(e.g., in control charting, one should insist
on some understanding of the accuracy of the
sample mean and varianc_e as estimators of the
true process mean and variance)?
<eI) Since numerous products and services are Ulan~
ufactured or administered in a temporal manner, when should a critical monitoring para.l1cter estimate be updated (e.g., iu control charting, when should one update the variance estimate of a process to ensllre a I'nore accurate
control chart from its usage)?
Continuing with the example of mOllitoring viscosity,
numerous critical decisions are required including: I)
a choice for what. variance value to lise (e.g., if it is
estimat.ed from sampled data, how mlich data should
be utilized and when should it be updated to best reHect current acceptahle viscosity characteristics?), 2) a
choice of a decision logic that permits adequate warning with a minimum of false alarms, 3) identificatioll of
decisioll points in lillie to consider updatillg estilllat.es
of process characteristics, alld 4) monit.oring frequencies
(i.e., is it slIffici(mt to monit.or viscosity hourly, daily,
weekly, etc:?) to meet preselected accuracy goals. This
paper addresses these issues Ilsing slati.sliciJ hypothesis testillg rnethods. Specifically, llsing t.he cOllcept. of
the-"power" of a test. and the actual -decisioll logic. in
testing hypot.heses concerning the meall of a process,
relationships between appropriate salllpl<.~ sizes, dett~c­
tion of significallt. orr-aim drifts, and associated cOIlJidence statements about all conclusions are attainable.
889
Unfortunately, these statistical concepts are not always
straightforward for the layman (i.e., nonstatistician) and
any insight that supports clarity is highly desirable (and
critical) to the correct usage and interpretation of snch
tools. This paper provides a SAS/GRA I'll program that
can be utilized to present these concepts compactly aud
with considerable clarity for the layman.
2
ApPROACH
Sample Size versus DeteCtion Probability (one mean)
Emphasis is on monitoring quality via the means and
variances. The differing cases discussed differ only in the
number of processes (i.e., populations) being monitored.
2.1
various curves corresponding to various choices of the
detection probahilities (i.e., values of .1- #). one attaillS
illsight into these interrelationships hy investigating t.he
graph in Figure 1. This graph was generated by a few
lines ofSAS/GRAPIl (see the program listing in Sectioll
3) and can easily be altered to get variom; other related
plots (e.g., one may wish to select differing values of COIIfidence I - ( t and/or differing probahilities of detectioll,
I - (3).
a=.05
50
DETECTING OFF-AIM DRIFTS IN
THE PROCESS MEAN J1.
Consider monitoring a particular product characteristic
with a desirable preselected !(oll-ailnl> (mean;-I') value
of 1'0. Since variation is an inherent part of the typical
manufacturing process, the product manufacturer must
necessarily monitor (Le" sample) the process periodically to maintain adequate insight into product characteristics. It is especially desirable- to be able to answer
the following question: What sample size, n, should be
utilized to detect an off-aim drift (i.e., away from "0)
toward a value I" such that the probability is I - (3 of
detecting when the difference 6 (1'0 - ",)/ s exceeds a
preselected value (e.g., \.5) with a confidence probability
of 1 -" that the collect.ive conclusions hold t.rue. Notice
that the difference 6 is the magnitude of oft-aim drift ill
terms of standard deviation units $ (i.e., s is the sample
standard deviation) and, hence, is a unitless quantity.
Consequently, since 5 is unitless, the results are applicable to numerous monitoring tasks. Obviously, it seems
reasollable to expect n, ~<;, 6, (1', and /3 to be functionally
related in some manner. From the use of statistical hypothesis testing logic, this connection is. easily seen to be
given by the following implicit fuuctional relatiollship:
\
10
\\
s
m 30
~
I
\~\
10
\~
l
=
[F- 1(1- (3)
,r
+ r l ( l - a/2)]/Jil = 6
(J)
\\
10
\
~
~ ~
::s;:::
1-,_.'lI!>
-......::
::--:.------ t-- I--
o
0.15
0.50
0.75
.00
1.15
1.50
1.75
1.00
6
Figure 1: Relationships between n, 6. I-Ct. I-p
2.2
DETECTING CHANGES IN THE
PROCESS VARIANCE (72
Since numerous products and services are manufactured
or ~dministered ill a temporallllanner, when should the
variance of the critical monitorillg parameter estilllat(,~
be up·dated (e.g.,-in the simple task of control chartiug,
when should one updat.e t.he variance estimator to cllsure
a more accurate control chart froUl its usage)"! Thus, a
critical question regarding a process variance is: what
sample size, n, should be utilized to detect a shift iu the
variance q'!. (i.e" away from q5 toward a value q~) such
that the probability is I - {3 of detectillg whe" the ratiu
u'5/iTr exceeds a prcsclecte<1 value (e.g., 3) with a confi·
dence probability of I - cr that the colll~cl.ivc cOllclusions
hold true. The functional relationship betwccn II, (\', Ii,
arid the ratio I{ = (J'~/ cif is givcu implicitly as follows:
Arriving at, this relationship amouut.s t.o that of finding the power, 1- p, in testing the statistic.at hypothesis
If 0 : I' = Ito versus the alternative /11 : I' = 1'1 wit.h an
overall confidence coefficient of I-cr. The notation F-l
in the above is the inverse of the cumulative distributioll
function oCthe t-distribution with n-I degrees of freedom
(see any introductory. statistics text, e.g;, [2]). To the
layman, sllch an expression lacks clarity; however, use of
SAS/GRAPH provides the needed insight.. Specifically,
by fixing 0' = .05 (a frequent choice which guaraIlLC<.~s
a confidence coefficient of .95), letting the vert.ical axis
represent choices of the sample size n; the horizontal
axis represent candidate values of It, aut! by oV(~rlayillg
890
J(
=1'-,1
:( .. _1
(I - (./2)/F-:,t ({3)
Sample Size versus Power (Two Variances)
(2)
.\:"_1
a=.05
As in t.he ca.'ie for I' (discussed above), arriving at this
relationship consists of finding the power, 1 - p, in testing lIo : (7'2 = (T~ versus Ifl : (12 = C"? with an overall
confidence coefficient of 1 - 0'. Figure 2 provides COIIsiderable clarity regarding these interrelationships (the
generation ofthis plot using SAS/GRAPII is discussed
in Section 3 as well).
50
,
40
s
•
m 30
Sample Size versus Power (Variance)
0'=.05
V
e
50
~
z
20
40
10
s
m 30
f
S
i 20
z
10
'\
\\\ .\
~,\~
~\I\
"-
-
.n
I---
t--= .-
-
o
.6
."
I~l\ r::; r--...
'\
~
"-
~
." I-~~
~ t::::hr------
1\
'"f:f -- :::- -I'----
~
10
<5
r---..
~
figure 3: Relationships between n, 6, l-cx, 1-/1
.
r- I-
/-...
function F-l in equation (3) is the inverse of the CUUIUlative F-distributioll with 11- 1 degrees of freedom for,
both, the numerator and denominator (see [2] for a discussion of the F-distribution). As before, this functional
relationship results from the logistics that are the basis
for testing Ho: "'U".~
1 versus H, : dUd~ = k (where
k -I 1). Similarly, if d[ and d~ are acceptably the same,
the following relationship establishes the implicit relationship between Q; {3, ", and 6 = (1'1 ~ 1'2)/5. Figure
4, provides the clarification needed to see these relat.ionships; again, the SAS/GRAPII program that generated
this plot is discussed in Section 3.
t-
o
5
8
=
<5
rigure 2: Relationships between n, 6.1-a, l-p
2.3
\
DETECTING DIFFERENCES BETWEEN TWO PROCESS VARIANCES AND MEANS
At times, t.wo geographically remote plants are tasked
with m<lllufacLuring the same product for shipment to
customers. A key issue is whether or not the two plants
produce t.he same product (Le., what. are the conditions
that leat! to declaring a difference in the variances 'and
means). First, for variances, the implicit relationship of
jIlt.crest is
In this expression, F,-l
is the inverse of t.he cumulat.ive
2 .. _2
t-distribution with 2" - 2 degrees of freedom [2].
2.4
DETECTING
DIFFERENCES BETWEEN SEVERAL
PROCESS MEANS
Generalizing from monitorillg two proc.ess means to that
of several means results in yet other relationships (see
reference {ID, Again, these int.errelationships are not
which is plotted in Figllre:j IIsing the SAS/GRAPII program discussed in Section :L It should be noted the thc
891
easily understood without resorting to graphical depictions like those given in the previous section.' In particular, a graph can easily be generated for the situation
whereby there are k means and the plot depicts the minimal sample size required to detect a maximum difference
.6. between a pair of the k means with a detection probability 1 - ;3 and with overall confidence I-(\' (to be
discussed at conference time).
Sample Size versus DetectIon Probability (two means)
a=.05
50
I
\
40
\\\ \
s
m 30
\I~<1\
!
~
I
\
10
.
~ 1',,;,~ ~
'\
1
5
"'-. ~
"'- ~
10
t:- ~ t---- t---
--
o
.5
1 .0
1. 5
1.0
1 .5
6
figure 4: Relationships between n, 6.1-a,l-/J
2.5
DETECTING CHANGES IN LINEAR FUNCTIONS OF THE MEAN
A general framework consists-of considering the general
linear statistical model Y = X/3 + l where },-, is the vector of data, X is the design matrix, P is the vector of
means and ( is the error vector (see [I]). Of particular interest are tests of the form No : ),T;3 = 1'0 versus
HI : ),T;3 = 1'1 where), is a chosen vedor that gives
the desirable test concerning a particular linear combination of means. In particular, if f3 = U!J, Il2, "', ILk]T
then choosing ), = [I, -1,0, ... , OJT yields the same results as discussed above between It I and It2' lIowever,
ot.her candidate choices for ..\ result.s in- !Clore f1exihilit.y
and leads to the relationship
F.-.!.(I -;3)
+ F;~.(I
which is easily implerncllted ill SAS/CRAPII (to be discussed at conference Lime). It should be noted that in
equ'ation (5), F'~~k is the iuverse of the 'climulative distribut.ion function of the t-distributioll with 1t-k degr<..~s
of freedom (lJ .
3
DISCUSSION
OF
SASjGRAPH
PROGRAM
The program listing given below is the specific· program
used to generate the plot in Figure 1. The generation of
the other plot.s given ill Figures 2 til rough 5 result from
minor modifications of this listing; t.hese modifications
are detailed in the discllssion following the listing.' It
should be noted that t.his particula{ program generates
4 overlayed plots in each case considered in this paper;
the user may wish to only plot one or possibly cl~ange
the'values of cr and/or P, or alter the ou~put in some
other way. As can be seen, the program is small and
modifications to generate differing outputs than given
herein should be straightforward.
I. /*COPTIONS DEV=VGA; 1*
2.
LEGENDI LABEI,=NON8 VALUI;;=(I'OWI'=GREEK
'l-b=.95' '1-0=.85' 'l":b=.75'
'l-b=.G5')
ACROSS=I
FRAME
POSI1'ION=(INSID8 MIDDLE) MODE=PROTEClr
OFFSE1'=(5,0);
3. SYMBOLI C=I3LUE I=JOIN V=NONE;
4. SYMBOL2 C=GREEN I=JOIN V=NONE;
5. SYM130L3 C=YELLOW I=JOIN V=NONE;
G. SYM130L4 C=RED I=JOIN V=NONE;
7. AXISI LAI3EL=(F=GREEK 'd') ORDER=(.25
TO 2 BY .25);
8. COPTIONS VSIZE=4.5 IN HSIZE=:l.5 IN;
9. DATA MADE;
10. ALPIIA=.05;
II. D8TAI=.05;
12. BE1'A2=.15;
13. 138TA3=.2.5;
Ij ..
BETA4=.~!i;
1.1.
PI = 1- B 8TA 1;1'2= I-B ETA2;P3= 1-·
UETA:l;P4= I-BETA4;
IG. ARRAY D(4);ARRAY 1'(4) 1'1·1'4;
17. DON=5T051;
18. DO 1=1 TO 4;
- ,,/2) - I = 6/J),7ViTX)-I),
(5)
892
19. D(I)=(TINV( I'(I),N-I )+TIN V( I-A LPIIA/2,N,
1))/N**.5);
4
20. END;
DISCUSSION jCONCLUSIONS
The availability of SAS/GRAPII with options like the
21. OUTPUT;
overlay capability, as well
a.~
its mallY other flexillilities,
can greatly aid the I"Ylllan (i.e., the non"tatistician).
This paper demonstrates how invaluable SAS/GRAPII
can be in establishing easily reada!>le graphics generated
22. END;
23. LABEL N='Sample Size';
24. PROC GpLOT DATA=MADE;
from somewhat complicated statistical, functional rela~
tionships. Seve'ral key relationships are included that are
of value in gaining insight to support critical decisions
related to sample_ sizes, probabilities of detecting pre~
selected ofT~aim process drifts, magnitudes of the drifL
to be sensed, along with confidence coefficients for joint
results.
25.
PLOT N*DI=I N*D2=2 N*D3=3
N*D4=4/0VERLAY IIAXIS=IIAXISI VAXIS=O
TO 50 BY 10 VREF=IO 20 30 40 50 IIREF=.5 .75
I 1.25 1.5 1.75 LEGEND=LEGENDI;
26. TITLEI 'Sample Size versus Delection Probability (one mean)';
27. TITLE2 F=GREEK 'a=.05';
5
REFERENCES
1. Neter, John, ,",,'assermall, William, and Kutner,
Michael II " Al'l,lied Linear SltdisliclLl Alodds. Third
To obtain the plot in Figure 2, one needs to change
Edition, Irwin Pub!., 1990.
program line 19 to:
D(I)=(CINV( I-ALPIIA/2,N-I) )/(CI NV( 1-1'(1),
N-I»;
and the OVERLAY portion of program line 25 to:
OVERLAY HAXIS=AXIS3 VAXIS=O TO 50 BY 10
VREF=lO 2030 4050
IIREF=1.52 2.5 3 3.5 4 4.5 5 5.5 6 6.5 77:5 8 LEGEND=LEGENDI;
2. Triola. Mario- F., Elemenlary Statistics. Fifth Edition, Addison-Wesley Pub!. Co., 1992.
For the plot in Figure 3, line 19 becomes:
D(I)=(FINV( I-ALP II A/2),N-I,N-I )/FINV( I-P(I),NI,N-I));
and the OVERLAY portion of line 25 is changed to:
OVERLAY HAXIS=AXIS4 VAXIS=O TO 50 BY 10
VREF=IO 20 30 40 50 HREF=3 4 56
78 LEGEND=LEGENDI;
Finally, for the plot in Fignre 4, line 19 becomes:
D(I)=(TINV( 1'( 1),2*N-2)+TI NV( I-A Lp II A/2,2* N2»/«N/2)**.5);
and the OVERLAY portion of line 25 is changed to:
OVERLAY I1AXIS=AXIS2 VAXIS=O TO 50 BY 10
VREF=IO 20 30 40 50 IIREF=.75 I
1.25 1.5 1.7522.25 LEGEND=LEGEND1.
Note that the use of the IlAXIS statement wa.'.; necessary to get the plot to take all a certain desirable look.
In particular, this part of the plot is determined by the
range of values the user wishes to sec for fJ (in the case
of tests on means) or k (in the C"lSe of test.s 011 the variances). Further <iisC'.ussioll regarding t.he flexibility and
options in the above SAS/G itA I'll program will be taken
up at conferenc.e time.
893