Download Graphical Techniques for Evaluating "Nonnormal" Data

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Statistics
GRAPlUCAL TECHNIQUES FOR EVALUATING "NONNORMAL" DATA
Melissa A. Durfee
Worcester Polytechnic Institute, Worcester, MA
Abstract
Using simple graphical techniques such as the box
and whisker plot, probability-probability plot and
probability plot, data that cannot be described by
a normal distribution may be analyzed and modeled
by detemiining the most appropriate distribution.
In addition, initial constraints imposed on the
data may be checked for appropriateness. The
validity of the modeled distribution may be confirmed through a statistical test such as the
chi-square goodness-or-fit. Therefore, even when
the data is not normally distributed, the resulting
statistical distribution may be utilized to monitor
the quality or a production process.
Introduction
Heavy Liquid Separation (HLS) separates ceramic
inclusions from loose nickel-based superalloy
powder. The technique involves mixing one-half
pound of alloy powder with heavy liquid thalliummalonate-formate (TMF). A series of centrifuge
operations is performed. Due to density differences between the liquid, alloy powder, and
inclusions, separation is achieved. Inclusions of
interest may be decanted and isolated on a filter
paper. Subsequent analysis of these inclusions in
the scanning electron microscope (SEM) provides
size and chemistry data for evaluating the
cleanliness of the powder prior to extrusion.
Therefore, HLS may be utilized as a powder quality
control tool.
Size data from the SEM is examined to determine
the appropriate statistical distribution to utilize
for modeling and process monitor. In addition, the
initial assumption on oxide percentage sum is
statistically analyzed to determine appropriateness.
Initial Assumptions
In analyzing the data,the following criteria - per
customer requirements - were utilized:
1) 0 + Na + Mg + AI + Si + Zr + Ca ~ 85
(oxide sum) .
2) total counts ~ 1000.
From the UNIVARIATE procedure, statistics,
histogram, normal probability and box and whisker
plots on the oxide sum are depicted in Figures 1
and 2. The Kolmogorov D statistic in Figure 1
indicates that the distn"bution is not normal since
the significance is less than 0.01. Since the statistic
is a truncated sum, this result is expected.
Box and Whisker Plot
When the minimum oxide sum is established as 85,
both the median and mean equal 91, and the 25th
percentile and 75th percentile are equal to 88 and 94,
respectively. However, the minimum (85) and
maximum (99) are not symmetrical with respect to
the median (inner horizontal line on the box and
whisker plot) and the 25th and 75th percentiles
(edges of the box). When examining the box and
whisker plot, this result is evidenced by the
asymmetrical length of the whiskers which indicate
the minimum and maximum.
Establishing the minimum oxide sum equal to 81
(Figures 3 and 4), improves the appearance of the
box and whisker plot. The minimum and maximum
are now symmetrical with respect to the median (90)
and the 25th and 75th percentiles; equal to 87 and
93, respectively. Therefore, lowering the initial
restriction on the oxide sum to 81 should be
considered.
Probability-Probability Plots
Probability-probability (P-P) plots, also referred to
as percent plots, are used to compare an empirical
cumulative distribution function with a specific
theoretical distribution function. If the two
distributions match, the points on the P-P plot form
a linear pattern that intercepts the ~rigin and has a
slope equal to one [SASIQC M Software:
Reference, p. 58].
The empirical distribution function is defined as:
FN(x) = proportion ofnonmissing values:S x
= (number of values :S x)
[1]
N
where N is the number of nonmissing observations.
A P-P plot is constructed by sorting the n
. . values: ~( \ _< x(2:\ <
nonmlSSmg
_ ... <
_ x .\" Th'th
e1
Cn
sorted value of X. iitepr~ilted by a p'lnnt on the
plot whose y-~bordinate is iln and whose
699
Statistics
;».
Both axes on the pop
x-coordinate is F(xj
plot range from 0 to I.
An advantage of Pop plots is that they are
discriminating in regions of high probability
density since, in these regions, the empirical and
cumulative distributions change more rapidly than
in regions of low probability density. Since the
SEM size data, converted to mils2, exhibits
high density in the range 0.962081 to 3.5, this
technique is well-suited for determining the
modeling distribution.
With a threshold (minimum value) equa.l to
0.962086, Pop plots on area (mils2) are
constructed by the CAPABILITY procedure and
PPPLOT option of SAS/QC. The following distributions are fit: exponential (Figure 5), gamma
(Figure 6), lognormal (Figure 7), and Weibull
(Figure 8). A plot for the beta distribution could
not be constructed on this data. When comparing
the plots with the reference line (slope = 1), the
Weibull distribution provides the best fit followed
by Gamma, Exponential, and Lognormal.
Goodness-of-Fit Test
To confirm that the Weibull distribution best
models the size data, a goodness-of-fit test, based
on the chi-square distribution, ~ay be constructed. To run the test in SAS M, the area
must be divided by 100 since the range of data must
be between 0 and 1. Also, since the maximum eell
frequency is limited to 30, \he maximum area is
restricted to 9.9952 mils. Otherwise, the
cell width would be too large to accurately group
the data. The resUlts are summarized as follows:
Chi-Square
164.0
49.8
1;188.8
39.5
Distribution
Exponential
Gamma
Lognormal
Weibull
Simificance
0.0001
0.0105
0.0001
0.0569
At a=.05, the Weibull distribution cannot be
rejected.
WeibuU Distribution
The Weibull distribution is defined as:
c-1
f(x)
c
=~ (x- 9) exp [_(x- 9)]
a
=0
a
700
p=9
S2 =
+ ar(l + ~)
[3]
a 2[I'(1 + ~ - r(I + ~)i where r(n)=(n-l)!
The Weibull distribution is used extensively in
reliability engineering as a model of time to failure
in electrical and mechanical components and
systems. Examples where Weibull has been used
include electrical devices such as memory elements,
mechanical components such as bearings, and
structural elements in aircraft and automobiles.
Probability Plots
Probability plots facilitate the comparison of a data
distribution with a specified theoretical distribution.
A probability plot is constructed by sorting the n
nonmissing values: ~l: ~ S X(2~ S. ... S x(n).' The ith
sorted value of x. wltep~ted by a ~mt on the
plot whose y-J~ordinate is x. and whose
x-coordinate is F-l«i-3/8)/(n+1/4)~~) F(.) indicates
the distribution function within the specified family.
The Weibull probability plot on area is indicated in
Figure 9. Up to the 99th percentile, the distribution
line fits the plotted points closely. Thereafter,
excursions indicate that outliers may be present.
Based on maximum likelihood estimates calculated
using SAS, the fitted Weibull distribution
parameters are as follows:
= 0.962086
a = 1.6540641
9
and
I' = 2.93224158
s = 2.66672661
c = 0.74964824
Since the distribution has been accurately modeled,
process monitor may be established. For example, a
restriction may be imposed on the calculated
Weibull percentile. If a production lot exceeds this
limit, the lot would be considered significantly
different since it is not representative of the "known"
population. Although the modeling distribution
differs, this approach parallels standard quality
control monitors based on the normal distribution.
Conclusion
ifx>9
a
otherwise
e = threshold (or location) parameter
a = scale parameter (a>O)
c = shape parameter (c>O).
The minimum data value must be greater than the
threshold parameter, 9. The mean (p) and variance
(s1 of the Weibull distribution are:
[2)
Using simple graphical techniques such as the box
and whisker plot, poP plot, and probability plot,
nonnorma1 data may be analyzed and fit to the most
appropriate distribution. Results may be statistically confirmed through the chi-square goodness-
Statistics
of-fit test. For the analyzed data, the asymmetry
of the box and whisker plot for oxide sum suggests
that the minimum value should be lowered to 81
fro~ 85.
For the ceramic inclusion area in
mils , the appropriate modeling distribution is
Weibull which is confirmed by a statistical test.
References
Hogg, Robert V. and Ledolter, Johannes.
Engineering Statistics. New York: Macmillan
Publishing Company, 1987.
Montgomery, Douglas C. Statistical Quality
Control. New York: John Wiley k Sons, 1985.
The Author
SAS/QC Software: Reference, Version 6,
First Edition. Cary, NC: SAS Institute, Inc.,
Melissa A. Durfee
P.O. Box 168
Grafton, MA 01519
(508) 839-4689
1989.
SAS andSAS/QC are registered trademarks of SAS
Institute Inc., Cary, NC USA.
FIGURE 1
UNIVARIATE STATISTICS, NORMAL AND BOX & WHISKER PLOTS
HEAVY LIQUID SEPARATION OF CERAMIC INCLUSIONS
OXIDE SUM >= 85
TOTAL COUNTS >= 1000
Univariate Procedure
Variable=SUM
N
Mean
Std Dev
Skewness
USS
CV
T:Mean=O
Num A= 0
M(Sign)
Sgn Rank
D:Normal
100%
75%
50%
25%
0%
Max
Q3
Med
Ql
Min
Range
Q3-Ql
Mode
Lowest
85(
85(
85(
85(
85(
Moments
3437 Sum wgts
91.02706 Sum
3.456889 Variance
-0.00165 Kurtosis
28519786 CSS
3.79765 Std Mean
1543.742 ~r>ITI
3437 Num > 0
1718.5 pr>=IMI
2954102 Pr>= S
0~082014
Pr>D
Quantiles(Def=5)
99
99%
94
95%
91
90%
88
10%
85
5%
1%
14
3437
312860
11.95008
-0.95163
41060.48
0.058965
0.0001
3437
0.0001
0.0001
<.01
98
97
96
86
85
85
6
92
Extremes
Obs
Highest
3432)
99 (
3418)
99 (
3405)
99(
3402)
99 (
3376)
99 (
Obs
2473)
2543)
3027)
3043)
3136)
701
Statistics
FIGURE 2
UNIVARIATE STATISTICS, NORMAL AND BOX & WHISKER PLOTS
HEAVY LIQUID SEPARATION OF CERAMIC INCLUSIONS
OXIDE SUM >= 85 TOTAL COUNTS >= 1000
Univariate Procedure
variable=SUM
Histogram
99.5+**
.******
.****************
.**************************
.******************************
.****************************************
.****************************************
92.5+*******************************************
.*******************************************
.***********************************
.************************************
.*********************************
.*********************************
.****************************
85.5+*************************
----+----+----+----+----+----+----+----+---
*
#
Boxplot
9
44
123
206
240
318
316
343
339
275
287
260
+-----+
I I
I I
+-----+
*--+--*
263
221
193
may represent up to 8 counts
Normal Probability Plot
+*
+****
*******
*****+
****++
****++
****++
92.5+
***++
****+
***+
***+
****
****
*****
85.5+**********+
+----+----+----+----+----+----+----+----+----+----+
-2
-1
0
+1
+2
99.5+
702
Statistics
FIGURE 3
UNIVARIATE STATISTICS, NORMAL AND BOX & WHISKER PLOTS
HEAVY LIQUID SEPARATION OF CERAMIC INCLUSIONS
OXIDE SUM >= 81
TOTAL COUNTS >= 1000
Univariate Procedure
Variable=SUM
N
Mean
Std Dev
Skewness
USS
CV
T:Mean=O
Hum -. 0
MeSiqn)
Sgn Rank
D:Hormal
100%
75%
50%
25%
0%
Max
Q3
Med
Ql
Min
Ranqe
Q3-Ql
Mode
Moments
3900 Sum wqts
90.03795 Sum
4.23521 Variance
-0.23865 Kurtosis
31686582 CSS
4.703806 Std Mean
1327.648 pr>ITI
3900 Hum > 0
1950 pr>=IMI
3803475 Pr>= S
0.088415 Pr>D
Quantiles(Def=5)
99
99%
93
95%
90
90%
87
10%
81
5%
1%
18
3900
351148
17.93701
-0.7846
69936.38
0.067818
0.0001
3900
0.000l.
0.0001
<.01
98
96
95
84
83
81
6
92
Lowest
81e
81C
81 (-
81e
81e
Extremes
Obs
Hiqhest
3865)
99{
3659)
99(
3585)
99 (
3573)
99 (
3394)
99(
Obs
2812)
2885)
3462)
3478)
3581)
703
Statistics
FIGURE 4
UNIVARIATE STATISTICS, NORMAL AND BOX & WHISKER PLOTS
HEAVY LIQUID SEPARATION OF CERAMIC INCLUSIONS
OXIDE SUM >= 81 TOTAL COUNTS >= 1000
univariate Procedure
Variable=SUM
Histogram
99.5+**
.******
.****************
.**************************
.******************************
.****************************************
.****************************************
.*******************************************
.*******************************************
90.5+***********************************
~************************************
.*********************************
.*********************************
.****************************
.*************************
.******************
.*****************
.**************
81.5+***********
----+----+----+----+----+----+----+----+---
*
Be
#
9
44
123
206
240
318
+-----+
316
I I
I I
+-----+
343
339
*--+--*
275
287
260
263
221
193
140
129
107
87
may represent up to 8 counts
Normal probability Plot
99.5+
++ *
+++****
******
*****
****+
****+
***++
****+
***++
90.5+
***++
***+
**+
****
***
***
***
***
*****
81.5+******+
+----+----+----+----+----+----+----+----+----+----+
-2
704
-1
o
+1
lot
+2
Statistics
FIGURES
P-P PLOT: EXPONENTIAL DISTRIBUTION FIT ON AREA (mils**2)
-------------------------------------------------------1.0+
0.8 +
A 0.6 +
R
E
A
++++
++++
++++
0.4 +
0.2 +
+++
+++
++
++
+----------+----------+----------+----------+----------+
o
.2
Observations:
FIGURE 6
.4
.6
.8
1
Exponential(Thetax O.96 Scale=1.9?)
+
(2184 Hidden)
P-P PLOT: GAMMA DISTRIBUTION FIT ON AREA (mils**2)
·1.0 +
0.8 +
A 0.6 +
R
E
A
0.4 +
0.2 +
o+
+----------+----------+----------+----------+----------+
o
.2
.4
.6
.8
1
Observations:
Gamma(Theta=O.96 Shapex O.64 Scale=3.09)
+
(2118 Hidden)
705
Statistics
FIGURE 7
1.0 +
P-P PLOT: LOGNORMAL DISTRIBUTION FIT ON AREA (mils**2)
--------------------------------------------------------
0.8 +
A 0.6 +
R
E
I
A
0.4 +
+++
++
+++
++++
0.2 +
++++
+ ++
+----------+----------+----------+----------+----------+
.2
.4
.6
.8
1
o
Observations:
FIGURE 8
Loqnormal(Theta=l Shape=2.1 Scale=-.3)
+
(2209 Hidden)
p-p PLOT: WEIBULL DISTRIBUTION FIT ON AREA (mils**2)
1.0+
0.8 +
A
R
E
A
0.6 +
0.4 +
0.2 +
o
+
+----------+----------+----------+----------+----------+
.2
.4
.6
.8
1
o
Observations:
706
Weibull(Theta=.96 Shape=.75 Scale=1.7)
+
(2114 Hidden)
s
C!
HEAW LIQUID SEPARATION OF CERAMIC INCLUSIONS
OXIDE SUM >= 85 AND TOTAL COUNTS >= 1000
PROBA81UTY PLOT: WEI BULL DISTRI8 UTION RT ON AREA (mils**2}
80-:1·· .
~
cc
+
70 ~:: :
60 ~:: :
~
**
.!!!
.. .
50 ~ ..
.. ..
•. .
. +
-.5 401:: :
Lti
c::
+ 't
30 ~:: :
«
+
20
10
-
+
+
Om
.01
I
75
90
95
99
99.9
99.99
WEIBULL PERCENTILES
Werbull Llne:
-..J
o
-..J
- - Threshold=O.9621, Scole= 1.6541
ii
~.