Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Statistics GRAPlUCAL TECHNIQUES FOR EVALUATING "NONNORMAL" DATA Melissa A. Durfee Worcester Polytechnic Institute, Worcester, MA Abstract Using simple graphical techniques such as the box and whisker plot, probability-probability plot and probability plot, data that cannot be described by a normal distribution may be analyzed and modeled by detemiining the most appropriate distribution. In addition, initial constraints imposed on the data may be checked for appropriateness. The validity of the modeled distribution may be confirmed through a statistical test such as the chi-square goodness-or-fit. Therefore, even when the data is not normally distributed, the resulting statistical distribution may be utilized to monitor the quality or a production process. Introduction Heavy Liquid Separation (HLS) separates ceramic inclusions from loose nickel-based superalloy powder. The technique involves mixing one-half pound of alloy powder with heavy liquid thalliummalonate-formate (TMF). A series of centrifuge operations is performed. Due to density differences between the liquid, alloy powder, and inclusions, separation is achieved. Inclusions of interest may be decanted and isolated on a filter paper. Subsequent analysis of these inclusions in the scanning electron microscope (SEM) provides size and chemistry data for evaluating the cleanliness of the powder prior to extrusion. Therefore, HLS may be utilized as a powder quality control tool. Size data from the SEM is examined to determine the appropriate statistical distribution to utilize for modeling and process monitor. In addition, the initial assumption on oxide percentage sum is statistically analyzed to determine appropriateness. Initial Assumptions In analyzing the data,the following criteria - per customer requirements - were utilized: 1) 0 + Na + Mg + AI + Si + Zr + Ca ~ 85 (oxide sum) . 2) total counts ~ 1000. From the UNIVARIATE procedure, statistics, histogram, normal probability and box and whisker plots on the oxide sum are depicted in Figures 1 and 2. The Kolmogorov D statistic in Figure 1 indicates that the distn"bution is not normal since the significance is less than 0.01. Since the statistic is a truncated sum, this result is expected. Box and Whisker Plot When the minimum oxide sum is established as 85, both the median and mean equal 91, and the 25th percentile and 75th percentile are equal to 88 and 94, respectively. However, the minimum (85) and maximum (99) are not symmetrical with respect to the median (inner horizontal line on the box and whisker plot) and the 25th and 75th percentiles (edges of the box). When examining the box and whisker plot, this result is evidenced by the asymmetrical length of the whiskers which indicate the minimum and maximum. Establishing the minimum oxide sum equal to 81 (Figures 3 and 4), improves the appearance of the box and whisker plot. The minimum and maximum are now symmetrical with respect to the median (90) and the 25th and 75th percentiles; equal to 87 and 93, respectively. Therefore, lowering the initial restriction on the oxide sum to 81 should be considered. Probability-Probability Plots Probability-probability (P-P) plots, also referred to as percent plots, are used to compare an empirical cumulative distribution function with a specific theoretical distribution function. If the two distributions match, the points on the P-P plot form a linear pattern that intercepts the ~rigin and has a slope equal to one [SASIQC M Software: Reference, p. 58]. The empirical distribution function is defined as: FN(x) = proportion ofnonmissing values:S x = (number of values :S x) [1] N where N is the number of nonmissing observations. A P-P plot is constructed by sorting the n . . values: ~( \ _< x(2:\ < nonmlSSmg _ ... < _ x .\" Th'th e1 Cn sorted value of X. iitepr~ilted by a p'lnnt on the plot whose y-~bordinate is iln and whose 699 Statistics ;». Both axes on the pop x-coordinate is F(xj plot range from 0 to I. An advantage of Pop plots is that they are discriminating in regions of high probability density since, in these regions, the empirical and cumulative distributions change more rapidly than in regions of low probability density. Since the SEM size data, converted to mils2, exhibits high density in the range 0.962081 to 3.5, this technique is well-suited for determining the modeling distribution. With a threshold (minimum value) equa.l to 0.962086, Pop plots on area (mils2) are constructed by the CAPABILITY procedure and PPPLOT option of SAS/QC. The following distributions are fit: exponential (Figure 5), gamma (Figure 6), lognormal (Figure 7), and Weibull (Figure 8). A plot for the beta distribution could not be constructed on this data. When comparing the plots with the reference line (slope = 1), the Weibull distribution provides the best fit followed by Gamma, Exponential, and Lognormal. Goodness-of-Fit Test To confirm that the Weibull distribution best models the size data, a goodness-of-fit test, based on the chi-square distribution, ~ay be constructed. To run the test in SAS M, the area must be divided by 100 since the range of data must be between 0 and 1. Also, since the maximum eell frequency is limited to 30, \he maximum area is restricted to 9.9952 mils. Otherwise, the cell width would be too large to accurately group the data. The resUlts are summarized as follows: Chi-Square 164.0 49.8 1;188.8 39.5 Distribution Exponential Gamma Lognormal Weibull Simificance 0.0001 0.0105 0.0001 0.0569 At a=.05, the Weibull distribution cannot be rejected. WeibuU Distribution The Weibull distribution is defined as: c-1 f(x) c =~ (x- 9) exp [_(x- 9)] a =0 a 700 p=9 S2 = + ar(l + ~) [3] a 2[I'(1 + ~ - r(I + ~)i where r(n)=(n-l)! The Weibull distribution is used extensively in reliability engineering as a model of time to failure in electrical and mechanical components and systems. Examples where Weibull has been used include electrical devices such as memory elements, mechanical components such as bearings, and structural elements in aircraft and automobiles. Probability Plots Probability plots facilitate the comparison of a data distribution with a specified theoretical distribution. A probability plot is constructed by sorting the n nonmissing values: ~l: ~ S X(2~ S. ... S x(n).' The ith sorted value of x. wltep~ted by a ~mt on the plot whose y-J~ordinate is x. and whose x-coordinate is F-l«i-3/8)/(n+1/4)~~) F(.) indicates the distribution function within the specified family. The Weibull probability plot on area is indicated in Figure 9. Up to the 99th percentile, the distribution line fits the plotted points closely. Thereafter, excursions indicate that outliers may be present. Based on maximum likelihood estimates calculated using SAS, the fitted Weibull distribution parameters are as follows: = 0.962086 a = 1.6540641 9 and I' = 2.93224158 s = 2.66672661 c = 0.74964824 Since the distribution has been accurately modeled, process monitor may be established. For example, a restriction may be imposed on the calculated Weibull percentile. If a production lot exceeds this limit, the lot would be considered significantly different since it is not representative of the "known" population. Although the modeling distribution differs, this approach parallels standard quality control monitors based on the normal distribution. Conclusion ifx>9 a otherwise e = threshold (or location) parameter a = scale parameter (a>O) c = shape parameter (c>O). The minimum data value must be greater than the threshold parameter, 9. The mean (p) and variance (s1 of the Weibull distribution are: [2) Using simple graphical techniques such as the box and whisker plot, poP plot, and probability plot, nonnorma1 data may be analyzed and fit to the most appropriate distribution. Results may be statistically confirmed through the chi-square goodness- Statistics of-fit test. For the analyzed data, the asymmetry of the box and whisker plot for oxide sum suggests that the minimum value should be lowered to 81 fro~ 85. For the ceramic inclusion area in mils , the appropriate modeling distribution is Weibull which is confirmed by a statistical test. References Hogg, Robert V. and Ledolter, Johannes. Engineering Statistics. New York: Macmillan Publishing Company, 1987. Montgomery, Douglas C. Statistical Quality Control. New York: John Wiley k Sons, 1985. The Author SAS/QC Software: Reference, Version 6, First Edition. Cary, NC: SAS Institute, Inc., Melissa A. Durfee P.O. Box 168 Grafton, MA 01519 (508) 839-4689 1989. SAS andSAS/QC are registered trademarks of SAS Institute Inc., Cary, NC USA. FIGURE 1 UNIVARIATE STATISTICS, NORMAL AND BOX & WHISKER PLOTS HEAVY LIQUID SEPARATION OF CERAMIC INCLUSIONS OXIDE SUM >= 85 TOTAL COUNTS >= 1000 Univariate Procedure Variable=SUM N Mean Std Dev Skewness USS CV T:Mean=O Num A= 0 M(Sign) Sgn Rank D:Normal 100% 75% 50% 25% 0% Max Q3 Med Ql Min Range Q3-Ql Mode Lowest 85( 85( 85( 85( 85( Moments 3437 Sum wgts 91.02706 Sum 3.456889 Variance -0.00165 Kurtosis 28519786 CSS 3.79765 Std Mean 1543.742 ~r>ITI 3437 Num > 0 1718.5 pr>=IMI 2954102 Pr>= S 0~082014 Pr>D Quantiles(Def=5) 99 99% 94 95% 91 90% 88 10% 85 5% 1% 14 3437 312860 11.95008 -0.95163 41060.48 0.058965 0.0001 3437 0.0001 0.0001 <.01 98 97 96 86 85 85 6 92 Extremes Obs Highest 3432) 99 ( 3418) 99 ( 3405) 99( 3402) 99 ( 3376) 99 ( Obs 2473) 2543) 3027) 3043) 3136) 701 Statistics FIGURE 2 UNIVARIATE STATISTICS, NORMAL AND BOX & WHISKER PLOTS HEAVY LIQUID SEPARATION OF CERAMIC INCLUSIONS OXIDE SUM >= 85 TOTAL COUNTS >= 1000 Univariate Procedure variable=SUM Histogram 99.5+** .****** .**************** .************************** .****************************** .**************************************** .**************************************** 92.5+******************************************* .******************************************* .*********************************** .************************************ .********************************* .********************************* .**************************** 85.5+************************* ----+----+----+----+----+----+----+----+--- * # Boxplot 9 44 123 206 240 318 316 343 339 275 287 260 +-----+ I I I I +-----+ *--+--* 263 221 193 may represent up to 8 counts Normal Probability Plot +* +**** ******* *****+ ****++ ****++ ****++ 92.5+ ***++ ****+ ***+ ***+ **** **** ***** 85.5+**********+ +----+----+----+----+----+----+----+----+----+----+ -2 -1 0 +1 +2 99.5+ 702 Statistics FIGURE 3 UNIVARIATE STATISTICS, NORMAL AND BOX & WHISKER PLOTS HEAVY LIQUID SEPARATION OF CERAMIC INCLUSIONS OXIDE SUM >= 81 TOTAL COUNTS >= 1000 Univariate Procedure Variable=SUM N Mean Std Dev Skewness USS CV T:Mean=O Hum -. 0 MeSiqn) Sgn Rank D:Hormal 100% 75% 50% 25% 0% Max Q3 Med Ql Min Ranqe Q3-Ql Mode Moments 3900 Sum wqts 90.03795 Sum 4.23521 Variance -0.23865 Kurtosis 31686582 CSS 4.703806 Std Mean 1327.648 pr>ITI 3900 Hum > 0 1950 pr>=IMI 3803475 Pr>= S 0.088415 Pr>D Quantiles(Def=5) 99 99% 93 95% 90 90% 87 10% 81 5% 1% 18 3900 351148 17.93701 -0.7846 69936.38 0.067818 0.0001 3900 0.000l. 0.0001 <.01 98 96 95 84 83 81 6 92 Lowest 81e 81C 81 (- 81e 81e Extremes Obs Hiqhest 3865) 99{ 3659) 99( 3585) 99 ( 3573) 99 ( 3394) 99( Obs 2812) 2885) 3462) 3478) 3581) 703 Statistics FIGURE 4 UNIVARIATE STATISTICS, NORMAL AND BOX & WHISKER PLOTS HEAVY LIQUID SEPARATION OF CERAMIC INCLUSIONS OXIDE SUM >= 81 TOTAL COUNTS >= 1000 univariate Procedure Variable=SUM Histogram 99.5+** .****** .**************** .************************** .****************************** .**************************************** .**************************************** .******************************************* .******************************************* 90.5+*********************************** ~************************************ .********************************* .********************************* .**************************** .************************* .****************** .***************** .************** 81.5+*********** ----+----+----+----+----+----+----+----+--- * Be # 9 44 123 206 240 318 +-----+ 316 I I I I +-----+ 343 339 *--+--* 275 287 260 263 221 193 140 129 107 87 may represent up to 8 counts Normal probability Plot 99.5+ ++ * +++**** ****** ***** ****+ ****+ ***++ ****+ ***++ 90.5+ ***++ ***+ **+ **** *** *** *** *** ***** 81.5+******+ +----+----+----+----+----+----+----+----+----+----+ -2 704 -1 o +1 lot +2 Statistics FIGURES P-P PLOT: EXPONENTIAL DISTRIBUTION FIT ON AREA (mils**2) -------------------------------------------------------1.0+ 0.8 + A 0.6 + R E A ++++ ++++ ++++ 0.4 + 0.2 + +++ +++ ++ ++ +----------+----------+----------+----------+----------+ o .2 Observations: FIGURE 6 .4 .6 .8 1 Exponential(Thetax O.96 Scale=1.9?) + (2184 Hidden) P-P PLOT: GAMMA DISTRIBUTION FIT ON AREA (mils**2) ·1.0 + 0.8 + A 0.6 + R E A 0.4 + 0.2 + o+ +----------+----------+----------+----------+----------+ o .2 .4 .6 .8 1 Observations: Gamma(Theta=O.96 Shapex O.64 Scale=3.09) + (2118 Hidden) 705 Statistics FIGURE 7 1.0 + P-P PLOT: LOGNORMAL DISTRIBUTION FIT ON AREA (mils**2) -------------------------------------------------------- 0.8 + A 0.6 + R E I A 0.4 + +++ ++ +++ ++++ 0.2 + ++++ + ++ +----------+----------+----------+----------+----------+ .2 .4 .6 .8 1 o Observations: FIGURE 8 Loqnormal(Theta=l Shape=2.1 Scale=-.3) + (2209 Hidden) p-p PLOT: WEIBULL DISTRIBUTION FIT ON AREA (mils**2) 1.0+ 0.8 + A R E A 0.6 + 0.4 + 0.2 + o + +----------+----------+----------+----------+----------+ .2 .4 .6 .8 1 o Observations: 706 Weibull(Theta=.96 Shape=.75 Scale=1.7) + (2114 Hidden) s C! HEAW LIQUID SEPARATION OF CERAMIC INCLUSIONS OXIDE SUM >= 85 AND TOTAL COUNTS >= 1000 PROBA81UTY PLOT: WEI BULL DISTRI8 UTION RT ON AREA (mils**2} 80-:1·· . ~ cc + 70 ~:: : 60 ~:: : ~ ** .!!! .. . 50 ~ .. .. .. •. . . + -.5 401:: : Lti c:: + 't 30 ~:: : « + 20 10 - + + Om .01 I 75 90 95 99 99.9 99.99 WEIBULL PERCENTILES Werbull Llne: -..J o -..J - - Threshold=O.9621, Scole= 1.6541 ii ~.