Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
PROBABILITY PLOTT ING IN SAS Daniel M. Chilko, West Virginia University Gerry Hobbs, West Virginia University E. James Harner, West Virginia University Intr'oduction EBr charts or histop.:rams are th~ simplest and most fr~quently used They ~raphlcal representation of data. th .. reveal many oroperties of t~e data ra~ge of ~~ta values, the number of modesl whether the distribution is symmetric or skewed, th~ existence of outliers. Although bar charts reveal the g~neral shape of the distribution, i t Is sometimes difficult to determine w~ethp.r or not t~e data can ~e viewed as a sample from some hypothesl~ed distribution. ProbabilTty plots arp a ~raDhical reoresentation of data that focus on the dIstributional aspects of data. random .variable with a standard normal distribution is shown In Fi~ure 1. This ~raDh can he turned Into a stral~ht linp. hy transformi"~ the x axis to F(x) (hoth ax.es woulrl be probabll ities) or by transformln~ a~es would he the ~(x) axIs to x (both quantities). A samnle rl'istributTon functTon can be oroduced in SAS using PROC RANK with the PERCENT option and PROC PLOT. See Fip;:urF. 2. For data from symmetric distrihutions l this function is characterTstically S-sha~~~ and its point of inflection makes it difficult to work with. Bar charts are easy to f'troduce in SAS usin~ PROC CHART and PROC GCH~RT. Prohability plots are also easy to produce using SAS. If the orobability axis ls transformed,. thp. plot is now a pro,",ability plot. That is, a olot scales the probability prob~bility ~xis of a sample distribution function accor~in~ to some probabt, ity Probability plots distribution such that, if W~ chosp. thp correct distribution, the resultin~ plot is ~orp or less a straight linp~ A random varlable, x, is characterized by its distribution function ~r e~ample; the graph of F{x) for a Sample Cumulative Distribution Function Normal Distribution Function f'EHCE.HT 100 0.9 ., 0.' F I 00 0.' 70 o. , "' • • 50 C.5 " 0, • .0 0, , 0, • 20 kO z • 0.1 iB"'_.b 85,1 65.1 68_1:1 65 • .9 Pd'-'c.entag_ C«'II'P .. r D. 0F===:::::,----~---~----'T -I °, • 1 f .11111 ..... 2.. tnI.1 6:!31,;3 If ~ Normal Probability Plot xn are the orde:e-d ol1servations from a samplf." of SIZE' nJl I X , ••• ,. the scaljng of the ~xjs is achi~ved by flndinJ!; a set of values. Yl' Y2' .~ .. Yn (say) such that os, sx FCy.)·p. 1 (i3,o.t.: 1 Wh~re p. ~s dpnote a~nropriate c~ospn fr~ctioAs of the rllstribution correspondjn~ to the ~ Is:. Plottln~ the oairs (xl'Yl)~ (x2-1Y2)~ •• q (Xn'Yo) should result in a strai~ht Tine if tl1e X· IS a~e ~ sample fro~ a distr1bution t1"kvinr, a -r1i strihution function F{x). Si~ce se~ms , ·•" ", ,• ,• ••• • rea$onabl~ to consider th~ data as d~nendent upon the distribution function, probability plots ar~ usually constructed with th~ variable of interest as the vertical axis and th~ orobahi 1 i"ty distribution scale1 valu~s as the horizontal axis. it Kif'lhall (1963) Inv~sti~ated the of how to chose the values of P ~uestron I fo~ .Iven size n, for use in probability Dlots. G6.~I 6:5.5I 1 65".Ol-<-,~_~_ _~_ _~_ _~_ _~_ _~_ _~_ -L. S f D.5 l .• lIiIUI"'. 3. Si nee construction of I)~obabi is indenendent of the esti~nation 11 ty plots of scale and location parameters, one pot~ntial us~ is th~ ~stimation of these paraM~ters from the nlot Itse1f CFprrell, 1958)~ Interest ~~Y instead focus on the estimation of a partIcular percentile of the distrihution. Estimation of param~ters from a f)robalJi1lty rJlot requires fitting a str'ai~ht lln~ to the plot. Harner ~L 0-1. (1981) rlf>scrtbp.d the estimation ·of the 99th percentIle of a distrihutlon using various re~ression techn1Qu~s to fit th~ straight line us1ng SAS. plots orocedur'E's 0.0 nrobability plots provide an rnformal test of the hypothesis tna,t ~ sample comes from a normal distribution end c~n be nroduced In SAS hy using PROr R~NK wIth the nORMAL option and PROC PLOT. Ft)1;lJr~ 3 Is a normal pro,",abi'ity plot for thp. percenta~e of copp~r in 12 s~lilples from the Liberty Bell. of statistical -0.5 ~rmal Fur thermore thl;" sea I j ng is I nrlependent ~ny scale and location parameters, so that thA scaling reduces to findln~ the invers~ of the distribution function of a t·standariizerl ' ! r~ndom varl~ble. For ~xamDle, If x has a prohabl 1 ity distribution with location oarameter ,l and a scale pe~amet~r B, a olot of the PQj...,ts (:I(l~ll)' (;>(2.2:2)' ,~., (~n,zn)1 where z· = (y.-A}/~, WOuld still result in ~ st}ail':ht11 inp.. ')ro~<lbility -I. 0 l:rtll.njll!- tt'Or.ul = F-ICi/(~+1J) ~Ior:-rl<"ll • 1/(n+1) 11 usttatiol1, we chose I. Y. • fl'tl.().7.: The probleM of scalIng the orohabilrty axis now ~p.comes the oroblem of findin2 the inver5~ ~f the ~tstrihlltron function. That is, J.any • Oi.Ot.: Pi = (i-.375)/(n+.25) III • • C1,!il Pi = (i-.SlIn II For" t • tie". 01 a Some cOMmonly used va1ues arp. Pj • UI!I.5X assum~ th-F! data to have a normal rlistribution. It Interpretation of non-linear plots 1s trLte that randoTr! variables havlnf! normal or near normal distributions occur quite"oftp.n in natu~e, perhaps because the normal distribution tS the llmltin~ ~Tstributlon of a random variable which represents the sum of e series of indep~ndent and identically distrlhuterl random vari~bles. lliite ofteh when data is ~lott~d on a particular probability plot, the plot does nnt a~pear too straight~ Abbot (1960) and Kln~ (1965) ~ave Investigated non-linear plots. Th~rr sturlies show that many such plots have a simple and str~ightfc~war'd explanation. 2 I~ Normal Probability Plot . Rgure t., shows a good fit in t"e mIddle of the plot but that the plot tends to fl~tten out at each end. A scarcity of values at the hi~h end usually indjcates a InSDecltlon an1 selectIon process that remov~s unacceptable values. A scarcity of vaTup.s at the lnw ~nd may tndicate selection to a minimum specification or measur i ng e~ui pment wh i ch may ·not "'ave rpsolution he10w some particular value. ., . ~' .," r .. • I' , v .25 r , ." , • ,. " ,• I 22 21 / 20 ../' I. ,. " Normal Probability Plot " · ? ,,•.' f 0.' i: ~, ~ Ii ~ 1-' g ~ , •r • ••• f -< .• -1>•• -<.g ~ i' I, -1.2 ,",.f. -1.5 ! ~ 0.< [Q!U"'. 5. Normal Probability Plot 1000- / 0.0 :,' r,' -1. 5 0.' r ~. . F I ,J .. , "r- " '" • ,/ 1. > •i -9. Q ,.,....... I.' ~ , 1S I .• i I 6:000 r • '" S!lao •, ·, o tODD c "rr ·, :31000 n 20CO -i. II 0.0 -3.0- I. , '.0 1000 I' , > ! < i,i ~ . • ••• ''r~--~~~'~'~'~'~';';'~'~'-'~~r'-~---------,----------~ ~ I , ... ~ 2B A COnvex plot usually indicates a left-skewed distribution. A COncaVe plot Indicates a right-skewed distribution. See FI8ure 6. A log~normal ~robabillty plot Is a good ne~t step for this pattern. See Fi~ure 7. ";'. i; ' " " • plot characterized by two fairly straight portions connected by as-shaped connection indtcates a bimodal distribution. Se~ Fi~ure 5. The detection of two sources for th~ data when only one is expected can be an Important b~nefit. -' •, ~2." FI '.0 IIW"C ... F IlIIl.lroe- 51. 3 l.2- 2•• Log-normal Probability Plot Normal Probability Plot PerllentQCII. Ccppar 12 Sd.aI>Le. frll. 'L ,beMY aell tI!J • .s~ • 59. OX • L 6t1.5i , o • • T "• 5 •", ,• • • ," • ••• ,"' " • ," , • , GII.Ol ,. 0'1. s;; •• fn. ct:· 0 .' •• •• • • • C ••• • 05.51: " • 00:. O~ 65• .31 ,:r c,;,____~__-._________.--------_.--------~ 1 -:2.4 G.O -L, 2 85.CI.l!...,_ _~_ _~_ _~_ _~_ _~_ _~_~~ 2•• l ..... e,..•• No,....a! ·'2.0 -1.5 -1. (l -0..5 o. a 1. G F 1&\11"'0- "I. Rp.ference lirie-s to of ~ddltlonal aid to the interpretation norm2l1 rronablTity rtlot IS a E is usually assumed to be a vector of independently an~ Identically dlstribute~ random variables, each normally distributed with ~ean zero and constant variance. do rp.ference line wnlch corresponds to a normal distrr~ution with a specified mean ~nd var'i ances. PROC MEA.NS can be used to produce a data set containing the usual mom~nt estlmates. A short n~TA step that processes this data set ~Rkes i t ~asy to add ref~re"ce lines to In a r~~reSs'Dn analysis, the differences ~etween ohserved and prp.dicted values are call~d residuals. That Is, pronability plots in SAS. See Fi,l!ure 8. The sample mean and varlnnce are not robust and estTmates of scale and location p_arameters based on order statistics are ~ore useful when outliers are present in the data. Hillyer (1978) investigated the uSe of moment and Quantile ~stimators in a cont~xt si~ilar to or'obabi 1 i ty pl otti n~. The I)resent authors dunl h'!ated his rp.sults using SAS~ PROC ~OPT, for example, ~roduces order statistic.s. T" fZ. Y- Y w-her"e Y Xb and b are the least SQuares estimates. If the underlying model assumptions are tru~, then the ,'s have normal distributions, each wlth mean ZerO. They do not, in ~eneral, have the sa~e variance nor are they independently distrbuted. EO ralf-norm.al prohability nlots provide an test of the normal I ty of the residuals in a re~ression analysis. H1.Ilf-normal plots show more sensitivity to kurtosis at the expense of not revealTn~ skewness. A detaIled discussion of Qroducing half-normal probability plots in SAS was ~iven by I nformal Half-nor~al Wlel1 Q probability plots random vari able has a norma 1 mean zero,· the rHstribution with absolute value af is saij to have a djstribution. In this random variable half-normal the linp.ar r~~resslon SaIl y ~ (197B). A useful eXDoslt;on on the Interpretation of half-normal plots fram~\10rk was ~iven by Panlel and .Tood (1971). XB • E 4 Exponential Gamma orobakilfty plots .,,1 Ie the normal ~Istrlbution Is of Importance in statistics, the gamma 11stribution is also encountered frequently. The general ~amm~ distribution dep~nds on a location l scale- f and shape oarameter. The general ~amma distrl~ution can bp transform~d to a standardized distribution with only a shape oaraMeter~ The chi-sQuare- and exponential distributions arp specral cases of the ~a~a distribution. A chi-square random variab1e ",:fth rlegr~e5 of fre~dom ~ is a gamma random variable with shape narameter eqoal to d/2; ~n exponential r~ndom variable is a gamma distribution with shape oarameter equal to 1. Wilk et. a1. (1962) d.scrl~ed the construction and interpretation of gamma probability olots. T~e SAS function GAMI NV can be used to produce ~aml11a prooabl i ty plots. ~lot The exponential distribution Is often used to characterize fallur~ nr waiti~~ ti~e rlistributions. FI~ure 9 is an sin~ular f 1: proba~ility ~xponential nro .... Abillty oint fl-r,r"r/uced hy SAS for thE \','aitinp: tf'l1es hetween major train wrecks in the U. S. durinv the period from 1900 to 1960. Chi-square probabr,ity p10t Sample variances or mean squares fro'll a normal pODulation have a chi-souare distribution. A chi-square probability olot can he used to provldp. an infor~al test of the hOMo~enlety of sample varjances. Fi~ure 10 IS a chi-square prollabi 1 i ty plot for' the sample varf ances af the 2mount of nitro~en in 5 red clover plants innoculat~d with 6 ~Iffer'ent hacteria stral~s. Exponential Probability Plot Chi-square probability plot ... N,t.".,ga'rl ",o",t,,"r>~ Df' '"'lid 01 ........ pl .... t:~ Il"Ine,,'-IICLtc.d .Ith _lib tna,tIon <lultur-•• , 0'1' r+.-,"ob,\j. tro,"'..,l, .tra.'n~ and rh,,, .. b , ..... ,,1, t .. t, .t.ra,n., '1"1_11 saoa ., • ~DOD • , •• 30 '" '<'i'00 ·" " e 2"00 H a, J ,an" 21.00 • ,• 190e • "• Isao .18 • • "" • ro 12;QD ,• · u o • , J •.12 100 , 800 a " 01 G 3tJ:o 0.0 0.5 1.0 L.i:i '.0 2.5 .. , • Q.5 f , lIyr .. 9. 5 • Q '. " 3.n 3.5 References Itlhot, W. H. (1960), Probab! I! ty Charts, Private pubncation, St. Petersburt!:, FA.. !En!el, C. and F. Equattons to Data~ ~Iood (1971), Fittin. John Wiley an~ Sons; Ilew York, IIY. Ferrell, E.~. (1958), Plotting Experimental Data on Normal or lo~-normal Probability Paper, Industr-icll Quality Control, 15, pp. 12-15. H3nson, V.F., J.H. Carlson, K.M. Papauchado, and N.A. Nielson, (1976), The liberty Bell: Composition of the Famous Failure, 4merican Scientist, 64, pp. 614-619. ti>rner, E.J .. G.H. Hobbs, E.C. Keller Jr., A.G. Everett, and D.M. Chilko (1ge}), Assessing Estimates of the 99th Percentile of a OTstrlbution, En\!' i ronme tries Ptoced i ngs, (to appear). Hi lIyer, 11 •.J. (1978), Evaluation of the EffeGt of Distributional ~$sumption5 o~ St~tistical Form~ of the Photochemical Oxidant Standard, Systems Apnlications, Inc., San-Rafael, CA. Kimbell, B.F. (1960), On the Choice of Plotting Positions on Prohability Journal of American Statistical Association; 55, PD. 5~6-560. P~pert i King, J.R. (1965), Graphical Data Analysis with Probabiltty Papers, Technical and ~n~ineerrng Aids for Management; Lowell, ~A • (1978), SAS R~gression Appl1cations, SAS Technica' Report A.-I02, SAS Institute, Inc., Cary, ~!C. . 9>11, J.P. Wi lk~ M.B., R. Gnanadeskan~ and M.J. Iluyet (1962), ProbabIlity Plottln~ for the Gamma Distribution, ~, Tp.chnometr~cs, PP. }-20. 6