Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
PROBABILITY PLOTT ING IN SAS
Daniel M. Chilko, West Virginia University
Gerry Hobbs, West Virginia University
E. James Harner, West Virginia University
Intr'oduction
EBr charts or histop.:rams are th~
simplest and most fr~quently used
They
~raphlcal representation of data.
th ..
reveal many oroperties of t~e data
ra~ge of ~~ta values, the number of
modesl whether the distribution is
symmetric or skewed, th~ existence of
outliers. Although bar charts reveal
the g~neral shape of the distribution,
i t Is sometimes difficult to determine
w~ethp.r or not t~e data can ~e viewed as
a sample from some hypothesl~ed
distribution. ProbabilTty plots arp a
~raDhical reoresentation of data that
focus on the dIstributional aspects of
data.
random .variable with a standard normal
distribution is shown In Fi~ure 1. This
~raDh can he turned Into a stral~ht linp.
hy transformi"~ the x axis to F(x) (hoth
ax.es woulrl be probabll ities) or by
transformln~
a~es would he
the ~(x) axIs to x (both
quantities).
A samnle rl'istributTon functTon can be
oroduced in SAS using PROC RANK with the
PERCENT option and PROC PLOT. See
Fip;:urF. 2.
For data from symmetric
distrihutions l
this function is
characterTstically S-sha~~~ and its
point of inflection makes it difficult
to work with.
Bar charts are easy to f'troduce in
SAS usin~ PROC CHART and PROC GCH~RT.
Prohability plots are also easy to
produce using SAS.
If the orobability axis
ls transformed,. thp. plot is now a
pro,",ability plot. That is, a
olot scales the probability
prob~bility
~xis
of a sample distribution function
accor~in~ to some probabt, ity
Probability plots
distribution such that, if W~ chosp. thp
correct distribution, the resultin~ plot
is ~orp or less a straight linp~
A random varlable, x, is characterized
by its distribution function
~r
e~ample;
the graph of F{x) for a
Sample Cumulative Distribution Function
Normal Distribution Function
f'EHCE.HT
100
0.9
.,
0.'
F
I
00
0.'
70
o. ,
"'
•
•
50
C.5
"
0, •
.0
0, ,
0,
•
20
kO
z
•
0.1
iB"'_.b
85,1
65.1
68_1:1
65 • .9
Pd'-'c.entag_ C«'II'P .. r
D.
0F===:::::,----~---~----'T
-I
°,
•
1
f
.11111 .....
2..
tnI.1
6:!31,;3
If
~
Normal Probability Plot
xn are the orde:e-d
ol1servations from a samplf." of SIZE' nJl
I
X
,
••• ,.
the scaljng of the ~xjs is achi~ved by
flndinJ!; a set of values. Yl' Y2' .~ .. Yn
(say) such that
os, sx
FCy.)·p.
1
(i3,o.t.:
1
Wh~re p. ~s dpnote a~nropriate c~ospn
fr~ctioAs of the rllstribution
correspondjn~ to the ~ Is:.
Plottln~ the
oairs (xl'Yl)~ (x2-1Y2)~ •• q (Xn'Yo)
should result in a strai~ht Tine if tl1e
X· IS a~e ~ sample fro~ a distr1bution
t1"kvinr, a -r1i strihution function F{x).
Si~ce
se~ms
,
·•"
",
,•
,•
•••
•
rea$onabl~
to consider
th~ data as d~nendent upon the
distribution function, probability plots
ar~ usually constructed with th~
variable of interest as the vertical
axis and th~ orobahi 1 i"ty distribution
scale1 valu~s as the horizontal axis.
it
Kif'lhall (1963) Inv~sti~ated the
of how to chose the values of P
~uestron
I
fo~
.Iven size n, for use in probability
Dlots.
G6.~I
6:5.5I
1
65".Ol-<-,~_~_ _~_ _~_ _~_ _~_ _~_ _~_
-L. S
f
D.5
l .•
lIiIUI"'.
3.
Si nee construction of
I)~obabi
is indenendent of the
esti~nation
11 ty plots
of
scale and location parameters, one
pot~ntial us~ is th~ ~stimation of these
paraM~ters from the nlot Itse1f
CFprrell, 1958)~ Interest ~~Y instead
focus on the estimation of a partIcular
percentile of the distrihution.
Estimation of param~ters from a
f)robalJi1lty rJlot requires fitting a
str'ai~ht lln~ to the plot.
Harner ~L
0-1. (1981) rlf>scrtbp.d the estimation ·of
the 99th percentIle of a distrihutlon
using various re~ression techn1Qu~s to
fit th~ straight line us1ng SAS.
plots
orocedur'E's
0.0
nrobability plots provide an
rnformal test of the hypothesis tna,t ~
sample comes from a normal distribution
end c~n be nroduced In SAS hy using PROr
R~NK wIth the nORMAL option and PROC
PLOT. Ft)1;lJr~ 3 Is a normal pro,",abi'ity
plot for thp. percenta~e of copp~r in 12
s~lilples from the Liberty Bell.
of
statistical
-0.5
~rmal
Fur thermore thl;" sea I j ng is I nrlependent
~ny scale and location parameters, so
that thA scaling reduces to findln~ the
invers~ of the distribution function of
a t·standariizerl ' ! r~ndom varl~ble. For
~xamDle, If x has a prohabl 1 ity
distribution with location oarameter ,l
and a scale pe~amet~r B, a olot of the
PQj...,ts (:I(l~ll)' (;>(2.2:2)' ,~., (~n,zn)1
where z· = (y.-A}/~, WOuld still result
in ~ st}ail':ht11 inp..
')ro~<lbility
-I. 0
l:rtll.njll!- tt'Or.ul
= F-ICi/(~+1J)
~Ior:-rl<"ll
•
1/(n+1)
11 usttatiol1, we chose I.
Y.
•
fl'tl.().7.:
The probleM of scalIng the orohabilrty
axis now ~p.comes the oroblem of findin2
the inver5~ ~f the ~tstrihlltron
function. That is,
J.any
•
Oi.Ot.:
Pi = (i-.375)/(n+.25)
III
• •
C1,!il
Pi = (i-.SlIn
II
For" t
•
tie". 01
a
Some cOMmonly used va1ues arp.
Pj
•
UI!I.5X
assum~
th-F! data to have a normal rlistribution.
It
Interpretation of non-linear plots
1s trLte that randoTr! variables havlnf!
normal or near normal distributions
occur quite"oftp.n in natu~e, perhaps
because the normal distribution tS the
llmltin~ ~Tstributlon of a random
variable which represents the sum of e
series of indep~ndent and identically
distrlhuterl random vari~bles.
lliite ofteh when data is ~lott~d on a
particular probability plot, the plot
does nnt a~pear too straight~ Abbot
(1960) and Kln~ (1965) ~ave Investigated
non-linear plots. Th~rr sturlies show
that many such plots have a simple and
str~ightfc~war'd explanation.
2
I~
Normal Probability Plot
.
Rgure t., shows a good fit in t"e mIddle
of the plot but that the plot tends to
fl~tten out at each end.
A scarcity of
values at the hi~h end usually indjcates
a InSDecltlon an1 selectIon process that
remov~s unacceptable values.
A scarcity
of vaTup.s at the lnw ~nd may tndicate
selection to a minimum specification or
measur i ng e~ui pment wh i ch may ·not "'ave
rpsolution he10w some particular value.
.,
.
~'
.,"
r
..
•
I'
,
v
.25
r
,
."
,
•
,.
"
,•
I
22
21
/
20
../'
I.
,.
"
Normal Probability Plot
"
·
?
,,•.'
f
0.'
i:
~,
~
Ii
~
1-'
g
~
,
•r
•
•••
f
-< .•
-1>••
-<.g
~
i'
I,
-1.2
,",.f.
-1.5
!
~
0.<
[Q!U"'.
5.
Normal Probability Plot
1000-
/
0.0
:,'
r,'
-1. 5
0.'
r
~.
.
F
I
,J
.. ,
"r-
"
'"
•
,/
1. >
•i
-9. Q
,.,.......
I.'
~
,
1S
I .•
i
I
6:000
r
•
'" S!lao
•,
·,
o tODD
c
"rr
·,
:31000
n
20CO
-i. II
0.0
-3.0-
I. ,
'.0
1000
I'
,
>
!
<
i,i
~
.
•
•••
''r~--~~~'~'~'~'~';';'~'~'-'~~r'-~---------,----------~
~
I
,
...
~
2B
A COnvex plot usually indicates a
left-skewed distribution. A COncaVe
plot Indicates a right-skewed
distribution. See FI8ure 6. A
log~normal ~robabillty plot Is a good
ne~t step for this pattern. See Fi~ure
7.
";'.
i;
'
"
"
• plot characterized by two fairly
straight portions connected by as-shaped
connection indtcates a bimodal
distribution. Se~ Fi~ure 5. The
detection of two sources for th~ data
when only one is expected can be an
Important b~nefit.
-'
•,
~2."
FI
'.0
IIW"C ...
F IlIIl.lroe- 51.
3
l.2-
2••
Log-normal Probability Plot
Normal Probability Plot
PerllentQCII. Ccppar
12 Sd.aI>Le. frll. 'L ,beMY aell
tI!J • .s~
•
59. OX
•
L
6t1.5i
,
o •
•
T
"• 5
•",
,• •
•
,"
•
•••
,"'
"
•
," ,
•
,
GII.Ol
,.
0'1. s;;
••
fn. ct:·
0
.'
••
••
• •
•
C
•••
•
05.51:
"
•
00:. O~
65• .31
,:r
c,;,____~__-._________.--------_.--------~
1
-:2.4
G.O
-L, 2
85.CI.l!...,_ _~_ _~_ _~_ _~_ _~_ _~_~~
2••
l ..... e,..•• No,....a!
·'2.0
-1.5
-1. (l
-0..5
o. a
1. G
F 1&\11"'0- "I.
Rp.ference lirie-s
to
of
~ddltlonal aid to the interpretation
norm2l1 rronablTity rtlot IS a
E is usually assumed to be a vector of
independently an~ Identically
dlstribute~ random variables, each
normally distributed with ~ean zero and
constant variance.
do
rp.ference line wnlch corresponds to a
normal distrr~ution with a specified
mean ~nd var'i ances.
PROC MEA.NS can be used
to produce a data set containing the
usual mom~nt estlmates. A short n~TA
step that processes this data set ~Rkes
i t ~asy to add ref~re"ce lines to
In a r~~reSs'Dn analysis, the
differences ~etween ohserved and
prp.dicted values are call~d residuals.
That Is,
pronability plots in SAS. See Fi,l!ure 8.
The sample mean and varlnnce are not
robust and estTmates of scale and
location p_arameters based on order
statistics are ~ore useful when outliers
are present in the data. Hillyer (1978)
investigated the uSe of moment and
Quantile ~stimators in a cont~xt si~ilar
to or'obabi 1 i ty pl otti n~. The I)resent
authors dunl h'!ated his rp.sults using
SAS~
PROC ~OPT, for example, ~roduces
order statistic.s.
T"
fZ.
Y- Y
w-her"e Y
Xb and b are the least SQuares
estimates.
If the underlying model
assumptions are tru~, then the ,'s have
normal distributions, each wlth mean
ZerO.
They do not, in ~eneral, have the
sa~e variance nor are they independently
distrbuted.
EO
ralf-norm.al prohability nlots provide an
test of the normal I ty of the
residuals in a re~ression analysis.
H1.Ilf-normal plots show more sensitivity
to kurtosis at the expense of not
revealTn~ skewness.
A detaIled
discussion of Qroducing half-normal
probability plots in SAS was ~iven by
I nformal
Half-nor~al
Wlel1
Q
probability plots
random vari able has a norma 1
mean zero,· the
rHstribution with
absolute value af
is saij to have a
djstribution.
In
this random variable
half-normal
the linp.ar r~~resslon
SaIl
y
~
(197B).
A useful eXDoslt;on on
the Interpretation of half-normal plots
fram~\10rk
was ~iven by Panlel and .Tood (1971).
XB • E
4
Exponential
Gamma orobakilfty plots
.,,1 Ie the normal ~Istrlbution Is of
Importance in statistics, the
gamma 11stribution is also encountered
frequently. The general ~amm~
distribution dep~nds on a location l
scale- f and shape oarameter. The general
~amma distrl~ution can bp transform~d to
a standardized distribution with only a
shape oaraMeter~ The chi-sQuare- and
exponential distributions arp specral
cases of the ~a~a distribution. A
chi-square random variab1e ",:fth rlegr~e5
of fre~dom ~ is a gamma random variable
with shape narameter eqoal to d/2; ~n
exponential r~ndom variable is a gamma
distribution with shape oarameter equal
to 1. Wilk et. a1. (1962) d.scrl~ed the
construction and interpretation of gamma
probability olots. T~e SAS function
GAMI NV can be used to produce ~aml11a
prooabl i ty plots.
~lot
The exponential distribution Is often
used to characterize fallur~ nr waiti~~
ti~e rlistributions.
FI~ure 9 is an
sin~ular
f
1:
proba~ility
~xponential
nro .... Abillty oint fl-r,r"r/uced hy
SAS for thE \','aitinp: tf'l1es hetween major
train wrecks in the U. S. durinv the
period from 1900 to 1960.
Chi-square probabr,ity p10t
Sample variances or mean squares fro'll a
normal pODulation have a chi-souare
distribution. A chi-square probability
olot can he used to provldp. an infor~al
test of the hOMo~enlety of sample
varjances. Fi~ure 10 IS a chi-square
prollabi 1 i ty plot for' the sample varf ances
af the 2mount of nitro~en in 5 red
clover plants innoculat~d with 6
~Iffer'ent hacteria stral~s.
Exponential Probability Plot
Chi-square probability plot
...
N,t.".,ga'rl ",o",t,,"r>~ Df' '"'lid 01 ........ pl .... t:~
Il"Ine,,'-IICLtc.d .Ith _lib tna,tIon <lultur-••
,
0'1' r+.-,"ob,\j. tro,"'..,l, .tra.'n~ and
rh,,, .. b , ..... ,,1, t .. t, .t.ra,n., '1"1_11
saoa
.,
• ~DOD
•
,
••
30
'" '<'i'00
·"
"
e 2"00
H
a,
J
,an"
21.00
•
,•
190e
•
"• Isao
.18
•
•
""
•
ro 12;QD
,•
·
u
o
•
,
J
•.12
100
,
800
a
"
01
G
3tJ:o
0.0
0.5
1.0
L.i:i
'.0
2.5
.. ,
•
Q.5
f , lIyr .. 9.
5
•
Q
'. "
3.n
3.5
References
Itlhot, W. H. (1960), Probab! I! ty Charts,
Private pubncation, St. Petersburt!:, FA..
!En!el, C. and F.
Equattons to
Data~
~Iood
(1971), Fittin.
John Wiley
an~
Sons;
Ilew York, IIY.
Ferrell, E.~. (1958), Plotting
Experimental Data on Normal or
lo~-normal Probability Paper, Industr-icll
Quality Control, 15, pp. 12-15.
H3nson, V.F., J.H. Carlson, K.M.
Papauchado, and N.A. Nielson, (1976),
The liberty Bell: Composition of the
Famous Failure, 4merican Scientist, 64,
pp.
614-619.
ti>rner, E.J .. G.H. Hobbs, E.C. Keller
Jr., A.G. Everett, and D.M. Chilko
(1ge}), Assessing Estimates of the 99th
Percentile of a OTstrlbution,
En\!' i ronme tries Ptoced i ngs, (to appear).
Hi lIyer, 11 •.J. (1978), Evaluation of the
EffeGt of Distributional ~$sumption5 o~
St~tistical Form~ of the Photochemical
Oxidant Standard, Systems Apnlications,
Inc., San-Rafael, CA.
Kimbell, B.F.
(1960), On the Choice of
Plotting Positions on Prohability
Journal of American Statistical
Association; 55, PD. 5~6-560.
P~pert
i
King, J.R. (1965), Graphical Data
Analysis with Probabiltty Papers,
Technical and ~n~ineerrng Aids for
Management; Lowell, ~A •
(1978), SAS R~gression
Appl1cations, SAS Technica' Report
A.-I02, SAS Institute, Inc., Cary, ~!C.
. 9>11, J.P.
Wi lk~ M.B., R. Gnanadeskan~ and M.J.
Iluyet (1962), ProbabIlity Plottln~ for
the Gamma Distribution,
~,
Tp.chnometr~cs,
PP. }-20.
6