Download Crude Test for Normality: Normal Probability Plots Be gracious with

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Foundations of statistics wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

History of statistics wikipedia , lookup

Misuse of statistics wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript
Crude Test for Normality:
Normal Probability Plots
Be gracious with me on this one. I haven’t yet had a chance to proofread and wordsmith it
yet....
Recall that the Central Limit Theorem tells us that the sample mean x for samples of size n is
approximately normal for any value of n and that the distribution of x becomes more closely normal
as the sample size increases. Another consequence of the CLT is that if the original distribution of
x is normal, that of x is exactly normal for any n. If n is small and x is not normal, the distribution
of x may be far from normal thereby technically invalidating the use of the z- and t-distributions
for constructing confidence intervals and performing tests of hypothesis.
Although their is no universal agreement regarding what constitutes a sufficiently large sample
size, most texts suggest that for n greater than 30 or so, the distribution of x usually can be safely
regarded as being “normal enough.” We’ve adopted the point of view that in the absence of any
other information, we can use the z distribution if σx is known and we can use the t distribution if
σx is not known and must be approximated using the sample standard deviation S. However, we
sometimes have to consider numerical results with grains of salt when n is small unless we know
that x itself is close to being normally distributed.
Sometimes we can use a stem and leaf diagram or a relative frequency histogram to determine
whether x is approximately normal. However, if we are dealing with a small set of data, either of
these two displays can be awkward to construct and either can give misleading results. There are
several quite sophisticated tests for normality available but they are a bit beyond us in this course.
An alternative means of crudely assessing normality for small data sets is provided by normal
probability plots. These plots are constructed in a way to be roughly linear if x is approximately
normal.
Here’s the basic idea. Recall that is x is normal with mean µx and standard deviation σx , the
corresponding standardized normal is defined by
z=
x − µx
.
σx
Written another way
z = mx + b
where the slope m and the intercept b are given by
m=
1
σx
and
b=
µx
.
σx
If z is plotted versus x, we thus obtain a straight line. If x is approximately normal, we expect to
obtain a graph that is approximately linear.
On the TI-83, we can construct a normal probability plot in the following manner.
• Enter the x data in a list, say, L1.
• Although it is not absolutely necessary, it’s a good idea to sort the data in ascending order.
Do this using STAT/EDIT/SortA(L1).
• In 2nd STAT PLOT set up Plot 1 as follows.
• Select the lower right plot picture for the plot Type.
• Choose L1 for the Data List.
• For the Data Axis choose X is you want a plot of z vs x and choose Y if you want a plot
of x vs z.
• Choose either of the three available Mark symbols; the xz or zx data will be plotted using
the symbol you choose.
• Use ZOOM 9 to generate the desired normal probability plot.
• Inspect the plot to see if it is roughly linear indicating that x is roughly normal. We note that
the plot is also useful for detecting outliers. Generally, it’s a good idea to delete such outliers
from L1 and repeat the process using the reduced list of data.
Here is a brief explanation of what the TI does to generate the plot. It first estimates the
cumulative frequencies for each of the ordered values xi . Commonly used approximations for these
frequencies are
fi =
i
,
n+1
i = 1, ..., n
and
fi =
i − 0.375
,
n + 0.25
i = 1, ..., n.
The expected values of z are next calculated using invNorm(fi ,0,1). These expected z values are
used in the normal plot. It should be noted that different statistics packages estimate the fi in
various ways but all are based on the idea that z and x should be roughly linear related if x is
roughly normal.