Download How do I Test my Data for Normality? - Integral

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Bootstrapping (statistics) wikipedia , lookup

Foundations of statistics wikipedia , lookup

Student's t-test wikipedia , lookup

History of statistics wikipedia , lookup

Time series wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Misuse of statistics wikipedia , lookup

Transcript
How do I Test my Data for Normality?
Steven Wachs
Principal Statistician
Integral Concepts, Inc.
Copyright © 2009 Integral Concepts, Inc.
Many statistical tests and
procedures assume that data
follows a normal (bell-shaped)
distribution.
For example, all of the following
statistical tests, statistics, or
methods assume that data is
normally distributed:
•
•
•
•
•
Normal Distribution (Mean = 5, Std. Dev = 0.1)
4
3
2
Hypothesis tests such
1
as t tests, Chi-Square
tests, F tests
0
Analysis of Variance
4.7
4.8
4.9
5.0
5.1
(ANOVA)
x
Least Squares
Regression
Control Charts of Individuals with 3-sigma limits
Common formulas for process capability indices such as Cp and Cpk
5.2
5.3
Before applying statistical methods that assume normality, it is necessary to perform a normality
test on the data (with some of the above methods we check residuals for normality). We
hypothesize that our data follows a normal distribution, and only reject this hypothesis if we
have strong evidence to the contrary.
While it may be tempting to judge the normality of the data by simply creating a histogram of the
data, this is not an objective method to test for normality – especially with sample sizes that are
not very large. With small sample sizes, discerning the shape of the histogram is difficult.
Furthermore, the shape of the histogram can change significantly by simply changing the
interval width of the histogram bars.
Normal probability plotting may be used to objectively assess whether data comes from a
normal distribution, even with small sample sizes. On a normal probability plot, data that follows
a normal distribution will appear linear (a straight line). For example, a random sample of 30
data points from a normal distribution results in the first normal probability plot (below left).
Here, the data points fall close to the straight line. The second normal probability plot (below
right) illustrates data that does not come from a normal distribution.
1
PO Box 251652 • West Bloomfield, MI 48325 • Tel. 248-421-7590 • Fax. 248-539-3858 • www.integral-concepts.com
Probability Plot of Weibull Data
Probability Plot of Normal Data
Normal
Normal
99
99
Mean
StDev
N
AD
P-Value
95
Mean
StDev
N
AD
P-Value
95
90
80
80
70
70
Percent
Percent
90
5.002
0.09254
30
0.372
0.399
60
50
40
30
60
50
40
30
20
20
10
10
5
5
1
1
4.8
4.9
5.0
Normal Data
5.1
-2000
5.2
-1000
0
1000
2000
weibull
3000
4000
5000
The “P-Value” on the plot legend is based on the computed Anderson-Darling statistic and
provides an objective assessment of the hypothesis that the data is normally distributed (comes
from a normal distribution). How the p-value is formally defined is an advanced subject and is
not discussed in this article. Here, we focus on the interpretation.
If the p-value is “small” (usually less than 0.05), then we have strong evidence that the data is
not normal (does not come from a normal distribution). If the p-value is “large” (usually more
than 0.10), then we assume that the data is normal (comes from a normal distribution). Pvalues larger than 0.10 tell us that there is not sufficient evidence to reject the normality
assumption (hypothesis). In this case, we are justified in using statistical methods that assume
normality.
Many methods are available to handle non-normal data and these should be utilized when
necessary. Applying methods which assume the normal distribution when this assumption is
not valid often results in incorrect conclusions.
2
PO Box 251652 • West Bloomfield, MI 48325 • Tel. 248-421-7590 • Fax. 248-539-3858 • www.integral-concepts.com
256.8
803.7
30
7.382
<0.005