Download Running Head: Normality and Outliers in ANOVA and MANOVA

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Psychometrics wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

History of statistics wikipedia , lookup

Categorical variable wikipedia , lookup

Misuse of statistics wikipedia , lookup

Time series wikipedia , lookup

Analysis of variance wikipedia , lookup

Transcript
1
Running Head: Normality and Outliers in ANOVA and MANOVA
Checking for Normality and Outliers in ANOVA and MANOVA
Lynne Cox
University of Calgary
2
NORMALITY AND OUTLIERS IN ANOVA AND MANOVA
Checking for non-normality and outliers in ANOVA and MANOVA
A parametric test is a statistical procedure that takes a sample statistic and applies those
results to make inferences regarding the general population. To ensure the components of the
test are compatible with each other, there are assumptions that must be met within each
multivariate analysis (Stevens, 2009). Once you have collected your data and before moving
forward with statistical analysis, the next step is to look at the quality of the data and take some
necessary precautions. Data screening involves checking if the data has been correctly inputted,
checking for missing values and outliers and checking for normality (Hindes, 2012).
Two assumptions I will cover in this paper will be checking for non-normality and
outliers in the Analysis of Variance (ANOVA) and the Multivariate Analysis of Variance
(MANOVA).

ANOVA is a statistical technique used to determine the degree of difference between
three or more groups.

MANOVA is an extension of the ANOVA, but it tests the difference in means
between two or more groups in vectors of means and allows the examination of two
or more dependant variables.
Retrieved from http://www.creative-wisdom.com/pub/parametric_WUSS2002.pdf
Based on the Central Limit Theorem, one of the assumptions of parametric tests is that
the variables are normally distributed. This Central Limit Theorem states that in a large sample
3
NORMALITY AND OUTLIERS IN ANOVA AND MANOVA
size the mean and the sum of the sample will tend to follow a normal distribution, commonly
referred to as the Bell Curve (Stevens, 2009).
If a dataset follows a normal distribution, then about 68% of the observations will fall
within one standard deviation of the mean, 95% within 2 standard deviations and 99.7% will fall
within 3 standard deviations of the mean. Although no method gives a definitive conclusion,
two ways to evaluate normality is through graphical representation and statistical methods.
Read more at: Central Limit Theorem: A Simple Explanation of the CLT | Suite101.com
http://suite101.com/article/central-limit-theorem-a98157#ixzz1yOeaq8k5
In this example the lighter area shows 68% of
the observations falling within -1 and +1 standard
deviations of the mean. About 95% of the observations
fall within 2 standard deviations of the mean (-2, 2) and
about 99.7% of observations fall within 3 standard
deviation of the mean (-3, 3), resulting in a bell curve.
Retrieved from http://www.stat.yale.edu/Courses/1997-98/101/normal.htm
Outliers
Outliers are data points that are extreme, atypical and infrequent. The values are far from
the mean and fall outside the distribution pattern. Outliers are not always random or by chance
and need to be given special notice, as a single outlier can have an excessive influence on the
size and direction of the strength and direction of the linear relationship between two variables
4
NORMALITY AND OUTLIERS IN ANOVA AND MANOVA
(Sattler, 2008, p. 99). In large data samples, you can expect to find a small number of outliers.
A data sample will always have a sample minimum and a sample maximum, but this does not
mean the outlier will fall within this range, as the sample minimum and sample maximum may
be closer to the other data points
Outliers can be caused by a data recording or entry error, instrument error, or by subjects
being simply different from the rest of the sample (Sattler, 2008). Since outliers might cause
your data to be non-normal, it is important to identify the cause of the outlier and then decide
what to do about them (Stevens, 2009, p. 11)
Examples of outliers

easy to spot when there are 2 data sets
Examples with a small data and a larger data set:
Case Number x1
1
111
2
92
3
90
4
107
5
98
6
150
7
118
8
110
9
117
10
94
x2
68
46
50
59
50
66
54
51
59
97
In this data set, it is easy to spot that case
number 6 in x1 is unexpected and has a
higher value than the others as is case
number 10 in x2. Both of these can be
identified as outliers, just by observing
the data set (Stevens, 2009, p. 11)
In the example below, it is harder to identify the outliers when using a larger data set with four
variables:
Case Number x1
1
111
2
92
3
90
4
107
5
98
6
150
7
118
x2
68
46
50
59
50
66
54
x3
17
28
19
25
13
20
11
x4
81
67
83
71
92
90
101
In this example, case 13 does not seem to
split off dramatically from the other
subject scores, and at first glance, case 13
does not stand out, yet a closer look at
case 13 shows X2, X3, and X4 the scores
look low, but X1 the score is high
compared to the rest of the data set
(Stevens, 2009, p. 12)
5
NORMALITY AND OUTLIERS IN ANOVA AND MANOVA
8
9
10
11
12
13
14
15
110
117
94
130
118
155
118
109
51
59
97
57
51
40
61
66
26
18
12
16
19
9
20
13
82
87
69
97
78
58
103
88
The Boxplot is a graphical display that shows the
black line as the median, the shaded area as the
middle (or 50 % of the scores), top and bottom as
25% of the scores. The smallest and largest (nonoutlier) scores designate the bottom and top lines.
An open dot identifies mild outliers (scores more
than 1.5 IQR (Interquartile range) and stars
indicate the extreme outliers which are scores
more than 3 IQR from the rest of the scores.
Normal Q-Q plot is another
graphical way to look at the level
of normality and identify outliers.
You can see all but one of the dots
fall on or within a very close range
of the line of regression.
Retrieved from
http://www.psychwiki.com/wiki/How_do_I_determine_whether_my_data_are_normal%3F
When there are extreme values in a data set, it is better to use the median as a measure of
central tendency, as the median is unaffected by outliers and is a strong measure of central
6
NORMALITY AND OUTLIERS IN ANOVA AND MANOVA
tendency (Meyers, 2006). If an outlier is discovered it is important to identify the cause before
making the decision on further analysis. If the data has been entered incorrectly it can be reentered or if the data was due to an instrumentation error it could be dropped. The analysis can
also be run once with the outlier and once without the outlier.
Checking for non-normality and outliers in an ANOVA
As previously mentioned, the two main methods of assessing normality are:

Graphically- using a visual inspection

Numerically-relying on a statistical test
As a beginning researcher, it is recommended that both methods, rather than relying on
just one method are carried out. Using SPSS (Statistical Package for the Social Sciences) allows
you to test for both Normality and Outliers. Using the Explore command in SPSS we are able to
first look for any outliers and then test for Normality.
Using SPSS to look for univariate (ANOVA) outliers
1. Choose Analyse, Descriptive Statistics
and then Explore.
2. Select variable
2. Click on Statistics and check off outliers
3. Click on Plots and unclick Stem and Leaf
4. Click OK to produce output
Retrieved from https://statistics.laerd.com/spss-tutorials/testing-for-normality-using-spssstatistics.php
7
NORMALITY AND OUTLIERS IN ANOVA AND MANOVA
Descriptives Table

The mean and trimmed mean will help identify outliers. In the case the Mean is 1.77
while the 5% Trimmed Mean is 1.74, only slightly lower. The trimmed mean shows that
5% of the higher and lower scores have been removed. By comparing the two scores you
can identify if any extreme scores are having an influence on the variable.

Extreme Values and the Boxplot
8
NORMALITY AND OUTLIERS IN ANOVA AND MANOVA

The Boxplot and the Extreme Values tables, show the mild and extreme outliers.
Referring to the Extreme Values tables you can identify the case number. This
information will help guide the decision on what is to be done with the outliers. You may
choose to re-enter data, get rid of the outlier, or run two analyses; one with the outlier and
one without.

Using SPSS to check for non-normality in an ANOVA
1. Choose Analyze, Descriptive
Statistics and then Frequencies
2. Select variables
3. Click on Charts, Histogram,
with normal curve
4. Click OK to produce output

Most tests rely on the assumption of normality. Referring to the descriptive table, we are
able to begin by looking at the measures of skewness and kurtosis. Skewness measures
the symmetry of a distribution while kurtosis measures the general peakedness of a
9
NORMALITY AND OUTLIERS IN ANOVA AND MANOVA
distribution. A normal distributed variable (showing mesokurtosis) will show values of
skewness and kurtosis around zero (Meyers, 2006).

Although the Histogram is another approach to be included in looking for non-normality
in univariate analysis, it does not provide a definitive indication of violation of normality.
The histogram should be used with the probability plot. These plots rank the data along a
regression line and when the data falls directly on the straight diagonal line, normality is
assumed.

The data does fall off the line, and further analysis is needed.
10
NORMALITY AND OUTLIERS IN ANOVA AND MANOVA
Tests of Normality
1. Choose Analyze, Descriptive
Statistics, Explore
2. Select variables
3. Click Plots, and unclick Stem
and Leaf, and click Normality
plots with tests
4. Click OK to produce output

When looking at the Tests of Normality, you want to have the test come out not
significant with a significance level of < .001. Both tests would show non-normal,
which is what the other approaches have also indicated.
Checking for Non-Normality and Outliers in a MANOVA
As MANOVA tests are sensitive to outliers, data should be screened and run through
normality tests and plot tests to see that the assumptions are met. Using Mahalanobis’ Distances
will help identify outliers in an MANOVA. If the scores for the Mahalanobis Distances exceed
the critical value found in the table it will be considered an outlier. “The Mahalanobis distance
statistic D2 measures the multivariate “distance” between each case and the group multivariate
11
NORMALITY AND OUTLIERS IN ANOVA AND MANOVA
mean (known as a centroid)” (Meyers, 2006, p. 67). The critical value tables are located in the
back of most textbooks.
As there are more variables needing to be normally distributed to not violate the
assumption of normality on the MANOVA, checking for non-normality is a more rigorous task
than the assumption of normality on an ANOVA analysis. Two additional properties to check
the normality assumption are “(a) any linear combinations of the variables are normally
distributed, and (b) all subsets of the set of variables have multivariate normal distributions”
(Stevens, 2009, p. 222). The second property implies the scatterplots for each pair of the
variables will be elliptical (Steven, 2009).


Example of a scatterplot
showing an elliptical
correlation between the
variables.
The higher the correlation, the
thinner the ellipse
(Sattler, 2007)
Scatterplot.gif

The scatterplot also shows outliers as the data points that fall outside the oval- shape
(elliptical)
12
NORMALITY AND OUTLIERS IN ANOVA AND MANOVA
Retrieved from: http://www.psychwiki.com/wiki/Analyzing_Data

Like the ANOVA, the shape of a distribution for the variables in a MANOVA should
follow the bell-shaped curve. Variable 2 is showing a positive skewness, while variable 1
is showing the data to be normal, as it following the bell shape.

When a variable has violated the assumption of normality, a data transformation can be
used to modify the variable (Hindes, 2012). The square root transformation, the
logarithmic transformation and the inverse transformation are three of the more common
transformations used. Including a variable that is not normal will reduce the power of the
test.
Conclusion
Research and design uses a systematic approach to collecting and analyzing data to help
explain or predict a certain occurrence or trend. Using Univariate and multivariate data analysis,
we are able to obtain a more detailed description of the relationship of the variables being
studied. Stronger results are reached if the data is screened and the assumptions of the test have
not been violated. When using the SPSS software program and following a very systematic
process, checking for non-normality and outliers in an ANOVA and a MANOVA analysis is
13
NORMALITY AND OUTLIERS IN ANOVA AND MANOVA
straight forward and thorough, with data being analyzed through both graphical and statistical
methods.
14
NORMALITY AND OUTLIERS IN ANOVA AND MANOVA
References
Gay, L. R., Mills, G. E., & Airasian, P. (2012). Educational research: Competencies for
analysis and applications (10th ed.). NJ: Pearson Education, Inc.
Hindes, Yvonne EDPS 607 L20 Multivariate Design and Analysis Spring 2012 Power point.
Meyers, L.S., Gamst, G., & Guarino, A.J. (2006). Applied Multivariate Research: Design and
Interpretation. Thousand Oaks, California: Sage Publication.
Sattler, J.M. (2008). Assessment of Children Cognitive Foundations (5th Ed.). La Mesa
California: Jerome M. Sattler, Publisher, Inc.
Todorov, V., & Filzmoser, P. (2010). Robust statistic for the one-way MANOVA Computational
Statistics and Data Analysis 54, 37-48. Doi10.1016/j.csda.2009.08.015
Retrieved from: http://www.creative-wisdom.com/pub/parametric_WUSS2002.pdf
Retrieved from: http://www.scribd.com/doc/49320849/115/Assumption-testing
Retrieved from: http://www.itl.nist.gov/div898/handbook/eda/section3/eda35b.htm
Retrieved from: https://statistics.laerd.com/spss-tutorials/testing-for-normality-using-spssstatistics.php
Retrieved from: http://pathwayscourses.samhsa.gov/eval201/eval201_supps_pg16.htm
Scatterplot.gif retrieved from:
https://www.google.ca/search?hl=en&q=scatterplot+elliptical&aq=f&aqi=g-lK1gbsK1&bav=on.2,or.r_gc.r_pw.r_qf.,cf.osb&biw=1202&bih=591&wrapid=tlif134043255544610
&um=1&ie=UTF-8&tbm=isch&source=og&sa=N&tab=wi&ei=zGDlT5-sNMTg2gX5ofDZCQ