Download Producing Appropriate Error Bars in SPSS

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Transcript
Producing Error Bars of Appropriate Size in SPSS
The Problem
Version 13 of SPSS allows the user to include error bars in standard line and bar
graphs (I'm not referring to the unsatisfactory 'interactive' graphs). This facility is
very useful, but it quite often happens that the standard errors or confidence intervals
which SPSS calculates from the data are inappropriate. Some examples are
1. The sample is clustered, so the SEs that SPSS calculates, which are based on the
assumption of independence, are too small
2. A pooled standard error, rather than a separate error term for each condition, is
needed
3. SPSS calculates errors appropriate for between-subjects comparisons, when the
focus in the graph is on within-subjects comparisons.
Some graphics programs, such as Deltagraph, allow the user to enter the appropriate
SEs or 95% CI limits directly, and don't insist on calculating these quantities from the
data. Given that SPSS does insist on doing its own calculations, how can we get it to
produce graphs showing the appropriate quantities?
A Solution
The method described here uses dummy cases (two to specify each mean and
associated error bar). The labour-saving convenience is the fact that, with two values,
a given standard deviation, SD, can be obtained by setting one of the values equal to
(the mean – SD) and the other equal to (the mean + SD). For example, if the required
mean for a given point in the graph is 100 and the required SD is 15, the values used
would be 85 and 115. The mean of these values is 100, and the SD is 15. This
assumes that SD is calculated with n rather than n - 1. For the purposes of graphs
(and for most other purposes) SPSS actually calculates the unbiased estimate of the
population standard deviation, so that it uses n - 1. This means that to obtain an SD of
15 in the above example, the values entered into SPSS should be 100 - 15/√2 and 100
+ 15/√2, namely 89.3934 and 110.6066.
Notice that, in order to get the values we want, we will ask SPSS to produce error bars
showing the SD, but that it's over to us as to what they actually represent in the graph.
For example, say that we had a sample of 50 cases, with a mean of 100 and an SD of
15 on a variable y. The standard error of the mean would be 15/√50 = 2.1213. In
order to have SPSS show the appropriate SE in the graph, we would enter the values
100 - 2.1213/√2 = 98.5 and 100 + 2.1213/√2 = 101.5, and tell SPSS that we wanted
error bars showing the SD associated with each point. Then in the graph labels, we
would say (correctly) that the error bars represent the standard error of the mean.
In this case, because there's no 'funny business' such as clustering, SPSS would have
calculated the correct standard errors for itself from the original data, so we wouldn't
have had to use our strategem. Note, however, that had we had asked SPSS to
-2produce standard error bars (as opposed to asking for SD error bars) based on our
carefully-calculated values, it would have produced the wrong answers.
Example
In this hypothetical example, there are four groups of subjects with means and SDs on
variable y as shown in the table below. The standard errors of the means are shown in
the fifth column of the table. (For this example, the SEs are those which would be
obtained by the conventional calculation SEmean = SD/√n so, again, this is for
purposes of demonstration only and, in real life, we wouldn't have to resort to our
strategem, because SPSS would produce the correct results from the original data.
However, note that our method would be useful with a 'straight' sample if we didn't
have the original data, just the information given in the table.)
The method described above was used to calculate the values which have to be
entered into SPSS to obtain a graph with the appropriate error bars for the standard
errors. These values are shown in columns 6 and 7. For example, value 1 for Group 1
is equal to 5 - .2652/√2.
Group
1
2
3
4
Mean of
Y
5
6
7
8
SD
n
.75
.5
1
1.5
8
5
10
15
Required
Standard Error
.2652
.2236
.3162
.3873
Value 1
for SPSS
4.8125
5.8419
6.7764
7.7261
Value 2
for SPSS
5.1875
6.1581
7.2236
8.2739
The Data Window showing the entered data is as follows:
To obtain the graph (a bar graph in this example), click on GraphsÎBarÎSimple
ÎDefine, then select y as the Variable for Other statistic, and group as the Category
Axis. Click on the Options button, check Display error bars and select Standard
deviation with a Multiplier of 1. Click on Continue, then OK, to produce the graph:
-3-
10
9
8
Mean y
7
6
5
4
3
2
1
0
1
2
3
4
group
Error bars: +/- 1 SEM
The graph has been edited to include a finer scale on the y-axis than that originally
used by SPSS, and to change the Error bars annotation under the x-axis from +/- 1 SD
to +/- SEM. Also, for the purposes of demonstration, a reference line has been added
at the value 8.3873, which is the mean for Group 4, plus 1 SE; as can be seen, the line
coincides with the upper bar for the group, as would be expected.
Confidence Intervals
The method described can also be used to produce graphs with confidence intervals
by multiplying the SE by (say) 1.96, dividing the estimate by √2, then producing the
two figures to be entered into SPSS by subtracting the result from the mean and
adding the result to the mean, respectively.
While error bars showing the SEs and CIs give some indication of the variability of
the data, they aren't directly useful for comparing the means of groups. Goldstein and
Healy (1995) have shown that, given that the SEs are similar across groups,
multiplying the SE by 1.4 rather than 1.96 gives error bars which, if they do not
overlap for two groups, suggest that there is a significant difference (at p < .05)
between the groups. The graph below shows error bars calculated with 1.4 rather than
1.96.
Inspection of the overlap suggests that the means for groups 1 and 2 are different from
those for the other groups, but that the means for groups 3 and 4 are not different from
each other. Bear in mind that this method doesn't take account of the number of
comparisons, and is based on the large-sample statistic z, rather than t. These
limitations notwithstanding, the bars shown in the above graph seem more
immediately useful than the conventional SE bars and 95% CI bars.
-4-
10
9
8
Mean +/- SE*1.4
7
6
5
4
3
2
1
0
1
2
3
4
group
Summary – producing a graph with appropriate error bars
1. Decide whether the error bars which SPSS will produce if you leave it to its own
devices are appropriate for your data. If they are not, you may need to use the
strategy described above, and summarised below. The standard errors or other
quantities you want to include in your graph may be obtained from the output of SPSS
or another package, or be the result of your own calculations.
2. Calculate two values for each mean which you want to display in the graph. These
will be equal to the mean plus or minus the value you want to display, each value first
divided by √2. For example, if the mean is 10, and the S.E. is 3, the values needed to
display 1 S.E. bars will be (10 – 3/√2) and (10 + 3/√2). If 95% CIs are to be displayed
for these data, the values will be (10 – 3*1.96/√2) and (10 + 3*1.96/√2).
3. Enter the values into SPSS in a single column. In another column (variable) enter
the codes for the groups or conditions. Each pair of calculated values will have the
same group or condition code.
4. Ask SPSS to produce a line or bar graph, with error bars (click Options to request
error bars). Make sure that you tell SPSS that the error bars represent standard
deviations, with a multiplier of one.
5. Click on Continue then OK, then stand back.
Alan Taylor, Department of Psychology, 18th November 2005
Reference
H. Goldstein and M. Healy (1995). The graphical presentation of a collection of
means. J. Royal Stat. Soc. A. 158, 175–177