Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Producing Error Bars of Appropriate Size in SPSS The Problem Version 13 of SPSS allows the user to include error bars in standard line and bar graphs (I'm not referring to the unsatisfactory 'interactive' graphs). This facility is very useful, but it quite often happens that the standard errors or confidence intervals which SPSS calculates from the data are inappropriate. Some examples are 1. The sample is clustered, so the SEs that SPSS calculates, which are based on the assumption of independence, are too small 2. A pooled standard error, rather than a separate error term for each condition, is needed 3. SPSS calculates errors appropriate for between-subjects comparisons, when the focus in the graph is on within-subjects comparisons. Some graphics programs, such as Deltagraph, allow the user to enter the appropriate SEs or 95% CI limits directly, and don't insist on calculating these quantities from the data. Given that SPSS does insist on doing its own calculations, how can we get it to produce graphs showing the appropriate quantities? A Solution The method described here uses dummy cases (two to specify each mean and associated error bar). The labour-saving convenience is the fact that, with two values, a given standard deviation, SD, can be obtained by setting one of the values equal to (the mean – SD) and the other equal to (the mean + SD). For example, if the required mean for a given point in the graph is 100 and the required SD is 15, the values used would be 85 and 115. The mean of these values is 100, and the SD is 15. This assumes that SD is calculated with n rather than n - 1. For the purposes of graphs (and for most other purposes) SPSS actually calculates the unbiased estimate of the population standard deviation, so that it uses n - 1. This means that to obtain an SD of 15 in the above example, the values entered into SPSS should be 100 - 15/√2 and 100 + 15/√2, namely 89.3934 and 110.6066. Notice that, in order to get the values we want, we will ask SPSS to produce error bars showing the SD, but that it's over to us as to what they actually represent in the graph. For example, say that we had a sample of 50 cases, with a mean of 100 and an SD of 15 on a variable y. The standard error of the mean would be 15/√50 = 2.1213. In order to have SPSS show the appropriate SE in the graph, we would enter the values 100 - 2.1213/√2 = 98.5 and 100 + 2.1213/√2 = 101.5, and tell SPSS that we wanted error bars showing the SD associated with each point. Then in the graph labels, we would say (correctly) that the error bars represent the standard error of the mean. In this case, because there's no 'funny business' such as clustering, SPSS would have calculated the correct standard errors for itself from the original data, so we wouldn't have had to use our strategem. Note, however, that had we had asked SPSS to -2produce standard error bars (as opposed to asking for SD error bars) based on our carefully-calculated values, it would have produced the wrong answers. Example In this hypothetical example, there are four groups of subjects with means and SDs on variable y as shown in the table below. The standard errors of the means are shown in the fifth column of the table. (For this example, the SEs are those which would be obtained by the conventional calculation SEmean = SD/√n so, again, this is for purposes of demonstration only and, in real life, we wouldn't have to resort to our strategem, because SPSS would produce the correct results from the original data. However, note that our method would be useful with a 'straight' sample if we didn't have the original data, just the information given in the table.) The method described above was used to calculate the values which have to be entered into SPSS to obtain a graph with the appropriate error bars for the standard errors. These values are shown in columns 6 and 7. For example, value 1 for Group 1 is equal to 5 - .2652/√2. Group 1 2 3 4 Mean of Y 5 6 7 8 SD n .75 .5 1 1.5 8 5 10 15 Required Standard Error .2652 .2236 .3162 .3873 Value 1 for SPSS 4.8125 5.8419 6.7764 7.7261 Value 2 for SPSS 5.1875 6.1581 7.2236 8.2739 The Data Window showing the entered data is as follows: To obtain the graph (a bar graph in this example), click on GraphsÎBarÎSimple ÎDefine, then select y as the Variable for Other statistic, and group as the Category Axis. Click on the Options button, check Display error bars and select Standard deviation with a Multiplier of 1. Click on Continue, then OK, to produce the graph: -3- 10 9 8 Mean y 7 6 5 4 3 2 1 0 1 2 3 4 group Error bars: +/- 1 SEM The graph has been edited to include a finer scale on the y-axis than that originally used by SPSS, and to change the Error bars annotation under the x-axis from +/- 1 SD to +/- SEM. Also, for the purposes of demonstration, a reference line has been added at the value 8.3873, which is the mean for Group 4, plus 1 SE; as can be seen, the line coincides with the upper bar for the group, as would be expected. Confidence Intervals The method described can also be used to produce graphs with confidence intervals by multiplying the SE by (say) 1.96, dividing the estimate by √2, then producing the two figures to be entered into SPSS by subtracting the result from the mean and adding the result to the mean, respectively. While error bars showing the SEs and CIs give some indication of the variability of the data, they aren't directly useful for comparing the means of groups. Goldstein and Healy (1995) have shown that, given that the SEs are similar across groups, multiplying the SE by 1.4 rather than 1.96 gives error bars which, if they do not overlap for two groups, suggest that there is a significant difference (at p < .05) between the groups. The graph below shows error bars calculated with 1.4 rather than 1.96. Inspection of the overlap suggests that the means for groups 1 and 2 are different from those for the other groups, but that the means for groups 3 and 4 are not different from each other. Bear in mind that this method doesn't take account of the number of comparisons, and is based on the large-sample statistic z, rather than t. These limitations notwithstanding, the bars shown in the above graph seem more immediately useful than the conventional SE bars and 95% CI bars. -4- 10 9 8 Mean +/- SE*1.4 7 6 5 4 3 2 1 0 1 2 3 4 group Summary – producing a graph with appropriate error bars 1. Decide whether the error bars which SPSS will produce if you leave it to its own devices are appropriate for your data. If they are not, you may need to use the strategy described above, and summarised below. The standard errors or other quantities you want to include in your graph may be obtained from the output of SPSS or another package, or be the result of your own calculations. 2. Calculate two values for each mean which you want to display in the graph. These will be equal to the mean plus or minus the value you want to display, each value first divided by √2. For example, if the mean is 10, and the S.E. is 3, the values needed to display 1 S.E. bars will be (10 – 3/√2) and (10 + 3/√2). If 95% CIs are to be displayed for these data, the values will be (10 – 3*1.96/√2) and (10 + 3*1.96/√2). 3. Enter the values into SPSS in a single column. In another column (variable) enter the codes for the groups or conditions. Each pair of calculated values will have the same group or condition code. 4. Ask SPSS to produce a line or bar graph, with error bars (click Options to request error bars). Make sure that you tell SPSS that the error bars represent standard deviations, with a multiplier of one. 5. Click on Continue then OK, then stand back. Alan Taylor, Department of Psychology, 18th November 2005 Reference H. Goldstein and M. Healy (1995). The graphical presentation of a collection of means. J. Royal Stat. Soc. A. 158, 175–177