Download Guide - South

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Psychometrics wikipedia , lookup

Misuse of statistics wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Omnibus test wikipedia , lookup

Student's t-test wikipedia , lookup

Categorical variable wikipedia , lookup

Transcript
SPSS FOR WINDOWS
VERSION 11.0
BASIC GUIDE II
For further information contact:
Julie Morris or Bernice Dillon
Department of Medical Statistics
Tel: x 5815, 5800
Email: [email protected]
[email protected]
1
Qualitative Variables: Two independent groups
In this example we want to examine the differences between two groups ( disease
0=no, 1=yes) for a binomial variable (smoker 0=no, 1=yes).
We have already shown in Basic Guide I, that we can describe categorical variables
using percentages with the Crosstabs command (Analyse>Descriptive
statistics>Crosstabs ).
SMOKER * DISEASE Crosstabulation
DISEASE
1.00
4
2
66.7%
33.3%
80.0%
28.6%
1
5
16.7%
83.3%
20.0%
71.4%
5
7
41.7%
58.3%
100.0%
100.0%
.00
SMOKER
.00
1.00
Total
Count
% within SMOKER
% within DISEASE
Count
% within SMOKER
% within DISEASE
Count
% within SMOKER
% within DISEASE
Total
6
100.0%
50.0%
6
100.0%
50.0%
12
100.0%
100.0%
So, in this example, 33% of Non-Smokers (Smoker=0) have a disease whereas 83%
of Smokers (smoker=1) have a disease.
To test for differences between groups, we create this table, and run one of the
following tests:



If the table is 2x2 and the numbers are large enough in each cell, we use ChiSquare with Yates’ Continuity Correction.
If the table is greater than 2x2 and the numbers are large enough in each cell,
we use ‘Pearson’s Chi-Square’
If the numbers are not large enough, we use ‘Fisher’s Exact Test’
Look at the Expected Count less than 5 displayed below the table produced by SPSS.
If this is greater than or equal to 20% then the numbers are not large enough and we
should use Fisher’s Exact Test (and always use the two sided test)
To obtain a p-value using CROSSTABS
 Click on Statistics. Tick the Chi-Square box and then click on Continue.
2

Click OK
In this example, we get the following table
Chi-Square Tests
Value
Asy mp. Sig.
(2- sided)
df
Pearson Chi-Square
3.0 86 b
1
.07 9
Co ntinuity Co rrect ion a
1.3 71
1
.24 2
Lik eliho od Ratio
3.2 56
1
.07 1
Ex act Sig.
(2- sided)
Ex act Sig.
(1- sided)
.24 2
.12 1
Fisher's Exact T est
Lin ear-by-Lin ear Association
N o f Valid Cases
2.8 29
1
.09 3
12
a. Com puted only for a 2x2 table
b. 4 cells (100.0%) have expected count less than 5. T he mi nimum expected count i s 2.50.
One of the deciding
factors of which test to use
Under the table it says that 100% of our cells have an Expected Count less than 5,
therefore we don’t have enough subjects in our sample to use a chi-square. Therefore
we shall use the Fisher’s Exact Test. Our p-value is two-sided, so in this case, p=0.24.
We have no evidence that smokers are more likely to develop the disease (although,
this is a lot to do with the sample size and power of the study)
3
Qualitative variables: More than two groups
When we look at categorical variables by group, we are again forming a table of
counts and can analyse it using chi-square.
We use the exact same commands as we did for the two group example. In this case,
we have three samples of workers each from a different work place and we are
measuring the prevalence rate of a history of cough.
WORK
Total
Workplace 1
Count
% within WORK
Workplace 2
Count
% within WORK
Workplace 3
Count
% within WORK
Count
% within WORK
COUGH
Yes
16
45.7%
31
59.6%
10
58.8%
57
54.8%
Total
No
19
54.3%
21
40.4%
7
41.2%
47
45.2%
35
100.0%
52
100.0%
17
100.0%
104
100.0%
Chi-Square Tests
Value
Asy mp. Sig.
(2- sided)
df
Pearson Chi-Square
1.7 64 a
2
.41 4
Lik eliho od Ratio
1.7 62
2
.41 4
Lin ear-by-Lin ear Association
1.2 22
1
.26 9
N o f Valid Cases
104
a. 0 cells (.0%) have expected count less t han 5. The m inim um expected count
is 7.68.
We observe that 46% of Workplace 1 have a history of cough, 60% of Workplace 2
and 59% of Workplace 3. We can also see that 0% of the expected counts are less
than 5. Therefore, we would use the Pearson’s Chi-Square p-value, which is equal to
p=0.41. There is no evidence of any difference in proportions, between the three
groups.
4
Chi Square tests using summary data
Sometimes we just want to use SPSS to carry out a chi-square test on the
crosstabulation data without inputting all the raw data.
Inputting crosstabulation data directly into SPSS
To do this input 3 columns of data. The first column contains information about
the coding of your first variable (eg. if the variable has just two categories then the
codes would be 1 and 2). The second column contains information about the
coding of your second variable (eg. if the variable has three categories then the
codes would be 1, 2 and 3). The length of the columns (ie. the number of rows in
the datasheet) corresponds to the number of cells in your crosstabulation ie. the
number of different combinations of categories for your two variables (eg in the
above example there are 6 cells, (6 = 2 x 3). The values in these first two columns
cover the different combinations. The third column contains the frequencies, ie the
number of subjects corresponding to each combination of variables.
For example, for the workplace vs. cough data above, the spreadsheet will look
like this:
cough
workplace
freq
1
2
1
2
1
2
1
1
2
2
3
3
16
19
31
21
10
7
The Freq column is now used to weigh the cases.
 Click on the Data menu
 Click on Weight cases
 Tick the ‘weight cases by’ box
 Highlight the freq variable and click on the arrow to place it in the frequency
variable box
 Click on OK
Then use the Crosstabs command as before on the cough and workplace
variables
5
Qualitative variables: Two matched groups
In this scenario we would not use a chi-square test because our data are no longer
independent. To analyse whether a proportion changes at two points in time or
whether two pairwise matched groups differ, we would use a ‘McNemar’s Test’.
In this example, we wish to see if the rate of cough has changed from one year to the
year after. Since we are using the same sample at year 1 and year 2, this would be a
McNemar’s Test1
There are two ways of carrying out McNemar’s Test:
1. The McNemar’s Test is under the Crosstabs command that we have been
using for the Chi-Square commands.
1

Enter One Variable in the Row Box and One Variable in the Column Box


Click on Statistics and put a tick in the McNemar box.
Click on Continue, Now Click on Cells and click the Total Percentages box
If there were two different groups at year 1 and year 2, then this would be a chi-square test.
6
 Click on Continue
 Click on OK
In this example, we get the following table
YEAR1 * YEAR2 Crosstabulation
YE AR2
Yes
YE AR1
Yes
Co unt
% of T o tal
No
T o tal
57
30 .8%
24 .0%
54 .8%
25
22
47
24 .0%
21 .2%
45 .2%
57
47
10 4
54 .8%
45 .2%
10 0.0%
Co unt
% of T o tal
T o tal
25
Co unt
% of T o tal
No
32
Chi-Square Tests
Value
1.0 00 a
McNem ar T est
N o f Valid Cases
Ex act Sig.
(2- sided)
10 4
a. Binomial distribution used.
The percentage of subjects coughing at year one is the total percentage for this row. It
is, in this example, 54.8%. In fact, the percentage of coughs in year 2 is also 54.8%. It
is no surprise, that the p-value for this 1.000 (Exact Sig. (2-sided)). There is no
evidence that the prevalence rate of cough is changing between year 1 and year 2.
2. Alternatively, McNemar’s test is under the ‘Non-parametric test – 2 related
samples’ option.
Test type = McNemar’s
Test pair = 2 matched variables
7
Quantitative data - Two independent groups
We describe continuous variables with a mean or a median to give a table such as.
Secondary Prevention of Coronary Heart Disease
Mean (SD)
Respondents Non-Respondents
(n=1343)
(n=578)
66.2 (8.2)
66.6 (8.7)
Age (Years)
10 (6,35)
15 (8,47)
Time since MI (mths)*
6.5 (1.2)
6.6 (1.2)
Cholesterol (mmol/l)
[*Median (Range)]
The next question is – “Is one mean (or median) ‘significantly’ greater than the
other? Or, in other words, “Is the difference we observe big enough to suggest it
hasn’t happened by chance.” To answer this we need a confidence interval or a pvalue as our tool of evidence. The method we use is dependent on whether the data
are normal (when we are allowed to use means) or not. We have the following
choices:

If the data are NORMALLY distributed, we can get a confidence interval for
the difference in means and perform an ‘Independent samples t-test’ to get
our p-value. We can achieve ALL of this using the COMPARE MEANS
option in SPSS

If the data are LOG NORMALLY distributed, we can get a confidence interval
for the ratio of means and perform an ‘Independent samples t-test’ to get our
p-value. We can achieve ALL of this using the COMPARE MEANS option
in SPSS.

If the data are NEITHER NORMALLY NOR LOG NORMALLY distributed,
we can get a confidence interval for the median difference and perform a
‘Mann-Whitney U test’ to get our p-value. We can achieve SOME of this
using the NON-PARAMETRIC TESTS option in SPSS. SPSS will not give
you a confidence interval for the median difference.
Testing for Normality or Log Normality
The tests for normality have been given SPSS Basic Notes 1 (and Basic Statistics 1).
It is important to note that when we do group comparisons that the data must be
normally distributed within each group. The easiest way to do this is to split the file
by the group variable before doing your test of normality.
Log normal data has a positive skewness. It is very common in biochemistry where
there are more people with a normal, low value than people with a high value, which
often indicates an illness. To test for log normality, we simply do a normality test on
the variable after it has been log transformed. An example is given in these notes.
8
Independent Samples t-test
Independent Samples t-test
Appropriate for comparing the means between two groups of different subjects, (who
have not been matched in the sampling2.)
The Compare Means option allows you to test for the difference in means for a
continuous variable between two or more groups.
1) If the data are normally distributed
To open the t-test box: Go to the Analyse menu, select “Compare Means” and then
select “Independent samples T test”.



Highlight the continuous variable to be tested and click on the relevant arrow
to place it in the ‘Test Variable’ box
Highlight the variable that indicates the groups and click on the relevant arrow
to place in the ‘Grouping Variable’ box
Now click the ‘Define Groups’ box and type in the values of the grouping
variable. So for instance if you wanted to compare men and women, and in
your data set men were coded as 1 and women coded as 2 then you would
enter 1 and 2 into this box. Although you could argue that SPSS should be
able to work this out, what this box does allow you to do is to select two
subgroups of a variable. So, for instance if you had a variable coded 1=Yes,
2=No, 8=Don’t Know, then you could compare just the yes and the no group
by entering 1 and 2 into the box.
2
If the sample design includes matching then we have special methods for analysing. The Independent
Samples t-test is not appropriate as through the Sample Design, you have forced the groups to be more
alike than they would have been if they had been randomly sampled
9

Now press ‘Continue’ then ‘OK’
An example of the output you should get is as follows: It tests for the difference in
average heights between men and women, by using independent samples.
Levene’s
F
Sig.
Equal variances
Not equal variances
0.005
0.942
t
11.019
11.053
t-test for equality of means
sig
Mean s.error
Lower
Diff
Diff
112
0.000 15.52
1.408
12.728
62.74 0.000 15.52
1.404
12.713
df
Upper
18.310
18.325
There are two rows to this table. An assumption of the t-test is that our two groups (in
this example, men and women) have the same variance. If they don’t, it is often a sign
of a violation of normality, but if we are sure that our data are normal but have
different variances then we use the information given on the bottom row of the table,
but more commonly, we use the top row. In order to test for equal variances between
the two groups, we use the Levene Test of Equal Variances. The p-value is in the
second column. If this p-value is below 0.05, we have evidence of a difference in
variances and we must use the bottom row for our p-value and confidence interval.
In our case we have observed that men are on average 15.52cm taller than women
with a confidence interval of (12.73cm,18.31cm). The p-value is given as 0.000,
which we should report as p<0.0013. Also spot that the confidence interval does not
contain zero therefore we are 95% certain that men are taller than women.
2) If the data are log-normally distributed
We can also do an independent samples t-test, but this time, we have to transform the
data first. To this we use the Natural Log function (Ln) in Compute. We do not use
the Log, base 10 function. In order to do this:





Click on the Transform menu
Click on Compute
Enter a new variable name in the Target Variable box
Form the expression “ln(Variable Name)” in the Numeric Expression box
Click on OK
3
In most circumstances, a p-value can never equal zero. This is a consequence of using a sample to
describe an unknown population. Therefore there is always a small probability that the difference that
you have observed has happened by chance. Therefore, for small p-values that are reported in SPSS as
0.000, we would use p<0.001 in a paper. i.e. it is a small probability, but it still exists
10
In our example the variable that we wish to transform is called Hdl and the new
variable we have created is called loghdl4. Loghdl will now be the transformed
version of HDL and it is this that we run the t-test on.
So to run the t-test, we repeat what we did in the example with the normally
distributed variable



Highlight the continuous variable to be tested and click on the relevant arrow
to place it in the ‘Test Variable’ box. This time the variable we put in is
LOGHDL
Highlight the variable that indicates the groups and click on the relevant arrow
to place in the ‘Grouping Variable’ box
Now click the ‘Define Groups’ box and type in the values of the grouping
variable.
You’ll get a similar output. In our example, which represents HDL levels between
two groups, the first group in the treatment arm of a trial and the second group in the
placebo arm, we get:
Levene’s
F
Sig.
Equal variances
Not equal variances
0.441
0.511
T
df
-0.120
-0.120
39
36.92
t-test for equality of means
sig
Mean
s.error
Lower
Diff
Diff
0.905 -0.0097
0.08118
-0.17407
0.905 -0.0097
0.08118
-0.17422
4
In statistics, Log is synonymous with Natural Logs as we hardly ever use Log, Base 10. Therefore, it
is not unusual for our transformed variables to be called ‘Log<variable>’ when ‘Ln<variable>’ may
seem more appropriate.
11
Upper
0.15462
0.15478
Again we can assume equal variances (p=0.51). There is no evidence of any
difference in levels of HDL betweem the two groups (p=0.91). We observe a mean
difference of -0.0097, but this is between the log transformed outcomes. If we
exponentiate the mean difference in the table, then we get 0.99. This is the ratio of the
means of the treatment group and the placebo group, i.e. The average HDL reading of
the treatment group is 0.99 times the average of the placebo group. If we exponentiate
the limits of the confidence interval, then we get a confidence interval of this ratio,
which is (0.84, 1.16), that is the average HDL after the treatment group is somewhere
between 0.84 times and 1.16 times the average HDL after a course of the placebo.
In order to exponentiate, we have to do it either with a calculator or a spreadsheet,
using the EXP function.
Mann Whitney U test
If our data are not normally distributed then using a t-test is prone to increasing our
chances of a Type I Error, i.e. giving us a significant p-value when there is no
difference in the population. In order to combat this, we use a non-parametric test.
The equivalent of the t-test is the Mann-Whitney U test. We may use medians to
describe non-normal data, but the Mann-Whitney U is a rank test, so does not directly
compare two medians. It compares two distributions.
The menu for running a Mann-Whitney U test is under the Analyse mean, click on
“Nonparametric Tests”, then click on “2 Independent Samples”


Highlight the variable to be tested and click on the relevant arrow to place it in
the ‘Test Variable’ box
Highlight the variable defining the groups and click on the relevant arrow to
place in the ‘Grouping Variable’ box
12

Click on the ‘Define Groups’ box and type in the values of the grouping
variable corresponding to the groups being compared. This is identical to the ttest example.

Click on Continue and then OK
In this example, comparing weight between men and women, we get the following
output:
Ranks
WEIGHT
SEX
Male
Female
Total
N
80
34
114
Mean
Rank
72.15
23.03
Sum of
Ranks
5772.00
783.00
Test Statisticsa
Mann-Whitney U
Wilcoxon W
Z
Asymp. Sig. (2-tailed)
WEIGHT
188.000
783.000
-7.264
.000
a. Grouping Variable: SEX
We can see from the p-value (reported in SPSS as 0.000, but we should report as
<0.001) that the weight of males is significantly different to females. There are,
however NO descriptive statistics with this output as the mean rank and the sum of
ranks that have been reported are completely meaningless. To get the medians, by
group, use the ‘Explore’ option. The confidence interval for the median difference is
not available in SPSS. Either seek alternative software, calculate it by hand or ask a
statistician.5
5
One reference for the formula for the Median Difference is:
Campbell, M.J. and Gardner, M.J. (1989) Calculating confidence intervals for some non-parametric
analyses. In Statistics with Confidence (eds M.J. Gardner and D.G.Altman), London: British Medical
Journal, 71-9 [9.6.3, Ans 9.4]
It is not a regularly seen statistic in papers so may not be necessary in order to get your paper
published. Descriptives by group could be sufficient.
13
Quantitative variables – matched data
When we analyse two matched continuous variables within a single group, there are
two possible scenarios.
a) We have measured the same variable twice, maybe at two different time points
and we are interested in whether this measure has changed in the duration.
b) We have two separate groups but they have been pair-wise matched in some
way
As with most analysis with continuous variables the actual method of analysis is
dependent on the assumption of normality. Let us consider our approach

If normal, we can get a confidence interval for the average decline over time
and a p-value from a ‘paired-sample t-test.’ This is ALL available in the
COMPARE MEANS option.

If not-normal, we use the confidence interval for the median decline and a pvalue from a ‘Wilcoxon Test’. SOME of this is available in the NONPARAMETRIC TESTS option.
In this case the normality we are interested in is the normality of the paired
differences. In order to test for normality in our paired differences, we need to
calculate the difference between the two variables and test that for normality. Data
that are normally distributed at both time points often have normally distributed
differences. Also some data that are non-normal at both time points could have
normal paired differences. For example, hdl may be log-normal at time 1 and lognormal at time 2 but the reductions in hdl could be normal. There is no hard and fast
rule for the distribution of the paired differences, it will have to be checked. However
normally distributed paired differences are very common.
Paired t-test
This test is appropriate when we have a repeated measure on a single group of
subjects (or our groups have been matched in the design) and the paired difference
between the paired variables is normally distributed.



Select Compare Means
Select paired-sample t-test
Highlight the two variables to be tested and click on the relevant arrow to
place in the ‘Paired Variables’ box (e.g. In this case we choose pre and post –
you have to select them both before transferring them across)
14

Click on OK
Here is an example of some output: It demonstrates the difference in calories taken by
a sample of women, pre and post menstruation
Pair 1
Mean
1283.33
SD
398.64
Paired Differences
s.error Lower Upper
132.88 976.91 1589.75
t
9.658
df
8
sig
0.000
Therefore, we have observed that that dietary intake on pre menstrual days is 1283
calories higher than dietary intake on post menstrual days with a confidence interval
of (977 calories,1590 calories). This is highly significant (p<0.001)
Wilcoxon signed rank test
This test is used for comparing repeated measures on the same subjects (or measures
between two groups of subjects who have been matched in some way) where the
differences are not normally distributed. As an example, we will use the menstruation
data again, but this time, we assume that the paired differences are not normally
distributed.




Select Non-parametric tests
Select 2 related samples
Select ‘Wilcoxon’ for we have ‘continuous’ data
Highlight the two variables to be tested and click on the relevant arrow to
place in the ‘Test Pair List’ box
15

Click on OK
Ranks
N
Dietary intake post
mens trual - Dietary
intake pre-mens trual
9a
0b
0c
9
Negative Ranks
Pos itive Ranks
Ties
Total
Mean
Rank
5.00
.00
Sum of
Ranks
45.00
.00
a. Dietary intake post menstrual < Dietary intake pre-menstrual
b. Dietary intake post menstrual > Dietary intake pre-menstrual
c. Dietary intake pre-mens trual = Dietary intake post menstrual
We get the following output.:
The mean ranks are not appropriate summary statistics. Therefore, the figure we are
most interested in is the p-value in the table overleaf (Asymp. Sig.(2-tailed)). We can
conclude there is a significant difference in calorie intake between the premenstrual
and postmenstrual period (with a higher intake on premenstrual days as the mean
negative rank exceeds the mean positive rank). This time to get the median difference,
calculate the difference variable first and then Explore this new variable. To get the
confidence interval for it – seek alternative help.
Test Statisticsb
Z
Asymp. Sig. (2-tailed)
Dietary
intake pos t
mens trual
- Dietary
intake
pre-mens tr
ual
-2.666 a
.008
a. Bas ed on positive ranks.
b. Wilcoxon Signed Ranks Tes t
16
Quantitative variables – Correlations
Correlations allows us to assess the strength of the relationship between two
continuous variables.

If normal data, we use a ‘Pearson’s correlation coefficient’
 If non-normal data, we use a ‘Spearman’s correlation coefficient’
Some people would say you need BOTH variables to be normal to do a Pearson’s.
This is true if you want the Confidence Interval. Without the confidence interval, you
only need one variable to be normal but as we are advocating Confidence Intervals in
this course, only use Pearsons if BOTH variables are normally distributed – otherwise
use Spearman’s. An alternative correlation coefficient for non-normal data is
Kendall’s Tau-B but we will use Spearman’s.
The Correlate menu is under Analyse
 Click on ‘Bivariate’
 Highlight the relevant variables and click on the arrow to place them in the
‘Variables’ box
 Select either Pearson or Spearman.
- We use Pearson when we have normally distributed data and believe we
have a linear relationship.
- We use Spearmans when we have non-normally distributed data and/or a
non-linear relationship is to be assessed.
(In this case, as we believe height and weight are normal, we shall click Pearson)

Click on OK
17
SPSS gives the following outcome which shows that Height and Weight are highly
correlated (correlation coefficient=0.747). SPSS does not give a confidence interval
for the correlation coefficient.
Correlations
HEIGHT
WEIGHT
Pears on Correlation
Sig. (2-tailed)
N
Pears on Correlation
Sig. (2-tailed)
N
HEIGHT
WEIGHT
1.000
.747**
.
.000
114
114
.747**
1.000
.000
.
114
114
**. Correlation is s ignificant at the 0.01 level
(2-tailed).
18
Quantitative variables: More than two groups
The p-value for a comparative test between more than two groups will provide
evidence for the hypothesis that all the groups, on average, are the same. Consider this
example. In this dataset, three groups, defined by age, are given a task to reduce their
stress levels.The stress is measured on a scale of 0 to100 where 0=”No stress” to
100=”Total Stress” and is measured twice, once before the task and then afterwards.
It is theorised that age will be related to the efficacy of the task. Therefore we are
comparing three means, the mean decline in stress of those aged 16-25, the mean
decline for those aged 26-45 and the mean decline for those aged 45-65.
The way we answer the question “Are the average stress declines the same?”
depends on whether our continuous variable is normally distributed or not. Here, we
need to examine normality within each of our groups.6



If it is normal, then we can get a p-value from a ‘One Way Analysis of
Variance.’ This is available in SPSS under the COMPARE MEANS option
If it is Log normal, then we can get a p-value from a ‘One Way Analysis of
Variance’ after we have transformed our data. This is available in SPSS under
the COMPARE MEANS option [No Example given]
If it is not normal, then we can get a p-value from a ‘Kruskal-Wallis Analysis
of Variance.’ This is available under the NON-PARAMETRIC TESTS
option.
One Way ANOVA
To run a one-way ANOVA, go to the Analyse Menu, select ‘Compare Means’, then
select ‘Oneway Anova’

6
Highlight the variables to be tested and click on the arrow to add them to the
‘dependent list’. Add the grouping variable, age, into the ‘Factor’ box.
We also assume equal variances in each group.
19

Click on Options and click Descriptives
In this example, we get the following
Descriptives
95 % Co nfidence In terv al for
Mean
N
DECLINE
ST RPRE
Std. Dev iatio n
Std. Erro r
Lo wer Bound
Up per Bound
Minimum
Maximum
1 15-25
20
Mean
14 .30
10 .579
2.3 65
9.3 5
19 .25
-3
33
2 26-45
20
9.4 5
8.9 24
1.9 95
5.2 7
13 .63
-17
21
3 46-65
20
14 .40
12 .475
2.7 89
8.5 6
20 .24
-1
43
To tal
60
12 .72
10 .827
1.3 98
9.9 2
15 .51
-17
43
1 15-25
20
52 .80
11 .218
2.5 09
47 .55
58 .05
35
72
2 26-45
20
33 .40
15 .010
3.3 56
26 .38
40 .42
11
61
3 46-65
20
35 .60
11 .749
2.6 27
30 .10
41 .10
19
63
To tal
60
40 .60
15 .298
1.9 75
36 .65
44 .55
11
72
ANOVA
Sum of Squares
DE CLINE
ST RPRE
df
Mean Square
Between Groups
32 0.233
2
16 0.117
Within Group s
65 95.95 0
57
11 5.718
T o tal
69 16.18 3
59
Between Groups
45 13.60 0
2
22 56.80 0
Within Group s
92 94.80 0
57
16 3.067
13 808.4 00
59
T o tal
F
Sig.
1.3 84
.25 9
13 .840
.00 0
The p-value for the mean decline is 0.26, whereas for the levels of stress before the
task (Variable name: strpre) it is <0.001. We can conclude that there is no evidence of
any difference in the average decline of stress after the task but there is strong
20
evidence that the levels of stress before the task were different. We can see from the
descriptives that the younger age group were more stressed with a mean score of
52.80 (95% C.I.=(47.6,58.1)) compared with mean scores of 33.4 for the 26-45 age
group and 35.6 for the 46-65 age group. The middle age group showed the smallest
decline, average 9.5 (95% C.I.=(5.3,13,6)) but this was not ‘significantly’ smaller.
Multiple Comparison Tests
Click on Post Hoc and put a tick in the box corresponding to one of the multiple
comparison tests (eg. Scheffe, Duncan, Tukey). Click on Continue.
Kruskal-Wallis
For example purposes, we are using the same stress data set but in this case we are
assuming it is not normally distributed. If this were the case, then by using a One-way
Analysis of Variance, we have a large chance of invoking a Type I error. Therefore,
we choose the non-parametric version, the Kruskal-Wallis One Way Analysis of
Variance.

Enter the variables that you want to test into the Test Variable List. Enter the
grouping variable, age, into the Grouping Variable box.

Click on Define Range. Enter the range of categories that you want to test, in
this case we want to test from 1 (16-24) up to 3 (45-64). i.e we are choosing a
subset of the possible age groups.
Click on Continue

21

Click on OK
We get the following output:
Test Statisticsa,b
DE CLINE
Ch i-Square
1.7 12
df
Asymp. Sig.
ST RPRE
19 .268
2
2
.42 5
.00 0
a. Kruskal W allis Test
b. Grouping Variable: AGE
I have missed out the mean ranks from the output as, again, they are meaningless – if
you wish to explore further where any differences lie – use the Explore option. We
see there is little evidence of a difference between groups in the levels of stress
decline (p=0.43) but there is strong evidence of a difference in levels of stress before
the task (p<0.001)
22
APPENDIX
Entering Data for Paired and Unpaired Tests
The golden rule for data entry is ‘ONE ROW PER SUBJECT’. No more and no less.
Therefore, when we are comparing independent groups, we would get a data sheet
such as.
i.e., we have a group variable, work, which tells us which workplace the subject is in
and an outcome variable, cough, which tells us whether the subject has a history of
cough.
It is not appropriate in this case to create three variables, one for each workplace. We
would then have 3 subjects per row.
When we have paired data, we get a data set such as the set overleaf.
23
This time year 1 is in one column and year 2 is in another column. We do not have a
year variable because that would mean that we would have two rows for one subject.
The consequence of all this is that the menus for the paired analysis are set up
differently for the menus of the unpaired analysis.
However, both menus are set up to accommodate:
ONE ROW PER SUBJECT and ONE SUBJECT PER ROW
24