Download Main presentation title goes here.

Document related concepts

Psychometrics wikipedia , lookup

Misuse of statistics wikipedia , lookup

Omnibus test wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript
Analysing
continuous data
Parametric versus Non-parametric methods
Scott Harris
October 2009
Learning outcomes
By the end of this session you should be able to choose between,
perform (using SPSS) and interpret the results from (Parametric /
Non-parametric equivalent):
– One sample t test / Sign test or Wilcoxon signed
ranks test.
– Independent Samples t test / Mann-Whitney U test.
– Paired samples t test / Wilcoxon signed ranks test.
Contents
• Introduction
– Refresher - types of data.
– Data requirements.
– The difference between parametric and nonparametric methods.
– The example dataset: CISR data.
• One sample versus a fixed value (Parametric and
non-parametric equivalent: P/NP)
– Test information.
– ‘How to’ in SPSS.
Contents
• Comparison of two Paired samples (P/NP)
– Test information.
– ‘How to’ in SPSS.
• Comparison of two Independent groups (P/NP)
– Test information.
– ‘How to’ in SPSS.
Refresher: Types of data
• Quantitative – a measured quantity.
– Continuous – Measurements from a continuous
scale: Height, weight, age.
– Discrete – Count data: Children in a family,
number of days in hospital.
• Qualitative – Assessing a quality.
– Ordinal – An order to the data: Likert scale (much
worse, worse, the same, better, much better), age
group (18-25, 26-30…).
– Categorical / Nominal – Simple categories: Blood
group (O, A, B, AB). A special case is binary data
(two levels): Status (Alive, dead), Infection (yes,
no).
Data requirements
• The Statistical tests that will be covered in this
session compare a sample with a continuous outcome
against either:
– a published or hypothesised value,
– a repeated sample from the same individual or
– a sample from another group.
• A different type of test is used in each of the
situations above.
Test requirements: t test
• A continuous outcome variable.
• Approximately ‘Normally’ distributed.
Skewed distributions
Positive skew
Negative skew
Skewed distributions…
With an obviously skewed distribution such as either
of those on the previous slide you have two options
open to you:
• Make a transformation of the data (such as taking the
log) to try to remove the skew and make the data
more normal.
• Make use of the equivalent non-parametric test.
Parametric and non-parametric methods
Parametric methods are based around the assumption of
normality as they make use of two parameters to explain the
underlying distribution.
These parameters are:
– The mean (explaining the average result) and
– The standard deviation (explaining the variability in the
data).
Non-parametric methods are, as their name suggests, methods
that are not based around the use of any parameters to
summarise any underlying distribution. It is for this reason that
these non-parametric tests are sometimes referred to as
‘distribution free’ tests. These tests can be used for ordinal
variables, even with only a few levels.
Example dataset: Information
CISR (Clinical Interview Schedule: Revised) data:
– Measure of depression – the higher the score the
worse the depression.
– A CISR value of 12 or greater is used to indicate a
clinical case of depression.
– 3 groups of patients (each receiving a different
form of treatment: GP, CMHN and CMHN problem
solving).
– Data collected at two time points (baseline and
then a follow-up visit 6 months later).
Example CISR dataset: Raw data
Example CISR dataset: Labelled data
Refresher: Hypothesis testing
• First you have to set up a null (H0) and alternative (H1)
hypothesis. You then calculate the specific test value
for the sample. This is then compared to a critical cutoff value, that can come from published Statistical
tables or can be left to the computer.
• If the calculated value exceeds the cut-off then the
null hypothesis is rejected and the alternative
hypothesis is accepted (accept H1). If the calculated
value is smaller than the cut-off then there is
insufficient evidence against the null hypothesis (do
not reject H0).
Comparing one sample against
a specific hypothesised value
One sample t test or
Sign test / Wilcoxon signed ranks test
Normally
distributed data
One sample t test
One sample t test: hypotheses
H0 : X  μ0
H1 : X  μ 0
H0 is the default hypothesis (null). We are testing
whether we have enough evidence to reject this and
instead accept the alternative hypothesis (H1).
Our null hypothesis is that the sample mean is equal
to a certain value. The alternative is that it isn’t.
Theory: One sample t test
The following equation is used to calculate the value of t:
x μ 0
t
s n
Where:
x
= Sample mean
μ 0 = Hypothesised mean (or test mean)
S = Sample standard deviation
n = Size of sample
This is distributed with (n-1) degrees of freedom.
Theory: One sample t test…
 The t value is then
compared against the
appropriate critical value
from Statistical tables:
 Significant evidence against
H0 if the absolute value of t
is greater than the two-sided
5% significance (0.05) value,
for the appropriate d.f.
Checking the distribution of B0SCORE
Graphs Legacy Dialogs  Histogram…
* Checking the distribution of B0SCORE .
GRAPH
/HISTOGRAM(NORMAL)=B0SCORE
/TITLE= 'Histogram of baseline CISR
score'.
Info: Histograms in SPSS
1)
From the menus select ‘Graphs’  ‘Legacy Dialogs’ ‘Histogram…’.
2)
Put the variable that you want to draw the histogram for into the
‘Variable:’ box.
3)
Tick the option to ‘Display normal curve’.
4)
Click the ‘Titles’ button to enter any titles and then click the
‘Continue’ button.
5)
If you want separate histograms for each level of another category
variable then you can either:
•
•
6)
Add the categorical variable into the ‘Panel by’  ‘Rows’ box or
Make use of the ‘Split file…’ command and draw the histogram as
normal. This will produce a separate full size plot for each level of
the categorical variable.
Finally click ‘OK’ to produce the histogram(s) or ‘Paste’ to add the
syntax for this into your syntax file.
The distribution of baseline CISR
The distribution?
SPSS – One sample t test
Analyze  Compare Means  One-Sample T Test…
* One sample t test vs. a value of 12 .
T-TEST
/TESTVAL = 12
/MISSING = ANALYSIS
/VARIABLES = B0SCORE
/CRITERIA = CI(.95) .
Info: One sample t test in SPSS
1) From the menus select ‘Analyze’  ‘Compare
Means’  ‘One-Sample T Test…’.
2) Put the variable that you want to test into the ‘Test
Variable(s):’ box.
3) Put the value that you want to test against into the
‘Test Value:’ box.
4) Finally click ‘OK’ to produce the one sample t test or
‘Paste’ to add the syntax for this into your syntax
file.
SPSS – One sample t test: Output
Observed summary statistics
One-Sample Statistics
N
B0SCORE
109
Mean
27.3119
Std. Deviation
10.75286
Std. Error
Mean
1.02994
mean difference
One-Sample Test
95% confidence interval
for the true mean
difference
Tes t Value = 12
B0SCORE
t
14.867
df
108
Sig. (2-tailed)
.000
Mean
Difference
15.31193
95% Confidence
Interval of the
Difference
Lower
Upper
13.2704
17.3534
2 sided p value with an alternative hypothesis of non-equality.
Highly significant (P<0.001) hence significant evidence against
the mean being 12.
Practical Questions
Analysing Continuous Data
Questions 1 and 2
Practical Questions
From the course webpage download the file HbA1c.sav by
clicking the right mouse button on the file name and selecting
Save Target As.
The dataset is pre-labelled and contains data on Blood sugar
reduction for 245 patients divided into 3 groups.
1) Produce Histograms for the reduction in blood sugar
(HBA1CRED), both combined across the three treatment groups
and split separately by treatment group. Do you think that the
data follow a normal distribution?
Practical Questions
2) Assuming that the outcome variable is normally
distributed conduct a suitable statistical test to
compare the starting HbA1c level (HBA1C_1) against
a value of 7 (the level below which good control of
glucose levels is accepted). What are the key
statistics that should be reported and what are your
conclusions from this test?
Practical Solutions
1)
To produce the overall histogram you can use the options exactly as
given. This results in the following syntax:
* Producing the overall histogram .
GRAPH
/HISTOGRAM(NORMAL)=HBA1CRED
/TITLE= 'Overall Histogram of blood sugar reduction'.
To split the graphs by treatment group, the easiest way is to add the
group variable to the Panel By:  Rows box. This results in the
following syntax:
* Producing the histograms split by group .
GRAPH
/HISTOGRAM(NORMAL)=HBA1CRED
/PANEL ROWVAR=GROUP ROWOP=CROSS
/TITLE= 'Histograms of blood sugar reduction, by group'.
30
Practical Solutions
1)
The histograms do not look highly skewed (although the individual
group histograms show some skew) and the combined histogram
actually appears to follow a normal distribution very well.
There may be an outlier in the Active A group.
Practical Solutions
2)
Along with the observed mean difference, its confidence interval and
the p value should be reported.
One-Sample Statistics
N
HB1AC_1
245
Mean
7.1337
Std. Deviation
1.51228
Std. Error
Mean
.09662
One-Sample Test
Tes t Value = 7
HB1AC_1
t
1.384
df
244
Sig. (2-tailed)
.168
Mean
Difference
.13373
95% Confidence
Interval of the
Difference
Lower
Upper
-.0566
.3240
The mean difference between the starting HbA1c level and 7 is 0.13
(95% CI: -0.06, 0.32). The starting HbA1c level is not statistically
significantly different from 7 (p=0.168).
(NOTE: The CI also includes 0)
Non-normally
distributed data
Sign test / Wilcoxon signed ranks test
Theory: Sign Test
• The simplest non-parametric test.
• For each subject, subtract the value you are testing
against from each of the observed values, writing
down the sign of the difference. (That is write “-” if
the difference score is negative, and “+” if it is
positive.)
• If the groups are the same then we should have equal
numbers of “+” and “-”. If we get more of one than the
other then we start to build evidence against the
group being the same as the test value.
Sign test: Example
ID M6Score Test value Difference Sign
(1)
(2)
(1-2)
1
13
12
1
+
3
6
12
-6
-
4
26
12
14
+
5
7
12
-5
-
6
9
12
-3
-
7
0
12
-12
-
9
5
12
-7
-
10
11
12
-1
-
Theory: Wilcoxon signed rank test (WSRT)
The Sign test ignores all information about magnitude of
difference. WSRT looks at the sign of the difference and also the
magnitude.
1. Calculate the difference between the observed value and the test
value.
2. Rank the differences in order from smallest to largest, ignoring
the sign of the values.
3. Assign rank values to the numbers. 1 for the smallest all the way
up to n for the largest.
4. Add up the ranks for the positive differences and then for the
negative differences.
5. If the sample is the same as the test value then we should have
equal sums of ranks for the positive and negative differences. If
one is higher than the other then we start to build evidence
against the sample being the same as the test value.
Wilcoxon signed rank test : Example
Difference Sign
(1-2)
1
+
-6
-
14
+
-5
-
-3
-
-12
-
-7
-
-1
-
1.
Difference
1
1
3 5 6 7 12 14
Sign
+
-
-
Rank
-
-
-
-
+
1.5 1.5 3 4 5 6
7
8
2.
3.
(When there are ties you apply the average rank
to all tied scores.)
(Sum: For this section of data): 4. + 5.
Positive: 1.5 + 8 = 9.5
Negative: 1.5 + 3 + 4 + 5 + 6 + 7 = 26.5
Checking the distribution of M6SCORE
* Checking the distribution of M6SCORE .
GRAPH
/HISTOGRAM(NORMAL)=M6SCORE
/TITLE= 'Histogram of 6 month CISR
scores'.
Graphs  Legacy Dialogs  Histogram…
The distribution of 6 month CISR
SPSS – Setting up the test value
Transform  Compute Variable…
* Setting up the test value .
COMPUTE TESTVALUE = 12 .
EXECUTE .
Info: Creating new variables in SPSS
1) From the menus select ‘Transform’  ‘Compute Variable…’.
2) Enter the name of the new variable that you want to create into
the ‘Target Variable:’ box.
3) Enter the formula for the new variable into the ‘Numeric
Expression’ box.
●
In this case we want to create a variable that just contains a
constant value, so we just enter that value into the
‘Numeric Expression’ box.
4) Finally click ‘OK’ to produce the new variable or ‘Paste’ to add
the syntax for this into your syntax file.
SPSS – Sign & Wilcoxon signed ranks tests
Analyze  Nonparametric Tests  2 Related Samples…
SPSS – Sign & Wilcoxon signed ranks tests
Sign Test
Wilcoxon signed
ranks Test
* One sample sign test vs. a value of 12 .
NPAR TEST
/SIGN= M6SCORE WITH TESTVALUE (PAIRED)
/STATISTICS DESCRIPTIVES QUARTILES
/MISSING ANALYSIS.
* One sample Wilcoxon signed ranks test vs. a value of 12 .
NPAR TEST
/WILCOXON=M6SCORE WITH TESTVALUE (PAIRED)
/STATISTICS DESCRIPTIVES QUARTILES
/MISSING ANALYSIS.
Info: Sign & Wilcoxon signed ranks tests in SPSS
1) From the menus select ‘Analyze’  ‘Nonparametric Tests’ 
‘2 Related Samples…’.
2) Click the two variables that you want to test from the list on the
left. After you click the first variable hold down the Ctrl key to
select the 2nd.
In this case we select the variable that we want to compare,
as well as the newly created constant variable.
3) Click the
button to move this pair of variables into the ‘Test
Pairs:’ box.
●
4) Ensure that the test(s) that you want to conduct are ticked in
the ‘Test Type’ box.
5) Click the ‘Options’ button and then select ‘Descriptive’ and
‘Quartiles’ from the ‘Statistics’ box.
6) Finally click ‘OK’ to produce the selected test(s) or ‘Paste’ to
add the syntax for this into your syntax file.
SPSS – Sign test: Output
Frequencies
N
TESTVALUE - M6SCORE
Negative Differencesa
Pos itive Differencesb
Ties c
Total
a. TESTVALUE < M6SCORE
b. TESTVALUE > M6SCORE
36
70
3
109
Observed numbers of
positive, negative and
tied values. You would
expect half positive and
half negative if there
was no difference.
c. TESTVALUE = M6SCORE
Test Statisticsa
Z
Asymp. Sig. (2-tailed)
a. Sign Test
TESTVALUE M6SCORE
-3.205
.001
2 sided p value with an alternative
hypothesis of non-equality. Highly
significant (P=0.001) hence strong
evidence against the sample
median being 12.
SPSS – Wilcoxon signed ranks test: Output
Ranks
N
TESTVALUE - M6SCORE
Negative Ranks
Pos itive Ranks
Ties
Total
a. TESTVALUE < M6SCORE
b. TESTVALUE > M6SCORE
c. TESTVALUE = M6SCORE
TESTVALUE M6SCORE
-1.986 a
.047
a. Bas ed on negative ranks.
b. Wilcoxon Signed Ranks Tes t
Mean Rank
61.28
49.50
Sum of Ranks
2206.00
3465.00
Observed sums of ranks (and mean rank) for the
positive and negative values.
Test Statisticsb
Z
Asymp. Sig. (2-tailed)
36a
70b
3c
109
2 sided p value with an alternative
hypothesis of non-equality. Just
significant (P=0.047) hence
significant evidence against the
sample median being 12.
Practical Questions
Analysing Continuous Data
Question 3
Practical Questions
3) Assuming that the outcome variable is NOT
normally distributed conduct a suitable statistical
test to compare the starting HbA1c level (HBA1C_1)
against a value of 7. What are the key statistics that
should be reported and what are your conclusions
from this test?
Practical Solutions
3)
For the non-parametric test there is only a p value to report from the
test (although the median could be reported from elsewhere and a CI
could be calculated from CIA).
Ranks
N
TESTVAL - HB1AC_1
Negative Ranks
Pos itive Ranks
Ties
Total
135 a
110 b
0c
245
Mean Rank
121.01
125.44
Sum of Ranks
16337.00
13798.00
Test Statisticsb
Z
Asymp. Sig. (2-tailed)
TESTVAL HB1AC_1
-1.143 a
.253
a. TESTVAL < HB1AC_1
a. Bas ed on positive ranks.
b. TESTVAL > HB1AC_1
b. Wilcoxon Signed Ranks Tes t
c. TESTVAL = HB1AC_1
The starting HbA1c level is not statistically significantly different from
7 (p=0.253).
Comparing two sets of
paired observations
Paired samples t test or
Wilcoxon signed ranks test
Normally
distributed data
Paired samples t test
Paired samples t test
Used when testing one set of values against another
set of values, when either the two sets of values come
from the same subjects or when they come from two
groups who are NOT independent of each other.
Change
Paired samples t test: Hypotheses
H 0 : X A  XB
H1 : X A  XB
The null hypothesis (H0) is that the measures
are the same. The alternative hypothesis (H1)
is that they are not.
Theory: Paired samples t test
Notice how the formula is very similar to that for the one sample
test:
D μ 0
t
sD n
Where:
D = Mean of the differences
μ 0 = Hypothesised mean (or test mean)
S D = Standard deviation of the difference
n
= Number of paired samples
This is distributed with (n-1) degrees of freedom.
Theory: Paired samples t test…
D μ 0
t
sD n
vs.
x μ
t
s n
If you calculate the difference between each set of the
paired observations, then you do a one sample t test
on this new ‘change’ variable (against a mean value
of 0  no difference) you will get the same result as
having done a paired sample t test on the original
paired variables.
The distribution of the difference in CISR
* Calculating the difference .
COMPUTE DIFF = B0SCORE - M6SCORE.
EXECUTE .
Info: Creating new variables in SPSS
1) From the menus select ‘Transform’  ‘Compute Variable…’.
2) Enter the name of the new variable that you want to create into
the ‘Target Variable:’ box.
3) Enter the formula for the new variable into the ‘Numeric
Expression’ box.
●
In this case we just want to create the difference between
the two variables, so we just enter one of the variables
(usually the larger of the two) minus the other variable into
the ‘Numeric Expression’ box. This was ‘B0SCORE -
M6SCORE’ in this case.
4) Finally click ‘OK’ to produce the new variable or ‘Paste’ to add
the syntax for this into your syntax file.
SPSS – Paired samples t test
* Paired sample t test .
T-TEST
PAIRS = B0SCORE WITH M6SCORE
(PAIRED)
/CRITERIA = CI(.95)
/MISSING = ANALYSIS.
This button
will allow you
to change the
order
Analyze  Compare Means  Paired-Samples T Test…
Info: Paired samples t test in SPSS
1) From the menus select ‘Analyze’  ‘Compare
Means’  ‘Paired-Samples T Test…’.
2) Click the two variables that you want to test from the
list on the left. After you click the first variable hold
down the Ctrl key to select the 2nd.
3) Click the
button to move this pair of variables
into the ‘Paired Variables’ box.
4) Finally click ‘OK’ to produce the paired samples t
test or ‘Paste’ to add the syntax for this into your
syntax file.
SPSS – Paired samples t test: Output
Paired Samples Statistics
Pair
1
B0SCORE
M6SCORE
Mean
27.3119
11.5413
N
109
109
Summary statistics
Std. Deviation
10.75286
11.12099
Std. Error
Mean
1.02994
1.06520
95% confidence interval
for the true mean difference
Ordering of difference
Paired Samples Test
Paired Differences
Pair 1
B0SCORE - M6SCORE
Mean
15.77064
Std. Deviation
12.24490
Std. Error
Mean
1.17285
95% Confidence
Interval of the
Difference
Lower
Upper
13.44585 18.09543
t
13.446
df
108
Sig. (2-tailed)
.000
mean difference
2 sided p value with an alternative hypothesis of non-equality of variables.
Highly significant (P<0.001) hence significant evidence against baseline
and 6 month CISR scores being equal.
Non-normally
distributed data
Wilcoxon signed ranks test
SPSS – Wilcoxon signed ranks test
Analyze  Nonparametric Tests  2 Related Samples…
* Wilcoxon signed ranks test .
NPAR TEST
/WILCOXON=BOSCORE WITH M6SCORE (PAIRED)
/STATISTICS DESCRIPTIVES QUARTILES
/MISSING ANALYSIS.
Info: Wilcoxon signed ranks tests in SPSS
1) From the menus select ‘Analyze’  ‘Nonparametric Tests’ 
‘2 Related Samples…’.
2) Click the two variables that you want to test from the list on the
left. After you click the first variable hold down the Ctrl key to
select the 2nd.
3) Click the
button to move this pair of variables into the ‘Test
Pairs:’ box.
4) Click the ‘Options’ button and then select ‘Descriptive’ and
‘Quartiles’ from the ‘Statistics’ box.
5) Finally click ‘OK’ to produce the Wilcoxon signed ranks test or
‘Paste’ to add the syntax for this into your syntax file.
SPSS – Wilcoxon signed ranks test: Output
Ranks
N
M6SCORE - B0SCORE
Negative Ranks
Pos itive Ranks
Ties
Total
a. M6SCORE < B0SCORE
b. M6SCORE > B0SCORE
c. M6SCORE = B0SCORE
6b
2c
109
Mean Rank
55.45
29.67
Sum of Ranks
5600.00
178.00
Observed sums of ranks (and mean rank) for the
positive and negative values.
Test Statisticsb
Z
Asymp. Sig. (2-tailed)
101 a
M6SCORE B0SCORE
-8.427 a
.000
a. Bas ed on positive ranks .
b. Wilcoxon Signed Ranks Tes t
2 sided p value with an alternative
hypothesis of non-equality. Highly
significant (P<0.001) hence
significant evidence against
baseline and 6 month CISR scores
being equal.
Practical Questions
Analysing Continuous Data
Question 4 and 5
Practical Questions
4)
Assuming that the outcome variable is normally distributed:
i.
ii.
5)
Conduct a suitable statistical test to compare the reduction in
HbA1c level (HBA1C_1 & HBA1C_2) across the entire cohort.
What are the key statistics that should be reported and what are
your conclusions from this test?
Conduct a suitable statistical test to look directly at the reduction
in HbA1c level (HBA1CRED). Does this test agree with the one
from part i?
Assuming that the outcome variable is NOT normally distributed:
i.
ii.
Conduct a suitable statistical test to compare the reduction in
HbA1c level (HBA1C_1 & HBA1C_2) across the entire cohort.
What are the key statistics that should be reported and what are
your conclusions from this test?
Conduct a suitable statistical test to look directly at the reduction
in HbA1c level (HBA1CRED). Does this test agree with the one
from part i?
Practical Solutions
4)
(i) Along with the observed mean HbA1c difference, its confidence
interval and the p value should be reported.
Paired Samples Statistics
Pair
1
HB1AC_1
HB1AC_2
Mean
7.1337
6.0806
N
245
245
Std. Deviation
1.51228
1.69735
Std. Error
Mean
.09662
.10844
Paired Samples Test
Paired Differences
Pair 1
HB1AC_1 - HB1AC_2
Mean
1.05314
Std. Deviation
.91526
Std. Error
Mean
.05847
95% Confidence
Interval of the
Difference
Lower
Upper
.93796
1.16832
t
18.010
df
244
Sig. (2-tailed)
.000
The mean difference in HbA1c levels is 1.05 (95% CI: 0.94, 1.17). There
is a highly significant decrease in the levels of HbA1c between the two
readings (p<0.001).
Practical Solutions
4)
(ii) Along with the observed mean HbA1c difference, its confidence
interval and the p value should be reported.
One-Sample Statistics
N
Blood s ugar reduction
245
Mean
1.0531
Std. Deviation
.91526
Std. Error
Mean
.05847
One-Sample Test
Tes t Value = 0
Blood s ugar reduction
t
18.010
df
244
Sig. (2-tailed)
.000
Mean
Difference
1.05314
95% Confidence
Interval of the
Difference
Lower
Upper
.9380
1.1683
The mean reduction in Hb1Ac levels is 1.05 (95% CI: 0.94, 1.17). There
is a highly significant decrease in the levels of HbA1c between the two
readings (p<0.001).
(This is identical to the result from part i)
Practical Solutions
5) (i) For the non-parametric test there is only a p value to report
from the test (although the medians could be reported from
elsewhere and a CI for the difference could be calculated from
CIA).
Ranks
Test Statisticsb
N
HB1AC_2 - HB1AC_1
Negative Ranks
Pos itive Ranks
Ties
Total
a. HB1AC_2 < HB1AC_1
b. HB1AC_2 > HB1AC_1
218 a
27b
0c
245
Mean Rank
131.26
56.30
Sum of Ranks
28615.00
1520.00
Z
Asymp. Sig. (2-tailed)
HB1AC_2 HB1AC_1
-12.200 a
.000
a. Bas ed on positive ranks.
b. Wilcoxon Signed Ranks Tes t
c. HB1AC_2 = HB1AC_1
There is a highly significant difference in the levels of HbA1c
between the two readings (p<0.001), with the initial values
being significantly higher.
Practical Solutions
5)
(ii) For the non-parametric test there is only a p value to report from
the test (although the medians could be reported from elsewhere and a
CI for the difference could be calculated from CIA).
Ranks
Test Statisticsb
N
NULL - Blood
s ugar reduction
Negative Ranks
Pos itive Ranks
Ties
Total
218 a
27b
0c
245
Mean Rank
131.26
56.30
Sum of Ranks
28615.00
1520.00
Z
Asymp. Sig. (2-tailed)
NULL Blood s ugar
reduction
-12.200 a
.000
a. NULL < Blood sugar reduction
a. Bas ed on pos itive ranks.
b. NULL > Blood sugar reduction
b. Wilcoxon Signed Ranks Test
c. NULL = Blood sugar reduction
There is a highly significant difference in the levels of HbA1c between
the two readings (p<0.001), with the initial values significantly higher.
(This is identical to the result from part i)
Comparing
two independent groups
Independent samples t test or
Mann Whitney U test
Normally
distributed data
Independent samples t test
Independent samples t test
Used when testing one set of values from one group
against another set of values from a separate group,
when the two groups are independent of each other.
Independent samples t test: Hypotheses
H 0 : X1  X 2
H1 : X1  X2
The null hypothesis (H0) is that the groups are
the same. The alternative hypothesis (H1) is
that they are not.
Theory: 2 Variances? Pooling the variance
If the two sample variances are similar (rule of thumb
– largest standard deviation not double the other)
then the two estimates can be pooled to give a
common variance (S2):
S12  S 22  S 2
Levenes test can be used to formally test whether
they are significantly different.
Theory: Estimating a common variance
The simplest pooled estimate is:
Where:
2
1
s ,s
s s
s 
2
2
2
1
2
2
2
2 = Sample variance of group 1 and
group 2 respectively.
This assigns equal weight to both sample estimates
regardless of sample size. When we have two
samples of differing sizes we need to use another
method.
Theory: Estimating common variance…
If the sample sizes are not equal then it can be shown
that the best pooled estimator of the common
variance is:
2
2




n

1
s

n

1
s
1
2
2
s2  1
n1  n 2  2
2
2
Where: s1 , s 2 = Sample variance of group 1 and group
2 respectively.
n1 , n 2= Size of sample in group 1 and group 2
respectively.
This formula gives more importance to the estimate
that is calculated from the larger sample size.
Theory: Independent samples t test
The following equation is used to calculate the value of t:
t
x1  x 2   μ 0
1
1 
s   
 n1 n 2 
Where:
x1 , x 2 = Sample mean of group 1 and group 2 respectively
μ 0 = Null hypothesis (usually 0)
S = Pooled estimate of standard deviation
n1 , n 2 = Size of sample in group 1 and group 2 respectively
This is distributed with n1  n 2  2 degrees of freedom.
SPSS – Independent samples t test
* Independent groups t test .
T-TEST
GROUPS = TMTGR(2 1)
/MISSING = ANALYSIS
/VARIABLES = B0SCORE
/CRITERIA = CI(.95) .
Analyze  Compare Means  Independent-Samples T Test…
Info: Independent samples t test in SPSS
1) From the menus select ‘Analyze’  ‘Compare Means’ 
‘Independent-Samples T Test…’.
2) Put the variable that you want to test into the ‘Test Variable(s):’
box.
3) Put the categorical variable, that indicates which group the
values come from, into the ‘Grouping Variable:’ box.
4) Click the ‘Define Groups…’ button and then enter the numeric
codes of the 2 groups that you want to compare. Click
‘Continue’.
5) Finally click ‘OK’ to produce the test results or ‘Paste’ to add
the syntax for this into your syntax file.
SPSS – Independent samples t test: Output
Group Statistics
B0SCORE
TMTGR
CMHN
GP
N
40
28
Mean
28.8750
23.8571
Std. Deviation
10.27116
11.01418
Std. Error
Mean
1.62401
2.08148
Group summary
statistics
Observed difference in means
95% confidence interval
for the true mean
difference
Here the standard deviations are similar,
hence we can assume equal variances
Independent Samples Test
Levene's Test for
Equality of Variances
F
B0SCORE
Equal variances
ass umed
Equal variances
not as sumed
.052
Sig.
.820
t-tes t for Equality of Means
t
df
Sig. (2-tailed)
Mean
Difference
Std. Error
Difference
95% Confidence
Interval of the
Difference
Lower
Upper
1.925
66
.059
5.01786
2.60729
-.18777
10.22349
1.901
55.611
.063
5.01786
2.64008
-.27167
10.30738
2 sided p value with an alternative hypothesis of non-equality of
groups. Not quite significant at the 5% level (P=0.059) hence no
significant evidence that the groups are different.
Non-normally
distributed data
Mann Whitney U test
Mann Whitney U test : Two-sample test
• Also known as Wilcoxon rank sum test (we will avoid
this name to avoid confusion).
• Similar to the Sign test and Wilcoxon Signed ranks
test in that it compares the ranks.
• This test compares the 2 groups rather than the
positives against the negatives.
• To account for the fact that there may be different
numbers of observations in each group the mean rank
is compared rather than the raw sum of the ranks.
SPSS – Mann-Whitney U test
* Mann-Whitney U test .
NPAR TESTS
/M-W= M6SCORE BY TMTGR(1 2)
/MISSING ANALYSIS.
Analyze  Nonparametric Tests  2 Independent Samples…
Info: Mann-Whitney U test in SPSS
1) From the menus select ‘Analyze’  ‘Nonparametric Tests’ 
‘2 Independent Samples…’.
2) Put the variable that you want to test into the ‘Test Variable
List:’ box.
3) Put the categorical variable, that indicates which group the
values come from, into the ‘Grouping Variable:’ box.
4) Click the ‘Define Groups…’ button and then enter the numeric
codes of the 2 groups that you want to compare. Click
‘Continue’.
5) Ensure that the ‘Mann-Whitney U’ option is ticked in the ‘Test
Type’ box.
6) Finally click ‘OK’ to produce the test results or ‘Paste’ to add
the syntax for this into your syntax file.
SPSS – Mann-Whitney U test: Output
Observed rank sums
and mean ranks for
each group
2 sided p value with an alternative
hypothesis of non-equality of groups.
Significant (P=0.010) hence significant
evidence of a difference between the
groups.
Practical Questions
Analysing Continuous Data
Question 6 and 7
Practical Questions
6)
7)
Assuming that the outcome variable is normally distributed:
i.
Conduct a suitable statistical test to compare the finishing HbA1c
level (HBA1C_2) between groups 1 and 2. What are the key
statistics that should be reported and what are your conclusions
from this test?
ii.
Repeat the analysis from part i, but this time comparing groups 1
and 3. What are your conclusions from this test?
Assuming that the outcome variable is NOT normally distributed:
i.
Conduct a suitable statistical test to compare the finishing HbA1c
level (HBA1C_2) between groups 1 and 2. What are the key
statistics that should be reported and what are your conclusions
from this test?
ii.
Repeat the analysis from part i, but this time comparing groups 1
and 3. What are your conclusions from this test?
Practical Solutions
6)
(i) Along with the observed group means for the final HbA1c, the mean
difference, its confidence interval and the p value should be reported.
Group Statistics
HB1AC_2
Treatment group
Active B
Active A
N
Mean
6.0132
5.7208
80
83
Std. Deviation
1.60600
1.79766
Std. Error
Mean
.17956
.19732
Independent Samples Test
Levene's Test for
Equality of Variances
F
HB1AC_2
Equal variances
ass umed
Equal variances
not as sumed
.230
Sig.
.632
t-tes t for Equality of Means
t
df
Sig. (2-tailed)
Mean
Difference
Std. Error
Difference
95% Confidence
Interval of the
Difference
Lower
Upper
1.094
161
.276
.29239
.26734
-.23555
.82034
1.096
160.089
.275
.29239
.26679
-.23448
.81927
There is a mean difference of 0.29 (95% CI: -0.24, 0.82) between Active A and B,
with Active B having the larger final HbA1c. This small difference however is not
statistically significant (p=0.276).
(The SD’s are similar  Use equal variances version)
Practical Solutions
6) Shown below is one way of tabulating these results:
Table 1: Comparison between Active treatments
Active B – Active A
Outcome
Active A
Active B
Mean difference
(95% CI)
Blood
sugar
reduction
mean (SD)
5.72 (1.80)
6.01 (1.61)
min to max 1.36 to 10.36 2.31 to 9.88
n
83
80
0.29 (-0.24, 0.82)
P value
0.276
Practical Solutions
6)
(ii) Along with the observed group means for the final HbA1c, the mean
difference, its confidence interval and the p value should be reported.
Group Statistics
HB1AC_2
Treatment group
Placebo
Active A
N
Mean
6.5105
5.7208
82
83
Std. Deviation
1.60229
1.79766
Std. Error
Mean
.17694
.19732
Independent Samples Test
Levene's Test for
Equality of Variances
F
HB1AC_2
Equal variances
ass umed
Equal variances
not as sumed
.511
Sig.
.476
t-tes t for Equality of Means
t
df
Sig. (2-tailed)
Mean
Difference
Std. Error
Difference
95% Confidence
Interval of the
Difference
Lower
Upper
2.977
163
.003
.78968
.26522
.26596
1.31339
2.980
161.308
.003
.78968
.26504
.26629
1.31306
There is a mean difference of 0.79 (95% CI: 0.27, 1.31) between Active A and
Placebo, with Placebo having the larger final HbA1c. This difference is highly
statistically significant (p=0.003).
(The SD’s are similar  Use equal variances version)
Practical Solutions
7)
(i) For the non-parametric test, again, there is only a p value to report
from the test (although the group medians could be reported from
elsewhere and a CI for the difference could be calculated from CIA).
Ranks
HB1AC_2
Treatment group
Active A
Active B
Total
N
Mean Rank
78.58
85.55
83
80
163
Sum of Ranks
6522.00
6844.00
Test Statisticsa
Mann-Whitney U
Wilcoxon W
Z
Asymp. Sig. (2-tailed)
HB1AC_2
3036.000
6522.000
-.943
.346
a. Grouping Variable: Treatment group
The final HbA1c level for the Active A and Active B groups does not
show a statistically significant difference (p=0.346), although the active
B group had slightly higher values.
Practical Solutions
7)
(ii) For the non-parametric test, again, there is only a p value to report
from the test (although the group medians could be reported from
elsewhere and a CI for the difference could be calculated from CIA).
Ranks
HB1AC_2
Treatment group
Active A
Placebo
Total
N
Mean Rank
72.41
93.72
83
82
165
Sum of Ranks
6010.00
7685.00
Test Statisticsa
Mann-Whitney U
Wilcoxon W
Z
Asymp. Sig. (2-tailed)
HB1AC_2
2524.000
6010.000
-2.865
.004
a. Grouping Variable: Treatment group
The final HbA1c level for the Active A and Placebo groups shows a
highly statistically significant difference (p=0.004), with the Placebo
group having significantly higher values.
Summary
You should now be able to choose between, perform (using SPSS)
and interpret the results from:
– Comparing a sample against a pre-specified value:
• Parametric: One sample t test
• Non-parametric: Sign test or Wilcoxon signed ranks test.
– Comparing two paired sets of results:
• Parametric: Paired samples t test
• Non-parametric: Wilcoxon signed ranks test.
– Comparing two independent groups:
• Parametric: Independent Samples t test
• Non-parametric: Mann-Whitney U test.
References
Parametric
• Practical statistics for medical research, D Altman: Chapter 9.
• Medical statistics, B Kirkwood, J Stern: Chapters 7 & 9.
• An introduction to medical statistics, M Bland: Chapter 10.
• Statistics for the Terrified: Testing for differences between groups.
Non-parametric
• Practical statistics for medical research, D Altman: Chapter 9.
• Medical statistics, B Kirkwood, J Stern: Chapter 30.
• An introduction to medical statistics, M Bland: Chapter 12.
• Statistics for the Terrified: Testing for differences between groups.
• Practical Non-Parametric Statistics (3rd ED), W.J. Conover.