Download WED_UnivStats - KEATS

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Foundations of statistics wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

History of statistics wikipedia , lookup

Taylor's law wikipedia , lookup

Psychometrics wikipedia , lookup

Analysis of variance wikipedia , lookup

Omnibus test wikipedia , lookup

Misuse of statistics wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript
Introduction to
Univariate Statistics
MSc in Pharmaceutical Medicine
Statistics & Data Management Module
Irene Rebollo-Mesa
Senior Lecturer in Trials
King’s Clinical Trials Unit, Biostatistics
Institute of Psychiatry, King’s College London
[email protected]
Outline
1 Comparing 2 Independent Samples
2 Comparing differences in a Paired Sample
3 Compare Several Independent Samples
4 Conclusions
2
1 Comparing 2
Independent Samples
3
Univariate Statistical Methods:
Comparing 2 Independent Samples
Design or Aim of Study
Type of Outcome
Data/Assumptions
Statistical Method
COMPARE TWO INDEPENDENT SAMPLES
Compare Two Means
Continuous, Normal dist.
Independent Samples T-test
Compare Two Proportions
Categorical, Binary, all >5
Chi-squared test
Compare Two Proportions
Categorical, Binary, some <5
Fisher’s exact test
Compare Distributions
Ordinal
Wilcoxon 2-sample signed rank
test (Mann Whitney U test)
Compare Time to an Event Time to event
in Two groups
Logrank test
4
Independent Samples t-test
Design or Aim of Study
Type of Outcome
Data/Assumptions
Statistical Method
COMPARE TWO INDEPENDENT SAMPLES
Compare Two Means
Continuous, Normal dist.
Independent Samples T-test
Compare Two Proportions
Categorical, Binary, all >5
Chi-squared test
Compare Two Proportions
Categorical, Binary, some <5
Fisher’s exact test
Compare Distributions
Ordinal
Wilcoxon 2-sample signed rank
test (Mann Whitney U test)
Compare Time to an Event Time to event
in Two groups
Logrank test
5
Independent Samples t-test
The Two-sample t-test is based on the difference in sample means divided by
the standard error (s.e.) of the difference in sample means:
Difference in Sample Means
t=
s.e.(Difference in Sample Means)
oUnder the Null Hypothesis (H0) we expect to see no difference in
means.
oThe larger the sample difference in means (either positive or
negative) the more our data appear to depart from the H0.
oThe standard error (=standard deviation / n) of the sample
mean tells us how far from zero we might expect the sample mean
to be under the N
6
Independent Samples t-test: Example
Cocco G, Pandolfi S, Rousson V:
Sufficient Weight Reduction Decreases Cardiovascular Complications in
Diabetic Patients with the Metabolic Syndrome. A Randomized Study of
Orlistat as an Adjunct to Lifestyle Changes (Diet and Exercise)
Heart Drug 2005;5:68-74
90 patients with metabolic syndrome and diabetes
Aged: >35 years; BMI: 31-40; LVEF: 42-50%
Treatment: Daily exercise, diet and either Orlistat or Placebo
Primary Outcome: Weight loss after diet of 6 months (kgs)
Orlistat
Placebo
0.8
3.0
7.4
7.9
8.6
3.1
8.6
10.8
6.0
3.6
6.0
6.1
7.9
6.0
3.0
8.7
6.2
3.2
4.2
7.6
16.0
3.1
6.9
5.6
8.3
7.7
4.4
4.6
7.3
11.6
13.7
7.3
7.8
6.7
-3.9
1.8
3.3
2.3
1.1
5.0
-0.8
7.0
4.3
-2.8
-3.4
-0.9
1.5
3.4
5.6
5.2
6.4
2.9
5.6
3.7
1.7
2.2
3.8
5.5
0.7
4.6
1.4
2.0
-0.2
4.9
1.9
5.7
2.0
4.5
3.4
4.5
3.8
2.9
2.5
3.3
1.5
1.5
5.9
4.9
4.5
-2.1
-0.5
2.2
1.6
-0.6
0.0
-1.5
6.3
1.7
-6.2
-1.9
7
Independent Samples t-test:
Example Histograms
8
Independent Samples t-test:
Example Boxplots
9
Independent Samples t-test:
Example Rationale
• Patients can be expected to respond differently
• Orlistat does not reduce weight in every patient
• Some patients who receive Placebo lose weight.
• The sample means for this experiment are :
Orlistat: 5.41 kgs Placebo: 2.48 kgs
NULL Hypothesis
• “In the population of patients with metabolic syndrome and
diabetes aged more than 35 the difference in mean weight
changes at 8 months is zero kgs”
• Patients can be expected to respond differently
• Does the difference in sample means , 2.93 kgs, provide
evidence that the null hypothesis is not true?
10
Independent Samples t-test:
Example Calculation
The Two-sample t-test is based on the difference in
sample means divided by the standard error (s.e.) of
the difference in sample means:
Difference in Sample Means
t=
s.e.(Difference in Sample Means)
Weight Change Example:
Difference in Sample Means: 5.41 -2.48= 2.93 kgs
s.e (Difference in Sample Means) : 0.72 kgs
t = 2.93/0.72 = 4.09 : p-value=0.000096
11
Independent Samples t-test:
Assumptions
• Each sample is representative of the population
• There are no difference between the samples other
than the treatments
• Measurements in each sample are independent of each
other and of the measurements in the other sample
• Measurements are Normally distributed in the
population
• The variances in each sample are the same
• If the assumptions of the test are not met, the p-value
may be misleading
12
Binary Outcome Data: Chi-squared test
Design or Aim of Study
Type of Outcome
Data/Assumptions
Statistical Method
COMPARE TWO INDEPENDENT SAMPLES
Compare Two Means
Continuous, Normal dist.
Independent Samples T-test
Compare Two
Proportions
Categorical, Binary, all >5
Chi-squared test
Compare Two Proportions
Categorical, Binary, some <5
Fisher’s exact test
Compare Distributions
Ordinal
Wilcoxon 2-sample signed rank
test (Mann Whitney U test)
Compare Time to an Event Time to event
in Two groups
Logrank test
13
Binary Outcome Data: Chi-squared test
•H0 :Graft outcome does not differ between donor types.
•Overall kidney failure rate = 178/819 =0.217 (21.7%)
•If there were no difference between donor types we would expect the
same rates in each group
•In other words amongst ,e.g. the 541 Deceased HB inoculated we would
EXPECT 541*0.217=117 with kidney failure
•Similarly amongst the 175 living related we would EXPECT 175*0.217=38
with kidney failure
14
Binary Outcome Data: Chi-squared test
•As a measure of how “far” the expected (E) outcomes are from the
observed (O) outcomes we calculate
(O - E)2
å
•In this example :
E
( 401 - 423) 2 (140 -117) 2 (151 -137) 2 (24 - 38) 2
423
+
117
+
137
38
... = 21.298
•And this is referred to the chi-squared distribution (3 df [ncol-1*nrow1] ) giving a p-value < .001.
15
Binary Outcome Data:
Interpreting Results
•Expected frequencies, proportions and standardized
residuals.
16
Binary Outcome Data: Size of Effect
•Relative Risk between 2 groups: living vs. deceased.
Most appropriate for prospective-cohort studies
with complete follow up. H0: RR=1.
RR=2.21 -> Kidney failure is 2.21 times higher in
deceased donors than in living donors.
n11
Rr =
n 21
n1+
n 2+
p1
=
p2
•Odds Ratio between 2 groups: living vs. deceased.
n11
Most appropriate for retrospective or case-control
studies. Better mathematical properties, can be
adjusted using log.reg. It approximates RR for small
prev. H0: OR=1.
OR=2.63 -> Among deceased donors the odds of
kidney failure is 2.63 times higher than in living
donors
Or =
n21
p1
(1- p1 )
n12
=
n22
p2
(1- p2 )
17
Binary Outcome Data: Fisher’s Exact Test
Design or Aim of Study
Type of Outcome
Data/Assumptions
Statistical Method
COMPARE TWO INDEPENDENT SAMPLES
Compare Two Means
Continuous, Normal dist.
Independent Samples T-test
Compare Two Proportions
Categorical, Binary, all >5
Chi-squared test
Compare Two
Proportions
Categorical, Binary, some <5
Fisher’s exact test
Compare Distributions
Ordinal
Wilcoxon 2-sample signed rank
test (Mann Whitney U test)
Compare Time to an Event Time to event
in Two groups
Logrank test
18
Binary Outcome Data: Fisher’s Exact Test
•Two Frequencies <5  chi-squared invalid.
•Fisher’s exact only for 2x2 tables. Based on the probs. Associated with all
possible tables based on marginal totals. (no simple formula…)
19
Binary Outcome Data:
Non-Parametric Wilcoxon signed rank test
Design or Aim of Study
Type of Outcome
Data/Assumptions
Statistical Method
COMPARE TWO INDEPENDENT SAMPLES
Compare Two Means
Continuous, Normal dist.
Independent Samples T-test
Compare Two Proportions
Categorical, Binary, all >5
Chi-squared test
Compare Two Proportions
Categorical, Binary, some <5
Fisher’s exact test
Compare Distributions
Ordinal
Wilcoxon 2-sample signed rank
test (Mann Whitney U test)
Compare Time to an Event Time to event
in Two groups
Logrank test
20
Binary Outcome Data:
Non-Parametric Wilcoxon signed rank test
• If the Normal assumptions are not met, you can try nonparametric statistics, which do not assume a particular
family of distribution for the data.
• Wilcoxon signed rank test (Mann-Whitney U test):
It’s the non-parametric 2-sample T test .
NHThere is no tendency for members of one
population to exceed members of the other.
21
Non-Parametric Wilcoxon signed rank test
1. Rank the data ignoring groups
2. Add the ranks in each group separately to give T1 and T2
3. The test statistic is the smallest of the Ts
 If the NH were true, we would expect the two rank
sums to be about the same
The smaller the T, the lower the probability of the data
arising by chance.
The T is compared with the expected smallest T given
the sample sizes of each group  if smaller p<0.05
22
Weight Loss
-6.2
-3.9
-3.4
-2.8
-2.1
-1.9
-1.5
-0.9
-0.8
-0.6
-0.5
-0.2
0
0.7
0.8
1.1
1.4
1.5
1.5
1.5
1.6
1.7
1.7
1.8
1.9
…
Group
Placebo
Orlistrat
Orlistrat
Orlistrat
Placebo
Placebo
Placebo
Placebo
Orlistrat
Placebo
Placebo
Placebo
Placebo
Placebo
Orlistrat
Orlistrat
Placebo
Placebo
Placebo
Placebo
Placebo
Placebo
Placebo
Orlistrat
Placebo
Rank
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
19
19
19
21
23
23
24
25
Wilcoxon signed rank test:
Orlistat Example
N Placebo = 45; N Orlistat = 45
W=1,548, p = 0.0000428
23
Non-Parametric Tests: Be careful?
• Non-parametric tests are less powerful
• You can not get Confidence Intervals
• They are useless with very small samples: It is
not true that you should always use nonparametrics with small samples!!!
• They are very useful with ordinal data because
they are based on ranks.
24
2 Comparing Differences
in a Paired Sample
25
Univariate Statistical Methods:
Differences in a Paired Sample
Design or Aim of Study
Type of Outcome
Data/Assumptions
Statistical Method
COMPARE DIFFERENCES IN A PAIRED SAMPLE
Test Mean Difference
Continuous, Normal dist. for
differences
Paired (matched) Samples T-test
Compare Two paired
Proportions
Categorical, Binary
McNemar’s test
Distribution of Differences Ordinal, symmetrical
distributions
Wilcoxon matched paired test
26
Univariate Statistical Methods:
Differences in a Paired Sample
Design or Aim of Study
Type of Outcome
Data/Assumptions
Statistical Method
COMPARE DIFFERENCES IN A PAIRED SAMPLE
Test Mean Difference
Continuous, Normal dist. for Paired (matched) Samples Tdifferences
test
Compare Two paired
Proportions
Categorical, Binary
Distribution of Differences Ordinal, symmetrical
distributions
McNemar’s test
Wilcoxon matched paired test
27
Paired samples T-test
paired-samples t-test uses the following test statistic:
t = sample mean (difference) / s.e. of sample mean
(difference)
o Under the Null Hypothesis (H0) we expect to see a mean
difference = 0.
o The larger the sample mean difference (either positive or
negative) the more our data appear to depart from the H0.
o The standard error (=standard deviation / n) of the sample
mean tells us how far from zero we might expect the sample
mean to be under the N
28
Paired samples T-test: Example
Effect of an Method of Inhalation on Urinary Albuterol
Excretion
Hindle et al (Chest, 1995)
• 9 Subjects
• Each inhaled 4x100 mg of
Albuterol Using a MeteredDose Inhaler (MDI) and a Dry
Powder Inhaler (DISK)
• Outcome: 30-min Postinhalation Urinary Albuterol
Excretion (% Inhaled Dose)
• Crossover trial
• Within Patient Variation
Subject Period 1 Period 2
1
MDI
DISK
2
DISK
MDI
3
DISK
MDI
4
MDI
DISK
5
DISK
MDI
6
MDI
DISK
7
MDI
DISK
8
DISK
MDI
9
DISK
MDI
Paired samples T-test: Example:
Difference in Urinary Excretion
Differen
ce
-0.15
2
0.26
0.80
-0.54
3
1.18
0.92
0.26
4
1.32
3.45
-2.13
5
0.37
3.85
-3.48
6
2.18
4.96
-2.78
7
2.62
2.11
0.51
8
0.85
1.97
-1.12
9
1.27
2.47
-1.20
Mean
1.19
2.38
-1.18
5
0.85
4
0.70
3
1
Urinary Excretion by Inhalation Method
2
DISK
1
MDI
Urinary Excretion
Volunte
er
MDI
DISK
Inhalation method
30
Paired samples T-test: Example
o Outcome V.: Urinary Excretion-->Continuous
o Independent V.: 1 factor with two levels, within sbjs: MDI vs.
DISK
paired-samples t-test uses the following test statistic:
t = sample mean (difference) / s.e. of sample mean
(difference)
o Under the Null Hypothesis (H0) we expect to see a mean
difference in urinary excretion close to zero in the sample.
o The larger the sample mean difference (either positive or
negative) the more our data appear to depart from the H0.
o The standard error (=standard deviation / n) of the sample
mean tells us how far from zero we might expect the sample
31
mean to be under the N
Paired samples T-test: Example
Null Hypothesis:
“In the population of volunteers studied urinary excretion of
Albuterol is the same whether delivered with MDI or DISK”
Does our sample mean of –1.18 provide any evidence that this
is not true?
Sample mean = -1.181;
S.E. of sample mean = 0.459
Statistic:
t = -1.181/0.459 = -2.574
Using a computer, or published tables, we obtain
p=0.0329 (two-sided)
i.e. if the H0 were true there would only be a 3.3 % chance that we
would see such a large, or larger, mean change in our sample.
32
What Have We Learned?
• Every subject can be expected to respond
differently.
• DISK does not increase the urinary excretion of
every patient.
• The mean difference in urinary excretion was 1.18 mmol/litre (significantly less on MDI)
• We can not be certain how new volunteers will
respond to the same treatment.
33
Assumptions of paired-samples t-test
• The sample is representative of the
population.
• Measurements in the sample are all
independent of one another.
• Measurements are Normally
distributed in the population.
• If the assumptions of a test are not
met the p-value may not be correct.
34
Univariate Statistical Methods:
Compare Two Paired Proportions
Design or Aim of Study
Type of Outcome
Data/Assumptions
Statistical Method
COMPARE DIFFERENCES IN A PAIRED SAMPLE
Test Mean Difference
Continuous, Normal dist. for
differences
Paired (matched) Samples T-test
Compare Two paired
Proportions
Categorical, Binary
McNemar’s test
Distribution of Differences Ordinal, symmetrical
distributions
Wilcoxon matched paired test
35
Binary outcome, Paired samples:
McNemar’s Test
Bronchodialator treatment and deaths from asthma: case-control
study. BMJ 2005
Each patient who
died was matched to
a surviving patient,
and risk factors
looked at  use of β2
Antagonist.
=
•H0 :The prevalence of use of β2 Antagonist is the same among
patients who died and patients who survived
• McNemar’s is based on discordant pairs. Assuming there is no
association there should be as many of each (yes/no) & (no/yes) 36
Binary outcome, Paired samples:
McNemar’s Test
=
Expected Frequency of
Discordant pairs
= (69+45)/2 = 57
S
(O - E )
E
discordant
cells
( 45 - 57)
57
2
+
2
=
( 69 - 57)
57
2
= 5.05
37
Univariate Statistical Methods:
Test Distribution of Differences in Paired Samples
Design or Aim of Study
Type of Outcome
Data/Assumptions
Statistical Method
COMPARE DIFFERENCES IN A PAIRED SAMPLE
Test Mean Difference
Continuous, Normal dist. for
differences
Paired (matched) Samples T-test
Compare Two paired
Proportions
Categorical, Binary
McNemar’s test
Distribution of
Differences
Ordinal, symmetrical
distributions
Wilcoxon matched paired test
38
Ordinal Outcome Data:
Wilcoxon Matched Pairs Test
Placebo
Active
0.68
0.61
0.96
1.00
0.85
0.72
0.93
0.81
0.35
0.26
0.77
0.7
0.74
0.61
0.98
1.00
1.00
0.98
0.28
0.16
0.92
0.79
1.00
0.98
0.93
1.00
1.00
0.96
0.45
0.49
0.82
0.74
0.36
0.26
1.00
0.97
0.49
0.43
0.74
0.66
0.68
0.61
39
Sample Mean
Difference
Sign
A-P
Rank
-0.07
0.04
-0.13
-0.12
-0.09
-0.07
-0.13
0.02
-0.02
-0.12
-0.13
-0.02
0.07
-0.04
0.04
-0.08
-0.1
-0.03
-0.06
-0.08
-0.07
-0.056
+
+
+
+
-
10.5
6
20
17.5
15
10.5
20
2
2
17.5
20
2
10.5
6
6
13.5
16
4
8
13.5
10.5
CROSS-OVER TRIAL:
•Measure of Treatment Effect on
proportion of days with headache
•H0=The distribution of differences
is symmetrical about zero  test
based on probability ditribution of
ranks of the differences
•If the null hypothesis is true a plus
is as likely as a minus
Probability(A>P)=
Probability(P>A)=0.5
Ordinal Outcome Data:
Wilcoxon Matched Pairs Test
-
We rank the differences ignoring the
sign
-
We sum the ranks of the positive
and negative differences separately
(excluding 0): T+ & T If the NH were true, we would
expect the two rank sums to be
about the same
The test statistic, T, is the lesser
of the sums. T+=206.5, T-=24.5.
The smaller the T, the lower the
probability of the data arising by
chance.
40
3 Compare Several
Independent Samples
41
Univariate Statistical Methods:
Compare Several Independent Samples
Design or Aim of Study
Type of Outcome
Data/Assumptions
Statistical Method
COMPARE SEVERAL INDEPENDENT SAMPLES
Compare Several means
Continuous, Normal dist.,
Same Variance
Compare Time to an Event Survival
in Several Groups
One-way Analysis of Variance
(ANOVA)
Logrank Test
42
Univariate Statistical Methods:
Compare Several Means
Design or Aim of Study
Type of Outcome
Data/Assumptions
Statistical Method
COMPARE SEVERAL INDEPENDENT SAMPLES
Compare Several means
Continuous, Normal dist.,
Same Variance
Compare Time to an Event Survival
in Several Groups
One-way Analysis of Variance
(ANOVA)
Logrank Test
43
Compare Several Means: ANOVA
• An extension of t-test to compare 3 or more independent
groups.
• H0: The samples for each group com from populations with
same mean values.
• It provides one p value comparing all groups. Only if that is
significant further contrasts are justified.
• ANOVA is based on partitioning variability :
 Between Group Variance: Variability (differences)
between the groups
 Residual Variance: Remaining variability due to within
group differences.
44
ANOVA: The F ratio
Statistic: The F
ratio of the two
variances  if the
groups are truly
different
Between-Group
variability should
be greater than
the residual
(Within-Group).
“Visual ANOVA" from the Wolfram Demonstrations Project
http://demonstrations.wolfram.com/VisualANOVA/
45
ANOVA: The F ratio
Hi BW
Low Within
Not all groups
differ!
46
ANOVA: The F ratio
Hi BW
Hi Within
47
ANOVA: Example
Use of RAB 753 in psoriasis – a multi centre Phase 2 clinical trial
Protocol: A double blind randomised
placebo controlled parallel group study of 2
concentrations of topical RSB 753 (.5 and
1%) in patients with moderate to severe
psoriasis.
Primary Outcome Measure: PASI score (4
weeks score on table).
Treatment Groups: Placebo, .5% and 1%.
N=24 per group
Placebo
4.2
4
4.3
4.5
4.2
4.8
4.8
4
4.8
4.6
4.7
3.8
4.7
4.6
4.6
4.3
4.6
4
4.3
4.7
4.8
4.9
4
4
0.50%
4
3.9
3.9
3.3
3.9
3.5
4
4.4
3.6
4.4
4
3.7
3.5
4.7
4
3.5
4
3.6
4.4
4
4
3.5
3.6
3
1%
3.2
3.2
3.6
4.1
3.8
3.1
2.9
4.8
4.9
3.4
3.5
2.8
2.7
4.8
3.2
3.2
3.7
4.5
3
2.8
4.8
3
2.8
4.8
48
ANOVA: Example
49
ANOVA: Example
Group Means are
different overall.
5 Conclusions
• The Design and Type of Data determine the proper
statistical test to be used.
• Statistical analysis must be guided by hypothesis.
It’s not a fishing expedition Beware of Type I
error.
• Specify clearly for your statistician:
1. Design
2. Hypothesis
3. Variables: type and scale of measurement
51