Download Day2 - Department of Biostatistics

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Foundations of statistics wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Psychometrics wikipedia , lookup

Taylor's law wikipedia , lookup

History of statistics wikipedia , lookup

Omnibus test wikipedia , lookup

Statistical inference wikipedia , lookup

Misuse of statistics wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript
PhD course in Basic Biostatistics – Day 2
Erik Parner, Department of Biostatistics, Aarhus University©
Exercise 1.2+1.4 (Triglyceride)
Logarithms and exponentials
Two independent samples from normal distributions
The model, check of the model, estimation
Comparing the two means
Approximate confidence interval and test
Exact confidence interval and test using the t-distribution
Comparing two populations using a non-parametric test
The Wilcoxon-Mann-Whitney test
Type 1 and type 2 errors
Statistical power
Simple sample size calculations
Basic Biostatistics - Day 2
1
Overview
Data to analyse Type of analysis
Continuous
One sample mean
Binary
Time to event
Unpaired/Paired Type
Day
Irrelevant
Parametric
Day 1
Nonparametric Day 3
Two sample mean
Non-paired
Parametric
Day 2
Nonparametric Day 2
Paired
Parametric
Day 3
Nonparametric Day 3
Regression
Non-paired
Parametric
Day 5
Several means
Non-paired
Parametric
Day 6
Nonparametric Day 6
One sample mean
Irrelevant
Parametric
Day 4
Two sample mean
Non-paired
Parametric
Day 4
Paired
Parametric
Day 4
Regression
Non-paired
Parametric
Day 7
One sample: Cumulative risk Irrelevant
Nonparametric Day 8
Regression: Rate/hazard ratio Non-paired
Semi-parametric Day 8
Basic Biostatistics - Day 2
2
Exercise 1.2+1.4 (Triglyceride)
Assuming triglyceride measurements follows a normal
distribution gave invalid results: e.g. the PI did not have
2.5% below and above the two limits.
The triglyceride may however be analyzed using a normal
model on the log-transformed data.
We then need to transform the results back to the
original scale to obtain useful results on the triglyceride
measurements.
The method presented on the next overheads rely on
the fact that percentiles are preserved when creating
a transformation of the data.
Basic Biostatistics - Day 2
3
Exercise 1.2+1.4 (Triglyceride)
1
PI
(-1.54;-0.01)
.8
.6
.4
.2
CI mean
-0.77(-0.81;-0.74)
0
-2
-1.5
-1
-.5
0
.5
ln trigly
exp
2.5
2
1.5
PI
(0.21;0.99)
1
.5
CI median
0.46 (0.44;0.48)
0
0
.5
1
1.5
trigly
Basic Biostatistics - Day 2
4
Logarithmic and exponential transformations
Medians and percentiles are preserved when making a
transformation of the data:
exp  X   exp  A  X  A  log  X   log  A
50% to
the right
exp
16 % to
the right
Basic Biostatistics - Day 2
log
5
Logarithmic and exponential transformations
The basic properties of the logarithms and exponentials
that we will use throughout the course:
log
Product
Sum
exp
log  a  b   log  a   log  b 
log  a b   log  a   log  b 
exp  a  b   exp  a   exp  b 
exp  a  b   exp  a  exp  b 
log  a
b
  b  log  a 
exp  a  b   exp  a   exp  b 
b
Basic Biostatistics - Day 2
a
6
Logarithms and the normal distribution
Assume Y is the measurement and that log(Y)=X follows a
normal distribution with mean=median=m , and standard
deviation=s, then Y = exp(X) has:
median(Y )  exp  m 
mean(Y )  exp  m  0.5  s 2 
sd (Y )  mean  exp s
2
 1
sd
cv(Y ) 
 exp s 2   1
mean
Basic Biostatistics - Day 2
7
Logarithm and the normal distribution
If X has a normal distribution with mean=median=m , and
standard deviation=s ,then
•
a valid 95% CI for m will transform into
a valid 95% CI for the median of Y = exp(X)
•
a valid 95% PI for X will transform into
a valid 95% PI for Y = exp(X)
Basic Biostatistics - Day 2
8
Body temperature versus gender
Scientific question: Do the two gender have different normal
body temperature?
Design: 130 participants were randomly sampled, 65 males and
65 females
Data: Measured temperature, gender
Summary of the data (the units are degrees Celsius):
-------------------------------------------------------------Gender |
N(tempC) mean(tempC)
sd(tempC)
med(tempC)
----------+--------------------------------------------------Male |
65
36.72615
.3882158
36.7
Female |
65
36.88923
.4127359
36.9
--------------------------------------------------------------
Basic Biostatistics - Day 2
9
Body temperature: Plotting the data
37.5
37
36.5
Temperature (C)
37
36.5
Male
Female
Gender
35.5
36
36
35.5
Temperature (C)
37.5
38
38
Figure 2.1
Male
Female
The data looks “fine” - a few outliers among females?
Basic Biostatistics - Day 2
10
Body temperature: Checking the normality in each group
Figure 2.2
Male
0
36
36.5
37
Inverse Normal
37.5
Female
35
36
37
38
35.5
36
36.5
.5
37
37.5
38
1
Female
0
Density
35.5
36
.5
36.5
37
37.5
1
38
Male
36
Graphs by Gender
36.5
37
Inverse Normal
37.5
38
Normality looks ok!
Basic Biostatistics - Day 2
11
Body temperature: The model
A statistical model:
Two independent samples from normal distributions, i.e.
• the two samples are independent
and
each are assumed to be a random sample from a normal
distribution:
1. The observations are independent (knowing one
observation will not alter the distribution of the
others)
2. The observations come from the same distribution, e.g.
they all have the same mean and variance.
3. This distribution is a normal distribution with unknown
mean, mi, and standard deviation, si. N(mi, si2)
Basic Biostatistics - Day 2
12
Body temperature: Checking the assumptions
The first two – think about how data was collected!
1. Independence between groups –information on
different individuals
Independence within groups: Data are from different
individuals, so the assumption is probably ok.
2. In each group: The observations come from the same
distribution. Here we can only speculate.
Does the body temperature depend on known factors
of interest, for example heart rate, time of day, etc.?
Basic Biostatistics - Day 2
13
Body temperature: The estimates
The estimates are found like we did day 1:
mˆ M  36.73 36.63;36.82  ,
sˆ M  0.388, sem  mˆ M   0.048
mˆ F  36.89  36.79;36.99  ,
sˆ F  0.413, sem  mˆ F   0.051
Observe that the width of the prediction interval is
approximately
2 * 1.96 * 0.4 C = 1.6 C,
so there is a large variation in body temperature between
individuals within each of the two groups
We see that the average body temperature is higher among
women
Basic Biostatistics - Day 2
14
Body temperature: Estimating the difference
Remember focus is on the difference between the two
groups, meaning, we are interested in :
  mF  mM
The unknown difference in mean body temperature.
This is of course estimated by:
ˆ  mˆ F  mˆ M  36.89  36.73  0.16
What about the precision of this estimate?
What is the standard error of a difference?
Basic Biostatistics - Day 2
15
The standard error of a difference
If we have two independent estimates and, like here,
calculate the differences, then the standard error of
the difference is given as
 
2
2
ˆ
se   se  mˆ F  mˆ M   se  mˆ F   se  mˆ M 
We note that standard error of a difference between
two independent estimates is larger than both of the
two standard errors.
In the body temperature data we get:
 
se ˆ  0.0482  0.0512  0.070
and an approx. 95% CI
 
ˆ  1.96  se ˆ  0.163  1.96  0.070   0.025;0.301
Basic Biostatistics - Day 2
16
Testing no difference in means
 : 0.163  0.025;0.301
 
se ˆ  0.070
Here we are especially interested in the hypothesis that
body temperature is the same for the two gender:
  0  0
Hypothesis:
We can make an approx. test similar to day 1
zobs
ˆ   0 ˆ  0 0.163  0



 2.32
0.070
se ˆ  se ˆ 
and find the p-value as
2  Pr  standard normal  zobs

We get p=2.03%
Basic Biostatistics - Day 2
17
Exact inference for two independent normal samples
Just like in the one sample setting, it is possible to make
exact inference – based on the t-distribution.
And again these are easily made by a computer.
Remember the model: Two independent samples from
normal distributions with means and standard deviations,
m M ,s M
and mF ,s F
Note, both the means and the standard deviations might
be different in the two populations.
If one wants to make exact inference, then one has to
make the additional assumption:
4.
The standard deviations are the same: sM  sF
Basic Biostatistics - Day 2
18
Exact inference for two independent normal samples
Testing the hypothesis : sM  sF
This is done by considering the ratio between the two
estimated standard deviations:
Fobs
 Largest observed standard deviation 


Smallest
observed
standard
deviation


2
A large value of this F-ratio is critical for the hypothesis
The p-value = the probability of observing a F-ratio at least
as large as we have observed - given the hypothesis is true!
The p-value is here found by using an F-distribution with
(nlargest-1) and (nsmallest-1) degrees of freedom:

p  value  2  Pr F  nlargest  1; nsmallest  1  Fobs
Basic Biostatistics - Day 2

19
Exact inference for two independent normal samples
Testing the hypothesis : sM  sF
Here we have:
nF  65 sˆ F  0.413
nM  65 sˆ M  0.388
2
so
Fobs
 0.413 
2

 1.063  1.13

 0.388 
The observed variance (sd2) is 13% higher among women.
But could this be explained by sampling variation
– what is the p-value?
To find the p-value we consult an F-distribution with
64=(65-1) and 64=(65-1) degrees of freedom.
We get p-value = 63%
The difference in the observed standard deviation can be
explained by sampling variation.
We accept that sM  sF ! The fourth assumption is ok!
Basic Biostatistics - Day 2
20
Exact inference for two independent normal samples
We now have a common standard deviation :
s  sF  sM
This is estimated as a “weighted” average
sˆ 
sˆ F2   nF  1  sˆ M2   nM  1
 nF  1   nM  1
This is not found in
the Stata output
0.4132   65  1  0.3882   65  1

 0.401
 65  1   65  1
Based on this we can calculate a revised/updated standard
error of the difference:
 
se ˆ  sˆ 
1
1
1
1

 0.401 

 0.070
nF nM
65 65
Basic Biostatistics - Day 2
21
Exact inference for two independent normal samples
ˆ : 0.163
 
se ˆ  0.070
Exact confidence intervals and p-values are found by using
a t-distribution with nM + nF  2 = 65 + 652 = 128 d.f.
 
ˆ  t0.975  se ˆ  0.163  1.96  0.070   0.024;0.302 
And the exact test:
H :  0
tobs
ˆ  0 0.163


 2.32
se ˆ  0.070
and find the p-value as
2  Pr  t-distribution  tobs

We get p2.2% (either from table of standard normal
distribution, or from Stata)
Basic Biostatistics - Day 2
22
Stata: two-sample normal analysis
The F-test and t-test are easily done in Stata (more details
can be found in the file day2.do).
. cd "D:\Teaching\BasalBiostat\Lectures\Day2"
D:\Teaching\BasalBiostat\Lectures\Day2
. use normtemp.dta, clear
. * Checking the normality.
. qnorm tempC if sex==1, title("Male") name(plot2, replace)
. qnorm tempC if sex==2, title("Female") name(plot3, replace)
. graph combine plot2 plot3, name(plotright, replace) col(1)
Basic Biostatistics - Day 2
23
. sdtest tempC, by(sex)
Variance ratio test
--------------------------------------------------------------Group
| Obs
Mean
Std.Err.
Std.Dev. [95% Conf.Interval]
--------+-----------------------------------------------------Male
|
65
36.72615
.0481522 .3882158
36.62996
36.82235
Female
|
65
36.88923
.0511936 .4127359
36.78696
36.9915
--------+-----------------------------------------------------combined 130 36.80769 .0357326 .4074148 36.73699 36.87839
--------------------------------------------------------------ratio = sd(Male) / sd(Female)
f =
0.8847
Ho: ratio = 1
Ha: ratio < 1
Pr(F < f) = 0.3128
degrees of freedom = 64, 64
Ha: ratio != 1
2*Pr(F < f)= 0.6256
Basic Biostatistics - Day 2
Ha: ratio > 1
Pr(F > f)= 0.6872
24
. ttest tempC, by(sex)
Two-sample t test with equal variances
--------------------------------------------------------------Group |
Obs
Mean
Std.Err.
Std.Dev. [95%Conf.Interval]
-------+-------------------------------------------------------
Male |
65
36.72615
.0481522
.3882158
36.62996
36.82235
Female |
65
36.88923
.0511936
.4127359
36.78696
36.9915
-------+------------------------------------------------------combined
130
36.80769
.0357326
.4074148
36.73699
36.87839
-------+------------------------------------------------------diff |
-.1630766
.070281
-.3021396 -.0240136
--------------------------------------------------------------diff = mean(Male) - mean(Female)
Ho: diff = 0
t = -2.3204
degrees of freedom = 128
Ha: diff < 0
Ha: diff != 0
Pr(T < t) = 0.0110
Pr(|T| > |t|)= 0.0219
Basic Biostatistics - Day 2
Ha: diff > 0
Pr(T > t)= 0.9890
25
Exact inference for two independent normal samples
What if you reject the hypothesis of the same sd in the
two groups?
1. This indicates that the variation in the two groups differ!
Think about why!!!
2. Often it is due to the fact that the assumption of
normality is not satisfied. Maybe you would do better by
making the statistical analysis on another scale, e.g. log.
3. If you still want to compare the means on the original
scale you can make approximate inference based on the
t-distribution (e.g. ttest tempC, by(sex) unequal )
4. If you only want to test the hypothesis that the two
distributions are located the same place, then can you use
the non-parametric Wilcoxon-Mann-Whitney test – see
later.
Basic Biostatistics - Day 2
26
Body temperature example - formulations
Methods:
Data was analyzed as two independent samples from normal
distributions based on the Students t. The assumption of
normality was checked by a Q-Q plot. Estimates are given with
95% confidence intervals.
Results:
The mean body temperature was 36.9(36.8;37.0)C among
women compared to 36.7(36.6;36.8)C among men. The mean
was 0.16(0.02;0.30)C, higher for females and this was
statistically significant (p=2.3%).
Conclusion:
Based on this study we conclude that women have a small, but
statistically significantly higher mean body temperature than
men.
Basic Biostatistics - Day 2
27
Example 7.2 Birth weight and heavy smoking
Scientific question: Does the smoking habits of the mother
influence the birth weight of the child?
Design and data: (observational) The birth weight (kg) of
children born by 14 heavy smokers and 15 non-smokers were
recorded.
Summary of the data (the units is kg):
-----------------------------------------------------------------------Group | Obs
Mean
Std. Err.
Std. Dev.
[95% Conf. Interval]
---------+--------------------------------------------------------------Non-smok |
15
3.627
.0925
.3584
3.428
3.825
Heavy sm |
14
3.174
.1238
.4631
2.907
3.442
Already here we observe, that the average birth weight is
smallest among heavy-smokers: difference=452 g
Basic Biostatistics - Day 2
28
Example 7.2 Birth weight and heavy smoking
Plot the data !!!!!!
4.5
4.5
4
Birth weight
4
3.5
3.5
3
3
2.5
2.5
Non-smoker
Heavy smoker
Smoking habits
Non-smoker
Basic Biostatistics - Day 2
Heavy smoker
29
Example 7.2 Birth weight and heavy smoking
Non-smoker
Non-smokers
1.5
4.5
4
1
3.5
3
.5
2.5
3
3.5
0
4
4.5
Inverse Normal
Heavy smoker
1.5
Heavy smokers
4.5
1
4
3.5
.5
3
2.5
0
2
3
4
5
2.5
3
3.5
4
Inverse Normal
Graphs by Smoking habits
Independence, same distribution and normality seems ok.
Basic Biostatistics - Day 2
30
Example 7.2 Birth weight and heavy smoking
exact inference
Compare the standard deviations (using the computer):
2
Fobs
 0.4631 

 1.64

 0.3584 
p  35%
from F (13,14)
We accept that the two standard deviations are identical.
and again by computer we get:
Difference in mean birth weight: 0.452(0.138;0.767) kg
Hypothesis: no difference in mean birth weight. p=0.06%
Conclusion of the test:
If there was no difference between the two groups, then it
would be almost impossible to observe such a large
difference as we have seen – hence the hypothesis cannot
be true!
Basic Biostatistics - Day 2
31
The birth weight example - formulations
Methods - like the body temperature example:
Data ……intervals.
Results:
The mean birth weight was 3.627(3.428;3.825) kg among nonsmokers compared to 3.174(2.907;3.442) kg among heavy
smokers. The difference 452(138;767)g was statistically
significant (p=0.06%).
Conclusion:
Children born by heavy-smokers have a birth weight, that is
statistically significantly smaller, than that of children born
by non-smokers. The study has only limited information on
the precise size of the association.
Furthermore we have not studied the implications of the
difference in birth weight or whether the difference could
be explained by other factors, like eating habits……
Basic Biostatistics - Day 2
32
Non-Parametric test: Wilcoxon-Mann-Whitney test
Until now we have only made statistical inference based on a
parametric model.
E.g. we have focused on estimating the difference between
two groups and supplying the estimate with a confidence
interval.
We have also performed a statistical test of no difference
based on the estimate and the standard error – a parametric
test.
There are other types of tests – non-parametric tests –
that are not based on a parametric model.
These test are also based on models, but they are not
parametric models.
We will here look at the Wilcoxon-Mann-Whitney test,
which is the non-parametric analogy to the two sample t-test.
Basic Biostatistics - Day 2
33
Non-Parametric test: Wilcoxon-Mann-Whitney test
The key feature of all non-parametric tests is, that they are
based on the ranks of the data and not the actual values.
Heavy smokers
Birth
weight
Rank
2.340
1
2.380
2
2.740
4
2.860
5
2.900
6
3.180
7
3.230
8
3.270
9
3.420
13
3.530
15
3.600
17.5
3.650
20.5
3.650
20.5
3.690
22
Non-smokers
Birth
weight
Rank
2.710
3
3.310
10
3.360
11
3.410
12
3.510
14
3.540
16
3.600
17.5
3.610
19
3.700
23
3.730
24
3.830
25
3.890
26
3.990
27
4.080
28
4.130
29
Basic Biostatistics - Day 2
Smallest
Number 17 and 18
34
Non-Parametric test: Wilcoxon-Mann-Whitney test
We can now add the rank in one of the groups, here the heavy
smokers:
Heavy-smokers observed rank sum=150.5
Hypothesis: The birth weights among heavy-smokers and
non-smokers is the same.
Assuming the hypothesis is true one can calculate the
expected rank sum among the heavy-smokers and standard
error of the observed rank sum and calculate a test
statistics:
zobs
Observed ranksum  Expected ranksum

se  Observed ranksum 

150.5  210
 2.597
22.91
P-value = 0.9%
The p-value is found as
2  Pr  standard normal  zobs
Basic Biostatistics - Day 2

35
Non-Parametric test: Wilcoxon-Mann-Whitney test
We saw that the ranksum among heavy smokers was smaller
than expected if there was no true difference between the
two groups.
So small that we only observe such a discrepancy in one out
of 100 (p-val=0.9%) studies like this.
We reject the hypothesis!
Conclusion
Children born by heavy-smokers have a statistically
significant lower birth weight than children born by nonsmokers.
Remember this depends on, the sample size, the design, the
statistical analysis...
Basic Biostatistics - Day 2
36
Non-Parametric test: Wilcoxon-Mann-Whitney test
Some comments:
• There are two assumptions behind the test:
1.
Independence between and within the groups.
2.
Within each group: The observations come from the
same distribution, e.g. they all have the same mean
and variance.
• The test is designed to detect a shift in location in the
two populations and not, for example, a difference in the
variation in the two populations.
• You will only get a p-value – the possible difference in
location will is not quantified by an estimate with a
confidence interval.
• As a test it is just as valid as the t-test!
Basic Biostatistics - Day 2
37
Stata: Wilcoxon-Mann-Whitney test
. use bwsmoking.dta,clear
(Birth weight (kg) of 29 babies born to 14 heavy smokers and 15
non-smokers)
. ranksum bw, by(group)
Two-sample Wilcoxon rank-sum (Mann-Whitney) test
group |
obs
rank sum
expected
-------------+--------------------------------Non-smoker |
15
284.5
225
Heavy smoker |
14
150.5
210
-------------+--------------------------------combined |
29
435
435
unadjusted variance
adjustment for ties
adjusted variance
525.00
-0.26
---------524.74
Ho: bw(group==Non-smoker) = bw(group==Heavy smoker)
z =
2.597
Prob > |z| =
0.0094
Basic Biostatistics - Day 2
38
Type 1 and type 2 errors
We will here return to the simple interpretation of a
statistical test:
We test a hypothesis:
  0
We will make a
Type 1 error if we reject the hypothesis, if it is true.
Type 2 error if we accept the hypothesis, if it is false.
If we use a specific significance level, a, (typically 5%) then
we know:
Pr  reject    0 given it is true  
Pr  reject    0 given    0   a
The risk of a Type 1 error = a
Basic Biostatistics - Day 2
39
Type 1 and type 2 errors
What about the risk of Type 2 error:
  Pr  accept    0 given it is not true  
Pr  accept    0 given    0   ?
This will depend on several things:
1. The statistical model and test we will be using
2. What is the true value of
?
3. The precision of the estimate.
What is the sample size and standard deviation?
That is, the risk of Type 2 error, , is not constant.
Often we consider the statistical power:
Pr  reject    0 given    0   1  
Basic Biostatistics - Day 2
40
Statistical power – planning a study
- testing for no difference
Suppose we are planning a new study of fish oil and its
possible effect on diastolic blood pressure (DBP).
Assume we want to make a randomized trial with two groups
of equal size and we will test the hypothesis of no difference.
We believe that the true difference between groups in DBP
is 5mmHg.
Furthermore we believe that the standard deviation in the
increase in DBP is 9mmHg.
We plan to include 40 women in each group and analyze using
a t-test.
What is the chance, that this study will lead to a statistically
significant difference between the two groups, given the true
difference is 5mmHg?
Basic Biostatistics - Day 2
41
Statistical power, when the true difference is 5 and
sd= 7,8,9 or 10 and we test the hypothesis of no difference.
n=40 power=69%
True difference = 5 - Test for no difference
100
90
80
70
60
50
40
30
sd=10
sd=9
sd=8
sd=7
20
10
0
20
40
60
80
100
Observations in each group
Basic Biostatistics - Day 2
42
Statistical power – planning a study
We plan to include 40 women in each group and analyze using
a t-test and the true difference is 5mmHg and sd=9mmHg
Power = 69%
That is, there is only 69% chance, that such a study will lead
to a statistical significant result - given the assumptions are
true.
How may women should we include in each group if we want to
have a power of 90%?
Based on the plot we see that more than aprox. 69 women in
each group will lead to a power of 90%.
Basic Biostatistics - Day 2
43
Statistical power, when the true difference is 5 and
sd= 7,8,9 or 10 and we test the hypothesis of no difference.
power=90% n=69
True difference = 5 - Test for no difference
100
90
80
70
60
50
40
30
sd=10
sd=9
sd=8
sd=7
20
10
0
20
40
60
80
100
Observations in each group
Basic Biostatistics - Day 2
44
The power increases as a function of the expected
difference between the groups and decreases as a function
of the variation, standard deviation, within the groups
True difference = 10 - Test for no difference
100
90
80
70
60
50
40
30
sd=10
sd=9
sd=8
sd=7
20
10
0
20
40
60
80
100
Observations in each group
Basic Biostatistics - Day 2
45
Power two unpaired normal samples
In general we have the five quantities in play:
  m1 - m2 The true difference between groups
s
The standard deviation within each group
a
The significance level (typically 5%)

The risk of type 2 error = 1-the power
n
The sample size in each group
If we know four of these, then we can determine the last.
Typically, we know the first four and want to know the
sample size.
or we know ,
power.
s, a and n and then we want to know the
Basic Biostatistics - Day 2
46
Stata: power for two unpaired normal samples
Power calculations are done using the sampsi command:
. sampsi 0 5, sd1(9) sd2(9) alpha(0.05) power(0.90)
Estimated sample size for two-sample comparison of means
Test Ho: m1 = m2, where m1 is the mean in population 1
and m2 is the mean in population 2
Assumptions:
alpha
power
m1
m2
sd1
sd2
n2/n1
=
=
=
=
=
=
=
0.0500
0.9000
0
5
9
9
1.00
(two-sided)
Estimated required sample sizes:
n1 =
69
n2 =
69
* In Stata 13
* power twomeans 0 5 , sd(9) alpha(0.05) power(0.90)
Basic Biostatistics - Day 2
47
By hand: power for two unpaired normal samples
If the sample size is not too small then it can be found by
hand by using the formula :
2
s 
n  2     f a ,  
 
  Risk of type 2 error 50% 20% 10%
Statistical Power
f  5%,  
5%
50% 80% 90% 95%
3.8
7.8 10.5 13.0
If we assume a  5%,   5,s  9 and
  10%
2
9
2
then n  2     f  5%,10%   2  1.8  10.5  68
5
Basic Biostatistics - Day 2
48
•
Comments on sample size calculations
Most often done by computer (in Stata sampsi)
• There are many different formulas see Kirkwood & Stern
Table 35.1. We will only look at a few in this course.
• It is in general more relevant to test that the difference is
larger than a specified value.
A so-called Superiority or Non-inferiority study.
• Or to plan the study so that your study is expected to yield a
confidence interval with a certain width.
• You need to know the true difference and you must have an
idea of the variation within the groups. The latter you might
find based on hospital records or in the literature.
• Sample size calculations after the study has been carried out
(post –hoc) is nonsense!!
The confidence interval will show how much information you
have in the study.
Basic Biostatistics - Day 2
49