Download Principles of sample size calculation

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Taylor's law wikipedia , lookup

Transcript
Principles of sample size
calculation
Jonathan Cook
(with thanks to Doug Altman)
Centre for Statistics in Medicine,
NDORMS, University of Oxford
EQUATOR – OUCAGS training course
24 October 2015
Outline

Principles of study design

Principles of study sample size calculation

How to determine the sample size

How to calculate in practice

Summary
2
Study design – general principles

Play of change could mislead
–

We need to be clear about our question
–
–

The more subtle a question the more precise we need to be
evaluation (i.e. more information that is more data)
What exactly are we interested in?
How precisely we want to know it?
Study (including sample size) should be fit for
purpose
–
–
Relevant
Sufficient for the intended analysis
3
Study size – how big?
Fundamental aspect of study design
– How many participants are needed?
 Ethically and scientifically important
– legitimate experimentation
– Add to knowledge
 Impact upon study conduct (e.g. 100 versus
2000)
– Management of project
– Timeframe
– Cost

4
Principles of sample size calculation
 Aim
– We wish to compare the outcome between the treatments and
determine if there is a difference between them
 Typically approach for RCT sample size calculation
– Choose the key (primary) outcome and base the calculation on it
– Get a large enough sample size to have reassurance that we will be
able to detect a meaningful difference in the primary outcome
 Main alternative approach
– Seek to estimate a quantity with a given precision
 Same principles apply to all types of study
– What we are looking for may well differ
5
Reaching the wrong conclusion (1)
What can go wrong:
 May conclude that there is a difference in outcome
between active and control groups, when in fact
no such difference exists
 Technically called a Type I error
– more usefully called a false-positive result
 Probability of making such an error is designated ,
commonly known as the significance level
 Risk of false-positive conclusion (Type I error) does
not decrease as the sample size increases
6
Reaching the wrong conclusion (2)
 May conclude that there is no evidence of a
difference in outcomes between active and control
groups, when in fact there is such a difference
 Technically called a Type II error
– more usefully called a false-negative result
 Probability of making such an error is often
designated as  (1- is commonly known as the
statistical power)
 Risk of missing an important difference (Type II
error) decreases as the sample size increases
7
Type I and Type II errors
There really is a There really is
difference
no difference
Statistically
significant
OK
Statistically
Type II error
non-significant (false negative/
[1-power])
Type I error
(false positive)
OK
How is sample size determined?
 Sample size calculation sets the recruitment
target
– Usually a formula is available
– Note analysable data not participants per se
 Required size is dependent upon:
– Trial design (e.g. cluster trial)
– Statistical analysis (e.g. t-test)
– Statistical parameters (e.g. sig. level and power)
– Difference we desire to detect (i.e. δ)

Some input have conventions some
don’t
–
Educated guess sometimes needed
9
Typically what do we need for
Standard RCT calculation?
Binary outcome
Continuous outcome
Either 1,2, 4&5 (or 3-5)
1. Anticipated control and
intervention group rates
(implies % target
difference)
2. Significance level ()
3. Power (1-β)
1. Anticipated mean in each
group (or more simply the
target mean difference)
2. Anticipated standard
deviation
3. Mean diff/SD (often called
“effect size”)
4. Significance level ()
5. Power (1-β)
More complicated study designs/statistical analyses require more
inputs and may be framed differently
10
Choice of Type I & II errors ( and )
 Varying  and power (1-) often produces greatly
different sample sizes
– For example difference of 80 & 70% in cure rate post
treatment.  =5% and power=80% requires 294 per group
How many does the following need:
–  =5% and power=90%?
–  =1% and power=90%?
 Many clinical trials (and other studies) are far too
small!
– Why?
11
Some ways to increase power
 Increase sample size
– Extend recruitment period
– Relax inclusion criteria (can work against)
– Make the trial multi-centre, or add further centres
 Increase event rate/reduce variation
– Selectively enrol “high-risk” patients
– Use a combined endpoint/precise estimate
– Do not exclude those at most risk of an event
(e.g. oldest patients)
12
Example - FILMS trial
“….. to detect a 6-point ETDRS score difference
(an effect size of 0.5) using a t-test at a 5%
level of significance and 80% power, it was
estimated that 64 participants would be
necessary in each group. This calculation was
based on data from published studies.14,15”
13
Example - FILMS trial
“….. to detect a 6-point ETDRS score difference
(an effect size of 0.5) using a t-test at a 5%
level of significance and 80% power, it was
estimated that 64 participants would be
necessary in each group. This calculation was
based on data from published studies.14,15”
14
Target difference
 How do we determine the difference we wish to
detect?
– Variety of formal and informal approaches available
– They can be judgement based, data driven or a combination
 Most seek to identify a target difference which is
viewed as important
–
e.g. minimum clinically important difference (MCID)
•
Hard to pin down!!!
 For a continuous outcome Cohen guidance often
resorted to (small, medium and large)
15
Example text – FILMS expanded
FILMS trial: The primary outcome is ETDRS distance
visual acuity. A target difference of a mean difference of
5 letters with a common standard deviation (SD) of 12
was assumed. Five letters is equivalent to one line on a
visual acuity chart and is viewed as an important
difference by patients and clinicians. The SD value was
based upon two previous studies – one RCT and one
observational comparative study. This target difference
is equivalent to a standardised effect size of 0.42.
Setting the statistical significance to the 2 sided 5%
level and seeking 90% power, 123 participants per
group are required; 246 in total.
16
How to calculate sample size
 Using a formula
– Lots out there sometime only subtly different
 Nomogram
 Software (recommended approach)
– Formula
– Simulation (e.g. if no formula is readily available)
– Online/Apps
17
Sample size nomogram
Example
Power 0.8
Sign level 0.1
Effect size 0.5
Crude
nomogram:120
Proper
calculation:128
http://homepage.stat.uiowa.edu/~rlenth/Power/
BUT
GOOD FOR ROUGH ESTIMATES BUT DANGEROUS
GET SOME ADVICE FROM SOMEONE EXPERIENCED
(USUALLY NEED TO CONSULT A STATISTICIAN)!!!!
20
Summary
 The sample size is important
– Affect many things
– It needs to fit the aim, design and the analysis
 Sample size process is complex
– Not a one-hit wonder
– Easy to go wrong
21