Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Principles of sample size calculation Jonathan Cook (with thanks to Doug Altman) Centre for Statistics in Medicine, NDORMS, University of Oxford EQUATOR – OUCAGS training course 24 October 2015 Outline Principles of study design Principles of study sample size calculation How to determine the sample size How to calculate in practice Summary 2 Study design – general principles Play of change could mislead – We need to be clear about our question – – The more subtle a question the more precise we need to be evaluation (i.e. more information that is more data) What exactly are we interested in? How precisely we want to know it? Study (including sample size) should be fit for purpose – – Relevant Sufficient for the intended analysis 3 Study size – how big? Fundamental aspect of study design – How many participants are needed? Ethically and scientifically important – legitimate experimentation – Add to knowledge Impact upon study conduct (e.g. 100 versus 2000) – Management of project – Timeframe – Cost 4 Principles of sample size calculation Aim – We wish to compare the outcome between the treatments and determine if there is a difference between them Typically approach for RCT sample size calculation – Choose the key (primary) outcome and base the calculation on it – Get a large enough sample size to have reassurance that we will be able to detect a meaningful difference in the primary outcome Main alternative approach – Seek to estimate a quantity with a given precision Same principles apply to all types of study – What we are looking for may well differ 5 Reaching the wrong conclusion (1) What can go wrong: May conclude that there is a difference in outcome between active and control groups, when in fact no such difference exists Technically called a Type I error – more usefully called a false-positive result Probability of making such an error is designated , commonly known as the significance level Risk of false-positive conclusion (Type I error) does not decrease as the sample size increases 6 Reaching the wrong conclusion (2) May conclude that there is no evidence of a difference in outcomes between active and control groups, when in fact there is such a difference Technically called a Type II error – more usefully called a false-negative result Probability of making such an error is often designated as (1- is commonly known as the statistical power) Risk of missing an important difference (Type II error) decreases as the sample size increases 7 Type I and Type II errors There really is a There really is difference no difference Statistically significant OK Statistically Type II error non-significant (false negative/ [1-power]) Type I error (false positive) OK How is sample size determined? Sample size calculation sets the recruitment target – Usually a formula is available – Note analysable data not participants per se Required size is dependent upon: – Trial design (e.g. cluster trial) – Statistical analysis (e.g. t-test) – Statistical parameters (e.g. sig. level and power) – Difference we desire to detect (i.e. δ) Some input have conventions some don’t – Educated guess sometimes needed 9 Typically what do we need for Standard RCT calculation? Binary outcome Continuous outcome Either 1,2, 4&5 (or 3-5) 1. Anticipated control and intervention group rates (implies % target difference) 2. Significance level () 3. Power (1-β) 1. Anticipated mean in each group (or more simply the target mean difference) 2. Anticipated standard deviation 3. Mean diff/SD (often called “effect size”) 4. Significance level () 5. Power (1-β) More complicated study designs/statistical analyses require more inputs and may be framed differently 10 Choice of Type I & II errors ( and ) Varying and power (1-) often produces greatly different sample sizes – For example difference of 80 & 70% in cure rate post treatment. =5% and power=80% requires 294 per group How many does the following need: – =5% and power=90%? – =1% and power=90%? Many clinical trials (and other studies) are far too small! – Why? 11 Some ways to increase power Increase sample size – Extend recruitment period – Relax inclusion criteria (can work against) – Make the trial multi-centre, or add further centres Increase event rate/reduce variation – Selectively enrol “high-risk” patients – Use a combined endpoint/precise estimate – Do not exclude those at most risk of an event (e.g. oldest patients) 12 Example - FILMS trial “….. to detect a 6-point ETDRS score difference (an effect size of 0.5) using a t-test at a 5% level of significance and 80% power, it was estimated that 64 participants would be necessary in each group. This calculation was based on data from published studies.14,15” 13 Example - FILMS trial “….. to detect a 6-point ETDRS score difference (an effect size of 0.5) using a t-test at a 5% level of significance and 80% power, it was estimated that 64 participants would be necessary in each group. This calculation was based on data from published studies.14,15” 14 Target difference How do we determine the difference we wish to detect? – Variety of formal and informal approaches available – They can be judgement based, data driven or a combination Most seek to identify a target difference which is viewed as important – e.g. minimum clinically important difference (MCID) • Hard to pin down!!! For a continuous outcome Cohen guidance often resorted to (small, medium and large) 15 Example text – FILMS expanded FILMS trial: The primary outcome is ETDRS distance visual acuity. A target difference of a mean difference of 5 letters with a common standard deviation (SD) of 12 was assumed. Five letters is equivalent to one line on a visual acuity chart and is viewed as an important difference by patients and clinicians. The SD value was based upon two previous studies – one RCT and one observational comparative study. This target difference is equivalent to a standardised effect size of 0.42. Setting the statistical significance to the 2 sided 5% level and seeking 90% power, 123 participants per group are required; 246 in total. 16 How to calculate sample size Using a formula – Lots out there sometime only subtly different Nomogram Software (recommended approach) – Formula – Simulation (e.g. if no formula is readily available) – Online/Apps 17 Sample size nomogram Example Power 0.8 Sign level 0.1 Effect size 0.5 Crude nomogram:120 Proper calculation:128 http://homepage.stat.uiowa.edu/~rlenth/Power/ BUT GOOD FOR ROUGH ESTIMATES BUT DANGEROUS GET SOME ADVICE FROM SOMEONE EXPERIENCED (USUALLY NEED TO CONSULT A STATISTICIAN)!!!! 20 Summary The sample size is important – Affect many things – It needs to fit the aim, design and the analysis Sample size process is complex – Not a one-hit wonder – Easy to go wrong 21