Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Study design and simple statistics 17th Feb 2005 Kath Bennett Overview • Overview of research methods, study design. • Some common statistical definitions. Research Basic research Lab, biochemical, genetic Epidemiology Distribution & determinants of disease in a population Clinical Deals with patients with a particular disease Research • Clear aims and objectives from start – hypothesis • Design study to be able to address the objectives set out • Collect complete and accurate data • Enter and analyse data • Interpret the data in light of available evidence • Publish Types of Clinical Research Quantitative Qualitative Types of clinical studies Quantitative Observational (epidemiological) Experimental (interventional) Cohort Case-Control Cross-Sectional Case Reports “Clinical trials” Randomised controlled trial Open studies Pilot study Large simplified trial Observational versus Experimental Research • Observational research seen as complementary to experimental: • Intervention producing large impact, can be shown using observational studies • Infrequent adverse events, require large numbers, inpractical in RCTS. • Longer term than RCTS. • Clinical uncertainty providing evidence for RCTS. • Impractical or unethical to do an RCT. Comparison of random and non-random studies HRT and coronary heart disease. Evidence from observational studies and recently published RCT (Lancet 2002) Relative risk Observational studies 0.5-0.75 RCT 1.29 Quantitative Methods Advantages • ‘Objective’ assessment • Can sample large numbers (cost!) • Can assess prevalence • Repeatable results (consistency) Quantitative Methods Disadvantages • Way in which questions are generated – Researcher decides limits and imposes structure – Little opportunity to detect “unexpected” new outcomes • Sources of bias – lack of explanatory power – limited ability to describe context Types of clinical studies Qualititative Focus group discussions Indepth interviewing Observation Documentary Primary versus Secondary Research Primary Clinical trials Surveys Cohort studies (original research focused on patients or populations) Secondary Systematic Reviews Meta – analyses Economic analyses (reanalysis of previously gathered data) Clinical trials • Importance for ventures into clinical research Principles required • • • • Appropriate Design Randomisation Blinding Study power or sample size Randomised Controlled Trial RCT QUESTION Treatment (efficacy, safety comparison etc.) PREFERRED DESIGN R.C.T.(randomised controlled trial) Clinical trial design • Parallel group trials – RANDOMISED:Patients randomly allocated to either one treatment or another – NON-RANDOMISED : patients not randomly allocated to treatment. • Factorial design – Patients may receive none, one or more than one of several interventions. • Cross-over trials – Patients receive one treatment followed by another. Fewer patients required but takes longer. Withinsubject comparisons, and therefore less variability producing more precise results (fewer patients required) Randomised parallel group design Participants satisfying entry criteria Randomly allocated to receive A or B A B Participants followed up exactly the same way Example: Digoxin vs Placebo – DIG study Factorial design Participants satisfying entry criteria Participants randomly allocated to one of four groups. 2x2 factorial design Example: Heart Protection Study. =Vitamins; =Placebo =Simvastatin; MRC/BHF Heart Protection Study 2x2 Factorial treatment comparisons Randomised to either: Simvastatin (40 mg daily) vs Placebo tablets Vitamins (600 mg E, 250 mg C & 20 mg beta-carotene) vs Placebo capsules Planned mean duration: At least 5 years Two-period, two-treatment cross-over trial Participants satisfying entry criteria – sometimes followed by run-in period B A A B Randomised to A followed by B or vice-versa Usually ‘washout’ in between Example: Aspergesic (A) vs ibuprofen (B) in rheumatoid arthritis. RELIABILITY CHANCE EFFECTS SYSTEMATIC BIASES Random error Systematic error To obtain evidence as reliable as possible • Minimise chance effects (random error) by – Increasing the number of patients studied (do large trials and reviews of trials) • Minimise systematic biases (systematic error) by – Using an appropriate method of allocation (randomisation) – Ensuring investigator and/or subject unaware of treatment allocation (blinding) – Basing the analyses on the allocated treatment (intention-to-treat) – Including all relevant evidence (systematic review of similar trials) Randomisation • Clinical trials, and any studies need to avoid bias – By doctor eg. preferences to treatment – By individual patient – By choice of design • Randomisation avoids bias by removing choice of treatment by doctor or patient • Randomisation is not always possible for practical or ethical reasons, leading to a controlled clinical trial (treated group compared directly with non-treated group) Blinding • Avoidance of bias in subjective assessment eg. pain, frequency of side effects achieved through blinding • Double blind (masked) trials – when both patients & investigators are not aware of which treatment group has been assigned • Single blind (masked) trials – when only the study participant is not aware of the treatment group assigned to them • ‘Placebo’ is also useful in avoiding bias Intention to treat (ITT) • Intention of randomisation is to establish similar groups of patients in each arm • Problems arise when non-adherence may be related to outcome or prognosis, leading to biased representation • ITT analyses all patients according to randomised treatment irrespective of protocol violations etc. • However, it does not solve all problems Number of patients required – sample size • Requirement for well-designed studies • Most journals now require sample size calculations • Reassurance money well spent – likelihood study will give unequivocal results • Requirement for regularity authorities i.e FDA • Low sample size can be a reason for not recognising that one treatment is superior • Unethical to perform a study if numbers too small to detect a useful difference What is “power” of a study? • “the ability to detect a true difference of clinical importance” Doug Altman • “the confidence with which the investigator can claim that a specified treatment benefit has not been overlooked”Sheila Gore Estimating sample size and power • Identify a single major outcome measure – primary endpoint – Survival, response rate, quality of life • Specify size of difference required to detect – Improvement in response from 20% to 30% • ‘We want to be reasonably certain of detecting such a difference if it really exists’ – ‘detecting a difference’ refers to P<0.05 – ‘reasonably certain’ refers to having a chance of at least 80% or obtaining such a P value Methods to calculate sample size • Equations – Mathematical equations available for computing sample size given , and (1- ) • Tables – Based on equations above • Nomogram – Summarises figures in a graph, easy to use • Computer packages Example • Objective: to compare effect of drug A vs drug B using blood pressure as outcome measure • Design: RCT – half to drug A, half to drug B • Require 80% power, and significance level set at 5% • Expected mean difference between the two groups= 6 • Pooled standard deviation SD=10 • =difference in means/SD (effect size) = 6/10 = 0.6 • From tables n=45 per group Common statistical definitions Classification of data • Different types of data – Nominal / categorical - used in classification (eg blood groups); Female / Male also – Ordinal - ordered categorical data (e.g. non-smoker, <10 day, 10-20 day, >20 day) – Interval / continuous data (e.g. age, birthweight, plasma K levels) Graphical presentations BAR CHARTS • Bar charts are used to show (graphically) frequency distributions for categorical data. • The height of each ‘bar’ in the bar chart is proportional to the number of observations or frequency of the observations in each category. BAR CHART Bar chart of Blood groups 60 50 Number of patients 40 30 20 10 A BLOOD GROUP AB B O Histograms • Similar to bar charts but for continuous (interval) data • the width of the bars varies only with varying intervals of data. • Boundaries of histogram ‘bars’ are taken as half way between the upper limit of the lower group and the lower limit of the upper group. Histogram of pre-operative haemoglobin rates Frequency (Number of patients) 16 14 12 10 8 6 4 Std. Dev = 14.40 Mean = 61.3 N = 45.00 2 0 30.0 40.0 50.0 60.0 70.0 pre-operative % haemoglobin 80.0 90.0 100.0 The Normal distribution increasing probability • An important distribution in statistics • - used for continuous data • - bell-shaped curve • - symmetric about the mean (or median) 0.4 2.5% 2.5 % 0 -4 95% -2 -1.96 0 2 1.96 4 Measures of location • Gives an idea of the ‘average’ value on a particular scale Common measures are: – Mean - sum of observations / number of observations – Median - middle value of the sample when arranged in order – Mode - most common value (used when only a few different values) Variation • Humans differ in response to exposure to adverse effects • Humans differ in response to treatment • Humans differ in disease symptoms • Diagnosis and treatment is often probabilistically based Measures of variation • Gives an idea of the spread or variability of the data • Common measures are: – Range – Quartiles - The ‘inter-quartile range’ is the difference between the 25th and 75th centiles – Sample variance - 2= 1 ( x - x )2 i n -1 Measures of dispersion (contd.) The standard deviation () is the square root of the variance. – Standard error (if repeated samples were taken, the standard deviation of means from each sample) • SE(Mean)= n Confidence intervals • Over emphasis on hypothesis testing and p-values. • The size and range of the difference between two groups is more informative than whether it is statistically significant or not. • Confidence intervals, if appropriate to the type of study, should be used for major findings in both main text and abstract. Confidence intervals • If a CI is constructed, the significance of a hypothesis test can be inferred from it. • For example, a 95% CI for the difference of two means containing 0 would infer that the difference between the means was nonsignificant at 5% Systolic blood pressure in 100 diabetic and 100 non-diabetic men 30 30 146.4 140.4 20 20 10 10 0 0 100.0 110.0 120.0 130.0 140.0 150.0 160.0 DIABETICS 170.0 180.0 190.0 100.0 110.0 120.0 130.0 140.0 150.0 NON-DIABETICS Difference between sample means = 6 mm Hg. 160.0 170.0 180.0 Systolic blood pressure in 100 men with diabetes and 100 men without • Difference of 6.0mm Hg found between mean systolic blood pressures, standard error 2.5mm Hg. • 95% confidence interval for population difference is from 1.1 to 10.9 mm Hg. • This means there is a 95% chance that the indicated range includes the ‘true’ population difference in mean blood pressure. What affects the width of a CI? • The sample size by a factor of n. Smaller sample size leads to lower precision. • Variability of data - less variable the data, more precise the estimate. • Degree of confidence. 95% most commonly used. If greater or less confidence required the CIs increase and decrease respectively. P-values and CIs • One can infer from CIs whether there is a statistical significant difference, but not vice versa. • Example, difference in BP between diabetics and non-diabetics found to be 6mm Hg. 95% confidence interval for population difference is from 1.1 to 10.9 mm Hg. • The interval does not contain ‘0’ so we can infer that there is a statistically significant difference between the groups. In fact, the p-value from an independent t-test was p=0.02. Probability • Probability and statistical tests – Statistical tests are used to assess the weight of evidence and to estimate probability that data arose from chance – Presented as ‘p value’, usually p<0.05, i.e. the observed difference would be expected to have arisen by chance less than 5% of time or p<0.001, less than 0.1% of the time – 5% or 1% is known as the significance level of the test or alpha () Effect on significance • ‘Non-significance’ – Indicates insufficient weight of evidence – Does not mean ‘no clinically important difference between groups’ – If power of test is low (i.e. sample size too small), all one can conclude is that the question of difference between groups is unresolved • Confidence intervals show, more informatively, the impact of sample size upon precision of a difference Reporting p-values P value Wording Summary >0.05 Not significant ns 0.01 to 0.05 Significant * 0.001 to 0.01 Very significant < 0.001 ** Extremely significant *** Report the actual p-value Measuring effectiveness Risk PROPORTION A ratio where the numerator (top) is part of the denominator (bottom). RISK Number of subjects in a group who have an event divided by total number of subjects in the group. It is the probability of (proportion) having an event in that group (P). It is called incidence when expressed per unit time RELATIVE RISK (RR) Ratio of risk in exposed group to risk in not exposed group (P1/P2) Example Type of vaccine I II (Control) Got Influenza 43 52 Avoided Influenza 237 198 Total 280 250 Risk of disease in Vaccine Group I = 43/280=0.154 Risk of disease in Vaccine Group II=52/250=0.208 Relative Risk (Risk Ratio) =0.154/0.208 =0.74 Odds ODDS Probability of developing disease divided by probability of not developing disease. P/ (1-P) Often expressed as number of times something expected not to happen: number of times something expected to happen. ODDS RATIO (OR) Ratio of odds for exposed group divided by odds for not exposed group. {P1/(1-P1)}/{P2/(1-P2)} Odds ratios are treated as relative risks, especially when events are rare, and emerge naturally in some types of studies (case-control studies) Example Odds of disease in Vaccine Group I = 0.154/(1-0.154)=0.182 Odds of disease in Vaccine Group II= 0.208/(1-0.208)=0.263 Odds ratio of getting disease in Group I relative to Group II=0.182/0.263=0.69 (close to relative risk of 0.74) Absolute risk reduction Absolute risk reduction (ARR) Risk in treated group minus risk in control group ARR=p1-p2 Number need to treat=1/ARR This is the number you would need to treat under each of two treatments to get one extra person cured under the new treatment Example Absolute risk reduction for vaccine I= 0.208 - 0.154=0.054 NNT=1/0.054=18.5 Thus on average one would have to give vaccine I to 19 patients to expect one extra patient is being protected from influenza compared with vaccine II. Summary • Have clear objectives and aims to study • Chose the study design that best addresses these aims • Use randomisation, blinding etc. where appropriate • Make sure sufficient numbers of individuals studied to be able to reliably answer the question. Useful statistical references • M Bland. An Introduction to Medical Statistics. • Campbell MJ and Machin D (1993) Medical Statistics: a commonsense approach. Wiley • DG Altman. Practical statistics for medical research. London: Chapman & Hall, 1991. • DS Moore and GP McCabe. Introduction to the practice of statistics. WH Freeman and Company, New York, 3rd Edition. 1999.