Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Statistics for GP and the AKT Sept ‘11 Aims • Be able to understand statistical terminology, interpret stats in papers and explain them to patients. • Pass the AKT Why should you care? • 10% of questions • Much less than 10% of the work • Easy marks Plan – don’t despair! • Representing data: – Parametric v non parametric data – Normal distribution and standard deviation – Types of data – Mean, median, mode – Prevalence and incidence • • – P value – Confidence intervals – Type 1 and type 2 error Clinical tests – Sensitivity, specificity – Positive predictive value, negative predictive value – Likelihood ratios for positive and negative test Types of studies Grades of evidence Types of bias Tests of statistical significance Significance of results : Magnitude of results: – NNT, NNH – Absolute risk reduction, Relative risk reduction – Hazard ratio – Odds ratio Types of research: – – – – • • • Pretty pictures: – Forest plot – Funnel plot – Kaplan-Meier survival curve The Normal Distribution •Frequency on y axis and continuous variable on x •Symmetrical, just as many have more than average as less than average •Generally true for medical tests and measurements Standard deviation • A measure of spread SD and the normal distribution •68.2% of data within 1SD •95.5% of data within 2SD •99.8% of data within 3SD •95% of data within 1.96 SD Defining ‘normal’ •Can be used to define normal for medical tests e.g. Na •But be definition 5% of ‘normal’ people will be ‘too high’ and 5% ‘too low’. Normality Positive and negative skew Parametric and non-parametric • If it’s normally distributed, it’s parametric • If it’s skewed, it’s non-parametic Mean, median and mode • Use mean for parametric data • Median for non parametric data • In a normal distribution: Mean = median = mode • For a negatively skewed distribution: Mean < median < mode • For a positively skewed distribution: Mean > median > mode • Remember alphabetical order, <for negative, >for positive What sort of distribution is this? Which is a normal distribution? Types of data Types of data • Continuous – can take any value e.g. height • Discrete – can only take integers e.g. number of asthma attacks • Nominal – into categories in no particular order e.g. colour of smarties • Ordinal – into categories with an inherent rank e.g. Bristol stool chart Prevalence and incidence • Prevalence – proportion of people that have a disease at a given time • Incidence – number of new cases per population per time • Prevalence = incidence x length of disease Types of research • • • • RCT Cohort Case controlled Cross sectional Group work • Definition • Strengths • Weaknesses • Example where it would be the most appropriate study to use RCT • Interventional study • Used to compare treatment(s) with a control group. • Control group have placebo or current best treatment. • Best evidence but…. • Expensive and ethical problems • Two types – Group comparative – Cross-over Cohort • Longitudinal/follow-up studies. • Usually prospective Disease Exposed Well Population selection Time Disease Not exposed Well • Assessed using relative risk Case control • Usually retrospective • Reverse cohort study Exposed Disease Not exposed selection Time Exposed Well Not exposed • Assessed using odds ratio Population Cross-sectional • Prevalence study • Evaluate a defined population at a specific time. • Used to assess disease status and compare populations Levels of Evidence • Ia – Meta analysis of RCT’s • Ib – RCT(s) • IIa – well designed non-randomised trial(s) • IIb – well designed experimental trial(s) • III – case, correlation and comparative • IV – panel of experts Grades of Evidence • Ia – Meta analysis of RCT’s • Ib – RCT(s) • IIa – well designed non-randomised trial(s) • IIb – well designed experimental trial(s) A B • III – case, correlation and comparative • IV – panel of experts C Bias • • • • • Confounding Observer Publication Sampling Selection CARD SORT For bonus points, spot the odd one out! Bias • Confounding – Exposed and non-exposed groups differ with respect characteristics independent of risk factor. • Observer – The patient/clinician know which treatment is being received. – Outcome measure has a subjective element. • Publication – Clinically significant results are more likely to be published – Negative results are less likely to be published • Sampling – Non-random selection from target population. • Selection – Intervention allocation to the next person is known before recruitment. Avoiding Bias • Confounding – Study design • Observer – Blinding • Publication – Journals accept more outcomes with non-significant results • Sampling – Compare groups statistically • Selection – Randomisation Chance… Types of significance tests Qualitative • Single sample (my sample vs manufacturer’s claim) – Binomial test • >1 independent sample (drug A vs drug B) – Small sample – Fisher exact test – Larger sample – Chi-squared • Dependent sample – Percentage agreement (+/- Kappa statistic) Types of significance tests Quantitative - Parametric • Single sample – Student one-sample t-test • Two independent samples – Student independent samples t-test • Two dependent samples – Student dependent samples t-test • >2 independent samples – One-way ANOVA • >2 dependent samples – ANOVA • Correlation – Pearson correlation coefficient Types of significance tests Quantitative – Non-parametric • Single sample – Kolmogorov-Smirnov test • Two independent samples – Mann-Whitney • Two dependent samples – Wilcoxon matched pairs sum test • >2 independent samples – Kruskal-Wallis test • >2 dependent samples – Friedman test • Correlation – Spearman Types of significance tests summary table Samples Qualitative 1 Binomial 2 >2 Correlation Ind: Fishers / *Chi squared - Dep: % agreement Quantitative Student Student Ind: one-way ANOVA Pearson Parametric Dep: ANOVA Quantitative Kolmogorov Ind: Mann-Whitney -Smirnov Nonparametric Ind: KurskalWallis Spearman Dep: Wilcoxon Dep: Friedman *Chi squared – can be used to compare quantitative data if look at proportions/percentages P value “The p value is equal to the probability of achieving a result at least as extreme as the experimental outcome by chance” • Usually significance level is 0.05 i.e. the chance that there is no real difference is less than 5% Hypothesis • Null hypothesis – states that there is no difference between the 2 treatments Errors • Type I error: – – – – – False positive The null hypothesis is rejected when it is true Probability is equal to p value Depends on significance level set not on sample size Risk increased if multiple end points • Type II error: – False negative – The null hypothesis is accepted when it is true i.e. fail to find a statistical significant difference – More likely if small sample size Error Sample populations Confidence intervals • 95% confidence interval means you are 95% sure that the result for the true population lies within this range • The bigger the sample, i.e. the more representative of the true population, the smaller the confidence interval. Confidence intervals (the maths) • For 95% confidence interval: Mean ± 1.96 x SEM • Standard error of the mean = SD / √n i.e. standard deviation divided by square root of number of samples As number of samples increases, SEM decreases. Confidence intervals • We measure the concentration span of a sample of 36 VTS trainees. The mean concentration span is 2.4 seconds and the standard deviation is 1.2 seconds. • What is the approximate 95% confidence interval? 1. 2. 3. 4. 5. 6. 1.2 – 3.6 seconds Too short to measure and getting shorter 2.2 – 2.6 seconds 2.3 – 2.5 seconds 2.0 – 2.8 seconds I don’t care Confidence intervals and trials • If the confidence interval of a difference doesn’t include 0, then the result is statistically significant. After 30 minutes of stats, the mean reduction in attention span was 2.3 minutes (0.8 – 3.8). • If the confidence interval of a relative risk doesn’t include 1, then the result is statistically significant. Relative risk of death after learning about stats was 0.7(0.3 – 1.1) Magnitude of results – NNT, NNH – Absolute risk reduction, Relative risk reduction – Hazard ratio – Odds ratio Relative risk • How many times more likely if….? Disease Total Exposed A B EER = A/B Control C D CER = C/D • EER = Exposed (or experimental) event rate • CER = Control event rate • RR = EER / CER Relative risk reduction (or increase) RRR (RRI) = EER-CER CER RRI = relative risk reduction EER = exposed event rate CER = control event rate Watch your R’s! Hazard • Hazard ratio (HR) – estimate of RR over time – Deaths rate in A/Death rate in B (2=twice as many, 0.5=half as many) – Note: hazard ratio does not reflect median survival time it is relative probability of dying Number needed to treat (NNT) Number needed to harm (NNH) • How many patients need to be treated to... • Absolute risk reduction (ARR)=EER-CER NNT = 1/ARR = 1/EER-CER Scenario • Claire Stewart thought women with no hair were more likely to pass CSA because having hair would distract trainees by getting in their eyes. • She tested this by randomising her female trainees. Pass CSA Fail CSA Control group 15 15 Shaved trainees 20 5 • What is the relative risk of passing? • What is the RRR/RRI? • What is the NNT? Odds ratio • Used in case control studies RF No RF Odds Case A B A/B Control C D C/D • Odds ratio: case odds/control odds It doesn’t need the total. How good is a test at predicting disease? • If the test is negative, how sure can you be that you don’t have the disease? • If the test is positive, how sure can you be that you do have the disease? Tests Learn this! Sensitivity and specificity • Sensitivity – proportion people that have the disease that test positive • Specificity – proportion of people that don’t have the disease that test negative Sensitivity and specificity Predictive values • Positive predictive value – proportion of positive tests that actually represent disease • Negative predictive value – proportion of negative tests that don’t have disease Learn this! Likelihood ratios • Take into account prevalence of disease so are more useful • Likelihood ratio for a positive test = sensitivity / 1 – specificity • Likelihood ratio for a negative test = 1 – sensitivity / specificity • A likelihood ratio of greater than 1 indicates the test result is associated with the disease. • A likelihood ratio less than 1 indicates that the result is associated with absence of the disease. • A likelihood ratio close to 1 means the test is not very useful An example…. • In a VTS group of 110 people, 30 people have the dreaded lurgy. A test is developed for this. Of the 30 people with the dreaded lurgy, 18 have a positive test. 16 of the others also have a positive test. • What is the likelihood ratio for a positive test? Pretty pictures – Forest plot – Funnel plot – Kaplan-Meier survival curve Forest plots aka Blobbograms • Used in meta analysis • Graphical representation of results of different RCT’s Studies Confidence interval Odds ratio of study Size of box = study size Odds ratio of summary measure Summary measure Confidence interval OR (CI) Funnel plot • Used in meta-analysis • Demonstrates the presence/absence of publication bias Y axis – Measure of precision Individual study X axis – Treatment effect Increased precision of study = reduced variance Asymmetrical funnel = publication bias (missing data/studies) Kaplan-Meier Survival Curve • What % of people are still alive Scenario • We’ve driven Sarah Egan to insanity by not doing enough learning logs. • She’s gone on a rampage with a gun because basically life will be better without any of us around (nothing to do with pregnancy hormones…obviously) • Draw the Kaplan-Meier survival curve for MK GP trainees Number of trainees Time (units) Any questions?