Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Primer on Statistics for Interventional Cardiologists Giuseppe Sangiorgi, MD Pierfrancesco Agostoni, MD Giuseppe Biondi-Zoccai, MD What you will learn • • • • • • • • • • • • Introduction Basics Descriptive statistics Probability distributions Inferential statistics Finding differences in mean between two groups Finding differences in mean between more than 2 groups Linear regression and correlation for bivariate analysis Analysis of categorical data (contingency tables) Analysis of time-to-event data (survival analysis) Advanced statistics at a glance Conclusions and take home messages What you will learn • Inferential statistics: – pivotal concepts – point estimation and confidence intervals – hypothesis testing: • rationale and significance • type I and type II error • p values and confidence intervals • multiple testing issues • one-tailed and two-tailed • power and sample size computation Methods of inquiry Statistical inquiry may be… Descriptive (to summarize or describe an observation) or Inferential (to use the observations to make estimates or predictions) Population and sample: at the heart of descriptive and inferential statistics Again: statistical inquiry may be… Descriptive (to describe a sample/population) or Inferential (to measure the likelihood that estimates generated from the sample may truly represent the underlying population) Accuracy and precision true value measurement Accuracy measures the distance from the true value Precision measures the spead in the measurements Accuracy and precision true value measurement Accuracy measures the distance from the true value Accuracy and precision true value measurement spread Accuracy measures the distance from the true value Precision measures the spead in the measurements Accuracy and precision test Accuracy and precision test Accuracy and precision test Accuracy and precision test Accuracy and precision example Accuracy and precision example Schultz et al, Am Heart J 2004 Accuracy and precision Accuracy and precision Thus: • Precision expresses the extent of RANDOM ERROR • Accuracy expresses the extent of SYSTEMATIC ERROR (ie bias) Bias Bias is a systematic DEVIATION from the TRUTH Thus: • in itself it cannot be ever recognized • there is a need for one external gold standard, one or more reference standards, and/or permanent surveillance An incomplete list of bias · Selection bias · Information bias · Confounders · Observation bias · Investigator’s bias (enthusiasm bias) · Patient’s background bias · Distribution of pathological changes bias · Selection bias · Small sample size bias · Reporting bias · Referral bias · Variation bias · Recall bias · Statistical bias · Selection bias · Confounding · Intervention bias · Measurement or information · Interpretation bias · Publication bias · Subject selection/sampling bias Simplest classification: 1. Selection bias 2. Information bias Sackett, J Chronic Dis 1979 Selection bias Information bias Validity Internal validity entails both PRECISION and ACCURACY (ie does a study provide a truthful answer to the research question?) External validity expresses the extent to which the results can be applied to other contexts and settings. It corresponds to the distinction between SAMPLE and POPULATION) Validity Validity Validity Validity Meredith, EuroIntervention 2005 Validity Meredith, EuroIntervention 2005 Validity 100 patients lesions ≤15 mm Meredith, EuroIntervention 2005 Validity Fajadet, Circulation 2006 Validity 1197 patients lesions 15-27 mm Fajadet, Circulation 2006 Validity Validity Rothwell, Lancet 2005 What you will learn • Inferential statistics: – pivotal concepts – point estimation and confidence intervals – hypothesis testing: • rationale and significance • type I and type II error • p values and confidence intervals • multiple testing issues • one-tailed and two-tailed • power and sample size computation Frequency An easy comparison -0.10 0.20 0.40 0.60 0.80 Late loss Cypher™ Bx Velocity™ 1.0 Frequency A tough comparison -0.10 0 0.10 0.20 0.30 Late loss Cypher™ Cypher Select™ 0.40 Point estimation & confidence intervals • Using summary statistics (mean and standard deviation for normal variables, or proportion for categorical variable) and factoring sample size, we can build confidence intervals or test hypotheses that we are sampling from a given population or not • This can be done by creating a powerful tool, which weighs our dispersion measures by means of the sample size: the standard error Measure of dispersion are just descriptive Range 99% Confidence Interval (CI) 50 – Top to bottom – Not very useful 40 Interquartile range 30 – Used with median 75% CI 20 – ¼ way to ¾ way Standard deviation (SD) 10 SD – Used with mean 0 40 30 20 10 0 – Very useful From standard deviation… Standard deviation (SD): – approximates population σ SD = as N increases Advantages: – with mean enables powerful synthesis mean±1*SD 68% of data mean±2*SD 95% of data (1.96) mean±3*SD 99% of data (2.86) Disadvantages: – is based on normal assumptions 2 S( x x ) N-1 Mean ± 1 standard deviation Frequency 68% -1 SD mean +1 SD Mean ± 2 standard deviations Frequency 95% -2 SD -1 SD mean +1 SD +2 SD Mean ± 3 standard deviations Frequency 99% -3 SD -2 SD -1 SD mean +1 SD +2 SD +3 SD …to confidence intervals Standard error (SE or SEM) can be used to test a hypothesis or create a confidence interval (CI) around a mean for a continuous variable (eg lesion length) SE = SD n 95% CI = mean ± 2 SE 95% means that we can be sure at a proportion of 0.95 (almost 1!) of including the true population value in the confidence interval What about proportions? • We can easily build the standard error of a proportion, according to the following formula: SE = P * (1-P) n Where variance=P*(1-P) and n is the sample size Point estimation & confidence intervals • We can then create a simple test to check whether the summary estimate we have found can be compatible according to random variation with the corresponding reference population mean • The z test (when the population SD is known) and the t test (when the population SD is only estimated), are thus used, and both can be viewed as a signal to noise ratio Signal to noise ratio Signal to noise ratio = Signal Noise Z test Signal to noise ratio = Z score Signal Noise Absolute difference in summary estimates = Standard error Results of z score correspond to a distinct tail probability of the Gaussian curve (eg 1.96 corresponds to a 0.025 one-tailed probability or 0.050 two-tailed probability) t test Signal to noise ratio = t score Signal Noise Absolute difference in summary estimates = Standard error Results of t score corresponding to a distinct tail probability of the t distribution (eg 1.96 corresponds to a 0.025 one-tailed probability or 0.050 two-tailed probability) t test • The t test differs from the z test as the variance is only estimated as follows: • However, given the central limit theorem, when n>30 (ie with >29 degrees of freedom) the t distribution approximately corresponds to the normal distribution, thus we can use the z test and z score instead What you will learn • Inferential statistics: – pivotal concepts – point estimation and confidence intervals – hypothesis testing: • rationale and significance • type I and type II error • p values and confidence intervals • multiple testing issues • one-tailed and two-tailed • power and sample size computation Frequency An easy comparison -0.10 0.20 0.40 0.60 0.80 Late loss Stent A Stent B 1.0 Frequency A tough comparison -0.10 0 0.10 0.20 0.30 Late loss Stent C Stent D 0.40 Any comparison can be viewed as… A fight between a null hypothesis (H0), stating that there is no big difference (ie beyond random variation) between two or more populations of interest (from which we are sampling) and an alternative hypothesis (H1), which implies that there is a non-random difference between two or more populations of interest Any statistical test is a test that tries to tell us whether H0 is false (thus implying H1) may be true Why falsifying H0 instead of proving H1 is true? You can never prove that something is correct in science, you can only disprove something, ie show it is wrong Thus, only falsifiable hypotheses are scientific Sampling distribution of a difference We may create a sampling distribution of a difference for any comparison of interest eg late loss, peak CK-MB or survival rates A B A-B big difference 0 no difference big difference Sampling distribution of a difference True difference distribution Difference in our study big difference 0 no difference (ie null hypothesis or H0) big difference Potential difference distributions big difference 0 no difference (ie null hypothesis or H0) big difference Potential difference distributions big difference 0 no difference (ie null hypothesis or H0) big difference Potential difference distributions big difference 0 no difference (ie null hypothesis or H0) big difference What you will learn • Inferential statistics: – pivotal concepts – point estimation and confidence intervals – hypothesis testing: • rationale and significance • type I and type II error • p values and confidence intervals • multiple testing issues • one-tailed and two-tailed • power and sample size computation Potential pitfalls (alpha error) big difference 0 no difference big difference Gray zone where… ...we may inappropriately reject a true null hypothesis (H0), ie providing a false positive result Potential pitfalls (beta error) big difference 0 no difference big difference Another gray zone where… ...we may fail to reject a false null hypothesis (H0), ie providing a false negative result True positive test big difference 0 no difference big difference Sampling here means correctly rejecting a false null hypothesis (H0), ie providing a true positive result True negative test big difference 0 no difference big difference Sampling here means correctly retaining a true null hypothesis (H0), ie providing a true negative result Statistical or clinical significance • Clinical and statistical significance are to highly different concepts • A clinically significant difference, if proved true, would be considered clinically relevant and thus worthwhile (pending costs and tolerability) • A statistically significant difference is a probability concept, and should be viewed in light of the distance from the null hypothesis and the chosen significance treshold Alpha and type I error Whenever I perform a test, there is thus a risk of a FALSE POSITIVE result, ie REJECTING A TRUE null hypothesis This error is called type I, is measured as alpha and its unit is the p value The lower the p value, the lower the risk of falling into a type I error (ie the HIGHER the SPECIFICITY of the test) Alpha and type I error Type I error is like a MIRAGE Because I see something that does NOT exist Beta and type II error Whenever I perform a test, there is also a risk of a FALSE NEGATIVE result, ie NOT REJECTING A FALSE null hypothesis This error is called type II, is measured as beta and its unit is a probability The complementary of beta is called power The lower the beta, the lower the risk of missing a true difference (ie the HIGHER the SENSITIVITY of the test) Beta and type II error Type II error is like being BLIND Because I do NOT see something that exists Non-invasive diagnosis of CAD Stress testing Abnormal Yes CAD No Normal Non-invasive diagnosis of CAD Stress testing Abnormal Yes CAD No True positive Normal Non-invasive diagnosis of CAD Stress testing Yes CAD No Abnormal Normal True positive False negative Non-invasive diagnosis of CAD Stress testing Abnormal Normal Yes True positive False negative No False positive CAD Non-invasive diagnosis of CAD Stress testing Abnormal Normal Yes True positive False negative No False positive True negative CAD Non-invasive diagnosis of CAD Stress testing Abnormal Normal Yes True positive False negative No False positive True negative CAD Summary of errors Experimental study H0 accepted H0 rejected H0 true Truth H0 false Summary of errors Experimental study H0 accepted H0 rejected H0 true Truth H0 false Summary of errors Experimental study H0 accepted H0 rejected H0 true Truth H0 false Type I error Summary of errors Experimental study H0 accepted H0 rejected H0 true Type I error H0 false Type II error Truth Type I error Pitt et al, Lancet 1997 Type I error Pitt et al, Lancet 2000 Type II error Burzotta, J Am Coll Cardiol Type II error De Luca, Eur Heart J 2008 Another example of beta error? Kandzari et al, JACC 2006 Another example of beta error? The PROSPECT Trial Inclusion criteria Comparison Sample size Primary end-point Consecutive patients with PCI of up to 4 lesions Endeavor vs Cypher 8800 Stent thrombosis at 3year follow-up and MACE Melikian et al, Heart 2007 Frequency Shapes of distribution & analytical errors Value Another potential cause of analytic errors Frequency 20 10 0 Value What you will learn • Inferential statistics: – pivotal concepts – point estimation and confidence intervals – hypothesis testing: • rationale and significance • type I and type II error • p values and confidence intervals • multiple testing issues • one-tailed and two-tailed • power and sample size computation P values 95% Confidence intervals The RANGE of values where we would have CONFIDENCE that the population value lies in 95 cases, if we were to perform 100 studies 95% confidence interval summary point estimate 99% confidence interval Confidence intervals P values & confidence intervals Ps and confidence intervals P values and confidence intervals are strictly connected Any hypothesis test providing a significant result (eg p=0.045) means that we can be confident at 95.5% that the population average difference lies far from zero (ie the null hypothesis) Ps and confidence intervals Thus this statistical analysis reports an odds ratio of 0.111, with 95% confidence intervals of 0.16 to 0.778, and a concordantly significant p value of 0.027 important trivial difference difference P values and confidence intervals Ho significant difference (p<0.05) non significant difference (p>0.05) What you will learn • Inferential statistics: – pivotal concepts – point estimation and confidence intervals – hypothesis testing: • rationale and significance • type I and type II error • p values and confidence intervals • multiple testing issues • one-tailed and two-tailed • power and sample size computation Multiple testing • What happens when you perform the same hypothesis test several times? … Multiple testing • What happens when you perform the same hypothesis test several times? … • The answer is restricting the analyses only to prespecified and biologically plausible sub-analysis, and using suitable corrections: – – – – – Bonferroni Dunn Tukey Keuls interaction tests Multiple testing & subgroups ENDEAVOR IV – 24-month TLR rates Risk Ratio [95% CI] Risk Ratio Endeavor Taxus P value* Diabetes 1.29 8.7% (20) 6.7% (15) 0.956 Non-diabetes 1.27 4.7% (24) 3.7% (19) 0.98 6.4% (16) 6.5% (17) >2.5 <3.0mm 1.21 5.9% (17) 4.8% (14) 3.0mm 3.45 5.5% (11) 1.6% (3) 1.07 5.2% (12) 4.9% (11) 1.38 6.2% (26) 4.5% (18) 1.21 5.6% (5) 4.6% (5) Single Stent 1.61 5.7% (39) 3.5% (23) Multiple Stents 0.84 9.1% (4) 10.8% (8) RVD 2.5mm 0.187 Lesion Length 10mm >10 <20mm 20mm 0.1 Favors Endeavor 1 Favors Taxus 10 0.412 0.324 *interaction p values calculated using logistic regression What you will learn • Inferential statistics: – pivotal concepts – point estimation and confidence intervals – hypothesis testing: • rationale and significance • type I and type II error • p values and confidence intervals • multiple testing issues • one-tailed and two-tailed • power and sample size computation One- or two-tailed tests mean± 1.96*SD <2.5% <2.5% One- or two-tailed tests mean± 1.96*SD <2.5% <2.5% When can we use a one-tailed test? When you assume that the difference is only in one direction: ALMOST NEVER for superiority or equivalence comparisons One- or two-tailed tests mean± 1.96*SD <2.5% <2.5% When can we use a one-tailed test? When you assume that the difference is only in one direction: ALMOST NEVER for superiority or equivalence comparisons When should you we use a two-tailed test? When you cannot assume the direction of the difference: ALMOST ALWAYS, except for non-inferiority comparisons What you will learn • Inferential statistics: – pivotal concepts – point estimation and confidence intervals – hypothesis testing: • rationale and significance • type I and type II error • p values and confidence intervals • multiple testing issues • one-tailed and two-tailed • power and sample size computation Sample size calculation To compute the sample size for a study we thus need: 1. 2. 3. Preferred alpha value Preferred beta value Control event rate or average value (with measure of dispersion if applicable) 4. Expected relative reduction in experimental group Svilaas et al, NEJM 2008 Another sample size example Fajadet, Circulation 2006 Power and sample size Whenever designing a study or analyzing a dataset, it is important to estimate the sample size or the power of the comparison SAMPLE SIZE Setting a specific alpha and a specific beta, you calculate the necessary sample size given the average inter-group difference and its variation POWER Given a specific sample size and alpha, in light of the calculated average inter-group difference and its variation, you obtain an estimate of the power (ie 1-beta) Power and sample size Whenever designing a study or analyzing a dataset, it is important to estimate the sample size or the power of the comparison SAMPLE SIZE Setting a specific alpha and a specific beta, you calculate the necessary sample size given the average inter-group difference and its variation POWER Given a specific sample size and alpha, in light of the calculated average inter-group difference and its variation, you obtain an estimate of the power (ie 1-beta) Power analysis To compute the power of a study we thus need: 1. 2. Preferred or actual alpha value Control event rate or average value (with measure of dispersion if applicable) 3. Expected or actual relative reduction in experimental group 4. Expected or actual sample size Biondi-Zoccai et al, Ital Heart J 2003 Thank you for your attention For any correspondence: [email protected] For further slides on these topics feel free to visit the metcardio.org website: http://www.metcardio.org/slides.html