Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Computational phylogenetics wikipedia , lookup
Pattern recognition wikipedia , lookup
Psychometrics wikipedia , lookup
Data analysis wikipedia , lookup
Predictive analytics wikipedia , lookup
Generalized linear model wikipedia , lookup
Least squares wikipedia , lookup
Medical statistics for cardiovascular disease Part 1 Giuseppe Biondi-Zoccai, MD Sapienza University of Rome, Latina, Italy [email protected] [email protected] Learning milestones • • • • • Key concepts Bivariate analysis Complex bivariate analysis Multivariable analysis Specific advanced methods Why do you need to know statistics? CLINICIAN RESEARCHER A collection of methods The EBM 3-step approach How an article should be appraised, in 3 steps: Step 1 – Are the results of the study (internally) valid? Step 2 – What are the results? Step 3 – How can I apply these results to patient care? Guyatt and Rennie, Users’ guide to the medical literature, 2002 The Cochrane Collaboration Risk of Bias Tool http://www.cochrane.org The ultimate goal of any clinical or scientific observation is the appraisal of causality Bradford Hill causality criteria • Force:* precisely defined (p<0.05, weaker criterion) and with strong relative risk (≤0.83 or ≥1.20) in the absence of multiplicity issues (stronger criterion) • Consistency:* results in favor of the association must be confirmed in other studies • Temporality: exposition must precede in a realistic fashion the event • Coherence: hypothetical cause-effect relationship is not in contrast with other biologic or natural history findings *statistics is important here Mente et al, Arch Intern Med 2009 Bradford Hill causality criteria • Biologic gradient:* exposition dose and risk of disease are positively (or negatively) associated on a continuum • Experimental: experimental evidence from laboratory studies (weaker criterion) or randomized clinical trials (stronger criterion) • Specificity: exposition is associated with a single disease (does not apply to multifactorial conditions) • Plausibility: hypothetical cause-effect relationship makes sense from a biologic or clinical perspective (weaker criterion) • Analogy: hypothetical cause-effect relationship is based on analogic reasoning (weaker criterion) *statistics is important here Mente et al, Arch Intern Med 2009 Randomization • Is the technique which defines experimental studies in humans (but not only in them), and enables the correct application of statistical tests of hypothesis in a frequentist framework (according to Ronal Fischer theory) • Randomization means assigning at random a patient (or a study unit) to one of the treatments • Over large numbers, randomization minimizes the risk of imbalances in patient or procedural features, but this does not hold true for small samples and for a large set of features Any clinical or scientific comparison can be viewed as… A battle between an underlying hypothesis (null, H0), stating that there is no meaningful difference or association (beyond random variability) between 2 or more populations of interest (from which we are sampling) and an alternative hypothesis (H1), which implies that there is a non-random difference between such populations. Any statistical test is a test trying to convince us that H0 is false (thus implying the working truthfulness of H1). Falsifiability • Falsifiability or refutability of a statement, hypothesis, or theory is an inherent possibility to prove it to be false. • A statement is called falsifiable if it is possible to conceive an observation or an argument which proves the statement in question to be false. • In this sense, falsify is synonymous with nullify, meaning not "to commit fraud" but "show to be false Statistical or clinical significance? • Statistical and clinical significance are 2 very different concepts. • A clinically significant difference, if demostrated beyond the play of chance, is clinically relevant and thus merits subsequent action (if costs and tolerability issues are not overcoming). • A statistically significant difference is a probabilistic concept and should be viewed in light of the distance from the null hypothesis and the chosen significance threshold. Descriptive statistics 100 100 AVERAGE Inferential statistics If I become a scaffolder, how likely I am to eat well every day? P values Confidence Intervals Samples and populations This is a sample Samples and populations And this is its universal population Samples and populations This is another sample Samples and populations And this might be its universal population Samples and populations But what if THIS is its universal population? Samples and populations Any inference thus depend on our confidence in its likelihood Alpha and type I error Whenever I perform a test, there is thus a risk of a FALSE POSITIVE result, ie REJECTING A TRUE null hypothesis. This error is called type I, is measured as alpha and its unit is the p value. The lower the p value, the lower the risk of falling into a type I error (ie the HIGHER the SPECIFICITY of the test). Alpha and type I error Type I error is like a MIRAGE Because I see something that does NOT exist Beta and type II error Whenever I perform a test, there is also a risk of a FALSE NEGATIVE result, ie NOT REJECTING A FALSE null hypothesis. This error is called type II, is measured as beta, and its unit is a probability. The complementary of beta is called power. The lower the beta, the lower the risk of missing a true difference (ie the HIGHER the SENSITIVITY of the test). Beta and type II error Type II error is like being BLIND Because I do NOT see something that exists Accuracy and precision true value measurement spread Accuracy measures the distance from the true value Precision measures the spead in the measurements Accuracy and precision test Accuracy and precision Thus: • Precision expresses the extent of RANDOM ERROR • Accuracy expresses the extent of SYSTEMATIC ERROR (ie bias) Validity Internal validity entails both PRECISION and ACCURACY (ie does a study provide a truthful answer to the research question?) External validity expresses the extent to which the results can be applied to other contexts and settings. It corresponds to the distinction between SAMPLE and POPULATION) Intention-to-treat analysis • Intention-to-treat (ITT) analysis is an analysis based on the initial treatment intent, irrespectively of the treatment eventually administered. • ITT analysis is intended to avoid various types of bias that can arise in intervention research, especially procedural, compliance and survivor bias. • However, ITT dilutes the power to achieve statistically and clinically significant differences, especially as drop-in and drop-out rates rise. Per-protocol analysis • In contrast to the ITT analysis, the per-protocol (PP) analysis includes only those patients who complete the entire clinical trial or other particular procedure(s), or have complete data. • In PP analysis each patient is categorized according to the actual treatment received, and not according to the originally intended treatment assignment. • PP analysis is largely prone to bias, and is useful almost only in equivalence or non-inferiority studies. ITT vs PP 100 pts enrolled 50 pts to group A (more toxic) 45 pts treated with A, 5 shifted to B because of poor global health (all 5 died) RANDOMIZATION ACTUAL THERAPY 50 pts to group B (conventional Rx, less toxic) 50 patients treated with B (none died) ITT vs PP 100 pts enrolled 50 pts to group A (more toxic) 45 pts treated with A, 5 shifted to B because of poor global health (all 5 died) RANDOMIZATION ACTUAL THERAPY 50 pts to group B (conventional Rx, less toxic) 50 patients treated with B (none died) • ITT: 10% mortality in group A vs 0% in group B, p=0.021 in favor of B ITT vs PP 100 pts enrolled 50 pts to group A (more toxic) 45 pts treated with A, 5 shifted to B because of poor global health (all 5 died) RANDOMIZATION ACTUAL THERAPY 50 pts to group B (conventional Rx, less toxic) 50 patients treated with B (none died) • ITT: 10% mortality in group A vs 0% in group B, p=0.021 in favor of B • PP: 0% (0/45) mortality in group A vs 9.1% (5/55) in group B, p=0.038 in favor of A Mean (arithmetic) Characteristics: -summarises information well -discards a lot of information (dispersion??) x x N Assumptions: -data are not skewed – distorts the mean – outliers make the mean very different -Measured on measurement scale – cannot find mean of a categorical measure ‘average’ stent diameter may be meaningless Median What is it? – The one in the middle – Place values in order – Median is central Definition: – Equally distant from all other values Used for: – Ordinal data – Skewed data / outliers Standard deviation Standard deviation (SD): – approximates population σ SD as N increases Variance Advantages: – with mean enables powerful synthesis mean±1*SD 68% of data mean±2*SD 95% of data mean±3*SD 99% of data Disadvantages: – is based on normal assumptions 2 ( x x ) N - 1 (1.96) (2.86) Interquartile range 25% to 75% percentile or 1° to 3° quartile 16.5 1st-3rd Quartile =16.5; 23.5 Interquartile Range =23.5-16.5=7.0 Median 23.5 Variable type Continuous Patient ID Lesion Length 11 14 6 15 7 16 3 17 1 18 8 18 10 19 9 21 12 22 5 23 2 24 4 25 13 27 Coefficient of variation CV = Standard deviation x 100 Mean Coefficient of variation (CV) is a index of relative variability CV is dimensionless CV enables you to compare data dispersion of variables with different units of measurement Learning milestones • • • • • Key concepts Bivariate analysis Complex bivariate analysis Multivariable analysis Specific advanced methods Point estimation & confidence intervals • Using summary statistics (mean and standard deviation for normal variables, or proportion for categorical variable) and factoring sample size, we can build confidence intervals or test hypotheses that we are sampling from a given population or not • This can be done by creating a powerful tool, which weighs our dispersion measures by means of the sample size: the standard error First you need the SE • We can easily build the standard error of a proportion, according to the following formula: SE = P * (1-P) n Where variance=P*(1-P) and n is the sample size Point estimation & confidence intervals • We can then create a simple test to check whether the summary estimate we have found can be compatible according to random variation with the corresponding reference population mean • The Z test (when the population SD is known) and the t test (when the population SD is only estimated), are thus used, and both can be viewed as a signal to noise ratio Signal to noise ratio Signal to noise ratio = Signal Noise From the Z test… Signal to noise ratio = Signal Noise Absolute difference in summary estimates Z score = Standard error Results of z score correspond to a distinct tail probability of the Gaussian curve (eg 1.96 corresponds to a 0.025 one-tailed probability or 0.050 two-tailed probability) …to confidence intervals Standard error (SE or SEM) can be used to test a hypothesis or create a confidence interval (CI) around a mean for a continuous variable (eg mortality rate) SE = SD n 95% CI = mean ± 2 SE 95% means that, if we repeat the study 20 times, 19 times out of 20 we will included the true population average Ps and confidence intervals P values and confidence intervals are strictly connected Any hypothesis test providing a significant result (eg p=0.045) means that we can be confident at 95.5% that the population average difference lies far from zero (ie the null hypothesis) Trivial difference Important difference P values and confidence intervals Ho significant difference (p<0.05) non significant difference (p>0.05) Power and sample size Whenever designing a study or analyzing a dataset, it is important to estimate the sample size or the power of the comparison. SAMPLE SIZE Setting a specific alpha and a specific beta, you calculate the necessary sample size given the average inter-group difference and its variation. POWER Given a specific sample size and alpha, in light of the calculated average inter-group difference and its variation, you obtain an estimate of the power (ie 1-beta). Hierachy of analysis • A statistical analysis: – Univariate (e.g. when describing mean or standard deviation) – Bivariate (e.g. when comparing age in men adn women) – Multivariable (e.g. when appraising how age and gender impact on the risk of death) – Multivariate (e.g. when appraising how age and gender simultaneously impact on risk of death and hospital costs) Types of variables Variables PAIRED OR REPEATED MEASURES eg blood pressure measured twice in the same patients at different times UNPAIRED OR INDEPENDENT MEASURES eg blood pressure measured in several different groups of patients only once Types of variables Variables CATEGORY nominal QUANTITY ordinal ordered categories ranks discrete continuous counting measuring Statistical tests Are data categorical or continuous? Categorical data: compare proportions in groups Continuous data: compare means or medians in groups How many groups? Two groups; normal data? Non-normal data; use Mann Whitney U test Normal data; use t test More than two groups; normal data? Normal data; use ANOVA Non-normal data; use Kruskal Wallis Student t test •It is used to test the null hypothesis that the means of two normally distributed populations are equal •Given two data sets (each with its mean, SD and number of data points) the t test determines whether the means are distinct, provided that the underlying distributions can be assumed to be normal •the Student t test should be used if the variances (not known) of the two populations are also assumed to be equal; the form of the test used when this assumption is dropped is sometimes called Welch's t test Mann Whitney rank sum U test Ranks late loss typestent A chyper B taxus Total N 267 295 562 Mean Rank 266,65 294,94 Test Statisticsa late loss Mann-Whitney U 35416,500 Wilcoxon W 71194,500 Z -2,063 Asymp. Sig. (2-tailed) ,032 a. Grouping Variable: typestent Sum of Ranks 71194,50 87008,50 Paired Student t test 55.1% (7.4) 48.7% (8.3) Only 11 patients !!! EF at baseline and FU in patients treated with BMC for MI Significant increase in EF by paired t test P=0.005 MAGIC, Lancet 2004 Wilcoxon signed rank test Descriptive Statistics N mld pos t mld fu 562 562 Minimum 1,50 ,00 Maximum 4,40 4,31 25th 2,4400 1,8700 Percentiles 50th (Median) 2,7500 2,4000 75th 3,1000 2,8400 Ranks N mld fu - mld pos t Negative Ranks Pos itive Ranks Ties Total 407 a 153 b 2c 562 Mean Rank 322,51 168,74 Sum of Ranks 131263,00 25817,00 Test Statisticsb a. mld fu < mld pos t b. mld fu > mld pos t c. mld fu = mld pos t Z Asymp. Sig. (2-tailed) mld fu mld pos t -13,764 a ,000 a. Bas ed on positive ranks. b. Wilcoxon Signed Ranks Tes t 1-way ANOVA •As with the t-test, ANOVA is appropriate when the data are continuous, when the groups are assumed to have similar variances, and when the data are normally distributed •ANOVA is based upon a comparison of variance attributable to the independent variable (variability between groups or conditions) relative to the variance within groups resulting from random chance. In fact, the formula involves dividing the between-group variance estimate by the within-group variance estimate Post-hoc test Descriptives blood pressure post 2 months N plac A B Total 5 4 4 13 Mean 90,00 79,00 86,00 85,38 Std. Deviation 2,646 4,082 1,633 5,455 Std. Error 1,183 2,041 ,816 1,513 95% Confidence Interval for Mean Lower Bound Upper Bound 86,71 93,29 72,50 85,50 83,40 88,60 82,09 88,68 Minimum Maximum 87 94 74 84 84 88 74 94 Multiple Comparisons Dependent Variabl e: bl ood pres sure pos t 2 m onths Bonferroni (I) drug plac A B (J) drug A B plac B plac A Mean Difference (I-J) 11,00* 4,00 -11,00* -7,00* -4,00 7,00* Std. Error 1,967 1,967 1,967 2,074 1,967 2,074 Sig. ,001 ,208 ,001 ,021 ,208 ,021 *. The m ean difference is si gnifi cant at the .05 l evel. 95% Confidence Interval Lower Bound Upper Bound 5,35 16,65 -1,65 9,65 -16,65 -5,35 -12,95 -1,05 -9,65 1,65 1,05 12,95 Kruskal Wallis test Ranks blood pres s ure pos t 1 month drug plac A B Total N 5 4 4 13 Test Statisticsa, b Chi-Square df Asym p. Sig. blood press ure post 1 m onth 8,339 2 ,015 a. Kruskal Wallis Tes t b. Grouping Variable: drug Post-hoc analysis with Mann Withney U and Bonferroni correction Mean Rank 10,50 3,13 6,50 Compare continuous variables Three (or more) paired groups Again ask yourself… Parametric or not? If parametric: ANOVA for repeated measures in SPSS… in the General Linear Model If non-parametric: Friedman test Friedman test a Descriptive StatisticsDescriptive Statisticsa iptive Statisticsa Descriptive Statisticsa Percentiles Percentiles Percentiles Percentiles Mean Std.Maximum Deviation Minimum Maximum Mean Std. Deviation Minimum Maximum 25th 50th (Median) 75th n MinimumN Maximum 25th 50th (Median) 75th N Mean Std. N Deviation Minimum 25th 50th (Median) 25th75th 50th (Median) blood s ure pre ure 4 88 93,75 92 88,50 90,00 4 s ure pre 90,00 88,00 4,082 9590,00 86 86,25 1,633 8 pre blood 86 pres 94 90,00 85 92,00 5 pres 90,00 2,828 94 90,0088,00 90,00 92,00 blood pres s ure ure blood pres s ure 4 83 83,75 87 83,50 85,00 80,00 87,505 4,08290,00 75 92,00 8585,00 85 76,25 1,633 0 85 4 93 3,000 93 80,0087,50 91,00 92,00 pos t 1 month 91,00 pos t 1 month blood pres s ure ure blood pres s ure 4 84 82,75 88 84,50 86,00 79,00 88,005 4,08290,00 74 92,50 8486,00 87 75,25 1,633 6s 87 4 94 2,646 94 79,0088,00 89,00 92,50 pos t 2 months 89,00 pos t 2 months a. drug = B a. drug = plac A Ranksa blood pres s ure pre blood pres s ure pos t 1 month blood pres s ure pos t 2 months Ranksa Mean Rank 2,00 1,90 2,10 a. drug = plac 5 ,111 2 ,946 Mean Rank 3,00 2,00 1,00 87,50 blood pres s ure pre blood pres s ure pos t 1 month blood pres s ure pos t 2 months Mean Rank 3,00 1,00 2,00 a. drug = B Test Statisticsa, b N Chi-Square df Asymp. Sig. 86,50 Ranksa a. drug = A Test Statisticsa, b N Chi-Square df Asymp. Sig. blood pres s ure pre blood pres s ure pos t 1 month blood pres s ure pos t 2 months 75th 91,50 4 8,000 2 ,018 Test Statisticsa, b N Chi-Square df Asymp. Sig. 4 8,000 2 ,018 a. Friedman Tes t a. Friedman Tes t a. Friedman Tes t b. drug = plac b. drug = A b. drug = B 2-way ANOVA A mixed-design ANOVA is used to test for differences between independent groups whilst subjecting participants to repeated measures. In a mixed-design ANOVA model, one factor is a between-subjects variable (drug) and the other is within-subjects variable (BP) CAMELOT, JAMA 2004 Binomial test Is the percentage of diabetics in this sample comparable with the known chronic AF population? We assume the population rate is at 15% Binomial Test DIABETES Group 1 Group 2 Total Category yes no N 5 8 13 Obs erved Prop. ,38 ,62 1,00 Tes t Prop. ,15 Exact Sig. (1-tailed) ,034 Compare discrete variables The second basis is the “observed”-“expected” relation Compare event rates • Absolute Risk (AR) 7.9% (47/592) & 15.1% (89/591) • Absolute Risk Reduction (ARR) 7.9% (47/592) – 15.1% (89/591) = -7.2% • Relative Risk (RR) 7.9% (47/592) / 15.1% (89/591) = 0.52 (given an equivalence value of 1) • Relative Risk Reduction (RRR) 1 – 0.52 = 0.48 or 48% • Odds Ratio (OR) 8.6% (47/545) / 17.7% (89/502) = 0.49 (given an equivalence value of 1) Post-hoc groups the chi-square test was used to determine differences between groups with respect to the primary and secondary end points. Odds ratios and their 95 percent confidence intervals were calculated. Comparisons of patient characteristics and survival outcomes were tested with the chi-square test, the chi-square test for trend, Fisher's exact test, or Student's t-test, as appropriate. This is a sub-group ! Bonferroni ! The level of significant p-value should be divided by the number of tests performed… Or the computed p-value, multiplied for the number of tests… P=0.12 and not P=0.04 !! Wenzel et al, NEJM 2004 Fisher Exact test Exp Ctrl Event a b r1 No event c d r2 s1 s2 N P= s1! * s2! * r1! * r2! N! * a! * b! * c! * d! McNemar test • The McNemar test is a hypothesis test to compare categorical variables in related samples. • For instance, how can I appraise the statistical significance of change into symptom status (asymptomatic vs symptomatic) in the same patients over time? • The McNemar test exploits the discordant pairs to generate a p value Follow-up symptomatic Follow-up asymptomatic Baseline symptomatic 15 3 Baseline asymptomatic 5 17 p=0.72 at McNemar test Follow-up symptomatic Follow-up asymptomatic Baseline symptomatic 15 0 Baseline asymptomatic 8 17 p=0.013 at McNemar test Take home messages Take home messages • Biostatistics is best seen as a set of different tools and methods which are used according the problem at hand. • Nobody is experienced in statistics at beginning, and only by facing everyday real-world problems you can familiarize yourself with different techniques and approaches. • In general terms, it is also crucial to remember that the easiest and simplest way to solve a statistical problem, if appropriate, is also the best and the one recommended by reviewers. Many thanks for your attention! For any query: [email protected] [email protected] For these slides and similar slides: http://www.metcardio.org/slides.html For similar slides: http://www.metcardio.org/slides.html Medical statistics for cardiovascular disease Part 2 Giuseppe Biondi-Zoccai, MD Sapienza University of Rome, Latina, Italy [email protected] [email protected] Learning milestones • • • • • Key concepts Bivariate analysis Complex bivariate analysis Multivariable analysis Specific advanced methods Linear regression Which of these different possible lines that I can graphically trace and compute is the best regression line? 400 Time to restenosis (days) 350 300 y' 250 y 200 150 100 50 0 0 10 20 30 40 50 60 Lesion Lenght It can be intuitively understood that it is the line that minimizes the differences between observed values (yi) and estimated values (yi’) Correlation • The square root of the coefficient of determination (R2) is the correlation coefficient (R) and shows the degree of linear association between 2 continuous variables, but disregards K. Pearson causation. • Assumes values between -1.0 (negative association), 0 (no association), and +1.0 (positive association). • It can be summarized as a point summary estimate, with specific standard error, 95% confidence interval, and p value. Dangers of not plotting data 4 sets of data: all with the same R=0.81!* *At linear regression analysis What about non-linear associations? Each number correspond to the correlation coefficient for linear association (R)!!! Pearson vs Spearman • Whenever the independent and dependent variables can be assumed to belong to normal distributions, the Pearson linear correlation method can be used, maximizing statistical power and yield. • Whenever the data are sparse, rare, and/or not belonging to normal distributions, the nonparametric Spearman correlation method should be used, which yields the rank correlation coefficient (rho), but not its R2. C. Spearman Difference of A – B in each case Bland Altman plot Mean of measurement A and B in each case Regression to the mean: don’t bet on past rookies of the year! Ecological fallacy Ecological fallacy Logistic regression • We model ln [p/(1-p)] instead of just p, and the linear model is written : ln [p/(1-p)] = ln(p) – ln(1-p) = β0 + β1*X • Logistic regression is based on the logit which transforms a dichotomous dependent variable into a continuous one Generalized Linear Models • All generalized linear models have three components : – Random component identifies the response variable and assumes a probability distribution for it – Systematic component specifies the explanatory variables used as predictors in the model (linear predictor). – Link describes the functional relationship between the systematic component and the expected value (mean) of the random component. • The GLM relates a function of that mean to the explanatory variables through a prediction equation having linear form. • The model formula states that: g(µ) = α + β1x1 + … + βkxk Generalized Linear Models • Through differing link functions, GLM corresponds to other well known models Distribution Name Normal Exponential Gamma Identity Inverse Gaussian Poisson Inverse squared Log Binomial Logit Inverse Link Function Mean Function Survival analysis • Patients experiencing one or more events are called responders • Patients who, at the end of the observational period or before such time, get out of the study without having experienced any event, are called censored Survival analysis 2 A 4 6 8 10 12 6 8 10 12 x B x A and F: events D Study end E Lost F Withdrawn C F Lost x B, C, D and E: censored Study end Study end Withdrawn D E 4 A x B C 2 Product limit (Kaplan-Meier) analysis Kaplan-Meier curves and SE Serruys et al, NEJM 2010 Learning milestones • • • • • Key concepts Bivariate analysis Complex bivariate analysis Multivariable analysis Specific advanced methods Multivariable statistical methods Goal is to explain the variation in the dependent variable by other variables simultaneously. Independent and dependent variables Independent Predictors Dependent Response Regressors Explanatory Prognostic Factors Manipulated Ind. Var. : have an effect on ... Dep. Var.: is influenced by … Bivariate statistical methods One Dep. Var. ~ One Ind. Var. Qualitative Qualitative Chi² Quantitative Logistic Reg. Qualitative Anova 1 I.V. D. V. Quantitative I.V. Quantitative Simple Regression Multivariable statistical methods One Dep. Var. ~ Several Ind. Var. Qualitative Qualitative Chi² Quantitative Logistic Reg. Qualitative Anova 1 I.V. D. V. Quantitative I.V. Quantitative Simple Regression Multivariable analysis • The methods mentioned have specific application domains depending on the nature of the variables involved in the analysis. • But conceptually and qua calculation there are a lot of similarities between these techniques. • Each of the multivariable methods evaluates the effect of an independent variable on the dependent variable, controlling for the effect of other independent variables. • Methods such as multiple regression, multi-factor ANOVA, analysis of covariance have the same assumptions towards the distribution of the dependent variable. • We will learn more about the concepts of multivariable analysis by reviewing the simple linear regression model. Multiple linear regression • Simple linear regression is a statistical model to predict the value of one continuous variable Y (dependent, response) from another continuous variable X (independent, predictor, covariate, prognostic factor). • Multiple linear regression is a natural extension of the simple linear regression model – We use it to investigate the effect on the response variable of several predictor variables, simultaneously – It is a hypothetical model of the relationship between several independent variables and a response variable. • Let’s start by reviewing the concepts of the simple linear regression model. Multiple regression models • Model terms may be divided into the following categories – – – – – Constant term Linear terms / main effects (e.g. X1) Interaction terms (e.g. X1X2) Quadratic terms (e.g. X12) Cubic terms (e.g. X13) • Models are usually described by the highest term present – Linear models have only linear terms – Interaction models have linear and interaction terms – Quadratic models have linear, quadratic and first order interaction terms – Cubic models have terms up to third order. The model-building process Source: Applied Linear Statistical Models, Neter, Kutner, Nachtsheim, Wasserman AIC and BIC AIC (Akaike Information Criterion) and BIC (Swarz Information Criterion) are two popular model selection methods. They not only reward goodness of fit, but also include a penalty that is an increasing function of the number of estimated parameters. This penalty discourages overfitting. The preferred model is the one with the lowest value for AIC or for BIC. These criteria attempt to find the model that best explains the data with a minimum of free parameters. The AIC penalizes free parameters less strongly than does the Schwarz criterion. AIC = 2k + n [ln (SSError / n)] BIC = n ln (SSError / n) + k ln(n) Two-Factor ANOVA Introduction • A method for simultaneously analyzing two factors affecting a response. – Group effect: treatment group or dose level – Blocking factor whose variation can be separated from the error variation to give more precise group comparisons: study center, gender, disease severity, diagnostic group, … • One of the most common ANOVA methods used in clinical trial analysis. • Similar assumptions as for single-factor anova. • Non-parametric alternative : Friedman test Two-Factor ANOVA The Model Response score of subject k in column i and row j Effect of treatment factor (a levels or i columns ) Interaction effect X ijk i j ( )ij ijk Overall Mean Effect of blocking factor (b levels or j rows) Error or Effect of not measured variables Analysis of Covariance ANCOVA • Method for comparing response means among two or more groups adjusted for a quantitative concomitant variable, or “covariate”, thought to influence the response. • The response variable is explained by independent quantitative variable(s) and qualitative variable(s). • Combination of ANOVA and regression. • Increases the precision of comparison of the group means by decreasing the error variance. • Widely used in clinical trials Analysis of Covariance The model • The covariance model for a single-factor with fixed levels adds another term to the ANOVA model, reflecting the relationship between the response variable and the concomitant variable. Yij i ( X ij X ) ij • The concomitant variable is centered around the mean so that the constant µ represents the overall mean in the model. Repeated-Measures Basic concepts • ‘Repeated-measures’ are measurements taken from the same subject (patient) at repeated time intervals. • Many clinical studies require: – multiple visits during the trial – response measurements made at each visit • A repeated measures study may involve several treatments or only a single treatment. • ‘Repeated-measures’ are used to characterize a response profile over time. • Main research question: – Is the mean response profile for one treatment group the same as for another treatment group or a placebo group ? • Comparison of response profiles can be tested with a single F-test. Repeated-Measures Comparing profiles Source: Common Statistical Methods for Clinical Research, 1997, Glenn A. Walker Repeated Measures ANOVA Random Effects – Mixed Model Response: miles Tests wrt Random Eff ects Summary of Fit RSquare Source MS Num DF Num F Ratio Prob>F subject[species] 17,1667 4,29167 4 2,8879 0,0588 1,219062 season 47,4583 15,8194 3 10,6449 0,0005 4,458333 species 51,0417 51,0417 1 11,8932 0,0261 0,838417 RSquare Adj 0,75224 Root Mean Square Error Mean of Response Observ ations (or Sum Wgts) SS 24 Parameter Estimates Term Std Error t Ratio Prob>|t| Intercept 4,4583333 Estimate 0,24884 17,92 <,0001 species[COY OTE]:subject[1-3] -0,666667 0,49768 -1,34 0,2003 species[COY OTE]:subject[2-3] -0,666667 0,49768 -1,34 0,2003 species[FOX]:subject[1-3] -1 0,49768 -2,01 0,0628 species[FOX]:subject[2-3] 0,25 0,49768 0,50 0,6227 -0,625 0,431003 -1,45 0,1676 0,431003 3,96 0,0012 0,431003 2,03 0,0605 0,24884 5,86 <,0001 season[f all-winter] season[spring-winter] season[summer-winter] species[COY OTE-FOX] Prediction Formula 1,7083333 0,875 1,4583333 What are your conclusions about the between subjects species effect and the within subjects season effect ? Repeated Measures ANOVA Correlated Measurements – Multivariate Model Response Profiles Multi-variate F-tests All Between Test Exact F DF Num DF Den Prob>F Wilks' Lambda 0,2516799 Value 11,8932 1 4 0,0261 Pillai's Trace 0,7483201 11,8932 1 4 0,0261 Hotelling-Lawley 2,973301 11,8932 1 4 0,0261 Roy's Max Root 2,973301 11,8932 1 4 0,0261 Logistic regression Sangiorgi et al, AHJ 2008 Multiple Regression SPSS Variable Selection Methods • Enter. A procedure for variable selection in which all variables in a block are entered in a single step. • Forward Selection (Likelihood Ratio). Stepwise selection method with entry testing based on the significance of the score statistic, and removal testing based on the probability of a likelihood-ratio statistic based on the maximum partial likelihood estimates. • Backward Elimination (Likelihood Ratio). Backward stepwise selection. Removal testing is based on the probability of the likelihood-ratio statistic based on the maximum partial likelihood estimates. Cox PH analysis • Problem – Can’t use ordinary linear regression because how do we account for the censored data? – Can’t use logistic regression without ignoring the time component • with a continuous outcome variable we use linear regression • with a dichotomous (binary) outcome variable we use logistic regression • where the time to an event is the outcome of interest, Cox regression is the most popular regression technique Cox PH analysis 1,0 ,9 MACE Free Survival ,8 ,7 ,6 ,5 ,4 ,3 ,2 ,1 0,0 0 100 200 300 Time Variables in the Equation Diabetes B ,710 SE ,204 Wald 12,066 Cosgrave et al, AJC 2005 df 1 Sig. ,001 Exp(B) 2,034 95,0% CI for Exp(B) Lower Upper 1,363 3,036 400 Harell C index Learning milestones • • • • • Key concepts Bivariate analysis Complex bivariate analysis Multivariable analysis Specific advanced methods Question: When there are many confounding covariates needed to adjust for: – Matching: based on many covariates is not practical – Stratification: is difficult, as the number of covariates increases, the number of strata grows exponentially: • 1 covariate: 2 strata 5 covariates: 32 (25) strata – Regression adjustment: may not be possible: potential problem: over-fitting Propensity score • Replace the collection of confounding covariates with one scalar function of these covariates Age Gender Ejection fraction Risk factors Lesion characteristics … 1 composite covariate: Propensity Score Balancing score Comparability Estimated Propensity Score 1.0 0.8 0.6 0.4 0.2 0.0 Ctl Trt No comparison possible… Compare treatments with propensity score • Three common methods of using the propensity score to adjust results: – Matching – Stratification – Regression adjustment Goal of a clinical trial is appraisal of… • Superiority: difference in biologic effect or clinical effect • Equivalence: lack of meaningful/clinically relevant difference in biologic effect or clinical effect • Non-inferiority: lack of meaningful/clinically relevant increase in adverse clinical event Superiority RCT • Possibly greatest medical invention ever • Randomization of adequate number of subjects ensures prognostically similar groups at study beginning • If thorough blinding is enforced, even later on groups maintain similar prognosis (except for effect of experiment) • Sloppiness/cross-over makes arm more similar > traditional treatment is not discarded • Per-protocol analysis almost always misleading Equivalence/non-inferiority RCT • Completely different paradigm • Goal is to conclude new treatment is not “meaningfully worse” than comparator • Requires a subjective margin • Sloppiness/cross-over makes arm more similar -> traditional treatment is more likely to be discarded • Per-protocol analysis possibly useful to analyze safety, but bulk of analysis still based on intention-to-treat principle Superiority, equivalence or noninferiority? Vassiliades et al, JACC 2005 Possible outcomes in a non-inferiority trial (observed difference & 95% CI) A B C D E F G Superior Non-inferior Non-inferior Tricky (& rare) Inconclusive Inconclusive Inferior, but Inferior H 0 Treatment Difference Delta New Treatment Better New Treatment Worse Typical non-inferiority design Hiro et al, JACC 2009 Cumulative meta-analysis Antman et al, JAMA 1992 Meta-analysis of intervention studies De Luca et al, EHJ 2009 Funnel plot Review : Late percutaneous coronary intervention for infarct-related artery occlusion Comparison: 01 Late percutaneous coronary intervention vs best medical therapy for infarct-related artery occlusion Outcome: 01 Death 0.0 SE(log OR) 0.4 0.8 1.2 1.6 0.1 0.2 0.5 1 2 5 10 OR (fixed) Indirect and network meta-analyses Indirect Direct plus indirect (i.e. network) Jansen et al, ISPOR 2008 Resampling • Resampling refers to the use of the observed data or of a data generating mechanism (such as a die or computer-based simulation) to produce new hypothetical samples, the results of which can then be analyzed. • The term computer-intensive methods also is frequently used to refer to techniques such as these… Bootstrap • The bootstrap is a modern, computer-intensive, general purpose approach to statistical inference, falling within a broader class of resampling methods. • Bootstrapping is the practice of estimating properties of an estimator (such as its variance) by measuring those properties when sampling from an approximating distribution. • One standard choice for an approximating distribution is the empirical distribution of the observed data. Jacknife • Jacknifing is a resampling method based on the creation of several subsamples by excluding a single case at the time. • Thus, the are only N jacknife samples for any given original sample with N cases. • After the systematic recomputation of the statistic estimate of choice is completed, an point estimate and an estimate for the variance of the statistic can be calculated. The Bayes theorem The Bayes theorem The main feature of Bayesian statistics is that it takes into account prior knowledge of the hypothesis Bayes theorem Likelihood of hypothesis (or conditional probability of B) Prior (or marginal) probability of hypothesis P (H | D) P (D | H) * P (H) _____________ P (H | D) = P (D) Posterior (or conditional) probability of hypothesis H Probability of the data (prior or marginal probability of B: normalizing constant) Thus it relates the conditional and marginal probabilities of two random events and it is often used to compute posterior probabilities given observations. Frequentists vs Bayesians “Classical” statistical inference vs Bayesians inference Before the next module, a question for you: who is a Bayesian? A Bayesian is who, vaguely expecting a horse, and catching a glimpse of a donkey, strongly believes he has seen a mule Before the next module, a question for you: who is a Bayesian? A Bayesian is who, vaguely expecting a horse, and catching a glimpse of a donkey, strongly believes he has seen a mule Before the next module, a question for you: who is a Bayesian? A Bayesian is who, vaguely expecting a horse, and catching a glimpse of a donkey, strongly believes he has seen a mule JMP Statistical Discovery Software • • • • • • • JMP is a software package that was first developed by John Sall, co-founder of SAS, to perform simple and complex statistical analyses. It dynamically links statistics with graphics to interactively explore, understand, and visualize data. This allows you to click on any point in a graph, and see the corresponding data point highlighted in the data table, and other graphs. JMP provides a comprehensive set of statistical tools as well as design of experiments and statistical quality control in a single package. JMP allows for custom programming and script development via JSL, originally know as "John's Scripting Language“. An add-on JMP Genomics comes with over 100 analytic procedures to facilitate the treatment of data involving genetics, microarrays or proteomics. Pros: very intuitive, lean package for design and analysis in research Cons: less complete and less flexible than the complete SAS system Price: €€€€. R • R is a programming language and software environment for statistical computing and graphics, and it is an implementation of the S programming language with lexical scoping semantics. • R is widely used for statistical software development and data analysis. Its source code is freely available under the GNU General Public License, and pre-compiled binary versions are provided for various operating systems. R uses a command line interface, though several graphical user interfaces are available. • Pro: flexibility and programming capabilities (eg for bootstrap), sophisticated graphical capabilities. • Cons: complex and user-unfriendly interface. • Price: free. S and S-Plus • S-PLUS is a commercial package sold by TIBCO Software Inc. with a focus on exploratory data analysis, graphics and statistical modeling • It is an implementation of the S programming language. It features object-oriented programming capabilities and advanced analytical algorithms (eg for robust regression, repeated measurements, …) • Pros: flexibility and programming capabilities (eg for bootstrap), user-friendly graphical user interface • Cons: complex matrix programming environment • Price: €€€€-€€. SAS • SAS (originally Statistical Analysis System, 1968) is an integrated suite of platform independent software modules provided by SAS Institute (1976, Jim Goodnight and Co). • The functionality of the system is very complete and built around four major tasks: data access, data management, data analysis and data presentation. • Applications of the SAS system include: statistical analysis, data mining, forecasting; report writing and graphics; operations research and quality improvement; applications development; data warehousing (extract, transform, load). • Pros: very complete tool for data analysis, flexibility and programming capabilities (eg for Bayesian, bootstrap, conditional, or meta-analyses), large volumes of data • Cons: complex programming environment, labyrinth of modules and interfaces, very expensive • Price: €€€€-€€€€ Statistica • STATISTICA is a powerful statistics and analytics software package developed by StatSoft, Inc. • Provides a wide selection of data analysis, data management, data mining, and data visualization procedures. Features of the software include basic and multivariate statistical analysis, quality control modules and a collection of data mining techniques. • Pros: extensive range of methods, user-friendly graphical interface, has been called “the king of graphics” • Cons: limited flexibility and programming capabilities, labyrinth • Price: €€€€. SPSS • SPSS (originally, Statistical Package for the Social Sciences) is a computer program used for statistical analysis released in its first version in 1968 and now distributed by IBM. • SPSS is among the most widely used programs for statistical analysis in social science. It is used by market researchers, health researchers, survey companies, government, education researchers, marketing organizations and others. • Pros: extensive range of tests and procedures, user-friendly graphical interface. • Cons: limited flexibility and programming capabilities. • Price: €€€€. Stata • Stata (name formed by blending "statistics" and "data“) is a general-purpose statistical software package created in 1985 by StataCorp. • Stata's full range of capabilities includes: data management, statistical analysis, graphics generation, simulations, custom programming. Most meta-analyses tools were first developed for Stata, and thus this package offers one of the most extensive library of statistical tools for systematic reviewers • Pros: flexibility and programming capabilities (eg for bootstrap, or meta-analyses), sophisticated graphical capabilities • Cons: relatively complex interface • Price: €€-€€€ WinBUGS and OpenBUGS • WinBUGS (Windows-based Bayesian inference Using Gibbs Sampling) is a statistical software for the Bayesian analysis of complex statistical models using Markov chain Monte Carlo (MCMC) methods, developed by the MRC Biostatistics Unit, at the University of Cambridge, UK. It is based on the BUGS (Bayesian inference Using Gibbs Sampling) project started in 1989. • OpenBUGS is the open source variant of WinBUGS. • Pros: flexibility and programming capabilities • Cons: complex interface • Price: free Take home messages Take home messages • Advanced statistical methods are best seen as a set of modular tools which can be applied and tailored to the specific task of interest. • The concept of generalized linear model highlights how most statistical methods can be considered part of a broader family of methods, depending on the specific framework or link function. Many thanks for your attention! For any query: [email protected] [email protected] For these slides and similar slides: http://www.metcardio.org/slides.html