Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Overview of Hypothesis Testing Laura Lee Johnson, Ph.D. Statistician Chapter 21 (parts of 18 and 24), 3rd Edition Objectives • Formulate questions using P-values Confidence intervals Type I and Type II errors • Identity a few commonly used statistical tests for comparing two groups • Define testable hypotheses to address questions of interest • Interpret hypothesis tests and confidence intervals Read Chapter 21 • Several important elements of hypothesis testing will not be covered in this lecture Covered in the textbook chapter Covered in a video lecture with handouts found at https://ippcr.nihtraining.com/lecture_detail .php?lecture_id=216&year=2013 Read Journals • Nature Methods: Points of Significance (typically by Krzywinski and Altman) • BMJ: Endgames (several are design or statistical in nature; look in particular for the ones by Philip Sedgwick) Use Hypothesis Testing • Analyze the data to find results Programs and formulas not presented here in detail • Software makes it seem like anyone can analyze the data, but be careful Randomized Trial to Determine Effects of a Low Glycemic Index Diet in Pregnancy • Population: at risk women/fetuses Women in second pregnancy Previously delivered infant weighing greater than 4000g • Outcomes of interest Birth weight of infants (mean) Primary outcome Incidence of infant macrosomia (yes/no) Gestational weight gain Maternal glucose intolerance BMJ 2012;345:e5605 Jennifer Walsh et al Low glycaemic index diet in pregnancy to prevent macrosomia (ROLO study): randomised control trial BMJ 2012;345:e7951 (Philip Sedgwick discussion) Statistical Inference • Inferences about a population are made on the basis of results obtained from a sample drawn from that population • Want to talk about the larger population from which the subjects are drawn, not the particular subjects! Use Hypothesis Testing • Designing a study • Reviewing the design of other studies Grant or application review (e.g. NIH study section, IRB) • Interpreting study results • Interpreting others’ study results Reviewing a manuscript or journal Interpreting the news Outline Estimation and Hypotheses • How to Test Hypotheses • Confidence Intervals • Regression • Error • Diagnostic Testing • Questions & Appendix Analysis Follows Design Questions → Hypotheses → Experimental Design → Samples → Data → Analyses →Conclusions What Do We Test • Effect or Difference we are interested in Difference in Means or Proportions Odds Ratio (OR) Relative Risk (RR) Correlation Coefficient • Clinically important difference Smallest difference considered biologically or clinically relevant • Medicine: usually 2 group comparison of population means Estimation: From the Sample • Point estimation Mean Median Change in mean/median • Interval estimation Variation (e.g. range, σ2, σ, σ/√n) 95% Confidence interval (CI) Pictures, Not Numbers • • • • Scatter plots (can be useful) Histograms (can be useful) Box plots (can be useful) Bar plots PLEASE DO NOT USE! Use a table instead • Not Estimation See the data and check assumptions Z or Standard Normal Distribution 2 of the Continuous Distributions • Normal/Gaussian distribution: N( μ,σ2) μ = mean,σ2 = variance Z or standard normal = N(0,1) • t distribution: t = degrees of freedom (df) Usually a function of sample size Mean = X (sample mean) Variance = s2 (sample variance) Binary Distribution • Two options Yes/no • Binomial distribution: B (n, p) Sample size = n Proportion ‘yes’ = p Mean = np Variance = np(1-p) • Can do exact or use Normal Many More Distributions • Not going to cover Poisson Log normal Gamma Beta Weibull Many more Hypothesis Testing • Null hypothesis (H0) • Alternative hypothesis (H1 or Ha) Superiority Studies • For example Average systolic blood pressure (SBP) on Drug A is different than average SBP on Drug B Alternative Hypothesis H1 Ha HA • • • • There is an effect What you want to prove If equivalence trial, special way to do this Contradicts the null Null Hypothesis • Null of that? Usually that there is no effect Difference in the Means = 0 Correlation Coefficient = 0 [do not use] Odds Ratio (OR) = 1 Relative Risk (RR) = 1 • Sometimes compare to a fixed value so Null Mean systolic blood pressure = 120 mmHg • If an equivalence trial, look at NEJM paper or other specific resources Example Hypotheses • H0: μ1 = μ2 Mean birth weight (g) of babies born to mothers in the low glycemic diet group = Mean birth weight (g) of babies born to mothers in the non-diet (control) group • HA: μ1 ≠ μ2 Two-sided test • HA: μ1 < μ2 One-sided test 1 vs. 2 Sided Tests • Two-sided test No a priori reason 1 group should have stronger effect Difference in any direction is notable Used for most tests • One-sided test Specific interest in only one direction Not scientifically relevant/interesting if reverse situation true Use a 2-Sided Test • Almost always • If you use a one-sided test Explain yourself Penalize yourself on the alpha 0.05 2-sided test becomes a 0.025 1sided test Never “Accept” Anything • Reject the null hypothesis • Fail to reject the null hypothesis • Failing to reject the null hypothesis does NOT mean the null (H0) is true • Failing to reject the null means Not enough evidence in your sample to reject the null hypothesis In one sample saw what you saw Deviation from the null might be too small to detect reliably with the study’s sample size Outline Estimation and Hypotheses How to Test Hypotheses • Confidence Intervals • Regression • Error • Diagnostic Testing • Questions & Appendix Experiment Develop hypotheses Collect sample/Conduct experiment • Calculate test statistic • Compare test statistic with what is expected when H0 is true Information at Hand • 1 or 2 sample test? • Outcome variable Binary, Categorical, Ordered, Continuous, Survival • Population • Numbers (e.g. mean, standard deviation) Example: Hypertension/Cholesterol • 20-74 years old men • Mean cholesterol hypertensive men • Mean cholesterol in male general (normotensive) population • In the 20-74 year old male normotensive population the mean serum cholesterol is 211 mg/ml with a standard deviation of 46 mg/ml One Sample: Cholesterol Sample Data • Have data on 25 hypertensive men • Mean serum cholesterol level is 220mg/ml Point estimate of the mean X • Sample standard deviation: s = 38.6 mg/ml Point estimate of the variance = s2 Compare Sample to Population • Is 25 enough? Sample Size and Power lecture • What difference in cholesterol is clinically or biologically meaningful? • Have an available sample and want to know if hypertensives are different than general population Cholesterol Hypotheses • H0: μ1 = μ2 • H0: μ = 211 mg/ml μ = POPULATION mean serum cholesterol for male hypertensives Mean cholesterol for hypertensive men = mean for general male population • HA: μ1 ≠ μ2 • HA: μ ≠ 211 mg/ml Cholesterol Sample Data • Population information (general) μ = 211 mg/ml σ = 46 mg/ml (σ2 = 2116) • Sample information (hypertensives) X = 220 mg/ml s = 38.6 mg/ml (s2 = 1489.96) N = 25 Experiment Develop hypotheses Collect sample/Conduct experiment Calculate test statistic • Compare test statistic with what is expected when H0 is true Test Statistic • Basic test statistic for a mean test statistic = point estimate of - target value of point estimate of • σ = standard deviation (sometimes use σ/√n) • For 2-sided test: Reject H0 when the test statistic is in the upper or lower 100*α/2% of the reference distribution • What is α? Vocabulary • Types of errors Type I (α) (false positives) Type II (β) (false negatives) • Related words Significance Level: α level Power: 1- β Source of Image: Effect Size FAQs by Paul Ellis http://tinyurl.com/pn2dt68 Lydia Flynn has a nice video also explaining type I and II errors at https://www.youtube.com/watch?v=Dsa9ly4OSBk How to test? Z test or Critical Value [Chapter 21] N(0,1) distribution and alpha t test or Critical Value [Chapter 21] t distribution and alpha P-value • Confidence interval P-value • Smallest α the observed sample would reject H0 • Given H0 is true, probability of obtaining a result as extreme or more extreme than the actual sample • MUST be based on a model Normal, t, binomial, etc. Cholesterol Example • • • • • P-value for two sided test X = 220 mg/ml, σ = 46 mg/ml n = 25 H0: μ = 211 mg/ml HA: μ ≠ 211 mg/ml 2* P[ X 220] 0.33 Determining Statistical Significance: P-Value Method • Compute the exact p-value (0.33) • Compare to the predetermined α-level (0.05) • If p-value < predetermined α-level Reject H0 Results are statistically significant • If p-value > predetermined α-level Do not reject H0 Results are not statistically significant Unknown Truth and the Data Truth H0 Correct HA Correct Data Decide H0 1- α β “fail to reject True Negative False Negative H0” α 1- β Decide HA “reject H0” False Positive True Positive α = significance level 1- β = power P-value Interpretation Reminders • Measure of the strength of evidence in the data that the null is not true • A random variable whose value lies between 0 and 1 • NOT the probability that the null hypothesis is true. Other Methods to Get p-Values • Permutation test • Bootstrap Type I Error • α = P( reject H0 | H0 true) • Probability reject the null hypothesis given the null is true • False positive • Probability reject that hypertensives’ µ=211mg/ml when in truth the mean cholesterol for hypertensives is 211 Type II Error (or, 1- Power) • β = P( do not reject H0 | H1 true ) • False Negative • Probability we NOT reject that male hypertensives’ cholesterol is that of the general population when in truth the mean cholesterol for hypertensives is different than the general male population Power • Power = 1-β = P( reject H0 | H1 true ) • Everyone wants high power, and therefore low Type II error Is α or β more important ? • Depends on the question • Most will say protect against Type I error Multiple comparisons • Need to think about individual and population health implications and costs Multiple Hypothesis Testing • Adjust critical level of significance • Try to reduce probability of a type I error • Desire overall type I error rate no greater than alpha across all hypothesis tests (not only for each individual test) Usually done across the primary outcomes • Assumption of independent outcomes Rarely the case Err toward non-significance Inflate type II error rate • Which inflation is more damaging? How to test? Z test or Critical Value N(0,1) distribution and alpha t test or Critical Value t distribution and alpha P-value • Confidence interval Outline Estimation and Hypotheses How to Test Hypotheses Confidence Intervals • Regression • Error • Diagnostic Testing • Questions & Appendix Hypothesis Testing and Confidence Intervals • Hypothesis testing focuses on where the sample mean is located • Confidence intervals focus on plausible values for the population mean • In general, the best way to estimate a confidence interval is to bootstrap (details: see a statistician) CI Interpretation • Cannot determine if a particular interval does/does not contain true mean • Can say in the long run Take many samples Same sample size From the same population 95% of similarly constructed confidence intervals will contain true mean • Think about meta analyses Interpret a 95% Confidence Interval (CI) for the population mean, μ • “If we were to find many such intervals, each from a different random sample but in exactly the same fashion, then, in the long run, about 95% of our intervals would include the population mean, μ, and 5% would not.” Do NOT interpret a 95% CI… • “There is a 95% probability that the true mean lies between the two confidence values we obtained from a particular sample” • “We can say that we are 95% confident that the true mean does lie between these two values.” • Also NOTE: Overlapping CIs do NOT imply non-significance 14 Week Change Outcome Fat-free body mass (kg) BMI Corn-Soy Fortified Blend Spread Mean (95% CI) Mean (95% CI) 2.9 2.2 (2.50, 3.30) (1.82, 2.58) 2.2 1.7 (1.96, 2.44) (1.49, 1.91) BMJ 2009; 338:b1867 Ndekha MJ et al Supplementary feeding with either ready-to-use fortified spread or cornsoy blend in wasted adults starting antiretroviral therapy in Malawi Which of the following statements, if any, are true? A. The difference in BMI between treatment groups was significant at the 5% level because the 95% confidence intervals for the two groups did not overlap B. The difference in fat-free body mass between treatment groups was not significant at the 5% level because the 95% confidence intervals for the two groups overlapped BMJ 2014; 349:g5196 18 August 2014 Philp Sedgwick 14 Week Change: part A Outcome Fat-free body mass (kg) BMI Corn-Soy Fortified Blend Spread Mean (95% CI) Mean (95% CI) 2.9 2.2 (2.50, 3.30) (1.82, 2.58) 2.2 1.7 (1.96, 2.44) (1.49, 1.91) BMJ 2009; 338:b1867 Ndekha MJ et al Supplementary feeding with either ready-to-use fortified spread or cornsoy blend in wasted adults starting antiretroviral therapy in Malawi TRUE! • Can say statistically significant difference at 5% level Because the 95% confidence intervals for the two groups did not overlap Non-overlapping confidence intervals Which of the following statements, if any, are true? A. The difference in BMI between treatment groups was significant at the 5% level because the 95% confidence intervals for the two groups did not overlap B. The difference in fat-free body mass between treatment groups was not significant at the 5% level because the 95% confidence intervals for the two groups overlapped BMJ 2014; 349:g5196 18 August 2014 Philp Sedgwick 14 Week Change: part B Outcome Fat-free body mass (kg) BMI Corn-Soy Fortified Blend Spread Mean (95% CI) Mean (95% CI) 2.9 2.2 (2.50, 3.30) (1.82, 2.58) 2.2 1.7 (1.96, 2.44) (1.49, 1.91) BMJ 2009; 338:b1867 Ndekha MJ et al Supplementary feeding with either ready-to-use fortified spread or cornsoy blend in wasted adults starting antiretroviral therapy in Malawi FALSE! • The 95% confidence intervals for each group’s mean overlap? Difference in fat-free body mass between treatment groups Might be statistically significantly different Might not be We do not know OK – We know Because We Can Calculate the 95% Confidence Interval of the Mean Difference • Mean difference = 0.7 kg • Using data from the paper • 95% CI = (0.2 kg, 1.2 kg) 95% CI around the mean difference does NOT include 0 The difference in fat-free body mass between treatment groups was statistically significant at the 5% level Interpreting a Single CI for a Single Study • Provide reasonable ranges, based on the sample data, that should be considered when powering future studies • Use for comparisons and looking at direction of study results • Look at the CI for the correct value Hypothesis about group differences Do not separately evaluate CI for each group • Do not force writers to articulate a CI in words General Formula (1-α)% CI for μ Z1 / 2 Z1 / 2 X , X n n • Construct an interval around the point estimate • Look to see if the population/null mean is inside But I Have All Zeros! Calculate 95% upper bound • Known # of trials without an event (2.11 van Belle 2002, Louis 1981) • Given no observed events in n trials, 95% upper bound on rate of occurrence is 3 / (n + 1) No fatal outcomes in 20 operations 95% upper bound on rate of occurrence = 3 / (20 + 1) = 0.143, so the rate of occurrence of fatalities could be as high as 14.3% Take Home: Hypothesis Testing • Many ways to test Rejection interval [book] Z test, t test, or Critical Value [book] P-value Confidence interval • For this, all ways for same (regression) method will agree If not: math wrong, rounding errors, assumption violation? • Make sure interpret correctly Take Home Hypothesis Testing • How to turn questions into hypotheses • Failing to reject the null hypothesis DOES NOT mean that the null is true • Every test has assumptions A statistician can check all the assumptions If the data does not meet the assumptions there are non-parametric versions of tests (see text) Non-parametric tests also have assumptions Take Home: CI • Meaning/interpretation of the CI • Give us some idea of the size of the difference between two groups and the direction of the difference • In practice use Bootstrap Take Home: Vocabulary • • • • • Null Hypothesis: H0 Alternative Hypothesis: H1 or Ha or HA Significance Level: α level Statistically Significant P-value, Confidence Interval Outline Estimation and Hypotheses How to Test Hypotheses Confidence Intervals Regression • Error • Diagnostic Testing • Questions & Appendix Regression • Continuous outcome Linear • Binary outcome Logistic • Many other types Warning! • New meaning for the letter β on the next several slides How a Statistician Sees a Research Study • Model: y = β0 + β1x1 + β2x2 + ε • Y is called outcome, endpoint, but not CAUSAL • Coefficient = Coef = β ≈ Association • Variable (in model) = Covariate = x Treatment Age Gender • Everything impacts the statistical analysis Linear Regression • Model for simple linear regression Yi = β0 + β1x1i + εi β0 = intercept β1 = slope • Hypothesis testing H0: β1 = 0 vs. HA: β1 ≠ 0 In Order of Importance 1. 2. 3. Independence: Observations are independent Equal variance Normality: Normally distributed error terms with constant variance (for ANOVA and linear regression) More Than One Covariate • Yi = β0 + β1x1i + β2x2i + β3x3i + εi • Systolic Blood Pressure = β0 + β1 Drug + β2 Male + β3 Age • β1 Association between Drug and SBP Average difference in SBP between the Drug and Control groups, given sex and age Testing? • Each cefficient β has a p-value associated with it • Each model will have an F-test • Other methods to determine fit Residuals • See a statistician and/or take a biostatistics class. Or 3. Repeated Measures (3 or more time points) • Do NOT use repeated measures AN(C)OVA Assumptions quite stringent • Talk to a statistician Mixed model Generalized estimating equations Other An Aside: Correlation • Range: -1 to 1 • Test is correlation is ≠ 0 • With N=1000, easy to have highly significant (p<0.001) correlation = 0.05 Statistically significant that is No where CLOSE to meaningfully different from 0 • Partial Correlation Coefficient Do Not Use Correlation. Use Regression • Some fields: Correlation still popular Partial regression coefficients • High correlation is > 0.8 (in absolute value). Maybe 0.7 • Never believe a p-value from a correlation test • Regression coefficients are more meaningful Outline Estimation and Hypotheses How to Test Hypotheses Confidence Intervals Regression Error • Diagnostic Testing • Questions & Appendix Omics, Imaging, Testing Millions of Things at Once • False negative (Type II error) Miss what could be important Are these samples going to be looked at again? • False positive (Type I error) Waste resources following dead ends What do you need to think about? • Where is the evidence? Will this be the last study or will there be other studies, regardless of your findings? • Implications on science, public health, health care, policy If false negative result If false positive result • These answers will help guide you as to what amount of error you are willing to tolerate in your trial design Outline Estimation and Hypotheses How to Test Hypotheses Confidence Intervals Regression Error Diagnostic Testing • Questions & Appendix Little Diagnostic Testing Lingo • False Positive/False Negative (α, β) • Positive Predictive Value (PPV) Probability diseased given POSITIVE test result • Negative Predictive Value (NPV) Probability NOT diseased given NEGATIVE test result • Predictive values depend on disease prevalence Sensitivity, Specificity • Sensitivity: how good is a test at correctly IDing people who have disease Can be 100% if you say everyone is ill (all have positive result) Useless test with bad Specificity • Specificity: how good is the test at correctly IDing people who are well Example: Western vs. ELISA • • • • 1 million people ELISA Sensitivity = 99.9% ELISA Specificity = 99.9% 1% prevalence of infection 10,000 positive by Western (gold standard) 9990 true positives (TP) by ELISA 10 false negatives (FN) by ELISA Medical University South Carolina HIV test example Similar example http://www.cdc.gov/hiv/testing/lab/clia/rtcounseling.html CDC HIV Counseling with Rapid Tests information 1% Prevalence • 990,000 not infected 989,010 True Negatives (TN) 990 False Positives (FP) • Without confirmatory test Tell 990 or ~0.1% of the population they are infected when in reality they are not PPV = 91%, NPV = 99.999% 1% Prevalence • 10980 total test positive by ELISA 9990 true positive 990 false positive • 9990/10980 = probability diseased GIVEN positive by ELISA = PPV = 0.91 = 91% • 989,020 total test negatives by ELISA 989,010 true negatives 10 false negatives • 989010/989020 = NPV = 99.999% 0.1% Prevalence • 1,000 infected – ELISA picks up 999 1 FN • 999,000 not infected 989,001 True Negatives (TN) 999 False Positives (FP) • Positive predictive value = 50% • Negative predictive value = 99.999% Prevalence Matters (Population You Sample to Estimate Prevalence, too) • Numbers look “good” with high prevalence Testing at STD clinic in high risk populations • Low prevalence means even very high sensitivity and specificity will result in middling PPV • Calculate PPV and NPV for 0.01% prevalence found in United States female blood donors 10% Prevalence • 99% PPV • 99.99% NPV Prevalence Matters • PPV and NPV tend to come from good cohort data • Can estimate PPV/NPV from case control studies but the formulas are hard and you need to be REALLY sure about the prevalence Triple sure High OR Does Not a Good Test Make • Diagnostic tests need separation • ROC curves Not logistic regression with high OR • Strong association between 2 variables does not mean good prediction of separation Analysis Follows Design Questions → Hypotheses → Experimental Design → Samples → Data → Analyses →Conclusions Outline Estimation and Hypotheses How to Test Hypotheses Confidence Intervals Regression Error Diagnostic Testing Questions Example Questions Appendix Questions? Randomized Trial to Determine Effects of a Low Glycemic Index Diet in Pregnancy • Population: at risk women/fetuses Women in second pregnancy Previously delivered infant weighing greater than 4000g • Outcomes of interest Birth weight of infants (mean) Primary outcome Incidence of infant macrosomia (yes/no) Gestational weight gain Maternal glucose intolerance BMJ 2012;345:e5605 Low glycaemic index diet in pregnancy to prevent macrosomia (ROLO study): randomised control trial Statistical Testing Information • Independent samples t test of primary outcome of birth weight • Two-tailed hypothesis testing • Critical level of significance (allowable type I error) 0.05 (5%) • Superiority trial BMJ 2012;345:e5605 Low glycaemic index diet in pregnancy to prevent macrosomia (ROLO study): randomised control trial Interventions and Proportion Birth Weight > 4000g • Interventions Low glycemic diet from early pregnancy 189 of 372 infants (51%) No dietary intervention 199 of 387 infants (51%) • P = 0.88 BMJ 2012;345:e5605 Low glycaemic index diet in pregnancy to prevent macrosomia (ROLO study): randomised control trial Low Glycemic Control Group (n=387) Diet (n=372*) Mean (95% CI) Mean (95% CI) Birth weight (g) 4034 4006 (3982, 4086) (3956, 4056) Gestational 12.2 13.7 weight gain (kg) (11.8, 12.6) (13.2, 14.2) Outcome *If interested, read the article and later discussion pieces for more on the study. Good lessons are discussed. Not a perfect study. Which of the following statements, if any, are true? A. The difference in birth weight between treatment groups was not significant at the 5% level because the 95% confidence intervals for the two groups overlapped B. The difference in gestational weight gain between treatment groups was significant at the 5% level because the 95% confidence intervals for the two groups did not overlap BMJ 2014; 349:g5196 18 August 2014 Philp Sedgwick Which of the following statements, if any, are true? A. The difference in birth weight between treatment groups was not significant at the 5% level because the 95% confidence intervals for the two groups overlapped B. The difference in gestational weight gain between treatment groups was significant at the 5% level because the 95% confidence intervals for the two groups did not overlap BMJ 2014; 349:g5196 18 August 2014 Philp Sedgwick Low Glycemic Control Group (n=387) Diet (n=372) Mean (95% CI) Mean (95% CI) Birth weight (g) 4034 4006 (3982, 4086) (3956, 4056) Gestational 12.2 13.7 weight gain (kg) (11.8, 12.6) (13.2, 14.2) Outcome Interventions and Birth Weight Results • Interventions Low glycemic diet from early pregnancy Mean birth weight 4034g Standard deviation (SD) 510 No dietary intervention Mean birth weight 4006g, SD = 497 • Mean difference in birth weight = 28.6g • 95% CI = (-45.6 to 102.8) • P = 0.449 • “absence of evidence is not evidence of absence” BMJ 2012;345:e5605 Low glycaemic index diet in pregnancy to prevent macrosomia (ROLO study): randomised control trial Which of the following statements, if any, are true? A. The difference in birth weight between treatment groups was not significant at the 5% level because the 95% confidence intervals for the two groups overlapped B. The difference in gestational weight gain between treatment groups was significant at the 5% level because the 95% confidence intervals for the two groups did not overlap BMJ 2014; 349:g5196 18 August 2014 Philp Sedgwick Low Glycemic Control Group (n=387) Diet (n=372) Mean (95% CI) Mean (95% CI) Birth weight (g) 4034 4006 (3982, 4086) (3956, 4056) Gestational 12.2 13.7 weight gain (kg) (11.8, 12.6) (13.2, 14.2) Outcome Interventions and Gestational Weight Gain Results • Interventions Low glycemic diet from early pregnancy Mean = 12.2 kg (4.4) No dietary intervention Mean = 13.7 kg (4.9) • Mean difference = -1.3 kg • 95% CI = (-2.4, -0.2) • P = 0.017 BMJ 2012;345:e5605 Low glycaemic index diet in pregnancy to prevent macrosomia (ROLO study): randomised control trial Low Glycemic Control Group (n=387) Diet (n=372) Mean (95% CI) Mean (95% CI) Birth weight (g) 4034 4006 (3982, 4086) (3956, 4056) Gestational 12.2 13.7 weight gain (kg) (11.8, 12.6) (13.2, 14.2) Outcome Which statement best describes the information provided by a 95% confidence interval (CI) for mean gestational weight gain? A. 95% of sample participants in the diet group achieved a weight gain between 11.8 and 12.6 kg B. 95% of the population would achieve a weight gain between 11.8 kg and 12.6 kg if they received the low glycemic diet C. There is a probability of 0.95 that the population mean gestational weight gain would be between 11.8 and 12.6 kg D. There is a probability of 0.95 that the sample mean weight gain for the diet group was between 11.8 and 12.6 kg BMJ 2012; 344:e3147 9 May 2012 Philp Sedgwick Want to continue with these? • Yes? Raise your card Which of the following statements, if any, are true? A. The P value provides a direct statement about the size of the difference between groups in the mean gestational weight gain B. The P value provides a direct statement about the directions of the difference between groups in the mean gestational weight gain C. The P value provides a dichotomous test of significance of the statistical hypothesis D. The 95% confidence interval provides a dichotomous test of significance of the statistical hypothesis BMJ 2013; 346:f3212 17 May 2013 Philp Sedgwick Which of the following statements, if any, are true? A. The alternative hypothesis states that, in the population sampled, treatment with the low glycemic diet is inferior or superior to placebo with regard to the secondary (gestational weight gain) endpoint B. It can be inferred that the null hypothesis was not true BMJ 2014; 348:g3557 30 May 2014 Philp Sedgwick Which of the following statements, if any, are true? A. Statistical hypothesis testing based on a critical level of significance is a dichotomous test B. The P value provides a direct statement about the direction of a difference between treatment groups in mean birth weights C. The P value is the probability that the alternative hypothesis was true BMJ 2014; 349:g4550 11 July 2014 Philp Sedgwick Which of the following statements, if any, are true? A. In the population, it can be inferred that no difference exists between low glycemic diet and the control treatment in mean birth weight B. The lack of significance between treatment groups in birth weight could have been the result of a type II error C. The alternative hypothesis of the statistical hypothesis test for gestational weight gain is true BMJ 2014; 349:g4751 1 August 2014 Philp Sedgwick Which of the following statements, if any, are true? A. A type I error would have occurred if no difference between treatment groups in mean gestational weight gain had existed in the population B. The type I error rate for each statistical hypothesis test was 5% C. The type I error rate for the multiple statistical hypothesis tests was 5% BMJ 2014; 349:g5310 29 August 2014 Philp Sedgwick Additional Materials Favorite Useful Reading BMJ Endgames: Statistical Question • Many by Philip Sedgwick • Randomised controlled trials: inferring significance of treatment effects based on confidence intervals BMJ 2014;349:g5196 (18 August 2014) http://dx.doi.org/10.1136/bmj.g5196 • Pitfalls of statistical hypothesis testing: multiple testing BMJ 2014;349:g5310 (29 August 2014) http://dx.doi.org/10.1136/bmj.g5310 Articles • Comparisons against baseline within randomised groups are often used and can be highly misleading J Martin Bland and Douglas G Altman http://www.trialsjournal.com/content/12/1/2 64 Good Studies and Reporting • A call for transparent reporting to optimize the predictive value of preclinical research by Landis et al http://www.ncbi.nlm.nih.gov/pmc/articles/PMC35 11845/ NIH’s NINDS has more about this on their website • NIH Proposed Principles and Guidelines for Reporting Preclinical Research http://www.nih.gov/about/reporting-preclinicalresearch.htm Articles on Reproducible Research, Best Practices • Rigor or Mortis: Best Practices for Preclinical Research in Neuroscience by Steward and Balice-Gordon http://dx.doi.org/10.1016/j.neuron.2014.10.042 • Preclinical research: Make mouse studies work by Perrin http://www.nature.com/news/preclinicalresearch-make-mouse-studies-work-1.14913 • Policy: NIH to balance sex in cell and animal studies (Nature Comment by Clayton and Collins) http://www.nature.com/news/policy-nih-tobalance-sex-in-cell-and-animal-studies-1.15195 More Resources via NIH • NIH NIMH Enhancing the Reliability of NIMHSupported Research through Rigorous Study Design and Reporting http://www.nimh.nih.gov/researchpriorities/policies/enhancing-the-reliabilityof-nimh-supported-research-throughrigorous-study-design-and-reporting.shtml • NIMH director’s blog on p-hacking (by Insel) http://www.nimh.nih.gov/about/director/2014/ p-hacking.shtml More Articles • False-positive psychology: undisclosed flexibility in data collection and analysis allows presenting anything as significant by Simmons, Nelson, and Simonsohn http://www.ncbi.nlm.nih.gov/pubmed/22006061 • Common misconceptions about data analysis and statistics by Motulsky http://www.ncbi.nlm.nih.gov/pubmed/25204545 Resources: General Books • Hulley et al (2001) Designing Clinical Research, 2nd ed. LWW • Rosenthal (2006) Struck by Lightning: The curious world of probabilities • Bland (2000) An Introduction to Medical Statistics, 3rd. ed. Oxford University Press • Armitage, Berry and Matthews (2002) Statistical Methods in Medical Research, 4th ed. Blackwell, Oxford • Altman (1991) Practical Statistics for Medical Research. Chapman and Hall Books • Statistical Rules of Thumb by Gerald van Belle (vanbelle.org for updates & monthly rule) • Hosmer and Lemeshow books • Epidemiology by Leon Gordis More Books • Statistical Reasoning in Medicine: The Intuitive P-Value Primer by Lemuel Moye • Designing Clinical Research: An Epidemiologic Approach, edited by Stephen Hulley • Critical Appraisal of Epidemiological Studies and Clinical Trials by Mark Elwood • Sequential trials: Whitehead, J. (1997) The Design and Analysis of Sequential Clinical Trials, revised 2nd. ed. Wiley • Equivalence trials: Pocock SJ. (1983) Clinical Trials: A Practical Approach. Wiley And More Books • Data Monitoring Committees in Clinical Trials: A Practical Perspective by Ellenberg, Fleming, DeMets. • Fundamentals of Clinical Trials by Friedman, Furberg, DeMets • The Statistical Evaluation of Medical Tests for Classification and Prediction by Margaret Sullivan Pepe Normal/Large Sample Data? Yes Inference on means? No Yes Independent? No Yes Variance known? Yes Z test Inference on variance? Yes Paired t F test for variances No Variances equal? Yes T test w/ pooled variance No T test w/ unequal variance Normal/Large Sample Data? No Yes Binomial? Independent? No No Nonparametric test Yes McNemar’s test Expected ≥5 Yes 2 sample Z test for proportions or contingency table No Fisher’s Exact test