Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
RESEARCH PROJECT PLANNING SAMPLE SIZE AND STATISTICAL POWER; STATISTICS OVERVIEW Catherine R. Messina PhD Research Associate Professor Department of Family, Population & Preventive Medicine September 28, 2016 CONSIDER …….. Dr. X compares a new method for treating diaper rash to the usual care for this condition . Dr. X randomly assigns 5 infants to the new method group and 5 infants to the usual care group (10 infants total). Study findings favor the new method as most effective compared to usual care. The p value for this comparison is 0.08. How do you interpret this situation? CONSIDER …….. When comparing groups and the value for p is > 0.05 …….. There may truly BE NO effect ….. There may truly BE an effect in the population but your statistical test which is based on your sample, suggests no significant effect … ????? STATISTICAL POWER The ability to detect a significant difference of a specific magnitude (i.e. effect size) between groups, if it actually exists Minimum acceptable statistical power for a proposed study set at 80% - that is at minimum, we tolerate an 80% chance that a difference that really exists will show up as a statistically significant finding What influences statistical power ……… STATISTICAL POWER Statistical power is directly related to sample size, effect size, and alpha ……… Power increases as effect size increases, for a given sample size Power increases as sample size increases, for a given effect size Power increases as alpha increases (typically set at p < 0.05) Power in inversely related to variability Power decreases as variability increases What is alpha? Threshold at which statistical significance is reached – that is, the risk of concluding that there is a difference when one does not exist cannot exceed 5%. POWER VS. EFFECT SIZE WHEN SAMPLE SIZE IS FIXED 1 .80 Power 0 Effect size POWER VS. SAMPLE SIZE WHEN EFFECT SIZE IS FIXED 1 .80 Power 0 Sample size POWER VS. ALPHA 1 .80 Power 0 0.05 Alpha 1 POWER VS. VARIABILITY WHEN EFFECT SIZE IS FIXED 1 .80 Power 0 Variability ESTIMATING THE SAMPLE SIZE: STATISTICAL POWER Critical aspect of research planning The size of the sample can influence your ability to detect meaningful differences between study groups Underpowered study makes it hard to detect real differences Not always a good idea to base your sample size on prior lit Many published studies actually have very low statistical power If power of a published study is 50%, then they had only a 50% probability of finding an effect if it really existed If you use the same sample size, then you may only have a 50% chance of replicating that effect ESTIMATING THE SAMPLE SIZE: STATISTICAL POWER Critical aspect of research planning – con’t Underpowered study makes it hard to interpret differences that appear to be real: the lower the power of a study, the lower the probability that an observed effect that reaches statistical significance(e.g., p < 0.05) actually reflects a true effect. (Ioannidis, JP (2005); Ioannidis, JP, Tarone, R, McLaughlin, JK (2011)) Stop collecting data here? ESTIMATING THE SAMPLE SIZE: STATISTICAL POWER Critical aspect of research planning – con’t Not only tells you how many participants you need – tell you how many you don’t need Saves resources Ethical considerations Over-powered study can increase risk for detecting meaningless differences ESTIMATING PARAMETERS NEEDED FOR POWER CALCULATIONS Sample size calculations require: estimation of power (not less than 80%), alpha level (typically p < 0.05) estimate effect size (this should be the smallest difference that is clinically significant) estimate population variability (e.g., standard deviations) – as sample size increases, variability decreases Estimate from pilot data Estimate from prior studies using the same outcome measure – if more than one study, you can use the average SD Should be close to true values but don’t need to be perfect Qualified quess-timate: preliminary results / pilot data Other studies / published literature Smallest clinically relevant effect ESTIMATING THE SAMPLE SIZE: REPRESENTATIVENESS Critical aspect of research planning The size of the sample can influence the representativeness of the study sample “Representativeness” – how well does the sample represent the population NOTE: having enough people in your sample does not necessarily guarantee representativeness if sample selection was biased in some way ESTIMATING THE SAMPLE SIZE: REPRESENTATIVENESS Provided sample selection is unbiased: In general, the larger the sample, the greater the likelihood that study findings will accurately reflect the population because larger samples have lower sampling error Sampling error = differences between the sample and the population that are due solely to the particular sample that happens to have been selected As sample size increases, sampling error decreases. ESTIMATING THE SAMPLE SIZE: REPRESENTATIVENESS Size of representative sample based on level of precision and confidence regarding your estimates E.g., 95% confidence level (alpha = 0.05) and high precision (narrow confidence interval) requires greater sample size than 95% confidence level and low precision (wide confidence interval) E.g., 95% confidence level (alpha = 0.05) and high precision (narrow confidence interval) requires greater sample size than 90% confidence level (alpha = 0.10) and high precision (narrow confidence interval) ESTIMATING THE SAMPLE SIZE Considerations! Sample size too small May not yield precise, reliable findings Clinically significant findings may be missed Sample size too big Clinically insignificant findings may emerge as statistically significant due solely to the sample size Waste of resources Unethical The sample size you need vs. what is at hand (or your timeline) – avoid spending time / resources on project that may yield very little Planning ahead for subgroup analyses STATISTICAL ANALYSIS PLAN INFORMED BY YOUR RESEARCH QUESTION!!! Research question is a general statement of purpose identifies the focus of study Are you describing a set of characteristics? Are you evaluating degree of correlation between 2 measures? Are you comparing measure(s) between 2 or more groups? Goes hand in hand with operational (i.e., measurable) definitions of variables of interest and choice of study measures E.g., if plan to compare means or compare proportions – need to obtain appropriate data E.g., cross sectional study or repeated measures design – each requires a different statistical approach STATISTICS – MAJOR TYPES Descriptive vs. inferential statistics Descriptive statistics – Describe or summarize data and describe patterns of variability Provide an overview of the attributes of a data set Include: summary statistics (e.g., group size, proportions, ratios, rates) measures of central tendency (e.g. mean, mode, median) measures of dispersion (e.g., range, variance, standard deviation) QUESTIONS ANSWERED BY DESCRIPTIVE STATISTICS What is the mean age of children in the study sample What is the age distribution of children who were vaccinated for flu in the past 5 years What percentage of children were screened for second hand smoke exposure STATISTICS – MAJOR TYPES Inferential statistics – A set of procedures for generalizing (or inferring) to a population of individuals based on information obtained from a limited number of individuals drawn from that population (i.e., the sample) Provide a measure of how well your data support your hypothesis APPLYING INFERENTIAL STATISTICS Select test of significance (method of inference used to support or reject claims based on sample data – think of this as your statistical test of choice) Decide whether significance test will be onetailed or two-tailed Select alpha, the probability that the sample effect really exists in the population and is not due to chance (usually set as < 0.05) Compute test of significance (the actual p value) RELATEDNESS VS. DIFFERENCES Does your research question focus on associations (or relationships) among measures or does it focus on differences between measures or groups??? DESCRIBE RELATIONSHIPS BETWEEN VARIABLES Correlation Are procedure time and patient age correlated? DESCRIBE RELATIONSHIPS BETWEEN VARIABLES Correlation Determines whether and to what degree a relationship exits between variables Quantifies the direction of the relationship (direct or indirect) Quantifies the strength of relationship expressed as a coefficient which ranges from –1 to + 1 1 = perfect correlation; 0 = no correlation Pearson correlation coefficient (Pearson r) – a measure of correlation used for interval scale data and assumes that the relationship between variables is linear Spearman Rho – a measure of correlation used for ordinal data CORRELATION DOES NOT IMPLY CAUSE / EFFECT Does not imply agreement – other measures such as Kappa are better DESCRIBING RELATIONSHIPS BETWEEN VARIABLES Simple and multiple linear regression Linear regression estimates or predicts values of a dependent variable for any value of one or more independent variables Dependent variable (DV; outcome) is continuous Does patient age predict procedure time? Does procedure time increase at a constant rate for each addition year of patient age? Used for interval scale data Assumes that the relationship between DV and IV is linear (i.e., if means of dependent and independent variables plotted against each other – would fall on straight line). Cannot imply causation “Simple” (univariate) linear regression model – only one independent variable (IV) as predictor of DV Multivariate linear regression model – more than one IV as predictors of DV TYPES OF REGRESSION MODELS Logistic Regression Dependent variable (outcome) is categorical / usually dichotomous (e.g., above median vs. below median) Provides odds ratios (also 95% confidence intervals and p values) What is the probability that procedure time will be above the median (rather than below) when patients are older compared to younger? OR = 1.5 Older patients 1.5 times more likely to evidence procedure times above the median than younger patients Simple or multiple logistic regression modeling Does not assume a linear relationship between DV and IV DESCRIBING RELATIONSHIPS BETWEEN VARIABLES Correlation Nominal data Chi square test of independence Categorical data arranged in 2 x 2 , 2 x 3, etc contingency tables Data in the cells are frequency counts Example: is patient gender associated with CRC screening use Flu vaccine NO Flu vaccine YES Female 10 (28%) 25 (71%) Male 18 (53%) 16 (47%) X 2 = 4.25, p = 0.04 COMPARISONS BETWEEN GROUPS T-tests – compare means for 2 groups Appropriate for interval data Independent samples t-test To compare 2 groups that are mutually exclusive E.g., Do mean values for HbA1C differ between intervention vs. control groups? Paired samples t-test To compare pretest vs. posttest (repeated) measures for the same individual E.g., Do mean HbA1c values at baseline differ from those post intervention? COMPARISONS BETWEEN GROUPS Analysis of Variance ANOVA – compare means for more than 2 groups Appropriate for interval data Avoids the need to compute multiple t-tests to compare groups Do mean values for HbA1C vary significantly by age group (children < 6 yrs, children 6 -12 years, and children > 12 years)? Evaluates all of the mean differences in a single hypothesis test using a single alpha level This means that you may conclude that a difference exists but will not tell you where that difference is …….. PLANNED VS. POST HOC COMPARISONS Planned comparisons Multiple comparisons of means that is decided upon before NOT AFTER - the study is conducted and is hypothesis driven E.g., We expect that mean values for HbA1C will vary significantly by age group and that children < 6 yrs will have significantly lower values than children > 12 years but not children 6 -12 years. Post hoc comparisons Multiple comparisons of means decided while study is conducted / during analysis stage –not driven by original hypothesis Can lead to spurious findings Only conducted if ANOVA test indicates significant difference P-value corrected for multiple comparisons- i.e., Bonferroni adjustment PARAMETRIC VS. NON-PARAMETRIC TESTS Parametric tests: t-test, ANOVA Data measured in interval scale Underlying assumptions about the shape of the distribution of population data (normal), selection of participants (independent), etc. Non-parametric tests: chi square tests; Mann-Whitney U (two independent samples), Wilcoxon test (paired samples); Kruskal Wallis (3 or more samples) Data can be nominal or ordinal (chi-square) Interval data if parametric assumptions violated (assumption free) or nature of distribution of population data not known MULTIPLE VS. BIVARIATE STATISTICAL METHODS Bivariate methods examine the effect of one variable at a time, on an outcome Cannot control for potential effects of other measures which may be associated with an outcome Age CRC screening with colonoscopy Health insurance Your intervention Gender MULTIVARIABLE VS. BIVARIATE STATISTICAL METHODS Multivariable methods examine the simultaneous effect of multiple variables on an outcome variable Statistically controls for / adjusts for effects of other measures which may be associated with an outcome Independent contribution of your intervention Gender on outcome controlling for gender, age, and Your CRC screening with health insurance intervention colonoscopy Health insurance Age “SOMEWHERE, SOMETHING INCREDIBLE IS WAITING TO BE KNOWN” CARL SAGAN PHD (AMERICAN ASTRONOMER, WRITER AND SCIENTIST, 1934-1996) CONTACT INFORMATION Catherine R. Messina PhD Department of Family, Population & Preventive Medicine HSC-L3, Rm 086 4-8266 [email protected]