Download Research Project Planning: Sample Size and Statistical Power

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Regression analysis wikipedia , lookup

Linear regression wikipedia , lookup

Time series wikipedia , lookup

German tank problem wikipedia , lookup

Coefficient of determination wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Transcript
RESEARCH PROJECT PLANNING
SAMPLE SIZE AND STATISTICAL POWER;
STATISTICS OVERVIEW
Catherine R. Messina PhD
Research Associate Professor
Department of Family, Population &
Preventive Medicine
September 28, 2016
CONSIDER ……..
Dr. X compares a new method for treating diaper rash to the
usual care for this condition . Dr. X randomly assigns 5
infants to the new method group and 5 infants to the usual
care group (10 infants total).
Study findings favor the new method as most effective
compared to usual care. The p value for this comparison
is 0.08.
How do you interpret this situation?
CONSIDER ……..
When comparing groups and the value for p is > 0.05
……..
There may truly BE NO effect …..
There may truly BE an effect in the population but your
statistical test which is based on your sample, suggests
no significant effect …
?????
STATISTICAL POWER

The ability to detect a significant difference of a specific
magnitude (i.e. effect size) between groups, if it actually exists


Minimum acceptable statistical power for a proposed study set at
80% - that is at minimum, we tolerate an 80% chance that a
difference that really exists will show up as a statistically significant
finding
What influences statistical power ………
STATISTICAL POWER

Statistical power is directly related to sample size,
effect size, and alpha ………
Power increases as effect size increases, for a given sample size
 Power increases as sample size increases, for a given effect size
 Power increases as alpha increases (typically set at p < 0.05)


Power in inversely related to variability


Power decreases as variability increases
What is alpha? Threshold at which statistical significance is
reached – that is, the risk of concluding that there is a
difference when one does not exist cannot exceed 5%.
POWER VS. EFFECT SIZE WHEN SAMPLE SIZE IS FIXED
1
.80
Power
0
Effect size
POWER VS. SAMPLE SIZE WHEN EFFECT SIZE IS FIXED
1
.80
Power
0
Sample size
POWER VS. ALPHA
1
.80
Power
0 0.05
Alpha
1
POWER VS. VARIABILITY WHEN EFFECT SIZE IS FIXED
1
.80
Power
0
Variability
ESTIMATING THE SAMPLE SIZE: STATISTICAL POWER
Critical aspect of research planning
 The size of the sample can influence your ability to detect
meaningful differences between study groups



Underpowered study makes it hard to detect real differences
Not always a good idea to base your sample size on prior lit

Many published studies actually have very low statistical power
 If power of a published study is 50%, then they had only a 50%
probability of finding an effect if it really existed
 If you use the same sample size, then you may only have a 50%
chance of replicating that effect
ESTIMATING THE SAMPLE SIZE: STATISTICAL POWER

Critical aspect of research planning – con’t
 Underpowered
study makes it hard to interpret
differences that appear to be real: the lower the power of
a study, the lower the probability that an observed effect
that reaches statistical significance(e.g., p < 0.05)
actually reflects a true effect. (Ioannidis, JP (2005); Ioannidis, JP, Tarone,
R, McLaughlin, JK (2011))
Stop collecting data here?
ESTIMATING THE SAMPLE SIZE: STATISTICAL POWER

Critical aspect of research planning – con’t
 Not
only tells you how many participants you need – tell
you how many you don’t need
Saves resources
 Ethical considerations

 Over-powered
study can increase risk for detecting
meaningless differences
ESTIMATING PARAMETERS NEEDED FOR
POWER CALCULATIONS

Sample size calculations require:
estimation of power (not less than 80%),
 alpha level (typically p < 0.05)
 estimate effect size (this should be the smallest difference that is
clinically significant)
 estimate population variability (e.g., standard deviations) – as
sample size increases, variability decreases

Estimate from pilot data
 Estimate from prior studies using the same outcome measure – if more than
one study, you can use the average SD

Should be close to true values but don’t need to be perfect
 Qualified quess-timate: preliminary results / pilot data
 Other studies / published literature
 Smallest clinically relevant effect
ESTIMATING THE SAMPLE SIZE: REPRESENTATIVENESS

Critical aspect of research planning

The size of the sample can influence the representativeness
of the study sample


“Representativeness” – how well does the sample
represent the population
NOTE: having enough people in your sample does not
necessarily guarantee representativeness if sample
selection was biased in some way
ESTIMATING THE SAMPLE SIZE:
REPRESENTATIVENESS
Provided sample selection is unbiased:

In general, the larger the sample, the greater the likelihood
that study findings will accurately reflect the population
because larger samples have lower sampling error
Sampling error = differences between the sample and the
population that are due solely to the particular sample that
happens to have been selected
 As sample size increases, sampling error decreases.

ESTIMATING THE SAMPLE SIZE:
REPRESENTATIVENESS

Size of representative sample based on level of precision
and confidence regarding your estimates

E.g., 95% confidence level (alpha = 0.05) and high precision
(narrow confidence interval) requires greater sample size than
95% confidence level and low precision (wide confidence interval)

E.g., 95% confidence level (alpha = 0.05) and high precision
(narrow confidence interval) requires greater sample size than
90% confidence level (alpha = 0.10) and high precision (narrow
confidence interval)
ESTIMATING THE SAMPLE SIZE
Considerations!
 Sample size too small
May not yield precise, reliable findings
 Clinically significant findings may be missed


Sample size too big
Clinically insignificant findings may emerge as statistically
significant due solely to the sample size
 Waste of resources
 Unethical

The sample size you need vs. what is at hand (or your
timeline) – avoid spending time / resources on project that
may yield very little
 Planning ahead for subgroup analyses

STATISTICAL ANALYSIS PLAN
INFORMED BY YOUR RESEARCH QUESTION!!!
 Research question is a general statement of purpose identifies the focus of study

Are you describing a set of characteristics?
 Are you evaluating degree of correlation between 2 measures?
 Are you comparing measure(s) between 2 or more groups?


Goes hand in hand with operational (i.e., measurable)
definitions of variables of interest and choice of study
measures
E.g., if plan to compare means or compare proportions – need to
obtain appropriate data
 E.g., cross sectional study or repeated measures design – each
requires a different statistical approach

STATISTICS – MAJOR TYPES
Descriptive vs. inferential statistics
 Descriptive statistics –

Describe or summarize data and describe patterns
of variability
 Provide an overview of the attributes of a data set
 Include:

summary statistics (e.g., group size, proportions,
ratios, rates)
 measures of central tendency (e.g. mean, mode,
median)
 measures of dispersion (e.g., range, variance,
standard deviation)

QUESTIONS ANSWERED BY DESCRIPTIVE STATISTICS



What is the mean age of children in the study sample
What is the age distribution of children who were
vaccinated for flu in the past 5 years
What percentage of children were screened for second
hand smoke exposure
STATISTICS – MAJOR TYPES
 Inferential statistics –
A set of procedures for generalizing (or inferring)
to a population of individuals based on
information obtained from a limited number of
individuals drawn from that population (i.e., the
sample)
 Provide a measure of how well your data
support your hypothesis

APPLYING INFERENTIAL STATISTICS




Select test of significance (method of inference
used to support or reject claims based on
sample data – think of this as your statistical
test of choice)
Decide whether significance test will be onetailed or two-tailed
Select alpha, the probability that the sample
effect really exists in the population and is not
due to chance (usually set as  < 0.05)
Compute test of significance (the actual p value)
RELATEDNESS VS. DIFFERENCES
Does your research question focus
on associations (or relationships)
among measures
or
does it focus on differences between
measures or groups???
DESCRIBE RELATIONSHIPS BETWEEN VARIABLES
Correlation
Are procedure time and patient age
correlated?

DESCRIBE RELATIONSHIPS BETWEEN VARIABLES
Correlation
 Determines whether and to what degree a relationship exits
between variables
Quantifies the direction of the relationship (direct or indirect)
 Quantifies the strength of relationship expressed as a coefficient
which ranges from –1 to + 1
 1 = perfect correlation; 0 = no correlation
 Pearson correlation coefficient (Pearson r) – a measure of
correlation used for interval scale data and assumes that the
relationship between variables is linear
 Spearman Rho – a measure of correlation used for ordinal data
 CORRELATION DOES NOT IMPLY CAUSE / EFFECT
 Does not imply agreement – other measures such as Kappa are
better

DESCRIBING RELATIONSHIPS BETWEEN VARIABLES
Simple and multiple linear regression
Linear regression estimates or predicts values of a dependent
variable for any value of one or more independent variables
 Dependent variable (DV; outcome) is continuous
 Does patient age predict procedure time? Does procedure time
increase at a constant rate for each addition year of patient age?
 Used for interval scale data
 Assumes that the relationship between DV and IV is linear (i.e., if
means of dependent and independent variables plotted against each
other – would fall on straight line).
 Cannot imply causation

“Simple” (univariate) linear regression model – only one
independent variable (IV) as predictor of DV
 Multivariate linear regression model – more than one IV as
predictors of DV

TYPES OF REGRESSION MODELS

Logistic Regression





Dependent variable (outcome) is categorical / usually
dichotomous (e.g., above median vs. below median)
Provides odds ratios (also 95% confidence intervals and p
values)
What is the probability that procedure time will be above the
median (rather than below) when patients are older compared
to younger?
OR = 1.5 Older patients 1.5 times more likely to evidence
procedure times above the median than younger patients
Simple or multiple logistic regression modeling
Does not assume a linear relationship between DV and IV
DESCRIBING RELATIONSHIPS BETWEEN VARIABLES
Correlation
Nominal data
 Chi square test of independence
Categorical data arranged in 2 x 2 , 2 x 3, etc contingency tables
 Data in the cells are frequency counts
Example: is patient gender associated with CRC screening use

Flu vaccine
NO
Flu vaccine
YES
Female
10 (28%)
25 (71%)
Male
18 (53%)
16 (47%)
X 2 = 4.25, p = 0.04
COMPARISONS BETWEEN GROUPS

T-tests – compare means for 2 groups
Appropriate for interval data
 Independent samples t-test

To compare 2 groups that are mutually exclusive
 E.g., Do mean values for HbA1C differ between intervention vs.
control groups?


Paired samples t-test

To compare pretest vs. posttest (repeated) measures for the same
individual

E.g., Do mean HbA1c values at baseline differ from those post
intervention?
COMPARISONS BETWEEN GROUPS

Analysis of Variance ANOVA – compare means for more
than 2 groups
Appropriate for interval data
 Avoids the need to compute multiple t-tests to compare groups
 Do mean values for HbA1C vary significantly by age group
(children < 6 yrs, children 6 -12 years, and children > 12
years)?
 Evaluates all of the mean differences in a single hypothesis
test using a single alpha level
 This means that you may conclude that a difference exists
but will not tell you where that difference is ……..

PLANNED VS. POST HOC COMPARISONS

Planned comparisons
Multiple comparisons of means that is decided upon before NOT AFTER - the study is conducted and is hypothesis driven
 E.g., We expect that mean values for HbA1C will vary
significantly by age group and that children < 6 yrs will have
significantly lower values than children > 12 years but not
children 6 -12 years.


Post hoc comparisons
Multiple comparisons of means decided while study is conducted
/ during analysis stage –not driven by original hypothesis
 Can lead to spurious findings
 Only conducted if ANOVA test indicates significant difference
 P-value corrected for multiple comparisons- i.e., Bonferroni
adjustment

PARAMETRIC VS. NON-PARAMETRIC TESTS

Parametric tests: t-test, ANOVA
Data measured in interval scale
 Underlying assumptions about the shape of the distribution of
population data (normal), selection of participants
(independent), etc.


Non-parametric tests: chi square tests; Mann-Whitney U
(two independent samples), Wilcoxon test (paired
samples); Kruskal Wallis (3 or more samples)
Data can be nominal or ordinal (chi-square)
 Interval data if parametric assumptions violated (assumption
free) or nature of distribution of population data not known

MULTIPLE VS. BIVARIATE STATISTICAL METHODS

Bivariate methods examine the effect of one variable at a
time, on an outcome

Cannot control for potential effects of other measures which
may be associated with an outcome
Age
CRC screening
with
colonoscopy
Health
insurance
Your
intervention
Gender
MULTIVARIABLE VS. BIVARIATE STATISTICAL METHODS

Multivariable methods examine the simultaneous effect of
multiple variables on an outcome variable

Statistically controls for / adjusts for effects of other measures
which may be associated with an outcome
Independent
contribution of
your intervention
Gender
on outcome
controlling for
gender, age, and
Your
CRC screening with
health insurance
intervention
colonoscopy
Health
insurance
Age
“SOMEWHERE, SOMETHING INCREDIBLE IS
WAITING TO BE KNOWN”
CARL SAGAN PHD
(AMERICAN ASTRONOMER, WRITER AND SCIENTIST,
1934-1996)
CONTACT INFORMATION





Catherine R. Messina PhD
Department of Family, Population & Preventive Medicine
HSC-L3, Rm 086
4-8266
[email protected]