Download sample size consideration in clinical research

Document related concepts

Bootstrapping (statistics) wikipedia , lookup

Psychometrics wikipedia , lookup

Taylor's law wikipedia , lookup

History of statistics wikipedia , lookup

Foundations of statistics wikipedia , lookup

Statistical hypothesis testing wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Student's t-test wikipedia , lookup

Misuse of statistics wikipedia , lookup

Transcript
Sample Size Consideration in
Clinical Research
John Kwagyan, PhD
[email protected]
Howard University College of Medicine
GHUCCTS
What Is Statistics?
The science of collecting, organizing,
analyzing, and interpreting data to assist in
making effective decisions.
What is Statistics?
The science of collecting, organizing,
analyzing and interpreting data to assist in
making effective decisions.
• Summarization of large quantities of data
(Descriptive/Summary Statistics)
• Making decision from sample to population
(Inferential Statistics)
Type of Statistics
• Descriptive/Summary Statistics
Methods for organizing, summarizing, and
presenting data in an informative way.
• Inferential Statistics
Methods for estimation and testing population
parameters?? based on sample information.
Population
Well defined
Large
Unique Characteristics
-prevalence of a disease
-variability of a measure
-Response rate of therapy
-etc
We are interested in estimating the
population characteristics!!!
POPULATION
SAMPLE
sample data
We make inference about population characteristics
based on sample data
Population Parameters
•
•
•
•
Mean cholesterol level of obese individuals
Prevalence of hypertension in Blacks
Incidence of lung cancer among smokers
Risk of liver disease (hepatitis) associated with
drinking
• Mortality rate of heart attach among men
• Variability of heart rate in PTSD
CENTRAL IDEA:
Estimate and Test for differences in parameters
Case Example
• Suppose that we plan to conduct a study comparing a
treatment with a control.
• The response variable is systolic blood pressure (SBP),
measured using a standard sphygmomanometer.
• The treatment is supposed to reduce blood pressure
• We set up a one-sided test
H0 : μT = μC versus H1 : μT <μC
where μT = mean SBP for the Trt group.
• The parameter Δ = μT −μC is the effect being tested
Case Example
• Suppose the goals of the study specify that we want to
be able to detect a situation where the treatment mean is
15 mmHg lower than the control group.
• The required effect size is Δ= −15.
• We specify that such an effect be detected with 80%
power (1-β= .80) when the significance level α = .05.
• Past experience with similar study-with similar
sphygmomanometers and similar subjects-suggests that
the data will be approximately normally distributed with
a standard deviation of SD =20 mmHg.
• We plan to use a two-sample pooled t test with equal
numbers n of subjects in each group.
Case Example
• Now we have all of the specifications needed for
determining sample size using the power
approach, and their values may be entered in
suitable formulas, charts, or power-analysis
software.
• We find that a sample size of n = 23 per group is
needed to achieve the stated goals.
Basic Parameters and Concepts
•
•
•
•
•
•
Study (Research) Hypotheses
Type I Error Rate, , Significance level
P-value
Type II Error Rate, 
Power, 1- 
Effect Size, Δ
~size of clinically meaningful change.
HYPOTHESIS,
HYPOTHESIS TESTING
Hypothesis
• HYPOTHESIS: a statement about a
population characteristic/parameter
• HYPOTHESIS: a prediction/idea about
what the examination of appropriate
data will show about a characteristic
Hypothesis
• Null (Test) Hypothesis, H0
~ Hypothesis to be questioned (disproved).
~ Hypothesis of no real (true) difference
• Alternative (Research) Hypothesis, HA
~ Hypothesis investigator wishes to establish.
~ Hypothesis of a real (true) difference
Example
• Research Hypothesis: Combination therapy is
effective?? in the treatment of hypertension.
• Effective
~ considerable reduction in BP (1)
~ controls BP increases
(2)
• Parameter
~ Mean percent reduction in BP (1)
~ Proportion controlled
(2)
• Test Hypothesis: The combination therapy is
not effective.
Goal
• Goal is to TEST the Null Hypothesis and
decide whether to REJECT IT in favor of
the Alternative, or FAIL TO REJECT it.
Test of Hypothesis
One-Tailed Tests
• A test is one-tailed when the research
hypothesis, HA , specifies a direction:
HA: The incidence of lung cancer among
smokers is higher than nonsmokers
Two-Tailed Tests
• A test is two-tailed when no direction is
specified in the research hypothesis HA.
HA: The stress level in DC is different from NY.
Test & Decision
Test H0 : no difference in effectiveness
Possible Outcomes
Null Hypothesis could be true (i.e., no difference)
Null Hypothesis could be false (i.e., difference)
Decision Making
Investigator rejects the null hypothesis
Investigator fails to rejects the null hypothesis
Test & Decision
Test H0: therapy
is not effective
Test: H0
________________________________________________________________
True (not effective)
False (Effective)
______________________________________________________________________________________________________
Decision
No Error
Reject Type I Error
Accept
Type II Error
No Error
_____________________________________________________________________________________________________
Drug Trial
H0: “Miracle” drug is not effective
H0
________________________________________________________________
True( Not Effective) False (Effective)
__________________________________________________________________________________________
Decision
Accept
No Error
Reject Type I Error
Type II Error
No Error
TI: Deny a patient a “known therapy” in favor of
an ineffective “miracle drug”
TII: Deny a patient a better drug in favor of a less
effective “known therapy
Test & Decision
Test H0
________________________________________________________________
True
False
__________________________________________________________________________________________________
Decision
Accept
Reject
No Error
Type II Error
=P(Type II Error)
Type I Error
No Error
=P(Type I Error )
_____________________________________________________________________________________________________
Is this Familiar !!!!!
• All tests were performed two-sided at the
5% level of significance.
• Significance was defined as a value of p <
0.05.
• A value of p < 0.05 was considered
statistically significant.
• ALL YOU ARE DOING IS CONTROLLING
THE TYPE I ERROR RATE
Definitions
 = P{Type I Error }
= P{rejecting H0|H0 is true}
= P{rejecting the truth}
 ~ is called the Type I Error Rate
 ~ is called the Significance Level
Definitions
 = P{Type II error}
= P{fail to reject H0|H0 is false}
= P{accepting a fallacy }
 ~ called the Type II Error Rate
1-  ~ called Power of study
Definitions
 = P{fail to reject H0|H0 is false}
1-  = P{reject H0 | H0 is false}
= P{ accept HA| HA is true}
1-  ~ is called Power of study
Power ~ quantifies the ability of the study
to detect a difference, if any
Definitions: P-value
~ probability of having observed our data (i.e.
observed a difference) when the null
hypothesis is true???.
~ probability of the data having arisen by
chance when the null hypothesis is true.
Definitions: P-value
~ the smaller the p-value,
the weaker the null hypothesis
~ the smaller the p-value,
the stronger the alternative hypothesis
How do we evaluate this probability?
By calculating a test statistic
Test Statistic
-a value which we can compare with a known
distribution of what we expect when the null
hypothesis is true
Most test statistic have the form:
• Test Statistic
= observed value – expected value
standard error of observed value
Common Test Statistic
• T-test
• F-test
• Chi-square (χ2) test
How do you choose the appropriate statistic???
Statistical Significance
• Accepted values in clinical research
p  0.05
P  0.01
significant
highly significant
In Genetic (Linkage) Analysis:
• Lod Score =3.0 ~ significant
• Lod Score =3.0 ~ =0.0001
SAMPLE SIZE
CONSIDERATION
Population And Sample
Target Population
Ineligible
Define Eligibility Criteria
Study Population
Study Sample
Eligibility Criteria!!!!
~ consist of
inclusion criteria
exclusion criteria
• Inclusion criteria is used to outline the intended
study population
• Exclusion criteria is used to fine-tune the intended
population by removing expected sources of
variation
Eligibility Criteria!!!!
• Inclusion Criteria
Female
Age ≥ 21 years
BMI ≥ 25 kgm-2
• Exclusion Criteria
Male
Age < 21 years
BMI < 25 kgm-2
REDUNDANT!!!!
Eligibility Criteria!!!!
• Inclusion Criteria
i. Female
ii. Age > 21 yrs
iii. BMI ≥ 25kgm-2
Exclusion Criteria
i. Male
ii. Age < 21
iii. BMI < 25
• Exclusion Criteria
i. Pregnant or breast feeding
ii. History of
…….
iii. Any other condition in the opinion of the
investigator (s) that would make the subject unsuitable for
the study
Why Sample Size ?
• Requirement ( Clinical Research
Protocol, Funding Agencies, etc) in
many grant application
• Budgetary Constraints
• Provide Statistical Justification
• Inference (decision) is based on it
How Much Data Do I Need?
• How big a difference are you trying to detect?
Effect Size
- Absolute difference ~ say 5mmHg drop BP
- Relative difference ~ 5% drop in BP
• How much variation is there in the outcome?
• How certain do you want to be that you will detect
the difference of interest ?
Eliciting effect size
• How big a difference would be of clinical
importance for you?
Some responses I get:
• Huh??
• What do you mean?
• What do you recommend?
• Any difference at all would be important
Finding the right variance
• Based on experience
Range of values
Stories behind extreme values
Sources of variations
• Use of historical data
• Conduct a pilot study.
What if u have imposed sample
size
• Sometimes, a proposal comes with imposed
sample size.
• Sample size is but one of several quality
characteristics of a study
• If n is held fixed, we simply need to focus on
other characteristics, such as effect size.
Determination of Sample Size
Depends on:
1. Outcome measure (Data Endpoint)
2. Study Design
Types of Data Endpoints
• Continuous Data
- BP, BMI, TC, LDL, Blood Sugar
• Categorical Data
- Hypertension, Obese, Dyslipidemia, Diabetes
• Count Data
0, 1, 2, 3 - No of risk factors
• Survival (Time-to-Event) Data
- time-to-cardiac event, time-to-death
Putting All Together
(Power Analysis)
1- = P{ accept HA|HA is true)
=Func (,  (n), )
2
Power
Certainty
Variability
Effect Size
Sample size
Crude SS Estimate for Means
2-Sample Test for Means (2-sided)
16s
n =
sd
10
10
15
15


5
10
5
10
2
,  =0.05,  =0.2
2
n
48
16
144
36
Power = 80%
Sample Size Formula
2-Sample Test for Means (2-sided)
16
n =
sd
10
10
15
15


5
10
5
10
2
,  =0.05,  =0.2
2
n
48
16
144
36
Power = 80%
Sample Size
• A larger sample size is needed to detect the
smallest meaningful difference.
• A larger sample size is needed when there is
much variability in the population
• A larger sample size is required to increase
the power of a study.
Other Approaches
There are several approaches to sample size.
• One can specify the desired width of a confidence
interval and determine the sample size that achieves that
goal.
•
A Bayesian approach can be used where we optimize
some utility function-perhaps one that involves both
precision of estimation and cost.
Avoid “canned” effect sizes.
- The T-shirt effect sizes
• This is an elaborate way to arrive at the same sample size
that has been used in past social science studies of large,
medium, and small size.
• The method uses a standardized effect size as the goal.
• Think about it: for a "medium" effect size, you'll choose the
same n regardless of the accuracy or reliability of your
instrument, or the narrowness or diversity of your
population.
• Important considerations are being ignored
here. "Medium" is definitely not the message!
Cohen Effect Sizes????
What is small, medium, or large effect sizes for:
•
•
•
•
•
•
•
Odds Ratio
Hazard Ratios
Repeated Measures ANOVAs
Regression Models
Multivariate Models
Sensitivity Analysis
Adaptive Designs
Post Hoc Power Analyses
• In contrast to a priori power analyses, post hoc power
analyses often make sense after a study has already been
conducted.
Take Away Points
• Use power prospectively for planning future
studies.
• Put science before statistics. The appropriate
inputs to power/sample-size calculations should be
based on careful considerations of the underlying
scientific (not statistical!!) goals of the study.
• T-shirt Effect Sizes- If at all possible avoid using
“canned” effect sizes
References
1. Lenth, R. V. (2001), ``Some Practical Guidelines for
Effective Sample Size Determination,'' The American
Statistician, 55, 187-193.
2. Hoenig, John M. and Heisey, Dennis M. (2001), ``The
Abuse of Power: The Pervasive Fallacy of Power
Calculations for Data Analysis,'' The American Statistician,
55, 19-24