Download studydesign

Document related concepts

Foundations of statistics wikipedia , lookup

History of statistics wikipedia , lookup

Misuse of statistics wikipedia , lookup

Transcript
Study design and simple
statistics
17th Feb 2005
Kath Bennett
Overview
• Overview of research methods, study
design.
• Some common statistical definitions.
Research
Basic research
Lab, biochemical,
genetic
Epidemiology
Distribution &
determinants of
disease in a
population
Clinical
Deals with
patients with
a particular
disease
Research
• Clear aims and objectives from start
– hypothesis
• Design study to be able to address the
objectives set out
• Collect complete and accurate data
• Enter and analyse data
• Interpret the data in light of available
evidence
• Publish
Types of Clinical Research
Quantitative
Qualitative
Types of clinical studies
Quantitative
Observational
(epidemiological)
Experimental
(interventional)
Cohort
Case-Control
Cross-Sectional
Case Reports
“Clinical trials”
Randomised controlled
trial
Open studies
Pilot study
Large simplified trial
Observational versus
Experimental Research
• Observational research seen as complementary
to experimental:
• Intervention producing large impact, can
be shown using observational studies
• Infrequent adverse events, require large
numbers, inpractical in RCTS.
• Longer term than RCTS.
• Clinical uncertainty providing evidence for
RCTS.
• Impractical or unethical to do an RCT.
Comparison of random and
non-random studies

HRT and coronary heart disease. Evidence
from observational studies and recently
published RCT (Lancet 2002)
Relative risk
Observational studies
0.5-0.75
RCT
1.29
Quantitative Methods
Advantages
• ‘Objective’ assessment
• Can sample large numbers (cost!)
• Can assess prevalence
• Repeatable results (consistency)
Quantitative Methods
Disadvantages
• Way in which questions are generated
– Researcher decides limits and imposes
structure
– Little opportunity to detect “unexpected” new
outcomes
• Sources of bias
– lack of explanatory power
– limited ability to describe context
Types of clinical studies
Qualititative
Focus group discussions
Indepth interviewing
Observation
Documentary
Primary versus Secondary
Research
Primary
Clinical trials
Surveys
Cohort studies
(original research
focused on patients
or populations)
Secondary
Systematic Reviews
Meta – analyses
Economic analyses
(reanalysis of
previously gathered
data)
Clinical trials
• Importance for ventures into clinical
research
Principles required
•
•
•
•
Appropriate Design
Randomisation
Blinding
Study power or sample size
Randomised Controlled Trial RCT
QUESTION
Treatment (efficacy,
safety comparison etc.)
PREFERRED DESIGN
R.C.T.(randomised
controlled trial)
Clinical trial design
• Parallel group trials
– RANDOMISED:Patients randomly allocated to
either one treatment or another
– NON-RANDOMISED : patients not randomly
allocated to treatment.
• Factorial design
– Patients may receive none, one or more than one of
several interventions.
• Cross-over trials
– Patients receive one treatment followed by another.
Fewer patients required but takes longer. Withinsubject comparisons, and therefore less variability
producing more precise results (fewer patients
required)
Randomised parallel group
design
Participants satisfying
entry criteria
Randomly allocated to
receive A or B
A
B
Participants followed up
exactly the same way
Example: Digoxin vs Placebo – DIG study
Factorial design
Participants satisfying
entry criteria
Participants randomly
allocated to one of four
groups. 2x2 factorial
design
Example: Heart Protection Study.
=Vitamins;
=Placebo
=Simvastatin;
MRC/BHF Heart Protection
Study
2x2 Factorial treatment comparisons
Randomised to either:
Simvastatin
(40 mg daily)
vs
Placebo
tablets
Vitamins
(600 mg E, 250 mg C
& 20 mg beta-carotene)
vs
Placebo
capsules
Planned mean duration: At least 5 years
Two-period, two-treatment
cross-over trial
Participants
satisfying
entry criteria
– sometimes
followed by
run-in period
B
A
A
B
Randomised to
A followed by
B or vice-versa
Usually
‘washout’ in
between
Example: Aspergesic (A) vs ibuprofen (B) in rheumatoid
arthritis.
RELIABILITY
CHANCE
EFFECTS
SYSTEMATIC
BIASES
Random error
Systematic error
To obtain evidence as reliable
as possible
• Minimise chance effects (random error) by
– Increasing the number of patients studied (do
large trials and reviews of trials)
• Minimise systematic biases (systematic error)
by
– Using an appropriate method of allocation
(randomisation)
– Ensuring investigator and/or subject unaware of
treatment allocation (blinding)
– Basing the analyses on the allocated treatment
(intention-to-treat)
– Including all relevant evidence (systematic review
of similar trials)
Randomisation
• Clinical trials, and any studies need to avoid
bias
– By doctor eg. preferences to treatment
– By individual patient
– By choice of design
• Randomisation avoids bias by removing
choice of treatment by doctor or patient
• Randomisation is not always possible for
practical or ethical reasons, leading to a
controlled clinical trial (treated group
compared directly with non-treated group)
Blinding
• Avoidance of bias in subjective assessment
eg. pain, frequency of side effects achieved
through blinding
• Double blind (masked) trials
– when both patients & investigators are not
aware of which treatment group has been
assigned
• Single blind (masked) trials
– when only the study participant is not aware of
the treatment group assigned to them
• ‘Placebo’ is also useful in avoiding bias
Intention to treat (ITT)
• Intention of randomisation is to establish
similar groups of patients in each arm
• Problems arise when non-adherence
may be related to outcome or prognosis,
leading to biased representation
• ITT analyses all patients according to
randomised treatment irrespective of
protocol violations etc.
• However, it does not solve all problems
Number of patients required –
sample size
• Requirement for well-designed studies
• Most journals now require sample size
calculations
• Reassurance money well spent – likelihood
study will give unequivocal results
• Requirement for regularity authorities i.e FDA
• Low sample size can be a reason for not
recognising that one treatment is superior
• Unethical to perform a study if numbers too
small to detect a useful difference
What is “power” of a study?
• “the ability to detect a true difference of
clinical importance” Doug Altman
• “the confidence with which the
investigator can claim that a specified
treatment benefit has not been
overlooked”Sheila Gore
Estimating sample size and
power
• Identify a single major outcome measure –
primary endpoint
– Survival, response rate, quality of life
• Specify size of difference required to detect
– Improvement in response from 20% to 30%
• ‘We want to be reasonably certain of detecting
such a difference if it really exists’
– ‘detecting a difference’ refers to P<0.05
– ‘reasonably certain’ refers to having a chance of at
least 80% or obtaining such a P value
Methods to calculate sample
size
• Equations
– Mathematical equations available for computing
sample size given ,  and (1- )
• Tables
– Based on equations above
• Nomogram
– Summarises figures in a graph, easy to use
• Computer packages
Example
• Objective: to compare effect of drug A vs drug B
using blood pressure as outcome measure
• Design: RCT – half to drug A, half to drug B
• Require 80% power, and significance level set at
5%
• Expected mean difference between the two
groups= 6
• Pooled standard deviation SD=10
•  =difference in means/SD (effect size)
= 6/10 = 0.6
• From tables n=45 per group
Common statistical definitions
Classification of data
• Different types of data
– Nominal / categorical - used in
classification (eg blood groups); Female /
Male also
– Ordinal - ordered categorical data (e.g.
non-smoker, <10 day, 10-20 day, >20 day)
– Interval / continuous data (e.g. age,
birthweight, plasma K levels)
Graphical presentations
BAR CHARTS
• Bar charts are used to show
(graphically) frequency distributions for
categorical data.
• The height of each ‘bar’ in the bar chart
is proportional to the number of
observations or frequency of the
observations in each category.
BAR CHART
Bar chart of Blood groups
60
50
Number of patients
40
30
20
10
A
BLOOD GROUP
AB
B
O
Histograms
• Similar to bar charts but for continuous
(interval) data
• the width of the bars varies only with
varying intervals of data.
• Boundaries of histogram ‘bars’ are taken as
half way between the upper limit of the
lower group and the lower limit of the upper
group.
Histogram of pre-operative haemoglobin rates
Frequency (Number of patients)
16
14
12
10
8
6
4
Std. Dev = 14.40
Mean = 61.3
N = 45.00
2
0
30.0
40.0
50.0
60.0
70.0
pre-operative % haemoglobin
80.0
90.0
100.0
The Normal distribution
increasing probability
• An important distribution in statistics
•
- used for continuous data
•
- bell-shaped curve
•
- symmetric about the mean (or median)
0.4
2.5%
2.5
%
0
-4
95%
-2
-1.96
0
2
1.96
4
Measures of location
• Gives an idea of the ‘average’ value on a
particular scale
Common measures are:
– Mean - sum of observations / number of
observations
– Median - middle value of the sample when
arranged in order
– Mode - most common value (used when
only a few different values)
Variation
• Humans differ in response to exposure
to adverse effects
• Humans differ in response to treatment
• Humans differ in disease symptoms
• Diagnosis and treatment is often
probabilistically based
Measures of variation
• Gives an idea of the spread or variability of the
data
• Common measures are:
– Range
– Quartiles - The ‘inter-quartile range’ is the
difference between the 25th and 75th
centiles
– Sample variance - 2=
1
 ( x - x )2
i
n -1
Measures of dispersion
(contd.)
The standard deviation () is the square
root of the variance.
– Standard error (if repeated samples were
taken, the standard deviation of means
from each sample)
• SE(Mean)= 
n
Confidence intervals
• Over emphasis on hypothesis testing and
p-values.
• The size and range of the difference
between two groups is more informative
than whether it is statistically significant or
not.
• Confidence intervals, if appropriate to the
type of study, should be used for major
findings in both main text and abstract.
Confidence intervals
• If a CI is constructed, the significance of a
hypothesis test can be inferred from it.
• For example, a 95% CI for the difference of
two means containing 0 would infer that the
difference between the means was nonsignificant at 5%
Systolic blood pressure in 100
diabetic and 100 non-diabetic men
30
30
146.4
140.4
20
20
10
10
0
0
100.0
110.0
120.0
130.0
140.0
150.0
160.0
DIABETICS
170.0
180.0
190.0
100.0
110.0
120.0
130.0
140.0
150.0
NON-DIABETICS
Difference between sample means = 6 mm Hg.
160.0
170.0
180.0
Systolic blood pressure in 100
men with diabetes and 100 men
without
• Difference of 6.0mm Hg found between mean
systolic blood pressures, standard error 2.5mm
Hg.
• 95% confidence interval for population
difference is from 1.1 to 10.9 mm Hg.
• This means there is a 95% chance that the
indicated range includes the ‘true’ population
difference in mean blood pressure.
What affects the width of a
CI?
• The sample size by a factor of n. Smaller
sample size leads to lower precision.
• Variability of data - less variable the data,
more precise the estimate.
• Degree of confidence. 95% most commonly
used. If greater or less confidence required
the CIs increase and decrease respectively.
P-values and CIs
• One can infer from CIs whether there is a statistical
significant difference, but not vice versa.
• Example, difference in BP between diabetics and
non-diabetics found to be 6mm Hg. 95%
confidence interval for population difference is from
1.1 to 10.9 mm Hg.
• The interval does not contain ‘0’ so we can infer
that there is a statistically significant difference
between the groups. In fact, the p-value from an
independent t-test was p=0.02.
Probability
• Probability and statistical tests
– Statistical tests are used to assess the weight
of evidence and to estimate probability that
data arose from chance
– Presented as ‘p value’, usually p<0.05, i.e. the
observed difference would be expected to
have arisen by chance less than 5% of time or
p<0.001, less than 0.1% of the time
– 5% or 1% is known as the significance level of
the test or alpha ()
Effect on significance
• ‘Non-significance’
– Indicates insufficient weight of evidence
– Does not mean ‘no clinically important difference
between groups’
– If power of test is low (i.e. sample size too small), all
one can conclude is that the question of difference
between groups is unresolved
• Confidence intervals show, more informatively,
the impact of sample size upon precision of a
difference
Reporting p-values
P value
Wording
Summary
>0.05
Not significant
ns
0.01 to 0.05
Significant
*
0.001 to 0.01 Very significant
< 0.001
**
Extremely significant ***
Report the actual p-value
Measuring effectiveness
Risk
PROPORTION
A ratio where the numerator (top) is part of the denominator
(bottom).
RISK
Number of subjects in a group who have an event divided by total
number of subjects in the group. It is the probability of
(proportion) having an event in that group (P). It is called
incidence when expressed per unit time
RELATIVE RISK (RR)
Ratio of risk in exposed group to risk in not exposed group (P1/P2)
Example
Type of vaccine
I
II (Control)
Got
Influenza
43
52
Avoided
Influenza
237
198
Total
280
250
Risk of disease in Vaccine Group I = 43/280=0.154
Risk of disease in Vaccine Group II=52/250=0.208
Relative Risk (Risk Ratio) =0.154/0.208 =0.74
Odds
ODDS
Probability of developing disease divided by probability of not developing
disease. P/ (1-P)
Often expressed as number of times something expected not to happen:
number of times something expected to happen.
ODDS RATIO (OR)
Ratio of odds for exposed group divided by odds for not exposed group.
{P1/(1-P1)}/{P2/(1-P2)}
Odds ratios are treated as relative risks, especially when events are
rare, and emerge naturally in some types of studies (case-control
studies)
Example
Odds of disease in Vaccine Group I = 0.154/(1-0.154)=0.182
Odds of disease in Vaccine Group II= 0.208/(1-0.208)=0.263
Odds ratio of getting disease in Group I relative to Group
II=0.182/0.263=0.69 (close to relative risk of 0.74)
Absolute risk reduction
Absolute risk reduction (ARR)
Risk in treated group minus risk in control group
ARR=p1-p2
Number need to treat=1/ARR
This is the number you would need to treat under
each of two treatments to get one extra person
cured under the new treatment
Example
Absolute risk reduction for vaccine I=
0.208 - 0.154=0.054
NNT=1/0.054=18.5
Thus on average one would have to give vaccine I to
19 patients to expect one extra patient is being
protected from influenza compared with vaccine
II.
Summary
• Have clear objectives and aims to study
• Chose the study design that best
addresses these aims
• Use randomisation, blinding etc. where
appropriate
• Make sure sufficient numbers of
individuals studied to be able to reliably
answer the question.
Useful statistical references
• M Bland. An Introduction to Medical Statistics.
• Campbell MJ and Machin D (1993) Medical
Statistics: a commonsense approach. Wiley
• DG Altman. Practical statistics for medical
research. London: Chapman & Hall, 1991.
• DS Moore and GP McCabe. Introduction to
the practice of statistics. WH Freeman and
Company, New York, 3rd Edition. 1999.