Download Basic statistics Trial Design Epidemiological Techniques Life Tables

Document related concepts
Transcript
Statistics, Trial design,
Epidemiological
Techniques
Performance status and
co-morbidity
I Dukic, B Zelhof,
North Western Urology Teaching, 8th July 2014
Overview
 Basic statistics and epidemiological statistics
 Measurement of performance status and co-morbidity
 Trial design
 Vivas
1. What is evidence based medicine? What are the
problems with randomisation studies?
2. How would you set up a phase III trial?
3. What do you understand by statistical significance and
confidence intervals?
Basic Statistics
 2x2 table - sensitivity, specificity
 Prevalence/incidence
 ROC
 Hypothesis testing
 P value
 Confidence interval
 RRR, ARR, CER, EER, NNT
Data types
 Qualitative- categorical measurement i.e. not
number
▪ Nominal : e.g. yes/no
▪ Ordinal: rank e.g. most useful to least useful
 Quantitative – numerical measurement
▪ Interval- e.g. time interval
▪ Ratio-e.g height
Parametric vs Non-Parametric
 Non-parametric test is less powerful, therefore,
parametric should be used if possible, provided the
following rules are fulfilled
 The basic distinction for parametric vs non-parametric
is:
1. If measurement scale is nominal or ordinal (Qualitative) – nonparametric tests
2. If measurement is interval or ration scales (Quantitative) –
parametric tests
3. Other considerations include normally distributed data in
parametric tests, and the relationship between the groups or
variables being tested
Statistics
 Average of data points
 Mean, median, mode, weighted mean
 Measure of spread
 Centile, standard deviation, range
 How sure of answer
 P value, power calculation, ci etc
Skew

Mean = The average value, calculated by adding all the observations and dividing by the
number of observations (parametric)

Mode= is the most common (frequent) value- (non-parametric)

Median= Middle value of the list (often used when data are skewed) (non parametric)
2x2 tables
Patient HAS the
disease
Patient does NOT
have the disease
Positive test
Correct
Wrong
Negative test
Wrong
Correct
2x2 tables
Patient HAS the
disease
Patient does NOT
have the disease
Positive test
Correct
True Positive
Wrong
False positive
Negative test
Wrong
False Negative
Correct
True Negative
Sensitivity
Patient HAS the
disease
Patient does NOT
have the disease
Positive test
True Positive
False positive
Negative test
False Negative
True Negative
Proportion of patients correctly identified as having disease =
TP
TP + FN
Specificity
Patient HAS the
disease
Patient does NOT
have the disease
Positive test
True Positive
False positive
Negative test
False Negative
True Negative
Proportion of patients patient without disease correctly identified =
TN
TN + FP
Positive Predictive Value - PPV
Patient HAS the
disease
Patient does NOT
have the disease
Positive test
True Positive
False positive
Negative test
False Negative
True Negative
Proportion of positive test with the disease =
TP
TP + FP
Negative Predictive Value - NPV
Patient HAS the
disease
Patient does NOT
have the disease
Positive test
True Positive
False positive
Negative test
False Negative
True Negative
Proportion with negative test without the disease =
TN
TN + FN
Relevance of Sensitivity + Specificity
 Highly specific test is unlikely to give a false positive: +ve
result should be regarded as true +ve - SPIN
 Sensitive test rarely misses a condition: -ve test reassuring SNOUT
Type I + Type II Error
 Type I
 Rejecting the null hypothesis when it is in fact true is called
a Type I error
 False positive
 Type II
 Not rejecting the null hypothesis when in fact the alternate
hypothesis is true is called a Type II error
 False negative
Likelihood Ratio
 the chance that a specified test result would be expected in a
patient with the condition of interest, versus a patient without
the condition.
Sensitivity
1 - Specificity
Receiver Operating Characteristics (ROC)
Graph of the pairs of true positive rates (sensitivity)
and
False positive rates
(100% - specificity)
Receiver Operating Characteristics
(ROC)
• Graph of the pairs of true
positive rates (sensitivity)
and false positive rates
(100% - specificity)
• Assess if a test is useful
• Can compare two different
tests
• Select optimal cut off value
for test
ROC
 It shows the trade off between
sensitivity and specificity (any
increase in sensitivity will be
accompanied by a decrease in
specificity).
 The closer the curve follows the lefthand border and then the top border
of the ROC space, the more accurate
the test.
 The closer the curve comes to the 45degree diagonal of the ROC space,
the less accurate the test.
 The area under the curve is a
measure of test accuracy
Importance of cut –off value on test
performance
Importance of cut –off value on test
performance
ROC Curves for optimal PSA range in
patients aged 50 - 80
50 - 60 years
60 - 70 years
70 - 80 years
El-Gallery et al, Urology, 46:2000. 1995
Incidence + Prevalence
• Incidence - The proportion of new cases of a disease in
the population at risk during a specified time interval. It
is usual to define the disorder, and the population, and
the time, and report the incidence as a rate
• Prevalence - This is a measure of the proportion of
people in a population who have a disease at a point in
time, or over some period of time.
Hypothesis (significance) Testing
 Null hypothesis (H0) - the exposure /
intervention being studied is not associated with
the outcome of interest. The difference in means
=0
 Alternative hypothesis (H1) – holds if null
hypothesis is not true
 Two-tailed tests - assume difference in means in
both directions e.g smoking rates different in
men and women men>women or women>men
 One-tailed test – direction of effect specified in
H1 e.g new drug cannot make things worse.
P value
 Viva question 3
 Hypothesis testing – produces a p-value
 P – value is the probability of obtaining our results , if the
null hypothesis is true (chance). It is not the probability
that the null hypothesis is true or correct
 Allows assessment of whether findings are statistically
significant or not statistically significant from a reference
value
 P<0.05 (smaller the p – value the greater the evidence
against the null hypothesis)
 P>0.05 we do not reject the null hypothesis – this does
not mean the null hypothesis is true.
Confidence Interval
 Viva Teaching – question 3
 The range of plausible values for the “true”
effect
 Generally use 95% certainty
 It can be used to make a decision with out
providing an exact p - value
 If the value lies outside 95% C.I Then reject H0 – p
< 0.05 no exact value.
What determines the width of the
confidence interval?
1. Sample size - a larger sample size will give more
precise results with narrower C.I.
2. Variability of the characteristic being studied; the
less variable it is (between subjects, within
subjects, measurement error etc, the more precise)
3. The degree of confidence required (95%,90%,
65%); the more confidence required, the wider the
interval.
Relative Risk / Risk Ratio
• Risk of an event /
developing a disease
relative to exposure.
It is a ration of the
probability of the
event occurring in the
exposed group
versus the nonexposed group
Outcome
No
Outcome
Exposed
A
B
Not Exposed
C
D
 RR = A / (A + B)
C / (C + D)
Risk ratio – ratio of risk in
exposed / risk in unexposed
Relative Risk 2
 Experimental Event Rate (EER) =
A
A+B
 Control Event Rate (CER) =
C
C+D
 Relative Risk = EER
CER
Outcome
No
Outcome
Exposed
A
B
Not Exposed
C
D
Relative Risk 3
 Suited to clinical trials
 A relative risk of 1 means there is no difference in risk
between the two groups
 An RR of < 1 means the event is less likely to occur in the
experimental group than in the control group
 An RR of > 1 means the event is more likely to occur in the
experimental group than in the control group
Relative Risk Reduction
Outcome
No
Outcome
Exposed
A
B
Not Exposed
C
D
Absolute risk reduction = EER - CER
Relative risk reduction = Risk difference =
Baseline risk
EER - CER
CER
Worked
Example
Relative Risk
•‘PROSCAR more than halves
the risk of developing acute
urinary retention and the
need for surgery’
•PLESS study
RR example 2
RR example 3
•
EER =
Retention
No
Retention
Total
Placebo
42
1471
1513
Finasteride
99
1404
1503
A
=
A+B
•
CER =
C
RR = EER
CER
=
2.8%
=
6.6%
2.8 =
0.42
1513
=
C+D
•
42
99
1503
=
6.6
Worked Example RRR
Retention
No
Retention
Total
Placebo
42
1471
1513
Finasteride
99
1404
1503
 RRR = risk difference = 6.6 – 2.8 = 58%
baseline risk
6.6
Worked Example ARR
Retention
No
Retention
Total
Placebo
42
1471
1513
Finasteride
99
1404
1503
 Absolute Risk Reduction = Control event rate – Experimental event rate
= CER – EER
= 6.6 – 2.8 = 3.8
Worked Example NNT
Retention
No
Retention
Total
Placebo
42
1471
1513
Finasteride
99
1404
1503
 NNT =
1
ARR
=
1
0.038
= 26
Statistics in
PLESS study
•‘PROSCAR more than halves the
risk of developing acute urinary
’
retention
and the need for surgery’
Retention in control = 6.6%
Retention in treatment = 2.8%
Relative risk reduction = 58%
Absolute risk reduction = 3.8%
NNT = 26
Combination therapy
NNT
MTOPS
CombAT
Progress
clinically
37
18
Prevent AUR
147
22
Prevent
surgery
52
18
McConnell, J.D. et al., 2003. The Long-Term Effect of Doxazosin, Finasteride, Roehrborn, C.G. et al., 2010. The Effects of Combination Therapy with
Dutasteride and Tamsulosin on Clinical Outcomes in Men with Symptomatic
and Combination Therapy on the Clinical Progression of Benign Prostatic
Benign Prostatic Hyperplasia: 4-Year Results from the CombAT Study.
Hyperplasia. New England Journal of Medicine, 349(25), pp.2387–2398.
European Urology, 57(1), pp.123–131.
Why is NNT important?
 It takes into account the underlying frequency of the outcome
(which RRR does not)
 The ideal NNT is 1, where everyone has improved with treatment
and no-one has with control.
 The higher the NNT, the less effective is the treatment
 NNTs are only one element of decision making and need to be
integrated with
 Patients’ underling risk
 patient preferences,
 caregiver experience and judgment
 local constraints and conditions
Common NNTs in Urology
 Appropriate antibiotic in UTI
1
 Treatment for stone passage in ureteric colic
4
 Intraurethral alprostatil for ED
2.3
 Compression stockings for post op DVT
9
 Aspirin/streptokinase after M.I
20
 Finasteride to prevent retention
26
 Aspirin after MI
40
Odds Ratio
 Way of comparing if the
probability of an event is
the same for 2 groups
 OR = 1 – equally likely in
both groups
 OR > 1 – more likely in
first group
 OR < 1 – less likely in
first group
Outcome
No
Outcome
Exposed
A
B
Not
Exposed
C
D
 OR = A/B = AD
C/D
BC
Summary of effect measures
Measure of
Effect
Abreviation
Description
No effect
Total
success
Absolute Risk
Reduction
ARR
Absolute change in risk: risk of
event in control group – risk of event
in treated group
ARR = 0%
ARR =
initial risk
Relative Risk
Reduction
RRR
Proportion of Risk remoed by
treatment: ARR divided by initial risk
in control group
RRR = 0%
RRR =
100%
Relative Risk
RR
Risk of event in treated group
divided by risk of event in control
group
RR = 1 or
100%
RR = 0
Odds Ratio
OR
Odds of event in treated group
divided by odds of event in control
group
OR = 1
OR = 0
Number Needed
to Treat
NNT
Number of patients needing
treatment to prevent one event:
reciprocal of ARR
NNT = ∞
NNT =
1/initial risk
Other Random Statistical Stuff
 Population should be clearly defined
 Sample every individual from population it is drawn must have an
equal chance of being included
 chi-square test (also chi squared test or χ2 test) is any statistical
hypothesis test in which the sampling distribution of the test statistic is
a chi-squared distribution when the null hypothesis is true, or any in
which this is asymptotically true, meaning that the sampling
distribution (if the null hypothesis is true) can be made to approximate
a chi-square distribution as closely as desired by making the sample
size large enough.
 Mann–Whitney U test (also called the Mann–Whitney–Wilcoxon
(MWW) or Wilcoxon rank-sum test) is a non-parametric statistical
hypothesis test for assessing whether two independent samples of
observations have equally large values
Levels of evidence
1a – evidence obtained from meta-analysis of randomized
studies
1b - evidence obtained from at least one randomized trial
2a - evidence obtained form well-designed controlled study
without randomization
2b - Evidence obtained from at least one other type of well
designed cohort or case-control study
3 - Evidence obtained from well designed non-experimental
studies such as comparative studies, correlation studies and
case report
4 - Evidence obtained from expert opinion
Controlled trials
Randomized control trial:
Is a specific type of scientific experiment, participants in
the trial are randomly allocated to either one
intervention. It is the gold standard for a clinical trial.
RCT are often used to test the efficacy of various types of
intervention within a patient population.
Double blinded
Single blinded
Non-Randomized trial
Observational studies
 Cohort study (longitudinal study)
 One way of getting around the problem of the small
proportion of people with the disease of interest is the
cohort study
 Following a group of people (i.e. the cohort) over time and
observe they develop disease
 Generally concerned with the aetiology of disease rather
than treatment
Observational studies
 Case-control study
 Another solution to the problem of the small number of
people with the disease of interest.
 Patients with a particular condition are matched with
control
 Case-control study is generally concerned with the
aetiology of disease rather than treatment
 Cross-sectional study
 Cross-sectional studies involve data collected at a defined
time. They are often used to assess the prevalence of
acute or chronic conditions, or to answer questions about
the causes of disease or the results of medical
intervention. They may also be described as censuses
Phases of studies
 Consists of 4 phases. If drugs pass phases 1-3, usually
approved by regulatory bodies.
 Phase 1: Screening for safety- experimental drug or
treatment in a small group of subjects (20-80) for the
first time to evaluate its safety, determine a safe
dosage range and identify side effects.
 Phase 2: Establishing the testing protocol- experimental
treatment is given to a larger group (100-300) to see if
it is effective and to further evaluate its safety.
Phases of studies
 Phase 3: Final testing-treatment is given to a large
group of subjects (1000-3000) to confirm its
effectiveness, monitor side effects, compare it to
commonly used treatments and collect information that
will allow it to be used safely
 Phase 4: Post-approval studies- post marketing studies.
Including treatment’s risks, benefits and optimal use
Parametric statistical tests
 Compare the difference between normally distributed
data sets
 Analysis of variance (ANOVA) – used to compare the
means of two or more samples to see whether they
come from the same population, testing the null
hypothesis
 t-test – compare two samples
 Χ2 (Chi squared) – a measure of difference between
actual and expected frequencies (usually based on a
null hypothesis), alternative includes Fisher’s exact test
(small numbers) and Mantel Haenszel test for comparing
multiple two way tables
Non parametric tests
 Data not normally distributed, examples include:
 Mann-Whitney U
 Wilcoxon rank test
 Kruskal Wallis
 Fridemann
Resources






Medical stats made easy (2003)
Notes on statistics for medical students
Statistics at Square One – BMJ publishing (free online)
How to read a paper
How to read a paper: Statistics for the non-statistician
Clinicians guide to statistics for medical practice and
research
Measurement of Performance
status and comorbidity
K Moore
Presented by I Dukic 2014
Performance Status
 Definition
 scales and criteria used by doctors and researchers to
assess how a patient's disease is progressing, assess how
the disease affects the daily living abilities of the patient, and
determine appropriate treatment and prognosis.
Assessment of performance status
 Various scoring systems
 Zubrod / WHO / ECOG
 Karnofsky
 Lansky - children
Eastern Cooperative Oncology Group
 ECOG was established in 1955 as one of the first
cooperative groups launched to perform multi-center
cancer clinical trials. ECOG has evolved from a five
member consortium of institutions on the East Coast to
one of the largest clinical cancer research organizations
in the U.S.
ECOG PERFORMANCE STATUS
0 - Fully active, able to carry on all pre-disease performance without
restriction
1 - Restricted in physically strenuous activity but ambulatory and
able to carry out work of a light or sedentary nature, e.g., light
house work, office work
2 - Ambulatory and capable of all selfcare but unable to carry out
any work activities. Up and about more than 50% of waking
hours
3 - Capable of only limited selfcare, confined to bed or chair more
than 50% of waking hours
4 - Completely disabled. Cannot carry on any selfcare. Totally
confined to bed or chair
5 - Dead
Am. J. Clin. Oncol: Toxicity And Response Criteria Of The Eastern Cooperative
Oncology Group. Oken et al. Am J Clin Oncol 5:649-655, 1982
WHO PERFORMANCE STATUS
0 - you are fully active and more or less as you were before your
illness
1 - you cannot carry out heavy physical work, but can do anything
else
2 - you are up and about more than half the day; you can look
after yourself, but are not well enough to work
3 - you are in bed or sitting in a chair for more than half the day;
you need some help in looking after yourself
4 - you are in bed or a chair all the time and need a lot of looking
after
KARNOFSKY PERFORMANCE
STATUS

100 – you don’t have any evidence of disease and feel well

90 – you only have minor signs or symptoms but are able to carry on as normal

80 – you have some signs or symptoms and it takes a bit of effort to carry on as normal

70 – you are able to care for yourself but unable to carry on with normal activities/active
work

60 – you need help from time to time but can mostly care for yourself

50 – you need quite a lot of help to care for yourself

40 – you always need help to care for yourself

30 – you are disabled and may need to stay in hospital

20 – you are sick, in hospital and need a lot of treatment

10 – you are very sick and unlikely to recover
Comparing ECOG with KARNOFSKY
ECO
G
Performance
KARNOFSKY
0
Fully active
90-100
1
Not able to do strenuous work but otherwise OK
70-80
2
Capable of self-care only. Up and about > 50% of waking
hours
50-60
3
Limited self care only. In bed or chair > 50% of waking
hours
30-40
4
Completely disabled. No self care possible. Confined to
bed/chair
12-20
Comorbidity
 Either the presence of one or more disease in addition to
a primary disease, or, the effect of such additional
disorders or diseases
 Many tests attempt to standardise the weight or value of
comorbid conditions
 Attempt to consolidate each individual comorbid
condition into a single, predictive variable that measures
mortality or other outcomes
 Researchers have validated such tests because of their
predictive value, but no one test is as yet recognized as a
standard
Comorbidity Index
 13 different methods identified and critically reviewed – 1
disease count, 12 indexes
 Charlson Index – most extensively studied
 Cumulative Illness Rating Scale (CIRS) – addresses
body systems without specific diagnoses
 Index of Coexisting Disease (ICED) – 2D, measures
disease severity and disability
 Kaplan – for use in diabetes
 Insufficient data on others

De Groot V et al. How to measure comorbidity: a critical review of available methods. J Clin
Epid 2003; 56: 221-229.
Charlson Comorbidity Index
 Charlson index most extensively studied and seems to
be the method of choice in urology.
 22 diseases in the index selected and weighed on the
basis of the strength of their association with mortality
 Validated as a predictor of short term and long term
mortality.
Charlson Comorbidity Index
Charlson Comorbidity Index
 Studied in Prostate, Renal, Bladder.
 Charlson score divided in to 3 levels
 Low (0)
 Medium (0-2)
 High (3 or more)
 Can look at
Each 1 point increase in
CCI score leads to a 2.3
increase in relative risk
of death at 12 months.
Data from 1987
 Survival for the same level of comorbidity over time
(CaP)
 Survival for different levels of comorbidity (RCC)
D
Tumour Factors
O
Patient Factors
Performance Status
Comorbidity Index
Life Tables
Treatment
C
T
O
Patient Preference
R
Observation