Download This Item - Southern Health NHS Foundation Trust

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Receiver operating characteristic wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Misuse of statistics wikipedia , lookup

Student's t-test wikipedia , lookup

Omnibus test wikipedia , lookup

Psychometrics wikipedia , lookup

Transcript
A guided tour of
research study design and statistics
Mustafa Soomro
Consultant psychiatrist
St James Hospital, Portsmouth
1
Definition of variable
Variable is a ‘thing’ which we measure
and has a variable value.
2
Types of variables
in a study design
• Independent variable
–
–
–
–
Can be manipulated in experimental design
Causal variable (if confounders controlled)
Predictor
Antecedent
• Dependent variable
– Can not be manipulated in experiments (dependent
on the value of independent variable)
– Effect variable (if confounders controlled)
– Predicted
– Subsequent
3
Measurements used on variables (data)
Quantitative measures
• Discrete measures (equal interval integer measures): are
integers with equal intervals between successive integers; e.g.
number of days in hospital, number of children
• Continuous measures : in which any two intervals could be
infinitely divided; these may have true zero (e.g. temperature in
Kelvin or weight or height) or may not have a true zero (e.g.
temperature in C or F)
• Ratio variables: continuous variables with true zero; or
discrete (interval) variables with true zero (this property allows
ratio or coefficient of variation between measures to be
calculated)
4
Measurements (data)
Qualitative measures
• Ordinal measures – Ordered (or ranked) with
several orders with no equal distance between
the successive orders e.g. Likert Scale, disease severity
mild moderate and severe
• Nominal or categorical
Categories with no order and no equal intervals
and are mutually exclusive. Two categories
(dichotomous [male, female] or binary [yes, no]) or more
5
(polytomous) e.g. multiple political parties in the UK
Properties of various measures
Nominal Ordinal
Non-ration
Discrete /
continuous
Ratio
frequency distribution.
Yes
Yes
Yes
Yes
median and percentiles.
No
Yes
Yes
Yes
add or subtract.
No
No
Yes
Yes
No
No
Yes
Yes
No
No
No
Yes
NPM
NPM
PM
PM6
What calculations and
methods would apply
mean, standard deviation,
standard error of the mean.
ratio, or coefficient of
variation.
Whether parametric [PM]
or non-parametric [NPM]
methods
Frequency distribution of data
• Continuous and discrete
– Histogram
• Normal,
• Right or positive skewed
• Left or negative skewed
• Other distributions
• Categorical an ordinal
– Bar chart
7
Normal distribution
• Normal distribution
showing 1, 2 and 3
SDs
• 1 SD
68% (on one side 34%)
• 2 SD
95% (on one side 48%)
• 3 SD
99% (on one side 49.9%)
8
Descriptive statistics –
measures of
central tendency and spread
Central tendency
Spread
Mean
Variance, SD and
SE of mean and CI
Range and IQR
Median
Mode
9
Inferential statistics
• Using statistical tests (parametric and nonparametric) upon sample to test hypothesis
• Then drawing conclusions about the
population from the sample
• Two types of hypotheses:
– Null (there is no difference between the groups)
– Alternative (there is a difference)
10
Inferential statistics –
error in hypothesis testing
• Type one error (rejecting null hypothesis
incorrectly)
– Likelihood of this (called alpha) should be equal to
less than 0.05
• Type two error (accepting null hypothesis
incorrectly)
– Likelihood of this is called beta; 1-beta is called
power of the study; and is often set at 0.8
11
Inferential statistics
Statistical tests for hypothesis testing give
• P- value
– How likely the difference is due to chance i.e. P value equal
or less than 0.05
– Interpretation of 0.05 is that the probability of finding the
difference or greater difference by chance is in 1 in 20 or
less
• Confidence Interval (CI)
– Range of difference obtained in 95/100 repetitions of the
study (95% CI)
12
Inferential statistics
• In one tailed test:
– Null hypothesis A=B
– Alternative hypothesis is chosen as one of these
two: A>B or B>A
• In two tailed test:
– Null hypothesis A=B
– Alternative hypotheses are two as follows: A>B and
B>A
13
Continuous
(normal)
Ordinal or
Continuous (nonnormal)
Categorical
(binomial)
Describe one group
Mean, SD
Median, interquartile range
Proportion
Compare one group to
a hypothetical value
One-sample t test
Wilcoxon test
Chi-square or
Binomial test
Compare two unpaired
groups
Unpaired t test
Mann-Whitney test
Fisher's test (chisquare test)
Compare two paired
groups
Paired t test
Wilcoxon test
McNemar's test
Compare three or more
unmatched Groups
One-way ANOVA
Kruskal-Wallis test
Chi-square test
Compare three or more
matched groups
Repeated-measures
ANOVA
Friedman test
Cochrane Q
Quantify association
between two variables
Pearson
correlation
Spearman correlation
Contingency coefficients
Predict value from another
variable
Simple regression
Nonparametric regression
Simple logistic
regression
Predict value from several
other variables
Multiple
regression
Data
Goal
Multiple logistic
14
regression
Goal
Survival Time Data
Describe one group
Kaplan Meier survival curve
Compare one group to a hypothetical value
Compare two unpaired groups
Log-rank test or Mantel-Haenszel
Compare two paired groups
Conditional proportional hazards
regression
Compare three or more unmatched groups
Cox proportional hazard regression
Compare three or more matched groups
Conditional proportional hazards
regression
Quantify association between two variables
Predict value from another variable
Cox proportional hazard regression
Predict value from several other variables
Cox proportional hazard regression
15
Reliability of a test
Reliability is reproducibility …..
(test constructed using standardised criteria
will improve reliability)
16
Reliability of a test – types of reliability
• Internal consistency reliability
– Cronbach’s alpha – average of item-item correlations
– Split half reliability (not needed if Cronbach’s calculated)
– Item total correlation
• Test retest reliability
• Inter-rater reliability
– Percentage agreement (affected by chance agreement,
through should not be used)
– Kappa (for categorical data)
– ICC (for continuous data)
17
Cohen’s Kappa
Iner-rater agreement for categorical measures
K= observed agreement – expected agreement / 1- expected agreement
The K value can be interpreted as follows (Altman, 1991):
Value of K
< 0.20
Strength of agreement
Poor
0.21 - 0.40 Fair
0.41 - 0.60 Moderate
0.61 - 0.80 Good
0.81 - 1.00 Very good
18
ICC
• Measure of agreement for continuous data
which takes into account absolute
differences in ratings between the raters
19
Validity of a test
Validity is
authenticity or
truthfulness or
accuracy
20
Validity of a test
• Face validity
• Construct validity: relates to consistency of
features of a test
– Descriptive validity
– Content validity
• Divergent / Discriminant validity: investigating
correlation with a test consists of different constructs
• Convergent validity: investigating correlation with a
test consists of same constructs
• Criterion validity
– Concurrent validity: denotes confirmation by
other means eg gold standard test
– Predictive validity [utility]: relates to prediction
21
of course of the condition by the test
Study design validity and reliability
• Internal and external validity of study
– Internal validity refers to how much it is free
from bias
– External validity refers to how much it is
applicable to the population of interest
• Reliability of study – i.e. precision of its
results (narrowness of CI)
22
Study design types
• Experimental
– Randomised
• Individual unit randomised
• Cross over trials
• Cluster randomised
– Quasi-randomised
– Quasi-experimental
• Observational
– Case control (retrospective)
– Cohort (prospective and retrospective)
– Cross sectional surveys
23
– Longitudinal surveys (prospective panel studies)
Errors in studies
Non-systematic error – random error
• Due to small sample size (remedy: use
large enough samples)
• Due to less reliable measures (remedy:
reliable measures)
24
Errors in studies
Systematic error - bias
• Confounding bias:
– Selection bias [e.g. in selecting cases or controls in case
control studies] (remedy: use random selection or well
defined selection criteria)
– Allocation bias [e.g. in RCTs (remedy: use random and
concealed allocation)
• Information bias (remedy: use blinding and using
reliable and valid measures)
• Performance bias (remedy: use blinding)
• Attrition bias (remedy: use intention to treat
25
analysis and do complete follow up)
Understanding magnitude of effect –
basic concepts
Continuous data
Mean: arithmetic average
Categorical data
Risk or absolute risk:
Probability of an event (ratio of events to total of
events and non-events)
10 depressed pts receive AD;
6 respond and 4 do not respond; response rate (risk of
response or absolute response):
6/6+4 or 6/10 (60% or 0.6)
Odds:
Ratio of events to non-events
6/4 = 1.5
26
Magnitude of effect in RCTs,
continuous data
Mean difference (MD)
= Mean change in control group – mean change
in experimental group
Standardised mean difference (SMD)
(i.e. effect size)
= Mean difference / SD pooled
MD and SMD to be reported with confidence
intervals (CI)
27
Magnitude of effect in RCTs,
continuous data
SMD of 0.2
means that, mean difference from baseline in
one group differs by 0.2 standard deviation from
the same of the other group
SMD of 1
means that, mean difference from baseline in
one group differs by 1 standard deviation from
the same of the other group
28
Normal distribution
• Normal distribution
showing 1, 2 and 3
SDs
• 1 SD
68% (on one side 34%)
• 2 SD
95% (on one side 48%)
• 3 SD
99% (on one side 49.9%)
29
Magnitude of effect in RCTs, Continuous data
Effect size
% of control group who would
Standardised mean be below the average person
(mean difference from
difference (SMD)
baseline) in experimental group
0.0
50%
0.2
58%
0.5
68%
0.8
79%
1.0
84%
2.0
3.0
98%
99.9%
30
Magnitude of effect in RCTs, categorical data
AD
Placebo
Total
Not depressed
40
20
60
Depressed
10
30
40
50
50
100
Risk (absolute risk) of depression in control (control event
rate [CER]) =
30/50 = .6
Risk (absolute risk) of depression in experimental group
(experimental event rate [EER]) =
10/50 = .2
ARR (absolute risk reduction) = CER – EER = .6- .2 = .4
NNT (numbers needed to treat): 1/ARR = 1/.4 = 2.5 (3 after
rounding up) Interpretation: On average one needs to treat 3 patients with AD
to get one extra patient better than the response rate with placebo.
31
Magnitude of effect in RCTs, categorical data
AD
Placebo
Total
No sedation
20
40
60
Sedation
30
10
40
50
50
100
ARI (absolute risk increase): EER – CER
30/50 – 10/50 = 0.4
NNH (number needed to harm): 1/ARI for sedation would be
2.5 (2 after rounding down because with harm we need to
error on side of caution)
Interpretation: on average one needs to treatment 2 patients with
AD to have one extra patient experience sedation compared to
sedation rate with placebo
NNT and NNH should be reported with CIs
32
Magnitude of effect in RCTs, categorical data
AD
Placebo
Total
Not depressed
40
20
60
Depressed
10
30
40
50
50
100
RR (relative risk or risk ratio):
EER/CER = .2 /.6 = .333 (RR of depression with AD)
Interpretation: risk of depression with AD is 33%
that of which is with placebo
RRR (relative risk reduction): CER-EER / CER = .6-.2 / .4
Odds and OR (odds ratio)
EEO (experimental events odds) = 10/40 = .25
CEO (control events odds) = 30/20 = 1.5
OR = EEO / CEO = .17 Interpretation: odds of depression with
AD is 0.17 to that 1.0 with placebo
33
RR and OR should be reported with CI
Magnitude of accuracy in diagnostic studies
Two by two table of gold standard test results and
comparison test results
Comparison
test
Gold standard test
Disease
present
Disease absent
Total
Test positive
a (true positive)
25
b (false positive)
10
a+b
Test negative
c (false negative) d (true negative)
60
5
c+d
Total
a+c 30
a+b+c+d
b+d 70
34
Overall test accuracy
Comparison test
Gold standard test
Disease present
Disease absent
Total
Test positive
a (true positive)
25
b (false positive)
10
a+b
Test negative
c (false negative)
5
d (true negative)
60
c+d
Total
a+c 30
b+d 70
a+b+c+d
• Diagnostic odds ratio = a*d / b*c = 30 Interpretation: odds
of getting accurate result with the test to those of getting
inaccurate results
• Overall test accuracy = .65
TP + TN / TP + TN + FN + FP (i.e. whole sample)
a+d /a+b+c+d
35
Estimates of diagnostic test accuracy
Comparison test
Gold standard test
Disease present
Disease absent
Total
Test positive
a (true positive)
25
b (false positive)
10
a+b
Test negative
c (false negative)
5
d (true negative)
60
c+d
Total
a+c 30
b+d 70
a+b+c+d
Proportion with test positive in diseased
• Sensitivity = a/(a + c) = 25/30 = .83
Proportion with test negative in non-diseased
• Specificity = d/(b + d) = 60/70 = .86
36
Estimates of diagnostic test accuracy
Comparison test
Gold standard test
Disease present
Disease absent
Total
Test positive
a (true positive)
25
b (false positive)
10
a+b
Test negative
c (false negative)
5
d (true negative)
60
c+d
Total
a+c 30
b+d 70
a+b+c+d
Likelihood Ratio (LR) for positive and negative test
• Ratio of likelihood of test positive in diseased vs nondiseased
LR + = sens/(1 – spec) = .83 /.14 = 6
• Ratio of likelihood of test negative in diseased vs nondiseased
LR – = (1 – sens)/spec = .17/.86 = .2
37
Use of LR
• LR combines sensitivity and specificity.
• It measures the power of a test to change the pre-test into the
post-test probability of a disease being present.
LR for test positive
LR for test negative
Magnitude of change
from pre-test to posttest probability
LR more than 10
Less than 0.1
Large change
LR 5 to 10
LR 01 to .0.2
Moderate change
LR 2 to 5
LR 0.2 to 0.5
Small change
LR less that 2
LR more than 0.5
Tiny change
LR of 1.0
LR of 1.0
No change
38
Estimates of diagnostic test accuracy
Comparison
test
Gold standard test
Disease present Disease absent
Total
Test positive
a (true positive)
25
b (false positive)
10
a+b
35
Test negative
c (false negative) d (true negative)
60
5
c+d
65
Total
a+c 30
a+b+c+d
b+d 70
• Positive predictive value (PPV) = a/(a + b) = 25/35 = .7
• Negative predictive value (NPV) = d/(c + d) = 60/65 = .9
• Prevalence (pre-test probability) = (a + c)/(a + b + c + d)39
= 30/100 = .3
Estimates of diagnostic test accuracy
Comparison
test
Gold standard test
Disease present Disease absent
Total
Test positive
a (true positive)
25
b (false positive)
10
a+b
Test negative
c (false negative) d (true negative)
60
5
c+d
Total
a+c 30
a+b+c+d
b+d 70
• Pre-test odds of disease = prevalence/(1 – prevalence) =
.3/.7 = .43
• Post-test odds = pre-test odds × likelihood ratio =
.43*6= 2.6
• Post-test probability = post-test odds/(post-test odds + 1)
2.6/3.6 = .72 [compare this to.3 pre-test probability i.e. test
40
improves chance of diagnosis]
Nomogram for converting pre-test probability to post-test probability using LR
41
Receiver Operator Characteristic
Plot true positives on
(ROC) Curve
vertical with false positives
on horizontal for different
cut offs e.g. of a depression
scale.
This helps in deciding which
point on the curve would
give more acceptable
sensitivity and specificity.
There is a trade off between
sensitivity and specificity.
Area under curve (AUC) is
measure of overall test
accuracy (as curve nears
the diagonal its accuracy
reduces and as it move to
upper left corner its
accuracy increases)
42
Magnitude of effect
Cohort and Case-control
Cohort:
OR
RR
Case control:
OR
(All thes should be reported with CIs)
43
Bradford Hill’s criteria of causality
for observational studies:
1. Temporal association
2. Dose response association
3. Specificity: (this may not be always true
when multiple causation or multiple
outcomes are involved)
4. Consistency
5. Plausible biological associations
6. High strength of association
7. Absence of reverse causality
44
End
• Thank you
• Questions
45