Download Making comparisons

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Regression analysis wikipedia , lookup

Coefficient of determination wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Transcript
Q 1 Correct answer B




Categorical (aka qualitative) scales have discrete mutually exclusive
categories. eg hair colour or, diagnosis
Ordinal scales have discrete mutually exclusive categories that can be
logically ordered. eg high/low, or none, few, some, many, all or the
Beaufort scale
Interval scales have discrete or continuous values such that equal intervals on
the scale are arithmetically additive eg most psychiatric rating scales
Ratio scales have continuous values such that equal intervals are arithmetically
additive and in addition a true zero that means that qualities are
multiplicatively related such that you can meaningfully say that one score is
twice as high as another, eg length, or, weight
(A discrete variable has to take a whole number; a continuous variable can take any
number or fraction of a number)
This is one of those questions that appear in exams purely because they are easy to
ask about, not because the topic has any real consequence.
The basic thing to remember is that the scale represents what you are trying to
measure but the statistics analyse the scale measurements, not the thing itself
It is possible to tie yourself in knots about whether or not commonly used psychiatric
rating scales such as DES, MADRAS, HAMD etc are ordinal, interval or even ratio as
they are all composed of ordinal subscales that are summed to a scale that looks
interval but with a zero like a ratio scale. However in practical terms, although it
sounds plain wrong to say that an individual with a score of 10 is 2 more depressed
than an individual with a score of 8 or twice as depressed as an individual with a score
of 5, the statistics appropriate to interval scales (ie parametric tests) work for such
scales and as such for practical purposes they are deemed to be interval scales.
It used to be that all MCQ books had a question where you had to say if scales were
self or observer rated. Because of the internet and a cavalier attitude to copyright
many sites invite you to assess your so far undiagnosed manic depression, OCD, sex
addiction etc by rating yourself on scales that are meant to be observer rated.
Anyway, thaere’s reasonalble list at the following link
http://www.cnsforum.com/clinicalresources/ratingscales/ratingpsychiatry/
Q 2 Correct answer D
Stem and leaf plots are obvious once you have seen one. They’re good because they
allow you to get a sense of the overall distribution without being overwhelmed by a
list of values, or misled by bald summary statistics. The first time most people will
have seen one is the exam so you are now at a distinct advantage.
The range is the highest and lowest value or the arithmetic difference between them.
The mode is just the commonest value
Once you know that there are 25 values the median is the 13th.
The middle 50% of the values are subtended by the interquartile range (IQR). The
IQR can either be reported as the values for the 1st and 3rd quartiles (23.5 and 59) or as
in this case, the numerical difference between them ie 35.5.
The lower quartile is the value that divides the group of values between the lowest
and the median into two. As with many issues in statistics, the issue of how you work
out IQR is groined by titanic conflicts, worked out in the letters page of number
cruncher’s jazz mags like ‘Statistitians Only’ ‘STATO!’ and ‘Throbbing Numbers’ or
at conferences in technical colleges that pretend to university status in places like
Paisley. No-one is going to ask you to work out (as opposed to interpret) an IQR.
Remember folks there are 3 (not 4) quartiles, 4 (not 5) quintiles, and 99 (not 100)
centiles
Smart Gavins and Sassy Janes will have realised that for a unimodal right skewed
distribution the mean > median > mode, that if the mode and median are correct the
figure of 28 is impossible and that they can get on with the next question instead of
calculating the mean or trying to work out what a bloody IQR is as Gormless Gavins
or Crazy Janes might be tempted to..
Q 3 Correct answer C
Box and whisker plots are again a summary way of tabulating data without losing too
much of the quality of the distribution
The ‘box’ is the interquartile range and the line down the middle of the box is the
median so that each side of the box contains 25% of the values
The original convention (described by Tukey) was that the lines extend to the highest
or lowest values within a limit of 1.5 times the interquartile range on either side of the
box. Values outside this are represented by an asterisk. However all sorts of other
conventions such as the outer lines representing the 10th to the 90th or 1st to 99th
quartiles are used. It’ll say on the diagram.
The median does not have a confidence interval. Medians, definition, are used when
the underlying distribution is not known (‘distribution free’) and the sort of
assumptions that allow you to relate the sample mean to the population mean do not
apply.
Q 4 Correct answer A
(15/25)/(25/65) = 1.56
Risk (p) can take any value between 0 and 1.
Relative risk (RR) = can take any value between 0 and ∞
Risk and relative risk do not have to refer to a negative outcome (although you can
use the term relative benefit I think its more important to be clear that you’re using a
relative as opposed to an absolute measure and to specifically name the two quantities
you’re relating) However if you want to get lost in the terminology it’s all in the
attached glossary.
Doing simple sums live in the exam is very difficult if you’re already stressed out so
it’s worth practising by juggling the comparisons. If you’re in a study group get
someone to set a contingency table for you. For a 2x2 table there are 8 possible
combinations; the other 7 given below
RR of having pseudoseizures in non AED vs AED
RR of having pseudoseizures in AED vs non AED
RR of pseudoseizure remission in AED vs non AED
RR of receiving AED in remitted vs nonremitted
RR of no longer receiving AED in remitted vs nonremitted
RR of receiving AED in nonremitted vs remitted
RR of no longer receiving AED in nonremitted vs remitted
(10/25)
(40/65)
(40/65)
(10/25)
(25/65)
(15/25)
(25/40)
(40/50)
(15/40)
(10/50)
(40/50)
(25/40)
(10/50)
(15/40)
0.65
1.54
0.64
0.78
1.875
1.28
0.53
Q 5 Correct answer C
(15/10)/(25/40) = 2.4
Odds = p/(1 – p) ie the probability of something happening expressed as a ratio to the
probability that it won’t
Odds ratio (OR) is the ratio of two odds
OR approximates to RR if the outcome is rare. If the probability of one or both of the
outcomes is > 10% the approximation of OR to RR breaks down. For example
P1
0.02
0.20
0.80
P2
0.01
0.10
0.40
= RR
2
2
2
Odds1
0.0204
0.25
4
Odds2
0.0101
0.1111
0.6666
= OR
2.02
2.27
6
There are only two circumstances where OR must be used; case control studies
(where it is impossible to use RR) and logistic regression (where if you used RR you
end up with the possibility that you derive probabilities >1. However, if you see OR
quoted in an RCT it may be that it is being used inappropriately to enhance the
apparent effect size.
‘Down with odds ratios!’
Sackett DL, Deeks JJ, Altman DG Evidence Based Medicine 1996 Sept-Oct;1:164.
As in the previous question, it is a good idea to juggle the requested OR. However as
illustrated below, although ORs may be a poor approximation to the more intuitively
understandable RR, its much harder to make mistakes with the sums because
whatever way you do it there are only two answers.
OR of having pseudoseizures in non AED vs AED
OR of having pseudoseizures in AED vs non AED
OR of pseudoseizure remission in AED vs non AED
OR of receiving AED in remitted vs nonremitted
OR of no longer receiving AED in remitted vs nonremitted
OR of receiving AED in nonremitted vs remitted group
OR of no longer receiving AED in nonremitted vs remitted
(10/15)
(40/25)
(40/25)
(10/15)
(25/40)
(15/10)
(25/15)
(40/10)
(15/25)
(10/40)
(40/10)
(25/15)
(10/40)
(15/25)
0.42
2.4
0.42
0.42
2.4
2.4
0.42
Q 6 Correct answer E
The number needed to treat is the number of patients that would have to receive the
intervention in order for one patient to improve who would not have done so
otherwise
NNT = 1/ARR
ARR = (15/25) – (25/65) = (0.6 – 0.385) = 0.215
1/ARR = 4.65, rounded up to 5
Q 7 Correct answer E
The confidence interval for a statistically non significant ARR includes zero, therefore
the CI for a nonsignificant NNT includes the reciprocal of zero which is ∞.
‘The number needed to treat: problems describing non-significant results’
Vivek Muthu; Evid Based Mental Health 2003 6: 72 doi: 10.1136/ebmh.6.3.72
Confidence intervals for the number needed to treat;
Douglas G Altman; BMJ 1998;317:1309–12
NNT can only take values >1. Even if treatment is perfect and all untreated patients
have a poor outcome the ARR is 1 minus zero and the NNT 1/1 = 1. In fact, a
‘perfect’ treatment could still have a relatively ‘high’ NNT if the condition has a high
rate of spontaneous remission or placebo response.
Q 8 Correct answer D
Any apparent positive result could be a type 1 error (false positive) as a result of
Chance. ie the p value (aka the false positive rate or α). The probability that a result
could have occurred by chance is assessed by the appropriate statistical procedure but
the effect size itself, as opposed to its statistical significance, is not the result of a
statistical test.
Bias. Systematic measurement or response differences between groups unrelated to
the exposure producing an apparent difference. In this case there could have been a
systematic tendency to self or observer rate pseudoseizures as panic attacks if subjects
are no longer on AEDs (ie unblinding).
Confounding. The association of one or other experimental group with a factor that
is in turn associated with the outcome in such a way as to produce a spurious
relationship between intervention and outcome. In this case the result could be
confounded by willingness to take advice such that those patients willing to stop
AEDs are also those that will accept the diagnosis of pseudoseizures and stop having
them. In RCTs confounding that influences the outcome usually results from faulty
allocation concealment/randomisation.
Reverse causality; the outcome causes the exposure.
Maybe no referring
professional took the advice to withdraw medication but good prognosis cases then
remitted despite this and stopped medication on their own
Q 9 Correct answer B
The p value is the probability that the result could have occurred by chance
There is a jargon
p < 0.01 is highly statistically significant
p < 0.05 is statistically significant
p > 0.05 is statistically insignificant
However p between 0.05 and 0.10 although ‘insignificant’ is said by convention to
show a trend towards significance (all must have prizes)
Clinical significance or insignificance is determined with reference to the effect size
and critical appraisal of internal and external validity, it has very little to do with the p
value.
Q 10 Correct answer C
Chi squared (+/- Yates correction), Fisher’s exact test and McNemar’s test (for paired
or matched data) are the statistical tests of choice to analyse contingency tables ie
dichotomous or categorical outcomes. Fishers exact test is generally the most valid
option because the Chi Squared breaks down both in the particular circumstances of
low expected values and the general circumstances of small sample sizes (<60
subjects) or disparate group sizes and is now very easy to do because of computers,
t test compares two groups on a continuous normally distributed measure, Mann
Whitney compares 2 groups on a continuous non normally distributed measure and
ANOVA, among other things, compares 2 or more groups on continuous normally
distributed measures.
Q 11 Correct answer E
In fact the one tailed result is = 0.0328 (ie half the p value for two tailed)
One tailed testing is unidirectional, ie for two outcomes A and B, where A turns out to
be better than B, the one tailed p value only considers the probability of outcomes
A=B OR A>B. the probability for A<B is discounted. In this case using a one tailed
test discounts your friend’s hypotheseis that withdrawal of AED could result in more
pseudoseizures.
The maths of one vs two tailed is a bit obscure but the rule of thumb is be suspicious
of any paper that quotes a one tailed value. It is almost never appropriate outside the
circumstances of case control studies and logistic regression.
‘One and Two Sided Tests of Significance’ Altman DG, Bland M; BMJ 1994; 309; 248
Q 12 Correct answer D
The chi square statistic becomes approximate and potentially inaccurate with low
sample sizes, marked disparity in group sizes or low expected cell values. Pre
computer it was very laborious to do and was only used if you had to; the rule of
thumb being if >20% of expected cell values were less than 5 or any expected value
was less than one. Observed values of zero are not a problem. It is very laborious for
2x2 tables and even more so for bigger tables but computers have sorted this out.
The other much simpler way of correcting for the low sample size/low expected
values was Yates continuity correction but because of computers the more robust
Fishers exact test is preferred (although most statistical packages still give both)
Q 13 Correct answer B
Parametric tests assume that the data follow an underling distribution (almost always
the normal distribution) and non-parametric tests do not. Parametric tests are more
powerful in that if an exactly similar data set is analysed with a parametric and a non
parametric test, the parametric always has a higher chance of yielding a positive
result. Hence, although continuous outcomes are very difficult to translate into the
terms of an individual patient, continuous variables are used for the power
calculations because they are far more liable to yield a positive result.
Q 14 Correct answer A
Hooray, it’s time for that table
Data
2 independent
groups
χ2
Catergorical
Continuous
Normally distributed
Continuous
Distribution free
2 matched or
paired groups
> 2 groups
Fisher’s exact test
Yates Correction
McNemar’s Test
χ2
t test
Paired t test
ANOVA
Mann- Whitney
U
Wilcoxon
Kruskal Wallis
ANOVA
The t test tests for the difference in means between two samples. With group sizes
below 30 the normal distribution tends to underestimate the sd and therefore the
confidence interval. The t distribution corrects for this, hence the fact that it is also
known as the small sample t test
The assumptions for the t test are that the groups are of roughly equal size and
variance and are normally distributed. t tests are astonishingly robust to violations of
the assumptions but in practice statistical packages will tell you if the assumptions are
violated and how much difference this makes to the result.
Q 15 Correct answer E
Hopefully it should be obvious that the figures for ‘years on AED’ are highly skewed.
If they were genuinely normally distributed a large proportion of subjects would have
been on AED for a negative number of years. Skewed figures are quoted with
reference to mean and sd all the time, probably because the authors cut and paste them
from stats packages without thinking about it too much.
‘Detecting skewness from summary information’
Bland M, Altman DG; BMJ 1996; 313; 1200
Although it sounds a bit like cheating transforming data to allow the use of more
powerful statistics is perfectly permissible and helpful.
The commonest
transformation is the log transformation, and the distribution here is a classic log
normal ie would be expected to become normal on log transformation, but there are
loads of other potential ways of transforming data as illustrated in the reference
‘Transforming data’
Bland M, Altman DG; BMJ 1996; 312; 770
The interpretation of log transformed data, in particular with regard to means and
confidence intervals, is a bit tricky and probably fat beyond what you need to know
but I’ve included the reference below for completists
‘Transformations, means and confidence intervals’
Bland M, Altman DG; BMJ 1996; 312; 1079
Q 16 Correct answer D
It should be obvious that this data does not follow any sort of regular distribution. In
particular the data for the immediate withdrawal group must be bimodal. No
transformation is going to pull this into shape and you have to use distribution free
statistics, the appropriate statistic here being Mann Whitney U
Q 17 Correct answer E
Sometimes randomistion doesn’t work out. This can be apparent or not as a
statistically significant difference between groups on one or other baseline
measurement. The point here is that t tests are relatively powerful and have picked up
a difference in ages between groups. However it is very unlikely that such a small
difference in age, or a factor associated with a small difference in age, would
confound the results to any great extent
On the other hand the difference in proportion of subjects with a history of sexual
abuse is pretty different between groups, whether or not it reaches significance on the
relatively low powered Chi square/Fishers test that they presumably used. As such
the statistical significance of the difference is unlikely to really inform the clinical
significance of the effect on the result.
However, you cannot just abandon a study because the randomisation hasn’t gone
perfectly; if you measure enough baseline factors it is inevitable that some won’t be
distributed equally. Excluding patients with a history of CSA would mean being
unable to examine a patient group of paramount interest
An interaction is, broadly speaking, a subytype of confounder, ie a factor that when
associated with the treatment alters the outcome. For example Ritalin might work for
boys but not girls with ADHD; we would say that there is an interaction with gender
Q 18 Correct answer E
Randomisation, restriction,and matching are ways of dealing with confounding prior
to analysis.
Confounders can be dealt with in the analysis by regression or stratification.
Stratification is simpler; all you do is divide the sample into groups with and without
the confounder and conduct separate analyses. If there is confounding the two
estimates will be different. In this case it might be that the intervention works for non
sexually abused but not sexually abused subjects. This may in turn just be a proxy of
severity, maybe sexually abused subjects need an enhanced package of psychological
treatment
More complicated and less intuitively appealing but certainly more powerful are
regression techniques that construct an equation in the form
y = a1x1+ a2x2 +… anxn…..+ b
Where x, in this case, would be the predictor variables and a the weights given to
them in determining y, the outcome. If we set the outcome (dependent variable) as
pseudoseizures (yes/no) and the independent variables as withdrawal of AED (x1) and
hx of sexual abuse (x2) the computer models the observed outcomes in different
combinations to assign the relative importance. In this case, because the outcome is
dichotomous, you use logistic regression and the outcome is given as an odds, for
once appropriately. If it turned out that there was an effect for the intervention,
manifest as a reduction in pseudoseizures but that the effect was less in subjects with
a history of sexual abuse the output of the test might be summarised as;
‘The OR of persisting pseudoseizures following AED withdrawal was 0.7. Correcting
for baseline differences in proportion of subjects with a history of sexual abuse the
effect was reduced but still showed a trend towards significance. Subgroup analysis
of the sexually abused and non sexually abused subjects separately yielded differing
effect sizes ( 0.9 and 0.6 respectively) but neither reached statistical significance,
probably because of the reduced sample sizes.’
However in the ANCOVA on the endpoint mean peseudoseizure frequecy reduction
(log transformed) there was a significant result in favour of AED withdrawal
independent of covarying for PTSD 17 score (possibly a rather more valid marker of
the role of previous trauma than a mere hx of CSA)
ANCOVA (Analysis of Covariance)looks at the effect on continuous outcomes by
continuous covariants. Essentially it’s exactly the same as multiple regression
Q 19 Correct answer B
Unfortunately the poisson distribution did come up one year and it threw everyone.
On the other hand if you’ve heard of it at all you were at a massive advantage
The poisson distribution, or the law of rare events, or the law of small numbers,
describes the frequency of independent events occurring in uniform frames of area or
time. It is defined by the fact that it’s mean and variance are equal, a fact whose
refulgent beauty has e’er set a throb in my chest without my ever knowing what it
meant. Poisson distribution retain thy mystery I like thee well. For the purposes of
the exam you just need to know the circumstances of its use and that rates are
compared using a poisson regression.
The binomial distribution is used to work out the confidence interval of a proportion,
eg probability of throwing 6 sixes on 36 throws
The F distribution is used for ANOVA
Q 20 Correct answer B
Poisson events need to be independent. Incident cases of Huntington’s depend on a
case of Huntington’s already being there.
In terms of the other data you would be very interested if they did deviate from a
poisson distribution in time or place as this would make some sort of intervening
factor very likely