Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Q 1 Correct answer B Categorical (aka qualitative) scales have discrete mutually exclusive categories. eg hair colour or, diagnosis Ordinal scales have discrete mutually exclusive categories that can be logically ordered. eg high/low, or none, few, some, many, all or the Beaufort scale Interval scales have discrete or continuous values such that equal intervals on the scale are arithmetically additive eg most psychiatric rating scales Ratio scales have continuous values such that equal intervals are arithmetically additive and in addition a true zero that means that qualities are multiplicatively related such that you can meaningfully say that one score is twice as high as another, eg length, or, weight (A discrete variable has to take a whole number; a continuous variable can take any number or fraction of a number) This is one of those questions that appear in exams purely because they are easy to ask about, not because the topic has any real consequence. The basic thing to remember is that the scale represents what you are trying to measure but the statistics analyse the scale measurements, not the thing itself It is possible to tie yourself in knots about whether or not commonly used psychiatric rating scales such as DES, MADRAS, HAMD etc are ordinal, interval or even ratio as they are all composed of ordinal subscales that are summed to a scale that looks interval but with a zero like a ratio scale. However in practical terms, although it sounds plain wrong to say that an individual with a score of 10 is 2 more depressed than an individual with a score of 8 or twice as depressed as an individual with a score of 5, the statistics appropriate to interval scales (ie parametric tests) work for such scales and as such for practical purposes they are deemed to be interval scales. It used to be that all MCQ books had a question where you had to say if scales were self or observer rated. Because of the internet and a cavalier attitude to copyright many sites invite you to assess your so far undiagnosed manic depression, OCD, sex addiction etc by rating yourself on scales that are meant to be observer rated. Anyway, thaere’s reasonalble list at the following link http://www.cnsforum.com/clinicalresources/ratingscales/ratingpsychiatry/ Q 2 Correct answer D Stem and leaf plots are obvious once you have seen one. They’re good because they allow you to get a sense of the overall distribution without being overwhelmed by a list of values, or misled by bald summary statistics. The first time most people will have seen one is the exam so you are now at a distinct advantage. The range is the highest and lowest value or the arithmetic difference between them. The mode is just the commonest value Once you know that there are 25 values the median is the 13th. The middle 50% of the values are subtended by the interquartile range (IQR). The IQR can either be reported as the values for the 1st and 3rd quartiles (23.5 and 59) or as in this case, the numerical difference between them ie 35.5. The lower quartile is the value that divides the group of values between the lowest and the median into two. As with many issues in statistics, the issue of how you work out IQR is groined by titanic conflicts, worked out in the letters page of number cruncher’s jazz mags like ‘Statistitians Only’ ‘STATO!’ and ‘Throbbing Numbers’ or at conferences in technical colleges that pretend to university status in places like Paisley. No-one is going to ask you to work out (as opposed to interpret) an IQR. Remember folks there are 3 (not 4) quartiles, 4 (not 5) quintiles, and 99 (not 100) centiles Smart Gavins and Sassy Janes will have realised that for a unimodal right skewed distribution the mean > median > mode, that if the mode and median are correct the figure of 28 is impossible and that they can get on with the next question instead of calculating the mean or trying to work out what a bloody IQR is as Gormless Gavins or Crazy Janes might be tempted to.. Q 3 Correct answer C Box and whisker plots are again a summary way of tabulating data without losing too much of the quality of the distribution The ‘box’ is the interquartile range and the line down the middle of the box is the median so that each side of the box contains 25% of the values The original convention (described by Tukey) was that the lines extend to the highest or lowest values within a limit of 1.5 times the interquartile range on either side of the box. Values outside this are represented by an asterisk. However all sorts of other conventions such as the outer lines representing the 10th to the 90th or 1st to 99th quartiles are used. It’ll say on the diagram. The median does not have a confidence interval. Medians, definition, are used when the underlying distribution is not known (‘distribution free’) and the sort of assumptions that allow you to relate the sample mean to the population mean do not apply. Q 4 Correct answer A (15/25)/(25/65) = 1.56 Risk (p) can take any value between 0 and 1. Relative risk (RR) = can take any value between 0 and ∞ Risk and relative risk do not have to refer to a negative outcome (although you can use the term relative benefit I think its more important to be clear that you’re using a relative as opposed to an absolute measure and to specifically name the two quantities you’re relating) However if you want to get lost in the terminology it’s all in the attached glossary. Doing simple sums live in the exam is very difficult if you’re already stressed out so it’s worth practising by juggling the comparisons. If you’re in a study group get someone to set a contingency table for you. For a 2x2 table there are 8 possible combinations; the other 7 given below RR of having pseudoseizures in non AED vs AED RR of having pseudoseizures in AED vs non AED RR of pseudoseizure remission in AED vs non AED RR of receiving AED in remitted vs nonremitted RR of no longer receiving AED in remitted vs nonremitted RR of receiving AED in nonremitted vs remitted RR of no longer receiving AED in nonremitted vs remitted (10/25) (40/65) (40/65) (10/25) (25/65) (15/25) (25/40) (40/50) (15/40) (10/50) (40/50) (25/40) (10/50) (15/40) 0.65 1.54 0.64 0.78 1.875 1.28 0.53 Q 5 Correct answer C (15/10)/(25/40) = 2.4 Odds = p/(1 – p) ie the probability of something happening expressed as a ratio to the probability that it won’t Odds ratio (OR) is the ratio of two odds OR approximates to RR if the outcome is rare. If the probability of one or both of the outcomes is > 10% the approximation of OR to RR breaks down. For example P1 0.02 0.20 0.80 P2 0.01 0.10 0.40 = RR 2 2 2 Odds1 0.0204 0.25 4 Odds2 0.0101 0.1111 0.6666 = OR 2.02 2.27 6 There are only two circumstances where OR must be used; case control studies (where it is impossible to use RR) and logistic regression (where if you used RR you end up with the possibility that you derive probabilities >1. However, if you see OR quoted in an RCT it may be that it is being used inappropriately to enhance the apparent effect size. ‘Down with odds ratios!’ Sackett DL, Deeks JJ, Altman DG Evidence Based Medicine 1996 Sept-Oct;1:164. As in the previous question, it is a good idea to juggle the requested OR. However as illustrated below, although ORs may be a poor approximation to the more intuitively understandable RR, its much harder to make mistakes with the sums because whatever way you do it there are only two answers. OR of having pseudoseizures in non AED vs AED OR of having pseudoseizures in AED vs non AED OR of pseudoseizure remission in AED vs non AED OR of receiving AED in remitted vs nonremitted OR of no longer receiving AED in remitted vs nonremitted OR of receiving AED in nonremitted vs remitted group OR of no longer receiving AED in nonremitted vs remitted (10/15) (40/25) (40/25) (10/15) (25/40) (15/10) (25/15) (40/10) (15/25) (10/40) (40/10) (25/15) (10/40) (15/25) 0.42 2.4 0.42 0.42 2.4 2.4 0.42 Q 6 Correct answer E The number needed to treat is the number of patients that would have to receive the intervention in order for one patient to improve who would not have done so otherwise NNT = 1/ARR ARR = (15/25) – (25/65) = (0.6 – 0.385) = 0.215 1/ARR = 4.65, rounded up to 5 Q 7 Correct answer E The confidence interval for a statistically non significant ARR includes zero, therefore the CI for a nonsignificant NNT includes the reciprocal of zero which is ∞. ‘The number needed to treat: problems describing non-significant results’ Vivek Muthu; Evid Based Mental Health 2003 6: 72 doi: 10.1136/ebmh.6.3.72 Confidence intervals for the number needed to treat; Douglas G Altman; BMJ 1998;317:1309–12 NNT can only take values >1. Even if treatment is perfect and all untreated patients have a poor outcome the ARR is 1 minus zero and the NNT 1/1 = 1. In fact, a ‘perfect’ treatment could still have a relatively ‘high’ NNT if the condition has a high rate of spontaneous remission or placebo response. Q 8 Correct answer D Any apparent positive result could be a type 1 error (false positive) as a result of Chance. ie the p value (aka the false positive rate or α). The probability that a result could have occurred by chance is assessed by the appropriate statistical procedure but the effect size itself, as opposed to its statistical significance, is not the result of a statistical test. Bias. Systematic measurement or response differences between groups unrelated to the exposure producing an apparent difference. In this case there could have been a systematic tendency to self or observer rate pseudoseizures as panic attacks if subjects are no longer on AEDs (ie unblinding). Confounding. The association of one or other experimental group with a factor that is in turn associated with the outcome in such a way as to produce a spurious relationship between intervention and outcome. In this case the result could be confounded by willingness to take advice such that those patients willing to stop AEDs are also those that will accept the diagnosis of pseudoseizures and stop having them. In RCTs confounding that influences the outcome usually results from faulty allocation concealment/randomisation. Reverse causality; the outcome causes the exposure. Maybe no referring professional took the advice to withdraw medication but good prognosis cases then remitted despite this and stopped medication on their own Q 9 Correct answer B The p value is the probability that the result could have occurred by chance There is a jargon p < 0.01 is highly statistically significant p < 0.05 is statistically significant p > 0.05 is statistically insignificant However p between 0.05 and 0.10 although ‘insignificant’ is said by convention to show a trend towards significance (all must have prizes) Clinical significance or insignificance is determined with reference to the effect size and critical appraisal of internal and external validity, it has very little to do with the p value. Q 10 Correct answer C Chi squared (+/- Yates correction), Fisher’s exact test and McNemar’s test (for paired or matched data) are the statistical tests of choice to analyse contingency tables ie dichotomous or categorical outcomes. Fishers exact test is generally the most valid option because the Chi Squared breaks down both in the particular circumstances of low expected values and the general circumstances of small sample sizes (<60 subjects) or disparate group sizes and is now very easy to do because of computers, t test compares two groups on a continuous normally distributed measure, Mann Whitney compares 2 groups on a continuous non normally distributed measure and ANOVA, among other things, compares 2 or more groups on continuous normally distributed measures. Q 11 Correct answer E In fact the one tailed result is = 0.0328 (ie half the p value for two tailed) One tailed testing is unidirectional, ie for two outcomes A and B, where A turns out to be better than B, the one tailed p value only considers the probability of outcomes A=B OR A>B. the probability for A<B is discounted. In this case using a one tailed test discounts your friend’s hypotheseis that withdrawal of AED could result in more pseudoseizures. The maths of one vs two tailed is a bit obscure but the rule of thumb is be suspicious of any paper that quotes a one tailed value. It is almost never appropriate outside the circumstances of case control studies and logistic regression. ‘One and Two Sided Tests of Significance’ Altman DG, Bland M; BMJ 1994; 309; 248 Q 12 Correct answer D The chi square statistic becomes approximate and potentially inaccurate with low sample sizes, marked disparity in group sizes or low expected cell values. Pre computer it was very laborious to do and was only used if you had to; the rule of thumb being if >20% of expected cell values were less than 5 or any expected value was less than one. Observed values of zero are not a problem. It is very laborious for 2x2 tables and even more so for bigger tables but computers have sorted this out. The other much simpler way of correcting for the low sample size/low expected values was Yates continuity correction but because of computers the more robust Fishers exact test is preferred (although most statistical packages still give both) Q 13 Correct answer B Parametric tests assume that the data follow an underling distribution (almost always the normal distribution) and non-parametric tests do not. Parametric tests are more powerful in that if an exactly similar data set is analysed with a parametric and a non parametric test, the parametric always has a higher chance of yielding a positive result. Hence, although continuous outcomes are very difficult to translate into the terms of an individual patient, continuous variables are used for the power calculations because they are far more liable to yield a positive result. Q 14 Correct answer A Hooray, it’s time for that table Data 2 independent groups χ2 Catergorical Continuous Normally distributed Continuous Distribution free 2 matched or paired groups > 2 groups Fisher’s exact test Yates Correction McNemar’s Test χ2 t test Paired t test ANOVA Mann- Whitney U Wilcoxon Kruskal Wallis ANOVA The t test tests for the difference in means between two samples. With group sizes below 30 the normal distribution tends to underestimate the sd and therefore the confidence interval. The t distribution corrects for this, hence the fact that it is also known as the small sample t test The assumptions for the t test are that the groups are of roughly equal size and variance and are normally distributed. t tests are astonishingly robust to violations of the assumptions but in practice statistical packages will tell you if the assumptions are violated and how much difference this makes to the result. Q 15 Correct answer E Hopefully it should be obvious that the figures for ‘years on AED’ are highly skewed. If they were genuinely normally distributed a large proportion of subjects would have been on AED for a negative number of years. Skewed figures are quoted with reference to mean and sd all the time, probably because the authors cut and paste them from stats packages without thinking about it too much. ‘Detecting skewness from summary information’ Bland M, Altman DG; BMJ 1996; 313; 1200 Although it sounds a bit like cheating transforming data to allow the use of more powerful statistics is perfectly permissible and helpful. The commonest transformation is the log transformation, and the distribution here is a classic log normal ie would be expected to become normal on log transformation, but there are loads of other potential ways of transforming data as illustrated in the reference ‘Transforming data’ Bland M, Altman DG; BMJ 1996; 312; 770 The interpretation of log transformed data, in particular with regard to means and confidence intervals, is a bit tricky and probably fat beyond what you need to know but I’ve included the reference below for completists ‘Transformations, means and confidence intervals’ Bland M, Altman DG; BMJ 1996; 312; 1079 Q 16 Correct answer D It should be obvious that this data does not follow any sort of regular distribution. In particular the data for the immediate withdrawal group must be bimodal. No transformation is going to pull this into shape and you have to use distribution free statistics, the appropriate statistic here being Mann Whitney U Q 17 Correct answer E Sometimes randomistion doesn’t work out. This can be apparent or not as a statistically significant difference between groups on one or other baseline measurement. The point here is that t tests are relatively powerful and have picked up a difference in ages between groups. However it is very unlikely that such a small difference in age, or a factor associated with a small difference in age, would confound the results to any great extent On the other hand the difference in proportion of subjects with a history of sexual abuse is pretty different between groups, whether or not it reaches significance on the relatively low powered Chi square/Fishers test that they presumably used. As such the statistical significance of the difference is unlikely to really inform the clinical significance of the effect on the result. However, you cannot just abandon a study because the randomisation hasn’t gone perfectly; if you measure enough baseline factors it is inevitable that some won’t be distributed equally. Excluding patients with a history of CSA would mean being unable to examine a patient group of paramount interest An interaction is, broadly speaking, a subytype of confounder, ie a factor that when associated with the treatment alters the outcome. For example Ritalin might work for boys but not girls with ADHD; we would say that there is an interaction with gender Q 18 Correct answer E Randomisation, restriction,and matching are ways of dealing with confounding prior to analysis. Confounders can be dealt with in the analysis by regression or stratification. Stratification is simpler; all you do is divide the sample into groups with and without the confounder and conduct separate analyses. If there is confounding the two estimates will be different. In this case it might be that the intervention works for non sexually abused but not sexually abused subjects. This may in turn just be a proxy of severity, maybe sexually abused subjects need an enhanced package of psychological treatment More complicated and less intuitively appealing but certainly more powerful are regression techniques that construct an equation in the form y = a1x1+ a2x2 +… anxn…..+ b Where x, in this case, would be the predictor variables and a the weights given to them in determining y, the outcome. If we set the outcome (dependent variable) as pseudoseizures (yes/no) and the independent variables as withdrawal of AED (x1) and hx of sexual abuse (x2) the computer models the observed outcomes in different combinations to assign the relative importance. In this case, because the outcome is dichotomous, you use logistic regression and the outcome is given as an odds, for once appropriately. If it turned out that there was an effect for the intervention, manifest as a reduction in pseudoseizures but that the effect was less in subjects with a history of sexual abuse the output of the test might be summarised as; ‘The OR of persisting pseudoseizures following AED withdrawal was 0.7. Correcting for baseline differences in proportion of subjects with a history of sexual abuse the effect was reduced but still showed a trend towards significance. Subgroup analysis of the sexually abused and non sexually abused subjects separately yielded differing effect sizes ( 0.9 and 0.6 respectively) but neither reached statistical significance, probably because of the reduced sample sizes.’ However in the ANCOVA on the endpoint mean peseudoseizure frequecy reduction (log transformed) there was a significant result in favour of AED withdrawal independent of covarying for PTSD 17 score (possibly a rather more valid marker of the role of previous trauma than a mere hx of CSA) ANCOVA (Analysis of Covariance)looks at the effect on continuous outcomes by continuous covariants. Essentially it’s exactly the same as multiple regression Q 19 Correct answer B Unfortunately the poisson distribution did come up one year and it threw everyone. On the other hand if you’ve heard of it at all you were at a massive advantage The poisson distribution, or the law of rare events, or the law of small numbers, describes the frequency of independent events occurring in uniform frames of area or time. It is defined by the fact that it’s mean and variance are equal, a fact whose refulgent beauty has e’er set a throb in my chest without my ever knowing what it meant. Poisson distribution retain thy mystery I like thee well. For the purposes of the exam you just need to know the circumstances of its use and that rates are compared using a poisson regression. The binomial distribution is used to work out the confidence interval of a proportion, eg probability of throwing 6 sixes on 36 throws The F distribution is used for ANOVA Q 20 Correct answer B Poisson events need to be independent. Incident cases of Huntington’s depend on a case of Huntington’s already being there. In terms of the other data you would be very interested if they did deviate from a poisson distribution in time or place as this would make some sort of intervening factor very likely