* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Basic Statistics II – Examples
Survey
Document related concepts
Transcript
Basic Statistics II – Examples 1. It was postulated that the increasing rate of hip fractures among elderly women may be related to women now having longer femoral necks than in previous generations. Fifty-two X-ray pictures of hips of elderly women taken in the 1950s were compared to 52 similar pictures taken in the 1990s. All the X-rays had been taken for routine diagnostic purposes with the same equipment in the same rheumatology unit. The following table appeared: Mean (sd) dimensions of proximal femur in 52 elderly white women in New Zealand in 1950s compared with those of 52 in 1990s Dimension (mm) 1950s 1990s P value Length of hip axis 124.0 (8.6) 130.5 (8.6) 0.0002 Length of femoral neck 79.4 (7.6) 84.9 (6.3) 0.0001 Width of femoral neck 38.1 (4.1) 38.6 (3.6) 0.49 a) What method could be used to calculate P in this study? What conditions, if any, do the data have to fulfill for the method to be valid? Are they likely to be fulfilled? b) From these P values, can we conclude that the length of femoral neck in elderly women has increased over time? Can we conclude that the width of femoral neck in elderly women has not increased over time? 2. A group of researchers observed that baby boys had higher pain scores following vaccination than baby girls. They hypothesized that this might be partly related to previous experience with acute pain such as circumcision. They therefore sought to investigate post hoc whether male neonatal circumcision is associated with a greater pain response to routine vaccination at 4 or 6 months. They scored pain response after vaccination with Haemophilus influenzae type B (HIB) (n=18). An independent observer rated the pain responses for each baby on a scale from 0 (no pain) to 10 (worst pain). Of the 5 boys not circumcised, 1 had a pain score of 4, 3 a pain score of 6, and 1 a pain score of 8. Of the 12 boys who were circumcised, 2 had a pain score of 6, 7 a pain score of 8, and 4 a pain score of 9. After HIB, pain score was significantly higher in boys who had previously been circumcised (Mann-Whitney U test, P = 0.01). a) What feature of the data makes the authors use the Mann-Whitney U test here? Is it necessary here? b) What are the disadvantages of this method compared to methods based on the Normal distribution? 3. The table below is taken from a study investigating the cause of diarrhoea in patients with gastroenetritis and shows the relationship between foreign travel and a positive result for the organism Providencia alcalifaciens: Recent travel abroad? Yes No Total a) b) c) d) e) f) P. alcalifaciens Positive 25 5 28 Negative 229 368 597 Total 254 373 627 What kind of study is this? What is meant by “ χ2 = 23.98, P < 0.001? “ Explain briefly how the chi-squared test works What conditions do the data have to meet for the test to be valid? What conclusions can be drawn from these data? What other information would be useful in deciding whether P.alcalifaciens was a likely cause of gastroenteritis in travellers? 4. In a double-blind comparison of two ointments, containing calcipotriol or betamethasone, for the treatment of psoriasis, 345 subjects were given one ointment on the left side of the body and the other on the right, the side being chosen at random. The severity of the condition was assessed by self report as “cleared”, “pronounced improvement”, “slight improvement”, “no change” or “worse”, and by visual assessment as a score by the investigator. The score was significantly lower (P < 0.001) for the calcipotriol side than the betamethasone side. Patients were more likely to report “cleared” or “pronounced improvement” on the calcipotriol side. a) Suggest 2 methods that could be used to carry out the significance test on the scores. What assumptions would be required for each? Which method would you choose? b) The self-assessment was recoded into two categories, “cleared or pronounced improvement” or “little or no improvement”. What method could be used to carry out the significance test for the recoded data? c) What can we conclude? 5. Spontaneous pneumothorax is the presence of air or gas in the pleural space without known cause. Two treatments, simple aspiration and intercostal tube drainage, were compared in a randomised controlled trial. Each treatment involves inserting a tube to remove excess air: in simple aspiration the tube is removed after air has been removed, in intercostal drainage it is left in place for a while. Of the 35 patients allocated to simple aspiration, 7 were not aspirated successfully and intercostal drainage was used. Some of the patients pre-treatment data are shown in the table: Age (years) Sex (M/F) Height (m) Weight (kg) Smoking history (pack years) Mean (sd) Aspiration Intercostal drain 34.6 (15.0) (n=35) 34.6 (13.1) (n=38) 28/8 29/9 1.79 (0.11) (n=32) 1.76 (0.09) (n=31) 64.8 (9.2) (n=31) 63.6 (11.0) (n=37) 8.09 (9.32) (n=34) 7.94 (9.16) (n=38) p-value 0.8 0.92 0.81 0.65 0.95 a) What shape do you think the distribution of smoking history has? b) What statistical method could be used for each of the tests given? The following table shows the nature of the pneumothorax, before treatment: Size of pneumothorax Small rim Partial collapse Complete collapse Aspiration Intercostal drain P-value 3 16 10 1 12 18 0.054 c) What statistical method could be used to find the P value here? d) What null hypotheses are being tested in this and the preceding table ? Are these useful things to test? Basic Statistics II – Answers 1. a) In this study we have means from two independent samples. Hence, we can use 2sample t- test to test the null hypothesis that the means are the same in the populations from which samples were drawn. This method would assume that the data are from Normal distributions, with the same variance. These variables are measurements of skeletal size which often follow a Normal distribution. The standard deviations are small in comparison with the means, providing no evidence of skewness. b) There is good evidence that the length of the femoral neck is different in the populations that these samples represent because the difference was statistically different. To conclude that there is a change over time we must assume that the samples are truly representative of elderly white women in these two decades. There is no evidence that the width of the femoral neck is different in the populations that these samples represent because the difference was not significant. However, this does not mean that it has not changed. The sample may be too small to detect a change that has occurred. A confidence interval would be more informative. The difference is 0.5 with 95% confidence -1.0 to +2.0. The clinical implication is that elderly women in the 1990s had longer, but not thicker, femoral necks than similar women in the 1950s. Longer bones may be more likely to break, and so this may explain any increase in fractured neck of femur. 2. a) The Mann-Whitney U-test compares the distribution of pain in 2 independent groups, here baby boys who have and who have not been circumcised. The test is based on the ranks of the data. The data here are discrete, rather than continuous, varying between 0 and 10. Thus the data cannot follow a Normal distribution, and a two-sample t-test would therefore only be approximate. b) The main disadvantage of the Mann-Whitney U-test is that it is a significance test and gives a p-value but does not give a simple estimate and CI for the difference between the groups. Methods based on the Normal distribution such as the t-test would give an estimate of the difference and CI in addition to the P value. The simple t test is fairly robust to grouping and so might have been used here thus giving a mean difference (2.0) and 95% CI (0.8 to 3.2). 3. a) This is a cross-sectional study. There is a single group of patients and both assessments, Providencia and travel history, were made at the same time. b) This is the result of the chi-square test that tests the null hypothesis that there is no association between P.alcalifaciens and foreign travel. The value 23.98 is the test statistic which will follow a chi-squared distribution with 1 df if the null hypothesis is true. P<0.001 tells us that the probability of these data or more extreme data occurring if the null hypothesis were true is smaller than 0.001 and so we have good evidence that the null hypothesis is not true and conclude that an association exists. c) The chi-squared test works by comparing the observed frequencies with those expected if the null hypothesis were true. If the null hypothesis were true then the observed and expected frequencies would be similar. Hence, a large value of the test statistic indicates good evidence against the null hypothesis. The actual value is looked up in a table of the chi-squared distribution to give the exact probability of a value greater than or equal to that observed. d) The chi-square test is a large sample test and the usual rule is that a large sample approximation holds if all expected frequencies are greater than 5 for a 2 x 2 table. Although one observed frequency is 5, no expected values will be as small. This is because if the null hypothesis were true then the overall probability of being positive for P.alcalifaciens would be 28/627 = 0.04 and this proportion would apply to those who have and those who have not traveled abroad. Thus, the expected numbers positive for P.alcalifaciens would be 254 x 28 / 627 = 11.3 for those who travelled abroad and 373 x 28/627 = 16.7 among those who have not travelled abroad. The other expected values can be calculated in a similar way but will be large because the expected values must add to the marginal totals for each row and column. e) The study shows that there is a statistically significant association between travelling abroad and being positive for P.alcalifaciens among people with gastroenteritis. We cannot conclude from this that P.alcalifaciens was the cause of the gastroenteritis. We can only conclude that an association between Providencia and foreign travel exists. f) We need a control group. We could look at the number of positive screens for P.alcalifaciens among subjects without diarrhoea cross-classified according to whether or not they had recently travelled abroad. This would tell us if the observed association between travel and P.alcalifaciens was a general one or one specific to those with diarrhoea. 4. a) The data are paired. The paired t-test could be used if the differences come from a Normal distribution. The Wilcoxon paired test could also be used but this would be less powerful than the t test if the t-test assumptions were satisfied. b) The data are paired and the variable is dichotomous. We could use McNemar’s test to compare proportions who had improved for the two treatments. c) The response was better using calcipotriol ointment than using betamethasone, though we are not told by how much. As this was a randomized double-blind trial, the evidence suggests that the ointment itself was the cause. 5. a) The distribution must be skew to the right. This is obvious because the standard deviation is greater than the mean. b) We have two small independent samples. For the measurements we could consider the two-sample t-test, possibly with a transformation, or the MannWhitney U-test. For each of them the standard deviations are similar in the two groups. For age, height and weight, the t-test appears appropriate. Smoking is highly skew and is likely to have a number of zeros for nonsmokers, hence the Mann-Whitney U-test is probably preferred. Sex could be compared by the chi-square test. c) This is a 3 x 2 contingency table. The total for the first row is 4, so both expected frequencies will be considerably less than 5. Hence, the chi-square test would not be valid. We could combine rows, ‘small rim’ and ‘partial collapse’ to give a 2 x 2 table. But does this make clinical sense? d) The null hypothesis in each case is that there is no difference in the mean or proportion of this variable between aspiration and intercostals patients in the population from which they come. As they are randomized, the two groups come from the same population and these null hypotheses are true by definition. The tests are therefore superfluous.