Download Basic Statistics II – Examples

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Foundations of statistics wikipedia , lookup

Psychometrics wikipedia , lookup

Omnibus test wikipedia , lookup

Misuse of statistics wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript
Basic Statistics II – Examples
1. It was postulated that the increasing rate of hip fractures among elderly women may
be related to women now having longer femoral necks than in previous generations.
Fifty-two X-ray pictures of hips of elderly women taken in the 1950s were compared
to 52 similar pictures taken in the 1990s. All the X-rays had been taken for routine
diagnostic purposes with the same equipment in the same rheumatology unit. The
following table appeared:
Mean (sd) dimensions of proximal femur in 52 elderly white women in New Zealand
in 1950s compared with those of 52 in 1990s
Dimension (mm)
1950s
1990s
P value
Length of hip axis
124.0 (8.6)
130.5 (8.6)
0.0002
Length of femoral neck
79.4 (7.6)
84.9 (6.3)
0.0001
Width of femoral neck
38.1 (4.1)
38.6 (3.6)
0.49
a) What method could be used to calculate P in this study? What conditions, if any, do
the data have to fulfill for the method to be valid? Are they likely to be fulfilled?
b) From these P values, can we conclude that the length of femoral neck in elderly
women has increased over time? Can we conclude that the width of femoral neck in
elderly women has not increased over time?
2. A group of researchers observed that baby boys had higher pain scores following
vaccination than baby girls. They hypothesized that this might be partly related to
previous experience with acute pain such as circumcision. They therefore sought to
investigate post hoc whether male neonatal circumcision is associated with a greater
pain response to routine vaccination at 4 or 6 months. They scored pain response after
vaccination with Haemophilus influenzae type B (HIB) (n=18). An independent
observer rated the pain responses for each baby on a scale from 0 (no pain) to 10
(worst pain).
Of the 5 boys not circumcised, 1 had a pain score of 4, 3 a pain score of 6, and 1 a
pain score of 8. Of the 12 boys who were circumcised, 2 had a pain score of 6, 7 a
pain score of 8, and 4 a pain score of 9.
After HIB, pain score was significantly higher in boys who had previously been
circumcised (Mann-Whitney U test, P = 0.01).
a) What feature of the data makes the authors use the Mann-Whitney U test here? Is
it necessary here?
b) What are the disadvantages of this method compared to methods based on the Normal
distribution?
3. The table below is taken from a study investigating the cause of diarrhoea in patients
with gastroenetritis and shows the relationship between foreign travel and a positive
result for the organism Providencia alcalifaciens:
Recent travel
abroad?
Yes
No
Total
a)
b)
c)
d)
e)
f)
P. alcalifaciens
Positive
25
5
28
Negative
229
368
597
Total
254
373
627
What kind of study is this?
What is meant by “ χ2 = 23.98, P < 0.001? “
Explain briefly how the chi-squared test works
What conditions do the data have to meet for the test to be valid?
What conclusions can be drawn from these data?
What other information would be useful in deciding whether P.alcalifaciens was a
likely cause of gastroenteritis in travellers?
4. In a double-blind comparison of two ointments, containing calcipotriol or
betamethasone, for the treatment of psoriasis, 345 subjects were given one ointment
on the left side of the body and the other on the right, the side being chosen at
random. The severity of the condition was assessed by self report as “cleared”,
“pronounced improvement”, “slight improvement”, “no change” or “worse”, and by
visual assessment as a score by the investigator.
The score was significantly lower (P < 0.001) for the calcipotriol side than the
betamethasone side. Patients were more likely to report “cleared” or “pronounced
improvement” on the calcipotriol side.
a) Suggest 2 methods that could be used to carry out the significance test on the scores.
What assumptions would be required for each? Which method would you choose?
b) The self-assessment was recoded into two categories, “cleared or pronounced
improvement” or “little or no improvement”. What method could be used to carry out
the significance test for the recoded data?
c) What can we conclude?
5. Spontaneous pneumothorax is the presence of air or gas in the pleural space without
known cause. Two treatments, simple aspiration and intercostal tube drainage, were
compared in a randomised controlled trial. Each treatment involves inserting a tube to
remove excess air: in simple aspiration the tube is removed after air has been
removed, in intercostal drainage it is left in place for a while. Of the 35 patients
allocated to simple aspiration, 7 were not aspirated successfully and intercostal
drainage was used. Some of the patients pre-treatment data are shown in the table:
Age (years)
Sex (M/F)
Height (m)
Weight (kg)
Smoking history
(pack years)
Mean (sd)
Aspiration
Intercostal drain
34.6 (15.0) (n=35)
34.6 (13.1) (n=38)
28/8
29/9
1.79 (0.11) (n=32)
1.76 (0.09) (n=31)
64.8 (9.2) (n=31)
63.6 (11.0) (n=37)
8.09 (9.32) (n=34)
7.94 (9.16) (n=38)
p-value
0.8
0.92
0.81
0.65
0.95
a) What shape do you think the distribution of smoking history has?
b) What statistical method could be used for each of the tests given?
The following table shows the nature of the pneumothorax, before treatment:
Size of
pneumothorax
Small rim
Partial collapse
Complete collapse
Aspiration
Intercostal drain
P-value
3
16
10
1
12
18
0.054
c) What statistical method could be used to find the P value here?
d) What null hypotheses are being tested in this and the preceding table ?
Are these useful things to test?
Basic Statistics II – Answers
1.
a) In this study we have means from two independent samples. Hence, we can use 2sample t- test to test the null hypothesis that the means are the same in the
populations from which samples were drawn. This method would assume that the
data are from Normal distributions, with the same variance. These variables are
measurements of skeletal size which often follow a Normal distribution. The
standard deviations are small in comparison with the means, providing no
evidence of skewness.
b) There is good evidence that the length of the femoral neck is different in the
populations that these samples represent because the difference was statistically
different. To conclude that there is a change over time we must assume that the
samples are truly representative of elderly white women in these two decades.
There is no evidence that the width of the femoral neck is different in the
populations that these samples represent because the difference was not
significant. However, this does not mean that it has not changed. The sample may
be too small to detect a change that has occurred. A confidence interval would be
more informative. The difference is 0.5 with 95% confidence -1.0 to +2.0. The
clinical implication is that elderly women in the 1990s had longer, but not thicker,
femoral necks than similar women in the 1950s. Longer bones may be more likely
to break, and so this may explain any increase in fractured neck of femur.
2.
a) The Mann-Whitney U-test compares the distribution of pain in 2 independent
groups, here baby boys who have and who have not been circumcised. The test is
based on the ranks of the data. The data here are discrete, rather than continuous,
varying between 0 and 10. Thus the data cannot follow a Normal distribution, and
a two-sample t-test would therefore only be approximate.
b) The main disadvantage of the Mann-Whitney U-test is that it is a significance test
and gives a p-value but does not give a simple estimate and CI for the difference
between the groups. Methods based on the Normal distribution such as the t-test
would give an estimate of the difference and CI in addition to the P value. The
simple t test is fairly robust to grouping and so might have been used here thus
giving a mean difference (2.0) and 95% CI (0.8 to 3.2).
3.
a) This is a cross-sectional study. There is a single group of patients and both
assessments, Providencia and travel history, were made at the same time.
b) This is the result of the chi-square test that tests the null hypothesis that there is no
association between P.alcalifaciens and foreign travel. The value 23.98 is the test
statistic which will follow a chi-squared distribution with 1 df if the null
hypothesis is true. P<0.001 tells us that the probability of these data or more
extreme data occurring if the null hypothesis were true is smaller than 0.001 and
so we have good evidence that the null hypothesis is not true and conclude that an
association exists.
c) The chi-squared test works by comparing the observed frequencies with those
expected if the null hypothesis were true. If the null hypothesis were true then the
observed and expected frequencies would be similar. Hence, a large value of the
test statistic indicates good evidence against the null hypothesis. The actual value
is looked up in a table of the chi-squared distribution to give the exact probability
of a value greater than or equal to that observed.
d) The chi-square test is a large sample test and the usual rule is that a large sample
approximation holds if all expected frequencies are greater than 5 for a 2 x 2
table. Although one observed frequency is 5, no expected values will be as small.
This is because if the null hypothesis were true then the overall probability of
being positive for P.alcalifaciens would be 28/627 = 0.04 and this proportion
would apply to those who have and those who have not traveled abroad. Thus, the
expected numbers positive for P.alcalifaciens would be 254 x 28 / 627 = 11.3 for
those who travelled abroad and 373 x 28/627 = 16.7 among those who have not
travelled abroad. The other expected values can be calculated in a similar way but
will be large because the expected values must add to the marginal totals for each
row and column.
e) The study shows that there is a statistically significant association between
travelling abroad and being positive for P.alcalifaciens among people with
gastroenteritis. We cannot conclude from this that P.alcalifaciens was the cause
of the gastroenteritis. We can only conclude that an association between
Providencia and foreign travel exists.
f) We need a control group. We could look at the number of positive screens for
P.alcalifaciens among subjects without diarrhoea cross-classified according to
whether or not they had recently travelled abroad. This would tell us if the
observed association between travel and P.alcalifaciens was a general one or one
specific to those with diarrhoea.
4.
a) The data are paired. The paired t-test could be used if the differences come from a
Normal distribution. The Wilcoxon paired test could also be used but this would
be less powerful than the t test if the t-test assumptions were satisfied.
b) The data are paired and the variable is dichotomous. We could use McNemar’s
test to compare proportions who had improved for the two treatments.
c) The response was better using calcipotriol ointment than using betamethasone,
though we are not told by how much. As this was a randomized double-blind trial,
the evidence suggests that the ointment itself was the cause.
5.
a) The distribution must be skew to the right. This is obvious because the
standard deviation is greater than the mean.
b) We have two small independent samples. For the measurements we could
consider the two-sample t-test, possibly with a transformation, or the MannWhitney U-test. For each of them the standard deviations are similar in the
two groups. For age, height and weight, the t-test appears appropriate.
Smoking is highly skew and is likely to have a number of zeros for nonsmokers, hence the Mann-Whitney U-test is probably preferred. Sex could be
compared by the chi-square test.
c) This is a 3 x 2 contingency table. The total for the first row is 4, so both
expected frequencies will be considerably less than 5. Hence, the chi-square
test would not be valid. We could combine rows, ‘small rim’ and ‘partial
collapse’ to give a 2 x 2 table. But does this make clinical sense?
d) The null hypothesis in each case is that there is no difference in the mean or
proportion of this variable between aspiration and intercostals patients in the
population from which they come. As they are randomized, the two groups
come from the same population and these null hypotheses are true by
definition. The tests are therefore superfluous.