Download Suggested solutions 2005

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts

Central limit theorem wikipedia , lookup

Volume and displacement indicators for an architectural structure wikipedia , lookup

Law of large numbers wikipedia , lookup

Statistics wikipedia , lookup

Transcript
Suggested solutions for exam in HMM4101, 2005
Part A
1. Validity with respect to measurement procedures can be defined as “the degree to
which the researcher has measured what he has set out to measure”. The reliability of
a research instrument indicates “the extent that repeated measurements made by it
under constant conditions will give the same result”. A research instrument can be
reliable, but at the same time not valid.
2. In a cross-sectional study design, a sample is selected from the population you want
to study, and a single investigation is done of that sample. Although questions can be
asked about both the present and past situation, such a study is not optimal to
investigate changes over time.
A longitudinal study design investigates the same population repeatedly over time,
using the same or a different sample each time. The same information is collected
every time, so that changes in the population can be detected.
A blind study is an experimental study where different groups of patients are given
different treatments, and where it is unknown to the patients which treatment (or
placebo) they receive. The purpose is to avoid that the patient’s knowledge of his or
her type of treatment affects the outcome. A double-blind study is similar to a blind
study, but it is additionally required that the physician administering the treatment
does not know which treatment each patient receives. Again, the purpose is to avoid
the suspicion that the physician can somehow influence the patients or the outcome
measurements with his knowledge.
A panel study is a longitudinal and prospective study where the same respondents are
contacted repeatedly, in order to study changes over time.
3. In stratified random sampling, the population is divided into different strata, and the
sample consists of a predefined number of elements from each stratum, while the
sample within each stratum is selected randomly. An important purpose is to ensure
that a sufficient number of elements (or persons) is sampled from each stratum to
make possible (or make efficient) the data analysis you have in mind.
One example could be the following: One would like to investigate how satisfied
people are with their primary health care, and how this depends on the degree of their
health problems. If a random sample of persons is selected, one would be likely to
include too few persons with severe health problems. Stratifying with respect to
degree of health problems could solve this.
4. As the normal distribution is symmetric, its median is the same as its mean, i.e., 3.
The cumulative distribution at a value x is the probability of being below x for a
random variable with the given distribution. Thus it is a function of x increasing from
0 to 1 as x increases. We know about the normal distribution that approximately 95%
of its probability is contained between its expectation minus two standard deviations
and its expectation plus two standard deviations. In our case, this means that 95% of
our distribution’s probability lies between 3  2 1  1 and 3  2 1  5 . Dividing the
remaining probability in two, we get that 0.025% of the distribution’s probability lies
above 5, and 0.025% of its probability lies below 1. Thus, approximately 97.5% of its
probability lies below 5, and the value of the cumulative function at 5 is
approximately 0.975.
The probability of being below the median is always 50% for any distribution. As our
distribution has median 3, the value of the cumulative distribution at 3 is 50%, or 0.5.
5. These concepts are used in hypothesis testing. A type I error is a rejection of a true
null hypothesis, while a type II error is an acceptance of a false null hypothesis.
6. An indicator variable, or a “dummy variable”, is a variable that can take only two
values, in practice often 0 and 1.
An example of use of an indicator variable in a multiple regression analysis was the
model we used to investigate the influence of smoking, along with other variables
such as mothers weight and age, on birth weight. Smoking was then an indicator
variable, taking on the values 0 and 1.
7. Analysis of variance (ANOVA) is a method for analyzing how the variation in a
dependent variable depends on the variation in one or more discrete explanatory
variables. The purpose of the method is to subdivide the sum of squares
corresponding to the variance of the dependent variable into different sums of squares
corresponding to different explanatory variables, a subdivision which can then be
used to test whether the apparent influences of the explanatory variables on the
dependent variable are statistically significant.
ANOVA can be performed by computing different sums of squares, degrees of
freedoms, and various quotients, and listing these in an ANOVA table. For example,
in a one-way ANOVA analysis, the total sum of squares SST, the within-groups sum
of squares SSW, and the between-groups sum of squares SSG are computed, with
SST=SSW+SSG, and the sizes of SSW and SSG are compared in order to test
whether there is a statistically significant difference between the dependent variable
in the different groups.
8. Assume you have measured two variables, x and y, for n objects, so that you have the
values ( x1 , y1 ),( x2 , y2 ),...,( xn , yn ) . To compute the Spearman rank correlation, all x
values are replaced by their rank in the list of ordered x values, and similarly for the y
values. Then the correlation is computed using the same standard formulas as for
Pearson correlation, i.e., computing the covariance and dividing by the product of the
standard deviations. In practice this means that whereas ordinary correlation can be
strongly influenced by outliers and special features of the marginal distributions,
Spearman rank correlation is less influenced.
9. A normal probability plot is used to visualize the extent to which a set of values seem
to be a sample from a normal distribution. If n values are given, then these are
ordered and plotted against the corresponding n-quantiles of the normal distribution.
The closer the resulting points are to lying on a straight line, the closer the values are
to coming from a normal distribution.
Part B
1. Comparing quality assessments for machines analyzing blood samples:
a. The plot is called a boxplot. For each type of machine, the plot shows the
following: A box, starting at the first quartile and going to the third quartile of
the data, so that it covers 50% of the data. The line in the middle of the box
indicates the median of the data. The lines on each side of the box stretch
down to the lowest observed value and up to the highest observed value.
In this case, the plot indicates that values are generally higher for machine Y
than for machine X, and generally higher for machine Z than for machine Y.
For example, we can see that about 75% of the values for machine Z were
above 60, while about 75% of the values for machine X were below 60.
Although the boxplots are not perfectly symmetrical around the median, the
asymmetry is not large enough to conclude that the values cannot be from
normal distributions.
b. We first compute the variances from the given information, by computing
1 nx
sx2 
( xi  x ) 2  702.1/18  39.006 , and similarly

nx  1 i 1
sy2  645.1/18  35.839 and sz2  479.3/18  26.628 .Clearly, machine X has
the largest and machine Z has the smallest variance, so these are the machines
we will compare. Our null hypothesis H0 is that there is no difference in the
population variances for these machines, while the alternative hypothesis H1 is
that there is a difference. Assuming normal distributions for the values, we
can compare the variances using an F test (see page 352 of Newbold):
sx2 702.1/18
F 2 
 1.46 . This number should be compared with a number
sz 479.3/18
from an F18,18-distribution. From the tables of Newbold, we find that
F20,18,0.05  2.19 , and that F15,18,0.05  2.27 . As 1.46 is smaller than both these
numbers, we get that we cannot reject the null hypothesis that the variances
for the quality results for the two machines are equal, even at the 10%
significance level.
c. Christoffer would here use an ANOVA test, as he is assuming that the data in
each group is normally distributed, and as he can now assume that the
population variances are the same for all the machines. The null hypothesis
would be that the three sets of data come from the same distribution, while the
alternative hypothesis would be that they come from different distributions,
with different means. First, note that the overall mean for all the machines is
m  ( x  y  z ) / 3  (56.8  60.4  62.9) / 3  60.033 . We then need to compute
SSG, the sum of suqares between groups, and SSW, the sum of squares within
groups. We get
SSG  19( x  m)2  19( y  m)2  19( z  m) 
19(56.8  60.033)2  19(60.4  60.033)2  19(62.9  60.033)2  357.33
We also get that SSW  702.1  645.1 479.3  1826.5
We set up an ANOVA table:
Source of
Sum of squares Degrees of
variation
freedom
Between groups 357.33
2
Within groups
1826.5
54
Total
2183.83
56
Mean squares
F ratio
178.665
33.824
5.28
The computed value 5.28 should be compared to an F distribution with 2 and
54 degrees of freedom. We get from the table in Newbold that F2, 40, 0.01 = 5.18,
and that F2, 60, 0.01 = 4.98, so we reject the null hypothesis at the 1%
significance level.
d. We can use the Kruskal-Wallis test: The null hypothesis and the alternative
hypothesis would be as above. We have the information that R1=393,
R2=565, and R3=695. This gives the statistic
3
Ri2
12
12  3932 5652 6952 
W
  3(n  1)  57  58  19  19  19   3  58  8.768
n(n  1) i 1 ni


which should be compared to a Chi-squared distribution with 2 degrees of
2
freedom. We have that 2,0.025
 7.38 , so that we can reject the null hypothesis
at the 2.5% significance level.
e. The advantage with Kari’s setup would be that the same samples would be
used for all three machine types. Some of the difference between the results of
each quality test could be due to some samples being “more difficult” to
analyze than others. In Kari’s setup, this difference would not influence the
difference between the results for the different brands. The data could be
analyzed using two-way ANOVA.
f. Based on the analysis so far, we will assume that the quality values for
machines X and Y are normally distributed and have the same population
variance. This population variance can be estimated as
(n  1) s X2  (nY  1) sY2 702.1  645.1
s 2p  X

 37.422 . A confidence interval is
nX  nY  2
36
then given as
y  x  tnX  nY  2, / 2
s 2p
nX

s 2p
nY
 60.4  56.8  t36,0.025 37.422   191  191 
 3.6  2.042 1.9847
which gives the interval [-0.45, 7.65].
2. Job-related stress among nurses:
a. In the third output, critical is the dependent variable and stress and patients
are the independent variables, and this is not a useful setup when we want to
investigate the causes of stress. The other regressions are however useful (and
the scatterplot will be indirectly useful, as we will see below).
b. We see from the “Coefficients” table that the estimated increase in the nurse’s
stress level is 1.005 per extra patient. The p-value given is 0.000, so the
estimate is clearly significant (i.e., the hypothesis that the estimated parameter
is zero can be clearly rejected). The correlation between patients and stress is
0.824.
c. The stress level of nurses is estimated to increase by 1.162 for each extra
patient per nurse. The stress level is estimated to increase by 0.094 for each
extra critical case. None of these estimates are significant.
d. The main reason for the high p-values in c is collinearity between patients and
critical. This collinearity can be seen in the scatterplot provided. The
scatterplot together with the two simple regressions where stress is the
dependent variable tell us that the stress level is high when patients is high
and critical is low, whereas the stress level is low when patients is low and
critical is high. However, it is then difficult to determine from the numbers
which of the independent variables lead to the increase in stress. This is
reflected in the high p-values.
e. The value 1.701 in the column named “t” is compared with a Student’s t
distribution with 10 degrees of freedom. The number 10 is obtained as the
number of observations (13) minus the number of independent variables (2)
minus 1. Specifically, the p-value is the number  such that t10, / 2  1.701 . The
table available in Newbold provides the numbers t10,0.1  1.372 and
t10,0.05  1.812 . Thus we see that the p-value must be between 2  0.05  0.1and
2  0.1  0.2 .
f. The difference between the two analyses is that one includes the variable
patients, and the other does not. It is the inclusion of this variable that changes
the results for the critical variable. Specifically, when both variables are
included, the analysis indicates that both increase stress, which seems
reasonable. However, when patients is excluded, the dominant effect is that a
higher critical value is correlated to a lower patient value, which leads to a
lower stress value.
g. Ingrid commits the fallacy of confusing correlation with causality. In our data,
there is a positive correlation between experiencing many critical situations
and low stress levels. However, this is because experiencing many critical
situations is correlated with having few patients, and this is again correlated
with low stress levels. In terms of causes and effects, it seems much more
reasonable that having many patients causes stress, than that few critical
situations causes stress.
h. Our data illustrates the point that the value of other variables than critical that
might influence stress should either be ensured to be the constant over
different observations, or they should be included in the analysis. If they are
included, one should make sure to avoid collinearity. As it is difficult to avoid
collinearity between patients and critical (more critically ill patients usually
mean fewer nurses per patient) it might be best to restrict the study to nurses
who all work at places with (roughly) the same number of patients per nurse.
Also, the sample size should be increased from 13.
i. Data in such contingency tables can generally be analyzed with chi-square
tests, which would test whether there is an association between high stress and
many patients. However, here the accuracy of the results would not be good,
as there are so few observations, and a Fisher exact test could be better. But if
it was performed, expected values would be compared to observed values in
each of the cells of the table, and the resulting statistic compared with a chisquare statistic. The null hypothesis would be that the probability for high
stress is the same for nurses with many patients as for those with few. A better
analysis could be performed by using a t-test to compare the actual stress
values for nurses in the manypatients group with the values for the nurses in
the other group.