Download Exam in HMM4101

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Misuse of statistics wikipedia , lookup

Transcript
Institute of Health Management and Health Economics
Faculty of Medicine
University of Oslo
English
Institute of Health Management and Health Economics
University of Oslo
Written exam Monday 11 December 09.00-13.00
HMM4101- Research Methods and Statistics
Results will be available three weeks after the deadline, see the list on the
board outside the Institute of Health Management and Health Economics,
Forskningsveien 3A. The results will also be posted on Studentweb.
The receiving day of the results is the day the results are posted on the board
outside the Institute. Appeals must be submitted within three weeks of this
date.
The written exam consists of 5 pages including this page.
Put all your answers on the exam paper provided, and not in this exam
questions form.
Open book exam: Notes and text book(s) are allowed. Calculator. Dictionary
Remember to write down your candidate number so that you have when the
results are made available.
Maximum number of points on this exam is 60. The number of points for each question is
given in parentheses below. Textbook, handouts, written notes and a calculator are
allowed. All exercises are based on hypothetical data. This exam is slightly shorter than
the trial exam! Suggested solutions will be posted on the website after the exam.
Exercise 1. A small community hospital wants you to do a patient satisfaction survey.
Patient satisfaction is measured on a continuous scale from 0 (not satisfied) to 100 (very
satisfied). Apart from the overall level of satisfaction, they are interested in seeing if there
are any gender differences, age differences and differences between the three departments
at the hospital. Gender is coded as 0 for males and 1 for females. Age is coded in years,
and the departments are originally coded as Department1=0, Department2=1 and
Department3=2.
a)(6p) You decide to collect the information by mailing a questionnaire to patients. In
general, what does it mean that a questionnaire is reliable and valid? Do you see any
problems by mailing a questionnaire to patients?
60 patients respond to your questionnaire. You start by looking at the gender differences.
You want to do a standard, two-sample t-test to see if the satisfaction scores are different
for males and females. You have 33 males, with a mean score of 75.75, and 27 females,
with a mean score of 76.77. The pooled variance sp2 is 148.67.
b)(6p) What’s the null hypothesis and the alternative hypothesis in this case? Calculate
the test-statistic. What is the conclusion of the test? (You will not find the exact number
of degrees of freedom in the table in Newbold, just use the approximate number)
c)(4p) Calculate the 95% confidence interval for the difference in satisfaction scores.
How can you see from this interval whether you should reject or accept the null
hypothesis in b)?
Below is some simple descriptive output from SPSS regarding the satisfaction scores for
males and females:
Descriptive Statistics(a)
N
Minimum
Score
33
Valid N (listwise)
33
a Gender = Males
56
Maximum
100
Mean
75,75
Std. Deviation
14,095
Descriptive Statistics(a)
N
Minimum
Score
27
Valid N (listwise)
27
Maximum
55
Mean
91
Std. Deviation
76,77
9,335
a Gender = Females
d)(4p) What are the assumptions of the two-sample t-test in b)? Do you see any problems
with the analysis in b), just based on the SPSS-output above?
Next, you want to do a regression analysis on satisfaction scores vs gender, age and
department. Below is the output from the univariate analysis of department.
Model Summary(b)
Model
1
R
R Square
Adjusted R
Square
Std. Error of
the Estimate
,690(a)
,476
,458
a Predictors: (Constant), Department2, Department3
b Dependent Variable: Score
8,909
ANOVA(b)
Model
1
Regression
Sum of
Squares
4114,478
Residual
4524,090
Df
2
Mean Square
2057,239
57
79,370
F
25,920
Sig.
,000(a)
t
Sig.
Total
8638,568
59
a Predictors: (Constant), Department2, Department3
b Dependent Variable: Score
Coefficients(a)
Unstandardized
Coefficients
Model
B
1
(Constant)
Std. Error
81,459
2,161
Department2
,289
2,801
Department3
-17,897
3,013
Standardized
Coefficients
Beta
37,699
,000
,012
,103
,918
-,684
-5,940
,000
a Dependent Variable: Score
e)(2p) The Sig-number in the ANOVA table above is the p-value of a test. A test of what,
exactly?
f)(5p) The Sig-values in the Coefficients table above are the p-values of three other tests.
Explain the meaning of these tests, and how many degrees of freedom they have. What
are the conclusions of the tests regarding the department effect?
g)(5p) What’s the predicted satisfaction score of a patient from Department 1? What’s the
predicted difference in satisfaction scores between Department 2 and 3? Would you keep
the department variable as is, or would you do something else?
Below is the output from the univariate analysis of age.
Coefficients(a)
Unstandardized
Coefficients
Model
B
1
(Constant)
Age
Standardized
Coefficients
Std. Error
85,981
4,417
-,188
,080
t
Sig.
Beta
-,295
19,467
,000
-2,353
,022
a Dependent Variable: Score
h)(4p) Calculate the 95% confidence interval for the regression coefficient (B) for age
(Again, you have to use approximate degrees of freedom). Generally, what information
does a 95% confidence interval for a regression coefficient give you (or, what’s the
meaning of a 95% confidence interval)?
Gender turns out to be non-significant in all of the regression analyses. Below is the
output from the multivariate model with both age and department.
Coefficients(a)
Unstandardized
Coefficients
Model
1
Standardized
Coefficients
t
Sig.
B
76,120
Std. Error
4,075
18,679
,000
Department2
,575
2,774
,024
,207
,836
Department3
-20,566
3,446
-,785
-5,968
,000
,116
,075
,182
1,538
,130
(Constant)
Age
Beta
a Dependent Variable: Score
i)(6p) Is age a confounder of department, or vice versa? By combining information from
the different tables, explain exactly why age can be significant in the univariate analysis,
but not significant after adjusting for department. (General talk about age and department
being correlated and loose theories not related to the information given in the exercise
will not give any points)
The hospital is of course interested in knowing why Department3 scores lower on patient
satisfaction. After further research, you find out that they use an anesthetic
(smertestillende) drug that does not seem to work very well.
You want to see how two other anesthetics compare to the one used at the hospital. You
set up a randomized, clinical trial, where 90 patients are divided into three independent
groups of 30, and an anesthetic is assigned to each group. The effect of the anesthetic is
measured on a continuous scale from 0mm to 100mm, where 100mm means a lot of pain
(this is called a VAS-scale, where pain is measured in mm=millimeters, and it does
actually exist!). You want to keep the continuous coding of VAS in the analysis, but the
administration feels that a coding in mm is unnecessary detailed, and want to divide it
into four categories (e.g 0mm-25mm=I feel great!, 26mm-50mm=light pain, 5175mm=heavier pain, 76mm-100mm=I have a lot of pain, is this anesthetic placebo?).
j)(6p) If you use VAS as a continuous variable, which possible tests could you use to find
differences between the anesthetics? If you use VAS as a categorical variable, which test
could you use? Discuss the advantages/disadvantages of the two coding options,
considering the assumptions and power of the different tests.
Exercise 2. You are working at a pharmaceutical company, and want to find out if a new
medicine is better than the competitor’s medicine. You have two groups with 30
randomly sampled patients in each group. In the group using Medicine A (the new
medicine), 70% are cured. In the group using Medicine B (the competitor’s medicine),
50% are cured.
a)(4p) Formulate a null hypothesis and an alternative hypothesis, and do a test on whether
the new medicine is better than the competing medicine. Conclusion?
b)(4p) What probability distribution can the number of people getting cured in each
group be assumed to follow? Explain why. What fundamental statistical theorem is
applied in order to get the test statistic you use in a)?
You also want to check for serious side effects (bivirkninger) of the new medicine. Let’s
assume that there is a probability of 1/5000 for a serious side effect.
c)(4p) If the drug is used by 2000 patients, what is the probability that none of them are
affected by the serious side effect? What assumptions do you make in this calculation?