Download Homework Number 1

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Corecursion wikipedia , lookup

Computer simulation wikipedia , lookup

Data analysis wikipedia , lookup

Pattern recognition wikipedia , lookup

Least squares wikipedia , lookup

Generalized linear model wikipedia , lookup

Predictive analytics wikipedia , lookup

Regression analysis wikipedia , lookup

Data assimilation wikipedia , lookup

Transcript
Advanced Biostatistical Methods
Prof. DM Steinberg
Homework Number 1
1. Zivin and Waud (Stroke, 1992) describe an experiment to study the drug MK801. The drug was designed to limit the negative impact of stroke. The
experimenters caused ischemic events in experimental animals and then
assessed the animals for neurological damage. The "dose" in this experiment
is the length (in minutes) of the ischemic event. Most of the animals were in a
control group that did not receive MK-801. If the drug is effective, the
experimental group that received MK-801 should have lower probability of
neurological damage.
The following table gives the results of the experiment.
Duration of
Ischemia (min)
20
25
30
35
40
45
50
55
60
Control Group
No Damage
Damage
3
0
8
5
7
8
4
8
0
3
0
1
0
1
0
0
0
4
MK-801 Group
No Damage
Damage
1
0
1
0
1
0
2
0
1
0
1
1
2
0
0
1
0
1
1.1 Analyze each group separately by logistic regression. For each group, state a
95% confidence interval for the slope, estimate the LD50 (no need for a confidence
interval) and estimate the probability of neurological damage if the ischemia lasts 40
minutes (again no CI).
1.2 Analyze the data from both groups together. Is there convincing evidence that
MK-801 reduces the risk of neurological damage from ischemia? Fit appropriate
logistic regression models and summarize the results, focusing on the above question.
2. The relationship between hypertension and coronary artery disease (CAD) was
examined. Subjects with, and without, CAD were tested for hypertension. The
subjects were also classified by age into two groups, 35-49 and 65 +. The data for the
two age groups were as follows:
For subjects aged 35-49:
Hypertension
Yes
No
Total
CAD
Yes
552
941
1493
Total
No
212
495
707
764
1436
2200
For subjects aged 65 +:
Hypertension
Yes
No
Total
CAD
Yes
1102
1018
2120
Total
No
87
106
193
1189
1124
2313
2.1 In both age groups, compute the odds ratio for suffering from CAD, for
individuals with hypertension as opposed to individuals without hypertension.
Compute a 95% confidence interval for each group-specific odds ratio.
2.2 Analyze the data by logistic regression. Fit appropriate models. (Note: as in
class, you can often fit these different models as separate "blocks" in the analysis.)
For each model that you fit, state first why that model is of interest (i.e. what
substantive question you can answer by fitting the model), then present the results
and finally summarize the important points in the output in a few words. For each
model, compute a 95% confidence interval for the odds ratio relating CAD to
hypertension.
3. A study in a mixed rural and urban district near Newcastle, UK,
examined the relationship between smoking and 20-year survival. The
original survey, conducted in 1972-1974, was a random sample of 1/6
of the residents. Twenty years later a follow-up survey was conducted
to examine survival. The data here are a subset from the survey and
include all women who were classified as current smokers or as never
having smoked at the time of the original survey. Subjects are
classified on the basis of smoking status and whether or not they were
still alive at the time of the follow-up survey. In addition, the data are
broken down by the age of the subject at the time of the original
survey. The data are listed below.
Age
Group
18 - 24
25 - 34
35 - 44
45 - 54
55 - 64
65 - 74
75 -
Non-smokers
Alive
Dead
61
1
152
5
114
7
66
12
81
40
28
101
0
64
Smokers
Alive
Dead
53
2
121
3
95
14
103
27
64
51
7
29
0
13
3.1 Analyze the data by logistic regression, treating age as a categorical variable.
Summarize the relationship between smoking and 20-year survival.
3.2 Analyze the data by logistic regression, treating age as a numerical variable,
with continuous dependence of the log odds for death on initial age.
Summarize the relationship between smoking and 20-year survival. Use this
model to estimating the probability of death, within 20 years, as a function of
age, for both smokers and non-smokers; include a graph showing the
estimated probability of death vs. age for each group.