Download PubH 6414 Homework 1

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
PubH 6414 Fall2011 Homework 4 (20 points)
We encourage you to work together in computing and discussing the problems.
However, each student is expected to independently write up the submitted
assignment using her or his own computing and giving explanations in her or his
own words. Identical or nearly identical homework submissions will not receive
credit.



Turn in this completed Word document in class by the homework due date.
You may use R commander to do the calculations needed for each question. Paste in ONLY the
parts of the output needed to answer the question. (You may use another statistical software
package to do the calculations, if you prefer, but the instructor and TAs cannot provide
assistance with other packages.)
Data needed for this homework assignment are on the website link:
http://www.biostat.umn.edu/~susant/FALL11PH6414HMK.html .
Problem 1: Probability Concepts. (2 points)
Part A. A researcher is studying the possibility that there may be an association between disease D and
exposure E. The data she collects will be analyzed using a table like the one below. Note that n is the
total number of people in the study and that n = a + b + c + d.
For each event described below, give the probability type. Choices: joint, conditional, or marginal. In
addition, calculate the probability of that event occurring for a randomly chosen study participant,
using the letters in the table above.
A1. Probability that someone was exposed, Pr(E).
A2. Probability that someone was exposed and has the disease, Pr(D and E).
A3. Probability that someone who had been exposed has the disease, Pr(D | E).
A4. Probability that someone has the disease, Pr(D).
Part B. A researcher is developing a diagnostic test T for disease D. Match the definitions below to the
correct terms.
Choices: Incidence; Prevalence; Sensitivity; Specificity; Positive Predictive Value; Negative Predictive
Value.
B1. Probability that the test is positive given that the person has the disease, Pr(T+ | D+).
B2. Probability that the person has the disease given that the test is positive, Pr(D+ | T+).
B3. Probability that a person has the disease, Pr (D+).
B4. Probability that the person is healthy given that the test is negative, Pr(D- | T-).
B5. Probability that the test is negative given that the person is healthy, Pr(T- | D-).
Problem 2: Venn Diagram (3 points)
Data from Table 4-1 on the 141 patients in the recent epidemic time period for the development of
serogroup B meningococcal disease are shown below.
Table 4-1 Recent Epidemic
Site of infection
Count
Sepsis Only
40
Meningitis Only
39
Both Sepsis &
Meningitis
34
Unknown (Neither)
28
141
Proportion
0.28
0.28
0.24
0.20
1.00
Define Events:
A: Sepsis
B: Meningitis
Below use probability notation (i.e. P(Meningitis Only) = P(B) – P(A and B) ) to define each scenario.
In addition, determine the probability of the scenario.
a. Sepsis as a site of infection
b. Meningitis as a site of infection
c. Sepsis as the only site of infection
d. Both sepsis and meningitis
e. A known infection site
f. Sepsis or meningitis or both as an infection site
g. An unknown infection site
List at least one combination of the events above (events a through g) which is a complementary set of
events. (There may be more than one set of complementary events.) Show that your set of events is
truly complementary using probability notation (i.e. P(A) + P(Ac) = 1).
Problem 3: Education and Mammography. (3 points)
Data from a CDC website indicate that approximately 70% of women age 40 or older in the US had a
mammogram within the past 2 years (2003 data). The table below provides the probability
distributions of mammograms (yes / no) by education level for women over 40.
Education
level
< HS
HS grad
> HS
Total
Mammogram
0.14
0.36
0.20
0.70
No
Mammogram
0.11
0.13
0.06
0.30
Total
0.25
0.49
0.26
1.0
Use the data in the table above to calculate the stated probabilities. Please show your work.
A. The conditional probability of having a mammogram for a woman with less than a high-school
education, P(mammogram | <HS):
B. The conditional probability of having a mammogram for a woman who is a high-school graduate,
P(mammogram | HS grad):
C. The conditional probability of having a mammogram for a woman who has continued her education
beyond high school, P(mammogram | >HS):
D. The joint probability of having less than a high-school education and having a mammogram, P
(<HS, mammogram):
E. The product of the two marginal probabilities of having less than a high-school education and of
having a mammogram, P(<HS) * P(mammogram):
F. Are education level and mammogram screening (yes / no) independent events? Justify your answer
using the probabilities you calculated and the probabilities in the table.
Problem 4: Gender and Education. (4 points)
Data on gender and education are shown in the table below.
Bachelor's Master's Professional Doctorate TOTAL
784
276
39
20
1119
Female
559
197
44
25
825
Male
1343
473
83
45
1944
TOTAL
Use the data in the table to answer the following questions. Please show your work.
a. What is the probability that a randomly chosen graduate is a man?
b. What kind of probability (conditional, marginal, joint) is this?
c. What is the probability that a randomly chosen graduate earned a doctorate?
d. What kind of probability (conditional, marginal, joint) is this?
e. What is the probability that a randomly chosen person who earned a doctorate is male?
f. What kind of probability (conditional, marginal, joint) is this?
g. What is the probability that a randomly chosen male graduate earned a doctorate?
h. What kind of probability (conditional, marginal, joint) is this?
i. What is the probability that a randomly chosen graduate is a man who earned a doctorate?
j. What kind of probability (conditional, marginal, joint) is this?
k. Are being male and earning a doctorate independent events within this sample? Why or why not?
Problem 5: Causes of Death. (1 point)
Data taken from the US National Center for Health Statistics, via Aschengrau and Seage, p. 109ff,
were used to create this table. The top three causes of death are given for each age group. Dots indicate
missing data.
Note: The numbers in the table are conditional probabilities. The conditional probabilities for all
causes for a given age group add up to 1.0 (or 100%), as shown in the last row of the table. For
example, for deaths occurring amongst people ages 15 – 24, the probability that the cause of death was
homicide is 0.19.
Cause of Death
Heart Disease
Cancer
Cerebrovascular Disease
Unintentional Injury
Suicide
Homicide
TOTAL
Age 15 - 24
.
.
.
0.42
0.13
0.19
1.00
Age 25 - 44 Age 45 - 64 Age > 65
All Ages
0.12
0.27
0.35
0.314
0.16
0.35
0.22
0.233
.
.
0.08
0.069
0.20
0.05
.
0.041
.
.
.
0.013
.
.
.
.
1.00
1.00
1.00
1.00
Which one choice below gives the correct interpretation of the light blue cell in the table? (Please
highlight your choice.)
a. 16% of people age 25 - 44 die of cancer.
b. 16% of people age 25 - 44 are diagnosed with cancer.
c. 16% of people age 25 - 44 who are diagnosed with cancer die from it.
d. 16% of deaths among people age 25 - 44 are caused by cancer.
e. 16% of cancer deaths occur in young people age 25 – 44.
f. 16% of cancer diagnoses occur in young people age 25 - 44.
g. 16% of all deaths are due to cancer in young people age 25 – 44.
Problem 6: ESR Screening Test. (4 points)
Data from a screening test for spinal malignancy are shown in the table below. The test measures
erythrocyte sedimentation rate (ESR). If ESR ≥ 20 mm/h, the test is positive. If ESR < 20 mm/h the
test is negative.
Spinal Malignancy
Yes
No
Test Positive
156
264
Test Negative
44
536
Total
200
800
Total
Use the data in the table to calculate the following. Please show your work.
a. The sensitivity of the screening test:
b. The specificity of the screening test:
c. The prevalence of spinal malignancy in this study:
d. The false positive rate of the screening test:
e. The false negative rate of the screening test:
f. The negative predictive value (NPV) of the screening test, using the direct calculation method:
g. The positive predictive value (PPV) of the screening test, using the direct calculation method:
h. The PPV of the screening test, using the Bayes’ formula method:
Suppose that you had a patient in your office who had just received a positive test result from this
screening test. How would you explain to her/him what the test results mean?
Problem 7: Depression Screening Test (3 points)
Data from a study on depression in stroke patients were reported in a paper in the online British
Medical Journal (Watkins et al., BMJ 2001; 323: 1159).
Respondents were considered to have tested positive if they responded yes to the question “Do you
often feel sad or depressed?” Respondents with clinical depression (D+) were defined by a more
complex measure, the Montgomery Asberg depression rating scale. The study results are shown in the
table below.
Clinically
depressed (D+)
Not clinically
depressed (D-)
Total
T+ (responded yes)
37
T- (responded no)
6
Total
A
8
28
B
45
34
a. What are the values for A and B in the table above? (numerical answers)
b. What is the probability of having answered yes, given a subject had been diagnosed with
clinical depression?
c. What is this value called?
d. What is the prevalence of clinical depression in this study of stroke patients?
e. If the prevalence of clinical depression in the community tested were much lower than it is in
this group of stroke patients, which of the following values would change and in which
direction? (Highlight the correct answer(s).)
1. NPV would increase
2. PPV would decrease
3. Sensitivity would increase
4. Specificity would decrease