Download Sample Exam - Cedar Crest College

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
No Name on This Page!!!
Biostatistics
Examination #2
100 pts. Maximum of 75 minutes.
Please write your name and the
Honor Code statement on the back
of the last page.
#1 (25 pts.)
Africa is a hotbed of diseases. AIDS is currently sweeping across the continent at a very
alarming rate. Polio could have been stopped, but rumors about the intent of Americans
allowed the poliovirus to jump from village to village; the rumors were that the polio
vaccines contained either birth control substances or the AIDS virus (HIV). Malaria has
always been a problem and continues to kill millions of children and adults each year.
Suppose you decided to take a year off from college to work with a health organization in
Africa. Your mission was to help this organization get a handle on the extent to which
different diseases are present within different countries on the African continent. Your
research design was as follows: From each of the chosen countries [Gabon [888], Kenya
[463], Uganda [975], Zimbabwe [431] and Botswana [779], each of 2000 randomly
selected individuals were tested for three diseases [AIDS, polio, malaria]; the number of
individuals who tested positive for one or more diseases in each country is noted above
just after the name of the country. Your data on the number of individuals who tested
positive for each of the diseases within each of the countries is as follows; please note
that the letter is the first letter of the country and the diseases are arranged as above
[AIDS, polio, malaria]. [G: 236, 83, 569] [K: 321, 84, 58] [U: 527, 105, 343] [Z: 301,
65, 65] [B: 432, 15, 332]
Set up a contingency table with names of diseases in the left column (10 pts).
AIDS
polio
malaria
total
Gabon
236
83
569
888
Kenya
321
84
58
463
Uganda
527
105
343
975
Zimbabwe
301
65
65
431
Botswana
432
15
332
779
total
1817
352
1367
3536
For all of the following questions, assume that your sample accurately reflects the
frequencies of specific diseases within the respective countries. Also, be sure to show
your work (calculations).
Of the diseased individuals within Zimbabwe, what is the probability (in decimal form; 3
decimal places) that a person has AIDS? (5 pts.)
301/431 = 0.698
___________
What percentage of Kenyans are inflicted with either AIDS, polio or malaria? (5 pts.)
463/2000 = 0.232 = 23.2% ___________
What percentage of the diseased individuals in your study have malaria? (5 pts.)
1367/3536 = 0.387 = 38.7% ___________
2
#2 (25 pts.)
Just last year your aunt left you 100 km2 of land in central Idaho near Sun Valley. She
also left you $3,000,000. Given your stress levels due to exams and courses in general,
you decide to bag college and move out to the frontier to enjoy life. During one of your
casual hikes up to Galena Summit, which overlooks the headwaters of the Salmon River
and which provides a glorious view of the Sawtooth Mountains, you begin to notice that
when you see a Mountain Bluebird, you tend not to see another in the same area. This
suggested to you that maybe this species of bird is territorial, meaning that each bird
secures a certain area and will defend this area to its death. Immediately you are
reminded of the Poisson distribution from your biostatistics class back east. What
follows? You hire 10 undergraduates during the summer to watch for bluebirds within
your 100 km2 of land. The land is broken up into one hundred 1 km2 quadrats of land.
The goal of the project is to determine whether Mountain Bluebirds are territorial. After
a long and enjoyable summer, your summer interns deliver the following data set to you
on the porch of your mountain chalet. [Of course you reciprocate by providing a
wonderful dinner for the dedicated young scientists.]
A total of 300 Mountain Bluebirds were tagged and studied during the summer.
# Bluebirds in
Quadrat
0
1
2
3
4
5
6
7
Total:
Observed # of
Quadrats
4
15
23
24
17
10
5
2
100
Expected
Probability
0.0498
0.1494
0.2240
0.2240
0.1680
0.1008
0.0504
0.0216
0.9880
Expected # of
Quadrats
4.98
14.94
22.40
22.40
16.80
10.08
5.04
2.16
98.8
Obs. – Exp.
-0.98
0.06
0.06
1.6
0.02
-0.08
-0.04
-0.16
Complete the table above. (10 pts.)
What is the average number of Mountain Bluebirds per km2 on your newly acquired
property (show your work and units)? (5 pts.)
300/100 = 3
_______________
Did any of your interns see 10 Mountain Bluebirds within a single quadrat? (5 pts.)
No!
__________
Given that the goal of the project was to determine whether Mountain Bluebirds are
territorial, how did you explain the results of the study to your summer interns? Be sure
to include in your detailed explanation why you selected the Poisson distribution and why
you came to your final conclusion regarding territoriality. (5 pts.) (extra space next page)
3
The Poisson distribution can be used to test for independence. In this situation you want
to know whether the location of one bluebird has an effect on where another one may be.
If there is no effect, then the data should conform to a Poisson distribution. If
territoriality is in force, then there should be a significant difference between the
observed and expected frequencies, i.e., your data set does not conform to a Poisson
distribution. Although a Chi-squared test would be used to test this, the observed values
are so close to the expected values that it is safe to conclude that there is no difference.
This means that the Mountain Bluebirds are not territorial.
#3 (25 pts.)
In many professions, time and money are in short supply. It’s no different for people
who choose to investigate crimes and work with corpses. Decisions need to be made
regarding whether or not a protocol is effective in finding substances or agents of disease
on the bodies of crime victims. Our current concern is whether or not a crime scene
investigator has chosen an effective test for C. tetani.
The victim above appears to have succumbed from the tetanus toxin. The lead
investigator seems to think that her primary suspect rigged up an umbrella that can shoot
Clostridium tetani, the bacterium that produces the tetanus toxin, into an unsuspecting
victim. Because of the expense of the testing kits, the investigator decided to have the lab
sample only eight locations on the body, i.e., only eight of the thirteen regions denoted by
the blue spots on the above corpse. Each sample test indicates whether C. tetani is
present or absent. The investigator happens to know a somewhat reclusive lab-rat (a.k.a.,
Dr. Bi Nominski) who has data indicating that the success rate of finding C. tetani on the
human body is 0.7. Our first mission is to estimate the probabilities associated with
finding C. tetani a specific number of times out of a total of eight samples.
Complete the table below. (10 pts.)
Successes
Probability
0
.0001
1
.0012
2
.0100
3
.0467
4
.1361
5
.2541
6
.2965
7
.1977
8
.0576
What is the percent probability that exactly two samples out of the eight yield C. tetani?
(5 pts.)
1.00%
____________
4
What is the percent probability that at least five of the samples yield C. tetani? (show
your calculations) (5 pts.)
0.2541+0.2965+0.1977+0.0576=0.8059 (80.59%) _____________
Suppose that national forensic guidelines indicate that in order for a protocol/test to be
effective there must be a 95% chance that at least 50% of the samples yield positive
results, i.e., indicate the presence of C. tetani. Would the national forensic agency
consider the test used on the above corpse effective? Explain in detail your reasoning. (5
pts.)
We have the probabilities in the chart above. Fifty percent of eight is four. Therefore the
probability of finding C. tetani in at least four of the eight samples equals 0.9420
[0.1361+0.2541+0.2965+0.1977+0.0576=0.9420 (94.20%)]. Although the probability
is close to 95%, it does not conform to the guidelines of the forensic agency, and
therefore would not be considered an effective protocol. All in all, the results of the
investigation would not hold up in court.
#4 (25 pts.)
So many things conform to a normal distribution. Take your pick: tree height, wing
length, rodent numbers, egg masses, shoe sizes, and turtle speeds. It is because of this
pattern in nature that mathematicians and statisticians proceeded to create numerous
statistical tests that are based on a normal distribution. These tests are considered
parametric tests.
Draw a normal curve and label the axes, mean, median and mode. (5 pts.)
______________________________________________________
5
Draw a standard normal curve (mean = 0, standard deviation = 1). Label the axes. (5
pts.)
______________________________________________________
Draw a standard normal curve (mean = 0, standard deviation = 1). Fill/shadow in the
tails of the distribution that extend equally beyond 95% of the area of the normal curve.
Be sure to label (with a number) the points at which the tails begin on the x-axis. Also,
identify the respective areas (in %) defined by the three different regions on the curve (5
pts.)
______________________________________________________
6
Suppose you are interested in the average number of base pairs within human genes. In
fact you are so interested in this subject that you proceed to create a frequency
distribution of gene size. Lo and behold, you find that the distribution conforms
perfectly to a normal distribution. You also find that the mean size is 14,000 base pairs
and that the standard deviation is equal to 2000 base pairs.
a. What percent of the ~32,000 genes have a size less than or equal to 17,000
base pairs? (show your work) (5 pts.)
93.32%
______________
The first step is to calculate a z-score, i.e., the number of standard deviations
17,000 is from the mean. If the mean is 14,000 then 17,000 is 3000 bp away.
The z-score is calculated by dividing this difference (3000) by the standard
deviation, which is equal to 2000. The z-score therefore equals 1.50.
Going to the normal probability table in the text (or u
sing another source), we find that a z-score of 1.50 translates into a
probability of 0.9332. This means that 93.32% of the area under the curve
exists to the left of 17,000 (= mean + 1.50 standard deviations). In other
words, 93.32% of the 32,000 genes have sizes less than or equal to 17,000 bp.
b. What percent of the ~32,000 genes have a size less than 11,000 base pairs?
(show your work) (5 pts.)
6.68%
______________
It turns out that 11,000 is 1.50 standard deviations to the left of the mean.
Using the information in part a. we know that if 93.32% of the genes are to
the left of 17,000, then the remainder (6.68%) have a size greater than 17,000
bp. Given the symmetrical nature of a normal curve, the area less than
14,000 (1.50 s.d. below the mean) is equal to the area greater than 17,000
(1.50 s.d. above the mean). Consequently, 6.68% of the genes have a size less
than 14,000 bp.