Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
No Name on This Page!!! Biostatistics Examination #2 100 pts. Maximum of 75 minutes. Please write your name and the Honor Code statement on the back of the last page. #1 (25 pts.) Africa is a hotbed of diseases. AIDS is currently sweeping across the continent at a very alarming rate. Polio could have been stopped, but rumors about the intent of Americans allowed the poliovirus to jump from village to village; the rumors were that the polio vaccines contained either birth control substances or the AIDS virus (HIV). Malaria has always been a problem and continues to kill millions of children and adults each year. Suppose you decided to take a year off from college to work with a health organization in Africa. Your mission was to help this organization get a handle on the extent to which different diseases are present within different countries on the African continent. Your research design was as follows: From each of the chosen countries [Gabon [888], Kenya [463], Uganda [975], Zimbabwe [431] and Botswana [779], each of 2000 randomly selected individuals were tested for three diseases [AIDS, polio, malaria]; the number of individuals who tested positive for one or more diseases in each country is noted above just after the name of the country. Your data on the number of individuals who tested positive for each of the diseases within each of the countries is as follows; please note that the letter is the first letter of the country and the diseases are arranged as above [AIDS, polio, malaria]. [G: 236, 83, 569] [K: 321, 84, 58] [U: 527, 105, 343] [Z: 301, 65, 65] [B: 432, 15, 332] Set up a contingency table with names of diseases in the left column (10 pts). AIDS polio malaria total Gabon 236 83 569 888 Kenya 321 84 58 463 Uganda 527 105 343 975 Zimbabwe 301 65 65 431 Botswana 432 15 332 779 total 1817 352 1367 3536 For all of the following questions, assume that your sample accurately reflects the frequencies of specific diseases within the respective countries. Also, be sure to show your work (calculations). Of the diseased individuals within Zimbabwe, what is the probability (in decimal form; 3 decimal places) that a person has AIDS? (5 pts.) 301/431 = 0.698 ___________ What percentage of Kenyans are inflicted with either AIDS, polio or malaria? (5 pts.) 463/2000 = 0.232 = 23.2% ___________ What percentage of the diseased individuals in your study have malaria? (5 pts.) 1367/3536 = 0.387 = 38.7% ___________ 2 #2 (25 pts.) Just last year your aunt left you 100 km2 of land in central Idaho near Sun Valley. She also left you $3,000,000. Given your stress levels due to exams and courses in general, you decide to bag college and move out to the frontier to enjoy life. During one of your casual hikes up to Galena Summit, which overlooks the headwaters of the Salmon River and which provides a glorious view of the Sawtooth Mountains, you begin to notice that when you see a Mountain Bluebird, you tend not to see another in the same area. This suggested to you that maybe this species of bird is territorial, meaning that each bird secures a certain area and will defend this area to its death. Immediately you are reminded of the Poisson distribution from your biostatistics class back east. What follows? You hire 10 undergraduates during the summer to watch for bluebirds within your 100 km2 of land. The land is broken up into one hundred 1 km2 quadrats of land. The goal of the project is to determine whether Mountain Bluebirds are territorial. After a long and enjoyable summer, your summer interns deliver the following data set to you on the porch of your mountain chalet. [Of course you reciprocate by providing a wonderful dinner for the dedicated young scientists.] A total of 300 Mountain Bluebirds were tagged and studied during the summer. # Bluebirds in Quadrat 0 1 2 3 4 5 6 7 Total: Observed # of Quadrats 4 15 23 24 17 10 5 2 100 Expected Probability 0.0498 0.1494 0.2240 0.2240 0.1680 0.1008 0.0504 0.0216 0.9880 Expected # of Quadrats 4.98 14.94 22.40 22.40 16.80 10.08 5.04 2.16 98.8 Obs. – Exp. -0.98 0.06 0.06 1.6 0.02 -0.08 -0.04 -0.16 Complete the table above. (10 pts.) What is the average number of Mountain Bluebirds per km2 on your newly acquired property (show your work and units)? (5 pts.) 300/100 = 3 _______________ Did any of your interns see 10 Mountain Bluebirds within a single quadrat? (5 pts.) No! __________ Given that the goal of the project was to determine whether Mountain Bluebirds are territorial, how did you explain the results of the study to your summer interns? Be sure to include in your detailed explanation why you selected the Poisson distribution and why you came to your final conclusion regarding territoriality. (5 pts.) (extra space next page) 3 The Poisson distribution can be used to test for independence. In this situation you want to know whether the location of one bluebird has an effect on where another one may be. If there is no effect, then the data should conform to a Poisson distribution. If territoriality is in force, then there should be a significant difference between the observed and expected frequencies, i.e., your data set does not conform to a Poisson distribution. Although a Chi-squared test would be used to test this, the observed values are so close to the expected values that it is safe to conclude that there is no difference. This means that the Mountain Bluebirds are not territorial. #3 (25 pts.) In many professions, time and money are in short supply. It’s no different for people who choose to investigate crimes and work with corpses. Decisions need to be made regarding whether or not a protocol is effective in finding substances or agents of disease on the bodies of crime victims. Our current concern is whether or not a crime scene investigator has chosen an effective test for C. tetani. The victim above appears to have succumbed from the tetanus toxin. The lead investigator seems to think that her primary suspect rigged up an umbrella that can shoot Clostridium tetani, the bacterium that produces the tetanus toxin, into an unsuspecting victim. Because of the expense of the testing kits, the investigator decided to have the lab sample only eight locations on the body, i.e., only eight of the thirteen regions denoted by the blue spots on the above corpse. Each sample test indicates whether C. tetani is present or absent. The investigator happens to know a somewhat reclusive lab-rat (a.k.a., Dr. Bi Nominski) who has data indicating that the success rate of finding C. tetani on the human body is 0.7. Our first mission is to estimate the probabilities associated with finding C. tetani a specific number of times out of a total of eight samples. Complete the table below. (10 pts.) Successes Probability 0 .0001 1 .0012 2 .0100 3 .0467 4 .1361 5 .2541 6 .2965 7 .1977 8 .0576 What is the percent probability that exactly two samples out of the eight yield C. tetani? (5 pts.) 1.00% ____________ 4 What is the percent probability that at least five of the samples yield C. tetani? (show your calculations) (5 pts.) 0.2541+0.2965+0.1977+0.0576=0.8059 (80.59%) _____________ Suppose that national forensic guidelines indicate that in order for a protocol/test to be effective there must be a 95% chance that at least 50% of the samples yield positive results, i.e., indicate the presence of C. tetani. Would the national forensic agency consider the test used on the above corpse effective? Explain in detail your reasoning. (5 pts.) We have the probabilities in the chart above. Fifty percent of eight is four. Therefore the probability of finding C. tetani in at least four of the eight samples equals 0.9420 [0.1361+0.2541+0.2965+0.1977+0.0576=0.9420 (94.20%)]. Although the probability is close to 95%, it does not conform to the guidelines of the forensic agency, and therefore would not be considered an effective protocol. All in all, the results of the investigation would not hold up in court. #4 (25 pts.) So many things conform to a normal distribution. Take your pick: tree height, wing length, rodent numbers, egg masses, shoe sizes, and turtle speeds. It is because of this pattern in nature that mathematicians and statisticians proceeded to create numerous statistical tests that are based on a normal distribution. These tests are considered parametric tests. Draw a normal curve and label the axes, mean, median and mode. (5 pts.) ______________________________________________________ 5 Draw a standard normal curve (mean = 0, standard deviation = 1). Label the axes. (5 pts.) ______________________________________________________ Draw a standard normal curve (mean = 0, standard deviation = 1). Fill/shadow in the tails of the distribution that extend equally beyond 95% of the area of the normal curve. Be sure to label (with a number) the points at which the tails begin on the x-axis. Also, identify the respective areas (in %) defined by the three different regions on the curve (5 pts.) ______________________________________________________ 6 Suppose you are interested in the average number of base pairs within human genes. In fact you are so interested in this subject that you proceed to create a frequency distribution of gene size. Lo and behold, you find that the distribution conforms perfectly to a normal distribution. You also find that the mean size is 14,000 base pairs and that the standard deviation is equal to 2000 base pairs. a. What percent of the ~32,000 genes have a size less than or equal to 17,000 base pairs? (show your work) (5 pts.) 93.32% ______________ The first step is to calculate a z-score, i.e., the number of standard deviations 17,000 is from the mean. If the mean is 14,000 then 17,000 is 3000 bp away. The z-score is calculated by dividing this difference (3000) by the standard deviation, which is equal to 2000. The z-score therefore equals 1.50. Going to the normal probability table in the text (or u sing another source), we find that a z-score of 1.50 translates into a probability of 0.9332. This means that 93.32% of the area under the curve exists to the left of 17,000 (= mean + 1.50 standard deviations). In other words, 93.32% of the 32,000 genes have sizes less than or equal to 17,000 bp. b. What percent of the ~32,000 genes have a size less than 11,000 base pairs? (show your work) (5 pts.) 6.68% ______________ It turns out that 11,000 is 1.50 standard deviations to the left of the mean. Using the information in part a. we know that if 93.32% of the genes are to the left of 17,000, then the remainder (6.68%) have a size greater than 17,000 bp. Given the symmetrical nature of a normal curve, the area less than 14,000 (1.50 s.d. below the mean) is equal to the area greater than 17,000 (1.50 s.d. above the mean). Consequently, 6.68% of the genes have a size less than 14,000 bp.