Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Name: Matriculation No.: NATIONAL UNIVERSITY OF SINGAPORE FACULTY OF SCIENCE MID TERM QUIZ I FOR THE DEGREE OF BACHELOR OF SCIENCE (Semester II 2012-2013) ST1232 Probability and Statistics 5th March – Time allowed: 1 hour 30 minutes INSTRUCTIONS TO CANDIDATES (1) This examination paper contains twenty (20) questions and comprises ten (10) printed pages including the cover page, and 6 blank pages for workings. (2) Answer ALL questions. The total number of marks for this exam is sixty (60). (3) Each correct answer will be awarded 3 marks, whereas one mark will be deducted for each incorrect answer. No marks will be awarded or deducted if the option (e) is selected as the answer for any of the questions. (4) This is a CLOSED BOOK examination. (5) Candidates may use calculators and may bring one handwritten A4-size help sheet. (6) Please enter the answer to each question in the table below. Question 1 2 3 4 5 6 7 8 9 10 Answer Question 11 12 13 14 15 16 17 18 19 20 Scores: _______________________________ 1 Answer 1. Which of the following are examples of variables measured on an ordinal scale? i. A numerical scale for quantifying pain that takes integer values from 1 to 10 ii. The age of children in a myopia study conducted for students between 6 – 12 year olds iii. Serum cholesterol level in normal subjects iv. The star classification of hotels v. The number of cigarettes smoked in a day a. b. c. d. e. All except (iii) (i) and (iv) only (ii) and (iv) only (i), (ii) and (iv) only I don’t know 2. According to the 2010 report from the Ministry of Health, the prevalence of HIV infection in Singapore is 117 per million in the population. Of the HIV carriers, 51.7% of them were infected through heterosexual sexual transmission, while 37.0% were infected through homosexual transmission. The remainder were infected through other reasons, including bisexuality, intraveneous drug use and blood transfusion. Based on these figures: a. HIV infection and homosexuality is positively correlated b. HIV infection and homosexuality is negatively correlated c. HIV infection and homosexuality is independent d. There is insufficient information to conclude the relationship between HIV infection and homosexuality e. I don’t know 3. Given that the prevalence of HIV in Singapore is 117 per million in the population as of 2010, Dr Alex Cook of the Department of Statistics & Applied Probability decided to conduct a survey by randomly sampling 10,000 people in Singapore. What is the chance that he will actually include more than two people in his sample of 10,000 who are HIV carriers? a. There isn’t sufficient information provided to calculate this b. 0.114 c. 0.031 d. 0.969 e. I don’t know 2 4. Due to certain changes in the lifestyle, a woman now believes that she is 90% likely to be pregnant. She buys two pregnancy test kits, the first is made by Lab-X and has a sensitivity of 98% and a specificity of 96%, the second is made by Lab-Y and has a sensitivity of 94% and a specificity of 99%. Assume that the results from both test kits are independent, and that test 1 (from Lab-X) shows a positive result while test 2 (from Lab-Y) shows a negative result. Find the probability that the woman is not pregnant. a. 0.402 b. 0.070 c. 0.054 d. 0.338 e. I don’t know Questions 5 – 8. For the following four questions, please refer to the figure below, which shows the treatment dosage in milligrams per day for subjects assigned to two different treatment regimes. 5. Based on the figure above, the following statement is true: a. There is insufficient information to deduce the relationship between mean dosage and the median dosage for treatment 1. b. The median dosage for treatment 1 is higher than the mean dosage treatment 1. c. The mean dosage for treatment 1 is higher than the median dosage treatment 1. d. There is no difference between the mean and median dosages treatment 1. e. I don’t know 3 the for for for 6. Based on the figure above, the following statement is true: a. There is graphical evidence to suggest there is no difference in the treatment dosage for treatment 1 and treatment 2. b. There is no graphical evidence of a difference in the treatment dosage for treatment 1 and treatment 2. c. There is insufficient evidence to conclude on the presence of any difference in the treatment dosage for treatment 1 and treatment 2. d. There is graphical evidence to suggest that the treatment 2 requires a higher dosage than treatment 1. e. I don’t know 7. Based on the figure above, the following statement is clearly true: i) The range of the dosage for treatment 2 is greater than the range of the dosage for treatment 1. ii) The dispersion of the dosage for treatment 1 is the same as the dispersion of the dosage for treatment 2. iii) Because of the shorter upper whisker for treatment 1, there are more outliers in the dosage for treatment 1 than the dosage for treatment 2. iv) The inter-quartile ranges of the dosage for treatment 1 and treatment 2 are the same, since the lower whisker for both treatments are identical. a. b. c. d. e. (iii) and (iv) only (i) and (iii) only (ii) and (iii) only (i) only I don’t know 8. It was calculated that the mean and median dosages for treatment 1 were 1.97 mg/day and 1.37 mg/day. The standard deviation for the treatment 1 dosages was 1.87 mg/day, with first and third quartiles given by 0.63 mg/day and 2.73 mg/day respectively. An appropriate summary statement for the dosage of treatment 1 will be: a. The appropriate location parameter yields an estimate of 1.97 mg/day, with a corresponding metric of dispersion estimated at 1.87 mg/day. b. The appropriate location parameter yields an estimate of 1.37 mg/day, with a corresponding metric of dispersion estimated at 1.87 mg/day. c. The appropriate location parameter yields an estimate of 1.37 mg/day, with a corresponding metric of dispersion estimated at 2.10 mg/day. d. The appropriate location parameter yields an estimate of 1.97 mg/day, with a corresponding metric of dispersion estimated at 2.10 mg/day. e. I don’t know 4 Questions 9 – 11. The Rhesus blood group can be assumed to be determined genetically from a biallelic marker, with alleles say, D and d. The allele D is dominant and those with genotypes DD, and Dd are known as being Rhesus positive, while those with the genotype dd is known as being Rhesus negative. It is also known that 16% of Irish is of the Rhesus negative blood group. 9. An Irish man who is known to be Rhesus positive marries a Rhesus negative girl and has a child who is Rhesus negative. Assume that the population is in Hardy-Weinberg equilibrium, what is the probability that the father is heterozygous (i.e. carrying the Dd genotype)? a. 1 b. 0.48 c. 0.5 d. 0 e. I don’t know 10. Another Irish man who is known to be Rhesus positive marries a girl at random from the population. Assume that the population is in Hardy-Weinberg equilibrium, what is the chance that their son will be Rhesus positive? a. 0.840 b. 0.886 c. 0.771 d. 0.691 e. I don’t know 11. A third Irish man who is known to be Rhesus positive marries a Rhesus positive girl who is known to be carrier for the Rhesus negative allele (thus she is a heterozygote). They have a daughter who is Rhesus positive. Assume that the population is in Hardy-Weinberg equilibrium, what is the chance that the father is also a carrier for the Rhesus negative allele (i.e. the father is also a heterozygote)? a. 1 b. 0.48 c. 0.5 d. 0 e. I don’t know 5 Questions 12 – 14. A standard test for diabetes is based on glucose levels in the blood after fasting for a prescribed period. For healthy persons, the mean fasting glucose level is found to be 5.31 mmol/L with a standard deviation of 0.58 mmol/L. For untreated diabetes, the mean is 11.74 and the standard deviation is 3.50. In both groups, the levels appear to be approximately normally distributed. A simple diagnostic test uses the fasting glucose levels to assign diabetes status. If the level is greater than 6.5, the subject is said to be diabetic. If the level is less than 6.50, the subject is said to be free from diabetes. 12. Using this diagnostic kit, what is the chance that a randomly chosen non-diabetic subject will be classified as a diabetic? a. 0.067 b. 0.020 c. 0.980 d. 0.933 e. I don’t know 13. What is the chance that two of such diagnostic kits will both be correct when used to diagnose a diabetic subject and another unrelated non-diabetic subject? a. 0.960 b. 0.019 c. 0.870 d. 0.914 e. I don’t know 14. A student from the Department of Biological Sciences decided to calibrate the diagnostic kit to achieve a sensitivity of 98% and a specificity of 99%. Such a design is subsequently used to test 10 subjects, of which three truly are diabetics and the remaining seven are truly non-diabetic. What is the probability there is only one misdiagnosis? a. 0.060 b. 0.092 c. 0.116 d. 0.146 e. I don’t know 6 15. The weight of male students follows a normal distribution with a mean of 69kg and a variance of 25kg2, while the weight of female students follows a normal distribution with a mean of 53kg and a variance of 30kg2. What is the probability that two randomly chosen female students will be heavier than twice the weight of a randomly chosen male student? a. 0.001 b. 0.006 c. 0.994 d. 0.999 e. I don’t know 16. Which of the following is a preferred form of summary for ordinal data? a. Histogram b. Scatterplot c. Tabular display of counts d. Boxplot e. I don’t know 17. The coefficient of variation (CV) is defined as the standard deviation divided by the arithmetic mean and is often used as a metric for comparing the variability between two groups. Based on the definition of the CV, situations that the CV will be useful will be when: i) The two groups contain different sample sizes ii) Different units of measurements were used in the two groups iii) The measurements were taken at different times iv) There is considerable skewness in the data from both groups a. b. c. d. e. All of the above (ii) only (ii) and (iv) only None of the above I don’t know 7 18. For the following figure, which statement best summarizes the data: a. A linear relationship exists between the outcome measurement and the treatment dosage that is independent of the gender. b. A linear relationship exists between the outcome measurement and the treatment dosage, after adjusting for the effects of gender. c. A linear relationship exists between the outcome measurement and the treatment dosage, after assuming an interaction between gender and treatment dosage on the outcome measurement. d. There is no linear relationship between the outcome measurement and the treatment dosage. e. I don’t know 8 19. You are recruited as an expert consultant to participate in a review of the problem gambling situation in Singaland, and you reviewed the statistics at the only casino on the island that is named the Marina Bay Sentosa casino. You observed that people who visited the casinos can be divided into three categories: observers who do not gamble (which accounts for 10% of the people who visited the casinos); recreational gamblers (which accounts for 60% of the people who visited the casinos); and problem gamblers (which accounts for the remaining 30%). The Ministry of Gambling decided that the issue of problem gambling needs to be addressed if more than 1% of the population falls into the category of problem gambling. Based on these statistics, your recommendation for the Ministry of Gambling will be: a. The issue of problem gambling needs to be addressed, since there is more than 1% of the population that falls into the category of problem gambling. b. The issue of problem gambling does not need to be addressed, since there is less than 1% of the population that falls into the category of problem gambling. c. Given that there are 70% of the people who visited the casinos and do not fall into the category of problem gambling, the issue of problem gambling does not need to be addressed. d. There is insufficient information here to decide. e. I don’t know 20. The most common form of colour-blindness (dichromatism) is a sex-linked hereditary condition caused by a defect on the X chromosome, and is a recessive disorder in females. The frequency of the defective allele is about 7% in the population. Assume Hardy-Weinberg equilibrium and that 48% of the population are males and 52% are females, find the percentage of colour-blind people in the population. a. 0.49% b. 3.6% c. 7.0% d. None of the above e. I don’t know 9 CUMULATIVE STANDARD NORMAL TABLE 10 - BLANK PAGE FOR WORKING - 11 - BLANK PAGE FOR WORKING - 12 - BLANK PAGE FOR WORKING - 13 - BLANK PAGE FOR WORKING - 14 - BLANK PAGE FOR WORKING - 15 - BLANK PAGE FOR WORKING - 16