Download Mid-term 2013 - Department of Statistics and Applied Probability

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Name:
Matriculation No.:
NATIONAL UNIVERSITY OF SINGAPORE
FACULTY OF SCIENCE
MID TERM QUIZ I FOR THE DEGREE OF BACHELOR OF SCIENCE
(Semester II 2012-2013)
ST1232 Probability and Statistics
5th March – Time allowed: 1 hour 30 minutes
INSTRUCTIONS TO CANDIDATES
(1) This examination paper contains twenty (20) questions and comprises ten (10)
printed pages including the cover page, and 6 blank pages for workings.
(2) Answer ALL questions. The total number of marks for this exam is sixty (60).
(3) Each correct answer will be awarded 3 marks, whereas one mark will be deducted
for each incorrect answer. No marks will be awarded or deducted if the option (e)
is selected as the answer for any of the questions.
(4) This is a CLOSED BOOK examination.
(5) Candidates may use calculators and may bring one handwritten A4-size help
sheet.
(6) Please enter the answer to each question in the table below.
Question
1
2
3
4
5
6
7
8
9
10
Answer
Question
11
12
13
14
15
16
17
18
19
20
Scores: _______________________________
1
Answer
1. Which of the following are examples of variables measured on an ordinal scale?
i. A numerical scale for quantifying pain that takes integer values from 1 to 10
ii. The age of children in a myopia study conducted for students between 6 – 12
year olds
iii. Serum cholesterol level in normal subjects
iv. The star classification of hotels
v. The number of cigarettes smoked in a day
a.
b.
c.
d.
e.
All except (iii)
(i) and (iv) only
(ii) and (iv) only
(i), (ii) and (iv) only
I don’t know
2. According to the 2010 report from the Ministry of Health, the prevalence of HIV
infection in Singapore is 117 per million in the population. Of the HIV carriers,
51.7% of them were infected through heterosexual sexual transmission, while 37.0%
were infected through homosexual transmission. The remainder were infected
through other reasons, including bisexuality, intraveneous drug use and blood
transfusion. Based on these figures:
a. HIV infection and homosexuality is positively correlated
b. HIV infection and homosexuality is negatively correlated
c. HIV infection and homosexuality is independent
d. There is insufficient information to conclude the relationship between HIV
infection and homosexuality
e. I don’t know
3. Given that the prevalence of HIV in Singapore is 117 per million in the population as
of 2010, Dr Alex Cook of the Department of Statistics & Applied Probability decided
to conduct a survey by randomly sampling 10,000 people in Singapore. What is the
chance that he will actually include more than two people in his sample of 10,000
who are HIV carriers?
a. There isn’t sufficient information provided to calculate this
b. 0.114
c. 0.031
d. 0.969
e. I don’t know
2
4. Due to certain changes in the lifestyle, a woman now believes that she is 90% likely
to be pregnant. She buys two pregnancy test kits, the first is made by Lab-X and has a
sensitivity of 98% and a specificity of 96%, the second is made by Lab-Y and has a
sensitivity of 94% and a specificity of 99%. Assume that the results from both test
kits are independent, and that test 1 (from Lab-X) shows a positive result while test 2
(from Lab-Y) shows a negative result. Find the probability that the woman is not
pregnant.
a. 0.402
b. 0.070
c. 0.054
d. 0.338
e. I don’t know
Questions 5 – 8. For the following four questions, please refer to the figure below, which
shows the treatment dosage in milligrams per day for subjects assigned to two different
treatment regimes.
5. Based on the figure above, the following statement is true:
a. There is insufficient information to deduce the relationship between
mean dosage and the median dosage for treatment 1.
b. The median dosage for treatment 1 is higher than the mean dosage
treatment 1.
c. The mean dosage for treatment 1 is higher than the median dosage
treatment 1.
d. There is no difference between the mean and median dosages
treatment 1.
e. I don’t know
3
the
for
for
for
6. Based on the figure above, the following statement is true:
a. There is graphical evidence to suggest there is no difference in the
treatment dosage for treatment 1 and treatment 2.
b. There is no graphical evidence of a difference in the treatment dosage for
treatment 1 and treatment 2.
c. There is insufficient evidence to conclude on the presence of any
difference in the treatment dosage for treatment 1 and treatment 2.
d. There is graphical evidence to suggest that the treatment 2 requires a
higher dosage than treatment 1.
e. I don’t know
7. Based on the figure above, the following statement is clearly true:
i)
The range of the dosage for treatment 2 is greater than the range of the
dosage for treatment 1.
ii)
The dispersion of the dosage for treatment 1 is the same as the dispersion
of the dosage for treatment 2.
iii)
Because of the shorter upper whisker for treatment 1, there are more
outliers in the dosage for treatment 1 than the dosage for treatment 2.
iv)
The inter-quartile ranges of the dosage for treatment 1 and treatment 2 are
the same, since the lower whisker for both treatments are identical.
a.
b.
c.
d.
e.
(iii) and (iv) only
(i) and (iii) only
(ii) and (iii) only
(i) only
I don’t know
8. It was calculated that the mean and median dosages for treatment 1 were 1.97 mg/day
and 1.37 mg/day. The standard deviation for the treatment 1 dosages was 1.87
mg/day, with first and third quartiles given by 0.63 mg/day and 2.73 mg/day
respectively. An appropriate summary statement for the dosage of treatment 1 will be:
a. The appropriate location parameter yields an estimate of 1.97 mg/day,
with a corresponding metric of dispersion estimated at 1.87 mg/day.
b. The appropriate location parameter yields an estimate of 1.37 mg/day,
with a corresponding metric of dispersion estimated at 1.87 mg/day.
c. The appropriate location parameter yields an estimate of 1.37 mg/day,
with a corresponding metric of dispersion estimated at 2.10 mg/day.
d. The appropriate location parameter yields an estimate of 1.97 mg/day,
with a corresponding metric of dispersion estimated at 2.10 mg/day.
e. I don’t know
4
Questions 9 – 11.
The Rhesus blood group can be assumed to be determined genetically from a biallelic
marker, with alleles say, D and d. The allele D is dominant and those with genotypes DD,
and Dd are known as being Rhesus positive, while those with the genotype dd is known
as being Rhesus negative. It is also known that 16% of Irish is of the Rhesus negative
blood group.
9. An Irish man who is known to be Rhesus positive marries a Rhesus negative girl and
has a child who is Rhesus negative. Assume that the population is in Hardy-Weinberg
equilibrium, what is the probability that the father is heterozygous (i.e. carrying the
Dd genotype)?
a. 1
b. 0.48
c. 0.5
d. 0
e. I don’t know
10. Another Irish man who is known to be Rhesus positive marries a girl at random from
the population. Assume that the population is in Hardy-Weinberg equilibrium, what is
the chance that their son will be Rhesus positive?
a. 0.840
b. 0.886
c. 0.771
d. 0.691
e. I don’t know
11. A third Irish man who is known to be Rhesus positive marries a Rhesus positive girl
who is known to be carrier for the Rhesus negative allele (thus she is a heterozygote).
They have a daughter who is Rhesus positive. Assume that the population is in
Hardy-Weinberg equilibrium, what is the chance that the father is also a carrier for
the Rhesus negative allele (i.e. the father is also a heterozygote)?
a. 1
b. 0.48
c. 0.5
d. 0
e. I don’t know
5
Questions 12 – 14.
A standard test for diabetes is based on glucose levels in the blood after fasting for a
prescribed period. For healthy persons, the mean fasting glucose level is found to be 5.31
mmol/L with a standard deviation of 0.58 mmol/L. For untreated diabetes, the mean is
11.74 and the standard deviation is 3.50. In both groups, the levels appear to be
approximately normally distributed. A simple diagnostic test uses the fasting glucose
levels to assign diabetes status. If the level is greater than 6.5, the subject is said to be
diabetic. If the level is less than 6.50, the subject is said to be free from diabetes.
12. Using this diagnostic kit, what is the chance that a randomly chosen non-diabetic
subject will be classified as a diabetic?
a. 0.067
b. 0.020
c. 0.980
d. 0.933
e. I don’t know
13. What is the chance that two of such diagnostic kits will both be correct when used to
diagnose a diabetic subject and another unrelated non-diabetic subject?
a. 0.960
b. 0.019
c. 0.870
d. 0.914
e. I don’t know
14. A student from the Department of Biological Sciences decided to calibrate the
diagnostic kit to achieve a sensitivity of 98% and a specificity of 99%. Such a design
is subsequently used to test 10 subjects, of which three truly are diabetics and the
remaining seven are truly non-diabetic. What is the probability there is only one
misdiagnosis?
a. 0.060
b. 0.092
c. 0.116
d. 0.146
e. I don’t know
6
15. The weight of male students follows a normal distribution with a mean of 69kg and a
variance of 25kg2, while the weight of female students follows a normal distribution
with a mean of 53kg and a variance of 30kg2. What is the probability that two
randomly chosen female students will be heavier than twice the weight of a randomly
chosen male student?
a. 0.001
b. 0.006
c. 0.994
d. 0.999
e. I don’t know
16. Which of the following is a preferred form of summary for ordinal data?
a. Histogram
b. Scatterplot
c. Tabular display of counts
d. Boxplot
e. I don’t know
17. The coefficient of variation (CV) is defined as the standard deviation divided by the
arithmetic mean and is often used as a metric for comparing the variability between
two groups. Based on the definition of the CV, situations that the CV will be useful
will be when:
i) The two groups contain different sample sizes
ii) Different units of measurements were used in the two groups
iii) The measurements were taken at different times
iv) There is considerable skewness in the data from both groups
a.
b.
c.
d.
e.
All of the above
(ii) only
(ii) and (iv) only
None of the above
I don’t know
7
18. For the following figure, which statement best summarizes the data:
a. A linear relationship exists between the outcome measurement and the
treatment dosage that is independent of the gender.
b. A linear relationship exists between the outcome measurement and the
treatment dosage, after adjusting for the effects of gender.
c. A linear relationship exists between the outcome measurement and the
treatment dosage, after assuming an interaction between gender and
treatment dosage on the outcome measurement.
d. There is no linear relationship between the outcome measurement and the
treatment dosage.
e. I don’t know
8
19. You are recruited as an expert consultant to participate in a review of the problem
gambling situation in Singaland, and you reviewed the statistics at the only casino on
the island that is named the Marina Bay Sentosa casino. You observed that people
who visited the casinos can be divided into three categories: observers who do not
gamble (which accounts for 10% of the people who visited the casinos); recreational
gamblers (which accounts for 60% of the people who visited the casinos); and
problem gamblers (which accounts for the remaining 30%). The Ministry of
Gambling decided that the issue of problem gambling needs to be addressed if more
than 1% of the population falls into the category of problem gambling. Based on
these statistics, your recommendation for the Ministry of Gambling will be:
a. The issue of problem gambling needs to be addressed, since there is more
than 1% of the population that falls into the category of problem
gambling.
b. The issue of problem gambling does not need to be addressed, since there
is less than 1% of the population that falls into the category of problem
gambling.
c. Given that there are 70% of the people who visited the casinos and do not
fall into the category of problem gambling, the issue of problem gambling
does not need to be addressed.
d. There is insufficient information here to decide.
e. I don’t know
20. The most common form of colour-blindness (dichromatism) is a sex-linked hereditary
condition caused by a defect on the X chromosome, and is a recessive disorder in
females. The frequency of the defective allele is about 7% in the population. Assume
Hardy-Weinberg equilibrium and that 48% of the population are males and 52% are
females, find the percentage of colour-blind people in the population.
a. 0.49%
b. 3.6%
c. 7.0%
d. None of the above
e. I don’t know
9
CUMULATIVE STANDARD NORMAL TABLE
10
- BLANK PAGE FOR WORKING -
11
- BLANK PAGE FOR WORKING -
12
- BLANK PAGE FOR WORKING -
13
- BLANK PAGE FOR WORKING -
14
- BLANK PAGE FOR WORKING -
15
- BLANK PAGE FOR WORKING -
16