Download AMS 5 – Statistics (Quiz 1) 10 August 2006 Due at the beginning of

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
Transcript
AMS 5 – Statistics (Quiz 1)
10 August 2006
Due at the beginning of class on Tuesday 15 August 2006
The Instructor
Dimitris Fouskakis
Student Name_______________________________________________
1. The conventional method of measuring the refractive status of an eye involves
three quantities: (1) sphere power, (2) cylinder power, and (3) axis.
Optometric researchers studied the variation in these three measures of
refraction (Optometry and Vision Science, June 1995). Twenty-five successive
refractive measurements were obtained on the eyes of over 100 university
students. The cylinder power measurements for the left eye of one particular
student (ID #11) are listed in the table:
-0.08
-0.16
-0.11
-0.20
-0.07
-0.09
-0.06
-0.08
-0.04
-0.06
-0.12
-0.16
-0.06
-1.07
-0.07
-0.12
-0.17
-0.15
-0.10
-0.09
-0.17
-0.17
-0.09
-0.16
-0.21
a) Calculate the mean and the median. Which one is a more appropriate
numerical measure of center for the above data? (7 pts)
b) Do you find any extreme values in the data set? (2 pts)
c) If yes, delete it and calculate again the mean and the median. Which one is
a more appropriate numerical measure of center for the new data? (7 pts)
Answer: a) The most appropriate numerical measure of the above data is the
median. After sorting the data we select the middle number (the 13th ordered
observation), which is –0.11. The mean is –0.15. b) Yes, the value –1.07 is an
extreme one. c) Without the outlier both the mean and the median are
appropriate. The new mean is –0.11 and the new median is –0.105 (much closer
now).
2. In a study to determine the effect of cigarette smoking on phenacetin
metabolism, phenacetin is administered to 12 smokers and 14 nonsmokers.
After two hours, plasma phenacetin level (µg/ml) is measured in all study
participants. Phenacetin levels, smoking status and age for each participant
were recorded.
Table 1: Grouped frequencies and relative frequencies for plasma phenacetin levels.
Plasma
Phenacetin
(mg/ml)
0.005 – 0.505
0.505 – 1.005
1.005 – 1.505
1.505 – 2.005
2.005 – 2.505
2.505 – 3.005
3.005 – 3.505
3.505 – 4.005
Total
Non - Smokers
Frequency
Percent (%)
1
1
2
3
3
2
1
1
14
7
7
14
22
22
14
7
7
100
Smokers
Frequency
Percent (%)
5
2
2
1
0
1
0
1
12
42
17
17
8
0
8
0
8
100
a) Summary statistics for phenacetin levels (µg/ml) are presented in Table 2. Do
these summary statistics suggest anything about the shape of the distribution
for each group? (3 pts)
Table 2: Summary statistics for the phenacetin levels according to study
group
Measure
Mean
Median
Standard Deviation
Non-Smoking
2.05
2.03
0.90
Smoking
0.93
0.57
1.20
b) Draw the frequency and density histograms for each group. (8 pts)
c) Is it a good idea to plot the relative frequencies (percentages) in this example?
(3 pts)
d) For comparing the two groups is it a better idea to use the frequency or the
relative frequency histograms and why? (3 pts)
e) What do the statistical numerical and graphical summaries suggest about the
qualitative effect of smoking on phenacetin concentration? (3 pts)
f) Suppose, it is known that phenacetin metabolism increases with age. Summary
statistics for age according to study group are given in the table below. How
would this factor affect the conclusions made in previous question? (3 pts)
Table 3: Summary statistics for age (in years) according to study
group
Measure
Mean
Median
Standard Deviation
Non-Smoking
47.2
46.5
15.1
Smoking
35.4
30.0
14.6
Answers: a) The mean and the median of the levels taken from the non-smoking
participants are nearly equal, which suggests that the density distribution for this
group will be symmetric. (or nearly symmetric). On the other hand, the mean of
the phenacetin levels taken from the smoking participants is larger than the
0
1
Frequency
2
3
median, which suggests that the density distribution for this group will be skewed
with a tail to the right.
b)
2
4
6
Plasma Phanacetin (non-smokers)
8
0
2
4
6
Plasma Phanacetin (non-smokers)
8
0
.05
Density
.1
.15
.2
0
5
4
Frequency
2
3
1
0
2
4
6
Plasma Phenacetin (Smokers)
8
0
2
4
6
Plasma Phenacetin (Smokers)
8
0
.1
Density
.2
.3
.4
0
c) Yes, the bin lengths are the same. d) The relative frequency ones because they
don’t take into account the difference in the two sample sizes. e) All statistical
summaries suggest that cigarette smokers have lower plasma phenacetin
concentrations than non-smokers. f) If phenacetin metabolism increases with age,
then plasma phenacetin levels would be expected to be higher in older individuals
than in younger ones. If average age of smokers is compared with average age of
non-smokers (Table 3), it appears that, in general, non-smokers are older than
smokers. Thus, if age affects phenacetin metabolism, it may be that age - not
smoking - accounts for the lower levels of phenacetin among smokers in this
study. In other words, age is a confounding factor since it relates with both
smoking status and plasma phenacetin levels.
3. The following data are the failure times in hours of 14 transmissions from
caterpillar tractors belonging to a particular American company:
2603 2320 2161 3286 3168 2376 3330 4159 2537 3814 2157 218 3168 3266
a) Calculate the median of the above data. (2 pts)
b) Without calculating the mean, say whether it is greater, approximately
equal, or smaller than the median. (3 pts)
c) Do you think that the standard deviation provides a good measure of
variability for the above data? If not, can you propose an alternative
measure? Explain briefly. (5 pts)
Answers: a) After sorting the data we select the middle number (the average
between the 7th and 8th ordered observations), which is 2885.5. b) Because of the
extreme observation with the value of 218 the mean will be lower than the
average. c) Again because of the outlier the standard deviation is not a good
measure of variability in this example. An alternative would be the interquartile
range which is the difference between the 75th and 25th percentile.
4. Among applicants to one law school in 1923, the average LSAT score was
about 169, the SD was about 9 and the lowest score was 160. Did the LSAT
scores follow the normal curve? (3 pts)
Answer: No. For example the normal curve says that about 16% of the scores
should be more than 1 SD below the average, and none are.
5. The scores on a Math test of 20 second year students were as follows:
39 61 47 58 65 70 75 66 60 54 55 47 68 82 66 62 54 58 50 42
a) Draw a histogram. Would you say that the histogram follows the
normal curve? (2 pts)
b) Calculate the mean and the standard deviation. (4 pts)
c) Using the normal curve estimate the percentage of students who scored
(i) below 50; (ii) between 52 and 58 and (iii) above 60. (9 pts)
d) One student’s score was 0.7 SDs below average. About what
percentage of the students had higher scores than he did? (3 pts)
e) Estimate the 60th percentile of the Math test. (2 pts)
f) Find the percentile of the student with score 42. (2 pts)
0
.01
Density
.02
.03
.04
Answers: a)
40
50
60
scores
70
80
The histogram follows the normal curve.
b) The mean is 58.95 and the SD = 10.63. c) 0.20; 0.21; 0.47 d) 0.76. e) The score
is 0.25 SDs above average, which is 0.25 × 10.63 = 2.65 points above average.
That’s 58.95 + 2.65 = 61.60. f) This student was 1.59 SDs below average, at the
5th percentile.
6. According to a study done at Keiser Permanente in Walnut Creek, California,
users of oral contraceptives have a higher rate of cervical cancer than nonusers, even after adjusting for age, education and marital status. Investigators
concluded that the pill causes cervical cancer.
a) Is this a controlled experiment or an observational study? (3 pts)
b) Why did the investigators adjust for age, education and marital status?
(3 pts)
c) Women using the pill were likely to differ from non-users on another
factor which affects the risk of cervical cancer. What factor is that?
(3 pts)
d) Were the conclusions of the study justified by the data? Answer yes or
no, and explain briefly. (2 pts)
Answers: a) This is an observational study. b) Rates of cervical cancer go up with
age; women of different marital status have different partners of sexual activity,
and therefore exposed to different kinds of risk; similarly for education. In other
words, these are potential confounders. c) Pill users are more active sexually than
non-users, and have more partners. That seems to be what makes the rate of
cervical cancer higher among pill users. d) No, see c).
7. A die is rolled 10 times. Find the chance of
a) getting 10 sixes. (3 pts)
b) not getting 10 sixes. (3 pts)
c) all the rolls showing 5 spots or less. (3 pts)
Answers: a) (1/ 6)10 = 1/ 60, 466,176 . b) 1 − (1/ 6)10 = 1 − 1/ 60, 466,176 ≈ 1 . c)
(5 / 6)10 = 0.16 = 16% .
8. A die is weighted so that, when the die is thrown, a “6” is three times more
likely to occur than each of the other numbers.
a) Find of the probability of a “6” and of each of the other numbers,
occurring. (3 pts)
b) Find the probability of obtaining on a single throw an even number.
(3 pts)
Answers: a) If we denote with X = P(getting an ace) = P(getting a deuce) =...=
P(getting a five), then P(getting a 6) = 3 × X. Since the probabilities of all possible
outcomes should add up to 1 we have 5X + 3X = 1 ⇒ X = 1/ 8 . Therefore,
P(getting an ace) = P(getting a deuce) =...= P(getting a five) = 1/8 and P(getting a
6) = 3/8.
b) P(getting an even number)=P(getting 2 or 4 or 6) and since these events are
mutually exclusive using the addition rule for probabilities we have
P(getting an even number) = 1/8 + 1/8 + 3/8 = 5/8 = 62.5%.