Download 1exam2-2007-answers

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Foundations of statistics wikipedia , lookup

History of statistics wikipedia , lookup

Categorical variable wikipedia , lookup

Regression toward the mean wikipedia , lookup

Transcript
Name________________________________
Stat 1
November 16, 2007
Exam #2 – SOLUTIONS (Nov. 19)
You may use a calculator. You may not use other references or consult with
anyone except the instructor. You may answer on the test paper itself or use
your own paper. The exam ends at 11:20am by the front wall clock.
Explain your answers! Sometimes an explanation is necessary for credit on a
question; more often, it is necessary for partial credit.
This test contains
10 numbered problems on pages 2-7.
n
1
4.00
5.66
6.93
8.00
8.94
10.00
12.25
14.14
17.32
20.00
22.36
24.49
26.46
28.28
30.00
31.62
37.42
44.72
54.77
70.71
100.00
0.25
0.18
0.14
0.13
0.11
0.10
0.08
0.07
0.06
0.05
0.04
0.04
0.04
0.04
0.033
0.032
0.027
0.022
0.018
0.014
0.010
n
6
32
48
64
80
100
150
200
300
400
500
600
700
800
900
1000
1400
2000
3000
5000
10000
n
It’s always fun to show Pascal’s triangle.
0
1
2
3
4
5
6
0
1
1
1
1
1
1
1
1
2
3
4
5
6
1
2
3
4
5
6
1
3
1
6
4
1
10 10 5 1
15 20 15 6 1
z
-2.8
-2.7
-2.6
-2.5
-2.4
-2.3
-2.2
-2.1
-2.0
-1.9
-1.8
-1.7
-1.6
-1.5
-1.4
-1.3
-1.2
-1.1
-1.0
-0.9
-0.8
-0.7
-0.6
-0.5
-0.4
-0.3
-0.2
-0.1
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
1.1
1.2
1.3
1.4
1.5
1.6
1.7
1.8
1.9
2.0
2.1
2.2
2.3
2.4
2.5
2.6
2.7
2.8
Percentile
0.0026
0.0035
0.0047
0.0062
0.0082
0.0107
0.0139
0.0179
0.0228
0.0287
0.0359
0.0446
0.0548
0.0668
0.0808
0.0968
0.1151
0.1357
0.1587
0.1841
0.2119
0.2420
0.2743
0.3085
0.3446
0.3821
0.4207
0.4602
0.5000
0.5398
0.5793
0.6179
0.6554
0.6915
0.7257
0.7580
0.7881
0.8159
0.8413
0.8643
0.8849
0.9032
0.9192
0.9332
0.9452
0.9554
0.9641
0.9713
0.9772
0.9821
0.9861
0.9893
0.9918
0.9938
0.9953
0.9965
0.9974
1
1. Here is a data table from a (very small) survey of rural children.
interviewee
Height (inches)
Annie
Bob
Connie
Donald
Elsie
Frank
Gigi
Harold
66
65
55
58
59
59
62
56
Number of
siblings
0
4
1
2
2
0
2
0
Distance to
school (miles)
3.3
0.7
1.5
4.5
4.4
3.0
1.0
82.0
The mean of the “daily commuting distance” variable is about 12.5, and the standard
deviation is about 28. Some people would consider these statistics to be poor
choices for describing the distribution.
a. Why would they think that?
Outliers have too much influence; values given do not describe bulk of values
b. Suggest alternative statistics:
Try median and IQR, or five-number summary, or compute mean and SD without
the outlier (but in that case, mention the outlier)
2. The weights of melons shipped from a certain farm are normally distributed with these
statistics:
mean
standard deviation
 = 60 ounces
 = 5 ounces
What fraction of the weights are below 62 ounces ? (Note table on front page.)
The standard score for a weight of 62 ounces is (62 – 60) / 5 = 0.40.
According to the table, 0.6554 of all normally-distributed values fall below
a standard score of 0.40. So, the answer is 65.54%.
2
3.
A certain data table includes 100 values for a variable called “score.” Here is a histogram
for this variable.
(Note: Most of the values are not whole numbers, so there are aren’t many values at the
boundaries of these intervals.)
The mean of this variable is 85.0. Choose one:
the median is less than 85.0
the median is more than 85.0
the median is equal to 85.0 (within rounding)
can’t tell from the information given
First: There are 100 observations, so the median should be between the 50th and
51st observations, counting from the top. From the histogram it appears
that there are 40 observations in the top two boxes, and 20 in the “85-90”
box, so the median should be somewhere in the middle of the 85-90 box.
(Might it be exactly 85, or within rounding? Unlikely, but possible, from the
information given.)
or, Second: There is a substantial skew to the left (that is, large left tail). These
extreme values will have more of an effect on the mean than the median,
pulling the mean to the left of the median. So if the mean is 85, the median
must be more than that.
3
4. In roulette, if you bet on “BLACK”, the probability of winning on each bet is 18/38 (or 9/19,
if you like). Each bet is independent.
a. If you bet twice, what is the probability that you win both bets?
9/19 times 9/19 equals 81/361.
b. If you bet twice, what is the probability that you will lose at least once?
(answer to b) = 1 – (answer to a) = 281 / 361.
5. Here is a probability model for your profit after two bets on “1 to 12”:
value of profit (dollars) 
probability

+4
+1
–2
0.10
0.43
0.47
What is the expected value of the profit?
(+4) times (0.10) = 0.40
(+1) times (0.43) = 0.43
(–2) times (0.47) = –0.94
total
= –0.11
(This is consistent with what we learned about roulette --- you lose an average of
5+ cents for each dollar you bet. Bet twice, lose an average of 10+ cents,
which rounds up to 11 cents.)
6. Here are two sequences of wins and losses with exactly 3 wins and 3 losses:
WWWLLL LWLWWL
In all, how many sequences of wins and losses are there with exactly 3 wins and 3 losses?
There are 20 of them. You can list them systematically, or look at row 6, column
3 of Pascal’s triangle, which is 20.
4
7. Exactly 50% of Swarthmore students expect to leave campus by December 18.
But the 70 students in STAT 11 didn’t know that. So each of them did a poll of
Swarthmore students, each using sample size 100, to see what fraction would leave by
December 18. That meant that among them, they had 70 different estimates of the
fraction. Their estimates ranged from 34% to 64%.
a. What is the “margin of error” of a poll of size 100?
1/sqrt(100) = 1/10 = 0.10 = ten percentage points
(That’s why Gallup doesn’t stop with 100 interviews!)
b. (Roughly) what is the average of their estimates?
The average of the poll results would be exactly 50% in the long run.
For 70 of them the average would be close to 50%, although you couldn’t
predict the average exactly.
c. If the STAT 11 students drew a histogram of their estimates, what would be its shape?
Normal.
8.
Here are some scatterplots. Estimate the correlation coefficient ( r ) for each plot.
16
16
14
14
12
12
10
10
8
8
6
6
4
4
2
2
0
0
0
2
4
6
8
10
r = -1 (below -.9 is ok)
25
0
2
4
6
8
10
r = 0 (-.30 to +.30 is ok)
4
3. 5
20
3
2. 5
15
2
10
1. 5
1
5
0. 5
0
0
0
5
10
15
20
r = 0.80 (0.50 to 0.90 ok)
0
5
10
15
20
r = 0 (by symmetry, since there
is no linear association)
5
9. Consider this scatterplot:
a. Draw a reasonable guess for the regression line. (Think about the slope. What are
you trying to accomplish with the line?)
b. Do you like this regression model? Why or why not?
No, it’s awful. The data are in two groups; any reasonable explanation or
prediction method would take this into account. The negative linear
relationship is entirely between groups, and misses the positive association
within each group.
6
10. Here are some data. Well, actually I spilled a drink on the data, but you can still see the
summary statistics.
What is the regression line? Draw it and write the equation. Show scales on the axes.
CARS (x)
mean
std. dev.
6
2
BOATS (y)
boats
4
3
correlation coefficient: r = + 0.50
cars
The line has to go through the “point of means,” with is x = 6, y = 4.
Its slope is r = +0.50 when measured in standardized units --- that is, “y
standard-deviations per x standard deviation.” In natural units, the slope is
r times ( y std dev / x std dev )
= 0.50 times 3/2
= 0.75.
So the line must have equation
y = 0.75 x plus something
and since this is true when x = 6, y = 4, the “something” must be -0.50. So
the complete equation is
y = 0.75 x – 0.50.
(end of exam)
7