Download 11solutions2-2008

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Regression toward the mean wikipedia , lookup

Transcript
Stat 11
February 5, 2008
Homework #2 - SOLUTIONS
This homework is due at the start of class on the date due. You may work in groups, consult with others, or use any
references or tools that seem useful, but you must write up your own solutions.
1. Getting at the truth from a set of measurements. Consider the data for exercise 1.34
(page 37).
a. What is your best estimate of the earth’s density based on these measurements?
(You should make a dot plot or histogram, but you don’t have to turn it in.
Then you should pick your favorite statistic for this purpose --- mean, median,
some kind of trimmed mean, RMS or geometric mean, whatever --- or
make up your own approach.)
The median (5.46) and the mean (about 5.45) are both good choices. 218.5
b. What statistic did you use for part a?
214.0
So would be the midmean or about any other measure of location.
216.5
c. In class we considered 12 reported measurements of the boiling
point of seawater ------>
218.0
(The sum of these 12 numbers is 2281.0.)
219.0
Based on these numbers, what is your best estimate of the
0.0
true boiling point?
My favorite method is to delete the outliers (0.0 and 104.0) --- in
221.0
part because we can see that they arose from something
220.0
other than ordinary measurement errors --- and then to take 104.0
the mean of what’s left. That gives 217.7. Any value above
218.5
about 215 is sensible.
215.5
d. Did you use the same method for parts a and c ? If not, why not?
216.0
There’s no absolute rule for how to chose a central value for a
variable; it depends on the shape of the distribution, how the values
were obtained, and why you need a central value.
Here, there’s really no difference in method. We would have excluded
outliers in part a, too, if there had been any.
2. Combining medians of groups. I class I remarked that you can’t usually determine the
median of a group from the medians of two subgroups. That’s true. But I also said that
it’s possible for the median score of the men in the class to be 50, and the median score of
the women in the class to be 50, but the median of the whole class to be something else.
Was that right? Do ONE of these:
a. Give an example of actual scores for a group of men and women for which
--- the median of the men is 50
--- the median of the women is 50
--- the median of the entire group is different from 50
OR
b. Explain why the median of the entire group would necessarily be 50.
1
For problem 2: The median of the combined group would necessarily be 50.
If half the women are below 50 and half the men are below 50, then
half of everybody must be below 50. Similarly above 50. (Does it
matter whether there are an odd or even number in each group? If
you analyze the cases carefully, you get the same result in each case.)
3. For one measurement the scores of the men were 70, 50, 50, 90 and the scores of the
women were 30, 90, 80, 40, 60, 80, 60, 40.
a. What was the midmean of the men’s scores?
60 (average of the middle two values, 50 and 70)
b. What was the midmean of the women’s scores?
60 (average of the middle 4 values: 40, 60, 60, 80)
c. What was the midmean of all the scores combined?
61 2/3 (ave. of the middle 6 values: 50, 50, 60, 60, 70, 80)
4. Construct a 5-number summary and a Box plot for the combined scores in problem 3.
Five numbers: min, Q3, med, Q1, max = 30, ?, 60, 80, 90.
You could get away with anything from 40 to 50 for Q3, since if you count
off 3 observations from the bottom you end between these values.
The text’s method gives 45.
5. What is the 40-th percentile for the combined scores in problem 3 ?
50
Normal distributions
6. Scores on a typical IQ test have mean 100, standard deviation 16. Assume that they are
normally distributed.
a. What fraction of the scores are between 84 and 116 ?
That’s one standard deviation above and below the mean, so the
answer is 68 %.
b. An article in Parade Magazine reported that Sharon Stone has an IQ of 160.
About what fraction of people taking this test would score at or above 160?
160 is 3.75 standard deviations above the mean
[ (160-100)/16 = 3.75 ] . But my table only goes up to 3.00, and the
book’s Table A only goes to 3.49, so you need to (a) guess or (b) use
Excel or a calculator to find (3.75) = 0.999912. That’s the fraction
of scores below 160, so the answer to the question is 0.000088, or
about 88 per million.
c. Ginger’s score was at the 80-th percentile. What was her score ?
In the table, find the fraction z such that (z) = 0.80. It’s 0.84
(closest approximation in either table), so the 80th percentile is 0.84
standard deviations above the mean. In this case, this means a score
of 100 + (0.84 times 16) = about 113.
2
You should be able to answer the next two problems from pictures and pure thought, without
calculation. Of course, you can calculate if you like.
7. Assume that X is normally distributed with mean 0.0 and standard deviation 5.0,
and Y is normally distributed with mean 0.0 and standard deviation 10.0. Which
variable has a larger fraction of its values above +1 ?
Y has a larger fraction
5
above +1.
10
For X, we’re asking what
fraction is above 0.2 sd’s
above the mean. For Y, we’re asking what fraction is above 0.1 sd’s
above the mean. It’s easier to be 0.1 sd’s above than 0.2 sd’s above, so
the second answer must be larger.
8. Assume that X is normally distributed with mean 5.0 and standard deviation 10.0,
and Y is normally distributed with mean 10.0 and standard deviation 5.0. Which
variable has a larger fraction of its values above +20 ?
To be above +20, a Y value would have to be two standard deviations above
the mean. But an X value would only have to be 1 1/2 standard
deviations above the mean, so X has a larger fraction above +20.
Associations and Correlations
9. Draw what you think a scatterplot would look like for each of these three pairs of variables.
Label your axes.
a. Apples: weight in grams, weight in ounces.
The dots are exactly on a line.
b. College freshmen: reported shoe size, grade point
average. (Is shoe size bimodal? Does that show in the scatterplot?)
There might be one symmetrical blob or two side-byside blobs, depending on how much women’s shoe sizes
overlap men’s.
c. Gasoline: days since your last fill-up, gallons remaining in your tank.
c. Gasoline: days since your last fill-up, gallons remaining in your tank.
Negative association, maybe not very
linear, with some randomness added.
3
10. Can you reconstruct the distributions of both variables from a scatterplot?
7
6.8
BOATS
a. In this scatterplot, what
are the minimum and
maximum values for the
CARS variable?
Minimum --- about 3.05
Maximum – about 4.00
b. Can you reconstruct the
entire 5-number summary for
the BOATS variable?
(That is --- min, Q1, median,
Q3, max.)
(all values approx.)
Min = 6.05
Q1 = 6.25
Median = 6.52
Q3 = 6.6
Max = 6.81
6.6
6.4
6.2
6
2.75
3
3.25
3.5
CARS
3.75
4
4.25
Q1 is here
because 5
of 20
BOAT
values are
below this
line.
(end)
4