Download HOMEWORK 4 Due: Feb.5

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Bootstrapping (statistics) wikipedia , lookup

Taylor's law wikipedia , lookup

History of statistics wikipedia , lookup

Regression toward the mean wikipedia , lookup

Transcript
HOMEWORK 4
Due: Feb.5
1. Suppose the scores on an achievement test follow an approximately symmetric moundshaped distribution with mean 500, min = 350, and max = 650. Which of the following is the
most likely value of the standard deviation?
10
50
100
150
Justify your answer. A sketch may help.
By the standard deviation rule, the values should extend about three sd’s on each side from the
mean. Since (650 – 500) ÷ 3 = 50, and (500 – 350) ÷ 3 = 50, the most likely value for the sd is 50.
2.
What if the diameters of a sample of new tires coming off one production line turned out to
have a standard deviation of 0. Would the manufacturer be happy or unhappy, assuming the
average diameter was correct? Explain.
Very happy. The only way that the standard deviation could be 0 is if all of the tires had exactly
the same diameter, which is the consistency of product that manufacturer would hope for.
3. For each of the following cases, indicate which would give a better summary of the data: the
five-number summary (min, Q1, median, Q3, max) or the mean and standard deviation?
Explain your choice.
a. Verbal SAT scores for 205 students entering a local college in the fall of 2002:
Since the distribution is mound shaped and roughly symmetrical, the mean and standard deviation
would do a good job of summarizing the distribution. We would not need the five number summary
for this data set.
b. Recent sales prices of homes in a local neighborhood (in thousands of dollars):
.
.
. .
. . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. . . .
.
. . . . .
. .
. . . . . . . . .
.
. . .
.
.
.
200 400 600 800 1000 1200 1400 1600 1800 2000 2200 2400
Here the distribution is significantly skewed, so the five number summary would be better.
5. Extra Credit: I had four sets of data. I have the variables, and I made a histogram and a
boxplot for each, and found the summary statistics for each. Somehow they got mixed up.
(Match each variable to the correct histogram, boxplot, and summary statistics).
Variables: A. Age at death of a sample of people
B. Heights of a class of college students
C. Number of medals won by medal-winning countries in the 2008 Olympics
D. Random numbers between 0 and 9 generated by a computer
Histograms:
a.
c.
b.
d.
Boxplots:
i.
iii.
ii.
iv.
Summary statistics (each row goes with one set of data):
SS1:
Mean: 72.82
Median: 75
Standard deviation: 15.51
IQR: 20
SS2:
Mean: 6.07
Median: 3
Standard deviation: 8.91
IQR: 4
SS3:
Mean: 4.1
Median: 4
Standard deviation: 2.808
IQR: 4
SS4:
Mean: 67.8
Median: 68
Standard deviation: 4.22
IQR: 6.5
First match the variables with the histograms.
A. Age at death of a sample of people
Most people die at older age, around 70 or 80, few die at a younger age, and very few die at a
very young age. Thus, we can expect a left-skewed distribution. And that would be histogram c.
B. Heights of a class of college students
The distribution of heights of college students is usually fairly mound shaped and symmetric.
Most of the students’ height is average, few are taller, and few are shorter, with very few very
tall and very short students. Histogram b shows a roughly symmetric, mound shaped distribution.
C. Number of medals won by medal-winning countries in the 2008 Olympics
Most of the medal winning countries got one or two medals, fewer countries got four of more.
Very few countries got many medals. Thus, we can expect a right-skewed distribution. And that
would be histogram d.
D. Random numbers between 0 and 9 generated by a computer
Since each number between 0 and 9 has the same chance to come up, we can expect a fairly
uniform distribution. That would be histogram a, then.
Now match the histograms to the boxplots:
Boxplot i definitely goes with histogram d showing that the distribution is highly skewed to the right.
Boxplot iii show that the distribution is skewed to the left, with one outlier, and that matches is up with
histogram c.
Boxplots ii and iv are bit trickier since they are very similar. But the distribution of the heights (histogram
b) shows a little skewness to the left, just like boxplot ii. Boxplot iv shows a symmetric distribution,
maybe a tiny bit skewed to the right, and that matches up with histogram a.
Now let’s match the summary statistics with the variables:
SS1:
Mean: 72.82
Median: 75
Standard deviation: 15.51
IQR: 20
SS2:
Mean: 6.07
Median: 3
Standard deviation: 8.91
IQR: 4
SS3:
Mean: 4.1
Median: 4
Standard deviation: 2.808
IQR: 4
SS4:
Mean: 67.8
Median: 68
Standard deviation: 4.22
IQR: 6.5
SS3 shows that the mean and the median are almost the same, meaning that the distribution is
almost symmetric. Same for SS4.
We had two fairly symmetric distributions: the random numbers, and the height. Now it doesn’t
make sense to say that the mean height of college students is 4.1, so it must be 67.8. Thus, SS4 goes
with the heights, and SS3 goes with the random numbers.
We have two more variables left: age at death, and the medals won, and two more summaries: SS1
and SS2. Again, it doesn’t make sense to say that the mean age at death was 6.07, so SS2 must
belong to the variable “medals won”, and SS1, with a mean of 72.82 must belong to the variable
“age at death” (at this mean makes sense for age at death).
Also, you can compare the mean and the median again. For SS1 median > mean, which usually
means that the distribution is left-skewed, and that’s “age at death”.
For SS2, mean > median, that means that the distribution is right-skewed, thus it must belong to
the variable “medals won”.
So, here’s the solution:
A/c/iii/SS1
B/b/ii/SS4
C/d/i/SS2
D/a/iv/SS3