Download Homework 8 – Solutions Chapter 5D

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Regression toward the mean wikipedia , lookup

Transcript
Homework 8 – Solutions
Chapter 5D
Review Questions.
6. What is an exponential scale? When is an exponential scale useful?
An exponential scale is one in which each unit corresponds to a power of 10. In
general, they are useful for displaying data that vary over a huge range of values.
14. Education and Earnings. Examine Figure 5.15, which shows the unemployment rate
and the median weekly earnings for eight different levels of education.
a. Briefly describe how earnings vary with educational attainment.
Earnings increase with level of education.
b. Briefly describe how unemployment varies with educational attainment.
The unemployment rate decreases with level of education.
c. What is the percentage increase in weekly earnings when a professional degree is
compared to a bachelor’s degree?
$1522 − $978
≈ 56%
$978
d. How much more likely is a high school dropout to be unemployed than a worker
with a bachelor’s degree?
A high school dropout is 9.0/2.8 ≈ 3.2 times as likely to be unemployed than a
someone with a bachelor’s degree.
e. On average, people spend about 45 years in the work force before retiring. Based
on the data in Figure 5.15, how much more would the average college graduate
(bachelor’s degree) earn during these 45 years than the average high school graduate?
In 45 years, the average college graduate would earn 52 × 45 × ($978 − $591) =
$905,580 more than the average high school graduate.
18. Federal Spending. Figure 5.31 shows the major spending categories of the federal
budget over the last 50 years. (Payments to individuals includes Social Security and
Medicare; net interest represents interest payments on the national debt; all other
represents non-defense discretionary spending.)
Interpret the stack plot and discuss some of the trends it reveals.
a. Find the percentage of the budget that went to net interest in 1990, 1995, and 2005.
About 15% of the budget went to net interest in 1990 and 1995; it dropped to
about 8% in 2005.
b. Find the percentage of the budget that went to defense in 1960, 1980, and 2005.
In 1960, about 52% of the budget went to defense; it dropped to about 23% in
1980, and to about 20% in 2005.
c. Find the percentage of the budget that went to payments to individuals in 1980,
2000, and 2005.
Payments to individuals was about 47% of the budget in 1980, 57% in 2000, and
about 54% in 2005.
Average weekly earnings
27. Comparing Earnings. Figure 5.40 compares the average weekly earnings of men and
women. Identify any misleading aspects of the display. Draw the display in a fairer
way.
$900
$800
$700
$600
$500
$400
$300
$200
$100
$0
The graph does not start at zero, making it appear
that women earn about a quarter of what men earn.
A fairer graph would start from zero.
Men
Women
28. Breaking Distances. Figure 5.41 shows the breaking distance for three different cars.
Discuss the ways in which it might be deceptive. How much greater is the breaking
distance of a Lincoln than the breaking distance of a Lexus? Draw the display in a
fairer way.
As presented in Figure 5.41, the Lincoln superficially appears to have a breaking
distance about twice that of the Lexus because the x-scale starts at 170 feet. In fact,
the difference is only about (207 − 187) = 20 feet, or (207 − 187)/187 = 10.7%. A
more fair way of representing the data might start the graph at zero:
Lincoln
Saab
Lexus
0
25 50 75 100 125 150 175 200
Breaking distance (feet)
29. Cell Phone Users. The following table shows the number of cell phone subscribers
in the United States for selected years between 1990 and 2007. Display the data
using both an ordinary vertical scale and an exponential vertical scale. (Hint: For the
exponential scale, use tick marks at 1 million, 10 million, and 100 million.) Which
graph is more useful? Why?
Year
1990
1995
1997
1998
1999
2000
2001
2002
2003
2007
Subscribers (millions)
5.3
33.8
55.3
69.2
86.0
109.5
128.3
140.8
158.7
255.0
Exponential Scale
Subscribers (millions)
Subscribers (millions)
Ordinary Scale
250
200
150
100
50
1990 1995 2000 2005
100
Either scale has its
10
1
1990 1995 2000 2005
uses, the ordinary scale shows a steadily increasing growth, while the exponential
scale illustrates that this growth is somewhat less than exponential.
Chapter 6A
Does it make sense? Decide whether each of the following statements makes sense (or is
clearly true) or does not make sense (or is clearly false). Explain your reasoning.
11. The distribution of grades was left-skewed, but the mean, median, and mode were all
the same.
Does not make sense. If the mean, median, and mode are the same, the distribution
should be symmetric.
Mean, Median, and Mode. Compute the mean, median, and mode of the following data
sets.
14. Body temperatures (in degrees Fahrenheit) of randomly selected normal and healthy
adults:
98.6 98.6 98.0 98.0 99.0
98.4 98.4 98.4 98.4 98.6
The mean, median, and mode are 98.44, 98.4, and 98.4, respectively.
20. Margin of Victory. The data set below gives the margin of victory in the NFL Superbowl games for 2002-2009.
3
12
11
3
3
27
3
27
a. Find the mean and median margin of victory.
The mean is 11.125 points and the median is 7 points.
b. Identify the outliers in the set. If you eliminate the outliers on the high side, what
are the new mean and median.
After eliminating outliers on the high side, the mean is 5.83 points and the median
is 3 points.
Approximate Average. State, with an explanation, whether the mean, median, or mode
gives the best description of the following averages.
23. The average number of times that people change jobs during their careers.
The distribution is probably right-skewed by a few people who change jobs frequently,
so the median will give a better description.
Describing Distributions. Consider the following distributions.
a. How many peaks would you expect from the distribution? Explain.
b. Would you expect the distribution to be symmetric, left-skewed, or right-skewed?
Explain.
c. Would you expect the variation of the distribution to be small, moderate, or large?
Explain.
27. The exam scores for 100 students when 40 students got an F, 25 students got a D, and
20 students got a C.
a. There would be one peak on the far left (F’s).
b. The distribution would be right-skewed because the scores trail off to the right.
c. The variation is large.
32. The weights of cars at a dealership at which about half of the inventory consists of
compact cars and half of the inventory consists of sport utility vehicles.
a. There would likely be two peaks, one for compact cars, and one for sport utility
vehicles.
b. The distribution would be symmetric.
c. The variation would be moderate because, although the difference in weight between compact cars and sport utility vehicles is large, the differences between
compact cars or between sport utility vehicles tends to be small.
Smooth Distributions. Through each histogram, draw a smooth curve that captures its
important features. Then classify the distribution according to its number of peaks,
symmetry or skewness, and variation.
35. Times between 300 eruptions of Old Faithful geyser in Yellowstone National Park,
shown in Figure 6.6.
Times Between Eruptions of Old Faithful
60
Frequency
50
40
30
20
10
0
50
60 70 80 90 100 110
Time (minutes)
The distribution has two peaks (i.e., it
is bimodal), with no symmetry and large
variation.
36. Time until failure for a sample of 108 computer chips that failed, shown in Figure 6.7.
The distribution has one peak and is
right-skewed. Although most of the data
is clustered around its peak, the distribution has considerable spread, so it has
moderate variation.
Frequency
Failure Time of Computer Chips
50
45
40
35
30
25
20
15
10
5
0
2
4
6
8
Time (months)
10
12
39. Family Income. Suppose you study family income in a random sample of 300 families.
You find that the mean family income is $55,000; the median is $45,000; and the highest
and lowest incomes are $250,000 and $2400, respectively.
a. Draw a rough sketch of the income distribution, with clearly labeled axes. Describe
the distribution as symmetric, left-skewed, or right-skewed.
Sketches will vary, but, because the mean is larger than the median and because
there are large outliers, the distribution is likely right-skewed with a single peak
at the mode.
b. How many families in the sample earned less than $45,000? Explain how you know.
About 150 families (50%0 earned less than $45,000 because that value is the
median income.
c. Based on the given information, can you determine how many families earned more
than $55,000? Why or why not?
Other than to say that is less than half, we don’t have enough information to
determine how many families earned more than $55,000.
Chapter 6B
Does it make sense? Decide whether each of the following statements makes sense (or is
clearly true) or does not make sense (or is clearly false). Explain your reasoning.
9. For the 30 students who took the test, the high score was 80, the median was 74, and
the low score was 40.
Makes sense. Supposing half the students scored 74 or better this is entirely possible.
12. The mean gas mileage of the compact cars we tested was 34 miles per gallon, with a
standard deviation of 5 gallons.
Does not make sense. The standard deviation and mean should have the same units.
Comparing Variations. Consider the following data sets.
a.
b.
c.
d.
Find the mean, median, and range for each of the two data sets.
Give the five number summary and draw a boxplot for each of the two data sets.
Find the standard deviation for each of the two data sets.
Apply the range rule of thumb to estimate the standard deviation of each of the
two data sets. How well does the rule work in each case? Briefly discuss why it
does or does not work well.
e. Based on all your results, compare and discuss the two data sets in terms of center
and variation.
15. The table below gives the cost of living index for six East Coast cities and six West
Coast cities (using the ACCRA index, where 100 represents the average cost of living
for all participating cities with a population over 1.5 million).
East Coast Cities
Atlanta
98.2
Baltimore
108.7
Boston
135.4
Miami
111.5
New York City
216.0
Washington, DC 140.0
West Coast Cities
Los Angeles
155.8
Portland
113.2
San Diego
144.8
San Francisco 182.4
San Jose
156.0
Seattle
122.7
a. For the East Coast the mean, median, and range are 134.97, 123.45, and 117.8,
respectively; and, for the West Coast they are 145.82, 150.3, and 69.2, respectively.
b. For the East Coast the five-number summary is (98.2, 108.7, 123.45, 140.0, 216.0),
while for the West Coast, it is (113.2, 122.7, 150.3, 156.0, 182.4). The boxplots
are then:
East Coast
West Coast
80
100
120
140
160
180
200
220
c. The standard deviation is 42.86 for the East Coast, and 25.06 for the West Coast.
d. For the East Coast approximate standard deviation is 117.8/4 = 29.45, which is
a far cry from the true value of 42.86 largely because of New York. On the West
Coast, the approximate value is 69.2/4 = 17.3, which is also fairly inaccurate.
e. The cost of living is smaller, though more varied, on the East Coast.
Understanding Variation. The following exercises give four data sets consisting of seven
numbers.
a. Make a histogram for each set.
b. Give the five-number summary and draw a boxplot for each set.
c. Compute the standard deviation for each set.
d. Bases on your results, briefly explain how the standard deviation provides a useful
single-number summary of the variation in these data sets.
20. The following sets of numbers all have a mean of 6:
{6,6,6,6,6,6,6},{5,5,6,6,6,7,7},
{5,5,5,6,7,7,7},{3,3,3,6,9,9,9}
8
4
6
3
4
2
2
1
0
0
6
a.
4
4
3
3
2
2
1
1
0
5
6
0
7
5
6
7
3
6
9
b. The five number summaries for each of the sets are (in order): (6,6,6,6,6);
(5,5,6,7,7); (5,5,6,7,7); (3,3,6,9,9). The boxplots are:
Set
Set
Set
Set
1
2
3
4
0
2
4
6
8
c. The standard deviations for the sets are (in order): 0.000, 0.816, 1.000, and 3.000.
d. Looking at part c, we can see that the variation increases with each successive set.
23. Portfolio Standard Deviation. The book Investments by Zvi Bodie, Alex Kane,
and Alan Marcus claims that the returns for investment portfolios with a single stock
have a standard deviation of 0.55, while the returns for portfolios with 32 stocks have
a standard deviation of 0.325. Explain how the standard deviation measures the risk
in these two types of portfolios.
A lower standard deviation suggests more certainty in the expected return, and a
lower risk.