Download Problem Set Section 2.1 Frequency Distributions Multiple Births: The

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Bootstrapping (statistics) wikipedia , lookup

History of statistics wikipedia , lookup

Regression toward the mean wikipedia , lookup

Time series wikipedia , lookup

Transcript
Problem Set Section 2.1
Frequency Distributions
1. Multiple Births: The following data represent the number of live multiple births (three
or more babies) in 2002 for women 15 to 44 years old.
Age
15 - 19
20 - 24
25 - 29
30 - 34
35 - 39
40 - 44
Multiple Births
93
(f)
511
1,628
2,832
1,843
377
a. Identify the class width
b. Construct a table to include the frequency distribution, class midpoints, relative
frequency distribution, cumulative frequency distribution, and cumulative
relative frequency distribution.
2. Serum HDL: A doctor randomly selects 40 of his 20 to 29 year old patient and obtains
the following data regarding their serum HDL cholesterol:
70
56
48
48
53
52
66
48
36
49
28
35
58
62
45
60
38
73
45
51
56
51
46
39
56
32
44
60
51
44
63
50
46
69
53
70
33
54
55
52
With the first class having a lower class limit of 20 and a class width of 10, construct a
frequency distribution and relative frequency distribution. Would using a class width of
5 instead of 10 be a better summary of the data? Why?
1
3. Head Circumferences: Refer to the HEADS Data Set. Construct a frequency distribution
for the head circumferences of baby boys and construct a separate frequency
distribution for the head circumferences of baby girls. In both cases, use the classes of
34.0-35.9, 36.0-37.9, and so on. Compare the results and determine whether there
appears to be a significant difference between the two genders.
2
Problem Set Section 2.2
Displaying Data
1. Interpreting a frequency histogram: The following histogram represents the ages of
students in chemistry classes for a particular college.
a. What is the class width?
b. What percentage of the 78 students were younger than 30 years?
c. What is the approximate value of the center?
d. Describe the shape of the distribution as bell shaped or skewed.
25
20
15
10
5
0
0
20
40
60
2. Interpreting Pie Chart:
a. What is the approximate percentage of people with Group A blood?
b. Assuming that the pie chart is based on a sample of 500 people, approximately
how many of those 500 people have Group A blood?
Group O
Group AB
Group A
Group B
3
3. Multiple Births: Use the frequency distribution from #1 in Problem Set 2.1.
a. Construct a histogram corresponding to the given distribution.
b. Add a frequency polygon to the histogram.
c. Construct a cumulative frequency histogram.
d. Add a cumulative frequency polygon (Ogive) to the cumulative frequency
histogram.
4. Government Income: In a recent year the federal government’s income was $1,782.3
billion. The various sources of income are broken down in the following table.
Sources
Individual income taxes
Corporate income taxes
Social insurance taxes
Excise, estate and gift
taxes, customer, and
misc.
a.
b.
c.
d.
Amount $B
793.7
131.8
713
143.8
Construct a Pareto chart that corresponds to the given data.
Construct a relative frequency distribution.
What percentage of income came from corporate income taxes?
Use the results from part b to construct a pie chart that corresponds to the given
data.
5. Per capita GNP: Refer to the Per Capita GNP ($US) for Western Africa.
4
a. List the original data represented by the given stem-and-leaf plot.
b. Construct the dot plot for the data represented by the stem-and-leaf plot.
Hint: use 100, 200, ….., etc.
c. Interpret the data by analyzing the distribution.
6. Rates of Return: Refer to the three-year rates of return for mutual funds.
5.37
4.31
4.13
8.58
3.06 14.48
12.5
8.33
2.34
0.97
8.33
8.89
0.05 13.88
3.71 10.07
2.27 11.91 11.69 12.06
5.99
10.1
6.07
9.88
9.84
7.9
8.21
6.5
4.93
7.75
9.11
6.11
6.83 10.94
5.99
9.38
6.38 10.34
2.86
6.68
a. Construct the stem-and-leaf plot for the data. Hint: First round the data to
the nearest tenth. Then use the tenths place as the leaves.
b. Interpret your results by examining the distribution of data.
7. Oscars: Below are the ages of actors and actresses at the time they won Oscars.
a. Construct a back-to-back stem-and-leaf plot for the given data.
b. Use the results from part (a), compare the two different sets of data, and
interpret any differences in distribution.
Actors:
32 37 36 32 51 53 33 61 35 45 55 39 46
76 37 42 40 32 60 38 56 48 48 40 43 40
62 43 42 44 41 56 39 46 31 47 45 60 36
Actresses:
50 44 35 80 26 28 41 21 61 38 49 33 74
30 33 41 31 35 41 42 37 26 34 34 35 26
61 60 34 24 30 37 31 27 39 34 26 25 33
5
Problem Set Section 2.3
Measures of the Center
Exercises 1- 5, find the a) mean, b) median, c) mode, and d) midrange.
1. Volume of ACME stock group: The volume of a stock is the number of shares traded on
a given day. The following data represent the volume of stock traded for a random
sample of 25 trading days in 2007. The data are in millions.
3.78
6.06
5.32
3.04
10.32
8.74
5.75
3.25
5.64
3.38
4.35
5.34
3.78
5.00
7.25
5.02
6.92
7.57
7.16
6.52
8.40
6.23
6.07
4.88
4.43
2. Drunk Driving: The blood alcohol concentrations of a sample of drivers involved in fatal
crashes and then convicted with jail sentences are given below (based on data from the
U.S. Department of Justice). Given that current state laws prohibit driving with levels
above 0.08 or 0.10, does it appear that these levels are significantly above the maximum
that is allowed?
0.27
0.12
0.17
0.16
0.17
0.21
0.16
0.17
0.13
0.18
0.24
0.29
0.24
0.14
0.16
3. Customer Waiting Times: Waiting times (in minutes) of customers at the Green Valley
Bank and the Bank of Big Spenders. Does there appear to be a difference between the
two data sets that is not apparent from a comparison of the measures of center. If so,
what is it?
GVB
Spenders
6.5
4.2
6.6
5.4
6.7
5.8
6.8
6.2
7.1
6.7
7.3
7.7
7.4
7.7
7.7
8.5
7.7
9.3
7.7
10.0
4. Regular Pepsi / Diet Pepsi: Weights (pounds) of samples of the contents in cans of
regular Pepsi and Diet Pepsi. Compare the two sets of data. Can you explain the
difference?
Regular
Diet
0.8192
0.7773
0.8150
0.7758
0.8163
0.7896
0.8211
0.7868
0.8181
0.7844
0.8247
0.7861
5. Head Circumferences: Refer to the raw data (not the frequency distribution) from
Section 2.1 problem #3.
a. Compare the two sets of data. Does there appear to be a difference between
the two genders?
6
b. Compare the mean and median. Do you think the distribution is approximately
symmetric or is it skewed? Based on your answer, which measure better
describes the center of the distribution? Why?
6. Multiple Births: Refer to the data from problem #1 in section 2.1. Calculate the mean.
Set up a table. Use a formula. Show set up.
7. Weights of 6th graders: The following data represent the weights of 6th graders at
Johnson Middle School. Calculate the mean using a calculator. Copy the table and add
a column for the midpoint.
Weights
58.3 - 61.3
61.3 - 64.3
64.3 - 67.3
67.3 - 70.3
70.3 - 73.3
73.3 - 76.3
76.3 - 79.3
79.3 - 82.3
82.3 - 85.3
85.3 - 88.3
Frequency
4
8
12
13
21
15
12
9
4
2
7
Problem Set Section 2.4
Measures of Dispersion
Exercises 1- 5, find the a) range, b) variance, and c) standard deviation.
1. Volume of ACME stock group: Refer to the data used in problem #1 in section 2.3.
Use the mean found in section 2.3 and the standard deviation to find the boundaries
for days when the volume was unusual. Are there any days when the volume of
stock being traded was unusual?
2. Drunk Driving: Refer to the data used in problem #2 in section 2.3. When the state
wages a campaign to “reduce drunk driving”, is the campaign intended to lower the
standard deviation?
3. Customer Waiting Times: Refer to the data in problem #3 in section 2.3. Compare
the two sets of data. Does the standard deviation validate your conclusion for this
problem in section 2.3? Explain.
4. A Fish Story: Fred and Ted went on a 10-day fishing trip. The number of trout
caught and released by the two boys each day was as follows.
Fred
Ted
9
15
24
2
8
3
9
18
5
20
8
1
9
17
10
2
8
19
10
3
Compare the range and standard deviation. Discuss limitations of the range as a
measure of dispersion.
5. Head Circumferences: Refer to the data in problem #3 in section 2.1. Does there
appear to be a difference between the two genders?
6. Which investment is better? You have $5000 to invest. You decide to invest the
money in the stock market and have narrowed you investment options down to two
mutual funds. The following data represent the historical quarterly rates of return
on each mutual fund for the past 20 quarters (5 years). Which has the best return?
Which has the most risk (volatility)? Which would you invest in and why?
Fund A
Fund B
1.3
5.2
7.3
6.4
-0.3
4.8
8.6
1.9
0.6
2.4
3.4
-0.5
6.8
3.0
3.8
-2.3
5.0
1.8
-1.3
3.1
-5.4 6.7 11.9
3.5 10.5 2.9
-6.7 1.4 8.9
-4.7 -1.1 3.4
4.3 4.3
3.8 5.9
0.3 -2.4
7.7 12.9
7. Multiple Births: Refer to problem #1 in section 2.1. Find the standard deviation of
the summarized data.
8. Weights of 6th graders: Refer to problem #7 in section 2.3. Find the standard
deviation of the summarized data.
8
9. Drunk Driving: Use the mean from problem #2 in section 2.3 and problem #2 in this
section to calculate the boundaries for unusual blood alcohol levels. Based on the
data would a blood alcohol level of 0.24 be unusual?
10. Empirical rule: The weights, in grams, of a pair of kidneys in adult males between
the ages of 40 and 50 has a bell-shaped distribution with a mean of 325 grams and a
standard deviation of 30 grams. Justify your response.
a. About 95% of kidneys will be between what weights?
b. What percentage of kidneys weights between 235 and 415 grams?
c. What percentage of kidneys weights more than 355 grams or less than
295 grams.
11. Chebyshev’s Inequality: In December 2004, the average price of regular unleaded
gasoline per gallon excluding taxes in the United States was $1.37 with a standard
deviation of $0.05. Using Chebyshev’s Inequality what can you conclude about the
percentage of gasoline stations with a price of a gallon of gasoline between
a. $1.52 and $1.22? Justify.
b. $1.47 and $1.27? Justify.
9
Problem Set Section 2.5
Measures of Position
1. Z - Scores. According to the American Freshman the number of hours per week that
college freshman spend studying has a mean of 7.06 hours with a standard deviation
of 2.32 hours. Suppose Sally Simplestudent spends 2 hours per week studying.
a. What is the difference between the time Sally spends studying and
the mean?
b. How many standard deviations is that?
c. Convert Sally’s time to a z score.
d. If we consider an “unusual” amount of time studying to be that which
converts to z scores greater than 2 or less than -2, is Sally’s time spent
studying unusual?
2. Z Scores. Graduating Math and Science Center students have a mean ACT score of
29. Calculate the z-score for their mean relative to the national mean of 21.0 and
standard deviation of 4.7. Is the ACT score of 29 unusual?
3. Z Scores. Graduating Math and Science Center students have a mean SAT score of
1320. Calculate the z-score for their mean relative to the national mean of 1016 and
standard deviation of 157. Is the SAT score of 1320 unusual?
4. Find value. A national achievement test is administered annually to 3rd graders. The
test has a mean score of 100 and a standard deviation of 15. If Jane's z-score is 1.20,
what was her score on the test?
5. Find value. Heights of women have a mean of 63.6 inches with a standard deviation
of 2.5 inches. Julia Roberts has a height that converts to a z score of 2.16. How tall
is Julia?
6. Volume of ACME stock group: Refer to the data and results from problem #1 in
section 2.3 and section 2.4.
a. Find the z-score for the day when 10.32 million shares were traded.
Was this level unusual?
b. Find the z-score for the day when 6.07 million shares were traded.
Was this level unusual?
7. SAT Scores: The follow are results from 5 students who took the SAT test.
960 1280 1020 960 1150
a. Find the mean and standard deviation.
b. Find the z score for each student.
c. Find the mean and standard deviation of the z scores.
d. In general, what can you conclude about the mean and standard
deviation of a set of z scores?
10
8. Compare Z – Scores. The average 20 to 29-year-old man is 69.6 inches tall with a
standard deviation of 2.7 inches, while the average 20 to 29-year-old woman is 64.1
inches tall, with a standard deviation of 2.6 inches. Show z-score calculation in each
case.
a. Who is relatively taller, a 75-inch man or a 70-inch woman?
b. Who is relatively taller, a 68-inch man or a 62-inch woman?
9. Compare Z – Scores. Three students take equivalent statistics tests. Which has the
highest relative score? Show z-score calculation in each case.
a. A score of 144 on a test with a mean of 128 and a standard deviation
of 34.
b. A score of 90 on a test with a mean of 86 and a standard deviation of
18.
c. A score of 18 on a test with a mean of 15 and a standard deviation of
5.
In Exercises 10 – 13, use the eruption duration for Old Faithful (OLDFAITHFUL) listed in
Datasets. Find the percentile corresponding to the given eruption duration. Show
setup.
10. 134
11. 240
12. 276
13. 120
In Exercises 14 – 21, use the eruption duration for Old Faithful (OLDFAITHFUL) listed in
Datasets. Find the value corresponding to the given eruption duration. Show setup.
14. P60
15. P18
16. Q1
17. Q3
18. P85
19. P35
20. P57
21. P93
11
22. Basketball Scores. Suppose the following table represents the outcomes of a
college basketball team (Iowa) in one particular season. Use points for.
School
Home or
Away
Points
for
Points
Against
NORTHWESTERN
H
72
55
PENN STATE
A
69
57
PURDUE
A
59
56
WISCONSIN
H
78
53
OHIO STATE
H
76
62
MICHIGAN
A
71
79
MINNESOTA
A
51
66
ILLINOIS
H
82
65
INDIANA
H
75
67
ILLINOIS
A
51
66
MICHIGAN STATE
A
67
69
MINNESOTA
H
66
68
MICHIGAN
H
80
75
OHIO STATE
A
69
56
WISCONSIN
A
48
49
PURDUE
H
84
62
PENN STATE
H
81
55
NORTHWESTERN
A
75
59
a. Find the percentile for the score of 67.
b. Find Q1 - the 25th percentile (or first quartile).
c. Find the 50th percentile of this data.
12
Problem Set Section 2.6
Bivariate Data Analysis
For each set of bivariate data given below
a)
b)
c)
d)
e)
Plot a scatter diagram (you may use excel)
Find the correlation coefficient (r)
Find the equation of the line of best fit
Plot the line on the scatter diagram (you may use excel)
Find the predicted value if asked
1.
x
1
2
2
5
6
y
2
5
4
15
15
2. Blood Pressure Measurements: Fourteen different second-year medical students took
blood pressure measurements of the same patient and the results are listed below.
Systolic
(x)
138 130 135 140 120 125 120 130 130 144 143 140 130 150
Diastolic
(y)
82
91 100 100 80
90
80
80
80
98 105 85
70 100
Find the best predicted diastolic blood pressure for a person with a systolic reading of 122.
3. Smoking and Nicotine: When nicotine is absorbed by the body, cotinine is produced. A
measurement of cotinine in the body is therefore a good indicator of how much a person
smokes. Listed below are the reported numbers of cigarettes smoked per day and the
measured amounts of cotinine.
Cigarettes (x) 60
Cotinine (y)
179
10
4
15
10
1
20
8
7
10
10
20
283
75
174
209
10
350
2
43
25 408 344
Find the best predicted level of cotinine for a person who smokes 40 cigarettes per day.
Based on the correlation coefficient, is the number of cigarettes smoked useful in
predicting the amount of cotinine in the body?
13
Selected Answers:
Section 2.1
Section 2.2
Section 2.3
1a) mean = 5.768
1b) median =5.64
1c) mode = 3.78
1d) midrange = 6.68
2a) mean = 0.0187
2b) median =
2c) mode = 0.17
2d) midrange =
3a) see slides
4) Regular Pepsi: mean = 0.81907; median = 0.81865; mode = none; midrange = 0.81985
Diet Pepsi: mean = 0.78333; median = 0.78525; mode = none; midrange = 0.78270
6) mean = 72.25
Section 2.4
2a) range = 0.17; σ = .051195; σ2= .00262
11b) at least 75%
7) 5.2
9) no
10a) 265 & 385
Section 2.5
1a) 5.06
1b) 2.18 or -2.18
1c) -2.18
1d) yes
2) 1.7, no
4) 118
6a) 2.50 yes
6b) 0.165 no
7a) 1074, 138.9
7b)-0.82, 1.48, -0.39, -0.82, 0.55 7d) µ=0; σ=1
8a) 70” woman
8b) 68” man
9a) 0.47
9b) 0.22
9c) .6
10) P20=134
12) P98=276
14) P60=241
16) Q1=203
18) P85=270
20) P57=241
22a) P28
22b)Q1=66
22c)P50=71
Section 2.6
1b) 0.985
1c) y=2.862x-.957
2b) 0.658
2c) y=0.769x-14.380
2e) 79.438
3b) 0.262
3c) y=2.479x-139.013
3e) no predictions should be made based off of this data. Correlation coefficient is too low.
14