Download 5 Describing Distiribution Problems Part2 Answer Key

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Transcript
Describing Distributions Problems Part2 Answer Key
1. Determine if any of the data sets contain outliers.
Cars99
CityMPG HwyMpg FuelCapacity Acc030 Acc060 QtrMile
17
19
20.5
23
30
23
26
28.5
31
38
10.3
14.5
16.2
18.45
23.7
2.4
3.3
3.5
3.9
4.5
5.6
8.8
9.5
10.9
12.5
14.1
16.8
17.4
18.2
19.1
S1 =
CItyMPG
Q1 (
min ( )
)
S2 =
S3 = median ( )
S4IQR
= Q3 (= )23 – 19 = 4
S51.5(4)
= max (=) 6
[19 – 6, 23 + 6] = [13, 29]
Since 30 is larger than 29 there is an at least one outlier on the upper end
HwyMPG
IQR = 31 – 26 = 5
1.5(5) = 7.5
[26 – 7.5, 31 + 7.5] = [18.5, 38.5]
Since all the numbers are within this interval there are no outliers
FuelCapacity
IQR = 18.45 – 14.5 = 3.95
1.5(3.95) = 5.925
[14.5 – 5.925, 18.45 + 5.925] = [8.575, 24.375]
Since all the numbers are within this interval there are no outliers
Acc030
IQR = 3.9 – 3.3 = 0.6
1.5(0.6) = 0.9
[3.3 – 0.9, 3.9 + 0.9] = [2.4, 4.8]
Since all the numbers are within this interval there are no outliers
Acc060
IQR = 10.9 – 8.8 = 2.1
1.5(2.1) = 3.15
[8.8 - 3.15, 10.9 + 3.15] = [5.65, 14.05]
Since 5.6 is smaller than 5.65 there is at least one outlier on the lower end
QtrMile
IQR = 18.2 – 16.8 = 1.4
1.5(1.4) = 2.1
[16.8 – 2.1, 18.2 + 2.1] = [14.7, 20.3]
Since 14.1 is smaller than 14.7 there is at least one outlier on the lower end
2. Affect of changing value on Center and Spread
a.
b.
c.
d.
e.
Create a box plot and stem-and-leaf plot for Bonds Homeruns.
Compute summary statistics for Bonds Homeruns
Multiply each original value by 3 and re-compute. What changed?
Add 2 to each original value and re-compute. What changed?
Change Bonds highest number of homeruns to 100 and re-compute. What
changed?
f. What conclusion can be made by your result in parts (c) – (i)
Bonds_Home_Runs
5
16
19
24
25
25
26
28
33
33
34
34
37
37
40
42
45
45
46
46
49
73
0
1
2
3
4
5
6
7
5
6
4
3
0
Collection 2
9
5 5 6 8
3 4 4 5 5
2 4 4 5 5 9
0
3
Box Plot
10 20 30 40 50 60 70 80
Bonds_Home_Runs
When you add or subtract a number to every value of a
data set, measures of center (mean and median) will
change by what you added or subtracted by. However,
measures of spread (Range, IQR, and Standard
Deviation) are unaffected
When you multiply or divide a number to every value of
a data set both measure of center and spread will be
altered by whatever you multiplied or divided.
When you adjust high or low extreme values, keeping
them as high or low extreme values, measures which are
remain the same(Q1, Median, Q3, and IQR) are called
Resistant Statistical Measures and measures which are
altered (Mean, Standard Deviation, and Range) are
called Non-Resistant Statistical Measures.
3. Comparing Highway MPG for Car Types
a. Which car type tends to get the best fuel efficiency? Second Best? Worst?
Small is Best, Family is Second, and Luxury is Worst
b. Which car type has the most variability? Least variability?
Sports is most variable and Upscale is least variable
c. Compare the family and large vehicle types.
Family and large are both symmetric data sets. Family cars have a
smaller spread with an IQR of 2 compared to Large cars with an IQR
of 4. The center for Family cars is bigger at 29 mpg compared to
Large cars at 27 mpg.
I would conclude that Family cars are more fuel-efficient than Large
cars. This is because Family cars have a larger center and a tighter
spread making the points focus around the center. Also more than
75% of Family cars are above 27mpg compared to only 50% of Large.
d. Compare the large and upscale vehicle types.
Large cars are symmetric and Upscale cars are skewed left. The
medians for each are about the same at 27 mpg. The spread for Large
cars is bigger with and IQR of 4 compared to than the spread for
Upscale cars with an IQR of 2. Large cars are more fuel efficient
because 25% of Large cars get above 29mpg and none of the Upscale
cars get above 29mpg
Boxplot of hwy mpg vs type
40
hwy mpg
35
30
25
family
large
luxury
small
type
sports
upscale
4. The other day I reached into my penny jar and pulled out 1000 pennies. I then
recorded the age of each penny. A graph of my finding are shown blow.
Describe the distribution seen below. Also state the relationship between the mean
and median.
1000Pennies
140
Histogram
The graph is skewed to the right, the
median age is around 12.5 years old,
and the age of the pennies have a
range of about 60 years. There is a
cluster around 2.5 years old and
there appears to be an outlier at 60
years old. Since the graph is skewed
right the mean is greater than the
median.
120
100
80
60
40
20
0
10
20
30 40
Age
50
60
70
5. Below is a graph of the number of losses for National League teams during the
1999 season. Describe the distribution seen below. Also state the relationship
between the mean and median.
NLPayroll
Box Plot
The graph is skewed to the left, the
median number of losses is about 85
games, and the numbers of losses
have a range of 40 games with an
IQR of 30 games. There is a cluster
centered on 87. There are no
apparent outliers in the data set.
Since the graph is skewed left the
mean is less than the median.
60 70 80 90 100
Losses
6. The graph below shows Skull measurements of 150 male Egyptian skulls from 5
different time periods. Describe the distribution seen below. Also state the
relationship between the mean and median.
Collection 1
Dot Plot
The graph is symmetric, the median
skull measurement is 95, and the
measurements have a range of 35.
There is a cluster around 95 and
there appears to be two outliers, one
at 81 and one at 115. Since the graph
is symmetric the mean and median
are about the same.
80
90
100
BL
110
120
7. The five number summary of a data set is (17, 27, 35, 49, 90)
a) Are there any outliers in this data set? Exaplin. (Show all work)
IQR = 22
1.5IQR = 33
Outliers exist outside [-6, 82]. Since the maximum lies outside
this interval, there is at least one outlier at 90
b) Is the mean less than, equal to, or greater than the median? Why?
Max – Median = 55
Median – Min = 18
Since the right side is more stretched out the graph is skewed
right, therefore the mean > median
c) If the maximum is changed from 90 to 60 what will happen the following
statistics?
(i) Mean (Decrease)
(ii) Median (Stay the Same)
(iii)Standard Deviation (Decrease)
(iv) Inter Quartile range (Stay the Same)
d) If 7 is added to every value in the data set, what will happen to the
following statistics?
(i) Mean (Increase by 7)
(ii) Median (Increase by 7)
(iii)Standard Deviation (Stay the Same)
(iv) Inter Quartile range (Stay the Same)
e) If 4 is multiplied to every value in the data set, what will happen to the
following statistics?
(i) Mean (Multiplied by 4)
(ii) Median (Multiplied by 4)
(iii)Standard Deviation (Multiplied by 4)
(iv) Inter Quartile range (Multiplied by 4)
8. The statistics below summarize the money earned, in million of dollars, during the
first a second weekend for movies that open in 1999.
Weekend 1
Weekend 2
Minimum
1
2
Lower Quartile 23
17
Median
34
23
Upper Quartile 43
26
Maximum
66
51
a. Construct parallel boxplots using the given data. Show your plots on the grid
below.
Weekend 1
1
23
2
17
23
34
26
43
66
51
Weekend 2
0
5 10 15 20 25 30 35 40 45 50 55 60
b. Compare the distribution of the two data sets (Use IQR for Spread).
65
70
The graph for Weekend 1 is symmetric and the graph for Weekend 2 is
skewed right. The median for Weekend 2 is 23 million, which is
smaller than the median for Weekend 1 at 34 Million. The spread for
Weekend 2 is smaller with and IQR of 9 million than the spread for
Weekend 1 with an IQR of 20 million
c. Based on the boxplots, what conclusion can you make? Explain.
Since Weekend 2 has a smaller center and a smaller spread, making
the values focus around the center, I would conclude that movies
make less money during there second weekend of release than during
there first weekend of release.
Since the upper 75% of the movies during Weekend 1 make more
than or equal to the upper 50% of movies during Weekend 2, I would
conclude that movies make more money during Weekend 1
11. Two identical footballs, one air-filled and one helium-filled, were used outdoors
on a windless day at The Ohio State University's athletic complex. Each football
was kicked 39 times and the two footballs were alternated with each kick. The
experimenter recorded the distance traveled by each ball.
Air
15
16
18
19
20
20
22
22
22
23
24
24
25
25
25
25
25
26
26
26
27
27
27
28
28
28
28
28
28
29
29
29
31
31
31
32
33
34
35
Helium
11
12
14
14
16
22
22
23
23
24
25
25
25
26
26
26
26
26
27
28
28
28
29
29
29
29
29
30
30
30
30
31
31
32
32
33
34
35
39
a. Find the five number summary for each and determine if either data set
Collection
1
contains
outliers.
Air Helium
15
11
23
24
26
28
29
30
35
39
S1 = min ( )
Air
Helium
IQR = 29 – 23 = 6
IQR = 30 – 24 = 6
1.5(6) = 9
1.5(6) = 9
[23 – 9, 29 + 9] = [14, 38]
[24 – 9, 30 + 9] = [15, 39]
There are no outliers
11, 12, and 14 are outliers
b. Create side-by side box-plots and back-to-back stem-and-leaf plots for
each. For the boxplots properly indicate outliers.
Collection 1
0
Box Plot
5
10
15
9865
44322200
99988888877766655555
432111
5
20
1
1
2
2
3
3
25
30
35
1244
6
22334
55566666788899999
0000112234
55
40
14. The five-number summary for the weights (in pounds) of fish caught in a bass
tournament is.
2.3 2.8 3.0 3.3 4.5
a.
Would you expect the mean weight of all fish caught to be higher or
lower than the median? Explain.
Median – Min = 3 – 2.3 = 0.7
Max – Median = 4.5 – 3 = 1.5
Since the right side is more stretched out the graph would be
skewed right making the mean greater than the median
b.
You caught 3 bass weighing 2.3 pounds, 3.9 pounds, and 4.2 pounds.
Were any of your fish outliers? Explain.
IQR = 3.3 – 2.8 = 0.5
1.5(.5) = 0.75
[2.8 – 0.75, 3.3 + 0.75] = [2.05, 4.05]
4.2 would be an outlier
c.
If the maximum weight was recorded incorrectly and it was actually
5.4, instead of 4.5, what statistical measures would change and which ones
would stay the same?
Mean, Standard Deviation, and Range would Increase
Median, Q1, Q3, and IQR would stay the same
d.
It was discovered that the scale being used to measure the weights was
found to be overweighing by 1 pound, how would this affect the measure
of center and how would it affect the measures spread?
The measures of center, mean and median, would increase by 1
The measures of spread, IQR and Standard Deviation, would stay
the same
e.
If the data was converted into kilograms how would this affect the
measure of center and how would this affect the measures of spread?
1 pound = 0.454kilograms. Therefore the measures of both center and
spread would be multiplied by 0.454
15. Here are the weekly payrolls for two imaginary restaurants, Mooseburgers and
McTofu.
a. Create parallel boxplots. Label your graph clearly.
Collection 1
100
Box Plot
150
200
250
300
350
b. Write a few sentences comparing the distributions.
McTofu is skewed right compared to Mooseburgers which is
symmetric. McTofu has a median at 120 which is smaller than
Mooseburger’s center which is 134. Both McTofu and
Mooseburgers have IQR of about 20. McTofu has an outlier at 360
but Mooseburgers does not contain any outliers. The Box-Plot do
not indicate any clusters.
c. Which restaurant pays the higher average salary?
McTofu has a higher average because of the extreme outlier pulling
the mean up
d. Why is the mean salary misleading?
Since skew ness and outliers can alter the mean it is not always a
good value to use when describing center
e. At which restaurant would you rather work? Give a sound statistical
justification for your decision.
I would choose to work at Mooseburgers because 75% of the workers
make more than $120 compared to only 50% of McTofu’s workers. I
have a better chance of making more than $120 at Mooseburgers
than at McTofu’s