Download Stats Ch 4 Mrs Warners Homework

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Degrees of freedom (statistics) wikipedia , lookup

History of statistics wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Taylor's law wikipedia , lookup

Regression toward the mean wikipedia , lookup

Transcript
4.1 The mean, π‘₯Μ… = $2118.71 and the median, the middle value is $1688. The mean is larger than the
median because the data set is positively skewed. There are two very high values that bring up the
mean. The median is a better representation of a typical value for this data set.
4.6.
a. The data is represented in a dotplot below. Since the distribution is not symmetrical, the median
would be the preferred measure of center.
b.
0 0 0 0 0 0
212 224 236 236 306
0
59 71 83 106 130 142 142 165 177 189 189 189 201
The mean of the remaining values is 119.947.
3
The trimming percentage is 25 βˆ— 100 = 12%
7
c. You would need to trim 7 values from each side: 25 βˆ— 100 = 28%. Removing all of the zeros would
give an unrepresentative center of value.
4.10
a. Since the mean is greater than the median in both cases, both distributions are positively skewed,
meaning that there are a few very large values that bring the mean up. The difference between the
median and the mean is greater for Bypass surgery, which indicates that the positive skew is larger for
that surgery. It seems that there are a few patients who wait a long time for the surgery.
b. The median wait time is the 50% completion date. That will always be less than the 90% completion
date.
4.12 It depends on what the word β€œaverage” means. If β€œaverage” is median, than this statement is not
possible. But if β€œaverage” is mean, then it is possible. We know that wage distributions are positively
skewed, with a few very large values, so the mean is higher than the median. So it makes sense that
more than 50% of people will earn less than the mean.
4.14 a.
7
𝑝̂ = 10 = .7
7
b.
π‘₯Μ… = 10 = .7 The sample proportion of success is equal to the mean.
c.
𝑝̂ = .8 = 25
π‘₯
x=20, so you need 13 more successes.
4.15 The median can be calculated by putting the values in order:
170
290
350
480
570
790
860
920
1000+
1000+
The median is the average of the middle two values: π‘₯Μƒ = 680 hours.
We cannot calculate a mean, but we could calculate a trimmed mean by trimming two values from
each end. The 20% trimmed mean is 661.667 mean.
4.16 19 βˆ— 70 = 1330 π‘π‘œπ‘–π‘›π‘‘π‘  π‘’π‘Žπ‘Ÿπ‘›π‘’π‘‘ π‘ π‘œ π‘“π‘Žπ‘Ÿ
1330+π‘₯
2000
= .71 π‘₯ = 90
1330+π‘₯
2000
= .72 π‘₯ = 110 Not possible
4.22
55 60 60 110 110 115 130 140 155 180 195 195
85
167.5
Lower Quartile
Median
Upper Quartile
Interquartile Range: 167.5-85=82.5 mg/cup
4.24 1,2,3,4,5
π‘₯Μ… = 3
𝑠 = 1.58
1,3,3,3,5
π‘₯Μ… = 3
𝑠 = 1.41
1,2,3,4,5
π‘₯Μ… = 3
𝑠 = 1.58
6,7,8,9,10
π‘₯Μ… = 8
𝑠 = 1.58
4.27 a.
π‘₯Μ… =
141+142+178+72+219+138+171+134+210+70
10
= 147.5
Variance: 𝑠 2 =
(141βˆ’π‘₯Μ… )2 +(142βˆ’π‘₯Μ… )2 +(178βˆ’π‘₯Μ… )2 +(72βˆ’π‘₯Μ… )2 +(219βˆ’π‘₯Μ… )2 +(138βˆ’π‘₯Μ… )2 +(171βˆ’π‘₯Μ… )2 +(134βˆ’π‘₯Μ… )2 +(210βˆ’π‘₯Μ… )2 +(70βˆ’π‘₯Μ… )2
9
𝑠 = √2505.83 = 50.058
b.
Smaller. The data for Memorial Day is much more consistent.
c.
Using my calculator to find the standard deviations, I find:
= 2505.83
Holiday
New Years
Memorial Day
July 4th
Labor Day
Thanksgiving
Christmas
Consistent Day?
No
Yes
No
Yes
Yes
No
Standard Deviation
50.058
18.224
47.139
17.725
15.312
52.370
So we do see that holidays that fall on consistent days tend to have more consistent data, since their
standard deviations are smaller.
4.28 a.
The lower quartile must be less than the median, so the lower quartile must be less than 14.
b. The upper quartile must be greater than the median, and less than 90% (the upper quartile is
75%) so the upper quartile is between 13 and 42.
c. The people who wait the most as defined here are in the 95th percentile, so it must be greater
than the 90th percentile, and must be greater than 42.
4.29 a. There were more houses sold in Los Osos than in Morro Bay, so an average of the two areas
would have to account for that fact.
b.
The range of values in Pasa Robles is much greater than the range of values in Grover
Beach, so Paso Robles is likely to have a higher standard deviation.
c.
Since Paso Robles has a highest value that is double that of Grover Beach, it is likely that
the median value is higher in Paso Robles.
4.30 Using my calculator, the standard deviation of the cases listed is 606.894 and the mean is 747.370.
So the mean plus two standard deviations is 747.370+2(606.894)=1961.138 thousands, or $1,961,158.
4.31 a. Using my calculator, I find:
Sample 1
Mean
7.81
Standard Deviation
0.398
Sample 2
49.68
1.739
Coefficient of Variation
. 398
βˆ— 100 = 5.096
7.81
1.739
= 3.500
49.68
Sample 1 is measured in ounces, and the values are smaller compared to Sample 2 which is measured in
pounds. It makes sense that the coefficient of variation for Sample 1 would be smaller since adding just
a little more or less makes a bigger difference in a small container.
4.32 a.
Since the mean is greater than the median, the distribution is positively skewed.
b.
c. There are no outliers at the upper end. An outliers at the upper end would need to be three
interquartile ranges away from the upper quartile. The IQR is 31-7=24, so an outlier would be
31+3(24)=103. The largest value given is 205, which is beyond 103, so it is an extreme outlier.
4.33 Median:
57.3+58.7
2
= 58
Lower Quartile: 53.5
Upper Quartile: 64.4
b. The IQR is 10.9. So an outlier would be below 53.5-1.5(10.9)=37.5. Both Alaska and Wyoming are
outliers.
c.
The median percent of population of a state that was born there and still lives there is 58%. There are
two outliers at the lower end. If those are ignored, the distribution is roughly symmetrical.
4.36
a.
Fiber: 7 7
7
7
7
8
8
8
8
8
10 10 10 12 12 12 13 14
5
6
6
9
9
10 10 10 10 11 11 13 14 17 18 19
Median: 8
Lower Quartile: 7
Upper Quartile: 12
IQR: 12-7=5
b.
Sugar: 0 0
Median: 10
Lower Quartile: 6
Upper Quartile: 13
IQR= 13-6=7
c.
An outlier would be 6-1.5(7)=-4.5 or 13+1.5(7)=23.5
No, there are no outliers in the sugar values.
d. The fiber data includes 5 values of 7, which is more than 25% of the data. And so the lowest value is
the same as the lower quartile.
e.
The median sugar content in the cereal is 10, while the median fiber content is 8. In general, there is
more sugar in the cereals than fiber. There is more variability in the sugar content distribution, with a
range of 19 and IQR of 7. The fiber distribution is smaller, with a range of 7 and IQR of 5. The distribution
of the sugar content is roughly symmetrical, while the fiber content distribution is positively skewed.
4.37 a. Since there seem to be outliers at either end, the IQR would be more useful as a measure of
variability.
b.
Lower quartile:
84+79
2
= 81.5
Upper quartile: 94
IQR: 12.5
81.5-3(12.5)=44
81.5-1.5(12.5)=62.75
94+1.5(12.5)=112.75
94+3(12.5)=131.5
The data point for farmer is less than 44, so it is an extreme outlier. The data point for student is
above 131.5, so it is an extreme outlier. There are no non-extreme outlier.
c.
d. It is reasonable to offer discounts to professions that would qualify as low outliers, which
mean just farmers would get the discount. It is also reasonable to offer discounts to those in the lower
quartile, which would be the last 10 professions on the list.
4.38
a.
One std dev above mean: 40min
One std dev below mean: 30 min
Two std dev above mean: 45 min
Two std dev below mean: 25 min
1
b.
100 (1 βˆ’ 22 ) = 75%
c.
These are three std dev away, so 100 (1 βˆ’
1
)
32
= 88.8%. So 11.1% are outside the give range.
d. 95% are within two std deviations. Outside of three std deviations is .3%. Less than 20 mintes is half
that, so .15%
4.39
a. 27 and 57 mph is +/- 1 standard deviation from the mean. So 68% of the speeds would be between
those values.
b.
100βˆ’68
2
= 16%
4.41
a. The values given are two std deviations above and below the mean. So using Chebyshev’s Rule at
least 75% of the observations must lie between those values.
b.
Three standard deviations: above 2.9 and below 70.94
c.
24.76-2(1720)= a negative number. So the distribution cannot be normal.
4.43 First test 𝑧 =
625βˆ’475
100
= 1.5. Second test 𝑧 =
45βˆ’30
8
= 1.875. Since the z score is higher for the
second test, so the student scores better relative to others on the second test.
4.48
a. A z score of 2.2 means you were 2.2 std deviations above the mean, and you performed better
than 95% of the class.
b. A z-score of .4 means you were .4 std dev above the mean, and you performed better than
average.
c. A z-score of 1.8 means you were 1.8 std dev above the mean and you’re around the 90th
percentile.
d. A z-score of 1 means you were 1 std dev above the mean and you performed better than 68% of
the students.
e.
A z-score of 0 means you were right at average.