Download Normal Distributions

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Bootstrapping (statistics) wikipedia , lookup

Receiver operating characteristic wikipedia , lookup

History of statistics wikipedia , lookup

Regression toward the mean wikipedia , lookup

Transcript
Name:
Homework Chapter 5
1.
Use the graph below
a) Why is the total area under this curve equal to 1?
Rectangle; A = LW A = 1(1) = 1
1
b) What percent of the observations lie above 0.8?
1 - .8 = .2; A = LW A = 1(.2) = .2; 20%
c) What percent of the observations lie between
0.25 and 0.75?
.75 - .25 = .5; A = LW A = 1(.5) = .5; 50%
d) What is the mean of this distribution?
πœ‡ = .5
2.
Use the graph below
a) Verify that the graph is a valid density curve.
3
1
All on or above horizontal axis; Rectangle + Triangle1(. 8) + 2 (1). 4) = 1
2
b) The median of this density curve is a point
between X = 0.2 and X = 0.4. Explain why.
1
0
0.2
0.4
0.6
0.8
0 ≀ π‘₯ ≀ .2 = .35 π‘Žπ‘›π‘‘. 4 ≀ π‘₯ ≀ .8 = .4; .5 β„Žπ‘Žπ‘  π‘‘π‘œ 𝑏𝑒 𝑏𝑒𝑑𝑀𝑒𝑒𝑛 .2 π‘Žπ‘›π‘‘ .4
c) Find the proportion of observations within each interval
i
0.6 < x < 0.8
. 8 βˆ’ .6 = .2
1(. 2) = .2
ii
0 < x < 0.4
iii
Complement:
0.4 ≀ π‘₯ ≀ 0.8 = .4
1 βˆ’ .4 = .6
0 < x < 0.2
Trapezoid:
1
𝐴 = (. 2)(2 + 1.5) = .35
2
3) The sketch below contains three normal curves; think of them as approximating the distribution
of exam scores for three different classes. One (call it A) has a mean of 70 and a standard
deviation of 5; another (call it B) has a mean of 70 and a standard deviation of 10; the third
(call it C) has a mean of 50 and a standard deviation of 10. Identify which is which by labeling
each curve with its appropriate letter.
A
C
B
4) For each of the following normal curves, identify (as accurately as you can from the graph) the
mean  and standard deviation  of the distribution.
πœ‡ = 50
𝜎=5
πœ‡ = 1100
𝜎 = 300
πœ‡ = βˆ’10
𝜎 = 40
πœ‡ = 225
𝜎 = 75
5) Suppose the average height of women collegiate volleyball players is 5’9”, with a standard
deviation of 2.1”. Assume that heights among these players follow a mound-shaped distribution.
a) Draw the curve and label it.
62.7 64.8 66.9 69 71.1 73.2 75.3
b) According to the empirical rule, about 95% of women collegiate volleyball players have
heights between what two values?
64.8 to 73.2
c) What does the empirical rule say about the proportion of players who are between 62.7
inches and 75.3 inches? 99.7% of players are between 62.7 and 75.3 inches tall
d) Reasoning from the empirical rule, what is the tallest we would expect a woman collegiate
volleyball player to be?
75.3 inches would cover all but .15%
e) About what percent of women collegiate volleyball players are taller than 71.1 inches?
16%
f) About what percent of women collegiate volleyball players are shorter than 64.8 inches?
2.5%
g) Find the percentiles for the following heights:
64.8
2.5th
50th
69
97.5th
73.2
99.85th
75.3
6) Given an approximately normal distribution, N(175, 37).
a. Draw a normal curve and label 1, 2, and 3 standard deviations on both sides of the mean.
64 101 138 175 212 249 286
b. What percent of values are within the interval (138, 212)?
68%
c. What percent of values are within the interval (101, 249)?
95%
d. What percent of values are within the interval (64, 286)?
99.7%
e. What percent of values are outside the interval (138, 212)?
32%
f. What percent of values are outside the interval (64, 286)?
0.3%
7) The incubation time for Rhode Island Red chicks is normally distributed with a mean 21 days and
standard deviation approximately 1 day.
a. Draw a normal curve and label 1, 2, and 3 standard deviations on both sides of the mean.
18 19 20 21 22 23 24
If 1000 eggs are being incubated, how many chicks do we expect will hatch in each of the following
situations?
b. In 19 to 23 days
c. In 18 days or more
d. In 22 days or less
95% of 1000
= 950
84% of 1000
= 840
99.85% of 1000
= 999
8) Use the table of standard normal probabilities to determine the proportion of the normal curve
that falls within:
a. one standard deviation of its mean (in other words, between z-scores of –1 and 1).
b. two standard deviations from the mean
.8413 - .1587 = .6826
.9772 - .0228 = .9544
c. three standard deviations from the mean.
.9987 - .0013 = .9974
d. Compare these values to the values obtained from the empirical rule.
They are all very close to the 68 – 95 – 99.7 proportions
described by the Empirical Rule.
9) Use the table of standard normal probabilities to determine the proportion of observations
from a standard normal distribution that satisfies each of the following statements. For each,
sketch a standard normal curve and shade the area under the curve that is the answer.
a. z < 2.85
b. z > 2.85
𝑃(𝑧 < 2.85) = .9978
𝑃(𝑧 > 2.85) =
1 βˆ’ .9978 = .0022
c. z > -1.66
d. -1.66 < z < 2.85
𝑃(βˆ’1.66 < 𝑧 < 2.85) =
. 9978 βˆ’ .0485 = .9493
𝑃(𝑧 > βˆ’1.66) =
1 βˆ’ .0485 = .9515
10) Use the table of standard normal probabilities to find the value z of a standard normal variable
that satisfies each of the following statements. For each sketch a standard normal curve and
mark your value of z on the axis.
a. 25% of observations fall below it
b. 40% of observations fall above it
.4000
.2500
𝑧 = βˆ’.67
𝑧 = .25
c. proportion less than it is 0.8
d. 90% of observations are greater than it
.8000
.9000
𝑧 = .84
𝑧 = βˆ’1.28
11) What are the quartiles (all three) of a standard normal distribution?
𝑄1 : 𝑧 = βˆ’.67
𝑄2 : 𝑧 = 0
𝑄3 : 𝑧 = .67
12) The deciles of any distribution are the points that mark off the lowest 10% and the highest
10%. What are the deciles of the standard normal distribution?
𝑧 = βˆ’1.28 π‘Žπ‘›π‘‘ 𝑧 = 1.28
13) Which of the following are true statements?
a. The area under a normal curve is always equal to 1, no matter what the mean and
standard deviation are.
b. The smaller the standard deviation of a normal curve, the higher and narrower
the graph.
c. Normal curves with different means are centered around different numbers.
i. I and II
ii. I and III
iii. II and III
iv. I, II, and III
v. None of the above gives the complete set of true responses.
14) Which of the following are true statements?
a. The area under the standard normal curve between 0 and 2 is twice the area between 0
and 1.
b. The area under the standard normal curve between 0 and 2 is half the area between –2
and 2.
c. For the standard normal curve, the interquartile range is approximately 3.
i. I and II
ii. I and III
iii. II only
iv. I, II and III
v. None of the above gives the complete set of true responses.
15) Populations P1 and P2 are normally distributed and have identical means. However, the standard
deviation of P1 is twice the standard deviation of P2. What can be said about the percentage
of observations falling within two standard deviations of the mean for each population?
i. The percentage for P1 is twice the percentage for P2.
ii. The percentage for P1 is greater, but not twice as great, as the percentage for
P2.
iii. The percentage for P2 is twice the percentage for P1.
iv. The percentage for P2 is greater, but not twice as great, as the percentage for
P1.
v. The percentages are identical.
16) Suppose that a college admissions office needs to compare scores of students who take the
SAT with those who take the ACT. Suppose that among the college’s applicants who take the
SAT, scores have a mean of 896 and a standard deviation of 174. Further suppose that among
the college’s applicants who take the ACT, scores have a mean of 20.6 and a standard deviation
of 5.2.
a) If applicant Bobby scored 1080 on the SAT, how many points above the SAT mean did he
score?
1080 – 896; 184 points above the SAT mean.
b) If applicant Kathy scored 28 on the ACT, how many points above the ACT mean did she
score?
28 – 20.6; 7.4 points above the ACT mean.
c) Is it sensible to conclude that since your answer to (a) is greater than your answer to b),
Bobby outperformed Kathy on the admissions test? Explain.
No, they are different tests with different
scoring scales.
d) Determine how many standard deviations above the mean Bobby scored by dividing your
answer to (a) by the standard deviation of the SAT scores. Do the same for Kathy’s score.
184
Bobby: 𝑧 = 174 ; 𝑧 = 1.06
7.4
Kathy: 𝑧 = 5.2 ; 𝑧 = 1.42
e) Which applicant has the higher z-score for his or her admissions test score?
Kathy had the higher z-score, 1.42 > 1.06
f) Explain in your own words which applicant performed better on his or her admissions test.
Kathy did better. Since she is more standard deviations above the mean she
would be in a higher percentile than Bobby.
g) Calculate the z-score for applicant Peter, who scored 740 on the SAT, and for applicant
Kelly, who scored 19 on the ACT.
Peter: 𝑧 =
740βˆ’896
174
; 𝑧 = βˆ’.90
Kelly: 𝑧 =
19βˆ’20.6
5.2
; 𝑧 = βˆ’.31
h) Which of Peter and Kelly has the higher z-score?
Kelly has the higher z-score, -.31 > -.90:
i) Under what conditions does a z-score turn out to be negative?
A z-score will be negative when a value is below the mean.
17) Data from the National Vital Statistics Report reveal that the distribution of the duration of
human pregnancies (i.e., the number of days between conception and birth) is approximately
normal with mean  = 270 and standard deviation  = 15. Use this normal model to determine
the probability that a given pregnancy comes to term in:
a. less than 244 days (which is about 8 months).
.0418
b. more than 275 days (which is about 9 months).
.3707
c. over 300 days.
.0228
d. between 260 and 280 days.
.4972
e. Data from the National Vital Statistics Report reveal that of 3,880,894 births in
the US in 1997, the number of pregnancies that resulted in a preterm delivery,
defined as 36 or fewer weeks since conception, was 436,600. Compare this to the
prediction that would be obtained from the model.
446691 vs. 436600; pretty close
18) Suppose that the IQ scores of students at a certain college follow a normal distribution with
mean 115 and standard deviation 12.
a. Use the normal model to determine the proportion of students with an IQ score below
100.
.1056
b. Find the proportion of these undergraduates having IQs greater than 130.
.1056
c. Find the proportion of these undergraduates having IQs between 110 and 130.
.5559
d. With his IQ of 75, what would the percentile of Forrest Gump’s IQ be?
.04th percentile
f. Determine how high one’s IQ must be in order to be in the top 1% of all IQs at this
college.
142.92
19)
Suppose that Professors Wells and Zeddes have final exam scores that are approximately
normally distributed with mean 75. The standard deviation of Wells’ scores is 10, and that
of Zeddes’ scores is 5.
a. With which professor is a score of 90 more impressive? Support your answer with
appropriate probability calculations and with a sketch.
Zeddes; z-score is 3 which is farther above the mean than Wells where the z-score is .33
b. With which professor is a score of 60 more discouraging? Again support your answer
with appropriate probability calculations and with a sketch.
Zeddes; z-score is -3 which is farther below the mean than Wells where the z-score is -.33
20)
Suppose that the wrapper of a certain candy bar lists its weight as 2.13 ounces. Naturally,
the weights of individual bars vary somewhat. Suppose that the weights of these candy
bars vary according to a normal distribution with mean  = 2.2 ounces and standard deviation
 = 0.04 ounces.
a. What proportion of candy bars weigh less than the advertised weight?
.0401
b. What proportion of candy bars weigh more than 2.25 ounces?
.1056
c. What proportion of candy bars weigh between 2.2 and 2.3 ounces?
.4938
d. If the manufacturer wants to adjust the production process so that only 1 candy bar
in 1000 weighs less than the advertised weight, what should the mean of the actual
weights be (assuming that the standard deviation of the weights remains 0.04
ounces)?
πœ‡ = 2.2532
21)
A trucking firm determines that its fleet of trucks averages a mean of 12.4 miles per gallon
with a standard deviation of 1.2 miles per gallon on cross-country hauls. What is the
probability that one of the trucks averages fewer than 10 miles per gallon?
With a z-score of -2, the probability that the trucks will average fewer than 10 mpg is 0.0228.
22)
A factory dumps an average of 2.43 tons of pollutants into a river every week. If the
standard deviation is 0.88 tons, what is the probability that in a week more than 3 tons are
dumped?
With a z-score of 0.65, the probability that in a week more than 3 tons of pollutants are
dumped is 0.2578.
23)
An electronic product takes an average of 3.4 hours to move through an assembly line. If
the standard deviation is 0.5 hour, what is the probability that an item will take between 3
and 4 hours?
With z-scores of -0.8 and 1.2, the probability that an item will take between 3 and 4 hours to
move through an assembly line is 0.6730.
24)
The mean score on a college placement exam is 500 with a standard deviation of 100.
Ninety-five percent of the test takers score above what?
95% of the test takers score above 335.5.
25)
The average noise level in a restaurant is 30 decibels with a standard deviation of 4
decibels. Ninety-nine percent of the time it is below what value?
99% of the time the average noise level in a restaurant is less than 39.32 decibels.
26)
The mean income per household in a certain state is $9500 with a standard deviation of
$1750. The middle 95% of incomes are between what two values?
With a z-score of -2, the probability that the trucks will average fewer than 10 mpg is 0.0228.
27)
Jay Olshansky from the University of Chicago was quoted in Chance News as arguing that
for the average life expectancy to reach 100, 18% of people would have to live to 120.
What standard deviation is he assuming for this statement to make sense (assuming life
expectancies are normally distributed)?
Jay Olshansky is assuming a standard deviation of approximately 21.7.
28)
Cucumbers grown on a certain farm have weights with a standard deviation of 2 ounces.
What is the mean weight if 85% of the cucumbers weigh less than 16 ounces?
The mean weight is 13.92 ounces if 85% of cucumbers weigh less than 16 ounces.
29)
A coffee machine can be adjusted to deliver any fixed number of ounces of coffee. If the
machine has a standard deviation in delivery equal to 0.4 ounce, what should be the mean
setting so that an 8-ounce cup will overflow only 0.5% of the time?
For an 8 oz cup to overflow 0.5% of the time, the mean setting should be 6.97 oz.
30)
If 75% of all families spend more than $75 weekly for food, while 15% spend more than
$150, what is the mean weekly expenditure and what is the standard deviation?
The mean weekly food expenditure is $93.54 with a standard deviation of $27.68.
31)
Three landmarks of baseball achievement are Ty Cobb’s batting average of .420 in 1911,
Ted Williams’s .406 in 1941, and George Brett’s .390 in 1980. These batting averages
cannot be compared directly because the distribution of major league batting averages has
changed over the decades. The distributions are quite symmetric and reasonably normal.
While the mean batting average has been held roughly constant by rule changes and the
balance between hitting and pitching, the standard deviation has dropped over time. Here
are the facts
Decade Mean Std .Dev.
1910s
.266
.0371
1940s
1970s
.267
.261
.0326
.0317
Comparatively, who ranked highest amongst his peers? Justify your answer.
Ted Williams batting average of .406 is 1941 would be ranked highest amongst his peers. Ted
was 4.26 standard deviations above the mean compared to 4.15 for Ty Cobb and 4.06 for
George Brett.
32)
The following graph and data output are from the 108 years of rainfall in Austin. Let the
calculated mean and standard deviation from the data represent µ and Οƒ for the
calculations.
Annual Rainfall in Austin
33.6518 in
9.67722 in
Rainfall
S1 = mean Rainfall
S2 = stdDev Rainfall
a) What proportion of rainfall was between 30 and 40 inches?
Since the z-score for 30 inches is -0.38 and the z-score for 40 inches is 0.66, the approximate
proportion of rainfall between 30 and 40 inches is 0.3934.
b) Comment on the accuracy of your calculations.
The calculations should be fairly accurate. The histogram of the distribution of rainfall for
Austin appears to be approximately Normal with no obvious outliers. Therefore, it is
appropriate to use standard Normal probabilities.
33)
A person with too much time on his hands collected 1000 pennies that came into his
possession in 1999 and calculated the age (as of 1999) of each. The distribution has mean
12.264 years and standard deviation 9.613 years. Knowing these summary statistics but
without seeing the distribution, can you comment on whether the normal distribution is
likely to provide a reasonable model for these penny ages? Explain.
It is probably not normally distributed. 12.264 – 3(9.613) is well below 0 (a new penny). This
distribution is most likely skewed to the right as there are more newer pennies in circulation.
34)
Use the following data set.
154
109
137
115
103
126
126
137
152
165
140
165
154
129
178
200
200
148
Draw a stemplot, boxplot and normal probability plot to assess normality.
Data appears to be fairly normally distributed
35)
Use the following data set:
32
31
29
30
20
24
10
24
30
31
33
30
22
15
25
32
32
23
20
23
Is this data set best modeled by using a normal distribution? Create and draw a histogram,
boxplot and normal probability plot to decide. Explain your results.
The histogram and box-plot both show that the data is skewed to the left. The normal probability
plot has too much curvature to be considered normal.