Download Introduction to the Practice of Statistics

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Central limit theorem wikipedia , lookup

Transcript
Introduction to the Practice of Statistics
Fifth Edition
Moore, McCabe
Section 5.2 Homework Answers
5.29 An automatic grinding machine in an auto parts plant prepares axles with a target diameter µ =
40.125 mm. The machine has some variability, so the standard deviation of the diameters is σ =
0.002mm. A sample of 4 axles is inspected each hour for process control purposes, and records are kept
of the sample mean diameter. If the process mean is exactly equal to the target value, what will be the
mean and standard deviation of the numbers recorded?
µ x = 40.125mm
σx =
0.002
4
= 0.001
5.31 Averages are less variable than individual observations. Suppose that the axle diameters in Exercise
5.29 vary according to a normal distribution. In that case, the mean x of an SRS of axles also has a
normal distribution.
a) Make a sketch of the normal curve for a single axle. Add the normal curve for the mean of an SRS of 4
axles on the same sketch.
b) What is the probability that the diameter of a single randomly chosen axle differs from the target value
by 0.004 mm or more?
Let the random variable X measure the diameter of an axle.
P( X < 40.121 OR X > 40.129) = 5% using the
68-95-99.7 Rule
c) What is the probability that the mean diameter of an SRS of 4 axles differs from the target value by
0.004mm or more?



40.121 − 40.125 
P( X < 40.121 OR X > 40.129) = 2 P  Z <

0.002




4


= 2P(Z < -4)
≈0
In actuality 0.0000634.
5.36 North Carolina State University posts the grade distributions for its courses online. You can find the
distribution grades in Statistics 101 in the fall 2003 semester was
Grade
X
P(x)
A
4
0.21
B
3
0.43
C
2
0.30
D
1
0.05
F
0
0.01
a) Using the common scale A = 4, B = 3, C = 2, D = 1, F = 0, take X to be the grade of a randomly chosen
101 student. Use the definitions of the mean (page 292) and standard deviation (page 300) for discrete
random variables to find the mean µ and the standard deviation σ of grades in this course.
µ = 0.21(4) + 0.43(3) + 0.30(2) + 0.05(1) + 0.01(0) = 2.78
σ=
0.21(4 - 2.78) 2 + 0.43(3 - 2.78) 2 + 0.30(2 - 2.78) 2 + 0.05(1 - 2.78) 2 + 0.01(0 - 2.78)2
σ = 0.8669
b) Statistics 101 is a large course. We can take the grades of an SRS of 50 students to be independent of
each other. If x is the average of these 50 grades, what are the mean and standard deviation of x ?
µ x = 2.78 while σ x =
0.8669
50
≈ 0.1226
c) What is the probability P(X ≥ 3) that a randomly chosen Statistics 101 student gets a B or better? What
is the approximate probability P( x ≥ 3) that the grade point average for 50 randomly chosen Statistics 101
students is B or better?
P(X ≥ 3) = 0.43 + 0.21 = 0.64



3-2.78 
P( x ≥ 3) ≈ P  Z >

0.8669 



50 

≈ P(Z > 1.794)
≈ 0.03636
5.37 Sheila's doctor is concerned that she may suffer from gestational diabetes (high blood glucose levels
during pregnancy). There is variation both in the actual glucose level and in the blood test that measures
the level. A patient is classified as having gestational diabetes if the glucose level is above 140 milligrams
per deciliter (mg/dl) one hour after a sugary drink is ingested. Sheila's measured glucose level one hour
after ingesting the sugary drink varies according to the normal distribution with µ = 125 mg/dl and
σ = 10 mg/dl.
(a) If a single glucose measurement is made, what is the probability that Sheila is diagnosed as having
gestational diabetes?
The key to any of these problems is to be aware of the assumptions concerning your situation.
Fact 1 – We are interested in glucose levels.
Fact 2 – It turns out that the Sheila’s glucose levels vary and the distribution is normal. µ = 125 mg/dl
and σ = 10 mg/dl
Fact 3 – A person with glucose levels above 140 mg/dl is classified as having gestational diabetes
What the result suggests is that if we take one reading at random there
140 - 125  is a 6.68% chance of having a glucose level reading higher than

P(x > 140) ≈ P  Z >

10

 140mg/dl.
= P(Z > 1.5)
I imagine that what the doctor is really interested in is not a single
= 0.0668
measurement to determine if Sheila should be classified with
gestational diabetes but rather Sheila’s glucose level average. Thus, a measurement of one number can be
used to approximate, µ, but we also understand there would be too much variability associated with one
measurement alone.
(b) If measurements are made instead on 4 separate days and the mean result is compared with the
criterion 140 mg/dl, what is the probability that Sheila is diagnosed as having gestational diabetes?
The question here is about the probability about the average of four measurements. Key word here we are
concerned about the probability of obtaining a particular average. The question suggests we are talking
about the distribution of averages.
Fact: Since the original distribution from which we sampled is normally distributed, the sampling
distribution of averaging four numbers is exactly normal as well. Know this fact well.



140 - 125 
P(X > 140) = P  Z >

10




4


= P(Z > 3.00)
= 0.0013
5.38 A $1 bet in a state lottery's Pick 3 game pays $500 if the three-digit number you choose exactly
matches the winning number, which is drawn at random. Here is the distribution of the payoff X:
Payoff
Prob
$0
0.999
$500
0.001
Each day's drawing is independent of other drawings.
(a) What are the mean and standard deviation of the random variable X?
µ = 0(0.999) + $500(0.001) = $0.5
σ = 0.999(0 − 0.5)2 + 0.001(500 − 0.5)2 = $15.80
(b) Joe buys a Pick 3 ticket every day. What does the law of large numbers say about the average payoff
Joe receives from his bets?
In the long run, the average payoff for a $1 bet is $0.50.
(c) What does the Central Limit Theorem say about the distribution of Joe’s average payoff after 365 bets
in a year? When we consider the distribution consisting of the average of the 365 outcomes, that
distribution, while discreet, will have the approximate shape of a normal distribution.
(d) Joe comes out ahead for the year if his average payoff is greater than $1 (the amount spent each day on
a ticket). What is the probability that Joe ends the year ahead?



1 - 0.5 
P( x ≥ $1) ≈ P  Z >

15.80 



365 

= P(Z > 0.60)
= 0.2742 There is a2 7.42% chance that at the end of the year Joe’s average winnings exceed
$1.
Now lets get back to reality. I assigned this problem so you can practice this concept with a discreet table.
But the original population suffers from severe case of right skewness. Thus, a distribution consisting of
365 averages will NOT be normally distributed. We would need a sample size of 10,000 to achieve a
graph that has that normal shape.
Below is what the distribution of 365 averages look like for the first 17 out of 366 possible averages,
along with the table of values. The most likely outcome is that the average over the entire year is $0,
P( x = $0) = 0.6941. At the other extreme is that Joe averages $500! This means Joe would win $500 on
every pick. P( x = $500) = this contains so many zeroes that even using scientific notation the computer
can not generate an answer. The chance of winning at Powerball is more likely.
Average Yearly Winnings
x
0.80000
Probability
0.70000
0.60000
0.50000
0.40000
0.30000
0.20000
0.10000
0.00000
0.00
5.00
10.00
15.00
20.00
Possible Averages of 365
0.00
1.37
2.74
4.11
5.48
6.85
8.22
9.59
10.96
12.33
13.70
15.07
16.44
17.81
P( x )
0.69407
0.25359
0.04620
0.00560
0.00051
0.00004
0.00000
0.00000
0.00000
0.00000
0.00000
0.00000
0.00000
0.00000
So the correct answer the book question of P( x ≥ $1) is 0.2536 which is close to our approximate value of
0.2742
On the next page is the distribution for averages of 8000
Distribuiton of averages of 8000
0.16000
0.14000
Probability
0.12000
0.10000
0.08000
0.06000
0.04000
0.02000
0.00000
0.00
5.00
10.00
15.00
20.00
25.00
30.00
35.00
Possible averages of 8000 First 23 Values
40. The number of flaws per square yard in a type of carpet material varies with mean 1.6 flaws per square
yard and standard deviation 1.2 flaws per square yard. This population distribution can not be normal,
because a count takes only whole-number values (i.e. this is a discrete population). An inspector studies
200 square yards of the material, records the number of flaws found per square yard inspected. Use the
central limit theorem to find the approximate probability that the mean number of flaws exceeds 2 per
square yard.



2 - 1.6 
P(X > 2) = P  Z >

1.2 



200 

= P(Z > 4.71)
= 0.0000012
The result indicates that the probability of seeing more than 2 flaws per square yard on average from a
sample of 200 will occur about 12 times out of 10,000,000 attempts, or a little more than once out of
1,000,000 attempts.
In this situation we have a population that is not normally
distributed. Regardless of the distribution type we can still
calculate the mean and standard deviation but the 68-95-99.7
rule does not apply anymore.
We are told that µ = 1.6 flaws per sq yd, and that σ = 1.2 flaws
per sq yd. Again, you can see that this is not normally
distributed since the smallest value for a measurement is 0 flaws
per sq yd, and if we go two standard deviations to the left 1.6 –
2(1.2) we have negative flaws per sq yd which is nonsense.
Also normal distributions are continuous and this distribution is
discrete, we can only have whole numbers as our outcomes; 0 flaws/sq yd, 1 flaw/sq yd, 2 flaws/sq yd,
and so on. We can not have 1.75flaws/sq yd in one measurement; when we average several values then
we can have fractional flaws/sq yd.
The situation is we will be looking at 200 sq yds of material, and we want to know the likely hood that if
we looked at blocks of 1yd by 1yd and recorded the flaws in each square yard, and then averaged all 200
numbers, what is the probability that the average recorded is greater than 2 flaws/sq yd?
Key words - Central Limit Theorem – the theorem is mentioned to bring back to memory the fact that you
will be dealing with the sampling distribution and not the actual
Simulated distribution based on given
population itself. Also the distribution is approximately normal,
information.
according to the theorem, so our calculated probability is also an
approximation.
You can see from the histogram, and the normal quantile plot
that the Central Limit Theorem is correct in the fact that the
distribution is very, very close to a normal distribution. The
straight line in the normal quantile plot indicates that it is
extremely close to a normal distribution, so much so that we
can depend on the calculations we are about to make to be
very good approximations.
I want to calculate the probability that my average of 200
numbers exceeds 2 flaws per sq yd. I can see from my
histogram that this is not very likely, since in my 750
simulations not once did this occur.
I used the simulated distribution above to
sample from.
Normal Quantile Plot for the Sampling
Dsitribution Simulation; 750 sample means
averaging 200 values at a time.
1.9
1.8
1.7
1.6
1.5
1.4
1.3
AVERAGES OF 200
VALUES
I conducted a simulation in which I sampled 200 square yards
of material and recorded the flaws in the 200 pieces of 1 yd by
1 yd. I then averaged the 200 values I recorded to get my one
value of the sample mean, x . Now I repeated this procedure
765 more times to obtain the sampling distribution of the
mean, for the 765 values. By doing this many times I am
hoping that the distribution I got by experiment is close to the
theoretical distribution, or at least I get to glimpse what the
theoretical distribution probably looks like.
-4
-2
0
EXPECTED Z-SCORE
2
4
5.41 In response to the increasing weight of airline passengers, the Federal Aviation Administration in
2003 told airlines to assume that passengers average 190 pounds in the summer, including clothing and
carry-on baggage. But passengers vary: the FAA gave a mean but not a standard deviation. A reasonable
standard deviation is 35 pounds. Weights are not normally distributed, especially when the population
includes both men and women, but they are, not very non-normal. A commuter plane carries 19
passengers. What is the approximate probability that the total weight of the passengers exceeds 4000
pounds? (Hint: To apply the central limit theorem, restate the problem in terms of the mean weight.)
How do we go about organizing this information? Keep in mind what we are doing. We are sampling
from the U.S. population and measuring the weight of a person, clothing and carry-on baggage. So our
random variable X is this final sum. What is the distribution like? They mention that it is, “not normally
distributed”, yet the last statement makes us seem that we are not far off, “but they are not very nonnormal.”
So I will assume, it is slightly right skewed, since some people seem to carry heavy items on the carry-on
skewing the weight to the right.
The plane carries 19 people and since this problem involves the issue of maximum weight of an airplane, I
will calculate for the worst case scenario, a full plane. We should not exceed 4000 lbs, which means that,
for the 19 passengers, the average weight can not exceed 210.52 lbs.
Gather SRS of 19, n = 19 and
calculate x .
The central limit theorem suggests that the sampling distribution of the mean will be close to a normal
distribution for a sample of 19. So now I will proceed the final calculations using procedures that are
applied to a normal distribution.
x =
4000lbs
≈ 210.52 lbs
19
σx =
35
≈ 8.03 lbs
19



210.52 - 190 
P(X > 210.52) = P  Z >

35




19


The result suggests that we encounter a plane exceeding
= P(Z > 2.56)
maximum weight about 5 times in a thousand flights on average in
= 0.0052
the long run, for full a passenger plane.
5.43 The distribution of annual returns on common stocks is roughly symmetric, but extreme
observations are more frequent than in a normal distribution. Because the distribution is not strongly nonnormal, the mean return over even a moderate number of years is close to normal. Annual real returns on
the Standard & Poor's 500-Stock Index over the period 1871 to 2004 have varied with mean 9.2% and
standard deviation 20.6%.
Andrew plans to retire in 45 years and is considering investing in stocks. What is the probability (assuming that the past pattern of variation continues) that the mean annual return on common stocks over the
next 45 years will exceed 15%? What is the probability that the mean return will be less than 5%?
First, let us figure out what are we measuring? How is our sample space defined? The answer lies in the
last paragraph; we are measuring stock return as a percentage of investment. Thus our measurements will
be in the form of percentages (or decimals), positive for gains, negative for losses.
We are given information about the distribution of stock returns in a yearly basis; µ = 9.2% and σ =
20.6%, and it is symmetric, but not normal (outliers are a bit too far off compared to a normal
distribution). All this means is that the distribution is bell-shaped like a normal distribution, but the
outliers are too extreme as compared to a normal distribution.
Gather SRS of size 45 and calculate x .
This Andrew fellow is going to retire in 45 years, the distribution we have is on a yearly basis. But the
question is about averaging over 45 years, which then means sampling 45 times from the given
distribution. The central limit theorem states that the distribution for the sampling distribution of the mean
will be very close to a normal distribution, therefore, I will proceed by using calculations that apply to a
normal distribution.
P( x > 15% ) = ?
σx =
20.6
≈ 3.07%
45



15% - 9.2% 
P(X > 15%) = P  Z >

20.6% 



45


= P(Z > 1.89)
= 0.0294
The second question is handled the same way.
What is the probability that the mean return will be less than 5%?



5% - 9.2% 
P(X < 5%) = P  Z <

20.6% 



45 

= P(Z < − 1.37)
= 0.0853
What both answers suggest is that Andrew will most likely have a return close to the advertise value (the
expected value) of 9.2%.