Download Math 243

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Statistics wikipedia , lookup

Probability wikipedia , lookup

Randomness wikipedia , lookup

Transcript
FINAL REVIEW
W /ANSWERS
(© 03/15/08 - Sharon Coates)
Concepts to review before answering the questions:
ƒ
A population consists of the entire group of people or objects of interest to an investigator,
while a sample refers to the part of the population that the investigator actually studies.
ƒ
In certain contexts, a population can also refer to a process (such as flipping a coin or
manufacturing a candy bar) that in principle can be repeated indefinitely. With this
interpretation of population, a sample is a specific collection of process outcomes; e.g., the
number of heads in n tosses or the weights of n candy bars.
ƒ
A parameter is numerical characteristic of a population, while a statistic is a numerical
characteristic of a sample. Examples:
(population) parameter
mean (average)
standard deviation
μ
σ
proportion (of “successes”)
p
(sample) statistic
x
s
p̂
ƒ A statistic from a random sample or a randomized experiment is a random variable.
The probability distribution of the statistic is its sampling distribution.
9
The binomial distributions are an important class of discrete probability distributions.
The important skill for using binomial distributions is the ability to recognize situations to
which they do and don’t apply.
9
In the binomial setting, the count X has a binomial distribution. The count takes whole
numbers between 0 and n. If both np and n(1 – p) are at least 10, probabilities about
the count X can be approximated by the appropriate normal distribution.
9
is always a number between 0 and 1. The proportion
p̂
does not
have a binomial distribution, but we can do probability calculations about
p̂
by restating
The proportion
p̂
them in terms of the count X and using binomial methods. The sampling distribution of
p̂
is approximately normal if np and n(1 – p) are both at least 10.
Question 1. Consider a game in which a red die and a blue die (assumed fair dice) are rolled. Let
the random variable X represent the value showing on the up face of the red die. Since X is a
discrete random variable, its probability distribution is tabulated:
X
P(X)
1
1/6
2
1/6
3
1/6
4
1/6
5
1/6
6
1/6
Let the random variable Y represent the value showing on the up face of the blue die. In parts a
through f below, give values.
a.
The mean of the probability distribution of X is ____________.
b.
The standard deviation of the probability distribution of X is _________.
c.
Shortened version of question) The mean of Y is _____________.
d.
The standard deviation of Y is ______________.
Suppose that you are offered a choice of the following two games:
Game 1: costs $7 to play. You win the sum of the values showing on the up faces of the red and
blue die.
Game 2: doesn’t cost anything to play initially, but you win 4 times the difference of the up faces;
i.e., 4(X – Y). If 4(X – Y) is negative, you must pay that amount. If 4(X – Y) is positive, you
receive that amount. Net amount = total winnings – cost to play.
e.
For Game 1, the net amount won in a game is the random variable X + Y – 7.
(1 pt) The mean of (the probability distribution of) X + Y – 7 is $_____.
(1 pt) The standard deviation of X + Y – 7 is $ _______________.
f.
For Game 2, the net amount won in a game is 4(X – Y) or 4X – 4Y.
(1 pt) The mean of (the probability distribution of) 4(X – Y) is $ _______.
(1 pt) The standard deviation of 4(X – Y) is $_____________.
g.
(2 pts) Based on your responses to parts e and f above, if you had to play and you
like excitement, which game should you choose and why?
Question 2. “Girls Get Higher Grades” was a headline from USA Today, Aug 12, 1998. Eighty
percent of the girls in grade 12 surveyed said it was important to them to do their best in all classes,
whereas only 65% of the boys in grade 12 surveyed responded this way. Suppose you take a SRS
of 20 female students and another sample of 20 male students in your region of the country.
2
a. To find the probability that more than half of the girls in the sample want to do their best in
every class, organize your response:
In a complete, English sentence, define the random variable in this context.
What is an appropriate symbol for the random variable? _________
Complete the mathematical sentence to find the probability by replacing “event” with an
inequality involving the random variable, connecting (with an equals sign) the appropriate
distribution function, and rounding the resulting probability to the nearest ten-thousandth:
P(event) =
b. What conditions would have to be satisfied in order to use your approach?
Question 3. Airlines routinely overbook popular flights because they know that not all ticket
holders show up. If more passengers show up than there are seats available, an airline offers
passengers $100 and a seat on the next flight. A particular 120-seat commuter flight has a 10%
no-show rate.
a. Suppose the airline sells 130 tickets for the flight.
If a ticketed passenger is a “trial,” how many trials are there? ________
A ticketed passenger either shows up or doesn’t show up for the flight. Assume that
there are no groups traveling together. What is the probability that an individual ticketed
passenger will show up for the flight? _______
On average, how many ticketed passengers will show up for a 120-seat commuter
flight? ________
When does the airline have to make a payout on a 120-seat commuter flight?
3
Let the random variable Y represent the amount of the payout if the airline sells 130
tickets for the flight. In the space provided, create the probability distribution of the
discrete random variable Y where the values of Y are in the first row of the table and the
corresponding probabilities P(Y) are in the second row. A table is appropriate because Y
is discrete. Make the table horizontal to save space.
How much money does the airline “expect” to have to pay out per flight? Hint: the
expected value of the payout is the mean of the probability distribution of payout
amounts based on the sale of 130 tickets.
b. To calculate how many tickets the airline should sell per flight if they want their chance of
giving any $100 amounts to be about 0.05, organize your response:
Let the random variable X count the number n of ticketed passengers that will show up
for the 120-seat commuter flight. The range of values of X is 0 to n. In terms of X, the
airline has to make a pay out on a 120-seat commuter flight when _______________.
If the airline sells n = 135 tickets, will the probability that the airline will have to make a
pay out be only about 0.05? To find out, complete the mathematical sentence by
replacing “event” with an inequality involving the random variable, connecting the
appropriate distribution function (with an equals sign), and rounding the resulting
probability to the nearest ten-thousandth:
P(event) =
What conditions would have to be satisfied in order to use the distribution function that
you selected?
4
You want to find out what n should be so that the chance the airline will have to make a
pay out is only about 5%. Use trial and error to obtain n, using your result above to
give you some direction. Then complete the sentence below by replacing “event” with
an inequality that involves the random variable. After the equals sign, give the
appropriate distribution function and let the result verify that the probability of a pay out
is about 5%:
The airline can sell ______ tickets for the 120-seat commuter flight
because P(event) =
Question 4. The process for manufacturing a ball bearing results in weights that have an
approximately normal distribution with mean 0.15 g and standard deviation 0.003 g. Define the
random variable in each case. Start the mathematical sentence P(event) where the event involves
the random variable. Include the appropriate calculator distribution function in the sentence.
a. If you select 1 ball bearing at random, what is the probability that it weighs less than 0.148 g?
b. If you select 4 ball bearings at random, what is the probability that their average (mean)
weight is less than 0.148g?
c.
If you select 10 ball bearings at random, what is the probability that their average (mean)
weight is less than 0.148 g?
5
Question 5. Suppose that a particular candidate for public office is in fact favored by 48% of all
registered voters in the district. A polling organization will take a random sample of 500 voters and
will use the sample proportion
probability that
p̂
p̂
to estimate the true proportion p. What is the approximate
will be greater than 0.5, causing the polling organization to incorrectly predict the
result of the upcoming election?
Question 6. Two business associates have opened a wine bottling factory. The machine that fills
the bottles is fairly precise. The distribution of the number of ounces of wine in the bottles is normal.
σ , the true population standard deviation, is not usually known, suppose for illustrative
purposes that σ = 0.06 oz. The mean μ is supposed to be 16 oz, but the machine slips away from
Although
that occasionally and has to be readjusted. The associates take a random sample of 10 bottles from
today’s production run and weigh the wine in each. The weights are:
15.91
16.08
16.08
15.94
15.94
15.96
16.03
15.82
Should the machine be adjusted? Use a 95% confidence interval to estimate
μ
16.02
15.96
in order to support
your advice.
Question 7.
Heights of women in the U.S. are normally distributed and have a standard deviation
of about 2 inches. Suppose you want to estimate the mean height
μ
of all the women in your city
from a random sample of size n at the 90% confidence level. What minimum sample size would you
need in order to estimate the mean height
μ
to within:
a. two inches?
b. one-half inch?
6
Answers:
Question 1.
a. The mean of the probability distribution of X is 3.5.
b. The standard deviation of the probability distribution of X is 1.70783.
c. (Shortened version of question) The mean of Y is 3.5.
d. The standard deviation of Y is 1.70783.
e. The mean of (the probability distribution of) X + Y – 7 is: $0.00.
Fast way : μ X + Y − 7 = μ X + μ Y − 7 = 3.5 + 3.5 − 7 = 0
The standard deviation of X + Y – 7 is: $2.415 approx.
Fast way: σ X + Y − 7 =
σ X2 + σ Y2 =
1.707832 + 1.707832
≈ 2.4152
The “fast way” is used to calculate the standard deviation of X + Y – 7 because X and Y
are independent random variables: the outcome of one die does not influence the
outcome of the other die.
Slow way: Give the probability distribution of X + Y.
X+Y
P(X + Y)
2
3
4
5
6
7
8
9
10
11
12
1/36 2/36 3/36 4/36 5/36 6/36 5/36 4/36 3/36 2/36 1/36
Then give the probability distribution of the net winnings $(X + Y -7):
X+Y-7
-$5 -$4 -$3 -$2 -$1 $0
$1
$2
$3
$4
$5
P(X + Y - 7) 1/36 2/36 3/36 4/36 5/36 6/36 5/36 4/36 3/36 2/36 1/36
Enter the first row values of the distribution of net winnings in List 1 (L1). Enter the second
row values into List 2 (L2). Calculate 1-Var Stats on L1, L2 or List 1 with Freq: List 2,
depending on your Ti. The mean
above. (The symbol
x
μ
and the standard deviation
is replaced with the appropriate symbol
σ
will match the results
μ .)
f. The mean of (the probability distribution of) 4(X – Y) is: $0.00.
Fast way : μ 4X − 4Y = μ4X − μ4Y = 4μX − 4μ Y = 4(3.5) − 4(3.5) = 0
The standard deviation of 4(X – Y) is: $9.661 approx.
Fast way: σ 4X − 4Y =
4 2 σ X2 + 4 2 σ Y2
=
(
4 2 1.707832
)
(
+ 4 2 1.707832
)
≈ 9.66095
7
The “fast way” is used to calculate the standard deviation of 4(X – Y) because X and Y
are independent random variables: the outcome of one die does not influence the
outcome of the other die.
Slow way: Give the probability distribution of X - Y.
X-Y
P(X - Y)
0
1
2
3
4
5
-1
-2
-3
-4
-5
6/36 5/36 4/36 3/36 2/36 1/36 5/36 4/36 3/36 2/36 1/36
Then give the probability distribution of the net winnings $(4X – 4Y):
4(X - Y)
P(4X – 4Y)
Then find
μ
$0
$4
$8 $12 $16 $20 -$4 -$8 -$12 -$16 -$20
6/36 5/36 4/36 3/36 2/36 1/36 5/36 4/36 3/36 2/36 1/36
and
σ
with your calculator.
g. A gambler should choose Game 2 because although both games’ winnings in the long run
average out to $0, there is more variability in the distribution of the net winnings in Game 2
because the standard deviation exceeds the standard deviation of the distribution of net
winnings in Game 1. Variability indicates that the gambler could win a lot of money, but he
could also lose a lot of money too. There is greater risk involved in Game 2 than in Game 1
and that is the exciting part for a gambler.
Question 2.
a. The random variable is the count in the sample of 20 girls in grade 12 who would say they
want to do their best in every class.
Note: The conditions for approximating the distribution of the sample proportion
p̂
who
want to do their best are not met because n(1 – p) is only 4, not at least 10. So use the count
X, not the proportion
p̂
as the random variable in this scenario.
The random variable could be symbolized by X (the count of those with the trait of interest in
the sample).
Ti 89/Voy 200:
The math sentence is: P(X ≥ 11) = binomialcdf ( 20, 0.8, 11, 20) ≈ 0.9974.
Ti 83 family/Ti 86:
The math sentence is: P(X ≥ 11) = 1 - binomialcdf ( 20, 0.8, 10) ≈ 0.9974.
b. The Binomial Setting is satisfied:
ƒ
There is a fixed number of trials (attitudes): n = 20.
8
ƒ
Each trial results in one of two outcomes: a success (the girl says that she wants to do
her best in all her classes) or a failure (the girl doesn’t say that she wants to do her
best in all her classes).
ƒ
The sample must be collected so that one girl in the sample couldn’t influence another
girl in the sample, so that the trials are independent or the population size > 20 times n.
ƒ
The probability of a success is constant from trial to trial: p = 0.8.
ƒ
The count X has a binomial distribution with n = 20 and p = 0.8.
ƒ
The shape of this binomial distribution of successes is not roughly normal because the
sample size n fails to meet the size requirement: if both np and n(1 – p) are at least 10,
then the distribution of the count X is approximated by a normal curve.
However: np = 20(0.8) = 16, but n(1 – p) = 20(0.2) = 4, therefore the distribution
function normalcdf( ) to estimate the probability should not be employed.
Question 3.
a. n = 130 trials p = 1 – 0.10 = 0.90 On average, np = 130(0.90) = 117 ticketed passengers
will show up for the flight.
The airline makes a pay out when the number of ticketed passengers is at least 121.
The distribution of the amount of payout Y is:
Payout
Y
$0
$100
$200
$300
$400
$500
$600
$700
$800
$900
$1000
P(Y)
.8479
.064
.0425
.0249
.0126
.0055
.0019
.00055
.000117
.000016
.000001
If X represents the number of ticketed passengers out of 130 that will show up for the 120-seat
commuter flight, then X and Y are related:
Ti 89:
P(Y = 0) = P(X < 120) = binomialcdf(130, 0.9, 0, 120) ≈ 0.8479
Ti 83/Ti 86: P(Y = 0) = P(X < 120) = binomialcdf(130, 0.9, 120) ≈ 0.8479
All Ti calculators use binomialpdf( ) to complete the table:
P(Y = 100) = P(X = 121) = binomialpdf(130, 0.9, 121) ≈ 0.064
to
P(Y = 1000) = P(X = 130) = binomialpdf(130, 0.9, 130) ≈ 0.000001.
Enter the values of Y into List 1 and the corresponding probabilities in List 2. Calculate 1-Var
Stats on List 1, List 2: The airline expects to pay out about $31.79 per flight, on average.
The airline has to make a pay out on a 120-seat flight when X > 121.
9
If n = 135 tickets are sold, the probability that the airline will have to make a payout is
better than 62%, an unwise policy:
Ti 89/ Ti 92Plus/ Voy 200:
P(X ≥ 121) = binomialcdf(135, 0.90, 121, 135) ≈ 0.6262
Ti 83 family/ Ti 86:
P(X ≥ 121) = 1 - binomialcdf(135, 0.90, 120) ≈ 0.6262
The Binomial Setting is satisfied:
i. Fixed number of trials (ticketed passengers): n = 135.
ii. Each trial results in one of two outcomes: a success (a ticketed passenger shows up) or
a failure (a ticketed passenger does not show up).
iii. The trials are independent if there are no family groups; e.g., no couples traveling
together.
iv. The probability of a success (a ticketed passenger shows up for the flight) remains
constant from trial to trial where p = 0.90.
v. The random variable X that counts the number of ticketed passengers that show up for
the flight has a binomial distribution with n = 135 and p = 0.90.
vi. The normal approximation to the binomial distribution probably can be used because
np = 135(0.90) = 121.5 and n(1 – p) = 135(0.10) = 13.5. But using the normal
approximation to the binomial distribution gives a result that underestimates the
probability:
(
P(X ≥ 121) ≈ normalcdf 121, ∞, 121.5, 135(0.90)(0.10)
)
≈ 0.5570
The airline can sell 128 tickets for the 120-seat commuter flight and the chance they will
have to make any payout is about 5% because:
Ti 89/Ti 92Plus/ Voy 200:
P(X ≥ 121) = binomialcdf(128, 0.90, 121, 128) ≈ 0.05.
Ti 83 family/Ti 86:
P(X ≥ 121) = 1 - binomialcdf(128, 0.90, 120) ≈ 0.05.
Question 4.
a . The random variable X is the weight of an individual ball bearing. X is N(0.15, 0.003).
z =
0.148 - 0.15
≈ - 0.667
0.003
P( Z < - 0.667) = normalcdf(- ∞ , - 0.667 , 0 , 1) = 0.2524 approx.
Faster way. You can find the probability without standardizing first:
P(X < 0.148) = normalcdf(- ∞ , 0.148 , 0.15 , 0.003) ≈ 0.2525
10
Interpretation. If you select an individual ball bearing at random, the chance it will weigh
less than 0.148 g is about 25%.
b. The random variable
x
x
is the mean weight of a sample of n = 4 ball bearings. The statistic
is N(0.15, 0.003 ).
4
z =
0.148 - 0.15
≈ - 1.333
0.003
4
P( Z < - 1.333) = normalcdf(- ∞ , - 1.333 , 0 , 1) = 0.0913
Faster way. You can find the probability without standardizing first:
P( x < 0.148) = normalcdf(- ∞ , 0.148 , 0.15 ,
0.003
4
) ≈ 0.0912
I nterpretation. If you average the weight of 4 ball bearings, the chance that the average
will be less than 0.148 g is about 9%.
c . The random variable
variable
x
x
is the mean weight of a sample of 10 ball bearings. The random
is N(0.15, 0.003 ).
10
Fast way.
P( x < 0.148) = normalcdf(- ∞ , 0.148 , 0.15 ,
0.003
10
) ≈ 0.0175
Interpretation. There is less than a 2% chance that the weight averaged over 10 ball
bearings will be less than 0.148 g if the true mean weight is 0.15g. Sample averages are
more stable (tend to be closer to the population average) than individual measurements.
The larger the sample size n, the more stable the sample mean: the Law of Large Numbers
states that the average value observed in many trials must approach
μ.
Question 5. The value of the parameter p = 0.48. Since np = 500(0.48) > 10 and n(1 – p) > 10, the
distribution of the random variable, the statistic
(
(
p̂ , is approximately N 0.48,
P ( pˆ ≥ 0.5 ) = normalcdf 0.5, ∞, 0.48,
0.48(0.52)
500
)
0.48(0.52)
500
).
≈ 0.1854
Interpretation. If in fact 48% of all registered voters in the district actually favor the candidate, there
is about a 19% chance that a random sample of 500 will incorrectly predict the result of the upcoming
election by concluding that better than 50% of all registered voters favor the candidate.
11
Question 6.
The mean of the sample of size n = 10 is
x
= 15.974 oz. A 95% confidence interval
to estimate the true mean weight of the wine is:
⎛ 0.06 ⎞
x ± z*σ x ⇒ 15.974 ± 1.96 ⎜
⎟ ⇒ (15.94 oz, 16.01 oz )
⎝ 10 ⎠
Interpretation. With 95% confidence the true mean weight
μ
of the wine from today’s
production run lies between 15.94 oz and 16.01 oz. Since 16 oz falls within the interval, the
machine doesn’t need adjusting yet. Does the true mean weight of the wine
μ
actually lie
between these endpoints? We have no way of knowing. The confidence we have is in the
method we used, not the results. If we took 100 random samples and constructed a 95%
confidence interval for each sample’s mean
x , then we could expect that 95 of the intervals
would capture the true mean and 5 would not, but we would not know specifically which ones did
and which ones did not.
Question 7.
a. To estimate the true mean height
μ
of the women in your city to within 2 inches with 90%
confidence, the smallest random sample that will accomplish this goal is :
2
⎛
⎞
z *σ
⎛ 1.645(2) ⎞
= ⎜
n = ⎜
⎟ ≈ 3 women
⎟
desired
margin
of
error
2
⎝
⎠
⎝
⎠
b. To estimate the true mean height
μ
2
of the women in your city to within one-half inch with
90% confidence, the smallest random sample that will accomplish this goal is:
2
⎛
⎞
z *σ
⎛ 1.645(2) ⎞
= ⎜
n = ⎜
⎟ ≈ 44 women
⎟
⎝ 0.5 ⎠
⎝ desired margin of error ⎠
2
12