Download AP Stats General Random Variable Distribution Review

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts

Central limit theorem wikipedia, lookup

Transcript
AP Stats General Random Variable Distribution Review
Name_________________________ Per______
1. Spell Checking software catches “non-word” errors which result in a string of letters that is not a word, such as “het”
instead of “the”. When students are asked to write a 250 word essay without spell checking, the number X of
nonword errors has the following distribution:
Value of X:
P(X):
0
0.1
1
0.2
2
0.3
3
0.3
4
0.1
a) Write the event “at least one nonword error” in terms of X. What is the probability of this event?
b) Describe the event 𝑋 ≤ 2 in words. What is its probability? What is the probability that 𝑋 < 2?
c) Calculate the mean of the random variable X and interpret this result in context.
2. Faked numbers in tax returns, invoices or expense account claims often display patterns that aren’t present in
legitimate records. Some patterns, like too many round numbers, are obvious and easily avoided by a clever crook.
Others are more subtle. It is a striking fact that the first digits of numbers in legitimate records often follow a model
known as Benford’s law. Call the first digit of a randomly chosen record X for short. Benford’s law gives this
probability model of X (note that the first digit can’t be 0):
Value of X:
P(X):
a)
b)
c)
d)
e)
f)
3.
1
0.301
2
0.176
3
0.125
4
0.097
5
0.079
6
0.067
7
0.058
8
0.051
9
0.046
Show that this is a legitimate probability distribution.
Make histogram of this probability distribution and describe what you see.
Describe the event 𝑋 ≥ 6 in words. What is 𝑃(𝑋 ≥ 6)?
Express the event “first digit is at most 5” in terms of X. What is the probability of this event?
What is the expected value of the first digit?
What is the standard deviation of the first digit?
A not-so-clever employee decided to fake his monthly expense report. He believed that the first digits of his
expense amounts should be equally likely to be any of the numbers from 1 to 9.
a) Create a probability distribution table for this event.
b) Make a histogram of this probability distribution and describe what you see.
c) Explain why the mean is 5.
d) Using the expected value from problem 2, explain how this information could be used to detect a fake expense
report.
e) Using the “not-so-clever” approach, what is P(X > 6) ? How could this information be used to detect a fake
expense report?
f) What is the standard deviation of this distribution? Would comparing standard deviations be a good way of
detecting a fake? Explain.
4. A study of 12,000 able-bodied male students at the University of Illinois found that their times for the mile run
were approximately normal with a mean of 7.11 minutes and a standard deviation of 0.74 minutes. If a student
is chosen at random from this group, find the probability that his time is less than 6 minutes. Use all proper
notation with conclusion in context.
5. Many chess advocates believe that chess play develops general intelligence, analytical skill and the ability to
concentrate. According to such beliefs, improved reading skills should result from efforts to play chess. A study
was conducted. All subjects in the study participated in a comprehensive chess program and their reading
performance was measured before and after the program. The graphs and numerical summaries below provide
the information on the subjects pretest scores, posttest scores and the difference ( post – pre ) between these
two scores.
a) Did the students have higher reading scores after participating in the chess program? Give appropriate
statistical evidence to support your answer.
b) If the student found a statistically significant improvement in the reading scores, could you conclude that
playing chess causes an increase in reading skills: Justify your answer.
c) What is the equation of the linear regression model relating posttest and pretest scores? Define any
variables used.
d) Discuss what r2 and the residual plot tell you about this linear regression model.
6. Rotter Partners is planning a major investment. The amount of profit X (in millions of dollars) is uncertain, but
an estimate gives the following probability distribution:
Profit:
Probability:
1
0.1
1.5
0.2
2
0.4
4
0.2
10
0.1
Based on this estimate, what is the mean and standard deviation of the profit? Rotter Partners owes its lender
fee of $200,000 plus 10% of the profits X. So the firm actually retains Y = 0.9X – 0.2 from the investment. Find
the mean and standard deviation of Y. Show your work.
7. A company’s single-serving cereal boxes advertise 9.63 ounces of cereal. In fact, the amount of cereal X in a
randomly selected box follows a Normal distribution with a mean of 9.70 ounces and a standard deviation of
0.03 ounces.
a. Let Y = the excess amount of cereal beyond what’s advertised in a randomly selected box, measured in
grams (1 ounce = 28.35 grams). Find the mean and standard deviation of Y.
b. Find the probability of getting at least 3 grams more cereal than advertised. Show your work.
8. The design of a toaster calls for a 100-ohm resistor and a 250-ohm resistor connected in series so that their
resistances add. The resistance of the 100-ohm resistor is normally distributed with a mean of 100 ohms and a d
standard deviation of 2.5 ohms, whiles the resistance of the 250-ohm resistor is normally distributed with a
mean of 250 ohms and a standard deviation of 2.8 ohms.
a. Describe the distribution of the total resistance.
b. What is the probability that the total resistance lies between 345 and 355 ohms? Show your work.
9. The amount a life insurance company earns on a 5-year term life policy is labeled X. Calculations reveal that
µX = $303.35 and σX = $9707.57. The risk of insuring one person’s life is reduced if more persons are insured.
a. Suppose that two 21-year-old males are insured, and their ages at death are independent. If X1 and X2
are the insurer’s income from the two insurance policies, the insurer’s average income W can be
expressed as:
𝑊=
𝑋1 + 𝑋2
= 0.5𝑋1 + 0.5𝑋2
2
Find the mean and standard deviation of W.
b. If four men are insured and the amount of income earned on each policy is independent, find the mean
and standard deviation of V, the average income of the four policies. Show your work. Why is the risk
of insurance reduced with more people being insured?
10. The Transportation Security Administration (TSA) is responsible for airport safety. On some flights, TSA officers
randomly select passengers for an extra security check before boarding. One such flight had 76 passengers – 12
in first class and 64 in coach. Some passengers were annoyed that the 7 passengers chosen were all from coach.
What is the probability that all passengers chosen for the random check were from coach?
11. In baseball, a 0.300 hitter gets a hit in 30% of times at bat. When a baseball player hits 0.300, fans tend to be
impressed. Typical major leaguers bat about 500 times a season and hit about 0.260. A hitters successive tries
seem to be independent. Could a typical major leaguer hit 0.300 just by chance? Compute an appropriate
probability to support your answer.
12. Ed and Adelaide attend the same high school, but are in different math classes. The time “E” that it takes Ed to
do his math homework follows a normal distribution with a mean of 25 minutes and a standard deviation of 5
minutes. Adelaide’s math homework time “A” follows a normal distribution with a mean of 50 minutes and a
standard deviation of 10 minutes.
a. Describe the distribution of the difference in the amount of time each student spent on their
assignments. (D = A – E)
b. Find the probability that Ed spent longer on his assignment than Adelaide did on hers. Show your work.
13. Which whether the following is geometric, binomial or neither:
a. The number of 6s I get if I roll a die 10 times
b. The number of times I have to roll a die to get two 6s
c. The number of cards I deal from a deck of 52 cards until I get a heart
d. The number of digits I read in a randomly selected row of the random number table until I find a 7
e. The number of 7s in a row of 40 random digits
14. The weight of tomatoes chosen at random from a bin at the farmer’s market is a random variable with a mean
of 10 ounces and a standard deviation of 1 ounces. Suppose we pick four tomatoes at random from the bin
a. Find their total weight “T”.
b. Find the standard deviation of the four tomatoes
15. According to the Census Bureau, 13% of American adults are Hispanic. An opinion poll plans to contact a SRS of
1200 adults.
a. What is the mean number of Hispanics in such a sample? What is the standard deviation?
b. Should we be suspicious if the sample selected for the poll contains 15% Hispanic people? Compute an
appropriate probability to support your answer.
Answers:
1 a) The event {𝑋 ≥ 1} 𝑜𝑟 {𝑋 > 0}. P = 0.9
b) No more than two nonword errors. 𝑃(𝑋 ≤ 2) = 0.6 𝑃(𝑋 < 2) = 0.3
c) mean is 2.1. On average, undergraduates make 2.1 nonword errors per 250-word essay.
2 a) All probabilities are between 0 and 1 and they add up to 1
b) This is a right skewed distribution with a mode of 1.
c) The first digit in a readomly chosen record is a 6 or higher. p = 0.222
d) The event {𝑋 ≤ 5}. p = 0.778
e) µ = 3.441 x
f) σ = 2.4618
3 a)
Value of X:
P(X):
1
0.1
2
0.1
3
0.1
4
0.1
5
0.1
6
0.1
7
0.1
8
0.1
9
0.1
b) The distribution would have a uniform, symmetric shape.
c) Since the distribution is uniform and symmetric, the mean would be exactly in the middle which is 5
d) To detect a fake, compute the sample mean of the first digits and see if it is near 3.441 or near 5.
e) p = 0.333. Using Benford’s law, the same probability is 0.155. When looking at the suspect report, find the percent
of figures that start with number higher than 6. If that percent is closer to 33%, than to 15%, it is probably fake.
f) σ = 2.58. No this would not be the best way to find a fake because the standard deviations are not too different
from each other.
4) N (7.11, 0.74) find P(X < 6). 𝑧 =
6−7.11
0.74
= −1.50
P(X>6) = 0.0668. There is a 6.7% chance that the student will run
the mile faster than 6 minutes
5) a) Yes, students did have higher scores in general after participating in the chess program. The mean difference was
5.38 and the median was 3. This means that at least half of the students (though less than three quarters of them since
Q1 was negative) improved their reading scores.
b) No, we cannot conclude that chess causes an increase in reading scores. We did not have a control group that did
not participate in the chess program so we do not have a comparison group. It may be that children naturally improve
their reading scores over than time period.
c) Predicted posttest = 17.897 + 0.78301(pretest)
d) The residual plot does not show a pattern and the scatter plot shows a positive linear correlation. r calculates to
0.746 which is a moderate, positive linear association. Therefore, the LSRL is an appropriate model.
6) µx = 3 and σx = 2.52. µY = $2.5 million and σY = 2.268 million
7 a) µY = 1.985 grams, σY = 0.8505 grams
b) P(Y ≥ 3) = P(Z ≥ 1.19) = 0.1170
8 a) T = R1 + R2 is Normal with a mean of 350 ohms and a standard deviation of 3.737 ohms.
b) P(345 < resistance < 355) = P( -.133 < Z < 1.33) – 0.8164
9 a) µW = $303.35 and σW = $6,864.29
b) 𝑉 =
𝑋1 +𝑋2 +𝑋3 +𝑋4
4
µV = $303.35 and σV = $4,853.79. The variation is smaller by a factor of 1⁄
√2
10 ) Binomial: X = people chosen from coach out of 7 , n=7, p=0.8421 𝑃(𝑋 = 7) = (77)0.8427 (1 − 0.842)0 = 0.30
11) This can be done as a binomial or a normal approximation because np=130 and n(1-p)=370. Using Normal
approximatio,…Let X be the number of hits out of 500 times at bat. P(X ≥ 150) = 0.0207. We would expect only 2% of
typical baseball players to hit 0.300 so it is probably not by chance that a player hits 0.300.
12 a) The difference distribution will be a normal distribution with a mean of 25 minutes and a standard deviation of
11.18 minutes.
b) P(D < 0) = P(Z < 0) = 0.0125
13 a) binomial b) neither because you’d have to do two geometrics and then multiply the two probabilities together c)
geometric d) geometric e) binomial
14 a) 10 x 4 = 40 ounces
b) 𝜎 = √12 + 12 + 12 + 12 = 2
15 a) µ = 156 σ = 11.6499
b) If there were 15% then there would be 1200(0.15) 180 Hispanics in the sample. The problem does not state if the
distribution is binomial or Normal. If you assume binomial P(X ≥ 180) = 0.0235. If you assume normal, P(X ≥ 179.5) =
0.0218. In either case, the probabilities are low so we should be suspicious of the opinion poll.