Download Statistics Review: The Empirical Rule, the Z

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Central limit theorem wikipedia , lookup

Transcript
Statistics Review: The Empirical Rule, the Z-table, and confidence intervals
Name:___________________________
Recall that there are many types of distributions, some of which happen to be symmetric and bell-shaped. These
distributions are what we call approximately normal, and they arise often in real world situations. If data is
approximately normal, then you can apply the empirical rule:
Approximately 68% of the data falls within one standard deviation
from the mean.95% fall within two standard deviations, and 99.7%
fall within 3 standard deviations.
In addition, 50% of the data falls below the mean and 50% falls
above it.
Standard Deviations away from the mean
Using symmetry, probabilities can be calculated for many parts of
the normal distributions.
Practice:
1. Using the normal distribution above, what percent of the data should fall between 1 and 2 standard deviations above
the mean?
2. Using symmetry and the normal distribution above, what percentage of the data should fall 2 standard deviations
below the mean or less?
3. You purchased 10 baskets of strawberries at the local farmer's market and counted the number of strawberries in
each basket. Use your calculator to find the mean count of strawberries in a basket and the standard deviations
between baskets
a) Mean (known as 𝑥̅ ) =
b) Standard deviation (known as s) =
3
2
1
1
1
1
1
1
c) Calculate the percentage of baskets that fall within one
standard deviation of the mean.
1
15 16 17 18 19 20 21 22 23 24 25
d) Do you think this data is approximately normally distributed? Why or why not?
4. Andy Lee is the punter for the San Francisco 49ers. He had a stellar 2011 season with an average punt length of 50.9
yards with a standard deviation of 3.5. His punt distance follows a normal distribution. Label the following normal
distribution. Identify the range of punt distances that covers 68% of the distances and shade the graph accordingly.
5. The mean birth weight for American infants who are not born prematurely is 7.65 lbs., with a standard deviation of
1.12 lbs. Sketch a normal curve that shows the mean and three standard deviations above and below the mean.
a. What percent of babies weigh between 6.53 and 8.77 lbs?
b. What percent of babies weigh more than 9.89 lbs?
c. What percent of babies weigh less than 6.53 lbs?
d. Find the range of birth weights that include 99.7% of all
infants.
6. Examine the following table to the right, which gives information about the
heights of young Americans aged 18 to 24, each distribution is approximately
normal. Label the following two normal distributions appropriately, one for
men and the other for women.
a. Kim is 1 standard deviation below the mean. How tall is she?
b. Mark is 1.5 standard deviations below the mean. How tall is
he?
c. Chris is 70 inches tall, while Amy is 68 inches tall. For each of them, calculate the number of standard deviations above the
average he or she is. Then, determine which one is relatively taller for his or her gender.
Oftentimes using a standardized value makes comparisons across distributions much easier as was done above in
problem 6c. A standardized value, known as a Z-score, is calculated in the following way where 𝑥̅ = mean and s=
Standard deviation:
̅
𝒙−𝒙
𝐙=
𝒔
7. Andy Lee is the punter for the San Francisco 49ers. He had a stellar 2011 season with an average punt length of 50.9
yards with a standard deviation of 3.5. His punt distance follows a normal distribution. In the very last game of the postseason, Andy Lee made his last punt for 39 yards. What is the Z-score for this punt?
̅
𝒙−𝒙
𝐙=
𝒔
8. Z-scores allow users to be more precise with estimating probabilities associated with the normal distribution by using
a Z-table. The partial Z-table below indicates the percentage of the population below a certain z-score.
a) The length of a phone conversation, measured in minutes, follows a normal
distribution with a mean of 7 minutes and a standard deviation of 2.2
minutes. What is the approximate probability that the phone conversation
lasts less than 9.4 minutes?
b) What is the approximate probability that a phone conversation could last more than
13 minutes?
c) What is the approximate probability of a phone conversation lasting between 9.4
and 11 minutes?
Another important use of normal distributions and z-scores arises often in studies. When a researcher gathers a simple
random sample from a population, she or he may wish to generalize the sample’s mean to the entire population’s mean.
To use the sample’s mean (called the statistics) to estimate the population’s true mean (called the parameter) we use
statistical tools called confidence intervals. A confidence interval takes the sample’s statistic and puts a margin of error
around to account for sampling variation. Adding and subtracting the margin of error creates an interval which very
likely contains the true population parameter. By making the margin of error smaller or larger (and by extension making
the confidence interval smaller or larger) we can be more or less confident in the fact that that the true population
parameter is in our sample.
9. Suppose a sample’s mean is 64 and at 95% confidence its margin of error is 12. What is the 95% confidence interval?
10. Suppose the same sample has a 99.7% confidence margin of error at 20. What is the 99.7% confidence interval?
Why did increasing our confidence level make the interval bigger?
In addition to confidence levels, there is another factor that affects the margin of error: the sample’s size. Naturally, if
the sample gets larger and larger it should better approximate the population’s distribution. Therefore, margin of error
is calculated in the following way depending if you are estimating a mean or a proportion:
Mean
𝐌=𝐙×
Proportion
𝒔
√𝑵
𝒑 × (𝟏 − 𝒑)
𝐌=𝐙×√
𝑵
Where 𝐙 is the z-score associated with the confidence level, 𝒔 is the sample’s proportion, 𝑵 is the sample size, and 𝒑 is
the sample’s proportion.
11. 100 male weightlifters from a popular gym were tested on the
bench press. They averaged 325 pounds for one repetition with a
standard deviation of 52 pounds. The gym has many more male
costumers and wants to estimate the average for all males that go
to the gym.
a) What is the margin of error for this mean assuming the gym
wanted to have a 95% confidence level?
b) What is the confidence interval for estimating the average for all males in the gym? Why might this be important to
the gym owner?
12. A local candy store has found that kids prefer certain colors of candy regardless of their taste. For kids ages five to
eight, the data to the right was collected.
a) What is the estimated population proportion of the most preferred candy color from the sample?
b) Construct a 90% confidence interval for the proportion of the most preferred candy color. Why might this be
important to the candy store?
13. The ticket sales of the current top six movies, in figures of millions, are shown to the right. Assuming a 95%
confidence level, what is the margin of error and the confidence interval for the mean of ticket sales? (Note N is total
ticket sales)
14. A company specializing in building robots that clean your house has found that the average amount of time kids are
forced (yes, forced) to spend cleaning their houses is about 2 hours per week. If their sample size was 1000 randomly
chosen kids and the standard deviation was 0.3 hours, what is the margin of error for a confidence interval of 95%?
15. Of all the new 3D television users, 72% have reported not having any side effects such as vertigo, headaches, or
undue eyestrain as compared to watching regular television. If this sample questionnaire had 2500 respondents, what is
the margin of error (assuming a confidence level of 95%)? Round to the nearest hundredth decimal. (Hint: this is a
proporiton)
16. A total of 50 boats participate in a race. You decide to figure out what the average finishing time is, but everything
happens so fast, and you can only get a sample of five boats' times written down. The average finishing times (in
seconds) are given to the right. What is the estimated population average based on your
sample?
17. You've recently been volunteering your time helping the elderly play bingo at the local retirement home. (Strangely,
you've noticed your cheeks are bruised, but that's probably due to all of the seniors pinching them.) You've randomly
surveyed some of the bingo players to find that they play bingo a certain amount of hours per week. What is the
estimated population mean based on your sample?
18. Which of the following is a correct representation of a proportion?
(A) The average amount of hours that 18 to 25 year old males spend playing video games
(B) A study that found its poll had a margin of error of 5%
(C) A research firm that found that 85% of all employees spend an hour eating lunch
(D) A study that claims that people who stare at the computer screen for too long have an increased chance of
developing eye fatigue.
19. Which of the following is most like a proportion of a sample?
(A) 95 out of 100 students prefer Coke to Pepsi
(B) 95% of women surveyed prefer pasta to ravioli
(C) 95 in 100 dogs will develop heartworm without proper treatment
(D) All of the above
20. Which of the following best represents a sample mean?
(A) The average length of a bridge in America is 500 feet long
(B) It's found that on average, 53% of voters in Alabama are Republican
(C) TV shows average about 22 minutes for every 30 minutes of broadcasting
(D) All of the above