Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Statistics Review: The Empirical Rule, the Z-table, and confidence intervals Name:___________________________ Recall that there are many types of distributions, some of which happen to be symmetric and bell-shaped. These distributions are what we call approximately normal, and they arise often in real world situations. If data is approximately normal, then you can apply the empirical rule: Approximately 68% of the data falls within one standard deviation from the mean.95% fall within two standard deviations, and 99.7% fall within 3 standard deviations. In addition, 50% of the data falls below the mean and 50% falls above it. Standard Deviations away from the mean Using symmetry, probabilities can be calculated for many parts of the normal distributions. Practice: 1. Using the normal distribution above, what percent of the data should fall between 1 and 2 standard deviations above the mean? 2. Using symmetry and the normal distribution above, what percentage of the data should fall 2 standard deviations below the mean or less? 3. You purchased 10 baskets of strawberries at the local farmer's market and counted the number of strawberries in each basket. Use your calculator to find the mean count of strawberries in a basket and the standard deviations between baskets a) Mean (known as 𝑥̅ ) = b) Standard deviation (known as s) = 3 2 1 1 1 1 1 1 c) Calculate the percentage of baskets that fall within one standard deviation of the mean. 1 15 16 17 18 19 20 21 22 23 24 25 d) Do you think this data is approximately normally distributed? Why or why not? 4. Andy Lee is the punter for the San Francisco 49ers. He had a stellar 2011 season with an average punt length of 50.9 yards with a standard deviation of 3.5. His punt distance follows a normal distribution. Label the following normal distribution. Identify the range of punt distances that covers 68% of the distances and shade the graph accordingly. 5. The mean birth weight for American infants who are not born prematurely is 7.65 lbs., with a standard deviation of 1.12 lbs. Sketch a normal curve that shows the mean and three standard deviations above and below the mean. a. What percent of babies weigh between 6.53 and 8.77 lbs? b. What percent of babies weigh more than 9.89 lbs? c. What percent of babies weigh less than 6.53 lbs? d. Find the range of birth weights that include 99.7% of all infants. 6. Examine the following table to the right, which gives information about the heights of young Americans aged 18 to 24, each distribution is approximately normal. Label the following two normal distributions appropriately, one for men and the other for women. a. Kim is 1 standard deviation below the mean. How tall is she? b. Mark is 1.5 standard deviations below the mean. How tall is he? c. Chris is 70 inches tall, while Amy is 68 inches tall. For each of them, calculate the number of standard deviations above the average he or she is. Then, determine which one is relatively taller for his or her gender. Oftentimes using a standardized value makes comparisons across distributions much easier as was done above in problem 6c. A standardized value, known as a Z-score, is calculated in the following way where 𝑥̅ = mean and s= Standard deviation: ̅ 𝒙−𝒙 𝐙= 𝒔 7. Andy Lee is the punter for the San Francisco 49ers. He had a stellar 2011 season with an average punt length of 50.9 yards with a standard deviation of 3.5. His punt distance follows a normal distribution. In the very last game of the postseason, Andy Lee made his last punt for 39 yards. What is the Z-score for this punt? ̅ 𝒙−𝒙 𝐙= 𝒔 8. Z-scores allow users to be more precise with estimating probabilities associated with the normal distribution by using a Z-table. The partial Z-table below indicates the percentage of the population below a certain z-score. a) The length of a phone conversation, measured in minutes, follows a normal distribution with a mean of 7 minutes and a standard deviation of 2.2 minutes. What is the approximate probability that the phone conversation lasts less than 9.4 minutes? b) What is the approximate probability that a phone conversation could last more than 13 minutes? c) What is the approximate probability of a phone conversation lasting between 9.4 and 11 minutes? Another important use of normal distributions and z-scores arises often in studies. When a researcher gathers a simple random sample from a population, she or he may wish to generalize the sample’s mean to the entire population’s mean. To use the sample’s mean (called the statistics) to estimate the population’s true mean (called the parameter) we use statistical tools called confidence intervals. A confidence interval takes the sample’s statistic and puts a margin of error around to account for sampling variation. Adding and subtracting the margin of error creates an interval which very likely contains the true population parameter. By making the margin of error smaller or larger (and by extension making the confidence interval smaller or larger) we can be more or less confident in the fact that that the true population parameter is in our sample. 9. Suppose a sample’s mean is 64 and at 95% confidence its margin of error is 12. What is the 95% confidence interval? 10. Suppose the same sample has a 99.7% confidence margin of error at 20. What is the 99.7% confidence interval? Why did increasing our confidence level make the interval bigger? In addition to confidence levels, there is another factor that affects the margin of error: the sample’s size. Naturally, if the sample gets larger and larger it should better approximate the population’s distribution. Therefore, margin of error is calculated in the following way depending if you are estimating a mean or a proportion: Mean 𝐌=𝐙× Proportion 𝒔 √𝑵 𝒑 × (𝟏 − 𝒑) 𝐌=𝐙×√ 𝑵 Where 𝐙 is the z-score associated with the confidence level, 𝒔 is the sample’s proportion, 𝑵 is the sample size, and 𝒑 is the sample’s proportion. 11. 100 male weightlifters from a popular gym were tested on the bench press. They averaged 325 pounds for one repetition with a standard deviation of 52 pounds. The gym has many more male costumers and wants to estimate the average for all males that go to the gym. a) What is the margin of error for this mean assuming the gym wanted to have a 95% confidence level? b) What is the confidence interval for estimating the average for all males in the gym? Why might this be important to the gym owner? 12. A local candy store has found that kids prefer certain colors of candy regardless of their taste. For kids ages five to eight, the data to the right was collected. a) What is the estimated population proportion of the most preferred candy color from the sample? b) Construct a 90% confidence interval for the proportion of the most preferred candy color. Why might this be important to the candy store? 13. The ticket sales of the current top six movies, in figures of millions, are shown to the right. Assuming a 95% confidence level, what is the margin of error and the confidence interval for the mean of ticket sales? (Note N is total ticket sales) 14. A company specializing in building robots that clean your house has found that the average amount of time kids are forced (yes, forced) to spend cleaning their houses is about 2 hours per week. If their sample size was 1000 randomly chosen kids and the standard deviation was 0.3 hours, what is the margin of error for a confidence interval of 95%? 15. Of all the new 3D television users, 72% have reported not having any side effects such as vertigo, headaches, or undue eyestrain as compared to watching regular television. If this sample questionnaire had 2500 respondents, what is the margin of error (assuming a confidence level of 95%)? Round to the nearest hundredth decimal. (Hint: this is a proporiton) 16. A total of 50 boats participate in a race. You decide to figure out what the average finishing time is, but everything happens so fast, and you can only get a sample of five boats' times written down. The average finishing times (in seconds) are given to the right. What is the estimated population average based on your sample? 17. You've recently been volunteering your time helping the elderly play bingo at the local retirement home. (Strangely, you've noticed your cheeks are bruised, but that's probably due to all of the seniors pinching them.) You've randomly surveyed some of the bingo players to find that they play bingo a certain amount of hours per week. What is the estimated population mean based on your sample? 18. Which of the following is a correct representation of a proportion? (A) The average amount of hours that 18 to 25 year old males spend playing video games (B) A study that found its poll had a margin of error of 5% (C) A research firm that found that 85% of all employees spend an hour eating lunch (D) A study that claims that people who stare at the computer screen for too long have an increased chance of developing eye fatigue. 19. Which of the following is most like a proportion of a sample? (A) 95 out of 100 students prefer Coke to Pepsi (B) 95% of women surveyed prefer pasta to ravioli (C) 95 in 100 dogs will develop heartworm without proper treatment (D) All of the above 20. Which of the following best represents a sample mean? (A) The average length of a bridge in America is 500 feet long (B) It's found that on average, 53% of voters in Alabama are Republican (C) TV shows average about 22 minutes for every 30 minutes of broadcasting (D) All of the above