Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Bootstrapping (statistics) wikipedia , lookup
Psychometrics wikipedia , lookup
Inductive probability wikipedia , lookup
Taylor's law wikipedia , lookup
Foundations of statistics wikipedia , lookup
History of statistics wikipedia , lookup
Law of large numbers wikipedia , lookup
Resampling (statistics) wikipedia , lookup
Honors Math 3 Unit 3: Inferential Statistics Name_________________________ Unit 3 Review Your unit 3 test will take place on Wednesday, 1/11 during class. This is a list of the important topics we have covered in this unit: 1. Statistics for a data set / Measures of spread Find the mean (equally-likely outcomes or weighted), variance (mean squared deviation), and standard deviation Prove and use this alternate formula for variance: 𝑥2 − (𝑥)2 Prove and use the idea that means and variances are additive Prove effects on mean and variance when a constant is added to or multiplied by each value in a data set. 2. Repeated experiments Given the statistics (mean, variance, and/or standard deviation) for a single experiment, find the statistics for summing the results of the experiment repeated n times For repeated Bernoulli trials, find the exact probability for k successes (using the formula involving a combination number) For repeated Bernoulli trials, find the mean, variance, and standard deviation 3. Probability distributions Make probability histograms For normal distributions, apply the 68/95/99+% rule and use normalcdf function to approximate probability Apply the Central Limit Theorem to identify normal distributions for answering probability questions (CLT for sums vs CLT for averages) Compute z-scores to compare values from multiple data sets 4. Sample surveys Identify appropriate methods for random sampling Given a sample proportion, use confidence intervals to estimate what values are plausible for the population parameter (assume 95% confidence if not specified) Find margins of error (expressed as percent) Understand that correlation does not imply causation Understand the differences between sample surveys, observational studies, and experiments, and biases associated with each 5. Assessing the effectiveness of a treatment (experiments) Identify appropriate randomization methods for selecting the treatment and control groups Recognize outcomes that show evidence that a treatment is effective (using 5% tolerance) 1. The following table shows the distribution of days in the hospital after birth for 57 new mothers. Days in hospital 1 2 3 4 5 Number of Mothers 14 34 5 3 1 a. Calculate the mean, variance, and standard deviation for the above data. b. Does this data have normal distribution? Tell why or why not. 2. A manufacturer produces a large number of toasters. From past experience, the manufacturer knows that approximately 2% are defective. In a quality control procedure, we randomly select 20 toasters for testing. a. Determine the probability that exactly one of the toasters is defective. b. Find the exact probability that at most two of the toasters are defective. Include enough details so that it can be understood how you arrived at your answer. c. Find the mean and standard deviation for the random variable X in the toaster problem. Make sure to define the random variable X. 3. A student scores 60 on a math test that has a mean of 54 and a standard deviation of 3, and she scores 80 on a history test with a mean of 75 and a standard deviation of 2. On which test did she do better compared to the rest of the class? 4. The average number of pounds of meat a person consumes a year is 218.4 pounds. Assume that the standard deviation is 25 pounds and the distribution is approximately normal. a. Find the probability that a person selected at random consumes less than 224 pounds per year. b. If a sample of 40 individuals is selected, find the probability that the average of the sample will be less than 224 pounds per year. 5. In this problem you’ll need to combine two ideas from the chapter: the statistics of repeated Bernoulli trials and calculating the standard deviation as a proportion or percentage of the number of trials. Suppose that a Bernoulli trial with probability p is repeated n times. a. In terms of p and n, calculate the standard deviation as a proportion of the number of trials. Hint: calculate n then simplify. b. In part a you found the standard deviation as a proportion of the number of trials. What is the limit of this result when n gets larger and larger? c. Apply the result of part a to 1,000,000 flips of a 60%/40% unfair coin. 6. The travel times for the MBTA 76 bus from Hanscom to Alewife have a normal distribution with a mean of 39 minutes with a standard deviation of 3.5 minutes. Answer the following questions, using appropriate calculator functions where needed. a. Suppose that the MBTA wishes to publish a range of possible travel times for this route, such that 99.7% of the trips will fall in the range. Find this range. b. What percent of this bus route’s trips have a travel time between 35 and 45 minutes? c. For each of the following travel times, calculate the z score, then state whether the travel time would be fairly typical, highly unusual, or in-between: 40 minutes, 30 minutes, 35 minutes, 45 minutes, 60 minutes d. Tony, Sasha, and Derman separately traveled this bus route and calculated their z-scores: z = 0.3 for Tony, z = –1.5 for Sasha, and z = 2.1 for Derman. Find their travel times. e. Using an appropriate calculator function, approximate the probability that the travel time on this bus route, rounded to the nearest minute, will be 38 minutes. f. An MBTA executive is investigating a complaint that trips on this bus sometimes take more than 45 minutes. What percent of the trips take more than 45 minutes? g. Suppose that the MBTA wishes to revise its range of possible travel times for this route, such that 97% of the trips will fall in the range. Find this range. 7. The average time it takes a group of adults to complete a certain achievement test is 46.2 minutes. The standard deviation is 8 minutes. Assume the variable is normally distributed. a. Find the probability that, if 50 randomly selected adults take the test, the average time it takes the group to complete the test will be less than 43 minutes. b. Find the probability that, if 50 randomly selected adults take the test, the total time it takes the group to complete the test will be more than 2440 minutes. 8. Some of the desks at a university are designed for left-handed students. The university’s Left-Handed Student Association is investigating whether there are enough left-handed desks on campus. The group checks 300 desks randomly chosen from the 10,000 desks at the university, and finds that 45 of them are left-handed desks. Based on this research, the group estimates that 15% of the university’s desks are lefthanded. What is the percent margin of error (using a 95% confidence level) for this estimate? 9. A company offers a course for students to help them prepare for the state standardized test. The company claims the course will improve students’ ability to pass the standardized test. A researcher would like to test this claim. Forty students volunteer to participant in the study and are divided equally into two groups: the Treatment Group and the Control Group. a. Describe an effective and valid way to randomize the participants into the Treatment Group and Control Group. b. Describe what should occur if a participant is in the Treatment Group or Control Group. c. The following table shows the experiment’s results. Do you believe the claim that the course improved students’ ability to pass the test? Show calculations to support your answer. Treatment Group Control Group Total Passed 14 11 25 Failed 6 9 15 Total 20 20 40 Answers 1. a. mean = 2, variance = 40 , standard deviation = 57 40 » 0.838 57 b. Not normally distributed – pretty heavily skewed – not symmetrical about the mean. æ20ö 1 2. a. ç ÷(.02) (.98)19 » 0.272 è1ø b. P(at most 2 defective) = P(exactly 0 defective) + P(exactly 1 defective) + P(exactly 2 defective) = æ20ö æ20ö æ20ö 0 20 1 19 2 18 ç ÷(.02) (.98) + ç ÷(.02) (.98) + ç ÷(.02) (.98) » 0.993 è0ø è1ø è2ø c. X is the number of defective toasters out of 20 tested. mean of X: 20(0.02) = 0.4, standard deviation of X: 20(0.02)(0.98) » 0.626 60 - 54 80 - 75 = 2, History: = 2.5. This means that in math her 3 2 score is 2 std. dev. above the mean, while in history her score is 2.5 std. dev. above the mean. Therefore, she did better compared to the rest of the class in history. 3. Find z-scores for each test score. Math: 4. a. normcdf (0, 224, 218.4, 25) 0.589 5. a. c. s n = np(1- p) = n p(1- p) n b. normcdf (0, 224, 218.4, b. n 25 40 ) 0.922. gets smaller as n gets larger. The limit is 0. 0.6(0.4) = 4.899 ×10-4 or 0.0004899 or 0.04899% 1000000 6. a. 28.5 to 49.5 minutes b. normalcdf(35,45,39,3.5) ≈ 0.830, so 83%. c. 0.286 (typical), –2.571 (unusual), –1.143 (typical), 1.714 (in-between), 6 (very unusual). d. 40.05, 33.75, and 46.35 minutes. e. normalcdf(37.5,38.5,39,3.5) ≈ 0.109 f. You need to pick some large number for the upper end of the interval. You’ll get roughly the same answer regardless of the choice: normalcdf(45,1000,39,3.5) ≈ 0.0432 = 4.32%. g. Need normalcdf(39 – x, 39 + x, 39, 3.5) = 0.97 7. a. normalcdf(0,43, 46.2, 8 50 Answer: 31.4 to 46.6 minutes. ) ≈ 0.00234 b. normalcdf(2440,1000000, 46.2 × 50, 8× 50 ) ≈ 0.0108 8. s = 300(0.15)(0.85) = 6.185 2s =12.37 is 4.12% 9. a. Sample: Put all 40 names in a hat, pull 20 names and assign to Treatment Group, rest to Control Group. b. Treatment group should attend the course and the Control Group should not c. There is not enough evidence to say the course improved the students’ ability to pass the test (based on a 5% cutoff). (normalcdf (14, 20, 11, 20(.55)(.45) )=0.089)