Survey

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts

Transcript

Welcome to MM207! Unit 9 Seminar End of term deadlines • Final Project due Tuesday, by 11:59 pm ET • Unit 10 contains several discussion questions and an internet resource. While these are not graded, you should complete them. Discussion Question #1 List one statistical concept that you will use in your profession. Why? Discussion Question #2 Which specific statistical concepts are still unclear? What do you need to make them clearer? Discussion Question #3 List a specific statistical concept that you would feel comfortable explaining to another. Why do you feel you mastered this concept? Example: Identifying Sampling Techniques You are doing a study to determine the opinion of students at your school regarding stem cell research. Identify the sampling technique used. 1. You divide the student population with respect to majors and randomly select and question some students in each major. Solution: Stratified sampling (the students are divided into strata (majors) and a sample is selected from each major) Larson/Farber 4th ed. 6 Example: Identifying Sampling Techniques 2. You assign each student a number and generate random numbers. You then question each student whose number is randomly selected. Solution: Simple random sample (each sample of the same size has an equal chance of being selected and each student has an equal chance of being selected.) Larson/Farber 4th ed. 7 Example: Comparing z-Scores from Different Data Sets In 2007, Forest Whitaker won the Best Actor Oscar at age 45 for his role in the movie The Last King of Scotland. Helen Mirren won the Best Actress Oscar at age 61 for her role in The Queen. The mean age of all best actor winners is 43.7, with a standard deviation of 8.8. The mean age of all best actress winners is 36, with a standard deviation of 11.5. Find the z-score that corresponds to the age for each actor or actress. Then compare your results. Larson/Farber 4th ed. 8 Solution: Comparing z-Scores from Different Data Sets • Forest Whitaker z x • Helen Mirren z Larson/Farber 4th ed. x 45 43.7 0.15 8.8 0.15 standard deviations above the mean 61 36 2.17 11.5 2.17 standard deviations above the mean 9 Solution: Comparing z-Scores from Different Data Sets z = 0.15 z = 2.17 The z-score corresponding to the age of Helen Mirren is more than two standard deviations from the mean, so it is considered unusual. Compared to other Best Actress winners, she is relatively older, whereas the age of Forest Whitaker is only slightly higher than the average age of other Best Actor winners. Larson/Farber 4th ed. 10 Distinguishable Permutations Distinguishable Permutations • The number of distinguishable permutations of n objects where n1 are of one type, n2 are of another type, and so on n! ■ n1 ! n2 ! n3 ! nk ! where n1 + n2 + n3 +∙∙∙+ nk = n Larson/Farber 4th ed 11 Example: Distinguishable Permutations A building contractor is planning to develop a subdivision that consists of 6 one-story houses, 4 twostory houses, and 2 split-level houses. In how many distinguishable ways can the houses be arranged? Solution: • There are 12 houses in the subdivision • n = 12, n1 = 6, n2 = 4, n3 = 2 12! 6! 4! 2! 13,860 distinguishable ways Larson/Farber 4th ed 12 Example: Finding Probabilities You have 11 letters consisting of one M, four Is, four Ss, and two Ps. If the letters are randomly arranged in order, what is the probability that the arrangement spells the word Mississippi? Larson/Farber 4th ed 13 Solution: Finding Probabilities • There is only one favorable outcome • There are 11! 34, 650 1! 4! 4! 2! 11 letters with 1,4,4, and 2 like letters distinguishable permutations of the given letters 1 P( Mississippi ) 0.000029 34650 Larson/Farber 4th ed 14 Example: Graphing a Binomial Distribution Fifty-nine percent of households in the U.S. subscribe to cable TV. You randomly select six households and ask each if they subscribe to cable TV. Construct a probability distribution for the random variable x. Then graph the distribution. (Source: Kagan Research, LLC) Solution: • n = 6, p = 0.59, q = 0.41 • Find the probability for each value of x Larson/Farber 4th ed 15 Solution: Graphing a Binomial Distribution x 0 1 2 3 4 5 6 P(x) 0.005 0.041 0.148 0.283 0.306 0.176 0.042 Histogram: Subscribing to Cable TV 0.35 Probability 0.3 0.25 0.2 0.15 0.1 0.05 0 0 1 2 3 4 5 6 Households Larson/Farber 4th ed 16 Mean, Variance, and Standard Deviation • Mean: μ = np • Variance: σ2 = npq • Standard Deviation: npq Larson/Farber 4th ed 17 Example: Finding the Mean, Variance, and Standard Deviation In Pittsburgh, Pennsylvania, about 56% of the days in a year are cloudy. Find the mean, variance, and standard deviation for the number of cloudy days during the month of June. Interpret the results and determine any unusual values. (Source: National Climatic Data Center) Solution: n = 30, p = 0.56, q = 0.44 Mean: μ = np = 30∙0.56 = 16.8 Variance: σ2 = npq = 30∙0.56∙0.44 ≈ 7.4 Standard Deviation: npq 30 0.56 0.44 2.7 Larson/Farber 4th ed 18 Solution: Finding the Mean, Variance, and Standard Deviation μ = 16.8 σ2 ≈ 7.4 σ ≈ 2.7 • On average, there are 16.8 cloudy days during the month of June. • The standard deviation is about 2.7 days. • Values that are more than two standard deviations from the mean are considered unusual. 16.8 – 2(2.7) =11.4, A June with 11 cloudy days would be unusual. 16.8 + 2(2.7) = 22.2, A June with 23 cloudy days would also be unusual. Larson/Farber 4th ed 19 Sample Size • Given a c-confidence level and a margin of error E, the minimum sample size n needed to estimate p is 2 zc ˆ ˆ n pq E • This formula assumes you have an estimate for p̂ and qˆ . • If not, use pˆ 0.5 and qˆ 0.5. Larson/Farber 4th ed 20 Example: Sample Size You are running a political campaign and wish to estimate, with 95% confidence, the proportion of registered voters who will vote for your candidate. Your estimate must be accurate within 3% of the true population. Find the minimum sample size needed if 1. no preliminary estimate is available. Solution: Because you do not have a preliminary estimate for p̂ use pˆ 0.5 and qˆ 0.5. Larson/Farber 4th ed 21 Solution: Sample Size • c = 0.95 zc = 1.96 2 E = 0.03 2 zc 1.96 ˆ ˆ (0.5)(0.5) n pq 1067.11 0.03 E Round up to the nearest whole number. With no preliminary estimate, the minimum sample size should be at least 1068 voters. Larson/Farber 4th ed 22 z-Test for a Population Proportion z-Test for a Population Proportion • A statistical test for a population proportion. • Can be used when a binomial distribution is given such that np ≥ 5 and nq ≥ 5. • The test statistic is the sample proportion p̂ . • The standardized test statistic is z. z Larson/Farber 4th ed. pˆ pˆ pˆ pˆ p pq n 23 Using a z-Test for a Proportion p Verify that np ≥ 5 and nq ≥ 5 In Words 1. State the claim mathematically and verbally. Identify the null and alternative hypotheses. 2. Specify the level of significance. In Symbols State H0 and Ha. Identify α. 3. Sketch the sampling distribution. 4. Determine any critical value(s). Larson/Farber 4th ed. Use Table 5 in Appendix B. 24 Using a z-Test for a Proportion p In Words In Symbols 5. Determine any rejection region(s). 6. Find the standardized test statistic. 7. Make a decision to reject or fail to reject the null hypothesis. 8. Interpret the decision in the context of the original claim. Larson/Farber 4th ed. p̂ p z pq n If z is in the rejection region, reject H0. Otherwise, fail to reject H0. 25 Example: Hypothesis Test for Proportions Zogby International claims that 45% of people in the United States support making cigarettes illegal within the next 5 to 10 years. You decide to test this claim and ask a random sample of 200 people in the United States whether they support making cigarettes illegal within the next 5 to 10 years. Of the 200 people, 49% support this law. At α = 0.05 is there enough evidence to reject the claim? Solution: • Verify that np ≥ 5 and nq ≥ 5. np = 200(0.45) = 90 and nq = 200(0.55) = 110 Larson/Farber 4th ed. 26 Solution: Hypothesis Test for Proportions • • • • • Test Statistic pˆ p 0.49 0.45 z pq n (0.45)(0.55) 200 H0: p = 0.45 Ha: p ≠ 0.45 = 0.05 Rejection Region: 0.025 -1.96 0.025 0 1.96 1.14 Larson/Farber 4th ed. z 1.14 • Decision: Fail to reject H0 At the 5% level of significance, there is not enough evidence to reject the claim that 45% of people in the U.S. support making cigarettes illegal within the next 5 to 10 years. 27 Example: Using Technology to Find a Regression Equation Use a technology tool to find the equation of the regression line for the Old Faithful data. Larson/Farber 4th ed. Duration x Time, y Duration x Time, y 1.8 56 3.78 79 1.82 58 3.83 85 1.9 62 3.88 80 1.93 56 4.1 89 1.98 57 4.27 90 2.05 57 4.3 89 2.13 60 4.43 89 2.3 57 4.47 86 2.37 61 4.53 89 2.82 73 4.55 86 3.13 76 4.6 92 3.27 77 4.63 91 3.65 77 28 Solution: Using Technology to Find a Regression Equation 100 50 Larson/Farber 4th ed. 1 5 29 Example: Predicting y-Values Using Regression Equations The regression equation for the advertising expenses (in thousands of dollars) and company sales (in thousands of dollars) data is ŷ = 50.729x + 104.061. Use this equation to predict the expected company sales for the following advertising expenses. (Recall from section 9.1 that x and y have a significant linear correlation.) 1. 1.5 thousand dollars 2. 1.8 thousand dollars 3. 2.5 thousand dollars Larson/Farber 4th ed. 30 Solution: Predicting y-Values Using Regression Equations ŷ = 50.729x + 104.061 1. 1.5 thousand dollars ŷ =50.729(1.5) + 104.061 ≈ 180.155 When the advertising expenses are $1500, the company sales are about $180,155. 2. 1.8 thousand dollars ŷ =50.729(1.8) + 104.061 ≈ 195.373 When the advertising expenses are $1800, the company sales are about $195,373. Larson/Farber 4th ed. 31 Solution: Predicting y-Values Using Regression Equations 3. 2.5 thousand dollars ŷ =50.729(2.5) + 104.061 ≈ 230.884 When the advertising expenses are $2500, the company sales are about $230,884. Prediction values are meaningful only for x-values in (or close to) the range of the data. The x-values in the original data set range from 1.4 to 2.6. So, it would not be appropriate to use the regression line to predict company sales for advertising expenditures such as 0.5 ($500) or 5.0 ($5000). Larson/Farber 4th ed. 32