Survey

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Transcript

MATH 138 Statistics Classroom Handouts Unit 3 and Final Review Name: _________________________ Course Number: ________________ Instructor: _____________________ Rev. 11/2015 MATH-138: Objectives Intro Stats, by Richard D. De Veaux, Paul F. Velleman, and David E. Bock Unit III The student will be able to: Chapter 15: Sampling Distribution Models 1. 2. 3. State and apply the conditions and uses of the Central Limit Theorem. Determine the mean and standard deviation (standard error) for a sampling distribution of proportions or means. Apply the sampling distribution of a proportion or a mean to application problems. Chapter 16: Confidence Intervals for Proportions 4. 5. 6. Determine and interpret the margin of error in the context of the problem. Construct a confidence interval for a proportion and interpret in the context of the problem, checking the necessary assumptions. Determine the sample size necessary to produce a specific margin of error. Chapter 17: Testing Hypotheses about Proportions 7. Perform a one-proportion z-test, to include: writing appropriate hypotheses, checking the necessary assumptions, drawing an appropriate diagram, computing the P-value, making a decision, and interpreting the results in the context of the problem. Chapter 18: Inferences about Means 8. 9. Perform a t-test for the population mean, to include: writing appropriate hypotheses, checking the necessary assumptions, drawing an appropriate diagram, computing the P-value, making a decision, and interpreting the results in the context of the problem. Compute and interpret in context a t-based confidence interval for the population mean, checking the necessary assumptions. Chapter 19: More about Tests and Intervals 10. 11. 12. Interpret the meaning of the P-value in the context of the problem. Determine when a result is statistically significant. Explain Type I and Type II errors in the context of the problem. Chapter 20: Comparing Groups 13. 14. 15. 16. Find and interpret a confidence interval for the difference between two proportions, checking the necessary assumptions. Perform a two-proportion z-test, to include: writing appropriate hypotheses, checking the necessary assumptions, drawing an appropriate diagram, computing the P-value, making a decision, and interpreting the results in the context of the problem. Compute and interpret in context a t-based confidence interval for the difference between two population means, checking the necessary assumptions. Perform and interpret a two-sample t-test for two population means, to include: writing appropriate hypotheses, checking the necessary assumptions, drawing an appropriate diagram, computing the P-value, making a decision, and interpreting the results in the context of the problem. Chapter 21: Paired Samples & Blocks 17. 18. Find and interpret in context a paired confidence interval, checking the necessary assumptions. Perform a paired t-test to include: writing appropriate hypotheses, checking the necessary assumptions, drawing an appropriate diagram, computing the P-value, making a decision, and interpreting the results in the context of the problem. Chapter 22: Comparing Counts 19. Perform chi-square tests for goodness-of-fit, homogeneity, and independence, to include: writing appropriate hypotheses, checking the necessary assumptions, drawing an appropriate diagram, computing the P-value, making a decision, and interpreting the results in the context of the problem. Unit 3: Technology Instructions Technology Instructions for Unit 3 Unit 3: Technology Instructions Unit 3: Technology Instructions Unit 3: Technology Instructions Unit 3: Technology Instructions Unit 3: Technology Instructions Unit 3: Technology Instructions Unit 3 Chapter 15 – Sampling Distribution Models Define the Sampling Distribution of Proportions: Sampling Distribution of p̂ : 𝑁 (𝑝, √ 𝑝𝑞 𝑛 ) Conditions to check: – We use the normal model to answer questions about how likely it is for a sample proportion to be in a certain range of values – Note: To use the sampling distribution of proportions to determine how unusual sample statistics are, we need to know the true population proportion p 1) Assume that 30% of the students at a University wear contacts and we randomly pick 100 students to see what percentage of them wear contacts. a) What is the sampling distribution? b) What is the probability that more than one third of them wear contacts? 2) Based on past experience, a bank believes that 7% of the people who receive loans will not make payments on time. The bank has recently approved 200 loans. a) What are the mean and standard deviation of the proportion of clients in this group who may not make timely payments? b) What assumptions underlie your model? Are the conditions met? c) What is the probability that over 10% of these clients will not make timely payments? Unit 3 Define the Sampling Distribution of Means: Sampling Distribution of y : 𝑁 (𝜇, 𝜎 √𝑛 ) Conditions to check: – We use the normal model to answer questions about how likely it is for a sample mean ( y ) to be in a certain range of values – Note: To use the sampling distribution of means to determine how unusual sample statistics are, we need to know the true population mean 𝜇 and the true population standard deviation 𝜎 3) Ithaca, New York gets an average of 35.4” of rain each year with a standard deviation of 4.2”. Assume the Normal model applies to their yearly rainfall. a) During what percentage of years does Ithaca get more than 40” of rainfall? b) Describe the amount of rainfall that separates the driest 20% of all years. c) You plan to stay in Ithaca for 4 years. Describe the sampling distribution of the means for the average rainfall over a 4-year period. d) What is the probability that those four years average less than 30” of rain? Unit 3 4) College’s data about the incoming freshmen is normally distributed with the mean of their high school GPAs is 3.4 and a standard deviation of 0.35. The students are randomly assigned to freshmen writing seminars in groups of 25. a) Describe the distribution of the students’ high school GPAs. b) Describe the sampling distribution model that applies to the average GPAs for students in a writing seminar. c) Find the probability a student has a GPA greater than 3.5. d) Find the probability that one of the groups has an average GPA greater than 3.5. Unit 3 Extra Practice/Homework 5) A national study found that 44% of college students engage in binge drinking (5 drinks at a sitting for men, 4 for women). A professor surveyed a random sample of 244 students at his college. What is the probability that less than 96 of them admitted to binge drinking in the past week? 6) Suppose the weights of college textbooks are normally distributed with a population mean of 4 pounds and a population standard deviation of 1.2 pounds. Suppose randomly sample and weigh 16 textbooks. Answer the following questions: a) What is the probability that the sample average of the 16 textbooks is less than 3.5 pounds? (use 2 decimal places in your answer)) b) What is the probability that a given (i.e. one) textbook weighs more than 5.5 pounds? (use 2 decimal places in your answer) c) Imagine the distribution of all possible samples averages of textbook weights (with n=16). What is the weight that separates the lightest 25% of sample averages from the heaviest 75% of sample averages? (round your answer to the nearest tenth of a pound) Unit 3 Chapter 16 – Confidence Intervals for Proportions 1) Your local newspaper polls a random sample of 330 voters, finding 144 who say they will vote “yes” on the upcoming school budget. What is the true support for the school budget? a) Try this by hand: One-proportion z-interval: pˆ z*SE pˆ where SE pˆ ˆˆ pq n Conditions to check: b) What is the margin of error? c) Now try using the TI: STATTEST1-PropZInt 2) A random sample of 416 teenagers showed that 76.4% of them had experimented with alcohol. a) What is the 95% confidence interval for the proportion of the population? b) Interpret this confidence interval Unit 3 3) Direct mail advertisers send solicitations to thousands of potential customers in the hope that some will buy the company’s product. The response rate usually is quite low. Suppose a company wants to test the response to a new flyer and sends it to 1000 people randomly selected from their mailing list of over 200,000 people. They get 123 orders from the recipients. a) Create a 90% confidence interval for the percentage of people the company contacts who may buy something. b) Explain what the interval means. c) The company must decide whether to now do a mass mailing. The mailing won’t be cost effective unless it produces at least a 5% return. What does your confidence interval suggest? 4) An experiment finds that 27% of 53 subjects report improvement after using a new medication. Create a 95% confidence interval for the actual cure rate. a) Check the conditions: b) Find the 95% confidence interval: c) Why is our interval so wide? d) Make it narrower – 90% confidence. e) What are the advantages and disadvantages? Unit 3 Sample Size Computations – To find the sample size needed to obtain a certain level of confidence within a set margin of error, we can use the following formula (derived from our Margin of Error formula) s n z* 2 ˆˆ pq ME 2 o z* is the z-critical value needed for the confidence level ˆ ˆ are our estimates from a previous study o pq use 0.5 for both if no estimates are given 5) Recall that an experiment finds that 27% of 53 subjects report improvement after using a new medication. What sample size would we need in a follow up study if we want a margin of error ± 5% with 98% confidence? Hints: – Solve for z* of the 98% confidence interval using invNorm(.01) – Use our previous estimates of p^ and q^ and solve for n. – Hint: always round up because we need to ensure a large enough sample. 6) In preparing a report on the economy, we need to estimate the percentage of businesses that plan to hire additional employees in the next 60 days. a) How many randomly selected employees must we contact in order to create an estimate in which we are 98% confident with a margin of error of 5%? b) Suppose we want to reduce the margin of error to 3%. What sample size will suffice? c) Why might it not be worth the effort to try to get an interval with a margin of error of only 1%? Unit 3 Extra Practice / Homework 7) A survey of 95 families showed that 40 of the families owned at least one tv. a) Find the 95% confidence interval on the proportion of families who own at least one tv. b) Find the Margin of Error associated with this confidence interval. c) Interpret this interval in the context of the problem. Carefully explain what 95% confidence means in this context. d) Suppose you reduce your level of confidence to 90%. Will this create a wider or narrower interval? Does this mean that precision will increase or decrease? 8) A survey of shoppers is planned to see what percentage use credit cards. Prior surveys suggest 53% of shoppers use credit cards. How many randomly selected shoppers must we survey in order to estimate the proportion of shoppers who use credit cards to within 3% with 95% confidence? Unit 3 Chapter 17 – Hypotheses Testing about Proportions 1) Previously the cracking rate of iron casts was 20%. After a new engineering process the cracking rate of 400 casts fell to 17%. Is this due to the new engineering or just random chance? a) What is the null hypothesis (no change)? b) What is the alternate hypothesis (this is what we want to show)? c) Compute a one-proportion z-test by hand: z ( pˆ p0 ) where SD pˆ SD pˆ p0 q0 n Conditions to check: d) Sketch the z distribution e) What is the probability that this data would be seen by chance? f) How convincing is the data? g) Now try this test using the TI STATTESTS1PropZTest Unit 3 2) Hypothesis Writing Practice -- Write the null and the alternative hypotheses for the following: a) In the 1950’s only about 40% of high school graduates went on to college. Has the percentage changed? b) 20% of the cars of a certain model have needed costly transmission work after being driven between 50,000 and 100,000 miles. The manufacturer hopes that the redesign of the transmission has solved this problem. c) We field test a new flavor of soft drink, planning to market it only if we are sure that at least 60% of the people like the flavor. d) The drug Lipitor is meant to lower cholesterol. Is there evidence to support the claim that over 1.9% of the users experience flu like symptoms as a side effect? e) According to the US department of Health, 16.3% of Americans did not have health insurance coverage in 1998. A politician claims that this percentage has decreased since 1998. f) During the past forty years, the monthly rate of return for a particular item has been 4.2 percent. A store analyst claims that it is different. Unit 3 3) It is believed that the percent of convicted felons who have a history of juvenile delinquency is 70%. Is there evidence to support the claim the the actual percentage is more than the 70% if out of 200 convicted felons, we find that 154 have a history of juvenile delinquency? Alpha = .05 a) Write the null and alternative hypotheses. b) Check the assumptions c) Draw the curve. d) Find the critical value and the p value. e) Should you reject or fail to reject the null hypothesis? Why? f) State your conclusion 4) In the 1980’s it was generally believed that autism affected about 6% of the nation’s children. Some people believe that the increase in the number of chemicals in the environment has led to an increase in the incidence of autism. A recent study examined 384 children and found that 46 of them showed signs of some form of autism. Is there strong evidence that the level of autism has increased? (Use an alpha of 0.05) a) Write the null and alternative hypotheses. b) Check the assumptions c) Draw the curve. d) Find the critical value and the p value. e) Should you reject or fail to reject the null hypothesis? Why? f) State your conclusion Unit 3 Extra Practice / Homework 5) Past studies have indicated that 85.6% of all enrolled college students in the U.S. are undergraduates. A random sample of 500 enrolled college students in a particular stats revealed that 420 of the students were undergraduates. Is there enough evidence to conclude that the percentage of undergraduates from this particular state differs from the national percentage? a) What type of test would be appropriate for this situation? (name the test and indicate if it should be a one tailed test or two tailed test?) b) State the null and the alternate hypotheses. c) Perform the appropriate test and report the P-value d) Use your P-value to make a conclusion about the problem. Are you accepting or rejecting the null hypothesis? Unit 3 Chapter 18 – Inferences about Means Confidence Intervals for the Mean 1) Recently, realtors randomly sampled 36 bids from potential buyers to estimate the average loss in home values. The sample showed an average loss of $9560 with a standard deviation of $1500. Conditions to check: a) Check the assumptions and conditions. b) Find a 95% confidence interval for the mean loss in value per home by hand using the formula: s One-sample t-interval: y tn*1SE y where SE y n Hint: you will need to look up t* using DISTRinvT or the StatCrunch t distribution calculator c) Also try this using your TI: STATTESTS TInterval d) Interpret the interval. Unit 3 2) Hoping to lure more shoppers downtown, a city builds a new public parking garage. The city plans to pay for the structure through parking fees. During a two month period (44 week days) daily fees collected averaged $126 with a standard deviation of $15.If a consultant predicted that the average daily income would be $130, would he be correct at the 90% confidence level? 3) In 1998, the Nabisco Company announced a “1000 Chips Challenge” claiming that every 18 ounce bag of Chips Ahoy contained at least 1000 chocolate chips. Below are the counts of chips in selected bags. 1219 1214 1087 1200 1419 1121 1325 1345 1244 1258 1356 1132 1191 1270 1295 1135 What does this evidence say about Nabisco’s claim? 4) The survival times in weeks are given for 20 male rats which were exposed to a high level of radiation. 152 152 115 109 137 88 94 77 160 165 20 128 123 136 101 62 152 83 69 125 Determine the 95% confidence interval on the mean survival time for rats. Unit 3 Hypothesis test for the mean - T Test: 5) Judy is an ad designer who designs the newspaper ads for the Giant grocery store. Electronic counters at the entrance total the number of people entering the store. Before Judy was hired, the mean number of people entering every day was 3018. Since she has started working at the Giant the management thinks that this average has increased. A random sample of 42 business days gave an average of 3333 people entering the store daily with a standard deviation of 287. Does this indicate that the average number of people entering the store every day has increased? Use an alpha of 0.01. a) Write the hypotheses: b) Try this test by hand: t y 0 SE y SE y s n Conditions to check: c) Sketch the t distribution d) Find the p-value DISTRtcdf(lower, upper, df) where df = n-1 e) Try again using the TI: STATTESTST-TEST – Inpt: choose data or stats according to what is available (usually you use stats) – μ0 : stands for the hypothecated mean which is in your hypothesis In this case we are testing against the previous 3018. – X bar is the sample mean of 3333 – Sx is the standard deviation of 287 – n: the number in your sample 42 days – μ : choose the notation used in the alternative hypothesis In this case we are looking for an increase so choose > – Calculate f) Write a conclusion Unit 3 6) Hoping to lure more shoppers downtown, a city builds a new public parking garage. The city plans to pay for the structure through parking fees. During a two month period (44 week days) daily fees collected averaged $126 with a standard deviation of $15. If a consultant claimed that the average daily income would be $130, should we reject her claim using alpha=0.10 (perform a 2-sided test)? How is this similar or different to using the confidence interval? 7) When consumers apply for credit, their credit is rated using FICO scores. A random sample of credit ratings is obtained, and the FICO scores are summarized with these statistics: n=18, ybar=660.3, s=95.9. Use an alpha of 0.05 and do a 2-sided hypothesis test to test the claim that the mean credit score (of the general population) is equal to 700 (Triola 2008). 8) Different cereals are randomly selected, and the sugar content is obtained for each cereal, with the results given below for Cheerios, Harmony, Smart Start, Cocoa Puffs, Lucky Charms, Corn Flakes, Fruit Loops, Wheaties, Cap’n Crunch, Frosted Flakes, Apple Jacks, Bran Flakes, Special K, Rice Krispies, Corn Pops, and Trix. Use an alpha of 0.05 to test the claim of a cereal lobbyist that the mean sugar content of all cereals is LESS than 0.3 g (Triola 2008). 0.03 0.24 0.30 0.47 0.43 0.07 0.47 0.13 0.44 0.39 0.48 0.17 0.13 0.09 0.45 0.43 Unit 3 Extra Practice / Homework 9) The State Superintendent of Schools consistently states that the average salary of all substitute teachers in every school district is at least $60 per day. The president of the teachers' union believes that the average salary is less than $60 per day. So, she sample eight school districts to help test her claim. The daily salaries she found for the districts were $60 $56 $60 $55 $70 $55 $60 $55 a) What statistical test should be performed? Should this test be one sided or two? b) Which condition has to be satisfied for the test that we did not have to check for the test on proportions? c) Write the two hypotheses for the problem. d) Make a sketch of the distribution indicating the test statistic and p. e) Compute the appropriate statistical test and report the test statistics and associated pvalue. f) In the context of the problem, write a conclusion to the study and substantiate your conclusion using statistical arguments. Use an alpha level of 0.05. g) Find a 95% confidence interval for the true daily salary based on the data. Unit 3 Chapter 19 – More about Tests and Intervals Statistical Significance – What is a p-value? – What is the difference between practical significance and statistical significance? 1) Are they speeding? – College Terrace speed limit: 25 mph Even after traffic-calming measures, a resident complains that cars still speed. – 250 randomly selected cars were clocked with mean speed 25.55 mph, s = 3.618. Is the mean speed of all cars greater than 25 mph? a) Write the hypotheses: b) Check conditions c) Perform hypothesis test d) Are the results statistically significant? e) Find the 95% confidence interval f) Are the results practically significant? Unit 3 Types of Errors – What is a Type I error? – What is a Type II error? 2) Have harsher penalties and ad campaigns increased seatbelt use among drivers and passengers? Observations of commuter traffic failed to find evidence of a significant change compared with three years ago. a) Explain what a P-value of 0.17 means in this context. b) What are the hypotheses? c) In this context, what are the Type I and Type II errors? 3) Production managers on an assembly line must monitor to be sure that the level of defective products remains small. They periodically inspect a random sample of the items produced. If they find a significant increase in the production of items that must be rejected, they will halt the assembly process until the problem can be identified and repaired. a) What are the hypotheses? b) In this context, what is a Type I error?.... Type II? c) What are the consequences of each type of error? Unit 3 4) A company is willing to renew its advertising contract with a local radio station only if the station can prove that more than 20% of the residents of the city have heard the ad and recognize the company’s product. The radio station conducts a random phone survey of 600 people and finds 133 people that recognize the product. In this context, what are the Type I and Type II errors? 5) A statistics professor has observed that about 13% of the students who initially enroll in his Introductory Stats class withdraw before the end of the semester. A salesman suggests that he try a statistics software package that gets students more involved with computers, predicting it will cut the dropout rate. The software is expensive so the salesman offers to let the professor use it for a semester to see if the dropout rate goes down significantly. a) Is this a one-tailed or two-tailed test? b) Write the null and alternative hypotheses c) In this context, what would happen if the professor makes a Type I error? d) In this context, what would happen if the professor makes a Type II error? e) Initially 203 students signed up for the course. They used the software suggested by the salesman and only 11 dropped out of the course. f) Should the professor spend the money for this software? Support your recommendation with an appropriate test. g) Explain what your P-value means Unit 3 Extra Practice / Homework 6) New road signs are made with the intention of improving visibility to drivers. Highway safety engineers setup a test course that included both the old and the new signs. Volunteers drove the test course and rated the old and new signs in terms of visibility. a) Write the null and alternative hypotheses "in words" using "improved visibility" and "not improved visibility". b) Using the hypotheses, write what would constitute a TYPE I error in the context of the problem. c) What would be the consequence of the TYPE I error? d) What would be a TYPE II error in the context of the problem? e) What would be the consequence of the TYPE II error? Unit 3 Chapter 20 – Comparing Groups Comparing Two Proportions 1) The Sleep in America Poll found that 205 of 293, or 70%, of Gen-Y use the Internet before sleep. 235 of 469, or 50%, of Gen-X use the Internet before sleep (De Veaux et al., 2014). pˆ1qˆ1 pˆ 2 qˆ2 Two-proportion z-interval: pˆ1 pˆ 2 z*SE pˆ1 pˆ 2 where SE pˆ1 pˆ 2 n1 n2 a) Use the two-proportion z-interval to build a confidence interval for the true difference in the proportions of Gen-Y’ers and Gen-X’ers who use the internet before sleep b) Now also try using your TI: STATTESTS2PropZInt Conditions to check: c) Interpret this confidence interval the context of the problem: 2) (Using TI) 84.9% of 12,460 males and 88.1% of 12,678 females indicated that they had high school diplomas. Find the 95% confidence interval for the difference in graduation rates between males and females. – STATTESTS2PropZInt o x1: number of males with diploma; .849 times 12,460 or approximately 10,579 o n1: 12460 o x2: number of females with a diploma; .881 times 12,678 or approximately 11,170 o n2: 12678 o C-level: .95 – You should get a confidence level from -4.1% to -2.4%. The percentages are negative because we let n1 represent the males and their percentage of graduates was lower than the female percentage. – When we interpret this confidence interval we should discuss the positive differences: o We are 95% confident that the true difference in the proportion of males and females who graduate high school is between 2.4% and 4.1%, with females having the higher graduation rate. Unit 3 Hypothesis Testing for the Difference in Proportions In the prior example we are 95% confident that the difference in the proportions of males and females that graduate is between 2.4 and 4.1 percentage points. – Is it likely that the proportions of males and females that graduate are actually the same? – What would have to be true of the difference of the proportions for this to be the case? – What might our 95% confidence interval look like if the true proportions are really the same? It turns out we have an easier way of testing whether or not two sample proportions are really from the same population (i.e. the same) 3) Try a hypothesis test for example 2) 84.9% of 12,460 males and 88.1% of 12,678 females indicated that they had high school diplomas a) Write your hypotheses: b) STATTESTS2PropZTest – You are testing to see if there is a difference, so choose p1 NOT equal to p2. 1) What conclusion can you make? 4) A survey of 430 randomly selected adults found that 21% of the 222 men and 18% of the 208 women had purchased books online. Is there evidence that men are more likely to make online purchases of books? Unit 3 5) Would being part of a support group that meets regularly help people who are wearing the nicotine patch actually quit smoking? A county health department tries an experiment using several hundred volunteers who are planning to use the patch. The subjects were randomly divided into two groups. – People in Group 1 were given the patch and attended a weekly discussions meeting with counselors and others trying to quit. People in Group 2 also used the patch but did not participate in the counseling groups. – After six months 46 of the 143 smokers in Group 1 and 30 of the 151 smokers in Group 2 had successfully stopped smoking. Do these results suggest that such support groups could be an effective way to help people stop smoking? 6) 120 subjects, 60 men and 60 women, were given the “water-level task” to test their perceptual and conceptual skills. Subjects were shown a drawing of a glass titled at a 45 degree angle. They were then asked to draw a line representing the surface of the water. The researchers recorded whether or not the line drawn was within 5 degrees of the true line. These are the results of people who were within the 5 degrees. Are the proportions the same? x n women 24 60 men 39 60 Unit 3 Comparing Two Means 7) In the morning class, the mean on the first exam was 78 with a standard deviation of 2.3 for the 57 students in the class. In the afternoon class, the mean was 81 with a standard deviation of 5.7 for the 37 students in the class. Determine the 90% confidence interval for the difference in the means of the two groups. STATTESTS2-SampTInt – For Input choose Stats unless you have placed the data in L1 and L2. – 𝑥̅1 is 78 and Sx1 is 2.3 with n1 as 57 – 𝑥̅2 is 81 and Sx2 is 5.7 with n2 as 37 – C-level is .9 Conditions to check: – Never pool – Calculate Report and interpret the confidence interval 8) The data below show the sugar content (as a percentage of weight) of several national brands of children’s and adult’s cereals (De Veaux et al., 2014). Create and interpret a 95% confidence interval for the difference between in mean sugar content. Be sure to check the necessary assumptions and conditions Children’s: 40.3 55 45.7 43.3 50.3 45.9 53.5 43 44.2 44 47.4 44 33.6 55.1 48.8 50.4 37.8 60.3 46.6 Adults’: 20 30.2 2.2 7.5 4.4 22.2 16.6 14.5 21.4 3.3 6.6 7.8 10.6 16.2 14.5 4.1 15.8 4.1 2.4 3.5 8.5 10 1 4.4 1.3 8.1 4.7 18.4 a) Check the assumptions and conditions b) Report the confidence interval c) Interpret the confidence interval d) Does this interval suggest a true difference in mean sugar content? Unit 3 2-Sample t-Test Use the example above to conduct a 2-Sample t-Test for the difference between two means. e) Hypotheses: f) Run the test using STATTESTS2SampTTest ***never pool g) Results: 9) It has been suggested that college students learn more and obtain higher grades in small classes (40 or less) when compared to large classes (150 or more). To test this claim, a university assigned a professor to teach a small and a large class of the same course. At the end of the course, the classes were given the same exam. The following are the final grade results. Test the claim at an alpha of 0.05. Sample size 35 Sample mean 74.2 Standard deviation 14 170 71.7 13 Hypotheses: H0: μsmall=μlarge HA: μsmall>μlarge Test: STATTESTS2-SampTTest and select STATS not data fill in the appropriate statistics For μ1: choose > don't pool Calculate a) Sketch the t distribution, indicate your test statistic, and shade the area corresponding with your p-value. b) What conclusion should be made? Unit 3 10) A study was conducted to assess the effects that occur when children are exposed to cocaine before birth. 190 children born to cocaine users had a mean score of 7.3 (with a standard deviation of 3.0) on a certain aptitude test. 186 children not exposed to cocaine had a mean score of 8.2 with a standard deviation of 3.0. Use an alpha of 0.05 to test the claim that cocaine use is harmful to children’s aptitude (Triola 2008). 11) Researchers investigated how the size of a bowl affects how much ice cream people tend to scoop when serving themselves. At an “ice cream social”, people were randomly given either a 17 oz or a 34 oz bowl and were invited to scoop as much ice cream as they liked. Did the bowl size change the selected portion size? (DeVeaux et al., 2009) Small bowl: n: 26, y(bar): 5.07oz, s: 1.84oz Large bowl: n: 22, y(bar): 6.58oz, s: 2.91oz Test an appropriate hypothesis and state your conclusions (for assumptions and conditions that you cannot test, assume they are sufficiently satisfied to proceed) Unit 3 Extra Practice / Homework 12) A study was conducted to determine if the percent of women who received financial aid is different form the percentage of men who receive financial aid. A random sample of students revealed the results shown in the table. Using (α=0.01), is there significant evidence to reject the null hypothesis? women men sample size 250 number receiving aid 200 300 225 a) Find the 90% confidence interval for the difference in financial aid rates between men and women and interpret this interval in the context of the problem. b) Write the two hypotheses for the problem. c) Compute the appropriate statistical hypothesis test and report the test statistic and pvalue. d) Write the conclusion for the problem regarding the null hypothesis using statistical language. 13) Given below are samples of enrollment from medical schools that specialize in research and from those that specialize in primary care. You need to decide if there is a difference in the average enrollment for the two specializations. Research 474 577 605 663 783 467 670 414 813 443 565 696 692 694 277 419 884 Primary Care 783 605 427 728 546 474 371 107 442 587 293 277 662 555 527 320 a) Find the 95% confidence interval for the difference between the means and interpret the interval in the context of the problem. b) Write the two hypotheses for this problem. c) Find the test-statistic (i.e. t) and p-value. Sketch the distribution of test-statistics and indicate t and the shaded region corresponding to the p-value. d) In the context of the problem, make a conclusion and substantiate with statistical arguments. Unit 3 Chapter 21 – Paired Samples and Blocks 1) Ages -- Are husband’s older than wives? This data set contains the ages of the husband and the wife for 20 couples. We want to know what the difference is between the ages of husbands and wives. Since this data is paired (and clearly not independent) we should not use the 2-Sample T Interval or 2-Sample T Test. Instead consider the pairwise differences. Using our calculator, enter the Age of Husband in L1, the Age of Wife in L2. Move your cursor over L3 and type L1L2… enter. L3 will fill with the pairwise differences. We will treat this as our data set. a) Create a 90% confidence interval for the true (pairwise) differences and interpret it in the context of the problem Age of Husband 29 38 47 23 45 48 35 34 56 52 44 30 26 65 25 37 72 27 50 36 Age of Wife 30 33 45 22 46 33 35 37 57 47 36 31 29 62 25 38 70 28 45 34 Pairwise Differences b) Conduct a T-Test using an α=0.05 Our hypotheses now need to be in terms of the mean difference 𝜇𝑑 H0: 𝜇𝑑 = 0 Conditions to check: HA: 𝜇𝑑 >0 (we are trying to see if husbands are older than wives) c) What conclusion is appropriate? d) Considering both a) and b) are our results statistically significant? Practically significant? e) What type of error might we have made? Unit 3 Worked Example: 2) A psychologist who is interested in testing the relationship between stress and short-term memory administered a test to 10 subjects prior to their exposure to a stressful situation and then retested them after the stress situation. From the following data can we conclude that the stress situation decreases one’s performance on a tests that measures short-term memory? Use an alpha of 0.05 Subject 1 2 3 4 5 6 7 8 9 10 Pre stress 13 15 9 13 15 17 13 16 11 13 Post stress 10 14 7 15 11 14 13 14 9 14 – This is a matched sample problem.. same group of subjects doing two different things…. they are dependent. We are interested in their mean difference. We want to look at everyone's difference in memory before the stress and after the stress. We then will look at the average of all of their differences. Since the question is to determine if stress decreases short-term memory, we need to discover if pre-stress memory>post-stress memory. So the difference we will take is pre - post. Hypotheses: H0: μ0 = 0 HA: μ0 > 0 (i.e. pre-stress > post-stress) – In L1 put the pre stress measures and in L2 put the post stress measures. Go to the very top of L3, (actually go on top of "L3") and type in 2nd L1 - 2nd L2 enter. You should see data in L3 which represents each subject’s difference. Now do a T-test on L3 using Data Test Stat -> Tests -> T-Test (same as we did for single samples) μ0 = 0 List: L3 Freq: 1 μ = 0 : use > Calculate Results You find a test statistic of 2.33333 and a p-value of .02 also indicating that the null hypothesis should be rejected. Therefore, the evidence suggests that stressful situations can cause a decrease in short-term memory. Unit 3 3) To test the effectiveness of a drug to relieve asthma, a group of subjects was randomly given a drug and placebo on two different occasions. After 1 hour an asthmatic relief index was obtained for each subject, with these results: (Low numbers are good!) Is the drug more effective? Subject Drug Placebo 1 28 32 2 31 33 3 17 19 4 22 26 5 12 17 6 32 30 7 24 26 8 18 19 9 25 25 4) A composition teacher wishes to see whether a new grammar program will reduce the number of grammatical errors her students make when writing a two page essay. The data are shown below. Can it be concluded that the number of errors has been reduced using this new program? Student 1 2 3 4 5 6 Errors before Errors after 12 9 9 6 0 1 5 3 4 2 3 3 5) Use the following data (representing hospital admissions from motor vehicle crashes) and an alpha of 0.05 to test the claim that Friday the 13ths are unlucky (Triola 2008): Friday the 6th(immediately preceding the 13th) Friday the 13th 9 13 6 12 11 14 11 10 3 4 5 12 Unit 3 Extra Practice / Homework 6) Eight students in a statistics class were asked to report the number of hours they slept on weeknights and on weekends. The table shows the results. Student # 1 2 3 4 5 6 7 8 weeknight hours 8 5.5 7.5 8 7 6 6 8 weekend hours 4 7 10.5 12 11 9 6 9 a) How can you tell that this is a problem involving dependent samples? b) Create a 90% confidence interval on the mean difference and interpret the intervals in the context of the problem. c) Write the two hypotheses for the problem. d) Perform a statistical test on the difference in the number of hours slept on weeknights vs. weekend nights and report your test statistic and p-value. Sketch the distribution of test statistics and indicate your test statistic as well as the shaded region corresponding to p. e) Make a conclusion based on your test. Unit 3 Chapter 22 – Comparing Counts 2 Goodness of Fit Test Hypotheses: – H0: The distribution of counts occurs in a manner consistent with our model. o p1 p2 ... pn or p1 #, p2 #,... pn # – HA: the distribution of counts occurs in a manner which is inconsistent with our model. o H A : not all pi ' s are stated o Note: the distribution of counts can vary from our model in many different ways! 1) Is there a birth-month advantage to becoming a major league baseball player? To answer this question we need to see how to distribution of birth month for major league baseball players compares to the overall distribution of birth month for the national population (that is our model) a) We start by finding the expected counts… (Assuming the null hypothesis is true, how many players in our sample would we expect to be born each month). Each expected count is n*p (the total number of players)*(the national birth % from our model) b) Make a table of that shows the Observed and Expected Counts for each month c) We compute the 2 test statistic as follows: 2 Obs Exp Exp The 2 statistic follows the 2 distribution with nConditions to check: 1 degrees of freedom (n = # of categories) d) To find the p-value you can use a 2 distribution calculator such as 2 cdf in your TI 2 Unit 3 2) A die is filled with a lead weight and then rolled 200 times with the following results: 1: 27 2: 31 3: 42 4: 40 5: 28 6: 32 Use an alpha of 0.05 to test the claim that the outcomes are not equally likely (Triola 2008). 3) You are planning to open an old time soda fountain and your partner claims that the public will not prefer any flavor over another. The flavors you serve are cherry, strawberry, orange, lime and grape. After several customers, you stop and take a look at how sales are going and here are the results. The following numbers of people ordered the flavor shown. Cherry 35, Strawberry 32, Orange 29, Lime 26 and Grape 25. Test to see if there was a preference. Unit 3 2 Test of Homogeneity 4) Are students’ post-graduation plans the same at different colleges within the same university? (De Veaux et al. 2014) Hypotheses: – H0: The distribution of ________ is the same for __________ o e.g. Students’ post-graduation activities are distributed in the same way for all four colleges. – HA: The distribution of ________ is not the same for __________ o e.g. Students’ plans do not have the same distribution. Expected counts = (row total)(column total) , grand total 2 Obs Exp 2 Exp and the distribution has(r – 1)(c – 1) degrees of freedom Using the TI: – Enter the contingency table into the TI using the Matrix function. – Matrx…Edit…enter to get into the matrix [A]. o A matrix is called by its rows first and then its columns. Enter these numbers accordingly. MATRIX[A] 3x4 o Next fill in the matrix as it appears. Remember to hit enter after each data entry. o Then 2nd Quit. – Stat -> Test -> #C x2-test. o We placed the observed data in [A] and the TI will place the expected values in [B] or anywhere you stipulate. o Then ask the TI to calculate. Unit 3 5) The following chart shows the post high school choices for three graduating classes. Determine if the post high school choices have changed over the three decades. College Employment Military Travel 1980 320 98 18 17 1990 245 24 19 2 2000 288 17 5 5 Total 853 139 42 24 Total 453 290 315 1058 Although example 5 was worded as a test of homogeneity, we could have written it as a test of independence by asking if post-high school choices were independent of decade of graduation. *the test mechanics are the same! 2 Test of Independence Hypotheses: H o : _______ and _______ are independent H A : _______ and _______ are not independent has (r – 1)(c – 1) degrees of freedom 6) Recall this contingency table showing who survived the sinking of the Titanic based on whether they were crew members or passengers booked in first-, second or third-class staterooms. Determine if surviving was independent of status. Crew Alive 212 Dead 673 Total 885 First 202 123 325 Second 118 167 285 Third 178 528 706 Total 710 1491 2201 Unit 3 7) A sociologist wishes to see whether the number of years of college a person has completed is related to his or her place of residence. A sample of 84 is selected and classified as shown. Location No College B.S. Degree Master’s Degree PhD Urban 15 12 16 5 Suburban 8 15 11 9 Rural 6 7 12 7 At =0.05, can the sociologist conclude that the years of college education are independent of the residence location? 8) Use the following data to do a test of independence to see if left-handedness is independent of gender (use alpha=0.05): Left-Handed Right-Handed Male 17 83 Female 16 184 Unit 3 Extra Practice / Homework 9) The makers of M&Ms claim that the colors of M&Ms are distributed as follows: 20% yellow, 20% red, 10% orange, 10% blue, 10% green and the rest are brown. You buy a bag and get 29 yellow, 23 red, 12 orange, 14 blue, 8 green, and 20 brown. a) If the M&Ms are packaged in the stated proportions, how many of each color should you expect to get? Create a table of observed and expected counts. b) To see if your bag was unusual should you test goodness-of-fit, homogeneity, or independence? c) State the hypotheses d) How many degrees of freedom are there? e) Find the chi-square statistic and the p-value f) Draw a sketch of this distribution and label your chi-square statistic and p-value. g) State a conclusion Unit 3 10) The table below shows the rank attained by male and female officers in the New York City Police Department (NYPD). Do these data indicate that men and women are equitably represented at all levels of the department? (De Veaux et al., 2009) Rank: Male Female 21,900 4,281 Officer 4,058 806 Detective 3,898 415 Sergeant Lieutenant 1,333 89 359 12 Captain Higher Ranks 218 10 a) State what type of test is appropriate and give your hypotheses. b) Find the chi-square statistic, the p-value, and report the degrees of freedom. c) State and justify your conclusions. Are gender and rank independent or is there an association? Unit 3 Confidence Interval & Hypothesis Testing Calculator Functions Procedure Calculator Function Hypotheses 1-Proportion Confidence Interval 1-Proportion Hypothesis Test A: 1-PropZInt 5: 1-PropZTest 1-Mean Confidence Interval 1-Mean Hypothesis Test 8: TInterval 2: TTest 2-Proportion Confidence Interval 2-Proportion Hypothesis Test B: 2-PropZInt 6: 2-PropZTest 2-Mean Confidence Interval (Independent Samples) 2-Mean Hypothesis Test (Independent Samples) 0: 2-SampTInt 2-Mean Confidence Interval (Paired/Dependent Samples) 2-Mean Hypothesis Test (Paired/Dependent Samples) 8: TInterval (on Pairwise Δs) 2: TTest (on Pairwise Δs) H0: µd = 0 (note: µd = µ1 - µ2) HA: µd ≠, <, > 0 χ2 Goodness-of-Fit Test D: X2GOF-Test (new OS only) H0: The distribution of ______ occurs in a manner consistent with the reported percentages: HA: The distribution of ______ occurs in a manner inconsistent with the reported percentages χ2 Test of Homogeneity C: X2-Test H0: _____ are distributed in the same way for all ____ HA: _____ are not distributed in the same way for all ____ χ2 Test of Independence C: X2-Test H0: The variables _____ and _____ are independent HA: The variables _____ and _____ are not independent 4: 2-SampTTest H0: p = ___ (% as a decimal) HA: p ≠, <, > ___ H0: µ = ___ HA: µ ≠, <, > ___ H0: p1 = p2 (be sure to indicate which group is group 1) HA: p1 ≠, <, > p2 H0: µ1 = µ2 (be sure to indicate which group is group 1) HA: µ1 ≠, <, > µ2 1: One Proportion (or mean) Confidence Interval: We are ___% confident that the true population proportion (or mean) of _________________ is between _____ and ______. 2: Two Proportion (or mean) Confidence Interval: We are ___% confident that the true difference in population proportions (or means) of ____________ and _____________ is between _____ and _____, with ________________ having a larger proportion (or mean). (if applicable) 3: Hypothesis Test (p-value<α): Since my p-value of ____ is less than alpha, I reject the null hypothesis. The statistical evidence seems to indicate that _______________ is not equal to _______________. (or is greater than ___________ or less than __________). 4: Hypothesis Test (p-value≥α): Since my p-value of ____ is greater than alpha, I fail to reject the null hypothesis. There is not enough statistical evidence to conclude that _______________ is not equal to _______________. (or is greater than ___________ or less than __________). Unit 3 Sampling Distributions We know and or p (population parameters) This is unusual! Usually we only know about our sample. Want to know the probability of seeing our sample data x or p , pq Use N p, n Means: Use N , n Solving using the appropriate Normal Curve Population p sample Proportions: x p Hypothesis Testing/Confidence Intervals Here we know about our sample e.g. x or p and want to make inferences about the population , p p x p One group 1 Prop Z Test 1 Prop Z Interval IntervalInterval Is the data proportions? 2 Prop Z Test 2 Prop Z Interval One group Is the data quantitative/means? T Test T Interval Independent Samples 2 Sample T Test 2 Sample T Interval T Test on Pairwise Differences T Interval on Pairwise Differences Is the data Counts of Categorical Variables? One variable and a hypothesized model Goodness of Fit Test of Homogeneity ** and Test of Independence ** These are the same test, but the hypotheses are written differently – pick the one to use based on the question being asked. Final Review Unit 3 Review Instructions: These review questions are in no particular order. Show you work including calculations or calculator inputs. For each question requiring a hypothesis test, be sure to write your hypotheses using correct notation. 1) A company that produces snack foods uses a machine to distribute 454g bags of pretzels. A simple random sample of 25 bags has the following weights: 465 456 438 454 447 449 442 449 446 447 468 433 454 463 450 446 447 456 452 444 447 456 456 435 450 a) Does the data provide sufficient evidence to conclude that the packaging machine is working properly? b) What assumptions and conditions did you need to check for your test? 2) The US Substance Abuse and Mental Health Services Administration conducts surveys on drug use by type and age group. According to the National Household Survey on Drug Abuse, 13.5% of 18-25 year olds were current users of marijuana or hashish in 2000. A recent poll if 1283 randomly selected 18-25 year olds revealed that 205 currently use marijuana or hashish. At the 10% significance level (alpha of 0.10), do the data provide sufficient evidence to conclude that the percentage of 18-25 year olds who currently use marijuana or hashish has changed from the 2000 percentage? Unit 3 3) The Acme company has developed a new battery. The engineer in charge claims that they new battery will operate longer than the old battery. To test the claim, the company selects a simple random sample of 100 new batteries and 100 old batteries. The old batteries run continuously for an average of 190 minutes with a standard deviation of 10 minutes. The new batteries, for 200 minutes with a standard deviation of 40 minutes. Test the engineer’s claim that the new batteries run longer than the old. Use a 0.05 level of significance. 4) Democrats and Republicans were surveyed for their opinions on gun control, and the results are shown in the table below. Democrat Republican Total favor 440 320 760 oppose 400 480 880 unsure 120 100 220 total 960 900 1860 a) Based on this sample, does a person’s opinion on gun control depend on party affiliation, at the 0.05 level of significance? b) What assumptions and conditions did you need to check for your test? Unit 3 5) Suppose the Acme Drug Company develops a new drug designed to prevent colds. The company states that the drug is equally effective for men and women. To test this claim they choose a random sample of 100 women and 200 men from a population of 100,000 volunteers and gave them the new drug. At the end of the study, 38% of the women caught a cold; and 51% of the men caught a cold. a) Based on the findings, can we reject the company’s claim that they drug is equally effective for men and women? Use a 0.05 level of significance. b) What assumptions and conditions did you need to check for your test? 6) A poll found that 38% of a random sample of 1012 American adults said that they believe in ghosts. a) Find the margin of error for this poll if we want 90% confidence in our estimate of American adults who believe in ghosts. b) Explain what a “90% confidence interval” means and find the interval. c) What sample size would be needed to reduce our margin of error to within 1%, while maintaining 90% confidence? Unit 3 7) Within a school district , students were randomly assigned to one of two math teachers, Mrs. Smith and Mrs. Jones. After the assignment, Mrs. Smith had 30 students and Mrs. Jones had 25 students. At the end of the year, each class took the same standardized test. Mrs. Smith’s students had an average test score of 78, with a standard deviation of 10; and Mrs. Jones’ students had an average test score of 85 with a standard deviation of 15. a) Test the hypothesis that Mrs. Smith and Mrs. Jones are not equally effective teachers. Use a 0.05 level of significance. Assume that student performance is approximately normal. b) Sketch the distribution and shade the area corresponding to the p-value. Label the test statistic c) Find the 95% confidence interval of the difference in average student exam scores between the two teachers. d) Is this confidence interval consistent with your decision to reject or fail to reject the null hypothesis in part a? Why or why not? 8) The Centers for Disease Control states that in 2013, 17.8% of American adults were smokers. What is the probability that in a random group of 100 college students sharing a dorm, more than 20% of them are smokers? Unit 3 9) You think that people’s ice cream favorites are reflected by the following model: 25% vanilla, 25% chocolate, 20% strawberry, 15% butter pecan, 8% rocky road, 7% other or no preference. You survey 1000 people and find the following preferences: 220 vanilla, 255 chocolate, 190 strawberry, 170 butter pecan, 95 rocky road, 70 other/no preference. Using alpha = 0.05, was your idea right or wrong? 10) To see if running raises HDL (“good”) cholesterol levels, five female volunteers had their HDL level measured before they started running and after each had run regularly an average of 4 miles daily for 6 months. See if you can support the claim that the average person’s HDL cholesterol level would be raised after all that running. Volunteer Before running: After running: 1 30 35 2 34 39 3 36 42 4 34 33 5 40 48 Unit 3 11) For several years, evidence has been mounting that folic acid reduces major birth defects. Results were published in the New England Journal of Medicine. For the study, doctors enrolled women prior to conception and divided them randomly into two groups. One group consisting of 2701 women, took daily multivitamins containing 0.8 mg of folic acid; the other group, consisting of 2052 women, received only trace elements. Major birth defects occurred in 35 cases when the women took folic acid and in 47 cases when the women did not. a) At the 1% (0.01) significance level, do the data provide sufficient evidence to conclude that women who take folic acid are at lesser risk of having children with major birth defects? b) Using your hypotheses above, describe a Type I error in the context of the problem. What are the consequences of this type of error? c) Using your hypotheses above, describe a Type II error in the context of the problem. What are the consequences of this type of error? d) Which type of error do you think would have worse consequences? e) If there is a difference in birth defect rate, find the 98% confidence interval for the difference in proportion of birth defects. 12) A college’s data about the incoming freshmen indicates that the mean of their high school GPAs is 3.4 with a standard deviation of 0.35. The distribution is normal. The students are randomly assigned to freshmen writing seminars in groups of 25. a) Find the probability a student has a GPA greater than 3.5. b) Find the probability that one of the groups has an average GPA greater than 3.5. Unit 3 13) A researcher wanted to see whether the English like soccer more than the Scots. She asked eight English and eight Scots to rate their liking for soccer on a numerical scale of 1 (hate) to 10 (love), and recorded these responses: English: Scots: 6.4 5.1 5.9 4.0 2.9 7.2 8.2 6.9 7.0 4.4 7.1 1.3 5.5 2.2 9.3 7.7 From the above data, can the researcher conclude that the English have a stronger liking for soccer than the Scots? Use alpha = 0.05. 14) Soccer again! A researcher wanted to see whether the English and Scots are equally fans of soccer. She polled random samples with the question: “Do you watch football at least once a week?” (In the UK they call soccer football). She got these results: Sample size Number of “yes” English 150 104 Scots 200 160 a) At the 0.05 significance level, are the English and Scots equally fans of soccer? Unit 3 15) This year’s water pollution readings at State Park Beach seem to be lower than last year. A sample of 10 readings was randomly selected from this year’s daily readings” 3.5 3.9 2.8 3.1 3.1 3.4 3.2 2.5 3.5 3.1 a) Does this sample provide sufficient evidence (at the 0.05 level) to conclude that the mean of this year’s pollution readings is significantly lower than last year’s mean of 3.8? b) Interpret the meaning of your p-value in the context of this problem. 16) 425 school children were surveyed about what they want to be when they grow up, out of a choice of five professions. The results were: Teacher 80, Doctor 105, Lawyer 70, Police Officer 70, Firefighter 100. Obviously these particular children preferred some occupations over others. You want to test whether their preferences reflect a real difference in the population. a) Indicate the test you should do and write your hypotheses b) What counts would you expect for each profession if the null hypothesis were true? You can show this in a table. c) Run the test and report your test statistic and p-value. d) Sketch the distribution and shade the area corresponding to the p-value. Label the test statistic e) State your conclusion using appropriate notation and vocabulary. Final Review Math 138 Final Exam Review Although there are many problems on this review, it does not fully cover all the material in MATH 138. For additional review problems, review your homework problems, projects, exams, and other classroom handouts. Unit 1 1) Classify the variables as categorical or quantitative: a) name b) employee ID number c) annual income 2) A survey of patients at a hospital classified the patients by gender and blood type, as seen in the table. Gender Male Female Blood A 105 93 type B 98 84 O 160 145 AB 15 18 a) What percentage of the patients with type-B blood are male? b) What percentage of the female patients have type-O blood? c) What percentage of the patients are male and have type-A blood? d) What percentage of the patients are female or have type-O blood? e) Give the conditional distribution of blood type for the males f) Give the conditional distribution of blood type for the females g) Are blood type and gender independent? 3) The number of days off that 30 police detectives took in a given year are provided below. Create a histogram of the data. (You can do this by hand or using technology as long as the bins are reasonable and you can reproduce an accurate sketch). 10 5 5 1 1 4 3 0 1 5 9 7 4 11 7 7 1 11 0 5 1 6 7 5 6 10 6 1 1 0 Final Review 4) Describe what these boxplots tell you about the relationship between fuel efficiency and the number of cylinders an engine has. 5) The stem-and –leaf diagram shows the ages of males playing basketball at a public gym over the course of a day. Describe the shape, center, spread, and unusual features of the distribution. 4 8 9 4 0 1 2 3 3 6 6 8 8 9 3 0 0 0 1 4 4 2 6 7 9 9 9 2 1 5 5 5 5 6 6 6 6 6 6 7 7 7 1 2 3 4 4 4 4 0 0 6) The volumes of soda in quart soda bottles can be described by a Normal model with a mean of 32.3 oz and a standard deviation of 1.2 oz. a) What percentage of bottles can we expect to have a volume less than 32 oz? b) 5% of bottles have a volume smaller than what amount? Final Review 7) The ages of the 21 members of a track and field team are listed below. 15 24 28 18 24 28 18 25 30 19 25 32 22 26 33 23 26 40 24 27 42 a) Report the 5-Number Summary for the data b) Create a boxplot for the data. Use fences to identify potential outliers, if there are any. c) Find the mean and the standard deviation for the data d) What is the z-score for the age of the team member who is 40 years old? e) Interpret the meaning of this z-score. 8) The distances traveled to work in miles by the employees at a large company are normally distributed with the mean of 35 ml. and standard deviation of 10 ml. a) What is the z score of an employee who travels 40 miles? b) What percent of employees travel more than 45miles ? c) Find the 90th percentile of the travel distances d) In what interval do the middle 70% of travel distances fall? 9) Match the given correlation coefficients with the scatter plots. Final Review 10) The attendance at Camden Yards during the 12 years from 2000 to 2011 is given as follows: (Source: http://www.ballparksofbaseball.com/attendance.htm) Year Attendance a) Find the equation of the regression line. 2000 3,296,031 2001 3,094,841 b) Interpret the slope in context. 2002 2,682,439 2003 2,454,523 c) Find and interpret the residual for 2008 2004 2,744,013 d) Overall, is this a good linear fit? Explain. 2005 2,624,804 2006 2,153,150 e) Is it feasible to use this model to predict attendance in 2014? 2007 2,164,822 Why or why not? 2008 1,950,075 2009 1,907,163 f) What is the predicted attendance for 2014? (The actual 2010 1,733,018 attendance was 2,102,240). 2011 1,755,461 11) The March 2000 Consumer Reports compared various brands of supermarket enchiladas in cost and sodium content. Use the scatterplot and regression analysis to answer the questions. Fitted Line Plot Sodium content (mg) = 2185 - 607.0 Cost (per serving) 1750 S R-Sq R-Sq(adj) Sodium content (mg) 1500 250.702 77.3% 74.0% 1250 1000 750 500 1.0 1.5 2.0 Cost (per serving) 2.5 3.0 a) Use the scatterplot above to describe the relationship between Cost and Sodium Content. b) What is the correlation coefficient for the relationship between cost and sodium content? c) How much sodium would you expect if the cost is $2.90? Final Review Unit 2 12) A real estate company kept a database on the apartments in a certain city. The percentages of various types of apartments are listed below. Number of Percent bedrooms 0 (Studio) 15.9 1 25.5 2 45.8 3 10.1 a) Are events represented in this table disjoint? b) What is the probability that a randomly selected apartment in this city is a 1-bedroom or 2-bedroom apartment? c) What is the expected value for the number of bedrooms that an apartment will have? 13) Explain the difference between an experiment and an observational study 14) Of the coffee makers sold in an appliance store, 6.0% have either a faulty switch or a defective cord, 2.0% have a faulty switch, and 0.8% have both defects. What percent of the coffee makers will have a defective cord? 15) Suppose a computer chip manufacturer rejects 15% of the chips produced because they fail presale testing. a) What is the probability that the first chip fails and the second chip passes presale testing? b) If you test 4 chips, what is the probability that not all of the chips fail? Final Review 16) Suppose that on any given day, there is a 65% chance of it being sunny, a 10% chance of rain, and a 25% chance of being cloudy. a) Clearly explain how you would use the random numbers from 1 to 100 to conduct a simulation to model this situation b) If you were asked to find the probability that it will sunny at least two days in a week, clearly state what the response variable would be. c) For each trial below, fill in the resulting outcome Trial # Random Numbers 1 2 3 4 5 6 65 91 5 72 41 69 48 90 82 93 74 63 67 71 34 35 56 83 42 4 95 91 42 18 37 81 85 67 80 57 63 38 76 35 12 63 84 65 36 82 89 68 Number of Sunny Days Response Variable d) Using your simulation, estimate the probability that it will be sunny at least two days in a week. e) Using your simulation, how many sunny days would you estimate per week? 17) Assume that 11% of people are left-handed. If we select 10 people at random, find the probability that a) Exactly 3 are left-handed. b) At most 2 are left-handed c) At least 2 are left-handed d) Between 1 and 4 people are left-handed (inclusive) e) Explain how you know that the Binomial model applies to this situation f) What is the mean and the standard deviation of the number of left-handed people in a group of 10? Final Review 18) Suppose that in a given suburb 60% of the houses have garages, 40% have decks, and 30% of the houses have both. a) What is the probability that a randomly selected house has a garage or a deck? b) What is the probability that a randomly selected house will have neither a garage nor a deck? c) What the probability that a randomly selected house will have a deck but not a garage? d) What is the probability that a randomly selected house will have a deck if it is known that the house has a garage? e) Are the events of a house having a garage and a house having a deck independent? Justify your answer with a probability test. 19) In a certain college, 33% of the physics majors belong to ethnic minorities. If 10 students are selected at random from the physics majors, what is the probability that no more than 6 belong to an ethnic minority? 20) Suppose you buy 1 ticket for $1 out of a lottery of 100 tickets where the prize for the one winning ticket is to be $50. What is your expected value? 21) A tax auditor has a pile of 191 tax returns of which he would like to select 17 for a special audit. Describe a method for selecting the sample which involves a) systematic sampling b) stratified sampling c) simple random sampling 22) At a college there are 120 freshmen, 90 sophomores, 110 juniors, and 80 seniors. A school administrator selects a random sample of 12 of the freshmen, 9 of the sophomores, 11 of the juniors and 8 of the seniors. She then interviews all the students selected. Identify the type of sampling used in this example. Final Review 23) A car insurance company is interested in the association between age and the frequency of car accidents. They obtained the following sample data. Number of accidents in past 3 years 0 1 More than 1 total Age Group Under 25 25-45 Over 45 74 89 82 18 8 12 8 3 6 100 100 100 total 245 38 17 300 a) What is the probability that a randomly selected participant was under age 25? b) What is the probability that a randomly selected participant was under age 25 or in more than 1 car accident? c) What is the probability that a randomly selected participant was over age 45 and in 0 car accidents? d) What is the probability that a randomly selected participant was over age 45 if it is known that they were in 0 car accidents? e) Do the events of being over 45 and having had 0 accidents in the past 3 years appear to be independent? Justify your answer using probabilities. 24) Suppose that in a given court system 20% of defendants are truly innocent and 80% of defendants are truly guilty. Defendants are given the choice to plead innocent or plead guilty. Suppose that defendants who are innocent plead innocent 99% of the time and that defendants who are guilty plead innocent 70% of the time. a) Find the probability that a defendant is guilty and pleads guilty b) Find the probability that a defendant is innocent and pleads innocent c) Find the probability that a defendant pleads innocent d) Find the probability that a defendant who pleads innocent is actually innocent Final Review Unit 3 25) Assume that 25% of students at a university wear contact lenses. We randomly select 200 students. a) What is the mean and standard deviation of the proportion of students in this group who may wear contact lenses? b) What is the probability that we observe a sample proportion (𝑝̂ ) of 30% or more students who wear contact lenses? 26) The number of hours per week that high school seniors spend on computers is normally distributed, with a mean of 4 hours and a standard deviation of 2 hours. 60 students are chosen at random. Let y be the mean number of hours spent on the computer for this group. Find the probability that y is between 4.2 and 4.4 hours. 27) A researcher wishes to estimate the proportion of fish in a certain lake that is inedible due to pollution of the lake. How large a sample should be tested in order to be 99% confident that the true proportion of inedible fish is estimated to within 6%? 28) A mayoral election race is tightly contested. In a random sample of 2200 likely voters, 1144 said that they were planning to vote for the current mayor. Based on a 95% confidence interval, would you claim that the mayor will win a majority of the votes? Explain. 29) 7 of 8,500 people vaccinated against a certain disease later developed the disease. 18 of 10,000 people vaccinated with a placebo later developed the disease. Test the claim that the vaccine is effective in lowering the incidence of the disease. Use a significance level of 0.02. 30) Suppose the proportion of sophomores at a particular college who purchased used textbooks in the past year is p s and the proportion of freshmen at the college who purchased used textbooks in the past year is p f . A study found a 95% confidence interval for ps p f is 0.235,0.427 . Does this interval suggest that sophomores are more likely than freshmen to buy used textbooks? Explain. Final Review 31) A skeptical paranormal researcher claims that the proportion of Americans that have seen a UFO, p, is less than 4%. He surveys 500 randomly selected Americans and finds that 11 of them claim to have seen a UFO. a) What type of test would be appropriate b) Write the hypotheses c) Check the assumptions and conditions d) Conduct the hypothesis test. Report the test statistic and p-value, sketch the curve, and make an appropriate conclusion. e) What type of error might have occurred? 32) A police officer pulls over an individual that was driving recklessly. The police office is trying to determine if the individual should be arrested for driving under the influence of alcohol. Given Null and alternative hypotheses below, identify the type of error: Ho: The individual is not driving under the influence of alcohol. Ha: The individual is driving under the influence alcohol. a) The police officer determines that the individual should be arrested and is driving under the influence of alcohol when the individual is not driving under the influence of alcohol. b) The police officer determines that the individual should not be arrested and is not driving under the influence of alcohol when the individual is under the influence of alcohol. 33) In the past, the mean running time for a certain type of flashlight battery has been 8.5 hours. That manufacturer has introduced a change in the production method and wants to perform a hypothesis test to determine whether the mean running time has increased as a result. He samples 30 newly produced batteries and find a sample mean of 9.2 hours with a standard deviation of 0.6 hours. a) What type of test would be appropriate? b) Write the hypotheses: c) Conduct the test (report the test statistic, p-value, sketch the curve, and write an appropriate conclusion). You can assume that all assumptions and conditions have been met. Final Review 34) Using the data below and a 0.05 significance level, test the claim that the responses occur with percentages of 15%, 20%, 25%, 25%, and 15% respectively. Response A B C D E Frequency 12 15 16 18 19 35) A coach uses a new technique to train gymnasts. 7 gymnasts were randomly selected and their competition scores were recorded before and after the training. The results are shown below. Subject A B C D E F G Before 9.4 9.5 9.6 9.6 9.4 9.6 9.6 After 9.5 9.7 9.6 9.5 9.5 9.9 9.4 Do the data suggest that the training technique is effective in raising the gymnasts’ scores? Perform a hypothesis test at the 5% significance level. 36) A laboratory tested twelve chicken eggs and found that the mean amount of cholesterol was 240 milligrams with s 19.8 milligrams. a) Construct a 95% confidence interval for the true mean cholesterol content of all such eggs. b) Interpret this confidence interval. 37) Suppose you have obtained a confidence interval for , but wish to obtain a greater degree of precision. Which of the following would result in a narrower confidence interval? a) b) c) d) Increasing the sample size while keeping the confidence level fixed Decreasing the sample size while keeping the confidence level fixed Increasing the confidence level while keeping the sample size fixed Decreasing the confidence level while keeping the sample size fixed 38) A car insurance company performed a study to determine whether an association exists between age and the frequency of car accidents. They obtained the following sample data. Perform a test to see if there is an association between age and frequency of car accidents. 0.05 Age Group Under 25 25-45 0ver 45 total Number of 0 74 89 82 245 accidents in 1 18 8 12 38 past 3 years More than 1 8 3 6 17 total 100 100 100 300