Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Degrees of freedom (statistics) wikipedia , lookup
Inductive probability wikipedia , lookup
Psychometrics wikipedia , lookup
Foundations of statistics wikipedia , lookup
Bootstrapping (statistics) wikipedia , lookup
Taylor's law wikipedia , lookup
History of statistics wikipedia , lookup
Law of large numbers wikipedia , lookup
Resampling (statistics) wikipedia , lookup
Probability and Statistics Teacher’s Edition - Assessment CK-12 Foundation March 15, 2010 CK-12 Foundation is a non-profit organization with a mission to reduce the cost of textbook materials for the K-12 market both in the U.S. and worldwide. Using an open-content, webbased collaborative model termed the “FlexBook,” CK-12 intends to pioneer the generation and distribution of high quality educational content that will serve both as core text as well as provide an adaptive environment for learning. Copyright ©2009 CK-12 Foundation This work is licensed under the Creative Commons Attribution-Share Alike 3.0 United States License. To view a copy of this license, visit http://creativecommons.org/licenses/ by-sa/3.0/us/ or send a letter to Creative Commons, 171 Second Street, Suite 300, San Francisco, California, 94105, USA. Contents 1 Probability and Statistics TE - Assessment 5 1.1 An Introduction to Analyzing Statistical Data . . . . . . . . . . . . . . . . . 6 1.2 Visualization of Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 1.3 Introduction to Probability . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 1.4 Discrete Probability Distributions . . . . . . . . . . . . . . . . . . . . . . . . 42 1.5 Normal Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 1.6 Planning and Conducting an Experiment or Study . . . . . . . . . . . . . . 61 1.7 Sampling Distributions and Estimations . . . . . . . . . . . . . . . . . . . . 71 1.8 Hypothesis Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83 1.9 Regressions and Correlation Quizzes . . . . . . . . . . . . . . . . . . . . . . 97 1.10 Chi-Square . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114 1.11 Analysis of Variance and the F-Distribution . . . . . . . . . . . . . . . . . . 125 3 www.ck12.org www.ck12.org 4 Chapter 1 Probability and Statistics TE - Assessment Proposed Pacing Guide for AP Probability and Statistics Understanding and Describing Data Chapters 1 and 2 - 2 – 3 weeks Scatterplots, Correlation, association, linear regression Chapter 9 - 4 weeks Probability Chapters 3, 4, 5 - 4 weeks Gathering Data and Experimental Design Chapter 6 - 2 -3 weeks Sampling Distributions Chapter 7 - 5 weeks Hypothesis Testing & Chi-Square Chapters 8, 10 - 4 weeks The above are all topics which are included in the AP Syllabus and appear on the AP Exam. ANOVA and Nonparametrics are not in the AP syllabus and thus, I would recommend that these topics be studied after the exam in the spring. ANOVA Chapter 11 - 2 weeks 5 www.ck12.org NonParametrics Chapter 12 - 2 weeks 1.1 An Introduction to Analyzing Statistical Data Definition of Statistical Terminology Quiz 1 For each description of data, identify the variables, classify each variable as categorical or quantitative and if the variable is quantitative state whether it is discrete or continuous. 1. Researchers caught and measured 27 lions recording their weight, neck size, length and sex. 2. A restaurant posts, for each of the sandwiches it sells, the type of meat, the number of calories and the serving size in ounces. Complete the following: 3. In statistics, the total group being studied is called ________ 4. A small, representative subset of the population is called a ________ Quiz 2 For each description of data, identify the variables, classify each variable as categorical or quantitative and if the variable is quantitative state whether it is discrete or continuous. 1. A large hospital in New York keeps data on new babies that are born. They record the mother’s age, the weeks the pregnancy lasted and the birth weight and gender of the baby. 2. A survey of cars in a large parking lot recorded the make of each car, the country of origin, the type of vehicle and the age of the car. Complete the following: 3. A mall, representative subset of the population is called a ________________ 4. A value of a population variable is called a _________________ www.ck12.org 6 Quiz 3 For each description of data, identify the variables, classify each variable as categorical or quantitative and if the variable is quantitative state whether it is discrete or continuous. 1. A telephone poll conducted of voters recorded the voter’s region of country, age and party affiliation. 2. Concerned with environmental issues, a survey of students at a large university recorded how far each student lived from campus, the mode of transportation used to get to campus (car, bus, bike, walk, etc.), whether or not the student owned a car and the year in school (freshman, sophomore, etc.) Complete the following: 3. An estimate from a sample of a parameter is called a _________________ 4. Whenever a sample is used instead of the entire population, results are merely estimates and have some chance of being incorrect. This is called _______________ An Overview of Data Quiz 1 1. Arrange in order from highest to lowest the four levels of measurement. 2. To a physicist the colors red, orange, yellow, green blue and violet correspond to specific wave lengths of light and thus are an example of which level of measurement? Indicate whether the following describes an experiment or an observational study: 3. Researchers are investigating the effect of two drugs designed to help people quit smoking. They found that 40 people our of 100 who decided to use drug A at the beginning of 2001 were no longer smoking at the end of 2001. Only 18 people out of 125 who chose to use drug B at the beginning of 2001 had quite smoking by the end of 2001. 4. True or False: An observational study is a way to establish a cause and effect relationship. Quiz 2 1. Arrange in order from lowest to highest the four levels of measurement. 2. To an electronics student familiar with color-coded resistors, colors are in an ascending order and thus represent at least what level of measurement? 3. Indicate whether the following describes and experiment or an observational study. 7 www.ck12.org A company would like to know the baking time and oven temperature that will produce the best bread. They consider 4 oven temperatures (300, 325, 350 and 375 degrees) and three baking times (40, 50, 60 minutes). Three breads are cooked at each time/temp combination. 4. True or False? In an experiment the researcher observes subjects in the real world without manipulating them. Quiz 3 1. Describe the difference between ratio and interval levels of measurement. 2. To a 3 year old child black, brown, red, orange, and yellow are just names of colors and thus represent what level of measurement? 3. Indicate if the following describes an experiment or an observational study: It believed that students who study a musical have higher GPA’s than student who do not. Of the music students 18% had all A’s as compared with only 7% among the students who did not study a musical instrument. 4. True or False: Cause-and-effect relationships can be established through an experiment. Measures of Center Quiz 1 1. The annual salaries of ten office workers are $23, 000, $38, 000, $46, 000, $23, 000, $24, 000, $23, 000, $23, 000, $38, 000, $23, 000, and $32, 000. a. Find the mean, median and modal salaries. b. Explain why the mode is an unsatisfactory measure of the middle in this case. 2. Find x if 5, 9, 11, 12, 13, 14, 17 and x have a mean of 12. 3. How many data points must be removed from each end of a sample of 400 values in order to calculate a 10% trimmed mean? 4. Sarah took 8 tests. Her scores for seven of these were 29, 36, 32, 38, 35, 34, and 39 (each out of 40). What was her score on the eighth test if her average for all eight tests was 35? 5. Create a data set that fits this description: The median age of Sarah and her six siblings is 14. The range of their ages is 12 years and the mode is 10. www.ck12.org 8 Quiz 2 1. The following raw data is the daily rainfall (to the nearest millimeter) for a month in the desert. 3, 1, 0, 0, 0, 0, 0, 2, 0, 0, 3, 0, 0, 0, 7, 1, 1, 0, 3, 8, 0, 0, 0, 42, 21, 3, 0, 3, 1, 0, 0 a. Find the mean, median and mode for the data. b. Give a reason why the mode is not the most suitable measure of center for this data. 2. Find a given that 3, 0, a, a, 6, a, 4, a, and 3 have a mean of 4. 3. How many data points must be removed from each end of a sample of 425 values in order to calculate a 15% trimmed mean? 4. The mean of 10 scores is 11.6. What is the sum of the scores? 5. Create a data set that fits the description: George took six math tests during the current marking period. His mean mark is 83 and his median mark is 85. Quiz 3 1. How many data points must be removed from each end of a sample of 80 values in order to calculate a 10% trimmed mean? 2. Bill drove an average of 262 miles each day for a period of 12 days. How many miles did he drive total? 3. The selling prices of the last 10 houses sold in a certain district were as follows: 146, 400 127, 600 211, 000 192, 500 256, 400 132, 400 148, 000 129, 500 131, 400 162, 500(all in dollars). a. Calculate the mean and median selling prices of these houses. b. Which measure would you use if you were i. A real estate agent wanting to sell your house. ii. Looking to buy a house in the district? 4. A basketball team scored 43, 55, 41 and 37 goals in their first four matches. What score will the team need to shoot in the next match so that they maintain the same mean score? 9 www.ck12.org 5. Create a data set that fits the description: Lara took a survey of the number of coins eight students had in their pockets. The minimum was 7, the mode was 11, the median was 10 and the range was 9. Measures of Spread Quiz 1 1. Find the median, upper and lower quartiles, the range and the interquartile range for the set of data: 2, 3, 3, 3, 4, 4, 4, 5, 5, 5, 5, 6, 6, 6, 6, 6, 7, 7, 8, 8, 8, 9, 9 2. Find the mean and standard deviation of the following distribution: Score Frequency 0 1 1 0 2 1 3 1 4 2 5 6 6 5 7 3 8 1 3. Use your calculator to find the mean and standard deviation of the following: 23, 24, 25, 26, 27, 28, 29, 30 Quiz 2 1. Find the median, upper and lower quartiles, the range and the interquartile range for the set of data: 10, 12, 15, 12, 24, 28, 19, 18, 18, 15, 16, 20, 21, 17, 18, 16, 22, 14 2. Find the mean and standard deviation of the following distribution: Score Frequency www.ck12.org 11 2 12 1 13 4 14 5 10 15 6 16 4 17 2 18 1 3. Use your calculator to find the mean and standard deviation of the following: 7, 19, 5, 14, 13, 18, 21, 14, 11, 13, 15, 8 Quiz 3 1. Find the median, upper and lower quartiles, the range and the interquartile range for the set of data: 21.8, 22.4, 23.5, 23.5, 24.6, 24.9, 25, 25.3, 26.1, 26.4, 29.5 2. The number of toothpicks in 48 boxes was counted and the results tabulated: Number of toothpicks Frequency 33 1 35 5 36 7 37 13 38 12 39 8 40 2 Find the mean and standard deviation of the distribution. 3. Use your calculator to find the mean and standard deviation of the following: −3, −2, −1, 0, 1, 2, 3 Test 1 1. A distribution of 6 scores has a median of 21. If the highest score increases 3 points, the median will be: a. 21 b. 21.5 c. 24 d. Cannot be determined with information given e. None of the above 2. If you are told that a data set has a mean of 25 and a variance of 0, you can conclude that: a. There is only one observation in the data set b. There are no observations in the data set. c. All of the observations in the data set are 25 11 www.ck12.org d. Someone has made a mistake e. None of the above 3. Of the following measures: mean, median, IQR and standard deviation, which are resistant: a. Mean and median b. Median and IQR c. Mean and standard deviation d. Median and standard deviation e. None of the above ∑ 4. The quantity (xi − x̄) is not used as a measure of variation because a. It is always equal to zero b. It is always a negative value c. It is too difficult to work with d. It is always a positive value e. None of the above 5. The variance of the following sample of five numbers: 1, 2, 3, 4, 5 is: a. 2 b. 9 c. 10 d. 13.3 e. 55 6. If you add 5 to each value in a data set, then the standard deviation will a. Decrease by 5 b. Increase by 5 c. Stay the same d. Depend on the values of the data in the data set e. None of the above 7. Last year a small accounting firm paid each of its five clerks $25000, two junior accountants $60, 000 each, and the firm’s owner $255, 000. www.ck12.org 12 a. What is the mean salary paid at this firm? b. How many of the employees earn less than the mean? c. What is the median salary? 8. The National Institute of Health is studying the birth weight of babies born to mothers who smoke cigarettes. A sample of the weights of 14 babies is selected and the weights are listed below (in pounds). 6.1 5.9 5.8 7.2 6.3 6.2 6.1 5.1 6.0 6.5 6.7 5.3 6.5 5.9 a. calculate the mean and median b. If the largest value were changed to 15.2, how would this impact the median? c. If the largest value were changed to 15.2, how would this impact the mean? 9. Define the following: a. Trimmed mean b. Midrange 10. The following table presents the years of service of eight college professors: Professor Baric Baxter Yrs 31 15 Professor Hastings Prevost Yrs 7 3 Professor Reed Rossman Yrs 1 6 Professor Stodghill Tesman Yrs 28 6 a. Use your calculator to compute the mean and median of these years of service. b. Suppose Baxter’s years of service had been mistakenly recorded as 155. Use your calculator to recompute the mean and the median. Did they change? Explain. c. Suppose Baric’s years of service had been mistakenly recorded as 3. Use your calculator to recompute the mean and the median. Did they change? Explain. 11. Construct a data set with ten hypothetical exam scores so that the mean does not equal the median and none of the scores are between the mean and the median. 12. Find the range, sample standard deviation, and sample variance for the following data set: a. inauguration ages of U.S. presidents: 57, 61, 57, 57, 58, 57, 61, 54, 68, 51, 49, 64, 50, 48, 65, 52, 56, 46, 54, 49, 54, 42, 51, 56, 55, 51, 54, 51, 60, 62, 43, 55, 56, 61, 52, 69, 64, 46, 54. b. Sort the presidential inauguration age data on your calculator. Count how many data elements are within one standard deviation of the mean (i.e. between 54.8 − 6.2 = 48.6 and 13 www.ck12.org 54.8 + 6.2 = 61.0). Convert this to a percentage. c. Sort the presidential inauguration age data on your calculator. Count how many data elements are within two standard deviations of the mean. Convert this to a percentage. 13. For each of the following Determine which level of measurement is most appropriate. a. Daily high and low temperatures at the Niles airport for 2004. b. Time (in days) for a sunspot to be visible from the earth. Test 2 1. Which of the following variables is a continuous variable? a. The lifetime of a 9 volt battery b. The number of cracked eggs in a carton of 12 eggs c. The number of people in a family d. The brand of laundry detergent used 2. If you add 7 to each value in a data stet, then the mean will a. Decrease by 7 b. Increase by 7 c. Stay the same d. Depend on the values of the data in the data set e. None of the above 3. Which of the following is a discrete variable a. The lifetime of a 9 volt battery b. The height of all students in the high school c. The number of cracked eggs in a carton d. The brand of laundry detergent used e. The types of music sold in a music store 4. The mean of the sample of five numbers: −1, −2, −3, −4, −5 is: a. 0 b. −3 c. −1 www.ck12.org 14 d. −6 e. 3 5. A distribution of 7 scores has a median of 23. If the lowest score decreases 4 points, the median will be a. 23 b. 23.5 c. 22.7 d. Cannot be determined by the information given e. None of the above 6. A police officer gave 20 speeding tickets last week on a stretch of road having a 60 miles per hour speed limit. The speeds recorded for each of the tickets are given below: 72 68 79 79 67 81 76 71 82 80 80 73 78 78 75 70 79 70 69 74 a. Is this categorical or quantitative data? b. What is the range? c. What is the IQR? 7. Define the following: a. Categorical variable b. Interquartile range 8. Explain the difference between an experiment and an observational study. 9. Construct a data set with ten hypothetical exam scores so that 90% of the scores are greater than the mean. Assume the exam scores are integers between 0 and 100. 10. For each of the following determine which level of measurement is most appropriate. a. Colors of Skittles® brand candies. b. Final course grades of A, B, C, D, and F . 10. Last year a small private school paid each of its five interns $25000, two lead teachers $60, 000 each, and the school’s principal $255, 000. a. What is the mean salary paid at this firm? b. How many of the employees earn less than the mean? c. What is the median salary? 15 www.ck12.org 1.2 Visualization of Data Histograms and Frequency Distributions Quiz 1 1. Given below is a cumulative frequency distribution table showing the marks secured by 50 students of a class. Table 1.1: Marks below below below below below No of students 20 40 60 80 100 17 20 29 37 50 a. Form a frequency table from the above data b. The frequency for the fourth class interval is.. c. The class interval with the highest frequency is 2. The data set below is the test scores (out of 100) for a math test for 50 students. a. Construct a frequency table for this data using class intervals 0 − 9 b. What percentage of the students scored 80 or more for the test? c. Draw the ogive for this data. 56 29 78 67 68 69 80 89 92 71 58 66 56 88 81 70 73 63 74 38 67 64 62 55 56 75 90 92 47 44 59 64 89 62 51 87 89 76 59 88 72 80 95 68 80 64 53 43 61 39 Visualization of Data Histograms and Frequency Distributions www.ck12.org 16 Quiz 2 The following data is the number of points scored by the eight winning teams in the first five rounds of the 2001 AFL season. 94 196 154 131 129 134 152 140 124 162 103 139 82 170 110 111 116 160 104 110 98 106 187 149 165 88 118 123 137 128 113 130 145 139 125 154 126 141 122 106 1. Construct a frequency table for this data using class intervals 80 − 89, 90 − 99, 100 − 109, . . . − 180 − 189 2. Construct a cumulative frequency table for this data. 3. What percentage of matches had winning scores of 99 points or less? 4. Draw an ogive for this data. 5. Describe the distribution of the data. Visualization of Data Histograms and Frequency Distributions Quiz 3 The number of matches in a box is stated as 50 but the actual number of matches has been found to vary. The number of matches in a box has been counted for a sample of 60 boxes. Here is the data: 51 50 50 51 52 49 50 48 51 47 50 52 48 50 49 51 50 50 52 52 51 50 50 52 50 53 48 50 51 50 50 49 48 51 49 52 50 49 50 50 52 50 51 49 52 52 50 49 50 49 51 50 50 51 50 53 48 49 49 50 1. Construct a frequency table for this data. 2. Display the data using a histogram 3. Describe the distribution of this data. 4. What percentage of the boxes contains exactly 50 matches? 5. Construct a cumulative frequency table for this data. 6. Draw an ogive for this data. 17 www.ck12.org Common Graphs and Data Plots Quiz 1 1. For the following data set construct a dot plot and comment on the distribution. 4 2 5 6 7 4 5 3 5 4 7 6 3 5 8 6 5 2. The data supplied below is the diameter (in cm) of a number of bacteria colonies as measured by a microbiologist 12 hours after seeding. .4 2.1 3.4 3.9 4.7 3.7 .8 3.6 4.1 4.9 2.5 3.1 1.7 3.6 2.8 3.7 2.8 3.2 3.3 1.5 2.6 4.0 1.3 3.5 .9 1.5 4.2 3.5 2.1 a. Produce a stemplot for this data. b. Comment on the skewness of the data 3. The stem and leaf plot represents how much Jacob spent this month: Money 0|1 7 7 1|1 2 4 2|1 1 6 3|9 4|1 2 7 5|6 6 8 Spent 8 66778 9 List out how much Jacob spent. 4. Fast food is often considered unhealthy because much fast food is high in fat and sodium. Are fat and sodium related? Following are the fat and sodium content of several brands of burgers. Create a scatterplot and describe the direction and the strength of the relationship. Fat (g) Sodium (mg) www.ck12.org 19 920 31 1500 34 1310 18 35 860 39 1180 39 940 43 1260 Quiz 2 1. Make a stem plot of the money that Fiona spent this month: Fiona: $71, $57, $68, $57, $83, $88, $64, $75, $66, $74, $81 2. The following sets of test scores were compared using a back-to-back stem plot: Ian Dan 9 8 6 4 |4| 6 7 8 9 8 7 5 2 2 |5| 2 3 7 8 9 5 4 3 |6| 0 0 1 7 8 6 2 2 0 |7| 6 List the scores that Dan and Ian had. 3. Draw a dot plot of the sodium data and comment on the distribution. Table 1.2: Cereal Sodium(mg) Sugar(g) Frosted Mini Wheats Raisin Bran All Bran Apple Jacks Capt. Crunch Cheerios Cinnamon Toast Crackling Oat Bran Crispix Frosted Flakes Fruit Loops Grape Nuts Honey Nut Cheerios Life Oatmeal Raisin Crisp Sugar Smacks Special K Wheaties Corn Flakes Honeycomb 0 210 260 125 220 290 210 140 220 200 125 170 250 150 170 70 230 200 290 180 7 12 5 14 12 1 13 10 3 11 13 3 10 6 10 15 3 3 2 11 19 www.ck12.org 4. Is there a relationship between fat in burgers and calories? Draw a scatterplot of the following data and describe the direction and strength of the relationship. Fat (g) Calories 19 410 31 580 34 590 35 570 39 640 39 680 43 660 Quiz 3 1. Draw a dot plot of the sugar data and comment on the distribution. Table 1.3: Cereal Sodium(mg) Sugar(g) Frosted Mini Wheats Raisin Bran All Bran Apple Jacks Capt. Crunch Cheerios Cinnamon Toast Crackling Oat Bran Crispix Frosted Flakes Fruit Loops Grape Nuts Honey Nut Cheerios Life Oatmeal Raisin Crisp Sugar Smacks Special K Wheaties Corn Flakes Honeycomb 0 210 260 125 220 290 210 140 220 200 125 170 250 150 170 70 230 200 290 180 7 12 5 14 12 1 13 10 3 11 13 3 10 6 10 15 3 3 2 11 2. Go Shop Already is a babysitting service in a mall. For a month, they kept track of the number of babies they were left with. Make a dot plot of the data: www.ck12.org 20 31, 24, 29, 24, 16, 14, 25, 17, 30, 18, 19, 26, 18, 23, 26, 17, 16, 24, 30, 27, 19, 29, 22, 20, 32, 30, 20, 21, 29, 23 3. At Ritzy Stuff store, management kept track of sales during a recent workday. They made a dot plot of the data. List out the sales for that workday: 4. Use the following data to examine the relationship between sodium and calories in hambugers. Draw a scatterplot and describe the strength and direction of the association. Calories Sodium (mg) 410 920 580 1500 590 1310 570 860 640 1180 680 940 660 1260 Box and Whisker Plots Quiz 1 1. Find the five-number summary for the data set: {37, 44, 5, 8, 20, 11, 14} 2. Circle the points that represent the five number summary values in the dot plots below: 3. Create a data set with the five number summary 6, 10, 12, 15, 20 so that the set contains 11 values. 4. The table shows the number of bachelor’s degrees earned in various fields at a private university for 1994. Degree Field 1994 21 www.ck12.org Architecture 78 Biological Sciences 172 Business and management 422 Computer science 205 Cultural studies 46 Education 261 Engineering 370 English literature 143 Law 29 Mathematics 65 Philosophy 52 Physical sciences 110 Visual and performing arts 141 a. Give the five number summary and the mean for the data. b. Create a box plot for the data set. Quiz 2 1. Find the five-number summary for the data set: {10, 1, 3, 4, 30, 4, 20, 22, 10, 25, 30} 2. Circle the points that represent the five number summary values in the dotplot below. 3. Create a data set with the five-number summary 6, 10, 12, 15, 20 that contains 12 values. 4. Here are summary statistics for Verbal SAT scores for a high school graduating class. www.ck12.org 22 Male Female N 80 82 Mean 590 602 Median 600 625 SD 97.2 102.0 Min 310 360 Max 800 770 Q1 515 530 Q3 650 680 a. Create parallel boxplots comparing the scores of males and females from the information given. b. Discuss the shape, center and spread of the scores. Quiz 3 1. Find the five number summary for the data set: {25, 27, 33, 14, 31, 16, 22, 24, 43, 25, 37, 39, 42} 2. Create a data set with the five-number summary 6, 10, 12, 15, 25 that contains 11 values. 3. The table shows the number of bachelor’s degrees earned in various fields at a private university for 1985. Degree Field 1985 Architecture 76 Biological Sciences 158 Business and management 410 Computer science 132 Cultural studies 25 Education 247 Engineering 351 English literature 129 Law 18 Mathematics 62 Philosophy 43 Physical sciences 107 Visual and performing arts 154 23 www.ck12.org a. Give the five number summary and the mean for the data. b. Create a box plot for the data set. Test 1 1. In a study of hatchling resting metabolism, three species were studied. These are labeled A, B, and C in the pie chart below. In total, 36 hatchlings were studied. Based on the pie chart, approximately how many of the hatchlings were Species C? a) 8 b) 12 c) 20 d) 24 e) 30 2. In the following frequency table, what proportion of values are less than 60? Table 1.4: Class Interval Frequency 15− < 30 30− < 45 45− < 60 60− < 75 15 14 16 12 www.ck12.org 24 Table 1.4: (continued) Class Interval Frequency 75− < 90 TOTAL 18 75 a. 0.187 b. 0.200 c. 0.213 d. 0.240 e) 0.600 3. The back-to-back stem-and-leaf plot below gives the percentage of students who dropped out of school at each of the 49 high schools in a large city school district. School Year 1989 − 1990 School Year 1992 − 1993 0 4 566677788899 00001111222224444 555666677778 2 13 9999887 0 4444433222211110 1 9997766665 1 4222100 88876 2 2 766 3 3 5 4 0112 stem = tens leaf = ones Which of the following statements is NOT justified by these data? a. The drop-out rate decreased in each of the 49 high schools between the 1989-1990 and 1992-1993 school years b. For the school years shown, most students in the 49 schools did not drop out of high school. c. In general, drop-out rates decreased between the 1989-1990 and 1992-1993 school years 25 www.ck12.org d. The median drop-out rate of the 49 high schools decreased between the 1989-1990 and 1992-1993 school years e. The spread between the schools with the lowest drop-out rates and those with the highest drop-out rates did not change much between the 1989-1990 and 1992-1993 school years 4. What is the median of the data for the School Year 1989-1990? a. 15 b. 16 c. 19 d. 20 e. cannot be determined 5. The frequency table below shows the heights (in inches ) of 130 members of a choir. Height Count 60 2 61 6 62 9 63 7 64 5 65 20 66 18 67 7 68 12 69 5 70 11 71 8 72 9 73 4 74 2 75 4 76 1 a. Find the five number summary for these data. b. Display the data with a boxplot. c. Find the mean and standard deviation. d. Display these data with a histogram. e. Write a few sentences describing the distribution of heights. 6. A police officer gave 20 speeding tickets last week on a stretch of road having a 60 mile per hour speed limit. The speed recorded for each of the tickets are given below: 72 68 79 79 67 81 76 71 82 80 80 73 78 78 75 70 79 70 69 74 a. Construct a dotplot of the data. b. Construct a stem-and-leaf display of the data. c. Find the five number summary and create a box plot. d. What is the range? e. What is the IQR? f. Are there any outliers in the data set? www.ck12.org 26 g. Cannot be determined with the full data set. 7. I have a data set consisting of 33 whole number observations. Its five number summary is (16, 20, 22, 30, 46) a. What is the range of the data? b. Identify the five numbers in the five number summary. c. How many observations are strictly less than 22? d. How many observations are strictly less than 20? e. What is the interquartile range? f. Construct a box plot. g. Test for outliers. Are there any outliers? 8.Draw a scatterplot of the following data and describe the strength and direction of the association between x and y. X Y 6 5 10 3 14 7 19 8 21 12 Test 2 1. Since Hill Valley High School eliminated the use of bells between classes, teachers have noticed that students seem to be arriving to class a few minutes late. One teacher decided to collect data to determine whether the students” and teachers” watches were displaying the correct time. At exactly 12 noon, the teacher asked 9 randomly selected students and 9 randomly selected teachers to record the times on their watches to the nearest half minute. The data is recorded in the table below, with minutes after noon recorded as positive values and minutes before noon shown as negative values. Students Teachers − 4.5 − 2.0 − 3.0 − 1.5 − 0.5 − 1.5 0 − 1.0 0 − 1.0 0.5 − 0.5 0.5 0 1.5 0 5.0 0.5 a. Construct parallel boxplots using these data. b. Based on the boxplots in part a) how do the groups compare? Discuss shape, center and spread. 2. A data set has the following five number summary: Min 7 27 www.ck12.org Q1 10 Med 12 Q3 17 Max 26 Is 26 an outlier ? a. Yes because it is the highest value in the set b. No, because it is the maximum c. Yes, because it is 1.5 IQR above the median. d. No, because it is not 1.5 IQR above Q3. 3. Here is the number of home runs Babe Ruth hit in each of his 15 years with the New York Yankees: 54 59 35 41 46 25 47 60 54 46 49 46 41 34 22 Roger Maris, who broke Ruth’s single year record, had these home run totals In his 10 years in the American League: 14 28 16 39 61 33 23 26 8 13 a. Compute the five number summary of each player b. Make side-by-side box plots of the home run distributions. What does your comparison show about Ruth and Maris as home run hitters? 4. A garage wants to understand how long customers have to wait to have their car serviced. Below is the data they collected. Table 1.5: Service time (minutes) Frequency Cumulative Frequency < 30 > 29 but < 45 > 44 but < 60 > 59 but < 75 > 74 but < 90 > or = 90 3 45 54 36 21 6 Total Frequencies = 165 3 48 102 138 159 165 www.ck12.org 28 a. Draw a cumulative frequency curve (ogive).] b. Answer the following multiple choice questions: a. The median service time was: i. 40 minutes ii. About half an hour iii. More than an hour b. Service time was under one hour i. 137 times ii. 102 iii. Not very many c. The number of occasions when service time was less than 30 minutes i. 3 ii. 40 iii. 55 d. The number of times service time was over three-quarters of an hour i. 27 ii. 110 times iii. 90 times 5. The rapid growth of internet publishing is seen in a number of electronic academic journals made available in the 1990’s. Table 1.6: Year Number of Journals 1991 1992 1993 1994 1995 1996 1997 27 36 45 181 306 1093 2459 Make a scatterplot and describe the strength and direction of the association. 29 www.ck12.org 6.The dotplot below shows the number of televisions owned by each family on a city block. Which of the following statements are true? A. The distribution is right-skewed with no outliers. B. The distribution is right-skewed with one outlier. C. The distribution is left-skewed with no outliers. D. The distribution is left-skewed with one outlier. E. The distribution is symmetric. 1.3 Introduction to Probability Events, Sample Spaces and Probability Quiz 1 1. You are going to roll a die three times and note how many odd numbers you get. What is the sample space? 2. Make a Venn diagram for the following: 31% of my students got an A on the exam 29% of my students studied for the exam 40% of my students bought me flowers 15% of my students studied for the exam, bought me flowers, and got an A 2% of my students got an A but did not study or buy me flowers 28% of my students bought me flowers and got an A 18% of my students studied and bought me flowers 3. A marble is randomly selected from a box containing 5 green, 3 red and 7 blue marbles. Determine the probability that the marble is: www.ck12.org 30 a. Red b. Green c. Neither green nor blue Quiz 2 1. A marble is randomly selected from a box containing 5 green, 3 red and 7 blue marbles. Determine the probability that the marble is a. Blue b. Not red c. Green or red 2. In a class of 30 students, 19 study physics, 17 study chemistry and 15 study both of these subjects. Display the information on a Venn diagram and determine the probability that a randomly selected class member studies: a. Both subjects b. Physics but not chemistry c. Chemistry if it is known that the student studies physics 3. A coin is tossed and a square spinner, labeled A, B, C, D is twirled. Determine the probability of obtaining: a. A head and consonant b. A tail and C c. A tail or a vowel Quiz 3 1. A dart board has 36 sectors, labeled 1 to 36. Determine the probability that a dart thrown at the board hits: a. A multiple of 4 b. 9 c. A number greater than 20 2. In a class of 30 students, 19 study physics, 17 study chemistry and 15 study both of these subjects.Display the information on a Venn diagram and determine the probability that a randomly selected class member studies: 31 www.ck12.org a. Neither subject b. At least one of the subjects c. Exactly one of the subjects 3. A coin is tossed and a square spinner, labeled A, B, C, D is twirled. Determine the probability of obtaining: a. A tail and a vowel b. A vowel and B Compound Events Quiz 1 1. A fair coin is tossed four times. Two events are defined as follows: A: {at least one tail is observed} B:{the number of tails observed is even} a. List the outcomes of A b. List the outcomes of B c. Find P (AC ), P (B), P (AC ∪ B) 2. Given that the probability of a person traveling to Canada is l18, to Mexico is .09 and to both countries is .04. a. Draw a Venn diagram to illustrate this situation b. What is the probability that a person chosen at random has i. Traveled to Canada but not Mexico ii. Traveled to either Canada or Mexico iii. Not traveled to either country Quiz 2 1. A fair coin is tossed four times. Two events are defined as follows: A: {at least one tail is observed} B:{the number of tails observed is even} a. List the outcomes of A www.ck12.org 32 b. List the outcomes of B, c. Find P (AC ), P (B), P (AC ∪ B) 2. Data from a large company reveal that 72% of the workers are married, that 44% are college graduates and that half of the college graduates are married. a. Draw a Venn diagram to illustrate this situation b. Find the probability that a randomly chosen worker i. Is neither married nor a college graduate ii. Is married but not a college graduate iii. Is married or a college graduate Quiz 3 1. A check on dorm rooms on a large college campus revealed that 38% had refrigerators, 52% had TVs and 21% had both a TV and a refrigerator. a. Draw a Venn diagram of this situation. b. Find the probability that a randomly selected dorm room has i. A TV but no refrigerator ii. A TV or a refrigerator but not both iii. Neither a TV nor a refrigerator 2. Given two simple events A and B. Suppose the following is true: P (A) = .78 P (B) = .36 P (A ∩ B) = .22 Find a. P (A ∩ B C ) b. P (B ∩ AC ) c. P (AC ∩ B C ) 33 www.ck12.org Conditional Probability Quiz 1 1. In a class of 25 students 14 like Pizza and 16 like coffee. One student likes neither and 6 students like both. One student is selected from the class. What is the probability that the student a. Likes pizza b. Likes pizza given that he/she likes coffee? 2. 26 52 13 GivenP (H) = 52 13 P (H ∩ R) = 52 P (R) = Find a. P (H|R) b. P (R|H) 3. In a group of 50 students, 40 study math, 32 study physics and each student studies at least one of these subjects. a. Use a Venn diagram to find how many students study both subjects. b. If a student from this group is randomly selected, find the probability that he/she studies physics given that he/she studies mathematics. Quiz 2 1. A box of chocolate contains 6 with hard centers (H) and 12 with soft centers (S). Find a. P (H) b. P (S) c. P (H ∩ S) d. P (H ∪ S) 2. In a class of 40, 34 like bananas, 22 like pineapples and 2 dislike both fruits. If a student is randomly selected, find the probability that the student: a. Likes both fruits www.ck12.org 34 b. Likes at least one fruit c. Likes bananas given that he/she like pineapples d. Dislikes pineapples given that he/she likes bananas 3. 400 families are surveyed. It is found the 90% had a TV set and 60% had a computer. Every family had at least one of these items. If one of these families is randomly selected find the probability it has a TV set given that it has a computer. Quiz 3 1. 2 5 1 IfP (B) = 3 1 P (A ∪ B) = 2 P (A) = Find a. P (A ∩ B) b. P (B|A) c. P (A|B) 2. A class has 25 students. 13 play tennis, 14 play volleyball and 1 plays neither of these two sports. A student is randomly selected from the class. Determine the probability that the student: a. Plays both tennis and volleyball b. Plays at least one of these two sports. c. Plays volleyball given that he/she does not play tennis. 3. The probability that a boys eats his lunch is .5 and the probability that his sister eats her lunch is .6. The probability that the girl eats her lunch given that the boy eats his lunch is .9. Determine the probability that: a. Both eat their lunch b. The boy eats his lunch given that the girl eats hers 35 www.ck12.org Additive and Multiplicative Rules Quiz 1 1. A box of chocolates contains 6 with hard centers (H) and 12 with soft centers(S). Are the events H and S mutually exclusive? 2. Given the following information : P (A) = ·78 If P (B) = ·3 P (A ∩ B) = ·22 a. Are the events A and B mutually exclusive? Explain b. Are the events A and B independent? Explain Quiz 2 1. A university requires its biology majors to take a course called Bioresearch. The prerequisite for this course is either a statistics course or a computer course. By the time they are juniors, 52% of the biology majors have taken statistics, 23% have had a computer course, and 7% have done both. a. Are taking the two courses, statistics and computers, mutually exclusive? Explain b. Are taking these two courses independent? Explain. 2. 1 1 P (A) = P (B) = p(A ∪ B) = p Find p if 2 3 : a. A and B are mutually exclusive b. A and B are independent Quiz 3 1. Fifty –six percent of American workers have a retirement plan, 68% have health insureance and 49% have both benefits. a. Are having health insurance and a retirement plan independent events? Explain. www.ck12.org 36 b. Are having these two benefits mutually exclusive? Explain 2. If P (X) = .5 and P (Y ) = .7 and X and Y are independent determine the probability of the occurrence of: a. Both Xand Y b. X or Y c. X given that Y occurs Basic Counting Rules Quiz 1 1. If the NCAA has applications from 7 universities for hosting its tennis championships in 2003 and 2004, how many ways may they select the hosts for these championships a. If they are not both to be held at the same university? b. If they may both be held at the same university? 2. A multiple-choice test consists of 15 questions, each permitting a choice of 4 alternatives. In how many ways may a student fill in the answers if he/she answers each question? 3. David owns 4 pairs of pants, 8 shirts and 2 sweaters. In how many ways may he choose 2 of the pairs of pants, 3 of the shirts and 1 of the sweaters to pack for a trip? 4. An art collector, who owns 12 original paintings, is preparing a will. In how many ways may the collector leave these paining to four heirs? 5. Bin A contains 3 red and 2 white tickets. Bin B contains 4 red and 1 white ticket. A die has 4 faces marked A and two faces marked B. The die is rolled and used to select the bin A or B. a ticket is then selected from the bin. Use a tree diagram to show all the possible outcomes (choices of tickets). Quiz 2 1. The probability that Ann’s mother takes her shopping is 25 . When Ann goes shopping with her mother she gets an ice cream 75% of the time. When Ann does not go shopping with her mother she gets an ice cream 25% of the time. Draw a tree diagram to illustrate the possible outcomes. 2. There are five finalists in a contest. In how many ways may the judges choose a winner and a first runner-up? 3. In a primary election, there are four candidates for mayor, five candidates for treasurer 37 www.ck12.org and three candidates for secretary. In how many ways may voters mark their ballots if they vote in all three of the races? 4. a. How many permutations are there of the letters in the word great? b. How many permutations are there of the letters in the word greet? 5. In how many ways may one A, three B’s, two C’s and one F be distributed among seven students in a statistics class? Quiz 3 1. Suppose a true-false test has 20 questions. In how many ways may a student mark the test, if each question is answered? 2. A football team plays 12 games during the season.In how many ways can it end the season with 6 wins, 5 losses and 1 tie? 3. Urn A contains 2 red and 3 blue marbles, and urn B contains 4 red and 1 blue marble.Peter tosses a coin and if the coin comes up heads he chooses a marble from urn A. Draw a tree diagram to represent all the possible outcomes. 4. How many distinct permutations are there of the word “statistics”? 5. A bank has a pool of 8 tellers and 8 customer service representatives. How many ways can the manager select 4 teller and 2 service reps to work on a given day? Test 1 1. You are going to roll a die three times and note how many odd numbers you get. What is the sample space? 2. Give an example of two sets A and B which are mutually exclusive. 3. Here is the probability associated with winning certain prizes in a raffle: Car .03 Boat .07 TV .12 Can Opener .33 a. What is the probability of winning nothing b. What is the probability of winning the car or the TV? c. What is the probability of winning the boat and the can opener? www.ck12.org 38 d. What is the probability of not winning the car or the can opener? 4. You play two games against the same opponent. The probability you win the first game is ·4. If you wind the first game, the probability you also win the second is ·2. If you lose the first game, the portability that you win the second is ·3 a. Are the games independent? Explain your answer b. What is the probability that you lose both games? c. What is the probability that you win both games? 5. Events A and B are defined by the given Venn diagram: Aand Bare a. Independent and disjoint b. Dependent and disjoint c. Independent and not disjoint d. Dependent and not disjoint e. Cannot be determined 6. If performance on AP Statistics tests are independent and the probability of passing an AP Statistics test is ·2, then the probability of passing three AP Statistics tests is a. 6 b. .2 c. .04 d. .008 e. 0 7. 45% of a high school student body is male. 80% of the females love math, while only 60% of the males love math. What percentage of the student body love math? 39 www.ck12.org a. 70% b. 50% c. 71% d. 60% e. 100% 8. If 3coins are tossed, what is the number of equally likely outcomes? a. 3 b. 4 c. 6 d. 8 e. 9 9. If p(X) = ·23 and p(X ∩ Y ) = ·12 and P (X ∪ Y ) = ·34 then P (y ′ ) = a. .23 b. .52 c. .11 d. .77 e. .48 10. How many possible 5− character code words are possible if the first two characters are letters and the last three characters are numbers? (No character may be repeated) a. 468000 b. 82 c. 676000 d. 78 Introduction to Probability Test 2 1. In a recent survey of 100 10− year olds, the following information was obtained: 53liked McDonalds 12 liked both McDonalds and Burger Kin 24 liked Burger King 6 liked all three 42 liked Wendy’s 23 liked McDonalds and Wendy’s 4 liked only Burger King a. Draw a Venn diagram illustrating this information b. How many 10− year olds don’t like any of these three? c. What percentage of these 10− year olds like burger King and Wendy’s? d. What is the probability that a 10− year old likes Burger King, given that he likes Wendy’s? 2. If if: P (A) = ·2 and P (B) = ·3 find p(A ∪ B) if: a. Aand Bare independent b. Aand B are mutually exclusive 3. A survey of families revealed that 8% of all families eat turkey at holiday meals, 44% eat ham, and 16% have both turkey and ham. a. What is the probability that a family selected at random had neither turkey nor ham at their holiday meal? b. What is the probability that a family selected at random had only ham without having turkey at their holiday meal? c. What is the probability that a randomly selected family had ham at their holiday meal, given that they had turkey? www.ck12.org 40 d. Are having turkey and having ham mutually exclusive events? Explain. 4. Draw a tree diagram to answer the following question: A college was making plans for staffing and used the following: of the students, 22 · 5% were seniors, 25% were junior, 25% were sophomores and the rest were freshman. Also, 40% of the seniors major in the area of humanities, as did 39% of the juniors, 40% of the sophomores and 36% of the freshmen.What is the probability that a randomly selected humanities major is a junior? 5. Assume that 75% of the AP Stat students studied for this test. If 40% of those who study get an A, but only 10% of those who don’t study get an A, what is the probability that someone who gets an A actually studied for the test? 6. Insurance company records indicate that 12% of all teenage drivers have been ticketed for speeding and 9% for going through a red light. If 4% have been ticketed for both, what is the probability that a teenage driver has been issued a ticket for speeding but not for running a red light? a.3% b.8% c. 12% d. 13% e. 17% 7. Which two events are most likely to be independent? a. Being a senior, going to homeroom b. Registering to vote; being left-handed c. Having a car accident; having a junior license d. Doing statistics homework; getting an A on the test e. Having 3 inches of snow in the morning; being on time for school 8. How many different three member teams can be formed from six students? a. 20 b. 120 c. 216 d. 720 9. How many different 6−letter arrangements can be formed using the letters in the word ABSENT, if each letter is used only once? 41 www.ck12.org a. 6 b. 36 c. 46656 d. 720 10. How many elements are in the sample space of rolling one die? a. 6 b. 12 c. 24 d. 36 11. A movie theater sells 3 sizes of popcorn (small, medium, and large) with 3 choices of toppings (no butter, butter, extra butter). How many possible ways can a bag of popcorn be purchased? a. 1 b. 3 c. 9 d. 27 1.4 Discrete Probability Distributions Probability Distribution for a Discrete Random Variable Quiz 1 1. Classify the following random variables as continuous or discrete: a. The quantity of fat in a lamb chop b. The mark out of 50 for a geography test c. The weight of a seventeen year old student. 2. To measure the rainfall over a 24 − hour period, the height of water collected in a rain gauge (up to 200mm) is used. Identify the random variable being considered, give the possible values for the random variable and indicate whether the variable is continuous or discrete. 3. A magazine store recorded the number of magazines purchased by its customers in one www.ck12.org 42 day. 23% purchased one magazine, 38% purchased two, 21% purchased three, 13% purchased four and 5% purchased five. a. What is the random variable? b. What are the possible values of the random variable? c. Make a random variable probability table. d. Graph the probability distribution. Quiz 2 1. Classify the following random variables as continuous or discrete. a. The volume of water in a cup of coffee b. The number of trout in a lake c. The number of hairs on a cat. 2. To investigate the stopping distance for a tire with a new tread pattern a braking experiment is carried out. Identify the random variable being considered, give the possible values for the random variable and indicate whether the variable is continuous or discrete. 3. Following is a probability distribution table: X P (x) 0 a 1 .3333 2 .1088 3 .0084 4 .0007 5 .0000 a. What is the value of a? b. What is the value of P (2)? c. Graph the probability distribution. Discrete Probability Distributions Probability Distribution for a Discrete Random Variable Quiz 3 1. Classify the following random variables as continuous or discrete. a. The length o hairs on a horse b. The height of a sky-scraper 43 www.ck12.org 2. To check the reliability of a new type of light switch, switches are repeatedly turned off and on until they fail. Identify the random variable being considered, give the possible values for the random variable and indicate whether the variable is continuous or discrete. 3. Given the following probability distribution: X P (x) 0 0.07 1 0.14 2 K 3 0.46 4 0.08 5 0.02 a. Find K b. Find i. P (x ≥ 2) ii. P (1 < x ≤ 3) c. Graph the probability distribution. Mean and Standard Deviation of Discrete Random Variables Quiz 1 1. Consider the following probability distribution: X P (x) 0 0.00 1 0.23 2 0.38 3 0.21 4 0.13 5 0.05 a. Find the mean of the distribution. b. Find the variance. c. Find the standard deviation. 2. Find the expected value of the following probability distribution: X P (X = x) 2 0.3 4 0.4 6 0.2 8 0.1 Quiz 2 1. The probability model below describes the number of repair calls that an appliance repair shop may receive during an hour. www.ck12.org 44 Repair Calls Probability 0 0.1 1 0.3 2 0.4 3 0.2 a. How many calls should the shop expect per hour? b. What is the standard deviation? 2. Following is a discrete probability distribution: X P (X) 0 0.54 1 0.26 2 0.15 3 0.03 4 0.01 5 0.01 >5 0.00 a. Find the mean of the distribution b. Find the variance c. Find the standard deviation Quiz 3 1. A random variable X has the following probability distribution: X P (X) 1 0.1 2 0.2 3 k 4 0.2 5 0.1 a. Find K b. Find the mean of the distribution c. Find the variance of the distribution d. Find the standard deviation of the distribution 2. Given the following probability distribution X P (x) 0 0.9675 8000 0.03 20000 0.0025 Find the expected value of X. 45 www.ck12.org Binomial Distribution Quiz 1 1. Suppose x is a binomial random variable with n = 5, p = 0.25. Calculate p(x) for the values: x = 0, 1, 2, 3, 4, 5. Give the probability distribution in tabular form. 2. Suppose x is a binomial random variable with n = 4 and p = 0.5. a. Display p(x) in tabular form. b. Compute the mean and the variance of x. 3. In a test for ESP, a subject is told that cards the experimenter can see but he cannot contain a star, a circle, a wave or a square. As the experimenter looks at each of the 20 cards in turn, the subject names the shape on the card. A subject who is just guessing has probability .25 of guessing correctly on each card. a. The count of correct guesses in 20 cards has a binomial distribution. What are n and p? b. What is the mean number of correct guesses? c. What is the probability of exactly 5 correct guesses? Quiz 2 1. Suppose x is a binomial random variable with n = 3, p = 0.2. Calculate p(x) for the values: x = 0, 1, 2, 3. Give the probability distribution in tabular form. 2. Suppose x is a binomial random variable with n = 5 and p = 0.4. a. Display p(x) in tabular form. b. Compute the mean and the variance of x. 3. A federal report finds that lie detector tests given to truthful persons have probability of .2 suggesting that the person is deceptive. A company asks 12 job applicants about thefts from previous employers, using a lie detector to assess their truthfulness. Suppose that all 12 answer truthfully. i. What is the probability that the lie detector says all 12 are truthful? ii. What is the probability that the lie detector says at least one is deceptive? iii. What is the mean number among 12 truthful persons who will be classified as deceptive? www.ck12.org 46 Quiz 3 1. Suppose x is a binomial random variable with n = 7, p = 0.2. Calculate p(x) for the values: x = 0, 1, 2, 3, 4, 5, 6, 7. Give the probability distribution in tabular form. 2. Suppose x is a binomial random variable with n = 7 and p = 0.5. a. Display p(x) in tabular form. b. Compute the mean and the variance of x. 3. A test for the presence of antibodies to the AIDS virus in blood has probability 0.99 of detecting the antibodies when they are present. Suppose that during a year 20 units of blood with AIDS antibodies pass through a blood bank. a. Take X to be the number of these 20 units that the test detects. What is the distribution of X? b. What is the probability that the test detects all 20 contaminated units? c. What is the probability that at least one unit is not detected? d. What is the mean number of units among the 20 that will be detected? Geometric Distribution Quiz 1 1. A basketball player has made 75% of his foul shots during the season. Assuming the shots are independent, find the probability that in tonight’s game he a. Misses for the first time on his fifth attempt. b. Make his first basket on his fourth shot. c. Makes his first basket on one of his first 3 shots. 2. Suppose the average number of lions seen on a 1-day safari is 5. a. What is the probability that tourists will see exactly four lions on the next 1-day safari? b. What is the probability that tourists will see exactly one lion on the next 1-day safari? 3. Of pre-med students in a private university, on average only 36% of students enrolled in a given section of organic chemistry will pass. What is the probability that Sarah will have to take the class three times in order to pass? 47 www.ck12.org Quiz 2 1. A tool hire shop has six lawn mowers which it hires out on a daily basis. The number of lawn mowers requested per day follows a Poisson probability distribution with mean 4.5. Find the probability that: i. exactly three lawn mowers are hired out on any one day; 2. Bob is a high school basketball player. He is a 70% free throw shooter. That means his probability of making a free throw is 0.70. What is the probability that Bob makes his first free throw on his fifth shot? 3. Over the course of a season, a basketball player is a 26% free throw shooter. In practice, her coach tells her to take 50 throw shots. What is the probability that she makes her first basket on the 4th shot? Quiz 3 1. A statistics professor find s that when she schedules an office hour for student help, an average of two students arrive. Find the probability that in a randomly selected office hour, the number of student arrivals is five. 2. In a deck of 52 cards there are 12 face cards. a. What is the probability of drawing a face card from the deck? b. If you draw cards with replacement (that is you replace the card before you choose another card), what is the probability that the first face card you draw is the tenth card? 3. The mean number of wiring faults in a new house is 8. What is the probability of buying a new house with exactly 1 wiring fault? Test 1 1. In a population of students, the number of calculators owned is a random variable x with P (x = 0) = .2, P (x = 1) = .6 and P (x = 2) = .2. The mean of this probability distribution is a. 0 b. 2 c. 1 d. .5 2. Refer to the previous problem. The variance of this probability distribution is www.ck12.org 48 a. 1 b. .63 c. .5 d. .4 e. The answer cannot be computed from the information given. 3. A psychologist studied the number of puzzles subjects were able to solve in a five minute period while listening to soothing music. Let x be the number of puzzles completed successfully by a subject. X had the following distribution X P (x) 1 .2 2 .4 3 .3 4 .1 What is the probability that a randomly chosen subject completes at least 3 puzzles in the ifve minute period while listening to soothing music? a. .3 b. .4 c. .6 d. .9 e. The answer cannot be computed from the information given. 4. Using the data in problem 3, P (X < 3) = a. .3 b. .4 c. .6 d. .9 5. Which of the following is not a property of a binomial experiment? a. It consists of a fixed number of trials n. b. Outcomes of different trials are independent. c. Each trial can result in one of several different outcomes. d. X = the number of successes observed when the experiment is performed. 6. The probability that 0, 1, 2, 3, or 4 people will seek treatment for the flu during any given hour at an emergency room is shown in the distribution 49 www.ck12.org X P (X) 0 .12 1 .25 2 .32 3 .24 4 .06 a. What does the random variable count or measure? b. What is the mean of X? c. What is the variance and standard deviation of X? 7. There is a probability of 0.08 that a vaccine will cause a certain side effect. Suppose that a number of patients are inoculated with the vaccine. We are interested in the number of patients vaccinated until the first side effect is observed. a. Define the random variable of interest, X = b. Find the probability that exactly 5 patients must be vaccinated in order to observe the first side effect. c. Construct a probability distribution table for X (up through X = 5). The National Association of Retailers reports that 65% of all purchases are now made by credit card; on a typical day a retailer makes 20 sales. 8. Explain why the sales can be considered as Bernoulli trials. 9. What is the probability that the fifth customer is the first one who uses a credit card? 10. Let X = number of customers who use a credit card on a typical day. What is the probability model for X? Give the mean and standard deviation. 11. What is the probability that on a typical day at least half of the customers use a credit card? Test 2 1. Which of the following is not a property of a geometric experiment? a. It consists of a fixed number of trails n. b. Outcomes of different trials are independent c. Each trial can result in one of two possible outcomes. d. The probability of success is the same for all trials. 2. If x is a binomial random variable with n = 10 and p = .25 then a. σX = 1.875 √ b. σX = 2.5 www.ck12.org 50 c. σX = √ 1.875 √ 2 d. σX = 1.875 3. A friend of yours plans to toss a fair coin 150 times. You watch the first 30 tosses, noticing that she got only 11 heads. Then you get bored and leave. If the coin is fair, how many heads do you expect her to have when she has finished the 150 tosses? a. 80 b. 75 c. 92 d. 100 e. 96 4. Which of these has a geometric model? a. The number of black cards in a 10-card hand b. The colors of the cars in a parking lot c. The number of hits a baseball player gets in 6 times at bat. d. The number of cards drawn from a deck until we find all four aces. e. The number of people we survey until we find someone who owns an ipod. 5. Which of these has a binomial model? a. The number of black cards in a 10-card hand b. The colors of the cars in a parking lot c. The number of hits a baseball player gets in 6 times at bat. d. The number of cards drawn from a deck until we find all four aces. e. The number of people we survey until we find someone who owns an ipod. 6. Coke is running a sales promotion in which 13% of all bottles have a “FREE” logo under the cap. What is the probability that you find three free one in a 6-pack?(.0289) a. 1% b. 3% c. 13% d. 12% e. 23% The owner of a store is trying to decide whether to discontinue selling tabloid newspapers. 51 www.ck12.org He suspects that only 5% of the customers buy a tabloid. He decides that for one day he’ll keep track of the number of customers and whether or not they buy a tabloid. 7. Assuming the owner is correct in thinking that 5% of the customers purchase tabloids, how many customers should he expect before someone buys a tabloid? 8. What is the probability that he does not sell a tabloid until the 5th customer? 9. What is the probability that exactly 3 of the first 15 customers buy tabloids? 10. What is the probability that at least 5 of his first 40 customers buy tabloids? 11. He had 300 customers that day. Assuming this day was typical for his store, what would be the mean and standard deviation of the number of customers who buy tabloids each day? 1.5 Normal Distribution Standard Normal Probability Distribution Quiz 1 1. When a specific vegetable is grown in a certain manner without fertilizer the weight of the vegetable produced is normally distributed with a mean of 40 g and a standard deviation of 10 grams. Determine the proportion of the vegetable grown a. With weights less than 50 grams b. With weight greater than or equal to 60 grams 2. A clock manufacturer investigated the accuracy of its clock after a year of continuous use. He found that the mean error was 0 minutes with a standard deviation of 2 minutes. If a buyer purchases 600 of these clocks, find the expected number that will be on time or up to 4 minutes fast after a year of continuous use. 3. IQ tests are standardized to a normal model with a mean of 100 and a standard deviation of 16. Draw the model for these IQ scores. Clearly label it showing what the empirical rule predicts about the scores. 4. The average reading speed of students completing a speed-reading course is 450 words per minute. If the standard deviation is 70 words per minute, find the z score associated with a reading speed of 420 words per minute. 5. Given the following set of data, find the mean and standard deviation and z score of each speed and create a normal probability plot from your results. Based on your plot comment on the normality of the distribution. www.ck12.org 52 Speed 31 29 31 34 27 34 37 28 29 30 26 29 24 38 34 31 36 29 31 34 34 32 36 Quiz 2 1. When a certain vegetable is grown without fertilizer the vegetable produced have weights that are normally distributed with a mean of 140 grams and a standard deviation of 40 grams. Determine the proportion of the vegetable grown a. With weights less than 60 grams. b. With weights between 20 and 60 grams. 2. A clock manufacturer investigated the accuracy of its clock after a year of continuous use. He found that the mean error was 0 minutes with a standard deviation of 2 minutes. If a buyer purchases 600 of these clocks, find the expected number that will be on time or up to 6 minutes slow after a year of continuous use. 3. IQ tests are standardized to a normal model with a mean of 100 and a standard deviation of 16. Draw the model for these IQ scores. In what interval would you expect the central 68% of the IQ scores to be found? 4. The average reading speed of students completing a speed-reading course is 450 words per minute. If the standard deviation is 70 words per minute, find the z score associated with a reading speed of 475 words per minute. 5. IQ scores for a random sample of people are shown below. 72 79 87 91 99 101 103 106 111 113 116 126 Find the mean, standard deviation and z score for each and create a normal probability plot. Based on your plot, comment on the normality of this data. Quiz 3 1. The height of male students if normally distributed with a mean of 170 cm and a standard deviation of 8 cm. Find the percentage of male students whose height is between 162 cm and 170 cm. 2. A clock manufacturer investigated the accuracy of its clock after a year of continuous use. He found that the mean error was 0 minutes with a standard deviation of 2 minutes. 53 www.ck12.org If a buyer purchases 600 of these clocks, find the expected number that will be between 4 minutes slow and 6 minutes fast after a year of continuous use. 3. Automobiles that have been recently tested predicted a mean of 24.8 mpg and a standard deviation of 6.2 mph for highway driving. Assume a Normal model can be applied. Draw the model for auto fuel economy. Clearly label it, showing what the empirical rule predicts about miles per gallon. 4. The average reading speed of students completing a speed-reading course is 450 words per minute. If the standard deviation is 70 words per minute, find the z score associated with a reading speed of 320 words per minute. 5. Given the following data: 22 17 18 29 22 23 24 23 17 21. Find the mean, standard deviation and z score for the data and create a normal probability plot. Based on your plot, comment on the normality of the data. The Density Curve of the Normal Distribution Quiz 1 1. Given that a random variable X is normally distributed with a mean 70 and standard deviation 4, find P (x ≥ 74) by first converting to the standard variable z and then using the table of standard normal probabilities. 2. The arm lengths of 18 year old females are normally distributed with mean 64 cm and standard deviation 4 cm. Find the percentage of 18 year old females whose arm lengths are between 59 cm and 74 cm. 3. Use the table to verify the empirical rule for P (−2 ≤ z ≤ 2) 4. Fish are washed onto a beach after a storm. Their lengths are found to have a normal distribution with a mean of 41 cm and a variance of 11 square cm. Find the proportion of fish measuring between 40 cm and 50 cm. Quiz 2 1. Given that a random variable X is normally distributed with a mean 70 and standard deviation 4, find P (x ≤ 68) by first converting to the standard variable z and then using the table of standard normal probabilities. 2. The arm lengths of 18 year old females are normally distributed with mean 64 cm and standard deviation 4 cm. Find the percentage of 18 year old females whose arm lengths are greater than 61 cm. 3. Use the table to verify the empirical rule for P (−3 ≤ z ≤ 3) www.ck12.org 54 4. Fish are washed onto a beach after a storm. Their lengths are found to have a normal distribution with a mean of 41 cm and a variance of 11 square cm. If a fish is randomly selected, find the probability that it is at least 50 cm. Quiz 3 1. Given that a random variable X is normally distributed with a mean 70 and standard deviation 4, find P (60.6 ≤ x ≤ 68.4) by first converting to the standard variable z and then using the table of standard normal probabilities. 2. The arm lengths of 18 year old females are normally distributed with mean 64 cm and standard deviation 4 cm. Find the probability that a randomly chosen 18 year old female has an arm length in the range 55 cm to 67 cm. 3. Use the table to find the P (−1.64 ≤ z ≤ 1.64) 4. Fish are washed onto a beach after a storm. Their lengths are found to have a normal distribution with a mean of 41 cm and a variance of 11 square cm. How many fish from a sample of 200 would you expect to measure at least 45 cm? Applications of the Normal Distribution Quiz 1 1. Circular tokens are used to operate a washing machine. The diameters of the tokens are known to be normally distributed. Only tokens with diameters between 1.93 and 2.05 cm will operate the machine. a. Find the mean and standard deviation of the distribution given that 2% of the tokens are too small and 3% are too large. 2. From the results of a statistics test, the average score was 46 with a standard deviation of 25. The teacher decided to give an A to the top 7% of the students in the class. Assuming that the scores were normally distributed, find the lowest score that a student must obtain in order to achieve an A. 3. Assume a normal distribution. Find the missing parameter. µ = 20, 45% above 30, σ =? Quiz 2 1. Assume a normal distribution, find the missing parameter. 55 www.ck12.org µ = 0.64, 12% above 0.70, σ =? 2. a. The average weight of eggs produced by young hens is 50.9 grams and only 28% of the eggs exceed 54 grams. Assume the normal distribution is appropriate, what would the standard deviation of the egg weights be? (5.3 grams) b. When the hens have reached the age of 1 year, the eggs they produce average 67.1 grams and 98% of them are above 54 grams. Again, assuming a normal distribution, what is the standard deviation of these eggs weights? (6.4 grams) c. Are egg sizes more consistent for the younger hens or the older ones? Explain. (younger since sd is smaller) 3. A tire manufacturer believes that the treadlife of its snow tires can be described by a normal distribution with a mean of 32, 000 miles and standard deviation of 2500 miles. He wants to offer a refund to any customer whose tires fail to last a certain number of miles. He is willing to give refund to no more than 1 of every 25 customers, for what mileage can he guarantee these tires to last? Quiz 3 1. Assuming a normal distribution, find the missing parameter σ = .5, 80% below 100, µ =? (95.79) 2. While only 5% of babies have learned to walk by the age of 10 months, 75% are walking by the age of 13 months. If the age at which babies develop the ability to walk can be described by a normal model, find the mean and standard deviation of the model. (mean = 12.1, sd = 1.3) 3. A department store sells furniture that is advertised with the claim that it “takes less than an hour to assemble”. However, through surveys the store has learned that only 25% of their customers succeeded in building the furniture in under an hour; 5% said it took them over 2 hours. The store assumes that consumer assembly time follows a normal distribution. a. Find the mean and standard deviation of this model. (mean = 1.29 hours, sd = .43 hours) b. The store wants to change its advertising claim. What assembly time should the store quote in order that 60% of the customers succeed in finishing the assembly by then? (1.4 hours) www.ck12.org 56 Test 1 1. The weights of cockroaches living in a typical college dormitory are approximately normally distributed with a mean of 80 grams and a standard deviation of 4 grams. The percentage of cockroaches weighing between 77 grams and 83 grams is about: a. 99.7% b. 95% c. 68% d. 55% e. 34% 2. Scores on the ACT are normally distributed with a mean of 18 and a standard deviation of 6. The interquartile range of the scores is approximately: a. 8.1 b. 12 c. 6 d. 10.3 e. 7 3. The test grades at a large school have an approximately normal distribution with a mean of 50. What is the standard deviation of the data so that 80% of the students are within 12 points (above or below) the mean? a. 5.875 b. 9.375 c. 10.375 d. 14.5 e. cannot be determined from the given information 4. The average cost per ounce for glass cleaner is 7.7 cents with a standard deviation of 2.5 cents. What is the z−score of Windex with a cost of 10.1 centers per ounce? a. .96 b. 1.31 c. 1.94 d. 2.25 57 www.ck12.org e. 3.00 5. Jay Olshansky from the University of Chicago was quoted in Chance News as arguing that for the average life expectancy to reach 100, 18% of the people would have to live to age 120. What standard deviation is he assuming for this statement to make sense? a. 21.7 b. 24.4 c. 25.2 d. 35.0 e. 111.1 6. The best male long jumpers for State College since 1973 have averaged a jump of 263.0 inches with a standard deviation of 14.0 inches. The best female long jumpers have averaged 201.2 inches with a standard deviation of 7.7 inches. This year Joey jumped 275 inches and his sister, Carla, jumped 207 inches. Both are State College students. Assume that the lengths of jumps for both males and females are approximately normal. Within their groups, which athlete had the more impressive performance? 7. The length of pregnancies from conception to natural birth among a certain female population is normally distributed random variable with mean 270 and standard deviation 10 days. a. What is the percent of pregnancies that last more than 300 days? b. How short must a pregnancy be in order to fall in the shortest 10% of all pregnancies? 8. For a normally distributed population, fill in the following blanks: % of the population observations lie within 1.96 standard deviations on either side of a. the mean. b. % of the population observations lie within 1.64 standard deviations on either side of the mean. 9. Find the proportions of observations from a standard normal distribution that satisfies each of these statements. In both cases, sketch a standard normal curve and shade the area under the curve that answers the question. a. Z > −1.68 b. −0.84 < Z < 1.26 Test 2 1. The empirical Rule can be used when assessing a distribution if www.ck12.org 58 a. The distribution is approximately normal b. The distribution is skewed c. The distribution is heavily tailed d. The standard deviation is close to the interquartile range e. The mean is equal to 0 and the standard deviation is equal to 1 2. Which of the following are true statements? I the area under the normal curve is always equal to 1 II The smaller the standard deviation of a normal curve, the higher and narrower the graph. III Normal curves with different means are centered around different numbers. a. I and II b. I and III c. II and III d. All of the above e. None of the above 3. The heights of adult women are approximately normally distributed about a mean of 65 inches with a standard deviation of 2 inches. If Rachel is at the 99th percentile in heigh for all adult women, then her height, in inches, is closest to a. 60 b. 62 c. 68 d. 70 e. 74 4. Joan’s doctor told her that the standardized score for her systolic blookd pressure, as compared to the blood pressure of other women her age, is 1.50. Which of the following is the best interpretation of her standardized score? a. Joan’s systolic blood pressure is 150. b. Joan’s systolic blood pressure is 1.5 standard deviations above the average systolic blood pressure of women her age. c. Joan’s systolic blood pressure is 1.5 above the average systolic blood pressure of women her age d. Joan’s systolic blood pressure is 1.5 times the average systolic blood pressure of women 59 www.ck12.org her age e. Only 1.5% of women Joan’s age have a higher systolic blood pressure than she does. 5. Suppose the test scores of 600 students are normally distributed with a mean of 76 and standard deviation of 8. The number of students scoring between 72 and 80 is a. 272 b. 164 c. 230 d. 136 e. 328 6. Which of the following is NOT CORRECT about a standard normal distribution? a. P (0 < Z < 1.50) = .4332 b. P (Z < −1.0) = .1587 c. P (Z > 2.0) = .0228 d. P (Z < 1.5) = .9332 e. P (Z < −2.5) = .4938 7. At a college the scores on the chemistry final exam are approximately normally distributed, with a mean of 75 and a standard deviation of 12. The scores on the calculus final are also approximately normally distributed with a mean of 80 and a standard deviation of 8. A student scored 81 on the chemistry exam and 84 on the calculus final. Relative to the other student in each class, in which subject did this student do better? 8. Men’s shirt sizes are determined by their neck sizes. Suppose that men’s neck sizes are approximately normally distributed with a mean of 15.7 inches and a standard deviation of 0.7 inch. A retailer sells men’s shirts in sizes S, M, L, and XL, where the shirt sizes are defined as follows: Table 1.7: Shirt Size Neck Size S M L XL 14 ≤ 15 ≤ 16 ≤ 17 ≤ neck neck neck neck size size size size < 15 < 16 < 17 < 18 a. Because the retailer only stocks the sizes listed above, what proportion of customers will find that the retailer does not carry any shirts in their sizes? Show your work. www.ck12.org 60 b. Using a sketch of a normal curve, illustrate the percentage of men whose shirt size is M . Calculate this percentage. 9. If Z is standard normally distributed, find k using technology if P (z ≤ k) = 0.878 10. Find the mean and the standard deviation of a normally distributed random variable if P (x ≥ 50) = 0.2 and P (x ≤ 20) = 0.3 11. 1.6 Planning and Conducting an Experiment or Study Surveys and Sampling Quiz 1 1. A magazine mailed a questionnaire to the human resource directors of all of the Fortune 500 companies, and received responses from 28% of them. Those responding reported that they did not find that such surveys intruded significantly on their workday. a. Identify the population of interest b. The population parameter of interest c. The sampling frame d. The sample e. The sampling method f. Any potential source of bias 2. The following question was part of a survey taken by the PTA (parent/teacher association) in an effort to obtain parents’ opinions. Do you think the response to the following question might be biased? If yes, propose a question with more neutral working that might better assess parental response. (answers will vary – wording bias) Should elementary school-age children have to pass high stakes tests in order to remain with their classmates? 3. What type of sampling is evident in the following: You want to determine the reading level of a book. You choose a chapter of the book at random, then a page from that chapter and you find 560 words on the page. You take your sample by choosing every 28th word on the page. 4. Define simple random sample and give an example. 61 www.ck12.org Quiz 2 1. A consumers union asked all subscribers whether they had used alternative medical treatments and, if so, whether they had benefited from them. For almost all of the treatments, approximately 24% of those responding reported cures or substantial improvement in their condition. a. Identify the population of interest. b. The population parameter of interest c. The sampling frame d. The sample e. The sampling method f. Any potential source of bias 2.The following question was part of a survey taken by the PTA (parent/teacher association) in an effort to obtain parents’ opinions. Do you think the response to the following question might be biased? If yes, propose a question with more neutral working that might better assess parental response Should schools and students be held accountable for meeting yearly learning goals by testing student before they advance to the next grade? (Answers will vary – wording bias) 3. What type of sampling is evident in the following: You want to survey how student feel about funding for the basketball team at a large university. The campus is 65% men and 35% women. You select 65 men at random and then 35 women at random. 4.Define cluster sampling and give an example. Quiz 3 1.Researchers waited outside a bar they had randomly selected from a list of all bars. They stopped every 10th person who came out of the bar and asked whether he or she thought drinking and driving was a serious problem. a. Identify the population of interest. b. The population parameter of interest c. The sampling frame d. The sample e. The sampling method www.ck12.org 62 f. Any potential source of bias 2. Examine each of the following questions for possible bias. If you think the question is biased, indicate how and propose a better question. g. Should companies that pollute the environment be compelled to pay the costs of cleanup? h. Given that 18−year olds are old enough to vote and to serve in the military, is it fair to set the drinking age at 21? 3. You want to assess the reading level of a book. You pick one page at random and use every word on that page. 4. Define stratified sampling and give an example. Experimental Design Quiz 1 1. Over a 6−month period, among 25 people with a mental disorder, patients who were given a high dose of omega−3 fats from fish oil improved more than those given a placebo. a. Is this an observational study or an experiment? b. If it is an experiment, identify (if possible) i. The subjects studied ii. The factor(s) in the experiment iii. The design iv. Whether it was blind or double blind 2. Dentists in a dental clinic were trying to determine whether the number of new cavities differs for people who eat an apple each day and for people who eat less than one apple per week. Two groups of clinic patients would be studied. One group would consist of 50 patients who report that they eat an apple each day; the other group would consist of 50 patients who report that they eat less than one apple per week. Dentists would examine the patients and their records to determine the number of new cavities the patients had over the preceding year and compare the two groups. a. Explain why this is an observational study b. What is the confounding variable? 3. A medical researcher is interested in testing a new medicine for migraine headaches. She decides to conduct a clinical trial on 100 randomly selected adults who get migraine headaches at a rate of one per week or more. Although age and gender are not of primary interest in 63 www.ck12.org the trials, the researcher is concerned that these factors may impact the effectiveness of the drug. Describe how she should set up her experiment for the 100 subjects if she wishes to control for gender. Quiz 2 1. There is some concern that for women who are taking estrogen, after menopause, that if they also drink alcohol their estrogen levels will rise too high. Twenty-four volunteers, 12 who were receiving supplemental estrogen and 12 who were not, were randomly divided into two groups. One group drank an alcoholic beverage and the other drank a nonalcoholic beverage. An hour later everyone’s estrogen level was checked. Only those on supplemental estrogen who drank alcohol showed a marked increase. a. Is this an experiment or an observational study? b. If it is an experiment identify (if possible) i. The subjects studied ii. The factor(s) in the experiment iii. The design iv. Whether it was blind or double blind 2. Researchers wanted to compare the effects of a new drug to the effects of an existing drug for reducing cholesterol levels in patients. They designed a completely randomized experiment using volunteers with a history of high cholesterol levels. An improved design would incorporate blocking. Name two different variables that one might use for the block design. 3. A medical researcher is interested in testing a new medicine for migraine headaches. She decides to conduct a clinical trial on 100 randomly selected adults who get migraine headaches at a rate of one per week or more. Although age and gender are not of primary interest in the trials, the researcher is concerned that these factors may impact the effectiveness of the drug. Describe how she should set up her experiment for the 100 subjects if she wishes to control for age. She decides on age categories of young (21 − 35), middle (36 − 55) and elderly (over 55). Quiz 3 1. Some gardeners prefer to use non-chemical methods to control insects in their gardens. Researchers have designed two kinds of traps, and want to know which design will be more www.ck12.org 64 effective. They randomly choose 10 locations in a large garden and place one of each kind of trap at each location. After a week they count the number of insects in each trap. a. Is this an experiment or an observational study? b. If it is an experiment identify (if possible) i. The subjects studied ii. The factor(s) in the experiment iii. The design iv. Whether it was blind or double blind 2.Researchers wanted to compare the weight gain for salmon raised on an old type of food with their weight gain using a new type of food. The fish were randomly assigned to tanks, but the tanks were located in areas where room temperature varied greatly. c. What is the response variable? d. What is the explanatory variable? e. How many treatments are there? f. What is the blocking variable? 3. A medical researcher is interested in testing a new medicine for migraine headaches. She decides to conduct a clinical trial on 100 randomly selected adults who get migraine headaches at a rate of one per week or more. Although age and gender are not of primary interest in the trials, the researcher is concerned that these factors may impact the effectiveness of the drug. Describe how she should set up her experiment for the 100 subjects if she wishes to control both for age and gender. Test 1 1. The student government at a high school wants to conduct a survey of student opinion it wants to begin with a random sample of 60 students. Which of the following survey methods will produce a stratified random sample? a. Survey the first 60 students to arrive at school in the morning. b. Survey every 10th student entering the school library until 60 are surveyed. c. Number all students on the official school roster and then use random numbers to choose 15 freshmen, 15 sophomores, 15 juniors and 15 seniors. d. Number the cafeteria seats, and use a table of random numbers to choose seats and interview the students until 60 have been interviewed. 65 www.ck12.org e. Number the students I the official school roster, and then use random numbers to choose 60 students from the roster for the survey. 2. Which of the following can be used to show a cause-and-effect relationship between two variables? a. A census b. A controlled experiment c. An observational study d. A sample survey e. A cross-sectional survey 3. To check the effects of cold temperatures on the elasticity of two brands of rubber bands, one box of Brand A and one box of Brand B rubber bands are tested. Ten rubber bands from the Brand A box are placed in a freezer for two hours, and ten bands from the Brand B box are kept at room temperature. The amount of stretch before breaking is measured on each rubber band , and the mean for the cold bands is compared to the mean for the others. Is this good experimental design? a. No, because means are not proper statistics for comparison. b. No, because more than two brands should be used. c. No, because more temperatures should be used. d. No, because temperature is confounded with brand. e. Yes 4. They Physician’s Health Study, a large medical experiment involving 22, 000 male physicians, attempted to determine whether aspirin could help prevent heart attacks. In this study, one group of 11, 000 physicians took an aspirin every other day, while a control group took a placebo. After several years, it was determined that the physicians in the group that took the aspirin had significantly fewer heart attacks than the physicians in the control group. Which of the following explains why it would NOT be appropriate to say that everyone should take an aspirin every other day? I the study included only physicians and different results may occur in individuals in other occupations. II The study included only males and there may be different results for females. III Although taking aspirin may be helpful in preventing heart attacks, it may be harmful to some other aspects of health. a. I only b. II only www.ck12.org 66 c. III only d. II and III only e. I, II and III 5. Which of the following is NOT a source of bias in survey design? a. Undercoverage b. Non-response c. Working of questions d. Voluntary response e. All are sources of bias 6. Suppose you wish to compare the AP Statistics exam results for the male and female students taking AP Statistics at your school. Which is the most appropriate technique for gathering the needed data? a. Census b. Sample survey c. Experiment d. Observational study e. None of these is appropriate 7. In addition to control by comparing several treatments, the TWO other basic principles which distinguish experiments from observational studies include; I randomization, i.e. assigning researchers by chance II randomization, i.e. assigning subjects by chance III replication, i.e. doing a study more than once IV replication, i.e., doing a study with many subjects a. I and III b. I and IV c. II and III d. II and IV e. IV and V 8. Which of the following statements are true? I. Random sampling is a good way to reduce response bias. 67 www.ck12.org II. To guard against bias from undercoverage, use a convenience sample. III. Increasing the sample size tends to reduce survey bias. IV. To guard against nonresponse bias, use a mail-in survey. A. I only B. II only C. III only D. IV only E. None of the above. 9. A food company assesses the nutritional quality of a new “instant breakfast” product by feeding it to newly weaned male white rats. The response variable is a rat’s weight gain over a 28 day period. A control group of rats eats a standard diet but otherwise receives exactly the same treatment as the experimental group. a. How many factors does this experiment have? b. How many levels for each factor? c. The experimenters had 30 rats for this experiment. How should they set up the experiment? 10. Many utility companies have introduced programs to encourage energy conservation among their customers. An electric company considers placing electronic indicators in a household to show what the cost would be if the electricity use at that moment continued for a month. Will indicators reduce electricity use? Would cheaper method work almost as well? The company decides to design an experiment. One cheaper method is to give customers a chart and information about monitoring their electricity use. The experiment compares these two approaches (indicator , chart) and a control. The control receives information about energy conservation but no help in monitoring electricity use. The response variable is total electricity used in a year. The company finds 60 single family residences in the same city that are willing to participate. What will the design look like? Test 2 1. A marketing company offers to pay $35 to the first 200 persons who respond to their advertisement and complete a questionnaire regarding displays of their client’s product. The situation is an example of which of the following? a. Simple random sample b. Convenience sample c. Voluntary response sample www.ck12.org 68 d. Multistage cluster sample e. None of the above 2. A simple random sample was selected of large urban school districts throughout New England. The selected districts were identified as target districts. Within each district, a simple random sample of its high schools was chosen and the principals of those high schools were interviewed. Which of the following statements regarding this design is NOT true? a. This is an example of multi-stage cluster sample. b. Results from the interviews cannot be used to infer responses of the population of interest. c. The population of interest is the set of all high school principals from large urban school districts in New England. d. Not every subset of principals has the same chance of selection. e. All of these statements are true. 3. Which of the following is an example of a census? a. Every fifth person leaving a supermarket is asked to name his or her favorite brand of peanut butter. b. Each employee in a corporation fills out a questionnaire for a management survey. c. All the students who are at a school on a particular day rate the food in the cafeteria. d. A telephone political poll selects ten names from every page of a city directory. e. All the commuters who are dissatisfied with the service of their commuter train company are asked to write a letter of complaint. 4. Subjects are randomly assigned to watch either a horror movie or a comedy, and the amount of popcorn they during the movie is measured. For this experiment, the type of movie is a. An independent variable. b. A dependent variable c. A continuous variable. d. A constant 5. Use the following excerpt from a random digit table to answer this question; 21052 65031 45074 92846 67815 78231 01548 20235 56410 82713 If data are labeled: 1: Chevy; 2. Plymouth; 3. Lincoln; 4. Volkswagen; 5. Porsche; 6. Ford; 69 www.ck12.org and single digit rand digit selection begins at the left side of the row, which cars wold be included in a simple random sample of three cars? a. Plymouth, Lincoln, Chevy b. Plymouth, Chevy, Porsche c. Plymouth, Ford, Porsche d. Lincoln, Plymouth, Porsche e. None of the above 6. A nutritionist wants to study the effect of storage time (6, 12, 18 months) on the amount of vitamin C present in freeze dried fruit when stored for these lengths of time. Vitamin C is measured in milligrams per 100 milligrams of fruit. Six fruit packs were randomly assigned to each of the three storage times. The treatment, experimental unit and response are respectively, a. A specific storage time, amount of vitamin C, a fruit pack b. A fruit pack, amount of vitamin C, a specific storage time c. Random assignment, a fruit pack, amount of vitamin C d. A specific storage time, a fruit pack, amount of vitamin C e. A specific storage time, the nutritionist, amount of vitamin C 7. Match the words in the first column with their correct definitions (second column) Hypothesis any factor in an experiment that changes Constant any factor that is not allowed to change Control a statement of a possible relationship between the independent and Dependent variables. Independent variable used to reduce the effects of chance errors. Dependent variable the factor in an experiment that is changed on purpose Variable the factor in an experiment that responds to the purposely changed Factor 8. Researchers are interested in the effects of repeated exposure to an advertising message. All subjects view a 40 minute television program that includes ads for a camera. Some subjects saw a 30 second commercial, some a 90 second commercial. The same commercial was repeated 1, 3, or 5 times during the program. After viewing, the subjects answered questions about their recall of the ad, their attitude toward the camera and their intention to purchase it. There were 36 subjects, 24 women and 12 men. a. What are the treatments? www.ck12.org 70 b. What are the response variables? c. Why would a block design be desired? d. Outline the design of this experiment. 9. Some schools teach reading using phonics and others using whole language (word recognition). Suppose a school district wants to know which method works better. Suggest a design for an appropriate experiment. 1.7 Sampling Distributions and Estimations Z Score and Central Limit Thereom Quiz 1 1. A sample is chosen randomly from a population that can be described by a normal model. a. What is the sampling distribution model for the sample mean? Describe shape, center and spread. b. If we choose a larger sample, what is the effect on the sampling distribution model? 2. Assume that the duration of human pregnancies can be described by a Normal model with mean 266 days and standard deviation 16 days. a. What percentage of pregnancies should last between 260 days and 278 days? b. Suppose a doctor is currently providing prenatal care to 70 pregnant women. He is interested in the mean length of their pregnancies. What is the distribution of the mean length of pregnancy? c. What is the probability that the mean duration of these patients’ pregnancies will be less than 260 days? 3. State the Central Limit theorem. Quiz 2 1. There is a city in upstate New York that gets an average of 35.4 inches of rain each year with a standard deviation of 4.2 inches. Assume the Normal model applies. a. During what percentage of years does this city get more than 45 inches of rain? b. Less than how much rain falls in the driest 25% of all years? 2. Using the same information in problem 1, let ȳ represent the mean amount of rain for 71 www.ck12.org eight years. a. Describe the sampling distribution model of this sample mean. b. What is the probability that those 8 years average less than 32 inches of rain? 3. State the conditions that the Central Limit Theorem requires. (random sampling, independent values) Quiz 3 1. Carbon monoxide emissions for a certain kind of car vary with mean 2.7 gm/mi and standard deviation 0.6 gm/mi. A company has 90 of these cars in its fleet. a. What percentage of cars would have an emission of carbon monoxide between 2.5 gm/mi and 3.0 gm/mi? 2. Using the same information as in problem 1, let ȳ represent the mean carbon dioxide level for the company’s fleet. a. What is the approximate model for the distribution of ȳ ? b. Estimate the probability that ȳ is between 3.0 and 3.1 gm/mi. c. There is only a 6% chance that the fleet’s mean carbon dioxide level is greater than what level? 3. True or False: The Central Limit Theorem only applies to samples that are drawn from a population that is normally distributed. Binomial Distribution and Binomial Experiments Quiz 1 1. In a test for ESP, a subject is told that cards the experimenter can see but he cannot contain a star, a circle, a wave, or a square. As the experimenter looks at each of the 20 cards in turn, the subject names the shape on the card. A subject who is just guessing has probability .25 of guessing correctly on each card. a. The count of correct guesses in 20 cards has a binomial distribution. What are n and p? b. What is the mean number of correct guesses? c. What is the probability of exactly 5 correct guesses? 2. An engineer chooses an SRS of 10 switches from a shipment of 10, 000 switches. Suppose that (unknown to the engineer) 10% of the switches in the shipment are bad. The engineer counts the number X of bad switches in the sample. Is this a binomial setting? Explain. www.ck12.org 72 3. Suppose the probability that an environmental engineer successfully lands a consulting job is .30 on each job bid on. Assume that the consulting jobs bid on are independent, and let x be the number of jobs landed in 5 jobs bid. a. What is the probability of landing exactly 3 consulting jobs? b. Find the probability of landing at most 2 jobs. c. Find the probability of landing at least 4 jobs. d. Determine the mean and standard deviation of x. Quiz 2 1. A federal report finds that lie detector tests given to truthful persons have probability of .2 of suggesting that the person is deceptive. a. A company asks 12 job applicants about thefts from previous employers, using a lie detector to assess their truthfulness. Suppose that all 12 answer truthfully. What is the probability that the lie detector says all 12 are truthful? What is the probability that the lie detector says at least one is deceptive? b. What is the mean number among 12 truthful persons who will be classified as deceptive? What is the standard deviation of this number? 2. You observe the sex of the next 20 children born at a local hospital; X is the number of girls among them. Is this a binomial situation? 3. Blood type is inherited. If both parents carry genes for the O and A blood types, each child has probability .25 of getting two O genes and so of having blood type O. Different children inherit independently of each other. The number of O blood types among 5 children of these parents is the count X of successes in 5 independent observations with probability .25 of a success on each observations So X has the binomial distribution with n = 5 and p = .25. What is the probability that at least two children will be born with blood type O? Quiz 3 1. A test for the presence of antibodies to the AIDS virus in blood has probability 0.99 of detecting the antibodies when they are present. Suppose that during a year 20 units of blood with AIDS antibodies pass through a blood bank. a. Take X to be the number of these 20 units that the test detects. What is the distribution of X? b. What is the probability that the test detects all 20 contaminated units? What is the probability that at least one unit is not detected? 73 www.ck12.org c. What is the mean number of units among the 20 that will be detected? What is the standard deviation of the number detected? 2. A couple decides to continue to have children until their first girl is born; X is the total number of children the couple has. 3. Bolts produced by a machine vary in quality. The probability that a given bolt is defective is 0.03. A random sample of 35 bolts is taken from the week’s production. If X denotes the number of defectives in the sample, find the mean and standard deviation of X. Confidence Intervals Quiz 1 1. The average composite ACT score for students who took the test in 2003 was 21.4. Assume that the standard deviation is 1.05. a. In a random sample of 36 students who took the exam, what is the probability that the average composite ACT score is 22 or more? b. Find a 90% confidence interval for µ , based on the sample information given above. 2. A survey designed to obtain information on the proportion of registered voters who are in favor of a constitutional amendment requiring a balanced budget results in a sample size of n = 400. Of the 400 voters sampled 272 are in favor of a constitutional amendment requiring a balanced budget. a. Give a point estimate of the population proportion in favor of a balanced budget amendment. b. Determine the estimated standard deviation of your point estimate. c. Calculate a 99% confidence interval for the population proportion in favor of the amendment. d. How large would n have to be in order to have estimated the population proportion to within .03 with 95% confidence? Quiz 2 1. Let x denote the variable which represents the amount of money spent by a tourist visiting The Grand Canyon. Historical information reveals that x is normally distributed with a mean of $250 and a standard deviation of $60. a. For a sample of n = 16, determine the mean and the standard deviation of the sampling distribution of the sample mean. www.ck12.org 74 b. For a sample of n = 36, determine the approximate probability that X̄ is greater than $280. c. For a sample of n = 36, determine the approximate probability that the total amount of money spent by the 36 tourists is greater than $9, 000. 2. A survey of 40, 000 American households in 1987 found that 30.5% of those in the sample had a pet cat. a. Use this sample information to form a 99% confidence interval to estimate the true proportion of all American households that owned a cat in 1987. b. Write a sentence interpreting the meaning of this interval. Quiz 3 1. In a survey of American households, 75.1% of the households claimed to have made a financial contribution to charity in the past year. a. If the survey had involved 1000 households, what would a 95% confidence interval be? b. Interpret, in words, the meaning of this confidence interval. c. Describe how increasing the number of households involved in the survey would change the 95% confidence interval. 2. The manager of an electronics department at a large department store is interested in knowing the mean size of TV screens (µ) that a customer purchases. Based upon industry standards it is believed that the standard deviation is 4 inches. a. If a sample of n = 36 yields a sample average TV screen size of 21.2 inches, calculate a 90% confidence interval for µ b. Determine how large a sample is needed in order to estimate µ within 1 inches with 95% confidence. Sums and Differences of Independent Random Variables Quiz 1 1. Consider the following two experiments: the first has outcome X taking on the values 0, 11, and 2, with equal probabilities; the second results in an (independent) outcome Y taking on the value 3 with probabiity 41 and 4 with probability 34 . a. Find the distribution of Y + X b. Find the mean and variance of Y + X. 75 www.ck12.org 2. Suppose X and Y are independent random variables. The variance of X is equal to 16; and the variance of Y is equal to 9. Let Z = X − Y . a. Find the standard deviation of Z. 3. Given independent random variables with means and standard deviations as shown, find the mean and standard deviation of each of these variables: Table 1.8: X Y Mean SD 80 12 12 3 a. 2Y + 20 b. .25X + Y c. X − 5Y Quiz 2 1. Consider the following two experiments: the first has outcome X taking on the values 0, 11, and 2, with equal probabilities; the second results in an (independent) outcome Y taking on the value 3 with probabiity 14 and 4 with probability 34 . a. Find the distribution of Y − X b. Find the mean and variance of Y − X(5) 2. You roll a die. If it comes up a 6 you win $100. If not, you get to roll again. If you get a 6 the second time you will $50. If not, you lose. a. Create a probability model for the amount you win at this game. b. Find the expected amount you’ll win. c. How much would you be willing to pa to pay this game? 3. Given independent random variables with means and standard deviations as shown, find the mean and standard deviation of each of these variables: Table 1.9: X Y www.ck12.org Mean SD 120 300 12 16 76 a. .8Y b. 2X − 100 c. 3X − Y Quiz 3 1. A couple plans to have children until they get a girl but they agree that they will not have more than three children even if all are boys. Assume boys and girls are equally likely. a. Create a probability model for the number of children they will have. b. Find the expect number of children. c. Find the expected number of boys they will have. 2. Given independent random variables with means and standard deviations as shown, find the mean and standard deviation of each of these variables: Table 1.10: X Y Mean SD 80 12 12 3 a. X − 20 b. .5Y c. X + Y 3. A grocery supplier believes that in a dozen eggs, the mean number of broken eggs. Is 0.7 with a standard deviation of .4 eggs. You buy 4 dozen eggs without checking them. a. How many broken eggs do you expect to get? b. What is the standard deviation? c. What assumptions do you have to make about the eggs in order to answer the questions? Student’s T Quiz 1 1. Find the critical value of t for a 90% confidence interval with 18 degrees of freedom. 2. Given the following sample data about automobile speeds in a residential area, find the 77 www.ck12.org 90% confidence interval for the true mean speed of the vehicles. Assume that the data satisfies the necessary conditions so that it can be approximated by a t-distribution. Speed 31 29 31 34 27 34 37 28 29 30 26 29 24 38 34 31 36 29 31 34 34 32 36 3. True or False: In order to use the t statistic, our sample must come from a normal population. Quiz 2 1. Find the critical value of t for a 95% confidence interval with 15 degrees of freedom. 2. Students weighed six bags of chips and recorded the following weights (in grams0 29.2, 28.5, 27.7, 27.9, 28.1, 28.5 The company claims bags of their chips weigh 28.3 grams. a. Find the mean and standard deviation of the observed values. b. Create a 95% confidence interval for the mean weight of such bags of chips. c. Explain in context what your interval means. 3. True or False: At statistic is used when the sample is taken from a normal distribution and the standard deviation is known. Quiz 3 1. Find the critical value of t for a 99% confidence interval with 18 degrees of freedom. 2. A consumer group tested 14 brands of vanilla ice cream and found the following numbers of calories per serving: 160 200 220 230 120 180 140 130 170 190 80 120 100 170 a. Create a 95% confidence interval for the average calorie content of vanilla ice cream. b. Explain what your interval means. (based on the sample we are 95% confident the average calorie content of vanilla ice cream is between www.ck12.org 78 3. True or False: To use the t statistic the sample should come from an underlying population which is normal, the standard deviation of the population is unknown and the sample size is small.). Test 1 1. The bound on the error of estimation associated with a 95% confidence interval for resulting in the interval (112.4, 121.6). Then a. 95% of the time, µ falls with the interval (112.4, 121.6) b. There is a 95% chance that µ falls within the interval (112.4, 121.6) c. 95% of all f the possible values of µ fall within the interval (112.3, 121.6) d. 95% of all f the possible samples produce intervals that do capture µ 2. If σ = 10, then the sample size required to estimate a population with mean µ to within .5 with 95% confidence is a. 40 b. 119 c. 1257 d. 1537 3. Which of the following is not a property of the t distribution? a. The t curve is centered at 0 and is bell shaped b. The t curve is more spread out than a z curve. c. The t curve tends to spread out as the degrees of freedom increases. d. The formula for a t variable is x̄ √x n 4. Which of the following statements concerning the Central Limit theorem is true? a. The Central Limit theorem predicts that the distribution of x̄ follows a normal distribution for every sample size n. b. The Central Limit Theorem predicts that the distribution of p follows a normal distribution for every sample size n. c. The Central Limit Theorem predicts that the distribution of p will be reasonable close to a normal distribution when n ≥ 30 d. The Central Limit Theorem predicts that the distribution of x̄ will be reasonable close to a normal distribution when n ≥ 30 79 www.ck12.org 5. The t distribution that you use to find your critical values closely resembles the normal distribution when: a. The sample mean is large b. The sample variance is large c. The sample size is large d. The population standard deviation is large 6. A pharmacist finds that 55%(π = .55) of all customers prefer name brand prescription drugs to generic prescription drugs. For a sample of n = 25 customers, let p be the sample proportion of customers who prefer name brand prescription drugs. a. Determine the mean and the standard deviation of the sampling distribution of p. b. Is it reasonable to assume that the sampling distribution of p is approximately normally distributed? c. What is the approximate probability that the sample proportion p is between .50 and .70? 7. The gas mileage for a certain model of car is known to have a standard deviation of 5 miles/gallon. A simple random sample of 64 cars of this model is chosen and found to have a mean gas mileage of 27.5 miles/gallon. Construct a 95% confidence interval for the mean gas mileage for this car model. Interpret the interval in words. 8. Using the following data and your calculator, construct a 90% confidence interval for the mean of the population from which the data was taken. Assume the underlying population is normally distributed. 29 34 34 28 30 29 38 31 29 34 32 9. Bolts are packed in boxes of 20. The probability of a bolt being defective is 0.1. what is the probability of a box containing 2 defective bolts? (.2852) 10. Let x1 and x2 be random variables with means and standard deviations given below: Table 1.11: Random variable Mean Standard deviation x1 x2 12 14 2 15 a. Determine µx1 +x2 b. Determine σx1 +x2 www.ck12.org 80 Test 2 1. The bound on the error of estimation associated with a 95% confidence interval for µ is a. 1.96 σ b. 1.96 √σ n c. 1.96 σ n d. 1.96 2. Which of the following does not influence the width of a large sample confidence interval for µ? a. x̄ b. The standard deviation of the population. c. The confidence level d. The sample size 3. The Central Limit Theorem predicts that a. The sampling distribution of x̄ will be approximately normal for reasonably large samples b. The sampling distribution of µx will be approximately normal for reasonably large samples. c. The mean of the sampling distribution of x̄ will tend to be close to µ for reasonably large samples. d. The mean of the sampling distribution of µx will tend to be close to µ for reasonably large samples. 4. A t interval is used in place of the z interval when which of the following must be estimated? a. The sample mean b. The population mean c. The sample standard deviation d. The population standard deviation. 5. The critical t values that are used to find a confidence interval for the population mean will get larger if a. The sample size becomes smaller b. The level of confidence is made smaller c. The standard deviation becomes smaller 81 www.ck12.org d. The population mean becomes larger 6. A random sample is selected from a population that has a proportion of successes π = .7 a. Determine the mean and the standard deviation of the sample proportion for a sample size of n = 9. b. For a sample of n = 25, determine the approximate probability that the sample proportion will be within .25 of π. 7. A simple random sample of 75 female adults living in a particular city was taken to study the amount of time they spent per week doing rigorous exercise. It indicated a mean of 73 minutes with a standard deviation of 21 minutes. Find the 99.5 confidence interval of the mean for all females in this city. Interpret this interval in words. 8. Using the following data and your calculator, construct a 90% confidence interval for the mean of the population from which the data was taken. Assume the underlying population is normally distributed. 31 27 37 29 26 24 34 36 31 34 36 21 9. Batteries are packaged in boxes of 10. The probability of a battery being faulty is 2%. What is the probability of a box containing 2 faulty batteries? (0.0153) 10. Let x1 and x2 be random variables with means and standard deviations given below: Table 1.12: Random variable Mean Standard deviation x1 x2 12 15 2 1 a. Determine µx1 +x2 b. Determine σx1 +x2 www.ck12.org 82 1.8 Hypothesis Testing Hypothesis testing and the P value Quiz 1 1. Bars of Choco are claimed by the manufacturer to have a mean mass of 102.5 grams. A test is carried out to see whether the mean mass of Choco bars is less than 102.5 grams. State the null and alternative hypotheses. 2. In a criminal trial in the United States the jury is always told that the defendant is “innocent until proven guilty”. a. What must a member of the jury assume about the defendant at the beginning of the trial? H0 : (HINT: one word) b. It is the prosecuting attorney’s job to present evidence to the jury. IF there is enough evidence (“beyond a reasonable doubt”), then the jury will convict the defendant of the crime. If the defendant is convicted, the jury is rejecting the null hypothesis. Ha : c. When the jury convicts someone of a crime, their verdict is GUILTY. Is this “Reject H0” OR “Fail to Reject H0”? d. If the jury fails to convict someone of a crime, their verdict is NOT GUILTY. Is this “Reject H0” OR “Fail to Reject H0”? e. Sometimes the jury makes a correct decision and sometime the jury makes a mistake. a. Write a sentence describing a Type I error in the U.S. criminal justice system. b. Write a sentence describing a Type II error in the U.S. criminal justice system. 3. Using a z distribution find the critical value for a two tailed test at α = .02 4. In a one sided hypothesis test at α = .02 if the p value is .001, what is the decision you will make regarding the null hypothesis? 5. A clean air standard requires that vehicle exhaust emissions not exceed specified limits for various pollutants. Many states require that cars be tested annually to ensure that they meet the standard. State regulators are checking up on repair shops to see if they are certifying cars that do not meet the standard. a. In this problem, what is meant by the power of the test the regulators are conducting? 83 www.ck12.org (Probability of detecting the shop is not meeting standards when it is not) b. Will the power be greater if they test 20 or 40 cars? (40 cars) Quiz 2 1. The mean factory assembly time for a particular electronic component is 84 seconds. It is required to test whether the introduction of a new procedure results in a different assembly time. State the null and alternative hypotheses. 2. Medical tests have been developed to detect many serious diseases (such as cancer and HIV). A medical test is designed to give correct results as often as possible. That is, to minimize the occurrence of “false positives” and “false negatives”. A doctor starts by assuming that a patient is healthy (no disease), then looks for evidence to contradict that assumption. If the patient has a negative test result, the doctor continues to assume that the patient is healthy. If the patient has a positive test result, the doctor concludes that the patient has a disease. A. State H0 and Ha. b. When will the doctor Reject H0 ? C. When will the doctor fail to Reject H0 ? D. What kind of an error is a “false positive”? EXPLAIN. E. What kind of an error is a “false negative”? EXPLAIN. F. What are the consequences of a false positive? Of a false negative? 3. Using a z distribution find the critical value for a two tailed test at α = .03 for one tailed test, Ha : µ ≥ d for some constant d 4. In a two sided hypothesis test at α = .02 if the p value is .03, what is the decision you will make regarding the null hypothesis? 5. A company is sued for job discrimination because only 17% of the newly hired candidates were minorities when 30% of all who applied were minorities. Is this strong evidence that the company’s hiring practices are discriminatory? a. In this problem, describe what is meant by the power of the test. b. If the hypothesis is tested at the 5% level of significance instead of 1% how will the power be affected? www.ck12.org 84 Quiz 3 1. In a report it was stated that the average age of all hospital patients was 53 years. A newspaper believes that this figure is an underestimate. State the null and alternative hypotheses. 2. For the following conjecture, The teacher will never check homework today. a. Describe in words the null and alternative hypotheses. b. Describe the type I and type II errors for this conjecture c. Describe the ramifications of making these errors in the context of the problem. 3. Using a z distribution find the critical value for a two tailed test at α = .05 4. In a one sided hypothesis test at α = .05 if the p value is .039, what is the decision you will make regarding the null hypothesis? 5. A professor notes that for the past several years about 11% of the students who initially enroll in her class withdraw before the end of the semester. She is offered some software to buy which, the salesperson claims will help make the class more interesting. She can use the software for a semester, at no cost to see if the dropout rate goes down significantly. She only has to pay for the software if she chooses to continue to use it beyond the semester. a. Is this a one or two tailed test? b. Explain what will happen if the professor make a type II error. c. What is meant by the power of this test? Testing a Mean Hypothesis Quiz 1 1. Two pharmacists are concerned about their supply of an antibiotic. Mean release potency for unaffected antibiotic pills is 910 with a standard deviation of 6.8.The pharmacists test 10 lots of antibiotic and get the following potency data: 900 901 910 877 913 909 908 905 916 918 The pharmacists are concerned that the antibiotic is less than from the standard release potency of 910 and is going to run a significance test. a. What are the null and alternate hypotheses? 85 www.ck12.org b. Use a .01 level of significance and a z test to test the hypotheses. c. Calculate your p value. d. What is your conclusion? e. Describe a type I error in this context. What is the probability of making a type I error? f. Describe power in this situation. 2. Data for male mean earnings indicates that this figure is $24000. What can you say about the validity this figure if a simple random sample of 200 families showed an average earnings level of $23500, with a standard deviation of $4000? Use a .05 level of significance. Would your conclusions be any different at a .01 level of significance? (To calculate the standard error: 3. In an advertisement, a pizza shop claims that its mean delivery time is less than 30 minutes. A random selection of 36 delivery times yields a sample mean of 28.5 minutes and a standard deviation of 3.5 minutes. Does this provide sufficient evidence to support the claim at a significance level of α = .01 ? Quiz 2 1. Let µ denote the mean cholesterol of heart attack patients under the age o 50. The American Medical Association (AMA) has claimed that cholesterol levels of 240 and higher dramatically increase the risk of heart attacks. A random sample of the cholesterol levels of 15 heart attack patients age 50 and under yields x̄ = 247 and s = 17.3. For testing H0 : µ = 240 versus H0 : µ > 240 a. Calculate the value of the test statistic for testing the null hypothesis. b. Assuming that cholesterol levels are normally distributed, determine as closely as possible the p−value associated with the value of the test statistic you found in part a). c. Using a significance level of .05, does the sample data support the hypothesis that heart attack patients under the age of 50 have mean cholesterol level greater than 240? Explain. 2. The University of Higher Learning maintains that the average annual income of a graduate one year after graduation is 47, 550. You suspect this is actually lower. You take a SRS of size 30 of individuals who graduated from the University of Higher Learning one year ago, and find the average annual income to be 38, 790. Suppose that the standard deviation of annual average income of all one-year old graduates from the University of Higher Learning is 12, 500. a. Calculate the z−test statistic. b. Perform a z−test of significance and give the p−value. c. What is your conclusion? www.ck12.org 86 3. A franchise reports an average of 150 sales per day. You suspect that this average is inaccurate, so you randomly select 43 days and determine the number of sales for each day, which turns out to be 143 with a standard deviation of 15 sales. At α = .05; is there enough evidence to support her claim? At the level α = .01 ? Quiz 3 1. The department of natural resources reports that a fish is unsafe to eat if the polychlorinated biphenol (PCB) concentration exceeds 5 parts per billion (ppb). A sample of 10 fish taken from a local lake results in the data listed below. 2.9 4.7 7.6 6.9 4.8 4.9 5.2 3.7 5.1 3.8 a. Calculate point estimates of µ and σ b. Will you use at or z test statistic? Explain. c. For testing H0 : µ = 5 versus Ha : µ > 5 calculate the value of the test statistic. d. Assuming that the levels of PCB’s are normally distributed, determine the p−value as closely as possible. e. Does the data suggest that the fish from this lake should not be eaten? Explain. 2. You want to estimate the average height of an enchanted ogre. No one knows the standard deviation of heights of enchanted ogres. You take a random sample of 80 enchanted ogres and find that the average height is 7.8 feet, with a standard deviation of 1.2 feet. a. Ted the Wizard says the mean height of enchanted ogres is 8.5 feet. Your sample leads you to believe that the mean height is actually lower. Perform a test of significance and state your conclusion. b. No one knows if the distribution of heights of enchanted ogres is normal. Why does this not matter in this case? 3. Suppose you work for a Drug Corporation and are testing their expensive new drug which has been approved. The pill form of the drug is to be manufactured at 325 mg. Conduct a test at the α = .01 significance level, for the purpose of either recommending that the manufacturing plant adjust their manufacturing process or not. Assume that you have obtained 185 pills from the manufacturing plant, for which the mean concentration is 327.55 mg and s = 9.2 mg. 87 www.ck12.org Testing a Proportion Hypothesis Quiz 1 1. A lawsuit against a chemical company alleges that neighbors of a chemical plant have higher than normal cancer rates. The group filing the lawsuit randomly sampled 71 people who lived within 10 miles of the plant, and they found that 18 of those people had the cancer. By comparison, a random sample of 500 people in the general population found 74 cases of the same cancer. Is the cancer rate in the sample significantly higher than the rate in the general population? 2. In July 2002 the American Journal of Clinical Nutrition reported that 42% of 1546 randomly selected African American women studied had vitamin D deficiency. The data came from a national nutrition study conducted by the Centers for Disease Control in Atlanta. a. Find the value of σβ and sketch the sampling distribution b. Create a 95% confidence interval. 3. Cancer Rates. A lawsuit against a chemical company alleges that neighbors of a chemical plant have higher than normal cancer rates. The group filing the lawsuit randomly sampled 71 people who lived within 10 miles of the plant, and they found that 18 of those people had the cancer. It is known that the cancer rate in the general population is .15. Is the cancer rate in the sample significantly higher than the rate in the general population? (You may assume that the events and the two samples are independent.) a. Write appropriate hypotheses. b. Test the hypothesis at the 1% significance level (α = .01). c. Explain your conclusion in the context of the problem. d. Describe two ways to increase the power of this test. Quiz 2 1. A parcel delivery service claims that at least 80% of their parcels are delivered within 48 hours of posting. A check on 200 randomly selected parcels found that 152 were delivered within 48 hours of posting. Test the delivery service’s claim at the 5% significance level. 2. Vitamin D.In July 2002 the American Journal of Clinical Nutrition reported that 42% of 1546 randomly selected African American women studied had vitamin D deficiency. The data came from a national nutrition study conducted by the Centers for Disease Control in Atlanta. (You may assume independence.) b. Find the value of σβ www.ck12.org 88 c. Create a 95% confidence interval. d. Interpret the meaning of this interval. 3. Explain the difference between a p value and p̂ Quiz 3 1. The dropout rate of students enrolled at a certain university is reported to be 13.2%.The Dean of Students suspects that the drop-out rate for science students is greater than 13.2%, and she examines the records of a random sample of 95 of these students.The number of drop-outs was found to be 20.Test the Dean’s suspicion at the 5% significance level. 2. A random sample of 500 males was selected from a town in France and 53 men were found to be color-blind. Find a 95% confidence interval for the proportion of color-blind males in France. 3. Free Throws. During her first basketball season, a player made 49 out of 55 free throws attempted. a. Find a 95% confidence interval for her percentage of free throws. b. Based on your interval would you say she is a better free throw shooter than her teammates whose percentage is .75? Testing a Hypothesis for dependent and independent samples Quiz 1 1. An animal researcher carried out an experiment to see if the type of food fed to a rat would affect the rat’s running time through a maze. Four rats were fed bran and 8 rats were fed the regular food. The running time is in seconds. Here is the data: Table 1.13: Bran Regular Food 32 38 27 45 118 78 82 91 67 41 97 89 www.ck12.org Table 1.13: (continued) Bran Regular Food 42 a. Does this scenario involve dependent or independent samples? Explain. b. What would the hypotheses be for this scenario? c. Compute the pooled estimate for population variance. d. Calculate the estimated standard error for this scenario. e. What is the test statistic and at an alpha level of .05 what conclusions would you make about the null hypothesis? 2. A researcher was concerned about the effects of anxiety on test scores and investigated the effectiveness of relaxation training. The subjects were given a test before and after the training and a measure of their anxiety was taken after each test. Following is the data: Table 1.14: Before After 84 76 104 103 91 90 72 70 90 94 93 90 a. What would be the hypotheses for this scenario? b. Calculate the estimated standard deviation for this scenario. (Level 2) c. Compute the standard error of the difference for these samples. (Level 2) d. What is the test statistic and at an alpha level of .05 what conclusions would you make about the null hypothesis? Quiz 2 1. You want to know if attending summer school helps students ‘ grades to improve. Six students repeat a course they did poorly in during the school year. Assume that these six students are representative of all students who might attend summer school. Do these results provide evidence that the summer school program is worthwhile? www.ck12.org 90 June August 54 50 49 65 68 74 66 64 62 68 62 72 a. What would be the hypotheses for this scenario? b. Calculate the estimated standard deviation for this scenario. c. Compute the standard error of the difference for these samples. d. what is the test statistic and at an alpha level of .05 what conclusions would you make about the null hypothesis? 2. Can boys do more push-ups than girls? To answer this question students at a High School, as part of a physical fitness test, were asked to do as many push-ups as they could. Assume that students at the high school were assigned to gym classes randomly. Here is the data: boys Girls 11 24 34 7 17 14 27 16 31 2 17 15 25 19 32 25 28 10 23 27 25 31 16 8 a. Does this scenario involve dependent or independent samples? Explain. b. What would the hypotheses be for this scenario? c. Compute the pooled estimate for population variance. d. Calculate the estimated standard error for this scenario. e. What is the test statistic and at an alpha level of .05 what conclusions would you make about the null hypothesis? Quiz 3 1. You want to know whether people are likely to offer a different amount for a used bicycle when buying from a friend than when buying from a stranger. Following is the data collected: Table 1.15: Buying from a friend Buying from a stranger 275 300 260 300 260 250 175 130 91 www.ck12.org Table 1.15: (continued) Buying from a friend Buying from a stranger 255 275 290 300 200 225 240 a. State the null and alternative hypotheses. b. Perform a two-sample t test c. Calculate the p value d. State your decision. 2. Do you get better gas mileage when you use premium gas rather than regular gas? To study this question 10 cars from a company fleet were used. Each car was filled with regular or premium gas, depending on the toss of a coin, and the mileage for that tank was recorded. Then the mileage was recorded for the same cars for a thankful of the other kind of gas. Here are the results: car# Regular premium 1 16 19 2 20 22 3 21 24 4 22 24 5 23 25 6 22 25 7 27 26 8 25 26 9 27 28 10 28 32 a. What would be the hypotheses for this scenario? b. Calculate the estimated standard deviation for this scenario. c. Compute the standard error of the difference for these samples. d. What is the test statistic and at an alpha level of .05 what conclusions would you make about the null hypothesis? Test 1 1. Samples of hamburger were selected from two different outlets of a large supermarket to measure the percentage of fat present in the meat, with the following summary data. www.ck12.org 92 Table 1.16: N mean std.dev Outlet 1 Outlet 2 5 10.3 1.6 10 10.7 percent 2.3 percent It is reasonable to believe that both outlets have the same variability. Hence, the pooled standard deviation is: a. 1.95 b. 2.08 c. 4.38 d. 2.09 e. 2.11 (Solution: e) 2. The degrees of freedom of the pooled estimate in the previous question is: a. 15 b. 13 c. 7.5 d. 5 e. 10 (b) 3. In a test of H0 : µ = 100 against HA : µ6 = 100, a sample of size 10 produces a sample mean of 103 and a p−value of 0.08. Thus, at the 0.05 level of significance: a. there is sufficient evidence to conclude that µ6 = 100. b. there is sufficient evidence to conclude that µ = 100. c. there is insufficient evidence to conclude that µ = 100. d. there is insufficient evidence to conclude that µ6 = 100. e. there is sufficient evidence to conclude that µ = 103. Solution: d - you always try and collect evidence against the null 4. In a test of H0 : µ = 100 against HA: µ6 = 100, a sample of size 80 produces Z = 0.8 for the value of the test statistic. The p−value of the test is thus equal to: a. 0.20 93 www.ck12.org b. 0.40 c. 0.29 d. 0.42 e. 0.21 Solution: d 5. In order to study the amounts owed to the city, a city clerk takes a random sample of 16 files from a cabinet containing a large number of delinquent accounts and finds the average amount X owed to the city to be $230 with a sample standard deviation of $36. It has been claimed that the true mean amount owed on accounts of this type is greater than $250. a. What are the null and alternate hypotheses. b. At the .05 level of significance, compute the test statistic and p value. c. State your decision. 6. A physician wants to compare the blood pressures of six patients before and after treatment with a drug. The blood pressures are as follows: Table 1.17: Patient Before Drug After Drug 1 2 3 4 5 6 168 171 182 167 174 170 171 170 180 173 178 172 The physician wants to test if there is a significant change of the blood pressure before and after taking the drug at 0.05 level of significance. a. State the null and alternate hypotheses b. Find the value of the test statistic. c. What is your decision about the drug? 7. A paired difference experiment is conducted to compare the starting salaries of male and female college graduates who find jobs. Pairs are formed by choosing a male and a female with same major and similar grade-point averages. Suppose a random sample of 5 pairs and the starting salaries(in thousands) are as follows: www.ck12.org 94 Pair Male Female 1 25.9 24.9 2 20.0 18.5 3 28.7 27.7 4 13.5 13.0 5 18.8 17.8 Test whether the mean starting salary for males is less than that of females at the .05 level of significance. Test 2 1. Suppose that the variable you have measured in a sample of subjects does not have a normal distribution in the population. which of the following is recommended? a. Convert all the measurements to z−scores b. Eliminate as many measurements as necessary until your sample distribution looks like the normal distribution. c. Use a fairly large sample size (at least 30 or 40) d. Choose another variable to measure – only a normally distributed variable will give you valid results. (c) 2. The main advantage of a one-tailed test, compared to a two-tailed test is that: a. Only half the calculation is required b. Only half of the calculated t value is required c. There is only half the risk of a type I error d. A smaller critical value must be exceeded (d) 3. As the calculated z−score for a one sample test gets larger a. P gets larger b. P gets smaller c. P remains the same but alpha gets larger d. P remains the same but alpha gets smaller (b) 4. In a two-tailed large sample z test with calculated z = 1.68, the p−value is a. 0.0930 b. 0.0465 c. 0.9170 95 www.ck12.org d. 0.9535 5. Which of the calculated values of a test statistic would have the smallest p−value? a. Z = 3.05 b. T = 3.05 with 10 degrees of freedom c. T = 3.05 with 15 degrees of freedom d. T = 3.05 with 30 degrees of freedom 6. β is the a. Significance level b. P −value c. Probability of making a type II error d. Probability of making a type I error 7. Let µ denote the mean cholesterol of heart attack patients under the age of 50. The American Medical Association has claimed that cholesterol levels of 240 and higher dramatically increase the risk of heart attacks.A random sample of cholesterol levels of 15 heart attack patients age 50 and under yields x̄ = 247 s = 17.3.For testing H0 = µ = 240 versus Ha : µ ≥ 240 a. Calculate the value of the test statistic for testing the null hypothesis. b. Assuming that cholesterol levels are normally distributed, determine as closely as possible the p value associated with the value of the test statistic you found in part a). c. Using a significance level of .05, does the sample data support the hypothesis that heart attack patients under the age of 50 have mean cholesterol level greater than 240? 8. It is claimed that college women tend to have higher GPAs than do college men. A random sample of 13 men and 19 women in a college class reported their grade point averages. Here are the summary statistics for that data: Table 1.18: Men Women ȳ s 2.898 3.330 0.583 0.395 a. Calculate the pooled sample standard deviation. b. Does this sample support the claim? Test an appropriate hypothesis and state your conclusion. www.ck12.org 96 9. A manufacturer wished to compare the wearing qualities of two different types of automobile tires, A and B. To make the comparison, a tire of type A and one of type B were randomly assigned and mounted on the rear wheels of each of five automobiles. The automobiles were then operated for a specified number of miles and the amount of wear was recorded for each tire. These measurements appear below: Table 1.19: Automobile Tire A Tire B 1 2 3 4 5 10.6 9.8 12.3 9.7 8.8 10.2 9.4 11.8 9.1 8.3 Test the null hypothesis that there is no difference in the average length of wear for the two types of tires. 1.9 Regressions and Correlation Quizzes Scatterplots and Linear Correlation Quiz 1 A teacher believes that the number of hours a student spends studying can, to some degree, predict the student’s score on a quiz. Consider the following data: Table 1.20: Study Hours Quiz Score 7 6 11 5 15 11 12 11 7 5 10 6 4 9 10 8 9 7 97 www.ck12.org 1. Compute the Pearson correlation coefficient between the study hours and quiz score. 2. Find r squared 3. Interpret both r and r squared in words. 4. Draw a scatter plot of this data and describe the direction and strength of the relationship. Quiz 2 The following data describes the percentage of rotten apples in a case of fruit based on the number of days of transport to the store. Days Percent Rotten 1 5 2 7 3 8 4 12 5 16 6 21 1. Compute the Pearson correlation coefficient between the study hours and quiz score. 2. Find r squared 3. Interpret both r and r squared in words. 4. Draw a scatter plot of this data and describe the direction and strength of the relationship. Quiz 3 Consider the following data set: X Y 7 18 6 46 11 8 10 25 9 25 18 7 1. Compute the Pearson correlation coefficient between the study hours and quiz score. 2. Find r squared 3. Interpret both r and r squared in words. 4. Draw a scatter plot of this data and describe the direction and strength of the relationship. www.ck12.org 98 Least Squares Regression Quiz 1 Following is data on the amount of money (in dollars) a customer spent on a product and the customer satisfaction (on a scale of 1 − 10) with that product. Table 1.21: dollars satisfaction 11 18 17 15 9 5 12 19 22 25 6 8 10 4 9 6 3 5 2 10 1. Plot this data on a scatterplot (X axis – dollars spent, Y − axis – customer satisfaction) 2. Does there appear to be a linear relationship? 3. Calculate the regression equation for these data. 4. Sketch the regression line on the scatterplot. 5. What is the predicted satisfaction of a customer who spends $16? 6. Calculate the residuals for each of the observations and plots these residuals on a scatterplot. 7. Examine the scatterplot of the residuals. Is a transformation of the data necessary? Explain your answer. Quiz 2 The table below shows the percent of persons below the poverty level in the selected years. Let x = 0 correspond to 1980. 99 www.ck12.org Year % 1980 13 1985 14 1990 13.5 1991 14.2 1992 14.8 1993 15.1 1994 14.5 1995 13.8 1. Plot this data on a scatterplot (X axis year Y − axis – percent) 2. Does there appear to be a linear relationship? 3. Calculate the regression equation for these data. 4. Sketch the regression line on the scatterplot. 5. Estimate the percent of people below the poverty level in 1989. 6. Calculate the residuals for each of the observations and plot these residuals on a scatterplot. 7. Examine the scatterplot of the residuals. Is a transformation of the data necessary? Explain your answer. Quiz 3 The following table shows the number of deaths per 100, 000 people from heart disease. Year Deaths 1950 510 1960 521 1970 496 1980 436 1990 368 1996 358 1. Plot this data on a scatterplot (X axis – year, Y axis – number of deaths) 2. Does there appear to be a linear relationship? 3. Calculate the regression equation for these data. 4. Sketch the regression line on the scatterplot. 5. Estimate the number of deaths due to heart disease in 1974. 6. Calculate the residuals for each of the observations and plot these residuals on a scatterplot. 7. Examine the scatterplot of the residuals. Is a transformation of the data necessary? Explain your answer. www.ck12.org 100 Inferences about Regression Quiz 1 In examining the relationship between a child’s height (in cm) and his or her age (in months) the following summary statistics were output from a computer program: n = 12 Table 1.22: Parameter Estimate Standard Error Age Constant .6350 64.9283 .0214 .5084 1. What is the predictor variable? 2. Do you think the two variables are correlated? Explain. 3. What would be the regression equation for predicting height from the age? 4. Use the regression equation and predict the height of a child who is 18 months old. 5. Test the null hypothesis that the regression coefficient for this scenario is zero. a. Develop the null and alternate hypotheses b. Set the critical values at the .01 level of significance. c. Compute the test statistic d. Make a decision regarding the null hypothesis. 6. Develop a 95% confidence interval for β. Quiz 2 Given the following summary statistics: n = 56 x̄ = 39.0 ȳ = 26.5 sx = 5.4 sy = 13.4 r = −.848 1. Find the estimate of the regression coefficient. 2. Find the estimate of the constant. 3. What would the regression equation be for predicting Y from X? 101 www.ck12.org Quiz 3 When inflation is high, lenders require higher interest rates to make up for the loss of purchasing power of their money while it is loaned out. The data we have is the return of oneyear Treasury bills and the rate of inflation as measured by the change in the government’s Consumer Price Index in the same year. The data covers 51 years, from 1950 to 2000. Following is output from a computer program which analyzed the data. Linear Fit T-bill = 2.6662262 + 0.6269356Inflation Summary of Fit RSquare 0.448878 RSquare Adj 0.437631 Root Mean Square Error 2.18016 Mean of Response 5.198431 Observations (or Sum Wgts) 51 Table 1.23: Parameter Estimates Term Estimate Std Error t Ratio Prob > |t| Intercept Inflation 2.6662262 0.6269356 0.503848 0.099239 5.29 6.32 < .0001 < .0001 1. What is the correlation between inflation rates and T-bill returns? 2. What is the slope b1 of the fitted line and its standard error? (see output) 3. Calculate the t-statistic for testing the hypothesis that there is no straight line relationship between inflation rate and T-bill return against the alternative that the return on T-bills increases as the rate of inflation increases. a. State the hypotheses: b. Calculate t and report its degrees of freedom: 4. Find the regression equation. 5. Find a 90% confidence interval for the slope of the regression line. www.ck12.org 102 Multiple Regression Quiz 1 The experiment took place in February of 1986 at a student dormitory. Sixteen students volunteered to be the subjects in the experiment. Each student blew into a breathalyzer to indicate that his or her initial BAC was zero. The number (between 1 and 9) of 12 ounce beers to be drunk was assigned to each of the subjects by drawing tickets from a bowl. Thirty minutes after consuming their final beer, students had their BAC measured by a police officer of the OSU police department. The officer also administered a road sobriety test before and after the alcohol consumption. This involved performing four simple tasks, graded on a scale of 1 to 10 (ten being a perfect rating), demonstrating coordination: balancing on one foot, touching the tip of one’s nose with a forefinger, placing one’s head back with one’s eyes closed, and walking heel to toe. The police officer was not aware of how much alcohol each subject had consumed. - taken from the Electronic Encyclopedia for Statistical Examples and Exercises entitled ‘BAC.’ The Variables: ID = identification number Gender = indicated by female or male Weight = weight of each subject in pounds. Beers = number of 12 ounce beers consumed BAC = blood alcohol content 1st-Sobriety = combined score on the four road sobriety tests before alcohol consumption 2nd-Sobriety = combined score on the four road sobriety tests after alcohol consumption Following are the summary statistics: Dependent Variable: BAC vs. Independent Variable(s): Beers, Weight, Gender = male Table 1.24: Parameter estimates: Variable Estimate Std. Err. Intercept Beers Weight_OSU Gender_OSU = male 0.03870783 0.010972462 0.019895956 0.0013093256 −3.444049E − 4 6.842001E − 5 −0.0032403069 0.0062860446 103 Tstat P-value 3.5277252 15.195577 −5.0336866 −0.5154763 0.0042 < 0.0001 0.0003 0.6156 www.ck12.org Table 1.25: Analysis of variance table for multiple regression model: F −stat Source DF SS MS Model Error 3 12 Total 15 0.027846638 0.009282213 80.81081 0.0013783621 1.14863506E− 4 0.029225 P −value < 0.0001 Root MSE: 0.010717439 R-squared: 0.9528 1. How many predictor variables are there in this scenario? 2. What does the regression coefficient for beers tell us? 3. What is the regression model for this analysis? 4. What is R square and what does it indicate? 5. Which of the predictor variables are statistically significant? Explain. Quiz 2 Researchers are interested in predicting the carbon monoxide (CO) output from cigarettes by using the amount of nicotine and the amount of tar. They take a SRS of 29 brands of cigarettes and measure these quantities. Computer output for the analysis is given below: Regression Analysis: CO versus NICOTINE, TAR The regression equation is CO = 2.47 − 6.43 NICOTINE +1.32 TAR Table 1.26: Predictor Coef SE Coef T P Constant NICOTINE TAR 2.4711 −6.426 1.3184 0.9766 3.416 0.2312 2.53 −1.88 5.70 0.018 0.071 0.000 S = 1.559 R − Sq = 88.7% Analysis of Variance www.ck12.org 104 R − Sq(adj) = 87.8% Table 1.27: Source DF SS MS F P Regression Residual Error Total 2 26 495.60 63.23 247.80 2.43 101.90 0.000 28 558.83 1. What is the equation of the least squares regression line? 2. What is the predicted value of CO content when the level of nicotine in a cigarette is 1 mg and the level of tar is 10 mg? 3. Find the coefficient of determination, R2 and interpret it in terms of the problem. How much alcohol can one consume before one’s Blood Alcohol Content (BAC) is above the legal limit? An undergraduate statistics project was conducted at The Ohio State University in Columbus, Ohio that explored the relationship between BAC and other factors such as amount of alcohol consumed, gender, weight, and age. The Study: The experiment took place in February of 1986 at a student dormitory. Sixteen students volunteered to be the subjects in the experiment. Each student blew into a breathalyzer to indicate that his or her initial BAC was zero. The number (between 1 and 9) of 12 ounce beers to be drunk was assigned to each of the subjects by drawing tickets from a bowl. Thirty minutes after consuming their final beer, students had their BAC measured by a police officer of the OSU police department. The officer also administered a road sobriety test before and after the alcohol consumption. This involved performing four simple tasks, graded on a scale of 1 to 10 (ten being a perfect rating), demonstrating coordination: balancing on one foot, touching the tip of one’s nose with a forefinger, placing one’s head back with one’s eyes closed, and walking heel to toe. The police officer was not aware of how much alcohol each subject had consumed. - taken from the Electronic Encyclopedia for Statistical Examples and Exercises entitled ‘BAC Simple linear regression results: Dependent Variable: BAC Independent Variable: Beers BAC = −0.012700604 + 0.017963761 Beers Sample size: 16 R (correlation coefficient) = 0.8943 105 www.ck12.org R − sq = 0.79984075 Estimate of error standard deviation: 0.020440951 Table 1.28: Parameter estimates Parameter Estimate Std. Err. DF Intercept Slope −0.012700604 0.0126375025 14 0.017963761 0.0024017035 14 T −Stat P −Value −1.0049932 7.479592 0.332 < 0.0001 Table 1.29: Analysis of variance table for regression model Source DF SS MS F −stat P −value Model Error 1 14 0.023375345 0.005849655 0.023375345 4.178325E − 4 55.944298 < 0.0001 Total 15 0.029225 4. Interpret the intercept in the context of the problem. 5. Interpret the slope in the context of the problem. 6. What is the regression equation? 7. Predict the BAC of a person that has consumed 15 beers. Quiz 3 When the Dow Jones stock index first reached 10, 000, the New York Times reported the dates on which the Dow first crossed each of the “thousand” marks, starting with reaching 1000 in 1972. A regression of the Dow prices on year looks (in part) like this: Dependent variable is : Dow R − squared = 65.8% Variable Constant Year Coefficients − 603335.00 305.47 1. What is the correlation between the Dow index and the year? 2. Write the regression equation. www.ck12.org 106 3. Explain in this context what the equation says. A car dealer, specializing in Corvette sports cars, enlarged his facilities and offered a number of models for sale. His sales list includes data on age (in years), mileage (in thousand miles), and selling price (in thousand dollars) of cars. Use the explanatory variables, age and mileage, to predict the selling price of a car. Regression Analysis The regression equation is Price = 34.3 − 1.14 Age −0.201 Mileage Table 1.30: Predictor Coef StdErr T P Constant Age Mileage 34.315 −1.1400 −0.20065 1.950 0.2943 0.06027 17.59 −3.87 3.33 0.000 0.002 0.006 R-Sq = 89.2% S = 3.259 R-Sq(adj) = 87.4% Analysis of Variance Table 1.31: Source DF SS MS F P Regression Residual Error Total 2 12 1048.36 127.45 524.18 10.62 49.35 0.000 14 1175.82 4. Write the least-squares fitted regression line. 5. If x1 to be held constant, what is the change in ŷ when x2 is increased by 1 unit? This is considered the “slope” for Mileage (x2 ). Interpret. 6. Use the least-squares fitted regression line to predict the selling price of a car that is 10 years old with a mileage of 55, 000. Assume the values given for age and mileage are within the range of data used to calculate the least squares fitted regression line. 107 www.ck12.org Test 1 1. Foresters use regression to predict the volume of timber in a tree using easily measured quantities. Let y be the volume of timber measured in cubic feet and x be the diameter in feet (measured at 3 feet above ground. Y = −30 + 60x. The predicted volume for a tree of 18 inches is a. 1050 cubic feet b. 600 cubic feet c. 105 cubic feet d. 90 cubic feet e. 60 cubic feet 2. Given the least squares regression line: (cost of a monopoly property) = 67.3 + 6.78 (spaces from GO). Determine the residual for Reading Railroad which costs $200 and is 5 spaces from GO. a. −98.8 b. −9.88 c. 98.8 d. −1418.3 e. A residual has no meaning since one of the variables is categorical. 3. With regard to regression, which of the following statements about outliers are true? I Outliers have large residuals II A point may not be an outlier even though its x value is an outlier in the x−variable and its y−value is an outlier in the y−variable. III Removal of an outlier sharply affects the regression line a. I and II b. I and III c. II and III d. I, II, and III e. None of the above gives the complete set of true responses. 4. Consider the three points (2, 11), (3, 17), and (4, 29). Given any straight line, we can calculate the sum of the squares of the three vertical distances from these points to the line. What is the smallest possible value this sum can be? www.ck12.org 108 a. 6 b. 9 c. 29 d. 57 e. Cannot be determined 5. A bivariate set has a value of rsquare = .81. Which is an appropriate conclusion? a. R = 0.9 b. 81% of the data is usable c. There is an 81% chance that the regression line will fit the data. d. 81% of the variation between the variables is accounted for by the mode. e. None of these is appropriate. 6. Consider the following data set: Table 1.32: School Football Players’ SAT All Students’ SAT 1 2 3 4 5 6 7 8 9 10 872 741 826 788 838 1034 820 897 881 825 1140 1007 1190 998 1050 1250 986 1083 1009 1090 a. Which school’s scores would be influential if you were to make a scatterplot? b. Using football players’ SAT as your explanatory variable, find the least squares regression line for this data. c. If a school’s players have an average SAT score of 814. What score would you predict for the entire student body? d. Find the residual for school number 8. 7. Consider the following computer output of a linear regression analysis: 109 www.ck12.org The two variables are Percent 2002 (independent variable) and Total SAT 2002 (dependent variable) Summary of Fit RSquare .770503 RSquare Adj .76582 Root Mean Square Error 32.11888 Observations 51 Table 1.33: Analysis of Variance Source DF Sum of Squares Model Error C. total 1 49 50 169713.02 50549.49 220262.51 Table 1.34: Parameter Estimates Term Estimate Std Error Intercept Percent 2002 1145.1989 2.046837 7.57007 0.159583 What is the equation for the least squares regression line? 8. Data was collected on two variables X and Y and a least squares regression line was fitted to the data. The estimated equation is Y = −2.29 + 1.70X. What is the residual for the point (5, 6)? Test 2 1. If the coefficient of determination is calculated as 0.81, then the correlation coefficient is: a. 0.81 b. 0.9 c. −0.9 d. 0.405 e. Cannot be determined 2. A regression analysis of company profits and the amount of money the company spent on www.ck12.org 110 advertising four r−squared to be .72. Which of these is true: I This model can correctly predict the profit for 72% of the companies. II On average, 72% of a company’s profit results from advertising III On average, companies spend about 72% of their profit on advertising a. I only b. II only c. III only d. I and III e. None are correct 3. Medical records indicate that people with more education tend to live longer; the correlation is .48. The slope of the linear model predicts lifespan from years of education suggests that on average people tend to live .8 extra years for each additional year of education they have. The slope of the line that would predict years of education from lifespan is a. .288 b. .384 c. .8 d. 1.25 e. 1.67 4. The regression analysis examines the relationship between the number of years of formal education a person has and their annual income. According to this model, about how much more money do people who finish a 4 year college program earn each year, on average than those with only a 2 year degree? The dependent variable is Income. R-squared is 25.8%. The coefficient of constant is 3984.45. The coefficient of education is 2668.45 a. $2006 b. $2710 c. $5337 d. $7968 e. $9321 5. The correlation between a family’s weekly income and the amount they spend on restaurant meals is found to be r = .30. Which must be true? I Families tend to spend about 30% of their income in restaurants 111 www.ck12.org II In general, the higher the income, the more the family spends in restaurants III The line of best fit passes through 30% of the data points (income, restaurant$) a. I only b. II only c. III only d. II and III only e. I, II and III 6. Which of the following statements about the correlation coefficient are true? I The correlation coefficient and the slope of the regression line may have opposite signs. II A correlation of 1 indicates a perfect cause-and-effect relationship between the variables. III Correlations of +.87 and −.87 indicate the same degree of clustering around the regression line. a. I only b. II only c. III only d. I and II e. I, II and III 7. Given a set of ordered pairs (x, y) with sx = 2.5, sy = 1.9, r = .63 , what is the slope of the regression line of y on x? a. 1.9 b. 2.63 c. 0.65 d. 1.32 e. 0.48 8. Lydia and Bob were searching the Internet to find information on air travel in the United States. They found data on the number of commercial aircraft flying in the United States during the years 1990-1998. The dates were recorded as years since 1990. Thus, the year 1990 was recorded as year 0. They fit a least squares regression line to the data. The graph of the residuals and part of the computer output for their regression are given below. www.ck12.org 112 y = 2939.93 + 233.517x r = 0.88 a. Is a line an appropriate model to use for these data? What information tells you this? b. What is the value of the slope of the least squares regression line? Interpret the slope in the context of this situation. c. What is the value of the intercept of the least squares regression line? Interpret the intercept in the context of this situation. d. What is the predicted number of commercial aircraft flying in 1992? e. What was the actual number of commercial aircraft flying in 1992? 9. Following are the distances (in miles) and cheapest airline fares (in dollars) to certain destinations for passengers flying out of Baltimore, Maryland as of January 8, 1995. Table 1.35: Destination Distance Airfare Destination Distance Airfare Atlanta Boston Chicago Dallas Detroit Denver 576 370 612 1216 409 1502 178 138 94 278 158 258 Miami New Orleans New York Orlando Pittsburgh St. Louis 946 998 189 787 210 737 198 188 98 179 138 98 113 www.ck12.org a. Write the equation of the least squares line for predicting airfare from distance. b. What airfare does the least squares line predict for a destination which is 300 miles away? c. What airfare does the least squares line predict for a destination which is 1500 miles away? d. Use the equation of the regression line to predict the airfare to a destination 900 miles away. e. What airfare would the regression line predict for a flight to San Francisco which is 2842 miles from Baltimore? Would you take this prediction as seriously as the one for 900 miles? Explain. 1.10 Chi-Square The Goodness of Fit Test Quiz 1 1. Complete the following sentence: The chi-square goodness of fit test is used to 2. Following is information on the ethnicity distribution of holders of the highest academic degree for the year 1981 Table 1.36: Race/Ethnicity Percent White, non-Hispanic Black, non-Hispanic Hispanic Asian or Pacific Islander American Indian/Alaskan Native Non-resident alien 78.9 3.9 1.4 2.7 .4 12.8 A random sample of 300 doctoral degrees recipients in 1994 showed the following frequency distribution: Table 1.37: Race/Ethnicity Observed White, non-Hispanic 189 www.ck12.org 114 Table 1.37: (continued) Race/Ethnicity Observed Black, non-Hispanic Hispanic Asian or Pacific Islander American Indian/Alaskan Native Non-resident alien 10 6 14 1 80 a. If the distribution from 1981 is accurate how many recipients, out of the 300, would you expect to see of each ethnicity in 1994? b. Perform a goodness of fit test to determine if the distribution in 1994 is significantly different from the distribution in 1981. i. State the null and alternate hypotheses. ii. State the number of degrees of freedom for this test. iii. Use a chi-square table to determine the critical value at the .05 level of significance. iv. Calculate the test statistic v. State your decision and your conclusion. Quiz 2 1. The Chi-Square test of independence is used to 2. Following is information that has been gathered about the number of births per each zodiac sign in a given period of time. Table 1.38: Sign Births Aries Taurus Gemi ni Cancer Leo Virgo Libra Scorpio Sagittarius Capricorn 23 20 18 23 20 19 18 21 19 22 115 www.ck12.org Table 1.38: (continued) Sign Births Aquarious Pisces 24 29 Use the chi-square goodness of fit test to determine if births are uniformly distributed over the zodiac signs. a. State the null and alternate hypotheses b. How many degrees of freedom does this test have? c. Use a chi-square table to determine the critical value at the .01 level of significance d. Calculate the test statistic. e. State your decision and your conclusion. Quiz 3 1. The chi-square test of homogeneity is used to 2. The partners in a law firm brought in the following numbers of new clients during the past year: Partner Number of new clients Jones 35 Smith 42 Brown 22 Allen 41 Cross 30 Is there sufficient evidence as the .10 level of significance that partners do not bring in equal numbers of new clients? a. State the null and alternate hypotheses b. How many degrees of freedom does this test have? c. Use a chi-square table to determine the critical value at the .10 level of significance d. Calculate the test statistic. e. State your decision and your conclusion. www.ck12.org 116 Test of Independence Quiz 1 A company held a blood pressure screening for its employees. The results are summarized in the following table. The information is categorized by age group and blood pressure level. Table 1.39: Low Normal High Under 30 30 − 49 Over 50 27 48 23 37 91 51 31 93 73 1. What proportion of employees under 30 has high blood pressure? 2. What proportion of people with high blood pressure are over 50? 3. Does there appear to be an association between age and high blood pressure among these employees? i. State the null and alternate hypotheses. ii. How many degrees of freedom are in this chi-square test? iii. Calculate the chi-square statistic. iv. Determine, using the table, the critical value for this chi-square at the .05 level of significance. v. State your decision and your conclusion. Quiz 2 The following table shows the political affiliation of American voters and their positions on the death penalty. This data is hypothetical. Table 1.40: Republican Democrat Other Support the death penalty Oppose the death penalty .26 .12 .24 .04 .24 .10 117 www.ck12.org 1. What is the probability that a randomly chosen voter supports the death penalty? 2. What is the probability that a randomly chosen voter is not a Republican? 3. What is the probability that someone who favors the death penalty is a Democrat? 4. What is the probability that a Republican supports the death penalty? 5. What is the probability that a voter chosen at random is a Democrat and opposes the death penalty? 6. What is the probability that a voter chosen at random is either a Democrat OR opposes the death penalty? 7. Do party affiliation and opinion about the death penalty seem to be independent? a. State the null and alternative hypotheses b. How many degrees of freedom does your test have? c. Using technology calculate the chi-square statistic. d. What is the p−value associated with your statistic? e. Using the .01 level of significance, state your decision and your conclusion. Quiz 3 Some researchers were interested in a possible relationship between heart disease and baldness and so they asked a sample of 663 male heart patients to classify their degree of baldness. They also asked a control group of 772 males to do the same baldness assessment. The following table has the results: Table 1.41: Heart Disease Control None Little Some Much Extreme 251 165 195 50 2 331 221 185 34 1 1. What proportion of these men identified themselves as having little or no baldness? 2. Of those who had heart disease, what proportion claimed to have some, much or extreme baldness? 3. Of those who declared themselves as having little or no baldness, what proportion was in the control group? 4. Determine whether a relationship seems to exist between heart disease and baldness. www.ck12.org 118 a. State the null and alternative hypotheses b. How many degrees of freedom does your test have? c. Using technology calculate the chi-square statistic. d. What is the p−value associated with your statistic? e. Using the .05 level of significance, state your decision and your conclusion. Testing One Variance Quiz 1 Suppose a sample 30 observations is drawn from a population with σ 0 2 = 4.55. The sample variance, s2 = 6.7. Test the hypothesis that the sample comes from a population with a variance greater than 4.55. 1. State the null and alternate hypotheses. 2. How many degrees of freedom are there? 3. Compute the chi-square statistic. 4. What is the p−value for your test? 5. What is your decision and your conclusion? 6. Construct a 95% confidence interval for the population variance. 7. Complete the following: In testing for single variance using the chi-square statistic there are three pieces of information needed: the sample standard deviation, the number of data . pieces in your sample (n) and Quiz 2 Math instructors often interest in how exam scores of their students vary. The variance is important to them. Suppose a math instructor believes that the standard deviation for his final exam is 7 points but a student disagrees. The student claims that the standard deviation is more than 7 points. The student wants to conduct a hypothesis test. The student takes a random sample of 15 tests and finds the sample standard deviation to be 6.5 points. 1. State the null and alternate hypotheses for this test. 2. How many degrees of freedom are there for this test? 3. Compute the chi-square statistic. 119 www.ck12.org 4. What is your p−value? 5. What is your decision and your conclusion? 6. Construct a 90% confidence interval for the population variance. 7. Complete the following: In testing for single variance using the chi-square statistic there are three pieces of information needed: The hypothesized population variance, the number of data pieces in your sample (n) and . Quiz 3 1. Complete the following: In testing for single variance using the chi-square statistic there are three pieces of information needed: the sample standard deviation, the hypothesized population standard deviation and . A post office finds that the standard deviation for waiting times on a Monday afternoon is 6.8 minutes. The post office experiments with a single main waiting line and find that for a random sample of 25 customers, the waiting times have a standard deviation of 7.1 minutes. 2. State the null and alternate hypotheses for this test. 3. How many degrees of freedom are there for this test? 4. Compute the chi-square statistic. 5. What is your p−value? 6. What is your decision and your conclusion? 7. Construct a 90% confidence interval for the population variance. Test 1 In the paper “Color Association of Male and Female fourth-Grade School Children” (J. Psych., 1988, 39=83-8) children were asked to indicate what emotion they associated with the color red. The response and the sex of the child are in the table below. Table 1.42: Females Males Anger Happy Love Pain 27 34 19 12 39 38 17 28 1. Under an appropriate null hypothesis (that there is no association between sex and emotion felt when seeing the color red), the expected frequency for the cell corresponding to www.ck12.org 120 Anger and Male is a. 15.9 b. 55.7 c. 30.4 d. 31.9 e. 29.1 2. The null hypothesis will be rejected at the .05 level of significance is the test statistic exceeds a. 3.84 b. 5.99 c. 7.81 d. 9.49 e. 14.07 3. The approximate p−value is a. Between .100 and .900 b. Between .050 and .100 c. Between .025 and .050 d. Between .010 and .025 e. Between .005 and .010 4. Which of the following is not correct? a. The children were cross-classified by sex and emotion associated with red. Each child was counted in one and only one cell. b. The null hypothesis is that the type of emotion associated with red is independent of the sex of the child. c. The null hypothesis is that the proportion of emotions associated with red is the same for both sexes. d. All expected cell counts should be greater than 5 in order that the distribution of the test statistic is an approximate chi-square distribution. e. If we reject the null hypothesis than we have proven that the two sexes associate red with emotions in different ways. 5. A Type I error would be committed if: 121 www.ck12.org a. We conclude that the sex of the child and the emotion associated with red are independent when in fact they are not independent. b. We conclude that the sex of the child and the emotion associated with red are not independent when in fact they are not independent. c. We conclude that the proportion of emotions associated with red differs between males and females when in fact they are the same. d. We conclude that the proportion of emotions associated with red is the same for male and female when in fact they are the same. e. We fail to find any association between the color red and emotions for either sex. 6. The test statistic and approximate p−value is: a. 4.661 .1983 b. 4.661 .3966 c. 4.629 .2011 d. 4.629 .4022 e. 4.629 .1006 7. Each person in a random sample of males and females was asked to state his/her sex and preferred color. The resulting frequencies are shown below: Table 1.43: Male Female Red Blue Green 3 17 11 11 6 2 Which of the following is false? a. 55% of males prefer the color blue b. Of those who prefer the color green, 75% are males c. 44% of people surveyed prefer the color blue d. A higher percentage of males preferred the color blue than females. e. 15% of people are males who prefer the color red. 8. A rescue service wishes to student the behaviour of lost hikers. Two hundred hikers selected at random form those applying for hiking permits are asked whether they would head uphill, downhill or remain in the same place if they became lost while hiking. Each hiker ins the sample was also classified according to whether he or she was an experienced www.ck12.org 122 or novice hiker. The resulting data are summarized below: Table 1.44: Novice Experienced Uphill Downhill Remain place 20 10 50 30 50 40 in same Do these data provide convincing evidence of an association between the level of hiking expertise and the direction the hiker would head if lost? Give appropriate statistical evidence to support your conclusion. Test 2 1. It is generally agreed that the use of the chi-squared distribution is appropriate when the a. Sample size is at least 30 b. Sample size is large enough so that all of the observed cell counts is at least 5 c. Sample size is large enough so that all of the expect cell counts is at least 5 d. Sample size is large enough so that at least one of the expected cell counts is at least 5 e. Sample size is large enough so that the average of the expected cell counts is at least 5 2. In a chi-square test of the null hypothesis that is based on a sample of n = 100 observations classified according to 10 class intervals, the test statistic has a. 99 degrees of freedom b. 97 degrees of freedom c. 9 degrees of freedom d. 7 degrees of freedom e. The number of degrees of freedom cannot be determined without the data 3. Which of the following statements are true? I The chi-square inference procedures deal with categorical variables. II The chi-square distribution is symmetric III A chi-square test of independences on a 2 × 2 table produces the same result as a two tailed difference of proportions test. a. I only 123 www.ck12.org b. I and II only c. I and III only d. I, II and III e. None of the above 4. A random sample of 100 faculty members of a university are asked to respond to two questions:: Question 1: Are you happy with your financial situation? Question 2. Do you approve of the government’s economic policies? The responses are in the following table: Question Question 2 Yes No 1 Yes 22 12 No 48 18 To test the hypothesis that the response to Question 1 is independent of response to Question 2 at the 5% level of significance, the expected frequency for the cell (yes, yes) and the critical value of the associated test statistic are: a. 23.8 and 1.96 respectively b. 10.2 and 3.84 respectively c. 23.8 and 3, 84 respectively d. 23.8 and 7.81 respectively e. 10.2 and 7.81 respectively 5. A survey was conducted to investigate whether alcohol consumption and smoking are related. The following information was gathered for 600 people: Table 1.45: Drinker Non-drinker Smoker Non-smoker 193 89 165 153 Which of the following statements is true? a. The appropriate alternative hypothesis is: Smoking and Alcohol Consumption are independent. b. The appropriate null hypothesis is: Smoking and Alcohol consumption are not indepenwww.ck12.org 124 dent. c. The calculated value of the test statistic is 3.84 d. The calculated value of the test statistic is 7.86 e. At the level .01 we conclude that smoking and alcohol consumption are related. 6. A surprising study of 1437 male hospital admissions reported in the New York Times (February 24, 1993) page C12) found that, of 665 patients admitted with heart attacks, 214 had baldness, while the remaining 772 non-heart related admissions, 175 had baldness. Is this evidence sufficient as the 5% significance level to say that there is a relationship between heart attacks and baldness? Give appropriate statistical evidence to support your conclusion. The following grades were earned by students in three teachers’ classes. Table 1.46: Mrs. C Mr. M Ms. L A B C 12 6 15 24 12 6 12 18 3 7. Determine if these teachers, as a group, meet the established standard of 30% A’s, 40% B’s, and 30% C’s. 8. Is there evidence that the grading patterns are associated with the teacher who wards the grades? 1.11 Analysis of Variance and the F-Distribution F Distribution and Testing Two Variances Quiz 1 1. True or False: The F distribution is symmetrical. The variability in the amount of impurities present in a batch of chemicals used for a particular process depends on the length of time that the process is in operation. Suppose a sample of 25 is drawn from the normal process which is to be compared to a sample of a new process. 125 www.ck12.org Table 1.47: n s2 Sample 1 Sample 2 25 1.04 25 .51 2. What are the null and alternative hypotheses for this scenario? 3. What is the critical value with α = .05? 4. Calculate the F ratio. 5. Would you reject or fail to reject the null hypothesis? Explain your reasoning. Quiz 2 1. True or False: The F distribution ranges across all real numbers. A manufacturer wishes to determine whether there is less variability in the silver plating dome by company 1 than that done by company 2. Independent random samples yield the following results. Table 1.48: n s2 Sample 1 Sample 2 12 0.035 12 0.062 2. What are the null and alternative hypotheses for this scenario? 3. What is the critical value with α = .05? 4. Calculate the F ratio. 5. Would you reject or fail to reject the null hypothesis? Explain your reasoning. Quiz 3 1. Complete the following: the F distribution is a family of distributions based on A math test is given in two classrooms. The principal of the school wanted to know if the two classroom variances were different. www.ck12.org 126 Table 1.49: n s2 Sample 1 Sample 2 21 16.8 16 42.6 2. What are the null and alternative hypotheses for this scenario? 3. What is the critical value with α = .05? 4. Calculate the F ratio. 5. Would you reject or fail to reject the null hypothesis? Explain your reasoning. One Way ANOVA Quiz 1 Three different machines were being considered for purchase by a manufacturer. Initially five of each machine was borrowed, and each was randomly assigned to one of 15 technicians, all equal in skill. Each machine was put through a series of tasks and rated. The higher score on the test, the better the performance of the machine. The data are: Table 1.50: Machine 1 Machine 2 Machine 3 24.5 23.5 26.4 27.1 29.9 28.4 34.2 29.5 32.2 30.1 26.1 28.3 24.3 26.2 27.8 1. State the null hypothesis. 2. Using the data above, please fill out the missing values in the table below. Table 1.51: Machine 1 Number (nk ) Total (Tk ) Mean (X) Machine 2 154.4 30.88 127 Machine 3 Totals 5 = = = www.ck12.org Table 1.51: (continued) Machine 1 Machine 2 Machine 3 Totals Sum of Squared Obs. ∑ 2 2 ( ni=1 Xik ) Sum of Obs. Squared/Number ( 2) T of Obs. nkk = = 3. What is the mean squares between groups (M SB ) value? 4. What is the mean squares within groups (M SW ) value? 5. What is the F ratio of these two values? 6. Using a α = .05, please use the F distribution to set a critical value 7. What decision would you make regarding the null hypothesis? Why? Quiz 2 A sociology professor was interested in studying the question of whether the presence of others influenced helping behavior when there is a person in some kind of distress. Data was kept on the number of seconds it took for a subject to respond to the person in distress. The subject was in a room with other people. Following is the data: # people present 0 25 30 20 32 2 30 33 29 40 36 4 32 39 35 41 44 1. State the null hypothesis. 2. Using the data above, please fill out the missing values in the table below. www.ck12.org 128 Table 1.52: 0 2 Number (nk ) Total (Tk ) Mean (X) Sum of Squared Obs. ∑ 2 2 ) ( ni=1 Xik Sum of Obs. Squared/Number ( 2) T of Obs. nkk 4 Totals 5 = = = = 168 33.6 = 3. What is the mean squares between groups (M SB ) value? 4. What is the mean squares within groups (M SW ) value? 5. What is the F ratio of these two values? 6. Using a α = .05, please use the F distribution to set a critical value 7. What decision would you make regarding the null hypothesis? Why? Quiz 3 The data below comes from a study by Hogg and Ledolter (Hogg, R. V., and J. Ledolter. Engineering Statistics. New York: MacMillan, 1987.) of bacteria counts in shipments of milk. The columns represent different shipments. The rows are bacteria counts from cartons of milk chosen randomly from each shipment. Do some shipments have higher counts than others? 24 15 21 27 33 23 14 7 12 17 14 16 11 9 7 13 12 18 7 7 4 7 12 18 19 24 19 15 10 20 1. the null hypothesis. 2. Using the data above, please fill out the missing values in the table below. 129 www.ck12.org Table 1.53: 1 Number (nk ) Total (Tk ) Mean (X) Sum of Squared Obs. ∑ 2 2 ) ( ni=1 Xik Sum of Obs. Squared/Number of 2 ) Obs. ( 2 3 4 5 6 Totals = 80 13.3 = = = = Tk nk 3. What is the mean squares between groups (M SB ) value? 4. What is the mean squares within groups (M SW ) value? 5. What is the F ratio of these two values? 6. Using a α = .05, please use the F distribution to set a critical value 7. What decision would you make regarding the null hypothesis? Why? Two Way ANOVA Test and Experimental Design Quiz 1 A research study was conducted to examine the impact of eating a high protein breakfast on adolescents’ performance during a physical education physical fitness test. Half of the subjects received a high protein breakfast and half were given a low protein breakfast. All of the adolescents, both male and female, were given a fitness test with high scores representing better performance. Test scores are recorded below. Table 1.54: Group Males www.ck12.org High Protein Low Protein 10 7 9 6 8 5 5 4 7 4 5 3 130 Table 1.54: (continued) Group Females High Protein Low Protein 4 6 3 2 4 5 1 2 1. Complete the following ANOVA table. Table 1.55: Source SS df Protein Level Gender Protein Level x Gender Within 20 45 5 1 1 1 36 16 MS F Total 2. State the three hypotheses associated with this two way ANOVA. 3. What are the critical values for each of these three hypotheses? 4. Would you reject the null hypotheses? Why or why not? Quiz 2 Researchers have sought to examine the effect of various types of music on agitation levels in patients who are in the early and middle stages of Alzheimer’s disease. Patients were selected to participate in the study based on their stage of Alzheimer’s disease. Three forms of music were tested: Easy listening, Mozart, and piano interludes. While listening to music, agitation levels were recorded for the patients with a high score indicating a higher level of agitation. Scores are recorded below. Table 1.56: Group Piano Interlude Mozart Easy Listening 21 24 9 12 29 26 131 www.ck12.org Table 1.56: (continued) Group Early Alzheimer’s Middle Alzheimer’s Piano Interlude Mozart Easy Listening Stage 22 10 30 Stage 18 20 22 20 25 5 9 14 18 11 24 26 15 18 20 18 20 9 13 13 19 1. Complete the following ANOVA table. Table 1.57: Source SS df Type of Music Degree of Alzheimer’s Music x Alzheimer’s Within 740 30 2 1 260 2 178 24 MS F 2. State the three hypotheses associated with this two way ANOVA. 3. What are the critical values for each of these three hypotheses? 4. Would you reject the null hypotheses? Why or why not? 5. Interpret your answer. Quiz 3 A study examining differences in life satisfaction between young adult, middle adult, and older adult men and women was conducted. Each individual who participated in the study completed a life satisfaction questionnaire. A high score on the test indicates a higher level of life satisfaction. Test scores are recorded below. www.ck12.org 132 Table 1.58: Group Male Female Young Adult Middle Adult Older Adult 4 2 3 4 2 7 4 3 6 5 7 5 7 5 6 8 10 7 7 8 10 7 9 8 11 10 9 12 11 13 1. Complete the following ANOVA table. Table 1.59: Source SS df Age Gender Age x Gender Within 180 30 0 44 2 1 2 24 MS F 2. State the three hypotheses associated with this two way ANOVA. 3. What are the critical values for each of these three hypotheses? 4. Would you reject the null hypotheses? Why or why not? 5. Interpret your answers. Test 1 1. True or False? If False, correct it. In a one-way classification ANOVA, when the null hypothesis is false, the probability of obtaining an F-ratio exceeding that reported in the F table at the .05 level of significance is greater than .05. 2. In a study, subjects are randomly assigned to one of three groups: control, experimental A, or experimental B. After treatment, achievement test scores for the three groups are compared. The appropriate statistical test for this comparison is: 133 www.ck12.org a. the correlation coefficient b. chi square c. the t-test d. the analysis of variance 3. Table 1.60: Source SS df Between Within Total 30.5 165.0 4 99 What decision would be made regarding : population means are equal? a. Reject H0 at the .05 level b. Fail to reject H0 at the .01 level c. Insufficient information is given to answer 4. Nine children were randomly split into three groups of three each. In a spelling unit, individuals in one group were criticized each time they misspelled a word. The individuals in another group were praised each time they correctly spelled a word, while the individuals in the third group were neither praised nor criticized. At the end of the unit, each child was given ten words to spell with the following results (number correct is given for each child): Table 1.61: Praised Neutral Criticized 8 9 7 3 2 5 9 10 7 MEANS FOR RESPONSE TREATMENT MEAN(WORDS CORRECT) CRITICIZED 8.66667 PRAISED 8 NEUTRAL 3.33333 ANALYSIS OF VARIANCE: www.ck12.org 134 Table 1.62: SOURCE OF VARI- DF ATION SS MEAN SQUARE RESPONSE EXPERIMENTAL ERROR TOTAL 2 6 50.6667 11.3333 25.33333 1.88889 8 62.0000 PROBABILITY LEVEL FOR COMPARING MEANS = 0.05 Is there any evidence for significant differences among the methods? 5. Following is computer output for an ANOVA. Table 1.63: Source SS df MS Treatment Error Total 2356.5 2014.9 4371.4 3 20 23 785.5 100.74 What is the rejection region at the .05 level of significance for the above ANOVA? a. 2.78 b. 2.87 c. 3.03 d. 3.10 e. 3.49 6. A one-way ANOVA was conducted on a dataset with five levels. Sample sizes for each level were 5, 6, 5, 5, and 4. The correct degrees of freedoms for the ANOVA are: a. DFG = 5, DFE = 20, DFT = 25 b. DFG = 5, DFE = 19, DFT = 24 c. DFG = 4, DFE = 21, DFT = 25 d. DFG = 4, DFE = 20, DFT = 24 7. A client tells you that he wants to conduct a one-way ANOVA for four means. Based only on this information, i.e., four means only, can you conduct a one-way ANOVA? Explain. 135 www.ck12.org 8. For a two-way ANOVA with four levels of Factor A, five levels of Factor B, and 100 total observations, which of the following is the appropriate combination of degrees of freedoms? a. DFA = 4, DFB = 5, DFAB = 20, DFE = 71, DFT = 100 b. DFA = 3, DFB = 4, DFAB = 7, DFE = 86, DFT = 100 c. DFA = 3, DFB = 4, DFAB = 12, DFE = 80, DFT = 99 d. DFA = 3, DFB = 4, DFAB = 7, DFE = 90, DFT = 99 A research study was conducted to examine the clinical efficacy of a new antidepressant. Depressed patients were randomly assigned to one of three groups: a placebo group, a group that received a low dose of the drug, and a group that received a moderate dose of the drug. After four weeks of treatment, the patients completed the Beck Depression Inventory. The higher the score, the more depressed the patient. The data are presented below. Compute the appropriate test. Table 1.64: Placebo Low Dose Moderate Dose 38 47 39 25 42 22 19 8 23 31 14 26 11 18 5 9. What is your computed answer? 10. What would be the null hypothesis in this study? 11. What would be the alternate hypothesis? 12. What probability level did you choose and why? 13. What were your degrees of freedom? 14. Is there a significant difference between the four testing conditions? 15. Interpret your answer. 16. If you have made an error, would it be a Type I or a Type II error? Explain your answer. Test 2 1. Suppose the critical region for a certain test of hypothesis is of the form F > 9.48773 and the computed value of F from the data is 86 (F refers to an F statistic.) Then: www.ck12.org 136 a. H0 should be rejected. b. Ha is two-tailed. c. The significance level is given by the area to the right of .48773 under the appropriate F distribution. d. None of these. 2. True or False? If False, correct it. In ANOVA, if we wish to investigate the difference among five means, it is good statistical procedure to perform a t-test on each pair of means. 3. Samples of size 11 are taken from each of 5 populations. Complete the following analysis of variance table: Table 1.65: Source S.S. d.f. M.S. F Betweenmeans Withinsamples Total 1000 5000 6000 a b c d e a. a = 4 b = 44 c = 250 d = 113.6 e = 2.2 b. a = 4 b = 44 c = 250 d = 113.6 e = 0.2 c. a = 5 b = 55 c = 200 d = 90.9 e = 0.2 d. a = 5 b = 50 c = 200 d = 100 e = 2.0 e. a = 4 b = 50 c = 250 d = 100 e = 2.5 4. Following is computer output for an ANOVA. Table 1.66: Source SS df MS Treatment Error Total 2356.5 2014.9 4371.4 3 20 23 785.5 100.74 What is the f −value for the above ANOVA? a. 1.17 b. 23.39 137 www.ck12.org c. 0.13 d. 6.67 e. 7.80 5. A one-way ANOVA was conducted on a dataset with four levels. Sample sizes for each level were 5, 6, 5, and 4. The F statistic has which of the following distributions, when H0 is true? a. T (4) b. T (5, 20) c. F (3, 16) d. F (4, 18) e. F (4, 15) 6. A client tells you that her dog ate her ANOVA table. She says that she only has SSG = 456.3 and M SE = 35.3. She remembers that the total sample size is 30 and the number of levels is 4 in her analysis. What is the F −statistic in her ANOVA? a. 12.926 b. 0.0774 c. 7.5 d. 4.309 e. Not enough information to answer this question. 7. For a two-way ANOVA with four levels of Factor A, six levels of Factor B, and 121 total observations, which of the following is the appropriate combination of degrees of freedoms? a. DFA = 3, DFB = 7, DFAB = 21, DFE = 90, DFT = 121 b. DFA = 4, DFB = 6, DFAB = 10, DFE = 101, DFT = 121 c. DFA = 3, DFB = 5, DFAB = 15, DFE = 97, DFT = 120 d. DFA = 3, DFB = 5, DFAB = 14, DFE = 98, DFT = 120 A researcher is concerned about the level of knowledge possessed by university students regarding United States history. Students completed a high school senior level standardized U.S. history exam. Major for students was also recorded. Data in terms of percent correct is recorded below for 32 students. Compute the appropriate test for the data provided below. www.ck12.org 138 Table 1.67: Education Business/Management Behavioral/Social Science Fine Arts 62 81 75 58 67 48 26 36 72 49 63 68 39 79 40 15 80 57 87 64 28 29 62 45 42 52 31 80 22 71 68 76 8. What is your computed answer? 9. What would be the null hypothesis in this study? 10. What would be the alternate hypothesis? 11. What probability level did you choose and why? 12. What were your degrees of freedom? 13. Is there a significant difference between the four testing conditions? 14. Interpret your answer. 15. If you have made an error, would it be a Type I or a Type II error? Explain your answer. 139 www.ck12.org