Survey

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts

Transcript

PART I GETTING TO KNOW YOUR BOOK: A SCAVENGER HUNT AND MORE! You will have an easier time using Mind On Statistics in your statistics course if you first familiarize yourself with the features of the book and companion website. This part of the activities manual has 46 questions that will help you get to know your book. You will have to open the book and do some searching to answer these questions. In essence, the questions in this part send you on a scavenger hunt. The questions are divided into the following categories: A. Getting to Know the Visual Features of the Book B. A Few General Questions to Test Your Observational Skills C. Getting to Know about the Thought Questions D. Getting to Know about the Exercises E. Getting to Know about the Companion Website F. Getting to Know about the Datasets accompanying the Book G. Getting to Know about the Technology Manuals H. Getting to Know about the Supplemental Topics I. Getting to Know about the Skillbuilder Applets on the Companion Website You can choose to explore all of these activities as you begin working with the book, or explore each one as it is introduced in your course. Whichever you choose to do, have fun! 1 A. Getting to Know the Visual Features of the Book Words alone, with nothing to break them up except chapter headings, might be fine for a novel, but would make for confusing reading in a textbook! Like most textbooks, Mind On Statistics has various visual features designed to enhance the words and formulas on the pages. Questions 1 to 10 list ten organizational features of the book. In Chapter 1 and/or Chapter 2 of the book, locate two instances of each feature. Give two page numbers where you found the feature, and describe how the feature is presented visually (color and style). As an example, a Thought Question can be found on page 8 and page 17 of the book. Thought Questions are introduced in purple print and presented in a box with a blue border. 1. Case Study Found on page ____ and page ____. Visual presentation: 2. Definition Found on page ____ and page ____. Visual presentation: 3. Numbered Section Heading Found on page ____ and page ____. Visual presentation: 4. Unnumbered (Sub)Section Heading Found on page ____ and page ____. Visual presentation: 2 5. Key Terms Found on page ____ and page ____. Visual presentation: 6. “In Summary” Display Found on page ____ and page ____. Visual presentation: 7. Numbered Example Found on page ____ and page ____. Visual presentation: 8. Software/computing Tip (Minitab, SPSS, Excel, TI 83/84 Tips) Found on page ____ and page ____. Visual presentation: 9. Formula Box Found on page ____ and page ____. Visual presentation: 10. Technical Note (provide one example only) Found on page ____ . Visual presentation: 3 B. A Few General Questions to Test Your Observational Skills… 11. What is the connection between the photos on the opening pages of the chapters and the material in the chapters? 12. How are the titles of Chapter 2 and Chapter 17 related? What do you think those titles mean? 13. Some examples and exercises in the book use data from Penn State University and the University of California at Irvine. Why do you think data from those two schools are used so often? C. Getting to Know about the Thought Questions “Thought Questions” are scattered throughout the book. Several also are repeated as activities in Part 2 of this manual. Your instructor may use them in a formal way. If not, we encourage you to still read them and try to answer them on your own. 14. What is the premise behind the “Thought Questions” in the book? (Hint: See page xv of the Preface.) 4 15. Give an example of a Thought Question from Chapter 3. What is the page number where you found it? 16. Where can you find a hint for the answer to a Thought Question? 5 D. Getting to Know about the Exercises Exercises are located at the end of every chapter. Except for Chapters 1 and 17, exercises specific to each section of the chapter are given. The following questions will help you identify other useful features of the exercises. 17. What is visually different about the pages in the book that contain exercises, other than the words on the page? 18. What does it mean when the number of an exercise is in bold? 19. Notice that for Exercise 4.41 (page 141), the exercise number 4.41 and the letter for parts b and c are bold, but the letter for part a is not bold. What does that mean? 20. Some exercises in each section are designed to help you learn about the more basic concepts and methods. How are the exercises that cover these “basic skills” indicated in the text? 21. How many Skillbuilder exercises are there for Section 7.5? Is the number of Skillbuilder exercises the same for every section? 22. Notice that near the bottom of page 282, in the left margin, there is a note that says “8.4 Exercises are on pages 309-310.” What does that margin note mean? 6 23. On what page do the exercises for Section 9.3 begin? Give two ways you could have discovered that page number. 24. Give one place where you can find answers for selected exercises, and one place where you can find fully worked solutions. (Hint: The introduction to the Exercises for every chapter provides this information.) 25. Locate the answer to Exercise 10.1c. On what page is the answer located, and what is the answer? 26. All chapters except Chapter 1 have a set of exercises that may apply to material from any one or more of the sections in the chapter. What is the heading for those exercises? On what page do these exercises start in Chapter 11? 27. What does it mean when an exercise is marked with a blue laptop icon? Use Exercise 2.67 on page 61 as an example, and explain why the orange diamond is appropriate in that case. 28. How do “Dataset Exercises” differ from exercises marked with an orange diamond? (Hint: See page 67.) 7 E. Getting to Know about the Companion Website The companion website for this book (http://www.cengage.com/statistics/utts5e) contains a wealth of resources. Experience has taught us that some students never discover the resources on the companion site! The following activities will get you acquainted with what’s on them. To access the student resources, go to the webpage listed above. On the right side of the page, you will see the student resources for the book. 29. Turn to Example 4.2 on page 116 of your book. How can you tell there is additional information available for this example on the companion website? 30. What additional information is available on the companion website for Example 4.2 on page 116? 31. Continuing the exploration of Example 4.2 from the previous question, find the “Original Source” on the companion website. Write the title of this original source. 32. Continuing the exploration of Example 4.2 from the previous questions, what is the relationship between Example 4.2 and the “Original Source” for it? 33. List the resources available on the companion website. 8 F. Getting to Know about the Datasets accompanying the Book The companion website includes numerous “datasets.” (The general definition of a “dataset” is on page 16 of the book.) 34. The datasets are available in eight different formats. List what they are, and identify which one or more will be most useful to you. 35. Locate the dataset called MusicCDs on the companion website and open the dataset in a format that is familiar to you. What is the relationship between Example 2.9 on page 31 in the book and the dataset MusicCDs? 36. Continue to work with the MusicCDs dataset from the previous question. Notice that there are two columns, labeled "CDs" and "Sex." The first row has "220" and "Female." Explain what information this tells you about the first person in this dataset. 37. Now open the dataset deprived. Find the last student in the dataset. How many hours of sleep did this student claim to get per night, and did the student say he or she feels sleep-deprived? Hours of sleep:______ Deprived? _______ 38. Turn to Exercise 3.3 on page 100 and Exercise 3.100 on page 111. 9 a. What is the name of the dataset used for these two exercises? b. Do you have to access the dataset to do Exercise 3.3? What about to do Exercise 3.100? Other than reading the exercise, how did you know whether the dataset was required in each case? 10 G. Getting to Know about the Technology Manuals The companion website contains six Technology Manuals. Each one explains how to solve statistics problems using a specific technology. For example, there is a manual explaining how to solve statistics problems using Excel. 39. Aside from Excel, what other four technologies covered by the manuals? 40. Open the manual that covers the technology that you will be using in your class. What is the relationship between the organization of the technology manual and the organization of Mind On Statistics? H. Getting to Know about the Supplemental Topics On the companion website, there is section called “Supplemental Topics.” Access the site and answer these questions. 41. How many Supplemental Topic chapters are there? 42. What is the title of Supplemental Topic 5? 43. Are the Supplemental Topics listed in the Table of Contents in the book? 44. An example of an entry in the Index of Mind On Statistics is “American Statistical Association, S5-2.” Suppose that you want to read page S5-2. Where would you find it? 11 I. Getting to Know about the Skillbuilder Applets on the Companion Website There are six applets on the companion website that allow you to investigate concepts interactively. You won’t be ready to learn the statistical content related to them until you cover the relevant material in the book, but the following exploration will acquaint you with how they work. 45. Turn to the Table of Contents on page vii in the book. Notice that below Section 2.7 it says “Applets for Further Exploration.” Where in the book are can you find information about other Applets for further exploration? 46. Access the applets on the companion website, and then click on link for the “Random Sampling in Action” applet, the applet that is described in Chapter 5, pages 178-179 in the book. a. When you first open the applet, it should look like one of the figures in Section 5.7 of the book. What is the number of the figure that shows what it looks like, and what page is it on? b. Press the button labeled Sample. Describe what happens. c. There are exercises in the book to accompany this applet. On what page do these exercises start, and what are the exercise numbers? 12 PART II ACTIVITIES BY CHAPTER 13 CHAPTER 1 ACTIVITIES Activity 1.1 To begin, review Case Study 1.1 on page 2 in the book. a. The paragraph prior to Moral of the Story contains, “In fact, three-fourths of men have driven 95 miles per hour or more, but only one-fourth of the women have done so.” On the basis of the five number summaries, explain how we know that this is the case. (Hint: The lower quartile is a value such that about 1/4 of the data values are less than or equal to it. The upper quartile is such that about 3/4 the data value are less than or equal to it.) b. About what fraction of the females reported a “fastest ever driven speed” between 80 and 95 mph? Briefly explain how you determined this? c. The last few pieces of data for males and females, respectively, are as follows: Males: 112, 120, 110, 115, 125, 55, 90 Females: 95, 110, 80, 95, 90, 80, 90 Using Figure 1.1 on page 2 as a guide, draw a dotplot that compares these seven males and seven females. d. Refer to the data values given for the seven males in part c. Show that the median for those seven values is 112. 14 Activity 1.2 To begin, review Case Study 1.3 on pages 3-4 in the book. a. What was the population of interest for the survey described in Case Study 1.3? b. What sample was used to represent the population described in part a? c. What was the value of the margin of error for this survey? Write a sentence that interprets the meaning of this value. d. Suppose that a different survey asking the same question(s) collected data from 1000 randomly selected teens who have dated. What will be the margin of error for this survey? e. Refer back to part d. Suppose that in the survey of 1000 teens who have dated, the percentage who had dated somebody of another race or ethnic group was 53%. In that case, give an interval that is 95% certain to include the true percentage of U.S. teens who have dated somebody of another race or ethnic group. 15 Activity 1.3 To begin, review Case Study 1.5 and Case Study 1.6 on pages 5-6 in the book. a. Explain how Case Study 1.5 is an example of an observational study whereas Case Study 1.6 is an example of a randomized experiment. b. For Case Study 1.5, explain why smoking and alcohol use could be confounding variables in this study. c. Explain why it might not be possible to conclude that attending religious services regularly may cause lower blood pressure on the basis of the studies described in Case Study 1.5. d. Explain why it is possible, on the basis of the experiment described in Case Study 1.6, to make a cause-and-effect conclusion that aspirin use might reduce heart attack rates, 16 Activity 1.4 Read the original journal article for Case Study 1.5 about religious activities and blood pressure. It can be found on the textbook companion website. a. Briefly describe how the sample was selected for the study described in the journal article. b. An important concept in statistics is a sample should be representative of the larger population for which conclusions are made. Discuss what larger population you think is represented by the sample used in this study. That is, to whom do the results of this study apply? c. Summarize some of the limitations of the study that the authors of the article discuss. 17 CHAPTER 2 ACTIVITIES NOTE TO INSTRUCTORS: Three group/team projects for Chapter 2 suitable for inclass work are in the Course Support (Class Projects) section of the companion website. Activity 2.1 This is Thought Question 2.1 on page 17 in the book. There were almost 200 students who answered the survey questions given on page 15. Formulate four interesting questions about the students that you would like to answer using the data from these students. What kind of summary information would help you answer your questions? a. Question 1 about the dataset: Desired Summary Information: b. Question 2 about the dataset: Desired Summary Information: c. Question 3 about the dataset: Desired Summary Information: d. Question 4 about the dataset: Desired Summary Information: 18 Activity 2.2 Use the material in Section 2.1 for guidance on this activity. a. Explain the distinction between the terms sample data and population data. b. In what circumstance would the data described in Section 2.1 represent sample data? In what circumstance would it represent population data? Activity 2.3 This is Thought Question 2.2 on page 18 in the book. Review the data collected in the statistics class, listed on pages 15-16 in Section 2.1. Identify a type (categorical, quantitative, ordinal) for each variable. The only one that is ambiguous is question 5 of the survey. That question asks for a numerical response, but as we will see later in this chapter, it is more interesting to summarize the responses as if they are categorical. Variable Variable Type 1. Sex ________________ 2. Hours of sleep ________________ 3. Choice of S or Q ________________ 4. Height ________________ 5. Number picked ________________ 5. Fastest ever driven ________________ 7. Right handspan ________________ 8. Left handspan ________________ 19 Activity 2.4 This activity is about Example 2.1 on page 21. a. What percentage of the sample said that they wear a seatbelt when driving either “Always” or “Most times?” Show how you calculated the answer. b. What percentage of the females said that they wear a seatbelt when driving either “Always” or “Most times?” Show how you calculated the answer. c. What percentage of the males does not always wear a seatbelt when driving? Show how you calculated this value. d. Draw a bar graph of the information given in Table 2.1 on page 21 in the book. Use Figure 2.3 on page 23 for guidance. (Note: It’s easy to draw this “by hand,” but if you want to do use software to draw the graph you’ll find the raw data in the YouthRisk03 dataset on the companion website. e. Explain whether you think the U.S. Centers for Disease Control would consider the data used for Example 2.1 to be population data or sample data. f. In Example 2.1, which variable is the explanatory variable and which variable is the response variable? 20 Activity 2.5 This is a modified version of Thought Question 2.4 on page 23 in the book. a. Redo the bar graph in Figure 2.4 on page 23 using counts on the vertical axis instead of percentages. The necessary data are given in Table 2.3 on page 21. b. Is the comparison of frequency of myopia across the categories of lighting as easy to make using the bar graph with counts as it is with percentages? Generalize your conclusion to provide guidance about what should be done in similar situations. c. What do you learn from the bar graph of counts that you do not learn from the bar graph of percentages? 21 Activity 2.6 To begin, review Example 2.10 on page 38. Now, suppose that a different set of six quiz scores is 83, 76, 98, 90, 55, 85, 87. a. Find the value of the median for this new set of scores. Show any details of your work. b. Find the value of the mean for this new set of scores. Activity 2.7 One basic idea in Example 2.12 on pages 38-39 is that for almost all things we measure, there is a range of values that would be considered normal, but only one single number that is the average. In everyday language, the words “normal” and “average” are often confused. For instance, when weather reporters talk about the normal rainfall for a given time or year, or what they really mean is the average rainfall. In the other direction, we may talk about whether someone gets “average grades.” We don’t mean that they are exactly at the average; instead we mean that they are in a range that includes the majority of people. a. Briefly explain how Example 2.12 (p. 38-39) is an example of how the words “normal” and “average” may be confused. b. Explain what is wrong with the following quote. A friend says: “I tend to have a lower body temperature than normal – it’s closer to 98 degrees than 98.6 degrees.” c. Explain what is wrong with the following quote. A teacher says: “I’ve taught this course many times, and as a class, your results on the midterm were about average for students in this course.” 22 Activity 2.8 Review Example 2.16 on page 45. Suppose that in a different year from the one in the example, the weights of the members of the crew team are 192.4, 180.6, 203.6, 215.8, 175.0, 183.2, 199.4, 187.2, 111.6. a. For this new set of weights, find the values of the median, the lower quartile and the upper quartile. b. For this new data, what is the value of the interquartile range? c. Fill in the two blanks in the following sentence with numerical values. For this new data, any value smaller than _______ or larger than _______ would be marked as an outlier. d. Draw a boxplot of the set of weights given in this activity. Use Figure 2.14 on page 34 for guidance. 23 Activity 2.9 Review Example 2.18 on pages 48-49 in the book. Suppose that a different set of pulse rates is 62, 66, 72, 77, 83. Follow the steps in Example 2.18 to calculate the value of the standard deviation for this new set of pulse rates. Step 1: Calculate x , the sample mean. . Steps 2 and 3: Complete in this table: Data Value Difference between value Squared Difference and mean between value and mean Step 4: Determine value of the variance. Step 5: Take the square root of the Step 4 answer in order to find the standard deviation. 24 Activity 2.10 Suppose that car and truck speeds at a particular location have approximately a bell-shaped distribution with mean = 65 mph and standard deviation = 5 mph. For parts a-c, use the empirical rule to fill in the blanks in each part: a. About 68% of cars and trucks travel between _______ and _______ at this location. b. About 95% of cars and trucks travel between _______ and ________ at this location. c. About 99.7% of cars and trucks travel between _______ and ________ at this location. d. Illustrate the distribution of vehicle speeds at this location, by drawing a picture similar to Figure 2.23 on page 50. e. Calculate a z-score for a vehicle speed of 72 mph at this location (where mean = 65 mph and standard deviation = 5 mph.) f. Fill in the blanks in the following sentence: A vehicle speed of 72 mph is _______ standard deviations _______________ the mean speed at this location. (Hint: See the last sentence before the definition box of The Empirical Rule on page 51). 25 Activity 2.11 This activity uses the Empirical applet described in Applets for Further Exploration (pages 52-53). To start, read Section 2.8 and also start the applet on your computer. a. For each of the eight variables, characterize the shape of the distribution as approximately bell-shaped, somewhat skewed, or extremely skewed. Also note whether there are major outliers, minor outliers or no outliers. Look carefully to make sure you notice any outliers that are at the extremes of the histograms. Shape? Outliers? Sleep _______________ _________ TV Hours _______________ _________ Dad’s Height _______________ _________ Exercise _______________ _________ Ideal Height _______________ _________ Alcohol _______________ _________ Handspan (females) _______________ _________ Handspan (males) _______________ _________ b. In general, if there are major outliers in a data set: (i) Will they cause the standard deviation to be larger or smaller than it would be for the data without the outlier(s)? Explain. (ii) Will the widths of the intervals used in the Empirical Rule (Mean ± one s.d., etc.) be bigger or smaller than they would be without the outlier(s)? (This answer follows directly from your answer in part a.) (iii) Using your answer in part b, do you think the percent of the data covered by the interval Mean ± one s.d. will be higher or lower when outliers are present than it would be if outliers were not present? Explain. 26 c. Fill in the values found using the applet, for the percent of the data in each of the following intervals. Mean ± one s.d Mean ± two s.d. Sleep ____________ ____________ TV Hours ____________ ____________ Dad’s Height ____________ ____________ Exercise ____________ ____________ Ideal Height ____________ ____________ Alcohol ____________ ____________ Handspan (females) ____________ ____________ Handspan (males) ____________ ____________ d. Study the results from activities a and c. What can you conclude about how well the Empirical Rule works for these data sets? In what situation (shape, outlier status) does is work best? In what situation (shape, outlier status) does it work least well? e. For which of the two intervals, Mean ± one s.d. or Mean ± two s.d., did the Empirical Rule work less well when outliers and a skewed shape were present? 27 CHAPTER 3 ACTIVITIES NOTE TO INSTRUCTORS: Three group/team projects for Chapter 3 suitable for inclass work are given in the Course Support (Class Projects) section of the companion website. Activity 3.1 This is a modified version of Thought Question 3.2 on page 74 in the book. a. The following scatterplot shows adult daughters’ heights versus mothers’ heights, in inches, as reported by 132 females in a statistics class. You would now like to predict how tall your infant niece will be when she grows up. Explain how you would use this scatterplot to help you make the prediction. b. Suppose that your niece’s mother (your aunt) is 62 inches tall. About how tall do you predict your infant niece will be when she grows up? Explain how you determined this prediction. c. What other variables, aside from her mother’s height, might be useful for improving your prediction of your niece’s height? How could you use these variables in conjunction with the mother’s height to make your prediction? 28 Activity 3.2 To begin, review the In Summary box that begins at the bottom of page 77 in the book. a. What does the slope of a regression line estimate? b. What does the intercept of a regression line estimate? c. On page 76 in the book, the equation Handspan = − 3 + 0.35 Height is given. (See Example 3.5 for further discussion.) What is the value of the slope of this regression line? Write a sentence that interprets this value in the context of the height and handspan variables. d. What is the intercept of the equation given in part c and on page 76? Explain why this intercept does not provide useful information about height and handspan. (Hint: Think about the definition you wrote in part b, and consider that the x-variable is height.) e. Review Example 3.7 on page 78 in the book. What is the value of the slope for the regression line in that example? Write a sentence that interprets this value in the context of the sign-reading distance and age variables. f. Continue with Example 3.7 on page 78. What is the estimated average sign-reading distance for drivers who are 22 years old? 29 Activity 3.3 This is Thought Question 3.5 on page 94 in the book. a. Sketch a scatterplot with an outlier that would inflate the correlation between the two variables. b. Sketch a scatterplot with an outlier that would deflate the correlation between the two variables. Activity 3.4 This activity uses the Correlation applet described in Applets for Further Exploration (pages 97-99 in the book). Start by reading Section 3.6 and also start the applet on your computer. Use the first scatter plot frame in the applet to do the following. a. Create a set of points that has a correlation value within 0.05 of the target correlation of 0.5, using at least 15 points. Do this without including any extreme outliers. Draw a sketch of the plot that you created. b. Create a set of points such that the correlation is about 0.8 for the first 14 points. Then add a single outlier that lowers the correlation to about 0.5. You may have to play with adding and deleting points until you figure out how to do this. Draw a sketch of the plot that you created. 30 c. Create a set of points such that the correlation is close to 0 for the first 14 points. Then, add a single outlier that increases the correlation to about 0.5. You may have to play with adding and deleting points until you figure out how to do this. Characterize the type of outlier that made this happen. Was it in line with the other points? d. Using the results from parts b and c, characterize the affect different types of outliers have on the value of the correlation. Explain what type of outlier inflates correlation and what type of outlier deflates correlation. Activity 3.5 This activity uses the Correlation applet described in Section 3.6 (pages 9698 in the book). Start by reading Section 3.6 and also start the applet on your computer. Use the second scatter plot frame in the applet to do the following. a. Create a set of points that has a correlation value within 0.05 of the target correlation of −0.8, as instructed, using at least 15 points. Do this without including any extreme outliers. Draw a sketch of the plot that you created. b. Create a set of points by putting a tight cluster of points in the upper left corner for the first 14 points, such that the correlation is no more than about 0.2 in absolute value (i.e. it’s between −0.2 and +0.2). Then add a single outlier such that the correlation increases to be within 0.05 of the target of −0.8, just by adding the outlier. Draw a sketch of the plot that you created. 31 Activity 3.5 Continued c. Create a scatter plot illustrating two groups where the correlation is positive within each group, but the correlation for the combined groups is within 0.05 of the target correlation of −0.8. Draw a sketch of the plot that you created. d. On the basis of the results of parts b and c, describe two ways to create a strong negative correlation that would be misleading, if the interpretation of the strong negative value is that as one variable increases steadily the other decreases steadily. Activity 3.6 This activity uses the Correlation applet described in Section 3.6 (pages 9698 in the book). Start by reading Section 3.6 and also start the applet on your computer. Using the third scatter plot frame in the applet, play with various ways in which you can place points to get within 0.05 of the target correlation of 0. Describe some different ways to do this. 32 Activity 3.7 This is a dataset activity. Use the Temperature dataset. The data are latitude and temperature data for 20 U.S. cities. Latitude is the geographic latitude of the city and JanTemp is the mean January temperature. a. Make a scatterplot showing the connection between JanTemp (y-variable) and latitude (x-variable). On the basis of this plot, answer these questions: (i) Does it look like a straight line is a suitable description of the data, or do the data look to be curved? (ii) Is the correlation between the two variables positive or is it negative? b. Use statistical software to determine the equation of the regression line. Write the equation. c. What is the value of the slope of the regression line found in part b? Write a sentence that interprets this slope. d. Imagine a city not in the data set is at latitude = 40. What is the predicted January temperature for this city? Show work. e. Refer to the previous part. Suppose the imaginary city at latitude = 40 has a mean January temperature = 36. For this city, what is value of the residual (prediction error). Show work. f. Determine the correlation between JanTemp and latitude. Give the numerical value of the correlation. Then, briefly discuss why this value indicates that there is a strong negative association between JanTemp and latitude. 33 Activity 3.8 On the companion website, refer to the original Journal Article for Chapter 11—Example 11.12: “Development and initial validation of the Hangover Symptoms Scale: Prevalence and correlates of hangover symptoms in college students.” On page 1447 it says: “The HSS [Hangover Symptoms Scale] was significantly positively associated with the … typical quantity of alcohol consumed when drinking (r = 0.40).” a. What two variables were measured for each person to provide this result? Which of these two variables is the response variable and which is the explanatory variable in this situation? b. Explain what is meant by r = 0.40. c. Look back at Case Study 1.6 on pages 5-6 in the book. On the basis of the definition of “statistically significant” given on page 6 in the book, explain what you think it means to say that the hangover symptoms variable was “significantly positively associated” with the typical quantity of alcohol consumed variable. 34 Activity 3.9 For parts a-c, refer to the original Journal Articles on the companion website. In each case, discuss which of the four interpretations of an observed association given in Section 3.5 (pages 94-95) in the book might apply. a. The journal article for Chapter 2 – Example 2.2: “Myopia and Ambient Lighting at Night.” b. The journal article for Chapter 10 – Case Study 10.2: “A Controlled Trial of SustainedRelease Bupropion, a Nicotine Patch, or Both for Smoking Cessation.” c. The journal article for Chapter 17 – Exercises 17.17-17.34 Study #2: “Effects of Walking on Mortality among Nonsmoking Retired Men.” 35 CHAPTER 4 ACTIVITIES NOTE TO INSTRUCTORS: Three group/team projects for Chapter 4 suitable for inclass work are given in the Course Support (Class Projects) section of the companion website. Activity 4.1 This is Thought Question 4.1 on page 113 in the book. a. Hair color and eye color are related characteristics. What exactly does it mean to say that these two variables are related? b. Suppose that you know the hair colors and the eye colors of 200 individuals. How would you assess whether the two variables are related for those individuals? Activity 4.2 Read Example 4.2 on page 116 of the text. Use the information given in that example to answer the parts of this activity. a. Refer to Table 4.3 on page 116. Among the 114 couples who are separated, what percentage contains no smokers? Among the couples who are separated, what percentage contains only one person who smoked? Among the couples who are separated, what percentage contains two people who smoked? b. Among the 1384 persons who are not separated, what percentage contains no smokers? Among the couples who are not separated, what percentage contains only one person who smoked? Among the couples who are not separated, what percentage contains two people who smoked? 36 Activity 4.2 Continued c. Explain why the answers to parts a and b of this activity suggest that there may be a relationship between smoking habits and the likelihood of marital separation. (Hint: See the In Summary box on page 117 in the book for an explanation of what constitutes a relationship between two categorical variables.) Activity 4.3 The following table gives data from the first 24 observations in the YouthRisk03 dataset, which contains data from 12th grade students in a 2003 survey of U.S. high school students. The survey was conducted by the U.S. Centers for Disease Control. The variables here are student gender and what the student says what action they have done about their weight in the past 3 months (four categories = stay the same, try to gain, try to lose, nothing). Gender Weight Action Male Male Female Female Male Female Female Female Female Male Male Male Male Male Female Female Male Female Female Male Female Female Female Female Stay same Gain Lose Gain Nothing Lose Nothing Lose Lose Lose Gain Lose Gain Gain Lose Lose Gain Lose Gain Nothing Lose Lose Gain Stay same 37 a. Create a two-way table of counts summarizing the information given in the data table above. Activity 4.3 continued b. Using the counts determined in part a, calculate conditional percentages appropriate for comparing the weight action responses of males and females. That is, calculate the percentage in each weight action category for males and also (separately) for females. c. Now use the complete YouthRisk03 dataset that’s on the companion website. Using a computer and statistical software, repeat parts a and b for the whole dataset. d. Write a few sentences that summarize the results of part c. 38 Activity 4.4 In Example 4.2 (p. 116) the likelihood of marital separation was associated with smoking habits. Were the data collected in an observational study or in an experiment? Do you think that cigarette smoking may cause an increase in the marital separation rate? Do you think getting separated may cause a person to start smoking? How would you explain the association between smoking habits and the likelihood of marital separation? Activity 4.5 Review Example 4.2 on page 116 and refer to the counts given in Table 4.3 on page 116 in the book. a. Calculate the risk of being a smoker for those who are separated. b. Calculate the risk of being a smoker for those who are not separated. c. Calculate the relative risk of being a smoker for those who are separated compared to those who are not separated. Write a sentence that explains this relative risk in a way that the general public would understand. d. What is the percent increase in the risk of being a smoker for those who are separated compared to those who are not separated. 39 Activity 4.6 In the news (or on the Internet), find an article in which a relative risk is described. a. Give the details of where you found the article, and briefly summarize what the article is about. Be sure to give the main result about the relative risk being described in the article you found. b. On page 121 in the book, review the three questions that should be considered when you encounter statistics about risk. Discuss whether the article you found adequately addresses the issues in these questions. For instance, are the actual risks given? Activity 4.7 In the news (or on the Internet), find an example of an observational study in which the relationship between two categorical variables is described. a. Give the details of where you found the article, and briefly summarize what the article is about. Be sure to give the main result. b. Does the article discuss whether any “third variables” were taken into account by the researchers when examining the relationship under study? If so, what “third variables” were considered? If not, what “third variables” do you think should be considered in the situation described in the article? 40 Activity 4.8 The parts of this activity are the Thought Questions in Section 4.2 of the book. a. This is Thought Question 4.2 on page 118 in the book. Based on the study described on pages 113-114 in Section 4.1, the relative risk of developing any myopia later in childhood is 5.5 for babies sleeping in full light compared with babies sleeping in darkness. Restate this information in a sentence that the public would understand. b. This is Thought Question 4.3 on page 120 in the book. Suppose that a newspaper article claims that drinking coffee doubles your risk of developing a certain disease. Assume that the statistic was based on legitimate, wellconducted research. What additional information would you want about the risk before deciding whether or not to quit drinking coffee? c. This is Thought Question 4.4 on page 123 in the book. If you were a frequent beer drinker and were worried about getting colon cancer, would it be more informative to you to know the risk of colon cancer for frequent beer drinkers or the relative risk of colon cancer for frequent beer drinkers compared to nondrinkers? Which of these statistics would likely be of more interest to the media? Explain your responses. 41 Activity 4.9 This activity is about Example 4.11 on page 124 in the book a. In Example 4.11, what is the response variable? What is the explanatory variable? What is the confounding factor? b. Explain how Example 4.11 is an illustration of Simpson’s Paradox. Activity 4.10 These questions are about Case Study 4.2 on page 133 in the book. a. Write null and alternative hypotheses about the two variables in Case Study 4.2. b. In Case Study 4.2, what is the population of interest? What is the sample? c. Explain which one of the following three statements is the most correct way to state a conclusion about Case Study 4.2. (Hint: See the discussion at the bottom of page 132.) 1. There is no relationship between gender and driving after drinking for young drivers in Oklahoma. 42 2. The sample evidence is not strong enough to say that there is a relationship between gender and driving after drinking for young drivers in Oklahoma. 3. There is a relationship between gender and driving after drinking for young drivers in Oklahoma. 43 Activity 4.11 On the companion website, refer to the original Journal Article for Chapter 11 -- Example 11.12: “Development and initial validation of the Hangover Symptoms Scale: Prevalence and Correlates of Hangover Symptoms in College Students.” a. Near the bottom of page 1445 of the article, the result of a chi-square test is given. Write the null and alternative hypotheses for this test. b. What was the result of this chi-square test? Write a conclusion in the context of this study. 44 CHAPTER 5 ACTIVITIES NOTE TO INSTRUCTORS: Three group/team projects for Chapter 5 suitable for inclass work are given in the Course Support (Class Projects) section of the companion website. Activity 5.1 Suppose that n = 300 students in statistics classes at a large university are asked “How important is religion in your own life (very, fairly, not very)?” The researcher would like to use the results to make generalizations about all persons at least 18-years old in the United States. a. For the researcher’s desired use of the data, describe the population of interest and the observed sample. b. Do you think the sample described in part b should be used to generalize about the population described in part a? Explain why or why not. c. Suppose that the variable studied was handedness (right-handed or left-handed). For that variable, could we use the sample of Stat 200 students to generalize about the larger population? Explain. 45 Activity 5.2 For each part, locate the original Journal Article on the companion website. In each case, describe how the sample was selected for the study described in the article and discuss what larger population, if any, is represented by the sample for the response variable(s) in the study. a. The journal article for Chapter 11—Example 11.12: “Development and initial validation of the Hangover Symptoms Scale: Prevalence and correlates of hangover symptoms in college students.” b. The journal article for Chapter 13 – Exercise 13.39: “A Prospective Study of Holiday Weight Gain.” c. The journal article for Chapter 3 – Example 3.3: “Some Exploratory Findings on the Development of Musical Tastes.” 46 Activity 5.3 For guidance, review the definitions of types of bias on page150in the book. Suppose that a national survey is done to study the extent of alcohol use by 12th grade students in the United States. In the survey, n = 3000 students are asked various questions about their alcohol use or (non-use). a. In the context of such a survey, about alcohol use by 12th graders, explain the difference between nonresponse bias and response bias. b. In this situation, give an example of a way to collect data that may cause selection bias to affect the results. Activity 5.4 This is Thought Question 5.2 on page 154 in the book. Suppose that a survey of 400 students at your school is conducted to assess student opinion about a new academic honesty policy. a. Based on Table 5.1 (p. 153), about what will be the margin of error for the poll? b. How many students attend your school? Given this figure, do you think the values in Table 5.1 should be used to estimate the margin of error for a survey of students at your school? Explain. 47 Activity 5.4 To begin, review Example 5.5 on page 152 in the book. a. For the study described in Example 5.4, what is the population and what is the sample? b. What is the value of the margin of error for the survey described in Example 5.4? Write a sentence that interprets this margin of error. (Hint: See the definition on page 151.) c. Using information given in Example 5.4, calculate approximate 95% confidence intervals that estimate the proportion and percentage of all adult Americans who would say whether they would travel in outer space. d. If the sample size for this survey had been 500 rather than 1,019, what would have been the value of the margin of error for the survey? Activity 5.6 In Example 5.6 on page 156 in the book, the ID numbers for Sample 2 were determined using a table of random digits. Verify that the ID numbers given for Sample 2 are correct by showing the details of selecting numbers from a table of random digits and converting them to ID numbers. 48 Activity 5.7 Suppose that you will use data collected from a sample of eight students in your statistics class to estimate the average amount of time that all students in the class spend studying statistics each week. Your teacher will allow you to collect the data during a class meeting. [Note: We’re assuming that there will be more than eight students in class that day!] a. Describe how you would pick students for the sample using a simple random sampling procedure. b. Describe how you would pick students for the sample using a stratified random sampling procedure. c. Describe how you would pick students for the sample using a cluster sampling procedure. Activity 5.8 This is Thought Question 5.3 on page 162 in the book. In Section 5.1, the Fundamental Rule for Using Data for Inference stated that “available data can be used to make inferences about a much larger group if the data can be considered to be representative with regard to the question(s) of interest.” Read the description of how the ABC News poll in Example 5.7 (page 161) was conducted. Do you think the results of the poll can be extended to a larger group than the 779 people in the sample? If so, to what group can the results be extended and why? 49 Activity 5.9 To begin, review the discussion of the term sampling frame on page 162 in the book. Then, consider the following situation. Suppose that a Gallup Poll is done using random digit dialing to reach individuals in households with land-line telephones. The purpose of this particular poll is to estimate the proportion of U.S. adults who favor stronger gun control laws. a. Describe the distinction between the sampling frame and the population in this situation. b. Explain whether you think the difference between the sampling frame and the population would (or would not) lead to selection bias in this situation. c. Suppose that n = 1000 adults are surveyed and 63% of the sample favors stronger gun control laws. Calculate an approximate 95% confidence interval to estimate the proportion of all adults in the United States in favor of stronger gun control laws. Use any of Examples 5.3 or 5.4 on pages 152-153 for guidance. Margin of error = Approximate 95% confidence interval = Activity 5.10 This is Thought Question 5.4 on page 164 in the book. Suppose you want to know how students at your school feel about the computer services that are offered. You are able to obtain the list of e-mail addresses for all students who are taking statistics classes, so you send a survey to a simple random sample of 100 of those students and 65 respond. Using the difficulties discussed so far in this section (pages 162164), explain to whom you could extend the results of your survey and why. 50 Activity 5.11 Review pages 166-174 in Section 5.6 of the book. Then, find an example of a survey on the web or in print media that demonstrates one of the possible sources of bias listed on pages 166-167. a. Briefly describe where you found this survey and what the survey was about. b. Explain which source of bias is demonstrated in the survey. c. Discuss how the survey should have been changed in order to eliminate the source of bias that you listed in part b. Activity 5.12 This activity uses the Sampling applet described in Section 5.7 (pages 174175 in the book). To begin, review Section 5.7 and start the applet on your computer. a. Take 20 samples, and then click on “Show Results.” You should see a popup window with 20 lines, where each line gives the mean height and percent female for one sample. Write down the sample mean heights for the 20 different samples, and then draw a dotplot of these 20 sample means. (Examples of dotplots are Figure 1.1 on page 2 and Figure 2.9 on page 30.) 51 b. Describe the characteristics of the sample mean height for the 20 different sample means found in part a. For instance, what were the lowest and highest values of mean height? How do the mean heights from the samples compare to the mean height for the population of all 100 individuals, which is 68.0 inches? Based on a single sample of 10 individuals, are you likely to get a good estimate of the mean height for the population? c. Does the mean height for the sample appear to be related to the percent of females in the sample? Would you expect them to be related? Explain. d. Read the explanation of how to take a systematic sample, on page 159-160 of the book. How would you take a systematic sample of five individuals from this population of 100 stick figures in the applet display? e. Take a systematic sample of five individuals (by hand). Explain what you did in enough detail so that someone else could find your sample? f. In using a sample to estimate the mean population height and percent female, would results from a systematic sample be biased? Explain. 52 CHAPTER 6 ACTIVITIES NOTE TO INSTRUCTORS: Four group/team projects for Chapter 6 suitable for in-class work are given in the Course Support (Class Projects) section of the companion website. Activity 6.1 a. This is Thought Question 6.1 on page 190 in the book. For many randomized experiments, researchers recruit volunteers who agree to accept whichever treatment is randomly assigned to them. Why do you think this strategy cannot always be used, thus requiring observational studies to be used instead? b. This is Thought Question 6.3 on page 194 in the book. For most randomized experiments, such as medical studies comparing a new treatment with a placebo, it is unrealistic to recruit a simple random sample of people to participate. Why is this case? What can be done instead to make sure the Fundamental Rule for Using Data for Inference (p. 148 in Section 5.1) is not violated? Activity 6.2 Explain in your own words what a confounding variable is. Give an example where an apparent causal relationship between the explanatory and response variable is probably influenced by a confounding variable. Don’t use an example given in Chapter 6. Activity 6.3 This is Thought Question 6.2 on page 192 in the book. 53 Choose a possible confounding variable for the situation in Example 6.1 (p. 191-192), other than the ones mentioned in the example, and explain how it meets the two conditions necessary to qualify as a confounding variable (defined on p. 191). Activity 6.4 Find an example in the news (or on the Internet) of an observational study for which the news story or headline is attributing a cause and effect relationship. Cite the source for your example and explain what relationship was found. Discuss possible confounding variables for the study, or other explanations that may account for the observed relationship. In general, do you think the differences or changes in the explanatory variable were responsible for a difference in the outcome variable? 54 Activity 6.5 Find an example of a successful randomized experiment in the news (or on the Internet) that you think may apply to you, now or when you are older. Cite the source for your example and explain what relationship was found. Do you think that if you changed your behavior based on the explanatory variable (diet, taking aspirin, meditating, etc) you would experience a change in the outcome variable as a result? In general, do you think the differences or changes in the explanatory variable were responsible for a difference or change in the outcome variable? Activity 6.6 This is Thought Question 6.4 on page 200 in the book. Students are sometimes confused by the reasons for blocking and for randomization. One method is used to control known sources of variability among the experimental units, and the other is used to control unknown sources of variability. Explain which is which, and provide examples illustrating these ideas. 55 Activity 6.7 As the basis for this activity, use the original Journal Articles for chapters other than Chapter 6 on the companion website. a. Among the original Journal Articles (but not for Chapter 6), identify a study based on a randomized experiment. Briefly summarize the purpose of the study and the principal result(s). b. For the study you identified in part (a) of this activity, explain what the researchers did to make the study be a randomized experiment. c. Among the original Journal Articles (but not for Chapter 6) identify a study based on an observational study. Briefly summarize the purpose of the study and the principal result(s). d. In the observational study that you identified in part c, what confounding variables did the researchers take into account? e. Explain whether the observational study that you identified in part c was a retrospective study or a prospective study. 56 Activity 6.8 Two restaurant servers, one male and one female, participate in a study done to examine the effect of drawing a happy face on the customer’s bill. For same customers, each server drew a happy face on the bill. For other customers, no drawing or message was put on the bill. It was randomly determined as to whether a happy face would be drawn or not. The researchers wanted to see if drawing the happy face increased tip percent. a. Is this an experiment or an observational study? Explain. b. The purpose of most statistical studies is to use the observed sample data to generalize to a larger group. What do you think are the weaknesses of using this study to generalize to all restaurant servers? c. In this study, what is the response variable and what are the explanatory variables? d. If you were a restaurant server, would you be more interested in your mean tip or your median tip? Explain. 57 Activity 6.9 Design an experiment to test something of interest to you. Identify the response and explanatory variables for this experiment. Outline how treatments or conditions will be assigned. Discuss any other steps you might take to safeguard against the difficulties discussed in Section 6.4. 58 Activity 6.10 Design an observational study to examine something of interest to you. Summarize the purpose for this study, and identify the response and explanatory variables. Discuss how you would collect the data. What are some possible confounding variables in your study? How might you account or control for these confounding variables when examining (or collecting) the data? Activity 6.11 Write brief explanations or definitions for each of the following terms. Suggestion: Use the Key Terms list on page 210 in the book to locate these terms in Chapter 6. a. Randomized experiment b. Observational study c. Confounding variable d. Randomization e. matched-pairs design f. block design 59 g. Rule for Concluding Cause and Effect 60 CHAPTER 7 ACTIVITIES INSTRUCTORS: Two group/team projects for Chapter 7 suitable for in-class work are given in the Course Support (Class Projects) section of the companion website. Activity 7.1 a. This is Thought Question 7.1 on page 223 in the book. Review Case Study 7.1 (p. 222) and the list of five random circumstances on page 220.Using your understanding of probability and random events, assign probabilities to the two possible outcomes for Random Circumstance 3 on page 220. b. This is Thought Question 7.2 on page 223 in the book. Review Case Study 7.1 (p. 222) and the list of five random circumstances on pages 220. At the beginning of Alicia’s day, the outcomes of the five random circumstances listed were uncertain to her. Which of them were uncertain because the outcome was not? Activity 7.2 a. Give an example of a personal probability for an event of interest to you. Assign a probability to the event and describe how you decided on this probability. b. Give an example of an event for which the relative frequency interpretation of probability can be used. (Don’t use any examples from Section 7.2 in the book.) Give the probability for this event and interpret the probability in terms of relative frequency. 61 Activity 7.3 This is Thought Question 7.3 on page 228 in the book. You are about to enroll in a course for which you know that 20% of the students will receive a grade of A. Do you think that the probability that you will receive an A in the class is .20? Do you think the probability that a randomly selected student in the class will receive an A is .20? Explain the difference in these two probabilities, using the distinction between relative frequency probability and personal probability in your explanation. Activity 7.4 Flip a coin 100 times. After each 10 flips (that is, after 10 flips, 20 flips, 30 flips, etc.), stop and compute the proportion of heads using all flips up to that point. Plot the proportion of heads versus the number of flips. Discuss how the plot relates to the relative frequency interpretation of probability. Flips 10 20 30 40 50 Heads Proportion= Heads/Flips 62 60 70 80 90 100 Activity 7.5 Review Example 7.7 on page 229 in the book, and use that example for guidance. For this activity, use the same sample space of 1000 three-digit lottery numbers that is used in Example 7.7. a. Let event C = the first digit drawn is a 7. What is the value of P(C)? b. Let event D = the three digit number is an even number (ends in 0, 2, 4, 6 or 8). What is the value of P(D)? c. Let event E = the sum of the three digits drawn is 2. What is the value of P(E)? (Hint: What are the three-digit numbers for which the three digits sum to the value 2? ) Activity 7.6 This is Thought Question 7.4 on page 232 in the book. Review Case Study 7.1 (p. 220). Remember that there were 50 students in Alicia’s statistics class and that student names were not put back in the bag after being selected. a. Consider the events A = Alicia is selected to answer question 1 and B = Alicia is selected to answer question 2. Describe each of the following four conditional probabilities in words and also determine a value for each: P(B|A), P(BC|A), P(B|AC), P(BC|AC). b. Now, based on your answers (to part a), can you formulate a rule about the value of P(B|A) + P(BC|A)? 63 Activity 7.7 The In Summary box on page 232 and other material in Section 7.3 can be used for guidance in this activity. Pick some random situation (for instance, drawing cards from a 52-card deck or tossing dice). Using that random situation, give examples for each of the following: a. Two events that are complements of each other. b. Two mutually exclusive events c. Two independent events d. Two dependent events Activity 7.8 Read Example 7.15 on page 234-235 in the book; use the information in that example for this activity. a. Give values for each of the following events. C = roommate doesn’t like to party P(C) = _______ D = roommate doesn’t snore P(D) = ________ C and D = doesn’t like to party and doesn’t snore P(C and D) = ______ b. For events C and D define d in part a, determine the value of P(C or D). Write a sentence that describes or interprets this probability. 64 Activity 7.9 On the companion website, refer to the original Journal Article for Chapter 7—Examples 7.12, 18, 24, and 28: “A Comparison of Gambling by Minnesota Public School Students in 1992, 1995, and 1998.” The data are from essentially the whole populations of 9th and 12th graders in Minnesota during the years considered in the study. a. Use tables 4 and 5 on pages 283 and 284 of the article to determine values of the following conditional probabilities for 12th grades students in 1998: P(12th grade student is weekly gambler | student is boy) = _________ P(12th grade student is weekly gambler | student is girl) = _____________ Note: Example 7.12 on page 230 is about 9th grade students. b. Review Example 7.16 on page 233 in the book. Assume that the 12th grade population is 50.9% girls and 49.1% boys, as was assumed for 9th grade students in Example 7.16. Calculate the probability that a randomly selected 12th grade student in Minnesota is a female who is a weekly gambler. c. Review Example 7.24 on pages 241-242 For 12th grade students, create a “hypothetical hundred thousand” table like the one in Example 7.25 for 9th grade students. Weekly Gambler Not Weekly Gambler Total Boy Girl 100,000 Total d. Use the table created in part c to determine for 12th grade students, the probability that a weekly gambler is a boy. That is, determine P(boy | weekly gambler). 65 Activity 7.10 This is Thought Question 7.5 on page 242. Continuing the DNA example, Example 7.25 (p. 242), verify that the conditional probability P(DNA match| innocent person) = 5 .00000083 . 5,999,999 Then provide an explanation that would be understood by a jury for the distinction between the two statements: The probability that a person who has a DNA match is innocent is 5/6. The probability that a person who is innocent has a DNA match is .00000083. Activity 7.11 This is Thought Question 7.6 on page 245. Explain why the tree diagram in Figure 7.1 (p. 241) displayed disease status first and test results second rather than the other way around. 66 Activity 7.12 To begin, review Example 7.25 on page 243 in the book. a. Using the results given in Table 7.4, estimate the probability that prize 1 is not in any of the six cereal boxes. b. Assuming that the six boxes are independent, what is the theoretical probability that prize 1 is none of the six boxes? (Hints: See Rule 3 on page 233 for independent events and remember that the probability is ¾ that prize 1 will not be in any specific box.) c. If you have access to the necessary statistical software, repeat the simulation done in Example 7.29 on page 245. (You’ll get different results!) Use your simulation to estimate the probability that all four prizes will be in six boxes of cereal. Activity 7.13 This is Thought Question 7.8 on page 252. If you wanted to pretend to be psychic, you could do a “cold reading” on someone you do not know. Suppose you are doing this for a 25-year-old woman. You make statements such as the following: I see that you are thinking of two men, one with dark hair and the other one with slightly lighter hair or complexion. Do you know who I mean? I see a friend who is important to you but who has disappointed you recently. I see that there is some distance between you and your mother that bothers you. Using the material in this section (Section 7.7), explain why this would often work to convince people that you are psychic. 67 Activity 7.14 Find a story of a startling coincidence in the news or on the Internet. (One way to do this is to type “amazing coincidence” into a search engine. You will get plenty of material.) Evaluate roughly how likely you think it is that the specific sequence of events would happen to the specific person to whom they happened. Then, evaluate roughly how likely something similar to that sequence of events would be to happen to the specific person. Finally, evaluate how likely you think something similar to that sequence of events would be to happen to someone, somewhere, someday. Activity 7.15 Ask several (around 10) people to each write down what they think would be a typical sequence of 20 coin flips. Have them write H for heads and T for tails. As an example, they might write H,H,T,H,T,T,T,T,H,T,T,H,H,T,H,T,H,H,H,T. After a person has written the sequence, count and record the length of the longest streak of consecutive “flips” of the same type. For the example just given, this value will be four, which is the number of consecutive T’s beginning at the 5th “flip” of the sequence. Then, flip a coin 20 times and record the length of the longest streak of consecutive flips of the same type (longest number of consecutive heads or consecutive tails). Repeat this several times (around 10). Compare the lengths of the longest streaks in the imagined “typical” sequences to the lengths of the longest streaks in actual coin flips. Summarize the results and discuss whether the results may provide evidence that people’s imagined flips are affected by gambler’s fallacy. 68 Activity 7.16 To begin, read Case Study 7.2 on pages 252-253 in the book. a. In the second column of Case Study 7.2, the probabilities used for events B, C and D are 108/119, 96/118, and 84/117, respectively. Explain why these are the right probabilities. b. Now suppose that a playlist of songs on an iPod or an MP3player has a total of 60 songs, with 10 songs from each of six albums. The music player can randomly shuffle the order of the songs. What is the probability that at least two of the first three songs after the random shuffle are from the same album? As in Case Study 7.2, first find the probability that the first three songs are from different albums and then subtract that value from 1. c. Suppose that you have access to statistical software that can randomly order a list of words or numbers. For the situation in part b (10 songs from each of 6 albums), describe how you could use this software to do a simulation for the purpose of estimating the probability that at least two of the first three songs are from the same album. d. If you have access to the necessary statistical software, do a simulation in order to estimate the answer to part b. Base your answer on at least 25 different random orderings. Summarize your results and give the estimated probability. 69 e. For the situation described in part b, what is the probability that in the first six songs of a random shuffle there is one song from each of the six albums? Either calculate the value theoretically or do a simulation. (Or, do both things.) f. Create your own probability question about an outcome of a random shuffle of an iPod (or other player’s) playlist. Use simulation to estimate the probability of the outcome of interest for your question. As an example, for the situation described in Case Study 7.2, you might examine the probability that first five randomly ordered songs are from only two albums. 70 CHAPTER 8 ACTIVITIES INSTRUCTORS: Two group/team projects for Chapter 8 suitable for in-class work are given in the Course Support (Class Projects) section of the companion website. Project 8.2 on the companion website is a more sophisticated version of Activity 8.13 on page 79 in this manual. Activity 8.1 This is Thought Question 8.1 on page 266 in the book. If you know that the number of possible values a random variable can have is finite, do you know whether the random variable is discrete or continuous? Answer the same question for a random variable that can have an infinite number of possible values. Activity 8.2 Review Example 8.6 on page 267. Then, modify the example so that it is about families with two children rather than three children. a. List the sample space of possible sequence (arrangement) of the sexes of the two children. b. Define the random variable X = number of girls among the two children. Assign the appropriate value of X to each simple event in the sample space. c. Give a probability for each possible value of X. Probability of 0 girls P(X = 0) = Probability of 1 girl P(X = 1) = Probability of 2 girls P(X = 2) = 71 Activity 8.3 Review the subsection on page 269 about a cumulative probability distribution function. a. What is a cumulative probability? In words, give an example of a cumulative probability for a variable not considered on page 269. b. In Example 8.8 (p. 269), it was found that P(X ≤ 1) = 4/8. In the context of the example, write a sentence that describes the event for which this is the probability. c. For Example 8.8 (p. 269), explain why P(X ≤ 2) was calculated in the way shown in the example. Activity 8.4 This is Thought Question 8.2 on page 270 in the book. In Example 8.9 (p. 269-270), we added the probabilities of X = 1 and X = 2 because they were mutually exclusive events. For any discrete random variable X, is it always true that X = k and X = m are mutually exclusive events, where k and m represent two values that X can have? Explain. 72 Activity 8.5 In a statistics class at a large university, students were asked to rate how much they liked various kinds of music on a scale of 1 (don’t like at all) to 6 (like very much). Following is a probability distribution for the female students’ ratings of Top 40 music. Notice that the probability is not given for a rating of 4. X=Rating Probability 1 .04 2 .05 3 .09 4 ? 5 .32 6 .26 a. Is the random variable X = music rating a discrete variable or a continuous variable? Explain. b. What is the value of the probability (not given) for X = 4, the probability that rating equals 4? Hint: What is the total of the probabilities given for all values of X that are not equal to 4? By the “laws” of probability, what is the total probability for all possible outcomes? c. Determine P(X ≤ 3), the probability that a Top 40 music rating given by a randomly selected female is 3 or less. To do this, add the probabilities for X = 1, 2 and 3. d. For each Top 40 rating value, determine the cumulative probability. For guidance, see Example 8.8 on page 269. 1 2 3 4 5 6 X = Rating Cumulative 1 Probability e. Write a sentence that explains the meaning of the cumulative probability P(X ≤ 4) in the context of this example. f. For this example, show that P(X = 5) + P(X = 6) = 1 – P(X ≤ 4) 73 Activity 8.6 This is Thought Question 8.3 on p. 271 in the book. Refer to the probability distribution for the sum of two dice, shown at the top of page 268 and in Figure 8.2 on page 270. a. What is the value of P(X = 4)? What does this probability measure? b. Explain what is measured by the value of 1– P(X = 4). c. This is an added part to Thought Question 8.3. What is the value of 1– P(X = 4)? Activity 8.7 Review Example 8.11 on pages 271-272 in the book. Then, modify the example so that the probability distribution for the amount a player gains on a single play is as follows: X = amount gained Probability $3 −$2 .3 .7 a. Calculate the expected value for this probability distribution. b. Write a sentence that interprets the expected value in this situation. For guidance see the sentence after the calculation in Example 8.11 on pages 271-272. 74 Activity 8.8 This is Thought Question 8.4 on page 272 in the book. Suppose that the probability of winning in a gambling game is .001, and when a player wins, his or her net gain is $999. When a player loses, the net amount lost is $1 (the cost to play). Is this game fair? Why or why not? How would you define a fair game? Does the number of times the game is played affect your view of whether the game is fair? Explain. Activity 8.9 Find two lottery or casino games that have fixed payoffs and for which the probabilities of each payoff are available. Some lottery tickets list them on the back of the ticket or on the lottery’s Web site. Some books about gambling give the payoffs and probabilities for various casino games. a. Compute the expected value for each game. Discuss what each expected value means. b. Using both the expected values and the list of payoffs and probabilities, explain which game you would rather play and why. 75 Activity 8.10 The television game show Deal or No Deal, on NBC in the U. S., requires simple probability assessments on the part of the contestant. At the game’s beginning, the player chooses one of 26 numbered briefcases, each concealing a different money amount, with the 26 amounts ranging from $0.01 to $1,000,000. Then, in each round of the game, the amounts concealed in some of the other briefcases are revealed, after which the “bank” offers the player an amount of money (“deal”) to stop playing. If the player says “no deal,” more briefcases are opened and the player is offered a different “deal.” The game ends when the player accepts a deal or when he/she has rejected all deals and accepts the amount concealed in the originally picked briefcase. The Deals: In rounds late in the game, deals offered by the bank tend to be in the range of 75% to 95% of the average amount in the unopened briefcases at that point. Early in the game, the deals are much lower, in part to encourage a player to keep playing. a. Suppose that there are four unopened briefcases remaining, including the player’s original pick, and the remaining amounts are known to be $10, $50, $10,000 and $400,000. The bank offer (deal) is $70,640. If the player says “no deal,” one more briefcase will be opened and a new deal will be offered. Discuss whether the player should accept the deal or say “no deal.” (Suggestion: Consider the potential deals and their probabilities if another case is opened.) b. Suppose that a different player has three unopened briefcases remaining, and the remaining amounts are known to be $0.01, $5, and $500,000. The bank offer (deal) is $152,503. If the player says “no deal,” one more briefcase will be opened and a new deal will be offered. Discuss whether the player should accept the deal or say “no deal” 76 c. The 26 money amounts randomly distributed to the suitcases in the game are: $0.01; $1; $5; $10; $25; $50; $75; $100; $200; $300; $400; $500; $750; $1,000; $5,000; $10,000; $25,000; $50,000; $75,000: $100,000; $200,000: $300,000; $400,000; $500,000; $750,000; $1,000,000. For players who decide in advance that they will reject all deals in favor of the amount in the first case that they picked, what is the expected value (average value) of the game? For a player who decides in advance to reject all deals and take the amount in the first case picked, what is the probability that the amount he or she wins is less than the expected value of the game? d. (This part might best be done by student groups). On the Internet, you may (probably) be able to find a simulation of the game. Using an Internet simulation of the Deal or No Deal game and/or other means, such as statistical software that can do random selection and perhaps theoretical considerations, propose and investigate a possible strategy for doing well (winning money!) in the game. As an example, what are your chances of doing well if you reject all deals until four unopened briefcases remain? Summarize your findings and how you arrived at these findings. 77 Activity 8.11 a. This is Thought Question 8.5 on page 277 in the book. The word binomial is from the Latin bi = “two,” and nomen = “name.” Explain why the word binomial is appropriate for a binomial random variable. b. This is Thought Question 8.6 on page 280 in the book. In Example 8.17 (p. 2278-279), we determined the probability that you could guess your way to a passing score on a quiz with 15 true-false questions. If you did guess at each of 15 true-false questions, what is the expected value of X = number of correct answers? Is the expected value a possible score on the quiz? What exactly does the expected value tell us? . Activity 8.12 Review Case Study 8.1 on pages 280-281 in the book. a. Explain why a single participant’s twenty trials are a binomial experiment with n = 20 and p = .5, if it is assumed that he or she cannot really detect the difference between samples A and B so is just randomly guessing on every trial. b. Briefly summarize how the researchers used the binomial distribution to define the standard that they used for a “significant” flavor detection performance. 78 Activity 8.13 Work with a partner. You’ll assume the role of the “experimenter.” To do this activity, you’ll need four cards of the same rank from a 52-card deck. For example, you might use the four Kings (King of hearts, King of diamonds, King of clubs, King of spades). Randomly mix the cards and then select one in a way that your partner can’t see what you picked. To guard against your own selection bias, don’t look at the cards while making your selection. Have your partner guess the suit of the card you picked. Repeat this 10 times, and keep track of right and wrong guesses using the following table. Trial 1 2 3 4 5 6 7 8 9 10 Right or Wrong? a. What was the number of correct guesses made by your partner? Number of correct guesses by partner = _____ b. Assuming a person randomly guesses each time, explain why this is a binomial experiment and give the values of n and p for the experiment. c. Use statistical software (or Excel) or an appropriate TI calculator to determine the following probabilities for somebody randomly guessing on all tries. Tip: See the software tips on pages 297 and 298 in the book. Probability of 0 correct guesses in 10 tries: P(X = 0) = Probability of 2 or fewer correct guesses in 10 tries: P(X ≤ 2) = Probability of 6 or fewer correct guesses in 10 tries: P(X ≤ 6) = d. Determine the probability that somebody who is randomly guessing could make more correct guesses than your partner did. (Hint: The first step is to find the cumulative probability for the number of correct guesses by your partner.) 79 Activity 8.14 a. This is a shortened version of Thought Question 8.7 on page 283. The total area under the probability density function over the entire range of values the random variable X can possibly have is the same for all continuous random variables. What is that total area? What probability does it represent? b. This is Thought Question 8.8 on page 284 in the book. Which of the following measurements do you think are likely to have a normal distribution: heights of college men, incomes of 40-year-old women, pulse rates of college athletes? Explain your reasoning for each variable. For those variables that are likely to be normally distributed, give approximate values for the mean and standard deviation. c. Give an example of a continuous random variable that you think does not have a normal distribution and sketch what you think is its density curve. Don’t use any examples given in Chapter 8. 80 Activity 8.15 a. Follow the steps in Example 8.24 (p. 289) to determine the probability that the height of a randomly selected college woman is less than 67 inches. Use the same population mean and standard deviation used in the book example. Draw a clearly labeled sketch that illustrates the answer. Use Figure 8.16 (p. 285) for guidance. 81 Activity 8.16 Suppose that the heights of college-age men have a normal distribution with mean μ = 71 inches and standard deviation σ = 2.7 inches. a. Using Example 8.26 (pages 291) for guidance, find the 75th percentile of heights for college-age men. b. At the bottom of the table inside the back cover of the book, the information is given that for z = 4.26, the cumulative probability is .99999. Expressed as a percentile, z = 4.26 is the 99.999th percentile of a standard normal curve. Find the height that is the 99.999th percentile of heights for college-age men. Activity 8.17 Write definitions or short explanations for each of the following key terms in section 8.6: a. Normal random variable: b. Normal curve: c. Standardized score: d. Standard normal random variable: 82 CHAPTER 9 ACTIVITIES INSTRUCTORS: Two group/team projects for Chapter 9 suitable for in-class work are given in the Course Support (Class Projects) section of the companion website. Activity 9.1 For guidance, use pages 313-315 in the book. a. Briefly explain the purpose for creating a confidence interval. Give an example of a situation in which a confidence interval would be useful. b. Briefly explain the purpose for conducting a hypothesis test. Give an example of a situation in which it would be useful to conduct a hypothesis test. c. Define, or explain, the term statistical inference. d. In each situation, explain whether the summary described is a population parameter or a sample statistic. (i) Mean GPA of a random sample of 144 students at your school. (ii) The proportion of all U.S. residents who are under 20 years old. (iii) The Gallup Poll surveys 1,000 randomly selected American adults and finds that 48% approve of the president’s job performance. 83 Activity 9.2 (for Section 9.2 in the book) a. Read Example 9.1 on page 316 in the book. What does the parameter d represent in this example? Why are we not able to know the value of this parameter? What is the value of the sample statistic that estimates the parameter d in this example? b. Section 9.2 describes five different population parameters (“The Big Five Parameters”) that will be covered in Chapters 9-13. List the five population parameters. Describe each parameter in words and write the symbol used for each parameter. Give an example for each parameter; give different examples from those given in Section 9.2. 84 Activity 9.3 Read the original Journal Article on the companion website for Chapter 13 – Exercise 13.39: “A Prospective Study of Holiday Weight Gain.” a. Identify a population parameter of interest in this study. Be sure to describe both the population of interest and the summary characteristic of interest. b. From the journal article, what is the value of the sample estimate of the population parameter that you identified in part a? What was the sample size for the dataset that the researchers used to determine this sample estimate? c. Explain why it would have been useful for the researchers to have used a confidence interval to estimate the population parameter that you described in part a. (Confidence intervals are discussed on pages 313-315 in the book. Activity 9.4 This is Thought Question 9.1 on page 326 in the book. For Example 9.4 (p. 326), into what range of possible values should the sample proportion fall 95% of the time, according to the Empirical Rule? Suppose that the polling organization used a sample of only 600 voters instead. Would the range of possible sample proportions be wider, narrower, or the same as it was for a sample size of 2400? Explain your answer, and explain why it makes intuitive sense. 85 Activity 9.5 To begin, review Example 9.4 on page 326 in the book. a. Suppose that the sample size for Example 9.4 was n = 1000 voters, rather than 2400 as in the book example. With n = 1000, what will be the values of the mean and standard deviation of the sampling distribution of the sample proportions? b. Still referring to Example 9.4 on page 326 – If the sample size is n = 1000 (rather than 2400), what will be the interval that covers about 99.7% (nearly all) of possible sample proportions in favor of Candidate X? Activity 9.6 To begin, review Example 9.5 on pages 327-328 in the book. a. In the study described in Example 9.5, suppose that there had been 45 participants, so that the combined number of trials is n = 45 × 20 = 900. Determine the mean and standard deviation of the sampling distribution of possible proportions of correct guesses. b. For a study with 45 participants and n = 900 trials, draw a sketch similar to Figure 9.4 (p. 327) that shows the approximate sampling distribution of possible sample proportions, assuming that all participants guess every time. 86 Activity 9.7 The purpose of this activity is to examine characteristics of sample means for different samples from the same population (Section 9.6 in the book). Use the Student0405 dataset on the companion website). The data are from student surveys in statistics classes at a large university in the years 2004-2005. The variable StudyHrs gives responses to, “How many hours do you typically study per week?” a. Using statistical software, draw a histogram of the StudyHrs variable. Is the shape bellshaped, skewed to the right or skewed to the left? About what is the most common response for weekly hours of study? b. Determine the mean and standard deviation of the StudyHrs variable. Mean study hours = _________ Standard Deviation of study hours data = __________ c. Now, treat the dataset as if it were population data, so think of the values found in part b as population parameters. We will take many different random samples n = 36 individuals from the population of responses for the StudyHrs variable. What are the mean and standard deviation of the distribution of possible values of the sample mean, for samples of n = 36, in this situation? Mean of possible sample means is = _______ Standard deviation of possible sample means is n = ____________ d. Assuming that the sampling distribution for a sample mean (p. 333) holds, fill in the blanks in this statement: For about 68% of all random samples of n = 36 students, the sample mean hours of study will be between ______ and _______. e. Use statistical software to select a random sample of n = 36 values from the StudyHrs column in the dataset. (Minitab users: Calc>Random Data>Sample From Columns.) Then, use the software to determine sample mean for the sample you selected. Sample Mean for selected sample = _________ 87 f. Now, repeat the process for the previous part nineteen additional times. Each time, get a new random sample of n = 36 values from the StudyHrs column and find the mean for the sample. List all 20 sample means that you’ve generated (the mean from part e and the nineteen new means from this part). g. Of your 20 sample means listed part f, how many were within the interval that you computed in part d. Explain whether this is about what would be expected or not. h. What fraction of your 20 sample means were within ±1 hour of the population mean (from part b)? Suppose that we were to take 20 different random samples of n = 100. Do you think that the fraction of sample means within ±1 hour would be more, less, or the same as it was for your sample of n = 36? Explain. i. Suppose that the sample means generated by all students in your class are combined and a histogram of these sample means is drawn. Approximately what would you expect the shape of this histogram to be? Explain. j. Refer back to the list of means that you wrote for part f of this activity. On the basis of that list, which one of the following statements could be a “moral of the story” for this activity? The value of a sample statistic varies from sample to sample. The value of a population parameter varies from sample to sample 88 Activity 9.8 To begin, review Example 9.8 on page 333 in the book. a. Suppose that the sample size in Example 9.8 is changed to n = 64. What will be the values of the mean and standard deviation of the sampling distribution of potential sample means? b. Explain what is being shown in Figure 9.7 on page 334. Activity 9.9 a. This is Thought Question 9.2 on page 333 in the book. Construct an example of interest to you personally for which the Rule for Sample Means applies and for which a study could be done to estimate a population mean. b. This is Thought Question 9.3 on page 335 in the book. From the weight-loss example discussed in Example 9.7 (p. 331) and also on pages 334335, we learned that increasing the sample size fourfold would about halve the range of possible sample means. Would the range of individual weight losses in the sample be likely to increase, decrease, or remain about the same if the sample size were increased fourfold? Explain. 89 Activity 9.10 For this activity use the SampleMeans applet described in Section 9.11 (pages 349-351) in the book. The applet is on the companion website. Additional activities for this applet are on pages 366 in the book. a. Use the default sample size of n = 25, to generate 500 different samples. What is the approximate shape of the histogram of sample means? (This will be the histogram in red, the bottom histogram in the display.) Is this the shape that would be predicted by the normal curve approximation rule for sample means? b. In this situation, what does the normal curve approximation rule predict that the mean and standard deviation of possible sample means will be, for samples of size n = 25? Does it look like the mean of the histogram is about what is should be? (Note that for the population of individual measurements, µ = 8 and σ = 5.) c. According to the Empirical Rule, what interval of values will contain about 99.7% of the possible values of sample means for samples of size n = 25. Does the histogram of sample means appear to span about this range? d. For one simple random sample of n = 25 individuals, how likely is it that the mean weight loss would be 4 pounds or less? Use the histogram of sample means to evaluate this. e. For one simple random sample of n = 25 individuals, explain whether it is likely that the mean weight loss could be 9 pounds or more. Use the histogram of sample means to evaluate this. 90 Activity 9.11 For this activity, use the TVMeans applet. This applet is essentially the same as the SampleMeans applet described in Section 9.11 (pages 349-351) in the book, but the population is responses given by college students to a question asking how many hours they watch television in a typical week. The TVMeans applet is on the companion website. a. What is the shape of the histogram of the population of individual measurements (the top histogram)? b. Generate 2000 samples with n = 4 observations. (Use 4 batches of 500 samples without clearing.) Observe the bottom histogram, which gives the histogram of the means for the 2000 samples. Is the histogram approximately bell-shaped? If not, is the shape what you would expect? Explain. c. Repeat part b using random samples of n = 16. d. Repeat part b using random samples of n = 25. e. Repeat part b using random samples of n = 36. f. Repeat part b using random samples of n = 49. g. Based on the results of this activity, about how large should the sample size n be for the Normal Curve Approximation Rule for Sample Means to work in this situation? 91 Activity 9.12 a. This is Thought Question 9.4 on page 346 in the book. Verify that if the raw data for each individual in a sample is 1 when the individual has a certain trait and 0 otherwise, then the sample mean is equivalent to the sample proportion with the trait. You can do this by using a formula, explaining it in words, or constructing a numerical example. b. This is Thought Question 9.5 on page 347 in the book. The Central Limit Theorem does not specify what is meant by “a sufficiently large sample.” What factor(s) about the population of values do you think determine how large is large enough for the approximate normal shape to hold? Consider the California Decco example. Do you think n= 30 would be large enough for the distribution of possible values for the average loss to be approximately normal? Why or why not? Now consider the handspan measurements of females. Do you think n= 30 would be large enough for the approximate normal shape to hold? What is different about these two examples? Activity 9.13 This is Thought Question 9.6 on page 348 in the book. Example 9.15 (p. 347) described a sample statistic, H = highest number drawn, for the Cash 5 lottery game. Give another example of a sample statistic for the Cash 5 game, and describe what you think the shape of its sampling distribution would be. 92 Activity 9.14 On the companion website, refer to Chapter 9 of the Technology Manual for the statistical software that you use in your class. Read over the description of how to carry out a simulation for Example 9.4 (page 326) in Section 9.4 of the book. a. Change the sample size to 1000 voters per sample and carry out the simulation with (at least) 400 repeated samples. Create a histogram of the sample proportions. What is the shape of the histogram? Is it about what you would expect? Explain whether the range of sample proportions about what you would expect? b. Change the sample size to 500 voters per sample and carry out the simulation with (at least) 400 repeated samples. Create a histogram of the sample proportions. What is the shape of the histogram? Is it about what you would expect? Explain whether the range of sample proportions about what you would expect? c. Compare the range of simulated sample proportions for samples of n = 2400 (see Example 9.4), n = 1000 (part a), and n = 600 (part b). What is indicated about the benefits of increasing the sample size of a survey? (Remember that the “true” proportion is p = .4 in this situation.) 93 CHAPTER 10 ACTIVITIES INSTRUCTORS: Two group/team projects for Chapter 10 suitable for in-class work are given in the Course Support (Class Projects) section of the companion website. Activity 10.1 This is Thought Question 10.1 on page 372 in the book. Each day, Maria gets dozens of e-mail messages. She keeps track of what proportion of the messages is spam or other junk, and what proportion is interesting. Suppose she got 50 messages yesterday, and 20 of them were interesting. a. If the collection of messages on a single day is considered to be a sample of all e-mail messages she ever receives, explain the meaning of each of the following definitions in the context of this example, and give numerical values where possible: unit, population, sample, sample size, population parameter, sample statistic (or sample estimate). Unit = Population = Sample = Sample size = Population parameter = Sample statistic = b. Discuss whether you think the Fundamental Rule for Using Data for Inference (p. 372) would allow Maria to draw conclusions about the population proportion based on the sample proportion. 94 Activity 10.2 a. Describe the basic purpose of a confidence interval. b. Give an example, not given in Chapter 10, of a situation in which a confidence interval could be used to estimate the unknown value of a population proportion. Indicate the population and population proportion of interest in your example. Activity 10.3 Review Example 10.2 on page 375 in the book, and use that example for guidance in this activity. Suppose that the sample result for a different poll on the same issue is that 33% of a randomly selected sample of 635 American adults said they are allergic to something. a. For this new sample, what is the value of p̂ , the sample statistic? b. For this new sample, calculate the value of the standard error of p̂ . c. Using this new sample, calculate the 95% confidence interval that estimates p, the proportion of all American adults who have an allergy. d. Write a sentence that interprets the confidence interval calculated in the previous part. 95 Activity 10.4 This is Thought Question 10.2 on page 374 in the book. Explain in your own words what it means to say that we have 95% confidence in the interval estimate. Then give an example of something you do in your life that illustrates the same concept: You follow the same procedure each time, and it either works (most of the time) or does not work to produce the desired result. What confidence level would you assign to the procedure in your example; that is, what percentage of the time do you think it produces your desired result? Activity 10.5 To begin, review Example 10.3 on page 378 in the book. a. Using the information given in Example 10.3, calculate a 95% confidence interval that estimates p = proportion of all Americans who think there is intelligent life on other planets. b. Write a sentence that interprets the confidence interval found in the previous part of this activity. c. In Example 10.3 of the book, what is the population parameter of interest and what is the value of the sample statistic that estimates this parameter? 96 Activity 10.6 Suppose that a Gallup Poll is done using random digit dialing to reach individuals in households with land-line telephones. The purpose is to estimate the proportion of U.S. adults who favor stronger gun control laws. In the survey, n = 900 individuals are sampled. In this sample the number of individuals that favors stronger gun control is 567. a. What is the population of interest in this situation? b. In words, describe the population parameter of interest. What mathematical symbol is used to represent this parameter? c. Describe the sample in this situation. d. What is the value of the sample statistic in this situation? What mathematical symbol is used to represent this statistic? e. The formula for the standard error of a sample proportion is pˆ (1 pˆ ) . Calculate the n standard error of the sample proportion in this situation. f. Use the general format Sample Statistic ± 2 × Standard Error to calculate an approximate 95% confidence interval that estimates the parameter of interest in this situation, and write a sentence that interprets the confidence interval. 97 Activity 10.7 This is modified version of Thought Question 10.3 on page 380 in the book. Suppose the legislature in a particular state wanted to know what proportion of students graduating from the state university last year were permanent residents of the state. The university had information for all students showing that 3900 of the 5000 graduates were state residents. a. In this situation, what is the population? What is the sample? b. Is a confidence interval appropriate for this situation? If so, compute the appropriate interval. If not, explain why not. Activity 10.8 To begin, read Example 10.10 on page 388 in the book. a. In words, describe the population parameter that is estimated in Example 10.10. What mathematical symbol(s) is used for this parameter? b. What is the value of the sample statistic that estimates the parameter of interest in this example? What mathematical symbol(s) is used for this statistic? c. What is the 95% confidence interval that estimates the parameter of interest in this situation? Write a sentence that interprets this confidence interval. d. Explain why the 95% confidence interval in this situation makes it reasonable to conclude that that 12th grade female drivers are more likely than 12th grade male drivers to always wear seatbelts when driving. 98 Activity 10.9 This is Thought Question 10.6 on page 389 in the book. An environmental group is suing a manufacturer because chemicals dumped into a nearby river may be harming fish. A sample of fish from upstream (no chemicals) is compared with a sample from downstream (chemicals), and a 95% confidence interval for the difference in proportions of healthy fish is .01 to .11 (with a higher proportion of healthy fish upstream). First, interpret this interval. The statistician for the manufacturer produces a 99% confidence interval ranging from –0.01 to +0.13. He tells the judge that because the interval includes 0, and because it has higher accompanying confidence than the other interval, we can’t conclude that there is a problem. Comment. Activity 10.10 This is Thought Question 10.7 on page 392 in the book. A randomly selected sample of 400 students is surveyed about whether additional coed dorms should be created at their school. Of those surveyed, 57% say that there should be more coed dorms. The 95% margin of error for the survey is 5%. a. Compute a 95% confidence interval for the population percentage in favor of more coed dorms. b. On the basis of this confidence interval, can we conclude that more than 50% of all students favor more coed dorms? Explain. Can we reject the possibility that the population proportion is .60? 99 Activity 10.11 This is a dataset activity. Use the GSS-02 dataset, which gives data from the 2002 General Social Survey. It’s a survey of randomly selected U.S. adults. A description of the dataset is on the companion website. You’ll have to consult that description to learn what variables are in this dataset. a. Describe a population proportion that could be estimated using the GSS-02 data. (Don’t estimate the proportion who are of a particular sex; make it more interesting than that!) Use the data to calculate a 95% confidence interval that estimates this proportion. Write a sentence that interprets the interval. b. Analyze the difference between males and females with regard to the population proportion that you considered in part a. What are the values of the sample proportions for males and females? Calculate a 95% confidence interval for the difference in two proportions. Interpret the result. c. Write a short paragraph summarizing the results of parts a and b. d. Now use the GSS-93 dataset, which gives data from the 1993 General Social Survey for most of the same variables in the GSS-02 dataset. Examine the same variable and corresponding proportion that you considered in part a of this activity. Compare the results for the two different years (1993 and 2002). 100 Activity 10.12 Use the methods discussed in this chapter to estimate the proportion of all cars in your area that are red. Stand near a busy street and count cars as they pass by. Count 100 cars and keep track of how many are red. a. Using your data, compute a 95% confidence interval for the proportion of cars in your area that are red. b. On the basis of how you collected the data, describe any possible biases that are likely to influence your results. Alternative suggestion #1 for Activity 10.12: Rather than keeping track of red cars, keep track of how many drivers out of 100 drivers are talking on a cell phone while driving. Then carry out parts a and b of the activity. Alternative suggestion #2 for Activity 10.12: Observe 100 pedestrians on your school campus. Keep track of how many are talking on a cell phone. Create a confidence interval for the proportion of pedestrians in your area who are talking on a cell phone at any given time. Discuss any possible biases in your sampling method. For more fun yet, observe 100 male pedestrians and 100 female pedestrians. Calculate 95% confidence intervals that estimate the proportions of male and females pedestrians talking on a cell phone and the difference in the proportions of male and female pedestrians talking on cell phones. For this variation of the activity, what do you think is the population represented by your sample? 101 CHAPTER 11 ACTIVITIES INSTRUCTORS: Two group/team projects for Chapter 11 suitable for in-class work are given in the Course Support (Class Projects) section of the companion website. Activity 11.1 Each part describes a scenario in which a sample will be used to estimate a population value. In each scenario, (i) describe the parameter of interest (in words) and give notation for that parameter, and (ii) give the value of the sample estimate of the parameter and give notation for the estimate. See pages 405-407 for guidance. a. We ask “What’s the average amount that students at our school sleep per night?” In a survey, a random sample of students at a school is asked how much typically sleep each night. A summary of the observed data is: Variable HrsSleep N 994 Mean SE Mean 6.7943 0.0384 StDev 1.2116 Parameter description (in words): Symbol for population parameter: Value of sample estimate (with notation): b. We ask “How much difference is there in the mean GPAs of fraternity members and men who aren’t in fraternities at our school?” This was done in the same survey described in part a. Here’s relevant observed data Not in fraternity In a fraternity N 341 87 Mean 3.18 3.11 Parameter description (in words): Symbol for population parameter: Value of sample estimate (with notation): Activity 11.1 continued 102 SE Mean 0.0245 0.0526 StDev 0.4518 0.4910 c. An anthropologist asks, “For adult women, how much difference is there, on average, between the lengths of the forearm and the foot?” He measures the forearm and foot lengths of a sample of 467 women. Here’s a summary of the sample data. Arm Foot Difference N 467 467 467 Mean 24.99 24.18 0.81 StDev 2.6115 2.1905 2.6592 SE Mean 0.1208 0.1014 0.1231 Parameter description (in words): Symbol for population parameter: Value of sample estimate (with notation): Activity 11.2 For each of the three situations described on pages 405-407 in the book, give one example that is different from the ones given on pages 405-407. For each of your examples, give an example research question and a description of the parameter of interest. 103 Activity 11.3 This is Thought Question 11.2 on page 411 in the book. Notice that all of the standard error formulas in this section (Section 11.1) have the sample size(s) in the denominator. This tells us that if the sample size is increased, the standard error will decrease (assuming that the sample statistics remain about the same). Refer to the rough definition of standard error on page 409, and explain why this relationship between sample size and standard error makes sense, based on that definition. Activity 11.4 To begin, review Example 11.5 on pages 414-415. Now, suppose that we have a sample of n = 12 forearm lengths for college women. For this sample, the mean is x 23.1 cm and the standard deviation is s = 1.28 cm. a. Calculate a 95% confidence interval that estimates the population mean forearm length for women. Use the top portion of page 415 in the book for guidance. b. Write a sentence that interprets the confidence interval calculated in part a. For guidance, see the last sentence of Example 11.5. c. What is the value of the standard error of the mean for the sample in this activity? In the context of this activity, explain what this standard error measures. (See page 409 in the book for a helpful definition.) 104 Activity 11.5 Give an example of a situation in which we would estimate the difference in two population means based on independent samples and an example of a situation in which we would estimate a mean difference based on paired data. Give examples different from any given in Section 11.1 in the book. See pages 407-409 for a discussion of paired data and independent samples. Activity 11.6 This is Thought Question 11.3 on page 420 in the book. a. What population do you think is represented by the sample of 175 students in Example 11.8 (p. 420)? Do you think the Fundamental Rule for Using Data for Inference (reviewed on p. 372 of Chapter 10) holds in this case? b. Does the confidence interval in Example 11.8 tell us that 95% of the students watch television between 1.842 and 2.338 hours per day? If not, what exactly does the interval tell us? Activity 11.7 This is Thought Question 11.4 on page 423 in the book. For a fixed sample, explain why it is logical that a 95% confidence interval covers a wider range of values than a 90% confidence interval. Explain this in terms of our confidence that the procedure works in any given case 105 Activity 11.8 Six individuals print letters of the alphabet, in alphabetical order, as they can for fifteen seconds using their dominant hand. They then repeat the task with their nondominant hand. The numbers of letters printed for the six individuals are as follows: Individual Dominant Nondominant 1 25 13 2 39 16 3 34 13 4 27 10 5 30 17 6 43 19 a. This is paired data! Compute the difference between numbers of letters printed using the two hands for each individual. List the sample of difference. b. Using either statistical software, a calculator, or “by hand” work, calculate the sample mean and the sample standard deviation of the sample of six differences. Mean difference = Standard deviation of differences = c. Calculate a 95% confidence interval that estimates the mean difference in letters printed using the two hands. Use either statistical software, a calculator, or “by hand” work to do the calculations. d. Write a sentence that interprets the 95% confidence interval for the mean difference. e. Consider the format Sample statistic ± (Multiplier × Standard error). For the confidence interval found in part c (for the mean difference), give values for each element of this formula. Sample statistic = Multiplier = 106 Standard error = Activity 11.9 This is Thought Question 11.5 on page 430 in the book. In Section 11.3, we learned how to find a confidence interval for the mean of paired differences, which we used in Example 11.9 (p. 422) to estimate the mean difference in weekly computer and TV hours for a population of liberal arts students. a. Explain why it would not have been appropriate to use the methods in section 11.4 for two independent samples, to estimate that mean difference, even though in either case the sample estimate is the difference in sample means, 5.36 hours. b. If the methods in this section had been erroneously used by treating computer usage and TV viewing hours as independent samples, do you think the standard error of the sample estimate would have been larger or smaller than it was in Example 11.9? Explain your answer using common sense, not formulas. Think about how much natural variability there would be in the data for two independent samples, compared with measuring both sets of hours on the same individuals. Activity 11.10 This is Thought Question 11.6 on page 435 in the book. Part of the quote in Case Study 11.1 (p. 435) said, “For any vertex baldness (i.e., mild, moderate, and severe combined), the age-adjusted RR was 1.4 (95% CI, 1.2 to 1.9).” Explain what is wrong with the following interpretation of this result, and write a correct interpretation: Incorrect Interpretation: There is a .95 probability that the age-adjusted relative risk of a heart attack (for men with any vertex baldness compared to men without any) is between 1.2 and 1.9. 107 Activity 11.11 For this activity, use the ConfidenceLevel skillbuilder applet described in Section 11.6 (pages 436-438) of the book. The applet is on the companion website. a. Generate one sample with the Confidence level set at 68%. Now move the slider to increase the confidence level. What happens to the center and the width of the interval? Explain why this happens. b. Generate one sample at a time with the Confidence level set at 68% until you get an interval that does not cover the true mean of 170 (the red line). Now move the slider to increase the confidence level. (The applet uses the same sample to create confidence intervals with the new confidence levels.) Does the interval ever cover the true mean of 170? Explain what has happened. c. If different random samples of the same size are taken from a population and a 95% confidence interval estimate of a population mean is created each time, which of the following change and which stay the same? The endpoints of the interval The true value of the population mean The sample mean d. Use the Reset button and then set the confidence level to 90%. Click animate!. Stop the process after about 100 intervals have been generated. What percentage of the intervals included the population mean value (170)? Is this percentage about what you would expect? d. Based on what you have learned from these activities and your reading of Chapters 10 and 11: 108 (i) Explain in your own words what the confidence level for a confidence interval is. (ii) If 100 researchers each use data from a random sample to construct a 90% confidence interval, will exactly ninety intervals cover the true population parameter value and ten intervals not cover the true population value? Explain. (iii) For any given sample and confidence interval, will the researcher know whether it has covered the truth? 109 Activity 11.12 This is a dataset activity. Use the Student0405 dataset, which gives data from a survey in statistics classes at a large university. The variable StudyHrs gives the self-reported number of hours that a student studies per week. a. Analyze the StudyHrs variable, assuming that the sample is representative of all students at the university. In particular, create and interpret appropriate graph(s) of the data, calculate useful descriptive statistics, and calculate and interpret a 95% confidence interval that estimates the mean weekly study hours for all students at the university. b. Compare weekly study hours for females and males. Your analysis should include a graphical comparison of males and females, appropriate descriptive statistics, and a 95% confidence interval that estimates the difference between mean weekly study hours for females and males at the school. Interpret the 95% confidence interval to make a conclusion about the difference between the means of the populations of males and females. 110 Activity 11.13 This is a dataset activity. Use the pulsemarch dataset, which gives pulse rates before and after marching in place for one minute, for 40 college students. The sex of each student is also in the dataset. a. Analyze the Before variable, which is the resting pulse rate before marching in place. In particular, create and interpret appropriate graph(s) of the data, calculate useful descriptive statistics, and calculate and interpret a 95% confidence interval that estimates the mean pulse rate for the population represented by this sample. b. Analyze the difference between the pulse rates after and before marching. The variables are After and Before. In particular, create and interpret appropriate graph(s) of the differences between the two pulse rates, calculate useful descriptive statistics, and calculate and interpret a 95% confidence interval that estimates the mean difference in pulse rate caused by marching for the population represented by this sample. 111 Activity 11.14 This is a dataset activity. Use the UCDavis2 dataset, which gives data collected in a survey of a statistics class. Among other things, students reported their estimates of their parents’ heights. The variables are dadheight and momheight. Treating the parents’ heights as paired data, analyze the difference between the heights of the two parents. In particular, create and interpret appropriate graph(s) of the differences between the parents’ heights, calculate useful descriptive statistics, and calculate and interpret a 95% confidence interval that estimates the mean difference in the heights of students’ parents, for the population of students represented by this sample. Outlier alert: Be on the lookout for outliers. You might find some unusual data here. If you omit any data, describe what you did and why. 112 Activity 11.15 Use a dataset of your choosing from the companion website. Use the data to create and interpret a 95% confidence interval that estimates the difference between two population means based on two independent samples. Describe the population parameter of interest and the sample used for your analysis. In addition to reporting the confidence interval, create appropriate graphs of the data and report useful descriptive statistics for the comparison. Activity 11.16 Collect data on a quantitative variable of interest to you. Collect at least 30 observations. Using the data, compute a 95% confidence interval for the mean of the population from which you sampled your observations. Explain how you collected your sample and discuss whether you think there may be any biases in the results due to your sampling method. Interpret the 95% confidence interval to make a conclusion about the mean of the population. 113 CHAPTER 12 ACTIVITIES INSTRUCTORS: Two group/team projects for Chapter 12 suitable for in-class work are given in the Course Support (Class Projects) section of the companion website. Activity 12.1 Review page 454 in the book. Then, explain the difference between a onesided or one-tailed hypothesis test and a two-sided or two-tailed hypothesis test. Give an example of each. Activity 12.2 This is Thought Question 12.1 on page 455 in the book. Confidence intervals and hypothesis testing are the two major categories of statistical inference. On the basis of this information, do you think null and alternative hypotheses are generally statements about populations, samples, or both? Explain. Activity 12.3 Think of a problem of interest to you which could involve testing hypotheses. Describe the problem and state the null and alternative hypotheses of interest. 114 Activity 12.4 This is a modified and expanded version of Thought Question 12.2 on page 455 in the book. a. For Example 12.4 (p. 455), explain what the null hypothesis and the alternative hypothesis would be. b. For Example 12.4 (p. 455), write a sentence that describes the probability question on which the hypothesis testing was based. The sentence should be of the form: “If [fill in null hypothesis] is true, then the likelihood that [fill in event] would have happened is [fill in likelihood].” 115 Activity 12.5 This is Thought Question 12.3 on page 458 Suppose that an ESP test is conducted by having someone guess whether each of n coin flips will result in heads or tails. The null hypothesis is that p= .5, and the alternative is that p > .5, where p = probability of guessing correctly. Suppose that one participant guesses n = 10 times and gets 6 right, while another guesses n = 100 times and gets 60 right. In each case, the percent correct was 60%. Do you think the p-value would be lower in one case than in the other, or would it be the same in both cases? Explain Activity 12.6 Think of a decision between two possible actions or statements that is of interest to you. Think of a situation where it’s not absolutely certain what the right decision might be (for instance, whether or not to accept a request to go out on a date.) Label one of the actions or statements as the “null hypothesis” and one as the “alternative hypothesis.” Now, discuss what the type 1 and type 2 errors are in your situation and how the potential consequences of each type might affect the decision making process. 116 Activity 12.7 Use Example 12.12 on pages 467-468 for guidance, but note that the null value is different for this activity than in Example 12.12. In a marketing survey for an automobile manufacturer, 90 randomly selected adults are asked which car color they would choose, if a particular model was available in both blue and red body colors. a. Let p = population proportion that would choose “blue.” The manufacturer wants to learn if a majority of buyers would pick blue. Keeping in mind that a majority is p>0.5, write null and alternative hypotheses about p in this situation. H0: Hs: b. In the survey, 53 of the 90 respondents said “blue.” What is the value of p̂ =sample proportion that picked blue? c. “By hand,” calculate the value of the test statistic z pˆ p0 p0 (1 p0 ) n d. On the basis of the p-value, decide between the null and alternative hypotheses. Then, write a general conclusion about whether a majority of adults would prefer blue as their car color for this model. e. Draw a sketch that illustrates the relationship between the p-value in this situation and the value of the test statistic computed in part c. Use Table 12.1 on page 465 and Figure 12.2 on page 467 for guidance. 117 Activity 12.8 (For Section 12.3): This is a dataset activity in which you’ll compare two population proportions. Use the GSS-02 dataset, which contains data from the 2002 General Social Survey, a survey of randomly selected adults in the U.S. We’ll compare the proportions of males and females who are opposed to capital punishment. The relevant variables in the dataset are sex and cappun. a. Define the population parameter for examining the difference in the proportions of males and females opposed to capital punishment. Describe it in words and give the appropriate notation. b. A researcher thinks that that females may be more likely to be opposed to capital punishment than males are. Write appropriate null and alternative hypotheses for this researcher. Write the hypotheses in words and using appropriate statistical notation. c. Use statistical software to test the hypotheses written in part b. What is the p-value for the test? On the basis of the p-value, is the result statistically significant at the .05 level of significance? What conclusion can be made about this situation? Write your conclusion in the context of this situation. e. What proportion of females in the sample is opposed to capital punishment? What proportion of males in the sample is opposed to capital punishment? What is the difference between the sample proportions opposed to capital punishment for females versus males? 118 Activity 12.9 Construct a situation for which you can test null and alternative hypotheses about a population proportion. Describe the population parameter of interest, the method you would use to collect data, and what your null and alternative hypotheses are. If feasible (and at the discretion of your instructor), collect data for this situation, use the data to test your hypotheses, and report your result. 119 Activity 12.10 Construct a situation for which you can test null and alternative hypotheses about the difference in two population proportions. Describe the parameter of interest, the method you would use to collect data, and what your null and alternative hypotheses are. If feasible (and at the discretion of your instructor), collect data for this situation, use the data to test your hypotheses, and report your result. 120 Activity 12.11 This is Thought Question 12.4 on page 474 in the book. Here are two questions about p-values and one-sided versus two-sided tests: 1. Under what conditions would the p-value for a one-sided z-test be greater than .5? 2. When the data are consistent with the direction of the alternative hypothesis for a onesided test, the p-value for the corresponding two-sided test is double what it would be for the one-sided test. Use this information to explain why it would be cheating to look at the data before deciding whether to do a one- or two-sided test. 121 CHAPTER 13 ACTIVITIES INSTRUCTORS: Two group/team projects for Chapter 13 suitable for in-class work are given in the Course Support (Class Projects) section of the companion website. Activity 13.1 (For Section 13.2) In a class survey, students in a statistics class were asked to report their heights (in inches). The data from the n = 87 males in the class were used to test whether the mean height for men is 70 inches (as is often reported by the media) or is greater than 70 inches. Computer output for the test follows. (Data source: pennstate1 dataset for the book.) Test of mu = 70 vs mu > 70 Variable Height N 87 Mean 71.5632 StDev 2.7042 SE Mean 0.2899 T 5.39 P 0.000 a. Write the null and alternative hypotheses using proper statistical notation. Null: Alternative: b. Read the output to find the values of the t-statistic and the p-value. Then, state a conclusion about the hypotheses and about the “real world” situation. Be careful about what population might be represented by this sample and any possible biases due to how the data were collected. t= p-value = Conclusion: c. Draw a sketch that shows the connection between the value of the t-statistic and the pvalue in this situation. Use the table and figure(s) at the top of page 503 in the book for guidance. 122 Activity 13.2 This is Thought Question 13.2 on page 510 in the book. Suppose that in Example 13.2 (pp. 508-509), the purpose of the study was to determine whether pilots should be allowed to consume alcohol the evening prior to their flights and the alcohol consumption occurred 12 hours before the measurement of time of useful performance. Refer to the discussion of type 1 and type 2 errors (pp. 461 in Chapter 12). Explain what the consequences of each type of error would be in this example. Which would be more serious? Given the data and results of the study, which type of error could have been made? Activity 13.3 To begin, review Example 13.4 on page 512 in the book. a. Explain why a one-sided alternative hypothesis was used for this example, rather than a two-sided alternative hypothesis. b. What was done in this example to verify the necessary data conditions for doing a two sample t-test? (You might look back at Example 11.11 on pages 427-428 in the book.) 123 Activity 13.4 In each part, (1) identify whether the comparison is based on two independent samples or paired data AND (2) write null and alternative hypothesis using proper statistical notation. Use the notation 1 2 for the difference in population means when the data are from independent samples and the notation d for the mean of a paired difference. a. Mean scores on a memory test are compared for women aged 40 to 49 years old versus women aged 60 to 60 years old. We wish to determine if the mean is higher for 40 to 49 year olds. Type of comparison is … Null hypothesis is … Alternative hypothesis is …… b. Fifty students have their blood pressures before and after an exam. We wish to know if there is an increase, on average. Type of comparison is … Null hypothesis is … Alternative hypothesis is c. A class survey is used to compare the mean GPAs of male and female students. We wish to know if there is a difference. Type of comparison is … Null hypothesis is … Alternative hypothesis is 124 Activity 13.5 (For Section 13.3, pages 507-512 in the book): Sixty-three college men report their actual weights and also their desired weights in a survey. A paired t-test is used to test whether the mean difference is 0 or not. Thus the alternative hypothesis was twosided. Computer output for the test is as follows. (Data source: idealwtmen dataset for the book.) Paired T for actual - ideal N Mean StDev actual 63 176.095 26.202 ideal 63 173.619 21.151 Difference 63 2.47619 13.76983 SE Mean 3.301 2.665 1.73484 95% CI for mean difference: (-0.99170, 5.94408) T-Test of mean difference = 0 (vs not = 0): T-Value = 1.43 P-Value = 0.159 a. Explain why it was appropriate to use a paired t-test rather than a two-sample t-test for independent samples. b. Write the null and alternative hypotheses using proper statistical notation. (Use Example 13.2 on pages 508-510 for guidance.) Null: Alternative: c. Read the output to find the values of the t-statistic and the p-value. Then, state a conclusion about the hypotheses and about the “real world” situation. t= p-value = Conclusion: d. Draw a sketch that shows the connection between the value of the t-statistic and the pvalue in this situation. Use Figure 13.5 on page 510 in the book for guidance. 125 Activity 13.6 Use Example 13.5 on pages 515-516 in the book as guidance for this activity. The output below shows results for a 2-sample t-test for comparing the mean hours of studying per week for students who say they prefer to sit in the front of classrooms and students who prefer to sit in the back (data came from a statistics class). The alternative hypothesis was a two-sided (not equal) hypothesis. N Mean StDev SE Mean Front 99 16.4 10.85 1.09 Back 94 10.9 8.41 0.87 T-Test of difference = 0 (vs not =): T-Value = 4.01 P-Value = 0.000 DF = 183 a. Write the null and alternative hypotheses being tested using appropriate statistical notation. Explain what the parameters in your hypothesis represent. (See Step 1 on page 515 in the book.) b. Explain why the necessary conditions for a two-sample t-test are met in this situation. (See Step 2 on page 516.) c. Read the output to find the values of the t-statistic and the p-value. Then, state a conclusion about the hypotheses and the “real world” situation. t= p-value = Conclusion: d. Draw a clearly labeled sketch that shows how the p-value is related to the value of the tstatistic in this situation. (See Figure 13.6 on page 512 for guidance.) 126 Activity 13.6 continued e. The formula for the t-statistic is t x1 x2 s12 s22 n1 n2 . Give values for each of the elements in the formula. (Use the output on the previous page.) x1 x2 s1 s2 n1 n2 Activity 13.7 This is Thought Question 13.3 on page 520 in the book. The paired t -test introduced in Section 13.3 and the two-sample t -test introduced in section 13.4 are both used to compare two sets of measurements. The null hypothesis in both cases is usually that the mean population difference is 0. Explain the difference in the situations for which they are used. Suppose researchers wanted to know if college students spend more time watching TV or exercising. Explain how they could collect data appropriate for a paired t -test and how they could collect data appropriate for a twosample t -test. 127 Activity 13.8 This is a dataset activity for section 13.4, Lesson 1. You’ll use an unpooled two-sample t-test. Use the GSS-02 dataset, which contains data from the 2002 General Social Survey, a survey of randomly selected adults in the U.S. We’ll consider the difference between the mean hours of self-reported television watching per day for females versus males. The relevant variables in the dataset are sex and tvhours. a. Define the population parameter of interest in this situation. Describe it in words and give the appropriate notation. b. Write appropriate null and alternative hypotheses for this situation. Explain why you chose the alternative hypothesis that you did. c. Use statistical software to test the hypotheses written in part b. What is the p-value for the test? ________ Is the result statistically significant at the .05 level of significance? ____________ What conclusion can be made about this situation? Write your conclusion in the context of this situation. d. What is the mean television watching time for females in the sample? What is the mean television watching time for males in the sample? What is the difference between the sample mean television watching times per day for females versus males? e. Write a short summary of this activity that could be understood by somebody with minimal training in statistics. 128 Activity 13.9 (For Section 13.6): In each part of this activity, a research question is briefly described. For each research question: 1. Specify the population parameter(s) of interest. Give the symbol and describe it in words. 2. Explain whether the primary method for analyzing the data should be a confidence interval or a hypothesis test. If a hypothesis test should be done, write the null hypothesis. The following table gives the notation for “big five” parameters that we covered in chapters 9-13. Value of interest 1. One proportion 2. One mean 3. Difference in two proportions (independent groups) 4. Difference in two means (independent groups) 5. Mean difference for paired data Population Parameter p p1 p2 1 2 d a. Research question: What proportion of adults in the United States is in favor of the death penalty for persons convicted of murder? b. Research question: Is the mean systolic blood pressure of women who use oral contraceptives greater than the mean systolic blood pressure of women who do not use oral contraceptives? c. Research question: Is mean normal human body temperature less than 98.6? 129 Activity 13.9 continued d. Research question: On average, how much difference is there between the adult heights of a father and his son? e. Research question: Is the proportion of men who experience sleep apnea (irregular breathing during sleep) higher than the proportion of women who do? f. Research question: What is the mean weight of six year old children? g. Two diets for weight-loss are compared. Sixty participants are randomly divided into two groups and each group uses a different diet. Research question: Is there a difference between the mean weight-losses for the two programs? h. Research question: How much difference is there in the proportions of patients successfully treated for two different treatments of a medical condition? i. In a survey done by a car manufacturer, people are asked which color they would pick for a new car if the car was available in silver, blue, and green. Research question: Will more than 1/3 of all people pick silver? 130 Activity 13.10 Read Section 13.8 (pages 530-532) about evaluating significance in research reports. a. Is this problem discussed in item 6 on page 532 equivalent to saying that one in 20 statistically significant results are erroneous, and are just due to chance? Explain. b. Would at least 20 hypotheses have to be tested in a study in order for the issue raised in item 6 on page 532 to be a problem? Explain. c. Find an example of a study in the news or on the Internet in which it is clear that multiple hypotheses were tested. Comment on whether the news article mentioned that as a problem. Discuss the extent to which you think this issue affected the conclusions made in the news story. d. To begin, see items 2 and 5 on page 532 in the book. Find an example of a study in the news or on the Internet in which a result is described as being “significant.” Discuss whether you think the word “significant” is used in the everyday sense or in the statistical sense only. Discuss whether information is given in the article about the magnitude of the “significant” difference or relationship. 131 Activity 13.10 continued e. See item 3 on page 532 in the book. Find an example of a study in the news or on the Internet for which you suspect that the small sample size issue described in item 3 may be a problem. Activity 13.11 This is Thought Question 13.4 on page 522 in the book. Refer to Example 13.9 (p. 521). A 95% confidence interval for the difference in proportions who would get ear infections with placebo compared to with Xylitol was 0.02 to 0.226. On the basis of this information, specify a one-sided 97.5% confidence interval and explain how you would use it to test H 0 : p1 p2 0 versus H a : p1 p2 0 with = .025. 132 CHAPTER 14 ACTIVITIES INSTRUCTORS: Two group/team projects for Chapter 14 suitable for in-class work are given in the Course Support (Class Projects) section of the companion website. Activity 14.1 This is Thought Question 14.1 on page 556 in the book. Regression equations can be used to predict the value of a response variable for an individual. What is the connection between the accuracy of predictions based on a particular regression line and the value of the standard deviation from the line? If you were deciding between two different regression models for predicting the same response variable, how would your decision be affected by the relative values of the standard deviations for the two models? Activity 14.2 This is Thought Question 14.2 on page 557 in the book. Look at the formula for SSE, and explain in words under what condition SSE= 0. Now explain what happens to r 2 when SSE = 0, and explain whether that makes sense according to the definition of r 2 as “proportion of variation in y explained by x.” 133 Activity 14.3 This is Thought Question 14.3 on page 560 in the book. In previous chapters, we learned that a confidence interval can be used to determine whether a hypothesized value for a parameter can be rejected. How would you use a confidence interval for the population slope to determine whether there is a statistically significant relationship between x and y? For example, why is the interval that we just computed for the sign-reading example (Example 14.7) evidence that sign-reading distance and age are related? Activity 14.4 Read pages 560-561 in the book. Summarize the connection between testing whether a population correlation value is 0 or not and testing whether the slope of a population regression line is 0 or not. 134 Activity 14.5 Use the GSS-02 dataset, which gives data from the 2002 General Social Survey. Analyze the relationship between the variables emailtime = hours spent per week using email and age = respondent’s age. Draw a scatterplot, determine the correlation value, determine the linear regression equation for the sample using emailtime as the yvariable, and assess the statistical significance of the observed relationship. What does this activity indicate about the effect of sample size on the significance of an observed relationship? Explain. (See page 561 in the book for guidance.) 135 Activity 14.6 This is Thought Question 14.5 on page 566 in the book. Draw a picture similar to the one in Figure 14.3 (p. 553), illustrating the regression line and the normal curves for the y values at several values of x. Use it to illustrate the difference between a prediction interval for y and a confidence interval for the mean of the y’s at a specific value of x. Activity 14.7 This is Thought Question 14.6 on page 567 in the book. A residual is the difference between an observed value of y and the predicted value of y for that observation. Based on the size of a residual for an observation, how would you decide whether an observation was an outlier? Is it enough to know the value of the residual, or do you need to know other information to make this judgment? How could you apply the methods for detecting outliers described in Chapter 2? 136 CHAPTER 15 ACTIVITIES INSTRUCTORS: Two group/team projects for Chapter 15 suitable for in-class work are given in the Course Support (Class Projects) section of the companion website. Activity 15.1 Find a survey question that has been asked at two different time periods or by two different sources. For instance, many polling organizations ask opinions about certain issues on an annual or other regular basis. [Suggestion: You might be able to use the GSS-93 and GSS-02 datasets on the companion website for the book.] a. Create a contingency table where “time period” is one categorical variable and “response to poll question is the other. b. State the null and alternative hypotheses for comparing responses across the two time periods. c. Carry out a chi-square test to see if opinions have changed over the two time periods. Write a brief summary of your findings. Be clear about the conclusion and to what population the conclusion applies. Also in your summary, indicate where you located the data and when the survey questions were asked. 137 Activity 15.2 Carefully collect data cross-classified by two categorical variables for which you are interested in determining whether there is a relationship. Collect the data yourself (or with a group.) Be sure to get a large enough sample so that there are at least five in each cell. a. Create a contingency table for the data and calculate appropriate descriptive percentages for looking at the possible relationship. b. Use a chi-square test to determine whether there is a statistically significant relationship in the observed data. c. Discuss the role of sample size in making the determination in part c. d. Write a summary of your findings. 138 Activity 15.3 This is Thought Question 15.1 on page 590 in the book. Consider Example 15.2 (pp. 585) about gender and the question about with whom it’s easiest to make friends. a. What are the degrees of freedom for the chi-square statistic for these data? b. To be statistically significant at the .05 level, how large would the calculated chi-square have to be? Activity 15.4 Refer to Example 15.10 on pages 594-595 in the book. a. Verify that the value of the chi-square statistic for the data in Table 15.5 on page 594 is (about) 13.66. Either use statistical software or “by hand” calculations. b. What are the degrees of freedom for the chi-square test done in Example 15.10? c. What is the “real world” conclusion in Example 15.10? 139 Activity 15.5 Read the original Journal Article on the companion website for Chapter 11— Example 11.12: “Development and Initial Validation of the Hangover Symptoms Scale: Prevalence and Correlates of Hangover Symptoms in College Students.” The authors give the results of some chi-square tests on pages 1445-1446 of the article. For any two of those chi-square tests, describe the variables involved, the null and alternative hypotheses, and the conclusion about the variables. Activity 15.6 This is Thought Question 15.2 on page 594 in the book. Suppose that you read that men are more likely to be left-handed than women are. To investigate this claim, you survey your class and find that 11 of 84 men and 7 of 78 women are left-handed. Should you compare the men and women using a z-test or a chi-square test? Or does it matter? Activity 15.7 This is Thought Question 15.4 on page 601 in the book. Remember that the “degrees of freedom” for the chi-square test for a two-way table represents the largest number of cells for which you were “free” to find expected counts. The remaining expected counts were determined because the row and column totals had to be the same as they were for the observed counts. Explain how the same principle applies in specifying the degrees of freedom for a chi-square goodness-of-fit test, which are k – 1 when there are k categories. 140 CHAPTER 16 ACTIVITIES INSTRUCTORS: One group/team project for Chapter 16 suitable for in-class work is given in the Course Support (Class Projects) section of the companion website. Activity 16.1 This is Thought Question 16.1 on page 621 in the book. To what populations do the conclusions of Example 16.1 on pages 618-619 apply? Do you think it matters that the data were collected at a single university? Does it matter that the surveys were done only in statistics classes? Activity 16.2 This is a dataset activity. Use the GSS-02 dataset, which gives data from the 2002 General Social Survey, a nationwide sample of randomly selected adults. The variable degree gives the highest educational degree achieved (five categories) and the variable emailtime gives self-reported hours of week spent doing email. Use one-way analysis of variance to examine differences in mean weekly email time for the five groups. Write a summary or your findings. Include appropriate graphical analysis, descriptive statistics, the findings of the analysis of variance, and a discussion of the ways in which the educational degree groups differ. 141 Activity 16.3 Read the original Journal Article for Chapter 6 – Case Study 6.2: “The Effects of Different Resistance Training Protocols on Muscular Strength and Endurance Development in Children.” This study was discussed in Case Study 6.2 on pages 195 in the book; the purpose was to investigate the effects of different weight lifting programs for children. a. How many children participated? What were the three different “treatment” groups in the study? How were children assigned to the groups? b. Examine Table 1 on the second page (page is numbered “2 of 7” at the lower left). This table presents a comparison of the characteristics of the children in the three treatment groups at the beginning of the study. ANOVA results are presented for three different variables. For each variable, describe the null hypothesis in words and give the conclusion. (Note: weight = child’s body weight). Given the nature of this study, is there any important way that the three groups differed at the beginning of the study? Explain. c. Table 3 in the article (on the page numbered “4 of 7” at the lower left) gives results for muscle endurance at the end of the study. The response is the number of times a child can now lift (or press) the maximum weight that they could lift (or press) one time at the beginning of the study. There are two variables – one for a chest press task and one for a leg extension task. ANOVA results are not directly given in Table 3, but ANOVAs were carried out by the authors. What would be the null hypotheses for these ANOVAs? On the basis of statements written about Table 3 and the values in Table 3, what do you think were the results of these ANOVAs? 142 Activity 16.3 continued d. Continue to examine the information in Table 3 of the article. Explain which of the necessary conditions for using the F-statistic to compare means is violated by the observed data? What condition are we not able to evaluate from information given in the article? (See page 620 in the textbook for a summary of the necessary conditions.) Activity 16.4 This is Thought Question 16.2 on page 630 in the book. In Example 16.9 (p. 629), each 95% confidence interval had the same width. Why did this happen? When would the 95% confidence intervals have different widths? Activity 16.5 This is Thought Question 16.4 on page 637 in the book. In Example 16.14 (p. 636), there was only one server of each sex. What problem does this cause in the interpretation and generalization of the results? How would you have designed the experiment to better examine the interaction between sex of the server and drawing a happy face (or not)? 143 Activity 16.6 (For Section 16.4): In a statistics class, students were randomly assigned either to run in place or not. After the running (or not), all students took their pulse rates. Mean pulse rates for combinations of gender and whether the student ran or not are given in the following table. (Data Source: The Pulse dataset bundled with the Minitab computer program, Minitab, Inc.) Did not run Ran Female 74.8 (n = 24) 112.8 (n = 11) Male 70.6 (n = 33) 83.2 (n = 24) a. For each sex separately, determine the difference between mean pulse rates for those who ran in place versus those who did not. Difference for females = _________ Difference for Males = ___________ b. Explain why the results for part a give evidence of an interaction between the gender and running in place (or not) variables. c. Graph the means using the same format used for Figure 16.13 (p. 636) and Figure 16.14 (p. 637) in the book. Put mean pulse rate on the vertical axis and use the horizontal axis to indicate whether students either did not or did run. Then, show two lines on the graph – one connecting the two means for females and one connecting the two means for males. Briefly discuss what the graph shows about how gender, running (or not), and the combination of the two variables affects pulse rate. 144 Activity 16.7 Design an experiment or survey for which the response variable is a quantitative variable of interest to you and the purpose is to compare (at least) three different groups or treatment conditions. Collect the data and analyze the results. Write a summary in which you discuss the purpose of your study, your data collection method, and your data analysis. Present a conclusion and indicate what population was represented by your sample. 145