Download getting to know your book

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia, lookup

Student's t-test wikipedia, lookup

Taylor's law wikipedia, lookup

Bootstrapping (statistics) wikipedia, lookup

Confidence interval wikipedia, lookup

Resampling (statistics) wikipedia, lookup

Misuse of statistics wikipedia, lookup

Foundations of statistics wikipedia, lookup

Transcript
PART I
GETTING TO KNOW YOUR BOOK:
A SCAVENGER HUNT AND MORE!
You will have an easier time using Mind On Statistics in your statistics course if you first
familiarize yourself with the features of the book and companion website. This part of the
activities manual has 46 questions that will help you get to know your book. You will have
to open the book and do some searching to answer these questions. In essence, the
questions in this part send you on a scavenger hunt.
The questions are divided into the following categories:
A. Getting to Know the Visual Features of the Book
B. A Few General Questions to Test Your Observational Skills
C. Getting to Know about the Thought Questions
D. Getting to Know about the Exercises
E. Getting to Know about the Companion Website
F. Getting to Know about the Datasets accompanying the Book
G. Getting to Know about the Technology Manuals
H. Getting to Know about the Supplemental Topics
I.
Getting to Know about the Skillbuilder Applets on the Companion
Website
You can choose to explore all of these activities as you begin working with the book, or
explore each one as it is introduced in your course. Whichever you choose to do, have fun!
1
A. Getting to Know the Visual Features of the Book
Words alone, with nothing to break them up except chapter headings, might be fine for a
novel, but would make for confusing reading in a textbook! Like most textbooks, Mind On
Statistics has various visual features designed to enhance the words and formulas on the
pages.
Questions 1 to 10 list ten organizational features of the book. In Chapter 1 and/or Chapter 2
of the book, locate two instances of each feature. Give two page numbers where you found
the feature, and describe how the feature is presented visually (color and style).
As an example, a Thought Question can be found on page 8 and page 17 of the book.
Thought Questions are introduced in purple print and presented in a box with a blue
border.
1. Case Study
Found on page ____ and page ____.
Visual presentation:
2. Definition
Found on page ____ and page ____.
Visual presentation:
3. Numbered Section Heading
Found on page ____ and page ____.
Visual presentation:
4. Unnumbered (Sub)Section Heading
Found on page ____ and page ____.
Visual presentation:
2
5. Key Terms
Found on page ____ and page ____.
Visual presentation:
6.
“In Summary” Display
Found on page ____ and page ____.
Visual presentation:
7. Numbered Example
Found on page ____ and page ____.
Visual presentation:
8. Software/computing Tip (Minitab, SPSS, Excel, TI 83/84 Tips)
Found on page ____ and page ____.
Visual presentation:
9. Formula Box
Found on page ____ and page ____.
Visual presentation:
10. Technical Note (provide one example only)
Found on page ____ .
Visual presentation:
3
B. A Few General Questions to Test Your Observational Skills…
11. What is the connection between the photos on the opening pages of the chapters and
the material in the chapters?
12. How are the titles of Chapter 2 and Chapter 17 related? What do you think those titles
mean?
13. Some examples and exercises in the book use data from Penn State University and the
University of California at Irvine. Why do you think data from those two schools are
used so often?
C. Getting to Know about the Thought Questions
“Thought Questions” are scattered throughout the book. Several also are repeated as
activities in Part 2 of this manual. Your instructor may use them in a formal way. If not, we
encourage you to still read them and try to answer them on your own.
14. What is the premise behind the “Thought Questions” in the book? (Hint: See page xv
of the Preface.)
4
15. Give an example of a Thought Question from Chapter 3. What is the page number
where you found it?
16. Where can you find a hint for the answer to a Thought Question?
5
D. Getting to Know about the Exercises
Exercises are located at the end of every chapter. Except for Chapters 1 and 17, exercises
specific to each section of the chapter are given. The following questions will help you
identify other useful features of the exercises.
17. What is visually different about the pages in the book that contain exercises, other than
the words on the page?
18. What does it mean when the number of an exercise is in bold?
19. Notice that for Exercise 4.41 (page 141), the exercise number 4.41 and the letter for
parts b and c are bold, but the letter for part a is not bold. What does that mean?
20. Some exercises in each section are designed to help you learn about the more basic
concepts and methods. How are the exercises that cover these “basic skills” indicated
in the text?
21. How many Skillbuilder exercises are there for Section 7.5? Is the number of
Skillbuilder exercises the same for every section?
22. Notice that near the bottom of page 282, in the left margin, there is a note that says
“8.4 Exercises are on pages 309-310.” What does that margin note mean?
6
23. On what page do the exercises for Section 9.3 begin? Give two ways you could have
discovered that page number.
24. Give one place where you can find answers for selected exercises, and one place where
you can find fully worked solutions. (Hint: The introduction to the Exercises for every
chapter provides this information.)
25. Locate the answer to Exercise 10.1c. On what page is the answer located, and what is
the answer?
26. All chapters except Chapter 1 have a set of exercises that may apply to material from
any one or more of the sections in the chapter. What is the heading for those exercises?
On what page do these exercises start in Chapter 11?
27. What does it mean when an exercise is marked with a blue laptop icon? Use Exercise
2.67 on page 61 as an example, and explain why the orange diamond is appropriate in
that case.
28. How do “Dataset Exercises” differ from exercises marked with an orange diamond?
(Hint: See page 67.)
7
E. Getting to Know about the Companion Website
The companion website for this book (http://www.cengage.com/statistics/utts5e) contains a
wealth of resources. Experience has taught us that some students never discover the
resources on the companion site! The following activities will get you acquainted with
what’s on them.
To access the student resources, go to the webpage listed above. On the right side of the
page, you will see the student resources for the book.
29. Turn to Example 4.2 on page 116 of your book. How can you tell there is additional
information available for this example on the companion website?
30. What additional information is available on the companion website for Example 4.2 on
page 116?
31. Continuing the exploration of Example 4.2 from the previous question, find the
“Original Source” on the companion website. Write the title of this original source.
32. Continuing the exploration of Example 4.2 from the previous questions, what is the
relationship between Example 4.2 and the “Original Source” for it?
33. List the resources available on the companion website.
8
F. Getting to Know about the Datasets accompanying the Book
The companion website includes numerous “datasets.” (The general definition of a
“dataset” is on page 16 of the book.)
34. The datasets are available in eight different formats. List what they are, and identify
which one or more will be most useful to you.
35. Locate the dataset called MusicCDs on the companion website and open the dataset in
a format that is familiar to you. What is the relationship between Example 2.9 on page
31 in the book and the dataset MusicCDs?
36. Continue to work with the MusicCDs dataset from the previous question. Notice that
there are two columns, labeled "CDs" and "Sex." The first row has "220" and
"Female." Explain what information this tells you about the first person in this dataset.
37. Now open the dataset deprived. Find the last student in the dataset. How many hours
of sleep did this student claim to get per night, and did the student say he or she feels
sleep-deprived?
Hours of sleep:______
Deprived? _______
38. Turn to Exercise 3.3 on page 100 and Exercise 3.100 on page 111.
9
a. What is the name of the dataset used for these two exercises?
b. Do you have to access the dataset to do Exercise 3.3? What about to do Exercise
3.100? Other than reading the exercise, how did you know whether the dataset was
required in each case?
10
G. Getting to Know about the Technology Manuals
The companion website contains six Technology Manuals. Each one explains how to solve
statistics problems using a specific technology. For example, there is a manual explaining
how to solve statistics problems using Excel.
39. Aside from Excel, what other four technologies covered by the manuals?
40. Open the manual that covers the technology that you will be using in your class. What
is the relationship between the organization of the technology manual and the
organization of Mind On Statistics?
H. Getting to Know about the Supplemental Topics
On the companion website, there is section called “Supplemental Topics.” Access the site
and answer these questions.
41. How many Supplemental Topic chapters are there?
42. What is the title of Supplemental Topic 5?
43. Are the Supplemental Topics listed in the Table of Contents in the book?
44. An example of an entry in the Index of Mind On Statistics is “American Statistical
Association, S5-2.” Suppose that you want to read page S5-2. Where would you find
it?
11
I. Getting to Know about the Skillbuilder Applets on the Companion Website
There are six applets on the companion website that allow you to investigate concepts
interactively. You won’t be ready to learn the statistical content related to them until you
cover the relevant material in the book, but the following exploration will acquaint you
with how they work.
45. Turn to the Table of Contents on page vii in the book. Notice that below Section 2.7 it
says “Applets for Further Exploration.” Where in the book are can you find
information about other Applets for further exploration?
46. Access the applets on the companion website, and then click on link for the “Random
Sampling in Action” applet, the applet that is described in Chapter 5, pages 178-179 in
the book.
a. When you first open the applet, it should look like one of the figures in Section 5.7
of the book. What is the number of the figure that shows what it looks like, and
what page is it on?
b. Press the button labeled Sample. Describe what happens.
c. There are exercises in the book to accompany this applet. On what page do these
exercises start, and what are the exercise numbers?
12
PART II
ACTIVITIES BY CHAPTER
13
CHAPTER 1 ACTIVITIES
Activity 1.1 To begin, review Case Study 1.1 on page 2 in the book.
a. The paragraph prior to Moral of the Story contains, “In fact, three-fourths of men have
driven 95 miles per hour or more, but only one-fourth of the women have done so.” On the
basis of the five number summaries, explain how we know that this is the case.
(Hint: The lower quartile is a value such that about 1/4 of the data values are less than or
equal to it. The upper quartile is such that about 3/4 the data value are less than or equal to
it.)
b. About what fraction of the females reported a “fastest ever driven speed” between 80
and 95 mph? Briefly explain how you determined this?
c. The last few pieces of data for males and females, respectively, are as follows:
Males: 112, 120, 110, 115, 125, 55, 90
Females: 95, 110, 80, 95, 90, 80, 90
Using Figure 1.1 on page 2 as a guide, draw a dotplot that compares these seven males and
seven females.
d. Refer to the data values given for the seven males in part c. Show that the median for
those seven values is 112.
14
Activity 1.2 To begin, review Case Study 1.3 on pages 3-4 in the book.
a. What was the population of interest for the survey described in Case Study 1.3?
b. What sample was used to represent the population described in part a?
c. What was the value of the margin of error for this survey? Write a sentence that
interprets the meaning of this value.
d. Suppose that a different survey asking the same question(s) collected data from 1000
randomly selected teens who have dated. What will be the margin of error for this survey?
e. Refer back to part d. Suppose that in the survey of 1000 teens who have dated, the
percentage who had dated somebody of another race or ethnic group was 53%. In that case,
give an interval that is 95% certain to include the true percentage of U.S. teens who have
dated somebody of another race or ethnic group.
15
Activity 1.3 To begin, review Case Study 1.5 and Case Study 1.6 on pages 5-6 in the book.
a. Explain how Case Study 1.5 is an example of an observational study whereas Case Study
1.6 is an example of a randomized experiment.
b. For Case Study 1.5, explain why smoking and alcohol use could be confounding
variables in this study.
c. Explain why it might not be possible to conclude that attending religious services
regularly may cause lower blood pressure on the basis of the studies described in Case
Study 1.5.
d. Explain why it is possible, on the basis of the experiment described in Case Study 1.6, to
make a cause-and-effect conclusion that aspirin use might reduce heart attack rates,
16
Activity 1.4 Read the original journal article for Case Study 1.5 about religious activities
and blood pressure. It can be found on the textbook companion website.
a. Briefly describe how the sample was selected for the study described in the journal
article.
b. An important concept in statistics is a sample should be representative of the larger
population for which conclusions are made. Discuss what larger population you think is
represented by the sample used in this study. That is, to whom do the results of this study
apply?
c. Summarize some of the limitations of the study that the authors of the article discuss.
17
CHAPTER 2 ACTIVITIES
NOTE TO INSTRUCTORS: Three group/team projects for Chapter 2 suitable for inclass work are in the Course Support (Class Projects) section of the companion website.
Activity 2.1 This is Thought Question 2.1 on page 17 in the book.
There were almost 200 students who answered the survey questions given on page 15.
Formulate four interesting questions about the students that you would like to answer using
the data from these students. What kind of summary information would help you answer
your questions?
a.
Question 1 about the dataset:
Desired Summary Information:
b.
Question 2 about the dataset:
Desired Summary Information:
c.
Question 3 about the dataset:
Desired Summary Information:
d.
Question 4 about the dataset:
Desired Summary Information:
18
Activity 2.2 Use the material in Section 2.1 for guidance on this activity.
a. Explain the distinction between the terms sample data and population data.
b. In what circumstance would the data described in Section 2.1 represent sample data? In
what circumstance would it represent population data?
Activity 2.3 This is Thought Question 2.2 on page 18 in the book. Review the data
collected in the statistics class, listed on pages 15-16 in Section 2.1. Identify a type
(categorical, quantitative, ordinal) for each variable. The only one that is ambiguous is
question 5 of the survey. That question asks for a numerical response, but as we will see
later in this chapter, it is more interesting to summarize the responses as if they are
categorical.
Variable
Variable Type
1. Sex
________________
2. Hours of sleep
________________
3. Choice of S or Q
________________
4. Height
________________
5. Number picked
________________
5. Fastest ever driven
________________
7. Right handspan
________________
8. Left handspan
________________
19
Activity 2.4 This activity is about Example 2.1 on page 21.
a. What percentage of the sample said that they wear a seatbelt when driving either
“Always” or “Most times?” Show how you calculated the answer.
b. What percentage of the females said that they wear a seatbelt when driving either
“Always” or “Most times?” Show how you calculated the answer.
c. What percentage of the males does not always wear a seatbelt when driving? Show how
you calculated this value.
d. Draw a bar graph of the information given in Table 2.1 on page 21 in the book. Use
Figure 2.3 on page 23 for guidance. (Note: It’s easy to draw this “by hand,” but if you want
to do use software to draw the graph you’ll find the raw data in the YouthRisk03 dataset
on the companion website.
e. Explain whether you think the U.S. Centers for Disease Control would consider the data
used for Example 2.1 to be population data or sample data.
f. In Example 2.1, which variable is the explanatory variable and which variable is the
response variable?
20
Activity 2.5 This is a modified version of Thought Question 2.4 on page 23 in the book.
a. Redo the bar graph in Figure 2.4 on page 23 using counts on the vertical axis instead of
percentages. The necessary data are given in Table 2.3 on page 21.
b. Is the comparison of frequency of myopia across the categories of lighting as easy to
make using the bar graph with counts as it is with percentages? Generalize your conclusion
to provide guidance about what should be done in similar situations.
c. What do you learn from the bar graph of counts that you do not learn from the bar graph
of percentages?
21
Activity 2.6 To begin, review Example 2.10 on page 38. Now, suppose that a different set
of six quiz scores is 83, 76, 98, 90, 55, 85, 87.
a. Find the value of the median for this new set of scores. Show any details of your work.
b. Find the value of the mean for this new set of scores.
Activity 2.7 One basic idea in Example 2.12 on pages 38-39 is that for almost all things we
measure, there is a range of values that would be considered normal, but only one single
number that is the average. In everyday language, the words “normal” and “average” are
often confused. For instance, when weather reporters talk about the normal rainfall for a
given time or year, or what they really mean is the average rainfall. In the other direction,
we may talk about whether someone gets “average grades.” We don’t mean that they are
exactly at the average; instead we mean that they are in a range that includes the majority
of people.
a. Briefly explain how Example 2.12 (p. 38-39) is an example of how the words “normal”
and “average” may be confused.
b. Explain what is wrong with the following quote. A friend says: “I tend to have a lower
body temperature than normal – it’s closer to 98 degrees than 98.6 degrees.”
c. Explain what is wrong with the following quote. A teacher says: “I’ve taught this course
many times, and as a class, your results on the midterm were about average for students in
this course.”
22
Activity 2.8 Review Example 2.16 on page 45. Suppose that in a different year from the
one in the example, the weights of the members of the crew team are 192.4, 180.6, 203.6,
215.8, 175.0, 183.2, 199.4, 187.2, 111.6.
a. For this new set of weights, find the values of the median, the lower quartile and the
upper quartile.
b. For this new data, what is the value of the interquartile range?
c. Fill in the two blanks in the following sentence with numerical values.
For this new data, any value smaller than _______ or larger than _______ would be
marked as an outlier.
d. Draw a boxplot of the set of weights given in this activity. Use Figure 2.14 on page 34
for guidance.
23
Activity 2.9 Review Example 2.18 on pages 48-49 in the book. Suppose that a different set
of pulse rates is 62, 66, 72, 77, 83. Follow the steps in Example 2.18 to calculate the value
of the standard deviation for this new set of pulse rates.
Step 1: Calculate x , the sample mean.
.
Steps 2 and 3: Complete in this table:
Data Value
Difference between value
Squared Difference
and mean
between value and mean
Step 4: Determine value of the variance.
Step 5: Take the square root of the Step 4 answer in order to find the standard deviation.
24
Activity 2.10 Suppose that car and truck speeds at a particular location have approximately
a bell-shaped distribution with mean = 65 mph and standard deviation = 5 mph.
For parts a-c, use the empirical rule to fill in the blanks in each part:
a. About 68% of cars and trucks travel between _______ and _______ at this location.
b. About 95% of cars and trucks travel between _______ and ________ at this location.
c. About 99.7% of cars and trucks travel between _______ and ________ at this location.
d. Illustrate the distribution of vehicle speeds at this location, by drawing a picture similar
to Figure 2.23 on page 50.
e. Calculate a z-score for a vehicle speed of 72 mph at this location (where mean = 65 mph
and standard deviation = 5 mph.)
f. Fill in the blanks in the following sentence: A vehicle speed of 72 mph is _______
standard deviations _______________ the mean speed at this location. (Hint: See the last
sentence before the definition box of The Empirical Rule on page 51).
25
Activity 2.11 This activity uses the Empirical applet described in Applets for Further
Exploration (pages 52-53). To start, read Section 2.8 and also start the applet on your
computer.
a. For each of the eight variables, characterize the shape of the distribution as
approximately bell-shaped, somewhat skewed, or extremely skewed. Also note whether
there are major outliers, minor outliers or no outliers. Look carefully to make sure you
notice any outliers that are at the extremes of the histograms.
Shape?
Outliers?
Sleep
_______________
_________
TV Hours
_______________
_________
Dad’s Height
_______________
_________
Exercise
_______________
_________
Ideal Height
_______________
_________
Alcohol
_______________
_________
Handspan (females)
_______________
_________
Handspan (males)
_______________
_________
b. In general, if there are major outliers in a data set:
(i) Will they cause the standard deviation to be larger or smaller than it would be for
the data without the outlier(s)? Explain.
(ii) Will the widths of the intervals used in the Empirical Rule (Mean ± one s.d.,
etc.) be bigger or smaller than they would be without the outlier(s)? (This answer
follows directly from your answer in part a.)
(iii) Using your answer in part b, do you think the percent of the data covered by the
interval Mean ± one s.d. will be higher or lower when outliers are present than it
would be if outliers were not present? Explain.
26
c. Fill in the values found using the applet, for the percent of the data in each of the
following intervals.
Mean ± one s.d
Mean ± two s.d.
Sleep
____________
____________
TV Hours
____________
____________
Dad’s Height
____________
____________
Exercise
____________
____________
Ideal Height
____________
____________
Alcohol
____________
____________
Handspan (females)
____________
____________
Handspan (males)
____________
____________
d. Study the results from activities a and c.
What can you conclude about how well the Empirical Rule works for these data sets? In
what situation (shape, outlier status) does is work best? In what situation (shape, outlier
status) does it work least well?
e. For which of the two intervals, Mean ± one s.d. or Mean ± two s.d., did the Empirical
Rule work less well when outliers and a skewed shape were present?
27
CHAPTER 3 ACTIVITIES
NOTE TO INSTRUCTORS: Three group/team projects for Chapter 3 suitable for inclass work are given in the Course Support (Class Projects) section of the companion
website.
Activity 3.1 This is a modified version of Thought Question 3.2 on page 74 in the book.
a. The following scatterplot shows adult daughters’ heights versus mothers’ heights, in
inches, as reported by 132 females in a statistics class. You would now like to predict how
tall your infant niece will be when she grows up. Explain how you would use this
scatterplot to help you make the prediction.
b. Suppose that your niece’s mother (your aunt) is 62 inches tall. About how tall do you
predict your infant niece will be when she grows up? Explain how you determined this
prediction.
c. What other variables, aside from her mother’s height, might be useful for improving
your prediction of your niece’s height? How could you use these variables in conjunction
with the mother’s height to make your prediction?
28
Activity 3.2 To begin, review the In Summary box that begins at the bottom of page 77 in
the book.
a. What does the slope of a regression line estimate?
b. What does the intercept of a regression line estimate?
c. On page 76 in the book, the equation Handspan = − 3 + 0.35 Height is given. (See
Example 3.5 for further discussion.) What is the value of the slope of this regression line?
Write a sentence that interprets this value in the context of the height and handspan
variables.
d. What is the intercept of the equation given in part c and on page 76? Explain why this
intercept does not provide useful information about height and handspan. (Hint: Think
about the definition you wrote in part b, and consider that the x-variable is height.)
e. Review Example 3.7 on page 78 in the book. What is the value of the slope for the
regression line in that example? Write a sentence that interprets this value in the context of
the sign-reading distance and age variables.
f. Continue with Example 3.7 on page 78. What is the estimated average sign-reading
distance for drivers who are 22 years old?
29
Activity 3.3 This is Thought Question 3.5 on page 94 in the book.
a. Sketch a scatterplot with an outlier that would inflate the correlation between the two
variables.
b. Sketch a scatterplot with an outlier that would deflate the correlation between the two
variables.
Activity 3.4 This activity uses the Correlation applet described in Applets for Further
Exploration (pages 97-99 in the book). Start by reading Section 3.6 and also start the applet
on your computer. Use the first scatter plot frame in the applet to do the following.
a. Create a set of points that has a correlation value within 0.05 of the target correlation of
0.5, using at least 15 points. Do this without including any extreme outliers. Draw a sketch
of the plot that you created.
b. Create a set of points such that the correlation is about 0.8 for the first 14 points. Then
add a single outlier that lowers the correlation to about 0.5. You may have to play with
adding and deleting points until you figure out how to do this. Draw a sketch of the plot
that you created.
30
c. Create a set of points such that the correlation is close to 0 for the first 14 points. Then,
add a single outlier that increases the correlation to about 0.5. You may have to play with
adding and deleting points until you figure out how to do this. Characterize the type of
outlier that made this happen. Was it in line with the other points?
d. Using the results from parts b and c, characterize the affect different types of outliers
have on the value of the correlation. Explain what type of outlier inflates correlation and
what type of outlier deflates correlation.
Activity 3.5 This activity uses the Correlation applet described in Section 3.6 (pages 9698 in the book). Start by reading Section 3.6 and also start the applet on your computer.
Use the second scatter plot frame in the applet to do the following.
a. Create a set of points that has a correlation value within 0.05 of the target correlation of
−0.8, as instructed, using at least 15 points. Do this without including any extreme outliers.
Draw a sketch of the plot that you created.
b. Create a set of points by putting a tight cluster of points in the upper left corner for the
first 14 points, such that the correlation is no more than about 0.2 in absolute value (i.e. it’s
between −0.2 and +0.2). Then add a single outlier such that the correlation increases to be
within 0.05 of the target of −0.8, just by adding the outlier. Draw a sketch of the plot that
you created.
31
Activity 3.5 Continued
c. Create a scatter plot illustrating two groups where the correlation is positive within each
group, but the correlation for the combined groups is within 0.05 of the target correlation
of −0.8. Draw a sketch of the plot that you created.
d. On the basis of the results of parts b and c, describe two ways to create a strong negative
correlation that would be misleading, if the interpretation of the strong negative value is
that as one variable increases steadily the other decreases steadily.
Activity 3.6 This activity uses the Correlation applet described in Section 3.6 (pages 9698 in the book). Start by reading Section 3.6 and also start the applet on your computer.
Using the third scatter plot frame in the applet, play with various ways in which you can
place points to get within 0.05 of the target correlation of 0. Describe some different ways
to do this.
32
Activity 3.7 This is a dataset activity. Use the Temperature dataset. The data are latitude
and temperature data for 20 U.S. cities. Latitude is the geographic latitude of the city and
JanTemp is the mean January temperature.
a. Make a scatterplot showing the connection between JanTemp (y-variable) and latitude
(x-variable). On the basis of this plot, answer these questions:
(i) Does it look like a straight line is a suitable description of the data, or do the data look
to be curved?
(ii) Is the correlation between the two variables positive or is it negative?
b. Use statistical software to determine the equation of the regression line. Write the
equation.
c. What is the value of the slope of the regression line found in part b? Write a sentence
that interprets this slope.
d. Imagine a city not in the data set is at latitude = 40. What is the predicted January
temperature for this city? Show work.
e. Refer to the previous part. Suppose the imaginary city at latitude = 40 has a mean
January temperature = 36. For this city, what is value of the residual (prediction error).
Show work.
f. Determine the correlation between JanTemp and latitude. Give the numerical value of
the correlation. Then, briefly discuss why this value indicates that there is a strong negative
association between JanTemp and latitude.
33
Activity 3.8 On the companion website, refer to the original Journal Article for Chapter
11—Example 11.12: “Development and initial validation of the Hangover Symptoms
Scale: Prevalence and correlates of hangover symptoms in college students.” On page 1447
it says: “The HSS [Hangover Symptoms Scale] was significantly positively associated with
the … typical quantity of alcohol consumed when drinking (r = 0.40).”
a. What two variables were measured for each person to provide this result? Which of these
two variables is the response variable and which is the explanatory variable in this
situation?
b. Explain what is meant by r = 0.40.
c. Look back at Case Study 1.6 on pages 5-6 in the book. On the basis of the definition of
“statistically significant” given on page 6 in the book, explain what you think it means to
say that the hangover symptoms variable was “significantly positively associated” with the
typical quantity of alcohol consumed variable.
34
Activity 3.9 For parts a-c, refer to the original Journal Articles on the companion website.
In each case, discuss which of the four interpretations of an observed association given in
Section 3.5 (pages 94-95) in the book might apply.
a. The journal article for Chapter 2 – Example 2.2: “Myopia and Ambient Lighting at
Night.”
b. The journal article for Chapter 10 – Case Study 10.2: “A Controlled Trial of SustainedRelease Bupropion, a Nicotine Patch, or Both for Smoking Cessation.”
c. The journal article for Chapter 17 – Exercises 17.17-17.34 Study #2: “Effects of
Walking on Mortality among Nonsmoking Retired Men.”
35
CHAPTER 4 ACTIVITIES
NOTE TO INSTRUCTORS: Three group/team projects for Chapter 4 suitable for inclass work are given in the Course Support (Class Projects) section of the companion
website.
Activity 4.1 This is Thought Question 4.1 on page 113 in the book.
a. Hair color and eye color are related characteristics. What exactly does it mean to say that
these two variables are related?
b. Suppose that you know the hair colors and the eye colors of 200 individuals. How would
you assess whether the two variables are related for those individuals?
Activity 4.2 Read Example 4.2 on page 116 of the text. Use the information given in that
example to answer the parts of this activity.
a. Refer to Table 4.3 on page 116. Among the 114 couples who are separated, what
percentage contains no smokers? Among the couples who are separated, what percentage
contains only one person who smoked? Among the couples who are separated, what
percentage contains two people who smoked?
b. Among the 1384 persons who are not separated, what percentage contains no smokers?
Among the couples who are not separated, what percentage contains only one person who
smoked? Among the couples who are not separated, what percentage contains two people
who smoked?
36
Activity 4.2 Continued
c. Explain why the answers to parts a and b of this activity suggest that there may be a
relationship between smoking habits and the likelihood of marital separation. (Hint: See
the In Summary box on page 117 in the book for an explanation of what constitutes a
relationship between two categorical variables.)
Activity 4.3 The following table gives data from the first 24 observations in the
YouthRisk03 dataset, which contains data from 12th grade students in a 2003 survey of
U.S. high school students. The survey was conducted by the U.S. Centers for Disease
Control. The variables here are student gender and what the student says what action they
have done about their weight in the past 3 months (four categories = stay the same, try to
gain, try to lose, nothing).
Gender
Weight Action
Male
Male
Female
Female
Male
Female
Female
Female
Female
Male
Male
Male
Male
Male
Female
Female
Male
Female
Female
Male
Female
Female
Female
Female
Stay same
Gain
Lose
Gain
Nothing
Lose
Nothing
Lose
Lose
Lose
Gain
Lose
Gain
Gain
Lose
Lose
Gain
Lose
Gain
Nothing
Lose
Lose
Gain
Stay same
37
a. Create a two-way table of counts summarizing the information given in the data table
above.
Activity 4.3 continued
b. Using the counts determined in part a, calculate conditional percentages appropriate for
comparing the weight action responses of males and females. That is, calculate the
percentage in each weight action category for males and also (separately) for females.
c. Now use the complete YouthRisk03 dataset that’s on the companion website. Using a
computer and statistical software, repeat parts a and b for the whole dataset.
d. Write a few sentences that summarize the results of part c.
38
Activity 4.4
In Example 4.2 (p. 116) the likelihood of marital separation was associated with smoking
habits. Were the data collected in an observational study or in an experiment? Do you think
that cigarette smoking may cause an increase in the marital separation rate? Do you think
getting separated may cause a person to start smoking? How would you explain the
association between smoking habits and the likelihood of marital separation?
Activity 4.5 Review Example 4.2 on page 116 and refer to the counts given in Table 4.3 on
page 116 in the book.
a. Calculate the risk of being a smoker for those who are separated.
b. Calculate the risk of being a smoker for those who are not separated.
c. Calculate the relative risk of being a smoker for those who are separated compared to
those who are not separated. Write a sentence that explains this relative risk in a way that
the general public would understand.
d. What is the percent increase in the risk of being a smoker for those who are separated
compared to those who are not separated.
39
Activity 4.6 In the news (or on the Internet), find an article in which a relative risk is
described.
a. Give the details of where you found the article, and briefly summarize what the article is
about. Be sure to give the main result about the relative risk being described in the article
you found.
b. On page 121 in the book, review the three questions that should be considered when you
encounter statistics about risk. Discuss whether the article you found adequately addresses
the issues in these questions. For instance, are the actual risks given?
Activity 4.7 In the news (or on the Internet), find an example of an observational study in
which the relationship between two categorical variables is described.
a. Give the details of where you found the article, and briefly summarize what the article is
about. Be sure to give the main result.
b. Does the article discuss whether any “third variables” were taken into account by the
researchers when examining the relationship under study? If so, what “third variables”
were considered? If not, what “third variables” do you think should be considered in the
situation described in the article?
40
Activity 4.8 The parts of this activity are the Thought Questions in Section 4.2 of the book.
a. This is Thought Question 4.2 on page 118 in the book.
Based on the study described on pages 113-114 in Section 4.1, the relative risk of
developing any myopia later in childhood is 5.5 for babies sleeping in full light compared
with babies sleeping in darkness. Restate this information in a sentence that the public
would understand.
b. This is Thought Question 4.3 on page 120 in the book.
Suppose that a newspaper article claims that drinking coffee doubles your risk of
developing a certain disease. Assume that the statistic was based on legitimate, wellconducted research. What additional information would you want about the risk before
deciding whether or not to quit drinking coffee?
c. This is Thought Question 4.4 on page 123 in the book.
If you were a frequent beer drinker and were worried about getting colon cancer, would it
be more informative to you to know the risk of colon cancer for frequent beer drinkers or
the relative risk of colon cancer for frequent beer drinkers compared to nondrinkers?
Which of these statistics would likely be of more interest to the media? Explain your
responses.
41
Activity 4.9 This activity is about Example 4.11 on page 124 in the book
a. In Example 4.11, what is the response variable? What is the explanatory variable? What
is the confounding factor?
b. Explain how Example 4.11 is an illustration of Simpson’s Paradox.
Activity 4.10 These questions are about Case Study 4.2 on page 133 in the book.
a. Write null and alternative hypotheses about the two variables in Case Study 4.2.
b. In Case Study 4.2, what is the population of interest? What is the sample?
c. Explain which one of the following three statements is the most correct way to state a
conclusion about Case Study 4.2. (Hint: See the discussion at the bottom of page 132.)
1. There is no relationship between gender and driving after drinking for young
drivers in Oklahoma.
42
2. The sample evidence is not strong enough to say that there is a relationship
between gender and driving after drinking for young drivers in Oklahoma.
3. There is a relationship between gender and driving after drinking for young drivers
in Oklahoma.
43
Activity 4.11 On the companion website, refer to the original Journal Article for Chapter
11 -- Example 11.12: “Development and initial validation of the Hangover Symptoms
Scale: Prevalence and Correlates of Hangover Symptoms in College Students.”
a. Near the bottom of page 1445 of the article, the result of a chi-square test is given. Write
the null and alternative hypotheses for this test.
b. What was the result of this chi-square test? Write a conclusion in the context of this
study.
44
CHAPTER 5 ACTIVITIES
NOTE TO INSTRUCTORS: Three group/team projects for Chapter 5 suitable for inclass work are given in the Course Support (Class Projects) section of the companion
website.
Activity 5.1 Suppose that n = 300 students in statistics classes at a large university are
asked “How important is religion in your own life (very, fairly, not very)?” The researcher
would like to use the results to make generalizations about all persons at least 18-years old
in the United States.
a. For the researcher’s desired use of the data, describe the population of interest and the
observed sample.
b. Do you think the sample described in part b should be used to generalize about the
population described in part a? Explain why or why not.
c. Suppose that the variable studied was handedness (right-handed or left-handed). For that
variable, could we use the sample of Stat 200 students to generalize about the larger
population? Explain.
45
Activity 5.2 For each part, locate the original Journal Article on the companion website. In
each case, describe how the sample was selected for the study described in the article and
discuss what larger population, if any, is represented by the sample for the response
variable(s) in the study.
a. The journal article for Chapter 11—Example 11.12: “Development and initial validation
of the Hangover Symptoms Scale: Prevalence and correlates of hangover symptoms in
college students.”
b. The journal article for Chapter 13 – Exercise 13.39: “A Prospective Study of Holiday
Weight Gain.”
c. The journal article for Chapter 3 – Example 3.3: “Some Exploratory Findings on the
Development of Musical Tastes.”
46
Activity 5.3 For guidance, review the definitions of types of bias on page150in the book.
Suppose that a national survey is done to study the extent of alcohol use by 12th grade
students in the United States. In the survey, n = 3000 students are asked various questions
about their alcohol use or (non-use).
a. In the context of such a survey, about alcohol use by 12th graders, explain the difference
between nonresponse bias and response bias.
b. In this situation, give an example of a way to collect data that may cause selection bias
to affect the results.
Activity 5.4 This is Thought Question 5.2 on page 154 in the book.
Suppose that a survey of 400 students at your school is conducted to assess student opinion
about a new academic honesty policy.
a. Based on Table 5.1 (p. 153), about what will be the margin of error for the poll?
b. How many students attend your school? Given this figure, do you think the values in
Table 5.1 should be used to estimate the margin of error for a survey of students at your
school? Explain.
47
Activity 5.4 To begin, review Example 5.5 on page 152 in the book.
a. For the study described in Example 5.4, what is the population and what is the sample?
b. What is the value of the margin of error for the survey described in Example 5.4? Write
a sentence that interprets this margin of error. (Hint: See the definition on page 151.)
c. Using information given in Example 5.4, calculate approximate 95% confidence
intervals that estimate the proportion and percentage of all adult Americans who would say
whether they would travel in outer space.
d. If the sample size for this survey had been 500 rather than 1,019, what would have been
the value of the margin of error for the survey?
Activity 5.6 In Example 5.6 on page 156 in the book, the ID numbers for Sample 2 were
determined using a table of random digits. Verify that the ID numbers given for Sample 2
are correct by showing the details of selecting numbers from a table of random digits and
converting them to ID numbers.
48
Activity 5.7 Suppose that you will use data collected from a sample of eight students in
your statistics class to estimate the average amount of time that all students in the class
spend studying statistics each week. Your teacher will allow you to collect the data during
a class meeting. [Note: We’re assuming that there will be more than eight students in class
that day!]
a. Describe how you would pick students for the sample using a simple random sampling
procedure.
b. Describe how you would pick students for the sample using a stratified random sampling
procedure.
c. Describe how you would pick students for the sample using a cluster sampling
procedure.
Activity 5.8 This is Thought Question 5.3 on page 162 in the book.
In Section 5.1, the Fundamental Rule for Using Data for Inference stated that “available
data can be used to make inferences about a much larger group if the data can be
considered to be representative with regard to the question(s) of interest.” Read the
description of how the ABC News poll in Example 5.7 (page 161) was conducted. Do you
think the results of the poll can be extended to a larger group than the 779 people in the
sample? If so, to what group can the results be extended and why?
49
Activity 5.9 To begin, review the discussion of the term sampling frame on page 162 in
the book. Then, consider the following situation. Suppose that a Gallup Poll is done using
random digit dialing to reach individuals in households with land-line telephones. The
purpose of this particular poll is to estimate the proportion of U.S. adults who favor
stronger gun control laws.
a. Describe the distinction between the sampling frame and the population in this situation.
b. Explain whether you think the difference between the sampling frame and the
population would (or would not) lead to selection bias in this situation.
c. Suppose that n = 1000 adults are surveyed and 63% of the sample favors stronger gun
control laws. Calculate an approximate 95% confidence interval to estimate the proportion
of all adults in the United States in favor of stronger gun control laws. Use any of
Examples 5.3 or 5.4 on pages 152-153 for guidance.
Margin of error =
Approximate 95% confidence interval =
Activity 5.10 This is Thought Question 5.4 on page 164 in the book.
Suppose you want to know how students at your school feel about the computer services
that are offered. You are able to obtain the list of e-mail addresses for all students who are
taking statistics classes, so you send a survey to a simple random sample of 100 of those
students and 65 respond. Using the difficulties discussed so far in this section (pages 162164), explain to whom you could extend the results of your survey and why.
50
Activity 5.11 Review pages 166-174 in Section 5.6 of the book. Then, find an example of
a survey on the web or in print media that demonstrates one of the possible sources of bias
listed on pages 166-167.
a. Briefly describe where you found this survey and what the survey was about.
b. Explain which source of bias is demonstrated in the survey.
c. Discuss how the survey should have been changed in order to eliminate the source of
bias that you listed in part b.
Activity 5.12 This activity uses the Sampling applet described in Section 5.7 (pages 174175 in the book). To begin, review Section 5.7 and start the applet on your computer.
a. Take 20 samples, and then click on “Show Results.” You should see a popup window
with 20 lines, where each line gives the mean height and percent female for one sample.
Write down the sample mean heights for the 20 different samples, and then draw a dotplot
of these 20 sample means. (Examples of dotplots are Figure 1.1 on page 2 and Figure 2.9
on page 30.)
51
b. Describe the characteristics of the sample mean height for the 20 different sample means
found in part a. For instance, what were the lowest and highest values of mean height?
How do the mean heights from the samples compare to the mean height for the population
of all 100 individuals, which is 68.0 inches? Based on a single sample of 10 individuals,
are you likely to get a good estimate of the mean height for the population?
c. Does the mean height for the sample appear to be related to the percent of females in the
sample? Would you expect them to be related? Explain.
d. Read the explanation of how to take a systematic sample, on page 159-160 of the book.
How would you take a systematic sample of five individuals from this population of 100
stick figures in the applet display?
e. Take a systematic sample of five individuals (by hand). Explain what you did in enough
detail so that someone else could find your sample?
f. In using a sample to estimate the mean population height and percent female, would
results from a systematic sample be biased? Explain.
52
CHAPTER 6 ACTIVITIES
NOTE TO INSTRUCTORS: Four group/team projects for Chapter 6 suitable for in-class
work are given in the Course Support (Class Projects) section of the companion website.
Activity 6.1
a. This is Thought Question 6.1 on page 190 in the book.
For many randomized experiments, researchers recruit volunteers who agree to accept
whichever treatment is randomly assigned to them. Why do you think this strategy cannot
always be used, thus requiring observational studies to be used instead?
b. This is Thought Question 6.3 on page 194 in the book.
For most randomized experiments, such as medical studies comparing a new treatment
with a placebo, it is unrealistic to recruit a simple random sample of people to participate.
Why is this case? What can be done instead to make sure the Fundamental Rule for Using
Data for Inference (p. 148 in Section 5.1) is not violated?
Activity 6.2 Explain in your own words what a confounding variable is. Give an example
where an apparent causal relationship between the explanatory and response variable is
probably influenced by a confounding variable. Don’t use an example given in Chapter 6.
Activity 6.3 This is Thought Question 6.2 on page 192 in the book.
53
Choose a possible confounding variable for the situation in Example 6.1 (p. 191-192),
other than the ones mentioned in the example, and explain how it meets the two conditions
necessary to qualify as a confounding variable (defined on p. 191).
Activity 6.4 Find an example in the news (or on the Internet) of an observational study for
which the news story or headline is attributing a cause and effect relationship. Cite the
source for your example and explain what relationship was found. Discuss possible
confounding variables for the study, or other explanations that may account for the
observed relationship. In general, do you think the differences or changes in the
explanatory variable were responsible for a difference in the outcome variable?
54
Activity 6.5 Find an example of a successful randomized experiment in the news (or on the
Internet) that you think may apply to you, now or when you are older. Cite the source for
your example and explain what relationship was found. Do you think that if you changed
your behavior based on the explanatory variable (diet, taking aspirin, meditating, etc) you
would experience a change in the outcome variable as a result? In general, do you think the
differences or changes in the explanatory variable were responsible for a difference or
change in the outcome variable?
Activity 6.6 This is Thought Question 6.4 on page 200 in the book.
Students are sometimes confused by the reasons for blocking and for randomization. One
method is used to control known sources of variability among the experimental units, and
the other is used to control unknown sources of variability. Explain which is which, and
provide examples illustrating these ideas.
55
Activity 6.7 As the basis for this activity, use the original Journal Articles for chapters
other than Chapter 6 on the companion website.
a. Among the original Journal Articles (but not for Chapter 6), identify a study based on a
randomized experiment. Briefly summarize the purpose of the study and the principal
result(s).
b. For the study you identified in part (a) of this activity, explain what the researchers did
to make the study be a randomized experiment.
c. Among the original Journal Articles (but not for Chapter 6) identify a study based on an
observational study. Briefly summarize the purpose of the study and the principal result(s).
d. In the observational study that you identified in part c, what confounding variables did
the researchers take into account?
e. Explain whether the observational study that you identified in part c was a retrospective
study or a prospective study.
56
Activity 6.8 Two restaurant servers, one male and one female, participate in a study done
to examine the effect of drawing a happy face on the customer’s bill. For same customers,
each server drew a happy face on the bill. For other customers, no drawing or message was
put on the bill. It was randomly determined as to whether a happy face would be drawn or
not. The researchers wanted to see if drawing the happy face increased tip percent.
a. Is this an experiment or an observational study? Explain.
b. The purpose of most statistical studies is to use the observed sample data to generalize to
a larger group. What do you think are the weaknesses of using this study to generalize to
all restaurant servers?
c. In this study, what is the response variable and what are the explanatory variables?
d. If you were a restaurant server, would you be more interested in your mean tip or your
median tip? Explain.
57
Activity 6.9 Design an experiment to test something of interest to you. Identify the
response and explanatory variables for this experiment. Outline how treatments or
conditions will be assigned. Discuss any other steps you might take to safeguard against
the difficulties discussed in Section 6.4.
58
Activity 6.10 Design an observational study to examine something of interest to you.
Summarize the purpose for this study, and identify the response and explanatory variables.
Discuss how you would collect the data. What are some possible confounding variables in
your study? How might you account or control for these confounding variables when
examining (or collecting) the data?
Activity 6.11 Write brief explanations or definitions for each of the following terms.
Suggestion: Use the Key Terms list on page 210 in the book to locate these terms in
Chapter 6.
a. Randomized experiment
b. Observational study
c. Confounding variable
d. Randomization
e. matched-pairs design
f. block design
59
g. Rule for Concluding Cause and Effect
60
CHAPTER 7 ACTIVITIES
INSTRUCTORS: Two group/team projects for Chapter 7 suitable for in-class work are
given in the Course Support (Class Projects) section of the companion website.
Activity 7.1
a. This is Thought Question 7.1 on page 223 in the book.
Review Case Study 7.1 (p. 222) and the list of five random circumstances on page
220.Using your understanding of probability and random events, assign probabilities to the
two possible outcomes for Random Circumstance 3 on page 220.
b. This is Thought Question 7.2 on page 223 in the book.
Review Case Study 7.1 (p. 222) and the list of five random circumstances on pages 220. At
the beginning of Alicia’s day, the outcomes of the five random circumstances listed were
uncertain to her. Which of them were uncertain because the outcome was not?
Activity 7.2
a. Give an example of a personal probability for an event of interest to you. Assign a
probability to the event and describe how you decided on this probability.
b. Give an example of an event for which the relative frequency interpretation of
probability can be used. (Don’t use any examples from Section 7.2 in the book.) Give the
probability for this event and interpret the probability in terms of relative frequency.
61
Activity 7.3 This is Thought Question 7.3 on page 228 in the book.
You are about to enroll in a course for which you know that 20% of the students will
receive a grade of A. Do you think that the probability that you will receive an A in the
class is .20? Do you think the probability that a randomly selected student in the class will
receive an A is .20? Explain the difference in these two probabilities, using the distinction
between relative frequency probability and personal probability in your explanation.
Activity 7.4 Flip a coin 100 times. After each 10 flips (that is, after 10 flips, 20 flips, 30
flips, etc.), stop and compute the proportion of heads using all flips up to that point. Plot
the proportion of heads versus the number of flips. Discuss how the plot relates to the
relative frequency interpretation of probability.
Flips
10
20
30
40
50
Heads
Proportion=
Heads/Flips
62
60
70
80
90
100
Activity 7.5 Review Example 7.7 on page 229 in the book, and use that example for
guidance. For this activity, use the same sample space of 1000 three-digit lottery numbers
that is used in Example 7.7.
a. Let event C = the first digit drawn is a 7. What is the value of P(C)?
b. Let event D = the three digit number is an even number (ends in 0, 2, 4, 6 or 8). What is
the value of P(D)?
c. Let event E = the sum of the three digits drawn is 2. What is the value of P(E)? (Hint:
What are the three-digit numbers for which the three digits sum to the value 2? )
Activity 7.6 This is Thought Question 7.4 on page 232 in the book.
Review Case Study 7.1 (p. 220). Remember that there were 50 students in Alicia’s
statistics class and that student names were not put back in the bag after being selected.
a. Consider the events A = Alicia is selected to answer question 1 and B = Alicia is selected
to answer question 2. Describe each of the following four conditional probabilities in
words and also determine a value for each:
P(B|A), P(BC|A), P(B|AC), P(BC|AC).
b. Now, based on your answers (to part a), can you formulate a rule about the value of
P(B|A) + P(BC|A)?
63
Activity 7.7 The In Summary box on page 232 and other material in Section 7.3 can be
used for guidance in this activity. Pick some random situation (for instance, drawing cards
from a 52-card deck or tossing dice). Using that random situation, give examples for each
of the following:
a. Two events that are complements of each other.
b. Two mutually exclusive events
c. Two independent events
d. Two dependent events
Activity 7.8 Read Example 7.15 on page 234-235 in the book; use the information in that
example for this activity.
a. Give values for each of the following events.
C = roommate doesn’t like to party
P(C) = _______
D = roommate doesn’t snore
P(D) = ________
C and D = doesn’t like to party and doesn’t snore
P(C and D) = ______
b. For events C and D define d in part a, determine the value of P(C or D). Write a
sentence that describes or interprets this probability.
64
Activity 7.9 On the companion website, refer to the original Journal Article for Chapter
7—Examples 7.12, 18, 24, and 28: “A Comparison of Gambling by Minnesota Public
School Students in 1992, 1995, and 1998.” The data are from essentially the whole
populations of 9th and 12th graders in Minnesota during the years considered in the study.
a. Use tables 4 and 5 on pages 283 and 284 of the article to determine values of the
following conditional probabilities for 12th grades students in 1998:
P(12th grade student is weekly gambler | student is boy) = _________
P(12th grade student is weekly gambler | student is girl) = _____________
Note: Example 7.12 on page 230 is about 9th grade students.
b. Review Example 7.16 on page 233 in the book. Assume that the 12th grade population is
50.9% girls and 49.1% boys, as was assumed for 9th grade students in Example 7.16.
Calculate the probability that a randomly selected 12th grade student in Minnesota is a
female who is a weekly gambler.
c. Review Example 7.24 on pages 241-242 For 12th grade students, create a “hypothetical
hundred thousand” table like the one in Example 7.25 for 9th grade students.
Weekly Gambler
Not Weekly Gambler
Total
Boy
Girl
100,000
Total
d. Use the table created in part c to determine for 12th grade students, the probability that a
weekly gambler is a boy. That is, determine P(boy | weekly gambler).
65
Activity 7.10 This is Thought Question 7.5 on page 242.
Continuing the DNA example, Example 7.25 (p. 242), verify that the conditional
probability P(DNA match| innocent person) =
5
 .00000083 .
5,999,999
Then provide an explanation that would be understood by a jury for the distinction between
the two statements:
The probability that a person who has a DNA match is innocent is 5/6.
The probability that a person who is innocent has a DNA match is .00000083.
Activity 7.11 This is Thought Question 7.6 on page 245.
Explain why the tree diagram in Figure 7.1 (p. 241) displayed disease status first and test
results second rather than the other way around.
66
Activity 7.12 To begin, review Example 7.25 on page 243 in the book.
a. Using the results given in Table 7.4, estimate the probability that prize 1 is not in any of
the six cereal boxes.
b. Assuming that the six boxes are independent, what is the theoretical probability that
prize 1 is none of the six boxes? (Hints: See Rule 3 on page 233 for independent events
and remember that the probability is ¾ that prize 1 will not be in any specific box.)
c. If you have access to the necessary statistical software, repeat the simulation done in
Example 7.29 on page 245. (You’ll get different results!) Use your simulation to estimate
the probability that all four prizes will be in six boxes of cereal.
Activity 7.13 This is Thought Question 7.8 on page 252.
If you wanted to pretend to be psychic, you could do a “cold reading” on someone you do
not know. Suppose you are doing this for a 25-year-old woman. You make statements such
as the following:
I see that you are thinking of two men, one with dark hair and the other one with
slightly lighter hair or complexion. Do you know who I mean?
I see a friend who is important to you but who has disappointed you recently.
I see that there is some distance between you and your mother that bothers you.
Using the material in this section (Section 7.7), explain why this would often work to
convince people that you are psychic.
67
Activity 7.14 Find a story of a startling coincidence in the news or on the Internet. (One
way to do this is to type “amazing coincidence” into a search engine. You will get plenty
of material.) Evaluate roughly how likely you think it is that the specific sequence of
events would happen to the specific person to whom they happened. Then, evaluate
roughly how likely something similar to that sequence of events would be to happen to the
specific person. Finally, evaluate how likely you think something similar to that sequence
of events would be to happen to someone, somewhere, someday.
Activity 7.15




Ask several (around 10) people to each write down what they think would be a typical
sequence of 20 coin flips. Have them write H for heads and T for tails. As an example,
they might write H,H,T,H,T,T,T,T,H,T,T,H,H,T,H,T,H,H,H,T.
After a person has written the sequence, count and record the length of the longest
streak of consecutive “flips” of the same type. For the example just given, this value
will be four, which is the number of consecutive T’s beginning at the 5th “flip” of the
sequence.
Then, flip a coin 20 times and record the length of the longest streak of consecutive
flips of the same type (longest number of consecutive heads or consecutive tails).
Repeat this several times (around 10).
Compare the lengths of the longest streaks in the imagined “typical” sequences to the
lengths of the longest streaks in actual coin flips. Summarize the results and discuss
whether the results may provide evidence that people’s imagined flips are affected by
gambler’s fallacy.
68
Activity 7.16 To begin, read Case Study 7.2 on pages 252-253 in the book.
a. In the second column of Case Study 7.2, the probabilities used for events B, C and D are
108/119, 96/118, and 84/117, respectively. Explain why these are the right probabilities.
b. Now suppose that a playlist of songs on an iPod or an MP3player has a total of 60 songs,
with 10 songs from each of six albums. The music player can randomly shuffle the order of
the songs. What is the probability that at least two of the first three songs after the random
shuffle are from the same album? As in Case Study 7.2, first find the probability that the
first three songs are from different albums and then subtract that value from 1.
c. Suppose that you have access to statistical software that can randomly order a list of
words or numbers. For the situation in part b (10 songs from each of 6 albums), describe
how you could use this software to do a simulation for the purpose of estimating the
probability that at least two of the first three songs are from the same album.
d. If you have access to the necessary statistical software, do a simulation in order to
estimate the answer to part b. Base your answer on at least 25 different random orderings.
Summarize your results and give the estimated probability.
69
e. For the situation described in part b, what is the probability that in the first six songs of a
random shuffle there is one song from each of the six albums? Either calculate the value
theoretically or do a simulation. (Or, do both things.)
f. Create your own probability question about an outcome of a random shuffle of an iPod
(or other player’s) playlist. Use simulation to estimate the probability of the outcome of
interest for your question. As an example, for the situation described in Case Study 7.2,
you might examine the probability that first five randomly ordered songs are from only two
albums.
70
CHAPTER 8 ACTIVITIES
INSTRUCTORS: Two group/team projects for Chapter 8 suitable for in-class work are
given in the Course Support (Class Projects) section of the companion website.
Project 8.2 on the companion website is a more sophisticated version of Activity 8.13 on
page 79 in this manual.
Activity 8.1 This is Thought Question 8.1 on page 266 in the book.
If you know that the number of possible values a random variable can have is finite, do you
know whether the random variable is discrete or continuous? Answer the same question for
a random variable that can have an infinite number of possible values.
Activity 8.2 Review Example 8.6 on page 267. Then, modify the example so that it is
about families with two children rather than three children.
a. List the sample space of possible sequence (arrangement) of the sexes of the two
children.
b. Define the random variable X = number of girls among the two children. Assign the
appropriate value of X to each simple event in the sample space.
c. Give a probability for each possible value of X.
Probability of 0 girls
P(X = 0) =
Probability of 1 girl
P(X = 1) =
Probability of 2 girls
P(X = 2) =
71
Activity 8.3 Review the subsection on page 269 about a cumulative probability
distribution function.
a. What is a cumulative probability? In words, give an example of a cumulative probability
for a variable not considered on page 269.
b. In Example 8.8 (p. 269), it was found that P(X ≤ 1) = 4/8. In the context of the example,
write a sentence that describes the event for which this is the probability.
c. For Example 8.8 (p. 269), explain why P(X ≤ 2) was calculated in the way shown in the
example.
Activity 8.4 This is Thought Question 8.2 on page 270 in the book.
In Example 8.9 (p. 269-270), we added the probabilities of X = 1 and X = 2 because they
were mutually exclusive events. For any discrete random variable X, is it always true that X
= k and X = m are mutually exclusive events, where k and m represent two values that X
can have? Explain.
72
Activity 8.5 In a statistics class at a large university, students were asked to rate how much
they liked various kinds of music on a scale of 1 (don’t like at all) to 6 (like very much).
Following is a probability distribution for the female students’ ratings of Top 40 music.
Notice that the probability is not given for a rating of 4.
X=Rating
Probability
1
.04
2
.05
3
.09
4
?
5
.32
6
.26
a. Is the random variable X = music rating a discrete variable or a continuous variable?
Explain.
b. What is the value of the probability (not given) for X = 4, the probability that rating
equals 4? Hint: What is the total of the probabilities given for all values of X that are not
equal to 4? By the “laws” of probability, what is the total probability for all possible
outcomes?
c. Determine P(X ≤ 3), the probability that a Top 40 music rating given by a randomly
selected female is 3 or less. To do this, add the probabilities for X = 1, 2 and 3.
d. For each Top 40 rating value, determine the cumulative probability. For guidance, see
Example 8.8 on page 269.
1
2
3
4
5
6
X = Rating
Cumulative
1
Probability
e. Write a sentence that explains the meaning of the cumulative probability P(X ≤ 4) in the
context of this example.
f. For this example, show that P(X = 5) + P(X = 6) = 1 – P(X ≤ 4)
73
Activity 8.6 This is Thought Question 8.3 on p. 271 in the book. Refer to the probability
distribution for the sum of two dice, shown at the top of page 268 and in Figure 8.2 on
page 270.
a. What is the value of P(X = 4)? What does this probability measure?
b. Explain what is measured by the value of 1– P(X = 4).
c. This is an added part to Thought Question 8.3. What is the value of 1– P(X = 4)?
Activity 8.7 Review Example 8.11 on pages 271-272 in the book. Then, modify the
example so that the probability distribution for the amount a player gains on a single play
is as follows:
X = amount gained
Probability
$3 −$2
.3 .7
a. Calculate the expected value for this probability distribution.
b. Write a sentence that interprets the expected value in this situation. For guidance see the
sentence after the calculation in Example 8.11 on pages 271-272.
74
Activity 8.8 This is Thought Question 8.4 on page 272 in the book. Suppose that the
probability of winning in a gambling game is .001, and when a player wins, his or her net
gain is $999. When a player loses, the net amount lost is $1 (the cost to play). Is this game
fair? Why or why not? How would you define a fair game? Does the number of times the
game is played affect your view of whether the game is fair? Explain.
Activity 8.9 Find two lottery or casino games that have fixed payoffs and for which the
probabilities of each payoff are available. Some lottery tickets list them on the back of the
ticket or on the lottery’s Web site. Some books about gambling give the payoffs and
probabilities for various casino games.
a. Compute the expected value for each game. Discuss what each expected value means.
b. Using both the expected values and the list of payoffs and probabilities, explain which
game you would rather play and why.
75
Activity 8.10 The television game show Deal or No Deal, on NBC in the U. S., requires
simple probability assessments on the part of the contestant. At the game’s beginning, the
player chooses one of 26 numbered briefcases, each concealing a different money amount,
with the 26 amounts ranging from $0.01 to $1,000,000. Then, in each round of the game,
the amounts concealed in some of the other briefcases are revealed, after which the “bank”
offers the player an amount of money (“deal”) to stop playing. If the player says “no deal,”
more briefcases are opened and the player is offered a different “deal.” The game ends
when the player accepts a deal or when he/she has rejected all deals and accepts the amount
concealed in the originally picked briefcase.
The Deals: In rounds late in the game, deals offered by the bank tend to be in the range of
75% to 95% of the average amount in the unopened briefcases at that point. Early in the
game, the deals are much lower, in part to encourage a player to keep playing.
a. Suppose that there are four unopened briefcases remaining, including the player’s
original pick, and the remaining amounts are known to be $10, $50, $10,000 and $400,000.
The bank offer (deal) is $70,640. If the player says “no deal,” one more briefcase will be
opened and a new deal will be offered. Discuss whether the player should accept the deal
or say “no deal.” (Suggestion: Consider the potential deals and their probabilities if another
case is opened.)
b. Suppose that a different player has three unopened briefcases remaining, and the
remaining amounts are known to be $0.01, $5, and $500,000. The bank offer (deal) is
$152,503. If the player says “no deal,” one more briefcase will be opened and a new deal
will be offered. Discuss whether the player should accept the deal or say “no deal”
76
c. The 26 money amounts randomly distributed to the suitcases in the game are:
$0.01; $1; $5; $10; $25; $50; $75; $100; $200; $300; $400; $500; $750;
$1,000; $5,000; $10,000; $25,000; $50,000; $75,000: $100,000; $200,000:
$300,000; $400,000; $500,000; $750,000; $1,000,000.
For players who decide in advance that they will reject all deals in favor of the
amount in the first case that they picked, what is the expected value (average
value) of the game?
For a player who decides in advance to reject all deals and take the amount in the
first case picked, what is the probability that the amount he or she wins is less than
the expected value of the game?
d. (This part might best be done by student groups). On the Internet, you may (probably) be
able to find a simulation of the game. Using an Internet simulation of the Deal or No Deal
game and/or other means, such as statistical software that can do random selection and
perhaps theoretical considerations, propose and investigate a possible strategy for doing
well (winning money!) in the game. As an example, what are your chances of doing well if
you reject all deals until four unopened briefcases remain? Summarize your findings and
how you arrived at these findings.
77
Activity 8.11
a. This is Thought Question 8.5 on page 277 in the book.
The word binomial is from the Latin bi = “two,” and nomen = “name.” Explain why the
word binomial is appropriate for a binomial random variable.
b. This is Thought Question 8.6 on page 280 in the book.
In Example 8.17 (p. 2278-279), we determined the probability that you could guess your
way to a passing score on a quiz with 15 true-false questions. If you did guess at each of 15
true-false questions, what is the expected value of X = number of correct answers? Is the
expected value a possible score on the quiz? What exactly does the expected value tell us?
.
Activity 8.12 Review Case Study 8.1 on pages 280-281 in the book.
a. Explain why a single participant’s twenty trials are a binomial experiment with n = 20
and p = .5, if it is assumed that he or she cannot really detect the difference between
samples A and B so is just randomly guessing on every trial.
b. Briefly summarize how the researchers used the binomial distribution to define the
standard that they used for a “significant” flavor detection performance.
78
Activity 8.13 Work with a partner. You’ll assume the role of the “experimenter.” To do
this activity, you’ll need four cards of the same rank from a 52-card deck. For example,
you might use the four Kings (King of hearts, King of diamonds, King of clubs, King of
spades). Randomly mix the cards and then select one in a way that your partner can’t see
what you picked. To guard against your own selection bias, don’t look at the cards while
making your selection. Have your partner guess the suit of the card you picked. Repeat this
10 times, and keep track of right and wrong guesses using the following table.
Trial
1
2
3
4
5
6
7
8
9
10
Right or
Wrong?
a. What was the number of correct guesses made by your partner?
Number of correct guesses by partner = _____
b. Assuming a person randomly guesses each time, explain why this is a binomial
experiment and give the values of n and p for the experiment.
c. Use statistical software (or Excel) or an appropriate TI calculator to determine the
following probabilities for somebody randomly guessing on all tries. Tip: See the software
tips on pages 297 and 298 in the book.
Probability of 0 correct guesses in 10 tries: P(X = 0) =
Probability of 2 or fewer correct guesses in 10 tries: P(X ≤ 2) =
Probability of 6 or fewer correct guesses in 10 tries: P(X ≤ 6) =
d. Determine the probability that somebody who is randomly guessing could make more
correct guesses than your partner did. (Hint: The first step is to find the cumulative
probability for the number of correct guesses by your partner.)
79
Activity 8.14
a. This is a shortened version of Thought Question 8.7 on page 283.
The total area under the probability density function over the entire range of values the
random variable X can possibly have is the same for all continuous random variables.
What is that total area? What probability does it represent?
b. This is Thought Question 8.8 on page 284 in the book.
Which of the following measurements do you think are likely to have a normal
distribution: heights of college men, incomes of 40-year-old women, pulse rates of college
athletes? Explain your reasoning for each variable. For those variables that are likely to be
normally distributed, give approximate values for the mean and standard deviation.
c. Give an example of a continuous random variable that you think does not have a normal
distribution and sketch what you think is its density curve. Don’t use any examples given
in Chapter 8.
80
Activity 8.15
a. Follow the steps in Example 8.24 (p. 289) to determine the probability that the height of
a randomly selected college woman is less than 67 inches. Use the same population mean
and standard deviation used in the book example. Draw a clearly labeled sketch that
illustrates the answer. Use Figure 8.16 (p. 285) for guidance.
81
Activity 8.16 Suppose that the heights of college-age men have a normal distribution with
mean μ = 71 inches and standard deviation σ = 2.7 inches.
a. Using Example 8.26 (pages 291) for guidance, find the 75th percentile of heights for
college-age men.
b. At the bottom of the table inside the back cover of the book, the information is given
that for z = 4.26, the cumulative probability is .99999. Expressed as a percentile, z = 4.26 is
the 99.999th percentile of a standard normal curve. Find the height that is the 99.999th
percentile of heights for college-age men.
Activity 8.17 Write definitions or short explanations for each of the following key terms in
section 8.6:
a. Normal random variable:
b. Normal curve:
c. Standardized score:
d. Standard normal random variable:
82
CHAPTER 9 ACTIVITIES
INSTRUCTORS: Two group/team projects for Chapter 9 suitable for in-class work are
given in the Course Support (Class Projects) section of the companion website.
Activity 9.1 For guidance, use pages 313-315 in the book.
a. Briefly explain the purpose for creating a confidence interval. Give an example of a
situation in which a confidence interval would be useful.
b. Briefly explain the purpose for conducting a hypothesis test. Give an example of a
situation in which it would be useful to conduct a hypothesis test.
c. Define, or explain, the term statistical inference.
d. In each situation, explain whether the summary described is a population parameter or a
sample statistic.
(i) Mean GPA of a random sample of 144 students at your school.
(ii) The proportion of all U.S. residents who are under 20 years old.
(iii) The Gallup Poll surveys 1,000 randomly selected American adults and finds that 48%
approve of the president’s job performance.
83
Activity 9.2 (for Section 9.2 in the book)
a. Read Example 9.1 on page 316 in the book. What does the parameter d represent in
this example? Why are we not able to know the value of this parameter? What is the value
of the sample statistic that estimates the parameter d in this example?
b. Section 9.2 describes five different population parameters (“The Big Five Parameters”)
that will be covered in Chapters 9-13. List the five population parameters. Describe each
parameter in words and write the symbol used for each parameter. Give an example for
each parameter; give different examples from those given in Section 9.2.
84
Activity 9.3 Read the original Journal Article on the companion website for Chapter 13 –
Exercise 13.39: “A Prospective Study of Holiday Weight Gain.”
a. Identify a population parameter of interest in this study. Be sure to describe both the
population of interest and the summary characteristic of interest.
b. From the journal article, what is the value of the sample estimate of the population
parameter that you identified in part a? What was the sample size for the dataset that the
researchers used to determine this sample estimate?
c. Explain why it would have been useful for the researchers to have used a confidence
interval to estimate the population parameter that you described in part a. (Confidence
intervals are discussed on pages 313-315 in the book.
Activity 9.4 This is Thought Question 9.1 on page 326 in the book.
For Example 9.4 (p. 326), into what range of possible values should the sample proportion
fall 95% of the time, according to the Empirical Rule? Suppose that the polling
organization used a sample of only 600 voters instead. Would the range of possible sample
proportions be wider, narrower, or the same as it was for a sample size of 2400? Explain
your answer, and explain why it makes intuitive sense.
85
Activity 9.5 To begin, review Example 9.4 on page 326 in the book.
a. Suppose that the sample size for Example 9.4 was n = 1000 voters, rather than 2400 as in
the book example. With n = 1000, what will be the values of the mean and standard
deviation of the sampling distribution of the sample proportions?
b. Still referring to Example 9.4 on page 326 – If the sample size is n = 1000 (rather than
2400), what will be the interval that covers about 99.7% (nearly all) of possible sample
proportions in favor of Candidate X?
Activity 9.6 To begin, review Example 9.5 on pages 327-328 in the book.
a. In the study described in Example 9.5, suppose that there had been 45 participants, so
that the combined number of trials is n = 45 × 20 = 900. Determine the mean and standard
deviation of the sampling distribution of possible proportions of correct guesses.
b. For a study with 45 participants and n = 900 trials, draw a sketch similar to Figure 9.4
(p. 327) that shows the approximate sampling distribution of possible sample proportions,
assuming that all participants guess every time.
86
Activity 9.7 The purpose of this activity is to examine characteristics of sample means for
different samples from the same population (Section 9.6 in the book). Use the
Student0405 dataset on the companion website). The data are from student surveys in
statistics classes at a large university in the years 2004-2005. The variable StudyHrs gives
responses to, “How many hours do you typically study per week?”
a. Using statistical software, draw a histogram of the StudyHrs variable. Is the shape bellshaped, skewed to the right or skewed to the left? About what is the most common
response for weekly hours of study?
b. Determine the mean and standard deviation of the StudyHrs variable.
Mean study hours =
_________
Standard Deviation of study hours data = __________
c. Now, treat the dataset as if it were population data, so think of the values found in part b
as population parameters. We will take many different random samples n = 36 individuals
from the population of responses for the StudyHrs variable. What are the mean and
standard deviation of the distribution of possible values of the sample mean, for samples of
n = 36, in this situation?
Mean of possible sample means is  = _______
Standard deviation of possible sample means is

n
= ____________
d. Assuming that the sampling distribution for a sample mean (p. 333) holds, fill in the
blanks in this statement: For about 68% of all random samples of n = 36 students,
the sample mean hours of study will be between
______ and _______.
e. Use statistical software to select a random sample of n = 36 values from the StudyHrs
column in the dataset. (Minitab users: Calc>Random Data>Sample From Columns.)
Then, use the software to determine sample mean for the sample you selected.
Sample Mean for selected sample = _________
87
f. Now, repeat the process for the previous part nineteen additional times. Each time, get a
new random sample of n = 36 values from the StudyHrs column and find the mean for the
sample. List all 20 sample means that you’ve generated (the mean from part e and the
nineteen new means from this part).
g. Of your 20 sample means listed part f, how many were within the interval that you
computed in part d. Explain whether this is about what would be expected or not.
h. What fraction of your 20 sample means were within ±1 hour of the population mean
(from part b)? Suppose that we were to take 20 different random samples of n = 100. Do
you think that the fraction of sample means within ±1 hour would be more, less, or the
same as it was for your sample of n = 36? Explain.
i. Suppose that the sample means generated by all students in your class are combined and
a histogram of these sample means is drawn. Approximately what would you expect the
shape of this histogram to be? Explain.
j. Refer back to the list of means that you wrote for part f of this activity. On the basis of
that list, which one of the following statements could be a “moral of the story” for this
activity?
 The value of a sample statistic varies from sample to sample.
 The value of a population parameter varies from sample to sample
88
Activity 9.8 To begin, review Example 9.8 on page 333 in the book.
a. Suppose that the sample size in Example 9.8 is changed to n = 64. What will be the
values of the mean and standard deviation of the sampling distribution of potential sample
means?
b. Explain what is being shown in Figure 9.7 on page 334.
Activity 9.9
a. This is Thought Question 9.2 on page 333 in the book.
Construct an example of interest to you personally for which the Rule for Sample Means
applies and for which a study could be done to estimate a population mean.
b. This is Thought Question 9.3 on page 335 in the book.
From the weight-loss example discussed in Example 9.7 (p. 331) and also on pages 334335, we learned that increasing the sample size fourfold would about halve the range of
possible sample means. Would the range of individual weight losses in the sample be likely
to increase, decrease, or remain about the same if the sample size were increased fourfold?
Explain.
89
Activity 9.10 For this activity use the SampleMeans applet described in Section 9.11
(pages 349-351) in the book. The applet is on the companion website. Additional activities
for this applet are on pages 366 in the book.
a. Use the default sample size of n = 25, to generate 500 different samples. What is the
approximate shape of the histogram of sample means? (This will be the histogram in red,
the bottom histogram in the display.) Is this the shape that would be predicted by the
normal curve approximation rule for sample means?
b. In this situation, what does the normal curve approximation rule predict that the mean
and standard deviation of possible sample means will be, for samples of size n = 25? Does
it look like the mean of the histogram is about what is should be? (Note that for the
population of individual measurements, µ = 8 and σ = 5.)
c. According to the Empirical Rule, what interval of values will contain about 99.7% of the
possible values of sample means for samples of size n = 25. Does the histogram of sample
means appear to span about this range?
d. For one simple random sample of n = 25 individuals, how likely is it that the mean
weight loss would be 4 pounds or less? Use the histogram of sample means to evaluate
this.
e. For one simple random sample of n = 25 individuals, explain whether it is likely that the
mean weight loss could be 9 pounds or more. Use the histogram of sample means to
evaluate this.
90
Activity 9.11 For this activity, use the TVMeans applet. This applet is essentially the same
as the SampleMeans applet described in Section 9.11 (pages 349-351) in the book, but the
population is responses given by college students to a question asking how many hours
they watch television in a typical week. The TVMeans applet is on the companion
website.
a. What is the shape of the histogram of the population of individual measurements (the top
histogram)?
b. Generate 2000 samples with n = 4 observations. (Use 4 batches of 500 samples without
clearing.) Observe the bottom histogram, which gives the histogram of the means for the
2000 samples. Is the histogram approximately bell-shaped? If not, is the shape what you
would expect? Explain.
c. Repeat part b using random samples of n = 16.
d. Repeat part b using random samples of n = 25.
e. Repeat part b using random samples of n = 36.
f. Repeat part b using random samples of n = 49.
g. Based on the results of this activity, about how large should the sample size n be for the
Normal Curve Approximation Rule for Sample Means to work in this situation?
91
Activity 9.12
a. This is Thought Question 9.4 on page 346 in the book.
Verify that if the raw data for each individual in a sample is 1 when the individual has a
certain trait and 0 otherwise, then the sample mean is equivalent to the sample proportion
with the trait. You can do this by using a formula, explaining it in words, or constructing a
numerical example.
b. This is Thought Question 9.5 on page 347 in the book.
The Central Limit Theorem does not specify what is meant by “a sufficiently large
sample.” What factor(s) about the population of values do you think determine how large is
large enough for the approximate normal shape to hold? Consider the California Decco
example. Do you think n= 30 would be large enough for the distribution of possible values
for the average loss to be approximately normal? Why or why not? Now consider the
handspan measurements of females. Do you think n= 30 would be large enough for the
approximate normal shape to hold? What is different about these two examples?
Activity 9.13 This is Thought Question 9.6 on page 348 in the book.
Example 9.15 (p. 347) described a sample statistic, H = highest number drawn, for the
Cash 5 lottery game. Give another example of a sample statistic for the Cash 5 game, and
describe what you think the shape of its sampling distribution would be.
92
Activity 9.14 On the companion website, refer to Chapter 9 of the Technology Manual for
the statistical software that you use in your class. Read over the description of how to carry
out a simulation for Example 9.4 (page 326) in Section 9.4 of the book.
a. Change the sample size to 1000 voters per sample and carry out the simulation with (at
least) 400 repeated samples. Create a histogram of the sample proportions. What is the
shape of the histogram? Is it about what you would expect? Explain whether the range of
sample proportions about what you would expect?
b. Change the sample size to 500 voters per sample and carry out the simulation with (at
least) 400 repeated samples. Create a histogram of the sample proportions. What is the
shape of the histogram? Is it about what you would expect? Explain whether the range of
sample proportions about what you would expect?
c. Compare the range of simulated sample proportions for samples of n = 2400 (see
Example 9.4), n = 1000 (part a), and n = 600 (part b). What is indicated about the benefits
of increasing the sample size of a survey? (Remember that the “true” proportion is p = .4 in
this situation.)
93
CHAPTER 10 ACTIVITIES
INSTRUCTORS: Two group/team projects for Chapter 10 suitable for in-class work are
given in the Course Support (Class Projects) section of the companion website.
Activity 10.1 This is Thought Question 10.1 on page 372 in the book. Each day, Maria gets
dozens of e-mail messages. She keeps track of what proportion of the messages is spam or
other junk, and what proportion is interesting. Suppose she got 50 messages yesterday, and
20 of them were interesting.
a. If the collection of messages on a single day is considered to be a sample of all e-mail
messages she ever receives, explain the meaning of each of the following definitions in the
context of this example, and give numerical values where possible: unit, population,
sample, sample size, population parameter, sample statistic (or sample estimate).
Unit =
Population =
Sample =
Sample size =
Population parameter =
Sample statistic =
b. Discuss whether you think the Fundamental Rule for Using Data for Inference (p. 372)
would allow Maria to draw conclusions about the population proportion based on the
sample proportion.
94
Activity 10.2
a. Describe the basic purpose of a confidence interval.
b. Give an example, not given in Chapter 10, of a situation in which a confidence interval
could be used to estimate the unknown value of a population proportion. Indicate the
population and population proportion of interest in your example.
Activity 10.3 Review Example 10.2 on page 375 in the book, and use that example for
guidance in this activity. Suppose that the sample result for a different poll on the same
issue is that 33% of a randomly selected sample of 635 American adults said they are
allergic to something.
a. For this new sample, what is the value of p̂ , the sample statistic?
b. For this new sample, calculate the value of the standard error of p̂ .
c. Using this new sample, calculate the 95% confidence interval that estimates p, the
proportion of all American adults who have an allergy.
d. Write a sentence that interprets the confidence interval calculated in the previous part.
95
Activity 10.4 This is Thought Question 10.2 on page 374 in the book.
Explain in your own words what it means to say that we have 95% confidence in the
interval estimate. Then give an example of something you do in your life that illustrates the
same concept: You follow the same procedure each time, and it either works (most of the
time) or does not work to produce the desired result. What confidence level would you
assign to the procedure in your example; that is, what percentage of the time do you think
it produces your desired result?
Activity 10.5 To begin, review Example 10.3 on page 378 in the book.
a. Using the information given in Example 10.3, calculate a 95% confidence interval that
estimates p = proportion of all Americans who think there is intelligent life on other
planets.
b. Write a sentence that interprets the confidence interval found in the previous part of this
activity.
c. In Example 10.3 of the book, what is the population parameter of interest and what is the
value of the sample statistic that estimates this parameter?
96
Activity 10.6 Suppose that a Gallup Poll is done using random digit dialing to reach
individuals in households with land-line telephones. The purpose is to estimate the
proportion of U.S. adults who favor stronger gun control laws. In the survey, n = 900
individuals are sampled. In this sample the number of individuals that favors stronger gun
control is 567.
a. What is the population of interest in this situation?
b. In words, describe the population parameter of interest. What mathematical symbol is
used to represent this parameter?
c. Describe the sample in this situation.
d. What is the value of the sample statistic in this situation? What mathematical symbol is
used to represent this statistic?
e. The formula for the standard error of a sample proportion is
pˆ (1  pˆ )
. Calculate the
n
standard error of the sample proportion in this situation.
f. Use the general format Sample Statistic ± 2 × Standard Error to calculate an
approximate 95% confidence interval that estimates the parameter of interest in this
situation, and write a sentence that interprets the confidence interval.
97
Activity 10.7 This is modified version of Thought Question 10.3 on page 380 in the book.
Suppose the legislature in a particular state wanted to know what proportion of students
graduating from the state university last year were permanent residents of the state. The
university had information for all students showing that 3900 of the 5000 graduates were
state residents.
a. In this situation, what is the population? What is the sample?
b. Is a confidence interval appropriate for this situation? If so, compute the appropriate
interval. If not, explain why not.
Activity 10.8 To begin, read Example 10.10 on page 388 in the book.
a. In words, describe the population parameter that is estimated in Example 10.10. What
mathematical symbol(s) is used for this parameter?
b. What is the value of the sample statistic that estimates the parameter of interest in this
example? What mathematical symbol(s) is used for this statistic?
c. What is the 95% confidence interval that estimates the parameter of interest in this
situation? Write a sentence that interprets this confidence interval.
d. Explain why the 95% confidence interval in this situation makes it reasonable to
conclude that that 12th grade female drivers are more likely than 12th grade male drivers to
always wear seatbelts when driving.
98
Activity 10.9 This is Thought Question 10.6 on page 389 in the book. An environmental
group is suing a manufacturer because chemicals dumped into a nearby river may be
harming fish. A sample of fish from upstream (no chemicals) is compared with a sample
from downstream (chemicals), and a 95% confidence interval for the difference in
proportions of healthy fish is .01 to .11 (with a higher proportion of healthy fish upstream).
First, interpret this interval. The statistician for the manufacturer produces a 99%
confidence interval ranging from –0.01 to +0.13. He tells the judge that because the
interval includes 0, and because it has higher accompanying confidence than the other
interval, we can’t conclude that there is a problem. Comment.
Activity 10.10 This is Thought Question 10.7 on page 392 in the book.
A randomly selected sample of 400 students is surveyed about whether additional coed
dorms should be created at their school. Of those surveyed, 57% say that there should be
more coed dorms. The 95% margin of error for the survey is 5%.
a. Compute a 95% confidence interval for the population percentage in favor of more coed
dorms.
b. On the basis of this confidence interval, can we conclude that more than 50% of all
students favor more coed dorms? Explain. Can we reject the possibility that the population
proportion is .60?
99
Activity 10.11 This is a dataset activity. Use the GSS-02 dataset, which gives data from
the 2002 General Social Survey. It’s a survey of randomly selected U.S. adults. A
description of the dataset is on the companion website. You’ll have to consult that
description to learn what variables are in this dataset.
a. Describe a population proportion that could be estimated using the GSS-02 data. (Don’t
estimate the proportion who are of a particular sex; make it more interesting than that!) Use
the data to calculate a 95% confidence interval that estimates this proportion. Write a
sentence that interprets the interval.
b. Analyze the difference between males and females with regard to the population
proportion that you considered in part a. What are the values of the sample proportions for
males and females? Calculate a 95% confidence interval for the difference in two
proportions. Interpret the result.
c. Write a short paragraph summarizing the results of parts a and b.
d. Now use the GSS-93 dataset, which gives data from the 1993 General Social Survey for
most of the same variables in the GSS-02 dataset. Examine the same variable and
corresponding proportion that you considered in part a of this activity. Compare the results
for the two different years (1993 and 2002).
100
Activity 10.12 Use the methods discussed in this chapter to estimate the proportion of all
cars in your area that are red. Stand near a busy street and count cars as they pass by. Count
100 cars and keep track of how many are red.
a. Using your data, compute a 95% confidence interval for the proportion of cars in your
area that are red.
b. On the basis of how you collected the data, describe any possible biases that are likely to
influence your results.
Alternative suggestion #1 for Activity 10.12: Rather than keeping track of red cars, keep
track of how many drivers out of 100 drivers are talking on a cell phone while driving.
Then carry out parts a and b of the activity.
Alternative suggestion #2 for Activity 10.12: Observe 100 pedestrians on your school
campus. Keep track of how many are talking on a cell phone. Create a confidence interval
for the proportion of pedestrians in your area who are talking on a cell phone at any given
time. Discuss any possible biases in your sampling method. For more fun yet, observe 100
male pedestrians and 100 female pedestrians. Calculate 95% confidence intervals that
estimate the proportions of male and females pedestrians talking on a cell phone and the
difference in the proportions of male and female pedestrians talking on cell phones. For
this variation of the activity, what do you think is the population represented by your
sample?
101
CHAPTER 11 ACTIVITIES
INSTRUCTORS: Two group/team projects for Chapter 11 suitable for in-class work are
given in the Course Support (Class Projects) section of the companion website.
Activity 11.1 Each part describes a scenario in which a sample will be used to estimate a
population value. In each scenario, (i) describe the parameter of interest (in words) and
give notation for that parameter, and (ii) give the value of the sample estimate of the
parameter and give notation for the estimate. See pages 405-407 for guidance.
a. We ask “What’s the average amount that students at our school sleep per night?” In a
survey, a random sample of students at a school is asked how much typically sleep each
night. A summary of the observed data is:
Variable
HrsSleep
N
994
Mean SE Mean
6.7943
0.0384
StDev
1.2116
Parameter description (in words):
Symbol for population parameter:
Value of sample estimate (with notation):
b. We ask “How much difference is there in the mean GPAs of fraternity members and
men who aren’t in fraternities at our school?” This was done in the same survey described
in part a. Here’s relevant observed data
Not in fraternity
In a fraternity
N
341
87
Mean
3.18
3.11
Parameter description (in words):
Symbol for population parameter:
Value of sample estimate (with notation):
Activity 11.1 continued
102
SE Mean
0.0245
0.0526
StDev
0.4518
0.4910
c. An anthropologist asks, “For adult women, how much difference is there, on average,
between the lengths of the forearm and the foot?” He measures the forearm and foot
lengths of a sample of 467 women. Here’s a summary of the sample data.
Arm
Foot
Difference
N
467
467
467
Mean
24.99
24.18
0.81
StDev
2.6115
2.1905
2.6592
SE Mean
0.1208
0.1014
0.1231
Parameter description (in words):
Symbol for population parameter:
Value of sample estimate (with notation):
Activity 11.2 For each of the three situations described on pages 405-407 in the book, give
one example that is different from the ones given on pages 405-407. For each of your
examples, give an example research question and a description of the parameter of interest.
103
Activity 11.3 This is Thought Question 11.2 on page 411 in the book.
Notice that all of the standard error formulas in this section (Section 11.1) have the sample
size(s) in the denominator. This tells us that if the sample size is increased, the standard
error will decrease (assuming that the sample statistics remain about the same). Refer to the
rough definition of standard error on page 409, and explain why this relationship between
sample size and standard error makes sense, based on that definition.
Activity 11.4 To begin, review Example 11.5 on pages 414-415. Now, suppose that we
have a sample of n = 12 forearm lengths for college women. For this sample, the mean is
x  23.1 cm and the standard deviation is s = 1.28 cm.
a. Calculate a 95% confidence interval that estimates the population mean forearm length
for women. Use the top portion of page 415 in the book for guidance.
b. Write a sentence that interprets the confidence interval calculated in part a. For
guidance, see the last sentence of Example 11.5.
c. What is the value of the standard error of the mean for the sample in this activity? In the
context of this activity, explain what this standard error measures. (See page 409 in the
book for a helpful definition.)
104
Activity 11.5 Give an example of a situation in which we would estimate the difference in
two population means based on independent samples and an example of a situation in
which we would estimate a mean difference based on paired data. Give examples different
from any given in Section 11.1 in the book. See pages 407-409 for a discussion of paired
data and independent samples.
Activity 11.6 This is Thought Question 11.3 on page 420 in the book.
a. What population do you think is represented by the sample of 175 students in Example
11.8 (p. 420)? Do you think the Fundamental Rule for Using Data for Inference (reviewed
on p. 372 of Chapter 10) holds in this case?
b. Does the confidence interval in Example 11.8 tell us that 95% of the students watch
television between 1.842 and 2.338 hours per day? If not, what exactly does the interval
tell us?
Activity 11.7 This is Thought Question 11.4 on page 423 in the book.
For a fixed sample, explain why it is logical that a 95% confidence interval covers a wider
range of values than a 90% confidence interval. Explain this in terms of our confidence that
the procedure works in any given case
105
Activity 11.8 Six individuals print letters of the alphabet, in alphabetical order, as they can
for fifteen seconds using their dominant hand. They then repeat the task with their
nondominant hand. The numbers of letters printed for the six individuals are as follows:
Individual
Dominant
Nondominant
1
25
13
2
39
16
3
34
13
4
27
10
5
30
17
6
43
19
a. This is paired data! Compute the difference between numbers of letters printed using the
two hands for each individual. List the sample of difference.
b. Using either statistical software, a calculator, or “by hand” work, calculate the sample
mean and the sample standard deviation of the sample of six differences.
Mean difference =
Standard deviation of differences =
c. Calculate a 95% confidence interval that estimates the mean difference in letters printed
using the two hands. Use either statistical software, a calculator, or “by hand” work to do
the calculations.
d. Write a sentence that interprets the 95% confidence interval for the mean difference.
e. Consider the format Sample statistic ± (Multiplier × Standard error). For the
confidence interval found in part c (for the mean difference), give values for each element
of this formula.
Sample statistic =
Multiplier =
106
Standard error =
Activity 11.9 This is Thought Question 11.5 on page 430 in the book.
In Section 11.3, we learned how to find a confidence interval for the mean of paired
differences, which we used in Example 11.9 (p. 422) to estimate the mean difference in
weekly computer and TV hours for a population of liberal arts students.
a. Explain why it would not have been appropriate to use the methods in section 11.4 for
two independent samples, to estimate that mean difference, even though in either case the
sample estimate is the difference in sample means, 5.36 hours.
b. If the methods in this section had been erroneously used by treating computer usage and
TV viewing hours as independent samples, do you think the standard error of the sample
estimate would have been larger or smaller than it was in Example 11.9? Explain your
answer using common sense, not formulas.
Think about how much natural variability there would be in the data for two independent
samples, compared with measuring both sets of hours on the same individuals.
Activity 11.10 This is Thought Question 11.6 on page 435 in the book.
Part of the quote in Case Study 11.1 (p. 435) said, “For any vertex baldness (i.e., mild,
moderate, and severe combined), the age-adjusted RR was 1.4 (95% CI, 1.2 to 1.9).”
Explain what is wrong with the following interpretation of this result, and write a correct
interpretation:
Incorrect Interpretation: There is a .95 probability that the age-adjusted relative risk of a
heart attack (for men with any vertex baldness compared to men without any) is between
1.2 and 1.9.
107
Activity 11.11 For this activity, use the ConfidenceLevel skillbuilder applet described in
Section 11.6 (pages 436-438) of the book. The applet is on the companion website.
a. Generate one sample with the Confidence level set at 68%. Now move the slider to
increase the confidence level. What happens to the center and the width of the interval?
Explain why this happens.
b. Generate one sample at a time with the Confidence level set at 68% until you get an
interval that does not cover the true mean of 170 (the red line). Now move the slider to
increase the confidence level. (The applet uses the same sample to create confidence
intervals with the new confidence levels.) Does the interval ever cover the true mean of
170? Explain what has happened.
c. If different random samples of the same size are taken from a population and a 95%
confidence interval estimate of a population mean is created each time, which of the
following change and which stay the same?
The endpoints of the interval
The true value of the population mean
The sample mean
d. Use the Reset button and then set the confidence level to 90%. Click animate!. Stop the
process after about 100 intervals have been generated. What percentage of the intervals
included the population mean value (170)? Is this percentage about what you would
expect?
d. Based on what you have learned from these activities and your reading of Chapters
10 and 11:
108
(i) Explain in your own words what the confidence level for a confidence interval is.
(ii) If 100 researchers each use data from a random sample to construct a 90%
confidence interval, will exactly ninety intervals cover the true population
parameter value and ten intervals not cover the true population value? Explain.
(iii) For any given sample and confidence interval, will the researcher know whether it
has covered the truth?
109
Activity 11.12 This is a dataset activity. Use the Student0405 dataset, which gives data
from a survey in statistics classes at a large university. The variable StudyHrs gives the
self-reported number of hours that a student studies per week.
a. Analyze the StudyHrs variable, assuming that the sample is representative of all students
at the university. In particular, create and interpret appropriate graph(s) of the data,
calculate useful descriptive statistics, and calculate and interpret a 95% confidence interval
that estimates the mean weekly study hours for all students at the university.
b. Compare weekly study hours for females and males. Your analysis should include a
graphical comparison of males and females, appropriate descriptive statistics, and a 95%
confidence interval that estimates the difference between mean weekly study hours for
females and males at the school. Interpret the 95% confidence interval to make a
conclusion about the difference between the means of the populations of males and
females.
110
Activity 11.13 This is a dataset activity. Use the pulsemarch dataset, which gives pulse
rates before and after marching in place for one minute, for 40 college students. The sex of
each student is also in the dataset.
a. Analyze the Before variable, which is the resting pulse rate before marching in place. In
particular, create and interpret appropriate graph(s) of the data, calculate useful descriptive
statistics, and calculate and interpret a 95% confidence interval that estimates the mean
pulse rate for the population represented by this sample.
b. Analyze the difference between the pulse rates after and before marching. The variables
are After and Before. In particular, create and interpret appropriate graph(s) of the
differences between the two pulse rates, calculate useful descriptive statistics, and calculate
and interpret a 95% confidence interval that estimates the mean difference in pulse rate
caused by marching for the population represented by this sample.
111
Activity 11.14 This is a dataset activity. Use the UCDavis2 dataset, which gives data
collected in a survey of a statistics class. Among other things, students reported their
estimates of their parents’ heights. The variables are dadheight and momheight.
Treating the parents’ heights as paired data, analyze the difference between the heights of
the two parents. In particular, create and interpret appropriate graph(s) of the differences
between the parents’ heights, calculate useful descriptive statistics, and calculate and
interpret a 95% confidence interval that estimates the mean difference in the heights of
students’ parents, for the population of students represented by this sample.
Outlier alert: Be on the lookout for outliers. You might find some unusual data here. If you
omit any data, describe what you did and why.
112
Activity 11.15 Use a dataset of your choosing from the companion website. Use the data to
create and interpret a 95% confidence interval that estimates the difference between two
population means based on two independent samples. Describe the population parameter of
interest and the sample used for your analysis. In addition to reporting the confidence
interval, create appropriate graphs of the data and report useful descriptive statistics for the
comparison.
Activity 11.16 Collect data on a quantitative variable of interest to you. Collect at least 30
observations. Using the data, compute a 95% confidence interval for the mean of the
population from which you sampled your observations. Explain how you collected your
sample and discuss whether you think there may be any biases in the results due to your
sampling method. Interpret the 95% confidence interval to make a conclusion about the
mean of the population.
113
CHAPTER 12 ACTIVITIES
INSTRUCTORS: Two group/team projects for Chapter 12 suitable for in-class work are
given in the Course Support (Class Projects) section of the companion website.
Activity 12.1 Review page 454 in the book. Then, explain the difference between a onesided or one-tailed hypothesis test and a two-sided or two-tailed hypothesis test. Give an
example of each.
Activity 12.2 This is Thought Question 12.1 on page 455 in the book.
Confidence intervals and hypothesis testing are the two major categories of statistical
inference. On the basis of this information, do you think null and alternative hypotheses are
generally statements about populations, samples, or both? Explain.
Activity 12.3 Think of a problem of interest to you which could involve testing
hypotheses. Describe the problem and state the null and alternative hypotheses of interest.
114
Activity 12.4 This is a modified and expanded version of Thought Question 12.2 on page
455 in the book.
a. For Example 12.4 (p. 455), explain what the null hypothesis and the alternative
hypothesis would be.
b. For Example 12.4 (p. 455), write a sentence that describes the probability question on
which the hypothesis testing was based. The sentence should be of the form: “If [fill in null
hypothesis] is true, then the likelihood that [fill in event] would have happened is [fill in
likelihood].”
115
Activity 12.5 This is Thought Question 12.3 on page 458
Suppose that an ESP test is conducted by having someone guess whether each of n coin
flips will result in heads or tails. The null hypothesis is that p= .5, and the alternative is that
p > .5, where p = probability of guessing correctly. Suppose that one participant guesses n
= 10 times and gets 6 right, while another guesses n = 100 times and gets 60 right. In each
case, the percent correct was 60%. Do you think the p-value would be lower in one case
than in the other, or would it be the same in both cases? Explain
Activity 12.6 Think of a decision between two possible actions or statements that is of
interest to you. Think of a situation where it’s not absolutely certain what the right decision
might be (for instance, whether or not to accept a request to go out on a date.) Label one of
the actions or statements as the “null hypothesis” and one as the “alternative hypothesis.”
Now, discuss what the type 1 and type 2 errors are in your situation and how the potential
consequences of each type might affect the decision making process.
116
Activity 12.7 Use Example 12.12 on pages 467-468 for guidance, but note that the null
value is different for this activity than in Example 12.12. In a marketing survey for an
automobile manufacturer, 90 randomly selected adults are asked which car color they
would choose, if a particular model was available in both blue and red body colors.
a. Let p = population proportion that would choose “blue.” The manufacturer wants to
learn if a majority of buyers would pick blue. Keeping in mind that a majority is p>0.5,
write null and alternative hypotheses about p in this situation.
H0:
Hs:
b. In the survey, 53 of the 90 respondents said “blue.” What is the value of p̂ =sample
proportion that picked blue?
c. “By hand,” calculate the value of the test statistic z 
pˆ  p0
p0 (1  p0 )
n
d. On the basis of the p-value, decide between the null and alternative hypotheses. Then,
write a general conclusion about whether a majority of adults would prefer blue as their car
color for this model.
e. Draw a sketch that illustrates the relationship between the p-value in this situation and
the value of the test statistic computed in part c. Use Table 12.1 on page 465 and Figure
12.2 on page 467 for guidance.
117
Activity 12.8 (For Section 12.3): This is a dataset activity in which you’ll compare two
population proportions. Use the GSS-02 dataset, which contains data from the 2002
General Social Survey, a survey of randomly selected adults in the U.S. We’ll compare the
proportions of males and females who are opposed to capital punishment. The relevant
variables in the dataset are sex and cappun.
a. Define the population parameter for examining the difference in the proportions of males
and females opposed to capital punishment. Describe it in words and give the appropriate
notation.
b. A researcher thinks that that females may be more likely to be opposed to capital
punishment than males are. Write appropriate null and alternative hypotheses for this
researcher. Write the hypotheses in words and using appropriate statistical notation.
c. Use statistical software to test the hypotheses written in part b.
What is the p-value for the test?
On the basis of the p-value, is the result statistically significant at the .05 level of
significance?
What conclusion can be made about this situation? Write your conclusion in the context of
this situation.
e. What proportion of females in the sample is opposed to capital punishment? What
proportion of males in the sample is opposed to capital punishment? What is the difference
between the sample proportions opposed to capital punishment for females versus males?
118
Activity 12.9 Construct a situation for which you can test null and alternative hypotheses
about a population proportion. Describe the population parameter of interest, the method
you would use to collect data, and what your null and alternative hypotheses are. If
feasible (and at the discretion of your instructor), collect data for this situation, use the data
to test your hypotheses, and report your result.
119
Activity 12.10 Construct a situation for which you can test null and alternative hypotheses
about the difference in two population proportions. Describe the parameter of interest, the
method you would use to collect data, and what your null and alternative hypotheses are.
If feasible (and at the discretion of your instructor), collect data for this situation, use the
data to test your hypotheses, and report your result.
120
Activity 12.11 This is Thought Question 12.4 on page 474 in the book.
Here are two questions about p-values and one-sided versus two-sided tests:
1. Under what conditions would the p-value for a one-sided z-test be greater than .5?
2. When the data are consistent with the direction of the alternative hypothesis for a onesided test, the p-value for the corresponding two-sided test is double what it would be for
the one-sided test. Use this information to explain why it would be cheating to look at the
data before deciding whether to do a one- or two-sided test.
121
CHAPTER 13 ACTIVITIES
INSTRUCTORS: Two group/team projects for Chapter 13 suitable for in-class work are
given in the Course Support (Class Projects) section of the companion website.
Activity 13.1 (For Section 13.2) In a class survey, students in a statistics class were asked
to report their heights (in inches). The data from the n = 87 males in the class were used to
test whether the mean height for men is 70 inches (as is often reported by the media) or is
greater than 70 inches. Computer output for the test follows. (Data source: pennstate1
dataset for the book.)
Test of mu = 70 vs mu > 70
Variable
Height
N
87
Mean
71.5632
StDev
2.7042
SE Mean
0.2899
T
5.39
P
0.000
a. Write the null and alternative hypotheses using proper statistical notation.
Null:
Alternative:
b. Read the output to find the values of the t-statistic and the p-value. Then, state a
conclusion about the hypotheses and about the “real world” situation. Be careful about
what population might be represented by this sample and any possible biases due to how
the data were collected.
t=
p-value =
Conclusion:
c. Draw a sketch that shows the connection between the value of the t-statistic and the pvalue in this situation. Use the table and figure(s) at the top of page 503 in the book for
guidance.
122
Activity 13.2 This is Thought Question 13.2 on page 510 in the book.
Suppose that in Example 13.2 (pp. 508-509), the purpose of the study was to determine
whether pilots should be allowed to consume alcohol the evening prior to their flights and
the alcohol consumption occurred 12 hours before the measurement of time of useful
performance. Refer to the discussion of type 1 and type 2 errors (pp. 461 in Chapter 12).
Explain what the consequences of each type of error would be in this example. Which
would be more serious? Given the data and results of the study, which type of error could
have been made?
Activity 13.3 To begin, review Example 13.4 on page 512 in the book.
a. Explain why a one-sided alternative hypothesis was used for this example, rather than a
two-sided alternative hypothesis.
b. What was done in this example to verify the necessary data conditions for doing a two
sample t-test? (You might look back at Example 11.11 on pages 427-428 in the book.)
123
Activity 13.4 In each part,
(1) identify whether the comparison is based on two independent samples or paired data
AND
(2) write null and alternative hypothesis using proper statistical notation.
Use the notation 1  2 for the difference in population means when the data are from
independent samples and the notation d for the mean of a paired difference.
a. Mean scores on a memory test are compared for women aged 40 to 49 years old versus
women aged 60 to 60 years old. We wish to determine if the mean is higher for 40 to 49
year olds.
Type of comparison is …
Null hypothesis is …
Alternative hypothesis is ……
b. Fifty students have their blood pressures before and after an exam. We wish to know if
there is an increase, on average.
Type of comparison is …
Null hypothesis is …
Alternative hypothesis is
c. A class survey is used to compare the mean GPAs of male and female students. We wish
to know if there is a difference.
Type of comparison is …
Null hypothesis is …
Alternative hypothesis is
124
Activity 13.5 (For Section 13.3, pages 507-512 in the book): Sixty-three college men
report their actual weights and also their desired weights in a survey. A paired t-test is used
to test whether the mean difference is 0 or not. Thus the alternative hypothesis was twosided. Computer output for the test is as follows. (Data source: idealwtmen dataset for the
book.)
Paired T for actual - ideal
N
Mean
StDev
actual
63 176.095
26.202
ideal
63 173.619
21.151
Difference 63 2.47619 13.76983
SE Mean
3.301
2.665
1.73484
95% CI for mean difference: (-0.99170, 5.94408)
T-Test of mean difference = 0 (vs not = 0):
T-Value = 1.43 P-Value = 0.159
a. Explain why it was appropriate to use a paired t-test rather than a two-sample t-test for
independent samples.
b. Write the null and alternative hypotheses using proper statistical notation. (Use Example
13.2 on pages 508-510 for guidance.)
Null:
Alternative:
c. Read the output to find the values of the t-statistic and the p-value. Then, state a
conclusion about the hypotheses and about the “real world” situation.
t=
p-value =
Conclusion:
d. Draw a sketch that shows the connection between the value of the t-statistic and the pvalue in this situation. Use Figure 13.5 on page 510 in the book for guidance.
125
Activity 13.6 Use Example 13.5 on pages 515-516 in the book as guidance for this activity.
The output below shows results for a 2-sample t-test for comparing the mean hours of
studying per week for students who say they prefer to sit in the front of classrooms and
students who prefer to sit in the back (data came from a statistics class). The alternative
hypothesis was a two-sided (not equal) hypothesis.
N
Mean
StDev
SE Mean
Front 99
16.4
10.85
1.09
Back
94
10.9
8.41
0.87
T-Test of difference = 0 (vs not =): T-Value = 4.01
P-Value = 0.000 DF = 183
a. Write the null and alternative hypotheses being tested using appropriate statistical
notation. Explain what the parameters in your hypothesis represent. (See Step 1 on page
515 in the book.)
b. Explain why the necessary conditions for a two-sample t-test are met in this situation.
(See Step 2 on page 516.)
c. Read the output to find the values of the t-statistic and the p-value. Then, state a
conclusion about the hypotheses and the “real world” situation.
t=
p-value =
Conclusion:
d. Draw a clearly labeled sketch that shows how the p-value is related to the value of the tstatistic in this situation. (See Figure 13.6 on page 512 for guidance.)
126
Activity 13.6 continued
e. The formula for the t-statistic is t 
x1  x2
s12 s22

n1 n2
. Give values for each of the elements in
the formula. (Use the output on the previous page.)
x1 
x2 
s1 
s2 
n1 
n2 
Activity 13.7 This is Thought Question 13.3 on page 520 in the book.
The paired t -test introduced in Section 13.3 and the two-sample t -test introduced in
section 13.4 are both used to compare two sets of measurements. The null hypothesis in
both cases is usually that the mean population difference is 0. Explain the difference in the
situations for which they are used. Suppose researchers wanted to know if college students
spend more time watching TV or exercising. Explain how they could collect data
appropriate for a paired t -test and how they could collect data appropriate for a twosample t -test.
127
Activity 13.8 This is a dataset activity for section 13.4, Lesson 1. You’ll use an unpooled
two-sample t-test. Use the GSS-02 dataset, which contains data from the 2002 General
Social Survey, a survey of randomly selected adults in the U.S. We’ll consider the
difference between the mean hours of self-reported television watching per day for females
versus males. The relevant variables in the dataset are sex and tvhours.
a. Define the population parameter of interest in this situation. Describe it in words and
give the appropriate notation.
b. Write appropriate null and alternative hypotheses for this situation. Explain why you
chose the alternative hypothesis that you did.
c. Use statistical software to test the hypotheses written in part b.
What is the p-value for the test? ________
Is the result statistically significant at the .05 level of significance? ____________
What conclusion can be made about this situation? Write your conclusion in the context of
this situation.
d. What is the mean television watching time for females in the sample? What is the mean
television watching time for males in the sample? What is the difference between the
sample mean television watching times per day for females versus males?
e. Write a short summary of this activity that could be understood by somebody with
minimal training in statistics.
128
Activity 13.9 (For Section 13.6):
In each part of this activity, a research question is briefly described. For each research
question:
1. Specify the population parameter(s) of interest. Give the symbol and describe it
in words.
2. Explain whether the primary method for analyzing the data should be a
confidence interval or a hypothesis test. If a hypothesis test should be done, write
the null hypothesis.
The following table gives the notation for “big five” parameters that we covered in
chapters 9-13.
Value of interest
1. One proportion
2. One mean
3. Difference in two proportions
(independent groups)
4. Difference in two means
(independent groups)
5. Mean difference for paired
data
Population
Parameter
p

p1  p2
1 2
d
a. Research question: What proportion of adults in the United States is in favor of the death
penalty for persons convicted of murder?
b. Research question: Is the mean systolic blood pressure of women who use oral
contraceptives greater than the mean systolic blood pressure of women who do not use oral
contraceptives?
c. Research question: Is mean normal human body temperature less than 98.6?
129
Activity 13.9 continued
d. Research question: On average, how much difference is there between the adult heights
of a father and his son?
e. Research question: Is the proportion of men who experience sleep apnea (irregular
breathing during sleep) higher than the proportion of women who do?
f. Research question: What is the mean weight of six year old children?
g. Two diets for weight-loss are compared. Sixty participants are randomly divided into
two groups and each group uses a different diet.
Research question: Is there a difference between the mean weight-losses for the two
programs?
h. Research question: How much difference is there in the proportions of patients
successfully treated for two different treatments of a medical condition?
i. In a survey done by a car manufacturer, people are asked which color they would pick
for a new car if the car was available in silver, blue, and green.
Research question: Will more than 1/3 of all people pick silver?
130
Activity 13.10 Read Section 13.8 (pages 530-532) about evaluating significance in
research reports.
a. Is this problem discussed in item 6 on page 532 equivalent to saying that one in 20
statistically significant results are erroneous, and are just due to chance? Explain.
b. Would at least 20 hypotheses have to be tested in a study in order for the issue raised in
item 6 on page 532 to be a problem? Explain.
c. Find an example of a study in the news or on the Internet in which it is clear that
multiple hypotheses were tested. Comment on whether the news article mentioned that as a
problem. Discuss the extent to which you think this issue affected the conclusions made in
the news story.
d. To begin, see items 2 and 5 on page 532 in the book. Find an example of a study in the
news or on the Internet in which a result is described as being “significant.” Discuss
whether you think the word “significant” is used in the everyday sense or in the statistical
sense only. Discuss whether information is given in the article about the magnitude of the
“significant” difference or relationship.
131
Activity 13.10 continued
e. See item 3 on page 532 in the book. Find an example of a study in the news or on the
Internet for which you suspect that the small sample size issue described in item 3 may be
a problem.
Activity 13.11 This is Thought Question 13.4 on page 522 in the book.
Refer to Example 13.9 (p. 521). A 95% confidence interval for the difference in
proportions who would get ear infections with placebo compared to with Xylitol was 0.02
to 0.226. On the basis of this information, specify a one-sided 97.5% confidence interval
and explain how you would use it to test H 0 : p1  p2  0 versus H a : p1  p2  0 with
 = .025.
132
CHAPTER 14 ACTIVITIES
INSTRUCTORS: Two group/team projects for Chapter 14 suitable for in-class work are
given in the Course Support (Class Projects) section of the companion website.
Activity 14.1 This is Thought Question 14.1 on page 556 in the book.
Regression equations can be used to predict the value of a response variable for an
individual. What is the connection between the accuracy of predictions based on a
particular regression line and the value of the standard deviation from the line? If you were
deciding between two different regression models for predicting the same response
variable, how would your decision be affected by the relative values of the standard
deviations for the two models?
Activity 14.2 This is Thought Question 14.2 on page 557 in the book.
Look at the formula for SSE, and explain in words under what condition SSE= 0. Now
explain what happens to r 2 when SSE = 0, and explain whether that makes sense
according to the definition of r 2 as “proportion of variation in y explained by x.”
133
Activity 14.3 This is Thought Question 14.3 on page 560 in the book.
In previous chapters, we learned that a confidence interval can be used to determine
whether a hypothesized value for a parameter can be rejected. How would you use a
confidence interval for the population slope to determine whether there is a statistically
significant relationship between x and y? For example, why is the interval that we just
computed for the sign-reading example (Example 14.7) evidence that sign-reading distance
and age are related?
Activity 14.4 Read pages 560-561 in the book. Summarize the connection between testing
whether a population correlation value is 0 or not and testing whether the slope of a
population regression line is 0 or not.
134
Activity 14.5 Use the GSS-02 dataset, which gives data from the 2002 General Social
Survey. Analyze the relationship between the variables emailtime = hours spent per week
using email and age = respondent’s age. Draw a scatterplot, determine the correlation
value, determine the linear regression equation for the sample using emailtime as the yvariable, and assess the statistical significance of the observed relationship. What does this
activity indicate about the effect of sample size on the significance of an observed
relationship? Explain. (See page 561 in the book for guidance.)
135
Activity 14.6 This is Thought Question 14.5 on page 566 in the book.
Draw a picture similar to the one in Figure 14.3 (p. 553), illustrating the regression line and
the normal curves for the y values at several values of x. Use it to illustrate the difference
between a prediction interval for y and a confidence interval for the mean of the y’s at a
specific value of x.
Activity 14.7 This is Thought Question 14.6 on page 567 in the book.
A residual is the difference between an observed value of y and the predicted value of y for
that observation. Based on the size of a residual for an observation, how would you decide
whether an observation was an outlier? Is it enough to know the value of the residual, or do
you need to know other information to make this judgment? How could you apply the
methods for detecting outliers described in Chapter 2?
136
CHAPTER 15 ACTIVITIES
INSTRUCTORS: Two group/team projects for Chapter 15 suitable for in-class work are
given in the Course Support (Class Projects) section of the companion website.
Activity 15.1 Find a survey question that has been asked at two different time periods or
by two different sources. For instance, many polling organizations ask opinions about
certain issues on an annual or other regular basis.
[Suggestion: You might be able to use the GSS-93 and GSS-02 datasets on the companion
website for the book.]
a. Create a contingency table where “time period” is one categorical variable and “response
to poll question is the other.
b. State the null and alternative hypotheses for comparing responses across the two time
periods.
c. Carry out a chi-square test to see if opinions have changed over the two time periods.
Write a brief summary of your findings. Be clear about the conclusion and to what
population the conclusion applies. Also in your summary, indicate where you located the
data and when the survey questions were asked.
137
Activity 15.2 Carefully collect data cross-classified by two categorical variables for which
you are interested in determining whether there is a relationship. Collect the data yourself
(or with a group.) Be sure to get a large enough sample so that there are at least five in each
cell.
a. Create a contingency table for the data and calculate appropriate descriptive percentages
for looking at the possible relationship.
b. Use a chi-square test to determine whether there is a statistically significant relationship
in the observed data.
c. Discuss the role of sample size in making the determination in part c.
d. Write a summary of your findings.
138
Activity 15.3 This is Thought Question 15.1 on page 590 in the book.
Consider Example 15.2 (pp. 585) about gender and the question about with whom it’s
easiest to make friends.
a. What are the degrees of freedom for the chi-square statistic for these data?
b. To be statistically significant at the .05 level, how large would the calculated chi-square
have to be?
Activity 15.4 Refer to Example 15.10 on pages 594-595 in the book.
a. Verify that the value of the chi-square statistic for the data in Table 15.5 on page 594 is
(about) 13.66. Either use statistical software or “by hand” calculations.
b. What are the degrees of freedom for the chi-square test done in Example 15.10?
c. What is the “real world” conclusion in Example 15.10?
139
Activity 15.5 Read the original Journal Article on the companion website for Chapter 11—
Example 11.12: “Development and Initial Validation of the Hangover Symptoms Scale:
Prevalence and Correlates of Hangover Symptoms in College Students.” The authors give
the results of some chi-square tests on pages 1445-1446 of the article. For any two of those
chi-square tests, describe the variables involved, the null and alternative hypotheses, and
the conclusion about the variables.
Activity 15.6 This is Thought Question 15.2 on page 594 in the book.
Suppose that you read that men are more likely to be left-handed than women are. To
investigate this claim, you survey your class and find that 11 of 84 men and 7 of 78 women
are left-handed. Should you compare the men and women using a z-test or a chi-square
test? Or does it matter?
Activity 15.7 This is Thought Question 15.4 on page 601 in the book.
Remember that the “degrees of freedom” for the chi-square test for a two-way table
represents the largest number of cells for which you were “free” to find expected counts.
The remaining expected counts were determined because the row and column totals had to
be the same as they were for the observed counts. Explain how the same principle applies
in specifying the degrees of freedom for a chi-square goodness-of-fit test, which are k – 1
when there are k categories.
140
CHAPTER 16 ACTIVITIES
INSTRUCTORS: One group/team project for Chapter 16 suitable for in-class work is
given in the Course Support (Class Projects) section of the companion website.
Activity 16.1 This is Thought Question 16.1 on page 621 in the book.
To what populations do the conclusions of Example 16.1 on pages 618-619 apply? Do you
think it matters that the data were collected at a single university? Does it matter that the
surveys were done only in statistics classes?
Activity 16.2 This is a dataset activity. Use the GSS-02 dataset, which gives data from the
2002 General Social Survey, a nationwide sample of randomly selected adults. The
variable degree gives the highest educational degree achieved (five categories) and the
variable emailtime gives self-reported hours of week spent doing email. Use one-way
analysis of variance to examine differences in mean weekly email time for the five groups.
Write a summary or your findings. Include appropriate graphical analysis, descriptive
statistics, the findings of the analysis of variance, and a discussion of the ways in which the
educational degree groups differ.
141
Activity 16.3 Read the original Journal Article for Chapter 6 – Case Study 6.2: “The
Effects of Different Resistance Training Protocols on Muscular Strength and Endurance
Development in Children.” This study was discussed in Case Study 6.2 on pages 195 in the
book; the purpose was to investigate the effects of different weight lifting programs for
children.
a. How many children participated? What were the three different “treatment” groups in
the study? How were children assigned to the groups?
b. Examine Table 1 on the second page (page is numbered “2 of 7” at the lower left). This
table presents a comparison of the characteristics of the children in the three treatment
groups at the beginning of the study. ANOVA results are presented for three different
variables. For each variable, describe the null hypothesis in words and give the conclusion.
(Note: weight = child’s body weight). Given the nature of this study, is there any important
way that the three groups differed at the beginning of the study? Explain.
c. Table 3 in the article (on the page numbered “4 of 7” at the lower left) gives results for
muscle endurance at the end of the study. The response is the number of times a child can
now lift (or press) the maximum weight that they could lift (or press) one time at the
beginning of the study. There are two variables – one for a chest press task and one for a
leg extension task. ANOVA results are not directly given in Table 3, but ANOVAs were
carried out by the authors. What would be the null hypotheses for these ANOVAs? On the
basis of statements written about Table 3 and the values in Table 3, what do you think were
the results of these ANOVAs?
142
Activity 16.3 continued
d. Continue to examine the information in Table 3 of the article. Explain which of the
necessary conditions for using the F-statistic to compare means is violated by the observed
data? What condition are we not able to evaluate from information given in the article?
(See page 620 in the textbook for a summary of the necessary conditions.)
Activity 16.4 This is Thought Question 16.2 on page 630 in the book.
In Example 16.9 (p. 629), each 95% confidence interval had the same width. Why did this
happen? When would the 95% confidence intervals have different widths?
Activity 16.5 This is Thought Question 16.4 on page 637 in the book.
In Example 16.14 (p. 636), there was only one server of each sex. What problem does this
cause in the interpretation and generalization of the results? How would you have designed
the experiment to better examine the interaction between sex of the server and drawing a
happy face (or not)?
143
Activity 16.6 (For Section 16.4): In a statistics class, students were randomly assigned
either to run in place or not. After the running (or not), all students took their pulse rates.
Mean pulse rates for combinations of gender and whether the student ran or not are given
in the following table. (Data Source: The Pulse dataset bundled with the Minitab computer
program, Minitab, Inc.)
Did not run
Ran
Female
74.8 (n = 24)
112.8 (n = 11)
Male
70.6 (n = 33)
83.2 (n = 24)
a. For each sex separately, determine the difference between mean pulse rates for those
who ran in place versus those who did not.
Difference for females = _________ Difference for Males = ___________
b. Explain why the results for part a give evidence of an interaction between the gender and
running in place (or not) variables.
c. Graph the means using the same format used for Figure 16.13 (p. 636) and Figure 16.14
(p. 637) in the book. Put mean pulse rate on the vertical axis and use the horizontal axis to
indicate whether students either did not or did run. Then, show two lines on the graph –
one connecting the two means for females and one connecting the two means for males.
Briefly discuss what the graph shows about how gender, running (or not), and the
combination of the two variables affects pulse rate.
144
Activity 16.7 Design an experiment or survey for which the response variable is a
quantitative variable of interest to you and the purpose is to compare (at least) three
different groups or treatment conditions. Collect the data and analyze the results. Write a
summary in which you discuss the purpose of your study, your data collection method, and
your data analysis. Present a conclusion and indicate what population was represented by
your sample.
145