Download Probability and Statistics Teacher`s Edition - Assessment - cK-12

Document related concepts

Degrees of freedom (statistics) wikipedia , lookup

Inductive probability wikipedia , lookup

Psychometrics wikipedia , lookup

Foundations of statistics wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Taylor's law wikipedia , lookup

History of statistics wikipedia , lookup

Law of large numbers wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Misuse of statistics wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript
Probability and Statistics Teacher’s Edition - Assessment
CK-12 Foundation
March 15, 2010
CK-12 Foundation is a non-profit organization with a mission to reduce the cost of textbook
materials for the K-12 market both in the U.S. and worldwide. Using an open-content, webbased collaborative model termed the “FlexBook,” CK-12 intends to pioneer the generation
and distribution of high quality educational content that will serve both as core text as well
as provide an adaptive environment for learning.
Copyright ©2009 CK-12 Foundation
This work is licensed under the Creative Commons Attribution-Share Alike 3.0 United States
License. To view a copy of this license, visit http://creativecommons.org/licenses/
by-sa/3.0/us/ or send a letter to Creative Commons, 171 Second Street, Suite 300, San
Francisco, California, 94105, USA.
Contents
1 Probability and Statistics TE - Assessment
5
1.1
An Introduction to Analyzing Statistical Data . . . . . . . . . . . . . . . . .
6
1.2
Visualization of Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
16
1.3
Introduction to Probability . . . . . . . . . . . . . . . . . . . . . . . . . . .
30
1.4
Discrete Probability Distributions . . . . . . . . . . . . . . . . . . . . . . . .
42
1.5
Normal Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
52
1.6
Planning and Conducting an Experiment or Study . . . . . . . . . . . . . .
61
1.7
Sampling Distributions and Estimations . . . . . . . . . . . . . . . . . . . .
71
1.8
Hypothesis Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
83
1.9
Regressions and Correlation Quizzes . . . . . . . . . . . . . . . . . . . . . .
97
1.10 Chi-Square . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
1.11 Analysis of Variance and the F-Distribution . . . . . . . . . . . . . . . . . . 125
3
www.ck12.org
www.ck12.org
4
Chapter 1
Probability and Statistics TE - Assessment
Proposed Pacing Guide for AP Probability and Statistics
Understanding and Describing Data
Chapters 1 and 2 - 2 – 3 weeks
Scatterplots, Correlation, association, linear regression
Chapter 9 - 4 weeks
Probability
Chapters 3, 4, 5 - 4 weeks
Gathering Data and Experimental Design
Chapter 6 - 2 -3 weeks
Sampling Distributions
Chapter 7 - 5 weeks
Hypothesis Testing & Chi-Square
Chapters 8, 10 - 4 weeks
The above are all topics which are included in the AP Syllabus and appear on the AP Exam.
ANOVA and Nonparametrics are not in the AP syllabus and thus, I would recommend that
these topics be studied after the exam in the spring.
ANOVA
Chapter 11 - 2 weeks
5
www.ck12.org
NonParametrics
Chapter 12 - 2 weeks
1.1
An Introduction to Analyzing Statistical Data
Definition of Statistical Terminology
Quiz 1
For each description of data, identify the variables, classify each variable as categorical or
quantitative and if the variable is quantitative state whether it is discrete or continuous.
1. Researchers caught and measured 27 lions recording their weight, neck size, length and
sex.
2. A restaurant posts, for each of the sandwiches it sells, the type of meat, the number of
calories and the serving size in ounces.
Complete the following:
3. In statistics, the total group being studied is called ________
4. A small, representative subset of the population is called a ________
Quiz 2
For each description of data, identify the variables, classify each variable as categorical or
quantitative and if the variable is quantitative state whether it is discrete or continuous.
1. A large hospital in New York keeps data on new babies that are born. They record the
mother’s age, the weeks the pregnancy lasted and the birth weight and gender of the baby.
2. A survey of cars in a large parking lot recorded the make of each car, the country of
origin, the type of vehicle and the age of the car.
Complete the following:
3. A mall, representative subset of the population is called a ________________
4. A value of a population variable is called a _________________
www.ck12.org
6
Quiz 3
For each description of data, identify the variables, classify each variable as categorical or
quantitative and if the variable is quantitative state whether it is discrete or continuous.
1. A telephone poll conducted of voters recorded the voter’s region of country, age and party
affiliation.
2. Concerned with environmental issues, a survey of students at a large university recorded
how far each student lived from campus, the mode of transportation used to get to campus
(car, bus, bike, walk, etc.), whether or not the student owned a car and the year in school
(freshman, sophomore, etc.)
Complete the following:
3. An estimate from a sample of a parameter is called a _________________
4. Whenever a sample is used instead of the entire population, results are merely estimates
and have some chance of being incorrect. This is called _______________
An Overview of Data
Quiz 1
1. Arrange in order from highest to lowest the four levels of measurement.
2. To a physicist the colors red, orange, yellow, green blue and violet correspond to specific
wave lengths of light and thus are an example of which level of measurement?
Indicate whether the following describes an experiment or an observational study:
3. Researchers are investigating the effect of two drugs designed to help people quit smoking.
They found that 40 people our of 100 who decided to use drug A at the beginning of 2001
were no longer smoking at the end of 2001. Only 18 people out of 125 who chose to use drug
B at the beginning of 2001 had quite smoking by the end of 2001.
4. True or False: An observational study is a way to establish a cause and effect relationship.
Quiz 2
1. Arrange in order from lowest to highest the four levels of measurement.
2. To an electronics student familiar with color-coded resistors, colors are in an ascending
order and thus represent at least what level of measurement?
3. Indicate whether the following describes and experiment or an observational study.
7
www.ck12.org
A company would like to know the baking time and oven temperature that will produce the
best bread. They consider 4 oven temperatures (300, 325, 350 and 375 degrees) and three
baking times (40, 50, 60 minutes). Three breads are cooked at each time/temp combination.
4. True or False? In an experiment the researcher observes subjects in the real world without
manipulating them.
Quiz 3
1. Describe the difference between ratio and interval levels of measurement.
2. To a 3 year old child black, brown, red, orange, and yellow are just names of colors and
thus represent what level of measurement?
3. Indicate if the following describes an experiment or an observational study:
It believed that students who study a musical have higher GPA’s than student who do not.
Of the music students 18% had all A’s as compared with only 7% among the students who
did not study a musical instrument.
4. True or False: Cause-and-effect relationships can be established through an experiment.
Measures of Center
Quiz 1
1. The annual salaries of ten office workers are $23, 000, $38, 000, $46, 000, $23, 000, $24, 000,
$23, 000, $23, 000, $38, 000, $23, 000, and $32, 000.
a. Find the mean, median and modal salaries.
b. Explain why the mode is an unsatisfactory measure of the middle in this case.
2. Find x if 5, 9, 11, 12, 13, 14, 17 and x have a mean of 12.
3. How many data points must be removed from each end of a sample of 400 values in order
to calculate a 10% trimmed mean?
4. Sarah took 8 tests. Her scores for seven of these were 29, 36, 32, 38, 35, 34, and 39 (each
out of 40). What was her score on the eighth test if her average for all eight tests was 35?
5. Create a data set that fits this description:
The median age of Sarah and her six siblings is 14. The range of their ages is 12 years and
the mode is 10.
www.ck12.org
8
Quiz 2
1. The following raw data is the daily rainfall (to the nearest millimeter) for a month in the
desert.
3, 1, 0, 0, 0, 0, 0, 2, 0, 0, 3, 0, 0, 0, 7, 1, 1, 0, 3, 8, 0, 0, 0, 42, 21, 3, 0, 3, 1, 0, 0
a. Find the mean, median and mode for the data.
b. Give a reason why the mode is not the most suitable measure of center for this data.
2. Find a given that 3, 0, a, a, 6, a, 4, a, and 3 have a mean of 4.
3. How many data points must be removed from each end of a sample of 425 values in order
to calculate a 15% trimmed mean?
4. The mean of 10 scores is 11.6. What is the sum of the scores?
5. Create a data set that fits the description:
George took six math tests during the current marking period. His mean mark is 83 and his
median mark is 85.
Quiz 3
1. How many data points must be removed from each end of a sample of 80 values in order
to calculate a 10% trimmed mean?
2. Bill drove an average of 262 miles each day for a period of 12 days. How many miles did
he drive total?
3. The selling prices of the last 10 houses sold in a certain district were as follows:
146, 400 127, 600 211, 000 192, 500 256, 400 132, 400 148, 000 129, 500 131, 400 162, 500(all in dollars).
a. Calculate the mean and median selling prices of these houses.
b. Which measure would you use if you were
i. A real estate agent wanting to sell your house.
ii. Looking to buy a house in the district?
4. A basketball team scored 43, 55, 41 and 37 goals in their first four matches. What score
will the team need to shoot in the next match so that they maintain the same mean score?
9
www.ck12.org
5. Create a data set that fits the description:
Lara took a survey of the number of coins eight students had in their pockets. The minimum
was 7, the mode was 11, the median was 10 and the range was 9.
Measures of Spread
Quiz 1
1. Find the median, upper and lower quartiles, the range and the interquartile range for the
set of data:
2, 3, 3, 3, 4, 4, 4, 5, 5, 5, 5, 6, 6, 6, 6, 6, 7, 7, 8, 8, 8, 9, 9
2. Find the mean and standard deviation of the following distribution:
Score
Frequency
0
1
1
0
2
1
3
1
4
2
5
6
6
5
7
3
8
1
3. Use your calculator to find the mean and standard deviation of the following:
23, 24, 25, 26, 27, 28, 29, 30
Quiz 2
1. Find the median, upper and lower quartiles, the range and the interquartile range for the
set of data:
10, 12, 15, 12, 24, 28, 19, 18, 18, 15, 16, 20, 21, 17, 18, 16, 22, 14
2. Find the mean and standard deviation of the following distribution:
Score
Frequency
www.ck12.org
11
2
12
1
13
4
14
5
10
15
6
16
4
17
2
18
1
3. Use your calculator to find the mean and standard deviation of the following:
7, 19, 5, 14, 13, 18, 21, 14, 11, 13, 15, 8
Quiz 3
1. Find the median, upper and lower quartiles, the range and the interquartile range for the
set of data:
21.8, 22.4, 23.5, 23.5, 24.6, 24.9, 25, 25.3, 26.1, 26.4, 29.5
2. The number of toothpicks in 48 boxes was counted and the results tabulated:
Number of toothpicks
Frequency
33
1
35
5
36
7
37
13
38
12
39
8
40
2
Find the mean and standard deviation of the distribution.
3. Use your calculator to find the mean and standard deviation of the following:
−3, −2, −1, 0, 1, 2, 3
Test 1
1. A distribution of 6 scores has a median of 21. If the highest score increases 3 points, the
median will be:
a. 21
b. 21.5
c. 24
d. Cannot be determined with information given
e. None of the above
2. If you are told that a data set has a mean of 25 and a variance of 0, you can conclude
that:
a. There is only one observation in the data set
b. There are no observations in the data set.
c. All of the observations in the data set are 25
11
www.ck12.org
d. Someone has made a mistake
e. None of the above
3. Of the following measures: mean, median, IQR and standard deviation, which are resistant:
a. Mean and median
b. Median and IQR
c. Mean and standard deviation
d. Median and standard deviation
e. None of the above
∑
4. The quantity (xi − x̄) is not used as a measure of variation because
a. It is always equal to zero
b. It is always a negative value
c. It is too difficult to work with
d. It is always a positive value
e. None of the above
5. The variance of the following sample of five numbers: 1, 2, 3, 4, 5 is:
a. 2
b. 9
c. 10
d. 13.3
e. 55
6. If you add 5 to each value in a data set, then the standard deviation will
a. Decrease by 5
b. Increase by 5
c. Stay the same
d. Depend on the values of the data in the data set
e. None of the above
7. Last year a small accounting firm paid each of its five clerks $25000, two junior accountants
$60, 000 each, and the firm’s owner $255, 000.
www.ck12.org
12
a. What is the mean salary paid at this firm?
b. How many of the employees earn less than the mean?
c. What is the median salary?
8. The National Institute of Health is studying the birth weight of babies born to mothers
who smoke cigarettes. A sample of the weights of 14 babies is selected and the weights are
listed below (in pounds).
6.1 5.9 5.8 7.2 6.3 6.2 6.1 5.1 6.0 6.5 6.7 5.3 6.5 5.9
a. calculate the mean and median
b. If the largest value were changed to 15.2, how would this impact the median?
c. If the largest value were changed to 15.2, how would this impact the mean?
9. Define the following:
a. Trimmed mean
b. Midrange
10. The following table presents the years of service of eight college professors:
Professor
Baric
Baxter
Yrs
31
15
Professor
Hastings
Prevost
Yrs
7
3
Professor
Reed
Rossman
Yrs
1
6
Professor
Stodghill
Tesman
Yrs
28
6
a. Use your calculator to compute the mean and median of these years of service.
b. Suppose Baxter’s years of service had been mistakenly recorded as 155. Use your calculator to recompute the mean and the median. Did they change? Explain.
c. Suppose Baric’s years of service had been mistakenly recorded as 3. Use your calculator
to recompute the mean and the median. Did they change? Explain.
11. Construct a data set with ten hypothetical exam scores so that the mean does not equal
the median and none of the scores are between the mean and the median.
12. Find the range, sample standard deviation, and sample variance for the following
data set:
a. inauguration ages of U.S. presidents: 57, 61, 57, 57, 58, 57, 61, 54, 68, 51, 49, 64, 50, 48, 65, 52, 56, 46, 54, 49,
54, 42, 51, 56, 55, 51, 54, 51, 60, 62, 43, 55, 56, 61, 52, 69, 64, 46, 54.
b. Sort the presidential inauguration age data on your calculator. Count how many data
elements are within one standard deviation of the mean (i.e. between 54.8 − 6.2 = 48.6 and
13
www.ck12.org
54.8 + 6.2 = 61.0). Convert this to a percentage.
c. Sort the presidential inauguration age data on your calculator. Count how many data
elements are within two standard deviations of the mean. Convert this to a percentage.
13. For each of the following Determine which level of measurement is most appropriate.
a. Daily high and low temperatures at the Niles airport for 2004.
b. Time (in days) for a sunspot to be visible from the earth.
Test 2
1. Which of the following variables is a continuous variable?
a. The lifetime of a 9 volt battery
b. The number of cracked eggs in a carton of 12 eggs
c. The number of people in a family
d. The brand of laundry detergent used
2. If you add 7 to each value in a data stet, then the mean will
a. Decrease by 7
b. Increase by 7
c. Stay the same
d. Depend on the values of the data in the data set
e. None of the above
3. Which of the following is a discrete variable
a. The lifetime of a 9 volt battery
b. The height of all students in the high school
c. The number of cracked eggs in a carton
d. The brand of laundry detergent used
e. The types of music sold in a music store
4. The mean of the sample of five numbers: −1, −2, −3, −4, −5 is:
a. 0
b. −3
c. −1
www.ck12.org
14
d. −6
e. 3
5. A distribution of 7 scores has a median of 23. If the lowest score decreases 4 points, the
median will be
a. 23
b. 23.5
c. 22.7
d. Cannot be determined by the information given
e. None of the above
6. A police officer gave 20 speeding tickets last week on a stretch of road having a 60 miles
per hour speed limit. The speeds recorded for each of the tickets are given below:
72 68 79 79 67 81 76 71 82 80 80 73 78 78 75 70 79 70 69 74
a. Is this categorical or quantitative data?
b. What is the range?
c. What is the IQR?
7. Define the following:
a. Categorical variable
b. Interquartile range
8. Explain the difference between an experiment and an observational study.
9. Construct a data set with ten hypothetical exam scores so that 90% of the scores are
greater than the mean. Assume the exam scores are integers between 0 and 100.
10. For each of the following determine which level of measurement is most appropriate.
a. Colors of Skittles® brand candies.
b. Final course grades of A, B, C, D, and F .
10. Last year a small private school paid each of its five interns $25000, two lead teachers
$60, 000 each, and the school’s principal $255, 000.
a. What is the mean salary paid at this firm?
b. How many of the employees earn less than the mean?
c. What is the median salary?
15
www.ck12.org
1.2
Visualization of Data
Histograms and Frequency Distributions
Quiz 1
1. Given below is a cumulative frequency distribution table showing the marks secured by
50 students of a class.
Table 1.1:
Marks
below
below
below
below
below
No of students
20
40
60
80
100
17
20
29
37
50
a. Form a frequency table from the above data
b. The frequency for the fourth class interval is..
c. The class interval with the highest frequency is
2. The data set below is the test scores (out of 100) for a math test for 50 students.
a. Construct a frequency table for this data using class intervals 0 − 9
b. What percentage of the students scored 80 or more for the test?
c. Draw the ogive for this data.
56 29 78 67 68 69 80 89 92 71 58 66 56 88 81 70 73 63 74 38
67 64 62 55 56 75 90 92 47 44 59 64 89 62 51 87 89 76 59 88
72 80 95 68 80 64 53 43 61 39
Visualization of Data
Histograms and Frequency Distributions
www.ck12.org
16
Quiz 2
The following data is the number of points scored by the eight winning teams in the first
five rounds of the 2001 AFL season.
94 196 154 131 129 134 152 140 124 162 103 139 82 170 110 111 116 160 104 110
98 106 187 149 165 88 118 123 137 128 113 130 145 139 125 154 126 141 122 106
1. Construct a frequency table for this data using class intervals 80 − 89, 90 − 99, 100 −
109, . . . − 180 − 189
2. Construct a cumulative frequency table for this data.
3. What percentage of matches had winning scores of 99 points or less?
4. Draw an ogive for this data.
5. Describe the distribution of the data.
Visualization of Data
Histograms and Frequency Distributions
Quiz 3
The number of matches in a box is stated as 50 but the actual number of matches has been
found to vary. The number of matches in a box has been counted for a sample of 60 boxes.
Here is the data:
51 50 50 51 52 49 50 48 51 47 50 52 48 50 49 51 50 50 52 52 51 50 50 52
50 53 48 50 51 50 50 49 48 51 49 52 50 49 50 50 52 50 51 49 52 52 50 49 50
49 51 50 50 51 50 53 48 49 49 50
1. Construct a frequency table for this data.
2. Display the data using a histogram
3. Describe the distribution of this data.
4. What percentage of the boxes contains exactly 50 matches?
5. Construct a cumulative frequency table for this data.
6. Draw an ogive for this data.
17
www.ck12.org
Common Graphs and Data Plots
Quiz 1
1. For the following data set construct a dot plot and comment on the distribution.
4
2
5
6
7
4
5
3
5
4
7
6
3
5
8
6
5
2. The data supplied below is the diameter (in cm) of a number of bacteria colonies as
measured by a microbiologist 12 hours after seeding.
.4 2.1 3.4 3.9 4.7 3.7 .8 3.6 4.1 4.9 2.5 3.1
1.7 3.6 2.8 3.7 2.8 3.2 3.3
1.5 2.6 4.0 1.3 3.5 .9 1.5 4.2 3.5 2.1
a. Produce a stemplot for this data.
b. Comment on the skewness of the data
3. The stem and leaf plot represents how much Jacob spent this month:
Money
0|1 7 7
1|1 2 4
2|1 1 6
3|9
4|1 2 7
5|6 6 8
Spent
8
66778
9
List out how much Jacob spent.
4. Fast food is often considered unhealthy because much fast food is high in fat and sodium.
Are fat and sodium related? Following are the fat and sodium content of several brands of
burgers. Create a scatterplot and describe the direction and the strength of the relationship.
Fat (g)
Sodium (mg)
www.ck12.org
19
920
31
1500
34
1310
18
35
860
39
1180
39
940
43
1260
Quiz 2
1. Make a stem plot of the money that Fiona spent this month:
Fiona: $71, $57, $68, $57, $83, $88, $64, $75, $66, $74, $81
2. The following sets of test scores were compared using a back-to-back stem plot:
Ian
Dan
9 8 6 4 |4| 6 7 8
9 8 7 5 2 2 |5| 2 3 7 8
9 5 4 3 |6| 0 0 1 7 8
6 2 2 0 |7| 6
List the scores that Dan and Ian had.
3. Draw a dot plot of the sodium data and comment on the distribution.
Table 1.2:
Cereal
Sodium(mg)
Sugar(g)
Frosted Mini Wheats
Raisin Bran
All Bran
Apple Jacks
Capt. Crunch
Cheerios
Cinnamon Toast
Crackling Oat Bran
Crispix
Frosted Flakes
Fruit Loops
Grape Nuts
Honey Nut Cheerios
Life
Oatmeal Raisin Crisp
Sugar Smacks
Special K
Wheaties
Corn Flakes
Honeycomb
0
210
260
125
220
290
210
140
220
200
125
170
250
150
170
70
230
200
290
180
7
12
5
14
12
1
13
10
3
11
13
3
10
6
10
15
3
3
2
11
19
www.ck12.org
4. Is there a relationship between fat in burgers and calories? Draw a scatterplot of the
following data and describe the direction and strength of the relationship.
Fat (g)
Calories
19
410
31
580
34
590
35
570
39
640
39
680
43
660
Quiz 3
1. Draw a dot plot of the sugar data and comment on the distribution.
Table 1.3:
Cereal
Sodium(mg)
Sugar(g)
Frosted Mini Wheats
Raisin Bran
All Bran
Apple Jacks
Capt. Crunch
Cheerios
Cinnamon Toast
Crackling Oat Bran
Crispix
Frosted Flakes
Fruit Loops
Grape Nuts
Honey Nut Cheerios
Life
Oatmeal Raisin Crisp
Sugar Smacks
Special K
Wheaties
Corn Flakes
Honeycomb
0
210
260
125
220
290
210
140
220
200
125
170
250
150
170
70
230
200
290
180
7
12
5
14
12
1
13
10
3
11
13
3
10
6
10
15
3
3
2
11
2. Go Shop Already is a babysitting service in a mall. For a month, they kept track of the
number of babies they were left with. Make a dot plot of the data:
www.ck12.org
20
31, 24, 29, 24, 16, 14, 25, 17, 30, 18, 19, 26, 18, 23, 26, 17, 16, 24, 30, 27, 19, 29, 22, 20, 32, 30,
20, 21, 29, 23
3. At Ritzy Stuff store, management kept track of sales during a recent workday. They made
a dot plot of the data. List out the sales for that workday:
4. Use the following data to examine the relationship between sodium and calories in hambugers. Draw a scatterplot and describe the strength and direction of the association.
Calories
Sodium (mg)
410
920
580
1500
590
1310
570
860
640
1180
680
940
660
1260
Box and Whisker Plots
Quiz 1
1. Find the five-number summary for the data set: {37, 44, 5, 8, 20, 11, 14}
2. Circle the points that represent the five number summary values in the dot plots below:
3. Create a data set with the five number summary 6, 10, 12, 15, 20 so that the set contains
11 values.
4. The table shows the number of bachelor’s degrees earned in various fields at a private
university for 1994.
Degree Field 1994
21
www.ck12.org
Architecture 78
Biological Sciences 172
Business and management 422
Computer science 205
Cultural studies 46
Education 261
Engineering 370
English literature 143
Law 29
Mathematics 65
Philosophy 52
Physical sciences 110
Visual and performing arts 141
a. Give the five number summary and the mean for the data.
b. Create a box plot for the data set.
Quiz 2
1. Find the five-number summary for the data set: {10, 1, 3, 4, 30, 4, 20, 22, 10, 25, 30}
2. Circle the points that represent the five number summary values in the dotplot below.
3. Create a data set with the five-number summary 6, 10, 12, 15, 20 that contains 12 values.
4. Here are summary statistics for Verbal SAT scores for a high school graduating class.
www.ck12.org
22
Male
Female
N
80
82
Mean
590
602
Median
600
625
SD
97.2
102.0
Min
310
360
Max
800
770
Q1
515
530
Q3
650
680
a. Create parallel boxplots comparing the scores of males and females from the information
given.
b. Discuss the shape, center and spread of the scores.
Quiz 3
1. Find the five number summary for the data set:
{25, 27, 33, 14, 31, 16, 22, 24, 43, 25, 37, 39, 42}
2. Create a data set with the five-number summary 6, 10, 12, 15, 25 that contains 11 values.
3. The table shows the number of bachelor’s degrees earned in various fields at a private
university for 1985.
Degree Field 1985
Architecture 76
Biological Sciences 158
Business and management 410
Computer science 132
Cultural studies 25
Education 247
Engineering 351
English literature 129
Law 18
Mathematics 62
Philosophy 43
Physical sciences 107
Visual and performing arts 154
23
www.ck12.org
a. Give the five number summary and the mean for the data.
b. Create a box plot for the data set.
Test 1
1. In a study of hatchling resting metabolism, three species were studied. These are labeled
A, B, and C in the pie chart below. In total, 36 hatchlings were studied.
Based on the pie chart, approximately how many of the hatchlings were Species C?
a) 8
b) 12
c) 20
d) 24
e) 30
2. In the following frequency table, what proportion of values are less than 60?
Table 1.4:
Class Interval
Frequency
15− < 30
30− < 45
45− < 60
60− < 75
15
14
16
12
www.ck12.org
24
Table 1.4: (continued)
Class Interval
Frequency
75− < 90
TOTAL
18
75
a. 0.187
b. 0.200
c. 0.213
d. 0.240
e) 0.600
3. The back-to-back stem-and-leaf plot below gives the percentage of students who dropped
out of school at each of the 49 high schools in a large city school district.
School Year 1989 − 1990
School Year 1992 − 1993
0
4
566677788899
00001111222224444
555666677778
2
13
9999887
0
4444433222211110
1
9997766665
1
4222100
88876
2
2
766
3
3
5
4
0112
stem = tens
leaf = ones
Which of the following statements is NOT justified by these data?
a. The drop-out rate decreased in each of the 49 high schools between the 1989-1990 and
1992-1993 school years
b. For the school years shown, most students in the 49 schools did not drop out of high
school.
c. In general, drop-out rates decreased between the 1989-1990 and 1992-1993 school years
25
www.ck12.org
d. The median drop-out rate of the 49 high schools decreased between the 1989-1990 and
1992-1993 school years
e. The spread between the schools with the lowest drop-out rates and those with the highest
drop-out rates did not change much between the 1989-1990 and 1992-1993 school years
4. What is the median of the data for the School Year 1989-1990?
a. 15
b. 16
c. 19
d. 20
e. cannot be determined
5. The frequency table below shows the heights (in inches ) of 130 members of a choir.
Height
Count
60
2
61
6
62
9
63
7
64
5
65
20
66
18
67
7
68
12
69
5
70
11
71
8
72
9
73
4
74
2
75
4
76
1
a. Find the five number summary for these data.
b. Display the data with a boxplot.
c. Find the mean and standard deviation.
d. Display these data with a histogram.
e. Write a few sentences describing the distribution of heights.
6. A police officer gave 20 speeding tickets last week on a stretch of road having a 60 mile
per hour speed limit. The speed recorded for each of the tickets are given below:
72 68 79 79 67 81 76 71 82 80 80 73 78 78 75 70 79 70 69 74
a. Construct a dotplot of the data.
b. Construct a stem-and-leaf display of the data.
c. Find the five number summary and create a box plot.
d. What is the range?
e. What is the IQR?
f. Are there any outliers in the data set?
www.ck12.org
26
g. Cannot be determined with the full data set.
7. I have a data set consisting of 33 whole number observations. Its five number summary
is (16, 20, 22, 30, 46)
a. What is the range of the data?
b. Identify the five numbers in the five number summary.
c. How many observations are strictly less than 22?
d. How many observations are strictly less than 20?
e. What is the interquartile range?
f. Construct a box plot.
g. Test for outliers. Are there any outliers?
8.Draw a scatterplot of the following data and describe the strength and direction of the
association between x and y.
X
Y
6
5
10
3
14
7
19
8
21
12
Test 2
1. Since Hill Valley High School eliminated the use of bells between classes, teachers have
noticed that students seem to be arriving to class a few minutes late. One teacher decided
to collect data to determine whether the students” and teachers” watches were displaying
the correct time. At exactly 12 noon, the teacher asked 9 randomly selected students and 9
randomly selected teachers to record the times on their watches to the nearest half minute.
The data is recorded in the table below, with minutes after noon recorded as positive values
and minutes before noon shown as negative values.
Students
Teachers
− 4.5
− 2.0
− 3.0
− 1.5
− 0.5
− 1.5
0
− 1.0
0
− 1.0
0.5
− 0.5
0.5
0
1.5
0
5.0
0.5
a. Construct parallel boxplots using these data.
b. Based on the boxplots in part a) how do the groups compare? Discuss shape, center and
spread.
2. A data set has the following five number summary:
Min 7
27
www.ck12.org
Q1 10
Med 12
Q3 17
Max 26
Is 26 an outlier ?
a. Yes because it is the highest value in the set
b. No, because it is the maximum
c. Yes, because it is 1.5 IQR above the median.
d. No, because it is not 1.5 IQR above Q3.
3. Here is the number of home runs Babe Ruth hit in each of his 15 years with the New
York Yankees:
54
59
35
41
46
25
47
60
54
46
49
46
41
34
22
Roger Maris, who broke Ruth’s single year record, had these home run totals In his 10 years
in the American League:
14
28
16
39
61
33
23
26
8
13
a. Compute the five number summary of each player
b. Make side-by-side box plots of the home run distributions. What does your comparison
show about Ruth and Maris as home run hitters?
4. A garage wants to understand how long customers have to wait to have their car serviced.
Below is the data they collected.
Table 1.5:
Service time (minutes)
Frequency
Cumulative Frequency
< 30
> 29 but < 45
> 44 but < 60
> 59 but < 75
> 74 but < 90
> or = 90
3
45
54
36
21
6
Total Frequencies = 165
3
48
102
138
159
165
www.ck12.org
28
a. Draw a cumulative frequency curve (ogive).]
b. Answer the following multiple choice questions:
a. The median service time was:
i. 40 minutes
ii. About half an hour
iii. More than an hour
b. Service time was under one hour
i. 137 times
ii. 102
iii. Not very many
c. The number of occasions when service time was less than 30 minutes
i. 3
ii. 40
iii. 55
d. The number of times service time was over three-quarters of an hour
i. 27
ii. 110 times
iii. 90 times
5. The rapid growth of internet publishing is seen in a number of electronic academic journals
made available in the 1990’s.
Table 1.6:
Year
Number of Journals
1991
1992
1993
1994
1995
1996
1997
27
36
45
181
306
1093
2459
Make a scatterplot and describe the strength and direction of the association.
29
www.ck12.org
6.The dotplot below shows the number of televisions owned by each family on a city block.
Which of the following statements are true?
A. The distribution is right-skewed with no outliers.
B. The distribution is right-skewed with one outlier.
C. The distribution is left-skewed with no outliers.
D. The distribution is left-skewed with one outlier.
E. The distribution is symmetric.
1.3
Introduction to Probability
Events, Sample Spaces and Probability
Quiz 1
1. You are going to roll a die three times and note how many odd numbers you get. What
is the sample space?
2. Make a Venn diagram for the following:
31% of my students got an A on the exam
29% of my students studied for the exam
40% of my students bought me flowers
15% of my students studied for the exam, bought me flowers, and got an A
2% of my students got an A but did not study or buy me flowers
28% of my students bought me flowers and got an A
18% of my students studied and bought me flowers
3. A marble is randomly selected from a box containing 5 green, 3 red and 7 blue marbles.
Determine the probability that the marble is:
www.ck12.org
30
a. Red
b. Green
c. Neither green nor blue
Quiz 2
1. A marble is randomly selected from a box containing 5 green, 3 red and 7 blue marbles.
Determine the probability that the marble is
a. Blue
b. Not red
c. Green or red
2. In a class of 30 students, 19 study physics, 17 study chemistry and 15 study both of these
subjects. Display the information on a Venn diagram and determine the probability that a
randomly selected class member studies:
a. Both subjects
b. Physics but not chemistry
c. Chemistry if it is known that the student studies physics
3. A coin is tossed and a square spinner, labeled A, B, C, D is twirled. Determine the
probability of obtaining:
a. A head and consonant
b. A tail and C
c. A tail or a vowel
Quiz 3
1. A dart board has 36 sectors, labeled 1 to 36. Determine the probability that a dart
thrown at the board hits:
a. A multiple of 4
b. 9
c. A number greater than 20
2. In a class of 30 students, 19 study physics, 17 study chemistry and 15 study both of these
subjects.Display the information on a Venn diagram and determine the probability that a
randomly selected class member studies:
31
www.ck12.org
a. Neither subject
b. At least one of the subjects
c. Exactly one of the subjects
3. A coin is tossed and a square spinner, labeled A, B, C, D is twirled. Determine the
probability of obtaining:
a. A tail and a vowel
b. A vowel and B
Compound Events
Quiz 1
1. A fair coin is tossed four times. Two events are defined as follows:
A: {at least one tail is observed}
B:{the number of tails observed is even}
a. List the outcomes of A
b. List the outcomes of B
c. Find P (AC ), P (B), P (AC ∪ B)
2. Given that the probability of a person traveling to Canada is l18, to Mexico is .09 and to
both countries is .04.
a. Draw a Venn diagram to illustrate this situation
b. What is the probability that a person chosen at random has
i. Traveled to Canada but not Mexico
ii. Traveled to either Canada or Mexico
iii. Not traveled to either country
Quiz 2
1. A fair coin is tossed four times. Two events are defined as follows:
A: {at least one tail is observed}
B:{the number of tails observed is even}
a. List the outcomes of A
www.ck12.org
32
b. List the outcomes of B,
c. Find P (AC ), P (B), P (AC ∪ B)
2. Data from a large company reveal that 72% of the workers are married, that 44% are
college graduates and that half of the college graduates are married.
a. Draw a Venn diagram to illustrate this situation
b. Find the probability that a randomly chosen worker
i. Is neither married nor a college graduate
ii. Is married but not a college graduate
iii. Is married or a college graduate
Quiz 3
1. A check on dorm rooms on a large college campus revealed that 38% had refrigerators,
52% had TVs and 21% had both a TV and a refrigerator.
a. Draw a Venn diagram of this situation.
b. Find the probability that a randomly selected dorm room has
i. A TV but no refrigerator
ii. A TV or a refrigerator but not both
iii. Neither a TV nor a refrigerator
2. Given two simple events A and B. Suppose the following is true:
P (A) = .78
P (B) = .36
P (A ∩ B) = .22
Find a. P (A ∩ B C )
b. P (B ∩ AC )
c. P (AC ∩ B C )
33
www.ck12.org
Conditional Probability
Quiz 1
1. In a class of 25 students 14 like Pizza and 16 like coffee. One student likes neither and 6
students like both. One student is selected from the class. What is the probability that the
student
a. Likes pizza
b. Likes pizza given that he/she likes coffee?
2.
26
52
13
GivenP (H) =
52
13
P (H ∩ R) =
52
P (R) =
Find a. P (H|R)
b. P (R|H)
3. In a group of 50 students, 40 study math, 32 study physics and each student studies at
least one of these subjects.
a. Use a Venn diagram to find how many students study both subjects.
b. If a student from this group is randomly selected, find the probability that he/she studies
physics given that he/she studies mathematics.
Quiz 2
1. A box of chocolate contains 6 with hard centers (H) and 12 with soft centers (S). Find
a. P (H)
b. P (S)
c. P (H ∩ S)
d. P (H ∪ S)
2. In a class of 40, 34 like bananas, 22 like pineapples and 2 dislike both fruits. If a student
is randomly selected, find the probability that the student:
a. Likes both fruits
www.ck12.org
34
b. Likes at least one fruit
c. Likes bananas given that he/she like pineapples
d. Dislikes pineapples given that he/she likes bananas
3. 400 families are surveyed. It is found the 90% had a TV set and 60% had a computer.
Every family had at least one of these items. If one of these families is randomly selected
find the probability it has a TV set given that it has a computer.
Quiz 3
1.
2
5
1
IfP (B) =
3
1
P (A ∪ B) =
2
P (A) =
Find
a. P (A ∩ B)
b. P (B|A)
c. P (A|B)
2. A class has 25 students. 13 play tennis, 14 play volleyball and 1 plays neither of these
two sports. A student is randomly selected from the class. Determine the probability that
the student:
a. Plays both tennis and volleyball
b. Plays at least one of these two sports.
c. Plays volleyball given that he/she does not play tennis.
3. The probability that a boys eats his lunch is .5 and the probability that his sister eats her
lunch is .6. The probability that the girl eats her lunch given that the boy eats his lunch is
.9. Determine the probability that:
a. Both eat their lunch
b. The boy eats his lunch given that the girl eats hers
35
www.ck12.org
Additive and Multiplicative Rules
Quiz 1
1. A box of chocolates contains 6 with hard centers (H) and 12 with soft centers(S). Are
the events H and S mutually exclusive?
2. Given the following information :
P (A) = ·78
If P (B) = ·3
P (A ∩ B) = ·22
a. Are the events A and B mutually exclusive? Explain
b. Are the events A and B independent? Explain
Quiz 2
1. A university requires its biology majors to take a course called Bioresearch. The prerequisite for this course is either a statistics course or a computer course. By the time they are
juniors, 52% of the biology majors have taken statistics, 23% have had a computer course,
and 7% have done both.
a. Are taking the two courses, statistics and computers, mutually exclusive? Explain
b. Are taking these two courses independent? Explain.
2.
1
1
P (A) = P (B) = p(A ∪ B) = p Find p if
2
3
:
a. A and B are mutually exclusive
b. A and B are independent
Quiz 3
1. Fifty –six percent of American workers have a retirement plan, 68% have health insureance
and 49% have both benefits.
a. Are having health insurance and a retirement plan independent events? Explain.
www.ck12.org
36
b. Are having these two benefits mutually exclusive? Explain
2. If P (X) = .5 and P (Y ) = .7 and X and Y are independent determine the probability of
the occurrence of:
a. Both Xand Y
b. X or Y
c. X given that Y occurs
Basic Counting Rules
Quiz 1
1. If the NCAA has applications from 7 universities for hosting its tennis championships in
2003 and 2004, how many ways may they select the hosts for these championships
a. If they are not both to be held at the same university?
b. If they may both be held at the same university?
2. A multiple-choice test consists of 15 questions, each permitting a choice of 4 alternatives.
In how many ways may a student fill in the answers if he/she answers each question?
3. David owns 4 pairs of pants, 8 shirts and 2 sweaters. In how many ways may he choose
2 of the pairs of pants, 3 of the shirts and 1 of the sweaters to pack for a trip?
4. An art collector, who owns 12 original paintings, is preparing a will. In how many ways
may the collector leave these paining to four heirs?
5. Bin A contains 3 red and 2 white tickets. Bin B contains 4 red and 1 white ticket. A die
has 4 faces marked A and two faces marked B. The die is rolled and used to select the bin
A or B. a ticket is then selected from the bin. Use a tree diagram to show all the possible
outcomes (choices of tickets).
Quiz 2
1. The probability that Ann’s mother takes her shopping is 25 . When Ann goes shopping
with her mother she gets an ice cream 75% of the time. When Ann does not go shopping
with her mother she gets an ice cream 25% of the time. Draw a tree diagram to illustrate
the possible outcomes.
2. There are five finalists in a contest. In how many ways may the judges choose a winner
and a first runner-up?
3. In a primary election, there are four candidates for mayor, five candidates for treasurer
37
www.ck12.org
and three candidates for secretary. In how many ways may voters mark their ballots if they
vote in all three of the races?
4.
a. How many permutations are there of the letters in the word great?
b. How many permutations are there of the letters in the word greet?
5. In how many ways may one A, three B’s, two C’s and one F be distributed among seven
students in a statistics class?
Quiz 3
1. Suppose a true-false test has 20 questions.
In how many ways may a student mark the test, if each question is answered?
2. A football team plays 12 games during the season.In how many ways can it end the season
with 6 wins, 5 losses and 1 tie?
3. Urn A contains 2 red and 3 blue marbles, and urn B contains 4 red and 1 blue marble.Peter
tosses a coin and if the coin comes up heads he chooses a marble from urn A. Draw a tree
diagram to represent all the possible outcomes.
4. How many distinct permutations are there of the word “statistics”?
5. A bank has a pool of 8 tellers and 8 customer service representatives. How many ways
can the manager select 4 teller and 2 service reps to work on a given day?
Test 1
1. You are going to roll a die three times and note how many odd numbers you get. What
is the sample space?
2. Give an example of two sets A and B which are mutually exclusive.
3. Here is the probability associated with winning certain prizes in a raffle:
Car
.03
Boat
.07
TV
.12
Can Opener
.33
a. What is the probability of winning nothing
b. What is the probability of winning the car or the TV?
c. What is the probability of winning the boat and the can opener?
www.ck12.org
38
d. What is the probability of not winning the car or the can opener?
4. You play two games against the same opponent. The probability you win the first game
is ·4. If you wind the first game, the probability you also win the second is ·2. If you lose
the first game, the portability that you win the second is ·3
a. Are the games independent? Explain your answer
b. What is the probability that you lose both games?
c. What is the probability that you win both games?
5. Events A and B are defined by the given Venn diagram:
Aand Bare
a. Independent and disjoint
b. Dependent and disjoint
c. Independent and not disjoint
d. Dependent and not disjoint
e. Cannot be determined
6. If performance on AP Statistics tests are independent and the probability of passing an
AP Statistics test is ·2, then the probability of passing three AP Statistics tests is
a. 6 b. .2
c. .04
d. .008
e. 0
7. 45% of a high school student body is male. 80% of the females love math, while only 60%
of the males love math. What percentage of the student body love math?
39
www.ck12.org
a. 70% b. 50% c. 71% d. 60% e. 100%
8. If 3coins are tossed, what is the number of equally likely outcomes?
a. 3 b. 4 c. 6 d. 8 e. 9
9. If p(X) = ·23 and p(X ∩ Y ) = ·12 and P (X ∪ Y ) = ·34 then P (y ′ ) =
a. .23 b. .52 c. .11 d. .77 e. .48
10. How many possible 5− character code words are possible if the first two characters are
letters and the last three characters are numbers? (No character may be repeated)
a. 468000 b. 82 c. 676000 d. 78
Introduction to Probability
Test 2
1. In a recent survey of 100 10− year olds, the following information was obtained: 53liked
McDonalds 12 liked both McDonalds and Burger Kin
24 liked Burger King 6 liked all three
42 liked Wendy’s 23 liked McDonalds and Wendy’s
4 liked only Burger King
a. Draw a Venn diagram illustrating this information
b. How many 10− year olds don’t like any of these three?
c. What percentage of these 10− year olds like burger King and Wendy’s?
d. What is the probability that a 10− year old likes Burger King, given that he likes
Wendy’s?
2. If if: P (A) = ·2 and P (B) = ·3 find p(A ∪ B) if:
a. Aand Bare independent b. Aand B are mutually exclusive
3. A survey of families revealed that 8% of all families eat turkey at holiday meals, 44% eat
ham, and 16% have both turkey and ham.
a. What is the probability that a family selected at random had neither turkey nor ham at
their holiday meal?
b. What is the probability that a family selected at random had only ham without having
turkey at their holiday meal?
c. What is the probability that a randomly selected family had ham at their holiday meal,
given that they had turkey?
www.ck12.org
40
d. Are having turkey and having ham mutually exclusive events? Explain.
4. Draw a tree diagram to answer the following question:
A college was making plans for staffing and used the following: of the students, 22 · 5% were
seniors, 25% were junior, 25% were sophomores and the rest were freshman. Also, 40% of
the seniors major in the area of humanities, as did 39% of the juniors, 40% of the sophomores
and 36% of the freshmen.What is the probability that a randomly selected humanities major
is a junior?
5. Assume that 75% of the AP Stat students studied for this test. If 40% of those who study
get an A, but only 10% of those who don’t study get an A, what is the probability that
someone who gets an A actually studied for the test?
6. Insurance company records indicate that 12% of all teenage drivers have been ticketed for
speeding and 9% for going through a red light. If 4% have been ticketed for both, what is the
probability that a teenage driver has been issued a ticket for speeding but not for running a
red light?
a.3%
b.8%
c. 12%
d. 13%
e. 17%
7. Which two events are most likely to be independent?
a. Being a senior, going to homeroom
b. Registering to vote; being left-handed
c. Having a car accident; having a junior license
d. Doing statistics homework; getting an A on the test
e. Having 3 inches of snow in the morning; being on time for school
8. How many different three member teams can be formed from six students?
a. 20
b. 120
c. 216
d. 720
9. How many different 6−letter arrangements can be formed using the letters in the word
ABSENT, if each letter is used only once?
41
www.ck12.org
a. 6
b. 36
c. 46656
d. 720
10. How many elements are in the sample space of rolling one die?
a. 6
b. 12
c. 24
d. 36
11. A movie theater sells 3 sizes of popcorn (small, medium, and large) with 3 choices of
toppings (no butter, butter, extra butter). How many possible ways can a bag of popcorn
be purchased?
a. 1
b. 3
c. 9
d. 27
1.4
Discrete Probability Distributions
Probability Distribution for a Discrete Random Variable
Quiz 1
1. Classify the following random variables as continuous or discrete:
a. The quantity of fat in a lamb chop
b. The mark out of 50 for a geography test
c. The weight of a seventeen year old student.
2. To measure the rainfall over a 24 − hour period, the height of water collected in a rain
gauge (up to 200mm) is used. Identify the random variable being considered, give the
possible values for the random variable and indicate whether the variable is continuous or
discrete.
3. A magazine store recorded the number of magazines purchased by its customers in one
www.ck12.org
42
day. 23% purchased one magazine, 38% purchased two, 21% purchased three, 13% purchased
four and 5% purchased five.
a. What is the random variable?
b. What are the possible values of the random variable?
c. Make a random variable probability table.
d. Graph the probability distribution.
Quiz 2
1. Classify the following random variables as continuous or discrete.
a. The volume of water in a cup of coffee
b. The number of trout in a lake
c. The number of hairs on a cat.
2. To investigate the stopping distance for a tire with a new tread pattern a braking experiment is carried out. Identify the random variable being considered, give the possible values
for the random variable and indicate whether the variable is continuous or discrete.
3. Following is a probability distribution table:
X
P (x)
0
a
1
.3333
2
.1088
3
.0084
4
.0007
5
.0000
a. What is the value of a?
b. What is the value of P (2)?
c. Graph the probability distribution.
Discrete Probability Distributions
Probability Distribution for a Discrete Random Variable
Quiz 3
1. Classify the following random variables as continuous or discrete.
a. The length o hairs on a horse
b. The height of a sky-scraper
43
www.ck12.org
2. To check the reliability of a new type of light switch, switches are repeatedly turned off
and on until they fail. Identify the random variable being considered, give the possible values
for the random variable and indicate whether the variable is continuous or discrete.
3. Given the following probability distribution:
X
P (x)
0
0.07
1
0.14
2
K
3
0.46
4
0.08
5
0.02
a. Find K
b. Find
i. P (x ≥ 2)
ii. P (1 < x ≤ 3)
c. Graph the probability distribution.
Mean and Standard Deviation of Discrete Random Variables
Quiz 1
1. Consider the following probability distribution:
X
P (x)
0
0.00
1
0.23
2
0.38
3
0.21
4
0.13
5
0.05
a. Find the mean of the distribution.
b. Find the variance.
c. Find the standard deviation.
2. Find the expected value of the following probability distribution:
X
P (X = x)
2
0.3
4
0.4
6
0.2
8
0.1
Quiz 2
1. The probability model below describes the number of repair calls that an appliance repair
shop may receive during an hour.
www.ck12.org
44
Repair Calls
Probability
0
0.1
1
0.3
2
0.4
3
0.2
a. How many calls should the shop expect per hour?
b. What is the standard deviation?
2. Following is a discrete probability distribution:
X
P (X)
0
0.54
1
0.26
2
0.15
3
0.03
4
0.01
5
0.01
>5
0.00
a. Find the mean of the distribution
b. Find the variance
c. Find the standard deviation
Quiz 3
1. A random variable X has the following probability distribution:
X
P (X)
1
0.1
2
0.2
3
k
4
0.2
5
0.1
a. Find K
b. Find the mean of the distribution
c. Find the variance of the distribution
d. Find the standard deviation of the distribution
2. Given the following probability distribution
X
P (x)
0
0.9675
8000
0.03
20000
0.0025
Find the expected value of X.
45
www.ck12.org
Binomial Distribution
Quiz 1
1. Suppose x is a binomial random variable with n = 5, p = 0.25. Calculate p(x) for the
values: x = 0, 1, 2, 3, 4, 5. Give the probability distribution in tabular form.
2. Suppose x is a binomial random variable with n = 4 and p = 0.5.
a. Display p(x) in tabular form.
b. Compute the mean and the variance of x.
3. In a test for ESP, a subject is told that cards the experimenter can see but he cannot
contain a star, a circle, a wave or a square. As the experimenter looks at each of the 20
cards in turn, the subject names the shape on the card. A subject who is just guessing has
probability .25 of guessing correctly on each card.
a. The count of correct guesses in 20 cards has a binomial distribution. What are n and p?
b. What is the mean number of correct guesses?
c. What is the probability of exactly 5 correct guesses?
Quiz 2
1. Suppose x is a binomial random variable with n = 3, p = 0.2. Calculate p(x) for the
values: x = 0, 1, 2, 3. Give the probability distribution in tabular form.
2. Suppose x is a binomial random variable with n = 5 and p = 0.4.
a. Display p(x) in tabular form.
b. Compute the mean and the variance of x.
3. A federal report finds that lie detector tests given to truthful persons have probability of
.2 suggesting that the person is deceptive. A company asks 12 job applicants about thefts
from previous employers, using a lie detector to assess their truthfulness. Suppose that all
12 answer truthfully.
i. What is the probability that the lie detector says all 12 are truthful?
ii. What is the probability that the lie detector says at least one is deceptive?
iii. What is the mean number among 12 truthful persons who will be classified as deceptive?
www.ck12.org
46
Quiz 3
1. Suppose x is a binomial random variable with n = 7, p = 0.2. Calculate p(x) for the
values: x = 0, 1, 2, 3, 4, 5, 6, 7. Give the probability distribution in tabular form.
2. Suppose x is a binomial random variable with n = 7 and p = 0.5.
a. Display p(x) in tabular form.
b. Compute the mean and the variance of x.
3. A test for the presence of antibodies to the AIDS virus in blood has probability 0.99
of detecting the antibodies when they are present. Suppose that during a year 20 units of
blood with AIDS antibodies pass through a blood bank.
a. Take X to be the number of these 20 units that the test detects. What is the distribution
of X?
b. What is the probability that the test detects all 20 contaminated units?
c. What is the probability that at least one unit is not detected?
d. What is the mean number of units among the 20 that will be detected?
Geometric Distribution
Quiz 1
1. A basketball player has made 75% of his foul shots during the season. Assuming the shots
are independent, find the probability that in tonight’s game he
a. Misses for the first time on his fifth attempt.
b. Make his first basket on his fourth shot.
c. Makes his first basket on one of his first 3 shots.
2. Suppose the average number of lions seen on a 1-day safari is 5.
a. What is the probability that tourists will see exactly four lions on the next 1-day safari?
b. What is the probability that tourists will see exactly one lion on the next 1-day safari?
3. Of pre-med students in a private university, on average only 36% of students enrolled in
a given section of organic chemistry will pass. What is the probability that Sarah will have
to take the class three times in order to pass?
47
www.ck12.org
Quiz 2
1. A tool hire shop has six lawn mowers which it hires out on a daily basis. The number
of lawn mowers requested per day follows a Poisson probability distribution with mean 4.5.
Find the probability that:
i. exactly three lawn mowers are hired out on any one day;
2. Bob is a high school basketball player. He is a 70% free throw shooter. That means his
probability of making a free throw is 0.70. What is the probability that Bob makes his first
free throw on his fifth shot?
3. Over the course of a season, a basketball player is a 26% free throw shooter. In practice,
her coach tells her to take 50 throw shots. What is the probability that she makes her first
basket on the 4th shot?
Quiz 3
1. A statistics professor find s that when she schedules an office hour for student help, an
average of two students arrive. Find the probability that in a randomly selected office hour,
the number of student arrivals is five.
2. In a deck of 52 cards there are 12 face cards.
a. What is the probability of drawing a face card from the deck?
b. If you draw cards with replacement (that is you replace the card before you choose another
card), what is the probability that the first face card you draw is the tenth card?
3. The mean number of wiring faults in a new house is 8. What is the probability of buying
a new house with exactly 1 wiring fault?
Test 1
1. In a population of students, the number of calculators owned is a random variable x with
P (x = 0) = .2, P (x = 1) = .6 and P (x = 2) = .2. The mean of this probability distribution
is
a. 0
b. 2
c. 1
d. .5
2. Refer to the previous problem. The variance of this probability distribution is
www.ck12.org
48
a. 1
b. .63
c. .5
d. .4
e. The answer cannot be computed from the information given.
3. A psychologist studied the number of puzzles subjects were able to solve in a five minute
period while listening to soothing music. Let x be the number of puzzles completed successfully by a subject. X had the following distribution
X
P (x)
1
.2
2
.4
3
.3
4
.1
What is the probability that a randomly chosen subject completes at least 3 puzzles in the
ifve minute period while listening to soothing music?
a. .3
b. .4
c. .6
d. .9
e. The answer cannot be computed from the information given.
4. Using the data in problem 3, P (X < 3) =
a. .3
b. .4
c. .6
d. .9
5. Which of the following is not a property of a binomial experiment?
a. It consists of a fixed number of trials n.
b. Outcomes of different trials are independent.
c. Each trial can result in one of several different outcomes.
d. X = the number of successes observed when the experiment is performed.
6. The probability that 0, 1, 2, 3, or 4 people will seek treatment for the flu during any given
hour at an emergency room is shown in the distribution
49
www.ck12.org
X
P (X)
0
.12
1
.25
2
.32
3
.24
4
.06
a. What does the random variable count or measure?
b. What is the mean of X?
c. What is the variance and standard deviation of X?
7. There is a probability of 0.08 that a vaccine will cause a certain side effect. Suppose that
a number of patients are inoculated with the vaccine. We are interested in the number of
patients vaccinated until the first side effect is observed.
a. Define the random variable of interest, X =
b. Find the probability that exactly 5 patients must be vaccinated in order to observe the
first side effect.
c. Construct a probability distribution table for X (up through X = 5).
The National Association of Retailers reports that 65% of all purchases are now made by
credit card; on a typical day a retailer makes 20 sales.
8. Explain why the sales can be considered as Bernoulli trials.
9. What is the probability that the fifth customer is the first one who uses a credit card?
10. Let X = number of customers who use a credit card on a typical day. What is the
probability model for X? Give the mean and standard deviation.
11. What is the probability that on a typical day at least half of the customers use a credit
card?
Test 2
1. Which of the following is not a property of a geometric experiment?
a. It consists of a fixed number of trails n.
b. Outcomes of different trials are independent
c. Each trial can result in one of two possible outcomes.
d. The probability of success is the same for all trials.
2. If x is a binomial random variable with n = 10 and p = .25 then
a. σX = 1.875
√
b. σX = 2.5
www.ck12.org
50
c. σX =
√
1.875
√
2
d. σX
= 1.875
3. A friend of yours plans to toss a fair coin 150 times. You watch the first 30 tosses, noticing
that she got only 11 heads. Then you get bored and leave. If the coin is fair, how many
heads do you expect her to have when she has finished the 150 tosses?
a. 80
b. 75
c. 92
d. 100
e. 96
4. Which of these has a geometric model?
a. The number of black cards in a 10-card hand
b. The colors of the cars in a parking lot
c. The number of hits a baseball player gets in 6 times at bat.
d. The number of cards drawn from a deck until we find all four aces.
e. The number of people we survey until we find someone who owns an ipod.
5. Which of these has a binomial model?
a. The number of black cards in a 10-card hand
b. The colors of the cars in a parking lot
c. The number of hits a baseball player gets in 6 times at bat.
d. The number of cards drawn from a deck until we find all four aces.
e. The number of people we survey until we find someone who owns an ipod.
6. Coke is running a sales promotion in which 13% of all bottles have a “FREE” logo under
the cap. What is the probability that you find three free one in a 6-pack?(.0289)
a. 1%
b. 3%
c. 13%
d. 12%
e. 23%
The owner of a store is trying to decide whether to discontinue selling tabloid newspapers.
51
www.ck12.org
He suspects that only 5% of the customers buy a tabloid. He decides that for one day he’ll
keep track of the number of customers and whether or not they buy a tabloid.
7. Assuming the owner is correct in thinking that 5% of the customers purchase tabloids,
how many customers should he expect before someone buys a tabloid?
8. What is the probability that he does not sell a tabloid until the 5th customer?
9. What is the probability that exactly 3 of the first 15 customers buy tabloids?
10. What is the probability that at least 5 of his first 40 customers buy tabloids?
11. He had 300 customers that day. Assuming this day was typical for his store, what would
be the mean and standard deviation of the number of customers who buy tabloids each day?
1.5
Normal Distribution
Standard Normal Probability Distribution
Quiz 1
1. When a specific vegetable is grown in a certain manner without fertilizer the weight of the
vegetable produced is normally distributed with a mean of 40 g and a standard deviation of
10 grams. Determine the proportion of the vegetable grown
a. With weights less than 50 grams
b. With weight greater than or equal to 60 grams
2. A clock manufacturer investigated the accuracy of its clock after a year of continuous use.
He found that the mean error was 0 minutes with a standard deviation of 2 minutes. If a
buyer purchases 600 of these clocks, find the expected number that will be on time or up to
4 minutes fast after a year of continuous use.
3. IQ tests are standardized to a normal model with a mean of 100 and a standard deviation
of 16. Draw the model for these IQ scores. Clearly label it showing what the empirical rule
predicts about the scores.
4. The average reading speed of students completing a speed-reading course is 450 words
per minute. If the standard deviation is 70 words per minute, find the z score associated
with a reading speed of 420 words per minute.
5. Given the following set of data, find the mean and standard deviation and z score of each
speed and create a normal probability plot from your results. Based on your plot comment
on the normality of the distribution.
www.ck12.org
52
Speed
31
29
31
34
27
34
37
28
29
30
26
29
24
38
34
31
36
29
31
34
34
32
36
Quiz 2
1. When a certain vegetable is grown without fertilizer the vegetable produced have weights
that are normally distributed with a mean of 140 grams and a standard deviation of 40 grams.
Determine the proportion of the vegetable grown
a. With weights less than 60 grams.
b. With weights between 20 and 60 grams.
2. A clock manufacturer investigated the accuracy of its clock after a year of continuous use.
He found that the mean error was 0 minutes with a standard deviation of 2 minutes. If a
buyer purchases 600 of these clocks, find the expected number that will be on time or up to
6 minutes slow after a year of continuous use.
3. IQ tests are standardized to a normal model with a mean of 100 and a standard deviation
of 16. Draw the model for these IQ scores. In what interval would you expect the central
68% of the IQ scores to be found?
4. The average reading speed of students completing a speed-reading course is 450 words
per minute. If the standard deviation is 70 words per minute, find the z score associated
with a reading speed of 475 words per minute.
5. IQ scores for a random sample of people are shown below.
72
79
87
91
99
101
103
106
111
113
116
126
Find the mean, standard deviation and z score for each and create a normal probability plot.
Based on your plot, comment on the normality of this data.
Quiz 3
1. The height of male students if normally distributed with a mean of 170 cm and a standard
deviation of 8 cm. Find the percentage of male students whose height is between 162 cm
and 170 cm.
2. A clock manufacturer investigated the accuracy of its clock after a year of continuous
use. He found that the mean error was 0 minutes with a standard deviation of 2 minutes.
53
www.ck12.org
If a buyer purchases 600 of these clocks, find the expected number that will be between
4 minutes slow and 6 minutes fast after a year of continuous use.
3. Automobiles that have been recently tested predicted a mean of 24.8 mpg and a standard
deviation of 6.2 mph for highway driving. Assume a Normal model can be applied. Draw
the model for auto fuel economy. Clearly label it, showing what the empirical rule predicts
about miles per gallon.
4. The average reading speed of students completing a speed-reading course is 450 words
per minute. If the standard deviation is 70 words per minute, find the z score associated
with a reading speed of 320 words per minute.
5. Given the following data: 22 17 18 29 22 23 24 23 17 21. Find the mean, standard
deviation and z score for the data and create a normal probability plot. Based on your plot,
comment on the normality of the data.
The Density Curve of the Normal Distribution
Quiz 1
1. Given that a random variable X is normally distributed with a mean 70 and standard
deviation 4, find P (x ≥ 74) by first converting to the standard variable z and then using the
table of standard normal probabilities.
2. The arm lengths of 18 year old females are normally distributed with mean 64 cm and
standard deviation 4 cm. Find the percentage of 18 year old females whose arm lengths are
between 59 cm and 74 cm.
3. Use the table to verify the empirical rule for P (−2 ≤ z ≤ 2)
4. Fish are washed onto a beach after a storm. Their lengths are found to have a normal
distribution with a mean of 41 cm and a variance of 11 square cm. Find the proportion of
fish measuring between 40 cm and 50 cm.
Quiz 2
1. Given that a random variable X is normally distributed with a mean 70 and standard
deviation 4, find P (x ≤ 68) by first converting to the standard variable z and then using the
table of standard normal probabilities.
2. The arm lengths of 18 year old females are normally distributed with mean 64 cm and
standard deviation 4 cm. Find the percentage of 18 year old females whose arm lengths are
greater than 61 cm.
3. Use the table to verify the empirical rule for P (−3 ≤ z ≤ 3)
www.ck12.org
54
4. Fish are washed onto a beach after a storm. Their lengths are found to have a normal
distribution with a mean of 41 cm and a variance of 11 square cm. If a fish is randomly
selected, find the probability that it is at least 50 cm.
Quiz 3
1. Given that a random variable X is normally distributed with a mean 70 and standard
deviation 4, find P (60.6 ≤ x ≤ 68.4) by first converting to the standard variable z and then
using the table of standard normal probabilities.
2. The arm lengths of 18 year old females are normally distributed with mean 64 cm and
standard deviation 4 cm. Find the probability that a randomly chosen 18 year old female
has an arm length in the range 55 cm to 67 cm.
3. Use the table to find the P (−1.64 ≤ z ≤ 1.64)
4. Fish are washed onto a beach after a storm. Their lengths are found to have a normal
distribution with a mean of 41 cm and a variance of 11 square cm. How many fish from a
sample of 200 would you expect to measure at least 45 cm?
Applications of the Normal Distribution
Quiz 1
1. Circular tokens are used to operate a washing machine. The diameters of the tokens are
known to be normally distributed. Only tokens with diameters between 1.93 and 2.05 cm
will operate the machine.
a. Find the mean and standard deviation of the distribution given that 2% of the tokens are
too small and 3% are too large.
2. From the results of a statistics test, the average score was 46 with a standard deviation of
25. The teacher decided to give an A to the top 7% of the students in the class. Assuming
that the scores were normally distributed, find the lowest score that a student must obtain
in order to achieve an A.
3. Assume a normal distribution. Find the missing parameter.
µ = 20, 45% above 30, σ =?
Quiz 2
1. Assume a normal distribution, find the missing parameter.
55
www.ck12.org
µ = 0.64, 12% above 0.70, σ =?
2. a. The average weight of eggs produced by young hens is 50.9 grams and only 28% of
the eggs exceed 54 grams. Assume the normal distribution is appropriate, what would the
standard deviation of the egg weights be? (5.3 grams)
b. When the hens have reached the age of 1 year, the eggs they produce average 67.1 grams
and 98% of them are above 54 grams. Again, assuming a normal distribution, what is the
standard deviation of these eggs weights? (6.4 grams)
c. Are egg sizes more consistent for the younger hens or the older ones? Explain. (younger
since sd is smaller)
3. A tire manufacturer believes that the treadlife of its snow tires can be described by a
normal distribution with a mean of 32, 000 miles and standard deviation of 2500 miles. He
wants to offer a refund to any customer whose tires fail to last a certain number of miles.
He is willing to give refund to no more than 1 of every 25 customers, for what mileage can
he guarantee these tires to last?
Quiz 3
1. Assuming a normal distribution, find the missing parameter
σ = .5, 80% below 100, µ =? (95.79)
2. While only 5% of babies have learned to walk by the age of 10 months, 75% are walking
by the age of 13 months. If the age at which babies develop the ability to walk can be
described by a normal model, find the mean and standard deviation of the model. (mean
= 12.1, sd = 1.3)
3. A department store sells furniture that is advertised with the claim that it “takes less
than an hour to assemble”. However, through surveys the store has learned that only 25% of
their customers succeeded in building the furniture in under an hour; 5% said it took them
over 2 hours. The store assumes that consumer assembly time follows a normal distribution.
a. Find the mean and standard deviation of this model. (mean = 1.29 hours, sd = .43 hours)
b. The store wants to change its advertising claim. What assembly time should the
store quote in order that 60% of the customers succeed in finishing the assembly by then?
(1.4 hours)
www.ck12.org
56
Test 1
1. The weights of cockroaches living in a typical college dormitory are approximately normally distributed with a mean of 80 grams and a standard deviation of 4 grams. The
percentage of cockroaches weighing between 77 grams and 83 grams is about:
a. 99.7%
b. 95%
c. 68%
d. 55%
e. 34%
2. Scores on the ACT are normally distributed with a mean of 18 and a standard deviation
of 6. The interquartile range of the scores is approximately:
a. 8.1
b. 12
c. 6
d. 10.3
e. 7
3. The test grades at a large school have an approximately normal distribution with a mean
of 50. What is the standard deviation of the data so that 80% of the students are within 12
points (above or below) the mean?
a. 5.875
b. 9.375
c. 10.375
d. 14.5
e. cannot be determined from the given information
4. The average cost per ounce for glass cleaner is 7.7 cents with a standard deviation of
2.5 cents. What is the z−score of Windex with a cost of 10.1 centers per ounce?
a. .96
b. 1.31
c. 1.94
d. 2.25
57
www.ck12.org
e. 3.00
5. Jay Olshansky from the University of Chicago was quoted in Chance News as arguing
that for the average life expectancy to reach 100, 18% of the people would have to live to
age 120. What standard deviation is he assuming for this statement to make sense?
a. 21.7
b. 24.4
c. 25.2
d. 35.0
e. 111.1
6. The best male long jumpers for State College since 1973 have averaged a jump of
263.0 inches with a standard deviation of 14.0 inches. The best female long jumpers have
averaged 201.2 inches with a standard deviation of 7.7 inches. This year Joey jumped
275 inches and his sister, Carla, jumped 207 inches. Both are State College students. Assume that the lengths of jumps for both males and females are approximately normal. Within
their groups, which athlete had the more impressive performance?
7. The length of pregnancies from conception to natural birth among a certain female
population is normally distributed random variable with mean 270 and standard deviation
10 days.
a. What is the percent of pregnancies that last more than 300 days?
b. How short must a pregnancy be in order to fall in the shortest 10% of all pregnancies?
8. For a normally distributed population, fill in the following blanks:
% of the population observations lie within 1.96 standard deviations on either side of
a.
the mean.
b.
% of the population observations lie within 1.64 standard deviations on either side of
the mean.
9. Find the proportions of observations from a standard normal distribution that satisfies
each of these statements. In both cases, sketch a standard normal curve and shade the area
under the curve that answers the question.
a. Z > −1.68
b. −0.84 < Z < 1.26
Test 2
1. The empirical Rule can be used when assessing a distribution if
www.ck12.org
58
a. The distribution is approximately normal
b. The distribution is skewed
c. The distribution is heavily tailed
d. The standard deviation is close to the interquartile range
e. The mean is equal to 0 and the standard deviation is equal to 1
2. Which of the following are true statements?
I the area under the normal curve is always equal to 1
II The smaller the standard deviation of a normal curve, the higher and narrower the graph.
III Normal curves with different means are centered around different numbers.
a. I and II
b. I and III
c. II and III
d. All of the above
e. None of the above
3. The heights of adult women are approximately normally distributed about a mean of
65 inches with a standard deviation of 2 inches. If Rachel is at the 99th percentile in heigh
for all adult women, then her height, in inches, is closest to
a. 60
b. 62
c. 68
d. 70
e. 74
4. Joan’s doctor told her that the standardized score for her systolic blookd pressure, as
compared to the blood pressure of other women her age, is 1.50. Which of the following is
the best interpretation of her standardized score?
a. Joan’s systolic blood pressure is 150.
b. Joan’s systolic blood pressure is 1.5 standard deviations above the average systolic blood
pressure of women her age.
c. Joan’s systolic blood pressure is 1.5 above the average systolic blood pressure of women
her age
d. Joan’s systolic blood pressure is 1.5 times the average systolic blood pressure of women
59
www.ck12.org
her age
e. Only 1.5% of women Joan’s age have a higher systolic blood pressure than she does.
5. Suppose the test scores of 600 students are normally distributed with a mean of 76 and
standard deviation of 8. The number of students scoring between 72 and 80 is
a. 272
b. 164
c. 230
d. 136
e. 328
6. Which of the following is NOT CORRECT about a standard normal distribution?
a. P (0 < Z < 1.50) = .4332
b. P (Z < −1.0) = .1587
c. P (Z > 2.0) = .0228
d. P (Z < 1.5) = .9332
e. P (Z < −2.5) = .4938
7. At a college the scores on the chemistry final exam are approximately normally distributed,
with a mean of 75 and a standard deviation of 12. The scores on the calculus final are also
approximately normally distributed with a mean of 80 and a standard deviation of 8. A
student scored 81 on the chemistry exam and 84 on the calculus final. Relative to the other
student in each class, in which subject did this student do better?
8. Men’s shirt sizes are determined by their neck sizes. Suppose that men’s neck sizes are
approximately normally distributed with a mean of 15.7 inches and a standard deviation of
0.7 inch. A retailer sells men’s shirts in sizes S, M, L, and XL, where the shirt sizes are
defined as follows:
Table 1.7:
Shirt Size
Neck Size
S
M
L
XL
14 ≤
15 ≤
16 ≤
17 ≤
neck
neck
neck
neck
size
size
size
size
< 15
< 16
< 17
< 18
a. Because the retailer only stocks the sizes listed above, what proportion of customers will
find that the retailer does not carry any shirts in their sizes? Show your work.
www.ck12.org
60
b. Using a sketch of a normal curve, illustrate the percentage of men whose shirt size is M .
Calculate this percentage.
9. If Z is standard normally distributed, find k using technology if P (z ≤ k) = 0.878
10. Find the mean and the standard deviation of a normally distributed random variable if
P (x ≥ 50) = 0.2 and P (x ≤ 20) = 0.3
11.
1.6
Planning and Conducting an Experiment or Study
Surveys and Sampling
Quiz 1
1. A magazine mailed a questionnaire to the human resource directors of all of the Fortune
500 companies, and received responses from 28% of them. Those responding reported that
they did not find that such surveys intruded significantly on their workday.
a. Identify the population of interest
b. The population parameter of interest
c. The sampling frame
d. The sample
e. The sampling method
f. Any potential source of bias
2. The following question was part of a survey taken by the PTA (parent/teacher association)
in an effort to obtain parents’ opinions. Do you think the response to the following question
might be biased? If yes, propose a question with more neutral working that might better
assess parental response. (answers will vary – wording bias)
Should elementary school-age children have to pass high stakes tests in order to remain with
their classmates?
3. What type of sampling is evident in the following:
You want to determine the reading level of a book. You choose a chapter of the book at
random, then a page from that chapter and you find 560 words on the page. You take your
sample by choosing every 28th word on the page.
4. Define simple random sample and give an example.
61
www.ck12.org
Quiz 2
1. A consumers union asked all subscribers whether they had used alternative medical treatments and, if so, whether they had benefited from them. For almost all of the treatments,
approximately 24% of those responding reported cures or substantial improvement in their
condition.
a. Identify the population of interest.
b. The population parameter of interest
c. The sampling frame
d. The sample
e. The sampling method
f. Any potential source of bias
2.The following question was part of a survey taken by the PTA (parent/teacher association)
in an effort to obtain parents’ opinions. Do you think the response to the following question
might be biased? If yes, propose a question with more neutral working that might better
assess parental response
Should schools and students be held accountable for meeting yearly learning goals by testing
student before they advance to the next grade? (Answers will vary – wording bias)
3. What type of sampling is evident in the following:
You want to survey how student feel about funding for the basketball team at a large
university. The campus is 65% men and 35% women. You select 65 men at random and
then 35 women at random.
4.Define cluster sampling and give an example.
Quiz 3
1.Researchers waited outside a bar they had randomly selected from a list of all bars. They
stopped every 10th person who came out of the bar and asked whether he or she thought
drinking and driving was a serious problem.
a. Identify the population of interest.
b. The population parameter of interest
c. The sampling frame
d. The sample
e. The sampling method
www.ck12.org
62
f. Any potential source of bias
2. Examine each of the following questions for possible bias. If you think the question is
biased, indicate how and propose a better question.
g. Should companies that pollute the environment be compelled to pay the costs of cleanup?
h. Given that 18−year olds are old enough to vote and to serve in the military, is it fair to
set the drinking age at 21?
3. You want to assess the reading level of a book. You pick one page at random and use
every word on that page.
4. Define stratified sampling and give an example.
Experimental Design
Quiz 1
1. Over a 6−month period, among 25 people with a mental disorder, patients who were
given a high dose of omega−3 fats from fish oil improved more than those given a placebo.
a. Is this an observational study or an experiment?
b. If it is an experiment, identify (if possible)
i. The subjects studied
ii. The factor(s) in the experiment
iii. The design
iv. Whether it was blind or double blind
2. Dentists in a dental clinic were trying to determine whether the number of new cavities
differs for people who eat an apple each day and for people who eat less than one apple
per week. Two groups of clinic patients would be studied. One group would consist of 50
patients who report that they eat an apple each day; the other group would consist of 50
patients who report that they eat less than one apple per week. Dentists would examine the
patients and their records to determine the number of new cavities the patients had over the
preceding year and compare the two groups.
a. Explain why this is an observational study
b. What is the confounding variable?
3. A medical researcher is interested in testing a new medicine for migraine headaches. She
decides to conduct a clinical trial on 100 randomly selected adults who get migraine headaches
at a rate of one per week or more. Although age and gender are not of primary interest in
63
www.ck12.org
the trials, the researcher is concerned that these factors may impact the effectiveness of the
drug.
Describe how she should set up her experiment for the 100 subjects if she wishes to control
for gender.
Quiz 2
1. There is some concern that for women who are taking estrogen, after menopause, that
if they also drink alcohol their estrogen levels will rise too high. Twenty-four volunteers,
12 who were receiving supplemental estrogen and 12 who were not, were randomly divided
into two groups. One group drank an alcoholic beverage and the other drank a nonalcoholic
beverage. An hour later everyone’s estrogen level was checked. Only those on supplemental
estrogen who drank alcohol showed a marked increase.
a. Is this an experiment or an observational study?
b. If it is an experiment identify (if possible)
i. The subjects studied
ii. The factor(s) in the experiment
iii. The design
iv. Whether it was blind or double blind
2. Researchers wanted to compare the effects of a new drug to the effects of an existing
drug for reducing cholesterol levels in patients. They designed a completely randomized
experiment using volunteers with a history of high cholesterol levels. An improved design
would incorporate blocking. Name two different variables that one might use for the block
design.
3. A medical researcher is interested in testing a new medicine for migraine headaches. She
decides to conduct a clinical trial on 100 randomly selected adults who get migraine headaches
at a rate of one per week or more. Although age and gender are not of primary interest in
the trials, the researcher is concerned that these factors may impact the effectiveness of the
drug.
Describe how she should set up her experiment for the 100 subjects if she wishes to control
for age. She decides on age categories of young (21 − 35), middle (36 − 55) and elderly (over
55).
Quiz 3
1. Some gardeners prefer to use non-chemical methods to control insects in their gardens.
Researchers have designed two kinds of traps, and want to know which design will be more
www.ck12.org
64
effective. They randomly choose 10 locations in a large garden and place one of each kind
of trap at each location. After a week they count the number of insects in each trap.
a. Is this an experiment or an observational study?
b. If it is an experiment identify (if possible)
i. The subjects studied
ii. The factor(s) in the experiment
iii. The design
iv. Whether it was blind or double blind
2.Researchers wanted to compare the weight gain for salmon raised on an old type of food
with their weight gain using a new type of food. The fish were randomly assigned to tanks,
but the tanks were located in areas where room temperature varied greatly.
c. What is the response variable?
d. What is the explanatory variable?
e. How many treatments are there?
f. What is the blocking variable?
3. A medical researcher is interested in testing a new medicine for migraine headaches. She
decides to conduct a clinical trial on 100 randomly selected adults who get migraine headaches
at a rate of one per week or more. Although age and gender are not of primary interest in
the trials, the researcher is concerned that these factors may impact the effectiveness of the
drug.
Describe how she should set up her experiment for the 100 subjects if she wishes to control
both for age and gender.
Test 1
1. The student government at a high school wants to conduct a survey of student opinion it
wants to begin with a random sample of 60 students. Which of the following survey methods
will produce a stratified random sample? a. Survey the first 60 students to arrive at school
in the morning.
b. Survey every 10th student entering the school library until 60 are surveyed.
c. Number all students on the official school roster and then use random numbers to choose
15 freshmen, 15 sophomores, 15 juniors and 15 seniors.
d. Number the cafeteria seats, and use a table of random numbers to choose seats and
interview the students until 60 have been interviewed.
65
www.ck12.org
e. Number the students I the official school roster, and then use random numbers to choose
60 students from the roster for the survey.
2. Which of the following can be used to show a cause-and-effect relationship between two
variables?
a. A census
b. A controlled experiment
c. An observational study
d. A sample survey
e. A cross-sectional survey
3. To check the effects of cold temperatures on the elasticity of two brands of rubber bands,
one box of Brand A and one box of Brand B rubber bands are tested. Ten rubber bands
from the Brand A box are placed in a freezer for two hours, and ten bands from the Brand
B box are kept at room temperature. The amount of stretch before breaking is measured on
each rubber band , and the mean for the cold bands is compared to the mean for the others.
Is this good experimental design?
a. No, because means are not proper statistics for comparison.
b. No, because more than two brands should be used.
c. No, because more temperatures should be used.
d. No, because temperature is confounded with brand.
e. Yes
4. They Physician’s Health Study, a large medical experiment involving 22, 000 male physicians, attempted to determine whether aspirin could help prevent heart attacks. In this
study, one group of 11, 000 physicians took an aspirin every other day, while a control group
took a placebo. After several years, it was determined that the physicians in the group
that took the aspirin had significantly fewer heart attacks than the physicians in the control group. Which of the following explains why it would NOT be appropriate to say that
everyone should take an aspirin every other day?
I the study included only physicians and different results may occur in individuals in other
occupations.
II The study included only males and there may be different results for females.
III Although taking aspirin may be helpful in preventing heart attacks, it may be harmful
to some other aspects of health.
a. I only
b. II only
www.ck12.org
66
c. III only
d. II and III only
e. I, II and III
5. Which of the following is NOT a source of bias in survey design?
a. Undercoverage
b. Non-response
c. Working of questions
d. Voluntary response
e. All are sources of bias
6. Suppose you wish to compare the AP Statistics exam results for the male and female
students taking AP Statistics at your school. Which is the most appropriate technique for
gathering the needed data?
a. Census
b. Sample survey
c. Experiment
d. Observational study
e. None of these is appropriate
7. In addition to control by comparing several treatments, the TWO other basic principles
which distinguish experiments from observational studies include;
I randomization, i.e. assigning researchers by chance
II randomization, i.e. assigning subjects by chance
III replication, i.e. doing a study more than once
IV replication, i.e., doing a study with many subjects
a. I and III
b. I and IV
c. II and III
d. II and IV
e. IV and V
8. Which of the following statements are true?
I. Random sampling is a good way to reduce response bias.
67
www.ck12.org
II. To guard against bias from undercoverage, use a convenience sample.
III. Increasing the sample size tends to reduce survey bias.
IV. To guard against nonresponse bias, use a mail-in survey.
A. I only
B. II only
C. III only
D. IV only
E. None of the above.
9. A food company assesses the nutritional quality of a new “instant breakfast” product by
feeding it to newly weaned male white rats. The response variable is a rat’s weight gain over
a 28 day period. A control group of rats eats a standard diet but otherwise receives exactly
the same treatment as the experimental group.
a. How many factors does this experiment have?
b. How many levels for each factor?
c. The experimenters had 30 rats for this experiment. How should they set up the experiment?
10. Many utility companies have introduced programs to encourage energy conservation
among their customers. An electric company considers placing electronic indicators in a
household to show what the cost would be if the electricity use at that moment continued for
a month. Will indicators reduce electricity use? Would cheaper method work almost as well?
The company decides to design an experiment. One cheaper method is to give customers
a chart and information about monitoring their electricity use. The experiment compares
these two approaches (indicator , chart) and a control. The control receives information
about energy conservation but no help in monitoring electricity use. The response variable
is total electricity used in a year. The company finds 60 single family residences in the same
city that are willing to participate. What will the design look like?
Test 2
1. A marketing company offers to pay $35 to the first 200 persons who respond to their
advertisement and complete a questionnaire regarding displays of their client’s product.
The situation is an example of which of the following?
a. Simple random sample
b. Convenience sample
c. Voluntary response sample
www.ck12.org
68
d. Multistage cluster sample
e. None of the above
2. A simple random sample was selected of large urban school districts throughout New
England. The selected districts were identified as target districts. Within each district, a
simple random sample of its high schools was chosen and the principals of those high schools
were interviewed. Which of the following statements regarding this design is NOT true?
a. This is an example of multi-stage cluster sample.
b. Results from the interviews cannot be used to infer responses of the population of interest.
c. The population of interest is the set of all high school principals from large urban school
districts in New England.
d. Not every subset of principals has the same chance of selection.
e. All of these statements are true.
3. Which of the following is an example of a census?
a. Every fifth person leaving a supermarket is asked to name his or her favorite brand of
peanut butter.
b. Each employee in a corporation fills out a questionnaire for a management survey.
c. All the students who are at a school on a particular day rate the food in the cafeteria.
d. A telephone political poll selects ten names from every page of a city directory.
e. All the commuters who are dissatisfied with the service of their commuter train company
are asked to write a letter of complaint.
4. Subjects are randomly assigned to watch either a horror movie or a comedy, and the
amount of popcorn they during the movie is measured. For this experiment, the type of
movie is
a. An independent variable.
b. A dependent variable
c. A continuous variable.
d. A constant
5. Use the following excerpt from a random digit table to answer this question;
21052
65031
45074
92846
67815
78231
01548
20235
56410
82713
If data are labeled: 1: Chevy; 2. Plymouth; 3. Lincoln; 4. Volkswagen; 5. Porsche; 6. Ford;
69
www.ck12.org
and single digit rand digit selection begins at the left side of the row, which cars wold be
included in a simple random sample of three cars?
a. Plymouth, Lincoln, Chevy
b. Plymouth, Chevy, Porsche
c. Plymouth, Ford, Porsche
d. Lincoln, Plymouth, Porsche
e. None of the above
6. A nutritionist wants to study the effect of storage time (6, 12, 18 months) on the amount
of vitamin C present in freeze dried fruit when stored for these lengths of time. Vitamin
C is measured in milligrams per 100 milligrams of fruit. Six fruit packs were randomly
assigned to each of the three storage times. The treatment, experimental unit and response
are respectively,
a. A specific storage time, amount of vitamin C, a fruit pack
b. A fruit pack, amount of vitamin C, a specific storage time
c. Random assignment, a fruit pack, amount of vitamin C
d. A specific storage time, a fruit pack, amount of vitamin C
e. A specific storage time, the nutritionist, amount of vitamin C
7. Match the words in the first column with their correct definitions (second column)
Hypothesis any factor in an experiment that changes
Constant any factor that is not allowed to change
Control a statement of a possible relationship between the independent and Dependent
variables.
Independent variable used to reduce the effects of chance errors.
Dependent variable the factor in an experiment that is changed on purpose
Variable the factor in an experiment that responds to the purposely changed Factor
8. Researchers are interested in the effects of repeated exposure to an advertising message.
All subjects view a 40 minute television program that includes ads for a camera. Some
subjects saw a 30 second commercial, some a 90 second commercial. The same commercial
was repeated 1, 3, or 5 times during the program. After viewing, the subjects answered
questions about their recall of the ad, their attitude toward the camera and their intention
to purchase it. There were 36 subjects, 24 women and 12 men.
a. What are the treatments?
www.ck12.org
70
b. What are the response variables?
c. Why would a block design be desired?
d. Outline the design of this experiment.
9. Some schools teach reading using phonics and others using whole language (word recognition). Suppose a school district wants to know which method works better. Suggest a design
for an appropriate experiment.
1.7
Sampling Distributions and Estimations
Z Score and Central Limit Thereom
Quiz 1
1. A sample is chosen randomly from a population that can be described by a normal model.
a. What is the sampling distribution model for the sample mean? Describe shape, center
and spread.
b. If we choose a larger sample, what is the effect on the sampling distribution model?
2. Assume that the duration of human pregnancies can be described by a Normal model
with mean 266 days and standard deviation 16 days.
a. What percentage of pregnancies should last between 260 days and 278 days?
b. Suppose a doctor is currently providing prenatal care to 70 pregnant women. He is
interested in the mean length of their pregnancies. What is the distribution of the mean
length of pregnancy?
c. What is the probability that the mean duration of these patients’ pregnancies will be less
than 260 days?
3. State the Central Limit theorem.
Quiz 2
1. There is a city in upstate New York that gets an average of 35.4 inches of rain each year
with a standard deviation of 4.2 inches. Assume the Normal model applies.
a. During what percentage of years does this city get more than 45 inches of rain?
b. Less than how much rain falls in the driest 25% of all years?
2. Using the same information in problem 1, let ȳ represent the mean amount of rain for
71
www.ck12.org
eight years.
a. Describe the sampling distribution model of this sample mean.
b. What is the probability that those 8 years average less than 32 inches of rain?
3. State the conditions that the Central Limit Theorem requires. (random sampling, independent values)
Quiz 3
1. Carbon monoxide emissions for a certain kind of car vary with mean 2.7 gm/mi and
standard deviation 0.6 gm/mi. A company has 90 of these cars in its fleet.
a. What percentage of cars would have an emission of carbon monoxide between 2.5 gm/mi
and 3.0 gm/mi?
2. Using the same information as in problem 1, let ȳ represent the mean carbon dioxide level
for the company’s fleet.
a. What is the approximate model for the distribution of ȳ ?
b. Estimate the probability that ȳ is between 3.0 and 3.1 gm/mi.
c. There is only a 6% chance that the fleet’s mean carbon dioxide level is greater than what
level?
3. True or False: The Central Limit Theorem only applies to samples that are drawn from
a population that is normally distributed.
Binomial Distribution and Binomial Experiments
Quiz 1
1. In a test for ESP, a subject is told that cards the experimenter can see but he cannot
contain a star, a circle, a wave, or a square. As the experimenter looks at each of the 20
cards in turn, the subject names the shape on the card. A subject who is just guessing has
probability .25 of guessing correctly on each card.
a. The count of correct guesses in 20 cards has a binomial distribution. What are n and p?
b. What is the mean number of correct guesses?
c. What is the probability of exactly 5 correct guesses?
2. An engineer chooses an SRS of 10 switches from a shipment of 10, 000 switches. Suppose
that (unknown to the engineer) 10% of the switches in the shipment are bad. The engineer
counts the number X of bad switches in the sample. Is this a binomial setting? Explain.
www.ck12.org
72
3. Suppose the probability that an environmental engineer successfully lands a consulting
job is .30 on each job bid on. Assume that the consulting jobs bid on are independent, and
let x be the number of jobs landed in 5 jobs bid.
a. What is the probability of landing exactly 3 consulting jobs?
b. Find the probability of landing at most 2 jobs.
c. Find the probability of landing at least 4 jobs.
d. Determine the mean and standard deviation of x.
Quiz 2
1. A federal report finds that lie detector tests given to truthful persons have probability of
.2 of suggesting that the person is deceptive.
a. A company asks 12 job applicants about thefts from previous employers, using a lie
detector to assess their truthfulness. Suppose that all 12 answer truthfully. What is the
probability that the lie detector says all 12 are truthful? What is the probability that the
lie detector says at least one is deceptive?
b. What is the mean number among 12 truthful persons who will be classified as deceptive?
What is the standard deviation of this number?
2. You observe the sex of the next 20 children born at a local hospital; X is the number of
girls among them. Is this a binomial situation?
3. Blood type is inherited. If both parents carry genes for the O and A blood types, each
child has probability .25 of getting two O genes and so of having blood type O. Different
children inherit independently of each other. The number of O blood types among 5 children
of these parents is the count X of successes in 5 independent observations with probability
.25 of a success on each observations So X has the binomial distribution with n = 5 and
p = .25. What is the probability that at least two children will be born with blood type O?
Quiz 3
1. A test for the presence of antibodies to the AIDS virus in blood has probability 0.99
of detecting the antibodies when they are present. Suppose that during a year 20 units of
blood with AIDS antibodies pass through a blood bank.
a. Take X to be the number of these 20 units that the test detects. What is the distribution
of X?
b. What is the probability that the test detects all 20 contaminated units? What is the
probability that at least one unit is not detected?
73
www.ck12.org
c. What is the mean number of units among the 20 that will be detected? What is the
standard deviation of the number detected?
2. A couple decides to continue to have children until their first girl is born; X is the total
number of children the couple has.
3. Bolts produced by a machine vary in quality. The probability that a given bolt is defective
is 0.03. A random sample of 35 bolts is taken from the week’s production. If X denotes the
number of defectives in the sample, find the mean and standard deviation of X.
Confidence Intervals
Quiz 1
1. The average composite ACT score for students who took the test in 2003 was 21.4.
Assume that the standard deviation is 1.05.
a. In a random sample of 36 students who took the exam, what is the probability that the
average composite ACT score is 22 or more?
b. Find a 90% confidence interval for µ , based on the sample information given above.
2. A survey designed to obtain information on the proportion of registered voters who are in
favor of a constitutional amendment requiring a balanced budget results in a sample size of
n = 400. Of the 400 voters sampled 272 are in favor of a constitutional amendment requiring
a balanced budget.
a. Give a point estimate of the population proportion in favor of a balanced budget amendment.
b. Determine the estimated standard deviation of your point estimate.
c. Calculate a 99% confidence interval for the population proportion in favor of the amendment.
d. How large would n have to be in order to have estimated the population proportion to
within .03 with 95% confidence?
Quiz 2
1. Let x denote the variable which represents the amount of money spent by a tourist
visiting The Grand Canyon. Historical information reveals that x is normally distributed
with a mean of $250 and a standard deviation of $60.
a. For a sample of n = 16, determine the mean and the standard deviation of the sampling
distribution of the sample mean.
www.ck12.org
74
b. For a sample of n = 36, determine the approximate probability that X̄ is greater than
$280.
c. For a sample of n = 36, determine the approximate probability that the total amount of
money spent by the 36 tourists is greater than $9, 000.
2. A survey of 40, 000 American households in 1987 found that 30.5% of those in the sample
had a pet cat.
a. Use this sample information to form a 99% confidence interval to estimate the true
proportion of all American households that owned a cat in 1987.
b. Write a sentence interpreting the meaning of this interval.
Quiz 3
1. In a survey of American households, 75.1% of the households claimed to have made a
financial contribution to charity in the past year.
a. If the survey had involved 1000 households, what would a 95% confidence interval be?
b. Interpret, in words, the meaning of this confidence interval.
c. Describe how increasing the number of households involved in the survey would change
the 95% confidence interval.
2. The manager of an electronics department at a large department store is interested in
knowing the mean size of TV screens (µ) that a customer purchases. Based upon industry
standards it is believed that the standard deviation is 4 inches.
a. If a sample of n = 36 yields a sample average TV screen size of 21.2 inches, calculate a
90% confidence interval for µ
b. Determine how large a sample is needed in order to estimate µ within 1 inches with 95%
confidence.
Sums and Differences of Independent Random Variables
Quiz 1
1. Consider the following two experiments: the first has outcome X taking on the values
0, 11, and 2, with equal probabilities; the second results in an (independent) outcome Y
taking on the value 3 with probabiity 41 and 4 with probability 34 .
a. Find the distribution of Y + X
b. Find the mean and variance of Y + X.
75
www.ck12.org
2. Suppose X and Y are independent random variables. The variance of X is equal to 16;
and the variance of Y is equal to 9. Let Z = X − Y .
a. Find the standard deviation of Z.
3. Given independent random variables with means and standard deviations as shown, find
the mean and standard deviation of each of these variables:
Table 1.8:
X
Y
Mean
SD
80
12
12
3
a. 2Y + 20
b. .25X + Y
c. X − 5Y
Quiz 2
1. Consider the following two experiments: the first has outcome X taking on the values
0, 11, and 2, with equal probabilities; the second results in an (independent) outcome Y
taking on the value 3 with probabiity 14 and 4 with probability 34 .
a. Find the distribution of Y − X
b. Find the mean and variance of Y − X(5)
2. You roll a die. If it comes up a 6 you win $100. If not, you get to roll again. If you get a
6 the second time you will $50. If not, you lose.
a. Create a probability model for the amount you win at this game.
b. Find the expected amount you’ll win.
c. How much would you be willing to pa to pay this game?
3. Given independent random variables with means and standard deviations as shown, find
the mean and standard deviation of each of these variables:
Table 1.9:
X
Y
www.ck12.org
Mean
SD
120
300
12
16
76
a. .8Y
b. 2X − 100
c. 3X − Y
Quiz 3
1. A couple plans to have children until they get a girl but they agree that they will not
have more than three children even if all are boys. Assume boys and girls are equally likely.
a. Create a probability model for the number of children they will have.
b. Find the expect number of children.
c. Find the expected number of boys they will have.
2. Given independent random variables with means and standard deviations as shown, find
the mean and standard deviation of each of these variables:
Table 1.10:
X
Y
Mean
SD
80
12
12
3
a. X − 20
b. .5Y
c. X + Y
3. A grocery supplier believes that in a dozen eggs, the mean number of broken eggs. Is 0.7
with a standard deviation of .4 eggs. You buy 4 dozen eggs without checking them.
a. How many broken eggs do you expect to get?
b. What is the standard deviation?
c. What assumptions do you have to make about the eggs in order to answer the questions?
Student’s T
Quiz 1
1. Find the critical value of t for a 90% confidence interval with 18 degrees of freedom.
2. Given the following sample data about automobile speeds in a residential area, find the
77
www.ck12.org
90% confidence interval for the true mean speed of the vehicles. Assume that the data
satisfies the necessary conditions so that it can be approximated by a t-distribution.
Speed
31
29
31
34
27
34
37
28
29
30
26
29
24
38
34
31
36
29
31
34
34
32
36
3. True or False: In order to use the t statistic, our sample must come from a normal
population.
Quiz 2
1. Find the critical value of t for a 95% confidence interval with 15 degrees of freedom.
2. Students weighed six bags of chips and recorded the following weights (in grams0
29.2, 28.5, 27.7, 27.9, 28.1, 28.5
The company claims bags of their chips weigh 28.3 grams.
a. Find the mean and standard deviation of the observed values.
b. Create a 95% confidence interval for the mean weight of such bags of chips.
c. Explain in context what your interval means.
3. True or False: At statistic is used when the sample is taken from a normal distribution
and the standard deviation is known.
Quiz 3
1. Find the critical value of t for a 99% confidence interval with 18 degrees of freedom.
2. A consumer group tested 14 brands of vanilla ice cream and found the following numbers
of calories per serving:
160
200
220
230
120
180
140
130
170
190
80
120
100
170
a. Create a 95% confidence interval for the average calorie content of vanilla ice cream.
b. Explain what your interval means. (based on the sample we are 95% confident the average
calorie content of vanilla ice cream is between
www.ck12.org
78
3. True or False: To use the t statistic the sample should come from an underlying population
which is normal, the standard deviation of the population is unknown and the sample size
is small.).
Test 1
1. The bound on the error of estimation associated with a 95% confidence interval for
resulting in the interval (112.4, 121.6). Then
a. 95% of the time, µ falls with the interval (112.4, 121.6)
b. There is a 95% chance that µ falls within the interval (112.4, 121.6)
c. 95% of all f the possible values of µ fall within the interval (112.3, 121.6)
d. 95% of all f the possible samples produce intervals that do capture µ
2. If σ = 10, then the sample size required to estimate a population with mean µ to within
.5 with 95% confidence is
a. 40
b. 119
c. 1257
d. 1537
3. Which of the following is not a property of the t distribution?
a. The t curve is centered at 0 and is bell shaped
b. The t curve is more spread out than a z curve.
c. The t curve tends to spread out as the degrees of freedom increases.
d. The formula for a t variable is
x̄
√x
n
4. Which of the following statements concerning the Central Limit theorem is true?
a. The Central Limit theorem predicts that the distribution of x̄ follows a normal distribution
for every sample size n.
b. The Central Limit Theorem predicts that the distribution of p follows a normal distribution for every sample size n.
c. The Central Limit Theorem predicts that the distribution of p will be reasonable close to
a normal distribution when n ≥ 30
d. The Central Limit Theorem predicts that the distribution of x̄ will be reasonable close
to a normal distribution when n ≥ 30
79
www.ck12.org
5. The t distribution that you use to find your critical values closely resembles the normal
distribution when:
a. The sample mean is large
b. The sample variance is large
c. The sample size is large
d. The population standard deviation is large
6. A pharmacist finds that 55%(π = .55) of all customers prefer name brand prescription
drugs to generic prescription drugs. For a sample of n = 25 customers, let p be the sample
proportion of customers who prefer name brand prescription drugs.
a. Determine the mean and the standard deviation of the sampling distribution of p.
b. Is it reasonable to assume that the sampling distribution of p is approximately normally
distributed?
c. What is the approximate probability that the sample proportion p is between .50 and .70?
7. The gas mileage for a certain model of car is known to have a standard deviation of
5 miles/gallon. A simple random sample of 64 cars of this model is chosen and found to have
a mean gas mileage of 27.5 miles/gallon. Construct a 95% confidence interval for the mean
gas mileage for this car model. Interpret the interval in words.
8. Using the following data and your calculator, construct a 90% confidence interval for the
mean of the population from which the data was taken. Assume the underlying population
is normally distributed.
29 34 34 28 30 29 38 31 29 34 32
9. Bolts are packed in boxes of 20. The probability of a bolt being defective is 0.1. what is
the probability of a box containing 2 defective bolts? (.2852)
10. Let x1 and x2 be random variables with means and standard deviations given below:
Table 1.11:
Random variable
Mean
Standard deviation
x1
x2
12
14
2
15
a. Determine µx1 +x2
b. Determine σx1 +x2
www.ck12.org
80
Test 2
1. The bound on the error of estimation associated with a 95% confidence interval for µ is
a. 1.96 σ
b. 1.96
√σ
n
c. 1.96
σ
n
d. 1.96
2. Which of the following does not influence the width of a large sample confidence interval
for µ?
a. x̄ b. The standard deviation of the population.
c. The confidence level
d. The sample size
3. The Central Limit Theorem predicts that
a. The sampling distribution of x̄ will be approximately normal for reasonably large samples
b. The sampling distribution of µx will be approximately normal for reasonably large samples.
c. The mean of the sampling distribution of x̄ will tend to be close to µ for reasonably large
samples.
d. The mean of the sampling distribution of µx will tend to be close to µ for reasonably
large samples.
4. A t interval is used in place of the z interval when which of the following must be
estimated?
a. The sample mean
b. The population mean
c. The sample standard deviation
d. The population standard deviation.
5. The critical t values that are used to find a confidence interval for the population mean
will get larger if
a. The sample size becomes smaller
b. The level of confidence is made smaller
c. The standard deviation becomes smaller
81
www.ck12.org
d. The population mean becomes larger
6. A random sample is selected from a population that has a proportion of successes π = .7
a. Determine the mean and the standard deviation of the sample proportion for a sample
size of n = 9.
b. For a sample of n = 25, determine the approximate probability that the sample proportion
will be within .25 of π.
7. A simple random sample of 75 female adults living in a particular city was taken to study
the amount of time they spent per week doing rigorous exercise. It indicated a mean of
73 minutes with a standard deviation of 21 minutes. Find the 99.5 confidence interval of the
mean for all females in this city. Interpret this interval in words.
8. Using the following data and your calculator, construct a 90% confidence interval for the
mean of the population from which the data was taken. Assume the underlying population
is normally distributed.
31 27 37 29 26 24 34 36 31 34 36 21
9. Batteries are packaged in boxes of 10. The probability of a battery being faulty is 2%.
What is the probability of a box containing 2 faulty batteries? (0.0153)
10. Let x1 and x2 be random variables with means and standard deviations given below:
Table 1.12:
Random variable
Mean
Standard deviation
x1
x2
12
15
2
1
a. Determine µx1 +x2
b. Determine σx1 +x2
www.ck12.org
82
1.8
Hypothesis Testing
Hypothesis testing and the P value
Quiz 1
1. Bars of Choco are claimed by the manufacturer to have a mean mass of 102.5 grams. A
test is carried out to see whether the mean mass of Choco bars is less than 102.5 grams.
State the null and alternative hypotheses.
2. In a criminal trial in the United States the jury is always told that the defendant is
“innocent until proven guilty”.
a. What must a member of the jury assume about the defendant at the beginning of the
trial?
H0 :
(HINT: one word)
b. It is the prosecuting attorney’s job to present evidence to the jury. IF there is enough
evidence (“beyond a reasonable doubt”), then the jury will convict the defendant of the
crime. If the defendant is convicted, the jury is rejecting the null hypothesis.
Ha :
c. When the jury convicts someone of a crime, their verdict is GUILTY.
Is this “Reject H0” OR “Fail to Reject H0”?
d. If the jury fails to convict someone of a crime, their verdict is NOT GUILTY.
Is this “Reject H0” OR “Fail to Reject H0”?
e. Sometimes the jury makes a correct decision and sometime the jury makes a mistake.
a. Write a sentence describing a Type I error in the U.S. criminal justice system.
b. Write a sentence describing a Type II error in the U.S. criminal justice system.
3. Using a z distribution find the critical value for a two tailed test at α = .02
4. In a one sided hypothesis test at α = .02 if the p value is .001, what is the decision you
will make regarding the null hypothesis?
5. A clean air standard requires that vehicle exhaust emissions not exceed specified limits for
various pollutants. Many states require that cars be tested annually to ensure that they meet
the standard. State regulators are checking up on repair shops to see if they are certifying
cars that do not meet the standard.
a. In this problem, what is meant by the power of the test the regulators are conducting?
83
www.ck12.org
(Probability of detecting the shop is not meeting standards when it is not)
b. Will the power be greater if they test 20 or 40 cars? (40 cars)
Quiz 2
1. The mean factory assembly time for a particular electronic component is 84 seconds. It is
required to test whether the introduction of a new procedure results in a different assembly
time. State the null and alternative hypotheses.
2. Medical tests have been developed to detect many serious diseases (such as cancer and
HIV). A medical test is designed to give correct results as often as possible. That is, to
minimize the occurrence of “false positives” and “false negatives”.
A doctor starts by assuming that a patient is healthy (no disease), then looks for evidence
to contradict that assumption. If the patient has a negative test result, the doctor continues
to assume that the patient is healthy. If the patient has a positive test result, the doctor
concludes that the patient has a disease.
A. State H0 and Ha.
b. When will the doctor Reject H0 ?
C. When will the doctor fail to Reject H0 ?
D. What kind of an error is a “false positive”? EXPLAIN.
E. What kind of an error is a “false negative”? EXPLAIN.
F. What are the consequences of a false positive? Of a false negative?
3. Using a z distribution find the critical value for a two tailed test at α = .03 for one tailed
test, Ha : µ ≥ d for some constant d
4. In a two sided hypothesis test at α = .02 if the p value is .03, what is the decision you
will make regarding the null hypothesis?
5. A company is sued for job discrimination because only 17% of the newly hired candidates
were minorities when 30% of all who applied were minorities. Is this strong evidence that
the company’s hiring practices are discriminatory?
a. In this problem, describe what is meant by the power of the test.
b. If the hypothesis is tested at the 5% level of significance instead of 1% how will the power
be affected?
www.ck12.org
84
Quiz 3
1. In a report it was stated that the average age of all hospital patients was 53 years. A
newspaper believes that this figure is an underestimate.
State the null and alternative hypotheses.
2. For the following conjecture, The teacher will never check homework today.
a. Describe in words the null and alternative hypotheses.
b. Describe the type I and type II errors for this conjecture
c. Describe the ramifications of making these errors in the context of the problem.
3. Using a z distribution find the critical value for a two tailed test at α = .05
4. In a one sided hypothesis test at α = .05 if the p value is .039, what is the decision you
will make regarding the null hypothesis?
5. A professor notes that for the past several years about 11% of the students who initially
enroll in her class withdraw before the end of the semester. She is offered some software to
buy which, the salesperson claims will help make the class more interesting. She can use the
software for a semester, at no cost to see if the dropout rate goes down significantly. She
only has to pay for the software if she chooses to continue to use it beyond the semester.
a. Is this a one or two tailed test?
b. Explain what will happen if the professor make a type II error.
c. What is meant by the power of this test?
Testing a Mean Hypothesis
Quiz 1
1. Two pharmacists are concerned about their supply of an antibiotic. Mean release potency
for unaffected antibiotic pills is 910 with a standard deviation of 6.8.The pharmacists test
10 lots of antibiotic and get the following potency data:
900
901
910
877
913
909
908
905
916
918
The pharmacists are concerned that the antibiotic is less than from the standard release
potency of 910 and is going to run a significance test.
a. What are the null and alternate hypotheses?
85
www.ck12.org
b. Use a .01 level of significance and a z test to test the hypotheses.
c. Calculate your p value.
d. What is your conclusion?
e. Describe a type I error in this context. What is the probability of making a type I error?
f. Describe power in this situation.
2. Data for male mean earnings indicates that this figure is $24000. What can you say about
the validity this figure if a simple random sample of 200 families showed an average earnings
level of $23500, with a standard deviation of $4000? Use a .05 level of significance. Would
your conclusions be any different at a .01 level of significance? (To calculate the standard
error:
3. In an advertisement, a pizza shop claims that its mean delivery time is less than
30 minutes. A random selection of 36 delivery times yields a sample mean of 28.5 minutes
and a standard deviation of 3.5 minutes. Does this provide sufficient evidence to support
the claim at a significance level of α = .01 ?
Quiz 2
1. Let µ denote the mean cholesterol of heart attack patients under the age o 50. The
American Medical Association (AMA) has claimed that cholesterol levels of 240 and higher
dramatically increase the risk of heart attacks. A random sample of the cholesterol levels
of 15 heart attack patients age 50 and under yields x̄ = 247 and s = 17.3. For testing
H0 : µ = 240 versus H0 : µ > 240
a. Calculate the value of the test statistic for testing the null hypothesis.
b. Assuming that cholesterol levels are normally distributed, determine as closely as possible
the p−value associated with the value of the test statistic you found in part a).
c. Using a significance level of .05, does the sample data support the hypothesis that heart
attack patients under the age of 50 have mean cholesterol level greater than 240? Explain.
2. The University of Higher Learning maintains that the average annual income of a graduate
one year after graduation is 47, 550. You suspect this is actually lower. You take a SRS of
size 30 of individuals who graduated from the University of Higher Learning one year ago,
and find the average annual income to be 38, 790. Suppose that the standard deviation of
annual average income of all one-year old graduates from the University of Higher Learning
is 12, 500.
a. Calculate the z−test statistic.
b. Perform a z−test of significance and give the p−value.
c. What is your conclusion?
www.ck12.org
86
3. A franchise reports an average of 150 sales per day. You suspect that this average is
inaccurate, so you randomly select 43 days and determine the number of sales for each day,
which turns out to be 143 with a standard deviation of 15 sales. At α = .05; is there enough
evidence to support her claim? At the level α = .01 ?
Quiz 3
1. The department of natural resources reports that a fish is unsafe to eat if the polychlorinated biphenol (PCB) concentration exceeds 5 parts per billion (ppb). A sample of 10 fish
taken from a local lake results in the data listed below.
2.9
4.7
7.6
6.9
4.8
4.9
5.2
3.7
5.1
3.8
a. Calculate point estimates of µ and σ
b. Will you use at or z test statistic? Explain.
c. For testing H0 : µ = 5 versus Ha : µ > 5 calculate the value of the test statistic.
d. Assuming that the levels of PCB’s are normally distributed, determine the p−value as
closely as possible.
e. Does the data suggest that the fish from this lake should not be eaten? Explain.
2. You want to estimate the average height of an enchanted ogre. No one knows the standard
deviation of heights of enchanted ogres. You take a random sample of 80 enchanted ogres
and find that the average height is 7.8 feet, with a standard deviation of 1.2 feet.
a. Ted the Wizard says the mean height of enchanted ogres is 8.5 feet. Your sample leads
you to believe that the mean height is actually lower. Perform a test of significance and
state your conclusion.
b. No one knows if the distribution of heights of enchanted ogres is normal. Why does this
not matter in this case?
3. Suppose you work for a Drug Corporation and are testing their expensive new drug
which has been approved. The pill form of the drug is to be manufactured at 325 mg.
Conduct a test at the α = .01 significance level, for the purpose of either recommending
that the manufacturing plant adjust their manufacturing process or not. Assume that you
have obtained 185 pills from the manufacturing plant, for which the mean concentration is
327.55 mg and s = 9.2 mg.
87
www.ck12.org
Testing a Proportion Hypothesis
Quiz 1
1. A lawsuit against a chemical company alleges that neighbors of a chemical plant have
higher than normal cancer rates. The group filing the lawsuit randomly sampled 71 people
who lived within 10 miles of the plant, and they found that 18 of those people had the cancer.
By comparison, a random sample of 500 people in the general population found 74 cases of
the same cancer. Is the cancer rate in the sample significantly higher than the rate in the
general population?
2. In July 2002 the American Journal of Clinical Nutrition reported that 42% of 1546 randomly selected African American women studied had vitamin D deficiency. The data came
from a national nutrition study conducted by the Centers for Disease Control in Atlanta.
a. Find the value of σβ and sketch the sampling distribution
b. Create a 95% confidence interval.
3. Cancer Rates. A lawsuit against a chemical company alleges that neighbors of a
chemical plant have higher than normal cancer rates. The group filing the lawsuit randomly
sampled 71 people who lived within 10 miles of the plant, and they found that 18 of those
people had the cancer. It is known that the cancer rate in the general population is .15. Is
the cancer rate in the sample significantly higher than the rate in the general population?
(You may assume that the events and the two samples are independent.)
a. Write appropriate hypotheses.
b. Test the hypothesis at the 1% significance level (α = .01).
c. Explain your conclusion in the context of the problem.
d. Describe two ways to increase the power of this test.
Quiz 2
1. A parcel delivery service claims that at least 80% of their parcels are delivered within
48 hours of posting. A check on 200 randomly selected parcels found that 152 were delivered
within 48 hours of posting. Test the delivery service’s claim at the 5% significance level.
2. Vitamin D.In July 2002 the American Journal of Clinical Nutrition reported that 42%
of 1546 randomly selected African American women studied had vitamin D deficiency. The
data came from a national nutrition study conducted by the Centers for Disease Control in
Atlanta. (You may assume independence.)
b. Find the value of σβ
www.ck12.org
88
c. Create a 95% confidence interval.
d. Interpret the meaning of this interval.
3. Explain the difference between a p value and p̂
Quiz 3
1. The dropout rate of students enrolled at a certain university is reported to be 13.2%.The
Dean of Students suspects that the drop-out rate for science students is greater than 13.2%,
and she examines the records of a random sample of 95 of these students.The number of
drop-outs was found to be 20.Test the Dean’s suspicion at the 5% significance level.
2. A random sample of 500 males was selected from a town in France and 53 men were found
to be color-blind. Find a 95% confidence interval for the proportion of color-blind males in
France.
3. Free Throws. During her first basketball season, a player made 49 out of 55 free throws
attempted.
a. Find a 95% confidence interval for her percentage of free throws.
b. Based on your interval would you say she is a better free throw shooter than her teammates
whose percentage is .75?
Testing a Hypothesis for dependent and independent samples
Quiz 1
1. An animal researcher carried out an experiment to see if the type of food fed to a rat
would affect the rat’s running time through a maze. Four rats were fed bran and 8 rats were
fed the regular food. The running time is in seconds. Here is the data:
Table 1.13:
Bran
Regular Food
32
38
27
45
118
78
82
91
67
41
97
89
www.ck12.org
Table 1.13: (continued)
Bran
Regular Food
42
a. Does this scenario involve dependent or independent samples? Explain.
b. What would the hypotheses be for this scenario?
c. Compute the pooled estimate for population variance.
d. Calculate the estimated standard error for this scenario.
e. What is the test statistic and at an alpha level of .05 what conclusions would you make
about the null hypothesis?
2. A researcher was concerned about the effects of anxiety on test scores and investigated
the effectiveness of relaxation training. The subjects were given a test before and after the
training and a measure of their anxiety was taken after each test. Following is the data:
Table 1.14:
Before
After
84
76
104
103
91
90
72
70
90
94
93
90
a. What would be the hypotheses for this scenario?
b. Calculate the estimated standard deviation for this scenario. (Level 2)
c. Compute the standard error of the difference for these samples. (Level 2)
d. What is the test statistic and at an alpha level of .05 what conclusions would you make
about the null hypothesis?
Quiz 2
1. You want to know if attending summer school helps students ‘ grades to improve. Six
students repeat a course they did poorly in during the school year. Assume that these six
students are representative of all students who might attend summer school. Do these results
provide evidence that the summer school program is worthwhile?
www.ck12.org
90
June
August
54
50
49
65
68
74
66
64
62
68
62
72
a. What would be the hypotheses for this scenario?
b. Calculate the estimated standard deviation for this scenario.
c. Compute the standard error of the difference for these samples.
d. what is the test statistic and at an alpha level of .05 what conclusions would you make
about the null hypothesis?
2. Can boys do more push-ups than girls? To answer this question students at a High School,
as part of a physical fitness test, were asked to do as many push-ups as they could. Assume
that students at the high school were assigned to gym classes randomly. Here is the data:
boys
Girls
11
24
34
7
17
14
27
16
31
2
17
15
25
19
32
25
28
10
23
27
25
31
16
8
a. Does this scenario involve dependent or independent samples? Explain.
b. What would the hypotheses be for this scenario?
c. Compute the pooled estimate for population variance.
d. Calculate the estimated standard error for this scenario.
e. What is the test statistic and at an alpha level of .05 what conclusions would you make
about the null hypothesis?
Quiz 3
1. You want to know whether people are likely to offer a different amount for a used bicycle
when buying from a friend than when buying from a stranger. Following is the data collected:
Table 1.15:
Buying from a friend
Buying from a stranger
275
300
260
300
260
250
175
130
91
www.ck12.org
Table 1.15: (continued)
Buying from a friend
Buying from a stranger
255
275
290
300
200
225
240
a. State the null and alternative hypotheses.
b. Perform a two-sample t test
c. Calculate the p value
d. State your decision.
2. Do you get better gas mileage when you use premium gas rather than regular gas? To
study this question 10 cars from a company fleet were used. Each car was filled with regular
or premium gas, depending on the toss of a coin, and the mileage for that tank was recorded.
Then the mileage was recorded for the same cars for a thankful of the other kind of gas.
Here are the results:
car#
Regular
premium
1
16
19
2
20
22
3
21
24
4
22
24
5
23
25
6
22
25
7
27
26
8
25
26
9
27
28
10
28
32
a. What would be the hypotheses for this scenario?
b. Calculate the estimated standard deviation for this scenario.
c. Compute the standard error of the difference for these samples.
d. What is the test statistic and at an alpha level of .05 what conclusions would you make
about the null hypothesis?
Test 1
1. Samples of hamburger were selected from two different outlets of a large supermarket to
measure the percentage of fat present in the meat, with the following summary data.
www.ck12.org
92
Table 1.16:
N
mean
std.dev
Outlet 1
Outlet 2
5
10.3
1.6
10
10.7 percent
2.3 percent
It is reasonable to believe that both outlets have the same variability. Hence, the pooled
standard deviation is:
a. 1.95
b. 2.08
c. 4.38
d. 2.09
e. 2.11
(Solution: e)
2. The degrees of freedom of the pooled estimate in the previous question is:
a. 15
b. 13
c. 7.5
d. 5
e. 10 (b)
3. In a test of H0 : µ = 100 against HA : µ6 = 100, a sample of size 10 produces a sample
mean of 103 and a p−value of 0.08. Thus, at the 0.05 level of significance:
a. there is sufficient evidence to conclude that µ6 = 100.
b. there is sufficient evidence to conclude that µ = 100.
c. there is insufficient evidence to conclude that µ = 100.
d. there is insufficient evidence to conclude that µ6 = 100.
e. there is sufficient evidence to conclude that µ = 103.
Solution: d - you always try and collect evidence against the null
4. In a test of H0 : µ = 100 against HA: µ6 = 100, a sample of size 80 produces Z = 0.8
for the value of the test statistic. The p−value of the test is thus equal to:
a. 0.20
93
www.ck12.org
b. 0.40
c. 0.29
d. 0.42
e. 0.21
Solution: d
5. In order to study the amounts owed to the city, a city clerk takes a random sample of 16
files from a cabinet containing a large number of delinquent accounts and finds the average
amount X owed to the city to be $230 with a sample standard deviation of $36. It has been
claimed that the true mean amount owed on accounts of this type is greater than $250.
a. What are the null and alternate hypotheses.
b. At the .05 level of significance, compute the test statistic and p value.
c. State your decision.
6. A physician wants to compare the blood pressures of six patients before and after treatment with a drug. The blood pressures are as follows:
Table 1.17:
Patient
Before Drug
After Drug
1
2
3
4
5
6
168
171
182
167
174
170
171
170
180
173
178
172
The physician wants to test if there is a significant change of the blood pressure before and
after taking the drug at 0.05 level of significance.
a. State the null and alternate hypotheses
b. Find the value of the test statistic.
c. What is your decision about the drug?
7. A paired difference experiment is conducted to compare the starting salaries of male and
female college graduates who find jobs. Pairs are formed by choosing a male and a female
with same major and similar grade-point averages. Suppose a random sample of 5 pairs and
the starting salaries(in thousands) are as follows:
www.ck12.org
94
Pair
Male
Female
1
25.9
24.9
2
20.0
18.5
3
28.7
27.7
4
13.5
13.0
5
18.8
17.8
Test whether the mean starting salary for males is less than that of females at the .05 level
of significance.
Test 2
1. Suppose that the variable you have measured in a sample of subjects does not have a
normal distribution in the population. which of the following is recommended?
a. Convert all the measurements to z−scores
b. Eliminate as many measurements as necessary until your sample distribution looks like
the normal distribution.
c. Use a fairly large sample size (at least 30 or 40)
d. Choose another variable to measure – only a normally distributed variable will give you
valid results. (c)
2. The main advantage of a one-tailed test, compared to a two-tailed test is that:
a. Only half the calculation is required
b. Only half of the calculated t value is required
c. There is only half the risk of a type I error
d. A smaller critical value must be exceeded (d)
3. As the calculated z−score for a one sample test gets larger
a. P gets larger
b. P gets smaller
c. P remains the same but alpha gets larger
d. P remains the same but alpha gets smaller (b)
4. In a two-tailed large sample z test with calculated z = 1.68, the p−value is
a. 0.0930
b. 0.0465
c. 0.9170
95
www.ck12.org
d. 0.9535
5. Which of the calculated values of a test statistic would have the smallest p−value?
a. Z = 3.05
b. T = 3.05 with 10 degrees of freedom
c. T = 3.05 with 15 degrees of freedom
d. T = 3.05 with 30 degrees of freedom
6. β is the
a. Significance level
b. P −value
c. Probability of making a type II error
d. Probability of making a type I error
7. Let µ denote the mean cholesterol of heart attack patients under the age of 50. The
American Medical Association has claimed that cholesterol levels of 240 and higher dramatically increase the risk of heart attacks.A random sample of cholesterol levels of 15 heart
attack patients age 50 and under yields x̄ = 247 s = 17.3.For testing H0 = µ = 240 versus
Ha : µ ≥ 240
a. Calculate the value of the test statistic for testing the null hypothesis.
b. Assuming that cholesterol levels are normally distributed, determine as closely as possible
the p value associated with the value of the test statistic you found in part a).
c. Using a significance level of .05, does the sample data support the hypothesis that heart
attack patients under the age of 50 have mean cholesterol level greater than 240?
8. It is claimed that college women tend to have higher GPAs than do college men. A random
sample of 13 men and 19 women in a college class reported their grade point averages. Here
are the summary statistics for that data:
Table 1.18:
Men
Women
ȳ
s
2.898
3.330
0.583
0.395
a. Calculate the pooled sample standard deviation.
b. Does this sample support the claim? Test an appropriate hypothesis and state your
conclusion.
www.ck12.org
96
9. A manufacturer wished to compare the wearing qualities of two different types of automobile tires, A and B. To make the comparison, a tire of type A and one of type B
were randomly assigned and mounted on the rear wheels of each of five automobiles. The
automobiles were then operated for a specified number of miles and the amount of wear was
recorded for each tire. These measurements appear below:
Table 1.19:
Automobile
Tire A
Tire B
1
2
3
4
5
10.6
9.8
12.3
9.7
8.8
10.2
9.4
11.8
9.1
8.3
Test the null hypothesis that there is no difference in the average length of wear for the two
types of tires.
1.9
Regressions and Correlation Quizzes
Scatterplots and Linear Correlation
Quiz 1
A teacher believes that the number of hours a student spends studying can, to some degree,
predict the student’s score on a quiz. Consider the following data:
Table 1.20:
Study Hours
Quiz Score
7
6
11
5
15
11
12
11
7
5
10
6
4
9
10
8
9
7
97
www.ck12.org
1. Compute the Pearson correlation coefficient between the study hours and quiz score.
2. Find r squared
3. Interpret both r and r squared in words.
4. Draw a scatter plot of this data and describe the direction and strength of the relationship.
Quiz 2
The following data describes the percentage of rotten apples in a case of fruit based on the
number of days of transport to the store.
Days
Percent Rotten
1
5
2
7
3
8
4
12
5
16
6
21
1. Compute the Pearson correlation coefficient between the study hours and quiz score.
2. Find r squared
3. Interpret both r and r squared in words.
4. Draw a scatter plot of this data and describe the direction and strength of the relationship.
Quiz 3
Consider the following data set:
X
Y
7
18
6
46
11
8
10
25
9
25
18
7
1. Compute the Pearson correlation coefficient between the study hours and quiz score.
2. Find r squared
3. Interpret both r and r squared in words.
4. Draw a scatter plot of this data and describe the direction and strength of the relationship.
www.ck12.org
98
Least Squares Regression
Quiz 1
Following is data on the amount of money (in dollars) a customer spent on a product and
the customer satisfaction (on a scale of 1 − 10) with that product.
Table 1.21:
dollars
satisfaction
11
18
17
15
9
5
12
19
22
25
6
8
10
4
9
6
3
5
2
10
1. Plot this data on a scatterplot (X axis – dollars spent, Y − axis – customer satisfaction)
2. Does there appear to be a linear relationship?
3. Calculate the regression equation for these data.
4. Sketch the regression line on the scatterplot.
5. What is the predicted satisfaction of a customer who spends $16?
6. Calculate the residuals for each of the observations and plots these residuals on a scatterplot.
7. Examine the scatterplot of the residuals. Is a transformation of the data necessary?
Explain your answer.
Quiz 2
The table below shows the percent of persons below the poverty level in the selected years.
Let x = 0 correspond to 1980.
99
www.ck12.org
Year
%
1980
13
1985
14
1990
13.5
1991
14.2
1992
14.8
1993
15.1
1994
14.5
1995
13.8
1. Plot this data on a scatterplot (X axis year Y − axis – percent)
2. Does there appear to be a linear relationship?
3. Calculate the regression equation for these data.
4. Sketch the regression line on the scatterplot.
5. Estimate the percent of people below the poverty level in 1989.
6. Calculate the residuals for each of the observations and plot these residuals on a scatterplot.
7. Examine the scatterplot of the residuals. Is a transformation of the data necessary?
Explain your answer.
Quiz 3
The following table shows the number of deaths per 100, 000 people from heart disease.
Year
Deaths
1950
510
1960
521
1970
496
1980
436
1990
368
1996
358
1. Plot this data on a scatterplot (X axis – year, Y axis – number of deaths)
2. Does there appear to be a linear relationship?
3. Calculate the regression equation for these data.
4. Sketch the regression line on the scatterplot.
5. Estimate the number of deaths due to heart disease in 1974.
6. Calculate the residuals for each of the observations and plot these residuals on a scatterplot.
7. Examine the scatterplot of the residuals. Is a transformation of the data necessary?
Explain your answer.
www.ck12.org
100
Inferences about Regression
Quiz 1
In examining the relationship between a child’s height (in cm) and his or her age (in months)
the following summary statistics were output from a computer program:
n = 12
Table 1.22:
Parameter
Estimate
Standard Error
Age
Constant
.6350
64.9283
.0214
.5084
1. What is the predictor variable?
2. Do you think the two variables are correlated? Explain.
3. What would be the regression equation for predicting height from the age?
4. Use the regression equation and predict the height of a child who is 18 months old.
5. Test the null hypothesis that the regression coefficient for this scenario is zero.
a. Develop the null and alternate hypotheses
b. Set the critical values at the .01 level of significance.
c. Compute the test statistic
d. Make a decision regarding the null hypothesis.
6. Develop a 95% confidence interval for β.
Quiz 2
Given the following summary statistics:
n = 56
x̄ = 39.0
ȳ = 26.5
sx = 5.4
sy = 13.4
r = −.848
1. Find the estimate of the regression coefficient.
2. Find the estimate of the constant.
3. What would the regression equation be for predicting Y from X?
101
www.ck12.org
Quiz 3
When inflation is high, lenders require higher interest rates to make up for the loss of
purchasing power of their money while it is loaned out. The data we have is the return of oneyear Treasury bills and the rate of inflation as measured by the change in the government’s
Consumer Price Index in the same year. The data covers 51 years, from 1950 to 2000.
Following is output from a computer program which analyzed the data.
Linear Fit
T-bill = 2.6662262 + 0.6269356Inflation
Summary of Fit
RSquare 0.448878
RSquare Adj 0.437631
Root Mean Square Error 2.18016
Mean of Response 5.198431
Observations (or Sum Wgts) 51
Table 1.23: Parameter Estimates
Term
Estimate
Std Error
t Ratio
Prob > |t|
Intercept
Inflation
2.6662262
0.6269356
0.503848
0.099239
5.29
6.32
< .0001
< .0001
1. What is the correlation between inflation rates and T-bill returns?
2. What is the slope b1 of the fitted line and its standard error? (see output)
3. Calculate the t-statistic for testing the hypothesis that there is no straight line relationship
between inflation rate and T-bill return against the alternative that the return on T-bills
increases as the rate of inflation increases.
a. State the hypotheses:
b. Calculate t and report its degrees of freedom:
4. Find the regression equation.
5. Find a 90% confidence interval for the slope of the regression line.
www.ck12.org
102
Multiple Regression
Quiz 1
The experiment took place in February of 1986 at a student dormitory. Sixteen students
volunteered to be the subjects in the experiment. Each student blew into a breathalyzer to
indicate that his or her initial BAC was zero. The number (between 1 and 9) of 12 ounce
beers to be drunk was assigned to each of the subjects by drawing tickets from a bowl. Thirty
minutes after consuming their final beer, students had their BAC measured by a police officer
of the OSU police department. The officer also administered a road sobriety test before and
after the alcohol consumption. This involved performing four simple tasks, graded on a scale
of 1 to 10 (ten being a perfect rating), demonstrating coordination: balancing on one foot,
touching the tip of one’s nose with a forefinger, placing one’s head back with one’s eyes
closed, and walking heel to toe. The police officer was not aware of how much alcohol each
subject had consumed.
- taken from the Electronic Encyclopedia for Statistical Examples and Exercises entitled
‘BAC.’
The Variables:
ID = identification number
Gender = indicated by female or male
Weight = weight of each subject in pounds.
Beers = number of 12 ounce beers consumed
BAC = blood alcohol content
1st-Sobriety = combined score on the four road sobriety tests before alcohol consumption
2nd-Sobriety = combined score on the four road sobriety tests after alcohol consumption
Following are the summary statistics:
Dependent Variable: BAC vs. Independent Variable(s): Beers, Weight, Gender = male
Table 1.24: Parameter estimates:
Variable
Estimate
Std. Err.
Intercept
Beers
Weight_OSU
Gender_OSU
= male
0.03870783
0.010972462
0.019895956
0.0013093256
−3.444049E − 4 6.842001E − 5
−0.0032403069 0.0062860446
103
Tstat
P-value
3.5277252
15.195577
−5.0336866
−0.5154763
0.0042
< 0.0001
0.0003
0.6156
www.ck12.org
Table 1.25: Analysis of variance table for multiple regression model:
F −stat
Source
DF
SS
MS
Model
Error
3
12
Total
15
0.027846638 0.009282213 80.81081
0.0013783621 1.14863506E−
4
0.029225
P −value
< 0.0001
Root MSE: 0.010717439
R-squared: 0.9528
1. How many predictor variables are there in this scenario?
2. What does the regression coefficient for beers tell us?
3. What is the regression model for this analysis?
4. What is R square and what does it indicate?
5. Which of the predictor variables are statistically significant? Explain.
Quiz 2
Researchers are interested in predicting the carbon monoxide (CO) output from cigarettes
by using the amount of nicotine and the amount of tar. They take a SRS of 29 brands of
cigarettes and measure these quantities. Computer output for the analysis is given below:
Regression Analysis: CO versus NICOTINE, TAR
The regression equation is
CO = 2.47 − 6.43 NICOTINE +1.32 TAR
Table 1.26:
Predictor
Coef
SE Coef
T
P
Constant
NICOTINE
TAR
2.4711
−6.426
1.3184
0.9766
3.416
0.2312
2.53
−1.88
5.70
0.018
0.071
0.000
S = 1.559
R − Sq = 88.7%
Analysis of Variance
www.ck12.org
104
R − Sq(adj) = 87.8%
Table 1.27:
Source
DF
SS
MS
F
P
Regression
Residual Error
Total
2
26
495.60
63.23
247.80
2.43
101.90
0.000
28
558.83
1. What is the equation of the least squares regression line?
2. What is the predicted value of CO content when the level of nicotine in a cigarette is 1
mg and the level of tar is 10 mg?
3. Find the coefficient of determination, R2 and interpret it in terms of the problem.
How much alcohol can one consume before one’s Blood Alcohol Content (BAC) is above the
legal limit? An undergraduate statistics project was conducted at The Ohio State University
in Columbus, Ohio that explored the relationship between BAC and other factors such as
amount of alcohol consumed, gender, weight, and age.
The Study:
The experiment took place in February of 1986 at a student dormitory. Sixteen students
volunteered to be the subjects in the experiment. Each student blew into a breathalyzer to
indicate that his or her initial BAC was zero. The number (between 1 and 9) of 12 ounce
beers to be drunk was assigned to each of the subjects by drawing tickets from a bowl. Thirty
minutes after consuming their final beer, students had their BAC measured by a police officer
of the OSU police department. The officer also administered a road sobriety test before and
after the alcohol consumption. This involved performing four simple tasks, graded on a scale
of 1 to 10 (ten being a perfect rating), demonstrating coordination: balancing on one foot,
touching the tip of one’s nose with a forefinger, placing one’s head back with one’s eyes
closed, and walking heel to toe. The police officer was not aware of how much alcohol each
subject had consumed.
- taken from the Electronic Encyclopedia for Statistical Examples and Exercises entitled
‘BAC
Simple linear regression results:
Dependent Variable: BAC
Independent Variable: Beers
BAC = −0.012700604 + 0.017963761 Beers
Sample size: 16
R (correlation coefficient) = 0.8943
105
www.ck12.org
R − sq = 0.79984075
Estimate of error standard deviation: 0.020440951
Table 1.28: Parameter estimates
Parameter
Estimate
Std. Err.
DF
Intercept
Slope
−0.012700604 0.0126375025 14
0.017963761 0.0024017035 14
T −Stat
P −Value
−1.0049932
7.479592
0.332
< 0.0001
Table 1.29: Analysis of variance table for regression model
Source
DF
SS
MS
F −stat
P −value
Model
Error
1
14
0.023375345
0.005849655
0.023375345
4.178325E −
4
55.944298
< 0.0001
Total
15
0.029225
4. Interpret the intercept in the context of the problem.
5. Interpret the slope in the context of the problem.
6. What is the regression equation?
7. Predict the BAC of a person that has consumed 15 beers.
Quiz 3
When the Dow Jones stock index first reached 10, 000, the New York Times reported the
dates on which the Dow first crossed each of the “thousand” marks, starting with reaching
1000 in 1972. A regression of the Dow prices on year looks (in part) like this:
Dependent variable is : Dow
R − squared = 65.8%
Variable
Constant
Year
Coefficients
− 603335.00
305.47
1. What is the correlation between the Dow index and the year?
2. Write the regression equation.
www.ck12.org
106
3. Explain in this context what the equation says.
A car dealer, specializing in Corvette sports cars, enlarged his facilities and offered a number
of models for sale. His sales list includes data on age (in years), mileage (in thousand
miles), and selling price (in thousand dollars) of cars. Use the explanatory variables, age
and mileage, to predict the selling price of a car.
Regression Analysis
The regression equation is
Price = 34.3 − 1.14 Age −0.201 Mileage
Table 1.30:
Predictor
Coef
StdErr
T
P
Constant
Age
Mileage
34.315
−1.1400
−0.20065
1.950
0.2943
0.06027
17.59
−3.87
3.33
0.000
0.002
0.006
R-Sq = 89.2%
S = 3.259
R-Sq(adj) = 87.4%
Analysis of Variance
Table 1.31:
Source
DF
SS
MS
F
P
Regression
Residual Error
Total
2
12
1048.36
127.45
524.18
10.62
49.35
0.000
14
1175.82
4. Write the least-squares fitted regression line.
5. If x1 to be held constant, what is the change in ŷ when x2 is increased by 1 unit? This is
considered the “slope” for Mileage (x2 ). Interpret.
6. Use the least-squares fitted regression line to predict the selling price of a car that is
10 years old with a mileage of 55, 000. Assume the values given for age and mileage are
within the range of data used to calculate the least squares fitted regression line.
107
www.ck12.org
Test 1
1. Foresters use regression to predict the volume of timber in a tree using easily measured
quantities. Let y be the volume of timber measured in cubic feet and x be the diameter in
feet (measured at 3 feet above ground. Y = −30 + 60x. The predicted volume for a tree of
18 inches is
a. 1050 cubic feet
b. 600 cubic feet
c. 105 cubic feet
d. 90 cubic feet
e. 60 cubic feet
2. Given the least squares regression line: (cost of a monopoly property) = 67.3 + 6.78
(spaces from GO). Determine the residual for Reading Railroad which costs $200 and is 5
spaces from GO.
a. −98.8
b. −9.88
c. 98.8
d. −1418.3
e. A residual has no meaning since one of the variables is categorical.
3. With regard to regression, which of the following statements about outliers are true?
I Outliers have large residuals
II A point may not be an outlier even though its x value is an outlier in the x−variable and
its y−value is an outlier in the y−variable.
III Removal of an outlier sharply affects the regression line
a. I and II
b. I and III
c. II and III
d. I, II, and III
e. None of the above gives the complete set of true responses.
4. Consider the three points (2, 11), (3, 17), and (4, 29). Given any straight line, we can
calculate the sum of the squares of the three vertical distances from these points to the line.
What is the smallest possible value this sum can be?
www.ck12.org
108
a. 6
b. 9
c. 29
d. 57
e. Cannot be determined
5. A bivariate set has a value of rsquare = .81. Which is an appropriate conclusion?
a. R = 0.9
b. 81% of the data is usable
c. There is an 81% chance that the regression line will fit the data.
d. 81% of the variation between the variables is accounted for by the mode.
e. None of these is appropriate.
6. Consider the following data set:
Table 1.32:
School
Football Players’ SAT
All Students’ SAT
1
2
3
4
5
6
7
8
9
10
872
741
826
788
838
1034
820
897
881
825
1140
1007
1190
998
1050
1250
986
1083
1009
1090
a. Which school’s scores would be influential if you were to make a scatterplot?
b. Using football players’ SAT as your explanatory variable, find the least squares regression
line for this data.
c. If a school’s players have an average SAT score of 814. What score would you predict for
the entire student body?
d. Find the residual for school number 8.
7. Consider the following computer output of a linear regression analysis:
109
www.ck12.org
The two variables are Percent 2002 (independent variable) and Total SAT 2002 (dependent
variable)
Summary of Fit
RSquare .770503
RSquare Adj .76582
Root Mean Square Error 32.11888
Observations 51
Table 1.33: Analysis of Variance
Source
DF
Sum of Squares
Model
Error
C. total
1
49
50
169713.02
50549.49
220262.51
Table 1.34: Parameter Estimates
Term
Estimate
Std Error
Intercept
Percent 2002
1145.1989
2.046837
7.57007
0.159583
What is the equation for the least squares regression line?
8. Data was collected on two variables X and Y and a least squares regression line was fitted
to the data. The estimated equation is Y = −2.29 + 1.70X. What is the residual for the
point (5, 6)?
Test 2
1. If the coefficient of determination is calculated as 0.81, then the correlation coefficient is:
a. 0.81
b. 0.9
c. −0.9
d. 0.405
e. Cannot be determined
2. A regression analysis of company profits and the amount of money the company spent on
www.ck12.org
110
advertising four r−squared to be .72. Which of these is true:
I This model can correctly predict the profit for 72% of the companies.
II On average, 72% of a company’s profit results from advertising
III On average, companies spend about 72% of their profit on advertising
a. I only
b. II only
c. III only
d. I and III
e. None are correct
3. Medical records indicate that people with more education tend to live longer; the correlation is .48. The slope of the linear model predicts lifespan from years of education suggests
that on average people tend to live .8 extra years for each additional year of education they
have. The slope of the line that would predict years of education from lifespan is
a. .288
b. .384
c. .8
d. 1.25
e. 1.67
4. The regression analysis examines the relationship between the number of years of formal
education a person has and their annual income. According to this model, about how much
more money do people who finish a 4 year college program earn each year, on average than
those with only a 2 year degree? The dependent variable is Income. R-squared is 25.8%.
The coefficient of constant is 3984.45. The coefficient of education is 2668.45
a. $2006
b. $2710
c. $5337
d. $7968
e. $9321
5. The correlation between a family’s weekly income and the amount they spend on restaurant meals is found to be r = .30. Which must be true?
I Families tend to spend about 30% of their income in restaurants
111
www.ck12.org
II In general, the higher the income, the more the family spends in restaurants
III The line of best fit passes through 30% of the data points (income, restaurant$)
a. I only
b. II only
c. III only
d. II and III only
e. I, II and III
6. Which of the following statements about the correlation coefficient are true?
I The correlation coefficient and the slope of the regression line may have opposite signs.
II A correlation of 1 indicates a perfect cause-and-effect relationship between the variables.
III Correlations of +.87 and −.87 indicate the same degree of clustering around the regression
line.
a. I only
b. II only
c. III only
d. I and II
e. I, II and III
7. Given a set of ordered pairs (x, y) with sx = 2.5, sy = 1.9, r = .63 , what is the slope of
the regression line of y on x?
a. 1.9
b. 2.63
c. 0.65
d. 1.32
e. 0.48
8. Lydia and Bob were searching the Internet to find information on air travel in the United
States. They found data on the number of commercial aircraft flying in the United States
during the years 1990-1998. The dates were recorded as years since 1990. Thus, the year
1990 was recorded as year 0. They fit a least squares regression line to the data. The graph
of the residuals and part of the computer output for their regression are given below.
www.ck12.org
112
y = 2939.93 + 233.517x
r = 0.88
a. Is a line an appropriate model to use for these data? What information tells you this?
b. What is the value of the slope of the least squares regression line?
Interpret the slope in the context of this situation.
c. What is the value of the intercept of the least squares regression line?
Interpret the intercept in the context of this situation.
d. What is the predicted number of commercial aircraft flying in 1992?
e. What was the actual number of commercial aircraft flying in 1992?
9. Following are the distances (in miles) and cheapest airline fares (in dollars) to certain
destinations for passengers flying out of Baltimore, Maryland as of January 8, 1995.
Table 1.35:
Destination
Distance
Airfare
Destination
Distance
Airfare
Atlanta
Boston
Chicago
Dallas
Detroit
Denver
576
370
612
1216
409
1502
178
138
94
278
158
258
Miami
New Orleans
New York
Orlando
Pittsburgh
St. Louis
946
998
189
787
210
737
198
188
98
179
138
98
113
www.ck12.org
a. Write the equation of the least squares line for predicting airfare from distance.
b. What airfare does the least squares line predict for a destination which is 300 miles away?
c. What airfare does the least squares line predict for a destination which is 1500 miles
away?
d. Use the equation of the regression line to predict the airfare to a destination 900 miles
away.
e. What airfare would the regression line predict for a flight to San Francisco which is
2842 miles from Baltimore? Would you take this prediction as seriously as the one for
900 miles? Explain.
1.10
Chi-Square
The Goodness of Fit Test
Quiz 1
1. Complete the following sentence:
The chi-square goodness of fit test is used to
2. Following is information on the ethnicity distribution of holders of the highest academic
degree for the year 1981
Table 1.36:
Race/Ethnicity
Percent
White, non-Hispanic
Black, non-Hispanic
Hispanic
Asian or Pacific Islander
American Indian/Alaskan Native
Non-resident alien
78.9
3.9
1.4
2.7
.4
12.8
A random sample of 300 doctoral degrees recipients in 1994 showed the following frequency
distribution:
Table 1.37:
Race/Ethnicity
Observed
White, non-Hispanic
189
www.ck12.org
114
Table 1.37: (continued)
Race/Ethnicity
Observed
Black, non-Hispanic
Hispanic
Asian or Pacific Islander
American Indian/Alaskan Native
Non-resident alien
10
6
14
1
80
a. If the distribution from 1981 is accurate how many recipients, out of the 300, would you
expect to see of each ethnicity in 1994?
b. Perform a goodness of fit test to determine if the distribution in 1994 is significantly
different from the distribution in 1981.
i. State the null and alternate hypotheses.
ii. State the number of degrees of freedom for this test.
iii. Use a chi-square table to determine the critical value at the .05 level of significance.
iv. Calculate the test statistic
v. State your decision and your conclusion.
Quiz 2
1. The Chi-Square test of independence is used to
2. Following is information that has been gathered about the number of births per each
zodiac sign in a given period of time.
Table 1.38:
Sign
Births
Aries
Taurus
Gemi ni
Cancer
Leo
Virgo
Libra
Scorpio
Sagittarius
Capricorn
23
20
18
23
20
19
18
21
19
22
115
www.ck12.org
Table 1.38: (continued)
Sign
Births
Aquarious
Pisces
24
29
Use the chi-square goodness of fit test to determine if births are uniformly distributed over
the zodiac signs.
a. State the null and alternate hypotheses
b. How many degrees of freedom does this test have?
c. Use a chi-square table to determine the critical value at the .01 level of significance
d. Calculate the test statistic.
e. State your decision and your conclusion.
Quiz 3
1. The chi-square test of homogeneity is used to
2. The partners in a law firm brought in the following numbers of new clients during the
past year:
Partner
Number of new clients
Jones
35
Smith
42
Brown
22
Allen
41
Cross
30
Is there sufficient evidence as the .10 level of significance that partners do not bring in equal
numbers of new clients?
a. State the null and alternate hypotheses
b. How many degrees of freedom does this test have?
c. Use a chi-square table to determine the critical value at the .10 level of significance
d. Calculate the test statistic.
e. State your decision and your conclusion.
www.ck12.org
116
Test of Independence
Quiz 1
A company held a blood pressure screening for its employees. The results are summarized
in the following table. The information is categorized by age group and blood pressure level.
Table 1.39:
Low
Normal
High
Under 30
30 − 49
Over 50
27
48
23
37
91
51
31
93
73
1. What proportion of employees under 30 has high blood pressure?
2. What proportion of people with high blood pressure are over 50?
3. Does there appear to be an association between age and high blood pressure among these
employees?
i. State the null and alternate hypotheses.
ii. How many degrees of freedom are in this chi-square test?
iii. Calculate the chi-square statistic.
iv. Determine, using the table, the critical value for this chi-square at the .05 level of
significance.
v. State your decision and your conclusion.
Quiz 2
The following table shows the political affiliation of American voters and their positions on
the death penalty. This data is hypothetical.
Table 1.40:
Republican
Democrat
Other
Support the death penalty
Oppose the death penalty
.26
.12
.24
.04
.24
.10
117
www.ck12.org
1. What is the probability that a randomly chosen voter supports the death penalty?
2. What is the probability that a randomly chosen voter is not a Republican?
3. What is the probability that someone who favors the death penalty is a Democrat?
4. What is the probability that a Republican supports the death penalty?
5. What is the probability that a voter chosen at random is a Democrat and opposes the
death penalty?
6. What is the probability that a voter chosen at random is either a Democrat OR opposes
the death penalty?
7. Do party affiliation and opinion about the death penalty seem to be independent?
a. State the null and alternative hypotheses
b. How many degrees of freedom does your test have?
c. Using technology calculate the chi-square statistic.
d. What is the p−value associated with your statistic?
e. Using the .01 level of significance, state your decision and your conclusion.
Quiz 3
Some researchers were interested in a possible relationship between heart disease and baldness
and so they asked a sample of 663 male heart patients to classify their degree of baldness.
They also asked a control group of 772 males to do the same baldness assessment. The
following table has the results:
Table 1.41:
Heart Disease
Control
None
Little
Some
Much
Extreme
251
165
195
50
2
331
221
185
34
1
1. What proportion of these men identified themselves as having little or no baldness?
2. Of those who had heart disease, what proportion claimed to have some, much or extreme
baldness?
3. Of those who declared themselves as having little or no baldness, what proportion was in
the control group?
4. Determine whether a relationship seems to exist between heart disease and baldness.
www.ck12.org
118
a. State the null and alternative hypotheses
b. How many degrees of freedom does your test have?
c. Using technology calculate the chi-square statistic.
d. What is the p−value associated with your statistic?
e. Using the .05 level of significance, state your decision and your conclusion.
Testing One Variance
Quiz 1
Suppose a sample 30 observations is drawn from a population with σ 0 2 = 4.55. The sample
variance, s2 = 6.7. Test the hypothesis that the sample comes from a population with a
variance greater than 4.55.
1. State the null and alternate hypotheses.
2. How many degrees of freedom are there?
3. Compute the chi-square statistic.
4. What is the p−value for your test?
5. What is your decision and your conclusion?
6. Construct a 95% confidence interval for the population variance.
7. Complete the following: In testing for single variance using the chi-square statistic there
are three pieces of information needed: the sample standard deviation, the number of data
.
pieces in your sample (n) and
Quiz 2
Math instructors often interest in how exam scores of their students vary. The variance
is important to them. Suppose a math instructor believes that the standard deviation for
his final exam is 7 points but a student disagrees. The student claims that the standard
deviation is more than 7 points. The student wants to conduct a hypothesis test. The
student takes a random sample of 15 tests and finds the sample standard deviation to be 6.5
points.
1. State the null and alternate hypotheses for this test.
2. How many degrees of freedom are there for this test?
3. Compute the chi-square statistic.
119
www.ck12.org
4. What is your p−value?
5. What is your decision and your conclusion?
6. Construct a 90% confidence interval for the population variance.
7. Complete the following: In testing for single variance using the chi-square statistic there
are three pieces of information needed: The hypothesized population variance, the number
of data pieces in your sample (n) and
.
Quiz 3
1. Complete the following: In testing for single variance using the chi-square statistic there
are three pieces of information needed: the sample standard deviation, the hypothesized
population standard deviation and
.
A post office finds that the standard deviation for waiting times on a Monday afternoon is
6.8 minutes. The post office experiments with a single main waiting line and find that for a
random sample of 25 customers, the waiting times have a standard deviation of 7.1 minutes.
2. State the null and alternate hypotheses for this test.
3. How many degrees of freedom are there for this test?
4. Compute the chi-square statistic.
5. What is your p−value?
6. What is your decision and your conclusion?
7. Construct a 90% confidence interval for the population variance.
Test 1
In the paper “Color Association of Male and Female fourth-Grade School Children” (J.
Psych., 1988, 39=83-8) children were asked to indicate what emotion they associated with
the color red. The response and the sex of the child are in the table below.
Table 1.42:
Females
Males
Anger
Happy
Love
Pain
27
34
19
12
39
38
17
28
1. Under an appropriate null hypothesis (that there is no association between sex and
emotion felt when seeing the color red), the expected frequency for the cell corresponding to
www.ck12.org
120
Anger and Male is
a. 15.9
b. 55.7
c. 30.4
d. 31.9
e. 29.1
2. The null hypothesis will be rejected at the .05 level of significance is the test statistic
exceeds
a. 3.84
b. 5.99
c. 7.81
d. 9.49
e. 14.07
3. The approximate p−value is
a. Between .100 and .900
b. Between .050 and .100
c. Between .025 and .050
d. Between .010 and .025
e. Between .005 and .010
4. Which of the following is not correct?
a. The children were cross-classified by sex and emotion associated with red. Each child was
counted in one and only one cell.
b. The null hypothesis is that the type of emotion associated with red is independent of the
sex of the child.
c. The null hypothesis is that the proportion of emotions associated with red is the same for
both sexes.
d. All expected cell counts should be greater than 5 in order that the distribution of the test
statistic is an approximate chi-square distribution.
e. If we reject the null hypothesis than we have proven that the two sexes associate red with
emotions in different ways.
5. A Type I error would be committed if:
121
www.ck12.org
a. We conclude that the sex of the child and the emotion associated with red are independent
when in fact they are not independent.
b. We conclude that the sex of the child and the emotion associated with red are not
independent when in fact they are not independent.
c. We conclude that the proportion of emotions associated with red differs between males
and females when in fact they are the same.
d. We conclude that the proportion of emotions associated with red is the same for male
and female when in fact they are the same.
e. We fail to find any association between the color red and emotions for either sex.
6. The test statistic and approximate p−value is:
a. 4.661 .1983
b. 4.661 .3966
c. 4.629 .2011
d. 4.629 .4022
e. 4.629 .1006
7. Each person in a random sample of males and females was asked to state his/her sex and
preferred color. The resulting frequencies are shown below:
Table 1.43:
Male
Female
Red
Blue
Green
3
17
11
11
6
2
Which of the following is false?
a. 55% of males prefer the color blue
b. Of those who prefer the color green, 75% are males
c. 44% of people surveyed prefer the color blue
d. A higher percentage of males preferred the color blue than females.
e. 15% of people are males who prefer the color red.
8. A rescue service wishes to student the behaviour of lost hikers. Two hundred hikers
selected at random form those applying for hiking permits are asked whether they would
head uphill, downhill or remain in the same place if they became lost while hiking. Each
hiker ins the sample was also classified according to whether he or she was an experienced
www.ck12.org
122
or novice hiker. The resulting data are summarized below:
Table 1.44:
Novice
Experienced
Uphill
Downhill
Remain
place
20
10
50
30
50
40
in
same
Do these data provide convincing evidence of an association between the level of hiking
expertise and the direction the hiker would head if lost? Give appropriate statistical evidence
to support your conclusion.
Test 2
1. It is generally agreed that the use of the chi-squared distribution is appropriate when the
a. Sample size is at least 30
b. Sample size is large enough so that all of the observed cell counts is at least 5
c. Sample size is large enough so that all of the expect cell counts is at least 5
d. Sample size is large enough so that at least one of the expected cell counts is at least 5
e. Sample size is large enough so that the average of the expected cell counts is at least 5
2. In a chi-square test of the null hypothesis that is based on a sample of n = 100 observations
classified according to 10 class intervals, the test statistic has
a. 99 degrees of freedom
b. 97 degrees of freedom
c. 9 degrees of freedom
d. 7 degrees of freedom
e. The number of degrees of freedom cannot be determined without the data
3. Which of the following statements are true?
I The chi-square inference procedures deal with categorical variables.
II The chi-square distribution is symmetric
III A chi-square test of independences on a 2 × 2 table produces the same result as a two
tailed difference of proportions test.
a. I only
123
www.ck12.org
b. I and II only
c. I and III only
d. I, II and III
e. None of the above
4. A random sample of 100 faculty members of a university are asked to respond to two
questions:: Question 1: Are you happy with your financial situation? Question 2. Do you
approve of the government’s economic policies? The responses are in the following table:
Question
Question
2
Yes
No
1
Yes
22
12
No
48
18
To test the hypothesis that the response to Question 1 is independent of response to Question
2 at the 5% level of significance, the expected frequency for the cell (yes, yes) and the critical
value of the associated test statistic are:
a. 23.8 and 1.96 respectively
b. 10.2 and 3.84 respectively
c. 23.8 and 3, 84 respectively
d. 23.8 and 7.81 respectively
e. 10.2 and 7.81 respectively
5. A survey was conducted to investigate whether alcohol consumption and smoking are
related. The following information was gathered for 600 people:
Table 1.45:
Drinker
Non-drinker
Smoker
Non-smoker
193
89
165
153
Which of the following statements is true?
a. The appropriate alternative hypothesis is: Smoking and Alcohol Consumption are independent.
b. The appropriate null hypothesis is: Smoking and Alcohol consumption are not indepenwww.ck12.org
124
dent.
c. The calculated value of the test statistic is 3.84
d. The calculated value of the test statistic is 7.86
e. At the level .01 we conclude that smoking and alcohol consumption are related.
6. A surprising study of 1437 male hospital admissions reported in the New York Times
(February 24, 1993) page C12) found that, of 665 patients admitted with heart attacks, 214
had baldness, while the remaining 772 non-heart related admissions, 175 had baldness. Is
this evidence sufficient as the 5% significance level to say that there is a relationship between
heart attacks and baldness? Give appropriate statistical evidence to support your conclusion.
The following grades were earned by students in three teachers’ classes.
Table 1.46:
Mrs. C
Mr. M
Ms. L
A
B
C
12
6
15
24
12
6
12
18
3
7. Determine if these teachers, as a group, meet the established standard of 30% A’s, 40%
B’s, and 30% C’s.
8. Is there evidence that the grading patterns are associated with the teacher who wards the
grades?
1.11
Analysis of Variance and the F-Distribution
F Distribution and Testing Two Variances
Quiz 1
1. True or False: The F distribution is symmetrical.
The variability in the amount of impurities present in a batch of chemicals used for a particular process depends on the length of time that the process is in operation. Suppose a
sample of 25 is drawn from the normal process which is to be compared to a sample of a
new process.
125
www.ck12.org
Table 1.47:
n
s2
Sample 1
Sample 2
25
1.04
25
.51
2. What are the null and alternative hypotheses for this scenario?
3. What is the critical value with α = .05?
4. Calculate the F ratio.
5. Would you reject or fail to reject the null hypothesis? Explain your reasoning.
Quiz 2
1. True or False: The F distribution ranges across all real numbers.
A manufacturer wishes to determine whether there is less variability in the silver plating
dome by company 1 than that done by company 2. Independent random samples yield the
following results.
Table 1.48:
n
s2
Sample 1
Sample 2
12
0.035
12
0.062
2. What are the null and alternative hypotheses for this scenario?
3. What is the critical value with α = .05?
4. Calculate the F ratio.
5. Would you reject or fail to reject the null hypothesis? Explain your reasoning.
Quiz 3
1. Complete the following: the F distribution is a family of distributions based on
A math test is given in two classrooms. The principal of the school wanted to know if the
two classroom variances were different.
www.ck12.org
126
Table 1.49:
n
s2
Sample 1
Sample 2
21
16.8
16
42.6
2. What are the null and alternative hypotheses for this scenario?
3. What is the critical value with α = .05?
4. Calculate the F ratio.
5. Would you reject or fail to reject the null hypothesis? Explain your reasoning.
One Way ANOVA
Quiz 1
Three different machines were being considered for purchase by a manufacturer. Initially
five of each machine was borrowed, and each was randomly assigned to one of 15 technicians,
all equal in skill. Each machine was put through a series of tasks and rated. The higher
score on the test, the better the performance of the machine. The data are:
Table 1.50:
Machine 1
Machine 2
Machine 3
24.5
23.5
26.4
27.1
29.9
28.4
34.2
29.5
32.2
30.1
26.1
28.3
24.3
26.2
27.8
1. State the null hypothesis.
2. Using the data above, please fill out the missing values in the table below.
Table 1.51:
Machine 1
Number (nk )
Total (Tk )
Mean (X)
Machine 2
154.4
30.88
127
Machine 3
Totals
5
=
=
=
www.ck12.org
Table 1.51: (continued)
Machine 1
Machine 2
Machine 3
Totals
Sum
of
Squared
Obs.
∑ 2
2
( ni=1
Xik
)
Sum of Obs.
Squared/Number
( 2)
T
of Obs. nkk
=
=
3. What is the mean squares between groups (M SB ) value?
4. What is the mean squares within groups (M SW ) value?
5. What is the F ratio of these two values?
6. Using a α = .05, please use the F distribution to set a critical value
7. What decision would you make regarding the null hypothesis? Why?
Quiz 2
A sociology professor was interested in studying the question of whether the presence of
others influenced helping behavior when there is a person in some kind of distress. Data was
kept on the number of seconds it took for a subject to respond to the person in distress. The
subject was in a room with other people. Following is the data:
# people present
0
25
30
20
32
2
30
33
29
40
36
4
32
39
35
41
44
1. State the null hypothesis.
2. Using the data above, please fill out the missing values in the table below.
www.ck12.org
128
Table 1.52:
0
2
Number (nk )
Total (Tk )
Mean (X)
Sum
of
Squared
Obs.
∑ 2
2
)
( ni=1
Xik
Sum of Obs.
Squared/Number
( 2)
T
of Obs. nkk
4
Totals
5
=
=
=
=
168
33.6
=
3. What is the mean squares between groups (M SB ) value?
4. What is the mean squares within groups (M SW ) value?
5. What is the F ratio of these two values?
6. Using a α = .05, please use the F distribution to set a critical value
7. What decision would you make regarding the null hypothesis? Why?
Quiz 3
The data below comes from a study by Hogg and Ledolter (Hogg, R. V., and J. Ledolter.
Engineering Statistics. New York: MacMillan, 1987.) of bacteria counts in shipments of
milk. The columns represent different shipments. The rows are bacteria counts from cartons
of milk chosen randomly from each shipment. Do some shipments have higher counts than
others?
24
15
21
27
33
23
14
7
12
17
14
16
11
9
7
13
12
18
7
7
4
7
12
18
19
24
19
15
10
20
1. the null hypothesis.
2. Using the data above, please fill out the missing values in the table below.
129
www.ck12.org
Table 1.53:
1
Number
(nk )
Total (Tk )
Mean (X)
Sum
of
Squared
Obs.
∑ 2
2
)
( ni=1
Xik
Sum
of
Obs.
Squared/Number
of 2 ) Obs.
(
2
3
4
5
6
Totals
=
80
13.3
=
=
=
=
Tk
nk
3. What is the mean squares between groups (M SB ) value? 4. What is the mean squares
within groups (M SW ) value? 5. What is the F ratio of these two values? 6. Using a
α = .05, please use the F distribution to set a critical value 7. What decision would you
make regarding the null hypothesis? Why?
Two Way ANOVA Test and Experimental Design
Quiz 1
A research study was conducted to examine the impact of eating a high protein breakfast
on adolescents’ performance during a physical education physical fitness test. Half of the
subjects received a high protein breakfast and half were given a low protein breakfast. All of
the adolescents, both male and female, were given a fitness test with high scores representing
better performance. Test scores are recorded below.
Table 1.54:
Group
Males
www.ck12.org
High Protein
Low Protein
10
7
9
6
8
5
5
4
7
4
5
3
130
Table 1.54: (continued)
Group
Females
High Protein
Low Protein
4
6
3
2
4
5
1
2
1. Complete the following ANOVA table.
Table 1.55:
Source
SS
df
Protein Level
Gender
Protein Level x
Gender
Within
20
45
5
1
1
1
36
16
MS
F
Total
2. State the three hypotheses associated with this two way ANOVA.
3. What are the critical values for each of these three hypotheses?
4. Would you reject the null hypotheses? Why or why not?
Quiz 2
Researchers have sought to examine the effect of various types of music on agitation levels
in patients who are in the early and middle stages of Alzheimer’s disease. Patients were
selected to participate in the study based on their stage of Alzheimer’s disease. Three forms
of music were tested: Easy listening, Mozart, and piano interludes. While listening to music,
agitation levels were recorded for the patients with a high score indicating a higher level of
agitation. Scores are recorded below.
Table 1.56:
Group
Piano Interlude
Mozart
Easy Listening
21
24
9
12
29
26
131
www.ck12.org
Table 1.56: (continued)
Group
Early
Alzheimer’s
Middle
Alzheimer’s
Piano Interlude
Mozart
Easy Listening
Stage
22
10
30
Stage
18
20
22
20
25
5
9
14
18
11
24
26
15
18
20
18
20
9
13
13
19
1. Complete the following ANOVA table.
Table 1.57:
Source
SS
df
Type of Music
Degree
of
Alzheimer’s
Music
x
Alzheimer’s
Within
740
30
2
1
260
2
178
24
MS
F
2. State the three hypotheses associated with this two way ANOVA.
3. What are the critical values for each of these three hypotheses?
4. Would you reject the null hypotheses? Why or why not?
5. Interpret your answer.
Quiz 3
A study examining differences in life satisfaction between young adult, middle adult, and
older adult men and women was conducted. Each individual who participated in the study
completed a life satisfaction questionnaire. A high score on the test indicates a higher level
of life satisfaction. Test scores are recorded below.
www.ck12.org
132
Table 1.58:
Group
Male
Female
Young Adult
Middle Adult
Older Adult
4
2
3
4
2
7
4
3
6
5
7
5
7
5
6
8
10
7
7
8
10
7
9
8
11
10
9
12
11
13
1. Complete the following ANOVA table.
Table 1.59:
Source
SS
df
Age
Gender
Age x Gender
Within
180
30
0
44
2
1
2
24
MS
F
2. State the three hypotheses associated with this two way ANOVA.
3. What are the critical values for each of these three hypotheses?
4. Would you reject the null hypotheses? Why or why not?
5. Interpret your answers.
Test 1
1. True or False? If False, correct it.
In a one-way classification ANOVA, when the null hypothesis is false, the probability of
obtaining an F-ratio exceeding that reported in the F table at the .05 level of significance is
greater than .05.
2. In a study, subjects are randomly assigned to one of three groups: control, experimental
A, or experimental B. After treatment, achievement test scores for the three groups are
compared. The appropriate statistical test for this comparison is:
133
www.ck12.org
a. the correlation coefficient
b. chi square
c. the t-test
d. the analysis of variance
3.
Table 1.60:
Source
SS
df
Between
Within Total
30.5
165.0
4
99
What decision would be made regarding : population means are equal?
a. Reject H0 at the .05 level
b. Fail to reject H0 at the .01 level
c. Insufficient information is given to answer
4. Nine children were randomly split into three groups of three each. In a spelling unit,
individuals in one group were criticized each time they misspelled a word. The individuals
in another group were praised each time they correctly spelled a word, while the individuals
in the third group were neither praised nor criticized. At the end of the unit, each child was
given ten words to spell with the following results (number correct is given for each child):
Table 1.61:
Praised
Neutral
Criticized
8
9
7
3
2
5
9
10
7
MEANS FOR RESPONSE
TREATMENT MEAN(WORDS CORRECT)
CRITICIZED 8.66667
PRAISED 8
NEUTRAL 3.33333
ANALYSIS OF VARIANCE:
www.ck12.org
134
Table 1.62:
SOURCE OF VARI- DF
ATION
SS
MEAN SQUARE
RESPONSE
EXPERIMENTAL
ERROR
TOTAL
2
6
50.6667
11.3333
25.33333
1.88889
8
62.0000
PROBABILITY LEVEL FOR COMPARING MEANS = 0.05
Is there any evidence for significant differences among the methods?
5. Following is computer output for an ANOVA.
Table 1.63:
Source
SS
df
MS
Treatment
Error
Total
2356.5
2014.9
4371.4
3
20
23
785.5
100.74
What is the rejection region at the .05 level of significance for the above ANOVA?
a. 2.78
b. 2.87
c. 3.03
d. 3.10
e. 3.49
6. A one-way ANOVA was conducted on a dataset with five levels. Sample sizes for each
level were 5, 6, 5, 5, and 4. The correct degrees of freedoms for the ANOVA are:
a. DFG = 5, DFE = 20, DFT = 25
b. DFG = 5, DFE = 19, DFT = 24
c. DFG = 4, DFE = 21, DFT = 25
d. DFG = 4, DFE = 20, DFT = 24
7. A client tells you that he wants to conduct a one-way ANOVA for four means. Based only
on this information, i.e., four means only, can you conduct a one-way ANOVA? Explain.
135
www.ck12.org
8. For a two-way ANOVA with four levels of Factor A, five levels of Factor B, and 100 total
observations, which of the following is the appropriate combination of degrees of freedoms?
a. DFA = 4, DFB = 5, DFAB = 20, DFE = 71, DFT = 100
b. DFA = 3, DFB = 4, DFAB = 7, DFE = 86, DFT = 100
c. DFA = 3, DFB = 4, DFAB = 12, DFE = 80, DFT = 99
d. DFA = 3, DFB = 4, DFAB = 7, DFE = 90, DFT = 99
A research study was conducted to examine the clinical efficacy of a new antidepressant.
Depressed patients were randomly assigned to one of three groups: a placebo group, a group
that received a low dose of the drug, and a group that received a moderate dose of the drug.
After four weeks of treatment, the patients completed the Beck Depression Inventory. The
higher the score, the more depressed the patient. The data are presented below. Compute
the appropriate test.
Table 1.64:
Placebo
Low Dose
Moderate Dose
38
47
39
25
42
22
19
8
23
31
14
26
11
18
5
9. What is your computed answer?
10. What would be the null hypothesis in this study?
11. What would be the alternate hypothesis?
12. What probability level did you choose and why?
13. What were your degrees of freedom?
14. Is there a significant difference between the four testing conditions?
15. Interpret your answer.
16. If you have made an error, would it be a Type I or a Type II error? Explain your answer.
Test 2
1. Suppose the critical region for a certain test of hypothesis is of the form F > 9.48773 and
the computed value of F from the data is 86 (F refers to an F statistic.) Then:
www.ck12.org
136
a. H0 should be rejected.
b. Ha is two-tailed.
c. The significance level is given by the area to the right of .48773 under the appropriate F
distribution.
d. None of these.
2. True or False? If False, correct it.
In ANOVA, if we wish to investigate the difference among five means, it is good statistical
procedure to perform a t-test on each pair of means.
3. Samples of size 11 are taken from each of 5 populations. Complete the following analysis
of variance table:
Table 1.65:
Source
S.S.
d.f.
M.S.
F
Betweenmeans
Withinsamples
Total
1000
5000
6000
a
b
c
d
e
a. a = 4 b = 44 c = 250 d = 113.6 e = 2.2
b. a = 4 b = 44 c = 250 d = 113.6 e = 0.2
c. a = 5 b = 55 c = 200 d = 90.9 e = 0.2
d. a = 5 b = 50 c = 200 d = 100 e = 2.0
e. a = 4 b = 50 c = 250 d = 100 e = 2.5
4. Following is computer output for an ANOVA.
Table 1.66:
Source
SS
df
MS
Treatment
Error
Total
2356.5
2014.9
4371.4
3
20
23
785.5
100.74
What is the f −value for the above ANOVA?
a. 1.17
b. 23.39
137
www.ck12.org
c. 0.13
d. 6.67
e. 7.80
5. A one-way ANOVA was conducted on a dataset with four levels. Sample sizes for each
level were 5, 6, 5, and 4. The F statistic has which of the following distributions, when H0
is true?
a. T (4)
b. T (5, 20)
c. F (3, 16)
d. F (4, 18)
e. F (4, 15)
6. A client tells you that her dog ate her ANOVA table. She says that she only has
SSG = 456.3 and M SE = 35.3. She remembers that the total sample size is 30 and the
number of levels is 4 in her analysis. What is the F −statistic in her ANOVA?
a. 12.926
b. 0.0774
c. 7.5
d. 4.309
e. Not enough information to answer this question.
7. For a two-way ANOVA with four levels of Factor A, six levels of Factor B, and 121 total
observations, which of the following is the appropriate combination of degrees of freedoms?
a. DFA = 3, DFB = 7, DFAB = 21, DFE = 90, DFT = 121
b. DFA = 4, DFB = 6, DFAB = 10, DFE = 101, DFT = 121
c. DFA = 3, DFB = 5, DFAB = 15, DFE = 97, DFT = 120
d. DFA = 3, DFB = 5, DFAB = 14, DFE = 98, DFT = 120
A researcher is concerned about the level of knowledge possessed by university students
regarding United States history. Students completed a high school senior level standardized
U.S. history exam. Major for students was also recorded. Data in terms of percent correct is
recorded below for 32 students. Compute the appropriate test for the data provided below.
www.ck12.org
138
Table 1.67:
Education
Business/Management Behavioral/Social
Science
Fine Arts
62
81
75
58
67
48
26
36
72
49
63
68
39
79
40
15
80
57
87
64
28
29
62
45
42
52
31
80
22
71
68
76
8. What is your computed answer?
9. What would be the null hypothesis in this study?
10. What would be the alternate hypothesis?
11. What probability level did you choose and why?
12. What were your degrees of freedom?
13. Is there a significant difference between the four testing conditions?
14. Interpret your answer.
15. If you have made an error, would it be a Type I or a Type II error? Explain your answer.
139
www.ck12.org