Download Unit 3 Review

Document related concepts

History of statistics wikipedia , lookup

Statistics wikipedia , lookup

Transcript
MATH 138
Statistics Classroom Handouts
Unit 3 and Final Review
Name: _________________________
Course Number: ________________
Instructor: _____________________
Rev. 11/2015
MATH-138: Objectives
Intro Stats, by Richard D. De Veaux, Paul F. Velleman, and David E. Bock
Unit III
The student will be able to:
Chapter 15: Sampling Distribution Models
1.
2.
3.
State and apply the conditions and uses of the Central Limit Theorem.
Determine the mean and standard deviation (standard error) for a sampling
distribution of proportions or means.
Apply the sampling distribution of a proportion or a mean to application
problems.
Chapter 16: Confidence Intervals for Proportions
4.
5.
6.
Determine and interpret the margin of error in the context of the problem.
Construct a confidence interval for a proportion and interpret in the context of the
problem, checking the necessary assumptions.
Determine the sample size necessary to produce a specific margin of error.
Chapter 17: Testing Hypotheses about Proportions
7.
Perform a one-proportion z-test, to include: writing appropriate hypotheses,
checking the necessary assumptions, drawing an appropriate diagram, computing
the P-value, making a decision, and interpreting the results in the context of the
problem.
Chapter 18: Inferences about Means
8.
9.
Perform a t-test for the population mean, to include: writing appropriate
hypotheses, checking the necessary assumptions, drawing an appropriate diagram,
computing the P-value, making a decision, and interpreting the results in the
context of the problem.
Compute and interpret in context a t-based confidence interval for the population
mean, checking the necessary assumptions.
Chapter 19: More about Tests and Intervals
10.
11.
12.
Interpret the meaning of the P-value in the context of the problem.
Determine when a result is statistically significant.
Explain Type I and Type II errors in the context of the problem.
Chapter 20: Comparing Groups
13.
14.
15.
16.
Find and interpret a confidence interval for the difference between two
proportions, checking the necessary assumptions.
Perform a two-proportion z-test, to include: writing appropriate hypotheses,
checking the necessary assumptions, drawing an appropriate diagram, computing
the P-value, making a decision, and interpreting the results in the context of the
problem.
Compute and interpret in context a t-based confidence interval for the difference
between two population means, checking the necessary assumptions.
Perform and interpret a two-sample t-test for two population means, to include:
writing appropriate hypotheses, checking the necessary assumptions, drawing an
appropriate diagram, computing the P-value, making a decision, and interpreting
the results in the context of the problem.
Chapter 21: Paired Samples & Blocks
17.
18.
Find and interpret in context a paired confidence interval, checking the necessary
assumptions.
Perform a paired t-test to include: writing appropriate hypotheses, checking the
necessary assumptions, drawing an appropriate diagram, computing the P-value,
making a decision, and interpreting the results in the context of the problem.
Chapter 22: Comparing Counts
19.
Perform chi-square tests for goodness-of-fit, homogeneity, and independence, to
include: writing appropriate hypotheses, checking the necessary assumptions,
drawing an appropriate diagram, computing the P-value, making a decision, and
interpreting the results in the context of the problem.
Unit 3: Technology Instructions
Technology Instructions for Unit 3
Unit 3: Technology Instructions
Unit 3: Technology Instructions
Unit 3: Technology Instructions
Unit 3: Technology Instructions
Unit 3: Technology Instructions
Unit 3: Technology Instructions
Unit 3
Chapter 15 – Sampling Distribution Models
Define the Sampling Distribution of Proportions:
Sampling Distribution of p̂ : 𝑁 (𝑝, √
𝑝𝑞
𝑛
)
Conditions to check:
– We use the normal model to answer questions about how likely it is for a sample proportion
to be in a certain range of values
– Note: To use the sampling distribution of proportions to determine how unusual sample
statistics are, we need to know the true population proportion p
1) Assume that 30% of the students at a University wear contacts and we randomly pick 100
students to see what percentage of them wear contacts.
a) What is the sampling distribution?
b) What is the probability that more than one third of them wear contacts?
2) Based on past experience, a bank believes that 7% of the people who receive loans will not
make payments on time. The bank has recently approved 200 loans.
a) What are the mean and standard deviation of the proportion of clients in this group who
may not make timely payments?
b) What assumptions underlie your model? Are the conditions met?
c) What is the probability that over 10% of these clients will not make timely payments?
Unit 3
Define the Sampling Distribution of Means:
Sampling Distribution of y : 𝑁 (𝜇,
𝜎
√𝑛
)
Conditions to check:
– We use the normal model to answer questions about how likely it is for a sample mean ( y )
to be in a certain range of values
– Note: To use the sampling distribution of means to determine how unusual sample statistics
are, we need to know the true population mean 𝜇 and the true population standard
deviation 𝜎
3) Ithaca, New York gets an average of 35.4” of rain each year with a standard deviation of
4.2”. Assume the Normal model applies to their yearly rainfall.
a) During what percentage of years does Ithaca get more than 40” of rainfall?
b) Describe the amount of rainfall that separates the driest 20% of all years.
c) You plan to stay in Ithaca for 4 years. Describe the sampling distribution of the means for
the average rainfall over a 4-year period.
d) What is the probability that those four years average less than 30” of rain?
Unit 3
4) College’s data about the incoming freshmen is normally distributed with the mean of their
high school GPAs is 3.4 and a standard deviation of 0.35. The students are randomly
assigned to freshmen writing seminars in groups of 25.
a) Describe the distribution of the students’ high school GPAs.
b) Describe the sampling distribution model that applies to the average GPAs for students in
a writing seminar.
c) Find the probability a student has a GPA greater than 3.5.
d) Find the probability that one of the groups has an average GPA greater than 3.5.
Unit 3
Extra Practice/Homework
5) A national study found that 44% of college students engage in binge drinking (5 drinks at a
sitting for men, 4 for women). A professor surveyed a random sample of 244 students at his
college.
What is the probability that less than 96 of them admitted to binge drinking in the past week?
6) Suppose the weights of college textbooks are normally distributed with a population mean of
4 pounds and a population standard deviation of 1.2 pounds. Suppose randomly sample and
weigh 16 textbooks. Answer the following questions:
a) What is the probability that the sample average of the 16 textbooks is less than 3.5
pounds? (use 2 decimal places in your answer))
b) What is the probability that a given (i.e. one) textbook weighs more than 5.5 pounds?
(use 2 decimal places in your answer)
c) Imagine the distribution of all possible samples averages of textbook weights (with
n=16). What is the weight that separates the lightest 25% of sample averages from the
heaviest 75% of sample averages? (round your answer to the nearest tenth of a pound)
Unit 3
Chapter 16 – Confidence Intervals for Proportions
1) Your local newspaper polls a random sample of 330 voters, finding 144 who say they will
vote “yes” on the upcoming school budget. What is the true support for the school budget?
a) Try this by hand:
One-proportion z-interval: pˆ  z*SE  pˆ 
where SE  pˆ  
ˆˆ
pq
n
Conditions to check:
b) What is the margin of error?
c) Now try using the TI: STATTEST1-PropZInt
2) A random sample of 416 teenagers showed that 76.4% of them had experimented with
alcohol.
a) What is the 95% confidence interval for the proportion of the population?
b) Interpret this confidence interval
Unit 3
3) Direct mail advertisers send solicitations to thousands of potential customers in the hope that
some will buy the company’s product. The response rate usually is quite low. Suppose a
company wants to test the response to a new flyer and sends it to 1000 people randomly
selected from their mailing list of over 200,000 people. They get 123 orders from the
recipients.
a) Create a 90% confidence interval for the percentage of people the company contacts who
may buy something.
b) Explain what the interval means.
c) The company must decide whether to now do a mass mailing. The mailing won’t be cost
effective unless it produces at least a 5% return. What does your confidence interval
suggest?
4) An experiment finds that 27% of 53 subjects report improvement after using a new
medication. Create a 95% confidence interval for the actual cure rate.
a) Check the conditions:
b) Find the 95% confidence interval:
c) Why is our interval so wide?
d) Make it narrower – 90% confidence.
e) What are the advantages and disadvantages?
Unit 3
Sample Size Computations
– To find the sample size needed to obtain a certain level of confidence within a set margin
of error, we can use the following formula (derived from our Margin of Error formula) s
 
n
z*
2
ˆˆ
pq
ME 2
o z* is the z-critical value needed for the confidence level
ˆ ˆ are our estimates from a previous study
o pq
 use 0.5 for both if no estimates are given
5) Recall that an experiment finds that 27% of 53 subjects report improvement after using a new
medication.
What sample size would we need in a follow up study if we want a margin of error ± 5%
with 98% confidence?
Hints:
– Solve for z* of the 98% confidence interval using invNorm(.01)
– Use our previous estimates of p^ and q^ and solve for n.
– Hint: always round up because we need to ensure a large enough sample.
6) In preparing a report on the economy, we need to estimate the percentage of businesses that
plan to hire additional employees in the next 60 days.
a) How many randomly selected employees must we contact in order to create an estimate
in which we are 98% confident with a margin of error of 5%?
b) Suppose we want to reduce the margin of error to 3%. What sample size will suffice?
c) Why might it not be worth the effort to try to get an interval with a margin of error of
only 1%?
Unit 3
Extra Practice / Homework
7) A survey of 95 families showed that 40 of the families owned at least one tv.
a) Find the 95% confidence interval on the proportion of families who own at least one tv.
b) Find the Margin of Error associated with this confidence interval.
c) Interpret this interval in the context of the problem. Carefully explain what 95%
confidence means in this context.
d) Suppose you reduce your level of confidence to 90%. Will this create a wider or narrower
interval? Does this mean that precision will increase or decrease?
8) A survey of shoppers is planned to see what percentage use credit cards. Prior surveys
suggest 53% of shoppers use credit cards. How many randomly selected shoppers must we
survey in order to estimate the proportion of shoppers who use credit cards to within 3% with
95% confidence?
Unit 3
Chapter 17 – Hypotheses Testing about Proportions
1) Previously the cracking rate of iron casts was 20%. After a new engineering process the
cracking rate of 400 casts fell to 17%. Is this due to the new engineering or just random
chance?
a) What is the null hypothesis (no change)?
b) What is the alternate hypothesis (this is what we want to show)?
c) Compute a one-proportion z-test by hand:
z
( pˆ  p0 )
where SD  pˆ  
SD  pˆ 
p0 q0
n
Conditions to check:
d) Sketch the z distribution
e) What is the probability that this data would be seen by chance?
f) How convincing is the data?
g) Now try this test using the TI STATTESTS1PropZTest
Unit 3
2) Hypothesis Writing Practice -- Write the null and the alternative hypotheses for the
following:
a) In the 1950’s only about 40% of high school graduates went on to college. Has the
percentage changed?
b) 20% of the cars of a certain model have needed costly transmission work after being
driven between 50,000 and 100,000 miles. The manufacturer hopes that the redesign of the
transmission has solved this problem.
c) We field test a new flavor of soft drink, planning to market it only if we are sure that at
least 60% of the people like the flavor.
d) The drug Lipitor is meant to lower cholesterol. Is there evidence to support the claim that
over 1.9% of the users experience flu like symptoms as a side effect?
e) According to the US department of Health, 16.3% of Americans did not have health
insurance coverage in 1998. A politician claims that this percentage has decreased since
1998.
f) During the past forty years, the monthly rate of return for a particular item has been 4.2
percent. A store analyst claims that it is different.
Unit 3
3) It is believed that the percent of convicted felons who have a history of juvenile delinquency
is 70%. Is there evidence to support the claim the the actual percentage is more than the
70% if out of 200 convicted felons, we find that 154 have a history of juvenile delinquency?
Alpha = .05
a) Write the null and alternative hypotheses.
b) Check the assumptions
c) Draw the curve.
d) Find the critical value and the p value.
e) Should you reject or fail to reject the null hypothesis? Why?
f) State your conclusion
4) In the 1980’s it was generally believed that autism affected about 6% of the nation’s children.
Some people believe that the increase in the number of chemicals in the environment has led
to an increase in the incidence of autism. A recent study examined 384 children and found
that 46 of them showed signs of some form of autism. Is there strong evidence that the level
of autism has increased? (Use an alpha of 0.05)
a) Write the null and alternative hypotheses.
b) Check the assumptions
c) Draw the curve.
d) Find the critical value and the p value.
e) Should you reject or fail to reject the null hypothesis? Why?
f) State your conclusion
Unit 3
Extra Practice / Homework
5) Past studies have indicated that 85.6% of all enrolled college students in the U.S. are
undergraduates. A random sample of 500 enrolled college students in a particular stats
revealed that 420 of the students were undergraduates. Is there enough evidence to conclude
that the percentage of undergraduates from this particular state differs from the national
percentage?
a) What type of test would be appropriate for this situation? (name the test and indicate if it
should be a one tailed test or two tailed test?)
b) State the null and the alternate hypotheses.
c) Perform the appropriate test and report the P-value
d) Use your P-value to make a conclusion about the problem. Are you accepting or rejecting
the null hypothesis?
Unit 3
Chapter 18 – Inferences about Means
Confidence Intervals for the Mean
1) Recently, realtors randomly sampled 36 bids from potential buyers to estimate the average
loss in home values. The sample showed an average loss of $9560 with a standard deviation
of $1500.
Conditions to check:
a) Check the assumptions and conditions.
b) Find a 95% confidence interval for the mean loss in value per home by hand using the
formula:
s
One-sample t-interval: y  tn*1SE  y  where SE  y  
n
Hint: you will need to look up t* using DISTRinvT or the StatCrunch t distribution
calculator
c) Also try this using your TI: STATTESTS TInterval
d) Interpret the interval.
Unit 3
2) Hoping to lure more shoppers downtown, a city builds a new public parking garage. The city
plans to pay for the structure through parking fees. During a two month period (44 week
days) daily fees collected averaged $126 with a standard deviation of $15.If a consultant
predicted that the average daily income would be $130, would he be correct at the 90%
confidence level?
3) In 1998, the Nabisco Company announced a “1000 Chips Challenge” claiming that every 18
ounce bag of Chips Ahoy contained at least 1000 chocolate chips. Below are the counts of
chips in selected bags.
1219 1214 1087 1200 1419 1121 1325 1345 1244 1258 1356 1132
1191 1270 1295 1135
What does this evidence say about Nabisco’s claim?
4) The survival times in weeks are given for 20 male rats which were exposed to a high level of
radiation.
152
152
115
109
137
88
94
77
160
165
20
128
123
136
101
62
152
83
69
125
Determine the 95% confidence interval on the mean survival time for rats.
Unit 3
Hypothesis test for the mean - T Test:
5) Judy is an ad designer who designs the newspaper ads for the Giant grocery store. Electronic
counters at the entrance total the number of people entering the store. Before Judy was hired,
the mean number of people entering every day was 3018. Since she has started working at
the Giant the management thinks that this average has increased. A random sample of 42
business days gave an average of 3333 people entering the store daily with a standard
deviation of 287. Does this indicate that the average number of people entering the store
every day has increased? Use an alpha of 0.01.
a) Write the hypotheses:
b) Try this test by hand:
t
y  0
SE  y 
SE  y  
s
n
Conditions to check:
c) Sketch the t distribution
d) Find the p-value
DISTRtcdf(lower, upper, df) where df = n-1
e) Try again using the TI: STATTESTST-TEST
– Inpt: choose data or stats according to what is available (usually you use stats)
– μ0 : stands for the hypothecated mean which is in your hypothesis In this case we are
testing against the previous 3018.
– X bar is the sample mean of 3333
– Sx is the standard deviation of 287
– n: the number in your sample 42 days
– μ : choose the notation used in the alternative hypothesis In this case we are looking
for an increase so choose >
– Calculate
f) Write a conclusion
Unit 3
6) Hoping to lure more shoppers downtown, a city builds a new public parking garage. The city
plans to pay for the structure through parking fees. During a two month period (44 week
days) daily fees collected averaged $126 with a standard deviation of $15. If a consultant
claimed that the average daily income would be $130, should we reject her claim using
alpha=0.10 (perform a 2-sided test)?
How is this similar or different to using the confidence interval?
7) When consumers apply for credit, their credit is rated using FICO scores. A random sample
of credit ratings is obtained, and the FICO scores are summarized with these statistics: n=18,
ybar=660.3, s=95.9. Use an alpha of 0.05 and do a 2-sided hypothesis test to test the claim
that the mean credit score (of the general population) is equal to 700 (Triola 2008).
8) Different cereals are randomly selected, and the sugar content is obtained for each cereal,
with the results given below for Cheerios, Harmony, Smart Start, Cocoa Puffs, Lucky
Charms, Corn Flakes, Fruit Loops, Wheaties, Cap’n Crunch, Frosted Flakes, Apple Jacks,
Bran Flakes, Special K, Rice Krispies, Corn Pops, and Trix. Use an alpha of 0.05 to test the
claim of a cereal lobbyist that the mean sugar content of all cereals is LESS than 0.3 g (Triola
2008).
0.03 0.24 0.30 0.47 0.43 0.07 0.47 0.13
0.44 0.39 0.48 0.17 0.13 0.09 0.45 0.43
Unit 3
Extra Practice / Homework
9) The State Superintendent of Schools consistently states that the average salary of all
substitute teachers in every school district is at least $60 per day. The president of the
teachers' union believes that the average salary is less than $60 per day. So, she sample eight
school districts to help test her claim.
The daily salaries she found for the districts were $60 $56 $60 $55 $70 $55 $60 $55
a) What statistical test should be performed? Should this test be one sided or two?
b) Which condition has to be satisfied for the test that we did not have to check for the test
on proportions?
c) Write the two hypotheses for the problem.
d) Make a sketch of the distribution indicating the test statistic and p.
e) Compute the appropriate statistical test and report the test statistics and associated pvalue.
f) In the context of the problem, write a conclusion to the study and substantiate your
conclusion using statistical arguments. Use an alpha level of 0.05.
g) Find a 95% confidence interval for the true daily salary based on the data.
Unit 3
Chapter 19 – More about Tests and Intervals
Statistical Significance
– What is a p-value?
– What is the difference between practical significance and statistical significance?
1) Are they speeding?
– College Terrace speed limit: 25 mph Even after traffic-calming measures, a resident
complains that cars still speed.
– 250 randomly selected cars were clocked with mean speed 25.55 mph, s = 3.618.
Is the mean speed of all cars greater than 25 mph?
a) Write the hypotheses:
b) Check conditions
c) Perform hypothesis test
d) Are the results statistically significant?
e) Find the 95% confidence interval
f) Are the results practically significant?
Unit 3
Types of Errors
– What is a Type I error?
– What is a Type II error?
2) Have harsher penalties and ad campaigns increased seatbelt use among drivers and
passengers? Observations of commuter traffic failed to find evidence of a significant change
compared with three years ago.
a) Explain what a P-value of 0.17 means in this context.
b) What are the hypotheses?
c) In this context, what are the Type I and Type II errors?
3) Production managers on an assembly line must monitor to be sure that the level of
defective products remains small. They periodically inspect a random sample of the items
produced. If they find a significant increase in the production of items that must be rejected,
they will halt the assembly process until the problem can be identified and repaired.
a) What are the hypotheses?
b) In this context, what is a Type I error?.... Type II?
c) What are the consequences of each type of error?
Unit 3
4) A company is willing to renew its advertising contract with a local radio station only if the
station can prove that more than 20% of the residents of the city have heard the ad and
recognize the company’s product. The radio station conducts a random phone survey of 600
people and finds 133 people that recognize the product. In this context, what are the Type I
and Type II errors?
5) A statistics professor has observed that about 13% of the students who initially enroll in his
Introductory Stats class withdraw before the end of the semester. A salesman suggests that
he try a statistics software package that gets students more involved with computers,
predicting it will cut the dropout rate. The software is expensive so the salesman offers to let
the professor use it for a semester to see if the dropout rate goes down significantly.
a) Is this a one-tailed or two-tailed test?
b) Write the null and alternative hypotheses
c) In this context, what would happen if the professor makes a Type I error?
d) In this context, what would happen if the professor makes a Type II error?
e) Initially 203 students signed up for the course. They used the software suggested by the
salesman and only 11 dropped out of the course.
f) Should the professor spend the money for this software? Support your recommendation
with an appropriate test.
g) Explain what your P-value means
Unit 3
Extra Practice / Homework
6) New road signs are made with the intention of improving visibility to drivers. Highway
safety engineers setup a test course that included both the old and the new signs. Volunteers
drove the test course and rated the old and new signs in terms of visibility.
a) Write the null and alternative hypotheses "in words" using "improved visibility" and "not
improved visibility".
b) Using the hypotheses, write what would constitute a TYPE I error in the context of the
problem.
c) What would be the consequence of the TYPE I error?
d) What would be a TYPE II error in the context of the problem?
e) What would be the consequence of the TYPE II error?
Unit 3
Chapter 20 – Comparing Groups
Comparing Two Proportions
1) The Sleep in America Poll found that 205 of 293, or 70%, of Gen-Y use the Internet before
sleep. 235 of 469, or 50%, of Gen-X use the Internet before sleep (De Veaux et al., 2014).
pˆ1qˆ1 pˆ 2 qˆ2
Two-proportion z-interval:  pˆ1  pˆ 2   z*SE  pˆ1  pˆ 2  where SE  pˆ1  pˆ 2  

n1
n2
a) Use the two-proportion z-interval to build a confidence interval for the true difference in
the proportions of Gen-Y’ers and Gen-X’ers who use the internet before sleep
b) Now also try using your TI:
STATTESTS2PropZInt
Conditions to check:
c) Interpret this confidence interval the context of
the problem:
2) (Using TI) 84.9% of 12,460 males and 88.1% of 12,678 females indicated that they had high
school diplomas. Find the 95% confidence interval for the difference in graduation rates
between males and females.
– STATTESTS2PropZInt
o x1: number of males with diploma; .849 times 12,460 or approximately 10,579
o n1: 12460
o x2: number of females with a diploma; .881 times 12,678 or approximately 11,170
o n2: 12678
o C-level: .95
– You should get a confidence level from -4.1% to -2.4%. The percentages are negative
because we let n1 represent the males and their percentage of graduates was lower than
the female percentage.
– When we interpret this confidence interval we should discuss the positive differences:
o We are 95% confident that the true difference in the proportion of males and
females who graduate high school is between 2.4% and 4.1%, with females
having the higher graduation rate.
Unit 3
Hypothesis Testing for the Difference in Proportions
In the prior example we are 95% confident that the difference in the proportions of males and
females that graduate is between 2.4 and 4.1 percentage points.
– Is it likely that the proportions of males and females that graduate are actually the same?
– What would have to be true of the difference of the proportions for this to be the case?
– What might our 95% confidence interval look like if the true proportions are really the same?
It turns out we have an easier way of testing whether or not two sample proportions are really
from the same population (i.e. the same)
3) Try a hypothesis test for example 2) 84.9% of 12,460 males and 88.1% of 12,678 females
indicated that they had high school diplomas
a) Write your hypotheses:
b) STATTESTS2PropZTest
– You are testing to see if there is a difference, so choose p1 NOT equal to p2.
1) What conclusion can you make?
4) A survey of 430 randomly selected adults found that 21% of the 222 men and 18% of the 208
women had purchased books online. Is there evidence that men are more likely to make
online purchases of books?
Unit 3
5) Would being part of a support group that meets regularly help people who are wearing the
nicotine patch actually quit smoking? A county health department tries an experiment using
several hundred volunteers who are planning to use the patch. The subjects were randomly
divided into two groups.
– People in Group 1 were given the patch and attended a weekly discussions meeting with
counselors and others trying to quit. People in Group 2 also used the patch but did not
participate in the counseling groups.
– After six months 46 of the 143 smokers in Group 1 and 30 of the 151 smokers in Group 2
had successfully stopped smoking.
Do these results suggest that such support groups could be an effective way to help people
stop smoking?
6) 120 subjects, 60 men and 60 women, were given the “water-level task” to test their
perceptual and conceptual skills. Subjects were shown a drawing of a glass titled at a 45
degree angle. They were then asked to draw a line representing the surface of the water. The
researchers recorded whether or not the line drawn was within 5 degrees of the true line.
These are the results of people who were within the 5 degrees. Are the proportions the same?
x
n
women 24
60
men
39
60
Unit 3
Comparing Two Means
7) In the morning class, the mean on the first exam was 78 with a standard deviation of 2.3 for
the 57 students in the class. In the afternoon class, the mean was 81 with a standard deviation
of 5.7 for the 37 students in the class. Determine the 90% confidence interval for the
difference in the means of the two groups.
STATTESTS2-SampTInt
– For Input choose Stats unless you have placed the data in L1 and L2.
– 𝑥̅1 is 78 and Sx1 is 2.3 with n1 as 57
– 𝑥̅2 is 81 and Sx2 is 5.7 with n2 as 37
– C-level is .9
Conditions to check:
– Never pool
– Calculate
Report and interpret the confidence interval
8) The data below show the sugar content (as a percentage of weight) of several national brands
of children’s and adult’s cereals (De Veaux et al., 2014). Create and interpret a 95%
confidence interval for the difference between in mean sugar content. Be sure to check the
necessary assumptions and conditions
Children’s: 40.3 55 45.7 43.3 50.3 45.9 53.5 43 44.2 44 47.4 44 33.6 55.1
48.8 50.4 37.8 60.3 46.6
Adults’: 20 30.2 2.2 7.5 4.4 22.2 16.6 14.5 21.4 3.3 6.6 7.8 10.6 16.2 14.5
4.1 15.8 4.1 2.4 3.5 8.5 10 1 4.4 1.3 8.1 4.7 18.4
a) Check the assumptions and conditions
b) Report the confidence interval
c) Interpret the confidence interval
d) Does this interval suggest a true difference in mean sugar content?
Unit 3
2-Sample t-Test
Use the example above to conduct a 2-Sample t-Test for the difference between two means.
e) Hypotheses:
f) Run the test using STATTESTS2SampTTest
***never pool
g) Results:
9) It has been suggested that college students learn more and obtain higher grades in small
classes (40 or less) when compared to large classes (150 or more).
To test this claim, a university assigned a professor to teach a small and a large class of the
same course. At the end of the course, the classes were given the same exam. The following
are the final grade results. Test the claim at an alpha of 0.05.
Sample size
35
Sample mean
74.2
Standard deviation 14
170
71.7
13
Hypotheses:
H0: μsmall=μlarge
HA: μsmall>μlarge
Test:
STATTESTS2-SampTTest and select STATS not data
fill in the appropriate statistics
For μ1: choose >
don't pool
Calculate
a) Sketch the t distribution, indicate your test statistic, and shade the area corresponding
with your p-value.
b) What conclusion should be made?
Unit 3
10) A study was conducted to assess the effects that occur when children are exposed to cocaine
before birth. 190 children born to cocaine users had a mean score of 7.3 (with a standard
deviation of 3.0) on a certain aptitude test. 186 children not exposed to cocaine had a mean
score of 8.2 with a standard deviation of 3.0. Use an alpha of 0.05 to test the claim that
cocaine use is harmful to children’s aptitude (Triola 2008).
11) Researchers investigated how the size of a bowl affects how much ice cream people tend to
scoop when serving themselves. At an “ice cream social”, people were randomly given
either a 17 oz or a 34 oz bowl and were invited to scoop as much ice cream as they liked.
Did the bowl size change the selected portion size? (DeVeaux et al., 2009)
Small bowl: n: 26, y(bar): 5.07oz, s: 1.84oz
Large bowl: n: 22, y(bar): 6.58oz, s: 2.91oz
Test an appropriate hypothesis and state your conclusions (for assumptions and conditions
that you cannot test, assume they are sufficiently satisfied to proceed)
Unit 3
Extra Practice / Homework
12) A study was conducted to determine if the percent of women who received financial aid is
different form the percentage of men who receive financial aid. A random sample of students
revealed the results shown in the table. Using
(α=0.01), is there significant
evidence to reject the null hypothesis?
women men
sample size
250
number receiving aid 200
300
225
a) Find the 90% confidence interval for the difference in financial aid rates between men
and women and interpret this interval in the context of the problem.
b) Write the two hypotheses for the problem.
c) Compute the appropriate statistical hypothesis test and report the test statistic and pvalue.
d) Write the conclusion for the problem regarding the null hypothesis using statistical
language.
13) Given below are samples of enrollment from medical schools that specialize in research and
from those that specialize in primary care. You need to decide if there is a difference in the
average enrollment for the two specializations.
Research 474 577 605 663 783 467 670 414 813 443 565 696 692 694 277 419 884
Primary Care 783 605 427 728 546 474 371 107 442 587 293 277 662 555 527 320
a) Find the 95% confidence interval for the difference between the means and interpret the
interval in the context of the problem.
b) Write the two hypotheses for this problem.
c) Find the test-statistic (i.e. t) and p-value. Sketch the distribution of test-statistics and
indicate t and the shaded region corresponding to the p-value.
d) In the context of the problem, make a conclusion and substantiate with statistical
arguments.
Unit 3
Chapter 21 – Paired Samples and Blocks
1) Ages -- Are husband’s older than wives?
This data set contains the ages of the husband and the wife
for 20 couples. We want to know what the difference is
between the ages of husbands and wives. Since this data is
paired (and clearly not independent) we should not use the
2-Sample T Interval or 2-Sample T Test.
Instead consider the pairwise differences.
Using our calculator, enter the Age of Husband in L1, the
Age of Wife in L2. Move your cursor over L3 and type L1L2… enter. L3 will fill with the pairwise differences.
We will treat this as our data set.
a) Create a 90% confidence interval for the true
(pairwise) differences and interpret it in the context
of the problem
Age of
Husband
29
38
47
23
45
48
35
34
56
52
44
30
26
65
25
37
72
27
50
36
Age of
Wife
30
33
45
22
46
33
35
37
57
47
36
31
29
62
25
38
70
28
45
34
Pairwise
Differences
b) Conduct a T-Test using an α=0.05 Our hypotheses now need to be in terms of the mean
difference 𝜇𝑑
H0: 𝜇𝑑 = 0
Conditions to check:
HA: 𝜇𝑑 >0 (we are trying to see if husbands
are older than wives)
c) What conclusion is appropriate?
d) Considering both a) and b) are our results statistically significant? Practically
significant?
e) What type of error might we have made?
Unit 3
Worked Example:
2) A psychologist who is interested in testing the relationship between stress and short-term
memory administered a test to 10 subjects prior to their exposure to a stressful situation and
then retested them after the stress situation. From the following data can we conclude that the
stress situation decreases one’s performance on a tests that measures short-term memory?
Use an alpha of 0.05
Subject
1
2
3 4
5
6
7
8
9
10
Pre stress 13 15 9 13 15 17
13
16 11 13
Post stress 10 14 7 15 11 14 13
14
9 14
– This is a matched sample problem.. same group of subjects doing two different things….
they are dependent. We are interested in their mean difference. We want to look at
everyone's difference in memory before the stress and after the stress. We then will look
at the average of all of their differences. Since the question is to determine if stress
decreases short-term memory, we need to discover if pre-stress memory>post-stress
memory. So the difference we will take is pre - post.
Hypotheses:
H0: μ0 = 0
HA: μ0 > 0 (i.e. pre-stress > post-stress)
– In L1 put the pre stress measures and in L2 put the post stress measures. Go to the very
top of L3, (actually go on top of "L3") and type in 2nd L1 - 2nd L2 enter. You should see
data in L3 which represents each subject’s difference. Now do a T-test on L3 using Data
Test
Stat -> Tests -> T-Test (same as we did for single samples)
μ0 = 0
List: L3
Freq: 1
μ = 0 : use >
Calculate
Results
You find a test statistic of 2.33333 and a p-value of .02 also indicating that the null
hypothesis should be rejected.
Therefore, the evidence suggests that stressful situations can cause a decrease in short-term
memory.
Unit 3
3) To test the effectiveness of a drug to relieve asthma, a group of subjects was randomly given
a drug and placebo on two different occasions. After 1 hour an asthmatic relief index was
obtained for each subject, with these results:
(Low numbers are good!) Is the drug more effective?
Subject
Drug
Placebo
1
28
32
2
31
33
3
17
19
4
22
26
5
12
17
6
32
30
7
24
26
8
18
19
9
25
25
4) A composition teacher wishes to see whether a new grammar program will reduce the
number of grammatical errors her students make when writing a two page essay. The data are
shown below. Can it be concluded that the number of errors has been reduced using this new
program?
Student
1
2
3
4
5
6
Errors before Errors after
12
9
9
6
0
1
5
3
4
2
3
3
5) Use the following data (representing hospital admissions from motor vehicle crashes) and an
alpha of 0.05 to test the claim that Friday the 13ths are unlucky (Triola 2008):
Friday the 6th(immediately preceding the 13th)
Friday the 13th
9
13
6
12
11
14
11
10
3
4
5
12
Unit 3
Extra Practice / Homework
6) Eight students in a statistics class were asked to report the number of hours they slept on
weeknights and on weekends. The table shows the results.
Student #
1 2 3 4 5 6 7 8
weeknight hours 8 5.5 7.5 8 7 6 6 8
weekend hours 4 7 10.5 12 11 9 6 9
a) How can you tell that this is a problem involving dependent samples?
b) Create a 90% confidence interval on the mean difference and interpret the intervals in the
context of the problem.
c) Write the two hypotheses for the problem.
d) Perform a statistical test on the difference in the number of hours slept on weeknights vs.
weekend nights and report your test statistic and p-value. Sketch the distribution of test
statistics and indicate your test statistic as well as the shaded region corresponding to p.
e) Make a conclusion based on your test.
Unit 3
Chapter 22 – Comparing Counts
 2 Goodness of Fit Test
Hypotheses:
– H0: The distribution of counts occurs in a manner consistent with our model.
o p1  p2  ...  pn or p1  #, p2  #,... pn  #
– HA: the distribution of counts occurs in a manner which is inconsistent with our model.
o H A : not all pi ' s are stated
o Note: the distribution of counts can vary from our model in many different ways!
1) Is there a birth-month advantage to becoming a major league baseball player?
To answer this question we need to see how to distribution of birth month for major league
baseball players compares to the overall distribution of birth month for the national
population (that is our model)
a) We start by finding the expected counts… (Assuming the null hypothesis is true, how
many players in our sample would we expect to be born each month). Each expected
count is n*p (the total number of players)*(the national birth % from our model)
b) Make a table of that shows the Observed and Expected Counts for each month
c) We compute the  2 test statistic as follows:  2  Obs  Exp 
 Exp
The  2 statistic follows the  2 distribution with nConditions to check:
1 degrees of freedom (n = # of categories)
d) To find the p-value you can use a  2
distribution calculator such as  2 cdf in
your TI
2
Unit 3
2) A die is filled with a lead weight and then rolled 200 times with the following results:
1: 27
2: 31
3: 42
4: 40
5: 28
6: 32
Use an alpha of 0.05 to test the claim that the outcomes are not equally likely (Triola 2008).
3) You are planning to open an old time soda fountain and your partner claims that the public
will not prefer any flavor over another. The flavors you serve are cherry, strawberry, orange,
lime and grape. After several customers, you stop and take a look at how sales are going and
here are the results. The following numbers of people ordered the flavor shown. Cherry 35,
Strawberry 32, Orange 29, Lime 26 and Grape 25. Test to see if there was a preference.
Unit 3
 2 Test of Homogeneity
4) Are students’ post-graduation plans the same at different colleges within the same university?
(De Veaux et al. 2014)
Hypotheses:
– H0: The distribution of ________ is the same for __________
o e.g. Students’ post-graduation activities are
distributed in the same way for all four colleges.
– HA: The distribution of ________ is not the same for __________
o e.g. Students’ plans do not have the same distribution.
Expected counts =
(row total)(column total)
,
grand total
 
2
Obs  Exp 
2
Exp
and the distribution has(r – 1)(c – 1) degrees of freedom
Using the TI:
– Enter the contingency table into the TI using the Matrix function.
– Matrx…Edit…enter to get into the matrix [A].
o A matrix is called by its rows first and then its columns. Enter these numbers
accordingly.
 MATRIX[A] 3x4
o Next fill in the matrix as it appears. Remember to hit enter after each data entry.
o Then 2nd Quit.
– Stat -> Test -> #C x2-test.
o We placed the observed data in [A] and the TI will place the expected values in
[B] or anywhere you stipulate.
o Then ask the TI to calculate.
Unit 3
5) The following chart shows the post high school choices for three graduating classes.
Determine if the post high school choices have changed over the three decades.
College
Employment
Military
Travel
1980
320
98
18
17
1990
245
24
19
2
2000
288
17
5
5
Total
853
139
42
24
Total
453
290
315
1058
Although example 5 was worded as a test of homogeneity, we could have written it as a test of
independence by asking if post-high school choices were independent of decade of graduation.
*the test mechanics are the same!
 2 Test of Independence
Hypotheses:
H o : _______ and _______ are independent
H A : _______ and _______ are not independent
has (r – 1)(c – 1) degrees of freedom
6) Recall this contingency table showing who survived the sinking of the Titanic based on
whether they were crew members or passengers booked in first-, second or third-class
staterooms.
Determine if surviving was independent of
status.
Crew
Alive 212
Dead 673
Total 885
First
202
123
325
Second
118
167
285
Third
178
528
706
Total
710
1491
2201
Unit 3
7) A sociologist wishes to see whether the number of years of college a person has completed is
related to his or her place of residence. A sample of 84 is selected and classified as shown.
Location
No College
B.S. Degree
Master’s Degree
PhD
Urban
15
12
16
5
Suburban
8
15
11
9
Rural
6
7
12
7
At =0.05, can the sociologist conclude that the years of college education are independent of
the residence location?
8) Use the following data to do a test of independence to see if left-handedness is independent
of gender (use alpha=0.05):
Left-Handed Right-Handed
Male
17
83
Female
16
184
Unit 3
Extra Practice / Homework
9) The makers of M&Ms claim that the colors of M&Ms are distributed as follows: 20%
yellow, 20% red, 10% orange, 10% blue, 10% green and the rest are brown.
You buy a bag and get 29 yellow, 23 red, 12 orange, 14 blue, 8 green, and 20 brown.
a) If the M&Ms are packaged in the stated proportions, how many of each color should you
expect to get? Create a table of observed and expected counts.
b) To see if your bag was unusual should you test goodness-of-fit, homogeneity, or
independence?
c) State the hypotheses
d) How many degrees of freedom are there?
e) Find the chi-square statistic and the p-value
f) Draw a sketch of this distribution and label your chi-square statistic and p-value.
g) State a conclusion
Unit 3
10) The table below shows the rank attained by male and female officers in the New York City
Police Department (NYPD). Do these data indicate that men and women are equitably
represented at all levels of the department? (De Veaux et al., 2009)
Rank:
Male Female
21,900 4,281
Officer
4,058 806
Detective
3,898 415
Sergeant
Lieutenant 1,333 89
359 12
Captain
Higher Ranks 218 10
a) State what type of test is appropriate and give your hypotheses.
b) Find the chi-square statistic, the p-value, and report the degrees of freedom.
c) State and justify your conclusions. Are gender and rank independent or is there an
association?
Unit 3
Confidence Interval & Hypothesis Testing Calculator Functions
Procedure
Calculator Function
Hypotheses
1-Proportion Confidence Interval
1-Proportion Hypothesis Test
A: 1-PropZInt
5: 1-PropZTest
1-Mean Confidence Interval
1-Mean Hypothesis Test
8: TInterval
2: TTest
2-Proportion Confidence Interval
2-Proportion Hypothesis Test
B: 2-PropZInt
6: 2-PropZTest
2-Mean Confidence Interval
(Independent Samples)
2-Mean Hypothesis Test
(Independent Samples)
0: 2-SampTInt
2-Mean Confidence Interval
(Paired/Dependent Samples)
2-Mean Hypothesis Test
(Paired/Dependent Samples)
8: TInterval (on Pairwise Δs)
2: TTest (on Pairwise Δs)
H0: µd = 0
(note: µd = µ1 - µ2)
HA: µd ≠, <, > 0
χ2 Goodness-of-Fit Test
D: X2GOF-Test
(new OS only)
H0: The distribution of ______ occurs in a manner
consistent with the reported percentages:
HA: The distribution of ______ occurs in a manner
inconsistent with the reported percentages
χ2 Test of Homogeneity
C: X2-Test
H0: _____ are distributed in the same way for all ____
HA: _____ are not distributed in the same way for all ____
χ2 Test of Independence
C: X2-Test
H0: The variables _____ and _____ are independent
HA: The variables _____ and _____ are not independent
4: 2-SampTTest
H0: p = ___ (% as a decimal)
HA: p ≠, <, > ___
H0: µ = ___
HA: µ ≠, <, > ___
H0: p1 = p2
(be sure to indicate which group is group 1)
HA: p1 ≠, <, > p2
H0: µ1 = µ2
(be sure to indicate which group is group 1)
HA: µ1 ≠, <, > µ2
1: One Proportion (or mean) Confidence Interval:
We are ___% confident that the true population proportion (or mean) of _________________ is between _____ and ______.
2: Two Proportion (or mean) Confidence Interval:
We are ___% confident that the true difference in population proportions (or means) of ____________ and _____________ is
between _____ and _____, with ________________ having a larger proportion (or mean). (if applicable)
3: Hypothesis Test (p-value<α):
Since my p-value of ____ is less than alpha, I reject the null hypothesis. The statistical evidence seems to indicate that
_______________ is not equal to _______________. (or is greater than ___________ or less than __________).
4: Hypothesis Test (p-value≥α):
Since my p-value of ____ is greater than alpha, I fail to reject the null hypothesis. There is not enough statistical evidence to
conclude that _______________ is not equal to _______________. (or is greater than ___________ or less than __________).
Unit 3
Sampling Distributions
We know  and  or p (population parameters) This is unusual! Usually we only know
about our sample.

Want to know the probability of seeing our sample data x or p
, 


pq 
Use N  p,

n 


Means:
Use N   , 

n

Solving using the appropriate Normal Curve
Population
p
sample
Proportions:
x p
Hypothesis Testing/Confidence Intervals
Here we know about our sample e.g. x or p and want to make inferences about the population
 , p 

p
x p
One group
1 Prop Z Test
1 Prop Z Interval
IntervalInterval
Is the data proportions?
2 Prop Z Test
2 Prop Z Interval
One group
Is the data quantitative/means?
T Test
T Interval
Independent Samples
2 Sample T Test
2 Sample T Interval
T Test on Pairwise Differences
T Interval on Pairwise Differences
Is the data Counts of Categorical Variables?
One variable and a hypothesized model
Goodness of Fit
Test of Homogeneity **
and
Test of Independence
** These are the same test, but the hypotheses are written
differently – pick the one to use based on the question being asked.
Final Review
Unit 3 Review
Instructions: These review questions are in no particular order. Show you work including
calculations or calculator inputs. For each question requiring a hypothesis test, be sure to write
your hypotheses using correct notation.
1) A company that produces snack foods uses a machine to distribute 454g bags of pretzels. A
simple random sample of 25 bags has the following weights:
465
456
438
454
447
449
442
449
446
447
468
433
454
463
450
446
447
456
452
444
447
456
456
435
450
a) Does the data provide sufficient evidence to conclude that the packaging machine is
working properly?
b) What assumptions and conditions did you need to check for your test?
2) The US Substance Abuse and Mental Health Services Administration conducts surveys on
drug use by type and age group. According to the National Household Survey on Drug
Abuse, 13.5% of 18-25 year olds were current users of marijuana or hashish in 2000. A
recent poll if 1283 randomly selected 18-25 year olds revealed that 205 currently use
marijuana or hashish. At the 10% significance level (alpha of 0.10), do the data provide
sufficient evidence to conclude that the percentage of 18-25 year olds who currently use
marijuana or hashish has changed from the 2000 percentage?
Unit 3
3) The Acme company has developed a new battery. The engineer in charge claims that they
new battery will operate longer than the old battery. To test the claim, the company selects a
simple random sample of 100 new batteries and 100 old batteries. The old batteries run
continuously for an average of 190 minutes with a standard deviation of 10 minutes. The
new batteries, for 200 minutes with a standard deviation of 40 minutes.
Test the engineer’s claim that the new batteries run longer than the old. Use a 0.05 level of
significance.
4) Democrats and Republicans were surveyed for their opinions on gun control, and the results
are shown in the table below.
Democrat
Republican
Total
favor
440
320
760
oppose
400
480
880
unsure
120
100
220
total
960
900
1860
a) Based on this sample, does a person’s opinion on gun control depend on party affiliation,
at the 0.05 level of significance?
b) What assumptions and conditions did you need to check for your test?
Unit 3
5) Suppose the Acme Drug Company develops a new drug designed to prevent colds. The
company states that the drug is equally effective for men and women. To test this claim they
choose a random sample of 100 women and 200 men from a population of 100,000
volunteers and gave them the new drug. At the end of the study, 38% of the women caught a
cold; and 51% of the men caught a cold.
a) Based on the findings, can we reject the company’s claim that they drug is equally
effective for men and women? Use a 0.05 level of significance.
b) What assumptions and conditions did you need to check for your test?
6) A poll found that 38% of a random sample of 1012 American adults said that they believe in
ghosts.
a) Find the margin of error for this poll if we want 90% confidence in our estimate of
American adults who believe in ghosts.
b) Explain what a “90% confidence interval” means and find the interval.
c) What sample size would be needed to reduce our margin of error to within 1%, while
maintaining 90% confidence?
Unit 3
7) Within a school district , students were randomly assigned to one of two math teachers, Mrs.
Smith and Mrs. Jones. After the assignment, Mrs. Smith had 30 students and Mrs. Jones had
25 students.
At the end of the year, each class took the same standardized test. Mrs. Smith’s students had
an average test score of 78, with a standard deviation of 10; and Mrs. Jones’ students had an
average test score of 85 with a standard deviation of 15.
a) Test the hypothesis that Mrs. Smith and Mrs. Jones are not equally effective teachers.
Use a 0.05 level of significance. Assume that student performance is approximately
normal.
b) Sketch the distribution and shade the area corresponding to the p-value. Label the test
statistic
c) Find the 95% confidence interval of the difference in average student exam scores
between the two teachers.
d) Is this confidence interval consistent with your decision to reject or fail to reject the null
hypothesis in part a? Why or why not?
8) The Centers for Disease Control states that in 2013, 17.8% of American adults were smokers.
What is the probability that in a random group of 100 college students sharing a dorm, more
than 20% of them are smokers?
Unit 3
9) You think that people’s ice cream favorites are reflected by the following model:
25% vanilla, 25% chocolate, 20% strawberry, 15% butter pecan, 8% rocky road, 7% other or
no preference.
You survey 1000 people and find the following preferences:
220 vanilla, 255 chocolate, 190 strawberry, 170 butter pecan, 95 rocky road, 70 other/no
preference. Using alpha = 0.05, was your idea right or wrong?
10) To see if running raises HDL (“good”) cholesterol levels, five female volunteers had their
HDL level measured before they started running and after each had run regularly an average
of 4 miles daily for 6 months.
See if you can support the claim that the average person’s HDL cholesterol level would be
raised after all that running.
Volunteer
Before running:
After running:
1
30
35
2
34
39
3
36
42
4
34
33
5
40
48
Unit 3
11) For several years, evidence has been mounting that folic acid reduces major birth defects.
Results were published in the New England Journal of Medicine. For the study, doctors
enrolled women prior to conception and divided them randomly into two groups. One group
consisting of 2701 women, took daily multivitamins containing 0.8 mg of folic acid; the
other group, consisting of 2052 women, received only trace elements. Major birth defects
occurred in 35 cases when the women took folic acid and in 47 cases when the women did
not.
a) At the 1% (0.01) significance level, do the data provide sufficient evidence to conclude
that women who take folic acid are at lesser risk of having children with major birth
defects?
b) Using your hypotheses above, describe a Type I error in the context of the problem.
What are the consequences of this type of error?
c) Using your hypotheses above, describe a Type II error in the context of the problem.
What are the consequences of this type of error?
d) Which type of error do you think would have worse consequences?
e) If there is a difference in birth defect rate, find the 98% confidence interval for the
difference in proportion of birth defects.
12) A college’s data about the incoming freshmen indicates that the mean of their high school
GPAs is 3.4 with a standard deviation of 0.35. The distribution is normal. The students are
randomly assigned to freshmen writing seminars in groups of 25.
a) Find the probability a student has a GPA greater than 3.5.
b) Find the probability that one of the groups has an average GPA greater than 3.5.
Unit 3
13) A researcher wanted to see whether the English like soccer more than the Scots. She asked
eight English and eight Scots to rate their liking for soccer on a numerical scale of 1 (hate) to
10 (love), and recorded these responses:
English:
Scots:
6.4
5.1
5.9
4.0
2.9
7.2
8.2
6.9
7.0
4.4
7.1
1.3
5.5
2.2
9.3
7.7
From the above data, can the researcher conclude that the English have a stronger liking for
soccer than the Scots? Use alpha = 0.05.
14) Soccer again! A researcher wanted to see whether the English and Scots are equally fans of
soccer. She polled random samples with the question: “Do you watch football at least once a
week?” (In the UK they call soccer football). She got these results:
Sample size Number of “yes”
English
150
104
Scots
200
160
a) At the 0.05 significance level, are the English and Scots equally fans of soccer?
Unit 3
15) This year’s water pollution readings at State Park Beach seem to be lower than last year. A
sample of 10 readings was randomly selected from this year’s daily readings”
3.5
3.9
2.8
3.1
3.1
3.4
3.2
2.5
3.5
3.1
a) Does this sample provide sufficient evidence (at the 0.05 level) to conclude that the mean
of this year’s pollution readings is significantly lower than last year’s mean of 3.8?
b) Interpret the meaning of your p-value in the context of this problem.
16) 425 school children were surveyed about what they want to be when they grow up, out of a
choice of five professions. The results were: Teacher 80, Doctor 105, Lawyer 70, Police
Officer 70, Firefighter 100. Obviously these particular children preferred some occupations
over others. You want to test whether their preferences reflect a real difference in the
population.
a) Indicate the test you should do and write your hypotheses
b) What counts would you expect for each profession if the null hypothesis were true? You
can show this in a table.
c) Run the test and report your test statistic and p-value.
d) Sketch the distribution and shade the area corresponding to the p-value. Label the test
statistic
e) State your conclusion using appropriate notation and vocabulary.
Final Review
Math 138 Final Exam Review
Although there are many problems on this review, it does not fully cover all the material in
MATH 138. For additional review problems, review your homework problems, projects, exams,
and other classroom handouts.
Unit 1
1) Classify the variables as categorical or quantitative:
a) name
b) employee ID number
c) annual income
2) A survey of patients at a hospital classified the patients by gender and blood type, as seen in
the table.
Gender
Male Female
Blood A
105
93
type
B
98
84
O
160
145
AB
15
18
a) What percentage of the patients with type-B blood are male?
b) What percentage of the female patients have type-O blood?
c) What percentage of the patients are male and have type-A blood?
d) What percentage of the patients are female or have type-O blood?
e) Give the conditional distribution of blood type for the males
f) Give the conditional distribution of blood type for the females
g) Are blood type and gender independent?
3) The number of days off that 30 police detectives took in a given year are provided below.
Create a histogram of the data. (You can do this by hand or using technology as long as the
bins are reasonable and you can reproduce an accurate sketch).
10
5
5
1
1
4
3
0
1
5
9
7
4
11
7
7
1
11
0
5
1
6
7
5
6
10
6
1
1
0
Final Review
4) Describe what these boxplots tell you about the relationship between fuel efficiency and the
number of cylinders an engine has.
5) The stem-and –leaf diagram shows the ages of males playing basketball at a public gym over
the course of a day. Describe the shape, center, spread, and unusual features of the
distribution.
4 8 9
4 0 1 2 3
3 6 6 8 8 9
3 0 0 0 1 4 4
2 6 7 9 9 9
2
1 5 5 5 5 6 6 6 6 6 6 7 7 7
1 2 3 4 4 4 4
0
0
6) The volumes of soda in quart soda bottles can be described by a Normal model with a mean
of 32.3 oz and a standard deviation of 1.2 oz.
a) What percentage of bottles can we expect to have a volume less than 32 oz?
b) 5% of bottles have a volume smaller than what amount?
Final Review
7) The ages of the 21 members of a track and field team are listed below.
15
24
28
18
24
28
18
25
30
19
25
32
22
26
33
23
26
40
24
27
42
a) Report the 5-Number Summary for the data
b) Create a boxplot for the data. Use fences to identify potential outliers, if there are any.
c) Find the mean and the standard deviation for the data
d) What is the z-score for the age of the team member who is 40 years old?
e) Interpret the meaning of this z-score.
8) The distances traveled to work in miles by the employees at a large company are normally
distributed with the mean of 35 ml. and standard deviation of 10 ml.
a) What is the z score of an employee who travels 40 miles?
b) What percent of employees travel more than 45miles ?
c) Find the 90th percentile of the travel distances
d) In what interval do the middle 70% of travel distances fall?
9) Match the given correlation coefficients with the scatter plots.
Final Review
10) The attendance at Camden Yards during the 12 years from 2000 to 2011 is given as follows:
(Source: http://www.ballparksofbaseball.com/attendance.htm)
Year
Attendance
a) Find the equation of the regression line.
2000
3,296,031
2001
3,094,841
b) Interpret the slope in context.
2002
2,682,439
2003
2,454,523
c) Find and interpret the residual for 2008
2004
2,744,013
d) Overall, is this a good linear fit? Explain.
2005
2,624,804
2006
2,153,150
e) Is it feasible to use this model to predict attendance in 2014?
2007
2,164,822
Why or why not?
2008
1,950,075
2009
1,907,163
f) What is the predicted attendance for 2014? (The actual
2010
1,733,018
attendance was 2,102,240).
2011
1,755,461
11) The March 2000 Consumer Reports compared various brands of supermarket enchiladas in
cost and sodium content. Use the scatterplot and regression analysis to answer the questions.
Fitted Line Plot
Sodium content (mg) = 2185 - 607.0 Cost (per serving)
1750
S
R-Sq
R-Sq(adj)
Sodium content (mg)
1500
250.702
77.3%
74.0%
1250
1000
750
500
1.0
1.5
2.0
Cost (per serving)
2.5
3.0
a) Use the scatterplot above to describe the relationship between Cost and Sodium Content.
b) What is the correlation coefficient for the relationship between cost and sodium content?
c) How much sodium would you expect if the cost is $2.90?
Final Review
Unit 2
12) A real estate company kept a database on the apartments in a certain city. The percentages of
various types of apartments are listed below.
Number of Percent
bedrooms
0 (Studio)
15.9
1
25.5
2
45.8
3
10.1
a) Are events represented in this table disjoint?
b) What is the probability that a randomly selected apartment in this city is a 1-bedroom or
2-bedroom apartment?
c) What is the expected value for the number of bedrooms that an apartment will have?
13) Explain the difference between an experiment and an observational study
14) Of the coffee makers sold in an appliance store, 6.0% have either a faulty switch or a
defective cord, 2.0% have a faulty switch, and 0.8% have both defects. What percent of the
coffee makers will have a defective cord?
15) Suppose a computer chip manufacturer rejects 15% of the chips produced because they fail
presale testing.
a) What is the probability that the first chip fails and the second chip passes presale testing?
b) If you test 4 chips, what is the probability that not all of the chips fail?
Final Review
16) Suppose that on any given day, there is a 65% chance of it being sunny, a 10% chance of
rain, and a 25% chance of being cloudy.
a) Clearly explain how you would use the random numbers from 1 to 100 to conduct a
simulation to model this situation
b) If you were asked to find the probability that it will sunny at least two days in a week,
clearly state what the response variable would be.
c) For each trial below, fill in the resulting outcome
Trial #
Random Numbers
1
2
3
4
5
6
65 91 5 72 41 69 48
90 82 93 74 63 67 71
34 35 56 83 42 4 95
91 42 18 37 81 85 67
80 57 63 38 76 35 12
63 84 65 36 82 89 68
Number of Sunny
Days
Response Variable
d) Using your simulation, estimate the probability that it will be sunny at least two days in a
week.
e) Using your simulation, how many sunny days would you estimate per week?
17) Assume that 11% of people are left-handed. If we select 10 people at random, find the
probability that
a) Exactly 3 are left-handed.
b) At most 2 are left-handed
c) At least 2 are left-handed
d) Between 1 and 4 people are left-handed (inclusive)
e) Explain how you know that the Binomial model applies to this situation
f) What is the mean and the standard deviation of the number of left-handed people in a
group of 10?
Final Review
18) Suppose that in a given suburb 60% of the houses have garages, 40% have decks, and 30% of
the houses have both.
a) What is the probability that a randomly selected house has a garage or a deck?
b) What is the probability that a randomly selected house will have neither a garage nor a
deck?
c) What the probability that a randomly selected house will have a deck but not a garage?
d) What is the probability that a randomly selected house will have a deck if it is known that
the house has a garage?
e) Are the events of a house having a garage and a house having a deck independent?
Justify your answer with a probability test.
19) In a certain college, 33% of the physics majors belong to ethnic minorities. If 10 students are
selected at random from the physics majors, what is the probability that no more than 6
belong to an ethnic minority?
20) Suppose you buy 1 ticket for $1 out of a lottery of 100 tickets where the prize for the one
winning ticket is to be $50. What is your expected value?
21) A tax auditor has a pile of 191 tax returns of which he would like to select 17 for a special
audit. Describe a method for selecting the sample which involves
a) systematic sampling
b) stratified sampling
c) simple random sampling
22) At a college there are 120 freshmen, 90 sophomores, 110 juniors, and 80 seniors. A school
administrator selects a random sample of 12 of the freshmen, 9 of the sophomores, 11 of the
juniors and 8 of the seniors. She then interviews all the students selected. Identify the type
of sampling used in this example.
Final Review
23) A car insurance company is interested in the association between age and the frequency of
car accidents. They obtained the following sample data.
Number of
accidents in
past 3 years
0
1
More than 1
total
Age Group
Under 25 25-45 Over 45
74
89
82
18
8
12
8
3
6
100
100
100
total
245
38
17
300
a) What is the probability that a randomly selected participant was under age 25?
b) What is the probability that a randomly selected participant was under age 25 or in more
than 1 car accident?
c) What is the probability that a randomly selected participant was over age 45 and in 0 car
accidents?
d) What is the probability that a randomly selected participant was over age 45 if it is known
that they were in 0 car accidents?
e) Do the events of being over 45 and having had 0 accidents in the past 3 years appear to
be independent? Justify your answer using probabilities.
24) Suppose that in a given court system 20% of defendants are truly innocent and 80% of
defendants are truly guilty. Defendants are given the choice to plead innocent or plead
guilty. Suppose that defendants who are innocent plead innocent 99% of the time and that
defendants who are guilty plead innocent 70% of the time.
a) Find the probability that a defendant is guilty and pleads guilty
b) Find the probability that a defendant is innocent and pleads innocent
c) Find the probability that a defendant pleads innocent
d) Find the probability that a defendant who pleads innocent is actually innocent
Final Review
Unit 3
25) Assume that 25% of students at a university wear contact lenses. We randomly select 200
students.
a) What is the mean and standard deviation of the proportion of students in this group who
may wear contact lenses?
b) What is the probability that we observe a sample proportion (𝑝̂ ) of 30% or more students
who wear contact lenses?
26) The number of hours per week that high school seniors spend on computers is normally
distributed, with a mean of 4 hours and a standard deviation of 2 hours. 60 students are
chosen at random. Let y be the mean number of hours spent on the computer for this group.
Find the probability that y is between 4.2 and 4.4 hours.
27) A researcher wishes to estimate the proportion of fish in a certain lake that is inedible due to
pollution of the lake. How large a sample should be tested in order to be 99% confident that
the true proportion of inedible fish is estimated to within 6%?
28) A mayoral election race is tightly contested. In a random sample of 2200 likely voters, 1144
said that they were planning to vote for the current mayor. Based on a 95% confidence
interval, would you claim that the mayor will win a majority of the votes? Explain.
29) 7 of 8,500 people vaccinated against a certain disease later developed the disease. 18 of
10,000 people vaccinated with a placebo later developed the disease. Test the claim that the
vaccine is effective in lowering the incidence of the disease. Use a significance level of 0.02.
30) Suppose the proportion of sophomores at a particular college who purchased used textbooks
in the past year is p s and the proportion of freshmen at the college who purchased used
textbooks in the past year is p f . A study found a 95% confidence interval for ps  p f is
0.235,0.427 .
Does this interval suggest that sophomores are more likely than freshmen to
buy used textbooks? Explain.
Final Review
31) A skeptical paranormal researcher claims that the proportion of Americans that have seen a
UFO, p, is less than 4%. He surveys 500 randomly selected Americans and finds that 11 of
them claim to have seen a UFO.
a) What type of test would be appropriate
b) Write the hypotheses
c) Check the assumptions and conditions
d) Conduct the hypothesis test. Report the test statistic and p-value, sketch the curve, and
make an appropriate conclusion.
e) What type of error might have occurred?
32) A police officer pulls over an individual that was driving recklessly. The police office is
trying to determine if the individual should be arrested for driving under the influence of
alcohol. Given Null and alternative hypotheses below, identify the type of error:
Ho: The individual is not driving under the influence of alcohol.
Ha: The individual is driving under the influence alcohol.
a) The police officer determines that the individual should be arrested and is driving under
the influence of alcohol when the individual is not driving under the influence of alcohol.
b) The police officer determines that the individual should not be arrested and is not driving
under the influence of alcohol when the individual is under the influence of alcohol.
33) In the past, the mean running time for a certain type of flashlight battery has been 8.5 hours.
That manufacturer has introduced a change in the production method and wants to perform a
hypothesis test to determine whether the mean running time has increased as a result. He
samples 30 newly produced batteries and find a sample mean of 9.2 hours with a standard
deviation of 0.6 hours.
a) What type of test would be appropriate?
b) Write the hypotheses:
c) Conduct the test (report the test statistic, p-value, sketch the curve, and write an
appropriate conclusion). You can assume that all assumptions and conditions have been
met.
Final Review
34) Using the data below and a 0.05 significance level, test the claim that the responses occur
with percentages of 15%, 20%, 25%, 25%, and 15% respectively.
Response
A
B
C
D
E
Frequency
12 15 16 18 19
35) A coach uses a new technique to train gymnasts. 7 gymnasts were randomly selected and
their competition scores were recorded before and after the training. The results are shown
below.
Subject A
B
C
D
E
F
G
Before
9.4 9.5 9.6 9.6 9.4 9.6 9.6
After
9.5 9.7 9.6 9.5 9.5 9.9 9.4
Do the data suggest that the training technique is effective in raising the gymnasts’ scores?
Perform a hypothesis test at the 5% significance level.
36) A laboratory tested twelve chicken eggs and found that the mean amount of cholesterol was
240 milligrams with s  19.8 milligrams.
a) Construct a 95% confidence interval for the true mean cholesterol content of all such
eggs.
b) Interpret this confidence interval.
37) Suppose you have obtained a confidence interval for  , but wish to obtain a greater degree of
precision. Which of the following would result in a narrower confidence interval?
a)
b)
c)
d)
Increasing the sample size while keeping the confidence level fixed
Decreasing the sample size while keeping the confidence level fixed
Increasing the confidence level while keeping the sample size fixed
Decreasing the confidence level while keeping the sample size fixed
38) A car insurance company performed a study to determine whether an association exists
between age and the frequency of car accidents. They obtained the following sample data.
Perform a test to see if there is an association between age and frequency of car accidents.
  0.05
Age Group
Under 25 25-45 0ver 45 total
Number of
0
74
89
82
245
accidents in
1
18
8
12
38
past 3 years
More than 1
8
3
6
17
total
100
100
100
300