Download Sample – margin of error

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Taylor's law wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

German tank problem wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Misuse of statistics wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript
Math 140 Notes and Activity Packet (Word)
Sampling Variability, Sampling Distributions,
Standard Error and Confidence Intervals
Confidence Intervals Act 1
Exploring Sampling Variability for Mean Averages with a Sampling Distribution
The goal of this activity is to explore how well random samples approximate population values.
Normally we do not know population values and we must use a sample value to approximate
the population value. This is called a “point estimate”. For this activity we will look at some
population data from International Coffee Organization (ICO). We will be using the “Columbian
Mild” price data in U.S. cents per pound. The population mean average price was 134.338
cents per pound. Again, in real data analysis we often do not know the population value, but
for this activity it is useful for comparison purposes.
Open the “Confidence Interval Act 1 Data Set A” in Excel. 81 random samples have been taken
from the Columbian Mild data. All the data sets have 30 coffee prices. Each person in the class
will be finding the mean of a few of these data sets. Once you find the mean, you will put a
magnet up or draw a dot on the board to represent the sample mean you found. When
everyone’s magnets or dots are up on the board, we will have generated a “sampling
distribution”.
Answer the following questions:
1. The population mean was 134.338 cents. How many cents was the sample mean you
calculated from the population mean of 134.338 cents? (If you calculated more than one
sample mean, answer the question for all the sample means you calculated.)
2. Look at the dots or magnets on the board. Did all the sample means come out to be the
same as the population mean of 134.338 cents? Why do you thing this happened? Aren’t
random samples supposed to be good approximations of the population? What does this tell
you about sampling variability?
3. Normally we may have only one random sample. If all you knew was one of the random
samples on the board, how difficult would it be to determine that the population mean is really
134.338 cents? What does this tell us about the difficulty in determining population values
from 1 random sample?
4. Estimate the shape and center of the sampling distribution on the board. Is the center of the
graph close to the population mean of 134.338? Would the center of the sampling distribution
be a better approximation of the population mean than a single sample mean?
5. The standard deviation of a sampling distribution is often called the “standard error” and is
an important part of inferential statistics. Let’s see if we can estimate the standard deviation of
the sampling distribution on the board.
What is 95% of the 81 total dots on the board? (Round to the ones place.) Find two values that
about 95% of the dots fall in between. How far apart are these two values? The empirical rule
says that if the data set is bell shaped, the middle 95% should be about 4 total standard
deviations. So divide your distance between the values by 4. This should be pretty close to the
actual standard deviation. This is the standard deviation of the sampling distribution which is
called “standard error”.
Picture of Conf Int Act 1 (PDF online only) board work is available.
Confidence Intervals Act 2
Using Technology to Create a Sampling Distribution for Sample Means
Let’s look again at the Columbian Mild coffee data. Remember that the population mean price for the
Columbian mild coffee was 134.338 cents per pound. In activity 1 we wanted to explore how far
random sample means would be from the population value. So we created our own Sampling
distribution for sample means. This was a lot of work. Remember, everyone in our class calculated a
few sample means from samples of size 30. They then put the dots on the board for each sample mean
to create the distribution.
Could we do the same thing, but this time use technology? Definitely.
As we saw above, we want to create a sampling distribution from the Columbian Mild coffee data. Go
to www.matt-teachout.org and click on the “Statistics” tab. Now go to “data sets” and open the coffee
data. Copy the column that says “Columbian mild”.
There are several good programs on the market. We will look at “Stat Key” by the Lock family. Go to
the website www.lock5stat.com . Click on the button that says “StatKey”. Look for the tab that says
“sampling distributions mean” and click on it. Click on the edit data tab. Delete the data set currently
there and paste in the Columbian Mild data. Click on “samples of size n” and put in 30. (Remember our
activity yesterday used samples of size 30.) Turn off the button that says “first column is identifier” as
we have only a single column of data. Now click ok.
You are now ready to create your sampling distribution. It is always best to start slowly by having the
computer take 1 random sample for you. So click on “Generate 1 sample”. Let’s see if we understand
what we are looking at. The computer picked 30 numbers randomly from our list and calculated the
sample mean. The right side at the bottom shows the actual numbers. Then the computer put a dot on
the graph exactly like we did yesterday when we put magnets on the board. Click it a couple more times
and each time note that the computer took a random sample of 30, found the sample mean and put a
dot for the sample mean on the graph. Notice the sample means are different each time. This also
happened yesterday when our class made the distribution.
Now let’s speed up the process. Click on “generate 10 samples” a few times. This calculates 10 random
sample means at a time for us. Let’s go faster. Click on “generate 100 samples” a few times. This
calculates 100 random sample means at a time for us. Notice the computer has already collected more
random samples than our whole class was able to do yesterday. You know what is next. Let’s really
speed this up and get a ton of random samples. Click on “generate 1000 samples” a few times. This
calculates 1000 random sample means at a time!!
You have created a sampling distribution of sample means. Use the sampling distribution to answer
some of the same questions we answered yesterday.
1. Notice not all the dots (sample means) are exactly the same as the population mean 134.338 cents
per pound. Again, what does this tell us about the difficulty of taking one random sample and using it to
estimate the population value? (Remember this is called a “point estimate”.)
2. Compare the sampling distribution from “Stat Key” on the Lock website to the sampling distribution
we did by hand yesterday. Stat key kept track of how many random samples you took. How many was
there? As the number of sample means (dots) increases, is it easier or harder to determine the shape?
What is the shape of the distribution?
3. Stat Key calculated the mean average of the all the sample means and put a pointer at the center. Is
the center close to the population mean of 134.338 cents per pound? As the number of random
samples increases, does the mean of the distribution get more or less accurate as an estimate of the
population value? Why do you think that is?
4. In the last activity, we estimated the standard deviation of the distribution. We said that this has a
special name and is called the “standard error”. Notice that Stat Key calculated the standard deviation
for the distribution. What was the standard error for the distribution?
5. Recall in the last activity that we used the distribution on the board to find an approximate standard
error. Go to the upper left corner of Stat Key and click on where it says “two tails”. Notice it calculated
the two Columbian mild coffee prices that 95% of the sample means fell in between. If we look at the
total distance between these numbers, how many standard errors are they apart. Does this agree with
Empirical rule?
Confidence Intervals Act 3
Exploring Sampling Variability for Percentages with a Sampling Distribution
The goal of this activity is to explore how well random sample percentages approximate
population percentages. Normally we do not know population percentage and we must use a
sample percentage to approximate the population percentage. This is called a “point
estimate”. For this activity we will be flipping coins 20 times and count the number of tails.
Then calculate the sample percentage of tails. Each person will do two sets of 20 and therefore
get two sample percentages. Again, in real data analysis we often do not know the population
value, but for this activity it is useful for comparison purposes. Our goal is to see how well
random sample percentages approximate population percentages.
Each person in the class will be finding two sample percentages. Once you find each sample
percent, you will put a magnet up or draw a dot on the board to represent the sample percent
you found. When everyone’s magnets or dots are up on the board, we will have generated a
“sampling distribution” of sample percentages.
Answer the following questions:
1. In a perfect world and a fair coin, what should the population percentage for getting tails
be? So in a sample of 20 how many times do we expect to get tails? In sampling we often do
not get what we expect. How far were the sample percentages you calculated from the
population percentage?
2. Look at the dots or magnets on the board. Did all the sample percentages come out to be
the same as the population percentage? Why do you thing this happened? Aren’t random
samples supposed to be good approximations of the population? What does this tell you about
sampling variability?
3. Normally we may have only one random sample. If all you knew was one of the sample
percentage on the board, and you never knew the expected population value, how difficult
would it be to determine what the population percentage really is? What does this tell us
about the difficulty in determining population values from 1 random sample?
4. Estimate the shape and center of the sampling distribution on the board. Is the center of the
graph close to the population percentage of 0.5? Would the center of the sampling distribution
be a better approximation of the population percentage than a single sample percentage?
5. The standard deviation of a sampling distribution is often called the “standard error” and is
an important part of inferential statistics. Let’s see if we can estimate the standard deviation of
the sampling distribution on the board.
What is 95% of the total number of dots on the board? (Round to the ones place.) Find two
values that about 95% of the dots fall in between. How far apart are these two values? The
empirical rule says that if the data set is bell shaped, the middle 95% should be about 4 total
standard deviations. So divide your distance between the values by 4. This should be pretty
close to the actual standard deviation. This is the standard deviation of the sampling
distribution which is called “standard error”.
Picture of Conf Int Act 3 (PDF online only) board work is available.
Confidence Intervals Act 4
Using Technology to Create a Sampling Distribution for Sample Percentages
Let’s look again at the topic of flipping a coin. Remember that the population percentage of tails should
be 0.5 (50%). In activity 3 we wanted to explore how far random sample percentages would be from the
population value. So we created our own Sampling distribution for sample percentages. This was a lot
of work. Remember, everyone in our class flipped the coin 20 times and calculated the sample
percentage of tails. They then put the dots on the board for each sample percentage to create the
distribution.
Could we do the same thing, but this time use technology? Definitely.
As we saw above, we want to create a sampling distribution for sample percentages. As with activity 2,
we can use “Stat Key” by the Lock family. Go to the website www.lock5stat.com . Click on the button
that says “StatKey”. Look for the tab that says “sampling distributions proportion” and click on it. Click
on the edit proportion tab. Change the proportion to 0.5 and then push ok. Yesterday we flipped the
coins 20 times, so click on “samples of size n” and put in 20 and click ok.
You are now ready to have the computer flip some coins and create your sampling distribution. It is
always best to start slowly by having the computer take 1 random sample for you. So click on “Generate
1 sample”. Let’s see if we understand what we are looking at. The computer simulated flipping the coin
20 times and calculated the sample percentage for getting tails. The right side at the bottom shows the
actual flips. Then the computer put a dot on the graph exactly like we did when we put magnets on the
board. Click it a couple more times and each time note that the computer took a random sample of 20,
found the sample percentage and put a dot for the sample mean on the graph. Notice the sample
percentages are different each time. This also happened yesterday when our class made the
distribution.
Now let’s speed up the process. Click on “generate 10 samples” a few times. This calculates 10 random
sample percentages at a time for us. Let’s go faster. Click on “generate 100 samples” a few times. This
calculates 100 random sample percentages at a time for us. Notice the computer has already collected
more random samples than our whole class was able to do. You know what is next. Let’s really speed
this up and get a ton of random samples. Click on “generate 1000 samples” a few times. This calculates
1000 random sample means at a time!!
You have created a sampling distribution of sample percentages (proportions). Use the sampling
distribution to answer some of the same questions we answered in activity 3.
1. Notice not all the dots (sample percentages) are exactly the same as the population percentage of 0.5
. Again, what does this tell us about the difficulty of taking one random sample (one dot) and using it to
estimate the population value? (Remember this is called a “point estimate”.)
2. Compare the sampling distribution from “Stat Key” on the Lock website to the sampling distribution
we did by hand on the board. Stat key kept track of how many random samples you took. How many
was there? As the number of sample percentages (dots) increases, is it easier or harder to determine
the shape? What is the shape of the distribution?
3. Stat Key calculated the mean average of the all the sample percentages and put a pointer at the
center. Is the center close to the population value of 0.5? As the number of random samples increases,
does the center of the distribution get more or less accurate as an estimate of the population value?
Why do you think that is?
4. In activity 3, we estimated the standard deviation of the distribution. We said that this has a special
name and is called the “standard error”. Notice that Stat Key calculated the standard deviation for the
distribution. What was the standard error for the distribution?
5. Recall in the last activity that we used the distribution on the board to find an approximate standard
error. Go to the upper left corner of Stat Key and click on where it says “two tails”. Notice it calculated
the two Columbian mild coffee prices that 95% of the sample means fell in between. If we look at the
total distance between these numbers, how many standard errors are they apart. Does this agree with
Empirical rule?
Reflection: What is the difference between mathematical reasoning and statistical reasoning? This is a
rather large question. They are in fact different. Number 5 above may give you some insight as to how
they are different. Look at the following question.
What is the probability of getting tails during coin flips?
Mathematician: The exact probability is 0.5 or 50%.
Statistician: The probability could be anywhere from 0.3 (30%) to 0.7 (70%).
Notice the statistician recognizes that when people flip coins they will not always get tails exactly 50% of
the time. Statistical reasoning recognizes the role that random chance plays and that samples will have
a lot of variability.
It is important to say that both mathematical reasoning and statistical reasoning are equally important.
The point is that they are different.
Math 140 Notes: Interpreting Confidence Intervals
Sampling Distributions (lots of random samples)  Center of distribution is a
pretty good estimate of Population Value
One Random Sample  Bad estimate of the population value
How Bad??
Estimate Margin of Error (How far off the sample value is from the population
value)??
If we can estimate the margin of error, we can create something called a
confidence interval.
Confidence Interval: Two values we think the population value is in between.
How to Calculate a Confidence interval:
Sample Value ± Margin of Error
Confidence Level : 90%, 95% and 99% (most common is 95%)
Define “95% confident”: 95% of confidence intervals created contain the
population value and 5% of them don’t contain the population value.
Example 1
IQ tests: Sample mean of 99 and a margin of error of 26.
Create Confidence interval (99 ± 26)
Sample – margin of error < population value < sample + marg of error
99 – 26 < population value < 99 + 26
73 < population value < 125
Sentence to explain this confidence interval?
“We are 95% confident that the population mean average IQ score is between
73 and 125.”
Write Confidence intervals in three ways
1. Sample Value ± margin of error
99 (± 26 error)
2. Inequality notation: 73 < µ < 125
3. Interval notation: ( 73 , 125 )
Example 2
Sample percentage was 0.365 (36.5%)
Margin of Error = 0.049 (4.9%)
Confidence interval : 0.365 + or – 0.049
36.5% (+or- 4.9% error)
0.365 – 0.049 < population percentage < 0.365 + 0.049
0.316 < P < 0.414
( 0.316 , 0.414 )
Sentence to explain interval? We 95% confident that the population percentage
is in between 31.6% and 41.4%.
Note:
StatCrunch often gives the confidence interval without the Margin of Error.
Suppose we know the confidence interval. Can we figure out the sample value
and margin of error? Sure
The sample value will be at the middle of the interval. The margin of error is the
distance from the middle. Here are two formulas commonly seen in stat books.
Sample Value = ( upper limit + lower limit ) /2
Margin of Error = ( upper limit - lower limit ) /2
Example 3
Suppose we have a 95% confidence interval estimate of the population mean
weight (in kilograms)
(51.7 kg , 63.4 kg)
What is the sample value and margin of error used to make this confidence
interval?
Sample Value = (63.4 + 51.7 ) / 2 = 115.1 / 2 = 57.55 kg
Margin of Error = (63.4 - 51.7 ) / 2 = 11.7 / 2 = 5.85 kg
Math 140 Confidence Intervals Activity#5
Interpreting Confidence Intervals
1. The following confidence intervals were given in Magazine articles and Newspapers. For
each interval, identify the sample statistic  pˆ or x  and the margin of error. Then write the
confidence interval using inequalities and using interval notation. Now write a sentence
explaining the confidence interval to someone. (Assume these are all using a confidence
level of 95%)
a) “Roughly 39%  2.5% error  of the adult population is infected with this disease…”
b) “Men have an average height of 69.2 inches with an error of 1.9 inches.”
c) “The vaccine has proved very effective. Vaccinated children have only a 4.7%
chance of getting infected with an error of 1.2% .”
2. For each of the following confidence intervals, identify the sample statistic  pˆ or x  and the
margin of error. Then write a sentence explaining the true meaning of the confidence
interval. Write another sentence explaining the true meaning of the confidence level.
a) A 95% confidence interval estimate of the population proportion of cows having the
disease is  0.731 , 0.764  .
b) A 99% confidence interval estimate of the population mean number of miles is
13.4    17.2
c) A 90% confidence interval estimate of the population proportion of people who will
vote for the Independent party candidate is 0.068  p  0.083 .
d) A 95% confidence interval estimate of the population mean amount of milk in gallons
is 8.35 , 10.21 .
Math 140 Confidence Intervals Activity#6
Exploring Z-scores to use in Confidence intervals
Directions: Recall that to compute a 95% confidence interval estimate of a population value, we
use approximately 2 standard deviation from the center. Recall we get the two standard
deviations from the Empirical Rule. But the Empirical rule says that 95% of the data will fall
within “approximately” two standard deviations. But how accurate is 2 standard deviations if it
says “approximately”? Can we find a more accurate answer to the number of standard
deviations from the center that 95% of the data is in between? What about 90% or 99% since
those are also commonly used? These values are Z-scores and are often called “Critical Values”
in Statistics.
StatCrunch Directions: Open a blank page in StatCrunch. Go to the Stat menu, then click on
Calculator, then click on Normal. Now click on the “Between” tab. ( Stat  Calculator 
Normal  Between ) Z-scores have a mean of 0 and a standard deviation of 1. These will be
the default when you first open the normal calculator. Leave the mean at 0 and the standard
deviation at 1. Make sure the two x-value boxes are empty. In the last box where the probability
goes, put in 0.9 , 0.95 and 0.99 to find the famous critical value Z-scores.
1. Use Statcrunch to find the two Z-scores that corresponds to the middle 95%. Draw a picture
showing the Z-scores and 95%. Remember that the area under the curve between these Z-scores
must be 0.95. Do you remember what mean and standard deviation we use to find Z-scores on
Statcrunch? These Z-scores are pretty famous and are the Z-scores for 95% confidence intervals
 Zc for short  .
2. If we use a 99% confidence level instead of 95% do you think the Z-score will be more or less
than + or – 2? Draw a picture and explain why you think so. Now use Statcrunch to find the two
Z-scores that we could use to calculate a 99% confidence interval. How well did your first guess
agree with what we found on Statcrunch?
3. Now repeat #2, but use a 90% confidence interval. If we use a 90% confidence level instead
of 95% do you think the Z-score will be more or less than + or – 2? Draw a picture and explain
why you think so. Now use Statcrunch to find the two Z-scores that we could use to calculate a
90% confidence interval. How well did your first guess agree with what we found on
Statcrunch?
Let’s Summarize the Z-score critical values that we found. These are important to memorize as
we will be using them constantly in inferential statistics.
Confidence Level
Z score for the confidence interval
90%
Zc  
95%
Zc  
99%
Zc  
Go over Conf Int Notes (PDF online only) on constructing 1 pop mean and
1 pop proportion confidence intervals before doing Conf Int Act 7.
Math 140 Confidence Intervals #7
Constructing Confidence Intervals for 1 population mean
and 1 population proportion (percentage)
Confidence intervals give two values that we think the population value is in between. To
construct a confidence interval, we start with the sample value (point estimate) and then add
and subtract a certain number of standard deviations from the sample value. These standard
deviations are also called standard errors. The number of standard errors is the critical z-scores
corresponding to a certain confidence level. Later, we will see that we can also use the tdistribution to calculate the number of standard errors, but for now we will just use the
standard normal distribution (z-scores).
s
and your calculator to calculate the
n
confidence interval estimate of the population mean  .
Directions: For numbers 1-3, use the formula x  Z c
1. A random sample of 650 high school students has a normal distribution. The sample
mean average ACT exam score was 21 with a 3.2 sample standard deviation. Construct
a 99% confidence interval estimate of the population mean average ACT exam.
2. A random sample of 200 adults found that they had a sample mean temperature of 98.2
degrees and a standard deviation of 1.8 degrees. Construct a 95% confidence interval
estimate of the population mean body temperature of adults. Does the confidence
interval indicate that normal body temperature could be 98.6 degrees?
3. A random sample of 315 adults found that the sample mean amount or credit card debt was
$435 with a standard deviation of $106. Construct a 90% confidence interval estimate of
the population mean amount of credit card debt.
pˆ 1  pˆ 
and your calculator to
n
calculate the confidence interval estimate of the population percent p. You may have to use
x
the formula pˆ  to calculate the sample percent p̂ . Also remember to write the sample
n
proportion p̂ as a decimal before plugging into the formula.
Directions: For numbers 4-6, use the formula pˆ  Z c
4. In a random sample of 72 adults in Santa Clarita, CA, each person was asked if they
support the death penalty. 31 adults in the sample said that they do support the death
penalty. What was the sample proportion of adults in Santa Clarita that support the
death penalty? Now calculate a 95% confidence interval population estimate of people
in Santa Clarita that support the death penalty. Remember to use the appropriate
critical value Z-score for each.
5. In a random sample of 400 Americans, each person was asked if they are satisfied with
the amount of vacation time they given by their employers. 84% of them said that they
were not satisfied with their vacation time. Calculate the following. What was the
sample proportion of Americans that were not satisfied with their vacation time? Now
construct a 99% confidence interval in order to estimate the percent of Americans that
are not satisfied with their vacation time.
6. What percent of eligible Americans vote? In 2008, a random sample of 3000 American
adults that were eligible to vote was taken and we found that 2040 of them voted.
Construct a 90% confidence interval estimate of the population percent of Americans
that vote. Now construct another confidence interval. This time construct a 90%
confidence interval estimate of the population percent of Americans that do not vote.
Hint: For the “do not vote” group, the sample percent will change.
Go over Conf Int Notes (PDF online only) on constructing
confidence intervals with technology before doing Conf Int Act 8.
Math 140 Confidence Intervals Activity #8
Calculating Confidence Intervals with Statcrunch, Minitab or Statcato
Confidence intervals can be constructed with Statistics software like Statcrunch, Minitab and
Statcato.
Statcrunch: To construct a confidence interval for one proportion, go to “Stat” tab, then
“Proportion stats”, then “One Sample” and “with summary”. Plug in the number of events and
trials and choose “confidence interval”. To construct a confidence interval for the mean with
the z-score standard normal curve, go to “Stat” tab, then “Z-stats”, then “One Sample” and
“with summary”. Plug in the sample mean, sample standard deviation and sample size and
choose “confidence interval”. (Note: We will see later that we may choose to use the tdistribution instead of the z-score).
Statcato: To construct a confidence interval on Statcato, go to the “Statistics” tab, then to
“Confidence Intervals”. Then click on the type of confidence interval you want (1 proportion or 1
mean). Under summary statistics enter the information required.
(#1-7) Directions: Use Statcrunch, Minitab or Statcato to calculate the confidence interval
estimate of the population mean  . Then write a sentence that explains the meaning of the
confidence interval in the context of the problem.
1. A random sample of 650 high school students has a normal distribution. The sample
mean average ACT exam score was 21 with a 3.2 sample standard deviation. Construct
a 90% confidence interval estimate of the population mean average ACT exam. Also
construct a 95% confidence interval estimate of the population mean average ACT
exam.
2. A random sample of 200 adults found that they had a sample mean temperature of 98.2
degrees and a standard deviation of 1.8 degrees. Construct a 95% confidence interval
estimate of the population mean body temperature of adults. Construct a 99%
confidence interval estimate also. Do the confidence intervals indicate that normal
body temperature could be 98.6 degrees?
3. A random sample of 315 adults found that the sample mean amount or credit card debt
was $435 with a standard deviation of $106. Construct a 90% confidence interval
estimate of the population mean amount of credit card debt. Also construct a 95%
confidence interval estimate of the population mean amount of credit card debt.
4. A local Starbucks wants to estimate how long on average do their customers have to
wait for their drinks to be ready after they have ordered. Random customers were
selected and the staff measured the number of minute between when the person
ordered and when their drink was ready. Here are the wait times. Find the sample
mean and the sample standard deviation and construct a histogram. What is the shape?
Construct a 95% confidence interval estimate of the average number of minutes
Starbucks customers have to wait for their drink. Write a sentence that interprets your
confidence interval.
2.5 3.1 0.4 4.5 2.3
1.5 3.6 3.9 5.0 4.2
1.7 0.6 2.8 1.4 5.5
2.4 2.8 2.3 3.8 3.1
5. Redwood trees are the tallest plants on Earth. California is famous for its giant
Redwood trees. But just how tall are they? A random sample of 47 California Redwood
trees was taken and their heights measured. (This was not easy by the way.) The
sample mean average height was 248 feet with a standard deviation of 26 feet. Create a
90% confidence interval estimate of the average height of Redwood trees in California.
What does “90% confident” mean?
6. Maria is planning to attend UCLA. She is curious what the average age of UCLA students
is. Since most students that attend UCLA are in their 20’s yet there are also students up
to 70 years old, the population is positively skewed. The college conducted a random
sample of 65 students and found that the sample mean was 29.0 years old with a
standard deviation of 5.2 years. Construct a 95% confidence interval estimate of the
average age of all students at UCLA. Write a sentence that interprets the interval.
7. Mike wants to know the average price of a hamburger. So he randomly selects 30
randomly selected restaurants and records the price of a regular hamburger. Here are
the prices. Find the sample mean and the sample standard deviation. Construct a
histogram of the data. What is the shape? Construct a 99% confidence interval
estimate of the average price of a hamburger. Write a sentence that interprets the
meaning of the interval. What does “99% confident” mean?
$6.00 $4.28 $4.76 $2.56 $3.81
$2.79 $3.50 $3.96 $4.61 $4.56
$4.30 $3.24 $3.31 $5.21 $3.98
$5.12 $2.03 $3.90 $5.15 $3.06
$5.35 $6.32 $2.07 $3.72 $2.69
$2.12 $4.83 $3.45 $3.31 $3.86
(#8-10) Directions: Use Statcrunch, Minitab or Statcato to calculate the confidence interval
estimate of the population percent p . Then write a sentence that explains the meaning of
the confidence interval in the context of the problem. You may have to use the formula
x  pˆ n if the number of events is not given.
8. What percent of eligible Americans vote? In 2008, a random sample of 3000 American
adults was taken and we found that 68% of them voted. Construct a 90% confidence
interval estimate of the population percent of Americans that vote. Now construct a
99% confidence interval estimate.
9. In a random sample of 72 adults in Santa Clarita, CA, each person was asked if they
support the death penalty. 31 adults in the sample said that they do support the death
penalty. What was the sample proportion of adults in Santa Clarita that support the
death penalty? What is the standard error for this sampling distribution? Now
calculate a 95% confidence interval population estimate of people in Santa Clarita that
support the death penalty. Then calculate a 90% confidence interval estimate of the
population.
10. In a random sample of 400 Americans, each person was asked if they are satisfied with
the amount of vacation time they given by their employers. 16% of them said that they
were satisfied and 84% of them said they were not satisfied. Calculate the following.
What was the sample proportion that was satisfied with their vacation time? What was
the standard error for the sampling distribution? Now calculate a 99% confidence
interval population estimate of Americans that are satisfied with their vacation time and
a 99% confidence interval population estimate of Americans that are not satisfied with
their vacation time.
11. Answer the following questions.
a. As the confidence level gets higher, does the confidence interval get narrower
or wider? Why?
b. As the confidence level decreases, what happens to the margin of error? Why?
c. As the sample size increases, what happens to the standard error? Why?
Confidence Intervals Act 9
Understanding “Confidence”
Revisiting Sampling Distributions for Sample Mean and Sample Percentages
We said that being 95% confident means that 95% of confidence intervals created contain or capture the
true population value and 5% don’t. This is a sentence that students sometimes memorize, but rarely
understand.
The goal of the activity is to understand this with a sampling distribution. Recall that we calculated a
sampling distribution with the help of “statkey” on the website www.lock5stat.com.
Part I (Confidence Intervals with Sample means)
Let’s look again at the Columbian Mild coffee data. Remember that the population mean price for the
Columbian mild coffee was 134.338 cents per pound. In Conf Int Act 1 we wanted to explore how far
random sample means would be from the population value. So we created our own Sampling
distribution for sample means. This was a lot of work. Remember, everyone in our class calculated a
few sample means from samples of size 30. They then put the dots on the board for each sample mean
to create the distribution. In Conf Int Act 2 we saw that we could use technology to create the sampling
distribution. Let’s use the technology again to create lots and lots of samples from the lock5stat.com.
This time though we are going to create confidence intervals from each sample, so we can learn more
about how confidence intervals work.
As we saw above, we want to create a sampling distribution from the Columbian Mild coffee data. Go
to www.matt-teachout.org and click on the “Statistics” tab. Now go to “data sets” and open the coffee
data. Copy the column that says “Columbian mild”.
Go to the website www.lock5stat.com . Click on the button that says “StatKey”. Look for the tab that
says “sampling distributions mean” and click on it. Click on the edit data tab. Delete the data set
currently there and paste in the Columbian Mild data. Click on “samples of size n” and put in 30.
(Remember Act1 and Act2 used samples of size 30.) Turn off the button that says “first column is
identifier” as we have only a single column of data. Now click ok.
You are now ready to create your sampling distribution. This time we want the computer to create a
confidence interval for each sample it takes. On the right side of the screen, click on the button that
says confidence intervals. StatKey will take a random sample from the population data, find the sample
mean and place a dot for the sample mean in the distribution. It will also create a confidence interval
from that sample mean. Remember that the population mean price for the Columbian mild coffee was
134.338 cents per pound. StatKey will keep track of whether the true population mean is actually
contained in the confidence interval or not. Green means the confidence interval did contain the
population value and red means that the confidence interval did not contain the population value.
It is always best to start slowly by having the computer take 1 random sample for you. So click on
“Generate 1 sample”. Let’s see if we understand what we are looking at. The computer picked 30
numbers randomly from our list and calculated the sample mean. The right side at the bottom shows
the actual numbers. Then the computer put a dot on the graph exactly like we did when we put
magnets on the board. The computer also made a confidence interval from the sample. Click it a couple
more times and each time note that the computer took a random sample of 30, found the sample mean
and put a dot for the sample mean on the graph. Notice the sample means and the confidence intervals
created are different each time.
Now let’s speed up the process. Click on “generate 10 samples” a couple times. This calculates 10
random sample means at a time for us. Let’s go faster. Click on “generate 100 samples” a couple times.
This calculates 100 random sample means at a time for us. Notice that some of the confidence intervals
contain the population value (green) and some don’t (red). Let’s really speed this up and get a ton of
random samples. Click on “generate 1000 samples”. This calculates 1000 random sample means and
1000 confidence intervals. You have created a sampling distribution of sample means and confidence
intervals from each sample.
Answer the following questions about the confidence intervals. Remember green means it contained
the population value and red means it did not contain the population value.
1. Notice the confidence intervals for sample means were different for each random sample. Discuss
the implications of sampling variability on the accuracy of a confidence interval from a random
sample.
2. Did all the confidence intervals contain the population value 134.338 cents per pound? What does
it mean that the interval “contained” or “captured” the population value?
3. How many total random samples did you take? How many of them contained the population
value? What percent of the confidence intervals contained the population value?
4. How many confidence intervals did not contain the population value? What percent of the
confidence intervals did not contain the population value?
5. As the number of random samples increased, did the percentage get closer or farther away from
95%? Why do you think that is?
6. Rewrite the definition of 95% confident and explain it now in your own words using what you have
learned from the sampling distribution for sample means.
Part II (Confidence Intervals with Sample Percentages)
Let’s look again at the topic of flipping a coin. Remember that the population percentage of tails should
be 0.5 (50%). In activity 3 we wanted to explore how far random sample percentages would be from the
population value. So we created our own Sampling distribution for sample percentages. This was a lot
of work. Remember, everyone in our class flipped the coin 20 times and calculated the sample
percentage of tails. They then put the dots on the board for each sample percentage to create the
distribution.
As with the sample means we can use StatKey at www.lock5stat.com to create random samples and find
sample proportions (percentages). This time though we are going to create confidence intervals from
each sample, so we can learn more about how confidence intervals work.
Go to the website www.lock5stat.com . Click on the button that says “StatKey”. Look for the tab that
says “sampling distributions proportion” and click on it. Click on the edit proportion tab. Change the
proportion to 0.5 and then push ok. Yesterday we flipped the coins 20 times, so click on “samples of size
n” and put in 20 and click ok.
You are now ready to have the computer flip some coins and create your sampling distribution. It is
always best to start slowly by having the computer take 1 random sample for you. So click on “Generate
1 sample”. Let’s see if we understand what we are looking at. The computer simulated flipping the coin
20 times and calculated the sample percentage for getting tails. The right side at the bottom shows the
actual flips. Then the computer put a dot on the graph exactly like we did when we put magnets on the
board. On the right side again, click the button that says “confidence intervals”. Click it a couple more
times and each time note that the computer took a random sample of 20, found the sample percentage
and put a dot for the sample mean on the graph. It also calculated the confidence interval from the
sample percentage. Each time look and see if the interval contains the population percentage of 0.5
(50%). Remember green means it does and red means it does not. Notice the sample percentages and
the confidence intervals are different each time.
As with sample means, let’s speed up the process. Click on “generate 10 samples” a couple times. This
calculates 10 random sample percentages at a time for us. Note how many intervals contain the
population percentage of 0.5 and how many don’t. Let’s go faster. Click on “generate 100 samples” a
couple times. This calculates 100 random sample percentages at a time for us. Let’s really speed this up
and get a ton of random samples. Now click on “generate 1000 samples”. This calculates 1000 random
sample percentages and 1000 confidence intervals. Answer the following questions.
Answer the following questions about the confidence intervals. Remember green means it contained
the population value and red means it did not contain the population value.
7. Notice the confidence intervals for sample percentages were different for each random sample.
Discuss the implications of sampling variability on the accuracy of a confidence interval from a
random sample.
8. Did all the confidence intervals contain the population value 0.5? What does it mean that the
interval “contained” or “captured” the population value?
9. How many total random samples did you take? How many of them contained the population
value? What percent of the confidence intervals contained the population value?
10. How many confidence intervals did not contain the population value? What percent of the
confidence intervals did not contain the population value?
11. As the number of random samples increased, did the percentage get closer or farther away from
95%? Why do you think that is?
12. Rewrite the definition of “95% confident” and explain it now in your own words using what you
have learned from the sampling distribution for sample percentages.
Optional: Go over Conf Int Notes (PDF online only) on the
history of the t-distribution before doing Conf Int Act 10.
Math 140 Confidence Interval Activity 10
Exploring the T-distribution
There is a lot of difference between population and sample standard deviations, especially in
small sample sizes. The t-distribution is a normal distribution which takes these differences into
account when counting the number of standard deviations. The t-distribution takes into
account the sample size n. We measure the sample size by using degrees of freedom = n-1.
(You can read my article “why n-1” on my website under the articles section for further
information. Think of t-scores like z-scores but with a different normal curve depending on the
sample size.
Directions: Use the t-distribution applet on OLI page 190 to answer the following questions.
Recall that the critical “z” values for 90%, 95% and 99% confidence are
1.645,  1.960,  2.576 respectively.
1. For a sample size of n = 5, find the degrees of freedom. Use the applet and the degrees of
freedom to find the critical t-values for 90%, 95% and 99% confidence. How do they compare
to the critical z-values?
2. For a sample size of n = 17, find the degrees of freedom. Use the applet and the degrees of
freedom to find the critical t-values for 90%, 95% and 99% confidence. How do they compare
to the critical z-values?
3. For a sample size of n = 45, find the degrees of freedom. Use the applet and the degrees of
freedom to find the critical t-values for 90%, 95% and 99% confidence. How do they compare
to the critical z-values?
4. What do you notice about the difference between the t-scores and the z-scores as the
sample size increases?
Confidence Intervals Notes
Checking Assumptions for Confidence Intervals for Means and Proportions
Remember that a lot can go wrong when we use sample data to estimate a population value. There are
several types of bias that can mess up our data and make it not reflect the population. Was it a random
sample or a census? Remember convenience or voluntary response has an inherent sampling bias and
usually will not represent the population well. Was the sample size large enough? A sample of size 5,
even if random, usually will not give us enough data to reflect the population. Are there other sources
of bias (Question Bias, Response Bias, Non-Response Bias, Deliberate Bias) that could mess up our data?
We want to make sure these are addressed as well.
Understandably, if we have done our best to deal with possible sources of bias in our sampling
technique, we will need to check the following assumptions. All assumptions check that the data was
collected correctly and is large enough.
Assumptions for Quantitative Data (Sample Means) (These need to be met in order for the data to have
a chance of estimating the population mean.)
1. Random Sample or Census
2. Is the sample Large enough? For sample means we like our random samples to have a sample size of
30 or higher.
 n  30
Note: If the sample size is less than 30, we will need the data to be nearly
normal (almost bell shaped).
Assumptions for Categorical Data (Sample Percentages) (These need to be met in order for the data to
have a chance of estimating the population percentage.)
1. Random Sample or Census
2. Is the sample Large enough? For sample percentages we like our random samples to have at least 10
successes and at least 10 failures. If we are looking for the percentage of people in Brazil that have type
II diabetes, we will need at least 10 people that have type II diabetes (success) and at least 10 people
that do not have type II diabetes. Note: This does not mean a sample size of 20. If an event is rare, you
may have to collect data on 2000 people before you find 10 successes.
Note about sample size. As long as the data is a random sample or a census, the larger the sample size
(frequency) the better. A random sample of 200 will be a more accurate representation of the
population than a random sample of 30. However, a random sample of 30 is much more accurate than
a voluntary response sample of 10,000!! (Method trumps size)
Math 140 Conf Int Act 11
Checking Assumptions for Confidence Intervals
If we want to say something about a population, we need our data to be collected correctly (random or
census) and the size of the data set needs to be large enough. Remember the “garbage” principle.
Garbage data is data that has sources of bias or is not large enough. It cannot help us find population
values as it does not represent the population.
Directions Part I: Look at the following problems and check to see if they meet the assumptions for
estimating a population mean. Also look out for possible sources of bias that might make our data
not representative of the population. If you feel the data meets all the assumptions, then use
Statcrunch to create a 95% confidence interval estimate of the mean. Remember to use the
T-distribution. If it does not meet all the assumptions or has an obvious source of bias, do not make
the confidence interval.
Here are the assumptions that we like to check before making a confidence interval estimate of a
population mean.
Assumptions for Quantitative Data (Sample Means) (These need to be met in order for the data to have
a chance of estimating the population mean.)


Random Sample or Census
Is the sample Large enough? For sample means we like our random samples to have a sample
size of 30 or higher.
 n  30 Note:
If the sample size is less than 30, we will need the data to
be nearly normal (almost bell shaped).
1. A company wants to know the mean average amount of alcohol drunk by people vacationing in Las
Vegas per day. They posted a survey on Facebook asking how much people drink when on vacation in
Las Vegas. 8,355 people responded and listed how much they drink. The sample mean was 3.5 drinks
per day with a standard deviation of 1.2 drinks.
2. Jimmy works for a company that designs new homes. The company just moved to Oklahoma City
and his boss wants him to find out the average price of all homes in Oklahoma City. Jimmy had a
computer randomly select 28 homes. He then found out the price paid for each home. A histogram of
the employees showed a bell shaped distribution. The sample mean average price of the 28 homes was
$89650 with a standard deviation of $7625.
3. Rick works for a sports equipment manufacturing company. He wants to know how much the
average customer spends at his stores. So he went to the store by his house and kept track of how
much each customer spent. There were a total of 45 customers and the histogram of the data showed a
skewed right distribution. The mean average price of the sample was $71 with a standard deviation of
$19.
4. Rachael works for a company that manages apartment buildings in Northridge, CA. She wants to
determine the mean average price of a 1 bedroom apartment in Northridge. To determine this, she
goes to every apartment building listed in Northridge and asks the managers the average price of all
their apartments. Her sample size was 213 and her sample mean average was $1325. The standard
deviation was $185.
5. Mike is trying to take an opinion poll and found out the average amount of money in thousands of
dollars that people in Los Angeles would pay per year in order to have an NFL football team. He
randomly selects 3 streets in Los Angeles and asks every person living on those streets. His sample size
was 63 people, but the histogram was drastically skewed right. The mean average amount of money
was 1.3 (thousand dollars) with a standard deviation of 0.6 (thousand dollars).
Directions Part II: Look at the following problems and check to see if they meet the assumptions for
estimating a population percentage. Also look out for possible sources of bias that might make our
data not representative of the population. If you feel the data meets all the assumptions, then use
Statcrunch to create a 95% confidence interval estimate of the population percentage. If it does not
meet all the assumptions or has an obvious source of bias, do not make the confidence interval.
Assumptions for Categorical Data (Sample Percentages) (These need to be met in order for the data to
have a chance of estimating the population percentage.)



Random Sample or Census
Is the sample Large enough? For sample percentages we like our random samples to have at
least 10 successes and at least 10 failures.
Individuals in the data should be independent of each other.
6. Marsha works for the Republican Party and is asked to estimate what percentage of people in
Sacramento will vote for the republican candidate in the next election. She has a computer randomly
pick phone numbers with a Sacramento area code. She then calls the phone numbers and asks people if
they would vote for the republican candidate. She spoke with a total of 123 people and 37 said they
would vote for the republican candidate.
7. A health organization is doing a study on smoking tobacco and are trying to determine what
percentage of tobacco smokers under 25 years old use a pipe. They had people hang out in stores that
sell tobacco and pipes and counted how many total people under 25 years old came to the store and
how many of them used a pipe. They found that out of the total of 79 people under 25 years of age, 8 of
them used a pipe.
8. The COC Admissions department wants to see what percentage of students would be in favor of
using a new program to register for classes. They put a link on their website so that any students that
want to try out the program can. The students can then take a survey and say how well they like the
new system. 247 students tried the system and 112 of them said they liked the new system.
9. Michelle, a teacher at Valencia High, wants to see what percentage of students at Valencia High
school will be attending COC. She gives the students in her English 1 class a questionnaire to fill out that
asks where they will be attending college. Of the 34 students in her class, 26 are planning on attending
COC.
Math 140 Confidence Intervals Act 12
Central Limit Theorem
Directions: Watch the videos on the central limit theorem in the OLI book and answer the following
questions. Page numbers may vary. The videos can be found in Mod 27 on the pages that say the
following: “Distribution of Sample Means (1 of 4)” and “Distribution of Sample Means (3 of 4)”.
1. The videos refer to the term “The distribution of sample means”. Explain that phrase to a person
that does not know statistics.
2. If the original population is normally distributed, will random sample means and random sample
percentages have a distribution that is also normally distributed? Explain.
3. If the original population is skewed, will random sample means and random sample percentages have
a distribution that is also normally distributed? If not, what sample size will ensure that the sampling
distribution is normal? How does this relate to the assumptions we check when trying to estimate a
population value? Explain
4. State the Central Limit Theorem (One of the most important theorems in statistics.)
5. What are some of the consequences of the central limit theorem and how does it relate to the
assumptions we check when trying to estimate a population value?
Go over Conf Int Notes (PDF online only) on Central Limit Theorem
after doing Conf Int Act 12 to make sure students have right answers.
Confidence Intervals Act 13
Exploring Sampling Variability for 2 Population Mean or
2 Population Percentage with a Sampling Distribution
Part I – Columbian coffee verses Brazilian coffee: Difference between two sample means
Our goal is to understand how random samples can estimate the population mean difference
between two groups. To explore this topic we will look at random sample means from two
populations (Columbian mild and Brazilian coffee price data).
Normally we never know the difference between two population means, but for this exercise
we have the population census data. The population mean for the Columbian mild coffee
was 134.338 cents per pound and the population mean for the Brazilian natural coffee was
111.617 cents per pound. If we subtract the Columbian population mean minus the Brazilian
population mean we get 134.338 – 111.617 = +22.721 cents per pound. Our goal is to
compare random sample mean differences with the population mean difference of +22.721
cents per pound.
Open the Sampling Distribution II data on the statistics / data sets page on
www.teachoutcoc.org . We have taken 60 random samples from the Columbian coffee data
and 60 random samples from the Brazilian coffee data. Each person will compute 2 sample
mean differences. For example, the first person in class will calculate the sample mean from
“Columbian 1” and the sample mean from “Brazilian 1”. Then subtract Columbian mean –
Brazilian mean. Repeat for “Columbian 2” and “Brazilian 2”. Etc. Once everyone in class has
calculated two sample mean differences, go up to the board and put up magnets for each of
your sample mean differences.
Answer the following questions.
1. Describe how the sampling data was collected. Were we comparing means from two
independent groups or was it matched pair? How do you know? Suppose that instead of
randomly selecting 30 prices from the Columbian data and 30 prices from the Brazilian data, we
had the computer randomly select months and then subtracted the prices in those months.
Would that be independent groups or matched pairs? Explain.
2. What were your two sample means? Was your Columbian sample mean or the Brazilian
sample mean more expensive? How much more expensive? When you subtracted
Columbian – Brazilian, did you get a positive or negative value? Remember, the population
difference when we subtract Columbian minus Brazilian is +22.721 . How far was your sample
difference from +22.721? What does this tell us about sampling variability when trying to
estimate a population difference. (Repeat these questions twice for each sample difference
calculated.)
3. In general, what does a positive difference tell us? What does a negative difference tell us?
If the difference came out to be exactly zero, what would that tell us?
4. Now look at the sampling distribution on the board. Each person in class has put up two
magnets (or drawn two dots). (Remember, the population difference when we subtract
Columbian minus Brazilian is +22.721) . Were all the sample differences the same or was there
a lot of variability in sample mean differences? Were all the sample differences exactly the
same as the population value 22.721 or was there a lot of variability from the actual population
mean difference. Discuss the implications on the difficulty of finding an unknown population
mean difference when all you have is one sample from each of your two groups.
5. What is 95% of the 60 dots on the board? Find two values that 95% of the dots fall in
between.
6. Find the shape of the sampling distribution and estimate the center and standard deviation
of the sampling distribution. This is “Standard Error”. (Hint: If you look at the two numbers in
question #5, find the difference and divide by 4 to find an approximate standard deviation.)
7. Does your sample data meet the assumptions for estimating the difference between two
population means? Explain how.
Part II – Flipping a coin with the Left hand verses Right hand: Difference between two sample
percentages.
Our goal is to understand how random samples can estimate the population percentage
(proportion) difference between two groups. To explore this topic we will look at random
sample percentages from two populations, coin flips with the left hand verses coin flips with
the right hand.
Normally we never know the difference between two population percentages, but for this
exercise we do. The population probability of getting tails with the left hand is 0.5 and the
population probability of getting tails with the right hand is also 0.5 . Hence the population
mean difference is 0.
Just because the population difference is zero, does not mean that two random sample
percentage when subtracted will always get zero. Flip the coin 20 times with the right hand
counting how many times you got tails. Calculate the sample percentage of getting tails with
right handed coin tosses. (Leave your answer in decimal form.) Now flip the coin 20 times
with the left hand counting how many times you got tails. Calculate the sample percentage
of getting tails with left handed coin tosses. (Leave your answer in decimal form.) Now
subtract your right handed sample percent minus your left handed sample percent. Put your
difference on the board with a magnet or drawn dot. Repeat this process twice so that you
will calculate two sample percent differences.
Answer the following questions.
8. Describe how the sampling data was collected.
9. What were your two sample percentage? Was your right handed sample percent or your
left handed sample percent greater? How much more? When you subtracted
right handed sample percent minus left handed sample percent, did you get a positive value, a
negative value or zero? Remember, the population difference was zero. How far was your
sample difference from zero? What does this tell us about sampling variability when trying to
estimate a population difference. (Repeat these questions twice for each sample difference
calculated.)
10. In general, what does a positive difference tell us? What does a negative difference tell us?
If the difference came out to be exactly zero, what would that tell us?
11. Now look at the sampling distribution on the board. Each person in class has put up two
magnets (or drawn two dots). (Remember, the population difference should be zero.) Were all
the sample differences the same or was there a lot of variability in sample percent differences?
Were all the sample differences exactly the same as the population value of zero or was there a
lot of variability from the actual population percent difference. Discuss the implications on the
difficulty of finding an unknown population percentage difference when all you have is one
sample from each of your two groups.
12. Count the dots on the board. What is 95% of the number of dots on the board? Find two
values that 95% of the dots fall in between.
13. Find the shape of the sampling distribution and estimate the center and standard deviation
of the sampling distribution. This is “Standard Error”. (Hint: If you look at the two numbers in
question number 12, find the difference and divide by 4 to find an approximate standard
deviation.)
14. Does your sample data meet the assumptions for estimating the difference between two
population percentages? Explain how.
Two pictures of Conf Int Act 13 (PDF online only) board work is available.
Go over Conf Int Notes (PDF online only) on two population
confidence intervals to provide closure for Act 13 and to set up Conf Int Act 14
Math 140 Confidence Intervals Activity #14
Confidence Intervals for Difference between 2 Means
or 2 Percentages with Statcrunch, Minitab or Statcato
Confidence intervals can be constructed with Statistics software like Statcrunch, Minitab and
Statcato.
Statcrunch: To construct a confidence interval for two proportion, go to “Stat” tab, then
“Proportion stats”, then “Two Sample” and “with summary”. Plug in the number of events and
trials and choose “confidence interval”. To construct a confidence interval for the mean with
the T-score, go to “Stat” tab, then “T-stats”, then “Two Sample” (if independent) or “paired”(if
paired data) and “with summary”(if you have summary stats) or “with Data” (if you have the
raw data). If you have summary stats, you will also need to plug in the sample means, sample
standard deviations and sample sizes and choose “confidence interval”. (Note: We will see
later that we may choose to use the t-distribution instead of the z-score).
Statcato: To construct a confidence interval on Statcato, go to the “Statistics” tab, then to
“Confidence Intervals”. Then click on the type of confidence interval you want (2 proportion or 2
mean or paired). Under summary statistics enter the information required.
Directions: For each problem, find your confidence intervals with Statcrunch, Minitab or
Statcato and answer the following questions.
a) Does the data meet the assumptions for inference with two population proportions or
two population means? If it is two means, are the groups independent or matched
pair? List the assumptions needed and how the problem meets them or does not
meet them.
b) Find the difference between the sample means or the sample percentages. Does it
look like a significant difference? (Make a guess.)
c) Use Statcrunch, Minitab or Statcato and the given confidence level to calculate a
confidence interval estimate of the difference between the population proportions.
d) Was there any significant difference between the two populations? How do you
know? If so which population has a larger percent? Use the confidence interval to
estimate how much larger it is?
e) Write a sentence explaining the confidence interval. Should be similar to part (c).
1. The ACT exam is used by many colleges to test the readiness of high school students for
college. Many high school students are now taking ACT prep classes. A local high school
offers an ACT prep class, but wants to know if it really helps. Twenty-eight students were
randomly selected. They took the ACT exam before and after taking the ACT prep class.
For each student the difference between the after and before scores were measured (d =
after – before). The mean of the differences was 5.8 with a standard deviation of 4.3 . A
histogram of the differences yielded a bell shaped distribution. Were the two samples
independent or were they matched pairs? Construct a 90% confidence interval estimate of
the difference between the ACT scores after and before the ACT prep class. Write a
sentence interpreting the interval.
2. A question was asked in a tattoo magazine whether a man or a woman is more likely to
have a tattoo. A random sample of 857 men found that 146 of them had at least one
tattoo. A random sample of 794 women found that 137 of them had at least one tattoo.
(Create two confidence intervals using a 90% and 95% confidence level.)
3. Cotinine is an alkaloid found in tobacco and is used as a biomarker for exposure to
cigarette smoke. It is especially useful in examining a person’s exposure to second hand
smoke. A random sample of 90 non-smoking American adults was collected. These adults
were not smokers and did not live with any smokers. The average cotinine level for this
sample was 7.2 ng/mL with a standard deviation of 5.8 ng/mL. A second sample of 85 nonsmoking American adults was then collected. These adults did not smoke themselves, but
did live with one or more smokers. The average cotinine level for this sample was 28.5 and
had a standard deviation of 11.4 . Were the two samples independent or were they
matched pairs? Construct a 95% confidence interval estimate of the difference between
the cotinine levels of the those that live with smokers and those that do not live with
smokers. What does this tell us about the effects of second hand smoke?
4. A body mass index of 20-25 indicates that a person is of normal weight. A random sample
of 745 women and 760 men found that 198 of the women and 273 of the men had a
normal BMI score. But is there a significant difference between the percent of individuals
with a normal BMI for men and women? (Create two confidence intervals using a 95% and
99% confidence level.)
5. Open the Male Health Data set. Copy and paste the systolic and diastolic blood pressure
columns into Statcato. This is data from 40 randomly selected men throughout the U.S.
We want to explore the relationship between a man’s systolic blood pressure and his
diastolic blood pressure. Were the two samples independent or were they matched pairs?
Construct a 98% confidence interval estimate of the difference between the systolic and
diastolic blood pressure. Write a sentence interpreting the confidence interval. Which is
greater, diastolic blood pressure or systolic blood pressure?
6. A new medicine has been developed that treats high cholesterol. An experiment was
conducted and adults were randomly selected into two groups. The groups had similar
gender, ages, exercise patterns and diet. Of the 420 adults in the placebo group, 38 of
them showed a decrease in cholesterol. Of the 410 adults in the treatment group, 49 of
them showed a decrease in cholesterol. Was the medicine effective in lowering
cholesterol? (Create two confidence intervals using a 97% and 99% confidence level.)
7. Now open the Male and Female Health Data set. Copy and paste the Male Cholesterol and
Female Cholesterol levels into Statcato. (You may want to rename the columns to help
distinguish between the two.) Were the two samples independent or were they matched
pairs? Construct a 95% confidence interval estimate of the difference between men’s
cholesterol and women’s cholesterol. Write a sentence interpreting the confidence
interval. Are men and women’s cholesterol levels about the same, or is one greater than
the other?
8. In March 2003, a research group asked 2400 randomly selected Americans whether they
believe that the U.S. made the right or wrong decision to use military force in Iraq? Of the
2400 adults, 1862 said that they believed that the U.S. did make the correct decision. In
February 2008, the question was asked again to 2180 randomly selected Americans and
684 of them said that the U.S. did make the correct decision. Has the proportion of
Americans changed between 2003 and 2008? If so, how much? (Create two confidence
intervals using a 90% and 95% confidence level.)
9. Wrap up question: When you go from a higher confidence level to a lower confidence
level, were you likely to change your answer as to whether the two populations had a
significant difference? What about if you go from a lower confidence to a higher
confidence? Do you think that is always the case? Why?
Go over Conf Int Review Notes (PDF online only) before doing
the Conf Int Review Sheet. Review Sheet answers included.
Confidence Intervals and Sampling Distributions
Review Sheet with Answers
1. Write a paragraph discussing the topic of sampling variability. Make sure to address the
following questions. Will random samples always give the same sample means and sample
percentages (proportions)? Will random sample means and random sample percentages
always be the same as the population values? How difficult is it to estimate a population value
from 1 random sample? What does this tell us about the accuracy of most population claims in
the news and online?
2. Define the following terms
a) Sampling Distribution
b) Standard Error
c) Margin of Error
d) 95% confident
e) Confidence Interval
3. We spent a lot of time thinking about and working with sampling distributions. What is a
sampling distribution and why is it better than just looking at a single random data set? What
can the shape, center and spread of a sampling distribution tell us? What is the difference
between standard error and standard deviation? How can we calculate standard error from the
sampling distribution?
4. State the Central Limit Theorem and discuss its many implications.
5. For each of the following, identify the sample value and the margin of error. Then calculate
the confidence interval and write a sentence explaining the interval to someone in context.
These all came from large, random samples.
a) Weights of small breed of dog: 6.8 pounds  1.7 pound error
b) Percentage of seniors worldwide with Alzheimer’s Disease: 11.2%  1.3% error
c) Difference between the heights of women and men worldwide (women – men)
5.3 inches  1.7 inches
d) Difference between the percentage of people with high blood pressure in Bulgaria
and the percentage of people with high blood pressure in Finland.
(Bulgaria ave chol – Finland ave chol)
48.9%  2.6% error
6. For each of the following confidence intervals, write a sentence explaining the interval in
context. Then use the formulas below to calculate the sample value and the margin of error. Is
the sample value a mean, a percentage, a difference between two means, or the difference
between two percentages?
Sample Value =
Upper Limit + Lower Limit
2
Margin of Error =
Upper Limit  Lower Limit
2
a) 95% confidence interval estimate of price of gas in dollars (U.S.A. November 2015).
$2.16 , $2.24
b) The population percentage chance of getting Lyme Disease if bitten by a tick:
1.2% , 1.5%
c) Difference between the amount of hemoglobin per decaliter (g/dL) of blood in men
and in women worldwide (men – women)
1.6 grams , 2.0 grams  .
(Note: men is population 1 and women is population 2.)
d) In track and field, the population difference between men’s world speed record
times and women’s world speed record times in percentage (men – women) is
between  10.35% ,  9.65% . (Note: men is population 1 and women is
population 2.)
7. As a confidence level increases, does the margin of error increase or decrease? Explain why.
As a confidence level increases, does the confidence interval get wider or narrower? Explain
why. As the sample size increases does the standard error decrease or increase? Explain why.
8. Check the assumptions in order to determine whether or not we can estimate a population
value from the sample data. If the problem does not meet the assumptions, explain why. If the
problem does meet the assumptions, explain how and then use Statcrunch to calculate a 95%
confidence interval estimates from the data. Then write a sentence to explain the confidence
interval.
a) We wish to know the average salary for all K-12 teachers in the U.S. We took a random
sample of teachers across the U.S. and found the following data. A histogram showed a skewed
right shape.
Sample Size = 75
Sample Mean = $44,666
Sample Standard Deviation = $7,362
b) We wish to know what percent of people in Los Angeles have AB negative blood type. We
advertised on a TV commercial for people to come and donate blood for free ice cream. 325
people came to give blood, but only 4 had AB negative blood.
c) A hospital wants to know what percent of all of their patients are satisfied with their care
while at the hospital. They had a computer randomly select Hospital ID numbers. Those
patients selected were given a survey to fill out. Of the 115 patients selected, 87 said they were
satisfied with their care.
d) Jimmy is a skate board designer. He wants to know if the percentage of elementary and
junior high school students that like his skate boards is greater than the percentage of high
school students that like his skate boards. To find our Jimmy brings some of his boards over to
the elementary, junior high, and high schools near his house. He then asks kids after school if
they like his skate boards. Of the 135 elementary and junior high school students that Jimmy
asked, 102 said they liked the skate boards. Of the 87 high school students that Jimmy asked,
44 said they like the skate boards.
e) Use the health data on the website. This data describes the health statistics for 40 randomly
selected men and 40 randomly selected women. We want to use the data to estimate the
difference between the average BMI (body mass index) for women and the average BMI for
men. Is there a significant difference between women and men?
Review Sheet Answers - Confidence Interval/Sampling Distribution
1. Sampling variability implies that when we take different random samples we get
different means and percentages. They do not come out the same. Sampling variability
also implies that a random sample mean or percent will not be the same as the
population mean or population percent. In fact we can calculate the margin of error i.e.
the difference between the sample value and the population value. Adding and
subtracting the sample value and margin of error gives us confidence intervals. It is
almost impossible to estimate a population value from a single sample. The news is
often very inaccurate when they use a sample value and tell the public it is a population
value.
2. Sampling Distribution : Take a lot of random samples and calculate the mean or percent
from each sample. We then make a graph of all of the thousands of sample means or
sample percentages. (We can analyze the shape, center and spread of the distribution
to better understand the population.)
Standard Error: Standard deviation of the sampling distribution. Not the standard
deviation of a single data set, but the standard deviation of the sample values for
thousands of data sets.
Margin of Error: How far we think one sample value could be from the population
value. Margin of error is calculated by multiplying a z-score or t-score times the
standard error.
95% confidence: 95% of confidence intervals created contain the population value and
5% of them don’t.
Confidence Interval: two numbers that we think the population value is in between.
“We are 95% confident that the population value is between ## and ##.”
3. A sampling distribution is when we take a lot of random samples and calculate the mean
or percent from each sample. We then make a graph of all of the thousands of sample
means or sample percentages. (We can analyze the shape, center and spread of the
distribution to better understand the population.) Standard deviation is the variability
in one data set. Standard Error is a measure of variability for the sampling distribution
(thousands of data sets). Standard Error measures the variability in sample means or
sample percentages. Calculate 95% of the dots in the sampling distribution, then
estimate the values that the middle 95% of the dots fall in between. If we divide the
difference between the two values into 4 sections, you get an approximate standard
error. And the center of the distribution is very close to the population value. Using
technology, you can also have the computer find the standard deviation (standard error)
of the sampling distribution directly which is more accurate.
4. CLT: Central Limit Theorem : If the samples are large enough, the distribution of sample
means or sample percentages will be normal even if the population is skewed. If the
original population is close to bell shaped (nearly normal), then any size sample will give
means that will be also nearly normal. For a distribution of means to be bell shaped we
like samples to be at least 30 or nearly normal. For a distribution of percentages we like
the sample to have at least 10 success and at least 10 failures. The larger the data set
the smaller the standard error. Standard deviation of 1 data set is quite a bit larger than
the standard error (stand dev from sampling distribution).
5. a) Sample mean = 6.8 Lbs , Margin of Error = 1.7 Lbs
Confidence Interval ( 5.1 Lbs , 8.5 Lbs ) WE are 95% confident that the population mean
average weight of this breed of small dog is in between 5.1 pounds and 8.5 pounds.
b) Sample percent = 11.2% (or 0.112) ,
Margin of Error = 1.3% (or 0.013)
Confidence Interval : (9.9% , 12.5%) or (0.099 , 0.125)
We are 95% confident that between 9.9% and 12.5% of seniors worldwide have
Alzheimer’s disease.
c) Sample mean difference = -5.3 inches
Margin of Error = 1.7 inches
Confidence interval : ( -7.0 , -3.6)
We are 95% confident that the average height of women is between 3.6 and 7 inches
less than the average height of men.
d) Sample Percent difference = 48.9% or 0.489
Margin of Error = 2.6% or 0.026
Confidence interval : ( 46.3% , 51.5% ) or ( 0.463 , 0.515 )
We are 95% confidence that the percent of people with high blood pressure in Bulgaria
is between 46.3% and 51.5% higher than the percent of people with high blood pressure
in Finland.
6. a) We are 95% confident that the mean average price of gas in the U.S. in November
2015 is between $2.16 and $2.24 .
sample mean = (2.24 + 2.16) /2 = $2.20
margin of error = (2.24 – 2.16) / 2 = 0.08/2 = $0.04 (4 cents)
b) We are 95% confident that the population percent chance of getting Lyme disease if
bitten by a tick is between 1.2% and 1.5%.
sample percent = ( 0.015 + 0.012)/2 = 0.0135 or 1.35%
margin of error = ( 0.015 – 0.012)/2 = 0.0015 or 0.15%
c) We are 95% confident that the amount of hemoglobin per decaliter of blood for men
is between 1.6 g/dL and 2.0 g/dL greater than women.
sample mean difference = 1.8 g/dL
margin of Error = 0.2 g/dL
d) We are 95% confident that world speed record times for men (population 1) are
between 9.65% and 10.35% lower than the world speed record times for women
(population 2).
Sample percent difference = -10% (or -0.1)
margin of error = 0.35% (or 0.0035)
7. As the confidence level increases, the margin of error also increases (larger z score) and
the interval gets wider. As the confidence level decreases, the margin of error also
decreases (smaller z score) and the interval gets narrower.
As the sample size increases, the standard error and margin of error decreases, which
gives a narrower interval. If the sample size decreases, you have more error so the
standard error and the margin of error increase, which gives us a wider interval.
8. a) The data does meet assumptions to check a population mean. The data is random
and even though it is skewed the sample size is greater than or equal to 30. Using
Statcrunch T-stat, 1 sample, with summary, we obtained the 95% confidence interval
( $42972.16 , $46359.84 ) . So we are 95% confident that the average yearly salary for k12 teachers in the U.S. is between $42,972.16 and $46,359.84 .
b) The data set was large enough, however this is voluntary response data and never
representative of the population. We would be wasting our time trying to use this data
to estimate a population percent or calculating a confidence interval. It needs to be
random.
c) The data does meet assumptions for calculating a population percent. The data was
random and it had at least 10 people that were satisfied and at least 10 people that
were not satisfied. Using Statcrunch , proportion stat, 1 sample, with summary, we
obtained the following 95% confidence interval (0.678 , 0.835 ). So we are 95%
confident that the population percent of patients that are satisfied with their care is
between 67.8% and 83.5%.
d) The data set was large enough, however this is convenience data and never
representative of the population. We would be wasting our time trying to use this data
to estimate the difference between two population percentages or calculating a
confidence interval for the difference. It needs to be random.
e) The data does meet assumptions to check a population mean. Both data sets are
random. The groups are independent of each other and both sample sizes are greater
than or equal to 30. Using women’s BMI as population 1 and men’s BMI as population
2, we used Statcrunch T-stat, 2 sample, with data, we obtained the 95% confidence
interval ( -2.48 BMI points , +1.96 BMI points ) . Note: If you had made men’s BMI
population 1 then you would of gotten ( -1.96 , +2.48 ) . Both answers tell us the same
thing. So we are 95% confident that there is no significant difference between women’s
BMI and men’s BMI.