Download Math 140 Notes and Activity Packet (Word) Hypothesis Testing

Document related concepts

Foundations of statistics wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Psychometrics wikipedia , lookup

Taylor's law wikipedia , lookup

Statistical hypothesis testing wikipedia , lookup

Omnibus test wikipedia , lookup

Misuse of statistics wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript
Math 140 Notes and Activity Packet (Word)
Hypothesis Testing & Simulation
Math 140 Notes and Activity#1 (Card Activity)
Introduction to Hypothesis Testing
Articles in Newspapers, Magazines or Online make population claims all the time. Often they
take a sample (may not even be random) and then tell the world that their sample value is the
same as the population value. We saw in our study of sampling variability how bad this is.
Even random samples will be different from the population value.
With all these erroneous population claims, we may want to test what someone says about a
population and see if it is consistent with data. This is called a “Hypothesis Test”. Before
getting into the specifics of formal hypothesis testing it is important to understand the main
idea of hypothesis testing and the role that simulation plays in this process.
Cards and Candy Activity
Your teacher will open a brand new deck of cards, shuffle the cards and start allowing students
to pick cards. Remember, a deck of cards has 52 cards, half red and half black. Each student
gets to pick a card. Your instructor will identify if red or black is a winning color. If you a pick a
card with the winning color, you win a piece of candy, but if you pick a card with the losing
color you have to sit back down (no candy).
Questions to answer about the Cards and Candy Activity
1. Were there any assumptions you made before the activity started. As we started doing the
activity, when did you grow suspicious that the assumptions might not be true? (This is like
when someone makes a population claim, but when we start seeing data, we grow suspicious
that the population claim might not be true) What happened in the activity that gave you an
idea that something might be wrong?
2. In hypothesis testing we often have two hypotheses, the “null hypothesis” and the
“alternative hypothesis”. What do you think the null and alternative hypothesis might be for
the Cards and Candy activity? If our sample data disagrees significantly with the null
hypothesis, we will “Reject the null hypothesis” and therefore support the Alternative.
3. If the null hypothesis was true, what was the probability of getting 5 of the same color in a
row? We couldn’t put our finger on it, but this probability is part of what was bothering us. If
the null hypothesis is true, and the chances of sample data happening is very small. It gives us
the idea that the null hypothesis might not be true. This probability of getting sample data if
we assume the null hypothesis is true is called a “P-value”. What was the approximate P-value
for the Cards and Candy activity? Our P-value has to be very low to bother us. People in
statistics often like the P-value to be less than 5% to be a significant disagreement with the
null hypothesis. Was our P-value lower than 5%?
4. We now want to write a conclusion about what we think is true about the cards. What
does the cards drawn and the probability of it happening (P-value) tell us about the cards?
5. Here are some important questions to consider in Hypothesis testing.
a) If the null hypothesis is true, could the sample data of happened by random chance
(sampling variability)? (If the answer is no or it is extremely unlikely, then the null hypothesis
may be wrong.) (If the answer is yes or likely, then the null hypothesis may be correct). Apply
this question to the Cards and Candy activity.
b) There is a difference between “convincing evidence” and “proof”. What do you
think the difference is between these? In the Cards and Candy activity, which one did we
have? Was it possible that we might have got it wrong?
c) You should always think about the ramifications of your hypothesis test conclusion
being wrong. Sampling variability is difficult to predict and many educated Statisticians have
made wrong conclusions about populations. In the Card and Candy activity, what were the
ramifications of getting the conclusion wrong?
Hypothesis Test Notes
Finding the Null and Alternative Hypothesis
Null Hypothesis : H 0
Alternative Hypothesis : H A or H1
These are competing ideas about the population.
Hypothesis Test: Procedure for checking what someone
has said about the population. (i.e. checking the “claim”)
Claim: What the person actually said about the population.
Steps for finding the Null and Alternative Hypothesis
1. Write down the claim (what person said) in
symbolic language. Write the word “claim” next to it.
2. Write down the opposite of the claim (opposing view)
in symbolic language
3. The statement that has  or  or  is the null hypothesis.
Put an H 0 next to it.
4. The statement that has  or < or > is the alternative
hypothesis. Put an H A next to it.
Symbols for population parameters:
 (population mean)
p (population percentage)
 (population standard deviation)
Example 1: A mechanic claims that the (mean) average weight of
all transmissions is more than 52 kg.
Note: The sample data is never part of the null or alternative
hypotheses. Only population statements.
Write down the claim (what person said) in
symbolic language. Write the word “claim” next to it.
  52 (claim)
Write down the opposite of the claim (opposing view)
in symbolic language
  52
The statement that has  or  or  is the null hypothesis.
Put an H 0 next to it.
The statement that has  or < or > is the alternative
hypothesis. Put an H A next to it.
H A :   52 (claim)
H 0 :   52
Example 2: The FDA says that at least 2.5% of people that
take this medicine will have serious side effects.
Write down the claim (what person said) in
symbolic language. Write the word “claim” next to it.
p  0.025 (Claim) (Note 2.5% = 0.025)
Write down the opposite of the claim (opposing view)
in symbolic language
p  0.025 (claim)
p  0.025
The statement that has  or  or  is the null hypothesis.
Put an H 0 next to it.
The statement that has  or < or > is the alternative
hypothesis. Put an H A next to it.
H 0 : p  0.025 (claim)
H A : p  0.025
Example 3: The school board claims that the average
SAT score for female high school students is the same
as the average SAT score for male high school students.
1 : female
2 : male
H 0 : 1   2 (claim)
H A : 1   2
Notes: There are 3 types of Hypothesis Tests.
Right Tailed Test (Alternative is >)
Left Tailed Test (Alternative is <)
Two Tailed Test (Alternative is  )
Look at the three previous examples. What type of test is
being used?
Example 1
H A :   52 (claim)
H 0 :   52
Type of Test? Right Tail Test
Example 2
H 0 : p  0.025 (claim)
H A : p  0.025
Type of Test? Left Tailed Test
Example 3
H 0 : 1   2 (claim)
H A : 1   2
Type of Test? Two Tailed Test
Note: Sometimes you will see the null written as “=”
Even if technically it is  or  . Will accept either answer.
Example 1
H A :   52 (claim)
H 0 :   52
OR
H A :   52 (claim)
H 0 :   52
Math 140 Activity #2
Null and Alternative Hypothesis
For each of the following problems:
a) Write the null and alternative hypothesis.
b) Label whether the null or the alternative is the original claim.
c) Tell whether this is a left tail test, a right tail test, or a two tail test.
1. According to a CNN report, besides cell phones, 93% of Americans also own a traditional
phone. But has that percentage decreased as more and more Americans opt to only use a cell
phone and throw away their traditional phones?
2. According to a recent Newspaper article, people in California spend 1.25 hours a day eating
and drinking. Suppose we want to test the claim that the number of hours spent eating and
drinking is really 1.25 hours.
3. More and more Americans are becoming financially sound and opting to not own a credit card.
According to an article in USA Today, 74% of Americans still have at least one credit card. But
this claim seems a little on the low side. We think that more than 74% of Americans own a credit
card.
4. It has long been thought that normal body temperature is really 98.6 degrees Fahrenheit. A
recent study is now claiming that normal body temperature is really lower than 98.6 degrees.
5. The standard deviation for the heights of men was thought to be 2.9 inches. New studies
disagree with this. Test the claim that the standard deviation for heights of men is not 2.9 inches.
6. Wikipedia suggests that at least 10% of the world population is left handed. Wikipedia may
not be very accurate. Test the claim that at least 10% of the world population is left handed.
7. The percent of women that hold CEO level jobs is lower than the percent of men that hold
CEO level jobs.
8. The average cholesterol level for American men and women is about the same.
9. The majority of Republicans support decreasing taxes.
Hypothesis Test Notes
Test Statistics
The ability to “test” what someone says about a population is largely dependent on being about
to tell if the random sample value was significantly different than the population value. In other
words does the random sample value significantly disagree with what the person said about the
population?
That is a problem. The answer to this is it sometimes impossible to tell with your own eyes. Let
us suppose that the population percentage is 0.25 (25%) and the sample value is 0.22 (22%). Is
the sample significantly different? We don’t know. Sometimes 3% is a significant difference and
sometimes 3% is not significant.
So how do we tell if our sample data significantly disagrees with a population value?
The answer to this is we need to measure the difference in a very special way. Knowing how
many miles different or percentage points different is not going to help us. We need to know
how many “standard errors” different they are.
This is called a Test Statistic.
Example 1
Let’s look at the percentage problem where the population percentage is 0.25 (25%) and the
sample value is 0.22 (22%). We know the sample value is % lower, but we do not know if that is
significant. Another important bit of information is the sample size. In this case it was 100.
To find this out calculate the test statistic.
Formulas for Test Statistics follow a general pattern. Remember a test statistic counts how
many standard errors that the sample value is above or below the population value. So the
formula looks like the following:
Sample Value
 Population Value 
Standard Error
Remember in the last unit we saw that statisticians often used formulas to approximate the
standard error.
Here is the formula for the test statistic when comparing a sample percentage  p̂  and
population percentages ( p ). Recall that the number of standard deviations is often represented
with at Z-score or T-score, so it is not surprising that the test statistic
is often a T or Z. Also remember that z-scores and t-scores are often rounded to the hundredths
place.
z
( pˆ  p)
p 1  p 
n
Let’s plug in our numbers. Remember that the sample percentage pˆ  0.22 and population
percentages p = 0.25 and the sample size n = 100.
z
( pˆ  p)
p 1  p 
n

(0.22  0.25)
0.25 1  0.25 
100

(0.22  0.25)
0.25  0.75 
100

0.03
 0.69
0.0433
So is it significant? We learned in the last unit that for a z-score to be significant it should be
around 2 higher or -2 or lower. (Recall the 3 famous z-scores of 1.645, 1.96 and 2.576). So if we
want to be 95% confident, we should have a z-score of around 2. To be 90% confident we need
about 1.6 and for 99% confident we need about 2.5. These are just general guidelines. We will
get a whole lot more accurate when we talk about P-value and significance levels. For now use 2
and -2 as your guideline.
Our z-score was about -0.69. So our sample percentage (22%) was only 0.69 standard errors
below the population percentage 25%. Hence it is not significant.
Example 2
Sample size plays a key role in significance. Let’s look at the same example but with a sample
size of 1000. Remember that the sample percentage pˆ  0.22 and population percentage
p = 0.25
Let’s calculate the test statistic for this one.
z
( pˆ  p)
p 1  p 
n

(0.22  0.25)
0.25 1  0.25 
1000

(0.22  0.25)
0.25  0.75 
1000

0.03
 2.19
0.013693
Notice now our sample value of 22% is significantly lower than our population value of 25%. In
fact, our sample value of 22% is 2.19 standard errors below the population percentage of 25%.
This is a significant since our z-score is less than -2 (more than 2 standard errors away).
Example 3
What do we do if we want to know if a sample mean is significantly different from a population
mean? Sometimes 13 pounds is a lot and sometimes 13 pounds is very little. An article in a
health magazine claims that the mean average weight of all men is about 175 pounds. A random
sample of 60 men found that the sample mean was 188 pounds with a standard deviation of 96
pounds. So the sample mean 188 is 13 pounds heavier than the population mean of 175. Is that
significant?
The answer again is we don’t know. We would need calculate a test statistic to see if 13 pounds
is a lot in this situation. Here is the formula for calculating test statistics to compare sample and
population means. Notice it follows the same general pattern and seeks to count how many
standard errors the sample value is above or below the population value. Notice we label the
test statistic as a T-score since again it is the number of standard deviations (errors) and the Tscore is more accurate than the Z-scores for small quantitative data sets.
T
 Sample Value
 Population Value   x   

Standard Error
 s 


 n
Now use the formula to calculate the test statistic. Remember the sample mean x-bar is 188,
the population mean mu is 175, the standard deviation s is 96 and sample size n is 60.
T
 x     188  175 
 s 


 n
 96 


 60 
13
 1.0489  1.05
12.39355
Notice our sample mean of 188 pounds is not a significant disagreement with population mean
of 175 pounds. Our sample mean of 188 pounds is only 1.05 standard errors above the
population mean of 175. Remember it needs to be close to 2 or higher to be significant.
Two Important Notes:

It is important to understand how test statistics work and what the formulas mean. It is
not important to calculate these by hand. Statistics programs like StatCrunch can
calculate the test statistic in half a second with much better accuracy. It is important that
you can explain the meaning of the test statistic and what it tells us about significance.

Test Statistics can sometimes be borderline significant which makes them hard to
interpret. Think of a z-score test statistic of 1.90. Is that significant? It is close to 2
standard errors away, but is it close enough to be considered significant? Sometimes the
answer to this is yes and sometimes it is no. In the past, statisticians would look up a
critical value to compare the test statistic to so that they can know if it was significant.
Critical values are difficult to work with because they change for every situation. We will
see that P-value is a much better way to decide significance, especially in borderline
cases.
Math 140 Activity #3
Calculating and Interpreting Test Statistics
1. Wikipedia claims that 10% of people are left handed. A sample of 250 randomly selected adults
found that 32 of them were left handed.
a) Use the formulas pˆ 
x
and z 
n
 pˆ  p 
p(1  p)
n
to find the z-score test statistic.
b) Write a sentence to interpret the meaning of the test statistic.
c) Do the sample values seem to be significantly different than the claim?
2. According to a CNN report, 93% of Americans also own a traditional phone (not a cell phone).
We took a sample of 850 randomly selected Americans and found that 785 of them own a traditional
phone.
a) Use the formulas pˆ 
x
and z 
n
 pˆ  p 
p(1  p)
n
to find the z-score test statistic.
b) Write a sentence to interpret the meaning of the test statistic.
c) Do the sample values seem to be significantly different than the claim?
3. Normal body temperature has long thought to be 98.6F . A random sample of 50 randomly
selected adults was found to have a mean average temperature of 98.2F with a standard deviation
of 0.765F .
a) Use the formula t 
x  
 s 


 n
to find the t-score test statistic. (Note: A t-score is just like
a z-score. It measures the number of standard deviations the sample value is from the
population value.)
b) Write a sentence to interpret the meaning of the test statistic.
c) Do the sample values seem to be significantly different than the claim?
4. The average height for men has long thought to be 69.2 inches. A random sample of 275
randomly selected adult men was found to have a mean average height of 69.5 inches with a
standard deviation of 2.7 inches.
a) Use the formula t 
x  
 s 


 n
to find the t-score test statistic. (Note: A t-score is just like
a z-score. It measures the number of standard deviations the sample value is from the
population value.)
b) Write a sentence to interpret the meaning of the test statistic.
c) Do the sample values seem to be significantly different than the claim?
Hypothesis Test Activity 4
Randomized Simulation - Simulating the Null Hypothesis
Notes on Simulation
As we discussed yesterday, the key to a hypothesis testing is to see if your random sample data
significantly disagrees with the population value being tested. This is difficult to do because of
sampling variability. Random samples almost always give different values and will be different
than the population value being tested most of the time.
The key question in hypothesis testing is the following:
Key Question: Could the sample data be different from the population value just because of
sampling variability (random chance)? Or is the sample value so significantly different from the
population value, that it causes us to think that the population value may be wrong. (i.e. The
sample data is not what we would expect by random chance!)
How can we answer this key question we need to simulate a distribution based on the null
hypothesis. This is often called a randomized simulation or a “randomization technique”. We
need to simulate what a distribution should look like if the population value is the same as the
null hypothesis. Then we can compare our random sample data to the distribution to see how
likely it is to happen.
Note: A “Sampling Distribution” is different than a “Randomized Simulation”.
A sampling distribution is taking lots and lots of random samples from a population. We use this
to understand sampling variability, and to estimate a population value, standard error and
confidence intervals. A sampling distribution is not based on the null hypothesis. It is just lots
and lots of samples taken from a population.
A randomization simulation is simulating the null hypothesis. Eventually we will compare one
random sample to the simulation. A simulation assumes that the null hypothesis is true and is
totally based on the null hypothesis! It is not designed to estimate a population value, but to test
what someone has said about the population value.
Directions: We will now look at the following problems and use StatKey at
www.lock5stat.com. Look on the right side of the screen where it says “Randomization
Hypothesis Tests” and click on the appropriate link (single mean, single proportion, difference
of means, difference of proportions). We will be focusing on single mean and single
proportion in this activity.
1. Normal body temperature has long thought to be 98.6F . Many scientists now think that normal body
temperature may be lower than 98.6F . We want to test if sample evidence supports the scientists claim.
Let’s look at a random sample of body temperatures from 50 adults. On the lock5 website, Click on
“randomization hypothesis tests” and then “single mean”. The 50 body temperatures have already been
entered. If you do not see them there, click on “custom data set” and then “body temperature”.
a) What is the null and alternative hypothesis? Which one is the claim?
b) List the shape, sample size, mean and standard deviation for the “original sample” (actual data
not simulated. )
c) We want to simulate a distribution under the assumption that the population value really is 98.6F .
Make sure that   98.6 and click on “generate 1 sample” to simulate the null hypothesis. It is easy to get
lost. Remember the “original sample” is our actual sample data we listed in part (b). The “randomized
sample” is a simulated (made up) data set of the same size 50 from a population with a mean of 98.6F .
What is the mean and standard deviation of the first randomized simulation? How far is the first
simulated mean from 98.6?
d) Now click on generate 1000 samples. Let’s see if we understand what we are looking at. Again, these
are not actual samples from a population. Each dot represents the mean of a simulated data set of size 50.
You now have 1001 samples and have created a randomization distribution. In a sense, we have predicted
how we expect data sets from a population with a mean of 98.6F to behave. What is the shape, center
(mean) and spread (estimated standard error) from the distribution?
e) Our goal was to know if getting a sample mean of 98.26F was something that could happen by
random chance from a population with a population mean of 98.6F ? Look at the distribution. Since this
was a left tail test, let’s look at how many dots were 98.26F . Here is an important question. If we are
wondering if 98.26F is significantly lower than the population value 98.6F , wouldn’t dots that have a
temperature lower than the sample value 98.26F also cause us to doubt the validity of the population
value 98.6F ? Of course. So we don’t want to just count how many dots are exactly 98.26F , but how
many dots are that or lower. (Left Tail) This is an important idea in Statistics.
Click on the button that says “left tail”. In the box at the bottom of the distribution, type in “ 98.26 ”.
How many dots were lower than 98.26? What percent of the distribution was lower than 98.26?
This percent is called a “P-value”.
P-value = The probability of getting the sample data or more extreme, if the null hypothesis is really true.
The randomized simulation has helped us flush out what we expect to happen if the null hypothesis was
really true.
f) Decision time. In a simulation of samples from a population with a mean of 98.6F , would a
sample mean of 98.26F be likely to happen by random chance? What does this tell us about the
validity of the “so-called” population value for normal body temperature 98.6F ? Do you still
agree with the population value? How about the scientists that said they think it is lower. Do you
have any evidence that supports their claim? Do you have convincing evidence to back up your
opinion? Do you have proof?
2. Let’s repeat the activity in #1 with another data set. An article claims that the mean average price of
houses in New York is greater than 265 Thousand Dollars. To test this claim, we took a random sample of
30 homes in New York. We listed the prices in thousands of dollars. The sample data is already in the
lock5stat website. Again, go to “Randomization Hypothesis Test” and “Test for Single Mean”. On the
upper left, click on the button that says “body temperature” and change it to “Home Prices – NY”. We are
going to use a randomized simulation to test this population value of 265 Thousand Dollars. We want to
simulate what data sets that have a population value of 265 would look like. Click on “generate 1 sample”
and then “generate 1000 samples” and answer the following questions.
a) What is the null and alternative hypothesis? Which one is the claim?
b) List the shape, sample size, mean and standard deviation for the “original sample” (actual data
not simulated. ) Is the original sample mean lower or higher than the population value?
c) What is the mean and standard deviation of the first randomized simulation? How far is the first
simulated mean from 265?
d) Now click on generate 1000 samples. Let’s see if we understand what we are looking at. Again, these
are not actual samples from a population. Each dot represents the mean of a simulated data set of size 30.
You now have 1001 samples and have created a randomization distribution. In a sense, we have predicted
how we expect data sets from a population with a mean of 265 to behave. What is the shape, center
(mean) and spread (estimated standard error) from the distribution?
e) Our goal was to know if getting a sample mean of 565.633 was something that could happen by
random chance from a population with a population mean of 265? Look at the distribution. Since this was
a right tail test, let’s look at how many dots were 565.633. Again, if we are wondering if 565.633 is
significantly higher than the population value 265, wouldn’t dots that have a value greater than 565.633
also cause us to doubt the validity of the population value 265? Of course. So we don’t want to just count
how many dots are exactly 565.633, but how many dots are that or higher. (Right Tail)
Click on the button that says “right tail”. In the box at the bottom of the distribution, type in “ 565.633 ”.
How many dots were higher than 565.633? What percent of the distribution was higher than
565.633? This percent is called a “P-value”.
P-value = The probability of getting the sample data or more extreme, if the null hypothesis is really true.
The randomized simulation has helped us flush out what we expect to happen if the null hypothesis was
really true.
f) Decision time. In a simulation of samples of size 30 from a population with a mean of 265, would a
sample mean of 565.633 be likely to happen by random chance? What does this tell us about the
validity of the “so-called” population value of 265 thousand dollars? Do you still agree that 265
thousand dollars is the mean average price of all homes in New York? What about the article that
said that the actual population value is greater than 265 thousand dollars? Do you agree with the
article? Do you have convincing evidence to back up your opinion? Do you have proof?
Now let’s look at a simulation involving checking a single population percentage (proportion). People
make claims about population percentages all the time. Now we have a way to check their claims. For #3
go to the lock5stat website and under the “Randomization Hypothesis Tests” click on “Test for Single
Proportion”.
3. In the last election, we were wondering if president Obama would be re-elected. We took a random
poll of 1057 Americans and asked them if they would vote for Obama to be re-elected. Of the 1057 people
in the poll, 583 said they would support Obama. Is this evidence convincing enough for us to know if
more than 50% of all Americans would vote to re-elect president Obama? To answer this question we will
look at simulations from a population with mean average percent of 0.5, create a distribution, and then see
how it behaves. If you go to the top left corner you can click on the link for “election poll support Obama”
and the numbers will be automatically entered for you.
a) What is the null and alternative hypothesis? Which one is the claim?
b) What was the original sample percent in the poll? Is the original sample percent lower or higher
than the population value 0.5?
c) Click on generate 1 sample. The computer has simulated talking to 1057 people when the population
percent is 50% (0.5). How many people said they support Obama in the first simulation? What was the
simulated percent?
d) Now click on generate 1000 samples. Let’s see if we understand what we are looking at. Again, these
are not actual samples from a population. Each dot represents the percent of a simulated data set of size
1057. You now have 1001 samples and have created a randomization distribution. In a sense, we have
predicted how we expect data sets from a population with a percentage of exactly 50%. What is the
shape, center (mean) and spread (estimated standard error) from the distribution?
e) Our goal was to know if getting a sample percent of 0.552 was something that could happen by
random chance from a population with a population percent of 0.5? Look at the distribution. Since this
was a right tail test, let’s look at how many dots were 0.5. Again, if we are wondering if 0.552 is
significantly higher than the population value 0.5, wouldn’t dots that have a value greater than 0.552 also
cause us to doubt the validity of the population value 0.5? Of course. So we don’t want to just count how
many dots are exactly 0.552, but how many dots are that or higher. (Right Tail)
Click on the button that says “right tail”. In the box at the bottom of the distribution, type in “ 0.552 ”.
How many dots were greater than or equal to 0.552? What percent of the simulated distribution was
higher than 0.552? This percent is called a “P-value”.
P-value = The probability of getting the sample data or more extreme, if the null hypothesis is really true.
The randomized simulation has helped us flush out what we expect to happen if the null hypothesis was
really true.
f) Decision time. In a simulation of samples of size 1057 from a population with a population
proportion of 0.5 , would a sample mean of 0.552 be likely to happen by random chance? What does
this tell us about the validity of the “so-called” population value of 0.5 (50%)? Do you still agree
that 50% of people will vote for Obama? Obama’s campaign managers are worried. Do we have
convincing evidence that more than 50% of Americans will vote for Obama. (This is the same as
asking if we have convincing evidence that Obama will win the re-election.) Do you have proof?
(Obama of course did win the re-election.)
Hypothesis Test Notes
P-value and Significance Levels
Hypothesis Test: Using Random Sample data to
decide whether a population claim seems reasonable
or if it looks very wrong.
Problem: Sampling Variability!!
Random samples are usually different and random
samples are usually very different than the population
value. There are two possibilities and we will need
to decide which one we think is correct.
Option 1: Is our random sample data different than the
population value because all random samples are
different (random chance)? In which case the population
value might be correct.
OR
Option 2: Is our random sample data different than the
population value because the population value is wrong.
Important Note:
Think of random chance (sampling variability) as a
confounding variable. In order to show that the population
value in H 0 is wrong (option 2), we have to rule out random
chance (sampling variability). In other words we have to
make sure option 1 is not correct (or at least highly unlikely),
to be able to say that population value is probably
wrong (option 2).
P-value to the rescue!!
P-value can help us decide between the two options.
Definition
P-value : The probability of getting the
sample data or more extreme by random chance if
the null hypothesis is true.
Probability & Logic principle: If the probability of an
event happening is very low, but the event keeps happening,
then we should look for a different explanation. Our
assumption about how that event works might be wrong.
Assumption: Suppose the population value is correct
(in null hypothesis H 0 ). The P-value calculates the probability
of getting the sample data by random chance based on that
assumption.
Low P-value
If the P-value is very low, then the sample data probably
did not happen by random chance. This means option 1
is very unlikely. So probably option 2 is true and our
assumption about population value being correct in the
null hypothesis is probably wrong. When that happens
we say we “Reject the Null Hypothesis”.
High P-value
If the P-value is high, then the sample data could have
happened by random chance. The sample data is probably
different because of sampling variability (all random
samples are different). Option 1 could be true, meaning
we cannot rule out random chance (sampling variability)
as a confounding variable. So we will not be able to tell if
the population value is wrong. Since it is likely that option 1
occurred, the population value in the null hypothesis
might be correct.
Since we can’t tell if the population value is right or wrong,
we say we “Fail to Reject the Null Hypothesis”.
Important Notes:
 Failing to reject H 0 does not mean that H 0 is true!!
It means we cannot tell if the population value is right
or wrong. Sampling Variability has struck again.
 A low P-value occurs when the sample value significantly
disagrees with the population value in the null hypothesis.
In other words a low P-value corresponds with a
large test statistic. Both mean that the population value
is probably wrong.
 A high P-value occurs when the sample value is pretty
close to the population value in the null hypothesis. In
other words a high P-value corresponds with a
small test statistic. Both mean that population value
might be correct.
Significance Levels
Sometimes a P-value might be border line. Remember we
want the P-value to be low to insure that the sample data
was unlikely to occur by random chance. But how low do
we need it?
Significance Levels (also called alpha levels)
 (Greek Letter Alpha)
Significance levels (  ) are a number we can compare the
P-value too. We will also see later they are also associated
with avoiding certain types of errors in statistics.
Remember confidence levels? Significance levels (  ) are
the opposite of confidence levels ( 1   ). If you want to
be 95% confident for example the significance level would
be 100%-95% = 5%. This is the most common significance
level used.
Common Confidence Levels and Significance Levels.
Confidence Level
Significance Level
90% (0.90)
10% (0.10)
95% (0.95)
5% (0.05)
99% (0.99)
1% (0.01)
So before you do your hypothesis test you should choose which significance level
you want to use. If you are unsure, use 5% as this is the most common.
Summarize:
If the P-value ≤ significance level, Reject the null hypothesis.
If the P-value > significance level, Fail to reject the null hypothesis
Take a look at top half of the “P-value Diagram”.
(Can be found on the website on the hypothesis test page.)
Summary:
Low P-value
 Sample data significantly different than the
population value
 Sample data probably did not happen by random
chance (sampling variability)
 Reject H 0
High P-value
 Sample data close to the population value
(not significantly different)
 Sample data could of happened by random chance
(sampling variability)
 Fail to Reject H 0 (Does not mean H 0 is true)
Example 1 (Write the null and alternative hypothesis and
interpret what we can learn from the given p-value. Assume
the problem meets assumptions and use a 5% significance level)
An article on line said that the average typing speed for all
adults is about 40 (words per minute). We took a large
random sample in order to test this claim. Our sample mean
was 38 (words per minute) and our P-value was 0.216
H 0 :   40 (claim)
H A :   40
P-value 0.216 > 0.05 sig level
Fail to reject H 0
(Average typing speed might be 40 might be correct
but we are not sure)
Sample mean was not significantly different than
population value of 40.
The random sample mean of 38 (wpm) could have
happened by random chance.
Example 2 (Write the null and alternative hypothesis and
interpret what we can learn from the given p-value. Assume
the problem meets assumptions and use a 1% significance level)
A pharmaceutical company is developing a new medicine to
to help people with diabetes. They want to see if the medicine
will help at least 50% of people that take it. They took a random
sample of people taking the medicine. They got a P-value
of 8.74  104 .
H 0 : p  50% (claim)
H A : p  50%
Notice the P-value is in scientific notation.
8.74 104 = 0.000874
P-value 0.000874 < sig level 0.01
Reject H 0
It is probably not true that the medicine helps at
least 50% of people that take it.
Data supports that it is less than 50%
Sample data was significantly different than
population percent (50%).
The random sample data is very unlikely to occur
by random chance.
Go over the P-value Diagram Hyp Test Notes (PDF online only) and the P-value table below
Hypothesis Test Notes
P-Value, Test Statistic & Simulation Summary Table
Large Test Statistic
(more than 2 standard errors)
Small Test Statistic
(about 1 standard error or less)
OR
OR
Small P-value
(close to zero)
Large P-value
(over 10%)
OR
OR
Sample Value in Tail
(when simulating Ho)
Sample Value not in Tail
(when simulating Ho)
Is sample data
significant?
Significant
Not Significant
Could the sample
data happen by
random chance?
very Unlikely
Could happen
Reject Ho or
Fail to Reject Ho?
Reject Ho
Fail to Reject Ho
Is there
Evidence?
Yes. Evidence
No evidence
Math 140 Activity#5
Exploring the Meaning of P-value
For each of the following problems:
a) Write the null and alternative hypothesis.
b) Use the p-value and the significance level to decide whether we should reject the
null hypothesis or fail to reject the null hypothesis.
c) Write a detailed sentence describing the true meaning of the p-value in the
context of the problem.
d) Was the sample data significantly different than the population value?
e) How likely is it that the sample data happened by random chance?
1. According to a CNN report, besides cell phones, 93% of Americans also own a traditional
phone. But has that percentage decreased as more and more Americans opt to only use a cell
phone and throw away their traditional phones? A random sample of 500 Americans was taken
and 454 of them owned a traditional phone. The p-value was found to be 0.0269. Use a 5%
significance level.
2. According to a recent Newspaper article, people in California spend 1.25 hours a day eating
and drinking. Suppose we want to test the claim that the number of hours spent eating and
drinking is really 1.25 hours. In order to do this, we take a random sample of 400 people in
California. The average number of hours for the sample was 1.22 and a p-value of 0.248 was
found. Use a 10% significance level.
3. More and more Americans are becoming financially sound and opting to not own a credit card.
According to an article in USA Today, 74% of Americans still have at least one credit card. But
this claim seems a little on the low side. In order to verify the claim that more than 74% of
Americans have a credit card, a random sample of 900 Americans was taken and 76% of them
owned a credit card and a p-value of 0.0857 was found. Use a 5% significance level.
4. It has long been thought that normal body temperature is really 98.6 degrees Fahrenheit. A
recent study is now claiming that normal body temperature is really lower than 98.6 degrees. A
random sample of 10,000 adults worldwide was conducted and the average temperature was 98.2
degrees with a p-value of 0.0023 was found. Use a 1% significance level.
Go over Conclusion Notes (PDF online only)
Math 140 Activity#6
Writing Conclusions for Hypothesis Tests
Directions: For each of the following claims:
a) Find the null and alternative hypothesis.
b) If the null hypothesis was rejected, write a detailed conclusion statement.
c) If we failed to reject the null hypothesis, then write a detailed conclusion
statement. (Note: You will be writing two detailed conclusions for each
problem.)
1. “The hospital claims that less than 4% of people who received the medication showed
symptoms of side effects.”
2. “We think that the average height of women is more than 63.5 inches.”
3. “Latest polls show that the republican candidate should receive about 54% of the
vote.”
4. “The average electrically powered car weighs less than 2000 pounds.”
5. “The medication Toprol is showing real promise in treating migraines. The majority
(more than 50%) of patients taking Toprol have seen an improvement in their
Migraine symptoms.”
Math 140 Hypothesis Test Notes
1 population mean and proportion (percentage)
Steps for any hypothesis test
Null /Alt hypothesis
Check Assumptions
Put sample data into StatCrunch (P-value and Test Statistic)
Reject Ho or Fail to reject Ho
Conclusion
StatCrunch - 1 population percentage (proportion) Hypothesis Test
Stat => Proportion-stat => 1 sample => “with data” or “with summary”
=> Hypothesis Test
Hypothesized proportion (percentage)? p  ??? (the population # in Ho and Ha)
Alternative Hypothesis?  ,  ,  (left tail, right tail or two tail)
StatCrunch-1 population mean average hypothesis test
Stat => T-stat => 1 sample => “with data” or “with summary”
=> Hypothesis Test
Hypothesized mean   ???
(the population # in Ho and Ha)
Alternative Hypothesis?  ,  ,  (left tail, right tail or two tail)
“Histogram with mean marker”
Assumptions for 1 population proportion (clarification)
Confidence Interval Assumptions (We don’t know the population percentage p so
we have to use the sample percentage p̂ )
(Random and at least 10 successes and at least 10 failures in sample data)
n pˆ  x  10
n (1  pˆ )  n  x  10
Hypothesis Test Assumptions (Someone has made a guess at the population
percentage p. This formula is also useful before you collect the data to see if you
are likely to get 10 successes and 10 failures. If you are not likely to get 10, collect
more data.)
(Random and at least 10 expected successes and at least 10 expected failures)
n p  10
n (1  p)  10
Significance Levels: (# to compare the P-value with – Is my P-value low enough?)
 1% , 5% , 10%
 Most Common: 5%
 If the P-value is lower than the significance level, then the sample data was
significantly different and significantly disagrees with the null hypothesis
 If the P-value is higher than the significance level, then the sample data was
not significantly different and does not significantly disagrees with the null
hypothesis
1 Population Proportion Example
Ex) A doctor thinks that the percent of people in a small rural community that
have a certain infection is about 6%. He took some random sample data. He
interviewed 175 people and found that 13 of them had the infection. Test the
doctor’s claim that exactly 6% have the infection. (Use a 5% significance level.)
Steps for any hypothesis test
Null /Alt hypothesis
Check Assumptions
Put sample data into StatCrunch (P-value and Test Statistic)
Reject Ho or Fail to reject Ho
Conclusion
H 0 : p  0.06 (claim)
H A : p  0.06
Type of hypothesis test? 1 population proportion test (two tail)
Assumptions?
Notice that the sample data was random and had 13 success and 175-13 = 162
failures, so it does meet the assumptions.
Here is the expected successes and expected failures based on a population
percentage of 6% and a sample size of 175. Notice the expected success were
barely over 10. Would of advised the doctor to collect more data to make sure we
get at least 10 successes.
N x p = 175 x 0.06 = 10.5
N x (1-p) = 175 x(1-0.06) = 175 x 0.94 = 164.5
StatCrunch - 1 population percentage (proportion) Hypothesis Test
Stat => Proportion-stat => 1 sample => “with data” or “with summary”
=> Hypothesis Test
Hypothesized proportion (percentage)? p  ??? (the population # in Ho and Ha)
Alternative Hypothesis?  ,  ,  (left tail, right tail or two tail)
Hypothesis test results: (from StatCrunch)
Proportion Count Total Sample Prop.
p
13
175
0.074285714
Std. Err.
0.017952318
Z-Stat
P-value
0.7957
0.4262
Test stat z = 0.796
Test Stat Sentence: The sample percent 7.4% was only 0.796 standard errors
above the population value 6%.
 Z - Test Statistic is very small (Needs to be around 2 or higher)
 This tells us that the sample value (7.4%) was close to the population value
(6%)
 Not a Significant difference
P-value = 0.426
P-Value Sentence: If Ho is true, and the population percent really is 6%, we had a
42.6% probability of getting the sample percentage of 7.4% or more extreme by
random chance.
 P-value was very large
 This tells us that the sample data could have happened by random chance
(sampling variability).
 Sample value of 7.4% is not significantly different than the population
value of 6%.
 There is not a significant disagreement between the sample data and the
null hypothesis
Pvalue (0.426) > sig level 0.05
Fail to reject Ho
Conclusion?
There is not significant sample evidence to reject the claim that 6% of the
population have the infection.
(The doctor might be correct, but we don’t have any evidence)
1 Population Mean Example
Ex 2. Test the claim that Math 140 students at COC work less than 25 hours per
week on average? (Use a 10% significance level)
(Our class made this claim.) (Use the Fall 2015 math 140 survey data)
Steps for any hypothesis test
Null /Alt hypothesis
Check Assumptions
Put sample data into StatCrunch (P-value and Test Statistic)
Reject Ho or Fail to reject Ho
Conclusion
H A :   25 (claim)
H 0 :   25
1 population mean test (Left tailed test)
Population Mean Average Assumptions?
 Random?
 Sample size at least 30 or bell shaped?
The sample was not random, but it was an incomplete census so we can assume it
represents Math 140 students relatively well.
The data was skewed (not bell shaped), but because the sample size was 331 (over
30) it does meet the greater than 30 OR normal criteria.
StatCrunch-1 population mean average hypothesis test
Stat => T-stat => 1 sample => “with data” or “with summary”
=> Hypothesis Test
Hypothesized mean   ??? (the population # in Ho and Ha)
Alternative Hypothesis?  ,  ,  (left tail, right tail or two tail)
“Histogram with mean marker”
Hypothesis test results: (from StatCrunch)
Variable
Sample Mean
Hours Work 21.265861
Std. Err.
DF
T-Stat
P-value
0.8851419
330
-4.2187
<0.0001
Sig level = 10% or 0.1
Test Stat T = -4.219
Test Stat Sentence: The sample mean was 4.219 standard errors below the
population mean of 25 hours.
 4.2 is a lot for a z-score or t-score. (way over 2)
 There was a significant difference between sample mean 21.3 hours and
population mean 25 hours.)
P-value = 0 (< 0.0001)
P-Value Sentence: If Ho is true and Math 140 students work exactly 25 hours,
then there was about 0 Probability of getting the sample data or more extreme by
random chance.
 Data did not happen by random chance. (sampling variability)
 Sample data is significantly different than population value.
 Sample data significantly disagrees with the null hypothesis
Pvalue (0) < sig level (0.1 or 10%)
Reject the Ho
Conclusion: There is significant sample evidence to support the claim that Math
140 students at COC work less than 25 hours.
Math 140 Activity#7
Hypothesis Tests for One Population Means
Directions: Use Statcrunch to perform the hypothesis tests. Make sure to give the null and
alternative hypothesis, check the assumptions, give the t test statistic and p-value, state
whether or not you reject the null hypothesis, and a conclusion. Write a sentence explaining
the meaning of the test statistic and state whether the sample value was significantly different
than the population value or not. Write a sentence explaining the meaning of the p-value and
determine if the sample data was likely to happen by random chance or not.
1. The manager at a local Starbucks wants to make sure that customers wait less than 4 minutes
from the time they order to the time that they pick up their coffee. In order to test this, twenty
random customers were selected and the staff measured the number of minute between when
the person ordered and when their drink was ready. The sample mean was 2.870 minutes and
the sample standard deviation was 1.379 minutes. Here is a histogram of the twenty wait times.
Does this data meet the assumptions necessary to perform a hypothesis test? If so, use a 1%
significance level to test the claim that the average wait time is less than 4 minutes.
Histogram of C1
5
Frequency
4
3
2
1
0
0
1
2
3
C1
4
5
6
2. Redwood trees are the tallest plants on Earth. California is famous for its giant Redwood
trees. But just how tall are they? A random sample of 47 California Redwood trees was taken
and their heights measured. (This was not easy by the way.) The sample mean average height
was 248 feet with a standard deviation of 26 feet. Does this data meet the assumptions
necessary to perform a hypothesis test? If so, use a 5% significance level to test the claim that
Redwood trees have an average height greater than 240 feet.
3. Maria is planning to attend UCLA. She is curious what the average age of UCLA students is.
Since most students that attend UCLA are in their 20’s yet there are also students up to 70 years
old, the population is positively skewed. The college conducted a random sample of 65 students
and found that the sample mean was 29.0 years old with a standard deviation of 5.2 years. Does
this data meet the assumptions necessary to perform a hypothesis test? If so, use a 10%
significance level to test the claim that the average age of students at UCLA is 30 years old.
4. Mike wants to know the average price of a hamburger. So he randomly selects 24 randomly
selected restaurants and records the price of a regular hamburger. The sample mean price was
$3.88 and the sample standard deviation was $1.14. A histogram of the data is below. Does this
data set meet the assumptions necessary to perform a hypothesis test? If so, use a 10%
significance level to test the claim that the average price of a hamburger is greater than $3.50?
Histogram of C1
9
8
Frequency
7
6
5
4
3
2
1
0
$2.00
$3.00
$4.00
C1
$5.00
$6.00
For #5-7, use the Math 140 Survey data from Fall 2015 and StatCrunch to perform the following
hypothesis tests. Use a 5% significance level for all of the problems. Make sure to make a
histogram of the data and check if the data set meets the assumptions necessary to perform the
hypothesis test.
5. Test the claim that the average age of math140 students is higher than 21 years old.
6. Test the claim that the average weight of math140 students is less than 160 pounds.
7. Test the claim that the average height of math140 students is 64 inches.
Math 140 Activity#8
Hypothesis Testing for One Proportion
Directions: Use Statcrunch to perform the hypothesis tests. Make sure to give the null and
alternative hypothesis, check the assumptions, give the t test statistic and p-value, state
whether or not you reject the null hypothesis, and a conclusion. Write a sentence explaining
the meaning of the test statistic and state whether the sample value was significantly different
than the population value or not. Write a sentence explaining the meaning of the p-value and
determine if the sample data was likely to happen by random chance or not.
Notes:

Use the formulas n p  10 and n 1  p   10 when checking assumptions

If the problem only gives the sample percent you will need to calculate the number of
successes (x value) using the formula x  pˆ n before you can plug into Statcrunch.
1. The United States has the highest teen pregnancy rate in the industrialized world. The
Center for Disease control says that as of 2011, 33% of girls get pregnant before the age
of 20. We are wondering if the teen pregnancy rate is even higher than what the CDC claims?
A random sample of 400 girls is taken. Of the 400 girls randomly selected, 144 of them were
pregnant before the age of 20. (Use a 5% significance level.)
2. Campus bookstores have increased the number of digital textbooks this school year, as
students weaned on Facebook and iPads seek virtual alternatives to heavy tomes. Digital
textbooks are projected to account for approximately 13% of course materials sold by the fall
of 2012, compared with just 3 percent of the $5.85 billion sold last year, according to the
National Association of College Stores. How accurate is the claim of 13%? Is it too high or
too low an estimate? In a random sample of 260 college course materials, 15% of the sample
course materials were digital. But is this sample percentage significant enough to contradict
the claim that the true population percentage is 13%? (Use a 10% significance level.)
3. Childhood obesity has more than tripled in the past 30 years. The percentage of children aged
6–11 years in the United States who were obese increased from 7% in 1980 to nearly 20% in
2008. If this trend continues we can expect that the percent of young children that are obese
in 2012 to be significantly greater than 20%. In order to test this claim, a random sample of
800 children in the U.S. was taken and 179 of them were found to be obese. (Use a 5%
significance level.)
4. About 1 in 3 U.S. adults—as estimated 68 million—have high blood pressure1, which
increases the risk for heart disease and stroke, leading causes of death in the United States.
High blood pressure is called the "silent killer" because it often has no warning signs or
symptoms, and many people don't realize they have it. That's why it's important to get your
blood pressure checked regularly. But is the rate of U.S. adults really this high? Another web
site claims that the true percentage of U.S. adults with high blood pressure is actually
dramatically lower than 1 in 3 (33.3%). To test this claim we randomly selected 500 adults
across the U.S. and found that 165 of them had high blood pressure. (Use a 1% significance
level.)
(#5-6) You will be using the “Math 140 Survey Data from Fall 2015 to test the following claims.
The Survey was taken in all Math 140 classes, so though it is not random, it is an attempt at a
census, so you can assume it represents the population. In Statcrunch, go to Proportion Stats, 1
Sampel, and With Data.
5. Test the claim that more than 50% of Math 140 students are female. You will need to get the
gender data and paste it into Statcrunch. Type in “Female” in the Success box.
6. Test the claim that exactly 1/3 (33.3%) of Math 140 students take their class at the Canyon
Country Campus. You will need to get the campus data and paste it into Statcrunch. Type in
“Canyon Country” in the Success box.
Hypothesis Test Notes
Type 1 and Type 2 Errors
Sampling Variability can sometimes really mess up a hypothesis test. When that
happens, there can be severe consequences. Type 1 and Type 2 errors occur when
the sample data is not reflective of the population and gives us a wrong view about
the population.
Type 1 Error (Think the alternative hypothesis H A is correct when it is not.)
 Rejecting H 0 by mistake
 Bad random sample gives Low P-value that is not reflective of the
population. The person analyzing the data then rejects H 0 and supports H A
by mistake when in actuality, the H 0 is correct.
Type 2 Error (Think the null hypothesis H 0 is correct when it is not.)
 Fail to reject H 0 by mistake
 Bad random sample gives High P-value that is not reflective of the
population. The person analyzing the data then fails to rejects H 0 by mistake
(thinks H 0 might be correct), when in actuality, the H 0 is wrong and H A is
correct.
How to stop a Type 1 Error? Lower the significance level!!
 Significance level (alpha level) is the probability of type 1 error. So to limit
the chances of a type 1 error, simply lower the significance level from
5% to 1%.
 Remember Type 1 and Type 2 are on a see-saw. As one goes up the other
goes down. If you decrease the significance level from 5% to 1%, the
probability of type 2 error (beta level) will now increase.
How to stop a Type 2 Error? Increase the sample size!! (Collect more sample data)
 Remember Type 1 and Type 2 are on a see-saw. As one goes up the other
goes down. If you increase the significance level from 5% to 10%, the
probability of type 1 error will increase, but the probability of type 2 error
(beta level) will now decrease.
 Increasing the significance level is sometimes not an option especially when
a type 1 error is really bad. So instead of increasing the significance level,
increase the sample size. More data results in a more powerful test and a
lower probability of type 2 error (decreased beta level).
Significance levels and type 1 and type 2 errors
 5% significance level (95% confidence level) is a good balance between type
1 and type 2 errors. Both are relatively low.
 1% significance level (99% confidence level) will have a lower probability of
type 1 error but a higher probability of type 2 error.
 10% significance level (90% confidence level) will have a higher probability
of type 1 error but a lower probability of type 2 error.
Summary of Type 1 and Type 2 Errors
 Type 1 error is believing that the alternative hypothesis is correct
when it is not.
 Limit the chances of type 1 error? Decrease the significance level
(alpha level)
 Type 2 error is believing that the null hypothesis is correct when it is not.
 Limit the chances of type 2 error? Increase the sample size
Examples
When exploring type 1 and type 2 errors, the key is to write down the null and
alternative hypothesis and the consequences of believing the null is true and the
consequences of believing the alternative is true.
Remember above all: Type 1 and Type 2 errors are MISTAKES!!
Example
A pharmaceutical company wants to sell a new medicine in the U.S. To get
approval they need to convince the FDA that the medicine is safe and has few side
effects. If serious side effects happen in 4% or more of the people taking the
medicine, then the FDA may not approve sale of the medicine in the U.S. If serious
side effects happen in less than 4% of people taking the medicine, then the FDA
may approve sale of the medicine in the U.S.
What is the null and alternative hypothesis?
Ho: p ≥ 4% (FDA does not allow medicine to be sold in U.S.)
Ha: p < 4% (FDA does allow medicine to be sold in U.S.)
Describe the consequences of a type 1 error and what we could do to limit the
probability of a type 1 error.
Because of some biased sample data, we got a low P-value and rejected the null
hypothesis by mistake. So we think that the alternative hypothesis is correct when
it is not. That would mean that the FDA approved sale of the medicine by mistake.
The medicine causes serious side effects in a lot of people. People could die or
become very sick. They may sue the pharmaceutical company or the FDA.
To make sure this doesn’t happen, lower the significance level to 1%.
Describe the consequences of a type 2 error and what we could do to limit the
probability of a type 2 error.
Because of some biased sample data, we got a high P-value and failed to reject the
null hypothesis by mistake. So we think that the null hypothesis is correct when it
is not. That would mean that the FDA blocked the sale of a good medicine that
rarely causes any side effects. Patients will be deprived of a good medicine and the
company will lose a lot of money in potential profits.
To make sure this doesn’t happen, increase the sample size.
Math 140 Hypothesis Testing Activity 9
Type I and Type II Errors
Directions: For each of the following problems, find the null and alternative hypothesis. Then write a
description of a type I error and the consequences of that error in the context of the problem. Then write
a description of a type II error and the consequences of that error in the context of the problem.
1. A new medication has been developed to help alleviate the symptoms of stress. In doing sample
testing, the company that created the medicine found that it seems to work fine on men, but not so well
on women. The FDA does not want to approve sale of the medicine in the U.S. if it is true that the
percent of women that the medicine helps is significantly less than the percent of men that the medicine
helps.
2. The Acura car company is debating whether to recall its latest sedan because of a malfunction in its
airbags. Acura executives think that the defect rate is probably low, but if the airbags malfunction and do
not open in 2% or more of crashes, then they will need to put out a general recall.
3. Mike and his advertisement team have created an advertisement plan for a new flavor of Pepsi.
Right now, approximately 4% of soda drinkers are purchasing this type of Pepsi. Mike needs to show his
bosses that his advertisement plan will increase the percentage of soda drinkers purchasing this new
flavor.
4. What could we do to decrease the chances that a type I error occurring?
5. What could we do to decrease the chances that a type II error occurring?
6. If we increase the significance level from 5% to 10% what will happen to the probabilities for type 1
and type II errors?
7. If we decrease the significance level from 5% to 1% what will happen to the probabilities for type 1
and type II errors?
8. What significance level achieves a good balance between type I and type II errors?
Hypothesis Test Notes
Two Population Tests
We sometimes would like to know if one population is larger or smaller than
another population. This is a two population hypothesis test.
Label which group is population 1 and which is population 2!!! (It does not matter
which group you pick to be population 1 or 2, but however you label it, make sure
you put the data into StatCrunch in that order!)
Key Question?????
We are comparing two populations by looking at sample data. Remember, like any
hypothesis test, we have to rule out sampling variability (random chance) to be
able to reject the null hypothesis.
Key Question: Why are my two samples different?
Option 1: (Random Chance) The populations are the same, and the samples are
different because all random samples are different.
Option 2: (Populations are different) The samples are different because the
populations are different.
In a two population hypothesis test, to determine if populations are different, we
first must rule out option 1 (random chance).
How can we rule out random chance???
Important Note: You cannot just look at the two sample values. Remember
sometimes a 10 pound difference is a lot and sometimes it is not a lot. Sometimes
a 3% difference is a lot and sometimes it is not a lot.
Test Statistic, P-value, or Simulation to the rescue!!
We are able to rule out random chance when the samples are significantly different
and the probability of that significant difference happening is very low.
 Large Test Statistic (T-stat or Z-stat close to +2 or higher or close to -2 or
lower.)
 Low P-value (P-value is close to zero or less than the significance level)
 Simulate what samples would look like when the populations are the same.
(If our sample difference is in the tail, then our sample difference is
significant and the probability of that sample difference (P-value) or more
extreme is very low.)
Setting up your two population hypothesis test
Step 1: Label which group is population 1 and which is population 2
and stick to it.
For example:
Population 1: women
Population 2: men
Step 2: Null and Alternative Hypothesis
(There are various ways of writing the null and alternative hypothesis, they are all
equally correct and you can use any of them)
Example Claim: Mean average salary for women  1  is lower than the mean
average salary of men  2 
H 0 : 1  2
H A : 1  2 (claim)
By subtracting 2 from both sides we get. Remember saying group 1 is lower than
group 2 is the same as saying the difference (group 1 – group 2) is negative.
H 0 : 1  2  0
H A : 1  2 < 0 (claim)
If the data is matched pair (husband and wife or same person measured twice)
then you will sometime see 1  2 written as d
H 0 : d  0
H A : d < 0 (claim)
Example Claim: The percentage of women  p1  is higher than the percentage of
men  p2 
H 0 : p1  p2
H A : p1  p2 (claim)
By subtracting p2 from both sides we get. Remember saying group 1 is lower than
group 2 is the same as saying the difference (group 1 – group 2) is negative.
H 0 : p1  p2  0
H A : p1  p2  0 (claim)
What does this mean?
H 0 : d  0
H A : d  0 (claim)
Think:
Think:
So
H 0 : 1  2  0
H A : 1  2  0 (claim)
What does that mean?
H 0 : 1  2
H A : 1  2 (claim)
H 0 : d  0
H A : d  0 (claim)
means that the two populations are the same or different.
Assumptions
2 population mean average (Check these twice)
 Random
 At least 30 or bell shaped (normal)
 Matched Pair or Independent? Remember matched pair is a one-to-one
pairing (not just something in common)
2 population proportion (percentage) (Check these twice)




Random
At least 10 success
At least 10 failures
Two groups should be independent
Test Statistics
1 population test statistic sentence: the number of standard errors that the
sample value is above or below the population value.
2 population test statistic sentence: the number of standard errors that the
sample value from group 1 is above or below the sample value from group 2.
Formula for two population test statistic (Z or T)
sample value 1  sample value 2
standard error
Example: group 1: women , group 2: men
Comparing the percentage of women to the percentage of men.
Test Statistic Z = +2.48
Sample percentage from group 1 (women) is 2.48 standard errors above the
sample percentage from group 2 (men).
Example: group 1: Valencia High School , group 2: Saugus High School
Compare the mean average SAT scores
Test Statistic T = -1.06
Sample mean average for group 1 (Valencia) was 1.06 standard errors below the
sample mean average for group 2 (Saugus).
StatCrunch Directions (Alternate null and alternative with “zero”
Two Population proportion (percentage)
Stat => Proportion-Stats => Two Sample => with data or with summary
Two Population mean average (Independent groups)
Stat => T-Stats => Two Sample => with data or with summary
Two Population mean average (matched pair with raw data)
Stat => T-Stats => Paired => columns?
Two Population mean average (matched pair with summary data d , s d , n )
Stat => T-Stats => 1 sample =>with summary =>
put in mean, standard deviation, sample size
Pool or Not to Pool? (That is the question)
1. Pooling in 2 population proportion problems (categorical data)
P-pooled is combining the # of successes and the sample sizes of your two groups
into one large sample. p 
( x1  x2 )
(n1  n2 )
Note: You are allowed to pool the two sample percentages if the population
percentages are equal.
 In confidence intervals we do not know if the populations are the same or
not. So for 2 population proportion confidence intervals: Do not pool.
 In two population proportion hypothesis tests, it is OK to Pool, because you
are assuming the population percentages are the same in null hypothesis.
(Some programs ask if you want to pool for two population proportion, but
StatCrunch does this automatically. It automatically pools for the 2 population
proportion hypothesis test standard error and automatically does not pool for
confidence interval standard error. You will see a slight difference in the standard
error for hypothesis test verses confidence interval.)
2. Pooling the variances in 2 population mean average problems.
(Quantitative data)
You should not pool the sample variances unless you are sure the population
variances are equal. Since we rarely know the population variances, do not pool
the variances in StatCrunch.
Act 11 #1 (Matched Pair with summary data)
Group 1: After ACT scores
Group 2: Before ACT scores
H A : 1  2 (claim)
H 0 : 1  2
Note: Alternate way of writing null and alternative
H A : 1  2  0 (claim)
H 0 : 1  2  0
H A : d  0 (claim)
H 0 : d  0
Two Population mean average (matched pair with summary data d , s d , n )
Stat => T-Stats => 1 sample =>with summary =>
put in mean, standard deviation, sample size
T test statistic = +2.9166
Sample mean of after scores were 2.92 standard errors above the sample mean of
the before scores.
After scores are significantly higher than before scores (class is effective)
P-value = 0.0044
If Ho is true, then there is a 0.0044 probability of getting the sample data (sample
difference) or more extreme by random chance.
(unlikely to happen by random chance, Ho must be wrong.)
P-value (0.0044) < sig level (0.05)
Reject Ho
Conclusion: There is significant sample evidence to support the claim that the ACT
prep class is effective. (After > Before)
Act 12/#2
Population 1: Marijuana
Population 2: Non-marijuana
H A : p1  p 2 (claim)
H 0 : p1  p 2
Note: in StatCrunch null and alternative
H A : p1  p2  0 (claim)
H 0 : p1  p 2  0
Z test statistic = 6.85
Percentage of group 1 (marijuana users) was 6.85 standard errors above the
percentage of group 2 (non-marijuana users)
Percent of marijuana users that use other drugs is significantly greater.
P-value = 0 (< 0.0001)
If Ho is true, there was 0 probability of getting the sample data (sample difference)
or more extreme by random chance.
(Did not happen by random chance. Population 1 significantly different than
population 2) Ho is wrong.
Reject Ho
There is significant sample evidence to support the claim the percent of marijuana
users that use illegal drugs is higher than the percent of non-marijuana users that
use illegal drugs.
Math 140 Hypothesis Test Activity#10
Using Simulation to Understand Hypothesis Tests
Difference of Means and Difference of Proportions
Notes on Using Simulation to compare two groups
The key to two population hypothesis testing is to see if one group’s sample data significantly
disagrees with the other group’s sample data. This is difficult to do because of sampling
variability. Random samples almost always give different values, so the sample values for both
groups can be different yet not indicate that the populations are different.
The key question in hypothesis testing is the following:
Key Question: Could the sample data for both groups be different just because of sampling
variability (random chance)? Or is the sample values so significantly different, that it causes us
to think that the population values may be different. (i.e. The sample difference is not what we
would expect by random chance!)
How can we answer this key question we need to simulate a distribution based on the null
hypothesis. This is often called a randomized simulation or a “randomization technique”. We
need to simulate what a distribution should look like if the two groups are the same. Remember,
if the groups are the same then their population mean difference would be zero. When
simulating the difference between two groups we simulate taking samples from a population
value of zero.
3 options



If our real sample data (not simulated) shows a difference that is significantly greater
than zero, that will give evidence that population 1 is greater than population 2.
If our real sample data (not simulated) shows a difference that is significantly smaller
than zero, that will give evidence that population 1 is less than population 2.
If our real sample data (not simulated) shows a difference that is close to zero, then that
may indicate that the populations are not significantly different.
Simulation and P-value: The key is that we will need simulation to determine what is significant
and what is not. Again, we will be looking at simulated P-values to determine this. The P-value is
the chances of getting the sample difference if the two populations are really the same. The
lower the P-value, the more evidence we will have that the two populations are different. The
higher the P-value, the more likely that there is no significant difference between the groups.
Directions: We will now look at the following problems and use StatKey at
www.lock5stat.com. Look on the right side of the screen where it says “Randomization
Hypothesis Tests” and click on the appropriate link (single mean, single proportion, difference
of means, difference of proportions). We will be focusing on difference of means and
difference of proportions in this activity.
1. Do men exercise more than women? Let’s test this claim. Go to the Lock website and click
on StatKey and “Randomization test for a difference of means”. In the upper left corner
change the box to “Male/Female Exercise Hours Per Week”. We took random data from 20
men and 30 women. Population 1 was men and population 2 was women, so the computer is
subtracting in the order of men – women.
a) What is the null and alternative hypothesis? Which is the claim? Is this a right tail, left tail,
or two tailed test?
b) Are the groups matched pairs or independent?
c) Does this problem meet the assumptions to do the test?
d) If gender does not matter and men and women work out the same, what would we expect
the difference between the groups to be?
e) The computer is making simulated samples from two populations that work out the same
number of hours. Will all the simulations have a mean difference of zero? Why not?
f) What was the sample mean difference in our real (original) sample data? Do you think this
is significant? Discuss why it is difficult to judge significance without simulation, P-value or a
Test Statistic.
g) Simulate taking samples of men and women a few thousand times. We want to know if our
original non-simulated difference of +3 hours was significant. Plug it in the bottom box. What
percent of simulations had a difference of +3 or higher? What is the P-value? Why did we
look at the values higher than +3 also and not just +3?
h) Finish the test. (Use a 5% significance level) Do you reject the null hypothesis or fail to
reject the null hypothesis? Write a conclusion that address the original claim (question).
2. Do women that don’t smoke have a greater chance of getting pregnant than those that
smoke? Let’s test this claim. Go to the Lock website and click on StatKey and “Randomization
test for a difference of proportions”. In the upper left corner change the box to “Get Pregnant
(by Smoker status)”. We took a random sample of 135 smokers and found that 38 became
pregnant. We also took a random sample of 543 non-smokers and found that 206 became
pregnant. Population 1 was smokers and population 2 was non-smokers, so the computer is
subtracting in the order of smokers – nonsmokers.
a) What is the null and alternative hypothesis? Which is the claim? Is this a right tail, left tail,
or two tailed test?
b) Does this problem meet the assumptions to do the test?
c) If smoking does not matter and the chances of getting pregnant is the same for smokers
and non-smokers, what would we expect the difference between the groups to be?
d) The computer is making simulated samples from two populations that have the same
population percentage. Will all the simulations have a difference of zero? Why not?
e) What was the sample percent difference in our real (original) sample data? Do you think
this is significant? Discuss why it is difficult to judge significance without simulation, P-value
or a Test Statistic.
f) Simulate taking samples of smokers and non-smokers. We want to know if our original
non-simulated percent difference of -0.098 was significant. Click on “left tail” and plug it in
the bottom box. What percent of simulations had a difference of -0.098 or less? What is the
estimated P-value? Why did we look at the values less than -0.098 also and not just -0.098?
g) Finish the test. (Use a 5% significance level) Do you reject the null hypothesis or fail to
reject the null hypothesis? Write a conclusion that address the original claim (question).
Math 140 Hypothesis Test Activity#11
Hypothesis Tests for Comparing Two Population Means
Directions: For each of the following problems:
a. Write the null and alternative hypothesis. Is this a right tailed, left tailed or two
tailed test? Is it two independent samples or matched pairs.
b. Check whether the problem meets the assumptions necessary to perform the
hypothesis test. List all the assumptions and how the problem meets the
assumptions or does not meet the assumptions.
c. Use Statcrunch and the information given in the problem to calculate
the t test statistic. Write a sentence explaining the meaning of the test statistic.
d. Use Statcrunch and the information given in the problem to calculate
the P-value. Write a sentence explaining the meaning of the P-value.
e. State whether you reject or fail to reject the null hypothesis and then write the
conclusion for the hypothesis test.
1. The ACT exam is used by many colleges to test the readiness of high school students for
college. Many high school students are now taking ACT prep classes. A local high school
offers an ACT prep class, but wants to know if it really helps. Twenty students were
randomly selected. They took the ACT exam before and after taking the ACT prep class.
For each student the difference between the after and before scores were measured (d =
after – before). The mean of the differences was 1.5 with a standard deviation of 2.3 . A
histogram of the differences yielded a bell shaped distribution. Use a 5% significance level
to test the claim that the prep class was effective in raising ACT scores. Make sure to give
the null and alternative hypothesis, the test statistic, the p-value and a detailed conclusion.
2. Cotinine is an alkaloid found in tobacco and is used as a biomarker for exposure to
cigarette smoke. It is especially useful in examining a person’s exposure to second hand
smoke. A random sample of 32 non-smoking American adults was collected. These adults
were not smokers and did not live with any smokers. The average cotinine level for this
sample was 7.2 ng/mL with a standard deviation of 5.8 ng/mL. A second random sample of
35 non-smoking American adults was then collected. These adults did not smoke
themselves, but did live with one or more smokers. The average cotinine level for this
sample was 28.5 and had a standard deviation of 11.4 . Use a 1% significance level to test
the claim that people that do not live with smokers have a lower cotinine level than those
people that do live with smokers. Make sure to give the null and alternative hypothesis, the
test statistic, p-value and a detailed conclusion. What does this tell us about the effects of
second hand smoke?
3. Open the Female Health Data set. Copy and paste the systolic and diastolic blood
pressure columns into Statcrunch. This is data from 40 randomly selected women
throughout the U.S. We want to explore the relationship between a woman’s systolic blood
pressure and her diastolic blood pressure. Use a 1% significance level to test the claim that
systolic blood pressure is higher than diastolic blood pressure. Make sure to give the null
and alternative hypothesis, the test statistic, the p-value and a detailed conclusion.
4. Now open the Male and Female Health Data set. Copy and paste the Male Cholesterol
and Female Cholesterol levels into Statcrunch. (You may want to rename the columns to
help distinguish between the two.) Use a 5% significance level to test the claim that the
cholesterol levels of man and women are different. Make sure to give the null and
alternative hypothesis, the test statistic, the p-value and a detailed conclusion.
5. Now open the Male and Female Health Data set. Copy and paste the Male Systolic Blood
Pressure and Female Systolic Blood Pressure into Statcrunch. (You may want to rename
the columns to help distinguish between the two.) Use a 5% significance level to test the
claim that the average Systolic Blood pressures for men and women are the same. Make
sure to give the null and alternative hypothesis, the test statistic, the p-value and a detailed
conclusion.
Math 140 Hypothesis Test Activity#12
Hypothesis Testing for Two Proportions
Directions: For each of the following problems:
a. Write the null and alternative hypothesis. Is this a right tailed, left tailed or two
tailed test?
b. Check whether the problem meets the assumptions necessary to perform the
hypothesis test. List all the assumptions and how the problem meets the
assumptions or does not meet the assumptions.
c. Use Statcrunch and the information given in the problem to calculate
the z test statistic. Write a sentence explaining the meaning of the test statistic.
d. Use Statcrunch and the information given in the problem to calculate
the P-value. Write a sentence explaining the meaning of the P-value.
e. State whether you reject or fail to reject the null hypothesis and then write the
conclusion for the hypothesis test.
1. The United States has the highest teen pregnancy rate in the industrialized world. In 2008
a random sample of 1014 teenage girls found that 326 of them were pregnant before the
age of 20. Has the proportion of teenage pregnancy increased as of 2012? In 2012, a
random sample of 1025 teenage girls was taken and 334 were found to be pregnant
before the age of 20. (Use a 10% significance level.)
2. While many Americans favor the legalization of marijuana, opponents of legalization argue
that marijuana may be a gateway drug. They believe that if a person uses marijuana, then
they are more likely to use other more dangerous illegal drugs. Use the table of random
data given below to test the claim that marijuana users have a higher percentage of other
drug use than non-marijuana users. (Hint: Use a 5% significance level.)
Uses Other Drugs Total
Uses Marijuana
87
213
Does not use Marijuana
26
219
3. An article recently suggested that the percent of women worldwide that abstain from
drinking alcohol is significantly higher than the percent of men that abstain from drinking. Use
the following sample data to test this claim. We took a random sample of 250 women and
found that that 137 of them never drink. We took a random sample of 190 men and found that
66 of them never drink. (Use a 5% significance level.)
4. A health magazine claims that marriage status is one of the most telling factors for a
person’s happiness. Use the table below to test the claim that the percent of married people
that are unhappy is lower than the percent of single or divorced people that are unhappy. The
data was collected randomly. (Use a 10% significance level.)
Unhappy Total
Married
74
200
Single or Divorced
97
200
5. A tattoo magazine claimed that the percent of men that have at least one tattoo is greater
than the percent of women with at least one tattoo. Test this claim with the following sample
data. A random sample of 794 women found that 137 of them had at least one tattoo. A
random sample of 857 men found that 146 of them had at least one tattoo. (Use a 5%
significance level.)
6. A body mass index of 20-25 indicates that a person is of normal weight. A random sample of
745 men and 760 women found that 198 of the women and 273 of the men had a normal BMI
score. A fitness magazine claims that the percent of women with a normal BMI is lower than
the percent of men with a normal BMI. (Use a 10% significance level.)
7. A new medicine has been developed that treats high cholesterol. An experiment was
conducted and adults were randomly selected into two groups. The groups had similar gender,
ages, exercise patterns and diet. Of the 420 adults in the placebo group, 38 of them showed a
decrease in cholesterol. Of the 410 adults in the treatment group, 49 of them showed a
decrease in cholesterol. The drug company claims the medicine is effective, that is they claim
that the percent of adults that have lower cholesterol is greater in the treatment group than in
the placebo group. Use the sample data to test this claim. (Use a 1% significance level.)
Use the Math 140 Survey Data. Remember this was a census of non-PAL Math 140 students
from this semester. But what does this data allow us to say about the entire population of 140
students.
8. Test the claim that the percent of female 140 students is greater than the percent of
male 140 students.
9. Test the claim that the percent of 140 students that are Republican is different than the
percent of 140 students that are Democrat.
10. Test the claim that the percent of 140 students that use Instagram is less than the
percent of Math 140 students that use Facebook.
Hypothesis Test Notes
Chi-Squared Test Statistic & Goodness of Fit Test
Remember when comparing a sample percentage to a claimed population percentage we use a
1 proportion hypothesis test and a Z-test statistic.
When comparing a sample percentage for 1 group to a sample percentage from a second group
we use a 2 proportion hypothesis test and a Z-test statistic.
In both cases the Z-test statistic counts the number of standard errors one thing is from
another.
But what if we have more than 2 groups we are comparing? Or what if we are comparing
multiple variables in multiple groups (two way table)?
The answer to both of these is the “Chi-Squared Test Statistic”.
 
2
The basic idea of any test statistic is to compare the sample data to the null hypothesis. In ChiSquared, we will calculate the “Expected Values” if the null hypothesis is true.
Example 1
Let’s suppose that the percentage of high school students that graduate are the same five
different high schools. This multiple P hypothesis test is often called a “Goodness of Fit Test”.
H 0 : p1  p2  p3  p4  p5
H A : at least one is 
Suppose we have a total of 105 students that graduate. How many would we expect from each
school? This are the “Expected Values”
Notice if Ho is true then, all the schools would have the same number of graduates from the
105 total. In other words the expected values should all be 21 (105 divided by 5).
Now we need to compare what really happened to those expected values. Here is the
observed sample data. (Observed Values)
O1  17
O2  24
O3  13
O4  25
O5  26
So when doing Chi-Squared Hypothesis Tests, think “Expected” means Ho, but
“Observed” means sample data.
The formula for the chi-squared test statistic is pretty formidable. Remember, the computer
will be doing the heavy lifting. We need to understand the formula and be able can explain it.
 
2
O  E 
2
E
Notice we are finding the difference between the observed values (sample data) and expected
values (null hypothesis). Since we will sometimes get negative numbers we are squaring the
differences. This is why the test statistic is called “Chi-Squared”. We are dividing by the
expected value so we are looking at an average of the squares. Then adding all of these
together gives us the total Chi-Squared. Remember this is a way to compare complex
categorical data to the null hypothesis.
Chi-Squared Sentence:
The sum of the averages of the squares of the difference between the observed sample data
and the expected values if the null hypothesis it true.
Let’s calculate the chi-squared test statistic for example 1. Remember all of the expected
values are 21 and the observed values are given below.
O1  17
O2  24
O3  13
O4  25
O5  26
 
2
 4 

21
2
O  E 
E
 3

21
2
17  21

21
2
 8

21
2
2
 24  21

 4 

21
21
2
 5

21
2
13  21

21
2
=
2
 25  21

21
2
 26  21

2
21
16
9
64
16
25
130





 6.19
21
21
21
21
21
21
Note: While 6.19 is a lot for a Z-score or T-score, 6.19 may not be significant for a Chi-Squared.
Remember Chi-Squared comes from adding up squared numbers and can be rather large. We
would need to see a simulation or a P-value to see if 6.19 is significant or not.
Let’s simulate what chi-squared test statistics we would expect if the null hypothesis was true.
Here is a simulation created with StatKey.
First of all, what is the shape of the chi-squared distribution?
Notice the Chi-Squared distribution is not bell shaped (normal). It is always Skewed Right.
Chi-Squared hypothesis tests are always right tail. Remember squared numbers are always
positive and adding up squared numbers gives you a positive sum. So it is impossible for ChiSquared to be negative. Chi-Squared hypothesis tests are never left tailed or two tailed.
Chi-Squared takes complicated categorical data and condenses it into 1 right tail test.
Now what about the Chi-Squared test statistic of 6.19 that we computed?
Is it significant? (In the tail)
Could it happen by random chance?
What is the estimated P-value?
Remember, like all hypothesis tests, there are two reasons for the sample data being different
than the null hypothesis. Either the null may be true and the sample data is different because
all samples are different (random chance), or, the null hypothesis is wrong.
Which is it in this case?
Notice the data is not significant (in the tail) and could have happened by random chance
(18.9%). So there is not a significant difference between the observed sample data and the
expected values from the null hypothesis. Since we have not ruled out random chance, we
cannot be sure if the null hypothesis is indeed wrong. So we would fail to reject the null
hypothesis.
Example 2
Let us suppose that someone had a different claim they wanted to test with the school
graduation data. They claim that 15% of the graduates come from school 2, 15% from school 4,
15% from school 5, 25% from school 1, and 30% from school 3. This is also a multiple P test
(Goodness of Fit Test) though the null hypothesis looks a little different. Notice the groups are
checking the percentage for the same success variable (graduating). We are only checking one
percentage in each group. This is the trademark of a Goodness of Fit test.
H 0 : p1  25%, p2  15%, p3  30%, p4  15%, p5  15%
H A : at least one is 
Let’s calculate the Chi-Squared test statistic again.
Let’s start by calculating the expected values from the null hypothesis. This null hypothesis
suggests that each group has a different percentage and therefore a different expected value.
Remember our total number of graduates was 105. The null hypothesis suggest that 25% of
those will come from school 1, 15% of them will come from school 2, school 4 and school 5,
30% will come from school 3. Remember to calculate a percentage of a total simply convert the
percentage into a decimal and multiply by the total. Here are our expected values.
E1  0.25 105  26.25
E2  0.15 105  15.75
E3  0.30 105  31.5
E4  0.15 105  15.75
E5  0.15 105  15.75
Remember these are what we expect to get if the null hypothesis is true. We can compare
these with the Observed sample data values.
O1  17
O2  24
O3  13
O4  25
O5  26
Now let’s calculate the Chi-Squared test statistic.
 
2
O  E 
 9.25

26.25

E
2
2
17  26.25

2
26.25
 8.25

15.75
2
 18.5

31.5
 24  15.75

15.75
2
 9.25

2
15.75
2
13  31.5

31.5
 10.25

15.75
2
 25  15.75

15.75
2
 26  15.75

2
15.75
2
=
85.5625 68.0625 342.25 85.5625 105.0625




26.25
15.75
31.5
15.75
15.75
 3.2595  4.3214  10.8651  5.4325  6.6706
 30.55
Is a Chi-Squared Test Statistic of 30.55 significant? Remember Chi-Squared test statistic are
squared numbers added up, so it can be very large.
Let’s calculate a P-value with StatCrunch this time to determine if it is significant.
To calculate a P-value for a Goodness of Fit test we will need to do the following.
First type in the observed sample values in a column of StatCrunch. If the null hypothesis has
specific percentages instead of equal, then type these percentages (written as decimals) in
another column. Remember the percentage has to coincide with the observed value from the
same variable.
Stat  Goodness of Fit  Chi-Squared Test
Tell StatCrunch what column your observed sample data is in. If the null hypothesis is all
groups equal then click the button that says “all cells in equal proportion” under the “Expected”
menu. In this case each school had a different percentage in the null hypothesis. So under the
“Expected” menu, click the column where the percentages are. Now click compute. Notice the
P-value says “<0.0001”. This is what StatCrunch writes when the P-value is very close to zero.
P  value  0
Remember, like all hypothesis tests, there are two reasons for the sample data being different
than the null hypothesis. Either the null may be true and the sample data is different because
all samples are different (random chance), or, the null hypothesis is wrong.
Which is it in this case?
A P-value of 0 is very significant and since P-value is the probability of the sample data
happening by random chance, this data was very unlikely to happen by random chance. So
there is a significant difference between the observed sample data and the expected values
from the null hypothesis. We have ruled out random chance and can Reject the null
hypothesis.
Key Points about the Goodness of Fit Test

A Goodness of Fit test checks the same success variable in multiple groups. The sample
data will be a single row or column of observed values. (Not a two-way table).

Sample Null and Alternative Hypothesis (two types)
H 0 : p1  p2  p3  p4  p5
H A : at least one is 
H 0 : p1  25%, p2  15%, p3  30%, p4  15%, p5  15%
H A : at least one is 

Chi-Squared Test Statistic and P-value can be calculated with simulation (StatKey) or
with StatCrunch. (Do not calculate this by hand.)

The Chi-Squared distribution is always skewed right. Any hypothesis test using the ChiSquared test statistic will always be a right tailed test.

Chi-Squared Sentence:
The sum of the averages of the squares of the difference between the observed sample
data and the expected values if the null hypothesis it true.

What are the assumptions? All Chi-Squared hypothesis tests have the same
assumptions:
1. Random
2. All expected values must be at least 5 (observed sample data is large enough)

Large Chi-Squared Test Statistic (in the tail of the simulation) and Small P-value both tell
us that the data probably did not happen by random chance and is significant. The
observed sample data significantly disagrees with the expected values from the null
hypothesis. We can therefore Reject the null hypothesis.

Small Chi-Squared Test Statistic (Not in the tail of the simulation) and Large P-value both
tell us that the data could have happen by random chance and is not significant. The
observed sample data does not significantly disagree with the expected values from the
null hypothesis. Since we cannot rule out random chance, we don’t know if the null is
right or wrong, so we Fail to Reject the null hypothesis.

Conclusions may be written in the same way as all hypothesis tests. If the claim is the
null hypothesis, then you will either have evidence to reject the claim (small P-value) or
not have evidence to reject the claim (Large P-value). If the claim is the alternative
hypothesis, then you will either have evidence to support the claim (small P-value) or
not have evidence to support the claim (Large P-value).

Degrees of Freedom = K – 1 (K is # of groups)

Expected Values (Automatically calculated with StatCrunch)
n
E  (n = sample size total, K = # of groups, use this for case when all groups are
k
assumed to be equal in null hypothesis)
E  n  p (n = sample size total , p = percentage from each group, use this for case
when each group has different percentage in the null hypothesis)
Math 140 Hypothesis Tests Activity #13
Goodness of Fit Tests
Directions: For numbers 1-3, use statkey at www.lock5stat.com to simulate the following chi-squared
goodness of fit tests. Go to “more advanced randomization tests” at the bottom of the statkey page.
Click on the button that says “chi-squared goodness of fit”.
1. It is a big job to write and grade the AP-statistics exam for high school students each year. It is a
difficult multiple choice exam. All questions have five possible answers A-E. Test the claim that percent
of A answers is the same as the percent of B answers which is the same as C,D and E. The data has
already been entered in StatKey. How many categories are we checking? What is the degrees of
freedom? If all the categories are equal, what percent would we expect each of them to be in the null
hypothesis? Give the null and alternative hypothesis. Simulate the null hypothesis. Notice statkey
simulates chi-squared values from a population that is equal. Use the right tailed function to check if
the real original sample data is significant. What is test statistic? What is the P-value? In the original
sample data, were the observed sample values significantly different than the expected values in the
null hypothesis? Could the original sample data have happened by random chance (sampling
variability)? Do we reject or fail to reject the null hypothesis? Write a conclusion for the test. (Use a 5%
significance level.) (You can assume that the data meets the assumptions for inference.)
2. Alameda county needs to verify that it’s juries in court of law are representative of the population.
On the top left of Statkey, change the data to “Alameda County Juries”. Alameda county is supposedly
made up of 54% Caucasion, 18% African American, 12% Hispanic American, 15% Asian American and 1%
other. Test the claim that Alameda’s court juries do not represent these theoretical percentages of race.
How many categories are we checking? What is the degrees of freedom? Give the null and alternative
hypothesis. Simulate the null hypothesis. Notice statkey simulates chi-squared values from a
population that is the same as the county’s theoretical percentages. Use the right tailed function to
check if the real original sample data is significant. What is test statistic? What is the P-value? In the
original sample data, were the observed sample values significantly different than the expected values
in the null hypothesis? Could the original sample data have happened by random chance (sampling
variability)? Do we reject or fail to reject the null hypothesis? Write a conclusion for the test. (Use a 5%
significance level.) (You can assume that the data meets the assumptions for inference.)
3. Open the math 140 survey data. Copy and paste the column describing the type of transportation
data into statkey. You will need to push the “edit data” button. A person that works at COC claimed
that 80% of COC students drive alone, 10% carpool, 5% are dropped off by someone, 2% walk, 1% bike,
and 2% use public transportation. Let us check if the math 140 stat students are different than these
claimed percentages. How many categories are we checking? What is the degrees of freedom? Give
the null and alternative hypothesis. Simulate the null hypothesis. Notice statkey simulates chi-squared
values from a population that is the same as the theoretical percentages. Use the right tailed function
to check if the real original sample data is significant. What is test statistic? What is the P-value? In the
original sample data, were the observed sample values significantly different than the expected values
in the null hypothesis? Could the original sample data have happened by random chance (sampling
variability)? Do we reject or fail to reject the null hypothesis? Write a conclusion for the test. (Use a 5%
significance level.) (You can assume that the data meets the assumptions for inference.)
Directions: For numbers 4-7, use statcrunch to find the chi-squared test statistic and P-value and
complete the hypothesis test. You will need to go to the “stat” menu, then “goodness of fit”, then
“chi-squared test”.
4. In a random sample of 60 COC students, 29 were liberal, 23 were conservative and 8 were moderate.
Test the claim that the percent of people in each political party are equal at COC. Find the expected
values for each category. Does this data meet the assumptions necessary to perform a Goodness of Fit
Test? If so, test the claim that the probability of being in each party is the same. Make sure to give the
null and alternative hypothesis, the Chi Squared test statistic, the p-value, whether we reject the null
hypothesis or fail, and a conclusion.
As a follow up, answer the following two questions: Are the observed values significantly different than
the expected values from the null hypothesis? Could the sample data have happened by random chance
(sampling variability)?
5. An online sports magazine wrote an article about the favorite sports in America. It said that 43% of
Americans prefer Football, 23% of Americans prefer Baseball, 20% of Americans prefer Basketball, 8%
of Americans prefer Hockey, and 6% of Americans prefer Soccer. When 130 randomly selected COC
students were asked their favorite sport we found the following: 44 said Football, 26 said Baseball, 29
said Basketball, 13 said Hockey, 18 said Soccer. Test the claim that COC students do not match the
distribution claimed in the magazine article. Find the expected values for each category. Does this data
meet the assumptions necessary to perform a Goodness of Fit Test? If so, test the claim that the
probability of liking each sport is different than what the magazine suggested. Make sure to give the
null and alternative hypothesis, the Chi Squared test statistic, the p-value, whether we reject the null
hypothesis or fail, and a conclusion.
As a follow up, answer the following two questions: Are the observed values significantly different than
the expected values from the null hypothesis? Could the sample data have happened by random chance
(sampling variability)?
6. Thousands of people die from car accidents across the U.S. every year, but are the probabilities of
dying in a car accident the same for every day of the week? The following data summary gives the
observed number of the number of deaths from car accidents in the U.S. for each day of a randomly
selected week. The total number of deaths for the week was 805. Find the expected values for each
category. Does this data meet the assumptions necessary to perform a Goodness of Fit Test? If so, test
the claim that the probability of dying in a car accident is the same for each day. Make sure to give the
null and alternative hypothesis, the Chi Squared test statistic, the p-value, whether we reject the null
hypothesis or fail, and a conclusion.
Day
Monday Tuesday Wednesday Thursday Friday Saturday Sunday
Number of Deaths 106
104
103
113
130
132
117
As a follow up, answer the following two questions: Are the observed values significantly different than
the expected values from the null hypothesis? Could the sample data have happened by random chance
(sampling variability)?
7. The National Highway Traffic Safety Administration (NHTSA) publishes reports about motorcycle
fatalities and helmet use. The following distribution shows the proportion of fatalities by location of
injury for motorcycle accidents.
Location of Injury Multiple Locations Head Neck Thorax Abdomen/Spine
Proportion
0.57
0.31
0.03
0.06
0.03
The sample data below shows the distribution of 2068 randomly selected fatalities from riders that were
not wearing a helmet. Use a 0.05 significance level to test the claim that the distribution for the sample
does not match the proportions given by the NHTSA. Make sure to find the expected values and verify
the assumptions for a Goodness of Fit test. Make sure to give the null and alternative hypothesis, the
Chi Squared test statistic, the p-value, whether we reject the null hypothesis or fail, and a conclusion.
Where is the largest discrepancy between the observed and expected value? What does this tell us
about the importance of wearing helmets?
Location of Injury
Number of Deaths
Multiple Locations Head Neck Thorax Abdomen/Spine
1036
864
38
83
47
As a follow up, answer the following two questions: Are the observed values significantly different than
the expected values from the null hypothesis? Could the sample data have happened by random chance
(sampling variability)?
Hypothesis Test Notes
Chi-Squared Independence & Homogeneity Test
The Chi-Squared test statistic is very versatile and can be used for many different categorical
hypothesis tests. We often want to study relationships between categorical variables. Making
a two-way table and studying conditional probabilities help us to understand the categorical
data.
Example 1
A sample of 75 college students was taken to determine if listening to music helps or hurts a
person’s ability to retain information. The students were randomly put into three groups. One
group got to listen to their favorite music, one group had to listen to music they hated, and one
group had no music. The students were assigned to memorize the same information and were
given a test to determine how much of the information they remembered.
Liked Music Disliked Music No Music
Total
High Retention
10
11
18
39
Low Retention
14
15
7
36
Total
24
26
25
Grand Total = 75
We learned in our unit on probability that conditional probabilities are important to explore when
analyzing relationships between categorical variables. We also learned two very important principles.
1. When conditional probabilities were close or equal it indicated that the variables were
not related to each other. (Independent)
2. When conditional probabilities were significantly different it indicated that the variables were
related to each other. (Dependent)
This is a very important principle that is the guiding idea behind the hypothesis tests for Independence
and Homogeneity.
Let’s look at some conditional probabilities in the music problem.
P( high retention | liked music ) = 10/24 = 0.417 or 41.7%
P( high retention | disliked music ) = 11/26 = 0.423 or 42.3%
If we just looked at these two conditional probabilities, we might think that music and retention are
not related. (Independent)
Here lies the fundamental problem. We are not really taking the entire two-way table and all of the
conditional probabilities into account. If we look at another conditional probability, we may come to a
different conclusion. Look at these two.
P( high retention | liked music ) = 10/24 = 0.417 or 41.7%
P( high retention | no music ) = 18/25 = 72%
These two probabilities are significantly different and would lead us to the conclusion that music and
retention are related (dependent).
So it is difficult to determine if categorical variables are related or not by just looking at two conditional
probabilities. We need a better way to do this.
Chi-Squared Hypothesis Test for Independence and Homogeneity
The Chi-Squared test statistics can do a much better job. Not only will it take into account every
conditional probability, but it will also see if the data is significant enough to apply to the population.
Null and Alternative Hypotheses
Remember the guiding principle. If the distribution of conditional probabilities is the same, this
indicates independence (not related). If the distribution of conditional probabilities is different, this
indicates dependence (related).
There are two hypothesis tests that are equivalent statements and can be tested with the same data,
same test statistic and same P-value. Homogeneity tests that the distribution of conditional
probabilities are the same verses different. Independence tests that the variables are independent or
dependent. The key is that these are equivalent statements.
Homogeneity Test
H 0 : The distribution for retaining information is the same for the various music options
H A : The distribution for retaining information is different for the various music options
Independence Test
H 0 : Music and Retaining Info are Independent (not related)
H A : Music and Retaining Info are Dependent (related)
Notes about the null and alternative hypothesis:

The null hypothesis is that the various conditional probabilities for each variable are the same.
This implies that the condition does not matter and the variables are independent (not related).

The alternative hypothesis is that the various conditional probabilities for each variable are
different. This implies that the condition does matter and the variables are dependent (related).

When describing lots of probabilities for lots of variables, we often use the word “distribution”
and say “the distribution is the same” or “the distribution is different”. It does not mean all the
probabilities are the same or close, but only those probabilities from the same row or column are
the same or close.

Though the null and alternative hypotheses for the homogeneity test and independence test are
equivalent, that does not mean that you should not put the correct null and alternative
hypothesis.

If the claim is that the variables are related, not related, dependent or independent, you should
give the null and alternative for an Independence Test.
H 0 : Categorical variables are Independent (not related)
H A : Categorical variables are Dependent (related)

If the claim is that the variables have the same or different conditional probabilities, of if the
claim is that the distribution is the same or different, you should give the null and alternative for
a Homogeneity Test.
H 0 : The distribution is the same
H A : The distribution is different
Expected Values
Remember the Chi-Squared test statistic compares observed sample data to the expected values.
These “expected values” are what we expect to happen if the null hypothesis is true. For the
Independence Test, the null is that they are independent, but remember that is equivalent to the null
for the Homogeneity Test that the distribution of conditional probabilities is the same.
So if the null is true we should expect the conditional probabilities for each variable to be the same.
Let’s work this out for the music and retention problem.
Liked Music Disliked Music No Music
Total
High Retention
10
11
18
39
Low Retention
14
15
7
36
Total
24
26
25
Grand Total = 75
P (high retention) = 39/75 = 0.52
If the null hypothesis is true, we expect this probability to be the same regardless of the music choice.
Remember the expected values are found by a percent of a total (multiplying the total in the column or
row by the probability).
E  n p
Only the n is not the grand total, it is the total for each column (music choice)
So if the null is true we expect the p for high retention to always be 0.52 and the expected values will be
0.52 x total students for each music choice.
Elike / high  n  p  24  0.52  12.48
Edislike / high  n  p  26  0.52  13.52
Enone / high  n  p  25  0.52  13.0
P (low retention) = 36/75 = 0.48
If the null hypothesis is true, we expect this probability to be the same regardless of the music choice.
Remember the expected values are found by multiplying the amount by the probability.
E  n p
Only the n is not the grand total is the total for each column (music choice)
So if the null is true we expect the p for low retention to always be 0.48 and the expected values will be
0.48 x total students for each music choice.
Elike / low  n  p  24  0.48  11.52
Edislike / low  n  p  26  0.48  12.48
Enone / low  n  p  25  0.48  12.0
Calculate the Chi-Squared Test Statistic
We learned that the Chi-Squared test statistic is a comparison of the observed sample values and the
expected values from the null hypothesis. Here is the formula again.
 
2
O  E 
2
E
So Chi-Squared subtracts the observed and expected values to find the difference. Since some
differences are negative, it squares the differences. It also divides by E to make it a kind of average of
squares and finally it adds up these values for every variable.
Here is the sentence to explain Chi-Squared again:
“The sum of the averages of the squares of the differences between the observed sample data and the
expected values if the null hypothesis were true.”
Liked Music Disliked Music No Music
Total
High Retention
10
11
18
39
Low Retention
14
15
7
36
Total
24
26
25
Grand Total = 75
In this example the numbers in the two-way table are the observed values. Note: The observed values
do not include the totals! This two-way table has 2 rows and 3 columns (not counting totals). So we
have a total of 6 observed values and 6 expected values.
Liked Music Disliked Music No Music
High Retention
10
11
18
Low Retention
14
15
7
Let’s calculate the Chi-Squared test statistic for this problem. Here are the expected values to compare
to. It is good to label so that you subtract the correct observed value with the correct expected value.
Elike / high  n  p  24  0.52  12.48
Edislike / high  n  p  26  0.52  13.52
Enone / high  n  p  25  0.52  13.0
Elike / low  n  p  24  0.48  11.52
Edislike / low  n  p  26  0.48  12.48
Enone / low  n  p  25  0.48  12.0
 
2
O  E 
 2.48

E
2
12.48

2
10  12.48

12.48
 2.52 

13.52
2
 5

13
2
2
11  13.52 

2
13.52
 2.48

11.52
2
18  13

 2.52 

12.48
2
13
2
 5

14  11.52 

11.52
2
15 12.48 

12.48
2
 7 12 

2
12
6.1504 6.3504 25 6.1504 6.3504 25

 


12.48
13.52 13 11.52
12.48 12
 0.49282  0.46970  1.92308  0.53388  0.50885  2.08333
 6.01167
Notice again the numbers that were added to get the Chi-Squared test statistic. These are called the
contributions to Chi-Squared. Which cells had the greatest contribution to Chi-Squared? These are the
ones where the observed values disagreed with the null hypothesis the most.
12
2
Assumptions
The assumptions for any Chi-Squared hypothesis test are as follows:
1. Random
2. All Expected Values at least 5  E  5
Did this problem meet the assumptions?
(notice the data was random and all of the expected values were at least 5)
Is it significant?
We have had a little experience with Chi-Squared. We know they are always skewed right and right
tailed. It can be pretty large, so you may not be sure if a  2  6.01167 is significant.
There are two ways to handle this question. P-value or Simulation.
Let’s start with simulating Chi-Squared test statistics.
A simulation of this data with StatKey gave the following. Notice StatKey calls the two-way table
hypothesis tests a “Chi-Square Test for Association”. Association means dependent or related.
Was our Chi-Squared test statistic of 6.012 significant (in the tail)?
What was the estimated P-value?
Was it likely or unlikely that this data happened by random chance?
If we were using a 5% significance level, would we reject the null hypothesis or fail to reject?
The test statistic was in the tail. The estimated P-value was 0.048 or (4.8%). It was unlikely to happen by
random chance (4.8%). This is a borderline case. Significance will depend on the significance level. If the
P-value is less than the significance level it is significant.
At a 5% significance level, we would reject the null hypothesis. (P-value < sig level)
Writing a conclusion
Remember we said that there were two different hypothesis tests that we could do with this data and
Chi-Squared test statistic, Independence test or Homogeneity test. Here are the null and alternative
hypotheses again. Remember a conclusion must address the claim.
Homogeneity Test
H 0 : The distribution for retaining information is the same for the various music options
H A : The distribution for retaining information is different for the various music options
Suppose the claim was that distribution is different for various music options. (Alternative hypothesis
for homogeneity test) Since we rejected the null hypothesis, our conclusion would be:
“We have significant sample evidence to support the claim that the distribution for retaining
information is different.”
(Notice this implies that we also have evidence to support dependence.)
Independence Test
H 0 : Music and Retaining Info are Independent (not related)
H A : Music and Retaining Info are Dependent (related)
Suppose the claim was that music and retaining information are not related (independent). (Null
hypothesis for Independence test). Since we rejected the null hypothesis, the conclusion would be:
“We have significant sample evidence to reject the claim that music and retaining information are
independent.”
(Notice this also implies something about homogeneity. The conditional probabilities must also be
significantly different.)
Calculating the Chi-Squared test statist and P-value with StatCrunch
Remember, do not calculate Chi-Squared by hand with a calculator. That is the job of statistics programs
like StatCrunch.
StatCrunch Directions
To calculate the Chi-Squared test statistic and P-value with StatCrunch, start by typing or pasting in the
two-way table exactly as you see it in the problem. Remember do not type in the totals. Then go to the
“Stat” menu, click on “Tables” then on “Contingency”. Then click on “with summary”. Hold the control
key and highlight all of the columns with numbers. Under “Row Labels” highlight the column where
your row labels are. Click on “Expected Count” and “Contribution to Chi-Squared”. Then push compute.
Let’s try it.
Put in the two-way table from the last example into StatCrunch and calculate the Expected values,
contribution to Chi-Squared, Chi-Squared test statistic and P-value.
You should see the following:
Contingency table results:
Rows: var1
Columns: None
Cell format
Count
(Expected count)
(Contributions to Chi-Square)
Liked Music Disliked Music No Music Total
High Retention
10
(12.48)
(0.49)
11
(13.52)
(0.47)
18
(13)
(1.92)
39
Low Retention
14
(11.52)
(0.53)
15
(12.48)
(0.51)
7
(12)
(2.08)
36
24
26
25
75
Total
Chi-Square test:
Statistic DF Value P-value
Chi-square
2 6.01167 0.0495
Notice each cell now gives 3 numbers. The first is the observed sample data. The second is the expected
counts (expected values). These are the ones you need to write down and determine if they are at least
5. The last is the contribution to Chi-Squared. Notice the larges contribution to Chi-Squared came from
the “no music” group. We had a lot more “high retention” students than we expected and a lot less “low
retention” than we expected. This gives some evidence that when trying to retain information,
“no music” seems to be best.
Notes about the Chi-Squared Independence Test and Homogeneity Test

Though the Independence and Homogeneity tests use the same test statistic, you should write
the correct null and alternative hypothesis. If the claim is that the variables are related,
not related, dependent, independent, associated, or not associated, you are doing an
Independence test. If the claim is that the distribution of conditional probabilities is the same or
different, then you are doing a Homogeneity test.

The assumptions for any Chi-Squared hypothesis test are as follows:
1. Random
2. All Expected Values at least 5  E  5

The Degrees of Freedom for a two-way table are as follows:
Degrees of Freedom (df)  (r  1)(c  1)
Where “r” is the number of rows (not counting totals) and “c” is the number of columns (not
counting totals).
In the example, the degrees of freedom would be:
Degrees of Freedom (df)  (r  1)(c  1)  (2  1)(3  1)  (1)(2)  2

Remember Chi-Squared distribution is always skewed right and can be very large. The
hypothesis tests that use Chi-Squared are right tailed tests.

Do not calculate Chi-Squared test statistic by hand with a calculator. Use a statistics program
like StatKey or StatCrunch.

Remember Chi-Squared test statistics can be very large. Refer to either the simulation (in the
tail) or the P-value (close to zero) to determine significance.

To simulate a Chi-Squared test statistic with StatKey, we go to “Test for Association” under the
“advanced randomization tests” menu.

To calculate the Chi-Squared test statistic and P-value with StatCrunch, start by typing or pasting
in the two-way table exactly as you see it in the problem. Remember do not type in the totals.
Then go to the “Stat” menu, click on “Tables” then on “Contingency”. Then click on “with
summary”. Hold the control key and highlight all of the columns with numbers. Under “Row
Labels” highlight the column where your row labels are. Click on “Expected Count” and
“Contribution to Chi-Squared”. Then push compute.
Math 140 Hypothesis Test Activity #14
Conditional Probabilities, Simulation and Independence Tests
Directions: For numbers 1 and 2, use statkey at www.lock5stat.com to simulate the following chisquared goodness of fit tests. Go to “more advanced randomization tests” at the bottom of the
statkey page. Click on the button that says “chi-squared test for association”. Remember a test for
association has the same test statistic as a homogeneity test.
1. We want to know if the state a home is built in is related to whether or not the home is large. A
random sample of homes in the U.S was taken. They determined the state the home was built in and
whether or not the home was large. In statkey, you will find this data already entered. If not, look at
the top left button. It should say “homes for sale(size by state)”. Test the claim that where a home is
built is related to its size. What is the degrees of freedom? Give the null and alternative hypothesis.
Simulate the null hypothesis. Notice statkey simulates chi-squared values from a population that has
the same distribution (i.e. as if the companies were the same). Use the right tailed function to check if
the real original sample data is significant. What is test statistic? What is the P-value? In the original
sample data, were the observed sample values significantly different than the expected values in the
null hypothesis? Could the original sample data have happened by random chance (sampling
variability)? Do we reject or fail to reject the null hypothesis? Write a conclusion for the test. (Use a 5%
significance level.) (You can assume that the data meets the assumptions for inference.)
2. Let’s look at another data set in Statkey. We want to know if gender is independent to getting an
award. A random sample of people that won famous awards in the Olympic, Academia, and Nobel was
taken. They determined the gender of each of the people that won the award. In statkey, you will find
this data already entered. If not, look at the top left button. It should say “student survey (award by
gender)”. Test the claim that gender is totally independent of awards. What is the degrees of freedom?
Give the null and alternative hypothesis. Simulate the null hypothesis. Notice statkey simulates chisquared values from a population that has the same distribution (i.e. as if the companies were the
same). Use the right tailed function to check if the real original sample data is significant. What is test
statistic? What is the P-value? In the original sample data, were the observed sample values
significantly different than the expected values in the null hypothesis? Could the original sample data
have happened by random chance (sampling variability)? Do we reject or fail to reject the null
hypothesis? Write a conclusion for the test. (Use a 5% significance level.) (You can assume that the
data meets the assumptions for inference.)
Directions for #3-6: The following table describes the gender and majors of 692 randomly selected
students at a local college. Recall that two variables are considered independent if one event
occurring does not significantly change the probability of the other event occurring. For example, in
chapter 5 we said that two events are independent if P  A | B   P  A | C  . In other words, changing
the condition from B to C did not matter. The probability stayed the same. But what if P  A | B  is
not equal to P  A | C  , but it is close? What would we consider a significant difference in order to say
for sure that categories are independent? That is question behind the Chi Squared independence test.
For numbers 3c, 4c, 5c and 6c, use statcrunch to find the chi-squared test statistic and P-value and
complete the hypothesis test. First type the two way table as you see it into statcrunch. You will
need the row and column labels also. Now go to the “stat” menu, then “tables”, then “contingency”.
Tell statcrunch the columns your counts are in and under “row labels” the column your row labels are
in. Be sure to click on expected counts and contribution to chi-squared also.
3.
Female
Male
Business English History Music Biology Math
89
71
62
48
56
9
112
58
59
53
62
13
a) Find all the row and column totals for the table.
b) Do you think that gender and major are independent (not related) or dependent (related)? Find a
couple conditional probabilities to back up your answer. Hint: Use the probability formula:
P  A | B  P  A | C  .
c) Now use a Chi Squared Independence Test on Statcrunch to test the claim that gender and major
are dependent (related). Write down all the expected values. Does the problem meet all the
assumptions necessary to perform the test? Include the null and alternative hypothesis, the Chi
Squared test statistic, p-value, whether or not you reject the null hypothesis and a conclusion.
d) Now go back and analyze your answer to #1b. Would you change your mind?
4.
Type A Type B Type AB Type O
Male
40
9
5
60
Female
30
8
7
50
a) Find all the row and column totals for the table.
b) Do you think that gender and blood type are independent (not related) or dependent (related)?
Find a couple conditional probabilities to back up your answer. Hint: Use the probability formula:
P  A | B  P  A | C  .
c) Now use a Chi Squared Independence Test on Statcrunch to test the claim that gender and blood
type are independent (not related). Write down all the expected values. Does the problem meet all the
assumptions necessary to perform the test? Include the null and alternative hypothesis, the Chi
Squared test statistic, p-value, whether or not you reject the null hypothesis and a conclusion.
d) Now go back and analyze your answer to #2b. Would you change your mind?
5.
Med/Surg ICU SDS ER
18-35
19
4
25
16
36-49
27
7
22
9
50-64
17
13
15
17
65+
12
21
8
19
a) Find all the row and column totals for the table.
b) Do you think that age and what part of the hospital the person went to are independent (not
related) or dependent (related)? Find a couple conditional probabilities to back up your answer. Hint:
Use the probability formula: P  A | B   P  A | C  .
c) Now use a Chi Squared Independence Test on Statcrunch to test the claim that age and part of the
hospital are independent (not related). Write down all the expected values. Does the problem meet all
the assumptions necessary to perform the test? Include the null and alternative hypothesis, the Chi
Squared test statistic, p-value, whether or not you reject the null hypothesis and a conclusion.
d) Now go back and analyze your answer to #3b. Would you change your mind?
6.
Type A Type B Type AB Type O
Rh+
35
24
11
91
Rh -
12
6
10
21
a) Find all the row and column totals for the table.
b) Do you think that blood type and Rh factor are independent (not related) or dependent (related)?
Find a couple conditional probabilities to back up your answer. Hint: Use the probability formula:
P  A | B  P  A | C  .
c) Now use a Chi Squared Independence Test on Statcrunch to test the claim that blood type and Rh
factor are dependent (related). Write down all the expected values. Does the problem meet all the
assumptions necessary to perform the test? Include the null and alternative hypothesis, the Chi
Squared test statistic, p-value, whether or not you reject the null hypothesis and a conclusion.
d) Now go back and analyze your answer to #4b. Would you change your mind?
7. Follow-up question: Write a few sentences talking about the difficulty in determining whether there
is a “significant” difference between values. Why do we need an independence test? Can’t we just use
conditional probabilities to prove independence?
Math 140 Hypothesis Tests Activity #15
Homogeneity Tests and Simulation
Directions: For numbers 1 and 2, use statkey at www.lock5stat.com to simulate the following chisquared goodness of fit tests. Go to “more advanced randomization tests” at the bottom of the
statkey page. Click on the button that says “chi-squared test for association”. Remember a test for
association has the same test statistic as a homogeneity test.
1. A random sample of people were asked whether bottled water from various companies or filtered
water tastes better. In statkey, you will find this data already entered. If not, look at the top left button.
It should say “water taste”. Test the claim that the distributions of percentages that prefer bottled
water and filtered water are different depending on the company. What is the degrees of freedom?
Give the null and alternative hypothesis. Simulate the null hypothesis. Notice statkey simulates chisquared values from a population that has the same distribution (i.e. as if the companies were the
same). Use the right tailed function to check if the real original sample data is significant. What is test
statistic? What is the P-value? In the original sample data, were the observed sample values
significantly different than the expected values in the null hypothesis? Could the original sample data
have happened by random chance (sampling variability)? Do we reject or fail to reject the null
hypothesis? Write a conclusion for the test. (Use a 5% significance level.) (You can assume that the
data meets the assumptions for inference.)
2. Let’s look at another data set in Statkey. Click on the top left button and change the data to
“profession by handedness”. This looks at various professions and if the person was left or right
handed. Test the claim that the distributions of right and left handed should be the same for the various
professions. What is the degrees of freedom? Give the null and alternative hypothesis. Simulate the
null hypothesis. Notice statkey simulates chi-squared values from a population that has the same
distribution (i.e. as if the professions had the same percentages of right and left handed). Use the right
tailed function to check if the real original sample data is significant. What is test statistic? What is the
P-value? In the original sample data, were the observed sample values significantly different than the
expected values in the null hypothesis? Could the original sample data have happened by random
chance (sampling variability)? Do we reject or fail to reject the null hypothesis? Write a conclusion for
the test. (Use a 5% significance level.) (You can assume that the data meets the assumptions for
inference.)
Directions: For numbers 3-6, use statcrunch to find the chi-squared test statistic and P-value and
complete the hypothesis test. First type the two way table as you see it into statcrunch. You will
need the row and column labels also. Now go to the “stat” menu, then “tables”, then “contingency”.
Tell statcrunch the columns your counts are in and under “row labels” the column your row labels are
in. Be sure to click on expected counts and contribution to chi-squared also.
3. Three random samples were taken of Democrats, Republicans and Independents. Use a
Homogeneity test to answer the following question. Does the evidence suggest that the proportion of
individuals for or against the legalization of marijuana is the same for each political affiliation? Use a
0.05 significance level. Make sure to check if the data meets the assumptions necessary for a
homogeneity test. Include the null and alternative hypothesis, chi-square test statistic, p-value, whether
or not you reject the null hypothesis and a detailed conclusion.
As a follow up, answer the following two questions: Are the observed values significantly different than
the expected values from the null hypothesis? Could the sample data have happened by random chance
(sampling variability)?
Democrat Republican Independent
Legalize Marijuana
240
121
292
Do not Legalize Marijuana 326
370
446
4. A random sample of American adults was taken and their health and education status obtained. Use
a Homogeneity Test to test the claim that health is the same regardless of education level. Use a 0.05
significance level. Make sure to check if the data meets the assumptions necessary for a Homogeneity
Test. Include the null and alternative hypothesis, chi-square test statistic, p-value, whether or not you
reject the null hypothesis and a detailed conclusion.
Excellent Health Good Health Fair Health Poor Health
Less than High School
72
202
199
62
High School Diploma
465
877
358
108
Some College/Associates Degree 80
138
49
11
Bachelor’s Degree
229
276
64
12
Graduate Degree
130
147
32
2
As a follow up, answer the following two questions: Are the observed values significantly different than
the expected values from the null hypothesis? Could the sample data have happened by random chance
(sampling variability)?
5. Use a homogeneity test to test the following claim. An obstetrician wants to learn whether the
amount of prenatal care is different depending on and how wanted the pregnancy was. He randomly
selects 939 women and finds the following information about when they started prenatal care (if ever)
and how wanted was the pregnancy. Use a 0.05 significance level. Make sure to check if the data meets
the assumptions necessary for a Homogeneity Test. Include the null and alternative hypothesis, chisquare test statistic, p-value, whether or not you reject the null hypothesis and a detailed conclusion.
Prenatal care in < 3
months
Prenatal care in 3-5
months
Prenatal care in >5 months
(or never)
Intended
Pregnancy
593
26
33
Unintended
Pregnancy
64
8
11
Mistimed
Pregnancy
169
19
16
As a follow up, answer the following two questions: Are the observed values significantly different than
the expected values from the null hypothesis? Could the sample data have happened by random chance
(sampling variability)?
6. In 2005, a random sample of 1000 chickens sold in grocery stores were tested for presence of
salmonella or campylobacter. These are dangerous and can cause illness in people. The study was
repeated in 2008. Use a Homogeneity test to determine whether there has been a difference in the
proportions of outbreaks in 2005 vs 2008. Use a 0.05 significance level. Make sure to check if the data
meets the assumptions necessary for a Homogeneity Test. Include the null and alternative hypothesis,
chi-square test statistic, p-value, whether or not you reject the null hypothesis and a detailed
conclusion.
Salmonella or Campylobacter Present Salmonella or Campylobacter Not Present
2005 86
914
2008 74
926
As a follow up, answer the following two questions: Are the observed values significantly different than
the expected values from the null hypothesis? Could the sample data have happened by random chance
(sampling variability)?
Hypothesis Test Notes
Analysis of Variance (ANOVA)
Recall that the goodness of fit categorical data test can be used when comparing a percentage
in 3 or more groups. What if we have quantitative data from 3 or more groups and want to
compare the mean averages?
The answer to this is the ANOVA test.
ANOVA stands for “Analysis of Variance”
Is a favorite of statisticians because it is very versatile and can be used for comparing the means
of quantitative data sets.
ANOVA Null and Alternative Hypothesis
The “one-way” ANOVA hypothesis test is used to compare 1 mean average between several
groups. If you want to compare more than one mean from several groups, that is called a
“Two-way ANOVA”. (We will only cover one-way ANOVA)
Example 1-Mean Average Salaries for people living in five states in Australia.
Suppose we want to compare the mean average weekly salary for people living in 5 states in
Australia. (Northern Territory, New South Wales, Queensland, Victoria, and Tasmania) We think
they are different. As with all multiple population hypothesis tests, you should label the
populations.
1 : Northern Territory
2 : New South Wales
3 : Queensland
4 : Victoria
5 : Tasmania
Here is the null and alternative hypothesis for the ANOVA test. Remember an ANOVA is a
multiple  test for 3 or more groups.
H 0 : 1  2  3  4  5
H A : at least one is  (claim)
When doing an ANOVA test, it is good to find the sample size (n), the sample mean of each
group, and the standard deviation or variance for each group. Here is the sample data from
StatCrunch.
Summary statistics:
Column
n Mean
Std. dev. Variance
North Territory 35 1534.5395 701.52474 492136.96
New South Wales 35 1536.8228 677.14095 458519.87
Queensland
35 1368.2912 536.31969 287638.81
Victoria
35 1149.0504 516.55309 266827.09
Tasmania
35 898.69512 386.35397 149269.39
Note: The key question: Are these sample means different because of sampling variability
(random chance) OR are they different because at least one of the populations really is
different?
To answer this, we need a test statistic and a P-value.
ANOVA Test Statistic – F distribution
The T test statistic can only be used to compare two things, either a sample mean to a
population mean or the mean averages from two populations. Either way, the T-test statistic
cannot handle 3 or more groups.
F-test statistic to the rescue
The F-test statistic uses variance to measure how different the sample means are. It relies on
two specific variances. Remember Variance (standard deviation squared) is a measure of
spread that determines how far values are from the mean.


Variance between the groups (How far the sample means for each group are from the
overall mean of all the groups combined)
Variance within the groups (How far each sample value is from its own sample mean.)
The F-test statistic
F
Variance between the groups
Variance within the groups
F-test statistic sentence: The ratio of the variance between the groups to the variance within
the groups.
Now let’s watch the 3 ANOVA videos on Kahn Academy to see how the F is calculated.
Notes about the F-test statistic

In a fraction, when the numerator is significantly larger than the denominator, the
overall fraction is large. So if the variance between the groups is much larger than the
variance within the groups, this will give a large F-test statistic (small P-value) and
indicates that the sample means are significantly different. (Unlikely to happen by
random chance, reject the null hypothesis)

In a fraction, when the numerator is the same of smaller than the denominator, the
overall fraction is small. So if the variance between the groups is much smaller than the
variance within the groups, this will give a small F-test statistic (large P-value) and
indicates that the sample means are not significantly different. (Could have happen by
random chance, fail to reject the null hypothesis)

There are three degrees of freedom in an ANOVA: df within the groups, df between the
groups, and df total.
Here are the degrees of freedom in our last example of Australia weekly salaries
df between = # groups – 1 = 5 – 1 = 4
total df = total # of data values from all groups combined – 1 = 35x5 – 1 = 175 – 1 = 174
df within each group
= (35 – 1) + (35 – 1) + (35 – 1) +(35 – 1) +(35 – 1) = 34 + 34 + 34 + 34 + 34 = 170
Notice df between (4) + df within (170) = df total (174)

Do not calculate the F test statistic by hand with a calculator. It is a really difficult
calculation. Calculate the F-test statistic with a computer software like StatCrunch or
StatKey.

Like Chi-Squared, the F-distribution is always skewed right and the ANOVA test is always
a right tailed test.
How to do an ANOVA test with StatCrunch
Copy and Paste your raw quantitative data from each group into some columns of StatCrunch.
To check assumptions, you will want to create histograms of all your data sets to check shape.
Go to “graph” menu and click on “histogram”. (Or you can check the dot plot option in the oneway ANOVA menu.) A side by side boxplot is also a nice summary of center and spread.
To calculate the F-test statistic and P-value, Go to the “stat” menu, then “ANOVA”, then “one
way”.
Stat  ANOVA  One Way
Hold the control key down to select the columns where your data is and push compute. Here is
the printout we got.
Analysis of Variance results:
Data stored in separate columns.
Column statistics
Column
n
North Territory
Mean
Std. Dev. Std. Error
35 1534.5395 701.52474 118.57932
New South Wales 35 1536.8228 677.14095 114.45771
Queensland
35 1368.2912 536.31969 90.654574
Victoria
35 1149.0504 516.55309 87.313408
Tasmania
35 898.69512 386.35397 65.305741
ANOVA table
Source DF
SS
MS
F-Stat P-value
Columns 4 10484499 2621124.8 7.9217156 <0.0001
Error
170 56249332 330878.43
Total
174 66733832
Let’s see if we understand what we are seeing. Notice the MS (mean sum of squares) is the
sum of squares (SS) divided by degrees of freedom (df).
MS (columns) is the variance between the groups (2621124.8)
MS (Error) is the variance within the groups (330878.43)
So the F-test statistic is calculated by the formula
F
Variance between the groups 2621124.8

 7.9217
Variance within the groups
330878.43
So the variance between the groups is almost 8 times greater than the variance within the
groups. Is this significantly large for an F?
Again, when unsure about a test statistic refer to a simulation or the P-value. If the sample
data is in the tail of the simulation or if the P-value is close to zero it is significant.
Notice in our printout from StatCrunch we got the following P-value: “<0.0001”. This means
the actual P-value is very close to 0.
P-value  0
From our study of P-values, we know this is very significant. So the F test statistic is significantly
large and the variance between the groups is significantly greater than the variance within the
groups. This is highly unlikely to happen by random chance.
Reject the null hypothesis H 0
Conclusion?
Recall the claim was that at least one state was different than the others (alternative
hypothesis). Since we rejected the null, we support this claim. Our P-value is very small and
our test statistic very large, so we have significant sample evidence.
Conclusion: There is significant sample evidence to support the claim that the mean average
salaries of people in Northern Territory, New South Wales, Queensland, Victoria, and Tasmania
are different.
Assumptions for ANOVA




Random
Sample sizes at least 30 or bell shaped
Groups Independent of each other
Equal population variances (no group has a standard deviation
more than twice as big as any other group)
Check the assumptions for the Australian states problem.
Random? The data was a random sample of people living in these states in Australia.
30 or bell shaped? Some histograms were bell shaped and some a little skewed, but since the
sample sizes were 35 (over 30), it does pass this assumption.
Independent groups? The data may fail this one. Salaries from state to state all may be related
due to their reliance on the overall economy and unemployment rates of Australia.
Equal Population Variances? Looking at the standard deviations listed in the ANOVA printout.
The larges standard deviation is 701.5 and the smallest was 386.4, so no standard deviation was
more than twice as large as any other. So it passes this assumption.
Simulating the F-distribution
We can also determine if the F-test statistic is sufficiently large by simulation. Go to
www.lock5stat.com and click on “StatKey”. Click on “ANOVA for difference in means”.
In the top right corner, change the “ants” problem to “Fish Gill, Gill rates by Calcium”. This is
exploring if the amount of calcium is related to how well the gills of a fish work.
Here is the null and alternative hypothesis. Remember when the sample means are equal that
means calcium is not related to how well a fish’s gills function. When sample means are
different, this means that calcium is related.
H 0 : 1  2  3
H A : at least one is  (claim)
Notice the F-test statistic has already been calculated.
Let’s see if the F = 4.648 is significant by simulating F test statistic assuming the null hypothesis
is true and the population means are equal.
2000 simulations gave the following. Notice we are looking for the probability that the sample
data (original F test statistic) or more extreme happened by random chance. This is the P-value.
Notice the original sample F test statistic did fall in the tail. It also has a very small P-value
(0.012). Both of these things tell us that the F test statistic of 4.648 was significant.
Reject the null hypothesis.
There is sufficient sample evidence to support the claim that calcium is related to the function
of a fish’s gills (the population means are different).
Math 140 Hypothesis Test Act 16
Exploring ANOVA and the F-distribution through Simulation
ANOVA stands for Analysis of Variance. It is a popular test among statisticians. A one-way ANOVA
hypothesis test is like a multiple mean average test (usually 3 or more groups). Here is a sample null and
alternative hypothesis.
H 0 : 1  2  3  4  5
H A : at least one population mean is different
ANOVA uses a very ingenious test statistic that looks at the ratio between the variance between the
groups to the variance within the groups. We call this ratio the “F-distribution”. Recall that variance is
the square of the standard deviation and is a measure of spread from the mean.
ANOVA Test Statistic F 
Variance Between the groups
Variance Within the groups
So the main idea is this:
If the variance between the groups is very large compared to the variance within the groups then the
population means are probably not the same, and the F ratio will be rather large. This will result in a
small P-value and rejecting the null hypothesis.
If the variance between the groups is small compared to the variance within the groups then the
population means might be the same, and the F ratio will be rather small. This will result in a large
P-value and we will fail to reject the null hypothesis.
Here are the key questions:

How do I know if the F –test statistic is large enough to be considered significant? (Is the
variance between significantly greater than the variance within the groups?)

Is it likely or unlikely that the sample data occurred by random chance from equal populations,
or does this give evidence that the population means really are different?
These questions and more can be answered by studying simulation. Specifically we are going to
simulate random data from groups with equal population means and compare the original sample data
to the simulation.
Note: ANOVA does have several assumptions that we check, but for this activity you can assume the
assumptions are met. We will focus on understanding the ANOVA test and the F-test statistic through
simulation.
Go to www.lock5stat.com and click on the StatKey button. Under the “More advanced randomization
tests” menu click on “ANOVA for Difference in Means”.
1. The first data we are going to look at is the “Sandwich Ants” data. It should be entered, but if not,
you can click on the button at the top left of the page and click on “Sandwich Ants”. We are studying
the number of ants that are drawn to different kinds of food. In this data, we are looking at the mean
average number of ants that come to three different types of sandwiches. Test the claim that the
number of ants will be different depending on which sandwich is left out.
a) Write the null and alternative hypotheses for the test. Which is the claim?
b) What is the F-test statistic for the sample data? Does it look significant? Let’s find out by simulating.
c) Simulate random F test stats from populations with equal means. What shape is the simulated
distribution?
d) Estimate the P-value from the simulation? Write a sentence to explain the P-value.
e) Is it likely or unlikely that the original sample F test statistic happened by random chance?
f) Do you think the variance between the groups is significantly greater than the variance within the
groups? Why?
g) Will you reject or fail to reject the null hypothesis?
h) Write a conclusion for the correlation hypothesis test.
2. Now click on the Pulse rate and award data. This data looks at the average pulse rates of those
people that have won Olympic, Academy and Nobel awards. Test the claim that the population mean
average pulse rate is the same for the three groups.
a) Write the null and alternative hypotheses for the test. Which is the claim?
b) What is the F-test statistic for the sample data? Does it look significant? Let’s find out by simulating.
c) Simulate random F test stats from populations with equal means. What shape is the simulated
distribution?
d) Estimate the P-value from the simulation? Write a sentence to explain the P-value.
e) Is it likely or unlikely that the original sample F test statistic happened by random chance?
f) Do you think the variance between the groups is significantly greater than the variance within the
groups? Why?
g) Will you reject or fail to reject the null hypothesis?
h) Write a conclusion for the correlation hypothesis test.
3. Now click on the Homes for Sale (price by state) data. This data looks at the average selling price of
homes in four different states. Test the claim that the population mean average home price is different
in the various states.
a) Write the null and alternative hypotheses for the test. Which is the claim?
b) What is the F-test statistic for the sample data? Does it look significant? Let’s find out by simulating.
c) Simulate random F test stats from populations with equal means. What shape is the simulated
distribution?
d) Estimate the P-value from the simulation? Write a sentence to explain the P-value.
e) Is it likely or unlikely that the original sample F test statistic happened by random chance?
f) Do you think the variance between the groups is significantly greater than the variance within the
groups? Why?
g) Will you reject or fail to reject the null hypothesis?
h) Write a conclusion for the correlation hypothesis test.
Math 140 Hypothesis Test Activity 17
Analysis of Variance (ANOVA) with Statcrunch
One-Way ANOVA test assumptions




Random samples of three or more groups, each measuring the same quantitative variable.
Groups are independent of each other.
Check if all the samples are at least 30 or nearly normal.
Populations have same variance. Check if the largest sample standard deviation is less than or
equal to twice the smallest standard deviation.
ANOVA Test Statistic F 
Variance Between the groups
Variance Within the groups
Directions: Copy and paste the ANOVA test data sets into Statcrunch and answer the following
questions.
1.
Random samples of black bears were weighed at various times of the year. Some of the bears
were weighed in April through July. Others were weighed in August and September or October
and November.
a) Create a side by side box plot for the three data sets. Draw the boxplot below and describe
the graph. Also find the mean average, standard deviation and variance for each of the
three data sets. How do they compare.
b) Does the data meet the assumptions necessary to do an ANOVA test? (If the data set is
small, be sure to check the nearly normal assumption by making a histogram.)
c) Use an ANOVA test to test the claim that the average weight of black bears is different
depending on what time of year they are measured. (Can you think of a reason why this
might be true?) Give the null and alternative hypothesis, the F-test statistic and the P-value.
Write a sentence describing the meaning of the F-test statistic and another sentence
describing the meaning of the P-value. Give the degrees of freedom between the groups
and the degrees of freedom within the groups. Did you reject the null hypothesis or fail to
reject? Write a conclusion for your test.
d) Follow up questions:
i.) Was the variance between the groups significantly higher than the variance within
the groups? (Explain how you know.)
ii.) Is it likely or unlikely that the sample data occured by random chance from groups
that have the same mean average? (Explain how you know.)
2. Now we are going to look at the relationships between how much sleep Math 075 students get
and how many units they have completed at COC. Since the Math 075 data is census data we
can assume it is representative of the population of all 075 students. Julie thinks that the
average number of units will be the same no matter how much sleep the person gets. Analyze
the data for Julie. The number of units have been broken up into three data sets (less than 6
hours, 6-8 hours, more than 8 hours).
a) Create a side by side box plot for the three data sets. Draw the boxplot below and describe
the graph. Also find the mean average and variance for each of the three data sets. How do
they compare.
b) Does the data meet the asumptions necessary to do an ANOVA test? (If the data set is
small, be sure to check the nearly normal assumption by making a histogram.)
c) Use an ANOVA test to test the claim that the average number of units completed at COC is
the same regardless of how much sleep a person gets. Give the null and alternative
hypothesis, the F-test statistic and the P-value. Write a sentence describing the meaning of
the F-test statistic and another sentence describing the meaning of the P-value. Give the
degrees of freedom between the groups and the degrees of freedom within the groups. Did
you reject the null hypothesis or fail to reject? Write a conclusion for your test.
d) Follow up questions:
i.) Was the variance between the groups significantly higher than the variance within
the groups? (Explain how you know.)
ii.) Is it likely or unlikely that the sample data occured by random chance from groups
that have the same mean average? (Explain how you know.)
3.
Let’s look again at the Math 075 data and explore the relationship between political party and
how much alcohol someone drinks. Since the Math 075 data is census data we can assume it is
representative of the population of all 075 students. The amount of alcohol drunk has been
separated into four data sets corresponding to four political affiliations (democrat, republican,
independent, other). Is the average amount of alcohol drunk different depending on political
affiliation?
a) Create a side by side box plot for the three data sets. Draw the boxplot below and describe
the graph. Also find the mean average and variance for each of the three data sets. How do
they compare.
b) Does the data meet the asumptions necessary to do an ANOVA test? (If the data set is
small, be sure to check the nearly normal assumption by making a histogram.)
c) Use an ANOVA test to test the claim that the average amount of alcohol a person drinks
differs depending on political affiliation. Give the null and alternative hypothesis, the F-test
statistic and the P-value. Write a sentence describing the meaning of the F-test statistic and
another sentence describing the meaning of the P-value. Give the degrees of freedom
between the groups and the degrees of freedom within the groups. Did you reject the null
hypothesis or fail to reject? Write a conclusion for your test.
d) Follow up questions:
i.) Was the variance between the groups significantly higher than the variance within
the groups? (Explain how you know.)
ii.) Is it likely or unlikely that the sample data occured by random chance from groups
that have the same mean average? (Explain how you know.)
4.
Let’s look at the math 140 survey from fall 2015. We are analyzing the amount of minutes per
week spent on social media. We separated the data into the type of social media being used.
Test the claim that the amount of minutes spent is the same no matter what social media was
being used.
a. Create a side by side box plot for the three data sets. Draw the boxplot below and
describe the graph. Also find the mean average and variance for each of the three data
sets. How do they compare.
b. Does the data meet the asumptions necessary to do an ANOVA test? (If the data set is
small, be sure to check the nearly normal assumption by making a histogram.)
c. Use an ANOVA test to test the claim that the amount of minutes spent is the same no
matter what social media was being used. Give the null and alternative hypothesis, the
F-test statistic and the P-value. Write a sentence describing the meaning of the F-test
statistic and another sentence describing the meaning of the P-value. Give the degrees
of freedom between the groups and the degrees of freedom within the groups. Did you
reject the null hypothesis or fail to reject? Write a conclusion for your test.
d. Follow up questions:
i.) Was the variance between the groups significantly higher than the variance within
the groups? (Explain how you know.)
ii.) Is it likely or unlikely that the sample data occured by random chance from groups
that have the same mean average? (Explain how you know.)
Correlation Hypothesis Test Notes
Sample Correlation Coefficient: r
Population Correlation Coefficient (rho): 




Two variables have correlation if  (rho) is close to +1 or -1. ( r is significantly different
than zero)
Two variables have positive correlation if  (rho) is close to +1. ( r is significantly
greater than zero)
Two variables have negative correlation if  (rho) is close to -1. ( r is significantly less
than zero)
Two variables do not have correlation if  (rho) is close to zero. ( r is significantly close
to zero)
Correlation Hypothesis Test Example
X : Amount of Tar (mg)
Y : Amount of CO (ppm)
Test the claim that there is a positive correlation between tar and Carbon Monoxide.
Ha:   0 (or Rho > 0) (claim) (Is positive correlation)
Ho:   0 (or Rho = 0) (no correlation)
R = 0.9335
P-value < 0.0001
Assumptions: Random, 2 quantitative data sets, Scatterplot shows a linear trend (points are
close to the line) and there are no influential outliers. The histogram showed a nearly normal
distribution. The histogram was centered close to zero. The residual plot showed a slight fan
shape. So it fails the homoscedasticity requirement. My sample size was less than 30 (29) but
the data was bell shaped so it passes the 30 or normal requirement.
P-value < 0.0001
Sentence: If Ho is true (no correlation) then there is less than 0.0001 chance of getting the
sample data or more extreme.
Could this data happen by random chance from a population with no correlation? Very unlikely
(0.0001)
Is the r value of 0.9335 significantly different than zero? Yes. There is a significant difference
(low p-value)
Reject Ho
There is significant sample evidence to support the claim that there is a positive correlation
between the amount of tar in a cigarette and the amount of carbon monoxide.
Note about the null and alternative hypothesis of a correlation test
You may see the null and alternative hypothesis written differently in various books and programs.
Look at the following formula:
 sy 
slope  r  
 sx 
When there is no correlation, the r value gets close to zero. The standard deviations of the x and y
variables are both positive numbers. So as the correlation coefficient r gets close to zero, so does the
slope of the regression line.
Three ways of writing correlation null and alternative hypothesis. These are all equivalent statements.
H 0 : No Correlation
H A : Is Correlation
OR
H0 :   0
HA :   0
OR
H 0 : Slope  0
H A : Slope  0
Similarly, there are three ways of writing positive correlation null and alternative hypothesis (right tail).
These are also equivalent statements.
H 0 : No Correlation
H A : Is Positive Correlation
OR
H0:   0
HA :  > 0
OR
H 0 : Slope  0
H A : Slope > 0
Similarly, there are three ways of writing negative correlation null and alternative hypothesis (left tail).
These are also equivalent statements.
H0 : No Correlation
H A : Is Negative Correlation
OR
H0:   0
HA :  < 0
OR
H 0 : Slope  0
H A : Slope < 0
Note about test statistics used in a correlation test
There are three different test statistics that can be used in a correlation test. A StatCrunch printout will
show you all three. The important thing to remember is that all of them give virtually the same P-value.

The sample correlation coefficient (r) can be used. This works especially well with simulation.
You can use simulation to see if r was significant and to calculate an approximate P-value.

Like a two population mean hypothesis test, you can use a T-test statistic. However, the t will
not be measuring how many standard errors one sample is from another. Remember, in
correlation the x and y have different units. You cannot directly compare them. Instead the T
test statistic will measure how many standard errors that the slope of the regression line is from
zero. Recall that when there is no correlation, the slope of the regression line goes to zero.

Some statisticians also use an F-test Statistic like an ANOVA test, though ANOVA is designed for
comparing means from 3 or more groups.
Note: The P-value for a y-intercept hypothesis test is very different than the P-value for a correlation
test. Do not use the y-intercept P-value on StatCrunch. That is for testing a y-intercept or initial value.
Hypothesis Test Activity 18
Understanding Correlation through Simulation
Introduction: We learned yesterday that the correlation coefficient “r” can be used to
determine if there is a correlation (linear relationship) between two quantitative variables.
Today we are wondering if there is a correlation between two samples, is that correlation
coefficient significant enough to show that there is or is not correlation between populations
(i.e. to perform a hypothesis test).
We learned that if r is close to +1, there is strong positive correlation between the samples. If r
is close to -1, there is strong negative correlation between the samples. When r is close to 0,
that indicates that there is no correlation between the samples.
This can help us understand the samples, but how do we know how close the r value has to be
to 1 or -1 to be considered significant? Also is it significant enough to perform a correlation
hypothesis test (rho test)? These are important questions to answer through simulation and
understanding of r and P-value.
Here are some possible null and alternative hypotheses for a correlation rho test. The Greek
letter “rho” is often used for the population correlation coefficient. Remember “r” is a sample
correlation coefficient.
H 0 : rho  0 (no correlation)
H 0 : rho  0 (no correlation)
H A : rho  0 (is correlation)
H A : rho  0 (is positive correlation)
H 0 : rho  0 (no correlation)
H A : rho  0 (is negative correlation)
We are going to use Statkey on www.lock5stat.com to simulate random correlation data. That
way we can compare the correlation coefficient (r) of the original sample data to the simulated
r values in the simulation. Through this we can tell if the r from the original sample was
significant and we can also estimate the P-value and determine if the sample correlation
coefficient r could of occurred by random sampling variability (random chance) or there really is
correlation between the two populations.
Go to www.lock5stat.com and click on the “Statkey” button. Under “Randomization Hypothesis
Tests” click the one that says “Test for Slope, Correlation”. Make sure the top of the graph says
“Randomization Dotplot of “correlation”. Notice the null hypothesis is rho = 0. Remember rho
looks like a “  ” but it is not a “P”. Normally we will of course be checking assumptions, but
for this activity we will just be focused on understanding the simulation. You can assume the
assumptions are met.
1. Let’s look at the Uniform Violence data from the NFL. This should be already entered, but if
not you can find it in the top left button. It should say “Malevolent Uniforms”. They looked at
NFL team uniforms and measured how scary their uniforms are on a scale of about 2 to 5. They
then looked at how much those teams are penalized. Is there a positive correlation between
having a scarier uniform and being more prone to penalties during the games?
a) Write the null and alternative hypotheses for the test. Which is the claim? Is this a right tail,
left tail or two tailed test?
b) What is the r value for the sample data? Do you think the r-value will be significant enough
to show correlation between populations?
c) Perform the simulation and estimate the P-value from the simulation? Write a sentence to
explain the P-value. Is it likely or unlikely that the original sample r-value happened by random
chance?
d) Will you reject or fail to reject the null hypothesis?
e) Write a conclusion for the correlation hypothesis test.
2. Is there a negative correlation between the pH (acidity) of Florida lakes and the amount of
Mercury in the lake? This is the question we are striving to answer in the next simulation. On
the top left button, click on “Florida Lakes” data. The data was part of a study about dangerous
Mercury levels in many of the Florida lakes.
a) Write the null and alternative hypotheses for the test. Which is the claim? Is this a right tail,
left tail or two tailed test?
b) What is the r value for the sample data? Do you think this value will be significant enough to
show correlation between populations?
c) Perform the simulation and estimate the P-value from the simulation? Write a sentence to
explain the P-value. Is it likely or unlikely that the original sample r-value happened by random
chance?
d) Will you reject or fail to reject the null hypothesis?
e) Write a conclusion for the correlation hypothesis test.
3. In the top left button in Statkey, change to “ICU Admission” data. Test the claim that there
is no correlation between Systolic Blood pressure and heart rate.
a) Write the null and alternative hypotheses for the test. Which is the claim? Is this a right tail,
left tail or two tailed test?
b) What is the r value for the sample data? Do you think this value will be significant enough to
show correlation between populations?
c) Perform the simulation and estimate the P-value from the simulation? Write a sentence to
explain the P-value. Is it likely or unlikely that the original sample r-value happened by random
chance?
d) Will you reject or fail to reject the null hypothesis?
e) Write a conclusion for the correlation hypothesis test.
Math 140 Hypothesis Test Activity 19
Correlation Hypothesis Test with Statcrunch
Directions: For each of the following problems, find if there is a linear relationship (correlation)
between the quantitative variables by performing a correlation rho test. For each of the
following data sets, decide which data set should be the explanatory variable and which should
be the response variable. Go to the “Stat” menu, and click on “Regression”, then “Simple
Linear”. Put in the columns for the explanatory (x) and the response (y). Click on Fitted line
plot, Residuals verses x variable, and a Histogram of the residuals. You should get the
scatterplot, Residuals verses x variable, and a Histogram of the residuals on a word document
with the r value, r-squared, standard deviation of the residuals, and the equation of the
regression line. You do not have to save the graphs in a word document, but you will need to
look at the graphs when you check assumption. Be sure to check for assumptions. You may
assume the data has been collected randomly. Be sure to give the null and alternative
hypothesis, the correlation coefficient, and the P-value. Also state whether or not you reject
the null hypothesis and give a conclusion.
Assumptions for Correlation Rho Test

Two quantitative variables

Random ordered paired data with sample sizes at least 30.

Scatterplot shows a linear shape (Points in scatterplot follow a general linear
pattern.)

There are NO influential outliers in the scatterplot.

Histogram of the residuals is nearly normal (close to bell shaped).

Histogram of the residuals is centered at zero.

Points in the residual plot are evenly spread out from the regression line.
(Homoscedasticity) (No fan shape in the residual plot – residuals verses x values)
1. Open the women’s health data. Test the claim that there is a positive linear correlation
between the height of a woman and her weight?
2. Open the women’s health data. Test the claim that there is no linear correlation between
the diastolic blood pressure of a woman and her systolic blood pressure?
3. Open the women’s health data. Maria claims that there is a linear correlation between the
age of a woman and her cholesterol. Test Maria’s claim.
4. Open the Bear data. A park ranger claims that there is not a linear correlation between the
chest size of a bear and the width of its skull. Test this claim.
5. Open the Bear data. A park ranger claims that there is a positive linear correlation between
the neck size of a bear and its weight. Test this claim.
6. Open the Bear Data. A small boy named Joe went to the zoo. When looking at the bears, he
claims that you will not be able to show there is a correlation between the age of a bear and
the length of a bear because the bear data does not meet the assumptions necessary to do a
linear correlation hypothesis test. (Joe is a really smart kid.) A zoo keeper disagrees with him.
Do you agree with Joe or the zoo keeper? Why?