Download Introduction to Hypothesis Testing

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Bayesian inference in marketing wikipedia , lookup

Transcript
Introduction to Hypothesis Testing





Hypothesis testing methodology and terminology
Z test for the mean (  known)
One-tail tests
T test for the mean (  unknown)
Z test for the proportion
Purpose: To illustrate the use of the third Building Block:
“In order to evaluate an error, compare it to the standard error:
 sample mean  population mean 


standard error


(A)
Note (a) The standard error consists of two components: a measure of variability and a measure of
knowledge.
(b) We evaluate the error using probability
(c) If the probability is low either the sample was unlikely or one of the population values in the
above ratio is not correct.”
1. Hypothesis testing methodology and terminology
Make a claim about a population characteristic, determine how much evidence is needed to reject this claim,
obtain data, and make a conclusion about the claim.
1.1 Hypothesis
A claim about a population characteristic such as the population average or proportion.
1.1.1 Null hypothesis, H0
1.1.1.1 Notes


A claim that represents the status quo or what is assumed to be true.
Always contains the equal sign
1.1.1.2. Examples


In a jury trial the person is assumed innocent.
The average starting salary of college graduates at a university was $53,000 last year.
( = $53,000)
1.1.2 Alternative hypothesis, H1
1.1.2.1 Notes

A claim that is the opposite of the null and is what you wish to support.

Never contains the equal sign
1.1.1.2. Examples


In a jury trial the person is guilty.
The average starting salary of college graduates at a university was not $53,000 last
year. ( $53,000)
1.2 Types of errors and their probability

Wrong decisions can be made due to incomplete information
1.2.1 First type of error (Type I)
1.2.1.1 Notes


Saying the null hypothesis is wrong when actually the null hypothesis is true.
The probability of a Type I error is denoted by  and is also called the level of
significance.
1.2.1.2 Examples


Saying the person is guilty when they are actually innocent.
Making the decision that the average starting salary of college graduates at a university
was not $53,000 last year when actually it was.
1.2.2 Second type of error (Type II)
1.2.1.1 Notes



Failing to reject the null hypothesis when actually the null hypothesis is wrong.
The probability of this type of error is denoted by 
(The probability rejecting the null when it is wrong is 1- and is called the power of the
test.)
1.2.1.2 Examples


Saying the person is innocent when they are actually guilty.
Making the decision that the average starting salary of college graduates at a university
was $53,000 last year when actually it was not $53,000.
1.3 Decision Rules


Determine what evidence would cause you to reject the null and support the alternative.
Two approaches: how far is too far away or how unlikely must the evidence be in order to
reject the null.
1.3.1 Rejection Region Approach
1.3.1.1 Notes

Divide the number line into two sets: sample averages that would contradict the null
hypothesis claim and those that would not.

If the null hypothesis was true, it is possible some sample averages would fall in the
rejection region and you would make an error by rejecting the null when you shouldn’t.

Based on the chance of a Type I error decide how far is too far from the null.
1.3.1.2 Example:
 H0 The population average is $53,000 ( = $53,000)
 Sample averages far above $53,000 or far below $53,000 would cause you to doubt the null is
true.
 Since the null could be true, you decide to take only a 5% chance of making a type I error.
 You rejection region would be any sample mean more than 1.96 standard errors above or less
than 1.96 standard errors below $53,000.
 How far is too far if you willing to make 10% chance of a Type I error?
1.3.1 Probability Approach (p-value)
1.3.1.1 Notes

Decide on the chance you are willing to take of a Type I error.

If your data gives you a smaller chance of making a type I error, then reject the null
1.3.1.2 Example:
 H0 The population average is $53,000 ( = $53,000)
 Sample averages that are unlikely to be observed if the null was true would cause you to doubt
the null is true.
 Since the null could be true, you decide to take only a 5% chance of making a type I error.
 You rejection region would be any sample mean that has a less than 5% chance of being
observed if the null was true.
1.4 General Approach to Hypothesis Testing
State the H0

State the H1


Choose 

Choose n

Choose Test

Set up rejection region(s)

Collect data

Compute test statistic and p-value

Make statistical decision

Express conclusion in terms of rejecting null or not.
2. Hypothesis Test for the Mean (  known)
2.1 Example of a two-sided test:
Test the claim that on average there are three TV sets in each U.S. home. Assume you know that the
population standard deviation is 1 set. You have collected a simple random sample of 100
households and found the average to be 3.2. Can you conclude that this claim is false? 



H0  = 3
H1:   3
This is a two sided rejection region since sample values far above or below three would cause
you to reject the null.
Rejection Region: Z > 1.96 or Z < -1.96

Test statistic Z 
(x  )


n
(3.2  3)
2
1
100
 P-value is the probability of finding a sample average more extreme than one observed if the null
is true. The probability of finding a sample mean more than a distance of .2 away from the mean
of 3 is 2(.0228) = 0.0456


Make statistical decision. Reject that  =3.
Conclusion: We do have enough evidence to conclude that the average number of TV sets in
U.S. homes differs from three.
2.2 Example of a one-sided test:
In the past the average waist size of adult males in a town has been 36 inches with a standard deviation
of 3 inches. You wish to determine if the average waist size of males in a town is now greater than 36.
You collect a random sample of 36 men and determine that the average waist size is 37.5 inches. Again
let 



H0   36
H1:  > 36
This is a one-sided rejection region since only sample averages far above 36 would cause you to
reject the null and support the alternative.
Rejection Region: Z > 1.645

Test statistic Z 
(x  )

n

(37.5  36)
3
3
36
 The p-value can be written in several forms: Using the “>” column of the Z-table, (a) 0.0013 is
the probability of finding Z value greater than 3. By the third Building Block this is equivalent to
(b) the probability of finding an error larger than three standard errors. By the second Building
Block this is equivalent to (c) the probability of finding an error more than 1.5 inches. Finally in
the context of this problem, this can also be stated as (d) the probability of finding a sample mean
more than 1.5 inches above the population average of 36 inches. You will need to be able to
discuss p-values using form (d).


Make statistical decision. Reject that   36.
Conclusion: We do have enough evidence to conclude that the average waist size of males in this
city exceeds 36 inches.
For other examples, double click the embedded Excel file below. Scroll down to see solution. Press F9 to
see another example.
Show that the mean sales of all LCD panels is less than 308.
After collecting a random sample of 15 LCD panels
you find a sample mean of 134. Assume the distribution
of sales is normal with the population standard
deviation of 58.
Test at a 1% significance level.
If the embedded Excel file does not work, then click on the following link
http://wweb.uta.edu/faculty/eakin/busa3321/ZhyptestforMu.xls
Be sure to work a left-sided, a right-sided and a two-sided example.
3. Hypothesis Test for the Mean (  unknown)
3.1 Notes

Same requirements as for t confidence interval:
o Random Sample
o Sampling from normal or large sample size

For this chapter degrees of freedom is n-1
3.2 Example:

Does an average box of cereal contain more than 368 grams of cereal? A random sample of 36 boxes
showed X = 372.5, ands  15. Test at the 0.01 level.


H0   368
H1:  > 368

This is a one-sided rejection region since only sample averages far above 368 would cause you
to reject the null and support the alternative.
Rejection Region: t > 2.438
X   372.5  368

 1.80
S
15
n
36
 Using the “>” column of the T-Table, find 1.80 on row 35. It falls between the 0.05 and 0.025
column.. The interpretation of the p-value is that there was between a 2.5% and a 5% chance of
finding a sample average more than 4.5 ounces above a population average of 368. Since the pvalue is greater than 0.01, it is not considered unlikely and the population might still be 368 or
something close to that. (To determine the value of the population mean use a confidence
interval.)
t

Test statistic


Make statistical decision. Fail to reject that   368.
Conclusion: We do not have enough evidence to conclude that the box of cereal contain more
than 368 grams of cereal.
For other examples, double click the embedded Excel file below. Scroll down to see solution. Press F9 to
see another example.
Show that the mean mpg of all cars is smaller than 33.
After collecting a random sample of 9 cars
you find a sample mean of 21 and a sample standard
deviation of 4. Assume the distribution of mpg is normal.
Test at a 5% significance level.
If the embedded Excel file does not work, then click on the following link
http://wweb.uta.edu/faculty/eakin/busa3321/thyptest.xls
Be sure to work a left-sided, a right-sided and a two-sided example.
4. Hypothesis Test for the Population Proportion
4.1 Notes

Same requirements as for confidence interval of the proportion:
o Random Sample
o Sample size large enough so that np and n(1-p) are both at least 5

Uses the z formula covered in the sampling distribution section
4.2 Example:
A marketing company claims that it receives 4% responses from its mailing. To test this claim, a
random sample of 500 were surveyed with 25 responses. Test at the a = .05 significance level.


H0  = 0.04
H1:   0.04

This is a two-sided rejection region since sample proportions that are either much smaller than
0.04 or much larger than 0.04 would cause you to reject the null and support the alternative.
Rejection Region: Z > 1.96 or Z < -1.96
Z

p̂  p

p(1  p)
n
0.05  0.04
 1.14
0.04(1  0.04)
500
Test statistic
 The p-value of 0.2542 is the probability of finding a sample proportion more than 0.01 away
from the value of 0.04. The p-value was found using the Outside column of the z-table.


Make statistical decision. Fail to reject that p = 0.04.
Conclusion: We do not have enough evidence to conclude that marketing claim is false. It is
possible that it receives 4% responses from its mailing
For other examples, double click the embedded Excel file below. Scroll down to see solution. Press F9 to see
another example.
Show that the proportion of all young buyers differs from 0.5.
After collecting a random sample of 62 buyers
you find that 47% of these are young buyers
Test at a 10% significance level.
If the embedded Excel file does not work, then click on the following link
http://wweb.uta.edu/faculty/eakin/busa3321/zHypTestForP.xls
Be sure to work a left-sided, a right-sided and a two-sided example.