Download H 0

Document related concepts

Bootstrapping (statistics) wikipedia , lookup

Confidence interval wikipedia , lookup

Taylor's law wikipedia , lookup

Psychometrics wikipedia , lookup

Foundations of statistics wikipedia , lookup

Omnibus test wikipedia , lookup

Statistical hypothesis testing wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Misuse of statistics wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript
Hypothesis Tests
Regarding a Parameter –
Single Mean & Single Proportion
Overview
• This is the other part of inferential statistics, hypothesis
testing
• Hypothesis testing and estimation are two different
approaches to two similar problems
– Estimation is the process of using sample data to
estimate the value of a population parameter
– Hypothesis testing is the process of using sample
data to test a claim about the value of a population
parameter
The Language of
Hypothesis Testing
Determine the null and alternative
hypotheses from a claim
Hypothesis Testing
• The environment of our problem is that we
want to test whether a particular claim is
believable, or not
• The process that we use is called
hypothesis testing
• This is one of the most common goals of
statistics
Hypothesis Testing
• Hypothesis testing involves two steps
– Step 1 – to state what we think is true
– Step 2 – to quantify how confident we are in
our claim
• The first step is relatively easy
• The second step is why we need statistics
Hypothesis Testing
• We are usually told what the claim is, what the
goal of the test is
• Now similar to estimation in the previous unit
discussed, we will again use the material
regarding the sampling distribution of the sample
mean to quantify how confident we are in our
claim
Example
• An example of what we want to quantify
– A car manufacturer claims that a certain model of car
achieves 29 miles per gallon
– To test for the claim, we then test some number of
cars
– We calculate the sample mean … it is 27
– Is 27 miles per gallon consistent with the
manufacturer’s claim? How confident are we that the
manufacturer has significantly overstated the miles
per gallon achievable?
Example
• How confident are we that the gas economy is
definitely less than 29 miles per gallon?
• We would like to make either a statement
“We’re pretty sure that
the mileage is less than 29 mpg”
or
“It’s believable that the
mileage is equal to 29 mpg”
Level of Significance
• A hypothesis test for an unknown
parameter is a test of a specific claim
– Compare this to a confidence interval which
gives an interval of numbers, not a “believe it”
or “don’t believe it” answer
• The level of significance reflects the
confidence we have in our conclusion
Null Hypothesis
• How do we state our claim?
• Our claim
– Is the statement to be tested
– Is called the null hypothesis
– Is written as H0 (and is read as “H-naught”)
Alternative Hypothesis
• How do we state our counter-claim?
• Our counter-claim
– Is the opposite of the statement to be tested
– Is called the alternative hypothesis
– Is written as H1 (and is read as “H-one”)
Two-tailed Test
• There are different types of null hypothesis / alternative
hypothesis pairs, depending on the claim and the
counter-claim
• One type of H0 / H1 pair, called a two-tailed (or twosided) test, tests whether the parameter is either equal
to, versus not equal to, some value
– H0: parameter = some value
– H1: parameter ≠ some value
Example
• An example of a two-tailed test
• A bolt manufacturer claims that the diameter of the bolts
average 10 mm
– H0: Diameter = 10
– H1: Diameter ≠ 10
• An alternative hypothesis of “≠ 10” is appropriate since
– A sample diameter that is too high may be a problem
– A sample diameter that is too low may also be a problem
That is, we may reject the claim under the H0 , if the sample value
is either too high or too low
• Thus this is a two-tailed test
Left-tailed Test
• Another type of pair, called a left-tailed test,
tests whether the parameter is either equal to,
versus less than, some value
– H0: parameter = some value (This actually means
Parameter  some value)
– H1: parameter < some value
Note: Equality sign appears only in the null hypothesis.
Example
•
•
•
•
An example of a left-tailed test
A car manufacturer claims that the mpg of a certain model car is at least 29.0
– H0: MPG = 29.0 (In fact, this does not mean MPG is only 29.0. it means
MPG  29.0 )
– H1: MPG < 29.0
An alternative hypothesis of “< 29” is appropriate since
– A mpg that is too low is a problem
– A mpg that is too high is not a problem
That is, we reject the claim under the H0, if the sample mpg observed is too low,
much lower than 29.
Thus this is a left-tailed test. (The side of the tail depends on the direction under H1
which tends to support a lower value of MPG. And a lower value is located on the left
of a higher value on a number line.
Note: By convention, we always only put the equality sign for the claim in H0 , even
though it should be MPG  29.0. This is because we can tell the actual direction of
the inequality sign under H0 by just looking at the sign in H1 (H0 is the opposite of H1.
Since MPG is less than 29 in H1, MPG will be no less than 29 under H0.)
Right-tailed Test
• Another third type of pair, called a right-tailed test,
tests whether the parameter is either equal to, versus
greater than, some value
– H0: parameter = some value (This actually means
parameter  Some value.)
– H1: parameter > some value
Note: Equality sign appears only under the null
hypothesis H0
Example
• An example of a right-tailed test
• A bolt manufacturer claims that the defective
rate of their product is at most 1 part in 1,000
– H0: Defect Rate = 0.001
– H1: Defect Rate > 0.001
• An alternative hypothesis of “> 0.001” is
appropriate since
– A defect rate that is too low is not a problem
– A defect rate that is too high is a problem
That is, higher defective rate observed tends to be in
favor of H1, , but against H0.
• Thus this is a right-tailed test
One-tailed and Two-tailed Tests
• A comparison of the three types of tests
• The null hypothesis
– We believe that this is true
• The alternative hypothesis
Type of test
Two-tailed test
Left-tailed test
Right-tailed test
Sample value
that is too low
A problem
A problem
Not a problem
Sample value
that is too high
A problem
Not a problem
A problem
Example 1
• A manufacturer claims that there are at least two
scoops of cranberries in each box of cereal
• What would be a problem?
– The parameter to be tested is the number of scoops
of cranberries in each box of cereal
– If the sample mean is too low, that is a problem
– If the sample mean is too high, that is not a problem
• This is a left-tailed test
– The “bad case” is when there are too few
Example 2
• A manufacturer claims that there are exactly 500
mg of a medication in each tablet
• What would be a problem?
– The parameter to be tested is the amount of a
medication in each tablet
– If the sample mean is too low, that is a problem
– If the sample mean is too high, that is a problem too
• This is a two-tailed test
– A “bad case” is when there are too few
Example 3
• A manufacturer claims that there are at most 8
grams of fat per serving
• What would be a problem?
– The parameter to be tested is the number of grams of
fat in each serving
– If the sample mean is too low, that is not a problem
– If the sample mean is too high, that is a problem
• This is a right-tailed test
– The “bad case” is when there are too many
Reject or Not to reject H0
• There are two possible results for a hypothesis
test
• If we believe that the null hypothesis could be
true, this is called not rejecting the null
hypothesis
– Note that this is only “we believe … could be”
• If we are pretty sure that the null hypothesis is
not true, so that the alternative hypothesis is
true, this is called rejecting the null hypothesis
– Note that this is “we are pretty sure that … is”
Understand Type I and Type II
errors
Decision Errors
• In comparing our conclusion (not reject or reject the null
hypothesis) with reality, we could either be right or we
could be wrong
– When we reject (and state that the null hypothesis is
false) but the null hypothesis is actually true
– When we not reject (and state that the null hypothesis
could be true) but the null hypothesis is actually false
• These would be undesirable errors
Type I and II Errors
• A summary of the errors is
• We see that there are four possibilities … in two
of which we are correct and in two of which we
are incorrect
Type I and II Errors
• When we reject the null hypothesis (and state
that the null hypothesis is false) but the null
hypothesis is actually true … this is called a
Type I error
• When we do not reject the null hypothesis (and
state that the null hypothesis could be true) but
the null hypothesis is actually false … this called
a Type II error
• In general, Type I errors are considered the
more serious of the two
Example
• A very good analogy for Type I and Type II
errors is in comparing it to a criminal trial
• In the US judicial system, the defendant “is
innocent until proven guilty”
– Thus the defendant is presumed to be
innocent
– The null hypothesis is that the defendant is
innocent
– H0: the defendant is innocent
Example (continued)
• If the defendant is not innocent, then
– The defendant is guilty
– The alternative hypothesis is that the
defendant is guilty
– H1: the defendant is guilty
• The summary of the set-up
– H0: the defendant is innocent
– H1: the defendant is guilty
Example (continued)
• Our possible conclusions
• Reject the null hypothesis
– Go with the alternative hypothesis
– H1: the defendant is guilty
– We vote “guilty”
• Do not reject the null hypothesis
– Go with the null hypothesis
– H0: the defendant is innocent
– We vote “not guilty” (which is not the same as voting
innocent! Voting “not guilty” does not prove the
defendant is innocent, we just do not have enough
evidence to against the defendant.)
Example (continued)
• A Type I error
– Reject the null hypothesis
– The null hypothesis was actually true
– We voted “guilty” for an innocent defendant
• A Type II error
– Do not reject the null hypothesis
– The alternative hypothesis was actually true
– We voted “not guilty” for a guilty defendant
Example (continued)
• Which error do we try to control?
• Type I error (sending an innocent person to jail)
– The evidence was “beyond a reasonable doubt”
– We must be pretty sure
– Very bad! We want to minimize this type of error
• A Type II error (letting a guilty person go)
– The evidence wasn’t “beyond a reasonable doubt”
– We weren’t sure enough
– If this happens … well … it’s not as bad as a Type I
error (according to the US system)
State Conclusion to Hypothesis
Tests
Reject or Not to reject H0
• “Innocent” versus “Not Guilty”
• This is an important concept
• Innocent is not the same as not guilty
– Innocent – the person did not commit the crime
– Not guilty – there is not enough evidence to convict …
that the reality is unclear
• To not reject the null hypothesis – doesn’t mean
that the null hypothesis is true – just that there
isn’t enough evidence to reject
Summary
• A hypothesis test tests whether a claim is
believable or not, compared to the alternative
• We test the null hypothesis H0 versus the
alternative hypothesis H1
• If there is sufficient evidence to conclude that H0
is false, we reject the null hypothesis
• If there is insufficient evidence to conclude that
H0 is false, we do not reject the null hypothesis
Hypothesis Tests for a Population Mean
Assuming the Population Standard
Deviation is Known
Learning Objectives
• Understand the logic of hypothesis testing
• Test hypotheses about a population mean with σ
known using the classical approach
• Test hypotheses about a population mean with σ
known using P-values approach
• Test hypotheses about a population mean with σ
known using confidence intervals approach
Understand the logic of
hypothesis testing
Decision Rule
• Hypothesis test is to set up a decision rule
for the sample data to reject or not to
reject the null hypothesis
– How do we quantify “unlikely” the null
hypothesis is true?
– What is the exact procedure to get to a “do
not reject” or “reject” conclusion?
Methods of Hypothesis Testing
• There are three equivalent ways to
perform a hypothesis test
• They will reach the same conclusion
• The methods
– The classical approach
– The P-value approach
– The confidence interval approach
Methods of Hypothesis Testing
• The classical approach
– If the sample value observed is too many standard
deviations away from the true value claimed under H0,
then it must be too unlikely H0 is true
• The P-value approach
– If the probability of the sample value being that far
away is small, then it must be too unlikely H0 is true
• The confidence interval approach
– If we are not sufficiently confident that the parameter
is likely enough, then it must be too unlikely
• Don’t worry … we’ll be explaining more
Basic Steps to Test the Hypothesis
Step 1: We set up the null hypothesis that the
actual mean μ is equal to a value μ0 and the
alternative hypothesis
Step 2: We set up a criterion (to reject H0)
– A criterion that quantifies “unlikely” the null
hypothesis that the actual mean μ being equal
to a specified value of μ0 is true. That is, the
actual mean is unlikely to be equal to μ0
Collect Sample Data
• The three methods all need information
– We run an experiment
– We collect the data
– We calculate the sample mean
• The three methods all make the same assumptions to be
able to make the statistical calculations
– That the sample is a simple random sample
– That the sample mean has a normal distribution
Choose a Test Statistic
• We first assume that the population standard deviation σ
is known
• We use a sample estimate, for instance, a sample mean
x to test for the population parameter - the population
mean μ
• We can apply our techniques if either
– The population has a normal distribution
– Our sample size n is large (n ≥ 30)
• In those cases, the distribution of the sample mean x is
normal with mean μ and standard deviation σ / √ n
Check the criterion for Unlikely
• The three methods all compare the observed results with
the criterion that quantifies “unlikely”:
– Classical – how many standard deviations
– P-value – the size of the probability
– Confidence interval – inside or outside the interval
• If the results are unlikely based on these criterion, we
reject the claim under the null hypothesis.
Statistical Significance
• The three methods all conclude similarly
– We do not reject the null hypothesis, or
– We reject the null hypothesis
• When we reject the null hypothesis, we
say that the result is statistically
significant
Perform Hypothesis Testing
• We now will cover how each of the
– Classical
– P-value, and
– Confidence interval
approaches will show us how to conclude
whether the result is statistically significant
or not
Test hypotheses about a
population mean with σ known
using the classical approach
The Classical Approach
• We compare the sample mean x to the
hypothesized population mean μ0
– Measure the difference in units of standard
deviations, which is called the test statistic:
x  0
z0 
/ n
– A lot of standard deviations is far … few standard
deviations is not far
– Just like using a general normal distribution
α Level of Significance
• How far is too far?
• For example, we can set α = 0.05 as the size of
“unlikely”, so-called “the level of significance”
• “Unlikely” means that this difference occurs with
probability α = 0.05 of the time, or less under the null
hypothesis
• This concept applies to two-tailed tests, left-tailed tests,
and right-tailed tests
Note: α is often determined subjectively before the
experiment. It sets up a rule to reject the null
hypothesis. So, it is also the size of the risk for
committing a type I error of rejecting the null hypothesis
by mistake.
5% Level of Significance
• For two-tailed tests
– The least likely 5% is the lowest 2.5% and highest 2.5% (below –
1.96 and above +1.96 standard deviations) … –1.96 and +1.96
are the critical values (There are two critical values for 2-tailed
test)
– The region outside this is the rejection region (or critical region)
which covers the range of “ unlikely” values for the test statistic
to reject H0
5% Level of Significance
• For left-tailed tests
– The least likely 5% is the lowest 5% (below –1.645 standard
deviations) … –1.645 is the critical value (only one critical value
for one-tailed test.)
– The region less than this is the rejection region
5% Level of Significance
• For right-tailed tests
– The least likely 5% is the highest 5% (above 1.645 standard
deviations) … +1.645 is the critical value
– The region greater than this is the rejection region
Example 1
• An example of a two-tailed test
• A bolt manufacturer claims that the diameter of
the bolts average 10.0 mm
– H0: Diameter = 10.0
– H1: Diameter ≠ 10.0
• We take a sample of size 40
– (Somehow) We know that the standard deviation of
the population is 0.3 mm
– The sample mean is 10.12 mm
– We’ll use a level of significance α = 0.05
Example 1 (continued)
• Do we reject the null hypothesis?
– 10.12 is 0.12 higher than 10.0
– The standard error is (0.3 / √ 40) = 0.047
– The test statistic is 2.53
– One of the two critical normal values is za/2, for α =
0.05, is 1.96
– 2.53 is more than 1.96, which is in the rejection
region.
• Our conclusion
– We reject the null hypothesis
– We have sufficient evidence that the population mean
diameter is not 10.0
Example 2
• An example of a left-tailed test
• A car manufacturer claims that the mpg of a
certain model car is at least 29.0
– H0: MPG = 29.0
– H1: MPG < 29.0
• We take a sample of size 40
– (Somehow) We know that the standard deviation of
the population is 0.5
– The sample mean mpg is 28.89
– We’ll use a level of significance α = 0.05
Example 2 (continued)
• Do we reject the null hypothesis?
–
–
–
–
28.89 is 0.11 lower than 29.0
The standard error is (0.5 / √ 40) = 0.079
The test statistic is -1.39
-1.39 is greater than -1.645, which is the left-tailed
critical value -za, with α = 0.05. -1.39 is not in the
rejection region.
• Our conclusion
– We do not reject the null hypothesis
– We have insufficient evidence that the population
mean mpg is less than 29.0
Example 3
• An example of a right-tailed test
• A bolt manufacturer claims that the defective
rate of their product is at most 1.70 per 1,000
– H0: Defect Rate = 1.70
– H1: Defect Rate > 1.70
• We take a sample of size 40
– (Somehow) We know that the standard deviation of
the population is .06
– The sample defect rate is 1.78
– We’ll use a level of significance α = 0.05
Example 3 (continued)
• Do we reject the null hypothesis?
–
–
–
–
1.78 is 0.08 higher than 1.70
The standard error is (0.06 / √ 40) = 0.009
The test statistic is 8.43
8.43 is more than 1.645 which is the right-tailed
critical value za, with α = 0.05
• Our conclusion
– We reject the null hypothesis
– We have sufficient evidence that the population mean
rate is more than 1.70
Critical Value(s) and Rejection
Region
• Two-tailed test
– The critical values are zα/2 and -zα/2
– The rejection region includes {less than -zα/2} and
{greater than z1-α/2}
• Left-tailed test
– The critical value is -zα
– The rejection region is {less than -zα}
• Right-tailed test
– The critical value is zα
– The rejection region is {greater than zα}
Summary
• The general picture for a level of significance α
Decision Rule for Classical
Approach
• Calculate a test
z0 
x  0
/ n
• For a significance level α provided, we locate the critical
value(s) and corresponding rejection region.
• Reject the null hypothesis, if a calculated test statistic z0
is in the rejection region; Do not reject, otherwise.
Test hypotheses about a population
mean with σ known using P-values
The P-value Approach
• The P-value is the probability of observing
a sample mean that is as or more extreme
than the observed
• The probability is calculated assuming that
the null hypothesis is true
• We use the P-value to quantify how
unlikely the sample mean is
P-value
• Just like in the classical approach, we calculate the test
statistic
x  0
z0 
/ n
• We then calculate the p-value, the probability that the
sample mean would be this, or more extreme, if the null
hypothesis was true.
• It measures how likely the observed z, i.e. z0 will occur
under the null hypothesis.
• The two-tailed, left-tailed, and right-tailed calculations
are slightly different
P-value
• For the two-tailed test, the “unlikely” region are values that are too
high and too low
• Small P-values corresponds to situations where it is unlikely to be
this far away
P-value
• For the left-tailed test, the “unlikely” region are values that are too
low
• Small P-values corresponds to situations where it is unlikely to be
this low
P-value
• For the right-tailed test, the “unlikely” region are values
that are too high
• Small P-values corresponds to situations where it is
unlikely to be this high
Summary
• For all three models (two-tailed, left-tailed,
right-tailed)
– The larger P-values mean that the difference
is not relatively large … that it’s not an unlikely
event
– The smaller P-values mean that the difference
is relatively large … that it’s an unlikely event
Example
• Larger P-values
– A P-value of 0.30, for example, means that this value,
or more extreme, could happen 30% of the time
– 30% of the time is not unusual
• Smaller P-values
– A P-value of 0.01, for example, means that this value,
or more extreme, could happen only 1% of the time
– 1% of the time is unusual
Decision Rule for P-value Approach
• The decision rule is
• For a significance level α provided
– Do not reject the null hypothesis if the P-value is
greater than α
– Reject the null hypothesis if the P-value is less than α
• For example, if α = 0.05
– A P-value of 0.30 is likely enough, compared to a
criterion of 0.05 level of significance
– A P-value of 0.01 is unlikely, compared to a criterion
of 0.05 level of significance
Example 1
• An example of a two-tailed test
• A bolt manufacturer claims that the diameter of the bolts average
10.0 mm
– H0: Diameter = 10.0
– H1: Diameter ≠ 10.0
• We take a sample of size 40
– (Somehow) We know that the standard deviation of the
population is 0.3 mm
– The sample mean is 10.12 mm
– We’ll use a level of significance α = 0.05
Note: The claim is about the average is equal to 10.0 or not. It does not
indicate it is going to be greater than or less than 10.0 if it is not
equal to 10.0. So, we design a two-tailed hypotheses, since we do
not know the direction.
Example 1 (continued)
•
•
Do we reject the null hypothesis?
– 10.12 is 0.12 higher than 10.0
– The standard error is ( 0.3 / 40) = 0.047
– The test statistic z0 is 2.53:
(10.12-0.12)/0.047) = 2.53
– The 2-sided P-value of 2.53 is 0.0114 < 0.05 = α
P value = 2 P( x  10.12 ) or 2 P( z  2.53 )
2 x normalcdf(2.53, E99) = 0.0114 or
2 x normalcdf(10.12, E99, 10, 0.3 / 40 ) = 0.0114
Our conclusion
– We reject the null hypothesis
– We have sufficient evidence that the population mean diameter is not
10.0
Example 2
• An example of a left-tailed test
• A car manufacturer claims that the mpg of a certain model car is at
least 29.0
– H0: MPG = 29.0
– H1: MPG < 29.0
• We take a sample of size 40
– (Somehow) We know that the standard deviation of the
population is 0.5
– The sample mean mpg is 28.89
– We’ll use a level of significance α = 0.05
Note: Since the equality sign should always appear only in the null
hypothesis H0, , the claim of at least (greater than or equal to) 29.0 is
placed under the null hypothesis. Since the alternative hypothesis
H1 is the opposite of the null hypothesis H0, so MPG < 29 under H1.
Example 2 (continued)
• Do we reject the null hypothesis?
– 28.89 is 0.11 lower than 29.0
– The standard error is ( 0.5 / 40 ) = 0.079
– The test statistic is -1.39
– The 1-sided P-value of -1.39 is 0.0823 > 0.05 = α
P value = P( x  28.89 ) or P( Z  1.39 )
normalcdf(-E99, -1.39) = 0.0823 or
normalcdf(-E99, 28.89, 29, 0.5 / 40 ) = 0.0823
• Our conclusion
– We do not reject the null hypothesis
– We have insufficient evidence that the population mean mpg is
less than 29.0
Example 3
• An example of a right-tailed test
• A bolt manufacturer claims that the defective rate of their product is
at most 1.70 per 1,000
– H0: Defect Rate = 1.70 (This is the claim of at most 1.00 per 1,0000)
– H1: Defect Rate > 1.70
• We take a sample of size 40
– (Somehow) We know that the standard deviation of the
population is .06
– The sample defect rate is 1.78
– We’ll use a level of significance α = 0.05
Example 3 (continued)
• Do we reject the null hypothesis?
– 1.78 is 0.08 higher than 1.70
– The standard error is ( 0.06 / 40 ) = 0.009
– The test statistic is 8.43
– The 1-sided P-value of 8.43 is extremely small
P-value = P( x  1.78 ) or P( z  8.43 )
normalcdf(1.74, E99, 1.70,0.06 / 40 ) = 1.75E-17
normalcdf(8.43, E99) = 1.75E-17
• Our conclusion
– We reject the null hypothesis
– We have sufficient evidence that the population mean rate is
more than 1.70
Classical and P-value Approaches
• Compare the rejection regions for the classical approach
and the P-value approach
• They are the same
Classical
P-Value
Note: The classical approach sets a criteria for “unlikely” in terms of
a z value; the p-value approach sets a criteria in terms of a probability.
Test hypotheses about a
population mean with σ known
using confidence intervals
Level of Significance α and Level of
Confidence (1 – α)
• The confidence interval approach yields the
same result as the classical approach and as the
P-value approach
• We compare
– A hypothesis test with a level of significance α
to
– A confidence interval with confidence (1 – α) •100%
• These are the same α’s
Decision Rule for Confidence
Interval Approach
• The relationship is
Not rejecting the
hypothesis
μ0 is inside the
Confidence interval
Rejecting the
hypothesis
μ0 is outside the
Confidence interval
• The hypothesis test calculation and the
confidence interval calculation are very similar
Example 1
• An example of a two-tailed test
• A bolt manufacturer claims that the diameter of
the bolts average 10.0 mm
– H0: Diameter = 10.0
– H1: Diameter ≠ 10.0
• We take a sample of size 40
– (Somehow) We know that the standard deviation of
this measurement is 0.3 mm
– The sample mean is 10.12 mm
– We’ll use a level of significance α = 0.05
Example 1 (continued)
• Do we reject the null hypothesis?
– 10.12 is 0.12 higher than 10.0
– The standard error is (0.3 / √ 40) = 0.047
– The confidence interval is 10.12 ± 1.96 • 0.047, or
10.03 to 10.21
– 10.0 is outside (10.03, 10.21)
• Our conclusion
– We reject the null hypothesis
– We have sufficient evidence that the population mean
diameter is not 10.0
Note: 1.96 is z0.025 for a 95% confidence interval.
Example 2
• An example of a left-tailed test
• A car manufacturer claims that the mpg of a
certain model car is at least 29.0
– H0: MPG = 29.0
– H1: MPG < 29.0
• We take a sample of size 40
– (Somehow) We know that the standard deviation of
the population is 0.5
– The sample mean mpg is 28.89
– We’ll use a level of significance α = 0.05
Example 2 (continued)
• Do we reject the null hypothesis?
– 28.89 is 0.11 lower than 29.0
– The standard error is (0.5 / √ 40) = 0.079
– The upper confidence interval limit is 28.89 + 1.645 •
0.079, or 29.02
– 29.0 is inside (-∞, 29.02)
• Our conclusion
– We do not reject the null hypothesis
– We have insufficient evidence that the population
mean mpg is less than 29.0
Note: 1.645 is z0.05 for a 95% upper confidence interval
limit.
Example 3
• An example of a right-tailed test
• A bolt manufacturer claims that the defective
rate of their product is at most 1.70 per 1,000
– H0: Defect Rate = 1.70
– H1: Defect Rate > 1.70
• We take a sample of size 40
– (Somehow) We know that the standard deviation of
the population is .06
– The sample defect rate is 1.78
– We’ll use a level of significance α = 0.05
Example 3 (continued)
• Do we reject the null hypothesis?
– 1.78 is 0.08 higher than 1.70
– The standard error is (0.06 / √ 40) = 0.009
– The lower confidence interval limit is
1.78 – 1.645 • 0.009 = 1.76
– 1.70 is outside (1.76, ∞)
• Our conclusion
– We reject the null hypothesis
– We have sufficient evidence that the population mean
rate is more than 1.70
Note: 1.645 is z0.05 for a 95% lower confidence interval
limit.
Summary
• A hypothesis test of means compares whether the true
mean is either
– Equal to, or not equal to, μ0
– Equal to, or less than, μ0
– Equal to, or more than, μ0
• There are three equivalent methods of performing the
hypothesis test
– The classical approach
– The P-value approach
– The confidence interval approach
Hypothesis Tests for a
Population Mean in Practice
Test hypotheses about a population
mean with σ unknown
Test of Mean in Practice
• So far, we assumed that the population standard
deviation, σ, was known
• This is not a realistic assumption
• There is a parallel between last unit and this unit
– solving the problems assuming that σ was
known
– solving the problem assuming that σ was not
known
• σ not being known is a much more practical
assumption
Test of Mean in Practice
• The parallel between Confidence Intervals and
Hypothesis Tests carries over here too
• For Confidence Intervals
– We estimate the population standard deviation σ by
the sample standard deviation s
– We use the Student’s t-distribution with n-1 degrees
of freedom
• For Hypothesis Tests, we do the same
– Use s for σ
– Use the Student’s t for the normal
t-test Statistic
• Thus instead of the z-test statistic knowing
σ
x  0
z0 
/ n
we calculate a t-test statistic using s
x  0
t
s/ n
• This is the appropriate test statistic to use
when σ is unknown
Hypotheses
• We can perform our hypotheses for tests of a
population proportion in the same way as when
the sample standard deviation is known
Two-tailed
H0: μ = μ0
H1: μ ≠ μ0
Left-tailed
H0: μ = μ0
H1: μ < μ0
Right-tailed
H0: μ = μ0
H1: μ > μ0
Test of Mean in Practice
• The process for a hypothesis test of a mean,
when σ is unknown is not different from the test
of a men, when σ is known
– Set up the problem with a null and alternative
hypotheses
– Collect the data and compute the sample
mean
– Compute the test statistic
Classical and P-value Approaches
• Either the Classical and the P-value approach
can be applied to determine the significance
Classical approach
P-value approach
Test of Mean in Practice
• There are thus only differences between
this process and the one using the normal
distribution previously
– We use the sample standard deviation s
instead of the population standard deviation σ
– We use the Student’s t-distribution, with n-1
degrees of freedom, instead of the normal
distribution
Example
• An example
• A gasoline manufacturer wants to make sure
that the octane in their gasoline is at least 87.0
– The testing organization takes a sample of size 40
– The sample standard deviation is 0.5 ( i.e. s = 1.5)
– The sample mean octane is 86.94
• Our null and alternative hypotheses
– H0: Mean octane = 87
– HA: Mean octane < 87
Example (continued)
Classical Approach:
• Do we reject the null hypothesis under 0.05 level of significance?
– 86.94 is 0.06 lower than 87.0
– The standard error is (0.5 / √ 40) = 0.08
– 0.06 is 0.75 standard error less
t0 
86.94  87.00
0.5 / 40
 0.7589
– The critical t value t0.05, with 39 degrees of freedom, is
-1.685 [obtained from TI calculator invT(.95,39)=-1.685]
– -1.685 < - 0.75, it is not unusual
• Our conclusion
– We do not reject the null hypothesis
– We have insufficient evidence that the true population mean
(mean octane) is less than 87.0
Example (continued)
P-value approach:
• Do we reject the null hypothesis under 0.05 level of significance?
– 86.94 is 0.06 lower than 87.0
– The standard error is (0.5 / √ 40) = 0.08
– 0.06 is 0.75 standard error less
– The 1-sided P-value of -0.75 is 0.2289 > 0.05 = α
P value = p( t  0.75 ) = 0.2289 or
tcdf(-E99, -0.75, 39) = 0.2289
• Our conclusion
– We do not reject the null hypothesis
– We have insufficient evidence that the true population mean
(mean octane) is less than 87.0
Compare t-test with z-test
• Comparing using the classical approach
Compare t-test with z-test
• Comparing using the P-value approach
Summary
• A hypothesis test of means, with σ unknown, has the
same general structure as a hypothesis test of means
with σ known
• Any one of our three methods can be used, with the
following two changes to all the calculations
– Use the sample standard deviation s in place of the
population standard deviation σ
– Use the Student’s t-distribution in place of the normal
distribution
Hypothesis Tests for a
Population Proportion
Test hypotheses about a population
proportion using the normal model
Test of Population Proportion
• In a sample of size n, with x successes, the best
estimate of the population proportion is
x
p̂ 
n
• Similar to tests for means, we have
– Two-tailed tests
– Left-tailed tests
– Right-tailed tests
Standard Error of Sample
Proportion
• Just as for confidence intervals, the standard error of
the sample mean proportion is


p( 1  p )
n
Standard Error of the Sample
Proportion under H0
• To test for the population proportion, we use the
following standard error of the sample proportion:
p0 ( 1 p0 )
n
and not
p̂( 1  p̂ )
n
(Yes, use this)
(No, don’t use this)
z-test Statistic for Testing the
Proportion
• Because we assume that the null hypothesis H0: p = p0
is true, we should use
p0 ( 1 p0 )
n
as the standard error
• The test statistic is thus
Z=
p̂  p0
p0 ( 1 p0 )
n
Hypothesis Test of Proportion
• We can perform our hypotheses for tests
of a population proportion in the same way
as the hypothesis tests of a population
mean
Two-tailed
H0: p = p0
H1: p ≠ p0
Left-tailed
H0: p = p0
H1: p < p0
Right-tailed
H0: p = p0
H1: p > p0
Hypothesis Test of Proportion
• The process for a hypothesis test of a proportion is
– Set up the problem with a null and alternative
hypotheses
– Collect the data and compute the sample proportion
– Compute the test statistic
Z=
p̂  p0
p0 ( 1 p0 )
n
Classical and P-value Approaches
• Either the Classical and the P-value approach can be
applied to determine the significance
Classical approach
P-value approach
Example
• An example
• We believe that 60% of students prefer hamburgers
over hot dogs
• A random sample of 200 students found that 102 of
them preferred hamburgers.
• At α = 0.05, does the data support our belief?
– The sample size n = 200
– The hypothesized proportion p0 = 0.60

p
 102 / 200  0.51
– The sample proportion
Example (continued)
• Our hypotheses
– H0: p = 0.60
– H1: p ≠ 0.60
• The standard error is
• The test statistic is
p0 ( 1  p0 )
 0.035
n
p̂  p0
 2.60
p0 ( 1  p0 )
n
Example (continued)
• The critical values for α = 0.05 are
± z0.025 = ± 1.96
• The test statistic –2.60 is outside the critical
values, so we reject the null hypothesis
• There is significant evidence that the proportion
of students who prefer hamburgers is not 60%
Summary
• We can perform hypothesis tests of
proportions in similar ways as hypothesis
tests of means
– Two-tailed, left-tailed, and right-tailed tests
• The normal distribution should be used to
compute the critical value(s) for this test
Putting It All Together:
Which Method Do I Use?
Determine the appropriate
hypothesis test to perform
Which Test?
• Parallels between hypothesis tests and
confidence intervals
– Both use the concept of the variability of a
sample statistic
– Both use critical values from the normal and
Student’s t-distributions
– Both have means with known σ, means with
unknown σ, proportions, and standard
deviations cases
Which Test?
• It should not be surprising that the
decision process for which hypothesis test
to use is very similar to the decision
process for which confidence interval to
use
• Start with
– Is the parameter a mean?
– Is the parameter a proportion?
Which Test?
• In analyzing population means
• Is the population variance known?
– If so, then we can use the normal distribution
• If the population variance is not known
– If we have “enough” data (30 or more values), we still
can use the normal distribution
– If we don’t have “enough” data (29 or fewer values),
we should use the t-distribution
• We don’t have to ask this question in the
analysis of proportions
Which Test?
• For the test of a population mean
• If
The data is OK (reasonably normal)
The variance is known
then we can use the normal distribution with a
test statistic of
Z=
x  0

n
Which Test?
• For the test of a population mean
• If
The data is OK (reasonably normal)
The variance is NOT known
then we can use the Student’s t-distribution with
a test statistic of
t=
x  0
s
n
Which Test
• For the test of a population proportion
• If the sample size is large enough, we can
use the proportions method with a test
statistic of
Z=
p̂  p0
p0 ( 1 p0 )
n
Summary
• The main questions that determine the method
• Is it a
– Population mean?
– Population proportion?
• In the case of a population mean, we need to
determine
– Is the population variance known?
– Does the data look reasonably normal?
Summary
• The process of hypothesis testing is very similar across
the testing of different parameters
• The major steps in hypothesis testing are
– Formulate the appropriate null and alternative
hypotheses
– Calculate the test statistic
– Determine the appropriate critical value or values
– Reach the reject / do not reject conclusions
Summary
• Similarities in hypothesis test processes
Parameter
H0:
(2-tailed) H1:
(L-tailed) H1:
(R-tailed) H1:
Test statistic
Critical value
Mean (Std
Dev known)
μ = μ0
Mean (Std
Dev unknown)
μ = μ0
μ ≠ μ0
μ < μ0
μ > μ0
Difference
μ ≠ μ0
μ < μ0
μ > μ0
Difference
Normal
Student t
Proportion
p = p0
p ≠ p0
p < p0
p > p0
Difference
Normal
Summary
• We can test whether sample data supports a hypothesis
claim about a population mean or a proportion
• We can use any one of three methods
– The classical method
– The P-Value method
– The Confidence Interval method
• The commonality between the three methods is that they
set a criterion for rejecting or not rejecting the test
statistic. The classical approach sets a criteria in terms
of a z value; the p-value approach sets a criteria in terms
of a probability.