Download Slide 1

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
Transcript
Chapter 21
More About Tests
Copyright © 2009 Pearson Education, Inc.
Objectives:

The student will be able to:
 Explain Type I and Type II errors in the context
of the problem.
Copyright © 2009 Pearson Education, Inc.
Slide 1- 3
Zero In on the Null



Null hypotheses have special requirements.
To perform a hypothesis test, the null must be a
statement about the value of a parameter for a
model.
We then use this value to compute the probability
that the observed sample statistic—or something
even farther from the null value—will occur.
Copyright © 2009 Pearson Education, Inc.
Slide 1- 4
Zero In on the Null (cont.)


There is a temptation to state your claim as the
null hypothesis.
 However, you cannot prove a null hypothesis
true.
So, it makes more sense to use what you want to
show as the alternative.
 This way, when you reject the null, you are left
with what you want to show.
Copyright © 2009 Pearson Education, Inc.
Slide 1- 5
How to Think About P-Values


A P-value is a conditional probability—the
probability of the observed statistic given that the
null hypothesis is true.
 The P-value is NOT the probability that the null
hypothesis is true.
 It’s not even the conditional probability that null
hypothesis is true given the data.
Be careful to interpret the P-value correctly.
Copyright © 2009 Pearson Education, Inc.
Slide 1- 6
What to Do with a High P-Value



When we see a small P-value, this means our observed event
would be very very rare if the null hypothesis were true. We
use this as evidence to reject the null hypothesis
However big P-values just mean what we observed isn’t
surprising. That is, the results are now in line with our
assumption that the null hypothesis models the world, so we
have no reason to reject it.
A big P-value doesn’t prove that the null hypothesis is true, but
it certainly offers no evidence that it is not true.
 Thus, when we see a large P-value, all we can say is that
we “don’t reject the null hypothesis.”
Copyright © 2009 Pearson Education, Inc.
Slide 1- 7
Alpha Levels

We can define “rare event” arbitrarily by setting a
threshold for our P-value.
 If our P-value falls below that point, we’ll reject
H0. We call such results statistically significant.
 The threshold is called an alpha level, denoted
by .
Copyright © 2009 Pearson Education, Inc.
Slide 1- 8
Alpha Levels (cont.)


Common alpha levels are 0.10, 0.05, and 0.01.
 You have the option—almost the obligation—to
consider your alpha level carefully and choose
an appropriate one for the situation.
The alpha level is also called the significance
level.
 When we reject the null hypothesis, we say that
the test is “significant at that level.”
Copyright © 2009 Pearson Education, Inc.
Slide 1- 9
Alpha Levels (cont.)

What can you say if the P-value does not fall
below ?
 You should say that “The data have failed to
provide sufficient evidence to reject the null
hypothesis.”
 Don’t say that you “accept the null hypothesis.”
Copyright © 2009 Pearson Education, Inc.
Slide 1- 10
Alpha Levels (cont.)

Recall that, in a jury trial, if we do not find the
defendant guilty, we say the defendant is “not
guilty”—we don’t say that the defendant is
“innocent.”
Copyright © 2009 Pearson Education, Inc.
Slide 1- 11
What Not to Say About Significance


What do we mean when we say that a test is
statistically significant?
 All we mean is that the test statistic had a Pvalue lower than our alpha level.
Don’t be lulled into thinking that statistical
significance carries with it any sense of practical
importance or impact.
Copyright © 2009 Pearson Education, Inc.
Slide 1- 12
What Not to Say About Significance (cont.)


For large samples, even small, unimportant
(“insignificant”) deviations from the null hypothesis can be
statistically significant.
On the other hand, if the sample is not large enough,
even large, financially or scientifically “significant”
differences may not be statistically significant.
Copyright © 2009 Pearson Education, Inc.
Slide 1- 13
Confidence Intervals and Hypothesis Tests


Confidence intervals and hypothesis tests are
built from the same calculations.
 They have the same assumptions and
conditions.
You can approximate a hypothesis test by
examining a confidence interval.
 Just ask whether the null hypothesis value is
consistent with a confidence interval for the
parameter at the corresponding confidence
level.
Copyright © 2009 Pearson Education, Inc.
Slide 1- 14
Confidence Intervals and Hypothesis Tests
(cont.)

Because confidence intervals are two-sided, they
correspond to two-sided tests.
 In general, a confidence interval with a
confidence level of C% corresponds to a
two-sided hypothesis test with an -level
of 100 – C%.
Copyright © 2009 Pearson Education, Inc.
Slide 1- 15
Recall …



There are supposed to be 20% orange M&Ms in
a bag. Suppose a bag of 122 has only 21 orange
ones. Does this contradict the company’s 20%
claim?
We previously solved this using a 2-sided one
proportion z-Test. H0: p = 0.2 HA: p≠0.2
Alternatively we can look at the 95% confidence
interval on p^. This reflects 95% confidence that
the true proportion is within the identified range.
 p^±1.96*SE(p^)
 We ask, does this interval include the null
hypothesis proportion?
Copyright © 2009 Pearson Education, Inc.
Slide 1- 16
Confidence Intervals and Hypothesis Tests
(cont.)

The relationship between confidence intervals
and one-sided hypothesis tests is a little more
complicated.
 A confidence interval with a confidence level of
C% corresponds to a one-sided hypothesis
test with an -level of ½(100 – C)%.
 So 90% confidence corresponds to an alpha of
5%. For a one-sided test, we use a “looser”
confidence interval because we are only really
concerned with one side of the interval.
Copyright © 2009 Pearson Education, Inc.
Slide 1- 17
Recall our example from chapter 20


A 1996 report from the U.S. Consumer Product Safety
Commission claimed that at least 90% of all American
homes have at least one smoke detector. A city’s fire
department has been running a public safety campaign
about smoke detectors consisting of posters, billboards,
and ads on radio, TV, and in the newspaper. The city
wonders if this concerted effort has raised the local level
above the 90% national rate.
Building inspectors visit 400 randomly selected homes
and find that 376 have smoke detectors. Is this strong
evidence that the local rate is higher than the national
rate?
Copyright © 2009 Pearson Education, Inc.
Slide 1- 18
We solved this using a one-proportion
z-Test
p0 q0
p̂  p0 

SD  p̂  
z
SD  p̂ 




n
Mechanics: n=400, x=376. p^=376/400=0.94 p0=0.9
and q0=0.1
z = (0.94 – 0.90) / √( (0.9)(0.1) / 400 ) = 2.67
P = P(z>2.67) = 0.004
Alternatively we for an alpha level of 0.05 (5%)
we could have looked at the 90% confidence
interval on p^.
 This interval is p^± z*SE(p^)
Copyright © 2009 Pearson Education, Inc.
Slide 1- 19
Practice

In May 2007, George W. Bush’s approval rating
stood at 30% according to a CBS News/ New
York Times national survey of 1125 randomly
selected adults
 Make a 95% confidence interval for his
approval rating
 Based on the confidence interval, test the null
hypothesis that Bush’s approval rating was no
better than the 27% level established by
Richard Nixon during the Watergate scandal.
 What is the significance level of this test?
Copyright © 2009 Pearson Education, Inc.
Slide 1- 20
Aside…

Notation etc.
Copyright © 2009 Pearson Education, Inc.
Slide 1- 21
Which P is Which?




p is the true population proportion. This is usually
unknown. Sometimes we call it p0 which is an
assumed value under the null hypothesis.
p^ is our observed sample proportion.
P is the P-value found using a z-Test which
reflects the probability of observing a certain
proportion in a sample (p^) under the null
hypothesis.
P(z>z*) is the probability that a z-score z is
greater than a critical z-score z*.
Copyright © 2009 Pearson Education, Inc.
Slide 1- 22
When to use SE(p^) and when to use
SD(p^)

We use SE(p^) when we are looking at
confidence intervals of an observed proportion
when we don’t know the actual population
parameters



As part of creating a one-proportion z-interval
As part of our conclusion statement following rejecting
the null hypothesis in a on-proportion z-Test
We use SD(p^) when we are conducting a oneproportion z-Test.

In this case we are assuming the null hypothesis, that
p^=p0, thus we use p0 and q0 in our calculation and call
it a Standard Deviation because we have assumed a
known model.
Slide 1- 23
Copyright © 2009 Pearson Education, Inc.
What if we’re wrong…


We can be incorrect in 2 different ways
Lets talk about these ways
Copyright © 2009 Pearson Education, Inc.
Slide 1- 24
Making Errors


Nobody’s perfect. Even with lots of evidence
we can still make the wrong decision.
When we perform a hypothesis test, we can
make mistakes in two ways:
I.
The null hypothesis is true, but we
mistakenly reject it. (Type I error)
II. The null hypothesis is false, but we fail to
reject it. (Type II error)
Copyright © 2009 Pearson Education, Inc.
Slide 1- 25
Making Errors (cont.)


Which type of error is more serious depends on the
situation at hand. In other words, the gravity of the error is
context dependent.
Here’s an illustration of the four situations in a hypothesis
test:
Copyright © 2009 Pearson Education, Inc.
Slide 1- 26
Making Errors (cont.)


How often will a Type I error occur?
 Since a Type I error is rejecting a true null
hypothesis, the probability of a Type I error is
our  level.
When H0 is false and we reject it, we have done
the right thing.
 A test’s ability to detect a false hypothesis is
called the power of the test.
Copyright © 2009 Pearson Education, Inc.
Slide 1- 27
Making Errors (cont.)

When H0 is false and we fail to reject it, we have
made a Type II error.
 We assign the letter  to the probability of this
mistake.
 It’s harder to assess the value of  because we
don’t know what the value of the parameter
really is.
Copyright © 2009 Pearson Education, Inc.
Slide 1- 28
Examples – Type I and Type II errors


Production managers on an assembly line must monitor to be sure
that the level of defective products remains small. They periodically
inspect a random sample of the items produced. If they find a
significant increase in the production of items that must be rejected,
they will halt the assembly process until the problem can be identified
and repaired.
 What are the hypotheses?
 Would you do a one-tailed or two tailed test?
 In this context, what is a Type I error? Type II?
Highway safety engineers test new road signs, hoping that increased
reflectivity will make them more visible to drivers. Volunteers drive
through a test course with several of the new and old style signs and
rate which kind shows up the best.
 What are the hypotheses?
 In this context, what are the Type I and Type II errors?
Copyright © 2009 Pearson Education, Inc.
Slide 1- 30
In 2005 the US Census Bureau reported that 68.9% of American families owned
their homes. Census data reveal that the ownership rate in one small city is
much lower. The city council is debating a plan to offer tax breaks to firsttime home buyers in order to encourage people to become home owners.
They decide to adopt the plan of a 2-year trial basis and use the data they
collect to make a decision about continuing the tax breaks. Since this plan
costs the city tax revenues, they will continue to use it only if there is strong
evidence that the rate of home ownership is increasing.

What will their hypotheses be?

H0 : the rate of home ownership does not change
HA : the rate of home ownership increases

What would a type I error be?

Type I we reject H0 when H0 is true. So we think that the ownership has
increased but really it has stayed the same

What would a type II error be?

Type II we accept H0 when really H0 is false. This occurs if we think that
the rate of ownership has stayed the same when really it has increased.

For each type of error, who will be harmed?

Type I – the tax payers/city would be harmed, Type II – future first time
home buyers who will not get the tax break
Copyright © 2009 Pearson Education, Inc.
Slide 1- 31
More practice

Dropouts
 A statistics professor has observed that about 13% of the
students who initially enroll in his Introductory Stats class
withdraw before the end of the semester. A salesman
suggests that he try a statistics software package that
gets students more involved with computers, predicting it
will cut the dropout rate. The software is expensive so
the salesman offers to let the professor use it for a
semester to see if the dropout rate goes down
significantly.
 Is this a one-tailed or two-tailed test?
 Write the null and alternative hypotheses
 In this context, what would happen if the professor makes
a Type I error?
 In this context, what would happen if the professor makes
a Type II error?
Copyright © 2009 Pearson Education, Inc.
Slide 1- 32

Dropouts part II
 Initially 203 students signed up for the course.
They used the software suggested by the
salesman and only 11 dropped out of the
course.
 Should the professor spend the money for this
software? Support your recommendation with
an appropriate test.
 Explain what your P-value means
Copyright © 2009 Pearson Education, Inc.
Slide 1- 33
Remaining slides are
For Your Information
Copyright © 2009 Pearson Education, Inc.
Slide 1- 34
Power – motivating example




Suppose you are testing a new type of running
shoe to see if it’s worth the money. If the new
shoe made you 5% faster, how easy would it be
to tell?
It would take a lot of runs to make that difference
clear.
On the other hand if the shoe made you twice as
fast, it should be clear from a very small sample
size (of test runs).
The effect size and the sample size are important
to determining the power of a test.
Copyright © 2009 Pearson Education, Inc.
Slide 1- 35
Power



The power of a test is the probability that it
correctly rejects a false null hypothesis.
 Noticing a difference that is really there
When the power is high, we can be confident that
we’ve looked hard enough at the situation.
The power of a test is 1 – .
  is the probability of falsely accepting an
incorrect null hypothesis
Copyright © 2009 Pearson Education, Inc.
Slide 1- 36
Power (cont.)



Whenever a study fails to reject its null
hypothesis, the test’s power comes into question.
 We have to ask ourselves, was the null
hypothesis really true or did our test lack power
to detect a real difference.
When we calculate power, we imagine that the
null hypothesis is false.
The value of the power depends on how far the
truth lies from the null hypothesis value.
 The distance between the null hypothesis
value, p0, and the truth, p, is called the effect
size.
 Power depends directly on effect size.
Slide 1- 37
Copyright © 2009 Pearson Education, Inc.
A Picture Worth



1
P ( z  3.09)
Words
The larger the effect size, the easier it should be
to see it.
Obtaining a larger sample size decreases the
probability of a Type II error, so it increases the
power.
It also makes sense that the more we’re willing to
accept a Type I error, the less likely we will be to
make a Type II error.
Copyright © 2009 Pearson Education, Inc.
Slide 1- 38
A Picture Worth

1
P ( z  3.09)
Words (cont)
This diagram shows the relationship between
these concepts:
Copyright © 2009 Pearson Education, Inc.
Slide 1- 39
Reducing Both Type I and Type II Error



The previous figure seems to show that if we
reduce Type I error, we must automatically
increase Type II error.
But, we can reduce both types of error by making
both curves narrower.
How do we make the curves narrower? Increase
the sample size.
Copyright © 2009 Pearson Education, Inc.
Slide 1- 40
Reducing Both Type I and Type II Error
(cont.)

This figure has means that are just as far apart as in the
previous figure, but the sample sizes are larger, the
standard deviations are smaller, and the error rates are
reduced:
Copyright © 2009 Pearson Education, Inc.
Slide 1- 41
Reducing Both Type I and Type II Error
(cont.)

Original comparison of
errors:
Copyright © 2009 Pearson Education, Inc.

Comparison of errors with
a larger sample size:
Slide 1- 42
What Can Go Wrong?


Don’t interpret the P-value as the probability that H0 is
true.
 The P-value is about the data, not the hypothesis.
 It’s the probability of the data given that H0 is true, not
the other way around.
Don’t believe too strongly in arbitrary alpha levels.
 It’s better to report your P-value and a confidence
interval so that the reader can make her/his own
decision.
Copyright © 2009 Pearson Education, Inc.
Slide 1- 43
What Can Go Wrong? (cont.)


Don’t confuse practical and statistical
significance.
 Just because a test is statistically significant
doesn’t mean that it is significant in practice.
 And, sample size can impact your decision
about a null hypothesis, making you miss an
important difference or find an “insignificant”
difference.
Don’t forget that in spite of all your care, you
might make a wrong decision.
Copyright © 2009 Pearson Education, Inc.
Slide 1- 44