Download Stat 281 Chapter 8a

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Statistics wikipedia , lookup

Transcript
Chapter 8
Hypothesis Tests
Hypothesis Testing
• We now begin the phase of this course that
discusses the highest achievement of statistics.
• Statistics, as the analytic branch of science, has
provided scientists with a tool that, since the
early part of the last century, has made possible
many of the huge achievements of that century.
• You are aware, from science classes, of terms
like conjecture, hypothesis, theory, and law.
• It is at the stage of “hypothesis” that most
scientific research is done.
• Until now, you may not have been aware of the
important role of statistics in this process. A
whole new world is about to open up to you.
What is a Hypothesis?
• In scientific parlance, a hypothesis is an
educated guess about something research may
reveal, or a potential answer to a question that is
being investigated.
• In science, this is often called the “research
hypothesis” and is a statement of the anticipated
conclusion of the experiment.
• In the statistical analysis of an experiment, we
state two hypotheses: The null hypothesis and
the alternative hypothesis.
The Alternative Hypothesis
• The alternative hypothesis usually
corresponds to the scientist’s research
hypothesis.
• It is often a statement of what one hopes
to prove in the experiment.
• In statistics, the alternative hypothesis
may be written symbolically. Its name will
be Ha or H1 (in case of multiple
hypotheses, they can be numbered)
The Null Hypothesis
• The null hypothesis is a statement that
expresses the conclusion if the experiment
doesn’t prove anything.
• It is often the “status quo,” what is
currently accepted, or a standard we hope
to beat.
• The null hypothesis is given the symbolic
name H0 (h-naught).
Clear Thinking
• There are strong philosophical reasons for
stating hypotheses in this way.
• Science progresses by proposing new ideas,
which are tested, and accepted only if there is
sufficient evidence to support the new idea over
the old.
• The effect is to prevent science from going off on
wild tangents. The benefit of the doubt goes to
currently accepted beliefs, which ensures some
stability and enforces a standard of proof for new
ideas.
Courtroom Analogy
• An important analogy is found in the American system of
justice: Innocent until proven guilty.
• Here, the H0 is innocence. That is what will be accepted
in the event that evidence is insufficient or inconclusive.
• Ha is guilt. If evidence is sufficient (beyond a reasonable
doubt), the alternative will be accepted.
• Note that a conviction is a conclusion that Ha is true, and
thus a rejection of the innocence hypothesis, but an
acquittal is not a declaration of innocence, only a
conclusion that there is insufficient evidence to convict.
• We avoid saying “accept H0,” since H0 was assumed to
begin with. We prefer to say either that we “reject H0” or
“do not reject H0.” It is OK to say “accept Ha.”
Example Testing Problems
• In the previous chapter, we discussed
estimating parameters.
• For example, use a sample mean to estimate
μ, giving both a point estimate and a CI.
• Now we take a different approach. Suppose
we have an existing belief about the value of
μ. This could come from previous research,
or it could be a standard that needs to be met.
• Examples:
– Previous corn hybrids have achieved 100 bu/acre.
We want to show that our new hybrid does better.
– Advertising claims have been made that there are
20 chips in every chocolate chip cookie. Support
or refute this claim.
Stating the Null Hypothesis
• We start with a null hypothesis.
• The null hypothesis is denoted by H0: μ=μ0 where
μ0 corresponds to the current belief or status quo.
• Example:
– In the corn problem, if our hybrid is not better, it doesn’t
beat the previous yield achievement of 100 bu/acre.
Then we have H0: μ=100 or possibly H0: μ≤100.
– In the cookie problem, if the advertising claims are
correct, we have H0: μ=20 or possibly H0: μ≥20.
• Notice the choice of null hypothesis is not based
on what we hope to prove, but on what is currently
accepted.
Stating the Alternative
• The alternative hypothesis is the result that
you will get if your research proves something
is different from status quo or from what is
expected.
• It is denoted by Ha: μ≠μ0. Sometimes there is
more than one alternative, so we can write
H1: μ≠μ0, H2: μ>μ0, and H3: μ<μ0.
• In the corn problem, if our yield is more than
100 we have proved that our hybrid is better,
so the alternative Ha: μ>100 is appropriate.
Stating the Alternative
• For the cookie example, if there are less than
20 chips per cookie, the advertisers are
wrong and possibly guilty of false advertising,
so we want to prove Ha: μ<20.
• A jar of peanut butter is supposed to have 16
oz in it. If there is too much, the cost goes up,
while if it is too little, consumers will complain.
Therefore we have H0: μ=16 and Ha: μ≠16.
• From these examples, we can see that some
tests focus on one direction and some do not.
Comparison with
Confidence Intervals
• In a confidence interval, our focus is to
provide an estimate of a parameter.
• A hypothesis test makes use of an estimate,
such as the sample mean, but is not directly
concerned with estimation.
• The point is to determine if a proposed value
of the parameter is likely to be untrue.
Test of the Mean, σ Known
• The null hypothesis is initially assumed true.
• It states that the mean has a particular value, μ0.
• Therefore, it follows that the distribution of x-bar has the
same mean, μ0.
• The logic goes something like this. If we take a sample,
we get a particular sample mean. If the null hypothesis
is true, that mean is not likely to be “far away” from the
hypothesized mean. It could happen, but it’s not likely.
Therefore, if the sample mean is “too far away,” we will
suspect something is wrong, and reject the null
hypothesis.
• The next slide shows this graphically.
Comments on the Graph
• What we see in the previous graph is the idea
that lots of sample means will fall close to the
true mean. About 68% fall within one standard
deviation. There is still a 32% chance of
getting a sample mean farther away than that.
So, if a mean occurs more than one standard
deviation away, we may still consider it quite
possible that this is a random fluctuation,
rather than a sign that something is wrong
with the null hypothesis.
More Comments
• If we go to two standard deviations, about
95% of observed means would be
included. There is only a 5% chance of
getting a sample mean farther away than
that. So, if a far-away mean occurs (more
than two standard deviations out), we think
it is more likely that it comes from a
different distribution, rather than the one
specified in the null hypothesis.
Choosing a Significance Level
• The next graph shows what it means to
choose a 5% significance level.
• If the null hypothesis is true, there is only a
5% chance that the standardized sample
mean will be above 1.96 or below -1.96.
• These values will serve as a cutoff for the
test.
• We are dealing only with cases where the
sample mean can be assumed normal.
x
Decision Time
• We have already shown that we can use a
standardized value instead of x to decide
when to reject. We will call this value Z*, the
standard normal test statistic.
• The criterion by which we decide when to
reject the null hypothesis is called a “decision
rule.”
• We establish a cutoff value, beyond which is
the rejection region. If Z* falls into that region,
we will reject Ho.
• The next slide shows this for α=.05.
x
One-tailed Tests
• Our graphs so far have shown tests with
two tails.
• We have also seen that the alternative
hypothesis could be of the form H2: μ>μ0,
or H3: μ<μ0.
• These are one-tailed tests. The rejection
region only goes to one side, and all of α
goes into one tail (it doesn’t split).


Making Mistakes
• Hypothesis testing is a statistical process,
involving random events. As a result, we could
make the wrong decision.
• A Type I Error occurs if we reject H0 when it is
true. The probability of this is known as α, the
level of significance.
• A Type II Error occurs when we fail to reject a
false null hypothesis. The probability of this is
known as β.
• The Power of a test is 1-β. This is the probability
of rejecting the null hypothesis when it is false.
Classification of Errors
Actual
Decision
Ho True
Ho False
Reject
Type I Err Type B
P(Error)= α Correct
Do Not
Reject
Type A
Correct
Type II Err
P(Error)=β
Two important numbers
• The significance level of a test is α, the
probability of rejecting Ho if it is true.
• The power of a test is 1-β, the probability
of rejecting Ho if it is false.
• There is a kind of trade-off between
significance and power. We want
significance small and power large, but
they tend to increase or decrease together.
Steps in Hypothesis Testing
1. State the null and alternative hypotheses
2. Determine the appropriate type of test
(check assumptions)
3. Define the rejection region
4. Calculate the test statistic
5. State the conclusion in terms of the
original problem
p-Value Testing
• Say you are reporting some research in biology
and in your paper you state that you have rejected
the null hypothesis at the .10 level. Someone
reviewing the paper may say, “What if you used a
.05 level? Would you still have rejected?”
• To avoid this kind of question, researchers began
reporting the p-value, which is actually the
smallest α that would result in a rejection.
• It’s kind of like coming at the problem from
behind. Instead looking at α to determine a critical
region, we let the estimate show us the critical
region that would “work.”
How p-Values Work
• To simplify the explanation,
let’s look at a right-tailed
means test. We assume a
distribution with mean μ0 and
we calculate a sample mean.
• What if our sample mean fell
right on the boundary of the
Z*
critical region?
• This is just at the point where we would reject H0.
• So if we calculate the probability of a value greater than x ,
this corresponds to the smallest α that results in a rejection.
• If the test is two tailed, we have to double the probability,
because x marks one part of the rejection region, but its
negative marks the other part, on the other side (other tail).
Using a p-Value
• Using a p-Value couldn’t be easier. If p<α, we reject H0.
That’s it.
• p-Values tell us something about the “strength” of a rejection.
If p is really small, we can be very confident in the decision.
• In real world problems, many p-Values turn out to be like
.001 or even less. We can feel very good about a rejection in
this case. However, if p is around .05 or .1, we might be a
little nervous.
• When Fischer originally proposed these ideas early in the last
century, he suggested three categories of decision:
– p < .05  Reject H0
– .05 ≤ p ≤ .20  more research needed
– p > .20  Accept H0