Download Hypothesis Testing

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Statistics wikipedia , lookup

Transcript
Hypothesis Testing
Steps to Answering the
Questions with Data
How does science advance knowledge? How do we
answer questions about the world using
observations? Generally, science forms a question
and brings data to bear to answer it. Informally, the
process is:
• Understand what data says and form clear questions
about its implications.
• State a scientific question as either "just random
variability" or "unusual outcomes"?
• Compare the actual data with these two choices and
decide which to believe.
• Write down our understanding.
Detailed Steps
Phase 1: State the Question
1. Evaluate and describe the data
2. Review assumptions
3. State the question-in the form of hypotheses
Phase 2: Decide How to Answer the Question
4. Decide on a summary number-a statistic-that
reflects the question
5. How could random variation affect that statistic?
6. State a decision rule, using the statistic, to
answer the question
Detailed Steps (cont)
Phase 3: Answer the Question
7. Calculate the statistic
8. Make a statistical decision
9. State the substantive conclusion
Phase 4: Communicate the Answer to the
Question
10. Document our understanding with text,
tables, or figures
Tonight we’ll let the statistics speak for themselves
Ed Koren,
Koren, © The New Yorker,
Yorker, 9 December 1974.
Phase 1: State the Question
• The goal of phase 1 is to state the question. This
may at first seem trivial but it turns out that this is
often the hardest part!
• Perhaps the question is clear. If so, then in
phase 1 of the process, we look at the data and
its relationship to the question at hand.
Sometimes this relationship is obvious.
• More commonly, we may have data and it's not
entirely clear what the question is. Or we have a
clear question but the relationship of data to this
question in tenuous.
Step 1. Evaluate & Describe the
Data
1. The goal is to understand the data and to
fully describe the data. The questions we
should ask are:
• "Where did the data come from?" and
• "What are the observed statistics?"
Where did the data come
from?
At this point, we need to satisfy ourselves that this was a
well-designed study and that the folks in the study are
representative.
At the moment, all that is necessary is to know that the
study's methods were clearly defined and relevant, that
there were adequate controls in place, that the subjects
in the study were representative of cardiac-arrest
victims, and that the measurements of mortality were
without bias.
Bias could result, for instance, if only the "easier" cardiacarrest cases were included in the study.
What are the observed
statistics?
This case study "found that 29 of 278 CPR
patients survived." How can we
summarize that statement numerically?
With a statistic.
A statistic is a single descriptive number
computed from the data
The proportion of people who survived is:
xsurvived
p=
nphone
• where x(survived) is the number of individuals studied
who survived,
• n(phone) is the number of individuals studied (those
receiving the phone-CPR)
• p is the observed proportion surviving.
In the NEJM paper, the observed proportion
surviving is:
xsurvived
29
p=
=
= 0.104
nphone
278
~ 10%
So, about 10% of cardiac-arrest victims
survived after phone-CPR.
2. Review Assumptions
The next step is stating what assumptions we're making. In
this case study, the main assumptions are that the
cardiac arrest victims in this study are representative of
cardiac victims in general and that each victim's
experience is independent of the next.
Since the study had well-designed entry criteria and no
subjects were excluded for arbitrary reasons, we can
conclude that the subjects in this study are
representative of others who may have an emergency
response system similar to that of Seattle.
3. State the Question
What, in statistical terms, is the question?
Informally, we want to know if phone-CPR is an
improvement.
We want to know if our observed 10.4% phone-CPR
survival rate is better than the old survival rate of
6%.
Informally, we might be tempted to say "10.4% is
greater than 6% so phone-CPR is better." But be
more critical and ask whether this difference could
be due to chance.
That is, if we'd observed a 6.2% survival rate, would
we be tempted to say "6.2% is greater than 6% so
phone-CPR is better?"
Defining a Hypothesis
A statistical hypothesis is a hypothesis stated so
that it can be evaluated using statistical
techniques.
We begin by conceiving the true state of nature as
being either "no difference" or "a difference."
We state two hypotheses to reflect these two states
of nature.
A hypothesis is a statement about a population (Not
about a sample or the data).
First, we phrase a hypothesis of no difference,
termed a null-hypothesis.
Null Hypothesis
The null hypothesis, H0, is the statement that is
tested.
A null-hypothesis is the simplest explanation of
events: There is no difference. There is no
change. There is no improvement. Nothing
unusual is occurring.
A null-hypothesis is the statement we hope to
contradict with data. We usually hope to reject
the null hypothesis.
We always start by assuming that nothing is
going on-that any apparent differences are
purely because of chance.
Our preference, as scientists, is to believe
the simplest explanation for a
phenomenon.
Randomness
Thou shalt not interpret randomness.
• Randomness just "is"; making an
interpretation that goes beyond this requires
justification.
• If random noise, measurement error, or
chance occurrence can account for
observations, then there is no need to
formulate a more complicated explanation.
• We embody this preference in the statement
of the null hypothesis.
CPR Study
We start by assuming that nothing is going onthat any apparent improvement over the 6%
survival rate is purely because of chance.
What would the survival rate be if essentially
no intervention was made?
From the information in this case, we see that
we'd expect a 6% survival rate. Two ways to
state the null-hypothesis are:
H0: true percentage surviving is ≤ 6%, or
H0: true proportion surviving is ≤ 0.06.
Thus, one proposed state of nature is that
survival is 0.06 or less.
Then, the other possible state of nature is
that survival is greater than 0.06
The question
Thus we actually state the question addressed by
the study by positing two possible states of
nature.
H0: true proportion ≤ 0.06, or
HA: true proportion > 0.06.
The first state, as we've seen, is called the nullhypothesis.
The second state is the alternative-hypothesis.
Alternative Hypothesis
The alternative hypothesis, HA, is the statement we
hope to be able to conclude.
The statement about a population that is true if the
null hypothesis is not true. The two are
complementary: One and only one of the two
hypotheses is true.
Also called the research hypothesis.
Proof by contradiction
When forced to choose between a simple
conceptual model and real-world data that
clearly contradicts that model, we choose
to trust the data.
We won’t PROVE the null hypothesis, we
will accept the alternative by contradiction.
Phase 2: Decide How to Answer
the Question
In Phase 2, we’ll decide how to answer the
question.
We’ll consider all the outcomes that could
happen and assess the impact of random
variation.
4. Decide on a summary
statistic
Our summary statistic is p, the proportion
surviving.
If the null-hypothesis is true, this should be
approximately 0.06.
5. How could random variation affect
the observed statistic?
What survival proportions would we expect
to observe if
1) the null-hypothesis is true, and
2) we repeated the study over and over
again?
A simulation
How often would we observe various proportions
surviving if we run this study 1000 times, each
time on n = 278 subjects and each subject has a
0.06 chance of surviving?
Every time this study is done you'll observe a
different number of survivors; the estimated
proportion surviving will be different.
What we actually observed in this study is just one
of a number of possibilities of what we could
have observed.
Variability
Here we see that the range of proportions includes
p = 0.018.
In one experiment only five out of 278 survived. Up
through two experiments where p = 0.108, or an
observed proportion almost twice the true,
underlying survival rate of 0.06.
One answer to the question "how much variability
do we expect?" is "proportions between 0.018
and 0.108."
• The single most common proportion is
0.058 but it occurs only 10.4% of the time.
Other more typical values occur around this
middle.
• 29% of the time we observed proportion
between 0.054 and 0.061.
• Over 46% of the time between 0.050 and
0.065.
• How much variability do we expect? If the
true underlying proportion is 0.06 then more
than half the time (61.3%) we'll observe
values between 0.047 and 0.068.
It's useful to consider what we would choose to
believe if various outcomes occur.
In our simulation of 1000 experiments, we saw
p = 0.058 (n = 16 survivors) in 104
experiments.
We saw p = 0.061 (n = 17 survivors) in 88
experiments.
We saw p = 0.065 (n = 18 survivors) in 88
experiments. ... and so on.
Overall, in 1000 experiments, we saw 613 this
extreme or more.
So, observing 16 survivors or more is fairly
common; it happened in 61.3% of the
experiments.
What would this mean?
1. The survival proportion is 0.06 (or less).
To choose this, we're saying that an event
occurred that is within what we'd expect if the
null hypothesis is true.
2. The survival proportion is larger than 0.06.
To choose this, we're saying that the observed
data-survival p = 0.058-is compelling evidence
that the null hypothesis is not true.
After observing only 16 survivors, it would be
extremely difficult to defend choice 2.
We just can't observe a proportion less than 0.06
as evidence that the real proportion is greater
than 0.06.
Observing p = 0.058 clearly supports the null
hypothesis. We'd choose to stay with our
preference that the actual proportion is
unchanged at 0.06.
6. State a decision rule to
answer the question
• Which outcomes lead us to retain our
preference for the null hypothesis?
• And which lead us to conclude that the
alternative is compelling?
• The answer to these questions will be our
decision rule.
Consider three possible outcomes: n = 17 survivors, n = 18
survivors, or n = 19 survivors.
• The simulation shows that about half of the time
(509/1000) we'd see proportions of p = 0.061 or greater
(x = 17 or more survivors).
• About 40% of the time (421/1000) we'd see proportions
of p = 0.065 or greater (x = 18 or more survivors).
• About a third of the time (333/1000) we'd see proportions
of p = 0.068 or greater (x = 19 or more survivors).
Decision
if we observed p = 0.068 we would retain
our preference for the null-hypothesis.
That is, we presume the null-hypothesis is
true and for the three examples above, we
would agree that there was no compelling
evidence to reject the null-hypothesis.
Significance
Choice 2 implies what an observed result of
p = 0.108 is so "significant" a departure
from what we'd observe by chance, that
we reject the null hypothesis
• The significance level is represented by the
Greek symbol "alpha",α. It is the probability of
rejecting a true null hypothesis.
• The researcher chooses the risk of making this
error: concluding that the null hypothesis is false
when it really is true.
• The most frequent values are α = 0.05, 0.01, or
0.10.
Choosing α
• The researcher chooses the risk of making
this error, prior to conducting the analysis.
• The significance level is the percentage or
proportion of time the researcher is willing
to conclude that the null hypothesis is
false when it really is true.
CPR Study
Our decision rule is thus:
Ho: The population survival proportion is 0.06 or
less if the observed proportion p ≤ 0.083
(x = 23 survivors or less).
HA: The population survival proportion is larger
than 0.06 if the observed proportion p > 0.086
(x = 24 or more survivors).
Phase 3: Answer the Question
• Now we return to what we did observe.
• Recall the question: Does dispatcherinstructed bystander-administered CPR
improve survival?
7. Calculate the statistic
• In the actual study, we observed
proportion of 0.104.
8. Make a decision
• Based on our simulated results, our observed
proportion falls under the decision rule for
rejecting the null hypothesis in favor of the
alternative hypothesis.
• That is, we choose to believe the alternative
hypothesis:
HA: The survival proportion is > 0.06, since the
observed proportion p ≥ 0.086 (x = 24 or more
survivors).
Associated p-value
• So, if the survival rate really is 6% it would apparently be
a rare event to observe 10.4% survival.
• Therefore, the p-value for our experiment is thus
approximately 3 in 1000 (p-value = 0.003).
• Recall that this calculation assumes that the nullhypothesis is true: That the survival proportion is really
0.06.
• It is important to note that it is possible to observe
proportions as large as 0.104, but the likelihood is very
small (three in a thousand).
P-value
A p-value is the probability that randomness
alone leads to a test statistic ≥ to the
observed statistic.
When H0 is true, the p-value is the
probability of the test statistic as extreme,
or more extreme than the one actually
observed
9. State the conclusion
Does dispatcher-instructed bystanderadministered CPR improve the chances of
survival over no CPR? With this set of observed
data, it would be hard to defend the null
hypothesis. We reject the null hypothesis of
survival rate being 0.06 in favor of the rate being
> 0.06. Bystander CPR improves the survival
rate compared to no CPR.
Phase 4: Communicate the
Answer to the Question
• The final, and perhaps most important,
phase of this process is to communicate
what you understand.
• We’ve formalized a precise question,
described how to answer it, then brought
data to bear on the question, and finally
answered it
10. Document our understanding with
text, tables, or figures
Bystander-administered CPR was administered to
n = 278 cardiac arrest victims. Survival
probabilities of cardiac arrest victims without this
intervention is 6%. In this study, the observed
survival probability was 10.4% (x = 29), which
was a significant increase (p-value < 0.01).
Bystander CPR improves the survival to hospital
discharge compared to non CPR.
Summary
In this section, we used a case study and
simulation to go through the steps to answer
questions with data.
These steps are called hypothesis testing. This
was done using simulated results, specific to this
study.
Often, this isn’t possible, so in the next section,
we’ll standardize the process to a formal set of
steps to use in general.