Download 5. Hypothesis Testing

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
Transcript
5. Hypothesis Testing
‘mechanical aptitude’
Is μ > 75 ?
Using Data to Test Hypotheses
“When it is not in our power to
determine what is true, we ought to
follow what is most probable.”
Confidence intervals are one of the two
most common types of formal statistical
inference. They are appropriate when our
goal is to estimate a population parameter.
The second common type of inference is
directed at a different goal; to assess the
evidence provided by the data in favour of
some claim about the population.
= [68.5,70.5]
= [ 1222, 1300]
Hypotheses
• In Statistics, a hypothesis proposes a model. Then
we look at the data.
• If the data are consistent with that model, we have
no reason to disbelieve the hypothesis. Data
consistent with the model lend support to the
hypothesis, but do not prove it.
• But if the facts are inconsistent with the model, we
need to make a choice as to whether they are
inconsistent enough to disbelieve the model. If
they are inconsistent enough, we can reject the
model.
Hypotheses (cont.)
• Think about the logic of jury trials: To
prove someone is guilty, we start by
assuming they are innocent. We retain that
hypothesis until the facts make it unlikely
beyond a reasonable doubt. Then, and only
then, we reject the hypothesis of innocence
and declare the person guilty.
Hypotheses (cont.)
• The same logic used in jury trials is used in
statistical tests of hypotheses: We begin by
assuming that a hypothesis is true. Next we
consider whether the data are consistent
with the hypothesis. If they are, all we can
do is retain the hypothesis we started with.
If they are not, then like a jury, we ask
whether they are unlikely beyond a
reasonable doubt.
Hypotheses (cont.)
Testing Hypotheses
• In Statistics, we can quantify our doubt by
finding the probability that data like we saw
could occur based on our hypothesized
model.
• The null hypothesis, which we denote H0,
specifies a population model parameter of interest
and proposes a value for that parameter. We might
have, for example, H0: μ >75, as in the mechanical
aptitude example.
• We want to compare our data to what we would
expect given that H0 is true.
• We then ask how likely it is to get results from a
sample like we did if the null hypothesis were
true.
– If the results seem consistent with what we
would expect from natural sampling variability,
we’ll retain the hypothesis.
– If the probability of seeing results like our data
is really low, we reject the hypothesis.
Strategy 1
Strategy
1.
State a hypothesis
2. Take a random sample from the population of
interest and calculate a suitable statistic
3.
Investigate how likely the value of that
statistic is if your specified hypothesis is true
4.
Make a decision as to whether your
hypothesis is true given iii.
1. State a hypothesis
–
–
The null hypothesis: To perform a hypothesis test, we
must first translate our question of interest into a
statement about model parameters. In general, we
have H0: parameter = value.
The alternative hypothesis: The alternative
hypothesis, HA, contains the values of the parameter
we accept if we reject the null. HA comes in three
basic forms:
•
•
•
HA: parameter < value
HA: parameter ≠ value
HA: parameter > value
Strategy 4
5. Hypothesis Testing
– The decision in a hypothesis test is always a
statement about the null hypothesis.
– The decision must state either that we reject
or that we fail to reject the null hypothesis.
(a) Tests Concerning Means
4. Decision
(i) One Sample Test
Given a random sample of size n from a
population with mean μ and standard
deviance σ, how do we test
Ho : μ = μ o
Against various alternative hypotheses?
(i) Case 1 n ≥ 30
2. TS:
“production process” in
control if μ = 35.50mm
σ= 0.45mm.
In a random sample of size 40 he obtains a
mean of 35.62mm. Should the process be
shut down? What do you advise?
P-values in Hypothesis Tests
1. σ unknown? Use s.
z=
Example 1
x - μo
σ
n
3. Distribution of TS if H0 true:
N(0,1)
• Once we have our test statistic, we can
calculate a P-value—the probability of
observing a value for a test statistic at least
as far from the hypothesized value as the
statistic value actually observed if the null
hypothesis is true.
• The smaller the P-value, the more evidence
we have against the null hypothesis.
Alpha Levels
Alpha Levels (cont.)
• Sometimes we need to make a firm decision about
whether or not to reject the null hypothesis.
• When the P-value is small, it tells us that our data
are rare given H0. How rare is “rare”?
• We can define “rare event” arbitrarily by setting a
threshold for our P-value. If our P-value falls
below that point, we’ll reject H0. We call such
results statistically significant. The threshold is
called an alpha level, denoted by α.
• Common alpha levels are .10, .05, and .01.
The alpha level is also called the
significance level. (When we reject the null
hypothesis, we say that the test is
“significant at that level.”)
• You need to consider your alpha level
carefully and choose an appropriate one for
the situation.
Hypothesis testing at α=0.01
(large samples, two-tailed test)
Z* not significant at α=0.01
Z* significant
at α=0.01
Area = 0.005
-2.58
Area = 0.99
Critical Values Again (cont.)
Z* significant
at α=0.01
Area = 0.005
2.58
• Rather than looking up your test statistic
value in the table, you could just check it
directly against these critical values.
– Any test statistic score larger in magnitude than
a particular critical value leads us to reject H0.
– Any test statistic score smaller in magnitude
than a particular critical value leads us to fail to
reject H0.
Sig. level α
0.10
Critical values
of z for one-tailed
test
-1.28
or 1.28
Critical values
of z for two-tailed
test
-1.65
or 1.65
0.05
0.01
-1.65
-2.33
or 1.65 or 2.33
-1.96
-2.58
or 1.96 or 2.58
P-Values and Decisions:
What to Tell About a Hypothesis Test
• How small should the P-value be in order for you
to reject the null hypothesis?
• It turns out that our decision criterion is contextdependent.
– When we’re screening for a disease and want to be sure
we treat all those who are sick, we may be willing to
reject the null hypothesis of no disease with a fairly
large P-value.
– A longstanding hypothesis, believed by many to be
true, needs stronger evidence (and a correspondingly
small P-value) to reject it.
• Another factor in choosing a P-value is the
importance of the issue being tested.
P-Values and Decisions (cont.)
• Your conclusion about any null hypothesis should
be accompanied by the P-value of the test.
• Don’t just declare the null hypothesis rejected or
not rejected—report the P-value to show the
strength of the evidence against the hypothesis.
This will let each reader decide whether or not to
reject the null hypothesis.
Exercise.
Calculate a 95% C.I. for the population
mean using the sample data and see if
the resulting interval ‘agrees’ with
the hypothesis test you just carried
out.
Alternative Alternatives
• As stated earlier, there are three possible
alternative hypotheses:
• HA: parameter < value
• HA: parameter ≠ value
• HA: parameter > value
• HA: parameter ≠ value is known as a two-sided
alternative because we are equally interested in
deviations on either side of the null hypothesis
value. For two-sided alternatives, the P-value is
the probability of deviating in either direction
from the null hypothesis value.
P-Values and Decisions:
What to Tell About a Hypothesis Test
• How small should the P-value be in order for you
to reject the null hypothesis?
• It turns out that our decision criterion is contextdependent.
– When we’re screening for a disease and want to be sure
we treat all those who are sick, we may be willing to
reject the null hypothesis of no disease with a fairly
large P-value.
– A longstanding hypothesis, believed by many to be
true, needs stronger evidence (and a correspondingly
small P-value) to reject it.
• Another factor in choosing a P-value is the
importance of the issue being tested.
Alternative Alternatives (cont.)
• The other two alternative hypotheses are
called one-sided alternatives. A one-sided
alternative focuses on deviations from the
null hypothesis value in only one direction.
Thus, the P-value for one-sided alternatives
is the probability of deviating only in the
direction of the alternative away from the
null hypothesis value.
Example 2
Current mean check in
time is 3.8mins.
In a random sample of 50 under ‘new
system’ a mean of 3.3mins (s=1.1) is
observed. Should the new system be
implemented? What do you advise?
(i) Case 2 n < 30
HA
μ≠μo
1. σ unknown? Use s.
CR at sig. level α Dist under Ho
t* < t α/2 or t* > t α/2
t(n-1 df)
Two-sided
2. TS:
t=
tα/2
x - μo
σ
μ>μo
n
t* > t α
μ<μo
t* < t α
0
tα
t(n-1 df)
One-sided
t(n-1 df)
tα/2
t(n-1 df)
One-sided
3. Distribution of TS if H0 true:
0
tα
0
Example 3
6
Frequency
5
In a sample of 23 a mean speed of 31mph
(s=4.25) is observed. Are cars obeying the
speed limit in general?
4
3
2
1
0
24
28
32
Speed
36
40
1. A significance test is a formal procedure
for comparing observed data with a
hypothesis whose truth we want to
assess.
2. The hypothesis is a statement about the
parameter(s) in a population or model.
3. The results of a test are expressed in
terms of a probability that measures
how well the data and the hypothesis
agree.
Strategy Revisited
1. State the Null and Alternative Hypotheses
2. Collect a random sample of data
3. Calculate an appropriate test statistic (TS)
4. Determine the distribution of TS when Ho is true
5. Decide on the ‘Significance Level’ α and
corresponding Critical Region CR.
6. Check whether the value of the TS is in the critical
region, report the p-value and make a decision.