Download Statistics

Document related concepts

History of statistics wikipedia , lookup

Foundations of statistics wikipedia , lookup

Statistics wikipedia , lookup

Transcript
Chapter 8
Statistical inference: Significance
Tests About Hypotheses
 Learn
….
To use an inferential method called
a Significance Test
To analyze evidence that data provide
To make decisions based on data
Agresti/Franklin Statistics, 1 of 122
Two Major Methods for Making
Statistical Inferences about a
Population

Confidence Interval

Significance Test
Agresti/Franklin Statistics, 2 of 122
Questions that Significance
Tests Attempt to Answer

Does a proposed diet truly result in
weight loss, on the average?

Is there evidence of discrimination
against women in promotion decisions?

Does one advertising method result in
better sales, on the average, than another
advertising method?
Agresti/Franklin Statistics, 3 of 122
Section 8.1
What Are the Steps For
Performing a Significance Test?
Agresti/Franklin Statistics, 4 of 122
Hypothesis


A hypothesis is a statement about a
population, usually of the form that a
certain parameter takes a particular
numerical value or falls in a certain
range of values
The main goal in many research
studies is to check whether the data
support certain hypotheses
Agresti/Franklin Statistics, 5 of 122
Significance Test

A significance test is a method of
using data to summarize the evidence
about a hypothesis

A significance test about a hypothesis
has five steps
Agresti/Franklin Statistics, 6 of 122
Step 1: Assumptions

A (significance) test assumes that the
data production used randomization

Other assumptions may include:
• Assumptions about the sample size
• Assumptions about the shape of the
population distribution
Agresti/Franklin Statistics, 7 of 122
Step 2: Hypotheses

Each significance test has two
hypotheses:
• The null hypothesis is a statement that the
parameter takes a particular value
• The alternative hypothesis states that the
parameter falls in some alternative range
of values
Agresti/Franklin Statistics, 8 of 122
Null and Alternative Hypotheses

The value in the null hypothesis usually
represents no effect
• The symbol Ho denotes null hypothesis

The value in the alternative hypothesis
usually represents an effect of some type
• The symbol Ha denotes alternative
hypothesis
Agresti/Franklin Statistics, 9 of 122
Null and Alternative Hypotheses

A null hypothesis has a single
parameter value, such as Ho: p = 1/3

An alternative hypothesis has a range
of values that are alternatives to the
one in Ho such as
• Ha: p ≠ 1/3 or
• Ha: p > 1/3 or
• Ha: p < 1/3
Agresti/Franklin Statistics, 10 of 122
Step 3: Test Statistic


The parameter to which the
hypotheses refer has a point
estimate: the sample statistic
A test statistic describes how far that
estimate (the sample statistic) falls
from the parameter value given in the
null hypothesis
Agresti/Franklin Statistics, 11 of 122
Step 4: P-value

To interpret a test statistic value, we use a
probability summary of the evidence
against the null hypothesis, Ho
• First, we presume that Ho is true
• Next, we consider the sampling
•
distribution from which the test statistic
comes
We summarize how far out in the tail of
this sampling distribution the test statistic
falls
Agresti/Franklin Statistics, 12 of 122
Step 4: P-value

We summarize how far out in the tail
the test statistic falls by the tail
probability of that value and values
even more extreme
• This probability is called a P-value
• The smaller the P-value, the stronger
the evidence is against Ho
Agresti/Franklin Statistics, 13 of 122
Step 4: P-value
Agresti/Franklin Statistics, 14 of 122
Step 4: P-value


The P-value is the probability that the
test statistic equals the observed
value or a value even more extreme
It is calculated by presuming that the
null hypothesis H is true
Agresti/Franklin Statistics, 15 of 122
Step 5: Conclusion

The conclusion of a significance test
reports the P-value and interprets
what it says about the question that
motivated the test
Agresti/Franklin Statistics, 16 of 122
Summary: The Five Steps of a
Significance Test
1.
2.
3.
4.
5.
Assumptions
Hypotheses
Test Statistic
P-value
Conclusion
Agresti/Franklin Statistics, 17 of 122
Is the Statement a Null Hypothesis
or an Alternative Hypothesis?
In Canada, the proportion of adults who
favor legalize gambling is 0.50.
a. Null Hypothesis
b. Alternative Hypothesis
Agresti/Franklin Statistics, 18 of 122
Is the Statement a Null Hypothesis
or an Alternative Hypothesis?
a.
b.
The proportion of all Canadian college
students who are regular smokers is
less than 0.24, the value it was ten years
ago.
Null Hypothesis
Alternative Hypothesis
Agresti/Franklin Statistics, 19 of 122
Section 8.2
Significance Tests About
Proportions
Agresti/Franklin Statistics, 20 of 122
Example: Are Astrologers’
Predictions Better Than Guessing?

Scientific “test of astrology”
experiment:
• For each of 116 adult volunteers, an
astrologer prepared a horoscope based on
the positions of the planets and the moon
at the moment of the person’s birth
• Each adult subject also filled out a
California Personality Index Survey
Agresti/Franklin Statistics, 21 of 122
Example: Are Astrologers’
Predictions Better Than Guessing?

For a given adult, his or her birth data and
horoscope were shown to an astrologer
together with the results of the personality
survey for that adult and for two other
adults randomly selected from the group

The astrologer was asked which
personality chart of the 3 subjects was the
correct one for that adult, based on his or
her horoscope
Agresti/Franklin Statistics, 22 of 122
Example: Are Astrologers’
Predictions Better Than Guessing?

28 astrologers were randomly chosen
to take part in the experiment

The National Council for Geocosmic
Research claimed that the probability
of a correct guess on any given trial
in the experiment was larger than 1/3,
the value for random guessing
Agresti/Franklin Statistics, 23 of 122
Example: Are Astrologers’
Predictions Better Than Guessing?

Put this investigation in the context of
a significance test by stating null and
alternative hypotheses
Agresti/Franklin Statistics, 24 of 122
Example: Are Astrologers’
Predictions Better Than Guessing?



With random guessing, p = 1/3
The astrologers’ claim: p > 1/3
The hypotheses for this test:
• Ho: p = 1/3
• Ha: p > 1/3
Agresti/Franklin Statistics, 25 of 122
What Are the Steps of a Significance
Test about a Population Proportion?
Step 1: Assumptions
• The variable is categorical
• The data are obtained using
randomization
• The sample size is sufficiently large
that the sampling distribution of the
sample proportion is approximately
normal:
• np ≥ 15 and n(1-p) ≥ 15
Agresti/Franklin Statistics, 26 of 122
What Are the Steps of a Significance
Test about a Population Proportion?
Step 2: Hypotheses

The null hypothesis has the form:
• Ho: p = po

The alternative hypothesis has the form:
• Ha: p > po (one-sided test) or
• Ha: p < po (one-sided test) or
• Ha: p ≠ po (two-sided test)
Agresti/Franklin Statistics, 27 of 122
What Are the Steps of a Significance
Test about a Population Proportion?
Step 3: Test Statistic


The test statistic measures how far the sample
proportion falls from the null hypothesis value,
po, relative to what we’d expect if Ho were true
The test statistic is:
z
p
ˆp
p (1  p )
n
0
0
0
Agresti/Franklin Statistics, 28 of 122
What Are the Steps of a Significance
Test about a Population Proportion?
Step 4: P-value
 The P-value summarizes the evidence
 It describes how unusual the data
would be if H0 were true
Agresti/Franklin Statistics, 29 of 122
What Are the Steps of a Significance
Test about a Population Proportion?
Step 5: Conclusion
 We summarize the test by reporting
and interpreting the P-value
Agresti/Franklin Statistics, 30 of 122
Example: Are Astrologers’
Predictions Better Than Guessing?
Step 1: Assumptions
• The data is categorical – each prediction
•
•
•
falls in the category “correct” or
“incorrect” prediction
Each subject was identified by a random
number. Subjects were randomly selected
for each experiment.
np=116(1/3) > 15
n(1-p) = 116(2/3) > 15
Agresti/Franklin Statistics, 31 of 122
Example: Are Astrologers’
Predictions Better Than Guessing?
Step 2: Hypotheses
• H0: p = 1/3
• Ha: p > 1/3
Agresti/Franklin Statistics, 32 of 122
Example: Are Astrologers’
Predictions Better Than Guessing?
Step 3: Test Statistic:
• In the actual experiment, the astrologers were
correct with 40 of their 116 predictions (a success
rate of 0.345)
0.345  1 / 3
z(
 0.26
(1 / 3)(2 / 3)
116
Agresti/Franklin Statistics, 33 of 122
Example: Are Astrologers’
Predictions Better Than Guessing?
Step 4: P-value
 The P-value is 0.40
Agresti/Franklin Statistics, 34 of 122
Example: Are Astrologers’
Predictions Better Than Guessing?
Step 5: Conclusion
 The P-value of 0.40 is not especially
small
 It does not provide strong evidence
against H0: p = 1/3
 There is not strong evidence that
astrologers have special predictive
powers
Agresti/Franklin Statistics, 35 of 122
How Do We Interpret the
P-value?



A significance test analyzes the
strength of the evidence against the
null hypothesis
We start by presuming that H0 is true
The burden of proof is on Ha
Agresti/Franklin Statistics, 36 of 122
How Do We Interpret the
P-value?



The approach used in hypotheses
testing is called a proof by
contradiction
To convince ourselves that Ha is true,
we must show that data contradict H0
If the P-value is small, the data
contradict H0 and support Ha
Agresti/Franklin Statistics, 37 of 122
Two-Sided Significance Tests



A two-sided alternative hypothesis
has the form Ha: p ≠ p0
The P-value is the two-tail probability
under the standard normal curve
We calculate this by finding the tail
probability in a single tail and then
doubling it
Agresti/Franklin Statistics, 38 of 122
Example: Dr Dog: Can Dogs
Detect Cancer by Smell?

Study: investigate whether dogs can
be trained to distinguish a patient
with bladder cancer by smelling
compounds released in the patient’s
urine
Agresti/Franklin Statistics, 39 of 122
Example: Dr Dog: Can Dogs
Detect Cancer by Smell?
• Experiment:
• Each of 6 dogs was tested with 9
trials
• In each trial, one urine sample from a
bladder cancer patient was randomly
place among 6 control urine samples
Agresti/Franklin Statistics, 40 of 122
Example: Dr Dog: Can Dogs
Detect Cancer by Smell?

Results:
In a total of 54 trials with the six
dogs, the dogs made the correct
selection 22 times (a success rate of
0.407)
Agresti/Franklin Statistics, 41 of 122
Example: Dr Dog: Can Dogs
Detect Cancer by Smell?

Does this study provide strong
evidence that the dogs’ predictions
were better or worse than with
random guessing?
Agresti/Franklin Statistics, 42 of 122
Example: Dr Dog: Can Dogs
Detect Cancer by Smell?
Step 1: Check the sample size requirement:
 Is the sample size sufficiently large to use
the hypothesis test for a population
proportion?
• Is np0 >15 and n(1-p0) >15?
• 54(1/7) = 7.7 and 54(6/7) = 46.3

The first, np0 is not large enough
•
We will see that the two-sided test is robust
when this assumption is not satisfied
Agresti/Franklin Statistics, 43 of 122
Example: Dr Dog: Can Dogs
Detect Cancer by Smell?
Step 2: Hypotheses
• H0: p = 1/7
• Ha: p ≠ 1/7
Agresti/Franklin Statistics, 44 of 122
Example: Dr Dog: Can Dogs
Detect Cancer by Smell?
Step 3: Test Statistic
(0.407  1 / 7)
z
 5.6
(1 / 7)(6 / 7)
54
Agresti/Franklin Statistics, 45 of 122
Example: Dr Dog: Can Dogs
Detect Cancer by Smell?
Step 4: P-value
Agresti/Franklin Statistics, 46 of 122
Example: Dr Dog: Can Dogs
Detect Cancer by Smell?
Step 5: Conclusion
 Since the P-value is very small and
the sample proportion is greater than
1/7, the evidence strongly suggests
that the dogs’ selections are better
than random guessing
Agresti/Franklin Statistics, 47 of 122
Example: Dr Dog: Can Dogs
Detect Cancer by Smell?

Insight:
• In this study, the subjects were a convenience
•
•
•
sample rather than a random sample from
some population
Also, the dogs were not randomly selected
Any inferential predictions are highly tentative
The predictions become more conclusive if
similar results occur in other studies
Agresti/Franklin Statistics, 48 of 122
Summary of P-values for Different
Alternative Hypotheses
Alternative
Hypothesis
Ha: p > p0
Ha: p < p0
Ha: p ≠ p0
P-value
Right-tail
probability
Left-tail
probability
Two-tail
probability
Agresti/Franklin Statistics, 49 of 122
The Significance Level Tells Us How
Strong the Evidence Must Be



Sometimes we need to make a decision
about whether the data provide sufficient
evidence to reject H0
Before seeing the data, we decide how
small the P-value would need to be to reject
H0
This cutoff point is called the significance
level
Agresti/Franklin Statistics, 50 of 122
The Significance Level Tells Us How
Strong the Evidence Must Be
Agresti/Franklin Statistics, 51 of 122
Significance Level



The significance level is a number
such that we reject H0 if the P-value is
less than or equal to that number
In practice, the most common
significance level is 0.05
When we reject H0 we say the results
are statistically significant
Agresti/Franklin Statistics, 52 of 122
Possible Decisions in a Test
with Significance Level = 0.05
P-value:
≤ 0.05
> 0.05
Decision
about H0:
Reject H0
Fail to reject
H0
Agresti/Franklin Statistics, 53 of 122
Report the P-value


Learning the actual P-value is more
informative than learning only
whether the test is “statistically
significant at the 0.05 level”
The P-values of 0.01 and 0.049 are
both statistically significant in this
sense, but the first P-value provides
much stronger evidence against H0
than the second
Agresti/Franklin Statistics, 54 of 122
“Do Not Reject H0” Is Not the
Same as Saying “Accept H0”

Analogy: Legal trial
• Null Hypothesis: Defendant is Innocent
• Alternative Hypothesis: Defendant is Guilty
• If the jury acquits the defendant, this does not
•
mean that it accepts the defendant’s claim of
innocence
Innocence is plausible, because guilt has not
been established beyond a reasonable doubt
Agresti/Franklin Statistics, 55 of 122
One-Sided vs Two-Sided Tests

Things to consider in deciding on the
alternative hypothesis:
• The context of the real problem
• In most research articles, significance
•
tests use two-sided P-values
Confidence intervals are two-sided
Agresti/Franklin Statistics, 56 of 122
The Binomial Test for Small
Samples

The test about a proportion assumes normal
sampling distributions for p̂ and the z-test
statistic.
• It is a large-sample test the requires that the
expected numbers of successes and failures
be at least 15. In practice, the large-sample z
test still performs quite well in two-sided
alternatives even for small samples.
• Warning:
For one-sided tests, when p0 differs
from 0.50, the large-sample test does not work
well for small samples
Agresti/Franklin Statistics, 57 of 122
For a test of H0: p = 0.50:
The z test statistic is 1.04. Find the
P-value for Ha: p > 0.50.
a. .15
b. .20
c. .175
d. .222
Agresti/Franklin Statistics, 58 of 122
For a test of H0: p = 0.50:
The z test statistic is 1.04. Find the
P-value for Ha: p ≠ 0.50.
a. .15
b. .22
c. .30
d. .175
Agresti/Franklin Statistics, 59 of 122
For a test of H0: p = 0.50:
The z test statistic is 1.04. Does the
P-value for Ha: p ≠ 0.50 give strong
evidence against H0?
a. yes
b. no
Agresti/Franklin Statistics, 60 of 122
For a test of H0: p = 0.50:
The z test statistic is 2.50. Find the
P-value for Ha: p > 0.50.
a. .05
b. .10
c. .0062
d. .0124
Agresti/Franklin Statistics, 61 of 122
For a test of H0: p = 0.50:
The z test statistic is 2.50. Find the
P-value for Ha: p ≠ 0.50.
a. .05
b. .10
c. .0062
d. .0124
Agresti/Franklin Statistics, 62 of 122
For a test of H0: p = 0.50:
The z test statistic is 2.50. Does the
P-value for Ha: p ≠ 0.50 give strong
evidence against H0?
a. yes
b. no
Agresti/Franklin Statistics, 63 of 122
 Section 8.3
Significance Tests about
Means
Agresti/Franklin Statistics, 64 of 122
What Are the Steps of a Significance
Test about a Population Mean?

Step 1: Assumptions
• The variable is quantitative
• The data are obtained using
randomization
• The population distribution is
approximately normal. This is most
crucial when n is small and Ha is onesided.
Agresti/Franklin Statistics, 65 of 122
What Are the Steps of a Significance
Test about a Population Mean?

Step 2: Hypotheses:

The null hypothesis has the form:
• H0: µ = µ0

The alternative hypothesis has the form:
• Ha: µ > µ0 (one-sided test) or
• Ha: µ < µ0 (one-sided test) or
• Ha: µ ≠ µ0 (two-sided test)
Agresti/Franklin Statistics, 66 of 122
What Are the Steps of a Significance
Test about a Population Mean?

Step 3: Test Statistic
• The test statistic measures how far the sample
•
mean falls from the null hypothesis value µ0
relative to what we’d expect if H0 were true
The test statistic is:
x
t
s/ n
0
Agresti/Franklin Statistics, 67 of 122
What Are the Steps of a Significance
Test about a Population Mean?

Step 4: P-value
• The P-value summarizes the evidence
• It describes how unusual the data would
be if H0 were true
Agresti/Franklin Statistics, 68 of 122
What Are the Steps of a Significance
Test about a Population Mean?

Step 5: Conclusion
• We summarize the test by reporting and
interpreting the P-value
Agresti/Franklin Statistics, 69 of 122
Summary of P-values for Different
Alternative Hypotheses
Alternative
Hypothesis
Ha: µ > µ0
Ha: µ < µ0
Ha: µ ≠ µ0
P-value
Right-tail
probability
Left-tail
probability
Two-tail
probability
Agresti/Franklin Statistics, 70 of 122
Example: Mean Weight Change
in Anorexic Girls


A study compared different
psychological therapies for teenage
girls suffering from anorexia
The variable of interest was each
girl’s weight change: ‘weight at the
end of the study’ – ‘weight at the
beginning of the study’
Agresti/Franklin Statistics, 71 of 122
Example: Mean Weight Change
in Anorexic Girls



One of the therapies was cognitive
therapy
In this study, 29 girls received the
therapeutic treatment
The weight changes for the 29 girls
had a sample mean of 3.00 pounds
and standard deviation of 7.32
pounds
Agresti/Franklin Statistics, 72 of 122
Example: Mean Weight Change
in Anorexic Girls
Agresti/Franklin Statistics, 73 of 122
Example: Mean Weight Change
in Anorexic Girls



How can we frame this investigation
in the context of a significance test
that can detect a positive or negative
effect of the therapy?
Null hypothesis: “no effect”
Alternative hypothesis: therapy has
“some effect”
Agresti/Franklin Statistics, 74 of 122
Example: Mean Weight Change
in Anorexic Girls

Step 1: Assumptions
• The variable (weight change) is
•
•
quantitative
The subjects were a convenience sample,
rather than a random sample. The
question is whether these girls are a good
representation of all girls with anorexia.
The population distribution is
approximately normal
Agresti/Franklin Statistics, 75 of 122
Example: Mean Weight Change
in Anorexic Girls

Step 2: Hypotheses
• H0: µ = 0
• Ha: µ ≠ 0
Agresti/Franklin Statistics, 76 of 122
Example: Mean Weight Change
in Anorexic Girls

Step 3: Test Statistic
x   (3.00  0)
t

 2.21
s
7.32
n
29
0
Agresti/Franklin Statistics, 77 of 122
Example: Mean Weight Change
in Anorexic Girls

Step 4: P-value
•
Minitab Output
Test of mu = 0 vs not = 0
Variable N Mean
wt_chg 29 3.000
StDev SE Mean
7.3204 1.3594 CI
95% CI
T
P
(0.21546, 5.78454) 2.21 0.036
Agresti/Franklin Statistics, 78 of 122
Example: Mean Weight Change
in Anorexic Girls

Step 5: Conclusion
• The small P-value of 0.036 provides
considerable evidence against the null
hypothesis (the hypothesis that the
therapy had no effect)
Agresti/Franklin Statistics, 79 of 122
Example: Mean Weight Change
in Anorexic Girls

“The diet had a statistically significant
positive effect on weight (mean change =
3 pounds, n = 29, t = 2.21, P-value = 0.04)”

The effect, however, may be small in
practical terms
• 95% CI for µ: (0.2, 5.8) pounds
Agresti/Franklin Statistics, 80 of 122
Results of Two-Sided Tests and
Results of Confidence Intervals
Agree

Conclusions about means using two-sided
significance tests are consistent with
conclusions using confidence intervals
• If P-value ≤ 0.05 in a two-sided test, a 95%
confidence interval does not contain the H0
value
• If P-value > 0.05 in a two-sided test, a 95%
confidence interval does contain the H0 value
Agresti/Franklin Statistics, 81 of 122
What If the Population Does Not
Satisfy the Normality Assumption

For large samples (roughly about 30
or more) this assumption is usually
not important
• The sampling distribution of x is
approximately normal regardless of the
population distribution
Agresti/Franklin Statistics, 82 of 122
What If the Population Does Not
Satisfy the Normality Assumption

In the case of small samples, we
cannot assume that the sampling
distribution of x is approximately
normal
• Two-sided inferences using the t
•
distribution are robust against violations
of the normal population assumption
They still usually work well if the actual
population distribution is not normal
Agresti/Franklin Statistics, 83 of 122
Regardless of Robustness, Look
at the Data

Whether n is small or large, you
should look at the data to check for
severe skew or for severe outliers
• In these cases, the sample mean could
be a misleading measure
Agresti/Franklin Statistics, 84 of 122
A study has a random sample of 20
subjects. The test statistic for
testing Ho:µ=100 is t = 2.40.
Find the approximate P-value for the
alternative, Ha: µ > 100.
a. between .100 and .050
b. between .050 and .025
c. between .025 and .010
d. between .010 and .005
Agresti/Franklin Statistics, 85 of 122
A study has a random sample of 20
subjects. The test statistic for
testing Ho:µ=100 is t = 2.40.
Find the approximate P-value for the
alternative, Ha: µ ≠ 100.
a. between .100 and .050
b. between .050 and .020
c. between .025 and .010
d. between .020 and .010
Agresti/Franklin Statistics, 86 of 122
 Section 8.4
Decisions and Types of Errors in
Significance Tests
Agresti/Franklin Statistics, 87 of 122
Type I and Type II Errors

When H0 is true, a Type I Error occurs
when H0 is rejected

When H0 is false, a Type II Error
occurs when H0 is not rejected
Agresti/Franklin Statistics, 88 of 122
Significance Test Results
Agresti/Franklin Statistics, 89 of 122
An Analogy: Decision Errors in
a Legal Trial
Agresti/Franklin Statistics, 90 of 122
P(Type I Error) = Significance
Level α

Suppose H0 is true. The probability of
rejecting H0, thereby committing a
Type I error, equals the significance
level, α, for the test.
Agresti/Franklin Statistics, 91 of 122
P(Type I Error)


We can control the probability of a
Type I error by our choice of the
significance level
The more serious the consequences
of a Type I error, the smaller α should
be
Agresti/Franklin Statistics, 92 of 122
Type I and Type II Errors

As P(Type I Error) goes Down, P(Type
II Error) goes Up
• The two probabilities are inversely
related
Agresti/Franklin Statistics, 93 of 122
A significance test about a
proportion is conducted using a
significance level of 0.05.
The test statistic is 2.58. The P-value is 0.01.
If Ho is true, for what probability of a
Type I error was the test designed?
a.
.01
b. .05
c.
2.58
d. .02
Agresti/Franklin Statistics, 94 of 122
A significance test about a
proportion is conducted using a
significance level of 0.05.
The test statistic is 2.58. The P-value is 0.01.
If this test resulted in a decision error,
what type of error was it?
a.
Type I
b. Type II
Agresti/Franklin Statistics, 95 of 122
 Section 8.5
Limitations of Significance Tests
Agresti/Franklin Statistics, 96 of 122
Statistical Significance Does
Not Mean Practical Significance

When we conduct a significance test,
its main relevance is studying
whether the true parameter value is:
• Above, or below, the value in H0 and
• Sufficiently different from the value in H0
to be of practical importance
Agresti/Franklin Statistics, 97 of 122
What the Significance Test
Tells Us

The test gives us information about
whether the parameter differs from
the H0 value and its direction from
that value
Agresti/Franklin Statistics, 98 of 122
What the Significance Test Does
Not Tell Us

It does not tell us about the
practical importance of the
results
Agresti/Franklin Statistics, 99 of 122
Statistical Significance vs.
Practical Significance


A small P-value, such as 0.001, is
highly statistically significant, but it
does not imply an important finding in
any practical sense
In particular, whenever the sample
size is large, small P-values can occur
when the point estimate is near the
parameter value in H0
Agresti/Franklin Statistics, 100 of 122
Significance Tests Are Less Useful
Than Confidence Intervals


A significance test merely indicates
whether the particular parameter
value in H0 is plausible
When a P-value is small, the
significance test indicates that the
hypothesized value is not plausible,
but it tells us little about which
potential parameter values are
plausible
Agresti/Franklin Statistics, 101 of 122
Significance Tests are Less Useful
than Confidence Intervals

A Confidence Interval is more
informative, because it displays the
entire set of believable values
Agresti/Franklin Statistics, 102 of 122
Misinterpretations of Results of
Significance Tests

“Do Not Reject H0” does not mean
“Accept H0”
• A P-value above 0.05 when the
•
significance level is 0.05, does not mean
that H0 is correct
A test merely indicates whether a
particular parameter value is plausible
Agresti/Franklin Statistics, 103 of 122
Misinterpretations of Results of
Significance Tests

Statistical significance does not mean
practical significance
• A small P-value does not tell us whether
the parameter value differs by much in
practical terms from the value in H0
Agresti/Franklin Statistics, 104 of 122
Misinterpretations of Results of
Significance Tests

The P-value cannot be interpreted as
the probability that H0 is true
Agresti/Franklin Statistics, 105 of 122
Misinterpretations of Results of
Significance Tests

It is misleading to report results only
if they are “statistically significant”
Agresti/Franklin Statistics, 106 of 122
Misinterpretations of Results of
Significance Tests

Some tests may be statistically
significant just by chance
Agresti/Franklin Statistics, 107 of 122
Misinterpretations of Results of
Significance Tests

True effects may not be as large as
initial estimates reported by the
media
Agresti/Franklin Statistics, 108 of 122
 Section 8.6
How Likely is a Type II Error?
Agresti/Franklin Statistics, 109 of 122
Type II Error

A Type II error occurs in a
hypothesis test when we fail to reject
H0 even though it is actually false
Agresti/Franklin Statistics, 110 of 122
Calculating the Probability of a
Type II Error

To calculate the probability of a Type
II error, we must do a separate
calculation for various values of the
parameter of interest
Agresti/Franklin Statistics, 111 of 122
Example: Reconsider the
Experiment to test Astrologers’
Predictions

Scientific “test of astrology”
experiment:
• For each of 116 adult volunteers, an
astrologer prepared a horoscope based on
the positions of the planets and the moon
at the moment of the person’s birth
• Each adult subject also filled out a
California Personality Index Survey
Agresti/Franklin Statistics, 112 of 122
Example: Reconsider the
Experiment to test Astrologers’
Predictions

For a given adult, his or her birth data and
horoscope were shown to an astrologer
together with the results of the personality
survey for that adult and for two other
adults randomly selected from the group

The astrologer was asked which
personality chart of the 3 subjects was the
correct one for that adult, based on his or
her horoscope
Agresti/Franklin Statistics, 113 of 122
Example: Reconsider the Experiment
to test Astrologers’ Predictions

28 astrologers were randomly chosen
to take part in the experiment

The National Council for Geocosmic
Research claimed that the probability
of a correct guess on any given trial
in the experiment was larger than 1/3,
the value for random guessing
Agresti/Franklin Statistics, 114 of 122
Example: Reconsider the Experiment
to test Astrologers’ Predictions




With random guessing, p = 1/3
The astrologers’ claim: p > 1/3
The hypotheses for this test:
• Ho: p = 1/3
• Ha: p > 1/3
The significance level used for the test is
0.05
Agresti/Franklin Statistics, 115 of 122
Example: Reconsider the
Experiment to test Astrologers’
Predictions

For what values of the sample
proportion can we reject H0?

A test statistic of z = 1.645 has a Pvalue of 0.05. So, we reject H0 for z ≥
1.645 and we fail to reject H0 for z
<1.645.
Agresti/Franklin Statistics, 116 of 122
Example: Reconsider the
Experiment to test Astrologers’
Predictions

Find the value of the sample proportion that
would give us a z of 1.645:
( pˆ  p )
z
, solving for p̂ :
p (1  p )
n
(1/3)(2/3)
p̂  1/3  1.645
 0.405
116
0
0
0
Agresti/Franklin Statistics, 117 of 122
Example: Reconsider the Experiment
to test Astrologers’ Predictions
p
ˆ  0.405

So, we fail to reject H0 if

Suppose that in reality astrologers can make
the correct prediction 50% of the time (that is,
p = 0.50)
In this case, (p = 0.50), we can now calculate
the probability of a Type II error

Agresti/Franklin Statistics, 118 of 122
Example: Reconsider the
Experiment to test Astrologers’
Predictions

We calculate the probability of a sample
proportion < 0.405 assuming that the true
proportion is 0.50
z
0.405  0.50
 2.04
0.50(1  0.50)
116
Agresti/Franklin Statistics, 119 of 122
Example: Reconsider the
Experiment to test Astrologers’
Predictions



The area to the left of -2.04 in the
standard normal table is 0.02
The probability of making a Type II
error and failing to reject H0: p = 1/3 is
only 0.02 in the case in which the true
proportion is 0.50
This is only a small chance of making
a Type II error
Agresti/Franklin Statistics, 120 of 122
Power of a Test

Power = 1 – P(Type II error)

The higher the power, the better

In practice, it is ideal for studies to
have high power while using a
relatively small significance level
Agresti/Franklin Statistics, 121 of 122
Example: Reconsider the Experiment
to test Astrologers’ Predictions


In this example, the Power of the test
at p = 0.50 is: 1 – 0.02 = 0.98
Since, the higher the power the better,
a test power of 0.98 is quite good
Agresti/Franklin Statistics, 122 of 122