Download Lecture 20 - Rice Statistics

Document related concepts

History of statistics wikipedia , lookup

Foundations of statistics wikipedia , lookup

Statistics wikipedia , lookup

Transcript
Section 8.2
Significance Tests About
Proportions
Agresti/Franklin Statistics, 1 of 114
Example: Are Astrologers’
Predictions Better Than Guessing?

Scientific “test of astrology”
experiment:
• For each of 116 adult volunteers, an
astrologer prepared a horoscope based on
the positions of the planets and the moon
at the moment of the person’s birth
• Each adult subject also filled out a
California Personality Index Survey
Agresti/Franklin Statistics, 2 of 114
Example: Are Astrologers’
Predictions Better Than Guessing?

For a given adult, his or her birth data and
horoscope were shown to an astrologer
together with the results of the personality
survey for that adult and for two other
adults randomly selected from the group

The astrologer was asked which
personality chart of the 3 subjects was the
correct one for that adult, based on his or
her horoscope
Agresti/Franklin Statistics, 3 of 114
Example: Are Astrologers’
Predictions Better Than Guessing?

28 astrologers were randomly chosen
to take part in the experiment

The National Council for Geocosmic
Research claimed that the probability
of a correct guess on any given trial
in the experiment was larger than 1/3,
the value for random guessing
Agresti/Franklin Statistics, 4 of 114
Example: Are Astrologers’
Predictions Better Than Guessing?

Put this investigation in the context of
a significance test by stating null and
alternative hypotheses
Agresti/Franklin Statistics, 5 of 114
Example: Are Astrologers’
Predictions Better Than Guessing?



With random guessing, p = 1/3
The astrologers’ claim: p > 1/3
The hypotheses for this test:
• Ho: p = 1/3
• Ha: p > 1/3
Agresti/Franklin Statistics, 6 of 114
What Are the Steps of a Significance
Test about a Population Proportion?
Step 1: Assumptions
• The variable is categorical
• The data are obtained using
randomization
• The sample size is sufficiently large
that the sampling distribution of the
sample proportion is approximately
normal:
• np ≥ 15 and n(1-p) ≥ 15
Agresti/Franklin Statistics, 7 of 114
What Are the Steps of a Significance
Test about a Population Proportion?
Step 2: Hypotheses

The null hypothesis has the form:
• Ho: p = po

The alternative hypothesis has the form:
• Ha: p > po (one-sided test) or
• Ha: p < po (one-sided test) or
• Ha: p ≠ po (two-sided test)
Agresti/Franklin Statistics, 8 of 114
What Are the Steps of a Significance
Test about a Population Proportion?
Step 3: Test Statistic


The test statistic measures how far the sample
proportion falls from the null hypothesis value,
po, relative to what we’d expect if Ho were true
The test statistic is:
z
p
ˆp
p (1  p )
n
0
0
0
Agresti/Franklin Statistics, 9 of 114
What Are the Steps of a Significance
Test about a Population Proportion?
Step 4: P-value
 The P-value summarizes the evidence
 It describes how unusual the data
would be if H0 were true
Agresti/Franklin Statistics, 10 of 114
What Are the Steps of a Significance
Test about a Population Proportion?
Step 5: Conclusion
 We summarize the test by reporting
and interpreting the P-value
Agresti/Franklin Statistics, 11 of 114
Example: Are Astrologers’
Predictions Better Than Guessing?
Step 1: Assumptions
• The data is categorical – each prediction
•
•
•
falls in the category “correct” or
“incorrect” prediction
Each subject was identified by a random
number. Subjects were randomly selected
for each experiment.
np=116(1/3) > 15
n(1-p) = 116(2/3) > 15
Agresti/Franklin Statistics, 12 of 114
Example: Are Astrologers’
Predictions Better Than Guessing?
Step 2: Hypotheses
• H0: p = 1/3
• Ha: p > 1/3
Agresti/Franklin Statistics, 13 of 114
Example: Are Astrologers’
Predictions Better Than Guessing?
Step 3: Test Statistic:
• In the actual experiment, the astrologers were
correct with 40 of their 116 predictions (a success
rate of 0.345)
0.345  1 / 3
z(
 0.26
(1 / 3)(2 / 3)
116
Agresti/Franklin Statistics, 14 of 114
Example: Are Astrologers’
Predictions Better Than Guessing?
Step 4: P-value
 The P-value is 0.40
Agresti/Franklin Statistics, 15 of 114
Example: Are Astrologers’
Predictions Better Than Guessing?
Step 5: Conclusion
 The P-value of 0.40 is not especially
small
 It does not provide strong evidence
against H0: p = 1/3
 There is not strong evidence that
astrologers have special predictive
powers
Agresti/Franklin Statistics, 16 of 114
How Do We Interpret the
P-value?



A significance test analyzes the
strength of the evidence against the
null hypothesis
We start by presuming that H0 is true
The burden of proof is on Ha
Agresti/Franklin Statistics, 17 of 114
How Do We Interpret the
P-value?



The approach used in hypotheses
testing is called a proof by
contradiction
To convince ourselves that Ha is true,
we must show that data contradict H0
If the P-value is small, the data
contradict H0 and support Ha
Agresti/Franklin Statistics, 18 of 114
Two-Sided Significance Tests



A two-sided alternative hypothesis
has the form Ha: p ≠ p0
The P-value is the two-tail probability
under the standard normal curve
We calculate this by finding the tail
probability in a single tail and then
doubling it
Agresti/Franklin Statistics, 19 of 114
Example: Dr Dog: Can Dogs
Detect Cancer by Smell?

Study: investigate whether dogs can
be trained to distinguish a patient
with bladder cancer by smelling
compounds released in the patient’s
urine
Agresti/Franklin Statistics, 20 of 114
Example: Dr Dog: Can Dogs
Detect Cancer by Smell?
• Experiment:
• Each of 6 dogs was tested with 9
trials
• In each trial, one urine sample from a
bladder cancer patient was randomly
place among 6 control urine samples
Agresti/Franklin Statistics, 21 of 114
Example: Dr Dog: Can Dogs
Detect Cancer by Smell?

Results:
In a total of 54 trials with the six
dogs, the dogs made the correct
selection 22 times (a success rate of
0.407)
Agresti/Franklin Statistics, 22 of 114
Example: Dr Dog: Can Dogs
Detect Cancer by Smell?

Does this study provide strong
evidence that the dogs’ predictions
were better or worse than with
random guessing?
Agresti/Franklin Statistics, 23 of 114
Example: Dr Dog: Can Dogs
Detect Cancer by Smell?
Step 1: Check the sample size requirement:
 Is the sample size sufficiently large to use
the hypothesis test for a population
proportion?
• Is np0 >15 and n(1-p0) >15?
• 54(1/7) = 7.7 and 54(6/7) = 46.3

The first, np0 is not large enough
•
We will see that the two-sided test is robust
when this assumption is not satisfied
Agresti/Franklin Statistics, 24 of 114
Example: Dr Dog: Can Dogs
Detect Cancer by Smell?
Step 2: Hypotheses
• H0: p = 1/7
• Ha: p ≠ 1/7
Agresti/Franklin Statistics, 25 of 114
Example: Dr Dog: Can Dogs
Detect Cancer by Smell?
Step 3: Test Statistic
(0.407  1 / 7)
z
 5.6
(1 / 7)(6 / 7)
54
Agresti/Franklin Statistics, 26 of 114
Example: Dr Dog: Can Dogs
Detect Cancer by Smell?
Step 4: P-value
Agresti/Franklin Statistics, 27 of 114
Example: Dr Dog: Can Dogs
Detect Cancer by Smell?
Step 5: Conclusion
 Since the P-value is very small and
the sample proportion is greater than
1/7, the evidence strongly suggests
that the dogs’ selections are better
than random guessing
Agresti/Franklin Statistics, 28 of 114
Summary of P-values for Different
Alternative Hypotheses
Alternative
Hypothesis
Ha: p > p0
Ha: p < p0
Ha: p ≠ p0
P-value
Right-tail
probability
Left-tail
probability
Two-tail
probability
Agresti/Franklin Statistics, 29 of 114
The Significance Level Tells Us How
Strong the Evidence Must Be



Sometimes we need to make a decision
about whether the data provide sufficient
evidence to reject H0
Before seeing the data, we decide how
small the P-value would need to be to reject
H0
This cutoff point is called the significance
level
Agresti/Franklin Statistics, 30 of 114
The Significance Level Tells Us How
Strong the Evidence Must Be
Agresti/Franklin Statistics, 31 of 114
Significance Level



The significance level is a number
such that we reject H0 if the P-value is
less than or equal to that number
In practice, the most common
significance level is 0.05
When we reject H0 we say the results
are statistically significant
Agresti/Franklin Statistics, 32 of 114
Possible Decisions in a Test
with Significance Level = 0.05
P-value:
≤ 0.05
> 0.05
Decision
about H0:
Reject H0
Fail to reject
H0
Agresti/Franklin Statistics, 33 of 114
Report the P-value


Learning the actual P-value is more
informative than learning only
whether the test is “statistically
significant at the 0.05 level”
The P-values of 0.01 and 0.049 are
both statistically significant in this
sense, but the first P-value provides
much stronger evidence against H0
than the second
Agresti/Franklin Statistics, 34 of 114
“Do Not Reject H0” Is Not the
Same as Saying “Accept H0”

Analogy: Legal trial
• Null Hypothesis: Defendant is Innocent
• Alternative Hypothesis: Defendant is Guilty
• If the jury acquits the defendant, this does not
•
mean that it accepts the defendant’s claim of
innocence
Innocence is plausible, because guilt has not
been established beyond a reasonable doubt
Agresti/Franklin Statistics, 35 of 114
One-Sided vs Two-Sided Tests

Things to consider in deciding on the
alternative hypothesis:
• The context of the real problem
• In most research articles, significance
•
tests use two-sided P-values
Confidence intervals are two-sided
Agresti/Franklin Statistics, 36 of 114
The Binomial Test for Small
Samples

The test about a proportion assumes normal
sampling distributions for p̂ and the z-test
statistic.
• It is a large-sample test the requires that the
expected numbers of successes and failures
be at least 15. In practice, the large-sample z
test still performs quite well in two-sided
alternatives even for small samples.
• Warning:
For one-sided tests, when p0 differs
from 0.50, the large-sample test does not work
well for small samples
Agresti/Franklin Statistics, 37 of 114
 Section 8.3
Significance Tests about
Means
Agresti/Franklin Statistics, 38 of 114
What Are the Steps of a Significance
Test about a Population Mean?

Step 1: Assumptions
• The variable is quantitative
• The data are obtained using
randomization
• The population distribution is
approximately normal. This is most
crucial when n is small and Ha is onesided.
Agresti/Franklin Statistics, 39 of 114
What Are the Steps of a Significance
Test about a Population Mean?

Step 2: Hypotheses:

The null hypothesis has the form:
• H0: µ = µ0

The alternative hypothesis has the form:
• Ha: µ > µ0 (one-sided test) or
• Ha: µ < µ0 (one-sided test) or
• Ha: µ ≠ µ0 (two-sided test)
Agresti/Franklin Statistics, 40 of 114
What Are the Steps of a Significance
Test about a Population Mean?

Step 3: Test Statistic
• The test statistic measures how far the sample
•
mean falls from the null hypothesis value µ0
relative to what we’d expect if H0 were true
The test statistic is:
x
t
s/ n
0
Agresti/Franklin Statistics, 41 of 114
What Are the Steps of a Significance
Test about a Population Mean?

Step 4: P-value
• The P-value summarizes the evidence
• It describes how unusual the data would
be if H0 were true
Agresti/Franklin Statistics, 42 of 114
What Are the Steps of a Significance
Test about a Population Mean?

Step 5: Conclusion
• We summarize the test by reporting and
interpreting the P-value
Agresti/Franklin Statistics, 43 of 114
Summary of P-values for Different
Alternative Hypotheses
Alternative
Hypothesis
Ha: µ > µ0
Ha: µ < µ0
Ha: µ ≠ µ0
P-value
Right-tail
probability
Left-tail
probability
Two-tail
probability
Agresti/Franklin Statistics, 44 of 114
Example: Mean Weight Change
in Anorexic Girls


A study compared different
psychological therapies for teenage
girls suffering from anorexia
The variable of interest was each
girl’s weight change: ‘weight at the
end of the study’ – ‘weight at the
beginning of the study’
Agresti/Franklin Statistics, 45 of 114
Example: Mean Weight Change
in Anorexic Girls



One of the therapies was cognitive
therapy
In this study, 29 girls received the
therapeutic treatment
The weight changes for the 29 girls
had a sample mean of 3.00 pounds
and standard deviation of 7.32
pounds
Agresti/Franklin Statistics, 46 of 114
Example: Mean Weight Change
in Anorexic Girls
Agresti/Franklin Statistics, 47 of 114
Example: Mean Weight Change
in Anorexic Girls



How can we frame this investigation
in the context of a significance test
that can detect a positive or negative
effect of the therapy?
Null hypothesis: “no effect”
Alternative hypothesis: therapy has
“some effect”
Agresti/Franklin Statistics, 48 of 114
Example: Mean Weight Change
in Anorexic Girls

Step 1: Assumptions
• The variable (weight change) is
•
•
quantitative
The subjects were a convenience sample,
rather than a random sample. The
question is whether these girls are a good
representation of all girls with anorexia.
The population distribution is
approximately normal
Agresti/Franklin Statistics, 49 of 114
Example: Mean Weight Change
in Anorexic Girls

Step 2: Hypotheses
• H0: µ = 0
• Ha: µ ≠ 0
Agresti/Franklin Statistics, 50 of 114
Example: Mean Weight Change
in Anorexic Girls

Step 3: Test Statistic
x   (3.00  0)
t

 2.21
s
7.32
n
29
0
Agresti/Franklin Statistics, 51 of 114
Example: Mean Weight Change
in Anorexic Girls

Step 4: P-value
•
Minitab Output
Test of mu = 0 vs not = 0
Variable N Mean
wt_chg 29 3.000
StDev SE Mean
7.3204 1.3594 CI
95% CI
T
P
(0.21546, 5.78454) 2.21 0.036
Agresti/Franklin Statistics, 52 of 114
Example: Mean Weight Change
in Anorexic Girls

Step 5: Conclusion
• The small P-value of 0.036 provides
considerable evidence against the null
hypothesis (the hypothesis that the
therapy had no effect)
Agresti/Franklin Statistics, 53 of 114
Example: Mean Weight Change
in Anorexic Girls

“The diet had a statistically significant
positive effect on weight (mean change =
3 pounds, n = 29, t = 2.21, P-value = 0.04)”

The effect, however, may be small in
practical terms
• 95% CI for µ: (0.2, 5.8) pounds
Agresti/Franklin Statistics, 54 of 114
Results of Two-Sided Tests and
Results of Confidence Intervals
Agree

Conclusions about means using two-sided
significance tests are consistent with
conclusions using confidence intervals
• If P-value ≤ 0.05 in a two-sided test, a 95%
confidence interval does not contain the H0
value
• If P-value > 0.05 in a two-sided test, a 95%
confidence interval does contain the H0 value
Agresti/Franklin Statistics, 55 of 114
What If the Population Does Not
Satisfy the Normality Assumption

For large samples (roughly about 30
or more) this assumption is usually
not important
• The sampling distribution of x is
approximately normal regardless of the
population distribution
Agresti/Franklin Statistics, 56 of 114
What If the Population Does Not
Satisfy the Normality Assumption

In the case of small samples, we
cannot assume that the sampling
distribution of x is approximately
normal
• Two-sided inferences using the t
•
distribution are robust against violations
of the normal population assumption
They still usually work well if the actual
population distribution is not normal
Agresti/Franklin Statistics, 57 of 114
Regardless of Robustness, Look
at the Data

Whether n is small or large, you
should look at the data to check for
severe skew or for severe outliers
• In these cases, the sample mean could
be a misleading measure
Agresti/Franklin Statistics, 58 of 114