Download Chapter 21

Document related concepts

Taylor's law wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Resampling (statistics) wikipedia , lookup

German tank problem wikipedia , lookup

Misuse of statistics wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript
Chapter 21
What Is a Confidence Interval?
Chapter 21
1
Thought Question 1
Suppose that 40% of a certain population
favor the use of nuclear power for energy.
(a) If you randomly sample 10 people from
this population, will exactly four (40%) of
them be in favor of the use of nuclear
power? Would you be surprised if only
two (20%) of them are in favor? How
about if none of the sample are in favor?
Chapter 21
2
Thought Question 2
Suppose that 40% of a certain population
favor the use of nuclear power for energy.
(b) Now suppose you randomly sample 1000
people from this population. Will exactly
400 (40%) of them be in favor of the use of
nuclear power? Would you be surprised if
only 200 (20%) of them are in favor? How
about if none of the sample are in favor?
Chapter 21
3
Thought Question 3
A 95% confidence interval for the proportion
of adults in the U.S. who have diabetes
extends from .07 to .11, or 7% to 11%. What
does it mean to say that the interval from .07
to .11 represents a 95% confidence
interval for the proportion of adults in the
U.S. who have diabetes ?
Chapter 21
4
Thought Question 4
Would a 99% confidence interval for the
proportion described in Question 3 be
wider or narrower than the 95% interval
given? Explain. (Hint: what is the difference
between a 68% interval and a 95% interval?)
Chapter 21
5
Thought Question 5
In a May 2006 Zogby America poll of 1000
adults, 70% said that past efforts to enforce
immigration laws have been inadequate.
Based on this poll, a 95% confidence interval
for the proportion in the population who feel
this way is about 67% to 73%. If this poll had
been based on 5000 adults instead, would the
95% confidence interval be wider or narrower
than the interval given? Explain.
Chapter 21
6
Recall from previous chapters:
Parameter
fixed, unknown number that describes the population
Statistic
known value calculated from a sample
a statistic is used to estimate a parameter
Sampling Variability
different samples from the same population may yield
different values of the sample statistic
estimates from samples will be closer to the true values
in the population if the samples are larger
Chapter 21
7
Recall from previous chapters:
Example:
The amount by which the proportion obtained from the
sample ( p̂) will differ from the true population proportion
(p) rarely exceeds the margin of error.
Sampling Distribution
tells what values a statistic takes and how often it takes
those values in repeated sampling.
Example:
sample proportions ( p̂’s) from repeated sampling would
have a normal distribution with a certain mean and
standard deviation.
Chapter 21
8
Case Study
Comparing Fingerprint Patterns
Science News, Jan. 27, 1995, p. 451.
Chapter 21
9
Case Study: Fingerprints
Fingerprints are a “sexually dimorphic
trait…which means they are among traits that
may be influenced by prenatal hormones.”
 It is known…

– Most people have more ridges in the fingerprints
of the right hand. (People with more ridges in the
left hand have “leftward asymmetry.”)
– Women are more likely than men to have leftward
asymmetry.
 Compare
fingerprint patterns of heterosexual
and homosexual men.
Chapter 21
10
Case Study: Fingerprints
Study Results
 66
homosexual men were studied.
• 20 (30%) of the homosexual men showed
left asymmetry.
 186
heterosexual men were also studied
• 26 (14%) of the heterosexual men showed
left asymmetry.
Chapter 21
11
Case Study: Fingerprints
A Question
Assume that the proportion of all men
who have leftward asymmetry is 15%.
Is it unusual to observe a sample
of 66 men with a sample
proportion (p̂) of 30% if the true
population proportion (p) is 15%?
Chapter 21
12
Twenty Simulated Samples (n=66)
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
1
6
11
16
21
26
31
36
41
46
51
56
61
66
71
76
81
86
91
96
Sample Size
Chapter 21
13
The Rule for Sample Proportions
If numerous simple random samples of size n
are taken from the same population, the sample
proportions ( p
ˆ ) from the various samples will
have an approximately normal distribution. The
mean of the sample proportions will be p (the
true population proportion). The standard
deviation will be:
p(1  p)
n
Chapter 21
14
Rule Conditions and Illustration
 For
rule to be valid, must have
 Random
sample
 ‘Large’ sample size
Chapter 21
15
Case Study: Fingerprints
Sampling Distribution
p  0.15 (  mean); n  66
p(1  p )
0.15(1  0.15)

n
66
 0.044 (  s.d.)
Chapter 21
16
Case Study: Fingerprints
Answer to Question
 Where
should about 95% of the sample
proportions lie?

mean plus or minus two standard deviations
0.15  2(0.044) = 0.062
0.15 + 2(0.044) = 0.238

95% should fall between 0.062 & 0.238
Chapter 21
17
1000 Simulated Samples (n=66)
Simulated Data: p=0.15
160
p  0.15
n  66
140
120
100
80
0.15(1  0.15)
 0.044
66
60
40
20
0.9697
0.9091
0.8485
0.7879
0.7273
0.6667
0.6061
0.5455
0.4848
0.4242
0.3636
0.3030
0.2424
0.1818
0.1212
0.0606
0
0
Proportion of Successes
Chapter 21
18
1000 Simulated Samples (n=66)
Simulated Data: p=0.15
160
140
approximately 95% of sample
proportions fall in this interval
120
100
(0.062 to 0.238).
80
60
Is it likely we would observe
a sample proportion  0.30?
40
20
0.9697
0.9091
0.8485
0.7879
0.7273
0.6667
0.6061
0.5455
0.4848
0.4242
0.3636
0.3030
0.2424
0.1818
0.1212
0.0606
0
0
Proportion of Successes
Chapter 21
19
1000 Simulated Samples (n=30)
Simulated Data: p=0.15
200
p  0.15
n  30
180
160
140
120
100
0.15(1  0.15)
 0.065
30
80
60
40
20
0.9333
0.8667
0.8000
0.7333
0.6667
0.6000
0.5333
0.4667
0.4000
0.3333
0.2667
0.2000
0.1333
0.0667
0
0
Proportion of Successes
Chapter 21
20
1000 Simulated Samples (n=30)
Simulated Data: p=0.15
200
180
160
approximately 95% of sample
proportions fall in this interval.
140
120
100
80
60
Is it likely we would observe
a sample proportion  0.30?
40
20
0.9333
0.8667
0.8000
0.7333
0.6667
0.6000
0.5333
0.4667
0.4000
0.3333
0.2667
0.2000
0.1333
0.0667
0
0
Proportion of Successes
Chapter 21
21
Confidence Interval for a
Population Proportion

An interval of values, computed from
sample data, that is almost sure to cover
the true population proportion.

“We are ‘highly confident’ that the true population
proportion is contained in the calculated interval.”

Statistically (for a 95% C.I.): in repeated
samples, 95% of the calculated confidence
intervals should contain the true proportion.
Chapter 21
22
Formula for a 95% Confidence
Interval for the Population
Proportion (Empirical Rule)
 sample
proportion plus or minus
two standard deviations of
p( 1  p )
p̂  2
the sample proportion:
n
 since we do not know the population
proportion p (needed to calculate the
standard deviation) we will use the
sample proportion p̂ in its place.
Chapter 21
23
Formula for a 95% Confidence
Interval for the Population
Proportion (Empirical Rule)
pˆ (1  pˆ )
pˆ  2
n
standard error (estimated standard deviation of p̂ )
Chapter 21
24
Margin of Error
2
p̂(1  p̂ )
(plus or minus part of C.I.)
n

2
0.5(1  0.5 )
n
Chapter 21

1
n
25
Formula for a C-level (%) Confidence
Interval for the Population Proportion
pˆ (1  pˆ )
pˆ  z *
n
where z* is the critical value of the standard
normal distribution for confidence level C
Chapter 21
26
Common Values of z*
Confidence Level
C
Critical Value
z*
50%
0.67
60%
0.84
68%
1
70%
1.04
80%
1.28
90%
1.64
95%
1.96 (or 2)
99%
2.58
99.7%
3
99.9%
3.29
Chapter 21
27
Case Study
Parental Discipline
Brown, C. S., (1994) “To spank or not to spank.” USA
Weekend, April 22-24, pp. 4-7.
What are parents’ attitudes and
practices on discipline?
Chapter 21
28
Case Study: Survey
Parental Discipline
 Nationwide
random telephone survey of
1,250 adults.
– 474 respondents had children under 18
living at home
– results on behavior based on the smaller
sample
 reported
margin of error
– 3% for the full sample
– 5% for the smaller sample
Chapter 21
29
Case Study: Results
Parental Discipline
 “The
1994 survey marks the first time a
majority of parents reported not having
physically disciplined their children in
the previous year. Figures over the past
six years show a steady decline in
physical punishment, from a peak of 64
percent in 1988”
– The 1994 proportion who did not spank or
hit was 51% !
Chapter 21
30
Case Study: Results
Parental Discipline
 Disciplining
methods over the past year:
– denied privileges: 79%
– confined child to his/her room: 59%
– spanked or hit: 49%
– insulted or swore at child: 45%
 Margin
of error: 5%
– Which of the above appear to show a true
value different from 50%?
Chapter 21
31
Case Study: Confidence Intervals
Parental Discipline
 denied
privileges: 79%
– p̂ : 0.79
.79(1  .79)
 0.019
– standard error of p̂ :
474
– 95% C.I.: .79  2(.019) : (.752, .828)
 confined
child to his/her room : 59%
– p̂ : 0.59
.59(1  .59)
 0.023
474
– standard error of p̂ :
– 95% C.I.: .59  2(.023) : (.544, .636)
Chapter 21
32
Case Study: Confidence Intervals
Parental Discipline
 spanked
or hit: 49%
– p̂ : 0.49
.49(1  .49)
 0.023
– standard error of p̂ :
474
– 95% C.I.: .49  2(.023) : (.444, .536)
 insulted
or swore at child: 45%
– p̂ : 0.45
.45(1  .45)
 0.023
474
– standard error of p̂ :
– 95% C.I.: .45  2(.023) : (.404, .496)
Chapter 21
33
Case Study: Results
Parental Discipline
 Asked
of the full sample (n=1,250):
“How often do you think repeated yelling
or swearing at a child leads to long-term
emotional problems?”
– very often or often: 74%
– sometimes: 17%
– hardly ever or never: 7%
– no response: 2%
 Margin
of error: 3%
Chapter 21
34
Case Study: Confidence Intervals
Parental Discipline
 hardly
ever or never: 7%
– p̂ : 0.07
.07(1  .07)
 0.007
– standard error of p̂ :
1250
– 95% C.I.: .07  2(.007) : (.056, .084)
 Few
people believe such behavior is
harmless, but almost half (45%) of
parents engaged in it!
Chapter 21
35
Key Concepts (1st half of Ch. 21)
 Different
samples (of the same size) will
generally give different results.
 We can specify what these results look
like in the aggregate.
 Rule for Sample Proportions
 Compute and interpret Confidence
Intervals for population proportions
based on sample proportions
Chapter 21
36
Inference for Population Means
Sampling Distribution, Confidence Intervals
 The
remainder of this chapter discusses
the situation when interest is in making
conclusions about population means
rather than population proportions
– includes the rule for the sampling distribution
of sample means ( X's )
– includes confidence intervals for one mean
or a difference in two means
Chapter 21
37
Thought Question 6
(from Seeing Through Statistics, 2nd Edition, by Jessica M. Utts, p. 316)
Suppose the mean weight of all women
at a university is 135 pounds, with a
standard deviation of 10 pounds.
• Recalling the material from Chapter 13
about bell-shaped curves, in what range
would you expect 95% of the women’s
weights to fall? 115 to 155 pounds
Chapter 21
38
Thought Question 6 (cont.)
• If you were to randomly sample 10
women at the university, how close do
you think their average weight would be
to 135 pounds?
• If you randomly sample 1000 women,
would you expect the average to be
closer to 135 pounds than it would be
for the sample of 10 women?
Chapter 21
39
Thought Question 7
A study compared the serum HDL cholesterol
levels in people with low-fat diets to people with
diets high in fat intake. From the study, a 95%
confidence interval for the mean HDL cholesterol
for the low-fat group extends from 43.5 to 50.5...
a. Does this mean that 95% of all people with
low-fat diets will have HDL cholesterol levels
between 43.5 and 50.5? Explain.
Chapter 21
40
Thought Question 7 (cont.)
… a 95% confidence interval for the mean HDL
cholesterol for the low-fat group extends from
43.5 to 50.5. A 95% confidence interval for the
mean HDL cholesterol for the high-fat group
extends from 54.5 to 61.5.
(
) (
)
40
45
50
55
60
65
b. Based on these results, would you conclude
that people with low-fat diets have lower
HDL cholesterol levels, on average, than
people with high-fat diets?
Chapter 21
41
Thought Question 8
The first confidence interval in Question 7
was based on results from 50 people. The
confidence interval spans a range of 7 units.
If the results had been based on a much
larger sample, would the confidence interval
for the mean cholesterol level have been
wider, more narrow or about the same?
Explain.
Chapter 21
42
Thought Question 9
In Question 7, we compared average HDL
cholesterol levels for two diet groups by
computing separate confidence intervals for
the two means. Is there a more direct value
(and single C.I.) to examine in order to make
the comparison between the two groups?
Chapter 21
43
Case Study
Weights of Females at a Large University
Hypothetical
(from Seeing Through Statistics, 2nd Edition, by Jessica M. Utts, p. 316)
Suppose the mean weight of all
women is =135 pounds with a
standard deviation of =10 pounds
and the weight values follow a bellshaped curve.
Chapter 21
44
Case Study: Weights
Questions
 Where
should 95% of all women’s weights
fall?

mean plus or minus two standard deviations
135  2(10) = 115
135 + 2(10) = 155

95% should fall between 115 & 155
 What
about the mean (average) of a sample of
n women? What values would be expected?
Chapter 21
45
Twenty Simulated Samples (n=1000)
140
139
138
137
136
135
134
133
132
131
130
11
6
11
16
21
26
31
36
41
46 50051
56
Sample Size
61
Chapter 21
66
71
76
81
86
91
961000
46
The Rule for Sample Means
If numerous simple random samples of size n
are taken from the same population, the sample
means (X ) from the various samples will have
an approximately normal distribution. The
mean of the sample means will be  (the
population mean). The standard deviation will
be: 
( is the population s.d.)
n
Chapter 21
47
Conditions for the Rule for
Sample Means
Random
sample
Population of measurements…
– Follows a bell-shaped curve
- or -
– Not bell-shaped, but sample is ‘large’
Chapter 21
48
Case Study: Weights
Sampling Distribution
(for n = 10)
μ  135 (  mean for population and X)
σ  10
(  s.d. for population )
n  10
σ
n
 10
10
 3.16 (  s.d. for X)
Chapter 21
49
Case Study: Weights
Answer to Question
(for n = 10)
 Where
should 95% of the sample mean
weights fall (from samples of size n=10)?

mean plus or minus two standard deviations
135  2(3.16) = 128.68
135 + 2(3.16) = 141.32

95% should fall between 128.68 & 141.32
Chapter 21
50
Chapter 21
150.0000
148.5000
147.0000
145.5000
144.0000
142.5000
141.0000
139.5000
138.0000
136.5000
135.0000
133.5000
132.0000
130.5000
129.0000
127.5000
126.0000
124.5000
123.0000
121.5000
120
Sampling Distribution of Mean (n=10)
Simulated Data: Sample Size=10
200
150
100
50
0
Sam ple Means
51
Case Study: Weights
Sampling Distribution
(for n = 25)
μ  135
σ  10
n  25
10
25
2
Chapter 21
52
Case Study: Weights
Answer to Question
(for n = 25)
 Where
should 95% of the sample mean
weights fall (from samples of size n=25)?

mean plus or minus two standard deviations
135  2(2) = 131
135 + 2(2) = 139

95% should fall between 131 & 139
Chapter 21
53
0
Chapter 21
Sam ple Means
54
150.0000
148.5000
147.0000
145.5000
144.0000
142.5000
141.0000
139.5000
138.0000
136.5000
135.0000
133.5000
132.0000
130.5000
129.0000
127.5000
126.0000
124.5000
123.0000
121.5000
120
Sampling Distribution of Mean (n=25)
Simulated Data: Sample Size=25
200
150
100
50
Case Study: Weights
Sampling Distribution
(for n = 100)
μ  135
σ  10
n  100
10
100
1
Chapter 21
55
Case Study: Weights
Answer to Question
(for n = 100)
 Where
should 95% of the sample mean
weights fall (from samples of size n=100)?

mean plus or minus two standard deviations
135  2(1) = 133
135 + 2(1) = 137

95% should fall between 133 & 137
Chapter 21
56
Chapter 21
Sam ple Means
57
150.0000
148.5000
147.0000
145.5000
144.0000
142.5000
141.0000
139.5000
138.0000
136.5000
135.0000
133.5000
132.0000
130.5000
129.0000
127.5000
126.0000
124.5000
123.0000
121.5000
120
Sampling Distribution of Mean (n=100)
Simulated Data: Sample Size=100
200
150
100
50
0
Case Study
Exercise and Pulse Rates
Hypothetical
Is the mean resting pulse rate of adult
subjects who regularly exercise different
from the mean resting pulse rate of
those who do not regularly exercise?
Find Confidence Intervals for the means
Chapter 21
58
Case Study: Results
Exercise and Pulse Rates
A random sample of n1=31 nonexercisers yielded a sample
mean of X1=75 beats per minute (bpm) with a sample
standard deviation of s1=9.0 bpm. A random sample of
n2=29 exercisers yielded a sample mean of X 2=66 bpm
with a sample standard deviation of s2=8.6 bpm.
Nonexercisers
Exercisers
n mean std. dev.
31
75
9.0
29
66
8.6
Chapter 21
59
The Rule for Sample Means
If numerous simple random samples of size n
are taken from the same population, the sample
means (X ) from the various samples will have
an approximately normal distribution. The
mean of the sample means will be  (the
population mean). The standard deviation will
be: 
n
We do not know the value of  !
Chapter 21
60
Standard Error of the
(Sample) Mean
SEM = standard error of the mean
(standard deviation from the sample)
=
divided by
(square root of the sample size)
=
s
n
Chapter 21
61
Case Study: Results
Exercise and Pulse Rates
Nonexer.
Exercisers
n mean std. dev.
31
75
9.0
29
66
8.6
std. err.
1.6
1.6
 Typical
deviation of an individual pulse rate
(for Exercisers) is s = 8.6
 Typical deviation of a mean pulse rate
(for Exercisers) is s
= 1.6
 8.6
n
Chapter 21
29
62
Case Study: Confidence Intervals
Exercise and Pulse Rates
 95%
C.I. for the population mean:
 sample mean  2  (standard error)
X2

s
n
Nonexercisers: 75 ± 2(1.6) = 75 ± 3.2 = (71.8, 78.2)
Exercisers:
66 ± 2(1.6) = 66  3.2 = (62.8, 69.2)
 Do you think the population means are different?

Yes, because the intervals do not overlap
Chapter 21
63
Formula for a C-level (%) Confidence
Interval for the Population Mean


s
x  z *

n

where z* is the critical value of the standard
normal distribution for confidence level C
Chapter 21
64
Careful Interpretation of a
Confidence Interval

“We are 95% confident that the mean resting pulse rate
for the population of all exercisers is between 62.8 and
69.2 bpm.” (We feel that plausible values for the population of
exercisers’ mean resting pulse rate are between 62.8 and 69.2.)

** This does not mean that 95% of all people who exercise
regularly will have resting pulse rates between 62.8 and
69.2 bpm. **

Statistically: 95% of all samples of size 29 from the population of
exercisers should yield a sample mean within two standard
errors of the population mean; i.e., in repeated samples, 95% of
the C.I.s should contain the true population mean.
Chapter 21
65
Case Study: Confidence Intervals
Exercise and Pulse Rates
 95%
C.I. for the difference in population
means (nonexercisers minus exercisers):
(difference in sample means)
 2  (SE of the difference)
 Difference in sample means: X 1  X 2 = 9
 SE of the difference = 2.26 (given)
 95% confidence interval: (4.48, 13.52)
– interval does not include zero ( means are different)
Chapter 21
66
Case Study
An Experiment Testing a Vaccine for
Those with Genital Herpes
Adler, T., (1994) “Therapeutic vaccine fights herpes.”
Science News, Vol. 145, June 18, p. 388.
Does a new vaccine prevent the
outbreak of herpes in people already
infected?
Chapter 21
67
Case Study: Sample
An Experiment Testing a Vaccine for
Those with Genital Herpes
 98
men and women aged 18 to 55
 Experience between 4 and 14
outbreaks per year
 Experiment
– Double-blind experiment
– Randomized to vaccine or placebo
Chapter 21
68
Case Study: Report
An Experiment Testing a Vaccine for
Those with Genital Herpes
“The vaccine was well tolerated. gD2
recipients reported fewer recurrences per
month than placebo recipients (mean 0.42
[sem 0.05] vs 0.55 [0.05]…)…”
Chapter 21
69
Case Study: Confidence Intervals
An Experiment Testing a Vaccine for
Those with Genital Herpes
 95%
C.I. for population mean recurrences:
– Vaccine group: 0.42  2(0.05) : (.32, .52)
– Placebo group: 0.55  2(0.05) : (.45, .65)
 95%
C.I. for the difference in population
means:
– Difference = -0.13, SE = 0.07 (given)
– C.I.: (-0.27, 0.01) (contains 0  means not different)
Chapter 21
70
Key Concepts (2nd half of Ch. 21)
 Rule
for Sample Means
 Compute
confidence intervals for means
based on one sample
 Compute
confidence intervals for means
based on two samples
 Interpret
Confidence Intervals for Means
Chapter 21
71