Download ast3e_chapter08

Document related concepts

History of statistics wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Resampling (statistics) wikipedia , lookup

German tank problem wikipedia , lookup

Misuse of statistics wikipedia , lookup

Transcript
Chapter 8
Statistical Inference:
Confidence Intervals
Section 8.1
Point and Interval Estimates of Population
Parameters
Copyright © 2013, 2009, and 2007, Pearson Education, Inc.
Point Estimate and Interval Estimate
A point estimate is a single number that is our
“best guess” for the parameter.
An interval estimate is an interval of numbers
within which the parameter value is believed to fall.
Figure 8.1 A point estimate predicts a parameter by a single number. An interval
estimate is an interval of numbers that are believable values for the parameter.
Question: Why is a point estimate alone not sufficiently informative?
3
Copyright © 2013, 2009, and 2007, Pearson Education, Inc.
Point Estimate and Interval Estimate
A point estimate doesn’t tell us how close the
estimate is likely to be to the parameter.
An interval estimate is more useful, it incorporates a
margin of error which helps us to gauge the accuracy of
the point estimate.
4
Copyright © 2013, 2009, and 2007, Pearson Education, Inc.
Properties of Point Estimators
Property 1: A good estimator has a sampling
distribution that is centered at the parameter.
An estimator with this property is unbiased.
 The sample mean is an unbiased estimator of the
population mean.

5
The sample proportion is an unbiased estimator of
the population proportion.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc.
Properties of Point Estimators
Property 2: A good estimator has a small standard
deviation compared to other estimators.
 This means it tends to fall closer than other
estimates to the parameter.
 The sample mean has a smaller standard error than
the sample median when estimating the population
mean of a normal distribution.
6
Copyright © 2013, 2009, and 2007, Pearson Education, Inc.
Confidence Interval
A confidence interval is an interval containing the
most believable values for a parameter.
The probability that this method produces an interval
that contains the parameter is called the confidence
level.
This is a number chosen to be close to 1, most
commonly 0.95.
7
Copyright © 2013, 2009, and 2007, Pearson Education, Inc.
The Logic behind Constructing a
Confidence Interval
To construct a confidence interval for a population
proportion, start with the sampling distribution of a
sample proportion.
 Gives the possible values for the sample proportion
and their probabilities.
The sampling distribution:
 Is approximately a normal distribution for large
random samples by the CLT.
 Has mean equal to the population proportion.
 Has standard deviation called the standard error.
8
Copyright © 2013, 2009, and 2007, Pearson Education, Inc.
The Logic behind Constructing a
Confidence Interval
Fact: Approximately 95% of a normal distribution
falls within 2 standard deviations of the mean.
With probability 0.95, the sample proportion falls
within about 1.96 standard errors of the population
proportion.
The distance of 1.96 standard errors is the margin of
error in calculating a 95% confidence interval for the
population proportion.
9
Copyright © 2013, 2009, and 2007, Pearson Education, Inc.
Margin of Error
The margin of error measures how accurate the
point estimate is likely to be in estimating a parameter.
It is a multiple of the standard error of the sampling
distribution when the sampling distribution is a normal
distribution.
The distance of 1.96 standard errors in the margin of
error for a 95% confidence interval for a parameter from
a normal distribution.
10
Copyright © 2013, 2009, and 2007, Pearson Education, Inc.
Example: Confidence Interval for a
Proportion: A Wife's Career
Example: The GSS asked 1805 respondents whether
they agreed with the statement “It is more important for
a wife to help her husband’s career than to have one
herself”. 19% agreed. Assuming the standard error is
0.01, calculate a 95% confidence interval for the
population proportion who agreed with the statement.
 Margin of error =1.96* se  1.96*0.01  0.02
 95% CI  0.19  0.02 or (0.17 to 0.21)
 We predict that the population proportion who would
agree is somewhere between 0.17 and 0.21.
11
Copyright © 2013, 2009, and 2007, Pearson Education, Inc.
Chapter 8
Statistical Inference:
Confidence Intervals
Section 8.2
Constructing a Confidence Interval to
Estimate a Population Proportion
Copyright © 2013, 2009, and 2007, Pearson Education, Inc.
Finding the 95% Confidence Interval for
a Population Proportion
 We symbolize a population proportion by p.
 The point estimate of the population proportion is
the sample proportion.
 We symbolize the sample proportion by
“p-hat”.
13
Copyright © 2013, 2009, and 2007, Pearson Education, Inc.
p̂ , called
Finding the 95% Confidence Interval for
a Population Proportion
A 95% confidence interval uses a margin of error =
1.96(standard deviation)
CI = [point estimate  margin of error] =
p̂  1.96(stand ard deviation)
14
Copyright © 2013, 2009, and 2007, Pearson Education, Inc.
Finding the 95% Confidence Interval for
a Population Proportion
The exact standard deviation of a sample proportion
equals:
p(1  p)
n
This formula depends on the unknown population
proportion, p.
In practice, we don’t know p, and we need to estimate the
standard error as
ˆ (1  p
ˆ)
p
se 
15
n
Copyright © 2013, 2009, and 2007, Pearson Education, Inc.
Finding the 95% Confidence Interval for
a Population Proportion
A 95% confidence interval for a population proportion p is:
p̂(1 - p̂)
p̂  1.96(se), with se 
n
where p̂ denotes the sample proportion based on n
observations.
16
Copyright © 2013, 2009, and 2007, Pearson Education, Inc.
Example: Paying Higher Prices to
Protect the Environment
In 2010, the GSS asked subjects if they would be
willing to pay much higher prices to protect the
environment.
Of n = 1,361 respondents, 637 were willing to do so.
Find and interpret a 95% confidence interval for the
population proportion of adult Americans willing to do so
at the time of the survey.
17
Copyright © 2013, 2009, and 2007, Pearson Education, Inc.
Example: Paying Higher Prices to
Protect the Environment
ˆ
The sample proportion is p
 637 / 1361  0.468
The standard error of the sample proportion
se 
ˆ (1  p
ˆ)
p

n
p̂ =
(0.468)(0.532) / 1361  0.0135
Using this se, a 95% confidence interval for the population
proportion is
pˆ  1.96(se), which is 0.468  1.96(0.0135)
 0.468  0.026, or (0.441,0.494)
18
Copyright © 2013, 2009, and 2007, Pearson Education, Inc.
Sample Size Needed for Validity of
Confidence Interval for a Proportion
For the 95% confidence interval for a proportion p to be
valid, you should have at least 15 successes and 15
failures:
n
p

15
and
n(1
p̂
)

15
ˆ
At the end of Section 8.4, we’ll learn about a simple
adjustment to this confidence interval formula that you
should use when this guideline is violated.
19
Copyright © 2013, 2009, and 2007, Pearson Education, Inc.
How Can We Use Confidence Levels
Other than 95%?
“95% confidence“ means that there’s a 95% chance
that a sample proportion value occurs such that the
confidence interval p̂  1.96(se) contains the unknown
value of the population proportion, p.
With probability 0.05, however, the method produces
a confidence interval that misses p, and the inference
is incorrect.
20
Copyright © 2013, 2009, and 2007, Pearson Education, Inc.
How Can We Use Confidence Levels
Other than 95%?
In practice, the confidence level 0.95 is the most common
choice. But, some applications require greater (or less) confidence.
To increase the chance of a correct inference, we can use a
larger confidence level, such as 0.99.
Figure 8.4 A 99% Confidence Interval Is Wider Than a 95% Confidence Interval.
Question: If you want greater confidence, why would you expect a wider interval?
21
Copyright © 2013, 2009, and 2007, Pearson Education, Inc.
How Can We Use Confidence Levels
Other than 95%?
In using confidence intervals, we must compromise
between the desired margin of error and the desired
confidence of a correct inference.
As the desired confidence level increases, the margin
of error gets larger.
22
Copyright © 2013, 2009, and 2007, Pearson Education, Inc.
Example: A Husband Choosing Not to
Have Children
A recent GSS asked “If the wife in a family wants
children, but the husband decides that he does not want
any children, is it all right for the husband to refuse to
have children?
Of 598 respondents, 366 said yes,
and 232 said no.
Calculate the 99% confidence interval.
23
Copyright © 2013, 2009, and 2007, Pearson Education, Inc.
What is the Error Probability for the
Confidence Interval Method?
The general formula for the confidence interval for a
population proportion is:
Sample Proportion  (z-score)(std. error)
ˆ  z(se)
which in symbols is p
Table 8.2 Table 8.2 z -Scores for the Most Common Confidence Levels. The largeˆ  z(se)
sample confidence interval for the population proportion is p
24
Copyright © 2013, 2009, and 2007, Pearson Education, Inc.
Summary: Confidence Interval for a
Population Proportion, p
A confidence interval for a population proportion p is:
p̂

z
p̂
(1
p̂
)
n
Assumptions
 Data obtained by randomization
 A large enough sample size n so that the number of
success, np
ˆ , and the number of failures, n(1  pˆ ) , are
both at least 15.
25
Copyright © 2013, 2009, and 2007, Pearson Education, Inc.
Effects of Confidence Level and
Sample Size on Margin of Error
The margin of error for a confidence interval:
 Increases as the confidence level increases
 Decreases as the sample size increases
For instance, a 99% confidence interval is wider than
a 95% confidence interval, and a confidence interval
with 200 observations is narrower than one with 100
observations at the same confidence level. These
properties apply to all confidence intervals, not just the
one for the population proportion.
26
Copyright © 2013, 2009, and 2007, Pearson Education, Inc.
Interpretation of the Confidence Level
So what does it mean to say that we have “95%
confidence”?
The meaning refers to a long-run interpretation—how
the method performs when used over and over with
many different random samples.
If we used the 95% confidence interval method over
time to estimate many population proportions, then in
the long run about 95% of those intervals would give
correct results, containing the population proportion.
27
Copyright © 2013, 2009, and 2007, Pearson Education, Inc.
Chapter 8
Statistical Inference:
Confidence Intervals
Section 8.3
Constructing a Confidence Interval to
Estimate a Population Mean
Copyright © 2013, 2009, and 2007, Pearson Education, Inc.
How to Construct a Confidence Interval
for a Population Mean
The confidence interval again has the form
Point estimate  margin of error
The sample mean is the point estimate of the
population mean.
The exact standard error of the sample mean is 
In practice, we estimate σ by the sample standard
deviation, s, so se  s
n
29
Copyright © 2013, 2009, and 2007, Pearson Education, Inc.
/ n
How to Construct a Confidence Interval
for a Population Mean
For large n… from any population and also
For small n from an underlying population that is normal…
The confidence interval for the population mean is:
x  z(
30

n
)
Copyright © 2013, 2009, and 2007, Pearson Education, Inc.
How to Construct a Confidence
Interval for a Population Mean
In practice, we don’t know the population standard
deviation .
Substituting the sample standard deviation s for

get
se  s
to
n
introduces extra error. To account for this increased
error, we must replace the z-score by a slightly larger
score, called a t –score. The confidence interval is then
a bit wider. This distribution is called the t distribution.
31
Copyright © 2013, 2009, and 2007, Pearson Education, Inc.
Summary: Properties of the t-Distribution
 The t-distribution is bell shaped and symmetric about 0.
 The probabilities depend on the degrees of freedom,
df  n  1 .
 The t-distribution has thicker tails than the standard
normal distribution, i.e., it is more spread out.
 A t -score multiplied by the standard error gives the
margin of error for a confidence interval for the mean.
32
Copyright © 2013, 2009, and 2007, Pearson Education, Inc.
t-Distribution
Figure 8.7 The t Distribution Relative to the Standard Normal Distribution. The t distribution
gets closer to the standard normal as the degrees of freedom ( df ) increase. The two are
practically identical when df  30 . Question: Can you find z -scores (such as 1.96) for a
normal distribution on the t table (Table B)?
33
Copyright © 2013, 2009, and 2007, Pearson Education, Inc.
t-Distribution
Table 8.3 Part of Table B Displaying t-Scores. The scores have right-tail probabilities of
0.100, 0.050, 0.025, 0.010, 0.005, and 0.001. When n  7, df  6 , and t.025  2.447 is the
t -score with right-tail probability = 0.025 and two-tail probability = 0.05. It is used in a 95%
confidence interval, x  2.447( se).
34
Copyright © 2013, 2009, and 2007, Pearson Education, Inc.
t-Distribution
Figure 8.8 The t Distribution with df = 6. 95% of the distribution falls between -2.447
and 2.447. These t -scores are used with a 95% confidence interval when n = 7.
Question: Which t -scores with df = 6 contain the middle 99% of a t distribution (for a
99% confidence interval)?
35
Copyright © 2013, 2009, and 2007, Pearson Education, Inc.
Using the t Distribution to Construct a
Confidence Interval for a Mean
Summary: 95% Confidence Interval for a Population Mean
When the standard deviation of the population is unknown,
a 95% confidence interval for the population mean  is:
s
x
t
( );
df

n
1
n
.025
To use this method, you need:
 Data obtained by randomization
 An approximately normal population distribution
36
Copyright © 2013, 2009, and 2007, Pearson Education, Inc.
Example: Stock Market Activity on
Different Days of the Week
Companies often finance their daily operations by
selling shares of common stock.
One statistic monitored closely is the numbers of
shares, or trading volume, that are bought and sold
each day.
In this example, we compare the number of shares
traded of General Electric stock on Mondays and
Fridays during February through April of 2011.
37
Copyright © 2013, 2009, and 2007, Pearson Education, Inc.
Example: Stock Market Activity on
Different Days of the Week
Figure 8.9 MINITAB Dot Plot for General Electric Trading Volume. Data are shown for
11 Mondays and 12 Fridays during February, March, and April 2011. Question: What
does this plot tell you about how the trading volumes compare for these two days of
the week?
38
Copyright © 2013, 2009, and 2007, Pearson Education, Inc.
Example: Stock Market Activity on
Different Days of the Week
Let  denote the population mean for Monday volume. The
estimate of  is the sample mean of 45, 43, 43, 66, 91, 53,
35, 45, 29, 64, and 56, which is x  51.82. The sample
standard deviation of s  17.19. The standard error of the
sample mean is
se  s
39
n  17.19 / 11  5.18
Copyright © 2013, 2009, and 2007, Pearson Education, Inc.
Example: Stock Market Activity on
Different Days of the Week
Since n  11, the degrees of freedom are df  n  1  10 .
For a 95% confidence interval, from Table 8.3 we use
t.025  2.228 . The 95% confidence interval is
x  t.025 se  51.82  2.228(5.18)
Which is 51.82  11.541, or (40.27, 63.37)
With 95% confidence, the range of believable values for
the mean Monday volume is 40.27 million shares to 63.37
million shares.
40
Copyright © 2013, 2009, and 2007, Pearson Education, Inc.
Finding a t Confidence Interval for
Other Confidence Levels
t.025 since 95%
The 95% confidence interval uses
t.025 and
t.025 .
of the probability falls between
For 99% confidence, the error probability is 0.01
with 0.005 in each tail and the appropriate t-score
is t.005 .
To get other confidence intervals use the
appropriate t-value from Table B.
41
Copyright © 2013, 2009, and 2007, Pearson Education, Inc.
How Do We Find a t- Confidence
Interval for Other Confidence Levels?
Table 8.5 Part of Table B Displaying t -Scores for Large df Values. The z -score of 1.96
twith
is the t -score
df . 
.025 right-tail probability of 0.025 and
42
Copyright © 2013, 2009, and 2007, Pearson Education, Inc.
If the Population is Not Normal, is the
Method “Robust”?
A basic assumption of the confidence interval using
the t-distribution is that the population distribution is
normal.
Many variables have distributions that are far from
normal.
We say the t-distribution is a robust method in terms
of the normality assumption.
43
Copyright © 2013, 2009, and 2007, Pearson Education, Inc.
If the Population is Not Normal, is the
Method “Robust”?
How problematic is it if we use the t- confidence
interval even if the population distribution is not normal?
For large random samples, it’s not problematic
because of the Central Limit Theorem.
What if n is small?
Confidence intervals using t-scores usually work quite
well except for when extreme outliers are present.
44
Copyright © 2013, 2009, and 2007, Pearson Education, Inc.
The Standard Normal Distribution is the
t-Distribution with df  
Table 8.5 Part of Table B Displaying t-Scores for Large df Values. The z-score of
1.96 is the t-score t.025 with right-tail probability of 0.025 and df   .
45
Copyright © 2013, 2009, and 2007, Pearson Education, Inc.
Chapter 8
Statistical Inference:
Confidence Intervals
Section 8.4
Choosing the Sample Size for a Study
Copyright © 2013, 2009, and 2007, Pearson Education, Inc.
Sample Size for Estimating a Population
Proportion
To determine the sample size,
1. first, we must decide on the desired margin of error.
2. second, we must choose the confidence level for
achieving that margin of error.
In practice, 95% confidence intervals are most common.
47
Copyright © 2013, 2009, and 2007, Pearson Education, Inc.
Summary: Sample Size for Estimating
a Population Proportion
The random sample size n for which a confidence
interval for a population proportion p has margin of
error m (such as m = 0.04) is
ˆ(1p
ˆ)z
p
n
2
m
2
The z-score is based on the confidence level, such
as z=1.96 for 95% confidence.
You either guess the value you’d get for the sample
proportion p̂ based on other information or take the
safe approach of setting pˆ  0.5.
48
Copyright © 2013, 2009, and 2007, Pearson Education, Inc.
Example: Sample Size For Exit Poll
An election is expected to be close. Pollsters
planning an exit poll decide that a margin of error of
0.04 is too large.
How large should the sample size be for the margin
of error of a 95% confidence interval to equal 0.02?
49
Copyright © 2013, 2009, and 2007, Pearson Education, Inc.
Example: Sample Size For Exit Poll
The z-score is based on the confidence level, such as
z  1.96 for 95% confidence.
The 95% confidence interval for a population proportion p
1
.96
(
se
).
ˆ
is: p
The required sample size is:
pˆ (1  pˆ ) z 2 (0.50)(0.50)(1.96) 2
n

 2401
2
2
m
(0.02)
50
Copyright © 2013, 2009, and 2007, Pearson Education, Inc.
Choosing the Sample Size for
Estimating a Population Mean
The random sample size n for which a confidence
interval for a population mean has margin of error
approximately equal to m is
2z2
n
m2
where the z-score is based on the confidence level,
 confidence.
such as z=1.96 for 95%
51
Copyright © 2013, 2009, and 2007, Pearson Education, Inc.
Choosing the Sample Size for
Estimating a Population Mean
In practice, you don’t know the value of the standard
deviation,  .
 .
You must substitute an educated guess for
Sometimes you can use the sample standard
deviation from a similar study.
When no prior information is known, a crude estimate
that can be used is to divide the estimated range of the
data by 6 since for a bell-shaped distribution we expect
almost all of the data to fall within 3 standard deviations
of the mean.
52
Copyright © 2013, 2009, and 2007, Pearson Education, Inc.
Example: Estimating Mean Education in
South Africa
A social scientist plans a study of adult South
Africans to investigate educational attainment in the
black community.
How large a sample size is needed so that a 95%
confidence interval for the mean number of years of
education has margin of error equal to 1 year?
Assume that the education values will fall within a
22
2 2
range of 0 to 18 years.
zs 1
.
96
(
3
)
n
 2 2 
35
m 1
53
Copyright © 2013, 2009, and 2007, Pearson Education, Inc.
Other Factors That Affect the Choice of
the Sample Size
 The first is the desired precision, as measured by the
margin of error, m.
 The second is the confidence level.
 The third factor is the variability in the data.
 The fourth factor is cost.
54
Copyright © 2013, 2009, and 2007, Pearson Education, Inc.
What if You Have to Use a Small n?
The t methods for a mean are valid for any n.
However, you need to be extra cautious to look for
extreme outliers or great departures from the normal
population assumption.
In the case of the confidence interval for a
population proportion, the method works poorly for
small samples because the CLT no longer holds.
55
Copyright © 2013, 2009, and 2007, Pearson Education, Inc.
Confidence Interval for a Proportion with
Small Samples
If a random sample does not have at least 15
successes and 15 failures, the confidence interval
formula
pˆ  z pˆ (1  pˆ ) / n
is still valid if we use it after adding 2 to the original
number of successes and 2 to the original number of
failures.
This results in adding 4 to the sample size n.
56
Copyright © 2013, 2009, and 2007, Pearson Education, Inc.
Chapter 8
Statistical Inference:
Confidence Intervals
Section 8.5
Using Computers to Make New Estimation
Methods Possible
Copyright © 2013, 2009, and 2007, Pearson Education, Inc.
The Bootstrap: Using Simulation to
Construct a Confidence Interval
When it is difficult to derive a standard error or a
confidence interval formula that works well you can use
simulation.
The bootstrap is a simulation method that resamples
from the observed data. It treats the data distribution as
if it were the population distribution.
58
Copyright © 2013, 2009, and 2007, Pearson Education, Inc.
The Bootstrap: Using Simulation to
Construct a Confidence Interval
To use the bootstrap method:



59
You resample, with replacement, n observations from the
data distribution.
For the new sample of size n, construct the point
estimate of the parameter of interest
Repeat process a very large number of times (e.g.,
selecting 10,000 separate samples of size n and calculating
the 10,000 corresponding parameter estimates).
Copyright © 2013, 2009, and 2007, Pearson Education, Inc.
Example: How Variable Are Your Weight
Readings on a Scale?
To investigate how much the weight readings tend to
vary, he weighed himself ten times, taking a 30-second
break after each trial to allow the scale to reset.
Suppose your data set includes the following:
160.2, 160.8, 161.4, 162.0, 160.8
162.0, 162.0, 161.8, 161.6, 161.8.
This data has a mean of 161.44 and standard
deviation of 0.63.
Use the bootstrap method to find a 95% confidence
interval for the population standard deviation.
60
Copyright © 2013, 2009, and 2007, Pearson Education, Inc.
The Bootstrap: Using Simulation to
Construct a Confidence Interval
Re-sample with replacement from this sample of size 10
and compute the standard deviation of the new sample.
Figure 8.11 A Bootstrap Frequency Distribution of Standard Deviation Values. These
were obtained by taking 100,000 samples of size 10 each from the sample data
distribution. Question: What is the practical reason for using the bootstrap method?
61
Copyright © 2013, 2009, and 2007, Pearson Education, Inc.
The Bootstrap: Using Simulation to
Construct a Confidence Interval
Now, identify the middle 95% of these 100,000
sample standard deviations (take the 2.5th and 97.5th
percentiles).
For this example, these percentiles are 0.26 and
0.80.
This is our 95% bootstrap confidence interval for the
population standard deviation.
In Summary: A typical deviation of a weight reading
from the mean might be rather large, nearly a pound.
62
Copyright © 2013, 2009, and 2007, Pearson Education, Inc.