Download ch_07 - ROHAN Academic Computing

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts

German tank problem wikipedia, lookup

History of statistics wikipedia, lookup

Bootstrapping (statistics) wikipedia, lookup

Misuse of statistics wikipedia, lookup

Foundations of statistics wikipedia, lookup

Transcript
Chapter 7
Statistical Inference:
Confidence Intervals

Learn ….
How to Estimate a Population
Parameter Using Sample Data
Agresti/Franklin Statistics, 1 of 87
 Section 7.1
What Are Point and Interval
Estimates of Population
Parameters?
Agresti/Franklin Statistics, 2 of 87
Point Estimate

A point estimate is a single
number that is our “best guess” for
the parameter
Agresti/Franklin Statistics, 3 of 87
Interval Estimate

An interval estimate is an interval
of numbers within which the
parameter value is believed to fall.
Agresti/Franklin Statistics, 4 of 87
Point Estimate vs Interval
Estimate
Agresti/Franklin Statistics, 5 of 87
Point Estimate vs Interval
Estimate


A point estimate doesn’t tell us how
close the estimate is likely to be to
the parameter
An interval estimate is more useful
• It incorporates a margin of error which
helps us to gauge the accuracy of the
point estimate
Agresti/Franklin Statistics, 6 of 87
Point Estimation: How Do We Make
a Best Guess for a Population
Parameter?

Use an appropriate sample statistic:
• For the population mean, use the sample
•
mean
For the population proportion, use the
sample proportion
Agresti/Franklin Statistics, 7 of 87
Point Estimation: How Do We Make
a Best Guess for a Population
Parameter?

Point estimates are the most common
form of inference reported by the
mass media
Agresti/Franklin Statistics, 8 of 87
Properties of Point Estimators

Property 1: A good estimator has a
sampling distribution that is centered at
the parameter
• An estimator with this property is
unbiased
• The sample mean is an unbiased estimator
of the population mean
• The sample proportion is an unbiased
estimator of the population proportion
Agresti/Franklin Statistics, 9 of 87
Properties of Point Estimators

Property 2: A good estimator has a
small standard error compared to
other estimators
• This means it tends to fall closer than
other estimates to the parameter
Agresti/Franklin Statistics, 10 of 87
Interval Estimation: Constructing an
Interval that Contains the Parameter
(We Hope!)

Inference about a parameter should
provide not only a point estimate but
should also indicate its likely
precision
Agresti/Franklin Statistics, 11 of 87
Confidence Interval


A confidence interval is an interval
containing the most believable values
for a parameter
The probability that this method
produces an interval that contains the
parameter is called the confidence
level
•
This is a number chosen to be close to 1,
most commonly 0.95
Agresti/Franklin Statistics, 12 of 87
What is the Logic Behind
Constructing a Confidence Interval?

To construct a confidence interval for
a population proportion, start with the
sampling distribution of a sample
proportion
Agresti/Franklin Statistics, 13 of 87
The Sampling Distribution of the
Sample Proportion




Gives the possible values for the sample
proportion and their probabilities
Is approximately a normal distribution for
large random samples
Has a mean equal to the population
proportion
Has a standard deviation called the
standard error
Agresti/Franklin Statistics, 14 of 87
A 95% Confidence Interval for a
Population Proportion

Fact: Approximately 95% of a normal
distribution falls within 1.96 standard
deviations of the mean
• That means:
With probability 0.95, the
sample proportion falls within about 1.96
standard errors of the population
proportion
Agresti/Franklin Statistics, 15 of 87
Margin of Error


The margin of error measures how
accurate the point estimate is likely to
be in estimating a parameter
The distance of 1.96 standard errors
in the margin of error for a 95%
confidence interval
Agresti/Franklin Statistics, 16 of 87
Confidence Interval


A confidence interval is constructed
by adding and subtracting a margin of
error from a given point estimate
When the sampling distribution is
approximately normal, a 95%
confidence interval has margin of
error equal to 1.96 standard errors
Agresti/Franklin Statistics, 17 of 87
 Section 7.2
How Can We Construct a
Confidence Interval to Estimate a
Population Proportion?
Agresti/Franklin Statistics, 18 of 87
Finding the 95% Confidence Interval
for a Population Proportion



We symbolize a population proportion by p
The point estimate of the population
proportion is the sample proportion
We symbolize the sample proportion by p̂
Agresti/Franklin Statistics, 19 of 87
Finding the 95% Confidence Interval
for a Population Proportion


A 95% confidence interval uses a margin of
error = 1.96(standard errors)
[point estimate ± margin of error] =
p̂  1.96(standard errors)
Agresti/Franklin Statistics, 20 of 87
Finding the 95% Confidence Interval
for a Population Proportion

The exact standard error of a sample proportion
equals:
p (1  p )
n


This formula depends on the unknown population
proportion, p
In practice, we don’t know p, and we need to
estimate the standard error
Agresti/Franklin Statistics, 21 of 87
Finding the 95% Confidence Interval
for a Population Proportion

In practice, we use an estimated standard
error:
se 
p
ˆ (1  p
ˆ)
n
Agresti/Franklin Statistics, 22 of 87
Finding the 95% Confidence Interval
for a Population Proportion

A 95% confidence interval for a population
proportion p is:
p̂  1.96(se), with se 
p̂(1 - p̂)
n
Agresti/Franklin Statistics, 23 of 87
Example: Would You Pay Higher
Prices to Protect the Environment?

In 2000, the GSS asked: “Are you
willing to pay much higher prices in
order to protect the environment?”
• Of n = 1154 respondents, 518 were
willing to do so
Agresti/Franklin Statistics, 24 of 87
Example: Would You Pay Higher
Prices to Protect the Environment?

Find and interpret a 95% confidence
interval for the population proportion
of adult Americans willing to do so at
the time of the survey
Agresti/Franklin Statistics, 25 of 87
Example: Would You Pay Higher
Prices to Protect the Environment?
518
p̂ 
 0.45
1154
(0.45)(0.55)
se 
 0.015
1154
p̂  1.96(se)  1.96(0.015)
 0.45  0.03  (0.42, 0.48)
Agresti/Franklin Statistics, 26 of 87
Sample Size Needed for Large-Sample
Confidence Interval for a Proportion

For the 95% confidence interval for a
proportion p to be valid, you should have at
least 15 successes and 15 failures:
np
ˆ  15 and n(1- p̂)  15
Agresti/Franklin Statistics, 27 of 87
“95% Confidence”


With probability 0.95, a sample
proportion value occurs such that the
confidence interval contains the
population proportion, p
With probability 0.05, the method
produces a confidence interval that
misses p
Agresti/Franklin Statistics, 28 of 87
How Can We Use Confidence
Levels Other than 95%?



In practice, the confidence level 0.95
is the most common choice
But, some applications require
greater confidence
To increase the chance of a correct
inference, we use a larger confidence
level, such as 0.99
Agresti/Franklin Statistics, 29 of 87
A 99% Confidence Interval for p
pˆ  2.58(se)
Agresti/Franklin Statistics, 30 of 87
Different Confidence Levels
Agresti/Franklin Statistics, 31 of 87
Different Confidence Levels

In using confidence intervals, we
must compromise between the
desired margin of error and the
desired confidence of a correct
inference
• As the desired confidence level
increases, the margin of error gets
larger
Agresti/Franklin Statistics, 32 of 87
What is the Error Probability for
the Confidence Interval Method?

The general formula for the confidence
interval for a population proportion is:
Sample proportion ± (z-score)(std. error)
which in symbols is
pˆ  z(se)
Agresti/Franklin Statistics, 33 of 87
What is the Error Probability for
the Confidence Interval Method?
Agresti/Franklin Statistics, 34 of 87
Summary: Confidence Interval
for a Population Proportion, p

A confidence interval for a population
proportion p is:
p̂  z
p̂(1 - p̂)
n
Agresti/Franklin Statistics, 35 of 87
Summary: Effects of Confidence
Level and Sample Size on Margin of
Error

The margin of error for a confidence
interval:
• Increases as the confidence level
increases
• Decreases as the sample size
increases
Agresti/Franklin Statistics, 36 of 87
What Does It Mean to Say that
We Have “95% Confidence”?

If we used the 95% confidence
interval method to estimate many
population proportions, then in the
long run about 95% of those intervals
would give correct results, containing
the population proportion
Agresti/Franklin Statistics, 37 of 87
A recent survey asked: “During the
last year, did anyone take something
from you by force?”


a.
b.
c.
Of 987 subjects, 17 answered “yes”
Find the point estimate of the proportion
of the population who were victims
.17
.017
.0017
Agresti/Franklin Statistics, 38 of 87
 Section 7.3
How Can We Construct a
Confidence Interval To Estimate a
Population Mean?
Agresti/Franklin Statistics, 39 of 87
How to Construct a Confidence
Interval for a Population Mean




Point estimate ± margin of error
The sample mean is the point
estimate of the population mean
The exact standard error of the
sample mean is σ/ n
In practice, we estimate σ by the
sample standard deviation, s
Agresti/Franklin Statistics, 40 of 87
How to Construct a Confidence
Interval for a Population Mean



For large n…
•
and also
For small n from an underlying population
that is normal…
The confidence interval for the population
mean is:
x  z(

n
)
Agresti/Franklin Statistics, 41 of 87
How to Construct a Confidence
Interval for a Population Mean



In practice, we don’t know the
population standard deviation
Substituting the sample standard
deviation s for σ to get se = s/ n
introduces extra error
To account for this increased error,
we replace the z-score by a slightly
larger score, the t-score
Agresti/Franklin Statistics, 42 of 87
How to Construct a Confidence
Interval for a Population Mean


In practice, we estimate the standard
error of the sample mean by se = s/ n
Then, we multiply se by a t-score from
the t-distribution to get the margin of
error for a confidence interval for the
population mean
Agresti/Franklin Statistics, 43 of 87
Properties of the t-distribution



The t-distribution is bell shaped and
symmetric about 0
The probabilities depend on the
degrees of freedom, df
The t-distribution has thicker tails and
is more spread out than the standard
normal distribution
Agresti/Franklin Statistics, 44 of 87
t-Distribution
Agresti/Franklin Statistics, 45 of 87
Summary: 95% Confidence
Interval for a Population Mean

A 95% confidence interval for the
population mean µ is:
s
x  t ( ); df  n - 1
n
.025

To use this method, you need:
•
•
Data obtained by randomization
An approximately normal population distribution
Agresti/Franklin Statistics, 46 of 87
Example: eBay Auctions of
Palm Handheld Computers

Do you tend to get a higher, or a
lower, price if you give bidders the
“buy-it-now” option?
Agresti/Franklin Statistics, 47 of 87
Example: eBay Auctions of
Palm Handheld Computers


Consider some data from sales of the
Palm M515 PDA (personal digital
assistant)
During the first week of May 2003, 25
of these handheld computers were
auctioned off, 7 of which had the
“buy-it-now” option
Agresti/Franklin Statistics, 48 of 87
Example: eBay Auctions of
Palm Handheld Computers

“Buy-it-now” option:
235 225 225 240 250 250 210

Bidding only:
250 249 255 200 199 240 228
255 232 246 210 178 246 240
245 225 246 225
Agresti/Franklin Statistics, 49 of 87
Example: eBay Auctions of
Palm Handheld Computers

Summary of selling prices for the two
types of auctions:
buy_now N Mean StDev
no
18 231.61 21.94
yes
7 233.57 14.64
buy_now Maximum
no
255.00
yes
250.00
Minimum Q1 Median
Q3
178.00 221.25 240.00 246.75
210.00 225.00 235.00 250.00
Agresti/Franklin Statistics, 50 of 87
Example: eBay Auctions of
Palm Handheld Computers
Agresti/Franklin Statistics, 51 of 87
Example: eBay Auctions of
Palm Handheld Computers

To construct a confidence interval
using the t-distribution, we must
assume a random sample from an
approximately normal population of
selling prices
Agresti/Franklin Statistics, 52 of 87
Example: eBay Auctions of
Palm Handheld Computers



Let µ denote the population mean for
the “buy-it-now” option
The estimate of µ is the sample mean:
x = $233.57
The sample standard deviation is:
s = $14.64
Agresti/Franklin Statistics, 53 of 87
Example: eBay Auctions of
Palm Handheld Computers

The 95% confidence interval for the “buy-itnow” option is:
s
14.64
x  t.025 ( )  233.57  2.44(
)
n
7

which is 233.57 ± 13.54 or (220.03, 247.11)
Agresti/Franklin Statistics, 54 of 87
Example: eBay Auctions of
Palm Handheld Computers

The 95% confidence interval for the
mean sales price for the bidding only
option is:
(220.70, 242.52)
Agresti/Franklin Statistics, 55 of 87
Example: eBay Auctions of
Palm Handheld Computers

Notice that the two intervals overlap
a great deal:
• “Buy-it-now”: (220.03, 247.11)
• Bidding only: (220.70, 242.52)

There is not enough information for us to
conclude that one probability distribution
clearly has a higher mean than the other
Agresti/Franklin Statistics, 56 of 87
How Do We Find a t- Confidence
Interval for Other Confidence
Levels?

The 95% confidence interval uses t.025
since 95% of the probability falls
between - t.025 and t.025

For 99% confidence, the error
probability is 0.01 with 0.005 in each
tail and the appropriate t-score is t.005
Agresti/Franklin Statistics, 57 of 87
If the Population is Not Normal,
is the Method “Robust”?


A basic assumption of the confidence
interval using the t-distribution is that
the population distribution is normal
Many variables have distributions that
are far from normal
Agresti/Franklin Statistics, 58 of 87
If the Population is Not Normal,
is the Method “Robust”?

How problematic is it if we use the tconfidence interval even if the
population distribution is not normal?
Agresti/Franklin Statistics, 59 of 87
If the Population is Not Normal,
is the Method “Robust”?


For large random samples, it’s not
problematic
The Central Limit Theorem applies:
for large n, the sampling distribution
is bell-shaped even when the
population is not
Agresti/Franklin Statistics, 60 of 87
If the Population is Not Normal,
is the Method “Robust”?



What about a confidence interval using the
t-distribution when n is small?
Even if the population distribution is not
normal, confidence intervals using t-scores
usually work quite well
We say the t-distribution is a robust method
in terms of the normality assumption
Agresti/Franklin Statistics, 61 of 87
Cases Where the t- Confidence
Interval Does Not Work

With binary data

With data that contain extreme
outliers
Agresti/Franklin Statistics, 62 of 87
The Standard Normal Distribution is
the t-Distribution with df = ∞
Agresti/Franklin Statistics, 63 of 87
The 2002 GSS asked: “What do you
think is the ideal number of children in
a family?”

a.
b.
c.
d.
The 497 females who responded had a median
of 2, mean of 3.02, and standard deviation of
1.81. What is the point estimate of the
population mean?
497
2
3.02
1.81
Agresti/Franklin Statistics, 64 of 87
 Section 7.4
How Do We Choose the Sample
Size for a Study?
Agresti/Franklin Statistics, 65 of 87
How are the Sample Sizes
Determined in Polls?

It depends on how much precision is
needed as measured by the margin of
error

The smaller the margin of error, the
larger the sample size must be
Agresti/Franklin Statistics, 66 of 87
Choosing the Sample Size for
Estimating a Population Proportion?



First, we must decide on the desired
margin of error
Second, we must choose the
confidence level for achieving that
margin of error
In practice, 95% confidence intervals
are most common
Agresti/Franklin Statistics, 67 of 87
Example: What Sample Size Do
You Need For An Exit Poll?

A television network plans to predict
the outcome of an election between
two candidates – Levin and Sanchez

They will do this with an exit poll that
randomly samples votes on election
day
Agresti/Franklin Statistics, 68 of 87
Example: What Sample Size Do
You Need For An Exit Poll?

The final poll a week before election
day estimated Levin to be well ahead,
58% to 42%
• So the outcome is not expected to be
close

The researchers decide to use a
sample size for which the margin of
error is 0.04
Agresti/Franklin Statistics, 69 of 87
Example: What Sample Size Do
You Need For An Exit Poll?

What is the sample size for which a
95% confidence interval for the
population proportion has margin of
error equal to 0.04?
Agresti/Franklin Statistics, 70 of 87
Example: What Sample Size Do
You Need For An Exit Poll?

The 95% confidence interval for a
population proportion p is:
p
ˆ  1.96( se)

If the sample size is such that 1.96(se) =
0.04, then the margin of error will be
0.04
Agresti/Franklin Statistics, 71 of 87
Example: What Sample Size Do
You Need For An Exit Poll?

Find the value of the sample size n for
which 0.04 = 1.96(se):
0.04  1.96 pˆ (1  pˆ n
solve algebraically for n :
n  (1.96) pˆ (1  pˆ ) /(0.04)
2

Agresti/Franklin Statistics, 72 of 87
2
Example: What Sample Size Do
You Need For An Exit Poll?

A random sample of size n = 585
should give a margin of error of about
0.04 for a 95% confidence interval for
the population proportion
Agresti/Franklin Statistics, 73 of 87
How Can We Select a Sample Size
Without Guessing a Value for the
Sample Proportion


In the formula for determining n, setting
p̂ = 0.50 gives the largest value for n
out of all the possible values to
substitute for p̂
Doing this is the “safe” approach that
guarantees we’ll have enough data
Agresti/Franklin Statistics, 74 of 87
Sample Size for Estimating a
Population Parameter

The random sample size n for which a
confidence interval for a population proportion p
has margin of error m (such as m = 0.04) is
pˆ (1  pˆ ) z
n
m
2
2
Agresti/Franklin Statistics, 75 of 87
Sample Size for Estimating a
Population Parameter

The z-score is based on the confidence
level, such as z = 1.96 for 95%
confidence

You either guess the value you’d get for
the sample proportion based on other
information or take the safe approach of
setting p̂ = 0.50
Agresti/Franklin Statistics, 76 of 87
Sample Size for Estimating a
Population Mean

The random sample size n for which a 95%
confidence interval for a population mean has
margin of error approximately equal to m is
4s
n
m
2
2

To use this formula, you guess the value you’ll
get for the sample standard deviation, s
Agresti/Franklin Statistics, 77 of 87
Sample Size for Estimating a
Population Mean


In practice, since you don’t yet have
the data, you don’t know the value of
the sample standard deviation, s
You must substitute an educated
guess for s
• You can use the sample standard
deviation from a similar study
Agresti/Franklin Statistics, 78 of 87
Example: Finding n to Estimate
Mean Education in South Africa

A social scientist plans a study of
adult South Africans to investigate
educational attainment in the black
community

How large a sample size is needed so
that a 95% confidence interval for the
mean number of years of education
has margin of error equal to 1 year?
Agresti/Franklin Statistics, 79 of 87
Example: Finding n to Estimate
Mean Education in South Africa

No prior information about the
standard deviation of educational
attainment is available

We might guess that the sample
education values fall within a range of
about 18 years
Agresti/Franklin Statistics, 80 of 87
Example: Finding n to Estimate
Mean Education in South Africa




If the data distribution is bell-shaped,
the range from x – 3s to x + 3 s will
contain nearly all the distribution
The distance x – 3 s to x + 3s equals
6s
Solving 18 = 6s for s yields s = 3
So ‘3’ is a crude estimate of s
Agresti/Franklin Statistics, 81 of 87
Example: Finding n to Estimate
Mean Education in South Africa

The desired margin of error is m = 1 year

The required sample size is:
2
2
4s 4(3)
n

 36
m
1
2
2
Agresti/Franklin Statistics, 82 of 87
What Factors Affect the Choice
of the Sample Size?


The first is the desired precision, as
measured by the margin of error, m
The second is the confidence level
Agresti/Franklin Statistics, 83 of 87
What Other Factors Affect the
Choice of the Sample Size?

A third factor is the variability in the
data
• If subjects have little variation (that is, s
is small), we need fewer data than if
they have substantial variation

A fourth factor is financial
• Cost is often a major constraint
Agresti/Franklin Statistics, 84 of 87
What if You Have to
Use a Small n?

The t- methods for a mean are valid
for any n
• However, you need to be extra cautious to
look for extreme outliers or great
departures from the normal population
assumption
Agresti/Franklin Statistics, 85 of 87
What if You Have to
Use a Small n?

In the case of the confidence interval
for a population proportion, the
method works poorly for small
samples
Agresti/Franklin Statistics, 86 of 87
Constructing a Small-Sample
Confidence Interval for a Proportion


Suppose a random sample does not have at least 15
successes and 15 failures
The confidence interval formula:
p
ˆz
p
ˆ (1  p
ˆ)
n
● Is still valid if we use it after adding ‘2’ to the original
number of successes and ‘2’ to the original number of
failures

This results in adding ‘4’ to the sample size n
Agresti/Franklin Statistics, 87 of 87