Download Topic 06

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Taylor's law wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Student's t-test wikipedia , lookup

Misuse of statistics wikipedia , lookup

German tank problem wikipedia , lookup

Transcript
Topic 6 - Confidence intervals based on
a single sample
• Sampling distribution of the sample mean pages 187 - 189
• Sampling distribution of the sample
variance - pages 189 - 190
• Confidence interval for a population mean pages 209 - 212
• Confidence interval for a population
variance - pages 268 - 270
• Confidence interval for a population
proportion - pages 278 - 281
Confidence Intervals
• To use the CLT in our examples, we had to
know the population mean, m, and the
population standard deviation, s.
• This is okay if we have a huge amount of
sample data to estimate these quantities
with X and s, respectively.
• In most cases, the primary goal of the
analysis of our sample data is to estimate
and to determine a range of likely values for
these population values.
Confidence interval for m with s known
• Suppose for a moment we know s but not m
• The CLT says that X is approximately normal
with a mean of m and a variance of s2/n.
• So, Z 
X m
s
will be standard normal.
n
• For the standard normal distribution, let za be
such that P(Z > za) = a
Confidence interval for m with s known
• Normal calculator
Confidence interval for m with s unknown
• For large samples (n ≥ 30), we can replace s with
s, so that X  za /2 s n is a (1-a)100%
confidence interval for m.
• For small samples (n < 30) from a normal
population,
X  ta /2,n 1 s
n
is a (1-a)100% confidence interval for m.
• The value ta,n-1 is the appropriate quantile from a
t distribution with n-1 degrees of freedom.
• t-distribution demonstration
• t-distribution calculator
Acid rain data
• The EPA states that any area where the average pH of rain
is less than 5.6 on average has an acid rain problem.
• pH values collected at Shenandoah National Park are
listed below.
• Calculate 95% and 99% confidence intervals for the
average pH of rain in the park.
Light bulb data
• The lifetimes in days of 10 light bulbs of a certain
variety are given below. Give a 95% confidence
interval for the expected lifetime of a light bulb of
this type. Do you trust the interval?
Interpreting confidence statements
• In making a confidence statement, we have
the desired level of confidence in the
procedure used to construct the interval.
• Confidence interval demonstration
Sample size effect
• As sample size increases, the width of our
confidence interval clearly decreases.
• If s is known, the width of our interval for m
is
L  2za /2
• Solving for n,
n  (2za /2
s
n
s
)2
L
• If s is known or we have a good estimate, we
can use this formula to decide the sample
size we need to obtain a certain interval
length.
Paint example
• Suppose we know from past data that the
standard deviation of the square footage
covered by a one gallon can of paint is
somewhere between 2 and 4 square feet.
How many one gallon cans do we need to
test so that the width of a 95% confidence
interval for the mean square footage covered
will be at most 1 square foot?
Confidence interval for a variance
• In many cases, especially in manufacturing,
understanding variability is very important.
• Often times, the goal is to reduce the variability
in a system.
• For samples from a normal population,
 (n  1)s 2 (n  1)s 2 
, 2
 2

 ca /2,n 1 c1a /2,n 1 
is a (1-a)100% confidence interval for s2.
• The value c2a/2,n-1 is the appropriate quantile from
a c2 distribution with n-1 degrees of freedom.
• c2 calculator
Acid rain data
• Calculate a 95% confidence interval for the variance of pH
of rain in the park.
Confidence interval for a proportion
• A binomial random variable, X, counts the
number of successes in n Bernoulli trials where
the probability of success on each trial is p.
• In sampling studies, we are often times interested
in the proportion of items in the population that
have a certain characteristic.
• We think of each sample, Xi, from the population
as a Bernoulli trial, the selected item either has
the characteristic (Xi = 1) or it does not (Xi = 0).
• The total number with the characteristic in the
sample is then
n
X   Xi
i 1
Confidence interval for a proportion
• The proportion in the sample with the
characteristic is then
n
pˆ  X /n   X i /n
i 1
• The CLT says that the distribution of this sample
proportion is then normally distributed for large
n with
– E(X/n) =
– Var(X/n) =
• For the CLT to work well here, we need
– X ≥ 5 and n-X ≥ 5
– n to be much smaller than the population size
Confidence interval for a proportion
• How can we use this to develop a (1-a)100%
confidence interval for p, the population
proportion with the characteristic?
ˆ  za /2 p
ˆ (1  p
ˆ )/n
p
Murder case example
• Find a 99% confidence interval for the
proportion of African Americans in the jury pool.
(22 out of 295 African American in sample)
• StatCrunch
Sample size considerations
• The width of the confidence interval is
ˆ (1  p
ˆ )/n
L  2za /2 p
• To get the sample size required for a specific
length, we have
ˆ (1  p
ˆ)
n  (2za /2 /L )2 p
• We might use prior information to estimate
the sample proportion or substitute the
most conservative value of ½.
Nurse employment case
• How many sample records should be
considered if we want the 95% confidence
interval for the proportion all her records
handled in a timely fashion to be 0.02 wide?
Prediction intervals
• Sometimes we are not interested in a
confidence interval for a population
parameter but rather we are interested in a
prediction interval for a new observation.
• For our light bulb example, we might want a
95% prediction interval for the time a new
light bulb will last.
• For our paint example, we might want a
95% prediction interval for the amount of
square footage a new can of paint will cover.
Prediction intervals
• Given a random sample of size n, our best
guess at a new observation, Xn+1, would be
the sample mean, X .
• Now consider the difference, X n 1  X .
• E (X n 1  X ) 
• Var (X n 1  X ) 
Prediction intervals
• If s is known and our population is normal,
then using a pivoting procedure gives
• If s is unknown and our population is
normal, a (1-a)100% prediction interval for a
new observation is
1
X  ta /2,n 1s 1 
n
Acid rain example
• For the acid rain data, the sample mean pH
was 4.577889 and the sample standard
deviation was 0.2892. What is a 95%
prediction interval for the pH of a new rainfall?
• Does this prediction interval apply for the light
bulb data?