Download confidence-interval estimate

Document related concepts

History of statistics wikipedia , lookup

Statistics wikipedia , lookup

Transcript
Chapter 7
Estimates, Confidence
Intervals, and Sample Sizes
Inferences Based on a Single Sample
Estimation with Confidence Intervals
Learning Objectives
 State what is estimated
 Distinguish point and interval estimates
 Explain interval estimates
 Estimating a population mean: s is known.
 Estimating a population mean: s is not known.
 Estimating a Population Proportion
 Compute Sample Size
Statistical Methods
Statistical
Methods
Descriptive
Statistics
Inferential
Statistics
Estimation
Hypothesis
Testing
Estimation Process
Population
Mean, , is
unknown

 
 

Sample



 
Random Sample
Mean 
X = 79
I am 95%
confident that
 is between
75 & 83.
Unknown Population Parameters
Are Estimated
Estimate Population
Parameter...
with the Sample
Statistic
Mean

X
Proportion
p
p^
Variance
s2
s2
National Unemployment Rates, 2008 - 2015
Source: Bureau of Labor Statistics
Jan.
Feb.
Mar.
April
May
June
July
Aug.
Sept.
Oct.
Nov.
Dec.
2015
5.7
5.5
5.5
5.4
5.5
5.3
5.3
5.1
5.1
2014
6.6
6.7
6.7
6.3
6.3
6.1
6.2
6.1
5.9
5.8
5.8
5.6
2013
7.9
7.7
7.5
7.5
7.5
7.5
7.3
7.2
7.2
7.2
7.0
6.7
2012
8.3
8.3
8.2
8.1
8.2
8.2
8.2
8.1
7.8
7.9
7.8
7.8
2011
9.0
8.9
8.8
9.0
9.1
9.2
9.1
9.1
9.1
9.0
8.6
8.5
2010
9.7
9.7
9.7
9.9
9.7
9.5
9.5
9.6
9.6
9.6
9.8
9.4
2009
7.6
8.1
8.5
8.9
9.4
9.5
9.4
9.7
9.8
10.2
10.0
10.0
2008
4.9
4.8
5.1
5.0
5.5
5.6
5.8
6.2
6.2
6.6
6.8
7.2
Point Estimation
Estimation
Point
Estimation
Interval
Estimation
Point Estimation
A point estimate of a parameter (, p, or s 2) is
the value of a statistic ( X , ^
p , or s 2) used to
estimate the parameter.
1. It provides a single value.
2. It is based on observations from 1 sample.
3. It gives no information about how close the
4.
value is to the unknown population parameter.
Example: Sample mean X = 79 is a point
estimate of the unknown population mean.
Interval Estimation
Estimation
Point
Estimation
Interval
Estimation
Interval Estimation
 The sample mean rarely equals the population


mean. That is, sampling error is to be expected.
Therefore, in addition to the point estimate for  ,
we need to provide some information that
indicates the accuracy of the point estimate.
We do so by giving a confidence-interval
estimate for .
Interval Estimation
 A confidence interval or interval estimate is a


range of values used to estimate the true value of
a population parameter.
The interval is obtained from a point estimate of
the parameter and a percentage that specifies the
probability that the interval actually does contain
the population parameter.
This percentage is called the confidence level of
the interval.
Confidence Level (CL)
 Is the probability (when given in decimal form)
that the confidence interval contains the unknown
population parameter. The CL is denoted (1 - ) 
 That is,  is the probability that the parameter is

Not within the confidence interval.
Typical values of CL are 99%, 95%, 90%. This
means that the corresponding values of  are
 = 0.01, 0.05, 0.10.
Key Elements of the CI
 The center of the interval is the point estimate X.
 E is called the margin of error and is half the
length of the confidence interval.
 E is determined by , s, and the sample size n.
Key Elements of the CI
 This interval contains the parameter  , (1-)%
of the times.
P( X - E    X  E ) = 1 - 
Margin of Error and CI
Interval Estimation
1. Provides Range of Values.
2. Is based on observations from 1 sample.
3. Gives information about how close the
estimate is the to unknown population
parameter.
2. Is stated in terms of probability.
3. Example: unknown population mean lies
between 66 and 70 with 95% confidence.
http://onlinestatbook.com/stat_sim/sampling_dist/index.html
Margin of Error or Interval Width
Factors Affecting Interval Width
1. As data dispersion measured by s increases the
error or width E increases.
2. As Sample Size n increases the error or width E
decreases.
3. As the level of confidence (1 - ) % increases
the width increases because it affects Z / 2
Confidence Interval Estimates
Confidence
Intervals
Mean
s Known
Variance
s Unknown
Proportion
CI for the Mean (s known)
1. Assumptions are
 Population standard deviation is known
 Population is normally distributed or
 Sampling distribution can be approximated by
normal distribution (n  30)
2. Confidence Interval Estimate
X - Z / 2
s
n
   X  Z / 2
s
n
Example 1
The mean of a random sample of n = 25 isX = 50.
Set up a 95% confidence interval estimate for  if
s = 10.
X - Z 0.05/ 2
s
   X  Z 0.05/ 2
s
n
n
10
10
50 - 1.96
   50  1.96
25
25
46.08    53.92
Example 2
You are a Quality Control inspector
for Norton. The s for 2-liter bottles
is .05 liters. A random sample of
100 bottles showedX = 1.99 liters.
What is the 90% confidence
interval estimate of the true mean
amount in 2-liter bottles?
Tinto
2 litros
2
liter
Solution
X - Z /2 
s
   X  Z /2 
s
n
n
.05
.05
1.99 - 1.645 
   1.99  1.645 
100
100
1.982    1.998
Confidence Interval Estimates
Confidence
Intervals
Mean
s Known
Variance
s Unknown
Proportion
CI for the Mean (s unknown)
 If X is a normally distributed variable with mean μ
and standard deviation σ, then, for samples of size n,
the variableX is also normally distributed and has
mean μ and standard deviation s / n .
 Equivalently, the standardized version ofX ,
X -  has the standard normal distribution.
z=
s/ n
CI for the Mean (s unknown)
 In practice, σ is unknown therefore we cannot base
our CI procedure on the standardized version ofX.
 The best we can do is estimate σ using the sample
standard deviation s and replace σ with s in the
equation
X -
z=
s/ n
and base our CI procedure on the new variable,
X -
t=
s/ n
t-Distributions and t-Curves
 The t-distribution depends on n and for each n there

is t-curve.
Notice, that to find a t-value you need to compute
the sample mean and the sample standard deviation.
This will usually require the use of a calculator.
Finding the t-Value Having a
Specified Area to Its Right
 For a t-curve with 13 degrees of freedom, find t0.05;
that is, find the t-value having area 0.05 to its right,
as shown in the figure.
Finding the t-Value Having a
Specified Area to Its Right
 To find the t-value in question, we use Table IV.


For ease of reference, we have repeated a portion of
Table IV in the next slide.
Notice that the table provides the t-score only when
you know the area to its right.
Unlike the table for z-scores, the t-table cannot be
used to find probabilities when you know what the
t-score is. For this, you need to use the t-distribution
in the calculator.
Finding the t-Value Having a
Specified Area to Its Right
 For a t-curve with 13 degrees of freedom, find t0.05;
that is, find the t-value having area 0.05 to its right,
as shown in the figure.
Example 1
 A random sample of n = 25 has X = 50 and s = 8.
Set up a 95% confidence interval estimate for .
S
S
X - t /2, n -1 
   X  t /2, n -1 
n
n
8
8
50 - 2.0639 
   50  2.0639 
25
25
46.69    53.30
Example 2
 You are a time study analyst in

manufacturing. You have recorded
the following task times (in minutes):
3.6, 4.2, 4.0, 3.5, 3.8, 3.1.
What is the 90% confidence interval
estimate of the population mean
task time?
Solution
X = 3.7
S = 0.38987
n = 6 df = 6 - 1 = 5
S
n = 0.38987 / 6 = 0.1592
t0.05 = 2.015
3.7 - 2.015  0.1592    3.7  2.015  0.1592
3.379    4.020
Finding Sample Sizes
Estimating the Sample Size
Example 1
 What sample size is needed to be 90% confident of
being correct within  5? A pilot study suggested
that the standard deviation is 45.
 Z /2s   1.645  45 
n=
 = 219.2  220
 =
5

 E  
2
2
Example 2
 You work in Human Resources at Merrill
Lynch. You plan to survey employees to
find their average medical expenses. You
want to be 95% confident that the sample
mean is within ± $50. A pilot study
showed that s was about $400.
 What sample size do you use?
Solution
 Z 0.025s 
n=

 E 
2
 1.96  400 
=

50


2
= 245.86  246
Confidence Interval Estimates
Confidence
Intervals
Mean
s Known
Variance
s Unknown
Proportion
Confidence Intervals for
One Population
Proportion
Proportion Notation and Terminology
 Many statistical studies are concerned with obtaining

the proportion (percentage) of a population that has a
specified attribute.
For example, we might be interested in
• the percentage of U.S. adults who have health insurance,
• the percentage of cars in the US that are imports,
• the percentage of U.S. adults who favor stricter clean air
health standards, or
• the percentage of Canadian women in the labor force.
Proportion Notation and Terminology
 Notice that in the previous examples, a given


individual in the population will have the specified
attribute or not.
This means that we are interested in an experiment
that can have only two possible outcomes.
For instance,
• A U.S. adult does have health insurance or does not.
• A car in the U.S. is either an import or is not.
• etcetera
Proportion Notation and Terminology
 We introduced some notation and terminology used
when we make inferences about a population
proportion.
Proportion Notation and Terminology
 Sometimes we refer to x (the number of members in
the sample that have the specified attribute) as the
number of successes and to n − x (the number of
members in the sample that do not have the specified
attribute) as the number of failures.
Proportion Notation and Terminology
 Notice that for a given sample of size n, the quotient
x/n, is the mean number of successes in n trials.
p is the mean of the variable X which is 1
That is, ^
when the member in the sample has the attribute and
0 when the member does not.
The Sampling Distribution of the
Sample Proportion
 To make inferences about a population proportion p

we need to know the sampling distribution of the
sample proportion, that is, the distribution of the
p.
variable ^
Because a proportion can always be regarded as a
mean, we can use our knowledge of the sampling
distribution of the sample mean to derive the
sampling distribution of the sample proportion.
 The accuracy of the normal approximation depends
on n and p. If p is close to 0.5, the approximation is
quite accurate, even for moderate n. The farther p is
from 0.5, the larger n must be for the approximation
to be accurate.
 As a rule of thumb, we use the normal approximation

when np and n(1 − p) are both 5 or greater.
In this section, when we say that n is large, we mean
that np and n(1 − p) are both 5 or greater.
 Since in practice we do not know the value of p we

replace the conditions np  5 and n(1− p)  5 with
^  5 and n(1− ^
the conditions np
p)  5
This is the same as: the number of successes x and
the number of failures n-x are both 5 or greater.
CI for the Proportion p
Margin of Error or Interval Width
p̂
pˆ - z / 2 pˆ (1- pˆ )/ n
pˆ  z / 2 pˆ (1- pˆ )/ n
Estimating the Sample Size
 The margin of error E and CL (1-)% of a CI are
often specified in advance. We must then determine
the sample size required to meet those specifications.
 If we solve for n in the formula for E, we obtain
 z / 2 
n = pˆ (1 - pˆ ) 

E


2
 This formula cannot be used to obtain the required
p, is not
sample size because the sample proportion, ^
known prior to sampling.
Estimating the Sample Size
 The way around this problem is to observe that the
^ can be is 0.25 when ^
p = 0.5
p(1-p)
largest ^
Estimating the Sample Size
^ is 0.25
 Because the largest possible value of ^p(1-p)
the most conservative approach for determining
sample size is to use that value in equation
 z / 2 
n = pˆ (1 - pˆ ) 

 E 
2
 The sample size obtained then will generally be
larger than necessary and the margin of error less
than required. Nonetheless, this approach guarantees
that the specifications will be met or bettered.
Estimating the Sample Size
Example 1
 A random sample of 400 graduates showed 32 went
to grad school. Set up a 95% confidence interval
estimate for proportion p of students that go to grad
school.
pˆ - Z /2 
pˆ  (1 - pˆ )
 p  pˆ  Z /2 
n
pˆ  (1 - pˆ )
n
.08  (1 - .08)
.08  (1 - .08)
0.08 - 1.96 
 p  0.08  1.96 
400
400
0.053  p  0.107
Example 2
 You are a production manager for a
newspaper. You want to find the %
defective newspapers. Of 200
newspapers, 35 had defects.
What is the 90% confidence interval
estimate of the population
proportion of defective newspapers?
Solution
pˆ - z /2 
0.175 - 1.645 
pˆ  (1 - pˆ )
 p  pˆ  z /2 
n
pˆ  (1 - pˆ )
n
.175  (.825)
.175  (.825)
 p  0.175  1.645 
200
200
0.1308  p  0.2192
Example 3
Example 4
Example 5