Download Confidence Intervals - Gordon State College

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Degrees of freedom (statistics) wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Taylor's law wikipedia , lookup

German tank problem wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript
Confidence
Intervals
Point Estimate
A
specific numerical value estimate of
a parameter.
 The
best point estimate for the
population mean is the sample mean.
Properties of a Good Estimator
 1.
Unbiased
 2.
Consistent
 3.
Relatively Efficient
Since there’s no way of knowing
how good a point estimate is,
statisticians generally prefer . . .
… an Interval Estimate.
A
range of values used to estimate the
parameter.
 The interval may or may not contain
the parameter (but typically does).
 The most common kind is the . . .
. . . Confidence Interval
 Confidence
level = the probability
that the interval estimate will contain
the parameter.
 Common
90%
confidence levels:
95%
98%
99%
Confidence Intervals

Tradeoff: A greater confidence level comes from
a wider interval.

Important z-values to remember:
• 90%:
1.645 (some use 1.65)
• 95%:
1.96
• 98%:
2.33
• 99%:
2.575 (some use 2.58)
Confidence Intervals
 The
maximum error of estimate (E) is
the maximum difference between the
point estimate of a parameter and the
actual value of the parameter.
Confidence Intervals
 Four
steps:
– Compute 1 – conf. level =  .
– Divide  by 2.
– Look up the associated z-value(s)
for  /2 and 1 –  /2.
– Compute the confidence interval.
NOTATION FOR
PROPORTIONS
p = population proportion
x
p̂   sample proportion of x
n successes in a sample of size
n.
qˆ  1  pˆ  sample proportion of failures
in a sample of size n.
MARGIN OF ERROR OF THE
ESTIMATE FOR p
E  z / 2
pˆ qˆ
n
NOTE: n is the size of the sample.
CONFIDENCE INTERVAL FOR
THE POPULATION
PROPORTION p
pˆ qˆ
pˆ  E  p  pˆ  E where E  z / 2
n
The confidence interval is often expressed
in the following equivalent formats:
pˆ  E
or ( pˆ  E , pˆ  E )
or
SAMPLE SIZES FOR ESTIMATING
A PROPORTION p
When an estimate pˆ is known:

z / 2 
n
2
E
pˆ qˆ
2
When no estimate pˆ is known:

z / 2 
n
2
E
 0.25
2
Confidence Intervals
for the Mean
 When
the population standard
deviation or variance is known, the
standard normal distribution can be
used depending on the sample size
and the shape of the original
distribution . . . .
Confidence Intervals
for the Mean
 When
n ≤ 30, the original variable
must be normally distributed.
 When
n > 30, the distribution of
sample means will be approximately
normal even if the original distribution
isn’t normal.
MARGIN OF ERROR FOR THE
MEAN
The margin of error for the mean is the
maximum likely difference observed
between sample mean x and population
mean µ, and is denoted by E. When the
standard deviation, σ, for the population is
known, the margin of error is given by
E  z / 2 

n
where 1 − α is the desired confidence level.
CONFIDENCE INTERVAL
ESTIMATE OF THE
POPULATION MEAN μ (WITH
σ KNOWN and n > 30)
x  E    x  E where

E  z / 2 
or
or
n
xE
( x  E, x  E )
SAMPLE SIZE FOR
ESTIMATING µ
 z / 2   
n

E


2
where zα/2 = critical z score based on desired
confidence level
E = desired margin of error
σ = population standard deviation
Confidence Intervals
for the Mean
 To
summarize, we use the standard
normal distribution (z values from
Table A-2) for these main reasons:
–

is known and the original
variable is normally distributed
OR


is known and n > 30
Confidence Intervals
for the Mean

Now, if  is unknown, s can be
substituted for ,and we use a
new distribution
. . . the (Student) t distribution.
PROPERTIES OF THE
STUDENT t DISTRIBUTION
The Student t distribution is different for different
sample sizes (see Figure below for the cases n = 3
and n = 12).
Features of the t-distribution
Bell-shaped
 Symmetrical about the mean
 The mean, median and mode are equal to 0
and located at the center.
 The curve never touches the x-axis.

Features of the t-distribution
Variance is greater than 1.
 Actually a family of curves based on
degrees of freedom (related to sample size)
 As d.f. increases, t approaches the standard
normal distribution.

ASSUMPTIONS:
σ NOT KNOWN
1. The sample is a simple random sample.
2. Either the sample is from a normally
distributed population OR n > 30.
When σ is not known we will use the
Student t Distribution.
THE STUDENT t
DISTRIBUTION
If the distribution of a population is essentially
normal, then the distribution of
x
t
s
n
is essentially a Student t distribution for all
samples of size n, and is used to find critical
values denoted by tα/2. The Student t distribution
is often referred to as the t distribution.
Confidence Intervals

The degrees of freedom are the number of
values that are free to vary after a sample
statistic has been computed.
d.f. = n – 1
Confidence Intervals
 Two
steps:
– Use the appropriate confidence level
and the appropriate degree of
freedom [d.f. = n – 1] to look up the
associated t-values in the table (A-3).
– Compute the interval.
MARGIN OF ERROR
ESTIMATE OF µ
(WITH σ NOT KNOWN)
s
E  t / 2 
n
where (1 − α) is the confidence level and
tα/2 has n − 1 degrees of freedom.
NOTE: The values for tα/2 are found in
Table A-3 which is found on page 606,
inside the back cover, and on the Formulas
and Tables card.
CONFIDENCE INTERVAL
ESTIMATE OF THE
POPULATION MEAN μ
(WITH σ NOT KNOWN)
xE   xE
where
s
E  t / 2 
n
CHOOSING THE
APPROPRIATE DISTRIBUTION
CI for a Standard Deviation
1.
2.
Given sample values, estimate the population
standard deviation σ or the population variance σ2.
Determine the sample size required to estimate a
population standard deviation or variance.
COMMENT: Estimating standard deviations is
very useful in areas such a quality control in a
manufacturing process. This is because
manufacturers want the products to be consistent.
ASSUMPTIONS
1.
2.
The sample is a simple random sample.
The population must have normally
distributed values (even if the sample is
large).
CHI-SQUARE
DISTRIBUTION
To estimate a population variance we use the
chi-square distribution.
 
2
where
( n1) s

2
2
n = sample size
s2 = sample variance
σ2 = population variance
PROPERTIES OF THE CHISQUARE DISTRIBUTION
1. The chi-square distribution is not symmetric,
unlike the normal and Student t distributions.
As the number of degrees of freedom
increases, the distribution becomes more
symmetric.
Figure 7-8 Chi-Square Distribution
Figure 7-9 Chi-Square Distribution
for df = 10 and df = 20
PROPERTIES (CONTINUED)
2.
The values of chi-square can be zero or positive,
but they cannot be negative.
3.
The chi-square distribution is different for each
number of degrees of freedom, which is df = n – 1
for this CI. As the number increases, the chi-square
distribution approaches the shape of a normal
distribution.
CRITICAL VALUES
In Table A-4, each critical value of χ2
corresponds to an area given in the top row
of the table, and that area represents the
cumulative area located to the right of the
critical value in the body of the table.
NOTE: Since the chi-square distribution
is not symmetric the left critical value  L2
and the right critical value  2 are not
R
just opposites of each other.
 
 
ESTIMATORS OF σ2 and σ
The sample variance s2 is the best point
estimate of the population variance σ2.
The sample standard deviation s is the
best point estimate of the population
standard deviation σ.
CONFIDENCE INTERVAL FOR
POPULATION VARIANCE σ2
(n  1) s

2
 
2
2
R
(n  1) s

2
2
L
left-tail
critical value
right-tail
critical value
CONFIDENCE INTERVAL FOR POPULATION
STANDARD DEVIATION σ
(n  1) s

2
R
2
 
(n  1) s

2
L
2
DETERMINING SAMPLE SIZE
To determine sample size, use Table 7-2 on page 364.