Download Section 8.1 - USC Upstate: Faculty

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Probability wikipedia , lookup

History of statistics wikipedia , lookup

Statistics wikipedia , lookup

Transcript
Section 8.1
Estimating  When  is Known
In this section, we develop techniques for
estimating the population mean μ
using sample data.
We assume that the population standard
deviation σ is known.
8.1 / 1
Estimating  When  is Known
Assumptions:
1.We have a simple random sample of size n
from a population of x values.
2.The value of σ, the population standard
deviation of x, is known.
3.If the x distribution is normal, then our
methods work for any sample size n.
4.If the distribution is unknown, a sample size of
at least 30 (sometimes even more) is required.
8.1 / 2
Point Estimate
• A point estimate is an estimate of a
population parameter given by a single
number.
• We use x (the sample mean) as a point
estimate for μ (the population mean)
•
x
is the point estimate for μ.
8.1 / 3
Examples of Point Estimates
x is used as a point estimate for  .
s is used as a point estimate for .
Margin of Error
Is the magnitude of the difference between the
point estimate and the true parameter value.
• The margin of error using x
as a point estimate for  is
•
•
•
•
x   or
x 
8.1 / 4
Confidence Level
A confidence level, c, is a measure of the degree of
assurance we have in our results.
The value of c may be any number between zero
and one.
Typical values for c include 0.90, 0.95, and 0.99.
Critical Value for a Confidence Level, c
the value zc such that the area under the
standard normal curve falling between – zc and
zc is equal to c.
8.1 / 5
Confidence Level
The area under the normal curve from  zc to zc is the
probability that the standardized normal variable z lies in
that interval. This means that P( zc  z  zc )  c 8.1 / 6
Example
Find the critical value
Find a number z0.90 such that 90% of the area under the
standard normal curve lies between z0.90 and z0.90
That is, we will find z0.90 such that P(– z0.90 < z < z0.90 ) = 0.90
Solution
We know that to find the z value when we were given the
area between -z and z. The first thing we do is to find the
corresponding area to the left of –z. If A is the area
between –z and z, then (1-A)/2 is the area to the left of z.
In our case the area between –z and z is 0.90.
The corresponding area in the left tail is
(1-0.90)/2=0.05
8.1 / 7
Example cont.
According to Appendix Table 3, 0.0500 lies exactly
halfway between two values in the table ( .0505 and
.0495).
• Averaging the z values associated with areas gives
• z0.90 = 1.645 and z0.90 = 1.645.
• z0.90 = 1.645 is the critical value for a confidence level of
c = 0.90.
• We have
• P(-1.645 < z < 1.645) = 0.90.
8.1 / 8
Some Common Levels of Confidence and
Their Corresponding Critical Values
8.1 / 9
An estimate is not very valuable unless we have some kind
of measure of how “good” it is. The probability can give
us an idea of the size of the margin of error caused by
using the sample mean x as an estimate for the
population mean.
Remember that x is a random variable. Each time we draw
x
a sample of size n from a population,
we can get a
different value for x . According to the central limit
theorem, if the sample size is large, then x has a
distribution that is approximately normal with mean  x  
the population mean we are trying to estimate.
The standard deviation is  x   / n.
If x has a normal distribution, these results are true for any
sample size.
8.1 / 10
This information, together with our work on confidence
levels, leads us to the probability statement

 

P   zc
 x    zc
c
n
n

(1)
Comment:
To derive Equation (1), we start with the probability
statement P( zc  z  zc )  c Since n  30, we can
use the central limit theorem and replace z
by ( x   ) / ( / n ). Finally, we multiply all parts of the
inequality by  / n to obtain Equation (1).
8.1 / 11
Equation (1) uses the language of probability to give us an
idea of the size of the margin of error for the
corresponding confidence level c. In other words,
Equation (1) states that the probability is c that our point
estimate x is within a distance  zc ( / n ) of the
population mean μ.
The margin of error (or absolute error) using x is a point
estimate for μ is | x - μ|. In most practical problems, μ
is unknown, so the margin of error is also unknown.
However, Equation (1) allows us to compute an error
tolerance E, which serves as a bound on the margin of
error.

E  zc
n
8.1 / 12
Using a c% level of confidence, we can say that the point
estimate x differs from the population mean μ by a maximal
margin of error

(2)
E  zc
n
Note: Formula (2) for E is based on the fact that the sampling
distribution for x is exactly normal, with mean μ and standard
deviation  / n .
This occurs whenever the x distribution is normal with mean μ
and standard deviation σ. If the x distribution is not normal,
then according to the central limit theorem,
large samples ( n  30) produce an x distribution that is
approximately normal with mean μ and standard deviation  / n.
8.1 / 13
Using Equations (1) and (2), we conclude that
P(-E < x - μ < E) = c
(3)
and
P( x - E < μ < x + E) = c (4)
Equation (4) states that there is a chance of c that the
x
interval from x - E to x + E contains the population
mean μ. We call this interval a c confidence interval
for μ.
A c confidence interval for  is an interval computed
from sample data in such a way that c is the
probability of generating an interval containing the
actual value of .
8.1 / 14
A c confidence interval for  is an interval
computed from sample data in such a way
that c is the probability of generating an
interval containing the actual value of .
We may get a different confidence interval for
each different sample that is taken.
Some intervals will contain the population mean
μ and others will not. However, in the long
run, the proportion of confidence intervals
that contain μ is c.
8.1 / 15
How Find a Confidence
Interval for  When  is Known:
Let x be a random variable appropriate to your application.
Obtain a simple random sample (of size n) of x values for
which you compute the sample mean, x . The value of σ
is already known (perhaps from a previous study).
If you cannot assume x has a normal distribution, use a
sample size of 30 or more.
Confidence Interval for  When  is Known
xE xE
x  Sample Mean

E  zc
n
c  confidence level
z c  critical value for confidence level
where
based on the standard normal distributi on
Example
Confidence interval for μ with σ known
Create a 95% confidence interval for the mean driving time
between Philadelphia and Boston.
Assume that the mean driving time of 64 trips was x =6.4
hours and that the standardx deviation is σ = 0.9 hours.
Solution
The interval from x - E to x + E will be a 95% confidence
interval for μ. In this case,
x = 6.4 hours  = 0.9 hours
E  zc

 1.96
0.9
 .2205
64
n
c = 95%, so zc = 1.96 and n = 64
and x  E    x  E or 6.4 – .2205 < μ < 6.4 + .2205
6.1795 < μ < 6.6205
We are 95% sure that the true time
is between 6.18 and 6.62 hours.8.1 / 17
Comments
of the term confidence interval
It is important to realize that the endpoints x  E are really
statistical variables the equation P( x  E    x  E )  c
States that we have a chance c of obtaining a sample such that
the interval, once it is computed, will obtain the parameter μ.
Of course, after the confidence interval, is numerically fixed, it
either does or does not contain μ.
So the probability is 1 or 0 that the interval, when it is fixed, will
contain μ. A nontrivial probability statement can be made
only about variables, not constants.
Therefore, the above equation really states that if we repeat the
experiment many times and get lots of confidence intervals
(for the same sample size), then the proportion of all intervals
that will turn out to contain the mean μ is c.
8.1 / 18
We may get different confidence
intervals for different samples.
8.1 / 19
Guided Exercise
Walter jogs 3 miles. He knows that the standard
deviation is σ = 2.40 minutes for his jogging
times. For a random sample of 90 jogging times, his
mean time was x = 22.50 minutes. Let μ be the
z x 2.58
mean jogging time for the
entire distribution of
Walter’s 3-mile jogging times over the past several
years. Find the 0.99 confidence interval for μ.
z0.99  2.58
(a) What is the value of z0.99 ? (Tables)
(b) Is the x distribution approximately normal?
0.99
Yes, we know this from the central limit theorem.
8.1 / 20
Guided Exercise cont.
(c) What is the value of E?

3.40
E  zc
 2.58(
)  0.65
n
90
(d) What are the end points for a 0.99 confidence interval for μ?
x  E  22.50  0.65  21.85
x  E  22.50  0.65  23.15
(e) How can we interpret the confidence interval?
We are 99% certain that the interval from 21.85 to 23.15
is an interval that contains the population mean time μ.
8.1 / 21
PROCEDURE
How to find the sample size n for
estimating μ when σ is known
When estimating the mean, how large a sample must
be used in order to assure a given level of confidence?
Use the formula:
 z c 
n

 E 
2
If n is not a whole number, increase n to the next higher
whole number.
Note that n is the minimal sample size for a specified
confidence level and maximal error of estimate E.
Note: Use the standard deviation, s, of a preliminary
sample of size 30 or larger to estimate .
Example
Determine the sample size necessary to determine (with 99%
confidence) the mean time it takes to drive from Philadelphia to
Boston. We wish to be within 15 minutes of the true time.
Assume that a preliminary sample of 45 trips had a standard
deviation of 0.8 hours.
Solution:
z0.99 = 2.58 E = 15 minutes = 0.25 hours
Since the preliminary sample is large enough, we can assume that
the population standard deviation is approximately equal to 0.8
hours.
Minimum Sample Size is:
n  68.16
Round to the next higher whole number.
To be 99% confident in our results, the
minimum sample size = 69.
 zc   2.58(2.15) 
n
 
  769.2
 E   0.20 
2
2
8.1 / 23
Example
A study is designed to find the mean weight of salmon
caught by an Alaskan fishing company. A recent study of
a random sample of 50 salmon showed s = 2.15 lb.
How large a sample should be taken to be 99% confident
that the sample mean is xwithin 0.20 lb of the true
x
mean weight μ?
Solution: z0.99  2.58
E  0.20
  s  2.15
 zc   2.58(2.15) 
n
 
  769.2
 E   0.20 
2
2
WE conclude that a sample size of 770 or larger is enough
to satisfy the specifications.
Assignment 19
8.1 / 24