Download Estimation of the mean and proportion

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Taylor's law wikipedia , lookup

Misuse of statistics wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Student's t-test wikipedia , lookup

German tank problem wikipedia , lookup

Transcript
DEPARTMENT OF QUANTITATIVE METHODS & INFORMATION
SYSTEMS
Introduction to Business Statistics
QM 220
Chapter 8
Estimation of the mean and proportion
Spring 2008
Dr. Mohammad Zainal
Estimation: An introduction
2
Estimation is a procedure by which a numerical value or
values are assigned to a population parameter based on the
information collected from a sample.
¾
In inferential statistics,
statistics μ is called the true population mean and p is
called the true population proportion. There are many other
population parameters,
parameters such as the median,
median mode,
mode variance,
variance and
standard deviation.
¾
¾
E a le of estimation:
Examples
e ti atio
¾
mean fuel consumption for a particular model of a car
¾
average time
i
taken
k by
b new employees
l
to learn
l
a job
j b
¾
mean housing expenditure per month incurred by households
QM-220, M. Zainal
Estimation: An introduction
3
If we can conduct a census each time we want to find the value
population
p
parameter, then the estimation p
p
procedures are
of a p
not needed.
¾
Example, if the Kuwaiti Census Bureau can contact every
household in the Kuwait to find the mean housing expenditure
of households, the result of the survey will actually be a census
¾
¾
However, conducting a census:
¾
is too expensive,
¾
very time consuming,
¾
virtually impossible to contact every member of a population
QM-220, M. Zainal
Estimation: An introduction
4
That is why we usually take a sample from the population and
pp p
sample
p statistic. Then we
calculate the value of the appropriate
assign a value or values to the corresponding population
parameter based on the value of the sample statistic.
¾
Example, to estimate the mean housing expenditure per month
of all households in the Kuwait, the Census Bureau will
¾
¾
take a sample of certain households
¾
collect the information on the housing expenditure per month
¾
compute the value of the sample mean
¾
assign values to the population mean
QM-220, M. Zainal
Estimation: An introduction
5
The value assigned to a population parameter based on the
p
statistics is called an estimate of the
value of a sample
population parameter.
¾
The sample statistic used to estimate a population parameter is
called an estimator.
¾
¾
The estimation procedure involves the following steps.
¾
Select a sample.
¾
Collect the required information from the members of the
sample.
¾
Calculate the value of the sample statistic.
¾
Assign value(s) to the corresponding population parameter.
QM-220, M. Zainal
Point and interval estimates
6
¾
An estimate may be a point estimate or an interval estimate.
A Point Estimate
¾
The value of a sample statistic that is used to estimate a
population parameter is called a point estimate.
¾
If Census Bureau takes a sample of 10,000 households and
determines the mean housing expenditure per month, x, for this
sample is $1370. Then, using x as a point estimate of μ, the
bureau can state that the mean housing
g expenditure
p
per month,,
p
μ, for all households is about $1370.
QM-220, M. Zainal
Estimation: An introduction
7
¾
Usually, whenever we use point estimation, we calculate the
margin
a i of error
e o associated
a o iated with
ith that point
oi t estimation.
e ti atio
¾
For the estimation of the population mean, the margin of error
is calculated as follows:
Margin of error = ± 1.96σ x
or
± 1.96s x
An Interval Estimate
¾
In the interval estimation, instead of assigning a single value to
a population parameter, an interval is constructed around the
point estimate.
estimate
QM-220, M. Zainal
Point and interval estimates
8
¾
For the example, instead of saying that the mean housing
e e ditu e per
expenditure
e month
o th for
fo all households
hou ehold is
i $1370,
$1370 wee may
ay
obtain an interval subtracting a number from $1370 and adding
the same number to $1370.
¾
Then we say
y that this interval contains the p
population
p
mean, μ.
¾
For purposes of illustration, suppose subtract $240 from $1370
and add $240 to $1370.
$1370 Consequently,
Consequently we obtain the interval
($1370 ‐ $240) to ($1370 + $240), or $1130 to $1610.
QM-220, M. Zainal
Point and interval estimates
9
Then we state that the interval $1130 to $1610 is likely to
contain the population mean, μ, and that the mean housing
expenditure per month for all households in the United States is
between $$1130 and $$1610.
¾
¾
This procedure is called interval estimation.
The value
Th
l
$1130 is
i called
ll d the
th lower
l
li it off the
limit
th interval
i t
l and
d
$1610 is called the upper limit of the interval.
¾
QM-220, M. Zainal
Point and interval estimates
10
The question is, what number we should add to and subtract
from the point estimate?
¾
¾
The answer to this question depends on two considerations:
¾
The standard deviation of the mean
¾
The level of confidence to be attached to the interval
First, the larger the standard deviation,
First
deviation the greater is the
number subtracted from and added to the point estimate.
¾
Second, the quantity subtracted and added must be large if we
want to have a higher confidence in our interval.
¾
Confidence Level and Confidence Interval: Each interval is
constructed with regard to a given confidence level and is called
a confidence interval.
¾
QM-220, M. Zainal
Point and interval estimates
11
¾
The confidence level associated with a confidence interval
states how much confidence we have that this interval contains
the true population parameter.
¾
Th confidence
The
fid
l l is
level
i denoted
d
d by
b (1 ‐ α)100%,
)100% where
h
α is
i the
h
Greek letter alpha. When expressed as probability, it is called the
confidence
fd
coefficient
ff
and
d is denoted
d
d by
b 1 – α.
¾
α is called the significance level.
¾
Any value of the confidence level can be chosen to construct a
confidence interval,, the more common values are 90%,, 95%,, and
99%. The corresponding confidence coefficients are .90, .95, and
.99.
QM-220, M. Zainal
Interval estimation of a population mean:
12
QM-220, M. Zainal
Interval estimation of a population mean: large samples
13
If the population standard deviation σ is not known, then we
use the sample standard deviation S,
S in which
¾
S
Sx =
n
¾
is used instead of σ x =
σ
n
The (1 ‐
( α)100% confidence interval for μ
)
μ is x ± zσ x
if σ is known
x ± zs x
if σ is unknown
The value of z used here is read from the standard normal
di ib i table
distribution
bl for
f the
h given
i
confidence
fid
l
level.
l
¾
QM-220, M. Zainal
Interval estimation of a population mean: large samples
14
The quantity zσ x(or zs x when σ is not known) in the confidence
interval formula is called the maximum error of estimate and is
denoted by E.
¾
¾
To find z:
1‐Divide (1 ‐ α) by 2.
2‐Locate
2
Locate the answer
ans er in the body of the standard normal
distribution table and record the corresponding value of z.
QM-220, M. Zainal
Interval estimation of a population mean: large samples
15
Example: A publishing company has just published a new college
textbook. Before the company
p y decides the p
price at which to sell this
textbook, it wants to know the average price of all such textbooks in
the market. The research department at the company took a sample of
36 comparable textbooks and collected information on their prices.
This information produced a mean price of $90.50 for this sample. It is
known that the standard deviation of the prices of all such textbooks is
$7.50.
(a) What is the point estimate of the mean price of all such
college textbooks? What is the margin of error for this
estimate?
(b) Construct a 90% confidence interval for the mean price of
all such college
g textbooks.
QM-220, M. Zainal
Interval estimation of a population mean: large samples
16
Example: According to CardWeb.com, the mean bank credit card debt
for households was $7868 in 2004. Assume that this mean was based
on a random sample of 900 households and that the standard
deviation of such debts for all households in 2004 was $2070. Make a
99% confidence interval for the 2004 mean bank credit card debt for all
households.
QM-220, M. Zainal
Interval estimation of a population mean: large samples
17
The width of a confidence interval depends on the size of the
maximum error,
error E,
E which depends on the values of z,
z σ, and n.
n
Why ?
¾
¾
But wee have
ha e no
o control
o t ol on
o σ. Why?
¾
So, the width depends only on:
The value of z, which depends on the confidence level.
¾
The sample size n
¾
¾
The value of z increases as the confidence level increases.
For the same value of σ, an increase in n decreases the value of
σ, which ,in turn decreases the size of E when the confidence
level remains unchanged.
unchanged
¾
QM-220, M. Zainal
Interval estimation of a population mean: large samples
18
If we want to decrease the width of a confidence interval, we
have two choices:
¾
Lower the confidence level.
¾
I
Increase
th sample
the
l size.
i
¾
Lowering the confidence level is not a good choice because a
lower confidence level may give less reliable results.
¾
Increasing the sample size n, is the best way to decrease the
width of a confidence interval.
¾
QM-220, M. Zainal
Interval estimation of a population mean: large samples
19
Confidence level and the width of the confidence interval
Reconsider the last example. Suppose all the information given
in that example remains the same. First, let us decrease the
confidence level to 95%.
¾
From the normal distribution table, z = 1.96 for a 95%
confidence level. Then, using z = 1.96 in the confidence interval,
we obtain
¾
¾
95% confidence interval is smaller than the 99% interval
QM-220, M. Zainal
Interval estimation of a population mean: large samples
20
Sample size and the width of the confidence interval
Reconsider the last example. Suppose we change n to be 2500
and all other information remain the same.
¾
The width of the confidence interval for n = 2500 is smaller
than that of n = 900
¾
QM-220, M. Zainal
Interval estimation of a population mean: large samples
21
Example: The standard deviation for a population is 6.30. A
random sample selected from this population gave a mean
equal to 81.90.
¾Make
a 99% confidence interval for μ assuming n = 36
¾Make
a 99% confidence interval for μ assuming n = 81
¾Make
M k
a 99% confidence
fid
i
interval
l for
f μ assuming
i n = 100
¾Does
the width of the confidence intervals constructed in parts a
th
through
h c decrease
d
as the
th sample
l size
i increases?
i
? Why?
Wh ?
QM-220, M. Zainal
Interval estimation of a population proportion: large samples
22
¾
Many times we want to estimate the population proportion.
¾
Examples:
¾The
production manager of a company wants to estimate the
proportion of defective items on a machine
A bank manager may want to know the percentage of customers who
are satisfied with the bank services.
¾
¾
Recall:
¾The
sampling
p g distribution
(approximately) normal.
of
the
sample
p
proportion
p
p
is
¾The
mean of the sampling distribution of is equal to the population
proportion.
¾The
standard deviation of the sampling distribution of the sample
proportion is
σ pˆ = pˆ qˆ / n
QM-220, M. Zainal
Interval estimation of a population proportion: large samples
23
¾
The margin of error is
zs pˆ
¾
The (1 ‐ α)100% confidence interval for p is
pˆ ± zs pˆ
QM-220, M. Zainal
Interval estimation of a population proportion: large samples
24
Example: According to a 2002 survey, 20% of Americans needed
legal
g advice during
g the p
past y
year to resolve such thorny
y issues
as family trusts and landlord disputes. Suppose a recent sample
of 1000 adult Americans showed that 20% of them needed legal
advice during the past year to resolve such family‐related
issues.
((a)) What
Wh t is
i the
th point
i t estimate
ti t off the
th population
l ti proportion?
ti ? What
Wh t is
i the
th
margin of error for this estimate?
(b) Construct a 99% confidence interval for all adults Americans who
needed legal advice during the past year.
QM-220, M. Zainal
Interval estimation of a population proportion: large samples
25
Example: According to the analysis of a CNN‐USA TODAY‐
Gallup poll conducted in October 2002, ʺStress has become a
common part off everyday
d life
l f in the
h United
U
d States. The
h demands
d
d
of work, family, and home place an increasing burden on the
g American.ʺ According
g to this p
poll,, 40% of Americans
average
included in the survey indicated that they had a limited amount
of time to relax (Gallup. com, November 8, 2002). The poll was
based on a randomly selected national sample of 1502 adults
aged 18 and older. Construct a 95% confidence interval for the
corresponding population proportion.
QM-220, M. Zainal
Interval estimation of a population proportion: large samples
26
Example:
p of 400 observations taken from a p
population
p
a. A sample
produced a sample proportion of .63. Make a 95% confidence
interval for p.
b Another
b.
A th sample
l off 400 observations
b
ti
t k
taken
f
from
th same
the
population produced a sample proportion of .59. Make a 95%
p
confidence interval for p.
c. Another sample of 400 observations taken from the same
population produced a sample proportion of .67. Make a 95%
confidence interval for p.
p
QM-220, M. Zainal
Determining the sample size for the estimation of mean
27
The big reason on why we usually conduct a survey
instead of a census is our limited recourses.
¾
If a smaller sample can serve our purpose then no need
to take a bigger sample.
¾
Suppose on a test to estimate the
S
h mean life
l f off a battery.
b
If 40 batteries can give us the required confidence
y should we waste our money
y by
y buying
y g
interval,, why
more batteries.
¾
The question is how can we decide the minimum
sample size to produce a confidence interval with a
given α.
α
¾
QM-220, M. Zainal
Determining the sample size for the estimation of mean
28
¾
Recall that E is a function of z, σ, and n. That is
E = z.
σ
n
If we fix z, σ, and E and try to find n. The sample size can be
found using
2
σ
n = z 2. 2
E
¾
If we don’t know σ, then s can be used instead by taking a pilot
sample with any arbitrary size.
size
¾
QM-220, M. Zainal
Determining the sample size for the estimation of mean
29
Example: An alumni association wants to estimate the mean
debt of this year
yearʹss college graduates.
graduates It is known that the
population standard deviation of the debts of this yearʹs college
graduates is $11,800.
$11,800 How large a sample should be selected so
that the estimate with a 99% confidence level is within $800 of
the population mean?
QM-220, M. Zainal
Determining the sample size for the estimation of proportion
30
Similar to the sampling mean, we can determine the sample
size for the sampling proportion.
proportion
¾
¾
The only difference is the standard deviation.
The sample size can be found using
σ
E = z.
n
¾If p is not known, we choose a conservative sample of size n by
using
g p = qq. Why?
y
¾
¾
Then, we estimate p using the preliminary sample.
QM-220, M. Zainal
Determining the sample size for the estimation of proportion
31
Example: Lombard Electronics Company has just installed a
new machine that makes a part that is used in clocks.
clocks The
company wants to estimate the proportion of these parts
produced by this machine that are defective.
defective The company
manager wants this estimate to be within .02 of the population
proportion for a 95% confidence level. What is the most
conservative estimate of the sample size that will limit the
maximum
a i u eerror
o to
o within
i i .02 o
of thee popu
population
a io p
proportion?
opo io
QM-220, M. Zainal
Determining the sample size for the estimation of proportion
32
Example: Consider the previous example again. Suppose a
preliminary
p
y sample
p of 200 p
parts p
produced by
y this machine
showed that 7% of them are defective. How large a sample
should the company select so that the 95% confidence interval
for p is within .02 of the population proportion?
QM-220, M. Zainal
Interval estimation of a population mean: small samples
33
In a previous section , we considered estimating the population
mean for large samples (n ≥ 30).
30)
¾
Using the CLT, we assumed that the sampling distribution of
the sample
a le mean
ea is
i approximately
a
o i ately normal
o al despite
de ite the shape
ha e of
the population and whether or not σ is known.
¾
Unfortunately, many times we are restricted to small samples
due to the nature of the experiment.
¾
¾
For instance:
Clinical Trials
¾
Space missions
¾
QM-220, M. Zainal
Interval estimation of a population mean: small samples
34
If we are dealing with small sample sizes, we will have two
scenarios:
¾
1‐The original population is normal and σ is known.
2 Th original
2‐The
i i l population
l ti is
i (approximately)
(
i t l ) normall and
d σ is
i unknown.
k
In the first scenario, we use the normal distribution to construct
the confidence interval of μ.
¾
In the second scenario, we can’t use the normal distribution to
construct the confidence interval of μ. Instead, we will use
another distribution called the t‐distribution.
¾
QM-220, M. Zainal
Interval estimation of a population mean: small samples
35
Conditions under which the t‐distribution is used to make a confidence interval about μ.
μ
¾
1‐ The population from which the sample is drawn is (approximately) normally distributed
2‐ The sample size is small (that is, n < 30)
3‐ The population standard deviation, σ , is not known
The t distribution
The t distribution is a specific type of bell‐shaped distribution p
yp
p
with lower height and a wider spread than the standard normal distribution.
¾
As the sample size becomes larger, the t distribution approaches the standard normal distribution.
¾
QM-220, M. Zainal
Interval estimation of a population mean: small samples
36
The t distribution has only one parameter, called the degrees of freedom (df). The mean of the t distribution is equal to 0 and its ( )
q
standard deviation is √[df/(df ‐ 2)].
¾
¾
The units of the t distribution are denoted by t.
y
The number of degrees of freedom (df) is the only parameter of
the t distribution.
¾
df = n – 1
QM-220, M. Zainal
Interval estimation of a population mean: small samples
37
Example: Find the value of t for n = 10 and .05 area in the right
tail. Also, find it’s standard deviation.
Solution: df = n – 1 = 9 → standard deviation = 1.134
The required value of t for 9 df and
.05 area in the right tail
QM-220, M. Zainal
Interval estimation of a population mean: small samples
38
Example: Find the value of t for n = 10 and .05 area in the left
tail. Also, find it’s standard deviation.
Solution:
QM-220, M. Zainal
Interval estimation of a population mean: small samples
39
Confidence interval for μ using the t distribution
If the following three conditions hold true,
true we use the
t distribution to make a confidence interval about μ.
¾
11‐ The population from which the sample is drawn is The population from which the sample is drawn is
(approximately) normally distributed
2‐ The sample size is small (that is, n < 30)
3‐ The population standard deviation, σ , is not known
The (1 ‐ α)% confidence interval for μ for small samples is s
x ± ts X
where
sX =
n
The value of t is obtained from the t distribution table
for n‐1 df and a given confidence level.
¾
QM-220, M. Zainal
Interval estimation of a population mean: small samples
40
Example: A doctor wanted to estimate the mean cholesterol level
for all adult men living
g in Dasmah. He took a sample
p of 25 adult
men from Hartford and found that the mean cholesterol level
for this sample is 186 with a standard deviation of 12. Assume
that the cholesterol level for all adult men in Hartford are
(approximately) normally distributed. Construct a 95%
confidence interval for the population mean μ.
μ
QM-220, M. Zainal
Interval estimation of a population mean: small samples
41
Example: Twenty‐five randomly selected adults who buy books
for general reading were asked how much they usually spend
on books
b k per year. The
Th sample
l produced
d
d a mean off $1450 and
da
standard deviation of $300 for such annual expenses. Assume
that such expenses for all adults who buy books for general
reading
d
h
have
an approximate normall distribution.
d
b
Determine a
99% confidence interval for the corresponding population mean
μ
μ.
QM-220, M. Zainal