Download Chapter 8:

Document related concepts

History of statistics wikipedia , lookup

Degrees of freedom (statistics) wikipedia , lookup

Taylor's law wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Misuse of statistics wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Student's t-test wikipedia , lookup

German tank problem wikipedia , lookup

Transcript
Chapter 8:
ESTIMATION OF THE MEAN AND
PROPORTION
ESTIMATION: AN
INTRODUCTION
Definition
The assignment of value(s) to a population
parameter based on a value of the
corresponding sample statistic is called
estimation.
2
ESTIMATION: AN
INTRODUCTION cont.
Definition
The value(s) assigned to a population
parameter based on the value of a sample
statistic is called an estimate.
The sample statistic used to estimate a
population parameter is called an
estimator.
3
ESTIMATION: AN
INTRODUCTION cont.
The estimation procedure involves the
following steps.
1.
2.
3.
4.
Select a sample.
Collect the required information from the
members of the sample.
Calculate the value of the sample statistic.
Assign value(s) to the corresponding
population parameter.
4
POINT AND INTERVAL
ESTIMATES


A Point Estimate
An Interval Estimate
5
A Point Estimate
Definition
The value of a sample statistic that is used
to estimate a population parameter is
called a point estimate.
6
A Point Estimate cont.


Usually, whenever we use point estimation, we
calculate the margin of error associated with that
point estimation.
The margin of error is calculated as follows:
Margin of error  1.96 x or  1.96s x
7
An Interval Estimation
Definition
In interval estimation, an interval is
constructed around the point estimate,
and it is stated that this interval is likely to
contain the corresponding population
parameter.
8
Figure 8.1
Interval estimation.
x  
$1130
x  $1370
$1610
9
An Interval Estimation cont.


Definition
Each interval is constructed with regard to a given
confidence level and is called a confidence
interval. The confidence level associated with a
confidence interval states how much confidence we
have that this interval contains the true population
parameter. The confidence level is denoted by
(1 – α)100%.
10
INTERVAL ESTIMATION OF A
POPULATION MEAN: LARGE SAMPLES
Confidence Interval for μ for Large Samples
The (1 – α)100% confidence interval for μ is
x  z x if  is known
where
x  zs x if  is not known
 x   / n and s x  s / n
The value of z used here is read from the
standard normal distribution table for the given
confidence level.
11
INTERVAL ESTIMATION OF A
POPULATION MEAN: LARGE SAMPLES
cont.
Definition
The maximum error of estimate for μ,
denoted by E, is the quantity that is
subtracted from and added to the value of
x to obtain a confidence interval for μ.
Thus,
E  z x or zs x
12
Figure 8.2
Finding z for a 95% confidence level.
Total shaded area
is .9500 or 95%
.4750
.4750
μ
-1.96
0
x
1.96
z
13
Figure 8.3

2
Area in the tails.

2
(1 – α)
-z
0
z
z
14
Example 8-1
A publishing company has just published a new
college textbook. Before the company decides the
price at which to sell this textbook, it wants to
know the average price of all such textbooks in
the market. The research department at the
company took a sample of 36 comparable
textbooks and collected information on their
prices. This information produces a mean price of
$70.50 for this sample. It is known that the
standard deviation of the prices of all such
textbooks is $4.50.
15
Example 8-1
(a)
(b)
What is the point estimate of the mean
price of all such textbooks? What is the
margin of error for the estimate?
Construct a 90% confidence interval for
the mean price of all such college
textbooks.
16
Solution 8-1
a)
n = 36, x = $70.50, and σ = $4.5
x 

n

4.50
36
 $.75
Point estimate of μ = x = $70.50
Margin of error =  1.96 x  1.96(.75)  $1.47
17
Solution 8-1
b)
Confidence level is 90% or .90.
z = 1.65.
x  z x  70.50  1.65(.75)  70.50  1.24
 (70.50 - 1.24) to (70.50  1.24)
 $69.26 to $71.74
18
Solution 8-1
We can say that we are 90% confident that
the mean price of all such college textbooks
is between $69.26 and $71.74.
19
Figure 8.4
x3
Confidence intervals.
x2
x1  1.65 x
x 2  1.65 x
x3  1.65 x
x3
x3  1.65 x
x2
x1
x1
x
x1  1.65 x
x 2  1.65 x
20
Example 8-2
According to a report by the Consumer
Federation of America, National Credit Union
Foundation, and the Credit Union National
Association, households with negative assets
carried an average of $15,528 in debt in 2002
(CBS.MarketWatch.com, May 14, 2002). Assume
that this mean was based on a random sample of
400 households and that the standard deviation
of debts for households in this sample was
$4200. Make a 99% confidence interval for the
2002 mean debt for all such households.
21
Solution 8-2

Confidence level 99% or .99


sx 
s
n

4200
400
 $210
The sample is large (n > 30)

Therefore, we use the normal distribution

z = 2.58
22
Solution 8-2
x  zs x  15,528  2.58(210)  15,528  541.80
 $14,986.20 to $16,069.80
Thus, we can state with 99% confidence that
the 2002 mean debt for all households with
negative assets was between $14,986.20 and
$16,069.80.
23
INTERVAL ESTIMATION OF A
POPULATION MEAN: SMALL SAMPLES


The t Distribution
Confidence Interval for μ Using the t
Distribution
24
The t Distribution
Conditions Under Which the t Distribution Is Used to Make a
Confidence Interval About μ
The t distribution is used to make a
confidence interval about μ if
The population from which the sample is drawn
is (approximately) normally distributed
2. The sample size is small (that is, n < 30)
3. The population standard deviation , σ , is not
known
1.
25
The t Distribution cont.
The t distribution is a specific type of bellshaped distribution with a lower height and a
wider spread than the standard normal
distribution. As the sample size becomes larger,
the t distribution approaches the standard normal
distribution. The t distribution has only one
parameter, called the degrees of freedom (df).
The mean of the t distribution is equal to 0 and
its standard deviation is df /( df  2) .
26
Figure 8.5
The t distribution for df = 9 and the
standard normal distribution.
The standard deviation
of the t distribution is
The standard deviation of
the standard normal
distribution is 1.0
9 /(9  2)  1.134
μ=0
27
Example 8-3
Find the value of t for 16 degrees of
freedom and .05 area in the right tail of a
t distribution curve.
28
Table 8.1
Determining t for 16 df and .05 Area in the
Right Tail
Area in the right tail
Area in the Right Tail Under the t Distribution Curve
df
df
.10
.05
.025
…
.001
1
2
3
.
16
.
3.078
1.886
1.638
…
1.337
…
6.314
2.920
2.353
…
1.746
…
12.706
4.303
3.182
…
2.120
…
…
…
…
…
…
…
318.309
22.327
10.215
…
3.686
…
The required value of t for 16 df and .05 area in
the right tail
29
Figure 8.6
The value of t for 16 df and .05 area in the
right tail.
.05
df = 16
0
1.746
t
This is the required
value of t
30
Figure 8.7
The value of t for 16 df and .05 area in the
left tail.
df = 16
.05
-1.746
0
t
31
Confidence Interval for μ Using
the t Distribution
The (1 – α)100% confidence interval
for μ is
x  tsx
s
where s x 
n
The value of t is obtained from the t
distribution table for n – 1 degrees of
freedom and the given confidence level.
32
Example 8-4
Dr. Moore wanted to estimate the mean
cholesterol level for all adult men living in
Hartford. He took a sample of 25 adult men from
Hartford and found that the mean cholesterol
level for this sample is 186 with a standard
deviation of 12. Assume that the cholesterol
levels for all adult men in Hartford are
(approximately) normally distributed. Construct a
95% confidence interval for the population mean
μ.
33
Solution 8-4





Confidence level is 95% or .95
sx 
s
n

12
25
 2.40
df = n – 1 = 25 – 1 = 24
Area in each tail = .5 – (.95/2)
= .5 - .4750 = .025
The value of t in the right tail is 2.064
34
Figure 8.8
The value of t.
df = 24
.025
.025
.4750
-2.064
.4750
0
2.064
t
35
Solution 8-4
x  tsx  186  2.064(2.40)  186  4.95
 181.05 to 190.95
Thus, we can state with 95% confidence that
the mean cholesterol level for all adult men
living in Harford lies between 181.05 and
190.95.
36
Example 8-5
Twenty-five randomly selected adults who buy
books for general reading were asked how much
they usually spend on books per year. The
sample produced a mean of $1450 and a
standard deviation of $300 for such annual
expenses. Assume that such expenses for all
adults who buy books for general reading have
an approximate normal distribution. Determine a
99% confidence interval for the corresponding
population mean.
37
Solution 8-5





Confidence level is 99% or .99
sx 
s
n

300
25
 $60
df = n – 1 = 25 – 1 = 24
Area in each tail = .5 – (.99/2)
= .5 - .4950 = .005
The values of t are 2.797 and -2.797
38
Solution 8-5
The 99% confidence interval for μ is
x  ts x  $1450  2.797(60)
 $1450  $167.82
 $1282.18 to $1617.82
39
INTERVAL ESTIMATION OF A
POPULATION PROPORTION: LARGE
SAMPLES
Estimator of the Standard Deviation of p̂
The value of s pˆ , which gives a point
estimate of  p̂ , is calculated as
s pˆ 
pˆ qˆ
n
40
INTERVAL ESTIMATION OF A
POPULATION PROPORTION: LARGE
SAMPLES cont.
Confidence Interval for the Population
Proportion, p
The (1 – α)100% confidence interval for
the population proportion, p, is
pˆ  zs pˆ
The value of z used here is obtained from the
standard normal distribution table for the
given confidence level, and s pˆ  pˆ qˆ/n .
41
Example 8-6
According to a 2002 survey by FindLaw.com,
20% of Americans needed legal advice
during the past year to resolve such thorny
issues as family trusts and landlord disputes
(CBS.MarketWach.com, August 6, 2002).
Suppose a recent sample of 1000 adult
Americans showed that 20% of them
needed legal advice during the past year to
resolve such family-related issues.
42
Example 8-6
a) What is the point estimate of the
population proportion? What is the
margin of error of this estimate?
b) Find, with a 99% confidence level, the
percentage of all adult Americans who
needed legal advice during the past year
to resolve such family-related issues.
43
Solution 8-6



n = 1000,
s pˆ 
p̂ = .20, and,
q̂ = .80
pˆ qˆ
(.20)(.80)

 .01264911
n
1000
Note that
than 5.
npˆ
and
nqˆ
are both greater
44
Solution 8-6
a)
Point estimate of p = p̂ = .20
Margin of error = ±1.96 s pˆ
= ±1.96(.01264911)
= ± .025 or ±2.5%
45
Solution 8-6
b)
The confidence level is 99%, or .99.
The z value for .4950 is approximately 2.58.
pˆ  zs pˆ  .20  2.58(.01264911)  .20  .033
 .167 to .233 or 16.7% to 23.3%
46
Example 8-7
According to the analysis of a CNN–USA TODAY–Gallup
poll conducted in October 2002, “Stress has become a
common part of everyday life in the United States. The
demands of work, family, and home place an increasing
burden on the average American.” According to this poll,
40% of Americans included in the survey indicated that
they had a limited amount of time to relax (Gallup.com,
November 8, 2002). The poll was based on a randomly
selected national sample of 1502 adults aged 18 and
older. Construct a 95% confidence interval for the
corresponding population proportion.
47
Solution 8-7

Confidence level = 95% or .95
pˆ qˆ
(.40)(.60)

 .01264069
n
1502

s pˆ 

The value of z for .95 / 2 = .4750 is 1.96.
48
Solution 8-7
pˆ  zs pˆ  .40  1.96(.01264069)
 .40  .025
 .375 to .425 or 37.5% to 42.5%
49
DETERMINING THE SAMPLE SIZE FOR
THE ESTIMATION OF THE MEAN
Given the confidence level and the standard
deviation of the population, the sample size
that will produce a predetermined maximum
error E of the confidence interval estimate
2 2
of μ is
z
n
Where E is
E
2
E  z x  z.

n
50
Example 8-8
An alumni association wants to estimate
the mean debt of this year’s college
graduates. It is known that the population
standard deviation of the debts of this
year’s college graduates is $11,800. How
large a sample should be selected so that
the estimate with a 99% confidence level
is within $800 of the population mean?
51
Solution 8-8
z 
(2.58) (11,800)
n

2
2
E
(800)
 1448.18  1449
2
2
2
2
52
DETERMINING THE SAMPLE SIZE FOR
THE ESTIMATION OF PROPORTION
Given the confidence level and the values
of p and q, the sample size that will
produce a predetermined maximum error E
of the confidence interval estimate of p is
2
z pq
n 2
E
Where E is
E  z pˆ  z 
pq
n
53
DETERMINING THE SAMPLE SIZE FOR
THE ESTIMATION OF PROPORTION cont.
In case the values of p and q are not known:
1. Take the most conservative estimate of the
sample size n by using p = .5 and q = .5.
For a given E, these values of p and q will
give the largest sample size in comparison
to any other pair of values of p = .5 and q
= .5 since their product is greater than the
product of any other pair.
54
DETERMINING THE SAMPLE SIZE FOR
THE ESTIMATION OF PROPORTION cont.
2.
Take a preliminary sample of arbitrarily
determined size and calculate
and
p̂
q̂
from this sample. Then use them to find
n.
55
Example 8-9
Lombard Electronics Company has just installed a
new machine that makes a part that is used in
clocks. The company wants to estimate the
proportion of these parts produced by this
machine that are defective. The company
manager wants this estimate to be within .02 of
the population proportion for a 95% confidence
level. What is the most conservative estimate of
the sample size that will limit the maximum error
to within .02 of the population proportion?
56
Solution 8-9




The value of z for a 95% confidence level is
1.96.
p = .50 and q = .50
z 2 pq (1.96) 2 (.50)(.50)
n

 2401
2
2
E
(.02)
Thus, if the company takes a sample of 2401
parts, there is 95% chance that the estimate of p
will be within .02 of the population proportion.
57
Example 8-10
Consider Example 8-9 again. Suppose a
preliminary sample of 200 parts produced
by this machine showed that 7% of them
are defective. How large a sample should
the company select so that the 95%
confidence interval for p is within .02 of the
population proportion?
58
Solution 8-10
p̂ = .07
and
q̂ = .93
z pˆ qˆ (1.96) (.07)(.93)
n

2
2
E
(.02)
(3.8416)(.07)(.93)

 625.22  626
.0004
2
2
59