Download Confidence Intervals

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Degrees of freedom (statistics) wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Taylor's law wikipedia , lookup

Student's t-test wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Misuse of statistics wikipedia , lookup

German tank problem wikipedia , lookup

Transcript
Estimation and Confidence
Intervals
Learning Objectives

Know the difference between point and interval estimation.

Estimate a population mean from a sample mean when s is known.

Estimate a population mean from a sample mean when s is unknown.

Estimate a population proportion from a sample proportion.

Estimate the population variance from a sample variance.

Estimate the minimum sample size necessary to achieve given statistical goals.
Statistical Estimation

Point estimate -- the single value of a statistic calculated from a
sample.

A point estimate is the statistic, computed from sample information,
which is used to estimate the population parameter.
Statistical Estimation
A confidence interval estimate is a range of values constructed from
sample data so that the population parameter is likely to occur within
that range at a specified probability. The specified probability is called
the level of confidence.
Interval Estimate - a range of values calculated from a sample statistic(s) and
standardized statistics, such as the z.
 Selection of
the standardized statistic is determined
by the sampling distribution.
 Selection of critical values of the standardized statistic
is determined by the desired level of confidence.
Factors Affecting Confidence
Interval Estimates
The factors that determine the width of a confidence
interval are:
1.The sample size, n.
2.The variability in the population, usually σ
estimated by s.
3.The desired level of confidence.
Point and Interval Estimates


A point estimate is a single number,
a confidence interval provides additional information about
variability
Lower
Confidence
Limit
Point Estimate
Width of confidence interval
Upper
Confidence
Limit
Confidence Intervals

How much uncertainty is associated with a point estimate of a
population parameter?

An interval estimate provides more information about a
population characteristic than does a point estimate

Such interval estimates are called confidence intervals
Interval Estimates - Interpretation
For a 95% confidence interval about 95% of the similarly constructed intervals
will contain the parameter being estimated. Also 95% of the sample means for a
specified sample size will lie within 1.96 standard deviations of the hypothesized
population
Estimation Process
Random Sample
Population
(mean, μ, is
unknown)
Sample
Mean
X = 50
I am 95%
confident that
μ is between
40 & 60.
Confidence Interval to Estimate
 when s is Known


Point estimate
Interval Estimate
x
x
n
s
xz
n
or
s
s
xz
   xz
n
n
95% Confidence Interval for 
x  4.26, s  1.1, and n  60.
xz
s
   xz
s
n
n
1.1
1.1
4.26  1.96
   4.26  1.96
60
60
4.26  0.25    4.26  0.25
4.01    4.51
95% Confidence Interval for 
x  153, s  46, and n  85.
xz
s
   xz
s
n
n
46
46
153  1.96
   153  1.96
85
85
153  9.78    153  9.78
143.22    162.78
Question:
A survey was taken of U.S. companies that do business with firms in India.
One of the questions on the survey was: Approximately how many years ha
your company been trading with firms in India? A random sample of 44
responses to this question yielded a mean of 10.455 years. Suppose the
population standard deviation for this question is 7.7 years. Using this
information , construct a 90%condidence interval for the mean number of
years that a company has been trading in India for the population of U.S.
companies trading with firms in India.
Solution:
x  10.455, s  7.7, n  44.
90% confidence  z  1.645
xz
s
   xz
s
n
n
7.7
7.7
10.455  1.645
   10.455  1.645
44
44
10.455  1.91    10.455  1.91
8.545    12.365
The analyst is 90% confident that is a census of al U.S. companies trading
with firms in India were taken at the time of this survey, the actual population
mean number of years a company would have been trading with firms in India
would be between 8.545 and 12.365. The point estimate is 10.455 years
Question:
A study is conducted I a company that employs 800
engineers. A random sample of 50 engineers reveals that
the average sample age is 34.3 years. Historically, the
population standard deviation of the age of the company’s
engineers is approximately 8 years. Construct a 98%
confidence interval to estimate the average age of all the
engineers in this company.
Solution:
x  34.3, s  8, N = 800, and n  50.
98% confidence  z  2.33
xz
34.3  2.33
s
n
N n
s
   xz
N 1
n
N n
N 1
8
800  50
8
   34.3  2.33
800  1
50
50
34.3  2.554    34.3  2.554
31.75    36.85
800  50
800  1
The finite correction factor takes into account the fact that the population is only
800 instead of being infinitely large. The sample, n = 50, is a greater proportion
of the 800 than it would be of a larger population, and thus the width of the
confidence interval is reduced.
Confidence Interval to Estimate 
when n is Large and s is Known
s
x z
n
or
s
s
x z
   x z
n
n

2

2

2
Question:
Suppose a U.S. car rental firm wants to estimate the
average number of miles traveled per day by each of
its car rented in California. A random sample of 110
cars rented in California reveals that the sample mean
travel distance per day is 85.5miles, with a sample
standard deviation of 19.3miles. Compute a 99%
confidence Interval to estimate µ
Solution:
x  85.5, S  19.3, and n  110.
99% confidence  z  2.575
xz
s
   xz
s
n
n
19.3
19.3
85.5  2.575
   85.5  2.575
110
110
85.5  4.7    85.5  4.7
80.8    90.2
Z Values for Some of the More
Common Levels of Confidence
Confidence Level
z Value
90%
1.645
95%
1.96
98%
2.33
99%
2.575
Estimating the Mean of a Normal
Population: Unknown s

The population has a normal distribution.

The value of the population standard deviation is unknown.

z distribution is not appropriate for these conditions

t distribution is appropriate
The t Distribution

Developed by British statistician, William Gosset

A family of distributions - a unique distribution for each value of its
parameter, degrees of freedom (d.f.)

Symmetric, Unimodal, Mean = 0, Flatter than a z

t formula
x
t
s
n
Confidence Intervals for  of a
Normal Population: Unknown s
s
xt
n
or
s
s
x t
   xt
n
n
df  n  1
Example
A
tire manufacturer wishes to investigate the tread life of its
tires. A sample of 10 tires driven 50,000 miles revealed a
sample mean of 0.32 inch of tread remaining with a
standard deviation of 0.09 inch. Construct a 95 percent
confidence interval for the population mean. Would it be
reasonable for the manufacturer to conclude that after
50,000 miles the population mean amount of tread
remaining is 0.30 inches?
Confidence Interval for the Mean
Example using the t-distribution
Given in the problem :
n  10
x  0.32
s  0.09
Compute the C.I. using the
t - dist. (since s is unknown)
s
X  t / 2 , n 1
n
Student’s t-distribution Table
Question:
The owner of a large equipment rental company wants to make a rather
quick estimate of the average number of days a piece of ditch-digging
equipment is rented out per person per time. The company has record of all
accounts world be prohibitive. The owner decides to take a random sample
of rental invoices. Fourteen different rentals of ditch-diggers are selected
randomly from the files, yielding the following data. She uses these data to
construct a 99% confidence interval to estimate the average number of days
that a ditch-digger is rented and assumes that the number of days per rental
is normally distributed in the population.
3 1 3 2 5 1 2 1 4 2 1 3 1 1
Solution:
x  2.14, s  1.29, n  14, df  n  1  13
 1  .99

 0.005
2
2
t.005,13  3.012
Solution:
xt
2.14  3.012
s
n
1.29
   xt
s
n
   2.14  3.012
14
2.14  1.04    2.14  1.04
1.10    3.18
1.29
14
Example:
There are 250 families in Scandia, Pennsylvania. A random sample of 40 of
these families revealed the mean annual church contribution was $450 and
the standard deviation of this was $75.
Develop
a 90 percent confidence interval for the population mean.
Interpret
the confidence interval.
Solution:
Given in Problem:
N – 250
n – 40
s - $75
Since n/N = 40/250 = 0.16, the finite population correction factor must be used.
The population standard deviation is not known therefore use the t-distribution (may
use the z-dist since n>30)
Use the formula below to compute the confidence interval:
s
X t
n
N n
N 1
X t
s
n
N n
N 1
 $450  t.10, 401
 $450  1.685
$75
40
$75
40
250  40
250  1
250  40
250  1
 $450  $19.98 .8434
 $450  $18.35
 ($431.65,$468.35)
It is likely tha t the population mean is more than $431.65 but less than $468.35.
To put it another wa y, could the population mean be $445? Yes, but it is not
likely tha t it is $425 because the value $445 is within th e confidence
interval and $425 is not within the confidence interval.
Confidence Interval to Estimate
the Population Proportion
pˆ  z 
2
pˆ qˆ
 p  pˆ  z 
n
2
where :
pˆ = sample proportion
qˆ = 1 - pˆ
p = population proportion
n = sample size
pˆ qˆ
n
Problem:
A clothing company produces men’s jeans. The jeans are made
and sold with either a regular cut or a boot cut. In an effort to
estimate the proportion of their men’s jeans market in
Oklahoma City that prefers boot-cut jeans, the analyst takes a
random sample of 212 jeans sales form the company’s two
Oklahoma City retail outlets. Only 34 of the sales were for bootcut jeans. Construct a 90% confidence interval to estimate the
proportion of the population in Oklahoma City who prefer bootcut jeans.
Solution:
x 34
n  212, x  34, pˆ  
 0.16
n 212
qˆ = 1 - pˆ  1  0.16  0.84
90% Confidence  z  1.645
pˆ qˆ
pˆ qˆ
pˆ  z
 p  pˆ  z
n
n
(0.16)(0.84)
(0.16)(0.84)
0.16  1.645
 p  0.16  1.645
212
212
0.16  0.04  p  0.16  0.04
0.12  p  0.20
Population Variance

Variance is an inverse measure of the group’s homogeneity.

Variance is an important indicator of total quality in standardized products and
services. Managers improve processes to reduce variance.

Variance is a measure of financial risk. Variance of rates of return help managers
assess financial and capital investment alternatives.

Variability is a reality in global markets. Productivity, wages, and costs of living vary
between regions and nations.
Estimating the Population
Variance

Population Parameter s

Estimator of s

 formula for Single Variance
2

(
x

x
)
s2 
n 1
2
(
n

1
)
s
2 
s2
degrees of freedom  n - 1
Confidence Interval for s2
n  1s

2

2
2
s
2

n  1s


2
2

1
2
df  n  1
  1  level of confidence
90% Confidence Interval for s2
s 2  .0022125, n  8, df  n  1  7,   .10
 2   .21  .205  14.0671
2
2
 2    2 .1  .295  2.16735
1
1
2
2
______________________________________
(n  1) s 2

2
2
s 
2
(n  1) s 2
2 
1
2
(8  1).0022125
(8  1).0022125
2
s 
14.0671
2.16735
.001101  s 2  .007146
Problem:
The U.S. Bureau of Labor Statistics publishes data on the hourly
compensation costs for production workers in manufacturing for various
countries. The latest figures published for Greece show that the average
hourly wage for a production worker in manufacturing in $9.63. Suppose
the business council of Greece wants to know how consistent this figure is.
They randomly select 25 production workers in manufacturing from across
the country and determine that the standard deviation of hourly wages for
such workers is $1.12. Use this information to develop a 95% confidence
interval to estimate the population variance for the hurly wages of
production workers in manufacturing in Greece. Assume that the hourly
wages for production workers across the country in manufacturing are
normally distributed.
Solution:
s 2  1.2544, n  25, df  n  1  24,   .05



  .05  
2
2
2

2
1
2

2
2
1
.05
2
2
.025

n  1s 2

2

 39.3641
2
.975
 12.4011
s
2

2
25  1(1.2544) 
s
0.7648  s

n  1s 2

2
39.3641
2
1

2

25  1(1.2544)

12.4011
2
 2.4277
Problem:
The Interstate Conference of Employment Security Agencies
says the average workweek in the United Sates is down to only
35 hours, largely because of a rise in part-time workers.
Suppose this figure was obtained forma a random sample of 20
workers and that the standard deviation of the sample was 4.3
hours. Assume hours worked per week are normally
distributed in the population. Use this sample information to
develop a 98%confidence interval for the population variance
of the number of hours worked per week for a worker.
Solution:
n = 20
98% C.I.
s = 4.3
df = 20 – 1 = 19
2.99,19 = 7.63273
(20  1)(18.49)
36.1980
s2 = 18.49
< s2 <
9.71 < s2 < 46.03
2.01,19 = 36.1980
(20  1)(18.49)
7.63273
Determining Sample Size when
Estimating 

z formula
z
x
s
n
E  x

Error of Estimation (tolerable error)
z s
n
E
2

Estimated Sample Size
2
2

Estimated s
1
s  range
4
2
2
 z s 


 E 


2
Problem:
Suppose a researcher wants to estimate the average monthly
expenditure on the bread by a family in Chicago. She wants to
be 90% confidant of her results. How much error is she willing
to tolerate in the results? Suppose she wants the estimate
within $1.00 of the actual figure and the standard deviation of
average monthly bread purchases is $4.00. what is the sample
size estimation for this problem?
Sample Size When Estimating µ: Example
E  1, s  4
90% confidence  z  1.645
z s
n
E
2
2
2
2
(1.645) 2 (4) 2

12
 43.30 or 44
Question:
Suppose you want to estimate the average age of all Boeing 727
airplanes now in active domestic U.S. service. You want to be 95%
confident, and you want your estimate to be within two years of the
actual figure. The 727 was first placed in service about 30 years ago,
but you believe that no active 727s in the U.S. domestic fleet are more
than 25 years old. How large a sample should you take?
Solution:
E  2, range  25
95% confidence  z  1.96
1
1
estimated s  range   25  6.25
4
4
zs
E
2
n
2
2
(1.96) 2 (6.25) 2

22
 37.52 or 38
Determining Sample Size when Estimating p



z formula
pˆ  p
Z
pq
n
Error of Estimation (tolerable error)
E  pˆ  p

Estimated Sample Size
n
z 2 pq
E
2
Question:
Hewitt Associates conducted a national survey to determine the extent to
which employers are promoting health and fitness among their employees.
One of the questions asked was, Does your company offer on-site exercise
classes? Suppose it was estimated before the study that no more than 40%
of the companies would answer Yes. How large a sample would Hewitt
Associates have to take in estimating the population proportion to ensure a
98% confidence in the results and to be within .03 of the true population
proportion?
Solution:
E  0.03
98% Confidence  Z  2.33
estimated P  0.40
Q  1  P  0.60
z 2 pq
n 2
E
(2.33) 2 (0.40)(0.60)

(.003) 2
 1,447.7 or 1,448
Determining Sample Size when
Estimating p with No Prior Information
p
pq
0.5
0.25
z = 1.96
E = 0.05
400
350
300
0.4
0.24
250
0.3
0.21
n 200
150
0.2
0.16
0.1
0.09
100
50
0
0
2
z
n
E
1
4
2
0.1
0.2
0.3
0.4
0.5
P
0.6
0.7
0.8
0.9
1
Example: Determining n when
Estimating p with No Prior Information
E  0.05
90% Confidence  z  1.645
with no prior estimate of p, use p  0.50
q  1  p  0.50
2
n
z pq
E
2
(1.645) 2 (0.50)( 0.50)

(.05) 2
 270.6 or 271