Download estimationtheory

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Degrees of freedom (statistics) wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Taylor's law wikipedia , lookup

Misuse of statistics wikipedia , lookup

Student's t-test wikipedia , lookup

Resampling (statistics) wikipedia , lookup

German tank problem wikipedia , lookup

Transcript
Statistical Inference:
Estimation for
Single Populations
1
Learning Objectives
Estimate a population mean from a sample mean
when s is known.
Estimate a population mean from a sample mean
when s is unknown.
Estimate a population proportion using the z statistic.
Use the chi-square distribution to estimate the
population variance given the sample variance.
Determine the sample size needed in order to
estimate the population mean and population
proportion.
Statistical Estimation
A confidence Interval or Interval
Estimate is a range of values constructed
from sample data so that the population
parameter is likely to occur within that
range at a specified probability. The
specified probability is called the level of
confidence.
Interval Estimate - a range of values calculated from a sample
statistic(s) and standardized statistics, such as the z.
Selection of the standardized statistic is
determined by the sampling distribution.
Selection
of
critical
values
of
the
standardized statistic is determined by the
desired level of confidence.
Estimating the Population Mean
A point estimate is a static taken from a sample that
is used to estimate a population parameter.
Interval estimate - a range of values within which
the analyst can declare, with some confidence, the
population lies.
Factors Affecting Confidence
Interval Estimates
The factors that determine the width of a confidence
interval are:
1.The sample size, n.
2.The variability in the population, usually σ
estimated by s.
3.The desired level of confidence.
Point and Interval Estimates
A point estimate is a single number,
a confidence interval provides additional
information about variability
Width of confidence interval
Confidence Intervals
How much uncertainty is associated with a
point estimate of a population parameter?
An interval estimate provides more
information about a population characteristic
than does a point estimate
Such interval estimates are called confidence
intervals
Interval Estimates - Interpretation
For a 95% confidence interval about 95% of the similarly constructed intervals
will contain the parameter being estimated. Also 95% of the sample means for
a specified sample size will lie within 1.96 standard deviations of the
hypothesized population
Estimation Process
Random Sample
Population
(mean, μ, is
unknown)
Sample
Mean
X = 50
I am 95%
confident that
μ is between
40 & 60.
Confidence Interval to Estimate
 when s is Known
Point estimate
x
x
n
xz
Interval Estimate
s
n
or
xz
s
n
   xz
s
n
Distribution of Sample Means
for 95% Confidence
.025
.025
95%
.4750
.4750

X
Z
-1.96
0
1.96
Estimating the Population Mean
For a 95% confidence interval
α = 0.05
α/2 = 0.025
Value of α/2 or z.025 look at the standard normal
distribution table under
.5000 - .0250 = .4750
From Table A5 look up 0.4750, and read 1.96 as the
z value from the row and column
Estimating the Population Mean
α is used to locate the Z value in constructing the
confidence interval
The confidence interval yields a range within which
the researcher feel with some confidence the
population mean is located
Z score – the number of standard deviations a value
(x) is above or below the mean of a set of numbers
when the data are normally distributed
z
x 
s
n
95% Confidence Intervals for 
95%

X
X
X
X
X
X
X
95% Confidence Interval for 
x  1300, s  160, n  85, z/2  1.96
s
s
x  z /2
   x  z /2
n
n
46
46
1300  1.96
   1300  1.96
85
85
1300  34.01    1300  34.01
1265.99    1334.01
Problem:
A survey was taken of U.S. companies that do
business with firms in India. One of the questions
on the survey was: Approximately how many years
has your company been trading with firms in India?
A random sample of 44 responses to this question
yielded a mean of 10.455 years. Suppose the
population standard deviation for this question
is 7.7 years. Using this information, construct a 90%
confidence interval for the mean number of years that
a company has been trading in India for the population
of U.S. companies trading with firms in India.
Demonstration Problem 8.1
x  10.455, s  7.7, n  44.
90% confidence  z  1.645
xz
s
   xz
s
n
n
7.7
7.7
10.455  1.645
   10.455  1.645
44
44
10.455  1.91    10.455  1.91
8.545    12.365
Problem:
A study is conducted in a company that employs
800 engineers. A random sample of 50 engineers
reveals that the average sample age is 34.3 years.
Historically, the population standard deviation of
the age of the company’s engineers is
approximately 8 years. Construct a 98%
confidence interval to estimate the average age
of all the engineers in this company.
Problem:
x  34.3, s  8, N = 800, and n  50.
98% confidence  z  2.33
xz
s
n
N n
s
   xz
N 1
n
N n
N 1
8
800  50
8
34.3  2.33
   34.3  2.33
50 800  1
50
34.3  2.554    34.3  2.554
31.75    36.85
800  50
800  1
Estimating the Mean of a Normal
Population: Sample Size is Small and
Population is Unknown
The distribution of sample means is approximately
normal if the population has a normal distribution.
The z formulas can be use to estimate a population
mean if the value of the population Standard
Deviation is known.
t Distribution
A family of distributions -- a unique distribution for
each value of its parameter, degrees of freedom (d.f.)
t distribution is used instead of the z distribution for
doing inferential statistics on the population mean
when the population Std Dev is unknown and the
population is normally distributed
With the t distribution, you use the Sample Std Dev, s
t Distribution
A family of distributions - a unique distribution for
each value of its parameter using degrees of freedom
(d.f.)
t formula:
x
t 
s
n
t Distribution Characteristics
t distribution – symmetric, unimodal, mean = 0,
flatter in middle and have more area in their tails
than the normal distribution
t distribution approach the normal curve as n becomes
larger
t distribution is to be used when the population variance
or population Std Dev is unknown, regardless of the size
of the sample
Reading the t Distribution
t table uses the area in the tail of the distribution
Emphasis in the t table is on α, and each tail of the
distribution contains α/2 of the area under the curve
when confidence intervals are constructed
t values are located at the intersection of the df
value and the selected α/2 value
Confidence Intervals for  of a
Normal Population: Unknown s
x  t / 2,n 1
s
n
or
x  t / 2,n 1
df  n  1
s
s
   x  t / 2,n 1
n
n
Table of Critical Values of t
df
1
2
3
4
5
t0.100 t0.050 t0.025 t0.010 t0.005
3.078
1.886
1.638
1.533
1.476
6.314
2.920
2.353
2.132
2.015
12.706
4.303
3.182
2.776
2.571
31.821
6.965
4.541
3.747
3.365
63.656
9.925
5.841
4.604
4.032
1.714
25
1.319
1.318
1.316
1.708
2.069
2.064
2.060
2.500
2.492
2.485
2.807
2.797
2.787
29
30
1.311
1.310
1.699
1.697
2.045
2.042
2.462
2.457
2.756
2.750
40
60
120
1.303
1.296
1.289
1.282
1.684
1.671
1.658
1.645
2.021
2.000
1.980
1.960
2.423
2.390
2.358
2.327
2.704
2.660
2.617
2.576
23
24

1.711


t
With df = 24 and  = 0.05,
t = 1.711.
Confidence Intervals for  of a
Normal Population: Unknown s
s
xt
n
or
s
s
x t
   xt
n
n
df  n  1
Problem
The owner of a large equipment rental company wants to make a
rather quick estimate of the average number of days a piece of
ditch digging equipment is rented out per person per time. The
company has records of all rentals, but the amount of time
required to conduct an audit of all accounts would be prohibitive.
The owner decides to take a random sample of rental invoices.
Fourteen different rentals of ditch diggers are selected randomly
from the files, yielding the following data. She uses these data to
construct a 99% confidence interval to estimate the average
number of days that a ditch digger is rented and assumes that the
number of days per rental is normally distributed in the
population.
3 , 1, 3, 2, 5, 1, 2, 1, 4, 2, 1, 3, 1, 1.
Solution for Demonstration Problem 8.3
x  2.14, s  1.29, n  14, df  n  1  13
 1  .99

 0.005
2
2
t .005,13  3.012
s
s
x t
   xt
n
n
1.29
1.29
2.14  3.012
   2.14  3.012
14
14
2.14  1.04    2.14  1.04
1.10    3.18
Comp Time: Excel Normal View
Confidence Interval to Estimate
the Population Proportion
Estimating the population proportion often
must be made
pˆ  z 
2
pˆ  qˆ
 p  pˆ  z 
n
2
where :
pˆ = sample proportion
qˆ =1  pˆ
p = population proportion
n = sample size
pˆ  qˆ
n
Demonstration Problem 8.5
A clothing company produces men’s jeans. The jeans
are made and sold with either a regular cut or a boot
cut. In an effort to estimate the proportion of their
men’s jeans market in Oklahoma City that prefers
boot-cut jeans, the analyst takes a random sample
of 423 jeans sales from the company’s two Oklahoma
City retail outlets. Only 72 of the sales were for
boot-cut jeans. Construct a 90% confidence interval
to estimate the proportion of the population in
Oklahoma City who prefer boot-cut jeans.
Solution for Demonstration Problem 8.5
x
72
ˆ 
n  423, x  72, p

 0.17
n
423
ˆ =1  p
ˆ  1  0.17  0.83
q
90% Confidence  z  1.645
pˆ  z
ˆˆ
pq
 p  pˆ  z
n
ˆˆ
pq
n
(0.17)(0.83)
(0.17)(0.83)
0.17  1.645
 p  0.17  1.645
423
423
0.17  0.03  p  0.17  0.03
0.14  p  0.20
Estimating the Population Variance
Population Parameter s
Estimator of s
( x  x)
s 
n 1
2
2
 formula for Single Variance
(n  1) s
 
s2
degrees of freedom  n  1
2
2
Confidence Interval for s2
n  1s

2

2
2
s
2

n  1s


2
2
1

2
df  n  1
  1  level of confidence
Two Table Values of 2
df = 7
.05
.95
.05
0
2
4
6
8
2.16735
10
12
14
16
18
20
14.0671
df
1
2
3
4
5
6
7
8
9
10
0.950
3.93219E-03
0.102586
0.351846
0.710724
1.145477
1.63538
2.16735
2.73263
3.32512
3.94030
0.050
3.84146
5.99148
7.81472
9.48773
11.07048
12.5916
14.0671
15.5073
16.9190
18.3070
20
21
22
23
24
25
10.8508
11.5913
12.3380
13.0905
13.8484
14.6114
31.4104
32.6706
33.9245
35.1725
36.4150
37.6525
90% Confidence Interval for s2
s 2  .0022125, n  8, df  n  1  7,   .10
 2   .21   .205  14.0671
2
2
 2    2 .1   .295  2.16735
1
1
2
2
______________________________________
(n  1) s 2
 2
2
s 2 
(n  1) s 2
2
1
2
(8  1).0022125
(8  1).0022125
s 2 
14.0671
2.16735
.001101  s 2  .007146
Demonstration Problem 8.6
The U.S. Bureau of Labor Statistics publishes data on the hourly
compensation costs for production workers in manufacturing
for various countries. The latest figures published for Greece
show that the average hourly wage for a production worker in
manufacturing is $19.58. Suppose the business council of
Greece wants to know how consistent this figure is. They
randomly select 25 production workers in manufacturing from
across the country and determine that the standard deviation
of hourly wages for such workers is $1.12. Use this information
to develop a 95% confidence interval to estimate the
population variance for the hourly wages of production
workers in manufacturing in Greece. Assume that the hourly
wages for production workers across the country in
manufacturing are normally distributed.
Solution for Demonstration Problem 8.6
s 2  1.2544, n  25, df  n  1  24,   .05



  .05  
2
2
2
2
2
1

2

2
1
.05
2
2
.025

 n  1 s 2

2

2
 39.3641
2
.975
s
 25  1 (1.2544) 
39.3641
 12.4011
s
2
n  1 s 2



2
2
1

2
25  1 (1.2544)


12.40115
0.7648  s  2.4276
2
Determining Sample Size when Estimating
It may be necessary to estimate the sample size
when working on a project
In studies where µ is being estimated, the size of the
sample can be determined by using the z formula for
sample means to solve for n
Difference between x and µ is the error of estimation
Determining Sample Size when Estimating 
z formula
z
x
s
n
Error of Estimation (tolerable error) E  x  
z s
n
E
2
Estimated Sample Size
Estimated s
s
1
range
4
2
2
2
 z s

 E

2
2




Sample Size When Estimating : Example
E  1, s  4
90% confidence  z  1.645
z s
n
E
2
2
2
2
(1.645) 2 ( 4) 2

12
 43.30 or 44
Demonstration Problem 8.7
Suppose you want to estimate the average age of all
Boeing 737-300 airplanes now in active domestic U.S.
service. You want to be 95% confident, and you want
your estimate to be within one year of the actual
figure. The 737-300 was first placed in service about
24 years ago, but you believe that no active 737-300s
in the U.S. domestic fleet are more than 20 years old.
How large of a sample should you take?
Solution for Demonstration Problem 8.7
E  2, range  25
95% confidence  z  1.96
1
1
estimated s :
range     20   5
4
4
zs
E
2
n
2
2
(1.96) 2(5) 2

12
 96.04 or 97
Determining Sample Size when Estimating p
z formula
ˆp
p
Z 
pq
n
Error of Estimation (tolerable error)
Estimated Sample Size
n
2
z pq
E
2
E  pˆ  p
Demonstration Problem 8.8
Hewitt Associates conducted a national survey to
determine the extent to which employers are
promoting health and fitness among their employees.
One of the questions asked was, Does your company
offer on-site exercise classes? Suppose it was
estimated before the study that no more than 40% of
the companies would answer Yes. How large a sample
would Hewitt Associates have to take in estimating
the population proportion to ensure a 98% confidence
in the results and to be within .03 of the true
population proportion?
Solution for Demonstration Problem 8.8
E  0.03
98% Confidence  Z  2.33
estimated P  0.40
Q  1  P  0.60
z 2 pq
n 2
E
(2.33)2 (0.40)(0.60)

2
(.03)
 1, 447.7 or 1, 448