Download 07 ESTIMATIOM LARGE SAMPLEEstimationF

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Statistics wikipedia , lookup

Transcript
Quantitative Methods
Varsha Varde
Estimation
•
•
•
•
•
•
•
•
Contents.
1. Introduction
2. Point Estimators and Their Properties
3. Single Quantitative Population
4. Single Binomial Population
5. Two Quantitative Populations
6. Two Binomial Populations
7. Choosing the Sample Size
Varsha Varde
2
• Statistical inference has two main
branches:
• Estimation
• Hypothesis testing.
Estimation
• The objective of statistical estimation is to
estimate the value of some unknown
population parameter on the basis of the
sample drawn from this population.
• Estimation is of two types:
• (i) point estimation which gives a single
valued estimate of the population parameter,
and
• (ii) interval estimation which provides a
range of values with the help of two numbers
Point Estimate
• Sample statistics are used as estimates of population
parameters.
• sample mean(x‫) ־‬is used as an estimate of population
mean (µ)
• sample standard deviation (S)is used as an estimate of
population standard deviation(σ) ,where S is as shown in
the next slide
• the proportion of items in a sample(p ¯) with given
characteristics is taken as an estimate of the proportion
of such items in the population(p).
• Such estimates are called point estimates, because
they provide a single valued estimate of the parameter.
Sample Standard Deviation
n
s s 
2
x
i
 x
i 1
n 1
2
Interval Estimate
• Interval Estimate: It is an estimate that
includes a range of values in which a
population parameter is expected to lie.
• The population parameter would be within this
interval not with certainty but with a specified
probability.
• This probability is known as the level of
confidence. Commonly used levels of
confidence are 0.90, 0.95 and
0.99.Correspondingly levels of significance
are 0.10,0.05,and 0.01
Interval Estimate
• Confidence Level: It is the probability with which the
population parameter lies within the stated interval of
values. The stated interval of values is known as the
confidence interval.
• Confidence Interval: It gives an interval of values,
centered on sample statistics, in which the
population parameter is expected to lie with a known
level of confidence
• When we say that a 95% confidence interval for the
population parameter is (2.5, 3.5) it indicates that the
true value of the population parameter would be
anywhere between 2.5 and 3.5 with probability 0.95
or in other words we can expect to be right in this
assertion 95% of the times and wrong 5% of the
times.
Example
Type of Estimates: Point or Interval
• The average age of a bank clerk is 27
• We are 95% confident that the average age
of a bank clerk is in the range of 22 to 30
• The proportion of customers who experience
helping attitude of bank employees is 60%
• We are 90% confident that the proportion of
customers who experience helping attitude of
bank employees is in the range of 50% to
70%.
Desired Properties of Point Estimators.
• (i) Unbiased: Mean of the sampling
distribution is equal to the parameter.
• (ii) Efficient: Minimum variance, Small
standard error of point estimator.
• (iii) Consistent: Error of estimation,
distance between a parameter and its
point estimate decreases as sample size
increases.
• (iv) Sufficient: Maximum usage of sample
information
Desired Properties of Interval Estimators.
• Confidence Level(1 –α)100%
should be as high as possible.
(α)100% is level of significance and
(1 –α)100% is level of confidence.
(α)equals either 0 .10 or 0.05 or 0.01
Margin of Error Or Precision: (Bound
on the error of estimation) should be
as small as possible.
Varsha Varde
11
Parameters of Interest.
• Single Population: µ ( Mean of population)
• Single Population: p ( Population proportion)
• Two Populations: µ1 ,µ2 ( Means of two
populations)
• Two Populations: p1 , p2 (Proportions
in two
populations)
Varsha Varde
12
Single Quantitative Population
• Parameter of interest: µ
• Sample data: n, x‫־‬, s
• Other information:(1
–α)100% level
of confidence
• Point estimator of µ : x‫־‬
• mean of x‫ ־‬:E(x‫ =)־‬µx‫ = ־‬µ
• Standard error of x‫ ־‬: SE(x‫ = )־‬σ/vn (also
denoted as σx‫)־‬
Varsha Varde
13
Single Quantitative Population
• Confidence Interval (C.I.) for µ:
• x¯ ± zα/2σ/vn ( point estimate ± Bound )
• Confidence level: (1 -α)100% which is the probability
that the interval estimator contains the parameter.
• zα/2= 1.96 for 95% level of confidence
• zα/2= 1.645 for 90% level of confidence
• zα/2= 2.58 for 99% level of confidence
• Margin of Error. ( or Bound on the Error
of Estimation) B = zα/2σ/vn
• Width of the Confidence Interval (C.I.)
W= 2zα/2σ/vn
• Assumptions.
14
• 1. Large sample (n >=Varsha
30)Varde
Examples
• Example 1. We are interested in
estimating the mean number of
unoccupied seats per flight, µ, for a major
airline. A random sample of n = 225 flights
shows that the sample mean is 11.6 and
the standard deviation is 4.1.
• Data summary: n = 225; x‫ = ־‬11.6; s = 4.1.
• Question 1. What is the point estimate of
µ ( Do not give the margin of error)?
• x‫ = ־‬11.6
Varsha Varde
15
Example
• Question 2. Give a 95% bound on the
error of estimation (also known as the
margin of error)
•
B = zα/2σ/vn
•
= 1.96 x4.1/v225 = 0.5357
• Question 3. Find a 90% confidence
interval for µ.
• x‫ ־‬± zα/2σ/vn
• 11.6 ± 1.645x4.1/v225
• 11.6 ± 0.45 = (11.15, 12.05)
Varsha Varde
16
Example
• Question 4. Interpret the CI found in Question 3.
• The interval (11.15, 12.05) contains the true
value of the population parameter µ with
probability 0.90
• Question 5. What is the width of the CI found in
Question 3.?
• The width of the CI is
• W = 2zα/2σ/vn
• W = 2(0.45) = 0.90
• OR
• W = 12.05 - 11.15 = 0.90
Varsha Varde
17
Example
• Question 6. If n, the sample size, is increased
what happens to the width of the CI?
what happens to the margin of error?
• The width of the CI decreases.
• The margin of error decreases.
• Sample size:
• n ≈ (zα/2)2σ2 /B2
• where σ is estimated by s.
• Note: In the absence of data,σ is sometimes
approximated by R /4 where R is the range.
Varsha Varde
18
Example
• Example 2. Suppose you want to construct a
99% CI for µ so that W = 0.05. You are told that
preliminary data shows a range from 13.3 to
13.7. What sample size should you choose?
• Data summary: α= .01;R = 13.7 - 13.3 = 0.4;
• so σ = 0.4/4 = .1. Now
• B = W/2 = 0.05/2 = 0.025. Therefore
• n = (zα/2)2σ2/B2=2.582(.1)2/0.0252 = 106.50 .
• So n = 107. (round up)
• Exercise 1. Find the sample size necessary to
reduce W in the flight example to 0.6. Use
α= 0.05.
Varsha Varde
19
Single Binomial Population
• Parameter of interest: p
• Sample data: n, x, p ¯ = x /n (x here is the number of
occurrences of a particular event in n trials).
• Other information:α ,level of significance
• Point estimator: p¯
• Mean of : p¯ =µP ¯ = p
• Standard error of p¯ = σ p¯ = √pq/n
• Confidence Interval (C.I.) for p: p ¯ ± zα/2√p ¯q ¯/n
• Confidence level: (1 -α)100% which is the probability that the
interval estimator contains the parameter.
• Margin of Error: B = zα/2 √p ¯q ¯/n
• Assumptions.
• 1. Large sample (np ≥5; nq ≥5)
• 2. Sample is randomly selected
Varsha Varde
20
Example
• Example 3. A random sample of n = 484
voters in a community produced x = 257
voters in favor of candidate A.
• Data summary: n = 484; x = 257;
• p ¯ = x/n = 257/484 = 0.531.
• Question 1. Do we have a large sample
size?
• np ¯ = 484(0.531) = 257 which is ≥5.
• nq ¯ = 484(0.469) = 227 which is ≥5.
• Therefore we have a large sample size.
Varsha Varde
21
Example
• Question 2. What is the point estimate of
p and its margin of error at 95% level of
confidence ?
• p ¯ =x/n=257/484= 0.531
• B = zα/2 √p ¯q ¯ /n
• = 1.96√(0.531)(0.469)/484
• = 0.044
• Question 3. Find a 90% confidence
interval for p.
• p ¯ ± zα/2√p ¯q ¯/ n
• =0.531 ± 1.645(0.531)(0.469)/484
• =0.531 ± 0.037 = (0.494, 0.568)
Varsha Varde
22
Example
• Question 4. What is the width of the CI
found in Question 3.?
• The width of the CI is
• W = 2zα/2p ¯q ¯/n= 2(0.037) = 0.074
• Question 5. Interpret the CI found in
Question 3.
• The interval contains p with probability
0.90.OR If repeated sampling is used,
then 90% of CI constructed would contain
p.
Varsha Varde
23
Example
• Question 6. If n, the sample size, is
increased
• what happens to the width of the CI?
• what happens to the margin of error?
• The width of the CI decreases.
• The margin of error decreases.
• Sample size.
• n ≈ (zα/2)2(p ¯q ¯)/B2 .
• Note: In the absence of data, choose p ¯ = q ¯
= 0.5 or simply p ¯q ¯ = 0.25.
Varsha Varde
24
Example
• Example 4. Suppose you want to provide an
accurate estimate of customers preferring one
brand of coffee over another. You need to
construct a 95% CI for p so that B = 0.015.
• You are told that preliminary data shows a p ¯ =
0.35. What sample size should you choose ?
Use α= 0.05.
• Data summary: α= .05; p ¯ = 0.35;B = .015
• n =(zα/2)2(p ¯q ¯)/B2=(1.96)2(0.35)(0.65)/(.015)2
= 3, 884.28
• So n = 3, 885. (round up)
Varsha Varde
25
Example
• Exercise 2. Suppose that no
preliminary estimate of p ¯ is
available. Find the new sample
size. Use α= 0.05.
• Exercise 3. Suppose that no
preliminary estimate of p ¯ is
available. Find the sample size
necessary so that α= 0.01.
Varsha Varde
26
•
•
•
•
•
•
Two Quantitative Populations
Parameter of interest: µ1 - µ2
Sample data:
Sample 1: n1, x¯1, s1 ; Sample 2: n2, x¯2, s2
Point estimator: X¯1 - X¯2
Estimator mean: µX¯1-X¯2 = µ1 - µ2
Standard error: SE(X¯1 - X¯2) =√ σ21/n1+ σ22/n2
• Confidence Interval:(X¯1 - X¯2) ± zα/2 √ σ21/n1+ σ22/n2
•
•
•
•
•
Assumptions.
1. Large samples( n1 ≥30; n2 ≥30)
2. Samples are randomly selected
3. Samples are independent
Sample size.n≈ ( zα/2 )2 (σ21+σ22 )/B2
Varsha Varde
27
Two Binomial Populations
•
•
•
•
•
•
•
•
•
•
•
•
•
Parameter of interest: p1 - p2
Sample 1: n1, x1, p1 = x1/n1
Sample 2: n2, x2, p2 = x2/n2
p1 - p2 (unknown parameter)
α (significance level)
Point estimator: p1 - p2
Estimator mean: µ p1 - p2 = p1 - p2
Estimated standard error: σ p1 - p2 = √(p1q1 /n1+p2q2 /n2)
Confidence Interval:(p1 - p2) ± zα/2 √(p1q1/n1+ p2q2/n2)
Assumptions:1. Large samples(n1p1 ≥5, n1q1 ≥5, n2p2≥ 5, n2q2≥ 5)
2. Samples are randomly and independently selected
Sample size: n ≈(zα/2 )2 (p1q1+ p2q2)/B2
For unknown parameters: n ≈ (zα/2 )2 (0.5)/B2
Varsha Varde
28
Sample size:
• Sample size: for estimating population mean
• n ≈ (zα/2)2σ2 /B2
• where σ is estimated by s (Sample Standard
Deviation)
• Note: In the absence of data,σ is sometimes
approximated by R /4 where R is the range.
• zα/2= 1.96 for 95% level of confidence
• zα/2= 1.645 for 90% level of confidence
• zα/2= 2.58 for 99% level of confidence
• B= Precision or bound or margin of permissible
error
Sample size
• Sample size for estimating population
proportion
• n ≈ (zα/2)2(p ¯q ¯)/B2 .
• Note: In the absence of data, choose p ¯ = q ¯
= 0.5 or simply p ¯q ¯ = 0.25
• p ¯ is sample proportion
• q ¯ =1- p ¯
• zα/2= 1.96 for 95% level of confidence
• zα/2= 1.645 for 90% level of confidence
• zα/2= 2.58 for 99% level of confidence
• We are doing a customer satisfaction
study for a washing machine.
• We are measuring satisfaction on a scale
of 1to 10
• Determine the sample size required for
95% level of confidence & level of
precision or margin of error in estimation
at 0.3.
• We are estimating average customer
satisfaction
• So, n ≈ (zα/2)2σ2 /B2
• σ is not known. Range is 10-1=9. So we
use estimate of σ as R/4 =9/4=2.25
• zα/2=1.96 ; B=.3
• n= (1.96 x 2.25/0.3)2 = 216
• So sample size required is 216
• We are doing a study for estimating
proportion of population who use
toothpaste brand Colgate.
• Determine the sample size required for
95% level of confidence & level of
precision or margin of error in estimation
at 0.02.
• We are estimating proportion of customers
• So, n ≈ (zα/2)2(p ¯q ¯)/B2
• p ¯ is not known. So we use estimate of p
¯ as 0.5 . So p ¯q ¯ = .25
• zα/2=1.96 ; B=.04
• n= 0.25(1.96 / 0.04)2 = 2305
• So sample size required is 2305