Download N - Universidade Católica Portuguesa

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
MBACATÓLICA
Quantitative Methods
Miguel Gouveia
Manuel Leite Monteiro
Faculdade de Ciências Económicas e Empresariais
UNIVERSIDADE CATÓLICA PORTUGUESA
9. SAMPLING
DISTRIBUTIONS
MBACatólica 2006/07
Métodos Quantitativos
7-2
Problem
!
a)
b)
A soft-drink vending machine is set so the amount
of drink dispensed is a random variable with a mean
of 200 milliliters and a standard deviation of 15
milliliters. What is the probability that the average
amount dispensed in a random sample of 36 is at
least 204 milliliters:
if the the random variable is normally distributed?
if the distribution is unknown?
MBACatólica 2006/07
Métodos Quantitativos
7-3
Distribution of the sample mean
!
The sample mean (computed from n observations
drawn from a population) is a random variable.
!
Our objective is to study the distribution of that
variable and to see how it is related to the
distribution of the population from which the sample
was drawn.
MBACatólica 2006/07
Métodos Quantitativos
7-4
Distribution of the sample mean
Example: samples (with replacement) of size n=2
from a population with four values: 1, 2, 3, 4.
(µ=2.5 e σ 2 =1.25)
! Possible samples : 16
Sample means
!
1,1
2,1
3,1
4,1
1,2
2,2
3,2
4,2
1,3
2,3
3,3
4,3
MBACatólica 2006/07
1,4
2,4
3,4
4,4
1.0
1.5
2.0
2.5
1.5
2.0
2.5
3.0
2.0
2.5
3.0
3.5
2.5
3.0
3.5
4.0
Métodos Quantitativos
7-5
Distribution of the sample mean
Sample Mean
1.0
1.5
2.0
2.5
3.0
3.5
4.0
Total
MBACatólica 2006/07
Nº of samples
1
2
3
4
3
2
1
16
Métodos Quantitativos
Probability
1/16
2/16
3/16
4/16
3/16
2/16
1/16
1
7-6
Distribution of the sample mean
Distribution of the sample
mean
Distribution of the population
f (x)
f ( x)
0.3
0.3
0.2
0.2
0.1
0.1
0
1
2
MBACatólica 2006/07
3
4
x
0
1.0 1.5 2.0 2.5 3.0 3.5 4.0
Métodos Quantitativos
7-7
Distribution of the sample mean
E  X  = ∑ x. f (x ) = 2.5 = µ
!
The mean of the sample mean’s distribution is the
mean of the population.
!
Concepts of mean being used:
Expected value (parameter of the mean's distribution)
Random variable
Parameter (parameter of the universe)
MBACatólica 2006/07
Métodos Quantitativos
7-8
x
Distribution of the sample mean
V  X  = ∑ ( x − µ ) . f ( x ) = 0.625
2
V  X  = σ 2 n = 1.25 / 2
σ
=σx
!
The standard deviation of the sample mean is:
!
As the sample size (n) increases, the standard deviation of
the mean decreases.
As the standard deviation (σ) decreases, the standard
deviation of the mean also decreases.
!
MBACatólica 2006/07
n
Métodos Quantitativos
7-9
Distribution of the sample mean
Population: N = 4
µ = 2.5
Sample mean (n = 2)
E[ X ] = 2.5
σ 2 = 1.25
f ( x)
f (x)
.3
.3
.2
.2
.1
.1
0
1
2
MBACatólica 2006/07
3
4
x
0
V [ X ] = 0.625
1.0 1.5 2.0 2.5 3.0 3.5 4.0
Métodos Quantitativos
7-10
x
Distribution of the sample mean
 X + X 2 + ... + X n 
E  X  = E  1

n

µ + µ + ... + µ n µ
=
=
=µ
n
n
 X + X 2 + ... + X n 
V  X  = V  1

n

σ 2 + σ 2 + ... + σ 2 nσ 2 σ 2
=
= 2 =
n2
n
n
MBACatólica 2006/07
Métodos Quantitativos
7-11
Distribution of the sample mean
for Normal Populations
!
The linear combination of independent normal
random variables is itself a normal random variable.
!
Application:
n
σ 

If X ~ N (µ, σ) then X =∑ X i f i ~ N  µ ,

n

i =1
!
X×Y e X/Y do not have a normal distribution
MBACatólica 2006/07
Métodos Quantitativos
7-12
Problem
!
a)
b)
A soft-drink vending machine is set so the amount
of drink dispensed is a random variable with a mean
of 200 milliliters and a standard deviation of 15
milliliters. What is the probability that the average
amount dispensed in a random sample of 36 is at
least 204 milliliters:
if the the random variable is normally distributed?
if the distribution is unknown?
MBACatólica 2006/07
Métodos Quantitativos
7-13
Solution
!
a)
!
X: quantity of the soft-drink dispensed, with µ=200 and
σ=15. Sample size: n=36
if X ~ N ( 200,15
2
)
⇒

15 2 
X ~ N  200,

36 

probability that the average amount is at least 204:
 X − µ 204 − 200 
P  X ≥ 204  =P 
≥

n
15 36 
σ

=P [ Z ≥ 1.6 ] =1-0.9452=5.48%
and if the distribution was unknown?
MBACatólica 2006/07
Métodos Quantitativos
7-14
Central Limit Theorem
!
!
!
The distribution of a random variable obtained from the
sum (mean) of “n” independent and identically
distributed (i.i.d) random variables approaches a normal
distribution as “n” increases.
This result is independent from the distribution of the
population.
If X1, X2, ..., Xn are n random variables i.i.d. with mean
µ and variance σ 2, then:
X − µ
~ N
σ
n
(0 ,1 )
MBACatólica 2006/07
Métodos Quantitativos
7-15
MBACatólica 2006/07
Métodos Quantitativos
7-16
MBACatólica 2006/07
Métodos Quantitativos
7-17
Central Limit Theorem
…the distribution
of the sample
mean becomes
almost Normal,
independently of
the population’s
distribution.
As the
sample size
increases…
x
MBACatólica 2006/07
Métodos Quantitativos
7-18
Central Limit Theorem
!
What sample size (n) is “large enough”?
– For most population distributions, n>30
– For distributions that are fairly symmetric, n>15
may suffice
– For distributions that are normally distributed,
the sampling distribution of the mean will
always be normally distributed, regardless of the
sample size.
MBACatólica 2006/07
Métodos Quantitativos
7-19
Solution
!
b)
!
X: quantity of the soft-drink dispensed, with µ=200 and
σ=15. Sample size: n=36
since n is "large" ⇒

15 2 
X ~! = N  200,

36 

probability that the average amount is at least 204:
 X − µ 204 − 200 
P  X ≥ 204  =P 
≥

n
15 36 
σ

" P [ Z ≥ 1.6 ] =1-0.9452=5.48%
MBACatólica 2006/07
Métodos Quantitativos
7-20
10. INTRODUCTION TO
STATISTICAL
INFERENCE
MBACatólica 2006/07
Métodos Quantitativos
7-21
Statistical Inference
11. Point Estimation
12. Confidence Intervals
13. Hypothesis Tests
MBACatólica 2006/07
Métodos Quantitativos
7-22
Problem
!
a)
b)
BankX plans to launch a new financial product
different from all the existing ones. A sample of 25
potential investors provided the following
information regarding the amount they wish to
invest in the new product (normally distributed):
Σxi=1000 and Σ(xi–x)2=9600.
Compute a point estimate for the average amount
invested.
Compute a 90% confidence interval for the average
amount invested.
MBACatólica 2006/07
Métodos Quantitativos
7-23
Parameters and Statistics
!
Parameter: is a numerical value that characterizes
the distribution or the universe studied.
!
Estimator: is a random variable that can take
different values depending on the particular sample
drawn.
!
Estimate: is a number that is obtained from a
specific sample.
MBACatólica 2006/07
Métodos Quantitativos
7-24
11. Point Estimation
MBACatólica 2006/07
Métodos Quantitativos
7-25
Estimators for the mean, variance and
proportion
Population’s Estimator
parameter
Mean
Variance
Standard
deviation
Proportion
MBACatólica 2006/07
Estimate
µ
X
x
σ2
S2
s2
σ
S
s
p
fn
(fn)
Métodos Quantitativos
7-26
Estimator’s properties
!
Unbiasedness
An estimator is unbiased it the mean of its distribution
equals the parameter.
!
Efficiency
An unbiased estimator is the most efficient if its variance
(around the parameter) is minimal.
!
Consistency
An estimator is consistent if, as the sample size increases, its
mean approaches the parameter and its variance decreases.
MBACatólica 2006/07
Métodos Quantitativos
7-27
Unbiasedness
f (⋅)
Unbiased
Biased
µ
MBACatólica 2006/07
Métodos Quantitativos
7-28
Efficiency
f (⋅)
Sampling
distribution of
the mean
Sampling
distribution
of the
median
µ
MBACatólica 2006/07
Métodos Quantitativos
7-29
Consistency
f (⋅)
Large
sample
Small
sample
µ
MBACatólica 2006/07
Métodos Quantitativos
7-30
Problem
!
a)
b)
BankX plans to launch a new financial product
different from all the existing ones. A sample of 25
potential investors provided the following
information regarding the amount they wish to
invest in the new product (normally distributed):
Σxi=1000 and Σ(xi–x)2=9600.
Compute a point estimate for the average amount
invested.
Compute a 90% confidence interval for the average
amount invested.
MBACatólica 2006/07
Métodos Quantitativos
7-31
Solution
!
a)
BankX plans to launch a new financial product different
from all the existing ones. A sample of 25 potential
investors provided the following information regarding the
amount they wish to invest in the new product (normally
distributed): Σxi=1000 and Σ(xi–x)2=9600.
Compute a point estimate for the average amount invested.
n=25; x= 1000 25 =40; s 2 = 9600 24 =400.
point estimate: µˆ = x= 1000 25 =40
b)
Compute a 90% confidence interval for the average amount
invested.
MBACatólica 2006/07
Métodos Quantitativos
7-32
12. CONFIDENCE
INTERVALS
MBACatólica 2006/07
Métodos Quantitativos
7-33
Point Estimation vs. Confidence Intervals
Population
The mean, µ,
is unknown
Random sample
Mean
x = 50
I’ve got 95%
confidence that µ
is located
between 40 and 60.
Sample
MBACatólica 2006/07
Métodos Quantitativos
7-34
Confidence Intervals for the mean
!
Example for a Normal population (or for “large”
samples)

As: X ~ N  µ ,
X −µ
~ N (0,1)
 we have
σ n
n
σ 

Thus: P  − 1.96 < X − µ < 1.96  = 0.95


σ
/
n


MBACatólica 2006/07
Métodos Quantitativos
7-35
Confidence Intervals for the mean
which can also be written as:
σ
σ 

< µ < X +1.96  = 0.95
P  X −1.96
n
n

!
So, we have a 95% confidence interval for the mean:
x − 1.96
MBACatólica 2006/07
σ
n
< µ < x + 1.96
Métodos Quantitativos
σ
n
7-36
Interpretation of a (1-α)% confidence
interval
!
(1-α)% is the percentage of confidence intervals,
– from successive samples,
– all with size n,
– drawn from the same population
that include the true value of the parameter being
estimated.
MBACatólica 2006/07
Métodos Quantitativos
7-37
Interpretation of a (1-α)% confidence
interval
µ − zα / 2
σ
n
α /2
µ + zα / 2
1−α
E[ X ] = µ
Confidence
intervals for
10 different
samples
MBACatólica 2006/07
α /2
σ
n
x
(1 − α ) %
of the intervals
contain µ and
α % don’t.
Métodos Quantitativos
7-38
(1- α)% CI for the mean:
Normal Pop., n large and σ known
!
For a Normal population (or large n) with σ known:
1.
2.
3.
4.
Define the level of confidence (1- α)%
Collect a sample with size n. Compute x
Obtain zα/2 from the statistic tables
The confidence interval is given by:
x − zα 2
MBACatólica 2006/07
σ
n
< µ < x + zα 2
Métodos Quantitativos
σ
n
7-39
Problem
!
a)
b)
BankX plans to launch a new financial product
different from all the existing ones. A sample of 25
potential investors, collected the following
information regarding the amount they wish to
invest in the new product (normally distributed):
Σxi=1000 and Σ(xi–x)2=9600.
Compute a point estimate for the average amount
invested.
Compute a 90% confidence interval for the average
amount invested.
MBACatólica 2006/07
Métodos Quantitativos
7-40
Solution
BankX plans to launch a new financial product different
from all the existing ones. A sample of 25 potential
investors, collected the following information regarding the
amount they wish to invest in the new product (normally
distributed): Σxi=1000 and Σ(xi–x)2=9600.
!
b)
n=25; x= 1000 25 =40; s 2 = 9600 24 =400.
Compute a 90% CI for the average amount invested.
α = 10% ⇒ z0.05 = 1.645
σ
σ
x − zα 2
< µ < x + zα 2
n
n
IC for µ : (33.156,46.844)
MBACatólica 2006/07
⇒ 40 ± 1.645
Métodos Quantitativos
20
25
7-41
Conflict between credibility and precision
!
Credibility – Confidence level of an interval
Precision – Width of the confidence interval
!
For a given sample size n:
!
– More precision means decrease the width of the interval. Therefore
implying a lower level of confidence.
– A higher level of confidence implies a larger interval (less
precision).
!
The only way to increase simultaneously the precision and
the credibility of the inference is to increase n.
MBACatólica 2006/07
Métodos Quantitativos
7-42
Problem
!
A vending machine is calibrated to pour a quantity of liquid
that follows a normal distribution with variance equal to 16
ml2. In a sample of 25 drinks, the average was: x = 2 5 0 m l
We want:
a) To construct a 95% Confidence Interval for the true average
quantity of liquid on the served drinks;
b) To determine how many drinks should be included on a new
sample, if the interval precision is to be increased to 2 ml.
MBACatólica 2006/07
Métodos Quantitativos
7-43
Solution
a)
x − 1.96
σ
n
< µ < x + 1.96
σ
n
4
4
< µ < 250 + 1.96
25
25
248.432 < µ < 251.568
250 − 1.96
x − 1.96
σ
n
< µ < x + 1,96
σ
n
4
4
< µ < 250 + 1.96
25
25
248.432 < µ < 251.568
250 − 1.96
The width of the interval is 3.136 ml.
MBACatólica 2006/07
Métodos Quantitativos
7-44
Solution
b)
Width =
2 × zα 2
σ
n
2 = 2 ×1.96
zα 2
σ
n
4
n
n = 7.84
n = 62
MBACatólica 2006/07
Métodos Quantitativos
7-45
Problem
!
Ten analysts have given the following year earnings
forecasts for a stock, which are normally distributed:
Forecast (X i ) Number of analysts (ni )
1.40
1
1.43
1
1.44
3
1.45
2
1.47
1
1.48
1
1.50
1
Compute a 95% confidence interval for the
population mean of the forecasts.
MBACatólica 2006/07
Métodos Quantitativos
7-46
Population’s Variance unknown
Until now we have assumed that the variance of the
population was known. However, it usually is
unknown and has to be estimated.
n
! We know that
2
!
S
2
=
∑ (X
i =1
i
− X
)
n −1
is an unbiased estimator for the population variance.
E  S 2  = σ
MBACatólica 2006/07
2
Métodos Quantitativos
7-47
Distribution of the sample mean from a
Normal population with unknown σ
!
If the population is Normal, is the sample mean
distribution still given by
X −µ
~ N ( 0,1 ) ?
S
n
For small samples the answer is NO!
MBACatólica 2006/07
Métodos Quantitativos
7-48
Distribution of the sample mean from a
Normal population with unknown σ
!
With σ unknown, we have a “t” distribution:
X −µ
~ t ( n − 1)
S
n
n
where:
S2 =
MBACatólica 2006/07
∑ ( xi − x )
i =1
2
n −1
Métodos Quantitativos
7-49
t distribution (Student’s distribution)
Normal (0,1)
t (df = 13)
Also bell shaped
Also symmetric
But with wider tails
t (df = 5)
z, t
0
MBACatólica 2006/07
Métodos Quantitativos
7-50
Student’s t distribution
1
2
3
4
5
0.90
3.078
1.886
1.638
1.533
1.476
0.95
6.314
2.920
2.353
2.132
2.015
F(x)
0.975
12.706
4.303
3.182
2.776
2.571
0.99
31.821
6.965
4.541
3.747
3.365
0.995
63.656
9.925
5.841
4.604
4.032
6
7
8
9
10
1.440
1.415
1.397
1.383
1.372
1.943
1.895
1.860
1.833
1.812
2.447
2.365
2.306
2.262
2.228
3.143
2.998
2.896
2.821
2.764
3.707
3.499
3.355
3.250
3.169
11
12
13
14
15
1.363
1.356
1.350
1.345
1.341
1.796
1.782
1.771
1.761
1.753
2.201
2.179
2.160
2.145
2.131
2.718
2.681
2.650
2.624
2.602
3.106
3.055
3.012
2.977
2.947
26
27
28
29
inf
1.315
1.314
1.313
1.311
1.282
1.706
1.703
1.701
1.699
1.645
2.056
2.052
2.048
2.045
1.960
2.479
2.473
2.467
2.462
2.326
2.779
2.771
2.763
2.756
2.576
n
MBACatólica 2006/07
t (df = 3)
1-0.975
0
3.182
Métodos Quantitativos
7-51
(1- α)% CI for the mean:
Normal Pop. and σ unknown
!
For a Normal population with σ unknown:
1.
2.
3.
4.
Define the level of confidence (1- α)%
Collect a sample with size n. Compute x
n −1
Obtain tα( / 2 ) from the statistical tables
The confidence interval is given by:
( n −1)
x − tα / 2
MBACatólica 2006/07
s
( n −1) s
< µ < x + tα / 2
n
n
Métodos Quantitativos
7-52
Problem
!
Ten analysts have given the following year earnings
forecasts for a stock, which are normally distributed:
Forecast (X i ) Number of analysts (ni )
1.40
1
1.43
1
1.44
3
1.45
2
1.47
1
1.48
1
1.50
1
Compute a 95% confidence interval for the
population mean of the forecasts.
MBACatólica 2006/07
Métodos Quantitativos
7-53
Solution
x = 1.45;
s = 0.02789;
n = 10;
df = 9
9
t0.025
= 2.262
0.02789
0.02789
≤ µ ≤ 1.45 + 2.262
10
10
1.43 ≤ µ ≤ 1.47
1.45 − 2.262
!
For a 99% confidence level, the interval would be:
0.02789
0.02789
≤ µ ≤ 1.45 + 3.250
10
10
1.421 ≤ µ ≤ 1.479
1.45 − 3.250
MBACatólica 2006/07
Métodos Quantitativos
7-54
Distribution of the sample mean
σ Known
n<30
σ Unkown
n≥30
n≥30
n<30
Normal X − µ
X −µ
X −µ
X −µ
~ N (0,1)
~ N (0,1)
~ N (0,1)
~ t (n − 1)
S
σ
S
Population σ
n
n
n
n
CLT
We don’t X − µ
~ N (0,1)
know the σ
distribution
n
Not
Normal
Population
We don’t X − µ
~ N (0,1)
know the
S
distribution
n
CLT
MBACatólica 2006/07
CLT
Métodos Quantitativos
7-55
Confidence interval for a proportion
!
The true proportion of a population is p.
The estimator of p is the proportion on the sample,
X
i.e., f n = , where X is a binomial variable:
n
[] [ ]
1
np
E [ fn ] = E [ X ] =
= p
n
n
EP =
V [ fn ] =
1
np
EX =
=p
n
p
np (1 − p ) p (1 − p )
1
=
=
V
X
[ ]
2
2
n
n
n
MBACatólica 2006/07
Métodos Quantitativos
7-56
Confidence interval for a proportion
!
fn − p
For a large n:
p (1 − p )
n
!
~ N ( 0 ,1 )
The confidence interval is given by:
f n − zα 2
f n (1 − f n )
n
MBACatólica 2006/07
< p < f n + zα 2
f n (1 − f n )
n
Métodos Quantitativos
7-57
(1- α)% CI for a proportion :
with large samples
1.
2.
3.
4.
Define the level of confidence (1- α)%
Collect a sample of size n. Compute f n
Obtain zα/2 from the statistic tables
The confidence interval is given by:
f n − zα 2
f n (1 − f n )
MBACatólica 2006/07
n
< p < f n + zα 2
Métodos Quantitativos
f n (1 − f n )
n
7-58
Problem
!
We want to estimate the proportion of voters in a
political party. 400 citizens were interviewed and
140 of them revealed the intention to vote on that
party.
Compute a 99% confidence interval for the
proportion of votes on that party.
MBACatólica 2006/07
Métodos Quantitativos
7-59
Solution
n = 400
f n = 140 / 400 = 0.35, 1 − f n = 0.65
1 − α = 0.99, α / 2 = 0.005, zα / 2 = 2.57
0.35*0.65
0.35*0.65
≤ p ≤ 0.35 + 2.57
400
400
0.28871 ≤ p ≤ 0.41129
0.35 − 2.57
MBACatólica 2006/07
Métodos Quantitativos
7-60
Selection of the sample size
!
The sample size is a decision variable reflecting a
conflict between precision and the cost of sampling.
Very large:
• Too expensive
MBACatólica 2006/07
Very small:
• Imprecise results
Métodos Quantitativos
7-61
Selection of the sample size
!
Question: for a desirable minimum precision, what
should be the minimum sample size to be drawn?
The choice of n is affected by 3 factors:
1. The level of precision or the level of margin of
error (interval width)
2. Level of confidence
3. The dispersion of the population
MBACatólica 2006/07
Métodos Quantitativos
7-62
Sample size:
Estimation of a proportion
!
Since the confidence interval is given by:
f n − zα
2
f n (1 − f n )
< p < f n + zα
n
2
f n (1 − f n )
n
it can also be written as
fn − e < p < fn + e
with e being the margin of error.
MBACatólica 2006/07
Métodos Quantitativos
7-63
Sample size:
Estimation of a proportion
!
Fixing e, it is possible to obtain n as:
n = ( zα 2 )
!
2
f n (1 − f n )
e2
BUT: the value of f n is unknown before the sample
is drawn.
The value used for f n should be the one that
maximizes p(1-p), i.e., f n = 0.5 .
MBACatólica 2006/07
Métodos Quantitativos
7-64
Problem
!
Determine the minimum size of a sample in order to
compute a 95% confidence interval for the
proportion of consumers who are willing to buy a
new product, with a margin of error of one
percentage point.
!
Recompute that confidence interval if you were sure
that, given the high price of the product, no more
than 25% of consumers would buy it.
MBACatólica 2006/07
Métodos Quantitativos
7-65
Solution
e = 0.01
α = 5%
Zα / 2 = 1.96
0.5 × 0.5
= 9604
2
0.01
! If we knew “a priori” that p<0.25, then
0.25 × 0.75
n = 1.962
= 7203
0.012
n = 1.962
MBACatólica 2006/07
Métodos Quantitativos
7-66
Sample size:
Estimation of the mean
!
The confidence interval is given by:
x − zα 2
σ
< µ < x + zα 2
n
Thus: n = ( z α
MBACatólica 2006/07
2
)
σ
n
x −e < µ < x +e
and it can be written as:
2
σ
2
e2
Métodos Quantitativos
7-67
Sample size:
Estimation of the mean
!
If σ is unknown:
1. Collect a pilot sample, with a smaller size, to
estimate σ.
2. If the population is approximately normal:
Prob[µ ± 2σ]=0.95 and Prob[µ ± 3σ]=0.997
Therefore (and using past data or subjective
evaluations of the population), we can “estimate”:
ι. σ = (Percentile 97.5- Percentile 2.5)/4
ιι. σ = (MAX- MIN)/6
MBACatólica 2006/07
Métodos Quantitativos
7-68
Problem
!
Suppose you want to estimate the population mean
of the analysts forecasts for next year stock earnings
to within ± 0.01 with 95% confidence.
On the basis of past studies, you believe the standard
deviation of those forecasts to be 0.03.
Find the minimum sample size needed.
MBACatólica 2006/07
Métodos Quantitativos
7-69
Solution
e = 0.01
σ = 0.03
α = 5%
zα / 2 = 1.96
0.032
= 34.6
n = 1.96
0.012
We need at least 35 forecasts in our sample.
2
MBACatólica 2006/07
Métodos Quantitativos
7-70