• Study Resource
• Explore

Survey

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
```Supplement 8:
(proportions)
*The part of biasedness (including the proof) is a result of a correspondence
between Dr. Ka-fu Wong and YueShen Zhou. The example was drawn from a clip
sent over by Nipun Sharma. Use it at your own risks. Comments, if any, should
be sent to [email protected]
ECON1003: Analysis of Economic Data
Supplement8-1
Mean and variance (1)
 Suppose a proportion p of the population is female. An observation is
randomly drawn from the population. Code x = 1 if the drawn observation
is female. Code x = 0 if the drawn observation is male. What is the
population mean and variance of this random variable X?
 E(X)
= (1)Prob(x=1) + (0)Prob(x=0)
= (1)p + (0)(1-p)
=p
 Var(X)
= (1-p)2Prob(x=1) + (0-p)2Prob(x=0)
= (1-p)2p + p2(1-p)
= (1-p) p [1-p + p]
= (1-p)p
ECON1003: Analysis of Economic Data
Supplement8-2
Mean and variance (2)
 Suppose a proportion p of the population is female. A sample of n
observations is randomly drawn with replacement from the
population. Code x = 1 if a drawn observation is female, 0
otherwise. What is the population mean and variance of m =
(x1+…+xn)/n?
 E(m) = E[(x1+…+xn)/n] = [E(x1) + E(x2) + … + E(xn)]/n
= E(X)
=p
 Var(m) = Var[(x1+…+xn)/n] = Var[(x1+…+xn)]/n2
= [Var (x1)+…+Var(xn)]/n2
= Var(X)/n
= (1-p)p/n
ECON1003: Analysis of Economic Data
Supplement8-3
Central limit theorem of proportion
Let x1,…,xn be a iid sample from a population with p proportion of success.
(Failure coded as 0 and success as 1.)
 ∑xi/n is simply the proportion of success and hence the simple average of
the outcomes from the n trials.
 ∑xi/n will be approximately normal according to CLT.

ˆp 
n
x
i 1 i
n
 p (1  p ) 
pˆ ~ N  p,

n


ECON1003: Analysis of Economic Data
Supplement8-4
Do you think Chinese officials spent too much
government money on the following?
Base on a poll of 18,000 persons.
ECON1003: Analysis of Economic Data
Supplement8-5
Can we construct the 95% confidence intervals for the
population proportion?
Standard error= [p(1-p)/n]1/2
n
p
std error
lower
limit
18000
0.958
0.001495103
0.95507
0.9609303
18000
0.862
0.002570733
0.856961
0.8670385
18000
0.86
0.002586289
0.854931
0.865069
18000
0.85
0.002661453
0.844784
0.8552164
18000
0.807
0.00294157
0.801235
0.8127654
18000
0.802
0.002970185
0.796179
0.8078215
18000
0.679
0.003479775
0.67218
0.6858202
18000
0.5
0.00372678
0.492696
0.5073044
ECON1003: Analysis of Economic Data
upper limit
Supplement8-6
Why would some books used a different
formula?
 The population proportion is unknown. Thus, an estimate of the variance
of sample proportion will be

pˆ 
n
x
i 1 i
n
 p (1  p ) 
pˆ ~ N  p,

n


p (1  p )
Var ( pˆ ) 
n
pˆ (1  pˆ )
EstimatedVar ( pˆ ) 
?
n
 pˆ (1  pˆ )  p(1  p)
E


n
 n 
 pˆ (1  pˆ )  p (1  p)
E


n
 n 1 
When n is large, the difference between two estimators of sample variance
are negligible. This is why some books use n, some use (n-1).
ECON1003: Analysis of Economic Data
Supplement8-7
Why would some books used a different
formula?
pˆ  x / n
p (1  p )
)
n
p (1  p )
Var ( pˆ ) 
n
pˆ ~ N ( p,
 pˆ (1  pˆ )  p(1  p)
E


n
 n 
 pˆ (1  pˆ )  p (1  p)
E


n
 n 1 
pˆ (1  pˆ )
is a biased estimator for
n
p (1  p )
n
p (1  p )
pˆ (1  pˆ )
is an unbiased estimator for
n
n 1
When n is large, the difference between two estimators of sample variance
are negligible. This is why some books use n, some use (n-1).
ECON1003: Analysis of Economic Data
Supplement8-8
Proof: A biased estimator
 pˆ (1  pˆ )  1
2
ˆ
ˆ
E

E
(
p

p
)

 n  n
1
 [ E ( pˆ )  E ( pˆ 2 )]
n
1
 p  [Var ( pˆ )  E ( pˆ ) 2 ]
n
1
p (1  p )

 p 
 p2 
n
n

1
 2 np  p(1  p)  np 2
n
1
 2 np(1  p )  p(1  p)
n
(n  1) p (1  p)  n  1  p(1  p)



2
n
n
 n 




ECON1003: Analysis of Economic Data
Supplement8-9
Proof: An unbiased estimator
1
 pˆ (1  pˆ ) 
2
ˆ
ˆ
E

E
(
p

p
)

 n 1  n 1
1

[ E ( pˆ )  E ( pˆ 2 )]
n 1
1

p  [Var ( pˆ )  E ( pˆ ) 2 ]
n 1
1 
p (1  p )
2

p


p


n 1 
n

1

np  p (1  p )  np 2
(n  1)n
1
np(1  p)  p(1  p)

(n  1)n
(n  1) p (1  p ) p (1  p )


(n  1)n
n


