Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Supplement 8: Additional real-life examples (proportions) *The part of biasedness (including the proof) is a result of a correspondence between Dr. Ka-fu Wong and YueShen Zhou. The example was drawn from a clip sent over by Nipun Sharma. Use it at your own risks. Comments, if any, should be sent to [email protected]. Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data Supplement8-1 Mean and variance (1) Suppose a proportion p of the population is female. An observation is randomly drawn from the population. Code x = 1 if the drawn observation is female. Code x = 0 if the drawn observation is male. What is the population mean and variance of this random variable X? E(X) = (1)Prob(x=1) + (0)Prob(x=0) = (1)p + (0)(1-p) =p Var(X) = (1-p)2Prob(x=1) + (0-p)2Prob(x=0) = (1-p)2p + p2(1-p) = (1-p) p [1-p + p] = (1-p)p Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data Supplement8-2 Mean and variance (2) Suppose a proportion p of the population is female. A sample of n observations is randomly drawn with replacement from the population. Code x = 1 if a drawn observation is female, 0 otherwise. What is the population mean and variance of m = (x1+…+xn)/n? E(m) = E[(x1+…+xn)/n] = [E(x1) + E(x2) + … + E(xn)]/n = E(X) =p Var(m) = Var[(x1+…+xn)/n] = Var[(x1+…+xn)]/n2 = [Var (x1)+…+Var(xn)]/n2 = Var(X)/n = (1-p)p/n Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data Supplement8-3 Central limit theorem of proportion Let x1,…,xn be a iid sample from a population with p proportion of success. (Failure coded as 0 and success as 1.) ∑xi/n is simply the proportion of success and hence the simple average of the outcomes from the n trials. ∑xi/n will be approximately normal according to CLT. ˆp n x i 1 i n p (1 p ) pˆ ~ N p, n Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data Supplement8-4 Do you think Chinese officials spent too much government money on the following? Base on a poll of 18,000 persons. Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data Supplement8-5 Can we construct the 95% confidence intervals for the population proportion? Standard error= [p(1-p)/n]1/2 Ka-fu Wong © 2007 n p std error lower limit 18000 0.958 0.001495103 0.95507 0.9609303 18000 0.862 0.002570733 0.856961 0.8670385 18000 0.86 0.002586289 0.854931 0.865069 18000 0.85 0.002661453 0.844784 0.8552164 18000 0.807 0.00294157 0.801235 0.8127654 18000 0.802 0.002970185 0.796179 0.8078215 18000 0.679 0.003479775 0.67218 0.6858202 18000 0.5 0.00372678 0.492696 0.5073044 ECON1003: Analysis of Economic Data upper limit Supplement8-6 Why would some books used a different formula? The population proportion is unknown. Thus, an estimate of the variance of sample proportion will be pˆ n x i 1 i n p (1 p ) pˆ ~ N p, n p (1 p ) Var ( pˆ ) n pˆ (1 pˆ ) EstimatedVar ( pˆ ) ? n pˆ (1 pˆ ) p(1 p) E n n pˆ (1 pˆ ) p (1 p) E n n 1 When n is large, the difference between two estimators of sample variance are negligible. This is why some books use n, some use (n-1). Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data Supplement8-7 Why would some books used a different formula? pˆ x / n p (1 p ) ) n p (1 p ) Var ( pˆ ) n pˆ ~ N ( p, pˆ (1 pˆ ) p(1 p) E n n pˆ (1 pˆ ) p (1 p) E n n 1 pˆ (1 pˆ ) is a biased estimator for n p (1 p ) n p (1 p ) pˆ (1 pˆ ) is an unbiased estimator for n n 1 When n is large, the difference between two estimators of sample variance are negligible. This is why some books use n, some use (n-1). Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data Supplement8-8 Proof: A biased estimator pˆ (1 pˆ ) 1 2 ˆ ˆ E E ( p p ) n n 1 [ E ( pˆ ) E ( pˆ 2 )] n 1 p [Var ( pˆ ) E ( pˆ ) 2 ] n 1 p (1 p ) p p2 n n 1 2 np p(1 p) np 2 n 1 2 np(1 p ) p(1 p) n (n 1) p (1 p) n 1 p(1 p) 2 n n n Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data Supplement8-9 Proof: An unbiased estimator 1 pˆ (1 pˆ ) 2 ˆ ˆ E E ( p p ) n 1 n 1 1 [ E ( pˆ ) E ( pˆ 2 )] n 1 1 p [Var ( pˆ ) E ( pˆ ) 2 ] n 1 1 p (1 p ) 2 p p n 1 n 1 np p (1 p ) np 2 (n 1)n 1 np(1 p) p(1 p) (n 1)n (n 1) p (1 p ) p (1 p ) (n 1)n n Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data Supplement8-10 Supplement 8: Additional real-life examples - END - Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data Supplement8-11