Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Outline for Class meeting 6 (Chapter 3 (3.1), Lohr, 2/6/06) Ratio Estimation I. Recall ratio estimators under SRS. tˆ y tˆyr ˆ t x Bˆ t x , yˆ r Bˆ xU . tx We also denote the ratio itself by B yU xU . II. Ratio estimators are biased. However, estimators of mean and total have smaller variance than that of mean-per-unit estimators when correlation between X and Y is large enough. The approximate mean and variance of the ratio estimator can be calculated using the delta method, or Taylor series expansion : y0 y y y 1 1 f ( x, y ) 0 ( x x0 ) ( y y 0 ) 0 ( x x0 ) 2 ( x x0 )( y y0 ) . x x0 x0 x02 x3 x02 0 Then letting x x S and y y S and the expansion point ( x0 , y0 ) ( xU , yU ) , we have : A. Bias E ( Bˆ B) yU 1 V ( x S ) 2 Cov( x S , y S ) 3 xU xU 1 [ BS x2 S xy ] 2 nX U B. 1. Variance E (tˆyr t y ) 2 t x2 E ( Bˆ B) 2 t x2 2 [ B 2V ( x S ) V ( y S ) 2 BCov ( y S , x S )] xU N 2 (1 f ) 2 2 [ B S x S y2 2 BS xy ]. n This is equivalent to the form favored by Lohr : E (tˆyr t y ) 2 N 2 (1 f ) 2 Sd , n where S d2 is the population variance of the variable di = yi - Bxi. 2. Variance estimation Either is estimated in the obvious way ; i.e., by substituting the sample variance (of x, y, or d) for the population variance, sample covariance for population covariance, and B̂ for B. 3. When is ratio estimation better for estimating the total (or mean)? 1. If the bias can be ignored, then ratio estimation is superior when V (tˆyr ) V (tˆy ) or when R CV ( x ) 2CV ( y ) 2. When can the bias be ignored ? The following result (due to Hartley and Ross) gives an upper bound. First note that y cov( Bˆ , xS ) E ( S xS ) E ( Bˆ ) E ( x S ) yU xU E ( Bˆ ). xS cov( Bˆ , x S ) Therefore, E ( Bˆ ) B , so that xU x | bias in B̂ | B̂ S xU since correlation cannot exceed 1. Thus the size of the bias relative to the standard error of B̂ is | bias in Bˆ | CV of x S . Bˆ So if the CV of the sample mean of the auxiliary variable is sufficiently small, then bias of the ratio estimator is negligible. 4. Estimation of variance and confidence interval estimation When the bias is negligible, it makes sense to compute a confidence interval based on the variance. One can show that the ratio estimator is approximately normally distributed in sufficiently large samples, so that the ususal c.i. procedure can be used. Example 1 : Summary of SRS from Tourist Data cvc totals 4589 3921 4589 5672 4629 6046 9489 4300 3955 5072 survey tots 51325 38742 81540 62954 115832 73842 66590 27932 42653 33985 X=CVC totals Corr = 0.21001 Y=survey totals Mean 5226.2 Mean 59539.5 Standard Error 520.8123 Standard Error 8403.724 Median 4609 Median 57139.5 Mode 4589 Mode #N/A Standard Deviation 1646.953 Standard Deviation 26574.91 Sample Variance 2712454 Sample Variance 7.06E+08 Kurtosis 5.669522 Kurtosis 0.91967 Skewness 2.247655 Skewness 0.95995 Range 5568 Range 87900 Minimum 3921 Minimum 27932 Maximum 9489 Maximum 115832 Sum 52262 Sum 595395 Count 10 Count 10 140000 120000 100000 80000 60000 40000 20000 0 0 2000 4000 Is the auxiliary data useful ? 6000 8000 10000 Example 2 Library Data Univariate Procedure Variable=INQ Moments 369 Sum Wgts 369 100% 29009.98 Sum 10704683 75% 226077.3 Variance 5.111E10 50% 12.75344 Kurtosis 169.6792 25% 1.912E13 CSS 1.881E13 0% 779.3087 Std Mean 11769.11 Histogram 3300000+* . . .* . . . . 1700000+ . . . . .* . .* 100000+********************************************* ----+----+----+----+----+----+----+----+----+ * may represent up to 8 counts N Mean Std Dev Skewness USS CV Max Q3 Med Q1 Min # Quantiles(Def=5) 3278281 99% 4882 95% 1200 90% 153 10% 0 5% 1% Boxplot 1 * 1 * 1 * 6 360 288737 56070 21301 0 0 0 * +--0--+ Selected srswor of size 50 ; then calculated ratio and mean per unit estimator Variable=TOTINQ Moments N Mean Std Dev Skewness USS CV 1000 10706326 11030933 1.38368 2.362E17 103.0319 Sum Wgts Sum Variance Kurtosis CSS Std Mean Quantiles(Def=5) 1000 1.071E10 1.217E14 1.080211 1.216E17 348828.7 100% 75% 50% 25% 0% Max Q3 Med Q1 Min Histogram 5.25E7+* .** . .* .*** 2.75E7+************* .********* . .*** .************************* 2500000+*********************************************** ----+----+----+----+----+----+----+----+----+-* may represent up to 10 counts 52868460 21551840 5241128 3258388 846286.7 # 5 12 3 30 126 83 28 246 467 99% 95% 90% 10% 5% 1% 48522312 30029143 28031668 1973124 1607578 1134284 Boxplot 0 0 | | | | +-----+ | | | + | *-----* +-----+ 2 'VAR' Variables: CIRC INQ Simple Statistics Variable CIRC INQ N Mean Std Dev Sum Minimum Maximum 369 369 127885 29010 466496 226077 47189555 10704683 0 0 6384212 3278281 Pearson Correlation Coefficients / Prob > |R| under Ho: Rho=0 / N = 369 CIRC INQ CIRC 1.00000 0.0 0.90218 0.0001 INQ 0.90218 0.0001 1.00000 0.0 Univariate Procedure Variable=TOTINQR Moments N Mean Std Dev Skewness USS CV 1000 8810325 5499547 0.949433 1.078E17 62.42161 Sum Wgts Sum Variance Kurtosis CSS Std Mean Quantiles(Def=5) 1000 8.8103E9 3.025E13 -0.59795 3.021E16 173910.9 100% 75% 50% 25% 0% Histogram 2.25E7+* .**** 2.05E7+***** .***** 1.85E7+************ .**************** 1.65E7+********** .******* 1.45E7+****** .*** 1.25E7+* .*** 1.05E7+**** .******* 8500000+************ .***************** 6500000+******************************** .*************************************** 4500000+**************************************** .**************************** 2500000+****** ----+----+----+----+----+----+----+----+ * may represent up to 4 counts Max Q3 Med Q1 Min # 2 14 19 18 46 62 38 27 21 10 3 10 16 26 46 67 127 154 159 111 24 22561369 13813802 6338470 4716290 2552644 99% 95% 90% 10% 5% 1% Boxplot | | | | | | | | | +-----+ | | | | | | | | | + | | | *-----* | | +-----+ | | 21302769 19200545 17978239 3780556 3316143 2875174