Download 6 - SMU

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Degrees of freedom (statistics) wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Taylor's law wikipedia , lookup

Transcript
Outline for Class meeting 6 (Chapter 3 (3.1), Lohr, 2/6/06)
Ratio Estimation
I.
Recall ratio estimators under SRS.
tˆ
y
tˆyr  ˆ t x  Bˆ t x , yˆ r  Bˆ xU .
tx
We also denote the ratio itself by B  yU xU .
II.
Ratio estimators are biased. However, estimators of mean and total have smaller
variance than that of mean-per-unit estimators when correlation between X and Y
is large enough. The approximate mean and variance of the ratio estimator can be
calculated using the delta method, or Taylor series expansion :
 y0
y
y y
1
1
f ( x, y )   0 
( x  x0 ) 
( y  y 0 )  0 ( x  x0 ) 2 
( x  x0 )( y  y0 ) .
x x0
x0
x02
x3
x02
0
Then letting x  x S and y  y S and the expansion point ( x0 , y0 )  ( xU , yU ) , we have :
A. Bias
E ( Bˆ  B) 

yU
1
V ( x S )  2 Cov( x S , y S )
3
xU
xU
1
[ BS x2  S xy ]
2
nX U
B. 1. Variance
E (tˆyr  t y ) 2  t x2 E ( Bˆ  B) 2
t x2
 2 [ B 2V ( x S )  V ( y S )  2 BCov ( y S , x S )]
xU
N 2 (1  f ) 2 2
[ B S x  S y2  2 BS xy ].
n
This is equivalent to the form favored by Lohr :

E (tˆyr  t y ) 2 
N 2 (1  f ) 2
Sd ,
n
where S d2 is the population variance of the variable di = yi - Bxi.
2. Variance estimation
Either is estimated in the obvious way ; i.e., by substituting the sample variance (of x, y,
or d) for the population variance, sample covariance for population covariance, and B̂
for B.
3.
When is ratio estimation better for estimating the total (or mean)?
1. If the bias can be ignored, then ratio estimation is superior when
V (tˆyr )  V (tˆy ) or when R 
CV ( x )
2CV ( y )
2. When can the bias be ignored ? The following result (due to Hartley and Ross) gives
an upper bound. First note that
y
cov( Bˆ , xS )  E ( S xS )  E ( Bˆ ) E ( x S )  yU  xU E ( Bˆ ).
xS
cov( Bˆ , x S )
Therefore, E ( Bˆ )  B 
, so that
xU
 x
| bias in B̂ |  B̂ S
xU
since correlation cannot exceed 1. Thus the size of the bias relative to the standard error
of B̂ is
| bias in Bˆ |
 CV of x S .
 Bˆ
So if the CV of the sample mean of the auxiliary variable is sufficiently small, then bias
of the ratio estimator is negligible.
4. Estimation of variance and confidence interval estimation
When the bias is negligible, it makes sense to compute a confidence interval based on the
variance. One can show that the ratio estimator is approximately normally distributed in
sufficiently large samples, so that the ususal c.i. procedure can be used.
Example 1 : Summary of SRS from Tourist Data
cvc totals
4589
3921
4589
5672
4629
6046
9489
4300
3955
5072
survey tots
51325
38742
81540
62954
115832
73842
66590
27932
42653
33985
X=CVC totals
Corr =
0.21001
Y=survey totals
Mean
5226.2 Mean
59539.5
Standard Error
520.8123 Standard Error
8403.724
Median
4609 Median
57139.5
Mode
4589 Mode
#N/A
Standard Deviation 1646.953 Standard Deviation 26574.91
Sample Variance 2712454 Sample Variance
7.06E+08
Kurtosis
5.669522 Kurtosis
0.91967
Skewness
2.247655 Skewness
0.95995
Range
5568 Range
87900
Minimum
3921 Minimum
27932
Maximum
9489 Maximum
115832
Sum
52262 Sum
595395
Count
10 Count
10
140000
120000
100000
80000
60000
40000
20000
0
0
2000
4000
Is the auxiliary data useful ?
6000
8000
10000
Example 2 Library Data
Univariate Procedure
Variable=INQ
Moments
369 Sum Wgts
369
100%
29009.98 Sum
10704683
75%
226077.3 Variance
5.111E10
50%
12.75344 Kurtosis
169.6792
25%
1.912E13 CSS
1.881E13
0%
779.3087 Std Mean
11769.11
Histogram
3300000+*
.
.
.*
.
.
.
.
1700000+
.
.
.
.
.*
.
.*
100000+*********************************************
----+----+----+----+----+----+----+----+----+
* may represent up to 8 counts
N
Mean
Std Dev
Skewness
USS
CV
Max
Q3
Med
Q1
Min
#
Quantiles(Def=5)
3278281
99%
4882
95%
1200
90%
153
10%
0
5%
1%
Boxplot
1
*
1
*
1
*
6
360
288737
56070
21301
0
0
0
*
+--0--+
Selected srswor of size 50 ; then calculated ratio and mean per unit estimator
Variable=TOTINQ
Moments
N
Mean
Std Dev
Skewness
USS
CV
1000
10706326
11030933
1.38368
2.362E17
103.0319
Sum Wgts
Sum
Variance
Kurtosis
CSS
Std Mean
Quantiles(Def=5)
1000
1.071E10
1.217E14
1.080211
1.216E17
348828.7
100%
75%
50%
25%
0%
Max
Q3
Med
Q1
Min
Histogram
5.25E7+*
.**
.
.*
.***
2.75E7+*************
.*********
.
.***
.*************************
2500000+***********************************************
----+----+----+----+----+----+----+----+----+-* may represent up to 10 counts
52868460
21551840
5241128
3258388
846286.7
#
5
12
3
30
126
83
28
246
467
99%
95%
90%
10%
5%
1%
48522312
30029143
28031668
1973124
1607578
1134284
Boxplot
0
0
|
|
|
|
+-----+
|
|
| + |
*-----*
+-----+
2 'VAR' Variables:
CIRC
INQ
Simple Statistics
Variable
CIRC
INQ
N
Mean
Std Dev
Sum
Minimum
Maximum
369
369
127885
29010
466496
226077
47189555
10704683
0
0
6384212
3278281
Pearson Correlation Coefficients / Prob > |R| under Ho: Rho=0 / N = 369
CIRC
INQ
CIRC
1.00000
0.0
0.90218
0.0001
INQ
0.90218
0.0001
1.00000
0.0
Univariate Procedure
Variable=TOTINQR
Moments
N
Mean
Std Dev
Skewness
USS
CV
1000
8810325
5499547
0.949433
1.078E17
62.42161
Sum Wgts
Sum
Variance
Kurtosis
CSS
Std Mean
Quantiles(Def=5)
1000
8.8103E9
3.025E13
-0.59795
3.021E16
173910.9
100%
75%
50%
25%
0%
Histogram
2.25E7+*
.****
2.05E7+*****
.*****
1.85E7+************
.****************
1.65E7+**********
.*******
1.45E7+******
.***
1.25E7+*
.***
1.05E7+****
.*******
8500000+************
.*****************
6500000+********************************
.***************************************
4500000+****************************************
.****************************
2500000+******
----+----+----+----+----+----+----+----+
* may represent up to 4 counts
Max
Q3
Med
Q1
Min
#
2
14
19
18
46
62
38
27
21
10
3
10
16
26
46
67
127
154
159
111
24
22561369
13813802
6338470
4716290
2552644
99%
95%
90%
10%
5%
1%
Boxplot
|
|
|
|
|
|
|
|
|
+-----+
|
|
|
|
|
|
|
|
| + |
|
|
*-----*
|
|
+-----+
|
|
21302769
19200545
17978239
3780556
3316143
2875174