Download on classes of unbiased sampling strategies

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Taylor's law wikipedia , lookup

Gibbs sampling wikipedia , lookup

Sampling (statistics) wikipedia , lookup

Transcript
Journal of Reliability and Statistical Studies [ISSN: 0974-8024 (Print), 2229-5666 (Online)]
Vol. 3, Issue 2 (2010): 93-101
ON CLASSES OF UNBIASED SAMPLING STRATEGIES
Shashi Bhushan1 and Shubhra Katara2
1. Department of Statistics, PUC Campus, Mizoram University, Aizawl –
Mizoram, India.
E Mail: [email protected]
2. Department of Statistics, Bareilly College, Bareilly, U.P., India.
E Mail: [email protected].
Abstract
In this paper, we have proposed to use two classes of sampling strategies based on the
modified ratio estimator using the standard deviation and the coefficient of skewness of the
auxiliary variable by Singh (2003) for estimating the population mean (total) of the study
variable in a finite population. The properties of the proposed sampling strategies are studied and
some concluding remarks are given. Also, an empirical study is included as an illustration.
Keywords: Ratio estimator, Variance, Unbiasedness, Mean square error.
1. Introduction
The use of information on an auxiliary variable for increasing the efficiency of
sampling strategy is quite well established in sampling from finite populations. Many
transformed estimation procedures are also available in the literature for increasing the
efficiency but, unfortunately, this is always done at the cost of unbiasedness. There are
numerous instances where the bias of these estimation procedures is very large with
respect to their mean square error and therefore may not be advisable. The aim of this
paper is to introduce some efficient as well as unbiased sampling strategies for finite
population. Further, it may be of interest to note that there are very few works in
sampling literature wherein the focus is on devising better sampling strategies both in
terms of efficiency and unbiasedness. This may be done by some innovations in both
the aspects of sampling strategy viz namely sampling scheme and estimation
procedure. In this paper, we have used the prior information about coefficient of
coefficient of skewness and standard deviation at both the stages – estimation and
sampling scheme so that the accuracy of the sampling strategy is improved. Recently,
some attempts have been made by various authors, including Singh (2003), Senapati
(2005), Singh (2005) among others, to improve the existing sampling strategies by
using information about auxiliary variable.
The use of prior value of coefficient of skewness and standard deviation in
estimating the population mean of characteristic under study y was made by Singh
(2003). It was mentioned by the above author that the use of such prior information
about the coefficient of skewness and standard deviation leads to more efficient
estimation population mean (total) of the study variable.
The ratio estimator for estimating the population mean of the study variable
y is given by
yR =
y
µ
X = RX
x
(1.1)
94
Journal of Reliability and Statistical Studies, December 2010, Vol. 3(2)
µ = y , y is the sample mean of the study variable y , x is the sample mean of
where R
x
the some auxiliary variable x and X is the population mean of the x which is
assumed to be known.
If the population standard deviation of x denoted by σ x and population
coefficient of skewness denoted by β1x is known, then Singh (2003) proposed a
modified product estimator for estimating the population mean Y of the study variable
given by
( x β1x + σ x )
(1.2)
ySP = y
( X β1 x + σ x )
The bias and mean square error of ySP under simple random sampling are given by
1 1
(1.3)
Bias ( ySP ) = ( − )Y ρν Cx C y
n N
and
1 1
MSE ( y SP ) = ( − )Y 2 {C y2 + ν C x2 (ν + 2 K )}
(1.4)
n N
where C y is the population coefficient of variation of y , ν = X β1x ( X β1x + σ x ) ,
K = ρ C y C y and ρ is the population correlation coefficient between x and y .
Similar to (1.2), a modified ratio estimator can be given by
( X β1 x + σ x )
ySR = y
( x β1x + σ x )
having bias and mean square error given by
1 1
Bias ( ySR ) = ( − )Y (ν 2C x2 − ρν C x C y ) and
n N
1 1
MSE ( y SR ) = ( − )Y 2 {C y2 + ν C x2 (ν − 2 K )} respectively.
n N
In this paper, Singh (2003) estimator is modified using Walsh (1970) and
Reddy (1973). We propose two classes of unbiased sampling strategies such that the
estimator of population mean Y is
( X β1x + σ x )
ySA = y
(1.5)
A ( x β1x + σ x ) + (1 − A) ( X β1x + σ x )
{
}
Note that for A =1 the proposed estimator reduces to ySR . We now consider this
estimator under the following sampling schemes:
(i). Simple random sampling without replacement along with the jack-knife technique
*
and denote the resulting estimator as Y¶ .
SS
(ii). Midzuno (1952) - Lahiri (1951) - Sen (1952) type sampling scheme and denote the
resulting estimator by ySM .
On Classes of Unbiased Sampling …
95
Both the sampling strategies aim at getting some classes of better sampling
strategies than the existing ones in the sense of unbiasedness and lesser mean square
error.
2. Bias and MSE under SRSWOR
Consider estimator ySA under Simple random sampling without replacement
and denote it by ySS .
Let y = Y + e0 and x = X + e1 such that E (e0 ) = E (e1 ) = 0
Putting these values in (1.5) and using Taylor’s series expansion, we have
Aβ1xYe1
A2 β12xYe12
Aβ1x e0 e1
+
−
+K
ySS − Y = e0 −
X β1x + σ x ( X β1x + σ x ) 2 X β1x + σ x
Taking expectation on both sides and using (2.1) we have
Bias ( ySS ) = E ( ySS ) − Y
(2.1)
(2.2)


A2 β12x
Aβ1x
E (e12 ) −
E (e0 e1 ) (upto first order of approximation)
=Y
2
( X β1x + σ x )
 ( X β1x + σ x )

1
1
Since E (e0 2 ) = ( − )Y 2C y 2
n N
1 1 2 2
2
E (e1 ) = ( − ) X Cx
n N
1 1
E (e0 e1 ) = ( − ) XY ρ CxC y
(2.3)
n N
Therefore,
1 1
Bias ( ySS ) = Y ( − ) { A2ν 2Cx 2 − Aνρ Cx C y }
(2.4)
n N
Now, for mean square error, considering (2.2) to the first order of approximation, we
get
MSE ( ySS ) = E ( ySS − Y ) 2

Aβ1xYe1 
= E e0 −

X β1x + σ x 

= E (e0 2 ) +
2
A2 β12xY 2
2 Aβ1xY
E (e12 ) −
E ( e0e1 )
2
( X β1x + σ x )
( X β1x + σ x )
{
}
1 1
= Y 2 ( − ) C y 2 + ν Cx 2 ( A2ν − 2 AK )
(2.5)
n N
Note that on putting A = 1 in (2.4) and (2.5) we get expressions of bias and mean
square error of ySR and the optimizing value of the characterizing scalar A is given by
K
A = = Aopt (say)
(2.6)
ν
The minimum mean square error under optimizing value of A = Aopt is
1 1
MSE ( ySS ) = Y 2 ( − )(1 − ρ 2 )C y 2
n N
(2.7)
96
Journal of Reliability and Statistical Studies, December 2010, Vol. 3(2)
which is same as the mean square error of the linear regression estimator. Also note that
Bias ( ySS ) = 0 under the optimizing value of A .
3. Unbiasedness and MSE of Jack-Knife Sampling Strategy
Let us now apply Quenouille’s (1956) method of Jack-Knife such that the
sample of size n = 2m from a population of size N is split up at random into two sub
samples of size m each. For further details one may refer to Gray and Schucany (1972).
Let us define
X β1x + σ x
y SS( i ) = yi
; i = 1, 2
A ( xi β1x + σ x ) + (1 − A) ( X β1x + σ x )
y SS(3) = y s
X β1x + σ x
A ( xs β1x + σ x ) + (1 − A) ( X β1x + σ x )
(3.1)
where is the characterizing scalar to be chosen suitably such that s = s1 + s2 , s1 and s2
be the two sub samples of size m each and + denotes the disjoint union. y1 , y2 and
ys denote the sample means based on two sub samples of size m and the entire sample
of size n = 2m for characteristic y . x1 , x2 and xs denote the sample means based on
two sub samples of size m and the entire sample of size n = 2m for characteristic x .
It can be easily seen that
1 1
Bias y SS(i ) = Y ( − ) { A2ν 2Cx 2 − Aνρ CxC y } ; i = 1, 2
m N
1
1
Bias y SS(3) = Y (
− ) { A2ν 2Cx 2 − Aνρ CxC y } = B1 (say )
(3.2)
2m N
( )
( )
(2)
'
ySS + ySS
Let us define Y¶
as an alternative estimator of the population mean Y .
SS =
2
'
The bias of Y¶ is
(1)
SS
'
1 1
Bias(Y¶
− ) { A2ν 2Cx 2 − Aνρ CxC y } = B2 ( say)
(3.3)
SS ) = Y (
m N
*
We propose the jackknife estimator Y¶
for estimating population mean Y given by
SS
(3)
 N − 2n  ¶ '
−
y
 YSS
SS
¶
*
ySS − RYSS
B
 2( N − n) 
where R = 1
=
=
Y¶
(3.4)
SS
B2
1− R
 N − 2n 
1− 

 2( N − n) 
Taking expectation of (3.4) and using (3.2) and (3.3) we obtain
*
*
E (Y¶
showing that Y¶
is an unbiased estimator of population mean Y to the
SS ) = Y
SS
first order of approximation.
Consider
(3)
'
 y (3) − RY¶ '

*
SS
 SS
MSE (Y¶
=
E
−Y 
)
SS
 1− R



2
On Classes of Unbiased Sampling …
{
97
}
2
(3)
1
¶ ' − Y + RY
E
y
RY
−
SS
SS
(1 − R ) 2
( 3)
(3)
'
1
2
¶'
=
E ( ySS − Y )2 + R 2 E (Y¶
SS − Y ) − 2 RE ( ySS − Y )(YSS − Y )
2
(1 − R)
Also,
(3)
(3)
1
1
E ( ySS − Y ) 2 = MSE ( ySS ) = Y 2 (
− ) C y 2 + ν Cx 2 A2ν − 2 AK
2m N
Further,
=
{
{
(
}
(3.5)
)}
(3.6)
2
y +y

'
2
SS
 SS
E (Y¶
−Y 
SS − Y ) = E


2


2
(1)
( 2)
1
= E ( ySS − Y ) + ( y SS − Y )
4
(1)
(2)
(1)
( 2)
1
=
E ( ySS − Y ) 2 + E ( ySS − Y ) 2 + 2E ( y SS − Y )( ySS − Y )
4
Since
(i)
(i)
1 1
E ( ySS − Y )2 = MSE ( y SS ) = Y 2 ( − ) C y 2 + ν Cx 2 ( A2ν − 2 AK ) ; i = 1, 2
m N
Let yi = Y + e0(i ) and xi = X + e1( i ) such that E ( e0(i ) ) = E ( e1(i ) ) = 0 i = 1, 2
(1)
{
{
(2)
}
}
{
}
(3.7)
(3.8)
Consider
(1)
( 2)

Aβ1xYe1(1)   (2) Aβ1xYe1(2) 
E ( ySS − Y )( ySS − Y ) = E  e0(1) −
  e0 −

X β1x + σ x  
X β1x + σ x 

2


Aβ1xY
{ E(e0(2)e1(1) ) + E (e0(1)e1(2) )} +  X Aβ β1+xYσ  E (e1(1)e1(2) )
X β1x + σ x

1x
x 
Substituting the results in Sukhatme and Sukhatme
1
E (e0(1) e0 (2) ) = − Y 2C y 2
N
1 2 2
(1) (2)
E (e1 e1 ) = − X C x
N
1
E (e0 (1) e1(2) ) = E (e0(2) e1(1) ) = − XY ρ C xC y we have
N
(1)
( 2)
1 2
E ( ySS − Y )( ySS − Y ) = − Y {C y 2 + A2ν 2C x 2 − 2 Aνρ Cx C y }
N
1 2
2
2
2
= − Y C y + ν C x ( A ν − 2 AK )
N
Putting the values from (3.8) and (3.9) in (3.7) we have
'
1 2
2
2 1   1
E (Y¶
2 −  −
{C y 2 +ν Cx 2 ( Aν − 2 AK )}
SS − Y ) = Y
4   m N  N 
= E (e0(1) e0(2) ) −
{
}
{
}
1
1
− ) C y 2 + ν C x 2 ( A2ν − 2 AK )
2m N
Now consider
= Y 2(
(3.9)
(3.10)
98
Journal of Reliability and Statistical Studies, December 2010, Vol. 3(2)
(1)
( 3)
E ( ySS
(2)
(3)
'
ySS + ySS
− Y )(Y¶
− Y )}
SS − Y ) = E{( ySS − Y )(
2
(3)
(1)
(3)
( 2)
1
{E ( ySS − Y )( ySS − Y ) + E ( ySS − Y )( ySS − Y )}
2
Since
( 3)
(i )

Aβ1xYe1  (i )
Aβ1xYe1( i )
E ( ySS − Y )( ySS − Y ) = E  e0 −
 e0 −
X β1x + σ x 
X β1x + σ x

=

 where i = 1, 2

2
= E (e0e0(i ) ) −
 Aβ1xY 
Aβ1xY
( i)
E (e0(i ) e1 ) + E (e0 e1(i ) )} + 
{
 E (e1e1 )
+
X β1x + σ x
X
β
σ

1x
x 
using the following results given in Sukhatme and Sukhatme
1
 1
E (e0 e0( i ) ) = 
−  Y 2C y 2
 2m N 
1
 1
E (e1e1( i ) ) = 
−  X 2C x 2
 2m N 
1
 1
E (e0 e1(i ) ) = E (e0( i )e1 ) = 
−  XY ρ C xC y for i = 1, 2 we have
 2m N 
( 3)
(i)
1
 1
E ( ySS − Y )( ySS − Y ) = Y 2 
−  C y 2 + ν C x 2 ( A2ν − 2 AK )
(3.11)
 2m N 
Putting these values from (3.6), (3.10), (3.11) in (3.5) we have
*
1
1
 1
MSE (Y¶
Y2
−  (1 + R 2 − 2 R ) C y 2 + ν C x 2 ( A2ν − 2 AK )
SS ) =
(1 − R) 2
2
m
N


1
2 1
2
2
2
(3.12)
= Y ( − ) C y + ν C x ( A ν − 2 AK )
n N
which is equal to the mean square error of ySS . Therefore, the optimizing value of the
{
}
{
{
}
}
characterizing scalar A is given by (2.6) and the minimum mean square error under
optimizing value of A = Aopt is given by (2.7).
4. Unbiasedness and MSE of Midzuno-Lahiri-Sen Type Sampling Strategy
Let us consider ySA under Midzuno (1952)-Lahiri (1951)-Sen (1952) type
sampling scheme and denote it by ySM . The proposed Midzuno-Lahiri-Sen type
sampling scheme for selecting a sample s of size n deals with selecting first unit with
probability proportional to X β1x + σ x + Aβ1x ( xi − X ) where xi is the size of the first
selected unit such that
X β1x + σ x + Aβ1x ( xi − X )
P (i ) = P(selecting first unit i with size xi ) =
(4.1)
N ( X β1x + σ x )
and selecting the remaining n − 1 units in the sample from N − 1 units in the population
by simple random sampling without replacement. Thus the probability of selecting the
sample s of size n is
X β1x + σ x + Aβ1x ( xs − X )
P (s ) =
(4.2)
( X β1x + σ x ). N Cn
On Classes of Unbiased Sampling …
99
where xs and ys are the sample mean of x and y respectively based on the sample s .
Consider
y s ( X β1 x + σ x )
E ( ySM ) = E{
}
X β1x + σ x + Aβ1x ( xs − X )
N
=
s =1
N
=
ys ( X β1x + σ x )
P( s )
+ σ x + Aβ1x ( xs − X )
Cn
∑ Xβ
1x
Cn
∑
s =1
ys
Cn
N
= E ( ys )
(under simple random sampling without replacement)
=Y
(4.3)
showing that ySM is an unbiased estimator of population mean Y for all values of A
under the proposed Midzuno-Lahiri-Sen type sampling scheme.
Now, since
2
MSE ( ySM ) = V ( ySM ) = E ( ySM
) −Y 2
where
N
Cn
2
2
E ( ySM
) = ∑ ySM
.P (s )
s =1
=
=
2


y s ( X β1x + σ x )

 P( s )
∑
+
+
−
X
β
σ
A
β
(
x
X
)
s=1 
1x
1x

x
s
N
Cn
N
Cn
∑y
s=1
2
s
( X β1 x + σ x )
1
X β1x + σ x + Aβ1x ( xs − X ) N Cn
−1

2 
Aβ1x e1  
= E (Y + e0 ) 1 +
 

 X β1x + σ x  



Aβ1x e1
A2 β12x e12
= E (Y 2 + e02 + 2Ye0 ) 1 −
+
− K 
2
 X β 1x + σ x ( X β 1 x + σ x )


2 2
2
A β1xY
Aβ1xY
= Y 2 + E (e02 ) +
(upto 1st order of
E (e12 ) − 2
E (e0 e1 )
2
( X β1x + σ x )
X β1x + σ x
approximation)
Therefore, by using (2.3), we have
*
1 1
MSE ( ySM ) = Y 2 ( − ) C y 2 + ν C x 2 ( A2ν − 2 AK ) = MSE ( y SS ) = MSE (Y¶
SS )
n N
= MSE ( ySA ) (say)
(4.4)
{
}
Thus, the optimizing value of the characterizing scalar A is given by (2.6) and
the minimum mean square error under optimizing value of A = Aopt is given by (2.7).
Further, let the minimum mean square error under optimizing value of A = Aopt be
denoted by MSE ( ySA )min .
100
Journal of Reliability and Statistical Studies, December 2010, Vol. 3(2)
5. Concluding Remarks
If the minimizing value A = K δ = Aopt is known then, we have
1 1
MSE ( y ) − MSE ( y SA )min = ( − )Y 2 ρ 2C y 2 ≥ 0
n N
2
1 1
MSE ( y R ) − MSE ( ySA )min = ( − )Y 2 ( C x − ρ C y ) ≥ 0
n N
2
1 1
MSE ( yP ) − MSE ( ySA )min = ( − )Y 2 ( C x + ρ C y ) ≥ 0
n N
2
1 1
MSE ( ySR ) − MSE ( y SA )min = ( − )Y 2 (δ C x − ρC y ) ≥ 0
n N
2
1 1
MSE ( ySP ) − MSE ( y SA )min = ( − )Y 2 (δ C x + ρ C y ) ≥ 0
n N
Hence, under the optimizing value of the characterizing scalar A = K
(5.1)
(5.2)
(5.3)
(5.4)
(5.5)
ν = Aopt
, the proposed sampling strategies are always better than y , ySR , ySP , yP , yR and ylr
in sense of unbiasedness and gain in efficiency.
6. Empirical Study
Let us consider the following example considered by Singh and Chaudhary
(1986) wherein the following values were obtained Y = 1467.545, X = 22.62, Cx =
1.460904, C y = 1.745871, β1x = 3.307547, σ x =32.28717, ρ = 0.902147. The bias,
mean square errors and percent relative efficiency (PRE) w.r.t. y of the sample
mean y , ratio estimator yR , product estimator yP , ySR , ySP and ylr are given by
Est. (t)
y
yR
yP
ylr
ySR
ySP
Bias (t)
0
-244.68
3376.77 3867.58 -830.47 2358.86
MSE (t) 6564590 1249925 21072230 1221872 1884104 15731012
PRE( t :
100
525.20
31.15
537.26
348.42
41.73
y)
*
ySM / Yµ SS
0
1221872
537.26
1 1
Note that the above results are scaled by the factor ( − ) . It can be easily observed
n N
from the table that only sample mean y and the proposed sampling strategies are
unbiased. Further, it can be easily observed that ylr and the proposed sampling
strategies attains the minimum mean square error but ylr is biased. It is evident from
the above empirical study that the proposed sampling strategies are better than the
remaining sampling strategies both in terms of unbiasedness and mean square error.
References
1.
2.
Gray H. L. and Schucany W. R. (1972). The Generalized Jack-knife Statistic,
Marcel Dekker, New York.
Lahiri D. B. (1951). A method of sample selection providing unbiased ratio
estimates, Bull. Int. Stat. Inst., 3, p. 133-140.
On Classes of Unbiased Sampling …
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
101
Midzuno H. (1952). On the sampling system with probability proportional to the
sum of the sizes, Ann. Inst. Stat. Math., 3, p. 99-107.
Murthy M. N. (1967). Sampling theory and methods, Statistical Publishing Society,
Calcutta, India.
Quenouille M. H. (1956). Notes on bias in estimation, Biometrika, 43, p. 353-360.
Reddy V. N. (1973). On ratio and product methods of estimation, Sankhya, B,
35(3), p. 307 – 316.
Sen A. R. (1952). Present status of probability sampling and its use in the
estimation of a characteristic, Econometrika, 20, p. 103.
Senapati S. C., Sahoo L. N. and Misra G. (2006). On a Scheme of Sampling of Two
Units With Inclusion Probability Proportional to Size, Austrian Journal of
Statistics, 35(4), p. 445–454.
Singh G. N. (2003). On the improvement of product method of estimator in sample
surveys, J. I. S. A. S., 56(3), p. 267-275.
Singh H. P., Singh R., Espejo M. R., Pineda M. D. and Nadarajah S. (2005). On
The Efficiency Of A Dual To Ratio-Cum-Product Estimator In Sample Surveys,
Mathematical Proceedings of the Royal Irish Academy, 105A (2), p. 51–56.
Singh D. and Chaudhary F. S. (1986). Theory and analysis of sampling survey
design, Wiley Eastern Limited, New Delhi.
Sukhatme P. V. and Sukhatme B. V. (1997). Sampling Theory of Surveys with
Applications, Piyush Publications, Delhi.
Walsh J. E. (1970). Generalization of ratio estimator for population total, Sankhya,
A, 32, p. 99-106.