Download Document

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Psychometrics wikipedia , lookup

Generalized linear model wikipedia , lookup

Arrow's impossibility theorem wikipedia , lookup

Transcript
Asymptotic Efficiencies of the Goodness-of-Fit Test based on Linear
statistics.
Muhammad Naeem
Deanship of Preparatory Year Umm Al Qura University Makkah, KSA.
[email protected]
Abstract
A huge literature about the Pitman’s asymptotic efficiencies (AE) of the goodness-of-fit
tests based on higher-order non-overlapping spacings is available. The performance of
the linear random variable based on higher-order non-overlapping spacings is measured
asymptotically. Since the linear test satisfy the Cramér’s condition so the probability of
Large Deviation results are applicable for this statistics. It is observed that just like
Pitman’s sense, Linear test in intermediate (all three cases) is most efficient as well
because it satisfies the above mentioned condition.
Key words: Asymptotic Efficiency, Goodness of fit, Spacing, Cramér’s condition, Large
Deviation.
1. Introduction
Let X1 , X 2 ,..., X n1 be an ordered (ascending form) sample from a population having
continuous cumulative distribution function (cdf) F and probability density function
(pdf) f ( x ) . The goodness-of-fit problem is to test if this distribution is equal to a
specified one. A common approach to these problems is to transform the data via the
probability integral transformation U  F ( X ) so that the support of F is reduced to [0,1]
and the specified cdf reduces to that of a uniform random variable on [0,1]. Therefore,
without loss of generality, one may deal with the problem of testing the null hypothesis
H0: f ( x)  1 , 0  x  1 ,
(1.1)
against alternative that f is a pdf of some other random variable (different from uniform)
having support on [0,1]. From now on, we deal with this reduced problem. There are two
basic approaches for the goodness-of-fit problem: tests based on observed frequencies
and those, based on spacings. It is known that while the tests based on frequencies
perform better in detecting differences between the distribution functions, the tests based
on spacings are useful to detect differences between the corresponding densities. It is
worth noticing too that Jammalamadaka and Tiwari (1987) have shown that comparable
test based on k-spacings is better than chi-squared test in terms of local power. With the
notations X 0  0 and X n  1 the non-overlapping k- spacings are defined as
D(j k )  X jk  X ( j 1) k , j  1, 2,..., N  , DN( k) 1  1  X N k , where integer k  [1, n] , N   [n / k ]
is the greatest integer less than or equal to n / k . Consider N  N  if n / k is an integer
k
and N  N   1 otherwise. Note that all of D j  , X j depend on n also but the extra suffix
is omitted for notational simplicity. We are testing hypothesis (1.1) against the sequence
of alternatives
H1,n: f ( x)  1  dl ( x) (n) , 0  x  1 ,
(1.2)
where  (n )  0 as n   , d >0 is a distance between H1 and H0, l(x) is a direction of H1,
such that

1
0
l ( x)dx  0 ,
1
l
0
2
( x)dx  1 .
(1.3)
Assume that k may tends to infinity as n   , we consider test based on the statistic
N
L N   amn Dm( k, N)
where am ,n , m  1,2,..., N .
(1.4)
m 1
The large value of L N rejects the hypothesis. Tests based on simple spacing, i.e.1spacings, have been proposed by many authors (see, for example, Pyke (1965) and the
references contained therein). Distribution theory of such statistics and their asymptotic
efficiencies have been studied, for instance by Rao and Sethuraman (1975), Holst and
Rao (1981). For the first time the asymptotic normality of the statistics based on disjoint
k-spacings was discussed by Del Pino (1979) and has shown that it is more efficient in
Pitman sense than simple spacings statistics. We also refer to a series of papers by
Jammalamadaka Rao and co-authors (see, for example, Morgan Kuo and
Jammalamadaka (1981) and Jammalamadaka et al (1989). Here statistics (1.4) is called
Linear statistics based on k-spacings and it was studied by Holst and Rao (1981) with
k=1. The linear statistics belongs to the class of non symmetric statistics based on
spacings and it is well known that tests based on such type of statistics can detect
alternative (1.2) with  (n)  n1/ 2 . The present paper discusses the test of goodness of fit
based on (1.4) with k  1 which may increase jointly with n. It is shown that the
Kallenberg intermediate efficiency coincides with Pitman efficiency of statistic (1.4).
2. Asymptotic normality of L N .
In the following discussion, 0  U 0  U1  ...  U n1  U n  1 be an ordered sample from
uniform [0,1] distribution and Tm( k ) their non-overlapping k-spacings. Let hm ( x, m N ), m
=1,2, …, N, be measurable functions. Consider the statistics
N
RN   hm (nTm , m N ) .
(k )
(2.1)
m 1
Let Z and Z m,k , k=1,2,…,N be independent and identically distributed random
variables with common density function  k (u )  u k 1e  u /  ( k ) , u  0 , where (k ) is well
N
known gamma function. Put S N ,k  Z1,k  ......  Z N ,k , QN   hm ( Z m,k , m N ) ,
m 1
 N  corr (QN , S N , k )
gm (u, m N )  hm (u, m N )  Ehm (Z , m N )  (u  k )  N
VarQN
,
Nk
(2.2)
N
N
 N2  Var g m ( Z m,k , m N ) .
AN   Ehm ( Z m,k , m N ) ,
m 1
m 1
The following Assertion is the Corollary 2 of Mirakhmedov (2005).
Assertion. If
1

N
E g
3
N m 1
3
m
( Z m,k , m N )  0 , as N   , then the random variable
RN has asymptotically normal distribution with expectation AN and variance  N2 . The
statistic L N is a special case of (2.1) with hm ( x, y)  a  y  x . Therefore from Assertion it
follows. By putting, d j , N 
1
N
N
 amj ,n , b j ,N 
m 1
1
N
N
 a
m 1
 d1, N  .
j
m ,n
Theorem 2.1. If b4,N / Nb2,2 N   as N   then the r. v. L N has asymptotically
normal distribution with expectation nd1,N and variance nb2,N .
Proof. The r. v. L N is included in the family of statistic mentioned in (2.1) with kernel
function hm (u )  am ,nu . It is well known that EZ s  k (k  1)  ...  (k  s  1) , s  1 ;
EZ  VarZ  k ,
E  Z  k   3k (k  2) .
4
(2.3)
By using these in the notation (2.2) we have Ehm ( Z , m N )  am,n k , Var QN  Nkd 2, N ,
d 2, N , g m ,n ( Z m ,k )   am ,n  d1, N  Z m ,k  k  ,
 N  d1, N

d1,2N 
2
  1 
 Nkd 2, N  Nk  d 2, N  d1, N   Nkb2, N  nb2, N .
 d 2, N 
2
N
(2.4)
and by (2.3)
N
 Eg
m 1
4
m
( Z m ,k )  Nb4, N E  Z m ,n  k   3k (k  1) Nb4, N .
4
(2.5)
By applying relations (2.4), (2.5) in the above mentioned Assertion, and well known
inequality 3,N  4,1/N2 one can get

P L N  x nb2, N  nd1, N

1/ 2
 3  1b 
 ( x)  C  1   4,2 N 
 N  k  b2, N 
.
(2.6)
It proves Theorem 2.1.
Remark 2.1. Actually in (2.6) we have a more general result as compared to Theorem
2.1, namely estimation of the remainder term in CLT for L N .
Remark 2.2. The r. v. L N coincides with linear combination of order statistics, in our
case, with sample from uniform (0, 1) distribution. As usually, consider a r. v. of L N type
it is assumed that
N
 a m ,n  0 ,
m 1
N
a
m 1
2
m ,n
 1.
(2.7)
This condition does not lose generality because otherwise it can be taken
am ,n   am ,n  d 1,N

b 2,N
instead of am ,n . Under condition (2.7) if, additionally,
b4,N   as N   then r. v. L N has asymptotically normal distribution with zero
expectation and variance k.
4. Probability of Large Deviation.
The Cramér’s condition: there exists H  0 such that
E exp H g ( Z , m N )   , m  1, 2,3,..., N , is obviously satisfied by the statistic L N .
Therefore, by the Theorem, Mirakhmedov et. al. (2011), it follows
Theorem 4.1. For all x  0 , x  o
 n
 x3
 x 
 x 1  
P0 L N  x nb2, N  nd1, N  ( x) exp 
LN  
  1  O 
 ,
N 

 N 
 N


where LN (u )   0 N   1N u  ... is a special Cramér’s type power series for N large
enough
jN
 C   , j  0,1, 2,... .
5. Asymptotic Efficiency.
From Theorem 2.1 and well known theorem on convergence of moments (see, for
example Theorem 6.14 by Moran (1948b) it follows that
E1L N  Nd1, N  o( N ) and Var1L N  Nb2,2 N (1  o(1)) ,
(5.1)
Let Pj , E j and Varj denotes the expectation and variance accounted under H j , j=0,1. we
assume that E1L N  E0L N  0 . For non-symmetric statistics,
xN ( f )  N  E1L N  E0L N 
Var1L N   n (n)d  ( f )(1  o(1)) , with
1
 ( f )   l (u ) corr ( f ( Z , u ), Z )du ,
(5.2)
0
because Var  Z   k . Due to Holder’s inequality  ( f )  1 since (1.3) and equality can
be achieved if and only if f ( y, u )  l (u ) y . Therefore, non-symmetric tests discriminate
alternatives H1 (1.2) with  (n)  n 1 2 . Asymptotically optimal test in Pitman’s sense is a
n
linear test, which rejects H 0 if L N   am ,n Gmn  C . This result was also obtained by
m 1
Rao & Sethuraman (1975) and Holst & Rao (1981) for k=1. Asymptotically most
powerful test among non symmetric tests is a linear test based on linear statistic

N

L N   l (m / N ) Dm( k, N) critical region of which is given by x : x  u n and asymptotic
m 1
power is   d  u  and it can be applied to detect alternatives with given direction l ( x) ,
because all statistical characteristics of this test depend on l ( x) . The Linear test, being
non symmetric, can detect alternatives at a distance n and sequence of alternatives,
(1.2) with  (n)  n1/ 2 , are called Pitman’s alternatives. The Pitman’s alternatives, when
the sequence of alternatives H1,n converges with a rate  (n)  (n) 1/ 2 , is one of extreme
cases. Another extreme case arises in Bahadur’s concept which proposes that alternative
is fixed i.e. H1,n does not approach to null hypothesis. One can say, there seems no need
to use statistical methods in the case of alternatives far from the null hypothesis. Between
these extreme cases there is intermediate approach. For our case intermediate alternatives
determine  (n ) in (1.2) such that
n (n)   .
 ( n)  0 ,
(5.3)
These situations give rise to the concepts of intermediate asymptotic efficiency (IAE) due
to Kallenberg (1983), see also, Inglot (1999), Ivchenko and Mirakhmedov (1995).
According to the classification of Kallenberg (1983) this efficiency will be weak   IAE
n  n   O
if


ln n , middle   IAE if
  and considered to be strong
n  n    n
16
  IAE when n  n     n1 2  . Asymptotic relative efficiency of one test is defined as
the ratio of its asymptotic slopes. For Pitman’s alternatives it is equal to Pitman’s
asymptotic relative efficiency, see for example Fraser (1957). We have
Theorem
5.1
Let
alternative
H1
be
specified
by
(1.2)
and
(5.3).
If
en ( f ) d 2 2
0 E expH f (Z , u) du   for some H>0, then n 2 (n)  2  ( f )(1  o(1)) .
1
Proof. Following Mirakhmedov (2006), under the conditions of the theorem, we have
eN ( f )   log ( xN ( f )) o  xN2 ( f )  . Since  log  (  x ) 
1 2
x (1  o(1)) as x   ,
2
d2 2
n (n)  2 ( f )(1  o(1)) . The Theorem 5.1 is proved.
from (5.2) we have eN ( f ) 
2

It follows from Theorem 5.1 that  2 ( f ) should be taken as a measure of IAE of the f test. Thus, for intermediate alternatives (1.2) and (5.3) with direction l ( x) within the
class of non-symmetric tests, the linear test (based on statistics L N ) is most efficient in the
IAE sense, since it satisfies condition of Theorem 5.1. Due to Kallenberg’s classification
one can say that the linear test is optimal in the strong IAE sense.
Conclusions
(1) We shall measure the performance Linear test by the asymptotic value of slope
eN  L N    log P0 L N  Nd1, N  o( N ) .
(2) Since the Cramér’s condition, for L N , is satisfied so the probability of Large
Deviation results are applicable for this statistics.
(3) Just like Pitman’s sense, Linear test is most efficient in intermediate (all three
cases) because it satisfies the conditions of Theorem 5.1.
References
1. Del Pino (1979) On the asymptotic distribution of K-Spacing with application to
goodness of fit tests. Statistic 7,1058-1065.
2. Fraser, D.A.S. (1957). Nonparametric methods in Statistics. John Wiley, New
York.
3. Inglot, T. (1999). Generalized intermediate efficiency of goodness-of-fit tests.
Math. Methods Statist., 8, 487-509.
4. Ivchenko, G.I. and Mirakhmedov M.A. (1995). Large deviations and intermediate
efficiency of decomposable statistics in a multinomial scheme. Math. Methods of
Statistics.,4(3), 294-311.
5. Holst L. and Rao J.S. (1981). Asymptotic spacings theory with applications to the
two sample problem. Canadian J. Statist., 9, 603-610.
6. Jammalamadaka, S. Rao and Tiwari, R. C. (1987), Efficiencies of some disjoint
spacings tests relative to a chi-square test, New Perspectives in Theoretical and
Applied Statistics (eds. M.L. Puri, J. Vilaplana and W. Wertz), John Wiley, 311317.
7. Jammalamadaka, S.R; Zhou X. and Tiwari R.C. (1989). Asymptotic efficiencies
of spacings tests for goodness of fit, Metrika, 36, 355-377.
8. Kallenberg W.C.M. (1983). Intermediate efficiency. Ann. Statist.; 11, 170-182.
9. Mirakhmedov Sh.M. (2005). Lower estimation of the remainder term in the CLT
for a sum of the functions of k-spacings. Statist. & Probab. Letters. 73, 411-424.
10. Mirakhmedov Sh.M. (2006). Probability of large deviations for the sum of
functions of spacings. Inter. J. Math. and Math. Sciences. V., Article ID 58738, 122.
11. Mirakhmedov Sh.M. and Muhammad Naeem (2008) Asymptotic Efficiency of
the Goodness of Fit Test Based on Extreme k-spacings Statistic. J. Appl. Probabl.
Statist. 3(1), 43-53.
12. Mirakhmedov Sh.M., Tirmizi, S.I. and Naeem M. (2011). Cramer-Type large
deviation theorem for the sum of functions of higher ordered spacings. Metrika 74
(1).
13. Moran, P. A. P. (1948b). Some theorems on time series II. The significance of the
serial correlation coefficient. Biometrika 35, 255–60.
14. Morgan, K and Rao, J.S.(1981) Limit theory and efficiencies for test based on
height ordered spacing. Proceedings of the Indian jubilee international conference
on statistics: Application and New directions, Calcutta 1981, 333-352.
15. Pyke R.(1965), Spacings, J.Roy.Statist.Soc. ser. B 27, 395-449.
16. Rao,J.S Sethuraman (1975). Week convergence of empirical distribution
functions of random variables subject to perturbation and scale factors. Ann.
Statist. 3, 299-313.