Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
AN UPPER BOUND FOR THE VARIANCE OF CERTAIN STATISTICS ' . by Wassily Hoeffding University of North Carolina .~ :: .:. This research was supported by the United States Air Force through the Air Force Office of Scientific Research of the Air Research and Development Command, under Contract No. AF 18(600)-458. Reproduction in whole or in part is permitted for any purpose of the United states Government. .j·f~ ;,::~k:~f.~~;~,:· : Institute of Statistics Mimeograph Series No. 193 "March, 1958 AN UPPER BOUND FOR THE VARIANCE OF CERTAIN STATISTICS l by Wassily Hoeftding University of North Carolina 1. Results. Let X ,X , ••• ,X be independent and identically disl 2 n tributed random variables (real- or vector-valued). Let f(X l ,X 2 ) be a bounded function such that f(X l ,X2 ) = f(X2 ,Xl ). erality we shall assume thet the bounds are 0 ~ With no loss of genf (Xl ,X2 ) ~ 1. Let 2 U = n(n-i} Examples of statistics of this form are given below. The mean of U is E(U) =p and the variance is var(U) 2 = n(n-l) 2 2 (2(n-2)(r-p ) + s - p) , where As n tends to infinity, £2 J. ID (U-p) has a normal limiting distribution Hence i f we have an upper bound for the variance of U which de- pends only on p and n, we can obtain an approximate confidence region for p and a lower bound for the power of a test based on U when n is large (see [3] ). 1 This research was supported by the United states Air Force through the Air Force Office of Scientific Research of the Air Research and Development Command, under Contract No. AF 18(600)-458. Reproduction in whole or in part 1s permitted for any purpose of the United states Government. - 2 2 2 It is known ~2~ that 2(r-p ) ~ s - p , and since obviously s ~ p, we have var(U) < -2 p(l-p). -n In this note we shall show that, under the stated assumptions, -p 2 ( 1) -(l-p) 2 ~ 1 2 , P , 1 p ~2 It is easily seen that the sign of equality holds in (1) if, with proba(for p ~ ~) or f(X l ,X2 ) where g(X) takes the values 0 and 1 only. bility one, f(Xl'X 2 ) (for p ~ 1 2)' Inequality (1) (2) = S(X 1 )g(X2 ) ~plies = 1- g(X l )g(X2 ) that 2 2 var(U) ~ n(n-l) (2(n-2) R(p) + 1 - P ) • An inequality analogous to (1) was conjectured by Daniels and Kendall Ll J for the variance of the finite population analogue of the statistic t defined in Example 1 below. A proof of this conjecture suggested by Sundrum ["4 J does not seem to be complete. We now give three examples of statistics to which the present bound is applicable; in the first two examples the bound can be attained. Example 1. Let Xi = (Yi'Zi) be a random vector with two con- tinuously distributed components, and f(X l ,X )= 1 or 0 according as 2 (Yl-Y2 )(Zl-Z2) is positive or negative. well-known rank correlation coefficient. In this case t = 2U - 1 is a The condition for equality in (1) is satisfied if Z 1s a function of Y of a certain form, for instance positive and decreasing for Y < 0 and negative and increasing - 3 for Y > 0 (if P and d~creasing ~ 1 2); or negative and increasing for Y < 0 and positive for Y > 0 (if P ~ ~). For if we let g(X) = 0 or 1 ac- cording as Y < 0 or Y > 0, then, with probability one, f(X l ,X2 ) = g(X )g(X ) in the first case and f(Xl'X ) = 1- g(X )g(X 2) in the l 2 2 l second case. These two cases correspond to the inverse canonical rank- is attained, for instance, if Xi can take only two values, a and b, such that either a + b < 0 Here we may take g(X) case, and g(X) = 0 or < b (if P ~ = 0 or 1 2) or a < 0 < a + b (if p ~ 1 2). 1 according as X < 0 or X > 0 in the first 1 according as X > 0 or X < 0 in the second case. Example 3. Let Xi again be real-valued, and let where FO(:X) is a continuous (cumulative) distribution function. Then, if Xi has the distribution function F(x), and (U - ~)/2 differs in large samples negligible little from the Cramervon Mises goodness of fit criterion fO[F n (x) n Fn(X) denotes the number of observations - F o(x) < x. J 2 dF o(x), where In this case the - 4 condition for equality in (1) cannot be satisfied, and presumably the upper bound for the variance can be further tmproved. 2. Proof of Inequality (1). We first assume that Let Xl ,X2 ' ••• 'X be independent and identically distributed random varin ables, A -2 n n p = nEE i=l j=l As n --> " f. j ' 1 A -3 n n n r = n E E E f. 4 fO k • i=l j=l k=l lu 1 and rA converge in probability to p and r respectively 00, p We shall show that A (4) r - "2 " p -< R(p) + € n where R(p) is the function defined in (l).and verge to 0 as n --> 00. , € n are numbers which con- Since the function R(p) is continuous, inequality (4) easily tmplies inequality (1). A A Both p and r are functions of the nxn matrix IIfiJ II whose elements satisfy the conditions f Let Then ij = 0 or 1, - 5 2 A n p= n 1: F., 1=1 1 A 1\ We first show that, in order to find an upper bound for r when p is fixed, we may assume that (6) For suppose that there are integers 1, j, k such that F. > F., J 1 - and f ik < f jk. Thus f ik = 0, f jk = k f i, Let IIf~vll be the nxn matrix de- L fined by f lk = f ki = 1, fjk = f kj = 0, f~v = f uv otherwise. The transA formed matrix satisfies (5), and the value of p is not affected by the transformation. Also, writing F'u for the sum of the u-th row in the new matrix, Fi = F i + 1, Fj =F j - 1, 1:(F~) 2 2 - 1: Fu F~ = (Fi+l) 2 = = Fu otherwise. Therefore 2 2 2 - Fi + (Fj-l) - F j 2(F i -F j) + 2 > O. A Thus the value of r is increased by the transformation. By repeated application of this transformation we can obtain a matrix which satisfies (6), for instance as followso We first take one of the rows with the largest sum to be the i-th row, and for every j f i we apply the f i with f ik < fjko Having exhausted all k and all j, we are left with a matrix such that every column different transformation for each k from the i-th that has a 0 in the i-th place will consist of zeros only. This implies that any further transformations will not affect this i-th row. We next take one of the rows with the largest sum among the remaining n-l rows, and repeat the procedure described above, and - 6 so on. Eventually we obtain a matrix which satisfies the conditions A (5) and (6) without changing the value of p and without decreasing the A value of r. With no loss of generality we may assume that, in addition to (5) and (6), the matrix I/f ij " satisfies the condition (7) > F 2 -> ... -> Fn F • 1- For this can always be achieved by rearranging suitable rows and columns, which will not affect the values of A p and "r. Thus we now restrict our- selves to symmetric matrices whose every row and every column, apart from a 0 in the main diagonal, consists of a sequence of l's followed by a sequence of 0' s. A typical matrix of this form is shown below. 0 1 1 1 1 1 0 1 0 1 1 0 0 0 1 1 0 1 0 0 0 1 1 1 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 For a fixed matrix IIfij /I of this type let m denote the greatest integer such that fm,m+l = 1. Then f ij = 1 if i and f 1j = o if 1 ~ m+l, j ~ m+l. Hence n 2" p= = and m .E m m n .E foj+.E .E i=l j=l 1 n n .E i=l j=m+l m+l, j m ~ m+l (i + .E .E f " 1j i=m+l j=l i J fo i=l j=m+l m m(m-l) + 2.E ~ fO 1 j f j), - 7 - m = n n 2 L (m - 1 + L f ) i=l j=mt1 i j + m 2 L (L f .. ) i=mt-1 j=l lJ 2 m n m n 2 n m 2 = m(m-1) +2(m-1) L L f + L ( L f. ) + L (Lf. ) . 1=1 j=mt1 1j 1=1 j=m+1 1 j j=m+1 1=1 1 j If we put d - m 1 1 a1 = n-m =- bj L f , m 1=1 1j 1 m n 1 m L f i . = L a ( ) L m n-m 1=1 j=m+1 J m 1=1 1 1 =-n-m n L b , j:::m+1 j we obtain (8) n 2 "p::: m(m-1) + 2 m(n-m)d and " n 3 r = (m-1) n 2 " p + (n-m) 2 m 2 2 L a + m i 1=1 ::: (m-1) n 2 "p + m(n-m)n d 2 1 m + m(n-m)["(n-m) L (ai-d) m i=l 2 n 1 + mL (b .-d) n-m j=m+1 J 2 7 - Now m 1 m(n-m) = n 2 L L [(fij-d) - (ai-d) - (bj-d)J i=l j=mt1 1 m(n -m~ WJ ::: d - d ~ 1=1 n ( )2 1 m ( )2 1 n ( )2 L fij-d - -m Z ai-d - - n m Z bj-d j=m+1 1=1 - j=m+1 21 m 2 - L (a. -d) m 1=1 1 1 n-m n 2 L (bj-d). j=m+1 - 8 Since the left hand side is nonnegative, we obtain 2 1 m 2 1 n 2 2 - L (ai-d) + --- L (bj-d) < d - d • m i=l n-m j=m+l - (10) It now follows from (9) and (10) that /\ n3 r < (m-1) n 2/\ 2 p + m(n-m) n d (11) 2 + m(n-m)me.x(m, n-m) (d-d ). /\ /\ We thus have obtained an upper bound for r in terms of p for all matrices /Ifijll with m fixed. We now put m c = - • n By equation (8), since 0 2 (12) C /\ ~ P + ~ d ~ 1, the range of c is given by 2 C /\ c (I-c) -< 1 - p - -n • n' We therefore may assume that c is bounded away from 0 and 1 as n Indeed, inequality (1) is trivially true if p = 0 or 1; and if 0 -> 00 • < P < 1, /\ since p tends to p in probability, inequalities (12) imply that c is bounded away from 0 and 1 with a probability which approaches one as n tends to infinity. From (II) we obtain /\ r where E~ 2 -> /\ < c p + c(l-c)d 0 as n -> 00. 2 2 + c{l-c)max c{l-c}(d-d ) + E~ , If we express d in terms of ~ and c by Inequalities (10) and (II) are closely related to the sharp upper bound for the variance of the Wilcoxon-Mann-Whitney two-sample statistic in terms of its mean; see D. van Dantzig, "On the consistency and the p.ower of Wilcoxon r s two sample test," Nederl. Akad. Wetensch. Proc. Sere A, Vol. 54, No.1, pp. 1-8 (195l), footnote 4. - 9 means of (8)" we have 1\2 1\ 1\ r - p ~ R(p" c) + €n" where 2 2 2 2 p-c r p-c ( )( p-c) ] = R( P"c ) -P + cp + ~L 2c(1-c) + max c, 1-c 1 - 2c(1-c) and €n -> 0 as n -> (D • We now maximize R(p, c) with respect to c. By (12) we may assume that l-c ~ (l-p) 1/2 • We can write R(p"c) = G(l-p, I-c), c ~ 1 2; R(p"c) = G(p,c)" c ~ 1 2' where G(p,c) = -p2 + pc + '41 2 -1 1 3 p c - 'Ii: c • We calculate 1/2 1 1/2 2 2 1/2 G(p"p )-G(PIC)=~(P -c)(c +2p c-p). If c ~ ~, the right side is positive, and hence G(P1 C ) ~ G(p, pl/2) = p3/2 _ p2. It follows that and the function on the right is just R(p) as defined in (1). This com- pletes the proof for the case where f(X l ,X2 ) takes on the values 0 and 1 only. Now let f(X l ,X2 ) be any function which satisfies the assumptions of the theorem. Then f(X l ,X2 ) can be approximated by a function f'(X "X2 ) l - 10 - which also satisfies the assumptions of the theorem and takes only finitely many values, in such a way that pI rl = E f'(X l ,X2 ) = E f'(X l ,X2 ) and f'(Xl,X,) are as close as we please to p and r, re- It will therefore be sufficient to assume that f(X l ,X2 ) spectively. takes on the k values fl'f 2 , ... ,fk • Then it can be written in the form k f(X l ,X2 ) = i:l f i c i (Xl ,x2 ), Now let y l ,Y2 ,Y, be mutually independent random variables, independent of Xl 'X 2 'X', where each Yi is uniformly distributed on the Yi ~ 1. The random vectors Zj = (X j ' Yj ), j = 1,2,', are mutually independent and identically distributed. Now define interval 0 ~ where 1, 0< Y < fl/2 o otherwise. - - i Then The conditional expected value of f*(Zl'Z2) for Xl ,X2 fixed is E[f*(Zl'Z2) Hence Ix l ,x2 J::: k i~l fiC i (Xl'X2 ) ::: f(Xl'X2 ) • - 11 - (13) Also k f*(Zl,Z2) f*(Zl,Z3) = i~l k j~l d i (Y1 ) d1 (Y2 ) c i (X 1 'X2 ) dj(Y1 ) d j (Y 3 ) C j (X l 'X 3 ) so that k > t - j=l k l: j=l fifo c.(X l ,X 2 ) c.(X1'X~) J ~ J ~ Thus Since, by the first part of the proof, inequality (1) is true for f*, it follows from (13) and (14) that it is also true for f. is complete. The proof .. I , ... - 12 References [ l J M. E. Daniels and M. G. Kendall , "The significance of rank of correlations where parental correlation exists." Biometrika, Vol. 34 (1947), pp. 197-208. £2 J W. Hoeffding, "A class of statistics with asymptotically normal distribution." Ann. Math. Stat. , Vol. 19 (1948), pp. 293-325. [ 3 ] M. G. Kendall, Rank Correlation Methods. and New York (Hafner), 1955. £4] 2d ed. London (Griffin) R. M. Sundrum, "Moments of the rank correlation coefficient in the general case." Biometrika, Vol. 40 (1953), pp. 409-420. T