Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
..• This work was supported by the National Institutes of Health, Institute of General Medical Sciences, Grant No. GM-12868-04 A CENTRAL TOLERANCE REGION FOR THE MULTIVARIATE NORMAL DISTRIBUTION II by David G. Kleinbaum and S. John 1 University of North Carolina Institute of Statistics Mimeo Series No. 620 January 1969 I 1 Now at the Australian National University, Canberra I A CENTRAL TOLERANCE REGION FOR THE MULTIVARIATE NORMAL DISTRIBUTION II Introduction 1. In a recent paper of the same title, John (1968) considered the problem of determining from a random sample of size N from a S p-variate normal distribution a region which with probability includes the region R = .{ x: (x - lJ)" L -1 (x - lJ) ~ 2 X p (1 - a) } Here lJ and L are respectively the mean vector and covariance matrix of the distribution considered and X2p (1 - a) is the number ex- ceeded by a chi square of p degrees of freedom with probability 1 - a. The solution to the problem when lJ and L are unknown is the region R- x,s where x is matrix and =. h: (x - x)" -~,n } , the vector of sample means, S is the sample covariance ~- is the number exceeded by -~,n with probability 1 independently of x Wishart matrix of n s. [ Here t p is a random variable distributed as the smallest root of a standard p x p =N- difficult to determine (1) S-l (x - X) < Ie ~_ 1 degrees of freedom. -~,n Since it is exactly, two approximations were given: 1 en x2 (1 - a)/t (O.5)]{ + { N(n-p+l)r P p t (np)-$: \ F +l(l-S)?1]2 p,n-p 'j 2 -and (2) K.~2) -"N,n ~ K k 2 {N(n-p+l)}" (np);l.{ F (l-S)}.;L] p,n-p+l 2 - -k [{ n X (l-a)/(n-p-l)} + = p where F +1(1 - S) is the number exceeded with probability 1 -S by a p,n-p random variable having the F-distribution with p and n-p+l degrees of freedom and t (S) is the number which is exceeded by t p corresponding regions approximating R- X,s p with probability S. The will henceforth be denoted by ... adding the appropriate superscript to R_ X,s • In this paper we show that the above approximations to K._ -"N,n are inadequate unless n is unusually large and we propose two new approximations which are accurate. These results were obtained primarily by computer simulation, and only the case p =2 was considered. Also we have allowed n to be independent of N, so that S may be an estimate of L obtained independently of the N observations in our sample. 2. Evidence for the Inadequacy of the Earlier Approximation Since t (0.5) and n-p-l both represented central values of the t p P distribution, it was assumed that adequacy or inadequacy of (2) implied the adequacy or inadequacy of (1) (and vice versa). For the computer simulation it was assumed without loss of generality that the true parameters were ~ = a and L = I. R =- { x: x x ~ The region R was thus the circle < (1 - a) } • For specified values of a, S, Nand n, 100 independent sample values of x and S were generated by the method outlined in Section 5, and the proportion " (2) PR of R_ 's that contained R was computed. x,s compared with S. The proportion PR was then A few examples of the resUlts are given in Table 1. each specified value of a, e and For N it was found that a value of n graater than 300 was required in order for the proportion PR to be equal to S. Also for 3 all values of n less than 300, PR was 'smaller than TABLE 1 3 S. . Evidence of the Inadequacy of Approximation A (2) ~~ -~,n for (8 ,a , N) = (.95, .95, 25) n PR 24 75 120 300 500 600 .35 .74 .81 .87 .92 .95 3. The New Approximation to ~,n and Evidence in Support of It The following approximation was shown by simulation to be adequate for p = 2: (3)t~3) -~,n = [{ n X. 2p (l-a)/tp (Q))~ +{N(n-p+l)}-:t (np)"r{Fp,n-p+l(l-Q)}~ ]2 ~. ~ (A table of values of t (8) for p = 2 has been prepared by Kleinbaum and p John (1969). This approximation was suggested by the fact that the approxi- mation (3) is exact in the two special cases n = 00, N= 00. (John, 1968). The method of simulation was the same as that used for studying the earlier approximations except that 200 runs were made for each given set of ( a, N, n) in order to obtain more accuracy. S' However, it was necessary to compute S for specified values of t 2 (S). This was accomplished using the following formula obtained by John (1963): s= [1-F2n(2t2(S»]-~(t)/r(±n)I t~(S)}± where Fm(t) = Pr{X 3 PR = (n-l).-it2 (S) [1-Fn+1 (t 2 (S)], < t } • m- # of times R_x,S 100 contains R (100 runs) 4 2 presents some of the results obtained for various values of a, 8, N Table tIf and n. It was generally found that the approximation was best for small n but different values of N and a produced no specific trend with regard to the accuracy of the approximation. 4. A Fourth Approximation For the computer simulation described in the previous section, PR exceeded 8 for all but four of the parameter sets ( a, 8, Nand n) used; the four exceptions can be attributed to sampling error. was experimentally "safe." matically. Thus approximation (3) Nevertheless, this is difficult to prove mathe- This leads us to a fourth approximation which can be mathematically proved to be safe, although it is generally larger than the approximation (3). Here we say that an approximation ~_ is safe if -~,n pI-{ R_X,s .:> lL X,s } ~ 8 It is easy to see that this is so Pr· { U ~ if ~,n} ~ 8. Approximation (4) is given by i(4) -~,n = T[nx2 p (1 t ( 8 ) p where 8 + 82 - 1 1 a)1t- + {N(n-p+l)!(np5'tFp,n-p+l (1 - 8)}± ]2 2 1 ~ 8 The proof that (4) is safe is as follows: Pr {U ~ fn\Y~l~ aJt+ [ > Pr {tp~ t (8 ) and p l N(n-p+l)}-;t (np)"i· {F (i - ~)~ p,n-p +1(1 - 8 S-l(x -~) ~{N(n-p+l)}-l 2 )},!]2} (np)F p,n....p+1(1-8 2 )} 5 - Pr = - -1 {(x - 1J)"" S - (x 11»' {N(n-p-l)} -1 (np)F p,n-p+1(1-13 2 )} 1 - (1-13 1 ) - (1-13 2) 13 1 + 13 2 -1 > 13 , for and a< 13. < 1, i 1- = 1, 2, if 13 1 + 13 2 - 1 > f3 - In particular we may choose 13 1 and 13 2 so that 5. ~,n is minimum. In this case Outline of Simulation Method Values of x were generated using a standard computer procedure for generating independent N(O,l) variables and then dividing each of a pair by ~ resulti~g S was generated using a computer technique by Odell and Fefveson (1966) and specialized for our problem as follows: (i) (ii) Generate 3 independent N(O,l) variates U 'U and U • 3 1 2 Use U and U to generate two independent chi square variates 2 l VI and V2 using the Wilson-Hilferty approximation, where '--:""X 2 , (iii) " VI =n V2 = 'V n-1' 2 • ~ X 2 n-2 and {I - 2/ [9 n] + U1 [2 I 9 n]~ } 3 (n - l){ 1 - 2 / [9(n - 1)] + U [2 / 9(n 2 l)]~ 3 The variance covariance matrix S is given by S = «sij) where 2 sll = Vl/n, s22 = [V 2 + U3 ]/ n s12 = U3 .JV"; / n = s2l· A To determine whether or not Rcontained R, it was first checked x,s A whether the origin (0,0) was inside R- • x,s ~ A r/> R. If not, then Rx,s If so, it was then determined by solving a fourth degree polynomial whether the circle R 6 A and the ellipse Rhad any points of intersection. x,s there were any real roots of the equation. This depended on whether If so, then R~ ~ R. x,s I \ C Z ~ R provided the point (~X2 X,s the roots were imaginary, then R_ If all (1 - a), 0) " _ • Conversly, if the roots were imaginary and was within the ellipse R x,s 2 (~X (1 - a), 0) was outside the ellipse, then R. 2 X,s R_ ;P 7 4 TABLE 2 : . Support EV1. dence 1n e 0 fA ' t i on ~ "(3) pprox~a ,n Input Parameters .s. .a N n .880 .879 .95 120 5 .895 .879 .50 120 5 .880 .879 .05 120 5 .885 .879 .95 11 5 .890 .879 .50 11 5 .885 .879 .95 5 5 .875 .879 .50 5 5 .900 .835 .95 120 61 .915 .835 .05 120 61 .785 .716 .95 120 61 .795 .716 .95 61 61 .815 .716 .95 25 61 .860 .716 .95 11 61 .835 .716 .95 5 61 .995 .974 .95 120 61 .990 .977 .95 120 25 .990 .981 .95 120 11 .970 .986 .95 120 5 .470 .518 .95 61 5 .545 .518 .95 120 5 .808 .777 .95 120 5 .805 .777 .95 61 5 PR 4 PR = ~ # of times R- X,s 200 contains R (200 runs) REFERENCES 1. John, S. (1963). A tolerance region for multivariate normal distributions. Sankhya, Series A, Vol. 25, pp. 363-368. 2. John, S. (1968). A central tolerance region for the multivariate normal distribution. J. Roy Statist. Soc., Ser. B. 3. Kleinbaum, David G. and S. John (1969). A table of percentage points of the smallest latent root of a 2 x 2 Wishart matrix. (Submitted for publication). 4. Odell, P. L. and Feiveson, A. H. (1966). A numerical procedure to generate a covariance matrix. Journal of the American Statistical Association, Vol. 61, pp. 199-203.