Download Kleinbaum, D.G. and S. John; (1969)A central tolerance region for the multivariate normal distribution, II."

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Statistics wikipedia , lookup

Probability wikipedia , lookup

Degrees of freedom (statistics) wikipedia , lookup

Transcript
..•
This work was supported by the National Institutes of Health,
Institute of General Medical Sciences, Grant No. GM-12868-04
A CENTRAL TOLERANCE REGION
FOR THE MULTIVARIATE NORMAL DISTRIBUTION II
by
David G. Kleinbaum and S. John 1
University of North Carolina
Institute of Statistics Mimeo Series No. 620
January 1969
I
1 Now at the Australian National University, Canberra
I
A CENTRAL TOLERANCE REGION
FOR THE MULTIVARIATE NORMAL DISTRIBUTION II
Introduction
1.
In a recent paper of the same title, John (1968) considered
the problem of determining from a random sample of size N from a
S
p-variate normal distribution a region which with probability
includes the region
R = .{
x:
(x - lJ)"
L
-1
(x - lJ) ~
2
X p (1 - a) }
Here lJ and L are respectively the mean vector and covariance matrix
of the distribution considered and
X2p (1 - a) is the number ex-
ceeded by a chi square of p degrees of freedom with probability 1 - a.
The solution to the problem when lJ and L are unknown is the region
R-
x,s
where
x is
matrix and
=.
h:
(x - x)"
-~,n
}
,
the vector of sample means, S is the sample covariance
~-
is the number exceeded by
-~,n
with probability 1 independently of
x
Wishart matrix of n
s.
[
Here t
p
is a random variable distributed
as the smallest root of a standard p x p
=N-
difficult to determine
(1)
S-l (x - X) < Ie
~_
1 degrees of freedom.
-~,n
Since it is
exactly, two approximations were given:
1
en x2 (1 - a)/t (O.5)]{ + { N(n-p+l)r
P
p
t
(np)-$: \ F
+l(l-S)?1]2
p,n-p
'j
2
-and
(2)
K.~2)
-"N,n
~
K
k 2
{N(n-p+l)}" (np);l.{ F
(l-S)}.;L]
p,n-p+l
2
- -k
[{ n X (l-a)/(n-p-l)}
+
=
p
where F
+1(1 - S) is the number exceeded with probability 1 -S by a
p,n-p
random variable having the F-distribution with p and n-p+l degrees of freedom
and t (S) is the number which is exceeded by t
p
corresponding regions approximating R-
X,s
p
with probability
S.
The
will henceforth be denoted by
...
adding the appropriate superscript to R_
X,s
•
In this paper we show that the above approximations to K._
-"N,n are inadequate unless n is unusually large and we propose two new approximations
which are accurate.
These results were obtained primarily by computer
simulation, and only the case p
=2
was considered.
Also we have allowed
n to be independent of N, so that S may be an estimate of L obtained independently of the N observations in our sample.
2.
Evidence for the Inadequacy of the Earlier Approximation
Since t (0.5) and n-p-l both represented central values of the t
p
P
distribution, it was assumed that adequacy or inadequacy of (2) implied
the adequacy or inadequacy of (1) (and vice versa).
For the computer
simulation it was assumed without loss of generality that the true parameters were
~
= a and
L
= I.
R
=- { x:
x x
~
The region R was thus the circle
<
(1 - a) } •
For specified values of a, S, Nand n, 100 independent sample values of
x
and S were generated by the method outlined in Section 5, and the proportion
" (2)
PR of R_
's that contained R was computed.
x,s
compared with
S.
The proportion PR was then
A few examples of the resUlts are given in Table 1.
each specified value of a,
e and
For
N it was found that a value of n graater than
300 was required in order for the proportion PR to be equal to
S.
Also for
3
all values of n less than 300, PR was 'smaller than
TABLE 1
3
S.
.
Evidence of the Inadequacy of Approximation
A
(2)
~~
-~,n
for (8 ,a , N)
= (.95, .95, 25)
n
PR
24
75
120
300
500
600
.35
.74
.81
.87
.92
.95
3.
The New Approximation to
~,n
and
Evidence in Support of It
The following approximation was shown by simulation to be adequate
for p
= 2:
(3)t~3)
-~,n
= [{
n X. 2p (l-a)/tp (Q))~
+{N(n-p+l)}-:t (np)"r{Fp,n-p+l(l-Q)}~
]2
~.
~
(A table of values of t (8) for p = 2 has been prepared by Kleinbaum and
p
John (1969).
This approximation was suggested by the fact that the approxi-
mation (3) is exact in the two special cases n =
00,
N=
00.
(John, 1968).
The method of simulation was the same as that used for studying the earlier
approximations except that 200 runs were made for each given set of ( a,
N, n) in order to obtain more accuracy.
S'
However, it was necessary to compute
S for specified values of t 2 (S). This was accomplished using the following
formula obtained by John (1963):
s= [1-F2n(2t2(S»]-~(t)/r(±n)I t~(S)}±
where Fm(t) = Pr{X
3 PR =
(n-l).-it2 (S) [1-Fn+1 (t 2 (S)],
< t } •
m-
# of times
R_x,S
100
contains R (100 runs)
4
2 presents some of the results obtained for various values of a, 8, N
Table
tIf
and n.
It was generally found that the approximation was best for small n
but different values of N and a produced no specific trend with regard to
the accuracy of the approximation.
4.
A Fourth Approximation
For the computer simulation described in the previous section, PR
exceeded 8 for all but four of the parameter sets ( a, 8, Nand n) used; the
four exceptions can be attributed to sampling error.
was experimentally "safe."
matically.
Thus approximation (3)
Nevertheless, this is difficult to prove mathe-
This leads us to a fourth approximation which can be mathematically
proved to be safe, although it is generally larger than the approximation (3).
Here we say that an approximation ~_
is safe if
-~,n
pI-{
R_X,s .:>
lL
X,s
} ~
8
It is easy to see that this is so
Pr· { U ~
if
~,n} ~ 8.
Approximation (4) is given by
i(4)
-~,n
= T[nx2 p (1 t ( 8 )
p
where 8 + 82 - 1
1
a)1t- + {N(n-p+l)!(np5'tFp,n-p+l
(1 -
8)}± ]2
2
1
~
8
The proof that (4) is safe is as follows:
Pr {U
~ fn\Y~l~ aJt+
[
> Pr {tp~ t (8 ) and
p l
N(n-p+l)}-;t (np)"i· {F
(i - ~)~
p,n-p
+1(1 - 8
S-l(x -~) ~{N(n-p+l)}-l
2
)},!]2}
(np)F p,n....p+1(1-8 2 )}
5
- Pr
=
-
-1
{(x - 1J)"" S
-
(x 11»' {N(n-p-l)}
-1
(np)F p,n-p+1(1-13 2 )}
1 - (1-13 1 ) - (1-13 2)
13 1 + 13 2 -1
>
13 , for and
a<
13. <
1, i
1-
=
1, 2, if 13 1 + 13 2 - 1 > f3
-
In particular we may choose 13 1 and 13 2 so that
5.
~,n
is minimum.
In this case
Outline of Simulation Method
Values of x were generated using a standard computer procedure for
generating independent N(O,l) variables and then dividing each of a
pair by ~
resulti~g
S was generated using a computer technique by Odell and Fefveson
(1966) and specialized for our problem as follows:
(i)
(ii)
Generate 3 independent N(O,l) variates U 'U and U •
3
1
2
Use U and U to generate two independent chi square variates
2
l
VI and V2 using the Wilson-Hilferty approximation, where
'--:""X 2
,
(iii)
"
VI
=n
V2
=
'V
n-1'
2
•
~
X 2
n-2
and
{I - 2/ [9 n] + U1 [2 I 9 n]~ } 3
(n - l){ 1 - 2 / [9(n - 1)] + U [2 / 9(n 2
l)]~
3
The variance covariance matrix S is given by
S
=
«sij)
where
2
sll = Vl/n, s22 = [V 2 + U3 ]/ n
s12
= U3
.JV"; / n = s2l·
A
To determine whether or not Rcontained R, it was first checked
x,s
A
whether the origin (0,0) was inside R- •
x,s
~
A
r/> R.
If not, then Rx,s
If so, it
was then determined by solving a fourth degree polynomial whether the circle R
6
A
and the ellipse Rhad any points of intersection.
x,s
there were any real roots of the equation.
This depended on whether
If so, then R~ ~ R.
x,s
I \ C Z
~ R provided the point
(~X2
X,s
the roots were imaginary, then R_
If all
(1 - a), 0)
" _ • Conversly, if the roots were imaginary and
was within the ellipse R
x,s
2
(~X
(1 - a), 0) was outside the ellipse, then
R.
2
X,s
R_ ;P
7
4
TABLE 2 :
. Support
EV1. dence 1n
e
0
fA
' t i on ~
"(3)
pprox~a
,n
Input Parameters
.s.
.a
N
n
.880
.879
.95
120
5
.895
.879
.50
120
5
.880
.879
.05
120
5
.885
.879
.95
11
5
.890
.879
.50
11
5
.885
.879
.95
5
5
.875
.879
.50
5
5
.900
.835
.95
120
61
.915
.835
.05
120
61
.785
.716
.95
120
61
.795
.716
.95
61
61
.815
.716
.95
25
61
.860
.716
.95
11
61
.835
.716
.95
5
61
.995
.974
.95
120
61
.990
.977
.95
120
25
.990
.981
.95
120
11
.970
.986
.95
120
5
.470
.518
.95
61
5
.545
.518
.95
120
5
.808
.777
.95
120
5
.805
.777
.95
61
5
PR
4
PR
=
~
# of times R-
X,s
200
contains R (200 runs)
REFERENCES
1.
John, S. (1963). A tolerance region for multivariate normal
distributions. Sankhya, Series A, Vol. 25, pp. 363-368.
2.
John, S. (1968). A central tolerance region for the multivariate normal distribution. J. Roy Statist. Soc., Ser. B.
3.
Kleinbaum, David G. and S. John (1969). A table of percentage
points of the smallest latent root of a 2 x 2 Wishart matrix.
(Submitted for publication).
4.
Odell, P. L. and Feiveson, A. H. (1966). A numerical procedure to generate a covariance matrix. Journal of the American
Statistical Association, Vol. 61, pp. 199-203.