Download Verified Computation of Square Roots of a Matrix

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

System of linear equations wikipedia , lookup

Rotation matrix wikipedia , lookup

Eigenvalues and eigenvectors wikipedia , lookup

Principal component analysis wikipedia , lookup

Determinant wikipedia , lookup

Four-vector wikipedia , lookup

Jordan normal form wikipedia , lookup

Matrix (mathematics) wikipedia , lookup

Singular-value decomposition wikipedia , lookup

Orthogonal matrix wikipedia , lookup

Non-negative matrix factorization wikipedia , lookup

Perron–Frobenius theorem wikipedia , lookup

Matrix calculus wikipedia , lookup

Gaussian elimination wikipedia , lookup

Cayley–Hamilton theorem wikipedia , lookup

Matrix multiplication wikipedia , lookup

Transcript
Preprint BUW-SC 09/2
Bergische Universität Wuppertal
Fachbereich C – Mathematik und Naturwissenschaften
Mathematik
Andreas Frommer, Behnam Hashemi
Verified Computation of Square Roots of a Matrix
April 2009
http://www.math.uni-wuppertal.de/SciComp/
VERIFIED COMPUTATION OF SQUARE ROOTS OF A MATRIX
ANDREAS FROMMER∗ AND BEHNAM HASHEMI†
Abstract. We present methods to compute verified square roots of a square matrix A. Given an
approximation X to the square root, obtained by a classical floating point algorithm, we use interval
arithmetic to find an interval matrix which is guaranteed to contain the error of X. Our approach
is based on the Krawczyk method which we modify in two different ways in such a manner that
the computational complexity for an n × n matrix is reduced to n3 . The methods are based on the
spectral decomposition or, in the case that the eigenvector matrix is ill conditioned, on a similarity
transformation to block diagonal form. Numerical experiments prove that our methods are computationally efficient and that they yield narrow enclosures provided X is a good approximation. This
is particularly true for symmetric matrices, since their eigenvector matrix is perfectly conditioned.
Keywords: matrix square root, Brouwer’s fixed point theorem, Krawczyk’s method, Kronecker
structures, interval analysis, circular arithmetic.
1. Introduction. Let A be a given n × n matrix. In this paper we are interested
in computing enclosing intervals for all entries of a matrix square root A1/2 of A. Here,
the matrix square root A1/2 is to be understood, as usually, as the extension of the
square root function (·)1/2 : C → C, z 7→ z 1/2 to n × n matrices in the operator
theoretic sense. Such A1/2 , which is called a primary square root, is usually not
unique; it can be characterized as p(A) where p is any polynomial which interpolates
(·)1/2 at the eigenvalues of A in the Hermite sense w.r.t. their algebraic multiplicity.
The non-uniqueness of a primary square root A1/2 is due to the fact that z 1/2 takes two
different values for z 6= 0, and we are free in choosing either value for the interpolation
polynomial p. As an example let A = I, the identity in C2×2 . Then A has exactly
two primary square roots which are given by
1 0
−1 0
,
.
0 1
0 −1
An outstanding reference on the various aspects of the matrix square root is
Higham’s recent book [14] on which this introduction is based.
Every square root X of a matrix A satisfies the equation
(1.1)
F (X) = X 2 − A = 0.
But there might be solutions to (1.1) other than primary square roots of A. For
example, if again A = I ∈ C2×2 , the matrices
−1 0
1 0
,
0 1
0 −1
as well as, e.g.
0
1
1
0
,
∗ Department of Mathematics,
University of Wuppertal, 42097 Wuppertal, Germany
[email protected]
† Department of Applied Mathematics, Faculty of Mathematics and Computer Sciences, Amirkabir University of Technology, No.424 Hafez Avenue, Tehran 15914, Iran hashemi am @aut.ac.ir,
[email protected]
1
also solve (1.1). Any solution of (1.1) is called a square root of A; if it is not a matrix
function in the operator theoretic sense, it will be called ”non-primary”.
Computing (primary) matrix square roots numerically has been considered previously by several authors including [2, 5, 6, 8, 7, 10, 11, 12, 16, 22]. Björck and
Hammarling [5] have offered a method based on the Schur decomposition and a fast
recursion. However, if A is real this method may require complex arithmetic even
if the desired root is itself real. The method of [5] was extended by Higham [11] to
compute real square roots of a real matrix using real arithmetic. Newton’s method
has also been used to compute matrix square roots of A ∈ Cn×n by Higham [10].
Any nonsingular complex (real) square matrix A has a primary square root. While
the equation (1.1) may have infinitely many solutions (for n ≥ 2 any involutory matrix
is a solution to (1.1) for A = I), a nonsingular Jordan block has precisely two primary
square roots [12]. If A is singular, the existence of a square root depends on the Jordan
structure of the zero eigenvalue. More precisely, the matrix A has a primary square
root if and only if rank(A) = rank(A2 ) [15]. If A is real and nonsingular, it may
or may not have a real primary square root, a sufficient condition for one to exist is
that A have no real negative eigenvalues [12]. Any symmetric positive (semi)definite
matrix has a unique symmetric positive (semi)definite primary square root [15].
Any matrix A having no nonpositive real eigenvalues (e.g., every nonsingular Mmatrix) has a unique primary square root for which every eigenvalue has positive
real part, called the principal square root. One way to compute the principal square
root of a real diagonalisable matrix A is to use the spectral decomposition [33]. A
method for computing the principal (symmetric positive definite) square root A1/2 of
a symmetric positive definite matrix A is based on a Cholesky decomposition of A
and the polar decomposition of the Cholesky factor [12].
The general interest in matrix square roots reflects its importance in many applications like in the Neumann-Dirichlet mapping for elliptic boundary value problems
and related numerical methods to their solution, in the inverse scaling and squaring
method for the matrix logarithm which, for example, is used to produce generators
of Markov models, in the orthogonal Procrustes problem or in quadratic nonlinear
eigenproblems; see [14].
The number of matrix square roots depends on the Jordan structure of A. Higham
(see [11] or [14]) has classified the square roots of a nonsingular matrix A in a manner
which makes clear the distinction between primary and non-primary ones. The precise
result is given in the following lemma.
Lemma 1.1. Let the nonsingular matrix A ∈ Cn×n have the Jordan canonical
form Z −1 AZ = J = diag(J1 , J2 , ..., Jp ), and let s ≤ p be the number of distinct
eigenvalues of A. Then A has precisely 2s primary square roots.
Moreover, all primary square roots of A are isolated solutions of (1.1), characterized by the fact that the sum of any two of their eigenvalues is nonzero. The nonprimary square roots of A form a finite number of parametrized families of matrices;
each family contains infinitely many solutions of (1.1) which share the same spectrum. The non-primary sqaure roots are non-isolated, i.e. each neighborhood contains
an infinity of other solutions of (1.1).
The purpose of this paper is to develop methods based on interval arithmetic
which obtain guaranteed error bounds for a primary square root of a matrix A. A
classical, floating point numerical computation will always yield a result which is not
an exact square root of A but rather an approximation to it. The reasons for this
are manifold like the presence of rounding errors or that one has to stop an iteration
2
which in theory should run forever. Given a thus computed approximate square root
X̌ of A, our methods will compute an interval matrix with a small diameter, close
to X̌, for which, by the computation, it has been mathematically proven that the
interval matrix contains an exact square root of A. In this manner, we get verified,
reliable error bounds for each entry of the square root of A.
Interval methods of this kind have a long tradition, and there have been developed many approaches to compute enclosures for solutions of systems of linear and
nonlinear equations. We refer to the monographs [1, 17, 24, 25], e.g., for further
reference. Interval computations have to be supported by software providing for a
correct implementation of machine interval arithmetic including outward rounding,
see [1, 17, 24, 25], and for an easy use of interval operations. Prominent examples
are C-XSC [19] and INTLAB [32]. INTLAB is a MATLAB toolbox which is freely
available for non-commercial use. The numerical examples presented here have been
obtained using INTLAB.
The organization of this paper is as follows. In section 2, we introduce our notation, review some basic concepts and prove a theoretical result relating the fixed
points of one function to the zeros of another. This result will be used in Section 3 to
develop two variants of the standard Krawczyk operator which are particularly useful
for computing enclosures for a matrix square root, since they keep the complexity
down to cubic. These variants can be used in computational existence tests based
on -inflation which we discuss in Section 4. Possible extension of our methods will
be shortly discussed in section 5 before we report numerical results for several test
problems in section 6.
2. Notation and fundamental results. Even if A is a real matrix, its square
root may be complex, so C is the natural field to work with. There are two different
established ways, circular and rectangular arithmetic, of extending the concept of real
interval arithmetic to C. We will use circular arithmetic throughout, implying that
the set of complex “intervals” IC = ICdisc is given by all finite discs a in C with
midpoint mid (a) and radius rad (a). We refer to [1] for details on the definition of
the arithmetic operations in IC.
We at least partly aim at using the standard notations of interval analysis defined
in [18]. So ICn and ICn×n denotes the set of all interval vectors and the set of all n×n
interval matrices, respectively. All interval quantities will be typeset in boldface. For
a ∈ IC its absolute value is |a| := max{|a| | a ∈ a} = |mid (a)| + rad (a). The hull
2(a, b) of two intervals in IC is the interval of smallest radius containing a and b.
Since the set theoretic intersection of two intervals from IC will usually not be in IC,
we may define intersect(a, b) to be just any operator which produces an interval c ∈ IC
such that c ⊇ a ∩ b with c being empty if a and b are disjoint. The construction of a
“sharp” intersect operator is somewhat tedious; see intersect.m in INTLAB which
also provides a machine circular arithmetic using outward rounding.
For interval vectors and matrices, rad , mid , | · |, 2, intersect will be applied componentwise, thus producing results of the same dimension as the arguments.
Note that for reasons of computational efficiency, the default arithmetic for real
intervals in INTLAB is midpoint-radius arithmetic, i.e. the real analogon of complex
circular arithmetic, see [32].
The following two results will be needed in the proof of Theorem 2.3 below. We
use the projections of a set S ⊆ ICn onto its components, defined for i = 1, . . . , n as
Pi S = {si : s = (s1 , . . . , sn )T ∈ S} ⊆ C.
3
Lemma 2.1. Let A ∈ Cn×n .
a) Let x ∈ ICn , y = Ax and S = {Ax : x ∈ x}. Then
y i = Pi S for i = 1, . . . , n,
and
rad (y) = |A|rad (x).
b) If there exists u ∈ Rn , ui > 0 for i = 1, . . . , n such that
|A|u < u,
(2.1)
then ρ(A) < 1, where ρ denotes the spectral radius.
Proof. To show a), let aij ∈ C denote the entries of A, fix i and observe that
(2.2)
Pi S = {
n
X
aij xj : x = (x1 , . . . , xn )T ∈ x }.
j=1
In complex circular arithmetic we have a · b = {ab : b ∈ b} for any a ∈ C, b ∈ IC
as well as c + d = {c + d : c ∈ c, d ∈ d} for c, d ∈ IC; see [1, Ch. 6], for example.
Therefore, (2.2) gives
Pi S =
n
X
aij xj ,
j=1
i.e. Pi S = y i which proves the first part of a). The second part of a) follows since
rad (ab) = |a|rad (b) and rad (a + b) = rad (a) + rad (b) for a ∈ C, a, b ∈ IC, see [1,
Ch. 6], e.g., again. The inequality (2.1) in part b) implies kAku < 1 for the weighted
maximum norm kxku = maxni=1 {|xi |/ui }, and thus ρ(A) ≤ kAku < 1. For details of
this standard result see [25, Prop. 3.7.2] or [34, Cor.1.14], e.g.
Any arithmetic expression involving the components x1 , . . . , xN of a vector x ∈
CN defines a function ϕ : D ⊆ CN → C, and N such expressions give a function
f : D ⊆ CN → CN , one for each componenent fi of f . By slight abuse of notation
we identify the functions with their arithmetic expressions. Replacing the real vector
x by an interval vector x ∈ ICN we thus obtain an interval extension f of f . Note
that since the distributive law does not hold for intervals, f will actually depend on
the expression representing f . By the inclusion property of interval arithmetic, the
range of f over an interval is contained in its interval extension, i.e.
{f (x) : x ∈ x} ⊆ f (x).
A mapping
A : D × D → Cn×n
is called a slope for f if
(2.3)
f (y) − f (x) = A(y, x)(y − x) for all x, y ∈ D.
Let A be an interval matrix containing all slopes A(y, x) for y ∈ x, x a given interval
vector. For example, if x ∈ x and f is continuously differentiable on x, due to the
4
mean value theorem – applied to each component fi individually – we can take any A
which contains the set {f 0 (y) : y ∈ x}. A standard choice is thus to take A = f 0 (x),
the interval arithmetic evaluation of (an arithmetic expression for) f 0 (x).
For a given matrix R ∈ Cn×n , the Krawczyk operator k(x̌, x) is now defined as
(2.4)
k(x̌, x) = x̌ − Rf (x̌) + (I − R · A)(x − x̌), x̌ ∈ x ⊂ D.
The crucial property of k is summarized in the following theorem.
Theorem 2.2. Assume that f : D ⊂ CN → CN is continuous in D. Let x̌ ∈ D
and x ∈ ICn x ⊆ D. Assume that A is an interval matrix containing all slopes A(y, x̌)
for y ∈ x and let R ∈ Cn×n . If we have
(2.5)
k(x̌, x) ⊆ int x,
where int x is the topological interior of x, then f has a zero x∗ in k(x̌, x). Moreover,
if A also contains all slopes A(y, x) for x, y ∈ x, the zero x∗ is the only zero of f in
x.
In the stated form, this theorem is due to Rump [31]; it goes back to Krawczyk
[20], see also [23]. The relation (2.5) is likely to hold if f (x) is small, i.e. x is a good
approximation to a zero of f , and all entries of the interval matrix I − RA are small,
i.e. RA is close to the identity. The standard choice is to take R as a (numerically
computed and thus approximate) inverse of mid (A).
In the case of the matrix square root, as we will see in Section 3, it is too costly
to compute such an approximate inverse R. However, a version relying on a factorization of R will be available at reasonable cost. It is not entirely trivial to see that
such a modification can successfully be used to prove the existence of a zero in the
same manner as with the original Krawczyk operator. For this reason, we prove the
following slight generalization of Theorem 2.2 which we believe to express the essence
of all Krawczyk type verification methods. In its formulation we represent x as x̌ + z,
thus separating the approximate zero x̌ from the enclosure of its error, z.
Theorem 2.3. Assume that f : D ⊂ CN → CN is continuous in D. Let x̌ ∈ D
and z ∈ ICn be such that x̌ + z ∈ D. Moreover, assume that A ⊂ Cn×n is a set of
matrices containing all slopes A(x̌, y) for y ∈ x̌ + z =: x. Finally, let R ∈ Cn×n .
Denote Kf (x̌, R, z, A) the set
(2.6)
Kf (x̌, R, z, A) := {−Rf (x̌) + (I − RA)z : A ∈ A, z ∈ z}.
Then, if
(2.7)
Kf (x̌, R, z, A) ⊆ int z,
the function f has a zero x∗ in x̌ + Kf (x̌, R, z, A) ⊆ x. Moreover, if A also contains
all slope matrices A(y, x) for x, y ∈ x, then this zero is unique in x.
Proof. Define
g(x) = x − Rf (x), x ∈ x.
We have
g(x) = x − Rf (x)
= x̌ + (x − x̌) − R(f (x̌) + A(x, x̌)(x − x̌))
= x̌ − Rf (x̌) + (I − RA(x, x̌))(x − x̌).
5
Thus, (2.7) implies {g(x) : x ∈ x} ⊆ x, so that by Brouwer’s fixed point theorem g
has a fixed point x∗ in x. It remains to show that such x∗ is a zero of f and that
it is unique. We first show that (2.7) implies that R as well as every A ∈ A are
non-singular. To this purpose, fix A ∈ A. By Lemma 2.1 a) we have
Pi {(I − RA)z : z ∈ z} = ((I − RA)z)i , i = 1, . . . , n.
Therefore, by (2.7),
−Rf (x̌) + (I − RA)z ⊆ int z.
Herein, both sides represent interval vectors from ICn . We can therefore turn to radii
to obtain
(2.8)
rad ((I − RA)z) = |I − RA|rad (z) = rad ((I − RA)z) < rad (z),
where the first equality is due to Lemma 2.1 a). By Lemma 2.1 b) we obtain ρ(I −
RA) < 1 which proves that R as well as A are non-singular.
Now, R being non-singular shows that every fixed point x∗ of g is indeed a zero
of f . Moreover, if x∗∗ 6= x∗ is another zero of f in x, then the corresponding slope
matrix A = A(x∗∗ , x∗ ) satisfies 0 = f (x∗∗ ) − f (x∗ ) = A(x∗∗ − x∗ ). But if A contains
all these slopes, then this is impossible since we just showed that every A ∈ A is
non-singular.
We will use Theorem 2.3 for the case where A is obtained as the result A of an
interval arithmetic evaluation of (an expression for) f 0 and where 0 ∈ z, i.e. x̌ ∈ x. So
both, the existence and the uniqueness result will hold once (2.7) is satisfied. Let us
note, in passing, that our proof of Theorem 2.3 does not require A to be an interval
matrix, whereas the fact that z is an interval vector appears to be essential to obtain
the crucial inequality (2.8).
To end this section, we turn to the Kronecker product of matrices. For two
matrices A ∈ Cm×n and B ∈ Ck×t it is given by the mk × nt block matrix


a11 B . . . a1n B


..
..
..
A⊗B =
.
.
.
.
am1 B
...
amn B
For A = (aij ) ∈ Cm×n the vector vec(A) ∈ Cmn is obtained by stacking the
columns of A, i.e.,
vec(A) = (a11 , . . . , am1 , a12 , . . . , am2 , . . . , a1n , . . . , amn )T .
We will need the pointwise or Hadamard division of two matrices A, B ∈ Cn×m which
we denote as ·/, i.e. we have
A · /B = C ∈ Cn×m , where C = (cij ) with cij = aij /bij .
Finally, for d = (d1 , . . . , dn )T ∈ Cn , the matrix Diag (d) denotes the diagonal matrix
in Cn×n whose i-th diagonal entry is di . We extend this to matrices: For D ∈ Cn×m
we put Diag (D) = Diag (vec(D)) ∈ Cnm×nm .
The following properties of the Kronecker product and the vec operator will turn
out to be useful. For parts a) and b) see [15], e.g.; part c) is trivial.
Lemma 2.4. For any real matrices A, B, C and D with compatible sizes we have
6
a) (A ⊗ B)(C ⊗ D) = (AC ⊗ BD).
b) vec(ABC) = (C T ⊗ A)vec(B).
c) Diag (A)−1 vec(B) = vec(B · /A).
Part c) also holds for interval matrices. Note that there is only a subdistributive,
but not a distributive, law in interval arithmetic, i.e. (a+b)c ⊆ ac+bc for a, b, c ∈ IC.
This implies that interval matrix multiplication is not associative and that equalities
analogous to a) and b) do in general not hold for interval matrices. However, due to
the inclusion property of interval arithmetic, we have the following result which will
be repeatedly used without further mention in the next section.
Lemma 2.5. Let A, B, C be interval matrices of compatible sizes. Then
vec((AB)C)
T
{x = (C ⊗ A)vec(B) : A ∈ A, B ∈ B, C ∈ C} ⊆
.
vec(A(BC))
3. Computing enclosures for Kf . For A, X ∈ Cn×n , the equation
F (X) = X 2 − A = 0
2
can be reformulated, interpreting matrices as vectors in Cn via x = vec(X), a =
vec(A) and using Lemma 2.4 as
f (x) = (I ⊗ X)x − a = 0.
In the sequel we will use uppercase letters to denote matrices and the corresponding
lower case letters to denote the vector related to the matrix via the vec operator. In
the aim of keeping the exposition simple, we will do so implicitly but systematically.
From
F (X + H) = F (X) + XH + HX + H 2
we see that the Fréchet derivative of F , applied to a direction H ∈ Cn×n , is given as
F 0 (X)H = XH + HX.
By Lemma 2.4, this translates into
f 0 (x)h = (I ⊗ X + X T ⊗ I)h and f 0 (x) = I ⊗ X + X T ⊗ I.
Consequently, the standard Krawczyk operator (2.4) for this particular function f is
given as
k(x, x) = x − R[(I ⊗ X)x − a] + (I − R(I ⊗ X + X T ⊗ I))(x − x̌),
2
2
where R ∈ Cn ×n is a computed (approximate) inverse of I ⊗ mid X + mid X T ⊗ I.
Computational experiments to be reported in Section 6 show that with this Krawczyk
operator, we can indeed obtain quite tight enclosures for a square root of A. Its severe
disadvantage, however, is its computational complexity: The matrix R will not have a
nice Kronecker structure, and it will usually be a full matrix. Each of the n2 columns
of (I ⊗ X + X T ⊗ I) has 2n non-zeros, so computing its product with R will require
n3 operations for each entry, i.e. a total cost of O(n5 ), notwidthstanding the cost
for computing R. If we take the approach to compute an interval matrix containing
R(I⊗X+X T ⊗I) by solving n2 linear systems with the matrix I⊗mid X+mid X T ⊗I,
7
(and interval right hand sides), we can use the Schur decomposition of X̌ to solve each
system with O(n3 ) operations as it is done routinely in methods for the Sylvester
equation, see [3]. The resulting total cost is again O(n5 ). So O(n5 ) represents a
prohibitively rapidly growing dominant cost in evaluating k(x, x). Note that [21]
suggested a method for computing enclosures of the matrix square root with similar
complexity, based on an interval Newton method rather than on Krawczyk’s method.
We now develop variants of the Krawczyk operator, relying on Theorem 2.3, which
have a computational cost of 0(\3 ). For simplicity, we assume that A is diagonaizable.
An extension to the general case will be described in section 5. So assume that
(3.1) A = V ΛW, with V, W, Λ ∈ Cn×n , Λ = Diag (λ1 , . . . , λn ) diagonal, V W = I.
The square roots of A from (3.1) are given as V Λ1/2 W , where
1/2
Λ1/2 = Diag (λ1 , . . . , λ1/2
n ).
So if X is a computed accurate approximate square root of A, we can expect W XW −1
as well as V −1 XV to be close to the diagonal matrix Λ1/2 . From the identity
f 0 (x) = I ⊗ X + X T ⊗ I
= (V −T ⊗ W −1 ) · I ⊗ (W XW −1 ) + (V −1 XV )T ⊗ I · (V T ⊗ W ),
we see that an approximate inverse for f 0 (x) is given in factorized form by the matrix
(3.2)
R = (V −T ⊗ W −1 ) · ∆−1 · (V T ⊗ W ), where ∆ = I ⊗ Λ1/2 + Λ1/2 ⊗ I.
Now assume that V, W and Λ are numerically computed quantities obtained by
using a standard method to get the decomposition (3.1) like MATLAB’s eig function,
e.g. Note that we assume V , the matrix of right eigenvectors, to be a computed
quantity, as well as W , the matrix of left eigenvectors. This implies that we do not
assume that W V = I holds exactly. Similarly, the computed diagonal matrix Λ will
generally not have the exact eigenvalues on its diagonal. Note also that we could as
well obtain V, W from an eigen decomposition of X, but this might require complex
arithmetic while it is not reqiured for the eigen decomposition of A. Let D denote
the matrix for which
(3.3)
∆ = Diag (D), D = [d1 | . . . |dn ] ∈ Cn×n .
For any matrix X ∈ Cn×n and any vector z ∈ Cn we have
(I − R(I ⊗ X + X T ⊗ I))z
= (V −T ⊗ W −1 )D−1 D − I ⊗ (W XW −1 ) − (V −1 XV )T ⊗ I
V T ⊗ W z.
The latter expression is rich in Kronecker products, so that due to Lemma 2.4 b)
we can efficiently compute u = (I − R(I ⊗ X + X T ⊗ I))z as described
in Algorthm 1.
Lines 4-6 compute Q := D − I ⊗ (W XW −1 ) − (V −1 XV )T ⊗ I y in a way which
at first glance may look artificial. The fact that we gather all parts which belong to
a “diagonal block”, i.e. the n × n matrix (Diag (di ) − Sii I − T ), before multiplying
with Yi , will be crucial when turning to the interval arithmetic counterpart: Without
building the entire diagonal blocks first, we would obtain wider intervals due to the
subdistributive law, and the computed result would no longer reflect the fact that
I − Rf 0 (x) is close to the zero matrix. Note also that machine interval arithmetic as
8
Algorithm 1 Efficient computation of u = (I − Rf 0 (x))z
1: Compute Y = W ZV
{The j-th column of Y will be denoted Yj }
2: Compute S = V −1 XV
{S is an n × n matrix with entries Sij }
3: Compute T = W XW −1
4: for i = 1, . . . , n do {Compute columns Qi of matrix Q}
Pn
{See (3.3)}
5:
Compute Qi = − j=1,j6=i Sji Yj + (Diag (di ) − Sii I − T ) Yi
6: end for
7: Compute N = Q · /D
8: Compute U = W −1 N V −1
implemented in INTLAB or C-XSC [19] is particularly efficient if the level 3 BLAS
are used as much as possible. Therefore, we actually will use an alternative way to
compute Q from lines 4-6: Define S0 = S − Diag (S11 , . . . , Snn ), i.e. the diagonal
entries of S are replaced by zeros in S0 . Then, using Lemma 2.4 b), we see that
(3.4) Q = −Y S0 + [c1 | . . . |cn ], where ci = (Diag (di ) − Sii I − T )Yi , i = 1, . . . , n.
The following lemma analyzes the cost of Algorithm 1.
Lemma 3.1. Algorithm 1 requires O(n3 ) arithmetic operations, independently of
whether we use lines 5-7 or (3.4) instead.
Proof. One sees immediatley that the computation of Y, S, T, N and U have cost
O(n3 ). In lines 4-6, the cost for each i is O(n2 ), since we have n − 1 scalings of
n-vectors Yj , one matrix-vector multiplication and the addition of n vectors. In total,
the for-loop thus also has cost O(n3 ). If we replace it by (3.4), the cost is one matrixmatrix multiplication and n matrix-vector multiplications, which is again O(n3 ).
Algorithm 1 paves the way for an efficient computation of an interval vector
containing the set Kf (x̌, R, z, A) with A = f 0 (x̌ + z) = I ⊗ (X̌ + Z) + (X̌ + Z)T ⊗ I
from (2.6). Basically, due to the inclusion property of interval arithmetic, we just
have to replace the point quantities X and Z in Algorithm 1 by X̌ + Z and Z to
obtain an interval vector u = vec(U ) containing
Kf (x̌, R, z, f 0 (x̌ + z)) + Rf (x̌) = {(I − RA)z : A ∈ f 0 (x̌ + z), z ∈ z}.
There is one particularity, however: V −1 and W −1 will usually not be available as
exact inverses of the computed matrices V and W . We therefore assume that we have
precomputed interval matrices I V , I W which are known to contain V −1 and W −1 ,
resp. Incorporating the computation of −Rf (x̌), we obtain Algorithm 2 which, when
implemented in machine interval arithmetic, will output an interval vector k = vec(K)
containing the set Kf (x̌, R, z, f 0 (x̌ + z)). Note that at several places (starting at line
1), we do not indicate in which order the various multiplications of interval matrices
have to be performed. The inclusion property of interval arithmetic will guarantee
k ⊇ Kf (x̌, R, z, f 0 (x̌ + z)) for whatever order we choose.
In a manner completely analoguous to the proof of Lemma 3.1 one gets
Lemma 3.2. Algorithm 2 requires O(n3 ) arithmetic operations.
The wrapping effect, i.e. the increase of diameters due to the multiplication of
interval matrices will be quite noticeable in Algorithm 2. For example, when we
evaluate −Rf (x) in lines 11-14, the computation of G and the computation of L
9
Algorithm 2 Computation of an interval matrix K such that vec(K) contains Kf
from (2.6)
1: Compute Y = W ZV
{The j-th column of Y will be denoted Y j }
2: Compute S = I V (Z + X̌)V
{S is an n × n interval matrix with entries S ij }
3: Compute T = W (Z + X̌)I W
4: for i = 1, . . . , n do {We use (3.4)}
5:
compute ci = (Diag (di ) − S ii I − T )Y i
6: end for
7: Compute Q = −Y S 0 + [c1 | . . . |cn ]
8: Compute N = Q · /D
9: Compute U = I W N I V
10: {Lines 11-14 evaluate −Rf (x)}
11: Compute F = X · X − A
{F is an interval matrix due to outward rounding}
12: Compute G = W F V
13: Compute H = G · /D
14: Compute L = −I W HI V
15: Compute K = L + U
produce two wrapping effects (on n × n matrices), each. Similarly, the computations
of Y , S, T , Q and N also produce wrapping effects. The wrapping effect will be
more pronounced if the dimension n is large and if the matrices are ill conditioned.
Algorithm 2 will then compute an interval matrix K which will be substantially larger
than the set Kf (x̌, R, z, f 0 (x̌ + z)), and this might then prevent us to computationally
verify the crucial condition (2.7).
We therefore now develop a second approach which suffers from less wrappings.
We expect it to work for larger and less well conditioned matrices. As we will see later,
it will, on the other hand, yield a slightly weaker result with respect to uniqueness.
We start again from the spectral decomposition (3.1),
A = V ΛW,
and we assume again that we have computed a quite accurate approximate square
root X, so that W XW −1 and V −1 XV is close to Λ1/2 from (3.2). The new idea is
to transform f affinely such that the derivative at X is close to ∆ from (3.2), i.e. we
put
fˆ(x̂) = (V T ⊗ W ) · f (V −T ⊗ W −1 )x̂ .
Clearly, for any x̂,
fˆ0 (x̂) = (V T ⊗ W )f 0 (V −T ⊗ W −1 ) · x̂ · (V −T ⊗ W −1 )
= (V T ⊗ W ) · (I ⊗ X + X T ⊗ I) · (V −T ⊗ W −1 ) where X = W −1 X̂V −1
= (I ⊗ W XW −1 + (V −1 XV )T ⊗ I).
Consequently, if X is close to a square root of A, then fˆ0 (x̂) ≈ ∆, so that we can
use R̂ = ∆−1 as an approximate inverse in a Krawczyk type approach, now for the
function fˆ. If this approach yields an interval vector x̂ known to contain exactly one
zero of fˆ, then by transforming back we see that the parallelepiped
Π = {(V −T ⊗ W −1 )x̂ : x̂ ∈ x̂}
10
contains exactly one zero of f . Using (machine) interval arithmetic, we get an interval
matrix X containing Π as X = W −1 X̂V −1 . Note that we cannot assert that the such
computed X contains exactly one zero of f ; we just know that it contains an isolated
zero. This is a qualitative difference compared to the first approach. It represents the
price to pay for obtaining a method which will be applicable to larger matrices and
which will usually get narrower enclosures.
Let us mention the following subtlety: We will usually have computed an approxˆ = W X̌V we see that
imate square root X̌ of A. With X̌
ˆ ) = (V T ⊗ W )f (x̌) = vec W (X̌ 2 − A)V .
fˆ(x̌
ˆ , see
This relation will allow us to avoid computing any transformation from x̌ to x̌
Algorithm 5 below.
So the point is now to compute an interval matrix K̂ such that vec(K̂) contains
the set
(3.5)K̂ = Kfˆ(x̌, ∆, ẑ, Â) = −∆−1 (V T ⊗ W )f (x̌) + {(I − ∆−1 Â)ẑ : ẑ ∈ ẑ, Â ∈ Â}
where  = (I ⊗ W XW −1 + (V −1 XV )T ⊗ I), X = W −1 ẐV −1 + X̌.
Algorithm 3 does so in analogy to Algorithm 2.
Algorithm 3 Computation of an interval matrix K̂ such that vec(K̂) contains K̂
from (3.5)
1:
2:
3:
4:
5:
6:
7:
8:
9:
10:
11:
12:
13:
Compute Z = I W ẐI V .
Compute S = I V (Z + X̌)V
{S is an n × n interval matrix with entries S ij }
Compute T = W (Z + X̌)I W
for i = 1, . . . , n do
compute ci = (Diag (di ) − S ii I − T )Ẑ i
end for
Compute Q̂ = −ẐS 0 + [c1 | . . . |cn ]
Compute N̂ = Q̂ · /D
{lines 10-12 evaluate −∆−1 (V T ⊗ W )f (x̌)}
Compute F = X̌ · X̌ − A
{F is an interval matrix due to outward rounding}
Compute F̂ = W F V
Compute Ĥ = −F̂ · /D
Compute K̂ = Ĥ + N̂
The computation of Ĥ in lines 10-12 now produces only two wrappings as opposed
to four in the corresponding part of Algorithm 2, and similar savings in wrappings
also arise in the other parts of the algorithm.
The following result is immediate.
Lemma 3.3. Algorithm 3 requires O(n3 ) arithmetic operations. It also requires
O(n3 ) less operations than Algorithm 2.
4. Krawczyk type verification methods. Algorithm 2 presented in Section 3
computes an interval vector k = vec(K) containing the set
Kf = −Rf (x̌) + {(I − RA)z : A ∈ f 0 (z + x̌), z ∈ z},
with R from (3.2). By Theorem 2.3, if 0 ∈ z and
(4.1)
k ⊆ int z,
11
then x̌ + k contains a solution of (1.1) which is unique in x̌ + z. We then even know
that this solution of (1.1) is a prinicpal square root of A, since otherwise it cannot be
isolated by Lemma 1.1.
The question now arises of how to choose the interval vector z such that (4.1)
is likely to take place. One of the most successful approaches is to use the so-called
-inflation, see [29, 30]. It starts from the observation that it is natural to assume
0 ∈ K, so −Rf (x̌) has to be contained in z. Since 0 ∈ z is mandatory, we start
with an interval vector z which is slightly larger than the interval hull of −Rf (x̌)
and 0. To be precise, we say that we -inflate a given interval vector z if we apply
the following manipulations: We first increase the radius of each component by 10%
plus δ, while keeping the midpoint. Here, δ is the smallest positive number of the
floating point screen, i.e. δ = 2−1023 in IEEE double precision. We then form the
interval hull of this vector and the zero vector. If the condition (4.1) is not satisfied
with the -inflation z of −Rf (x̌), we perform a second test using the intersection
z ∩ k. If this still fails, we start over with the -inflation of a new vector. The details
are given in Algorithm 4 below which uses exactly the same inflation mechanism as
verifynlss.m from INTLAB, a routine for computing enclosures for zeros of general
nonlinear systems based on the standard Krawczyk operator and the results from
[29, 30].
Algorithm 4 If successful this algorithm obtains an interval matrix X containing
exactly one matrix X with X 2 − A = 0
1:
2:
3:
4:
5:
6:
7:
8:
9:
10:
11:
12:
13:
14:
15:
16:
17:
18:
19:
20:
21:
22:
Use a floating point algorithm to get an approximate square root X̌ of A.
Use a floating point algorithm to get approximations for V, W, Λ in the spectral
decomposition (3.1). These approximations will again be denoted V, W, Λ.
{The remaining computations will be performed using machine interval arithmetic}
Compute interval matrices I V , I W containing V −1 and W −1 , resp.
{Take verifylss.m from INTLAB, e.g.}
Compute L, an interval matrix containing −Rf (x̌) as in lines 11-14 of Algorithm 2.
Z=L
for k = 1, . . . kmax do
-inflate Z
compute U for input X̌, Z as in lines 1-9 of Algorithm 2
if K := L + U ⊆ int Z then {successful}
output X = X̌ + K and stop
else {second try}
put Z (2) = Z ∩ K
compute U (2) for input X̌, Z (2) as in lines 1-9 of Algorithm 2
if K (2) = L + U (2) ⊆ int Z (2) then {successful}
output X = X̌ + K (2) and stop
else
overwrite Z as intersect(Z, K (2) )
end if
end if
end for
In the alternative approach, where we use fˆ instead of f to avoid wrappings, we
basically proceed in the same manner applying the back transformation at the very
12
end. This is summarized in Algorithm 5, in which we do not repeat the steps for the
computation of X̌, W, V Λ, I V , I W .
Algorithm 5 If successful this algorithm obtains an interval matrix X containing a
matrix X with X 2 − A = 0
1: Compute Ĥ, an interval matrix containing −∆−1 (V T ⊗ W )f (x̌) as in lines 10-12
of Algorithm 3.
2: Ẑ = Ĥ
3: for k = 1, . . . kmax do
4:
-inflate Ẑ
5:
compute N̂ for input X̌, Ẑ as in lines 1-8 of Algorithm 3
6:
if K̂ := Ĥ + N̂ ⊆ int Ẑ then {successful}
7:
output X = X̌ + (I W K̂I V ) and stop
{back transformation}
8:
else
(2)
9:
put Ẑ = Ẑ ∩ K̂
10:
compute N̂
11:
if K̂
(2)
(2)
for input X̌, Ẑ
:= Ĥ + N̂
(2)
⊆ int Ẑ
(2)
(2)
as in lines 1-8 of Algorithm 3
then {successful}
(2)
12:
13:
14:
15:
16:
17:
output X = X̌ + (I W K̂ I V ) and stop
else
(2)
overwrite Ẑ as intersect(Ẑ, K̂ )
end if
end if
end for
{back transformation, again}
Before reporting our numerical experiments, let us shortly comment on a possible modification of the methods presented. A careful inspection of all algorithms
developed so far shows that they remain valid—they still yield enclosures for the
sets K and K̂—if we replace W by I V and I W by V everywhere. In doing so, we
avoid the computation of W and I W , but the computed enclosures for −Rf (x̌) and
−∆−1 (V T ⊗ W )f (x̌) will have the tendancy to become larger since they involve more
interval quantities. The -inflation mechanisms of Algorithms 4 and 5, if successful,
will provide enclosures which will be the narrower the smaller the computed enclosures for −Rf (x̌) and −∆−1 f (x̌), resp. So we expect the quality of the enclosures to
be (slightly) worse if we work with the modifications which do not use W and I W .
5. Extensions. In this section with shortly describe two possible extensions of
Algorithms 4 and 5 which increase their accuracy or their range of appicability.
The first extension results from the observation that the radius of the ,,initial
interval vectors” Z and Ẑ obtained via -inflation in Algorithms 4 and 5, resp.,
crucially depends on the modulus of Rf (x̌). Herein, we have to evaluate f (x̌) =
vec(X̌ 2 − A) using machine interval arithmetic to make sure that the computed value
contains the exact value. This means that we use outward rounding in each step of
the sequence of operations used to obtain X̌ 2 − A which, for each entry, represents
an inner product between a row and a column of X̌, basically. Since cancellation
has to take place (after all, x̌ is an approximate zero), we will have the situation
that the computed interval matrix X̌ 2 − A will consist of entries which have relative
large widths. For example, the exact value of an entry could be b = 2 · 10−15 , but
the computed enclosing interval b would be something like [−10−12 , +10−12 ]. In a
programming language providing a scalar product with maximum accuracy, the same
13
entry of X̌ 2 − A would be obtained as an interval [b, b] where the two bounds are two
consecutive numbers b, b of the floating point screen such that b ≤ b ≤ b. Clearly,
the modulus of this interval (which is very close to |b|) can be orders of magnitudes
smaller than that of b.
For reasons of computational speed, INTLAB does not provide an exact scalar
product. It provides, however, a fairly efficient mechanism which allows to evaluate
inner products and to compute an enclosing interval as if computed in k-fold precision.
This mechanism, described in detail in [27] is based on error-free transformations and
implemented as the INTLAB function dot_.m. In this context it is recommendable
to consider the final enclosure to be given as the pair (X̌, X) with A1/2 ∈ X̌ + X.
The explicit computation of X̌ + X would always result in an interval matrix with
double precision bounds, whereas keeping the pair (X̌, X) will allow to obtain radii
less than machine-epsilon in X although the entries in X̌ are of order 1.
The second extension1 considers the case where A is not diagonizable or, as it
will show up in computational practice, if the eigenvector matrix V is ill conditioned.
Due to the wrapping effect, the radii of the entries of the various interval matrices to
be computed in our algorithms will then tend to become very large and Algorithms 4
and 5 will fail because the conditions L + U ⊆ int Z and Ĥ + N̂ ⊆ int Ẑ, resp., will
not hold.
In such a situation we consider instead of (3.1) a more stable block diagonalization
of X̌ given as
(5.1)
X̌ = V BW, where V, W, B ∈ Cn×n , B block diagonal, V W = I.
Herein,



B=


B1
0
0
..
.
B2
..
.
0
···
···
..
.
..
.
0
0
..
.
0
Bm






is block diagonal with quadratic, upper triangular blocks Bi ∈ Cni ×ni , i = 1, . . . , m.
The point is that we can adapt the block sizes in such a way to have a control on the
condition of V . An algorithm providing such a block factorization is due to Bavely and
Stewart [4], it is available as the function bdschur.m in the Matlab control toolbox.
An upper bound for the condition number can be given as an input to the algorithm
which will then adapt the number of blocks m and their sizes ni accordingly.
Going through the derivation of Algorithms 2 and 3 we see that they remain
valid—i.e. they still compute interval matrices K and K̂ such that vec(K) ⊇ Kf
from (2.6) and vec(K̂) ⊇ K̂ from (3.5)—if we replace ∆ by the matrix
I ⊗ B + B T ⊗ I.
This includes reformulating the pointwise divisions with the matrix D for which ∆ =
Diag (D) in the appropriate manner. Indeed, instead of the generic computation
y = ∆−1 x,
1 We
are grateful to Arnold Neumaier who brought this idea to our attention.
14
we now have to compute an interval vector y containing all solutions y of the equations
(5.2)
(I ⊗ B + B T ⊗ I)y = x, x ∈ x.
Note that (I ⊗ B + B T ⊗ I) is block diagonal with m diagonal
For example, the first such diagonal block B1 is given as

B + b11 I
0
···
0

..
.
.

.
b12 I
B + b22 I
.

B1 = 
..
.
.
..
..

.
0
b1n1 I
b2n1
· · · B + bn 1 n 1 I
blocks of size ni · n.



.


The part of (5.2) corresponding to this block can be dealt with via a back substitution
process giving an interval vector y 1 containing the first block of all solutions of (5.2)
for x ∈ x. More precisely, we put the whole interval vector x into the right hand
side and use the fact that the diagonal blocks of B1 are lower triangular and that the
whole matrix B1 is lower triangular. The computational cost of the resulting overall
process giving all blocks of y will depend on the sizes ni of the blocks. If the size of
the blocks is bounded by a constant, it will still be O(n3 ), but if we have blocks with
large sizes, the complexity may be perceptibly higher.
6. Numerical experiments. In this section we test and compare Algorithms 4
and 5. The approximate square roots X̌ are approximate prinicpal square roots obtained using the floating point stable Schur method of Higham available as the MATLAB m-file rootm.m [13]. Whenever an expression in the algorithms is not specified
exactly due to missing brackets, we evaluated ’from right to left’, i.e. we compute
Y = W (ZV ) in the first line of Algorithm 2 etc.
We also present results using a verification routine called vermatfun from Rohn’s
VERSOFT [28], a collection of INTLAB programs. For a function f and a square,
diagonalizable matrix A, this routine first tries to compute enclosing intervals for
all quantities in the spectral decomposition A = V ΛV −1 . If this is successful, an
enclosure for f (A) is obtained via an interval counter part to the definition f (A) =
V f (Λ)V −1 .
In all computations we use a PC with 2.00 GHz CPU Pentium 4 and 1 GB of
RAM. We tested our algorithms on the set of matrices listed in Table A.1 in the
appendix, where further details on the matrices and their properties are given.
6.1. Results for the standard versions of the algorithms. Figure 6.1 as
well as Table A.1 in the appendix present a comparison of the results obtained by
Algorithms 4 and 5 and VERSOFT. They report the wall clock time representing
the total computational cost, i.e. the time for getting the approximation via rootm.m
and for the verification. The time spent solely for rootm.m is also given. In order to
appreciate the quality of the enclosures obtained, we display two quantitites. As a
measure for the absolute quality of the enclosures obtained we report the maximum
radius mr of the entries of the enclosing interval matrix X, i.e.
n
mr = max rad (X ij ).
i,j=1
The corresponding relative quantity is denoted mrr and defined as
n
mrr = max
i,j=1
rad (X ij )
.
|X ij |
15
mr
0
4
10
total time (in s) vs. dimension
10
Algorithm 4
Algorithm 5
Versoft
2
10
−5
10
0
10
−10
10
−2
0
10
10
20
10
25
100
400
1.600
time
for rootm.m (in s) vs. dimension
4
10
mrr
0
10
2
10
−5
10
0
10
−10
10
−2
0
10
10
20
10
25
100
400
1600
Fig. 6.1. Maximum radius of enclosure (left) and time (right) for different algorithms
We may call − log10 mrr the number of correct significant decimal digits, since it
roughly corresponds to the number of digits to which the upper and the lower bounds
coincide, i.e. the number of significant digits we know to be correct for every entry.
The quantity mr, on the other hand, is an absolute quantity. It represents the number
of correct digits for each entry including “leading zeros”. In situations where the sizes
of the entries of X vary substantially, mrr can be significantly larger than mr. Finally,
k stands for the number of iterations executed in the Algorithms 4 and 5 where we
put kmax = 5. Any execution time longer that one hour was rated “long” and the
computation was aborted.
The left part of Figure 6.1 plots mr and mrr for each of the 26 matrices numbered
as they appear in Table A.1; the right part plots the execution time with the horizontal
axis giving the dimensions of the matrices. The plots from Figure 6.1 are detailed
in Table A.1 in the appendix, revealing the following: Whenever it is successful,
Algorithm 4 is quite comparable to Algorithm 5 with respect to execution time as
well as with respect to the quality of the enclosure. However, there are cases where
Algorithm 4 is not successful, and this comprises cases with small dimensions as well
as cases with large dimensions. Algorithm 5 never failed, whereas VERSOFT seems
to not be able to succeed for matrices with multiple eigenvalues. The quality of
the enclosures, as measured via mr and mrr obtained via VERSOFT are sometimes
comparable to those from Algorithms 4 and 5, but quite often the proposed algorithms
are one or two orders of magnitude more accurate, an extreme case being the matrix
e05r0000. Considering the execution time, we see that the new algorithms are faster
by a factor of 8 to 30 as compared to VERSOFT. In the right lower corner of Figure 6.1
we report the wall clock time for the execution of rootm.m, i.e. for the floating point
16
1600
1600
1600
1400
1400
1400
1200
1200
1200
1000
1000
1000
800
800
800
600
600
600
400
400
400
200
200
200
0
12
14
16
0
12
14
16
0
2
4
6
Fig. 6.2. Histograms for correct digits. Left: number of correct digits in the floating point
approximation. Middle: number of correct digits in computed enclosing intervals. Right: Additional
correct digits obtained through enclosing intervals
method whioch produces the approximate square root. We see that the complete
Algorithms 4 and 5 scale similarly with the dimension as does rootm.m.
For many matrices, especially the symmetric ones, Algorithms 4 and 5 spend
about one fourth of their time for rootm.m, the remaining 75% representing the
overhead to pay for the verification part. In the symmetric indefinite and the nonsymmetric case, the matrix square root A1/2 is likely to be complex even though A is
real. This means that all INTLAB computations are (automatically) done in complex
circular arithmetic which is more costly than the real circular arithmetic which INTLAB uses when quantities are real. For this reason, the verification part now becomes
more costly relative to rootm.m, which tries to avoid complex arithmetic as long as
possible, particularly through the use of MATLAB’s Schur decomposition schur.m.
Indeed, the verification part may now represent up to 85% of the total cost, i.e. up
to 7 times the cost for rootm.m.
6.2. Accuracy of the floating point approximation. Once we have computed an enclosure X via Algorithm 4 or 5 we can determine the accuracy of the
floating point approximation X̌ obtained via rootm.m. Indeed, if for some entry X̌ij
we have X̌ij 6∈ X ij , we can compute the (relative) distance
dij =
|X̌ij − mid (X ij )| − rad (X ij )
,
|X̌ij |
so that b− log10 dij c represents the number of correct decimal digits in X̌ij , whereas
b− log10 mrrc represents the number of correct decimal digits known from the enclosing interval.
As an example, Figure 6.2 reports our findings for the matrix H, the 50×50 matrix
A from [14], i.e.
aij = 0.1 for i 6= j, aii = i2 , i, j = 1, . . . , 50.
The floating point approximation X̌ obtained via rootm.m had 2.5% (out of 2500)
entries for which X̌ij ∈ X ij . For these entries, X ij cannot be used to determine the
exact number of correct digits in Xij . For the remaining ones, the leftmost histogram
17
mr
−5
3
10
total time (in s) vs. dimension
10
Algorithm 4
Algorithm 5
−10
10
2
10
−15
10
1
10
−20
10
0
10
−25
10
−30
10
0
−1
5
10
15
20
10
25
mrr
0
10
3
10
25
100
400
time for residual vs. dimension
10
double precision
quadruple precision
2
10
−5
10
1
10
−10
10
0
10
−15
10
−1
10
−20
10
0
−2
5
10
15
20
10
25
10
25
100
400
Fig. 6.3. Maximum radius of enclosure (left) and time (right) using simulated quadruple precision for the residual
in Figure 6.2 shows the distribution of the exact number of correct digits. The number
of correct digits varies between 11 and 16. For example, 107 entries of X̌ have (exactly)
14 correct digits. In a similar manner, the middle plot of Figure 6.2 shows the (known)
correct digits obtained from the interval entries in X. One sees that we always have
at least 13 correct digits. Finally, the rightmost histogram in Figure 6.2 shows for how
many entries X ij yields 1,2,3 or 4 correct digits more than X̌ij . These statistics quite
drastically show that our interval arithmetic based enclosure methods not only give
results with a computed guaranteed bound for the error, but that they also obtain a
higher accuracy than the “pure” floating point methods.
6.3. Higher precision for the residuals. We now turn to results obtained
with the first extension of Algorithms 4 and 5 described in section 5. To be specific,
we compute an enclosure for the residual X̌ 2 − A with simulated quadruple precision,
i.e. we use INTLAB’s dot_.m with k = 2. Figure 6.3 reports the results obtained
using this improvement; details are in Table A.2 of the appendix. Although all other
computations are done in standard double precision, we now very often arrive at
relative accuracies which are better than 10−16 . Algorithm 4 fails in two cases only;
for all other matrices it is successful in its first iteration, see Table A.2. Algorithm 5
is always successful; in three cases it needs more than one iteration. Comparing the
timings, we see that the high precision computation of an enclosure for X̌ 2 − A via
dot_.m now makes up the major part of the total time. For this reason, we do not
report results for the largest matrices of our test set.
18
Matrix
n
Standard Krawczyk
time
k
frank
8
0.1
1
frank
9
0.1
1
frank
10
0.2
1
gcdmat
25
1.9
1
gcdmat
50
89.2
1
Standard Krawczyk with
simulated quadruple prec. residual
time k
mr
mrr
0.1
1
1.2 · 10−22
6.9 · 10−24
0.1
1
6.2 · 10−20
7.4 · 10−22
0.1
1
5.1 · 10−18
9.9 · 10−21
2.1
1
2.1 · 10−28
3.2 · 10−27
98.4 1
2.4 · 10−27
1.0 · 10−25
mr
mrr
1.3 · 10−10
7.7 · 10−12
5.6 · 10−9
6.5 · 10−11
3.6 · 10−7
6.7 · 10−10
6.8 · 10−15
4.6 · 10−14
2.1 · 10−14
3.6 · 10−13
Table 6.1
Standard Krawczyk
6.4. Block diagonalization. Finally, we briefly turn to the second extension
discussed in section 5 where we proposed to use a block diagonalization of X̌ in cases
where the eigenvector matrix of X̌ is badly conditioned or where A is not diagonalizable. Our example is the (real) gear matrix of dimension 50 from MATLAB’s gallery
to which we added a multiple of the identity such that the smallest eigenvalue becomes
1 and, consequently, its principal square root is real. This matrix is defective, having
exactly one Jordan block of size 2. Numerically, an eigen decomposition can be computed, but the computed eigenvector matrix has condition 2.7 · 1015 . Algorithms 4
and 5, whether in their standard form or using simulated quadruple precision computation for the residuals all failed for this example and so did VERSOFT. However,
bdschur.m allows to compute a block diagonal decomposition of the form (5.1) where
B is diagonal except for one 2 × 2 diagonal block and where V and W are nicely
conditioned. Note that, unfortunately, bdschur.m does not allow for complex input
which explains why we shifted the original gear matrix as described above in order
to guarantee for a real square root.
The modification of Algorithm 4 using this block decomposition succeeds to compute an enclosure for the square root, and it obtains mr = 1.7 × 10−12 and mrr = 1.0.
The time required was 4.8 seconds.
Note that for this small-dimensional example we might as well use the standard
Krawczyk method, which we can view as a method relying on a block diagonalization
with just one diagonal block. Standard Krawczyk also succeeds for gear and obtains
sharper enclosures with mr = 3.9 × 10−15 and mrr = 1.0. However, the time required
by our implementation now is 160 seconds.
We take the opportunity to report some further results for the standard Krawczyk
method for selected matrices in Table 6.1. The bottom line is that standard Krawczyk
is not sensitive to whether A is diagonalizable or not nor is it sensitive to the condition
number of the eigenvector matrix. Its big disadvantage, which prevents the method
to be useful for matrices of dimension 100 or higher is its computational complexity
as can be seen from the timings reported.
7. Conclusions. We presented two variants of Krawczyk’s method which, if
successful, obtain interval enclosures for the square root of a matrix A. Both methods
require an accurate floating point approximation of the square root. If the eigenvector
19
matrix of A is well conditioned, both algorithms are likely to be successful, even for
quite large matrices. While both methods presented achieve comparable accuracy, the
range of application of the second algorithm is larger. It suffers less from the wrapping
effect so that it is easier to computationally verify that Brouwer’s fixed point theorem
holds, the mathematical basis of Krawczyk’s method. We outlined extensions of these
methods including high precision residual calculations and variants which use a block
diagonalization in cases where the eigenvector matrix is ill conditioned. In terms of
computational cost, the overhead to be paid for the verification part is 3 to 7 times the
cost for obtaining the floating point approximation. This cost is compensated for by
the fact that we obtain results with guaranteed accuracy. We showed by an example
that this accuracy can actually amount to one to four more correct decimal digits
than in the floating point approximation. Since the complexity of our algorithms is
O(n3 ), we can treat relatively large matrices in a reasonable amount of time. The
largest example reported here is the poisson matrix of dimension 1, 600. The 1, 6002
= 2.56 million entries of its square root are obtained by computing an enclosure for
the solution of a nonlinear system with that many entries.
Acknowledgement. We are grateful to Nick Higham for his helpful comments
on a first draft of this paper.
Appendix A. Details of computations.
This appendix contains tables reporting the results of our numerical experiments.
We used 26 matrices in total, listed in the first column of Table A.1. The matrices
frank, poisson, fiedler, minij, circul, parter, riemann and gcdmat are from
MATLAB’s gallery, matrices tolosa, bwm200, fidap001 and e05r0000 are from the
Matrix Market [26], and matrices dwt 87, bcspwr03, bcsstk22 and bcsstk08 are
from the Harwell Boeing collection [9]. The 350 × 350 positive definite matrix H with
hii = i2 and hij = 0.1, i 6= j was taken from [14]. Note that the frank matrices are
low dimensional matrices with increasingly ill-conditioned eigenvalues.
Table A.1 gives all details for Algorithms 4 and 5 as well as for vermatfun from
VERSOFT. The plots corresponding to Table A.1 are given in Figure 6.1, the meaning
of mrr and mr is explained in section 6. For each matrix, its dimension n and characteristic property (symmetric, unsymmetric etc.) are reported in the first column. The
matrices are sorted by increasing dimension. The column k for Algorithm 4 reports
the counter for the iteration of Algorithm 4 at which the algorithm succeeded, i.e.
when we obtained
(A.1)
K ⊆ int Z or K (2) ⊆ int Z (2) .
We set kmax = 5, i.e. a maximum of 5 iterations was performed. Only in one case,
namely for the smallest of the frank matrices, did it pay out to try more than one
iteration. For 7 matrices, Algorithm 4 failed because after kmax iterations (A.1) was
not yet satisfied. In these cases, the algorithm requires noticeably more computational
work. Algorithm 5 always succeded in the first step, i.e. for k = 1, so that we do not
report this value in Table A.1.
Table A.2 gives the details for Algorithms 4 and 5 using simulated quadruple
precision to compute enclosures for the residuals. The time spent in the evaluation of
the residuals is also reported.
20
Algorithm 5
VERSOFT
mr
time
mr
time
mr
(trootm )
mrr
mrr
mrr
frank
0.1
2
5.7 · 10−8
0.0
5.7 · 10−8
0.2
2.1 · 10−8
unsym., 8
(0.0)
3.7 · 10−9
3.7 · 10−9
7.8 · 10−9
frank
0.2
5
NaN
0.0
5.5 · 10−6
0.3
2.7 · 10−6
−8
unsym., 9
(0.0)
NaN
6.6 · 10
1.1 · 10−6
frank
0.3
5
NaN
0.0
7.9 · 10−4
0.3
3.0 · 10−4
unsym., 10
(0.0)
NaN
1.5 · 10−6
9.4 · 10−5
gcdmat
0.2
1
4.7 · 10−13
0.2
4.7 · 10−13
3.6
1.2 · 10−11
spd, 50
(0.1)
5.9 · 10−11
5.9 · 10−11
1.5 · 10−9
dwt 87
0.8
1
6.5 · 10−13
0.8
6.5 · 10−13
9.4
1.6 · 10−3
sym. indef., 87
(0.2)
1.1 · 10−8
1.1 · 10−8
5.7 · 10−1
−8
tolosa
3.6
5
NaN
1.1
8.2 · 10
11.4
1.2 · 10−9
unsym., 90
(0.2)
NaN
2.0 · 10−7
2.1 · 10−9
poisson
0.7
1
5.0 · 10−13
0.6
5.0 · 10−13
24.6
NaN
spd, 100
(0.3)
1.8 · 10−7
1.8 · 10−7
NaN
bcspwr03
1.5
1
7.4 · 10−13
1.5
7.4 · 10−13
20.1
NaN
sym. indef., 118
(0.3)
6.4 · 10−8
6.4 · 10−8
NaN
fiedler
1.9
1
3.4 · 10−9
1.9
1.9 · 10−9
23.6
5.4 · 10−8
sym. indef., 130
(0.5)
8.7 · 10−9
4.7 · 10−9
9.5 · 10−8
bcsstk22
1.2
1
1.1 · 10−10
1.2
1.1 · 10−10
25.5
1.1 · 10−7
sym. indef., 138
(0.3)
1.0
1.0
1.0
minij
3.3
5
NaN
1.2
1.5 · 10−9
26.2
1.2 · 10−7
spd, 140
(0.4)
NaN
1.9 · 10−7
7.9 · 10−6
circul
4.2
1
1.6 · 10−9
4.0
1.1 · 10−9
35.0
1.7 · 10−9
circulant, 150
(0.6)
2.8 · 10−8
1.9 · 10−8
3.0 · 10−8
−9
fiedler
12.1
5
NaN
3.2
4.2 · 10
36.3
1.4 · 10−7
sym. indef., 160
(0.6)
NaN
1.1 · 10−8
2.4 · 10−7
minij
7.0
5
NaN
2.5
5.0 · 10−9
50.8
4.7 · 10−7
spd, 190
(0.7)
NaN
8.9 · 10−7
3.9 · 10−5
bwm200
8.7
1
7.0 · 10−11
8.6
7.0 · 10−11
61.2
1.8 · 10−10
unsym., 200
(1.1)
1.4 · 10−7
1.4 · 10−7
8.2 · 10−7
fidap001
7.5
1
3.6 · 10−14
7.4
3.6 · 10−14
68.3
6.1 · 10−12
unsym., 216
(1.0)
1.0
1.0
1.0
e05r0000
9.5
1
2.2 · 10−12
9.4
2.2 · 10−12
82.9
2.3 · 10−3
unsym., 236
(1.2)
1.8 · 10−5
1.8 · 10−5
1.0
gcdmat
5.4
1
1.9 · 10−11
5.3
1.9 · 10−11
93.7
1.3 · 10−9
spd, 250
(1.3)
2.8 · 10−6
2.8 · 10−6
1.4 · 10−4
−9
helmert
67.2
5
NaN
20.5
2.3 · 10
171.8 3.2 · 10−10
unsym., 260
(1.8)
NaN
8.2 · 10−7
3.2 · 10−7
−9
−9
parter
39.6
1
1.7 · 10
38.4
1.0 · 10
328.1
4.8 · 10−8
Toeplitz, 325
(3.3)
2.8 · 10−7
1.7 · 10−7
8.0 · 10−6
H
15.4
1
1.7 · 10−12
15.0
1.7 · 10−12
219.8 4.9 · 10−12
spd, 350
(3.4)
6.8 · 10−12
6.8 · 10−12
4.1 · 10−12
poisson
23.0
1
8.2 · 10−12
22.4
7.3 · 10−12
long
spd, 400
(4.8)
2.5 · 10−4
2.2 · 10−4
riemann
156.2
1
1.6 · 10−8
153.0
1.2 · 10−8
long
unsym., 520
(53.1)
8.8 · 10−7
6.0 · 10−7
gcdmat
187.0
1
3.8 · 10−10
176.1 3.6 · 10−10
long
spd, 800
(33.3)
1.2 · 10−2
1.1 · 10−2
bcsstk08
474.0
1
3.5 · 10−8
464.4
3.5 · 10−8
long
spd, 1074
(77.4)
1.0
1.0
poisson
5515
5
NaN
1140
1.1 · 10−10
long
spd, 1600
(248.1)
NaN
2.6 · 10−1
Table A.1
Details of computations for the standard algorithms, all times are in seconds and k = 1 for all
matrices in Alg. 5. NaN means that the algorithm was not successful.
Matrix
Property, n
Algorithm 4
time
k
21
Matrix
n
Algorithm 4
k
mr
mrr
0.1
1 4.7 · 10−18
3.0 · 10−19
0.2
3 9.3 · 10−15
6.4 · 10−16
0.2
5 4.3 · 10−11
2.0 · 10−13
0.3
1 8.3 · 10−26
1.2 · 10−23
1.6
1 1.2 · 10−24
2.0 · 10−20
4.3
5
NaN
NaN
0.9
1 2.9 · 10−24
1.0 · 10−18
3.4
1 3.1 · 10−24
3.2 · 10−19
4.5
1 4.7 · 10−20
1.2 · 10−19
1.8
1 3.7 · 10−22
2.2 · 10−7
1.9
1 1.1 · 10−20
1.4 · 10−18
7.7
1 4.6 · 10−21
7.9 · 10−20
7.7
1 7.6 · 10−20
1.9 · 10−19
4.0
1 7.9 · 10−20
1.4 · 10−17
19.1
1 2.4 · 10−21
6.9 · 10−18
21.0
1 2.2 · 10−23
4.0 · 10−6
27.9
1 3.4 · 10−22
4.7 · 10−14
8.9
1 2.3 · 10−23
3.8 · 10−18
90.8
5
NaN
NaN
91.6
1 4.7 · 10−19
7.5 · 10−17
26.9
1 1.8 · 10−22
7.6 · 10−22
40.5
1 6.9 · 10−22
2.1 · 10−14
496.5 1 1.1 · 10−17
1.1 · 10−15
time
frank
8
frank
9
frank
10
gcdmat
50
dwt
87
87
tolosa
90
poisson
100
bcspwr03
118
fiedler
130
bcsstk22
138
minij
140
circul
150
fiedler
160
minij
190
bwm200
200
fidap001
216
e05r0000
236
gcdmat
250
helmert
260
parter
325
H
350
poisson
400
riemann
520
Algorithm 5
k
mr
mrr
0.0
1 1.5 · 10−17
9.5 · 10−19
0.1
1 1.7 · 10−14
1.4 · 10−15
0.2
3 1.1 · 10−11
1.9 · 10−13
0.3
1 9.0 · 10−26
1.2 · 10−23
1.5
1 1.2 · 10−24
2.2 · 10−20
2.5
2 1.1 · 10−10
2.6 · 10−10
0.9
1 3.3 · 10−23
1.2 · 10−18
3.3
1 3.6 · 10−24
3.7 · 10−19
4.5
1 7.3 · 10−20
1.8 · 10−19
1.7
1 8.9 · 10−22
3.6 · 10−7
1.8
1 5.9 · 10−20
7.8 · 10−18
7.4
1 3.4 · 10−21
5.8 · 10−20
7.7
1 1.5 · 10−19
3.8 · 10−19
4.0
1 5.2 · 10−19
9.4 · 10−17
18.9
1 1.7 · 10−21
4.5 · 10−18
20.9
1 2.5 · 10−23
4.2 · 10−6
27.5
1 4.6 · 10−22
6.5 · 10−14
8.5
1 2.1 · 10−23
3.4 · 10−18
58.0
2 3.3 · 10−15
6.6 · 10−13
91.1
1 2.3 · 10−18
3.7 · 10−16
26.4
1 1.8 · 10−22
7.7 · 10−22
40.3
1 7.0 · 10−22
2.1 · 10−14
480.2 1 1.3 · 10−18
8.7 · 10−17
time
Table A.2
Details using simulated quadruple precision for residual computation
22
REFERENCES
[1] G. Alefeld and J. Herzberger, Introduction to Interval Computations, Computer Science
and Applied Mathematrics, Academic Press, New York, 1983.
[2] G. Alefeld and N. Schneider, On square roots of M-matrices, Linear Algebra Appl., 42
(1982), pp. 119–132.
[3] R. Bartels and G. Stewart, Solution of the matrix equation AX + XB = C, Comm. ACM,
15 (1972), pp. 820–826.
[4] A. Bavely and G. Stewart, An algorithm for computing reducing subspaces by block diagonalization, SIAM J. Numer. Anal., 16 (1979), pp. 359–367.
[5] A. Björck and S. Hammarling, A Schur method for the square root of a matrix, Linear
Algebra Appl., 52/53 (1983), pp. 127–140.
[6] G. Cross and P. Lancaster, Square Roots of Complex Matrices, Linear Multilinear Algebra,
1 (1974), pp. 289–293.
[7] E. D. Denman, Roots of real matrices, Linear Algebra Appl., 36 (1981), pp. 133–139.
[8] E. D. Denman and A. N. Beavers, The matrix sign function and computations in systems,
Appl. Math. Comput., 2 (1976), pp. 63–94.
[9] I. Duff, R. Grimes, and J. Lewis, Users guide for Harwell-Boeing sparse matrix test problems
collection, Tech. Report RAL-92-086, Computing and Information Systems Department,
Rutherford Appleton Laboratory, Didcot, U.K., 1992.
[10] N. J. Higham, Newton’s method for the matrix square root, Math. Comput., 46 (1986), pp. 537–
549.
, Computing real square roots of a real matrix, Linear Algebra Appl., 88/89 (1987),
[11]
pp. 405–430.
[12]
, Stable iterations for the matrix square root, Numer. Algorithms, 15 (1997), pp. 227–242.
[13]
, The matrix computation toolbox for MATLAB (version 1.0), Tech. Report Numerical Analysis Report No. 410, University of Manchester, 2002.
http://www.ma.man.ac.uk/~nareports.
[14]
, Functions of Matrices: Theory and Computation, SIAM, Philadelphia, 2008.
[15] R. A. Horn and C. R. Johnson, Topics in Matrix Analysis, Cambridge University Press,
Cambridge, 1994.
[16] W. Hoskins and D. Walton, A faster method of computing the square root of a matrix, IEEE
Trans. Autom. Control, 23 (1978), pp. 494–495.
[17] R. B. Kearfott, Rigorous Global Search: Continuous Problems, Kluwer Academic Publishers,
Dordrecht, 1996.
[18] R. B. Kearfott, M. Nakao, A. Neumaier, S. Rump, S. Shary, and
P. van Hentenryck,
Standardized notation in interval analysis,
2005.
http://www.mat.univie.ac.at/~neum/ms/notation.pdf.
[19] R. Klatte, U. W. Kulisch, A. Wiethoff, C. Lawo, and M. Rauch, C-XSC. A C++ Class
Library for Extended Scientific Computing, Springer-Verlag, Berlin, 1993.
[20] R. Krawczyk, Newton-Algorithmen zur Bestimmung von Nullstellen mit Fehlerschranken,
Computing, 4 (1969), pp. 187–201.
[21] W. Luther and W. Otten, The complex arithmetic-geometric mean and multiple-precision
matrix functions, in Scientific Computing and Validated Numerics, G. Alefeld, A. Frommer,
and B. Lang, eds., no. 90 in Math. Res., Berlin, 1996, Akademie Verlag, pp. 52–58.
[22] B. Meini, The matrix square root from a new functional perspective: Theoretical results and
computational issues, SIAM J. Matrix Anal. Appl., 26 (2004), pp. 362–376.
[23] R. E. Moore, A test for existence of solutions to nonlinear systems, SIAM J. Numer. Anal.,
14 (1977), pp. 611–615.
[24] R. E. Moore, R. B. Kearfott, and M. J. Cloud, Introduction to Interval Analysis, SIAM,
Philadelphia, 2009.
[25] A. Neumaier, Interval Methods for Systems of Equations, no. 37 in Encyclopedia of Mathematics and its Applications, Cambridge University Press, Cambridge, 1990.
[26] N. I. of Standards and Technology, Matrix Market, a web resource for test matrix collections. http://math.nist.gov/Matrix-Market/.
[27] T. Ogita, S. M. Rump, and S. Oishi, Accurate sum and dot product, SIAM J. Sci. Comput.,
26 (2005), pp. 1955–1988.
[28] J.
Rohn,
VERSOFT:
Verification
software
in
MATLAB/INTLAB.
http://uivtx.cs.cas.cz/~rohn/matlab.
[29] S. M. Rump, Solving algebraic problems with high accuracy, in A New Approach to Scientific
Computation, W. Miranker and E. Kaucher, eds., vol. 7 of Comput. Sci. Appl. Math., New
York, 1983, Academic Press, pp. 51–120.
23
[30]
[31]
[32]
[33]
[34]
, Verification methods for dense and sparse systems of equations, in Topics in Validated
Computations, J. Herzberger, ed., vol. 5 of Stud. Comput. Math., Amsterdam, 1994, Elsevier, pp. 63–135.
, Expansion and estimation of the range of nonlinear functions, Math. Comput., 65
(1996), pp. 1503–1512.
, INTLAB – INTerval LABoratory, in Developments in Reliable Computing, T. Csendes,
ed., Dordrecht, 1999, Kluwer Academic Publishers, pp. 77–104.
M. Smith, Numerical Computation of Matrix Functions, PhD thesis, University of Manchester,
2002.
R. S. Varga, Matrix Iterative Analysis, vol. 27 of Springer Series in Computational Mathematics, Springer, Berlin, 2nd ed., 2000.
24