Download Verified Computation of Square Roots of a Matrix

Preprint BUW-SC 09/2 Bergische Universität Wuppertal Fachbereich C – Mathematik und Naturwissenschaften Mathematik Andreas Frommer, Behnam Hashemi Verified Computation of Square Roots of a Matrix April 2009 http://www.math.uni-wuppertal.de/SciComp/ VERIFIED COMPUTATION OF SQUARE ROOTS OF A MATRIX ANDREAS FROMMER∗ AND BEHNAM HASHEMI† Abstract. We present methods to compute verified square roots of a square matrix A. Given an approximation X to the square root, obtained by a classical floating point algorithm, we use interval arithmetic to find an interval matrix which is guaranteed to contain the error of X. Our approach is based on the Krawczyk method which we modify in two different ways in such a manner that the computational complexity for an n × n matrix is reduced to n3 . The methods are based on the spectral decomposition or, in the case that the eigenvector matrix is ill conditioned, on a similarity transformation to block diagonal form. Numerical experiments prove that our methods are computationally efficient and that they yield narrow enclosures provided X is a good approximation. This is particularly true for symmetric matrices, since their eigenvector matrix is perfectly conditioned. Keywords: matrix square root, Brouwer’s fixed point theorem, Krawczyk’s method, Kronecker structures, interval analysis, circular arithmetic. 1. Introduction. Let A be a given n × n matrix. In this paper we are interested in computing enclosing intervals for all entries of a matrix square root A1/2 of A. Here, the matrix square root A1/2 is to be understood, as usually, as the extension of the square root function (·)1/2 : C → C, z 7→ z 1/2 to n × n matrices in the operator theoretic sense. Such A1/2 , which is called a primary square root, is usually not unique; it can be characterized as p(A) where p is any polynomial which interpolates (·)1/2 at the eigenvalues of A in the Hermite sense w.r.t. their algebraic multiplicity. The non-uniqueness of a primary square root A1/2 is due to the fact that z 1/2 takes two different values for z 6= 0, and we are free in choosing either value for the interpolation polynomial p. As an example let A = I, the identity in C2×2 . Then A has exactly two primary square roots which are given by 1 0 −1 0 , . 0 1 0 −1 An outstanding reference on the various aspects of the matrix square root is Higham’s recent book [14] on which this introduction is based. Every square root X of a matrix A satisfies the equation (1.1) F (X) = X 2 − A = 0. But there might be solutions to (1.1) other than primary square roots of A. For example, if again A = I ∈ C2×2 , the matrices −1 0 1 0 , 0 1 0 −1 as well as, e.g. 0 1 1 0 , ∗ Department of Mathematics, University of Wuppertal, 42097 Wuppertal, Germany [email protected] † Department of Applied Mathematics, Faculty of Mathematics and Computer Sciences, Amirkabir University of Technology, No.424 Hafez Avenue, Tehran 15914, Iran hashemi am @aut.ac.ir, [email protected] 1 also solve (1.1). Any solution of (1.1) is called a square root of A; if it is not a matrix function in the operator theoretic sense, it will be called ”non-primary”. Computing (primary) matrix square roots numerically has been considered previously by several authors including [2, 5, 6, 8, 7, 10, 11, 12, 16, 22]. Björck and Hammarling [5] have offered a method based on the Schur decomposition and a fast recursion. However, if A is real this method may require complex arithmetic even if the desired root is itself real. The method of [5] was extended by Higham [11] to compute real square roots of a real matrix using real arithmetic. Newton’s method has also been used to compute matrix square roots of A ∈ Cn×n by Higham [10]. Any nonsingular complex (real) square matrix A has a primary square root. While the equation (1.1) may have infinitely many solutions (for n ≥ 2 any involutory matrix is a solution to (1.1) for A = I), a nonsingular Jordan block has precisely two primary square roots [12]. If A is singular, the existence of a square root depends on the Jordan structure of the zero eigenvalue. More precisely, the matrix A has a primary square root if and only if rank(A) = rank(A2 ) [15]. If A is real and nonsingular, it may or may not have a real primary square root, a sufficient condition for one to exist is that A have no real negative eigenvalues [12]. Any symmetric positive (semi)definite matrix has a unique symmetric positive (semi)definite primary square root [15]. Any matrix A having no nonpositive real eigenvalues (e.g., every nonsingular Mmatrix) has a unique primary square root for which every eigenvalue has positive real part, called the principal square root. One way to compute the principal square root of a real diagonalisable matrix A is to use the spectral decomposition [33]. A method for computing the principal (symmetric positive definite) square root A1/2 of a symmetric positive definite matrix A is based on a Cholesky decomposition of A and the polar decomposition of the Cholesky factor [12]. The general interest in matrix square roots reflects its importance in many applications like in the Neumann-Dirichlet mapping for elliptic boundary value problems and related numerical methods to their solution, in the inverse scaling and squaring method for the matrix logarithm which, for example, is used to produce generators of Markov models, in the orthogonal Procrustes problem or in quadratic nonlinear eigenproblems; see [14]. The number of matrix square roots depends on the Jordan structure of A. Higham (see [11] or [14]) has classified the square roots of a nonsingular matrix A in a manner which makes clear the distinction between primary and non-primary ones. The precise result is given in the following lemma. Lemma 1.1. Let the nonsingular matrix A ∈ Cn×n have the Jordan canonical form Z −1 AZ = J = diag(J1 , J2 , ..., Jp ), and let s ≤ p be the number of distinct eigenvalues of A. Then A has precisely 2s primary square roots. Moreover, all primary square roots of A are isolated solutions of (1.1), characterized by the fact that the sum of any two of their eigenvalues is nonzero. The nonprimary square roots of A form a finite number of parametrized families of matrices; each family contains infinitely many solutions of (1.1) which share the same spectrum. The non-primary sqaure roots are non-isolated, i.e. each neighborhood contains an infinity of other solutions of (1.1). The purpose of this paper is to develop methods based on interval arithmetic which obtain guaranteed error bounds for a primary square root of a matrix A. A classical, floating point numerical computation will always yield a result which is not an exact square root of A but rather an approximation to it. The reasons for this are manifold like the presence of rounding errors or that one has to stop an iteration 2 which in theory should run forever. Given a thus computed approximate square root X̌ of A, our methods will compute an interval matrix with a small diameter, close to X̌, for which, by the computation, it has been mathematically proven that the interval matrix contains an exact square root of A. In this manner, we get verified, reliable error bounds for each entry of the square root of A. Interval methods of this kind have a long tradition, and there have been developed many approaches to compute enclosures for solutions of systems of linear and nonlinear equations. We refer to the monographs [1, 17, 24, 25], e.g., for further reference. Interval computations have to be supported by software providing for a correct implementation of machine interval arithmetic including outward rounding, see [1, 17, 24, 25], and for an easy use of interval operations. Prominent examples are C-XSC [19] and INTLAB [32]. INTLAB is a MATLAB toolbox which is freely available for non-commercial use. The numerical examples presented here have been obtained using INTLAB. The organization of this paper is as follows. In section 2, we introduce our notation, review some basic concepts and prove a theoretical result relating the fixed points of one function to the zeros of another. This result will be used in Section 3 to develop two variants of the standard Krawczyk operator which are particularly useful for computing enclosures for a matrix square root, since they keep the complexity down to cubic. These variants can be used in computational existence tests based on -inflation which we discuss in Section 4. Possible extension of our methods will be shortly discussed in section 5 before we report numerical results for several test problems in section 6. 2. Notation and fundamental results. Even if A is a real matrix, its square root may be complex, so C is the natural field to work with. There are two different established ways, circular and rectangular arithmetic, of extending the concept of real interval arithmetic to C. We will use circular arithmetic throughout, implying that the set of complex “intervals” IC = ICdisc is given by all finite discs a in C with midpoint mid (a) and radius rad (a). We refer to [1] for details on the definition of the arithmetic operations in IC. We at least partly aim at using the standard notations of interval analysis defined in [18]. So ICn and ICn×n denotes the set of all interval vectors and the set of all n×n interval matrices, respectively. All interval quantities will be typeset in boldface. For a ∈ IC its absolute value is |a| := max{|a| | a ∈ a} = |mid (a)| + rad (a). The hull 2(a, b) of two intervals in IC is the interval of smallest radius containing a and b. Since the set theoretic intersection of two intervals from IC will usually not be in IC, we may define intersect(a, b) to be just any operator which produces an interval c ∈ IC such that c ⊇ a ∩ b with c being empty if a and b are disjoint. The construction of a “sharp” intersect operator is somewhat tedious; see intersect.m in INTLAB which also provides a machine circular arithmetic using outward rounding. For interval vectors and matrices, rad , mid , | · |, 2, intersect will be applied componentwise, thus producing results of the same dimension as the arguments. Note that for reasons of computational efficiency, the default arithmetic for real intervals in INTLAB is midpoint-radius arithmetic, i.e. the real analogon of complex circular arithmetic, see [32]. The following two results will be needed in the proof of Theorem 2.3 below. We use the projections of a set S ⊆ ICn onto its components, defined for i = 1, . . . , n as Pi S = {si : s = (s1 , . . . , sn )T ∈ S} ⊆ C. 3 Lemma 2.1. Let A ∈ Cn×n . a) Let x ∈ ICn , y = Ax and S = {Ax : x ∈ x}. Then y i = Pi S for i = 1, . . . , n, and rad (y) = |A|rad (x). b) If there exists u ∈ Rn , ui > 0 for i = 1, . . . , n such that |A|u < u, (2.1) then ρ(A) < 1, where ρ denotes the spectral radius. Proof. To show a), let aij ∈ C denote the entries of A, fix i and observe that (2.2) Pi S = { n X aij xj : x = (x1 , . . . , xn )T ∈ x }. j=1 In complex circular arithmetic we have a · b = {ab : b ∈ b} for any a ∈ C, b ∈ IC as well as c + d = {c + d : c ∈ c, d ∈ d} for c, d ∈ IC; see [1, Ch. 6], for example. Therefore, (2.2) gives Pi S = n X aij xj , j=1 i.e. Pi S = y i which proves the first part of a). The second part of a) follows since rad (ab) = |a|rad (b) and rad (a + b) = rad (a) + rad (b) for a ∈ C, a, b ∈ IC, see [1, Ch. 6], e.g., again. The inequality (2.1) in part b) implies kAku < 1 for the weighted maximum norm kxku = maxni=1 {|xi |/ui }, and thus ρ(A) ≤ kAku < 1. For details of this standard result see [25, Prop. 3.7.2] or [34, Cor.1.14], e.g. Any arithmetic expression involving the components x1 , . . . , xN of a vector x ∈ CN defines a function ϕ : D ⊆ CN → C, and N such expressions give a function f : D ⊆ CN → CN , one for each componenent fi of f . By slight abuse of notation we identify the functions with their arithmetic expressions. Replacing the real vector x by an interval vector x ∈ ICN we thus obtain an interval extension f of f . Note that since the distributive law does not hold for intervals, f will actually depend on the expression representing f . By the inclusion property of interval arithmetic, the range of f over an interval is contained in its interval extension, i.e. {f (x) : x ∈ x} ⊆ f (x). A mapping A : D × D → Cn×n is called a slope for f if (2.3) f (y) − f (x) = A(y, x)(y − x) for all x, y ∈ D. Let A be an interval matrix containing all slopes A(y, x) for y ∈ x, x a given interval vector. For example, if x ∈ x and f is continuously differentiable on x, due to the 4 mean value theorem – applied to each component fi individually – we can take any A which contains the set {f 0 (y) : y ∈ x}. A standard choice is thus to take A = f 0 (x), the interval arithmetic evaluation of (an arithmetic expression for) f 0 (x). For a given matrix R ∈ Cn×n , the Krawczyk operator k(x̌, x) is now defined as (2.4) k(x̌, x) = x̌ − Rf (x̌) + (I − R · A)(x − x̌), x̌ ∈ x ⊂ D. The crucial property of k is summarized in the following theorem. Theorem 2.2. Assume that f : D ⊂ CN → CN is continuous in D. Let x̌ ∈ D and x ∈ ICn x ⊆ D. Assume that A is an interval matrix containing all slopes A(y, x̌) for y ∈ x and let R ∈ Cn×n . If we have (2.5) k(x̌, x) ⊆ int x, where int x is the topological interior of x, then f has a zero x∗ in k(x̌, x). Moreover, if A also contains all slopes A(y, x) for x, y ∈ x, the zero x∗ is the only zero of f in x. In the stated form, this theorem is due to Rump [31]; it goes back to Krawczyk [20], see also [23]. The relation (2.5) is likely to hold if f (x) is small, i.e. x is a good approximation to a zero of f , and all entries of the interval matrix I − RA are small, i.e. RA is close to the identity. The standard choice is to take R as a (numerically computed and thus approximate) inverse of mid (A). In the case of the matrix square root, as we will see in Section 3, it is too costly to compute such an approximate inverse R. However, a version relying on a factorization of R will be available at reasonable cost. It is not entirely trivial to see that such a modification can successfully be used to prove the existence of a zero in the same manner as with the original Krawczyk operator. For this reason, we prove the following slight generalization of Theorem 2.2 which we believe to express the essence of all Krawczyk type verification methods. In its formulation we represent x as x̌ + z, thus separating the approximate zero x̌ from the enclosure of its error, z. Theorem 2.3. Assume that f : D ⊂ CN → CN is continuous in D. Let x̌ ∈ D and z ∈ ICn be such that x̌ + z ∈ D. Moreover, assume that A ⊂ Cn×n is a set of matrices containing all slopes A(x̌, y) for y ∈ x̌ + z =: x. Finally, let R ∈ Cn×n . Denote Kf (x̌, R, z, A) the set (2.6) Kf (x̌, R, z, A) := {−Rf (x̌) + (I − RA)z : A ∈ A, z ∈ z}. Then, if (2.7) Kf (x̌, R, z, A) ⊆ int z, the function f has a zero x∗ in x̌ + Kf (x̌, R, z, A) ⊆ x. Moreover, if A also contains all slope matrices A(y, x) for x, y ∈ x, then this zero is unique in x. Proof. Define g(x) = x − Rf (x), x ∈ x. We have g(x) = x − Rf (x) = x̌ + (x − x̌) − R(f (x̌) + A(x, x̌)(x − x̌)) = x̌ − Rf (x̌) + (I − RA(x, x̌))(x − x̌). 5 Thus, (2.7) implies {g(x) : x ∈ x} ⊆ x, so that by Brouwer’s fixed point theorem g has a fixed point x∗ in x. It remains to show that such x∗ is a zero of f and that it is unique. We first show that (2.7) implies that R as well as every A ∈ A are non-singular. To this purpose, fix A ∈ A. By Lemma 2.1 a) we have Pi {(I − RA)z : z ∈ z} = ((I − RA)z)i , i = 1, . . . , n. Therefore, by (2.7), −Rf (x̌) + (I − RA)z ⊆ int z. Herein, both sides represent interval vectors from ICn . We can therefore turn to radii to obtain (2.8) rad ((I − RA)z) = |I − RA|rad (z) = rad ((I − RA)z) < rad (z), where the first equality is due to Lemma 2.1 a). By Lemma 2.1 b) we obtain ρ(I − RA) < 1 which proves that R as well as A are non-singular. Now, R being non-singular shows that every fixed point x∗ of g is indeed a zero of f . Moreover, if x∗∗ 6= x∗ is another zero of f in x, then the corresponding slope matrix A = A(x∗∗ , x∗ ) satisfies 0 = f (x∗∗ ) − f (x∗ ) = A(x∗∗ − x∗ ). But if A contains all these slopes, then this is impossible since we just showed that every A ∈ A is non-singular. We will use Theorem 2.3 for the case where A is obtained as the result A of an interval arithmetic evaluation of (an expression for) f 0 and where 0 ∈ z, i.e. x̌ ∈ x. So both, the existence and the uniqueness result will hold once (2.7) is satisfied. Let us note, in passing, that our proof of Theorem 2.3 does not require A to be an interval matrix, whereas the fact that z is an interval vector appears to be essential to obtain the crucial inequality (2.8). To end this section, we turn to the Kronecker product of matrices. For two matrices A ∈ Cm×n and B ∈ Ck×t it is given by the mk × nt block matrix   a11 B . . . a1n B   .. .. .. A⊗B = . . . . am1 B ... amn B For A = (aij ) ∈ Cm×n the vector vec(A) ∈ Cmn is obtained by stacking the columns of A, i.e., vec(A) = (a11 , . . . , am1 , a12 , . . . , am2 , . . . , a1n , . . . , amn )T . We will need the pointwise or Hadamard division of two matrices A, B ∈ Cn×m which we denote as ·/, i.e. we have A · /B = C ∈ Cn×m , where C = (cij ) with cij = aij /bij . Finally, for d = (d1 , . . . , dn )T ∈ Cn , the matrix Diag (d) denotes the diagonal matrix in Cn×n whose i-th diagonal entry is di . We extend this to matrices: For D ∈ Cn×m we put Diag (D) = Diag (vec(D)) ∈ Cnm×nm . The following properties of the Kronecker product and the vec operator will turn out to be useful. For parts a) and b) see [15], e.g.; part c) is trivial. Lemma 2.4. For any real matrices A, B, C and D with compatible sizes we have 6 a) (A ⊗ B)(C ⊗ D) = (AC ⊗ BD). b) vec(ABC) = (C T ⊗ A)vec(B). c) Diag (A)−1 vec(B) = vec(B · /A). Part c) also holds for interval matrices. Note that there is only a subdistributive, but not a distributive, law in interval arithmetic, i.e. (a+b)c ⊆ ac+bc for a, b, c ∈ IC. This implies that interval matrix multiplication is not associative and that equalities analogous to a) and b) do in general not hold for interval matrices. However, due to the inclusion property of interval arithmetic, we have the following result which will be repeatedly used without further mention in the next section. Lemma 2.5. Let A, B, C be interval matrices of compatible sizes. Then vec((AB)C) T {x = (C ⊗ A)vec(B) : A ∈ A, B ∈ B, C ∈ C} ⊆ . vec(A(BC)) 3. Computing enclosures for Kf . For A, X ∈ Cn×n , the equation F (X) = X 2 − A = 0 2 can be reformulated, interpreting matrices as vectors in Cn via x = vec(X), a = vec(A) and using Lemma 2.4 as f (x) = (I ⊗ X)x − a = 0. In the sequel we will use uppercase letters to denote matrices and the corresponding lower case letters to denote the vector related to the matrix via the vec operator. In the aim of keeping the exposition simple, we will do so implicitly but systematically. From F (X + H) = F (X) + XH + HX + H 2 we see that the Fréchet derivative of F , applied to a direction H ∈ Cn×n , is given as F 0 (X)H = XH + HX. By Lemma 2.4, this translates into f 0 (x)h = (I ⊗ X + X T ⊗ I)h and f 0 (x) = I ⊗ X + X T ⊗ I. Consequently, the standard Krawczyk operator (2.4) for this particular function f is given as k(x, x) = x − R[(I ⊗ X)x − a] + (I − R(I ⊗ X + X T ⊗ I))(x − x̌), 2 2 where R ∈ Cn ×n is a computed (approximate) inverse of I ⊗ mid X + mid X T ⊗ I. Computational experiments to be reported in Section 6 show that with this Krawczyk operator, we can indeed obtain quite tight enclosures for a square root of A. Its severe disadvantage, however, is its computational complexity: The matrix R will not have a nice Kronecker structure, and it will usually be a full matrix. Each of the n2 columns of (I ⊗ X + X T ⊗ I) has 2n non-zeros, so computing its product with R will require n3 operations for each entry, i.e. a total cost of O(n5 ), notwidthstanding the cost for computing R. If we take the approach to compute an interval matrix containing R(I⊗X+X T ⊗I) by solving n2 linear systems with the matrix I⊗mid X+mid X T ⊗I, 7 (and interval right hand sides), we can use the Schur decomposition of X̌ to solve each system with O(n3 ) operations as it is done routinely in methods for the Sylvester equation, see [3]. The resulting total cost is again O(n5 ). So O(n5 ) represents a prohibitively rapidly growing dominant cost in evaluating k(x, x). Note that [21] suggested a method for computing enclosures of the matrix square root with similar complexity, based on an interval Newton method rather than on Krawczyk’s method. We now develop variants of the Krawczyk operator, relying on Theorem 2.3, which have a computational cost of 0(\3 ). For simplicity, we assume that A is diagonaizable. An extension to the general case will be described in section 5. So assume that (3.1) A = V ΛW, with V, W, Λ ∈ Cn×n , Λ = Diag (λ1 , . . . , λn ) diagonal, V W = I. The square roots of A from (3.1) are given as V Λ1/2 W , where 1/2 Λ1/2 = Diag (λ1 , . . . , λ1/2 n ). So if X is a computed accurate approximate square root of A, we can expect W XW −1 as well as V −1 XV to be close to the diagonal matrix Λ1/2 . From the identity f 0 (x) = I ⊗ X + X T ⊗ I = (V −T ⊗ W −1 ) · I ⊗ (W XW −1 ) + (V −1 XV )T ⊗ I · (V T ⊗ W ), we see that an approximate inverse for f 0 (x) is given in factorized form by the matrix (3.2) R = (V −T ⊗ W −1 ) · ∆−1 · (V T ⊗ W ), where ∆ = I ⊗ Λ1/2 + Λ1/2 ⊗ I. Now assume that V, W and Λ are numerically computed quantities obtained by using a standard method to get the decomposition (3.1) like MATLAB’s eig function, e.g. Note that we assume V , the matrix of right eigenvectors, to be a computed quantity, as well as W , the matrix of left eigenvectors. This implies that we do not assume that W V = I holds exactly. Similarly, the computed diagonal matrix Λ will generally not have the exact eigenvalues on its diagonal. Note also that we could as well obtain V, W from an eigen decomposition of X, but this might require complex arithmetic while it is not reqiured for the eigen decomposition of A. Let D denote the matrix for which (3.3) ∆ = Diag (D), D = [d1 | . . . |dn ] ∈ Cn×n . For any matrix X ∈ Cn×n and any vector z ∈ Cn we have (I − R(I ⊗ X + X T ⊗ I))z = (V −T ⊗ W −1 )D−1 D − I ⊗ (W XW −1 ) − (V −1 XV )T ⊗ I V T ⊗ W z. The latter expression is rich in Kronecker products, so that due to Lemma 2.4 b) we can efficiently compute u = (I − R(I ⊗ X + X T ⊗ I))z as described in Algorthm 1. Lines 4-6 compute Q := D − I ⊗ (W XW −1 ) − (V −1 XV )T ⊗ I y in a way which at first glance may look artificial. The fact that we gather all parts which belong to a “diagonal block”, i.e. the n × n matrix (Diag (di ) − Sii I − T ), before multiplying with Yi , will be crucial when turning to the interval arithmetic counterpart: Without building the entire diagonal blocks first, we would obtain wider intervals due to the subdistributive law, and the computed result would no longer reflect the fact that I − Rf 0 (x) is close to the zero matrix. Note also that machine interval arithmetic as 8 Algorithm 1 Efficient computation of u = (I − Rf 0 (x))z 1: Compute Y = W ZV {The j-th column of Y will be denoted Yj } 2: Compute S = V −1 XV {S is an n × n matrix with entries Sij } 3: Compute T = W XW −1 4: for i = 1, . . . , n do {Compute columns Qi of matrix Q} Pn {See (3.3)} 5: Compute Qi = − j=1,j6=i Sji Yj + (Diag (di ) − Sii I − T ) Yi 6: end for 7: Compute N = Q · /D 8: Compute U = W −1 N V −1 implemented in INTLAB or C-XSC [19] is particularly efficient if the level 3 BLAS are used as much as possible. Therefore, we actually will use an alternative way to compute Q from lines 4-6: Define S0 = S − Diag (S11 , . . . , Snn ), i.e. the diagonal entries of S are replaced by zeros in S0 . Then, using Lemma 2.4 b), we see that (3.4) Q = −Y S0 + [c1 | . . . |cn ], where ci = (Diag (di ) − Sii I − T )Yi , i = 1, . . . , n. The following lemma analyzes the cost of Algorithm 1. Lemma 3.1. Algorithm 1 requires O(n3 ) arithmetic operations, independently of whether we use lines 5-7 or (3.4) instead. Proof. One sees immediatley that the computation of Y, S, T, N and U have cost O(n3 ). In lines 4-6, the cost for each i is O(n2 ), since we have n − 1 scalings of n-vectors Yj , one matrix-vector multiplication and the addition of n vectors. In total, the for-loop thus also has cost O(n3 ). If we replace it by (3.4), the cost is one matrixmatrix multiplication and n matrix-vector multiplications, which is again O(n3 ). Algorithm 1 paves the way for an efficient computation of an interval vector containing the set Kf (x̌, R, z, A) with A = f 0 (x̌ + z) = I ⊗ (X̌ + Z) + (X̌ + Z)T ⊗ I from (2.6). Basically, due to the inclusion property of interval arithmetic, we just have to replace the point quantities X and Z in Algorithm 1 by X̌ + Z and Z to obtain an interval vector u = vec(U ) containing Kf (x̌, R, z, f 0 (x̌ + z)) + Rf (x̌) = {(I − RA)z : A ∈ f 0 (x̌ + z), z ∈ z}. There is one particularity, however: V −1 and W −1 will usually not be available as exact inverses of the computed matrices V and W . We therefore assume that we have precomputed interval matrices I V , I W which are known to contain V −1 and W −1 , resp. Incorporating the computation of −Rf (x̌), we obtain Algorithm 2 which, when implemented in machine interval arithmetic, will output an interval vector k = vec(K) containing the set Kf (x̌, R, z, f 0 (x̌ + z)). Note that at several places (starting at line 1), we do not indicate in which order the various multiplications of interval matrices have to be performed. The inclusion property of interval arithmetic will guarantee k ⊇ Kf (x̌, R, z, f 0 (x̌ + z)) for whatever order we choose. In a manner completely analoguous to the proof of Lemma 3.1 one gets Lemma 3.2. Algorithm 2 requires O(n3 ) arithmetic operations. The wrapping effect, i.e. the increase of diameters due to the multiplication of interval matrices will be quite noticeable in Algorithm 2. For example, when we evaluate −Rf (x) in lines 11-14, the computation of G and the computation of L 9 Algorithm 2 Computation of an interval matrix K such that vec(K) contains Kf from (2.6) 1: Compute Y = W ZV {The j-th column of Y will be denoted Y j } 2: Compute S = I V (Z + X̌)V {S is an n × n interval matrix with entries S ij } 3: Compute T = W (Z + X̌)I W 4: for i = 1, . . . , n do {We use (3.4)} 5: compute ci = (Diag (di ) − S ii I − T )Y i 6: end for 7: Compute Q = −Y S 0 + [c1 | . . . |cn ] 8: Compute N = Q · /D 9: Compute U = I W N I V 10: {Lines 11-14 evaluate −Rf (x)} 11: Compute F = X · X − A {F is an interval matrix due to outward rounding} 12: Compute G = W F V 13: Compute H = G · /D 14: Compute L = −I W HI V 15: Compute K = L + U produce two wrapping effects (on n × n matrices), each. Similarly, the computations of Y , S, T , Q and N also produce wrapping effects. The wrapping effect will be more pronounced if the dimension n is large and if the matrices are ill conditioned. Algorithm 2 will then compute an interval matrix K which will be substantially larger than the set Kf (x̌, R, z, f 0 (x̌ + z)), and this might then prevent us to computationally verify the crucial condition (2.7). We therefore now develop a second approach which suffers from less wrappings. We expect it to work for larger and less well conditioned matrices. As we will see later, it will, on the other hand, yield a slightly weaker result with respect to uniqueness. We start again from the spectral decomposition (3.1), A = V ΛW, and we assume again that we have computed a quite accurate approximate square root X, so that W XW −1 and V −1 XV is close to Λ1/2 from (3.2). The new idea is to transform f affinely such that the derivative at X is close to ∆ from (3.2), i.e. we put fˆ(x̂) = (V T ⊗ W ) · f (V −T ⊗ W −1 )x̂ . Clearly, for any x̂, fˆ0 (x̂) = (V T ⊗ W )f 0 (V −T ⊗ W −1 ) · x̂ · (V −T ⊗ W −1 ) = (V T ⊗ W ) · (I ⊗ X + X T ⊗ I) · (V −T ⊗ W −1 ) where X = W −1 X̂V −1 = (I ⊗ W XW −1 + (V −1 XV )T ⊗ I). Consequently, if X is close to a square root of A, then fˆ0 (x̂) ≈ ∆, so that we can use R̂ = ∆−1 as an approximate inverse in a Krawczyk type approach, now for the function fˆ. If this approach yields an interval vector x̂ known to contain exactly one zero of fˆ, then by transforming back we see that the parallelepiped Π = {(V −T ⊗ W −1 )x̂ : x̂ ∈ x̂} 10 contains exactly one zero of f . Using (machine) interval arithmetic, we get an interval matrix X containing Π as X = W −1 X̂V −1 . Note that we cannot assert that the such computed X contains exactly one zero of f ; we just know that it contains an isolated zero. This is a qualitative difference compared to the first approach. It represents the price to pay for obtaining a method which will be applicable to larger matrices and which will usually get narrower enclosures. Let us mention the following subtlety: We will usually have computed an approxˆ = W X̌V we see that imate square root X̌ of A. With X̌ ˆ ) = (V T ⊗ W )f (x̌) = vec W (X̌ 2 − A)V . fˆ(x̌ ˆ , see This relation will allow us to avoid computing any transformation from x̌ to x̌ Algorithm 5 below. So the point is now to compute an interval matrix K̂ such that vec(K̂) contains the set (3.5)K̂ = Kfˆ(x̌, ∆, ẑ, Â) = −∆−1 (V T ⊗ W )f (x̌) + {(I − ∆−1 Â)ẑ : ẑ ∈ ẑ, Â ∈ Â} where Â = (I ⊗ W XW −1 + (V −1 XV )T ⊗ I), X = W −1 ẐV −1 + X̌. Algorithm 3 does so in analogy to Algorithm 2. Algorithm 3 Computation of an interval matrix K̂ such that vec(K̂) contains K̂ from (3.5) 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: Compute Z = I W ẐI V . Compute S = I V (Z + X̌)V {S is an n × n interval matrix with entries S ij } Compute T = W (Z + X̌)I W for i = 1, . . . , n do compute ci = (Diag (di ) − S ii I − T )Ẑ i end for Compute Q̂ = −ẐS 0 + [c1 | . . . |cn ] Compute N̂ = Q̂ · /D {lines 10-12 evaluate −∆−1 (V T ⊗ W )f (x̌)} Compute F = X̌ · X̌ − A {F is an interval matrix due to outward rounding} Compute F̂ = W F V Compute Ĥ = −F̂ · /D Compute K̂ = Ĥ + N̂ The computation of Ĥ in lines 10-12 now produces only two wrappings as opposed to four in the corresponding part of Algorithm 2, and similar savings in wrappings also arise in the other parts of the algorithm. The following result is immediate. Lemma 3.3. Algorithm 3 requires O(n3 ) arithmetic operations. It also requires O(n3 ) less operations than Algorithm 2. 4. Krawczyk type verification methods. Algorithm 2 presented in Section 3 computes an interval vector k = vec(K) containing the set Kf = −Rf (x̌) + {(I − RA)z : A ∈ f 0 (z + x̌), z ∈ z}, with R from (3.2). By Theorem 2.3, if 0 ∈ z and (4.1) k ⊆ int z, 11 then x̌ + k contains a solution of (1.1) which is unique in x̌ + z. We then even know that this solution of (1.1) is a prinicpal square root of A, since otherwise it cannot be isolated by Lemma 1.1. The question now arises of how to choose the interval vector z such that (4.1) is likely to take place. One of the most successful approaches is to use the so-called -inflation, see [29, 30]. It starts from the observation that it is natural to assume 0 ∈ K, so −Rf (x̌) has to be contained in z. Since 0 ∈ z is mandatory, we start with an interval vector z which is slightly larger than the interval hull of −Rf (x̌) and 0. To be precise, we say that we -inflate a given interval vector z if we apply the following manipulations: We first increase the radius of each component by 10% plus δ, while keeping the midpoint. Here, δ is the smallest positive number of the floating point screen, i.e. δ = 2−1023 in IEEE double precision. We then form the interval hull of this vector and the zero vector. If the condition (4.1) is not satisfied with the -inflation z of −Rf (x̌), we perform a second test using the intersection z ∩ k. If this still fails, we start over with the -inflation of a new vector. The details are given in Algorithm 4 below which uses exactly the same inflation mechanism as verifynlss.m from INTLAB, a routine for computing enclosures for zeros of general nonlinear systems based on the standard Krawczyk operator and the results from [29, 30]. Algorithm 4 If successful this algorithm obtains an interval matrix X containing exactly one matrix X with X 2 − A = 0 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: 20: 21: 22: Use a floating point algorithm to get an approximate square root X̌ of A. Use a floating point algorithm to get approximations for V, W, Λ in the spectral decomposition (3.1). These approximations will again be denoted V, W, Λ. {The remaining computations will be performed using machine interval arithmetic} Compute interval matrices I V , I W containing V −1 and W −1 , resp. {Take verifylss.m from INTLAB, e.g.} Compute L, an interval matrix containing −Rf (x̌) as in lines 11-14 of Algorithm 2. Z=L for k = 1, . . . kmax do -inflate Z compute U for input X̌, Z as in lines 1-9 of Algorithm 2 if K := L + U ⊆ int Z then {successful} output X = X̌ + K and stop else {second try} put Z (2) = Z ∩ K compute U (2) for input X̌, Z (2) as in lines 1-9 of Algorithm 2 if K (2) = L + U (2) ⊆ int Z (2) then {successful} output X = X̌ + K (2) and stop else overwrite Z as intersect(Z, K (2) ) end if end if end for In the alternative approach, where we use fˆ instead of f to avoid wrappings, we basically proceed in the same manner applying the back transformation at the very 12 end. This is summarized in Algorithm 5, in which we do not repeat the steps for the computation of X̌, W, V Λ, I V , I W . Algorithm 5 If successful this algorithm obtains an interval matrix X containing a matrix X with X 2 − A = 0 1: Compute Ĥ, an interval matrix containing −∆−1 (V T ⊗ W )f (x̌) as in lines 10-12 of Algorithm 3. 2: Ẑ = Ĥ 3: for k = 1, . . . kmax do 4: -inflate Ẑ 5: compute N̂ for input X̌, Ẑ as in lines 1-8 of Algorithm 3 6: if K̂ := Ĥ + N̂ ⊆ int Ẑ then {successful} 7: output X = X̌ + (I W K̂I V ) and stop {back transformation} 8: else (2) 9: put Ẑ = Ẑ ∩ K̂ 10: compute N̂ 11: if K̂ (2) (2) for input X̌, Ẑ := Ĥ + N̂ (2) ⊆ int Ẑ (2) (2) as in lines 1-8 of Algorithm 3 then {successful} (2) 12: 13: 14: 15: 16: 17: output X = X̌ + (I W K̂ I V ) and stop else (2) overwrite Ẑ as intersect(Ẑ, K̂ ) end if end if end for {back transformation, again} Before reporting our numerical experiments, let us shortly comment on a possible modification of the methods presented. A careful inspection of all algorithms developed so far shows that they remain valid—they still yield enclosures for the sets K and K̂—if we replace W by I V and I W by V everywhere. In doing so, we avoid the computation of W and I W , but the computed enclosures for −Rf (x̌) and −∆−1 (V T ⊗ W )f (x̌) will have the tendancy to become larger since they involve more interval quantities. The -inflation mechanisms of Algorithms 4 and 5, if successful, will provide enclosures which will be the narrower the smaller the computed enclosures for −Rf (x̌) and −∆−1 f (x̌), resp. So we expect the quality of the enclosures to be (slightly) worse if we work with the modifications which do not use W and I W . 5. Extensions. In this section with shortly describe two possible extensions of Algorithms 4 and 5 which increase their accuracy or their range of appicability. The first extension results from the observation that the radius of the ,,initial interval vectors” Z and Ẑ obtained via -inflation in Algorithms 4 and 5, resp., crucially depends on the modulus of Rf (x̌). Herein, we have to evaluate f (x̌) = vec(X̌ 2 − A) using machine interval arithmetic to make sure that the computed value contains the exact value. This means that we use outward rounding in each step of the sequence of operations used to obtain X̌ 2 − A which, for each entry, represents an inner product between a row and a column of X̌, basically. Since cancellation has to take place (after all, x̌ is an approximate zero), we will have the situation that the computed interval matrix X̌ 2 − A will consist of entries which have relative large widths. For example, the exact value of an entry could be b = 2 · 10−15 , but the computed enclosing interval b would be something like [−10−12 , +10−12 ]. In a programming language providing a scalar product with maximum accuracy, the same 13 entry of X̌ 2 − A would be obtained as an interval [b, b] where the two bounds are two consecutive numbers b, b of the floating point screen such that b ≤ b ≤ b. Clearly, the modulus of this interval (which is very close to |b|) can be orders of magnitudes smaller than that of b. For reasons of computational speed, INTLAB does not provide an exact scalar product. It provides, however, a fairly efficient mechanism which allows to evaluate inner products and to compute an enclosing interval as if computed in k-fold precision. This mechanism, described in detail in [27] is based on error-free transformations and implemented as the INTLAB function dot_.m. In this context it is recommendable to consider the final enclosure to be given as the pair (X̌, X) with A1/2 ∈ X̌ + X. The explicit computation of X̌ + X would always result in an interval matrix with double precision bounds, whereas keeping the pair (X̌, X) will allow to obtain radii less than machine-epsilon in X although the entries in X̌ are of order 1. The second extension1 considers the case where A is not diagonizable or, as it will show up in computational practice, if the eigenvector matrix V is ill conditioned. Due to the wrapping effect, the radii of the entries of the various interval matrices to be computed in our algorithms will then tend to become very large and Algorithms 4 and 5 will fail because the conditions L + U ⊆ int Z and Ĥ + N̂ ⊆ int Ẑ, resp., will not hold. In such a situation we consider instead of (3.1) a more stable block diagonalization of X̌ given as (5.1) X̌ = V BW, where V, W, B ∈ Cn×n , B block diagonal, V W = I. Herein,    B=   B1 0 0 .. . B2 .. . 0 ··· ··· .. . .. . 0 0 .. . 0 Bm       is block diagonal with quadratic, upper triangular blocks Bi ∈ Cni ×ni , i = 1, . . . , m. The point is that we can adapt the block sizes in such a way to have a control on the condition of V . An algorithm providing such a block factorization is due to Bavely and Stewart [4], it is available as the function bdschur.m in the Matlab control toolbox. An upper bound for the condition number can be given as an input to the algorithm which will then adapt the number of blocks m and their sizes ni accordingly. Going through the derivation of Algorithms 2 and 3 we see that they remain valid—i.e. they still compute interval matrices K and K̂ such that vec(K) ⊇ Kf from (2.6) and vec(K̂) ⊇ K̂ from (3.5)—if we replace ∆ by the matrix I ⊗ B + B T ⊗ I. This includes reformulating the pointwise divisions with the matrix D for which ∆ = Diag (D) in the appropriate manner. Indeed, instead of the generic computation y = ∆−1 x, 1 We are grateful to Arnold Neumaier who brought this idea to our attention. 14 we now have to compute an interval vector y containing all solutions y of the equations (5.2) (I ⊗ B + B T ⊗ I)y = x, x ∈ x. Note that (I ⊗ B + B T ⊗ I) is block diagonal with m diagonal For example, the first such diagonal block B1 is given as  B + b11 I 0 ··· 0  .. . .  . b12 I B + b22 I .  B1 =  .. . . .. ..  . 0 b1n1 I b2n1 · · · B + bn 1 n 1 I blocks of size ni · n.    .   The part of (5.2) corresponding to this block can be dealt with via a back substitution process giving an interval vector y 1 containing the first block of all solutions of (5.2) for x ∈ x. More precisely, we put the whole interval vector x into the right hand side and use the fact that the diagonal blocks of B1 are lower triangular and that the whole matrix B1 is lower triangular. The computational cost of the resulting overall process giving all blocks of y will depend on the sizes ni of the blocks. If the size of the blocks is bounded by a constant, it will still be O(n3 ), but if we have blocks with large sizes, the complexity may be perceptibly higher. 6. Numerical experiments. In this section we test and compare Algorithms 4 and 5. The approximate square roots X̌ are approximate prinicpal square roots obtained using the floating point stable Schur method of Higham available as the MATLAB m-file rootm.m [13]. Whenever an expression in the algorithms is not specified exactly due to missing brackets, we evaluated ’from right to left’, i.e. we compute Y = W (ZV ) in the first line of Algorithm 2 etc. We also present results using a verification routine called vermatfun from Rohn’s VERSOFT [28], a collection of INTLAB programs. For a function f and a square, diagonalizable matrix A, this routine first tries to compute enclosing intervals for all quantities in the spectral decomposition A = V ΛV −1 . If this is successful, an enclosure for f (A) is obtained via an interval counter part to the definition f (A) = V f (Λ)V −1 . In all computations we use a PC with 2.00 GHz CPU Pentium 4 and 1 GB of RAM. We tested our algorithms on the set of matrices listed in Table A.1 in the appendix, where further details on the matrices and their properties are given. 6.1. Results for the standard versions of the algorithms. Figure 6.1 as well as Table A.1 in the appendix present a comparison of the results obtained by Algorithms 4 and 5 and VERSOFT. They report the wall clock time representing the total computational cost, i.e. the time for getting the approximation via rootm.m and for the verification. The time spent solely for rootm.m is also given. In order to appreciate the quality of the enclosures obtained, we display two quantitites. As a measure for the absolute quality of the enclosures obtained we report the maximum radius mr of the entries of the enclosing interval matrix X, i.e. n mr = max rad (X ij ). i,j=1 The corresponding relative quantity is denoted mrr and defined as n mrr = max i,j=1 rad (X ij ) . |X ij | 15 mr 0 4 10 total time (in s) vs. dimension 10 Algorithm 4 Algorithm 5 Versoft 2 10 −5 10 0 10 −10 10 −2 0 10 10 20 10 25 100 400 1.600 time for rootm.m (in s) vs. dimension 4 10 mrr 0 10 2 10 −5 10 0 10 −10 10 −2 0 10 10 20 10 25 100 400 1600 Fig. 6.1. Maximum radius of enclosure (left) and time (right) for different algorithms We may call − log10 mrr the number of correct significant decimal digits, since it roughly corresponds to the number of digits to which the upper and the lower bounds coincide, i.e. the number of significant digits we know to be correct for every entry. The quantity mr, on the other hand, is an absolute quantity. It represents the number of correct digits for each entry including “leading zeros”. In situations where the sizes of the entries of X vary substantially, mrr can be significantly larger than mr. Finally, k stands for the number of iterations executed in the Algorithms 4 and 5 where we put kmax = 5. Any execution time longer that one hour was rated “long” and the computation was aborted. The left part of Figure 6.1 plots mr and mrr for each of the 26 matrices numbered as they appear in Table A.1; the right part plots the execution time with the horizontal axis giving the dimensions of the matrices. The plots from Figure 6.1 are detailed in Table A.1 in the appendix, revealing the following: Whenever it is successful, Algorithm 4 is quite comparable to Algorithm 5 with respect to execution time as well as with respect to the quality of the enclosure. However, there are cases where Algorithm 4 is not successful, and this comprises cases with small dimensions as well as cases with large dimensions. Algorithm 5 never failed, whereas VERSOFT seems to not be able to succeed for matrices with multiple eigenvalues. The quality of the enclosures, as measured via mr and mrr obtained via VERSOFT are sometimes comparable to those from Algorithms 4 and 5, but quite often the proposed algorithms are one or two orders of magnitude more accurate, an extreme case being the matrix e05r0000. Considering the execution time, we see that the new algorithms are faster by a factor of 8 to 30 as compared to VERSOFT. In the right lower corner of Figure 6.1 we report the wall clock time for the execution of rootm.m, i.e. for the floating point 16 1600 1600 1600 1400 1400 1400 1200 1200 1200 1000 1000 1000 800 800 800 600 600 600 400 400 400 200 200 200 0 12 14 16 0 12 14 16 0 2 4 6 Fig. 6.2. Histograms for correct digits. Left: number of correct digits in the floating point approximation. Middle: number of correct digits in computed enclosing intervals. Right: Additional correct digits obtained through enclosing intervals method whioch produces the approximate square root. We see that the complete Algorithms 4 and 5 scale similarly with the dimension as does rootm.m. For many matrices, especially the symmetric ones, Algorithms 4 and 5 spend about one fourth of their time for rootm.m, the remaining 75% representing the overhead to pay for the verification part. In the symmetric indefinite and the nonsymmetric case, the matrix square root A1/2 is likely to be complex even though A is real. This means that all INTLAB computations are (automatically) done in complex circular arithmetic which is more costly than the real circular arithmetic which INTLAB uses when quantities are real. For this reason, the verification part now becomes more costly relative to rootm.m, which tries to avoid complex arithmetic as long as possible, particularly through the use of MATLAB’s Schur decomposition schur.m. Indeed, the verification part may now represent up to 85% of the total cost, i.e. up to 7 times the cost for rootm.m. 6.2. Accuracy of the floating point approximation. Once we have computed an enclosure X via Algorithm 4 or 5 we can determine the accuracy of the floating point approximation X̌ obtained via rootm.m. Indeed, if for some entry X̌ij we have X̌ij 6∈ X ij , we can compute the (relative) distance dij = |X̌ij − mid (X ij )| − rad (X ij ) , |X̌ij | so that b− log10 dij c represents the number of correct decimal digits in X̌ij , whereas b− log10 mrrc represents the number of correct decimal digits known from the enclosing interval. As an example, Figure 6.2 reports our findings for the matrix H, the 50×50 matrix A from [14], i.e. aij = 0.1 for i 6= j, aii = i2 , i, j = 1, . . . , 50. The floating point approximation X̌ obtained via rootm.m had 2.5% (out of 2500) entries for which X̌ij ∈ X ij . For these entries, X ij cannot be used to determine the exact number of correct digits in Xij . For the remaining ones, the leftmost histogram 17 mr −5 3 10 total time (in s) vs. dimension 10 Algorithm 4 Algorithm 5 −10 10 2 10 −15 10 1 10 −20 10 0 10 −25 10 −30 10 0 −1 5 10 15 20 10 25 mrr 0 10 3 10 25 100 400 time for residual vs. dimension 10 double precision quadruple precision 2 10 −5 10 1 10 −10 10 0 10 −15 10 −1 10 −20 10 0 −2 5 10 15 20 10 25 10 25 100 400 Fig. 6.3. Maximum radius of enclosure (left) and time (right) using simulated quadruple precision for the residual in Figure 6.2 shows the distribution of the exact number of correct digits. The number of correct digits varies between 11 and 16. For example, 107 entries of X̌ have (exactly) 14 correct digits. In a similar manner, the middle plot of Figure 6.2 shows the (known) correct digits obtained from the interval entries in X. One sees that we always have at least 13 correct digits. Finally, the rightmost histogram in Figure 6.2 shows for how many entries X ij yields 1,2,3 or 4 correct digits more than X̌ij . These statistics quite drastically show that our interval arithmetic based enclosure methods not only give results with a computed guaranteed bound for the error, but that they also obtain a higher accuracy than the “pure” floating point methods. 6.3. Higher precision for the residuals. We now turn to results obtained with the first extension of Algorithms 4 and 5 described in section 5. To be specific, we compute an enclosure for the residual X̌ 2 − A with simulated quadruple precision, i.e. we use INTLAB’s dot_.m with k = 2. Figure 6.3 reports the results obtained using this improvement; details are in Table A.2 of the appendix. Although all other computations are done in standard double precision, we now very often arrive at relative accuracies which are better than 10−16 . Algorithm 4 fails in two cases only; for all other matrices it is successful in its first iteration, see Table A.2. Algorithm 5 is always successful; in three cases it needs more than one iteration. Comparing the timings, we see that the high precision computation of an enclosure for X̌ 2 − A via dot_.m now makes up the major part of the total time. For this reason, we do not report results for the largest matrices of our test set. 18 Matrix n Standard Krawczyk time k frank 8 0.1 1 frank 9 0.1 1 frank 10 0.2 1 gcdmat 25 1.9 1 gcdmat 50 89.2 1 Standard Krawczyk with simulated quadruple prec. residual time k mr mrr 0.1 1 1.2 · 10−22 6.9 · 10−24 0.1 1 6.2 · 10−20 7.4 · 10−22 0.1 1 5.1 · 10−18 9.9 · 10−21 2.1 1 2.1 · 10−28 3.2 · 10−27 98.4 1 2.4 · 10−27 1.0 · 10−25 mr mrr 1.3 · 10−10 7.7 · 10−12 5.6 · 10−9 6.5 · 10−11 3.6 · 10−7 6.7 · 10−10 6.8 · 10−15 4.6 · 10−14 2.1 · 10−14 3.6 · 10−13 Table 6.1 Standard Krawczyk 6.4. Block diagonalization. Finally, we briefly turn to the second extension discussed in section 5 where we proposed to use a block diagonalization of X̌ in cases where the eigenvector matrix of X̌ is badly conditioned or where A is not diagonalizable. Our example is the (real) gear matrix of dimension 50 from MATLAB’s gallery to which we added a multiple of the identity such that the smallest eigenvalue becomes 1 and, consequently, its principal square root is real. This matrix is defective, having exactly one Jordan block of size 2. Numerically, an eigen decomposition can be computed, but the computed eigenvector matrix has condition 2.7 · 1015 . Algorithms 4 and 5, whether in their standard form or using simulated quadruple precision computation for the residuals all failed for this example and so did VERSOFT. However, bdschur.m allows to compute a block diagonal decomposition of the form (5.1) where B is diagonal except for one 2 × 2 diagonal block and where V and W are nicely conditioned. Note that, unfortunately, bdschur.m does not allow for complex input which explains why we shifted the original gear matrix as described above in order to guarantee for a real square root. The modification of Algorithm 4 using this block decomposition succeeds to compute an enclosure for the square root, and it obtains mr = 1.7 × 10−12 and mrr = 1.0. The time required was 4.8 seconds. Note that for this small-dimensional example we might as well use the standard Krawczyk method, which we can view as a method relying on a block diagonalization with just one diagonal block. Standard Krawczyk also succeeds for gear and obtains sharper enclosures with mr = 3.9 × 10−15 and mrr = 1.0. However, the time required by our implementation now is 160 seconds. We take the opportunity to report some further results for the standard Krawczyk method for selected matrices in Table 6.1. The bottom line is that standard Krawczyk is not sensitive to whether A is diagonalizable or not nor is it sensitive to the condition number of the eigenvector matrix. Its big disadvantage, which prevents the method to be useful for matrices of dimension 100 or higher is its computational complexity as can be seen from the timings reported. 7. Conclusions. We presented two variants of Krawczyk’s method which, if successful, obtain interval enclosures for the square root of a matrix A. Both methods require an accurate floating point approximation of the square root. If the eigenvector 19 matrix of A is well conditioned, both algorithms are likely to be successful, even for quite large matrices. While both methods presented achieve comparable accuracy, the range of application of the second algorithm is larger. It suffers less from the wrapping effect so that it is easier to computationally verify that Brouwer’s fixed point theorem holds, the mathematical basis of Krawczyk’s method. We outlined extensions of these methods including high precision residual calculations and variants which use a block diagonalization in cases where the eigenvector matrix is ill conditioned. In terms of computational cost, the overhead to be paid for the verification part is 3 to 7 times the cost for obtaining the floating point approximation. This cost is compensated for by the fact that we obtain results with guaranteed accuracy. We showed by an example that this accuracy can actually amount to one to four more correct decimal digits than in the floating point approximation. Since the complexity of our algorithms is O(n3 ), we can treat relatively large matrices in a reasonable amount of time. The largest example reported here is the poisson matrix of dimension 1, 600. The 1, 6002 = 2.56 million entries of its square root are obtained by computing an enclosure for the solution of a nonlinear system with that many entries. Acknowledgement. We are grateful to Nick Higham for his helpful comments on a first draft of this paper. Appendix A. Details of computations. This appendix contains tables reporting the results of our numerical experiments. We used 26 matrices in total, listed in the first column of Table A.1. The matrices frank, poisson, fiedler, minij, circul, parter, riemann and gcdmat are from MATLAB’s gallery, matrices tolosa, bwm200, fidap001 and e05r0000 are from the Matrix Market [26], and matrices dwt 87, bcspwr03, bcsstk22 and bcsstk08 are from the Harwell Boeing collection [9]. The 350 × 350 positive definite matrix H with hii = i2 and hij = 0.1, i 6= j was taken from [14]. Note that the frank matrices are low dimensional matrices with increasingly ill-conditioned eigenvalues. Table A.1 gives all details for Algorithms 4 and 5 as well as for vermatfun from VERSOFT. The plots corresponding to Table A.1 are given in Figure 6.1, the meaning of mrr and mr is explained in section 6. For each matrix, its dimension n and characteristic property (symmetric, unsymmetric etc.) are reported in the first column. The matrices are sorted by increasing dimension. The column k for Algorithm 4 reports the counter for the iteration of Algorithm 4 at which the algorithm succeeded, i.e. when we obtained (A.1) K ⊆ int Z or K (2) ⊆ int Z (2) . We set kmax = 5, i.e. a maximum of 5 iterations was performed. Only in one case, namely for the smallest of the frank matrices, did it pay out to try more than one iteration. For 7 matrices, Algorithm 4 failed because after kmax iterations (A.1) was not yet satisfied. In these cases, the algorithm requires noticeably more computational work. Algorithm 5 always succeded in the first step, i.e. for k = 1, so that we do not report this value in Table A.1. Table A.2 gives the details for Algorithms 4 and 5 using simulated quadruple precision to compute enclosures for the residuals. The time spent in the evaluation of the residuals is also reported. 20 Algorithm 5 VERSOFT mr time mr time mr (trootm ) mrr mrr mrr frank 0.1 2 5.7 · 10−8 0.0 5.7 · 10−8 0.2 2.1 · 10−8 unsym., 8 (0.0) 3.7 · 10−9 3.7 · 10−9 7.8 · 10−9 frank 0.2 5 NaN 0.0 5.5 · 10−6 0.3 2.7 · 10−6 −8 unsym., 9 (0.0) NaN 6.6 · 10 1.1 · 10−6 frank 0.3 5 NaN 0.0 7.9 · 10−4 0.3 3.0 · 10−4 unsym., 10 (0.0) NaN 1.5 · 10−6 9.4 · 10−5 gcdmat 0.2 1 4.7 · 10−13 0.2 4.7 · 10−13 3.6 1.2 · 10−11 spd, 50 (0.1) 5.9 · 10−11 5.9 · 10−11 1.5 · 10−9 dwt 87 0.8 1 6.5 · 10−13 0.8 6.5 · 10−13 9.4 1.6 · 10−3 sym. indef., 87 (0.2) 1.1 · 10−8 1.1 · 10−8 5.7 · 10−1 −8 tolosa 3.6 5 NaN 1.1 8.2 · 10 11.4 1.2 · 10−9 unsym., 90 (0.2) NaN 2.0 · 10−7 2.1 · 10−9 poisson 0.7 1 5.0 · 10−13 0.6 5.0 · 10−13 24.6 NaN spd, 100 (0.3) 1.8 · 10−7 1.8 · 10−7 NaN bcspwr03 1.5 1 7.4 · 10−13 1.5 7.4 · 10−13 20.1 NaN sym. indef., 118 (0.3) 6.4 · 10−8 6.4 · 10−8 NaN fiedler 1.9 1 3.4 · 10−9 1.9 1.9 · 10−9 23.6 5.4 · 10−8 sym. indef., 130 (0.5) 8.7 · 10−9 4.7 · 10−9 9.5 · 10−8 bcsstk22 1.2 1 1.1 · 10−10 1.2 1.1 · 10−10 25.5 1.1 · 10−7 sym. indef., 138 (0.3) 1.0 1.0 1.0 minij 3.3 5 NaN 1.2 1.5 · 10−9 26.2 1.2 · 10−7 spd, 140 (0.4) NaN 1.9 · 10−7 7.9 · 10−6 circul 4.2 1 1.6 · 10−9 4.0 1.1 · 10−9 35.0 1.7 · 10−9 circulant, 150 (0.6) 2.8 · 10−8 1.9 · 10−8 3.0 · 10−8 −9 fiedler 12.1 5 NaN 3.2 4.2 · 10 36.3 1.4 · 10−7 sym. indef., 160 (0.6) NaN 1.1 · 10−8 2.4 · 10−7 minij 7.0 5 NaN 2.5 5.0 · 10−9 50.8 4.7 · 10−7 spd, 190 (0.7) NaN 8.9 · 10−7 3.9 · 10−5 bwm200 8.7 1 7.0 · 10−11 8.6 7.0 · 10−11 61.2 1.8 · 10−10 unsym., 200 (1.1) 1.4 · 10−7 1.4 · 10−7 8.2 · 10−7 fidap001 7.5 1 3.6 · 10−14 7.4 3.6 · 10−14 68.3 6.1 · 10−12 unsym., 216 (1.0) 1.0 1.0 1.0 e05r0000 9.5 1 2.2 · 10−12 9.4 2.2 · 10−12 82.9 2.3 · 10−3 unsym., 236 (1.2) 1.8 · 10−5 1.8 · 10−5 1.0 gcdmat 5.4 1 1.9 · 10−11 5.3 1.9 · 10−11 93.7 1.3 · 10−9 spd, 250 (1.3) 2.8 · 10−6 2.8 · 10−6 1.4 · 10−4 −9 helmert 67.2 5 NaN 20.5 2.3 · 10 171.8 3.2 · 10−10 unsym., 260 (1.8) NaN 8.2 · 10−7 3.2 · 10−7 −9 −9 parter 39.6 1 1.7 · 10 38.4 1.0 · 10 328.1 4.8 · 10−8 Toeplitz, 325 (3.3) 2.8 · 10−7 1.7 · 10−7 8.0 · 10−6 H 15.4 1 1.7 · 10−12 15.0 1.7 · 10−12 219.8 4.9 · 10−12 spd, 350 (3.4) 6.8 · 10−12 6.8 · 10−12 4.1 · 10−12 poisson 23.0 1 8.2 · 10−12 22.4 7.3 · 10−12 long spd, 400 (4.8) 2.5 · 10−4 2.2 · 10−4 riemann 156.2 1 1.6 · 10−8 153.0 1.2 · 10−8 long unsym., 520 (53.1) 8.8 · 10−7 6.0 · 10−7 gcdmat 187.0 1 3.8 · 10−10 176.1 3.6 · 10−10 long spd, 800 (33.3) 1.2 · 10−2 1.1 · 10−2 bcsstk08 474.0 1 3.5 · 10−8 464.4 3.5 · 10−8 long spd, 1074 (77.4) 1.0 1.0 poisson 5515 5 NaN 1140 1.1 · 10−10 long spd, 1600 (248.1) NaN 2.6 · 10−1 Table A.1 Details of computations for the standard algorithms, all times are in seconds and k = 1 for all matrices in Alg. 5. NaN means that the algorithm was not successful. Matrix Property, n Algorithm 4 time k 21 Matrix n Algorithm 4 k mr mrr 0.1 1 4.7 · 10−18 3.0 · 10−19 0.2 3 9.3 · 10−15 6.4 · 10−16 0.2 5 4.3 · 10−11 2.0 · 10−13 0.3 1 8.3 · 10−26 1.2 · 10−23 1.6 1 1.2 · 10−24 2.0 · 10−20 4.3 5 NaN NaN 0.9 1 2.9 · 10−24 1.0 · 10−18 3.4 1 3.1 · 10−24 3.2 · 10−19 4.5 1 4.7 · 10−20 1.2 · 10−19 1.8 1 3.7 · 10−22 2.2 · 10−7 1.9 1 1.1 · 10−20 1.4 · 10−18 7.7 1 4.6 · 10−21 7.9 · 10−20 7.7 1 7.6 · 10−20 1.9 · 10−19 4.0 1 7.9 · 10−20 1.4 · 10−17 19.1 1 2.4 · 10−21 6.9 · 10−18 21.0 1 2.2 · 10−23 4.0 · 10−6 27.9 1 3.4 · 10−22 4.7 · 10−14 8.9 1 2.3 · 10−23 3.8 · 10−18 90.8 5 NaN NaN 91.6 1 4.7 · 10−19 7.5 · 10−17 26.9 1 1.8 · 10−22 7.6 · 10−22 40.5 1 6.9 · 10−22 2.1 · 10−14 496.5 1 1.1 · 10−17 1.1 · 10−15 time frank 8 frank 9 frank 10 gcdmat 50 dwt 87 87 tolosa 90 poisson 100 bcspwr03 118 fiedler 130 bcsstk22 138 minij 140 circul 150 fiedler 160 minij 190 bwm200 200 fidap001 216 e05r0000 236 gcdmat 250 helmert 260 parter 325 H 350 poisson 400 riemann 520 Algorithm 5 k mr mrr 0.0 1 1.5 · 10−17 9.5 · 10−19 0.1 1 1.7 · 10−14 1.4 · 10−15 0.2 3 1.1 · 10−11 1.9 · 10−13 0.3 1 9.0 · 10−26 1.2 · 10−23 1.5 1 1.2 · 10−24 2.2 · 10−20 2.5 2 1.1 · 10−10 2.6 · 10−10 0.9 1 3.3 · 10−23 1.2 · 10−18 3.3 1 3.6 · 10−24 3.7 · 10−19 4.5 1 7.3 · 10−20 1.8 · 10−19 1.7 1 8.9 · 10−22 3.6 · 10−7 1.8 1 5.9 · 10−20 7.8 · 10−18 7.4 1 3.4 · 10−21 5.8 · 10−20 7.7 1 1.5 · 10−19 3.8 · 10−19 4.0 1 5.2 · 10−19 9.4 · 10−17 18.9 1 1.7 · 10−21 4.5 · 10−18 20.9 1 2.5 · 10−23 4.2 · 10−6 27.5 1 4.6 · 10−22 6.5 · 10−14 8.5 1 2.1 · 10−23 3.4 · 10−18 58.0 2 3.3 · 10−15 6.6 · 10−13 91.1 1 2.3 · 10−18 3.7 · 10−16 26.4 1 1.8 · 10−22 7.7 · 10−22 40.3 1 7.0 · 10−22 2.1 · 10−14 480.2 1 1.3 · 10−18 8.7 · 10−17 time Table A.2 Details using simulated quadruple precision for residual computation 22 REFERENCES [1] G. Alefeld and J. Herzberger, Introduction to Interval Computations, Computer Science and Applied Mathematrics, Academic Press, New York, 1983. [2] G. Alefeld and N. Schneider, On square roots of M-matrices, Linear Algebra Appl., 42 (1982), pp. 119–132. [3] R. Bartels and G. Stewart, Solution of the matrix equation AX + XB = C, Comm. ACM, 15 (1972), pp. 820–826. [4] A. Bavely and G. Stewart, An algorithm for computing reducing subspaces by block diagonalization, SIAM J. Numer. Anal., 16 (1979), pp. 359–367. [5] A. Björck and S. Hammarling, A Schur method for the square root of a matrix, Linear Algebra Appl., 52/53 (1983), pp. 127–140. [6] G. Cross and P. Lancaster, Square Roots of Complex Matrices, Linear Multilinear Algebra, 1 (1974), pp. 289–293. [7] E. D. Denman, Roots of real matrices, Linear Algebra Appl., 36 (1981), pp. 133–139. [8] E. D. Denman and A. N. Beavers, The matrix sign function and computations in systems, Appl. Math. Comput., 2 (1976), pp. 63–94. [9] I. Duff, R. Grimes, and J. Lewis, Users guide for Harwell-Boeing sparse matrix test problems collection, Tech. Report RAL-92-086, Computing and Information Systems Department, Rutherford Appleton Laboratory, Didcot, U.K., 1992. [10] N. J. Higham, Newton’s method for the matrix square root, Math. Comput., 46 (1986), pp. 537– 549. , Computing real square roots of a real matrix, Linear Algebra Appl., 88/89 (1987), [11] pp. 405–430. [12] , Stable iterations for the matrix square root, Numer. Algorithms, 15 (1997), pp. 227–242. [13] , The matrix computation toolbox for MATLAB (version 1.0), Tech. Report Numerical Analysis Report No. 410, University of Manchester, 2002. http://www.ma.man.ac.uk/~nareports. [14] , Functions of Matrices: Theory and Computation, SIAM, Philadelphia, 2008. [15] R. A. Horn and C. R. Johnson, Topics in Matrix Analysis, Cambridge University Press, Cambridge, 1994. [16] W. Hoskins and D. Walton, A faster method of computing the square root of a matrix, IEEE Trans. Autom. Control, 23 (1978), pp. 494–495. [17] R. B. Kearfott, Rigorous Global Search: Continuous Problems, Kluwer Academic Publishers, Dordrecht, 1996. [18] R. B. Kearfott, M. Nakao, A. Neumaier, S. Rump, S. Shary, and P. van Hentenryck, Standardized notation in interval analysis, 2005. http://www.mat.univie.ac.at/~neum/ms/notation.pdf. [19] R. Klatte, U. W. Kulisch, A. Wiethoff, C. Lawo, and M. Rauch, C-XSC. A C++ Class Library for Extended Scientific Computing, Springer-Verlag, Berlin, 1993. [20] R. Krawczyk, Newton-Algorithmen zur Bestimmung von Nullstellen mit Fehlerschranken, Computing, 4 (1969), pp. 187–201. [21] W. Luther and W. Otten, The complex arithmetic-geometric mean and multiple-precision matrix functions, in Scientific Computing and Validated Numerics, G. Alefeld, A. Frommer, and B. Lang, eds., no. 90 in Math. Res., Berlin, 1996, Akademie Verlag, pp. 52–58. [22] B. Meini, The matrix square root from a new functional perspective: Theoretical results and computational issues, SIAM J. Matrix Anal. Appl., 26 (2004), pp. 362–376. [23] R. E. Moore, A test for existence of solutions to nonlinear systems, SIAM J. Numer. Anal., 14 (1977), pp. 611–615. [24] R. E. Moore, R. B. Kearfott, and M. J. Cloud, Introduction to Interval Analysis, SIAM, Philadelphia, 2009. [25] A. Neumaier, Interval Methods for Systems of Equations, no. 37 in Encyclopedia of Mathematics and its Applications, Cambridge University Press, Cambridge, 1990. [26] N. I. of Standards and Technology, Matrix Market, a web resource for test matrix collections. http://math.nist.gov/Matrix-Market/. [27] T. Ogita, S. M. Rump, and S. Oishi, Accurate sum and dot product, SIAM J. Sci. Comput., 26 (2005), pp. 1955–1988. [28] J. Rohn, VERSOFT: Verification software in MATLAB/INTLAB. http://uivtx.cs.cas.cz/~rohn/matlab. [29] S. M. Rump, Solving algebraic problems with high accuracy, in A New Approach to Scientific Computation, W. Miranker and E. Kaucher, eds., vol. 7 of Comput. Sci. Appl. Math., New York, 1983, Academic Press, pp. 51–120. 23 [30] [31] [32] [33] [34] , Verification methods for dense and sparse systems of equations, in Topics in Validated Computations, J. Herzberger, ed., vol. 5 of Stud. Comput. Math., Amsterdam, 1994, Elsevier, pp. 63–135. , Expansion and estimation of the range of nonlinear functions, Math. Comput., 65 (1996), pp. 1503–1512. , INTLAB – INTerval LABoratory, in Developments in Reliable Computing, T. Csendes, ed., Dordrecht, 1999, Kluwer Academic Publishers, pp. 77–104. M. Smith, Numerical Computation of Matrix Functions, PhD thesis, University of Manchester, 2002. R. S. Varga, Matrix Iterative Analysis, vol. 27 of Springer Series in Computational Mathematics, Springer, Berlin, 2nd ed., 2000. 24

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Verified Computation of Square Roots of a Matrix