* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Numerical methods for Vandermonde systems with particular points
Polynomial greatest common divisor wikipedia , lookup
Factorization wikipedia , lookup
Bra–ket notation wikipedia , lookup
Quadratic form wikipedia , lookup
Linear algebra wikipedia , lookup
Eigenvalues and eigenvectors wikipedia , lookup
Cartesian tensor wikipedia , lookup
Jordan normal form wikipedia , lookup
Matrix (mathematics) wikipedia , lookup
System of linear equations wikipedia , lookup
Determinant wikipedia , lookup
Four-vector wikipedia , lookup
Perron–Frobenius theorem wikipedia , lookup
Singular-value decomposition wikipedia , lookup
Cayley–Hamilton theorem wikipedia , lookup
Matrix calculus wikipedia , lookup
Non-negative matrix factorization wikipedia , lookup
Factorization of polynomials over finite fields wikipedia , lookup
Numerical methods for Vandermonde systems with particular points distribution T.Tommasini Dipartimento di Matematica Università di Bologna A.A. 2002-2003 Revised February 2004 Numerical methods for Vandermonde systems with particular points distribution T.Tommasini INDEX 1. Introduction 2. Inverse matrix, determinant and matrix-vector product 3. System solution 3.1. Primal system 3.2. Dual system 4. Parallel implementation and computational analysis 5. Errors and perturbation bounds 6. Rectangular Vandermonde Matrices 7. Least squares solution 8. Computational analysis 9. Perturbation bounds for the QRD factorization Appendix Conclusions References 2 Numerical methods for Vandermonde systems with particular points distribution T.Tommasini 1. Introduction In this paper we will consider column-wise Vandermonde matrices of order n=kq of type (1.1) 1 1 ………….1 1 2 ……… n …………………… 1n-1 2n-1 …… nn-1 V(1,2,…,n) = where (j-1)k+s (s=1,k) are the kth roots of q distinct complex numbers a j0 (j=1,q) , that is (j-1)k+tk =aj (1.2) , t=1,k. In other words they are equidistant points on concentric circles in the complex plane. The transpose of V is called row-wise Vandermonde matrix and it arises in the polynomial interpolation problem on particular points distribution. We will also consider the more simple case of a Vandermonde matrix of order n=2q (k=2), where j (j=1,q) are q distinct pairs of complex numbers different from zero, the two roots of the complex numbers aj (j=1,q). (1.3) V(-1,1,…,-q,q) = 1 1 …… 1 1 -1 +1 …… + q -q ….………………………… (-1)n-1 +1n-1 .…. + qn-1 (-q)n-1 In order to simplify we have used the same index for the two roots opposite in sign of the numbers aj. We will call it the symmetric case. In the next pages we derive a compact expression of V by means of the Kronecker product and then formulas and methods for the solution of the following problems: a. Inverse matrix and determinant b. Primal and dual system solution c. Generalized inverse d. Least squares problem e. Errors and perturbation bounds. In the general case (1.1) if we call Vj the matrices (1.4) Vj = 1 1 …… 1 (j-1)k+1 (j-1)k+2 … jk …………………….. (j-1)k+1k-1 (j-1)k+2 k-1…jkk-1 , j=1,q, and in the symmetric case ( k=2, n=2q) they will be simply 3 (1.5) Vj = 1 1 -j j j=1,q , then the matrix V can be written in the block form (see [10]) as (1.6) V1 V2 .….… Vq V = a 1 V1 a2V2 ……. aqVq a1q-1V1 a2q-1V2 …… aqq-1Vq Let Aq = {aji-1}i=1,q;j=1,q denote the qxq column-wise Vandermonde matrix deriving from the points aj (j = 1, q), by means of the Kronecker product it is possible to express V in this way (1.7) V=(AqIk)D, where Ik is the k order identity matrix and D is the block diagonal matrix (1.8) D=diag(V1 ,V2,…,Vq) From (1.7) it derives this formulation of the conjugate transpose of V (1.9) VH=DH(AqHIk). In the symmetric case the previous expressions (1.7), (1.8) and (1.9) are true with k=2 and Vj in D given by (1.5). The extension to the rectangular case is discussed in Sect. 6. 2. - Inverse matrix, determinant and matrix-vector product From the previous formulation and from known Kronecker product properties (see [4], [5]), the inverse matrices of V and VH can be expressed by (2.1) V-1=D-1(Aq-1Ik) , (2.2) V-H=(Aq-HIk)D-H , where (2.3) D-1=diag(V1-1 ,V2-1,…,Vq-1) (2.4) D-H=diag(V1-H ,V2-H,…,Vq-H) involving the inversions of Vandermonde matrices Aq and AqH of order q=n/k (q=n/2 in the symmetric case) and the well known k-order matrices Vj-1 and their conjugate transposes. Moreover we can use the same formula (1.7) in order to evaluate the determinant of the Vandermonde matrix V. By means of the Binet theorem and of a property of the Kronecker product it is possible to write (2.5) det(V)=det[(AqIk)D]=det(Aq)kdet(D) where 4 , det(D)= j=1,q det(Vj) (2.6) In other words the determinant of V may be computed by the computation of the determinant of the reduced dimension matrix Aq with a computational cost of order q2/2. The computation of det(D) is of order qk2/2 and it is not relevant when k<<q. We can use also the formula (1.7) in order to compute the matrix-vector product p=Vu, where u is an arbitrary n-dimensional vector in this way p = Vu = (AqIk)Du . (2.7) Let z=Du; we call Z the qk matrix derived from vector z row-wise, that is (2.8) z=vect(Z) , z1 ……… zk Z = zk+1 ……. z2k ……………. z(q-1)k+1 … zqk and in the same way we call P the qk matrix derived from vector p row-wise, p=vect(P). Then (2.9) p = vect(P) = (AqIk)z = (AqIk)vect(Z) . If M,N and QT are matrices of dimensions such that it is possible to execute their product and vect(matrix) is the vector derived from the matrix by rows, it is derived that (see [1]) (2.10) vect(MNQT) = (MQ)vect(N) . Then, applying (2.10) to (2.9), the matrix-vector multiplication p=Vu can be expressed as (2.11) p = vect(P)= vect(AqZ) . Let us consider the following partition of the vectors u, z in subvectors of dimensions k. (2.12) u1 u = u2 … uq z1 , z = z2 … zq . Then, in order to reduce the computational effort, the computation of the product p=Vu can be done in the following way: 5 Algorithm 2.1 (j-1)k+tk =aj , t=1,k , j=1,q Aq = {aji-1}i=1,q;j=1,q Vj (j=1,q) as in (1.4) z1 u1 z= ... , u= ... zq uq Compute zj=Vj uj , j=1,q Compute P = Aq Z z= vect(Z) , p=vect(P) (by rows) This means that the product Vu may be computed by executing the product AqZ, with a computational cost cnew=2kq2+2k2q instead of c=2n2=2k2q2 . Without considering the lower cost 2k2q of the computation of z=Du (generally k<<q), an approximated speed-up factor c/cnew =k is obtained. 3. System solution 3.1. Primal system We now consider linear systems of equations of type (3.1.1) Vx=b where V is matrix (1.1) and x, b n-dimensional vectors. Using the expression (2.1) of the inverse of V, we can write the solution x of (3.1) in this way x=D-1(Aq-1Ik)b (3.1.2) . In order to reduce the solution computational effort, we call B the qk matrix derived from vector b row-wise (3.1.3) b=vect(B) b1 ……… bk , B = bk+1 …….. b2k ……………. b(q-1)k+1 … bqk = b.1,b.2,...,b.k and then , using some known formulas of the Kronecker product (see [1]) we obtain the following expression of x (3.1.4) x = D-1(Aq-1Ik)vect(B) = D-1 vect(Aq-1B) . 6 In [10] , starting from this solution formulation, a less expensive algorithm is derived to compute x in this way. Now we let C=Aq-1B and call c. i and b.i the columns of C and B respectively; then we may compute C solving the k Vandermonde systems (3.1.5) Aqc.i = b.i i=1,k with the Björck-Pereyra algorithm. Let c=vect(C), we consider the following partition the vectors c and x in k-dimensional subvectors cj (j=1,q) and xj (j=1,q) respectively; then we may solve the q systems of order k (3.1.6) Vj xj = c j j=1,q . Vj = Bj E j=1,q Moreover (3.1.7) where Bj = diag(j1,j,…,jk-1) is a diagonal matrix (j = (j-1)k+1 is a fixed k-th root of aj) and E=V(1,2,…,k) is a Vandermonde matrix on the kth roots i (i=1,k) of 1; then instead of (3.1.6) we may compute the DFT of the vectors dj=Bj-1 cj (j=1,q) (3.1.8) xj= 1/k EH dj j=1,q . Given the Vandermonde matrix V (1.1) and the right hand side vector b, the algorithm for the solution of the primal system on the special points distribution satisfying (1.2) can be summarized in this way: Algorithm SV b=vect(B) C=Aq-1B , c=vect(C) C=c.1, c.2,…c.k , B=b.1,b.2,...,b.k c1 x1 c= … , x= … cq xq Bj = diag(j1,j,…,jk-1) , j=1,q, (j , a fixed k-th root of aj) E=V(1,2,…,k) , ( i , i=1,k , k-roots of 1) Solve with the Björck-Pereyra algorithm Aqc.i = b.i i=1,k Compute dj=Bj-1 cj j=1,q Compute DFT xj= 1/k EHdj j=1,q In the symmetric case, when k=2 and V is matrix (1.3), we have to solve only two Vandermonde 7 systems (3.1.5) of halved dimensions q=n/2. Then we may use the inverse of Vj in order to compute directly the solution in this way xj =Vj-1 cj = , (3.1.9) j=1,q Given the Vandermonde matrix V (1.3) and the right hand side vector b, the algorithm for the primal system solution on a symmetric points distribution can be expressed as Algorithm SSV b=vect(B) C=Aq-1B , c=vect(C) C=c.1, c.2, B=b.1,b.2 c1 x1 c= … , x= … cq xq 1 1 Vj = -j j , j=1,q Solve with the Björck-Pereyra algorithm Aqc.i = b.i i=1,2 Compute xj=Vj-1 cj j=1,q In a different way it is possible to compute directly the inverse of Aq with, for example, the ParkerTroub algorithm and then the system solution will be (3.1.10) x = D-1(Aq-1Ik)vect(B) = D-1 vect(Aq-1B) . Therefore a second algorithm for the primal system solution may be the following: Algorithm SV1 1. compute the inverse of Aq with the Parker-Troub algorithm; 2. compute the matrix product Aq-1B; 3. compute x by DFT using (3.1.7) and (3.1.8). 8 3.2 Dual system In a similar way if we consider the dual system VHy =f (3.2.1) where V is the Vandermonde matrix of order n = kq given by (1.1) and f the right hand side vector, we can write the vector solution y using the expression of the inverse of VH (3.2.2) y=(Aq-HIk)D-Hf where D-H=diag(V1-H ,V2-H,…,Vq-H). If we call (3.2.3) g= D-Hf , g=vect(G) , where G is the qk matrix deriving from vector g row-wise, (3.2.2) may be written as (3.2.4) y=(Aq-HIk)vect(G)=vect(Aq-HG) In order to compute g, we consider the following partition of g and f in k dimensional subvectors (3.2.5) g1 g= … gq f1 f= … fq and the q Vandermonde systems of order k (3.2.6) VjHgj = f j j=1,q Now using (3.1.7) in (3.2.6) , we obtain (3.2.7) EHBjHgj = f j j=1,q Calling hj=BjHgj (j=1,q), we can compute gj = Bj-Hh j (j=1,q) and h j by the DFT algorithm (3.2.8) hj =1/k E f j j=1,q . Then if y=vect(Y), (3.2.4) is equivalent to (3.2.9) Y= Aq-HG Now considering the columns of matrices Y and G (3.2.10) Y=y.1, y.2,…,y.k , G=g.1,g.2,...,g.k (3.2.9) may be written as the k dual Vandermonde systems of order q=n/k (3.2.11) Aq Hy.i= g.i i=1,k and then they can be solved by the Björck-Pereyra algorithm. Given the Vandermonde matrix VH and the right hand side vector f , the algorithm for the solution of 9 the dual system (3.2.1) can be summarized in this way: Algorithm SVH y=vect(Y) g=D-Hf , g=vect(G) 1 g f1 g= … , f= … gq fq Bj = diag(j1,j,…,jk-1) , j=1,q (j , a fixed k-th root of aj) E=V(1,2,…,k) , ( i , i=1,k , k-roots of 1), j H j h =Bj g , j=1,q Y=y.1, y.2,…,y.k , G=g.1,g.2,...,g.k Compute DFT hj = 1/k E fj Compute gj=Bj-H hj Solve with the Björck-Pereyra algorithm AqH y.i= g.i , j=1,q , j=1,q , i=1,k In the symmetric case, the q systems (3.2.6) can be solved easily by means of the expression of the inverse of VjH (3.2.12) gj=Vj-H fj , j=1,q This method for the solution of the dual system on a symmetric points distribution can be summarized in this way Algorithm SSVH y=vect(Y), g=D-Hf , g=vect(G) 1 1 g f g= … , f= … gq fq Y=y.1, y.2, G=g.1,g.2 1 1 Vj = -j j j=1,q 10 Compute gj= Vj-H fj , j=1,q Solve by Björck-Pereyra algorithm AqH y.i= g.i , i=1,2 In the same manner as for the primal system a second algorithm for the system solution of (3.2.1) may be the following: Algorithm SVH1 1. compute the inverse of AqH with the Parker-Troub algorithm; 2. compute g by DFT using (3.2.7) and (3.2.8); 3. compute the matrix product Aq-HG as in (3.2.9). 4. Parallel implementation and computational analysis The compact matrix formulation of the algorithms SV, SSV and SVH, SSVH respectively, derived in the previous section, is particularly suitable for the parallel implementation. Therefore we suppose that k parallel processors are available, with k << q and, to simplify, q = kl moreover, every processor may be vectorial. In this hypothesis the parallel formulation of the algorithm SV can be expressed in the following way: Algorithm PARSV 1. compute simultaneously on k processors the k columns c.i (i = 1,q) of C with the Björck-Pereyra algorithm, applied to the systems (3.1.5); 2. compute simultaneously k subvectors dj=Bj-1 cj ( j=1,kl) by l major steps; 3. compute simultaneously the DFT of k vectors d j; after l major steps all the q systems will be solved. In the same manner we can do the parallel implementation of the dual system solution algorithm. Both for the primal and the dual system, when k=2, also the parallel version is simplified. It is well known that an nn Vandermonde system can be solved sequentially with O(n2) operations. The sequential implementation of the SV algorithm requires the following number of operations (4.1) C 1 = kq2+qk log k+qk , where and are constants. With k processors the parallel algorithm PARSV requires (4.2) C k = q2+ qlog k+q . that is on k processors a theoretical speed-up (4.3) S k = C1/Ck = k is obtained and, consequently, maximum efficiency. 11 Moreover it is important to underline that the previous algorithm SV is competitive also in sequential computation. Indeed, if we consider the speed-up of its implementation with respect to the Bjorck-Pereyra algorithm, the following gain is obtained (4.4) S =k2q2/(kq2+qk log k+qk)t= =kq/(q+log k+1)= =k(1+(logk+1)/q)-1 . If we use, for example, the coefficients =5/2 and =5 of the known solution algoritms (see [2], [4]), the following values for the operations number (,/,+,-) are obtained (4.5) C k = 5/2q2 +5q log k+q , (4.6) C 1 =5/2kq2+5qk log k+qk , (4.7) S = k(1+ 2(5log k+1)/5q) -1 . This also means that the sequential implementation of the algorithm pre sented is about k times faster than the Björck-Pereyra algorithm, for Vandermonde matrices of this type when kq . In other words, the asymptotical speed-up is k. Similar considerations may be done in the symmetric case when k=2. Practically we solve two Vandermonde systems of halved dimensions q = n/2 instead of the n order system (3.1.1) and the cost of the computation of xj is negligible. Therefore, whatever algorithm of n2-order we utilize for the Vandermonde system solution, the computational cost is one half the original one (speed-up 2). If we use the procedure given in [2] of complexity 5q2/2, the SSV algorithm requires 5q2 multiplications. Indeed recently same new procedures of order n (log n)2 are devised for the Vandermonde systems solution. By using them within the proposed method and omitting the computational cost of the small systems solution(see (3.16) and (3.26)), we obtain a speed-up of (4.8) S'= [log n/(log (n/2)] 2 . Moreover step 1 can be performed in parallel using 2 processors with a speed-up (k=2) Sk = 2 with respect to the sequential one. We observe that also the less expensive step 2 may be execute in parallel manner. When j (j = 1, q) are real and bj (j = 1, n) real, all the steps may be performed using only real arithmetic. On the other hand the operations number (,/) of the algorithm SV1 will be (4.8) C1(1)= (k+6)q2+q k log k +qk ; with k processors the computational cost is (4.9) Ck(1)= 7q2+q log k +q . Therefore, when k is small with respect to q, this algorithm results more expensive. Similar remarks, as for the algorithms SV, SSV, can be made on the dual system computational cost of the algorithms SVD and SSVD. 12 5. – Errors and perturbation bounds Now, starting from the compact expression of the Vandermonde matrix V given by (1.7) and using the exact decomposition of Aq= UqLq, we can write (5.1) V=(UqIk)(LqIk)D. In order to evaluate the perturbation errors involved by using this decomposition of Aq, we can consider the perturbed matrix factorization (5.2) Aq + Aq = (Uq + Uq)(Lq + Lq) and write (5.3) V+ V = (U + U)(L+L)D , where (5.4) L=LqIk , L=LqIk , (5.5) U=UqIk , U=UqIk . The matrix D is considered not error affected. Considering the norms 1,2, , it is easy to verify that (5.6) ||L||t=|| Lq||t , ||U||t=||Uq||t , ||L||t = ||Lq||t , ||U||t = ||Uq||t , t=1,2, . If we use the Frobenius norm, using a norm property of the Kronecker product, it follows (5.7) ||L||F=|| Lq||F ||Ik ||F = k||Lq||F, ||U||F=||Uq||F ||Ik ||F = k||Uq||F . Similar relations can be derived for ||L||F , ||U||F. Therefore we can write the equalities (5.8) ||L||F ||Lq||F = , ||L||F ||Lq||F ||U||F ||Uq||F = . ||U||F ||Uq||F In other words, formulas (5.6) afferms that the error norm (norms t=1,2,) of the UL decomposition is equal to the UL decomposition error norm of a Vandermonde matrix of order q=n/k. In Frobenius norm, there is a factor k (see (5.7); we can say that the factorization relative errors of a Vandermonde matrix of order n=kq are equal to ones of a q=n/k dimensional Vandermonde matrix decomposition (see (5.8)). This result is interesting because of the increasing of the ill-conditioning of the Vandermonde matrices with their order. In the symmetric case (k=2, n= 2q), Aq is a matrix of order reduced by a factor 2. The same considerations may be done for VH. Moreover an upper bound for the condition number of Aq , using norm 1,2, may be computed. From (1.7) and applying the previous norm properties, we can write (5.9) ||AqIk||t=||Aq||t = ||VD-1||t ||V||t ||D-1||t In a similar way from (2.1) it follows 13 , t=1,2, . (5.10) ||Aq-1||t ||V-1||t ||D||t , t=1,2, . Hence it is obtained the following upper bound of the condition number of Aq (5.11) condt(Aq) condt(D) condt(V) , t=1,2, . . In a similar way an upper bound of the condition number of Aq may be computed using the Frobenius norm. From (1.7) and applying the properties of the Frobenius norm, we can write (5.12) ||AqIk||F= || Aq||F||Ik ||F = ||VD-1||F ||V||F||D-1||F and hence (5.13) ||Aq||F (1/k) ||V||F||D-1||F . In a similar way from (2.1) it follows (5.14) ||Aq-1||F (1/k) ||V-1||F ||D||F . Hence it is obtained the following upper bound of the condition number of Aq. (5.15) condF(Aq) (1/k)condF(D) condF(V) . Interesting tests and results on the stability of the Björck-Pereyra algorithms are contained in [6] and [7], when the points defining the matrix are not-negative and arranged in increasing order. As shown in Section 4, the procedures SV and SVD have an approximated computational reduction factor of k for any 0(n2) algorithm we use for the Vandermonde system solution. The speed-up factor is 2 for SSV , SSVD. Moreover, in our experience, in addition to the advantage of lower computing time, these methods give generally more accuracy, in agreement with the expectation derived from the bounds (5.11) and (5.15). Some interesting numerical results are included in Appendix. 6. Rectangular Vandermonde Matrices The expression of V by means of the Kronecker product given in (1.7) can be extended to a rectangular Vandermonde matrix. Starting from this decomposition, we derive a formula for a factorization of type QRD of a column-wise rectangular matrix given by a particular point distribution in the complex plane and we obtain an algorithm for the least squares solution of an overdetermined Vandermonde system of type (6.1) Vx = b, where x and b are an n- and an m-vector respectively and V is the mxn complex Vandermonde matrix (6.2) V= {ji-1} i=1,m;j=1,n and the n=kq points j (j = 1,...,q) defining the matrix are the kth roots of q distinct complex numbers aj 0 (j=1,q) as in (1.2). We suppose the rows number m=kp and m>n . The advantage is a drastic reduction in number of operations with respect to any known QR factorization of order O(mn) (see [3],[4]). 14 Finally the error bounds of the new factorization are derived, that appear to be sharper than the classic ones, depending on a reduced order matrix. Extending a result obtained in [10] for the square matrices to the rectangular ones (see [11]), the Vandermonde matrix (6.2) can be expressed in the simplified block form (6.3) V={aji-1Vj}i=1,p;j=1,q , where Vj are the k-order Vandermonde matrices defined in (1.4). We introduce the rectangular Vandermonde matrix (6.4) A pq={aji-1}i=1,p;j=1,q and use the Kronecker product in a similar way as in the square matrix case, in order to express the matrix V by means of the compact expression (6.5) V=(ApqIk)D, where D is the block diagonal matrix of order n=kq defined in ((1.8). The row-wise rectangular Vandermonde matrix VH is the conjugate transpose of the matrix V and is given by (6.6) VH = DH(ApqHIk) In many applications the points defining the Vandermonde matrix are symmetric with respect to the origin of the complex plane. This special case is obtained from the previous more general one when k=2 (m=2p, n=2q): if we call ±j the two symmetric points, roots of the complex numbers aj (j=1,q), the submatrices Vj have the simple form (1.5) and the matrix V is (6.7) V=(Am/2n/2I2)D, where (6.8) D=diag(V1 ,V2,…,Vn/2) . If the numbers aj are real and. positive, all the points are on the real axis and all the matrices involved are real. Moreover, using some Kronecker product properties, we can express the generalized inverse of V in this way (6.9) V+ = [DH(ApqHIk)(ApqIk)D] -1 DH(ApqHIk) = = D-1(Apq-1Ik)(Apq-HIk)D-HDH(ApqHIk) = = D-1(Apq-1Ik)(Apq-HIk) (ApqHIk) = = D-1(Apq-1Apq-HApqHIk) = = D-1[(ApqHApq)-1ApqHIk] = = D-1(Apq+Ik) . 15 In a different way, it is possible to use the singular value decomposition of the reduced order matrix Apq (6.10) Apq =Up pqWqH in order to write the generalized inverse of the matrix V by means of Apq+, in this way (6.11) V+=D-1(Wqpq+UpH Ik) . 7. Least squares solution Now we can rewrite the overdetermined system (6.1) with m> n as (7.1) (ApqIk)Dx=b . Using the standard properties of the Kronecker product, the exact solution may be obtained by means of the generalized inverse of the matrix V given by (6.8), that is (7.2) x=D-1(Apq+Ik)b . Moreover, if we call B the matrix of order pxk derived from the components of b row-wise, and b=vect(B), the least squares solution x of (7.1) may be written as (7.3) x=D-1vect(Apq+B). As the dimensions of matrices Apq+ and B are small compared to those of the original matrix V, this formula implementation reduces the amount of work with respect to the explicit computation of V+. Nevertheless the larger condition number of ApqHApq, suggests us the use of a different method. It is better to compute a QR factorization of the reduced dimension matrix Apq, with any known algorithm for Vandermonde matrices, and then, by some properties of the Kronecker product, we can express the factorization of V and the least squares solution of the overdetermined system (6.1) by these factors. We can write this factorization as (7.4) Apq =Qpq Rq where the rectangular matrix Qpq and the square upper triangular matrix Rq are of dimensions p x q and q x q respectively. Moreover the q columns of the matrix Qpq are orthonormal, that is QpqHQpq=Iq , where QpqH=QpqT. Factorizations of this type, together with the inverse of the factor Rq is computed with a complexity of 5mn+7/2n for a m x n Vandermonde matrix. According to this, the matrix V of (6.5), using also some properties of the Kronecker product, can be written as (7.5) V=(QpqIk)(RqIk)D . So we have determined a factorization of V of type QRD, where the matrices Q and R, (7.6) Q=QpqIk , R=RqIk , easily obtained from the reduced dimension matrices Qpq and Rq, have the same properties of the reduced dimension ones, that is QHQ= (QpqHIk)(QpqIk )=In and R is upper triangular ; D is the known block diagonal matrix (1.8). 16 Moreover using (7.4), the previous system (7.1) becomes (QpqRqIk)Dx = b. (7.7 ) Multiplying both sides by ( Rq-1QpqHIk ) and using the properties of the Kronecker product, we obtain (7.8 ) ( Rq-1QpqHIk ) (QpqRqIk)Dx = ( Rq-1QpqHIk ) b ( Rq-1QpqHQpqRqIk)Dx = ( Rq-1QpqHIk ) b ( Rq-1IqRqIk)Dx = ( Rq-1QpqHIk ) b ( IqIk)Dx = ( Rq-1QpqHIk ) b. (7.9) Dx = (Rq-1QpqH Ik) b. Then, by the equality b = vect(B) we can write (7.10) Dx = vect(Rq-1QpqHB) . Therefore if c =vect(C), where C is the matrix product (7.11) C=Rq-1Qpq HB the block diagonal system (8.9) can be written as (7.12) Dx = c where D=diag(V1 ,V2,…Vq) and Vj=BjE (j=1,q). Hence partitioning x and c in k-order subvectors and (7.12) into q particular systems of order k, depending on the k roots of 1, they can be solved by the DFT algorithm as in ( 3.1.8). In such a way the least squares solution x of system (6.1) can be written in a simple compact matrix formulation by means of (7.4), (7.11) and (7.12). The implementation of the previous formulas gives rise to a method for the least squares solution of a Vandermonde system in these special cases of points distribution, that may be summarized in three major steps: Algorithm QRV 1. Compute the QR factorization of the Vandermonde matrix Apq; 2. Compute two matrix products, as in formula (7.11); 3. Compute the block diagonal system solution of (7.12), solving q systems of order k with the DFT algorithm. 17 The resulting computational complexity is considerably lower with respect to the known algorithms, as analysed in the next section, and the computer storage is obviously reduced. An important application is the symmetric case described at the end of section 6, when k = 2 and matrices V and D have the form (6.7) and (6.8) respectively. In such a situation of symmetric points distribution, the formula (7.3) for the overdetermined system solution becomes (7.13) x=D-1vect(Am/2n/2+B), where B is the (m/2) x 2 matrix derived from vector b and the inverse of D is given by the diagonal block matrix (7.14) D-1=diag(V1-1,V2-1,…,Vn/2-1) where the 2-order matrices Vj-1 (j=1,n/2) are known. Moreover the compact matrix formulas (7.10) and (7.11) for the least squares solution become (7.15) x = D-1 vect(Rn/2-1Qm/2 n/2H B) . If the numbers aj (j = 1, n/2) are real and positive, the whole algorithm can be performed using only real arithmetic. In a different way, it is possible to use the singular value decomposition of the reduced order matrix Apq and the generalized inverse of the matrix V given by (6.10) to compute the least squares system solution. In this case the solution vector x can be expressed as (7.16) x=D-1(Wqpq+UpH Ik) b and considering also b=vect(B), it is possible to write (7.17) x=D-1vect(Wq pq+UpHB) . Then, another algorithm may be formulated in this way: Algorithm SVDV 1. Compute the SVD of Apq =Up pqWqH 2. Compute the product P =Wq pq+UpHB 3. Solve the diagonal block system Dx=vect(P) 8. Computational analysis The implementation of the QRD factorization (7.5), involving the QR decomposition of a matrix of order reduced by a factor k, gives rise to a speed-up factor of k2 with respect to the known algorithms. If we use the algorithm described in [3] the operations numbers (,/) is 5pq+7/2q2. It is important to observe that the computational reduction is general for any algorithm of this type of order we use for the QR decomposition. In the symmetric case, when k = 2, the speed-up factor is obviously 4. 18 Here, too, for the overdetermined Vandermonde system solution we have to take into account the matrix products of step 2 of the algorithm. The operations number required by step 3 may be neglected. Hence, using the previously mentioned factorization method for step 1, the total cost of this method for the least squares system solution is Ck = (k + 5)pq+(k+7)/2 q2 (8.1) in the general case, and (8.2) C2 = 7pq+9q2/2 when the points defining the matrix V are symmetric (k=2). Better results may be obtained if we use the Strassen algorithm for the matrix products. Anywhere it is interesting to observe that the speed-up factor is (8.3) 12p+8q Sk = k2 (2p+ q)k+10p+7q a function increasing very quickly with k, the dimension of the block decomposition submatrices Vj of V. Function Sk has the asymptotic line slope ma = 2(12p+ 8q)/(2p+q), bounded below and above by 12 and 16 respectively. In other words the behaviour of function Sk is similar to a line that forms with the x-axis an angle a limited by 85.230 < a < 86.420. Furthermore the speed-up Sk is almost independent of the numbers p and q. On the symmetric point distribution (k = 2), the implementation of the previous least squares algorithm produces a considerable complexity reduction and the speed-up is (8.4) 12m+8n S2= 4 14m+9n a number between 3 and 4. Moreover, the new algorithm for the least squares system solution derived from the compact matrix formulation (7.4), (7.10) and (7.11) is particularly suitable for vectorial and parallel implementation. The whole problem is reduced to use known and efficient parallel algorithms for the QR factorization of Apq, and for the matrix products in (7.10). Also the solution of the m systems of step 3 may be done simultaneously. 9. Perturbation bounds for the QRD factorization Now our aim is to evaluate the perturbation matrices of the factors Q and R of the previous factorization QRD of V. We suppose the factor D not error affected. The method proposed in (7.9) utilizes a computed factorization of Apq, that is the exact decomposition of a perturbed matrix (9.1) Apq + Apq = (Qpq +Qpq )(Rpq +Rq) . Then we can derive consequent perturbation errors of the factorization of V in this way (9.2) V + V = [(Qpq +Qpq ) Ik ] [(Rq +Rq ) Ik ] D . If we call 19 Q = Qpq Ik (9.3) , R = Rq Ik , using (7.6) and (9.3), we can write V + V =(Q + Q)( R + R) D . (9.4) Using the norms 1,2, we can write (9.5) ||R||t =||Rq||t , ||R||t =||Rq||t , ||Q||t =||Qpq||t , ||Q||t =||Qpq||t , t=1,2, . Otherwise with the Frobenius norm it is derived (9.6) ||R||F = k||Rq||F , ||R||F = k||Rq||F , ||Q||F = k||Qpq|| F , ||Q||F = k||Qpq||F . Bounds for the Frobenius norms of matrices Qpq and Rq depending on the condition number of Apq are derived in [9]. Then we can derive consequent perturbation errors bounds of the factorization of V as follows (9.5) ||R||F k ||Rq||F ||Apq||F = 2 (k /t) cond2(Apq) ||R||t t||Rq||t ||Apq||t (9.6) , t=2 2=1 t=F F=k ||Apq||F ||Q|| F = ||Qpq||F ||Ik||F k (1+ 2) cond2(Apq) ||Apq||2 where cond2(Apq)= ||Apq||2 ||Apq+ ||2. Hence the proposed factorization reduces the errors, because they have the same norms (t=1,2,) of the errors of a factorization of a lower order Vandermonde matrix and the error bounds (t=F) depends on the condition number of Apq, a matrix of order reduced by a factor k. As a consequence, considering also that Vandermonde matrices tend to be ill conditioned with increasing order, the previous method is promising also from the point of view of its stability. Appendix Real case (k=2) In the symmetric case (k=2), when j (j=q) are real, the determinant of D in (2.76) is given by (a.1) det(D)= j=1,q det(Vj) = 2q j=1,q j and then (a.2) det(V)=2q j=1,q j det(Aq)2 . In other words the determinant of V may be computed by the computation of the determinant of the 20 halved dimension matrix Aq with a computational cost of order q2/2 (we have not considered the lower order operations). Moreover cond (D) in (5.15) can be easily computed as condF(D)= (q+j=1,q aj )1/2(q+j=1,q1/aj )1/2 (a.3) and it may be used in the following bound of the condition number of Aq condF(Aq) (1/2) condF(D) condF(V) . (a.4) Some numerical experiments have been performed, comparing the results obtained by the procedures SSV and the well known Björck-Pereyra algorithms. Obviously the results are depending on the machine precision and the matrix dimensions. Even if they confirm the ill-conditioning of the problem, in most cases better solution accuracy is obtained for the new algorithms with respect to the classical ones. In table 6.1 the performance of the procedure SSV is compared with the Björck-Pereyra algorithm for the primal system solution, when the points i =i (i=q,1) are the first natural numbers in decreasing order, the right-hand side vector b is chosen as the sum of the matrix coefficients and the exact solution is x = (1, 1, ..., 1)T . Considering the error vector norm || e|| and its ratio with the machine precision u = 0.222E-15, the following improvement is observed. TABLE 6.1. Björck-Pereyra n = 2q ||e|| 20 22 24 26 28 30 1.37 1.51 4.54 1.47 4.53 1.63 SSV || e ||/u e-09 e-08 e-08 e-06 e-06 e-05 6.19 e+ 06 6.81 e+ 07 2.04 e+ 08 6.65 e+ 09 2.04 e+ 10 7.37 e+ 10 || e || 1.27 e-09 1.96 e-09 1.03 e-08 1.31 e-06 2.84 e-06 4.54 e-07 || e || /u 5.75 e+ 06 8.83 e+ 06 4.66 e+ 07 5.90 e+ 09 1.28 e+ 10 2.04 e+ 09 Conclusions Vandermonde matrices mn (m=kp,n=kq) on a particular points distribution have been analysed including the symmetric case (k=2) of pairs of symmetric points in the complex plane. Expressions of the determinant, the inverse (m=n) and the generalized inverse (m>n) are derived, depending on Vandermonde matrices of order pq. The matrix-vector product may be computed with a speed-up factor k. Some numerical algorithms for the primal and dual system solution (m=n) are proposed and analyzed depending on reduced order Vandermonde matrices. Both the sirial and parallel implementation of the algorithms SV and SVH result to be less expensive on a computational point of view. On the other hand the algorithm SV1 and SVH1 may be competitive on particular vectorial and parallel architectures. In the same way the algorithm QRV for the Vandermonde least squares problem (m>n) results to be faster than the traditional one. The algorithm SVDV might be preferred on particular machines and it should produce more accurate results. 21 REFERENCES [1] S. BARNETT, Matrix Differential Equations and Kronecker Products, SIAM J. Ap-pl. Math., 24 (1973), pp. 1-5. [2] Ǻ. BJÖRCK - V. PEREYRA, Solution of Vandermonde Systems of Equations, Math. Comp., 24 (1970), pp. 893-904. [3] C. J. DEMEURE, Fast QR factorization of Vandermonde matrices, Linear Algebra Appl. 122/123/124 (1989) 165-194. [4] G. H. GOLUB - C. F. VAN LOAN, Matrix Computation (third Edition), John’s Hopkins University Press, 1989. [5] A. GRAHAM, Kronecker Products and Matrix Calculus with Applications, Ellis Horwood Limited, 1981. [6] N. J. HIGHAM, Error Analysis of Bjorck-Pereyra Algorithms for Solving Vandermonde Systems, Numer. Math., 50 (1987), pp. 613-632. [7] N. J. HIGHAM, Stability Analysis of Algorithms for Solving Confluent Vandermonde-like Systems, SIAM J. Matrix Anal. Appl., 11, n. 1 (1990), pp. 23-41. [8] R. A. HORN - C. R. JOHNSON, Topics in Matrix Analysis, Cambridge University Press. [9] J.-G. SUN, Perturbation bounds for the Cholesky and QR factorizations, BIT 31 (1991) 341-352. [10] T. TOMMASINI, A New Algorithm for Special Vandermonde Systems, Numerical Algorithms, 2 (1992), pp. 299-306. [11] T. TOMMASINI, Complexity Reduction of Least Squares Problems Involving Special Vandermonde Matrices, Advances in Comp. Math., 6 (1996), pp. 77-86. T.Tommasini, Dipartimento di Matematica, [email protected] 24/03/03 revised february 2004 22