Download Lecture: 5

Lecture: 5 FOUNDATIONS of SPARSE MATRIX TECHNOLOGY The main goal of the lecture to discuss questions how to store and process data when we solve complex engineering problems demanding intensive calculations under huge volume of data. Introduction Nowadays complex engineering problems demanding intensive calculations, physically based simulation in computer graphics, radiosity algorithms basically are being solved with the help of finite element method. This problem comes to the solution of system of the linear algebraic equations: A x = b, where A is a sparse or band matrix of coefficients, x is a vector of unknown node values and b is a vector of right parts. We consider symbolic and numerical algorithms for processing data. STORAGE of DATA STRUCTURES Technology of sparse matrix requires a processing list where elements of such list are numbers, matrices, arrays or switches. The simplest structure for storing the data is the ARRAY. Example: A(I), B(I,J) Here and further FORTRAN notation is used. A diagonal scheme for symmetric matrix storage A band matrix often has wide band and can contain large number of nonzero elements. One of the ways to store the matrix is the diagonal scheme. A matrix is the band matrix if all nonzero elements are confined in a band formed by diagonals that are parallel to main diagonal. Thus aij = 0, if |i -j| > b, and ak,k-b 0, or ak,k+b 0 at least for one value k. b is the half bandwidth. 2b + 1 is the band of the matrix. Consider an example for the matrix depicted in Figure 1. This is the band matrix 7*7 with the band is equal 5. 1 1.0 2 3 4 5 6 7 2.0 8.0 9.0 8.0 3.0 9.0 4.0 10.0 10.0 5.0 11.0 12.0 11.0 6.0 12.0 7.0 Figure 1. To store the matrix A an array AN(I,J) can be used. For the matrix of order N and with the half bandwidth b this array has the size N(b +1). Main diagonal has space in the last column, and lower co diagonals are in the rest columns shifted one position from top to down. This is socalled diagonal scheme. AN 0. 0. 8. = 9. 0. 0. 10. 0. 11. 12. 0. 1. 2. 3. 4. 5. 6. 7. Row-wise format Row-wise format is one of the widely used storage scheme for sparse matrices. This scheme has minim memory requirements and also is very convenient for processing sparse matrices. Values of non zero matrix elements and correspondent column indices are kept in two row arrays, let us say AN and JA. Additional pointer array marking positions of the arrays AN and JA is needed. Additional component in IA contains the pointer to the first free position in JA and AN. Consider, for example, the matrix A: 1 2 3 4 5 6 7 8 9 10 | 0. 0. 1. 3. 0. 0. 0. 5. 0. 0. | A = | 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. | | 0. 0. 0. 0. 0. 7. 0. 1. 0. 0. | A is presented in the following way: position IA JA AN =1 2 3 4 5 6 =1 4 4 6 =3 4 8 6 8 = 1. 3. 5. 7. 1. The description of the first row of matrix A starts from the position IA(1) = 1 of the arrays AN and JA. Since the description of the second row starts from IA(2) = 4, first row elements have positions 1, 2,and 3 in AN and JA. In common case, the description of a row with the number r has space in the positions from IA(r) to IA(r+1) - 1 in the arrays JA and AN. If IA(r+1) == IA(r) it means that the row with the number r is empty. This storage method is called "Row-wise Representation Complete and Ordered" (RR(C) O) as the matrix A is represented completely, and elements of each row keep the correspondent to increasing column indices. Algebra for sparse matrices Elementary algebra operations for sparse matrix are a transposition, column permutation, ordering of a sparse representation, multiplication of sparse matrices by a vector, etc. Main requirement for designing such algorithms is to provide linear dependence of calculations on a number of non-zero elements. Multiplication of a sparse matrix by a vector In these algorithm calculations are produced directly without symbolic step. The simplest case is to use "Row-wise Representation Complete and Unordered” (RR(C) U) representation . Most of algorithms for matrix calculations do not demand ordered representations. For the given above example RR(C) U representation looks as follows: Position = 1 2 3 4 5 IA =1 4 4 6 JA =8 3 4 8 6 AN = 5. 1. 3. 1. 7. Consider the multiplication of a sparse matrix of a common form by the filled vector b. This vector is stored in an array B. The result c is stored in an array C: c = Ab. Let N be the number of matrix rows. If b is a filled vector then an access to this vector can be provided arbitrary. If the vector b is sparse and is stored in an array B in packed sort then at first we need create a pointer array IP, then to find the element bi. We use the array B(IP(i)). Now the order of an accumulation of scalar products is defined arbitrary. For every matrix row I we define values of the first IAA position and last IAB position, where elements of i-th row are located in the arrays JA and AN. After that we simply look throw JA and AN on the section from IAA to IAB to calculate the scalar product of the row I and the vector b. Each value in JA is a column index and is used for extracting of an element from the array B that has to be multiply by the correspondent number from AN. The result of each multiplication is added to the C(I). The following is an algorithm in FORTRAN notation for the filled vector b: Input : IA, JA, AN - matrix in the form RR(C) U B - vector N - number of matrix rows Output : C - vector of result (size N) 1 2 3 4 5 6 7 10 8 20 DO 20 I = 1, N U = 0. IAA = IA(I) IAB = IA(I + 1) - 1 IF(IAB.LT.IAA) GO TO 20 DO 10 K = IAA, IAB U = U + AN(K) * B(JA(K)) C(I) = U The loop DO 20 treats N rows of a given matrix. Variable U accepts zero value in the line 2. U replaces C(I) in the line 7 and accumulates scalar products. The operator in the line 5 reveals empty rows of the matrix. K is a pointer for a position in the arrays JA and AN. JA(K) in the line 7 is the column index corresponding to the element AN(K). NUMERICAL ALGORITHMS for SOLUTION of LARGE SPARSE SYSTEM of EQUATIONS Iterative and direct methods Numerical methods for solving linear system of algebraic equations (SLAE) fall into two classes: iterative and direct methods. Typical iteration methods consist of a choice of the initial approximation x(1) for x and a construction of a set x(2), x(3) . ,.., such as lim x(i) = x. In practice we stop the iterations when we reach the given measure of an accuracy. On the other hand direct methods provide ultimate number of calculation steps. Which method is better? Iterative methods provide minimal memory requirements but time of calculations depends on the type of a problem. Parallel dissection method This method was developed for 2D finite element models or methods (FEM), but can be generalized for the 3D case. This method is simple enough and can be easily implemented on computers. If the size of a problem is moderate this algorithm demands less memory than any another method. Fig. 2(a) illustrates the main idea of this method. This figure depicts rectangulars that present set of nodes 2D FEM mesh. If the number  of separators has been chosen( here  is equal 3) we have mesh partitioning onto  + 1 blocks R1, R2... Collecting separators in one block creates tree partitioning. It is explained by Figure 2(b). Such partitioning decreases fill-in and number of operations. Now we can enumerate nodes of each R set sequentially from left to right starting from the left down corner, after this numeration all separators are numbered in vertical direction. This is a monotonous numeration of a tree. Thus, a matrix connected with the FEM mesh is separated on the blocks, see Figure 2(c). Filling can appear only in saturated areas. In this example, four diagonal blocks have banded structure and can be stored in the sparse row format. For this example, m and l are the numbers of mesh nodes in two orthogonal directions. Figure 2. Parallel dissection of regular net. The derived SLAE of size N can be written as follows: Ai xi + Bi yi = Ei Bti xi + Di yi = Fi i = 1,2,..., N - 1. Define xi from the first equation and place it into the second: xi = A-1i Ei - A-1i Bi yi, D*i yi = F*i , where D*i = Di - Bti A-1i Bi , F*i = - Bti A-1i Ei + Fi. We call Ai , Bi, and Di respectively interior, boundary and dissections matrices . Calculations are carried out according to the next algorithm: 1. Decomposition of Ai into the product of the lower and the upper triangular matrices Ai = Li Ri . This decomposition for each i-th row can be produced by m l3/N3 operations. All matrices can be operated by (N + 1) m l3/N3  m l3/N2 operations. 2. Calculating D*i. We can use the implicit asymmetric method that is based on the equality: Bti A-1i Bi = Bti (Li Ri)-1 Bi = Bti (Ri-1(Li-1 Bi)). Calculations are carried out in three steps for each b-column of the matrix Bi  Calculating Li-1 b = c. This is equal to the solution of the SLAE Li c = b . It can be done for the m l2/N2 operations.  Calculating Ri-1 c = d. It can be done forthe m l2/N2 operations.  Calculating Bit d. Matrix Bit d contains 2m rows, and each row has only three nonzero elements. This step can be done for the 6m operations. Namely, in this step we use the fact that matrices Bi and Bit are sparse. So for each i the 2m l2/N2 operations are needed. 3. Solving the system D*i y= F*i. The size of this matrix is equal to mN. This SLAE can be decomposed by Gauss method for m 2m i 1 im N  ( 2m  i )  N  ( 2m  i )  N(8m3/3 -m3/3 ) = 7Nm3/3 operations. So, the total number of operations for this method( without forward and backward substitution ) is T(N) = m l3/N2 + m l2/N2 + 7Nm3/3. Subroutine for a numerical solution of a triangular system This subroutine provides a solution for the sparse SLAE L y = b, where L is a lower triangular matrix. Let us the L matrix is given. a11 a22 a31 a33 a43 a44 a53 a62 a55 a64 a66 This matrix is stored in the following profile format: DIAG : a11 a22 a33 ENV : a31 a43 a53 XENV : 1 0 1 1 3 a44 4 a55 0 6 a66 a62 0 a64 0 10 The algorithm is given in FORTRAN notation( .LT. is <, .GE. is >=, .EQ. is ==, .LT. is <). For understanding of the algorithm, you can rewrite this FORTRAN algorithm in C language, input data, compile, run, and print output results. You can input the next data for the matrix L: {2},{0.50, 0.50},{1, -1, 1}, {0.25, --0.25, -0.50, 0.50}, 1, -1, -2, -3, 1} , and for the vector b: {7, 3, 7, -4, -4}. Input: NEQNS - integer number of equations : (XENV, ENV) - real arrays for envelop L .... : DIAG - a real array for diagonal elements Output:- RHS- contains real input vector b and real output vector of the solution y //There are additional integer variables: I, IBAND, IFIRST, K, KSTOP, KSTRT, L, //LAST and real S. //search for the first nonzero element in RHS 1 IFIRST=0 2 100 IFIRST= IFIRST + 1 3 IF(RHS(IFIRST) .NE. 0.) GO TO 200 4 IF(IFIRST .LT. NEQNS) GO TO 100 5 RETURN 6 200 LAST = 0 //LAST containes the number of last calculated nonzero component of the solution 7 DO 500 I = IFIRST, NEQNS 8 IBAND = XENV(I+1) - XENV(I) 9 IF(IBAND .GE. I) IBAND = I-1 10 S = RHS(I) 11 L = I - IBAND 12 RHS(I) = 0. //envelop row is empty or correspondent components of the solution are zero 13 IF (IBAND .EQ. 0. .OR. LAST .LT. L) GO TO 400 14 KSTRT = XENV(I + 1) - IBAND 15 KSTOP = XENV(I + 1) - 1 16 DO 300 K = KSTRT, KSTOP 17 S = S - ENV(K)*RHS(L) 18 L=L+1 19 300 CONTINUE 20 400 IF ( S .EQ. 0.) GO TO 500 30 RHS(I) = S/DIAG(I) 40 LAST = I 50..500 CONTINUE 60 RETURN 70 END Iterative methods Iterative methods usually are used for solving a sparse SLAE. Gauss-Seidel method Let the SLAE Ax = b is given. A is a matrix of size N by N, and x and b are vectors of size N. Let us have initial approximation x(0) for the solution of SLAE. We can choose x(0) i = bi/aii. We consider the following equation: i 1 n j 1 j i 1 xi(m +1) = (bi -  aij xj(m +1) -  aij xj(m )) / aii, where xi(m +1) is an i-th coordinate of the vector x for the (m +1)-th iteration, xi(m +1) is an i-th coordinate of vector x for the (m +1) iteration, bi is an i-th coordinate of the vector b, aij is an (i,j) element of matrix A. For each iteration ri = xi(m +1) - xi(m ) residuals can be calculated and the process can be stopped when the error = max ri < epsilon is true. Sometimes speed of the iteration process can be improved by applying relaxation method. At the beginning we define an intermediate result: i 1 n j 1 j i 1 yi(m +1) = (bi -  aij xj(m +1) -  aij xj(m )) / aii, and then xi(m +1) = xi(m) + w(yi(m +1) - xi(m )). Usually the relaxation coefficient has a value from 1 to 2. Algorithm for sparse SLAE Input : IA, JA, AN - matrix in the form RR(L) U , where L means that only lower triangular elements are stored. AD B N F EPS Output : X - diagonal elements of a matrix A - vector - number of matrix rows - relaxation multiplier - accuracy of solution - vector of result (size N) 1 2 10 DO 10 I = 1, N X(I) = B(I)/AD(I) 3 IT = 0 4 20 IT = IT + 1 5 IEND = 0 6 DO 40 I = 1, N 7 IAA = IA(I) 8 IAB = IA(I+!) - 1 9 IF(IAB .LT. IAA) GO TO 40 10 U = B(I) 11 DO 30 J= IAA, IAB 12 30 U = U - AN(J) * X(JA(J)) 13 U = U/AD(I) - X(I) 14 IF(ABS(U) .GT. EPS) IEND = 1 15 X(I) = X(I) +F * U 16 40 CONTINUE 17 IF(IEND .EQ. 1) GO TO 20 In the loop DO 10 initial values of the unknown vector x are calculated. The variable IT is a current number of iteration. Loop DO 40 produces processing N given equations. Loop DO 30 produces calculations according to the above formula. Exercises. 1. Rewrite the Subroutine for the Gauss-Seidel method for sparse SLAE in C, compile and run to solve the following system of equations: 20.9 x1 + 1.2 x2 + 2.1 x3 + 0.9x4 = 21.70 1.2 x1 + 21.2 x2 + 1.5 x3 + 2.5x4 = 27.46 2.1 x1 + 1.5 x2 + 19.8 x3 + 1.3x4 = 28.76 0.9 x1 + 2.5 x2 + 1.3 x3 + 32.1x4 = 49.72 2. Use C program code (file NM5TEST), compile and run the program to see the use of sparse matrix technology

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Lecture: 5