Download Lecture: 5

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Root-finding algorithm wikipedia , lookup

Factorization of polynomials over finite fields wikipedia , lookup

Compressed sensing wikipedia , lookup

Dynamic substructuring wikipedia , lookup

Linear least squares (mathematics) wikipedia , lookup

False position method wikipedia , lookup

Determinant wikipedia , lookup

Singular-value decomposition wikipedia , lookup

System of linear equations wikipedia , lookup

Gaussian elimination wikipedia , lookup

Matrix multiplication wikipedia , lookup

Transcript
Lecture: 5
FOUNDATIONS of SPARSE MATRIX TECHNOLOGY
The main goal of the lecture to discuss questions how to store and process data when
we solve complex engineering problems demanding intensive calculations under huge
volume of data.
Introduction
Nowadays
complex
engineering
problems
demanding
intensive
calculations, physically based simulation in computer graphics, radiosity
algorithms basically are being solved with the help of finite element
method. This problem comes to the solution of system of the linear
algebraic equations:
A x = b,
where A is a sparse or band matrix of coefficients, x is a vector of
unknown node values and b is a vector of right parts. We consider
symbolic and numerical algorithms for processing data.
STORAGE of DATA STRUCTURES
Technology of sparse matrix requires a processing list where elements of
such list are numbers, matrices, arrays or switches.
The simplest structure for storing the data is the ARRAY.
Example: A(I), B(I,J)
Here and further FORTRAN notation is used.
A diagonal scheme for symmetric matrix storage
A band matrix often has wide band and can contain large number of
nonzero elements. One of the ways to store the matrix is the diagonal
scheme. A matrix is the band matrix if all nonzero elements are confined
in a band formed by diagonals that are parallel to main diagonal. Thus aij
= 0, if |i -j| > b, and ak,k-b 0, or ak,k+b 0 at least for one value k. b
is the half bandwidth. 2b + 1 is the band of the matrix.
Consider an example for the matrix depicted in Figure 1. This is the band
matrix 7*7 with the band is equal 5.
1
1.0
2
3
4
5
6
7
2.0 8.0 9.0
8.0 3.0
9.0
4.0 10.0
10.0 5.0 11.0 12.0
11.0 6.0
12.0
7.0
Figure 1.
To store the matrix A an array AN(I,J) can be used. For the matrix of
order N and with the half bandwidth b this array has the size N(b +1).
Main diagonal has space in the last column, and lower co diagonals are
in the rest columns shifted one position from top to down. This is socalled diagonal scheme.
AN
0.
0. 8.
= 9. 0.
0. 10.
0. 11.
12. 0.
1.
2.
3.
4.
5.
6.
7.
Row-wise format
Row-wise format is one of the widely used storage scheme for sparse
matrices. This scheme has minim memory requirements and also is very
convenient for processing sparse matrices. Values of non zero matrix
elements and correspondent column indices are kept in two row arrays,
let us say AN and JA.
Additional pointer array marking positions of the arrays AN and JA is
needed. Additional component in IA contains the pointer to the first free
position in JA and AN.
Consider, for example, the matrix A:
1 2 3 4 5 6 7 8 9 10
| 0. 0. 1. 3. 0. 0. 0. 5. 0. 0. |
A = | 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. |
| 0. 0. 0. 0. 0. 7. 0. 1. 0. 0. |
A is presented in the following way:
position
IA
JA
AN
=1 2 3 4 5 6
=1 4 4 6
=3 4 8 6 8
= 1. 3. 5. 7. 1.
The description of the first row of matrix A starts from the position IA(1)
= 1 of the arrays AN and JA. Since the description of the second row
starts from IA(2) = 4, first row elements have positions 1, 2,and 3 in AN
and JA.
In common case, the description of a row with the number r has space in
the positions from IA(r) to IA(r+1) - 1 in the arrays JA and AN. If
IA(r+1) == IA(r) it means that the row with the number r is empty. This
storage method is called "Row-wise Representation Complete and
Ordered" (RR(C) O) as the matrix A is represented completely, and
elements of each row keep the correspondent to increasing column
indices.
Algebra for sparse matrices
Elementary algebra operations for sparse matrix are a transposition,
column permutation, ordering of a sparse representation, multiplication
of sparse matrices by a vector, etc. Main requirement for designing such
algorithms is to provide linear dependence of calculations on a number of
non-zero elements.
Multiplication of a sparse matrix by a vector
In these algorithm calculations are produced directly without symbolic
step. The simplest case is to use "Row-wise Representation Complete
and Unordered” (RR(C) U) representation . Most of algorithms for matrix
calculations do not demand ordered representations. For the given above
example RR(C) U representation looks as follows:
Position = 1 2 3 4 5
IA
=1 4 4 6
JA
=8 3 4 8 6
AN
= 5. 1. 3. 1. 7.
Consider the multiplication of a sparse matrix of a common form by the
filled vector b. This vector is stored in an array B. The result c is stored
in an array C:
c = Ab.
Let N be the number of matrix rows. If b is a filled vector then an access
to this vector can be provided arbitrary. If the vector b is sparse and is
stored in an array B in packed sort then at first we need create a pointer
array IP, then to find the element bi. We use the array B(IP(i)). Now the
order of an accumulation of scalar products is defined arbitrary. For
every matrix row I we define values of the first IAA position and last
IAB position, where elements of i-th row are located in the arrays JA and
AN. After that we simply look throw JA and AN on the section from
IAA to IAB to calculate the scalar product of the row I and the vector b.
Each value in JA is a column index and is used for extracting of an
element from the array B that has to be multiply by the correspondent
number from AN. The result of each multiplication is added to the C(I).
The following is an algorithm in FORTRAN notation for the filled vector
b:
Input
: IA, JA, AN - matrix in the form RR(C) U
B
- vector
N
- number of matrix rows
Output : C
- vector of result (size N)
1
2
3
4
5
6
7 10
8 20
DO 20 I = 1, N
U = 0.
IAA = IA(I)
IAB = IA(I + 1) - 1
IF(IAB.LT.IAA) GO TO 20
DO 10 K = IAA, IAB
U = U + AN(K) * B(JA(K))
C(I) = U
The loop DO 20 treats N rows of a given matrix. Variable U accepts zero
value in the line 2. U replaces C(I) in the line 7 and accumulates scalar
products. The operator in the line 5 reveals empty rows of the matrix. K
is a pointer for a position in the arrays JA and AN. JA(K) in the line 7 is
the column index corresponding to the element AN(K).
NUMERICAL ALGORITHMS for SOLUTION of LARGE SPARSE
SYSTEM of EQUATIONS
Iterative and direct methods
Numerical methods for solving linear system of algebraic equations
(SLAE)
fall into two classes: iterative and direct methods. Typical
iteration methods consist of a choice of the initial approximation x(1) for x
and a construction of a set x(2), x(3) . ,.., such as lim x(i) = x. In practice
we stop the iterations when we reach the given measure of an accuracy.
On the other hand direct methods provide ultimate number of calculation
steps. Which method is better? Iterative methods provide minimal
memory requirements but time of calculations depends on the type of a
problem.
Parallel dissection method
This method was developed for 2D finite element models or methods
(FEM), but can be generalized for the 3D case. This method is simple
enough and can be easily implemented on computers. If the size of a
problem is moderate this algorithm demands less memory than any
another method.
Fig. 2(a) illustrates the main idea of this method. This figure depicts
rectangulars that present set of nodes 2D FEM mesh. If the number  of
separators has been chosen( here  is equal 3) we have mesh partitioning
onto  + 1 blocks R1, R2...
Collecting separators in one block creates tree partitioning. It is
explained by Figure 2(b). Such partitioning decreases fill-in and number
of operations. Now we can enumerate nodes of each R set sequentially
from left to right starting from the left down corner, after this numeration
all separators are numbered in vertical direction. This is a monotonous
numeration of a tree. Thus, a matrix connected with the FEM mesh is
separated on the blocks, see Figure 2(c). Filling can appear only in
saturated areas. In this example, four diagonal blocks have banded
structure and can be stored in the sparse row format. For this example, m
and l are the numbers of mesh nodes in two orthogonal directions.
Figure 2. Parallel dissection of regular net.
The derived SLAE of size N can be written as follows:
Ai xi + Bi yi = Ei
Bti xi + Di yi = Fi
i = 1,2,..., N - 1.
Define xi from the first equation and place it into the second:
xi = A-1i Ei - A-1i Bi yi,
D*i yi = F*i ,
where
D*i = Di - Bti A-1i Bi ,
F*i = - Bti A-1i Ei + Fi.
We call Ai , Bi, and Di respectively interior, boundary and dissections matrices .
Calculations are carried out according to the next algorithm:
1. Decomposition of Ai into the product of the lower and the upper
triangular matrices Ai = Li Ri . This decomposition for each i-th row can
be produced by m l3/N3 operations. All matrices can be operated by (N +
1) m l3/N3  m l3/N2 operations.
2. Calculating D*i. We can use the implicit asymmetric method that is
based on the equality:
Bti A-1i Bi = Bti (Li Ri)-1 Bi = Bti (Ri-1(Li-1 Bi)).
Calculations are carried out in three steps for each b-column of the
matrix Bi
 Calculating Li-1 b = c. This is equal to the solution of the SLAE
Li c = b .
It can be done for the m l2/N2 operations.
 Calculating Ri-1 c = d.
It can be done forthe m l2/N2 operations.
 Calculating Bit d. Matrix Bit d contains 2m rows, and each row has
only three nonzero elements. This step can be done for the 6m
operations. Namely, in this step we use the fact that matrices Bi and Bit
are sparse. So for each i the 2m l2/N2 operations are needed.
3. Solving the system D*i y= F*i.
The size of this matrix is equal to mN. This SLAE can be decomposed by
Gauss method for
m
2m
i 1
im
N  ( 2m  i )  N  ( 2m  i ) 
N(8m3/3 -m3/3 ) = 7Nm3/3 operations.
So, the total number of operations for this method( without forward and
backward substitution ) is
T(N) = m l3/N2 + m l2/N2 + 7Nm3/3.
Subroutine for a numerical solution of a triangular system
This subroutine provides a solution for the sparse SLAE
L y = b,
where L is a lower triangular matrix. Let us the L matrix is given.
a11
a22
a31
a33
a43 a44
a53
a62
a55
a64
a66
This matrix is stored in the following profile format:
DIAG : a11 a22
a33
ENV : a31
a43 a53
XENV : 1
0
1
1
3
a44
4
a55
0
6
a66
a62
0
a64
0
10
The algorithm is given in FORTRAN notation( .LT. is <, .GE. is >=, .EQ.
is ==, .LT. is <). For understanding of the algorithm, you can rewrite this
FORTRAN algorithm in C language, input data, compile, run, and print
output results.
You can input the next data for the matrix L:
{2},{0.50, 0.50},{1, -1, 1}, {0.25, --0.25, -0.50, 0.50}, 1, -1, -2, -3, 1} ,
and for the vector b:
{7, 3, 7, -4, -4}.
Input: NEQNS - integer number of equations
: (XENV, ENV) - real arrays for envelop L
....
: DIAG - a real array for diagonal elements
Output:- RHS- contains real input vector b and real output vector of the solution y
//There are additional integer variables: I, IBAND, IFIRST, K, KSTOP, KSTRT, L, //LAST
and real S.
//search for the first nonzero element in RHS
1
IFIRST=0
2 100
IFIRST= IFIRST + 1
3
IF(RHS(IFIRST) .NE. 0.) GO TO 200
4
IF(IFIRST .LT. NEQNS) GO TO 100
5
RETURN
6 200
LAST = 0
//LAST containes the number of last calculated nonzero component of the solution
7
DO 500 I = IFIRST, NEQNS
8
IBAND = XENV(I+1) - XENV(I)
9
IF(IBAND .GE. I) IBAND = I-1
10
S = RHS(I)
11
L = I - IBAND
12
RHS(I) = 0.
//envelop row is empty or correspondent components of the solution are zero
13
IF (IBAND .EQ. 0. .OR. LAST .LT. L) GO TO 400
14
KSTRT = XENV(I + 1) - IBAND
15
KSTOP = XENV(I + 1) - 1
16
DO 300 K = KSTRT, KSTOP
17
S = S - ENV(K)*RHS(L)
18
L=L+1
19 300
CONTINUE
20 400
IF ( S .EQ. 0.) GO TO 500
30
RHS(I) = S/DIAG(I)
40
LAST = I
50..500
CONTINUE
60
RETURN
70
END
Iterative methods
Iterative methods usually are used for solving a sparse SLAE.
Gauss-Seidel method
Let the SLAE Ax = b is given. A is a matrix of size N by N, and x and b
are vectors of size N.
Let us have initial approximation x(0) for the solution of SLAE. We can
choose x(0) i = bi/aii.
We consider the following equation:
i 1
n
j 1
j i 1
xi(m +1) = (bi -  aij xj(m +1) -  aij xj(m )) / aii,
where xi(m +1) is an i-th coordinate of the vector x for the (m +1)-th
iteration,
xi(m +1) is an i-th coordinate of vector x for the (m +1) iteration,
bi is an i-th coordinate of the vector b,
aij is an (i,j) element of matrix A.
For each iteration ri = xi(m +1) - xi(m ) residuals can be calculated and the
process can be stopped when the error = max ri < epsilon is true.
Sometimes speed of the iteration process can be improved by applying
relaxation method. At the beginning we define an intermediate result:
i 1
n
j 1
j i 1
yi(m +1) = (bi -  aij xj(m +1) -  aij xj(m )) / aii,
and then
xi(m +1) = xi(m) + w(yi(m +1) - xi(m )).
Usually the relaxation coefficient has a value from 1 to 2.
Algorithm for sparse SLAE
Input
: IA, JA, AN - matrix in the form RR(L) U , where L means that only lower
triangular elements are stored.
AD
B
N
F
EPS
Output : X
- diagonal elements of a matrix A
- vector
- number of matrix rows
- relaxation multiplier
- accuracy of solution
- vector of result (size N)
1
2 10
DO 10 I = 1, N
X(I) = B(I)/AD(I)
3
IT = 0
4 20
IT = IT + 1
5
IEND = 0
6
DO 40 I = 1, N
7
IAA = IA(I)
8
IAB = IA(I+!) - 1
9
IF(IAB .LT. IAA) GO TO 40
10
U = B(I)
11
DO 30 J= IAA, IAB
12 30
U = U - AN(J) * X(JA(J))
13
U = U/AD(I) - X(I)
14
IF(ABS(U) .GT. EPS) IEND = 1
15
X(I) = X(I) +F * U
16 40
CONTINUE
17
IF(IEND .EQ. 1) GO TO 20
In the loop DO 10 initial values of the unknown vector x are calculated.
The variable IT is a current number of iteration. Loop DO 40 produces
processing N given equations. Loop DO 30 produces calculations
according to the above formula.
Exercises.
1. Rewrite the Subroutine for the Gauss-Seidel method for sparse SLAE
in C, compile and run to solve the following system of equations:
20.9 x1 + 1.2 x2 + 2.1 x3 + 0.9x4 = 21.70
1.2 x1 + 21.2 x2 + 1.5 x3 + 2.5x4 = 27.46
2.1 x1 + 1.5 x2 + 19.8 x3 + 1.3x4 = 28.76
0.9 x1 + 2.5 x2 + 1.3 x3 + 32.1x4 = 49.72
2. Use C program code (file NM5TEST), compile and run the program to
see the use of sparse matrix technology