Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
Parallel Iterative Methods for Sparse Linear Systems H. Martin Bücker Lehrstuhl für Hochleistungsrechnen www.sc.rwth-aachen.de RWTH Aachen H. Martin Bücker Institute for Scientific Computing Large and Sparse H. Martin Bücker Institute for Scientific Computing Small and Dense H. Martin Bücker Institute for Scientific Computing Outline • Problem with Direct Methods • Iterative Methods • Krylov Subspace Methods • Selected Issues in Parallelism H. Martin Bücker Institute for Scientific Computing Solution of Linear Systems Ax = b A coefficient matrix • size N x N • regular, unique solution exists, ... • large • sparse x, b vectors of dimension N H. Martin Bücker Institute for Scientific Computing Origin of Sparse Systems • Finite Element Method • Finite Volume Method • Finite Difference Method • Further Sources as well Increase problem size N ! sparsity is increased Therefore: Sparsity more and more important H. Martin Bücker Institute for Scientific Computing Outline • Problem with Direct Methods • Iterative Methods • Krylov Subspace Methods • Parallelism Consider Factorization ... H. Martin Bücker Institute for Scientific Computing Original Ordering H. Martin Bücker Institute for Scientific Computing Original Ordering H. Martin Bücker Institute for Scientific Computing Reverse CMK Ordering H. Martin Bücker Institute for Scientific Computing Reverse CMK Ordering H. Martin Bücker Institute for Scientific Computing Column Count Ordering H. Martin Bücker Institute for Scientific Computing Column Count Ordering H. Martin Bücker Institute for Scientific Computing Minimum Degree Ordering H. Martin Bücker Institute for Scientific Computing Minimum Degree Ordering H. Martin Bücker Institute for Scientific Computing Fill-In H. Martin Bücker Institute for Scientific Computing Time H. Martin Bücker Institute for Scientific Computing Explicit Use of Matrix 2 “Pure” direct methods: Ω ( N ) storage Reality check: N = 10 7 ! 600 Tera Byte Fill-in and time depend on ordering Finding ordering with minimal fill-in is hard combinatorial problem (“NP-complete”). H. Martin Bücker Institute for Scientific Computing Sparse Direct Methods A. George and J.W. Liu: Computer Solution of Large Sparse Positive Definite Systems, Prentice-Hall, 1981. I.S. Duff, A.M. Erisman, and J. Reid: Direct Methods for Sparse Matrices, Clarendon Press, 1986. C.H. Bischof: “Introduction to High-Performance Computing”, winter 2001/02, Aachen University of Technology. H. Martin Bücker Institute for Scientific Computing Outline • Problem with Direct Methods • Iterative Methods • Krylov Subspace Methods • Parallelism H. Martin Bücker Institute for Scientific Computing Implicit Use of Matrix Avoid explicit use of matrix (rows & cols ops) by y Ay “fast” matrix-vector multiplication • O ( N ) for sparse matrices • O ( N log N ) for dense and structured matrices H. Martin Bücker Institute for Scientific Computing Classical Iterative Methods Iterative scheme M x n = S x n-1 + b with matrix splitting A=M-S converges if M nonsingular and ρ (M Choice of splitting H. Martin Bücker -1 S ) < 1. ! Jacobi, Gauss-Seidel, ... Institute for Scientific Computing Outline • Problem with Direct Methods • Iterative Methods • Krylov Subspace Methods • Parallelism H. Martin Bücker Institute for Scientific Computing Alexei Nikolaevich Krylov Maritime Engineer 300 papers and books: shipbuilding, magnetism, artillery, math, astronomy 1890: Theory of oscillating motions of the ship 1863-1945 H. Martin Bücker 1931: Krylov subspace methods Institute for Scientific Computing Krylov Subspace Methods H. Martin Bücker Institute for Scientific Computing Conjugate Gradients (CG) symmetric positive definite systems for n = 1, 2, 3, ... ... A p n-1 x n = x n-1 + α n p n-1 ... endfor Optimal: x n by minimizing || x ∗ - x n || A Efficient: storage and work per iteration fixed H. Martin Bücker Institute for Scientific Computing Generalized Minimum Residual Method (GMRES) general nonsymmetric systems for n = 1, 2, 3, ... ... A p n-1 for k = 1, n α k n = p Tk v endfor ... endfor H. Martin Bücker Institute for Scientific Computing GMRES Let residual vector rn := b - A x n Goal of any Krylov subspace method: r n → 0 Optimal: x n by minimizing || b - A x n || 2 Inefficient: storage and work per iteration ! n H. Martin Bücker Institute for Scientific Computing Classification Efficient: MatVec + O( N ) Optimal H. Martin Bücker Inefficient: MatVec + O( n N ) Not Optimal Institute for Scientific Computing Classification Efficient: MatVec + O( N ) Inefficient: MatVec + O( n N ) Not Useful Optimal H. Martin Bücker Not Optimal Institute for Scientific Computing Classification Efficient: MatVec + O( N ) Inefficient: MatVec + O( n N ) GMRES Optimal H. Martin Bücker Not Optimal Institute for Scientific Computing Classification Efficient: MatVec + O( N ) Inefficient: MatVec + O( n N ) CG (symmetric!) Optimal H. Martin Bücker Not Optimal Institute for Scientific Computing Classification Efficient: MatVec + O( N ) Inefficient: MatVec + O( n N ) Not Possible Optimal H. Martin Bücker Not Optimal Institute for Scientific Computing Classification Efficient: MatVec + O( N ) Inefficient: MatVec + O( n N ) Long Recurrences Optimal H. Martin Bücker Not Optimal Institute for Scientific Computing Classification Efficient: MatVec + O( N ) Inefficient: MatVec + O( n N ) Short Recurrences Optimal H. Martin Bücker Not Optimal Institute for Scientific Computing Iterative Methods Y. Saad: Iterative Methods for Sparse Linear Systems, PWS Publishing, 1996. L.N. Trefethen and D. Bau, III: Numerical Linear Algebra, SIAM, 1997. H.M. Bücker: “Parallel Algorithms and Software for Iterative Methods”, summer 2001, Aachen University of Technology. H. Martin Bücker Institute for Scientific Computing Outline • Problem with Direct Methods • Iterative Methods • Krylov Subspace Methods • Parallelism H. Martin Bücker Institute for Scientific Computing Parallel Matrix-Vector Product z=Ay N z i = a ii y i + Σ k=1 (i,k) ∈ E a y ik k Distribute data and work on p processors • Balancing of computational load • Minimization of communication H. Martin Bücker Institute for Scientific Computing Symmetric Matrix Pattern H. Martin Bücker Institute for Scientific Computing Graph Representation H. Martin Bücker Institute for Scientific Computing Graph Partitioning Given undirected graph G = ( V, E ), find partition of nodes V = V1+ V2+ ... + Vp such that number of edges connecting nodes in different Vi is minimal. Hard combinatorial problem NP-complete (for p = 2). H. Martin Bücker Institute for Scientific Computing Graph Partitioning H. Martin Bücker Institute for Scientific Computing Elimination of Syncs Iterative methods involve synchronization points in reduction operations such as • inner products • vector norms Avoid data dependencies when designing new iterative methods. H. Martin Bücker Institute for Scientific Computing Convergence History || r n || || r 0 || Iteration n H. Martin Bücker Institute for Scientific Computing Parallel Performance Intel Paragon, 1997 Processors H. Martin Bücker Institute for Scientific Computing Parallel Performance Processors H. Martin Bücker Institute for Scientific Computing Parallel Performance depends on architecture Processors H. Martin Bücker Institute for Scientific Computing Summary • Direct Methods ! Fill-In • Don’t Use Classical Iterations • Krylov Subspace Methods (Long vs. Short Recurrences) • New Issues in Parallelism (Graph Partitioning, New Methods) H. Martin Bücker Institute for Scientific Computing Graph Partitioning H. Martin Bücker Institute for Scientific Computing