* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Slide 1
Euclidean vector wikipedia , lookup
Rotation matrix wikipedia , lookup
Exterior algebra wikipedia , lookup
Vector space wikipedia , lookup
Linear least squares (mathematics) wikipedia , lookup
Jordan normal form wikipedia , lookup
Determinant wikipedia , lookup
Covariance and contravariance of vectors wikipedia , lookup
Matrix (mathematics) wikipedia , lookup
Eigenvalues and eigenvectors wikipedia , lookup
Perron–Frobenius theorem wikipedia , lookup
Orthogonal matrix wikipedia , lookup
Singular-value decomposition wikipedia , lookup
Non-negative matrix factorization wikipedia , lookup
Cayley–Hamilton theorem wikipedia , lookup
System of linear equations wikipedia , lookup
Ordinary least squares wikipedia , lookup
Gaussian elimination wikipedia , lookup
Four-vector wikipedia , lookup
Linear Algebra on GPUs Jens Krüger Technische Universität München Why LA on GPUs? 1. Why should we care about Linear Algebra at all? Use LA to solve PDEs solving PDEs can increase realism for VR, Education, Simulations, Games, … Why Linear Algebra on GPUs? 2. … and why do it on the GPU? a) The GPU is a fast streaming processor • b) LA operations are easily “streamable” The result of the computation is already on the GPU and ready for display Getting started … Visual simulation Education and Training Basis linear algebra operators High bandwidth Computer graphics applications Visual computing GPU as workhorse for numerical computations Programmable GPUs General linear algebra package Parallel computing Representation Vector representation – 2D textures best we can do • Per-fragment vs. per-vertex operations • High texture memory bandwidth • Read-write access, dependent fetches 1 N 1 N Representation (cont.) Dense Matrix representation – treat a dense matrix as a set of column vectors – again, store these vectors as 2D textures i N Vectors Matrix N 2D-Textures N ... ... N N 1 1 i N i ... ... N Representation (cont.) Banded Sparse Matrix representation – treat a banded matrix as a set of diagonal vectors i Matrix 2 Vectors 2 2D-Textures 1 N N N 1 2 2 Representation (cont.) Banded Sparse Matrix representation – combine opposing vectors to save space i Matrix 2 Vectors 2 2D-Textures 1 N N N-i N 1 2 2 Operations • Vector-Vector Operations – Reduced to 2D texture operations – Coded in vertex/fragment shaders return tex0 OP tex1 Operations (cont.) • Vector-Vector Operations – Reduce operation for scalar products original Texture st 1 pass nd 2 pass ... ... ... Reduce m x n region in fragment shader ... Operations (cont.) In depth example: Vector / Banded-Matrix Multiplication A b x = Example (cont.) Vector / Banded-Matrix Multiplication A A b x = b Example (cont.) Compute the result in 2 Passes Pass 1: = b A x multiply Example (cont.) Compute the result in 2 Passes Pass 2: A = b shift b‘ x multiply Representation (cont.) Random sparse matrix representation – Textures do not work • Splitting yields highly fragmented textures • Difficult to find optimal partitions – Idea: encode only non-zero entries in vertex arrays Column as TexCoord 1-4 Matrix Vector = Result Matrix Values as TexCoord 0 Vertex Array 1 Tex0 Pos Tex1-4 Vertex Array 2 Tex0 Pos Tex1-4 Vector Result = N Row Index as Position N Building a Framework Presented so far: • representations on the GPU for – single float values – vectors – matrices • dense • banded • random sparse • operations on these representations – add, multiply, reduce, … – upload, download, clear, clone, … Building a Framework Example: CG Encapsulate into Classes for more complex algorithms – Example use: Conjugate Gradient Method, complete source: void clCGSolver::solveInit() { m_clMatrix->matrixVectorOp(CL_SUB,m_clvX,m_clvB,m_clvR); // R = A*x-b m_clvR->addVector(m_clvR,m_clvR,-1.0f,0.0f); // R = -R m_clvR->addVector(m_clvR,m_clvP,1.0f,0.0f); // P = R m_clvR->reduceAdd(m_clvR, clfRho); // rho = sum(R*R); } void clCGSolver::solveIteration() { m_clMatrix->matrixVectorOp(CL_NULL,m_clvP,NULL,m_clvQ); // Q = Ap; m_clvP->reduceAdd(m_clvQ,clfTemp); // temp = sum(P*Q); clfRho->divZ(clfTemp,clfAlpha); // alpha = rho/temp; m_clvX->addVector(m_clvP,m_clvX,1,clfAlpha); // X = X + alpha*P m_clvR->subtractVector(m_clvQ,m_clvR,1,clfAlpha); // R = R - alpha*Q m_clvR->reduceAdd(m_clvR,clfNewRho); // newrho = sum(R*R); clfNewRho->divZ(clfRho,clfBeta); // beta = newrho/rho m_clvR->addVector(m_clvP,m_clvP,1,clfBeta); // P = R+beta*P; clFloat *temp; temp=clfNewRho; clfNewRho=clfRho; clfRho=temp; // swap rho and newrho pointer } void clCGSolver::solve(int maxiter) { solveInit(); for (int i = 0;i< maxiter;i++) solveIteration(); } int clCGSolver::solve(float rhoTresh, int maxiter) { solveInit(); clfRho->clone(clfNewRho); for (int i = 0;i< maxiter && clfNewRho.getData() > rhoTresh;i++) solveIteration(); return i; } Building a Framework Example: Jacobi • The Jacobi method for a problem A x = b where A = L + D +U can be expressed with matrices as follows: x (k +1) -1 (k ) = D (b - ( L + U ) x ) If the Matrix is stored in the diagonal form the matrix D-1 can be computed on the fly by inverting the diagonal elements during the vector-vector product. So one Jacobi step becomes one matrix-vector product, one vector-vector product and one vector subtract. Example 1 2D Waves (explicit) • usually you would compute: t +1 i, j x ( = x t i -1, j +x t i , j -1 +x t i +1, j +x t i , j +1 - 4 x t i, j ) • you need to write a custom shader for this filter • think about this as a matrix-vector operation! - - - - - - - - - x 1t x1t +1 x 2t x2t +1 x 3t x3t +1 x 4t x4t +1 x 5t x 6t x 7t x 7t x 9t = x5t +1 x6t +1 x7t +1 x8t +1 x9t +1 Example 2 2D Waves (implicit) 2D wave equation – Finite difference discretization – Implicit Crank-Nicholson scheme Key Idea: Rewrite as Matrix-Vector Product Example 2: 2D wave equation (cont.) ( ) cit = xit-1, j + xit, j -1 + xit+1, j + xit, j +1 - 4 xit, j + 2 xit, j - xit,-j1 t 2 c 2 where = 2 h 2 Navier Stokes on GPUs The Equations The Navier-Stokes Equations for 2D: Advection du 1 2 = u - V u + f x - p dt Re dv 1 2 = v - V v + f y - p dt Re Diffusion div (V ) = 0 Pressure Zero Divergence External Forces NSE Discretization ( ) v p 1 2 v 2 v (uv ) v 2 = + + gy 2 2 t Re x y x y y • Diffusion 2v vi +1, j - 2vi , j + vi -1, j 2 = (dy )2 x i, j 2v vi , j +1 - 2vi , j + vi , j -1 2 = (dy )2 y i , j • Advection (uv ) 1 (ui , j + ui , j +1 )(vi , j + vi +1, j ) (ui -1, j + ui -1, j +1 )(vi -1, j + vi , j ) 1 ui , j + ui , j +1 (vi , j - vi +1, j ) (ui -1, j + ui -1, j +1 )(vi -1, j - vi , j ) = + x d d x 2 2 2 2 x 2 2 2 2 i, j • Pressure p pi , j +1 - pi , j = dy y i, j Navier-Stokes Equations (cont.) Rewrite the Navier Stokes Equations (+ ) () ui ,t j 1 = Fi , tj - dt (t +1) (t +1) (pi+1, j - pi, j ) dx (+ ) () vi ,t j 1 = Gi ,tj - where Fi , j = u i , j Gi , j dt (t +1) (t +1) (p - p ) dy i , j +1 i , j ( ) 2 u u 2 (uv ) 1 2 u ; + t + + g x Re x 2 y 2 i , j x i , j y i , j , i j ( ) 2 v (uv ) v2 1 2 v ; = v i , j + t + + g y Re x 2 y 2 i , j x i , j y i , j , i j now F and G can be computed Navier-Stokes Equations (cont.) Problem: Pressure is still unknown! (+ ) () ui ,t j 1 = Fi , tj - dt (t +1) (t +1) (pi+1, j - pi, j ) dx (+ ) () vi ,t j 1 = Gi ,tj - dt (t +1) (t +1) (p - p ) dy i , j +1 i , j From div (V ) = 0 derive: u t +1 v t +1 G t 2 p t +1 F t 2 p t +1 + = - t + - t 0= 2 x y x x y y 2 ..to get this Poisson Equation: ( +1) ( +1) ( +1) pi +n1, j - 2 pi ,nj + pi -n1, j (dx ) 2 + ( +1) ( +1) ( +1) pi ,nj +1 - 2 pi ,nj + pi ,nj -1 (dy )2 (n ) (n ) (n ) (n ) 1 Fi , j - Fi -1, j Gi , j - Gi , j -1 = + dt dx dy Navier-Stokes Equations (cont.) The basic algorithm: 1. Compute F and G 1. add external forces 2. advect 3. diffuse easy semi-lagrange [Stam 1999] explicit 2. solve the Poisson equation use the CG solver 3. update velocities subtract pressure gradient Multigrid on GPUs Multigrid “in english” 1. do a few Jacobi/Gauss-Seidel iterations on the fine grid – – Jacobi/G-S eliminate high frequencies in the error conjugate gradient does not have this property !!! 2. compute the residuum of the last approximaton 3. propagate this residuum to the next coarser grid – 4. solve the coarser grid for the absolute error – 5. can be done by the means of a matrix multiplication for the solution you can use another multigrid step backpropagate the error to the finer grid – can be done by another matrix multiplication (transposed matrix from 3.) 6. use the error to correct the first approximation 7. do another few Jacobi/Gauss-Seidel iterations to remove noise introduced by the propagation steps Multigrid “in greek” A x =b A h x h + e h = b h h h ( h ) • Consider this problem: x is the solution vector of a set of linear equations • this equation holds for current approximation x’ with the error e h • - A x A e = b142 43 h h h h rearranging leads to the residual equation with residuum r rh I 2h h A e{ = I h 2h h h 2h h I 2 h e (1I 42 A I ) e 4 43 4 2h h h h 2h 2h = r 2h A2 h 2h 2h = A e r 2h r h • now multiply both sides with a non quadratic interpolation matrix and replace the error with an error times the transposed interpolation matrix • Let A2h be the product of the Interpolation Matrix, A and the transposed Interpolation matrix. • Finally we end up with a new set of linear equations with only half the size in every dimension of the old one. We solve this set for the error at this grid level. Propagating the error the above steps we can derive the error for the large system and use it to correct out approximation Multigrid (cont.) • „fine grid“, „coarse grid“ only makes sense if the problem to solve corresponds to a grid – this is the case in the finite difference methods described before – need to find an “interpolation matrix” for the propagation step to generate the coarser grid (for instance simple linear interpolation) – need an “extrapolation matrix” to move from the coarse to the fine grid • the coarse grid matrices can be pre-computed Multigrid on GPUs Observation: • you only need matrix-vector operations and a “Jacobi smoother” to do multigrid • to put it on the GPU simply use the matrix-vector operations from the framework Improvement: • to do the interpolation an extrapolation steps we can use the fast bilinear interpolation hardware of the GPU instead of a vector-matrix multiplication Multigrid on GPUs (cont.) We can embed the new “rescale” easily into the vector representation by simply rendering one textured quad. Vector Texture Interpolation Render Target Render Target Extrapolation Vector Texture Selected References • Chorin, A.J., Marsden, J.E. A Mathematical Introduction to Fluid Mechanics. 3rd ed. Springer. New York, 1993 • Briggs, Henson, McCormick A Multigrid Tutorial, 2nd ed. siam, ISBN 0-89871-462-1 • Acton Numerical Methods that Work, The Mathematical Association of America ISBN 0-88385-450-3 • Krüger, J. Westermann, R. Linear algebra operators for GPU implementation of numerical algorithms, In Proceedings of SIGGRAPH 2003, ACM Press / ACM SIGGRAPH • Bolz , J., Farmer, I., Grinspun, E., Schröder, P. Sparse Matrix Solvers on the GPU: Conjugate Gradients and Multigrid, In Proceedings of SIGGRAPH 2003, ACM Press / ACM SIGGRAPH • Hillesland, K. E. Nonlinear Optimization Framework for Image-Based Modeling on Programmable Graphics Hardware, In Proceedings of SIGGRAPH 2003, ACM Press / ACM SIGGRAPH • Fedkiw, R., Stam, J. and Jensen, H.W. Visual Simulation of Smoke. In Proceedings of SIGGRAPH 2001, ACM Press / ACM SIGGRAPH. 2001. • Stam, J. Stable Fluids. In Proceedings of SIGGRAPH 1999, ACM Press / ACM SIGGRAPH, 121-128. 1999. • Harris, M., Coombe, G., Scheuermann, T., and Lastra, A. Physically-Based Visual Simulation on Graphics Hardware.. Proc. 2002 SIGGRAPH / Eurographics Workshop on Graphics Hardware 2002.