* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Condition estimation and scaling
Survey
Document related concepts
Rotation matrix wikipedia , lookup
Jordan normal form wikipedia , lookup
Eigenvalues and eigenvectors wikipedia , lookup
Determinant wikipedia , lookup
Linear least squares (mathematics) wikipedia , lookup
Matrix (mathematics) wikipedia , lookup
Perron–Frobenius theorem wikipedia , lookup
Singular-value decomposition wikipedia , lookup
System of linear equations wikipedia , lookup
Principal component analysis wikipedia , lookup
Matrix calculus wikipedia , lookup
Orthogonal matrix wikipedia , lookup
Cayley–Hamilton theorem wikipedia , lookup
Four-vector wikipedia , lookup
Non-negative matrix factorization wikipedia , lookup
Ordinary least squares wikipedia , lookup
Transcript
Bindel, Fall 2012 Matrix Computations (CS 6210) Week 5: Monday, Sep 17 The slippery inverse The concept of the inverse of a matrix is generally more useful in theory than in numerical practice. We work with the inverse implicitly all the time through solving linear systems via LU; but we rarely form it explicitly, unless the inverse has some special structure we want to study or to use. If we did want to form A−1 explicitly, the usual approach is to compute P A = LU , then use that factorization to solve the systems Axk = ek , where ek is the kth column of the identity matrix and xk is thus the kth column of the identity matrix. As discussed last time, forming the LU factorization takes n3 /3 multiply-adds (2n3 /3 flops), and a pair of triangular solves takes n2 multiply-add operations. Therefore, computing the inverse explicitly via an LU factorization takes about n3 multiply-add operations, or roughly three times as much arithmetic as the original LU factorization. Furthermore, multiplying by an explicit inverse is almost exactly the same amount of arithmetic work as a pair of triangular solves. So computing and using an explicit inverse is, on balance, more expensive than simply solving linear systems using the LU factorization. To make matters worse, multiplying by the explicit inverse of a matrix is not a backward stable algorithm. Even if we could compute A−1 essentially exactly, only committing rounding errors when storing the entries and when performing matrix-vector multiplication, we would find fl(A−1 b) = (A−1 + F )b, where |F | ≤ nmach |A−1 |. But this corresponds to to a backward error of roughly −AF A, which is potentially much larger than kAk. In summary: you should get used to the idea that any time you see an inverse in the description of a numerical method, it is probably shorthand for “solve a linear system here.” Except in special circumstances, forming and multiplying by an explicit inverse is both slower and less numerically stable than solving a linear system by Gaussian elimination. Iterative refinement revisited At the end of last lecture, we discussed iterative refinement: xk+1 = xk + Â−1 (b − Axk ). Bindel, Fall 2012 Matrix Computations (CS 6210) The fixed point for this iteration is x = A−1 b, and we can write a simple recurrence for the error ek+1 = xk − x: ek+1 = Â−1 Eek where E = Â − A. Therefore, if kÂ−1 Ek < 1, then iterative refinement converges – in exact arithmetic. In floating point arithmetic, we actually compute something like xk+1 = xk + Â−1 (b − Axk + δk ) + µk , where δk is an error associated with computing the residual, and µk is an error associated with the update. This gives us the error recurrence ek+1 = Â−1 Eek + Â−1 δk + µk If kδk k < α and kµk k < β for all k, then we can show that kxk − xk ≤ kA−1 Ekk kx0 − xk + αkA−1 k + β . 1 − kA−1 Ek If we evaluate the residual in the obvious way, we typically have α ≤ c1 mach kAkkxk, β ≤ c2 mach kxk, for some modest c1 and c2 ; and for large enough k, we end up with kxk − xk ≤ C1 mach κ(A) + C2 mach . kxk That is, iterative refinement leads to a relative error not too much greater than we would expect due to a small relative perturbation to A; and we can show that in this case the result is backward stable. And if we use mixed precision to evaluate the residual accurately enough relative to κ(A) (i.e. ακ(A) . β) we can actually achieve a small forward error. Condition estimation Suppose now that we want to compute κ1 (A) (or κ∞ (A) = κ1 (AT )). The most obvious approach would be to compute A−1 , and then to evaluate Bindel, Fall 2012 Matrix Computations (CS 6210) kA−1 k1 and kAk1 . But the computation of A−1 involves solving n linear systems for a total cost of O(n3 ) — the same order of magnitude as the initial factorization. Error estimates that cost too much typically don’t get used, so we want a different approach to estimating κ1 (A), one that does not cost so much. The only piece that is expensive is the evaluation of kA−1 k1 , so we will focus on this. Note that kA−1 xk1 is a convex function of x, and that kxk1 ≤ 1 is a convex set. So finding kA−1 k1 = max kA−1 xk1 kxk1 ≤1 is a convex optimization problem. Also, note that k · k1 is differentiable almost everywhere: if all the components of y are nonzero, then ξ T y = kyk1 , for ξ = sign(y); and if δy is small enough so that all the components of y + δy have the same sign as the corresponding components of y, then ξ T (y + δy) = ky + δyk1 More generally, we have ξ T u ≤ kξk∞ kuk1 = kuk1 , i.e. even when δy is big enough so that the linear approximation to ky + δyk1 no longer holds, we at least have a lower bound. Since y = A−1 x, we actually have that |ξ T A−1 (x + δx)| ≤ kA−1 (x + δx)k, with equality when δx is sufficiently small (assuming y has no zero components). This suggests that we move from an initial guess x to a new guess xnew by maximizing |ξ T A−1 xnew | over kxnew k ≤ 1. This actually yields xnew = ej , where j is chosen so that the jth component of z T = ξ T A−1 has the greatest magnitude. Bindel, Fall 2012 Matrix Computations (CS 6210) Putting everything together, we have the following algorithm % Hager’s algorithm to estimate norm(A^{-1},1) % We assume solveA and solveAT are O(n^2) solution algorithms % for linear systems involving A or A’ (e.g. via LU) x = ones(n,1)/n; while true % Initial guess y = solveA(x); % Evaluate y = A^{-1} x xi = sign(y); % and z = A^{-T} sign(y), the z = solveAT(xi); % (sub)gradient of x -> \|A^{-1} x\|_1. % Find the largest magnitude component of z [znorm, j] = max(abs(z)); % znorm = |z_j| is our lower bound on |A^{-1} e_j|. % If this lower bound is no better than where we are now, quit if znorm <= norm(y,1) invA_normest = norm(y,1); break; end % Update x to e_j and repeat x = zeros(n,1); x(j) = 1; end This method is not infallible, but it usually gives estimates that are the right order of magnitude. There are various alternatives, refinements, and extensions to Hager’s method, but they generally have the same flavor of probing A−1 through repeated solves with A and AT . Scaling Suppose we wish to solve Ax = b where A is ill-conditioned. Sometimes, the ill-conditioning is artificial because we made a poor choice of units, and it Bindel, Fall 2012 Matrix Computations (CS 6210) appears to be better conditioned if we write D1 AD2 y = D1 b, where D1 and D2 are diagonal scaling matrices. If the original problem was poorly scaled, we will likely find κ(D1 AD2 ) κ(A), which may be great for Gaussian elimination. But by scaling the matrix, we are really changing the norms that we use to measure errors — and that may not be the right thing to do. For physical problems, a good rule of thumb is to non-dimensionalize before computing. The non-dimensionalization will usually reveal a good scaling that (one hopes) simultaneously is appropriate for measuring errors and does not lead to artificially inflated condition numbers.