Download Condition estimation and scaling

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Rotation matrix wikipedia , lookup

Jordan normal form wikipedia , lookup

Eigenvalues and eigenvectors wikipedia , lookup

Determinant wikipedia , lookup

Linear least squares (mathematics) wikipedia , lookup

Matrix (mathematics) wikipedia , lookup

Perron–Frobenius theorem wikipedia , lookup

Singular-value decomposition wikipedia , lookup

System of linear equations wikipedia , lookup

Principal component analysis wikipedia , lookup

Matrix calculus wikipedia , lookup

Orthogonal matrix wikipedia , lookup

Cayley–Hamilton theorem wikipedia , lookup

Four-vector wikipedia , lookup

Non-negative matrix factorization wikipedia , lookup

Ordinary least squares wikipedia , lookup

Matrix multiplication wikipedia , lookup

Gaussian elimination wikipedia , lookup

Transcript
Bindel, Fall 2012
Matrix Computations (CS 6210)
Week 5: Monday, Sep 17
The slippery inverse
The concept of the inverse of a matrix is generally more useful in theory
than in numerical practice. We work with the inverse implicitly all the time
through solving linear systems via LU; but we rarely form it explicitly, unless
the inverse has some special structure we want to study or to use.
If we did want to form A−1 explicitly, the usual approach is to compute P A = LU , then use that factorization to solve the systems Axk = ek ,
where ek is the kth column of the identity matrix and xk is thus the kth
column of the identity matrix. As discussed last time, forming the LU factorization takes n3 /3 multiply-adds (2n3 /3 flops), and a pair of triangular
solves takes n2 multiply-add operations. Therefore, computing the inverse
explicitly via an LU factorization takes about n3 multiply-add operations,
or roughly three times as much arithmetic as the original LU factorization.
Furthermore, multiplying by an explicit inverse is almost exactly the same
amount of arithmetic work as a pair of triangular solves. So computing and
using an explicit inverse is, on balance, more expensive than simply solving
linear systems using the LU factorization.
To make matters worse, multiplying by the explicit inverse of a matrix is
not a backward stable algorithm. Even if we could compute A−1 essentially
exactly, only committing rounding errors when storing the entries and when
performing matrix-vector multiplication, we would find fl(A−1 b) = (A−1 +
F )b, where |F | ≤ nmach |A−1 |. But this corresponds to to a backward error
of roughly −AF A, which is potentially much larger than kAk.
In summary: you should get used to the idea that any time you see an
inverse in the description of a numerical method, it is probably shorthand for
“solve a linear system here.” Except in special circumstances, forming and
multiplying by an explicit inverse is both slower and less numerically stable
than solving a linear system by Gaussian elimination.
Iterative refinement revisited
At the end of last lecture, we discussed iterative refinement:
xk+1 = xk + Â−1 (b − Axk ).
Bindel, Fall 2012
Matrix Computations (CS 6210)
The fixed point for this iteration is x = A−1 b, and we can write a simple
recurrence for the error ek+1 = xk − x:
ek+1 = Â−1 Eek
where E = Â − A. Therefore, if kÂ−1 Ek < 1, then iterative refinement
converges – in exact arithmetic.
In floating point arithmetic, we actually compute something like
xk+1 = xk + Â−1 (b − Axk + δk ) + µk ,
where δk is an error associated with computing the residual, and µk is an
error associated with the update. This gives us the error recurrence
ek+1 = Â−1 Eek + Â−1 δk + µk
If kδk k < α and kµk k < β for all k, then we can show that
kxk − xk ≤ kA−1 Ekk kx0 − xk +
αkA−1 k + β
.
1 − kA−1 Ek
If we evaluate the residual in the obvious way, we typically have
α ≤ c1 mach kAkkxk,
β ≤ c2 mach kxk,
for some modest c1 and c2 ; and for large enough k, we end up with
kxk − xk
≤ C1 mach κ(A) + C2 mach .
kxk
That is, iterative refinement leads to a relative error not too much greater
than we would expect due to a small relative perturbation to A; and we can
show that in this case the result is backward stable. And if we use mixed
precision to evaluate the residual accurately enough relative to κ(A) (i.e.
ακ(A) . β) we can actually achieve a small forward error.
Condition estimation
Suppose now that we want to compute κ1 (A) (or κ∞ (A) = κ1 (AT )). The
most obvious approach would be to compute A−1 , and then to evaluate
Bindel, Fall 2012
Matrix Computations (CS 6210)
kA−1 k1 and kAk1 . But the computation of A−1 involves solving n linear
systems for a total cost of O(n3 ) — the same order of magnitude as the
initial factorization. Error estimates that cost too much typically don’t get
used, so we want a different approach to estimating κ1 (A), one that does not
cost so much. The only piece that is expensive is the evaluation of kA−1 k1 ,
so we will focus on this.
Note that kA−1 xk1 is a convex function of x, and that kxk1 ≤ 1 is a
convex set. So finding
kA−1 k1 = max kA−1 xk1
kxk1 ≤1
is a convex optimization problem. Also, note that k · k1 is differentiable
almost everywhere: if all the components of y are nonzero, then
ξ T y = kyk1 , for ξ = sign(y);
and if δy is small enough so that all the components of y + δy have the same
sign as the corresponding components of y, then
ξ T (y + δy) = ky + δyk1
More generally, we have
ξ T u ≤ kξk∞ kuk1 = kuk1 ,
i.e. even when δy is big enough so that the linear approximation to ky + δyk1
no longer holds, we at least have a lower bound.
Since y = A−1 x, we actually have that
|ξ T A−1 (x + δx)| ≤ kA−1 (x + δx)k,
with equality when δx is sufficiently small (assuming y has no zero components). This suggests that we move from an initial guess x to a new guess
xnew by maximizing
|ξ T A−1 xnew |
over kxnew k ≤ 1. This actually yields xnew = ej , where j is chosen so that
the jth component of z T = ξ T A−1 has the greatest magnitude.
Bindel, Fall 2012
Matrix Computations (CS 6210)
Putting everything together, we have the following algorithm
% Hager’s algorithm to estimate norm(A^{-1},1)
% We assume solveA and solveAT are O(n^2) solution algorithms
% for linear systems involving A or A’ (e.g. via LU)
x = ones(n,1)/n;
while true
% Initial guess
y = solveA(x);
% Evaluate y = A^{-1} x
xi = sign(y);
% and z = A^{-T} sign(y), the
z = solveAT(xi); % (sub)gradient of x -> \|A^{-1} x\|_1.
% Find the largest magnitude component of z
[znorm, j] = max(abs(z));
% znorm = |z_j| is our lower bound on |A^{-1} e_j|.
% If this lower bound is no better than where we are now, quit
if znorm <= norm(y,1)
invA_normest = norm(y,1);
break;
end
% Update x to e_j and repeat
x = zeros(n,1); x(j) = 1;
end
This method is not infallible, but it usually gives estimates that are the
right order of magnitude. There are various alternatives, refinements, and
extensions to Hager’s method, but they generally have the same flavor of
probing A−1 through repeated solves with A and AT .
Scaling
Suppose we wish to solve Ax = b where A is ill-conditioned. Sometimes, the
ill-conditioning is artificial because we made a poor choice of units, and it
Bindel, Fall 2012
Matrix Computations (CS 6210)
appears to be better conditioned if we write
D1 AD2 y = D1 b,
where D1 and D2 are diagonal scaling matrices. If the original problem was
poorly scaled, we will likely find κ(D1 AD2 ) κ(A), which may be great for
Gaussian elimination. But by scaling the matrix, we are really changing the
norms that we use to measure errors — and that may not be the right thing
to do.
For physical problems, a good rule of thumb is to non-dimensionalize
before computing. The non-dimensionalization will usually reveal a good
scaling that (one hopes) simultaneously is appropriate for measuring errors
and does not lead to artificially inflated condition numbers.