Download Linear algebra review and Matlab linear algebra examples

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Nonlinear dimensionality reduction wikipedia , lookup

Principal component analysis wikipedia , lookup

Transcript
Web resources
Those marked with þ I have verified still work in 2013.
Linear algebra
þ Strang, G., Lecture videos for 18.06 Linear Algebra, Massachusetts Institute of
Technology Open Courseware project http://ocw.mit.edu/courses/mathematics/18-06sclinear-algebra-fall-2011/index.htm (The navigation panel on the left shows 3 units in the
syllabus – click the + to see the individual lecture topics.)
Regression
þ Glover, D., W. Jenkins and S. Doney, (2002), Lecture notes for 12.747 Modeling,
Data Analysis and Numerical Techniques for Geochemistry, MIT/WHOI,
http://w3eos.whoi.edu/12.747/notes/lect03/lectno03.html
Optimal interpolation
þ Hartmann, D., Lecture notes for ATMS 552 Objective Analysis. University of
Washington, Section on Mapping Data Fields:
http://www.atmos.washington.edu/~dennis/552_Notes_5.pdf
Kriging
þ Chu, Dezhang. The GLOBEC Kriging Software Package - EasyKrig2.1,
http://globec.whoi.edu/software/kriging/easy_krig/easy_krig.html
þ Lafleur, Caroline. Matlab Kriging toolbox,
http://globec.whoi.edu/software/kriging/V3/intro_v3.html
Empirical Orthogonal Functions
Bjornsson, H. and S. A. Venegas (1997), A manual for EOF and SVD analyses of climate
data. McGill University, CCGCR Report No. 97-1, Montréal, Québec, 52pp,
þ Hartmann, D., Lecture notes for ATMS 552 Objective Analysis. University of
Washington, Section on EOF/PC analysis:
http://www.atmos.washington.edu/~dennis/552_Notes_4.pdf
Varimax rotation algorithm for Matlab:
http://erizo.ucdavis.edu/~dmk/notes/EOFs/EOFs.html
þ The SVD song on Youtube
http://www.youtube.com/watch?v=JEYLfIVvR9I&uid=_viSqzGQPXi2RAgvnetDoA&lr
=1
-4-
Lectures: John Wilkin
Feb 14: Matrix and vector algebra – a review
Geophysical data are frequently presented to us as a set of coinciding time series of
observations at a set of spatial locations. (Sometimes the spatial arrangement is a regular
grid).
It is natural to seek a reduction of these data to a small set of spatial patterns and common
associated time series that describe the dominant coherent patterns of variability that
underlie the data.
Tidal harmonic analysis, and filtering or averaging to a mean seasonal cycle, are
examples of such a reduction of the data. These use prior knowledge of the dominant
dynamics to fit a preconceived model to the data, e.g. the tidal harmonic forcing
frequencies.
But what if we don’t know the temporal patterns that are common across space?
We might be able to propose likely spatial structures instead, based on physical
arguments such as dynamical modes of propagating waves in the atmosphere and ocean
(which, incidentally, are the solutions to an eigenvalue/eigenvector problem defined by
the governing equations and boundary conditions).
However, determining a reasonable set of mode structures might be intractable
(mathematically, or computationally), or we simply may have no idea (yet) what the
dominant processes are.
EOF analysis extracts coherent spatial/temporal patterns of variability from the
covariance of the data themselves. The modes are empirical in that they do not
necessarily bear any correspondence to dynamical modes, and interpreting them as such
could be misleading. But it is very often the case that we see a correspondence between
EOFs and features of ocean and atmosphere dynamics that we recognize, which can
guide further interpretation.
Put simply, EOFs are the eigenvectors of a matrix formed from the covariance between
all possible pairs of the time series.
Treating a time series of data as a vector, we can describe the mean, variance, and
covariance of time series in terms of vector and matrix operations. Before we proceed,
this lecture reviews some essentials of matrix and vector algebra, drawing on the “Basic
Machinery” described in Chapter 3 of Wunsch, C., The Ocean Circulation Inverse
Problem, Cambridge University Press, 442 pp., 1996.
-5-
Linear algebra definitions:
Matrix of M by N values:
{ }
A = ai, j , 1 ≤ i ≤ M , 1 ≤ j ≤ N
A dataset in multiple dimensions might be represented by a matrix:
Each row of D is a vector di of the time series at position i.
time
j →
⎛ d11 … d1n
⎜
D = position ↓ ⎜ M O M
⎜⎝ dm1 L dmn
⎞
⎟
⎟
⎟⎠
There are m rows (one for each position).
There are n columns (one for each observation time).
The elements di, j might be complex for 2-D vector data, e.g. current, wind, stress …
d = u + −1v = u + iv
or even di, j = temperature + i salinity . This can be useful if you believe the data co-vary
and are not independent. In the case of mixing temperature and salinity you would want
some kind of normalization so as not to mix units.
>> the variable 1i in Matlab is always sqrt(-1)
Transpose
Vector of N values and its transpose
⎡ q1 ⎤
⎢ ⎥
⎢M ⎥
q = ⎢ qi ⎥
⎢ ⎥
⎢M ⎥
⎢ ⎥
⎣ qN ⎦
q T = ⎡⎣ q1 Lqi LqN ⎤⎦
-6-
Inner product
The inner, or “dot” product, of two vectors is
a Tb = a b cosθ
where θ is the angle (in N-dimensional space) between the two vectors.
If
If
θ = 0 then the vectors are parallel.
θ = π / 2 then the vectors are orthogonal.
In more general terms,
a Tb = ∑ i=1 ai bi
i= N
from which it follows that both vectors must be of length N (i.e. they are conforming) in
order to compute the summation.
If b is a unit vector (having length 1), then the inner product can be thought of as the
projection of a onto the coordinate direction of b. (More on this later.)
Norm or length
The length or norm of a vector can be defined in many ways, but the conventional l2
norm is defined
( )
f = f Tf
1/2
=
N
∑f
2
i
i=1
The Cartesian distance between two points is the length of their vector difference
1/2
a - b = (a - b)T (a - b) = ⎡⎣(xa − xb )2 + (ya − yb )2 ⎤⎦
Sometimes the distance between two vectors is weighted
n
∑cW c
c =
i
ii i
i=1
(
= c T Wc
)
1/ 2
To be useful, a weighting matrix is usually symmetric and positive definite.
Matrix multiplication
P
Matrix multiplication is
Cij = ∑ Aip Bpj
ith row of A times jth column of B
p=1
which requires the dimensions be conformable
-7-
MxN ~ MxP PxN
We write
C = AB
The requirement that matrix operations be conformable is your friend in Matlab.
Matrix operation rules:
AB ≠ BA
Multiplication is not commutative
Multiplication is associative
ABC = (AB)C = A(BC)
( AB )
T
= BT AT
The expansion of the transpose of a product
N
trace(A) = ∑ aii The sum of diagonal elements
i=1
A symmetric matrix has the property C = C T
of all rows of the matrix with themselves.
so the product C TC is the dot product
We can easily make a symmetric matrix B from any other matrix A
B = ATA
BT = (A T A)T = A T A = B
The identity matrix:
⎛ 1 0
0
⎜
I =⎜ 0 O 0
⎜⎝ 0 0 1
⎞
⎟
⎟
⎟⎠
The inverse of a matrix A is denoted A −1 and defined such that A −1A = I
It follows that
( AB )
-1
= B -1 A -1
Some of these concepts in linear algebra are demonstrated in Matlab script
jw_linearalgebra.m at:
http://marine.rutgers.edu/dmcs/ms615/jw_matlab
Data set operations using matrices
With an entire dataset in matrix form it is straightforward to calculate certain properties
of the data using matrix operations and functions in Matlab™.
Our data matrix D
-8-
j →
time
⎛ d11 … d1n
⎜
D = position ↓ ⎜ M O M
⎜⎝ dm1 L dmn
⎞
⎟
⎟
⎟⎠
[Matlab example: Compute the mean and variance of each time series.]
DD T will be a matrix with elements proportional to the covariance of each time series
with all the others.
Basis set
Suppose we had N vectors ei , each of dimension (length) N.
If it is possible to represent any arbitrary N-dimensional vector, f , as a weighted sum of
these N vectors, ei
N
f = ∑ α i ei
i =1
then the ei are a called a spanning set, or basis set, because they are sufficient to span the
entire N-dimensions.
[This is somewhat analogous to Fourier analysis, where any function f(x) is represented
as the weighted sum of a set of functions; namely sin kx and cos kx . These are the basis
functions. You need an infinite set if x is continuous, or a finite set of size N if x is
discrete with N elements.]
To have this property, the ei must be independent, meaning that no single one of the ei
can be represented as a weighted sum of the others excluding itself.
The coefficients α k of the expansion can be found by solving a set of simultaneous
equations describing the projection of f onto each of the ei .
N
eTk f = ∑ α i eTk e i
i=1
This is easily solved in the case that the ei are mutually orthogonal and normal (have unit
length), in which case we call them orthonormal.
More on this in a moment … first, a class of vectors with this orthogonality property are
the eigenvectors of a symmetric matrix.
Eigenvectors and eigenvalues
-9-
Matrix multiplication can be thought of as a transformation of vector x into vector y
Ax = y
If a vector v has the property that transformation by matrix A leaves its direction
unchanged, then v is said to be an eigenvector of matrix A, and satisfies the property
Av = λ v
where λ is a scalar. If A is square of dimension N, there are N eigenvectors and they
are orthogonal, each with a corresponding eigenvalue λn :
Av n = λn v n
If A is a symmetric matrix, it will have N real eigenvalues and orthonormal eigenvectors
that form a basis set.
A matrix constructed of the N orthonormal eigenvectors, say Q, will have the property
QT Q = I
and therefore:
Q-1 = QT
It follows that
AQ = QL
QT AQ = QTQL = L
where L is a matrix of zeros except for the eigenvalues λn on the diagonal.
Matrix Q is said to “diagonalize” matrix A. Also, A is said to “factorize” according to:
A = QLQT
Orthonormal vectors
Orthonormal vectors satisfy the property:
eTk e i = δ ik
where δ ik is the Kronecker delta:
δ ik = 1 if i = k, (normal) and δ ik = 0 if i ≠ k (orthogonal).
N
Then
eTk f = ∑ α iδ ki = α k
i =1
is the projection of f onto basis vector e k and we have
easily solved for the coefficients α k .
- 10 -
Some of these concepts in linear algebra are demonstrated in Matlab script
jw_linearalgebra.m at:
http://marine.rutgers.edu/dmcs/ms615/jw_matlab
That was algebra – what do we need to know about calculus with matrices …
Differentiation
Consider a scalar, J, (a single number, not a vector) that is the product
J = r Tq = qTr
(so the vectors must be conformable)
Differentiating this scalar with respect to the vector q produces a vector gradient
( )
( )
∂ T
∂ T
q r =
r q =r
∂q
∂q
This is like the differentiation using the product rule for any two variables r and q.
∂
( rq ) = r
∂q
For a quadratic form the scalar would J be written:
J = q T Aq
(this requires the matrix A be NxN)
We get
(
)
∂J
= A + AT q
∂q
much like the differentiation of a quadratic product Aq2
( )
∂
Aq 2 = 2 Aq
∂q
Most spatial analysis of data that entails fitting or smoothing data to fit some statistical or
dynamical model involves some form of weighted least squares fitting. Least squares will
involve minimization, by differentiating some scalar functional J that represents the norm
of a model-data misfit with respect to model parameters.
- 11 -