Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Web resources Those marked with þ I have verified still work in 2013. Linear algebra þ Strang, G., Lecture videos for 18.06 Linear Algebra, Massachusetts Institute of Technology Open Courseware project http://ocw.mit.edu/courses/mathematics/18-06sclinear-algebra-fall-2011/index.htm (The navigation panel on the left shows 3 units in the syllabus – click the + to see the individual lecture topics.) Regression þ Glover, D., W. Jenkins and S. Doney, (2002), Lecture notes for 12.747 Modeling, Data Analysis and Numerical Techniques for Geochemistry, MIT/WHOI, http://w3eos.whoi.edu/12.747/notes/lect03/lectno03.html Optimal interpolation þ Hartmann, D., Lecture notes for ATMS 552 Objective Analysis. University of Washington, Section on Mapping Data Fields: http://www.atmos.washington.edu/~dennis/552_Notes_5.pdf Kriging þ Chu, Dezhang. The GLOBEC Kriging Software Package - EasyKrig2.1, http://globec.whoi.edu/software/kriging/easy_krig/easy_krig.html þ Lafleur, Caroline. Matlab Kriging toolbox, http://globec.whoi.edu/software/kriging/V3/intro_v3.html Empirical Orthogonal Functions Bjornsson, H. and S. A. Venegas (1997), A manual for EOF and SVD analyses of climate data. McGill University, CCGCR Report No. 97-1, Montréal, Québec, 52pp, þ Hartmann, D., Lecture notes for ATMS 552 Objective Analysis. University of Washington, Section on EOF/PC analysis: http://www.atmos.washington.edu/~dennis/552_Notes_4.pdf Varimax rotation algorithm for Matlab: http://erizo.ucdavis.edu/~dmk/notes/EOFs/EOFs.html þ The SVD song on Youtube http://www.youtube.com/watch?v=JEYLfIVvR9I&uid=_viSqzGQPXi2RAgvnetDoA&lr =1 -4- Lectures: John Wilkin Feb 14: Matrix and vector algebra – a review Geophysical data are frequently presented to us as a set of coinciding time series of observations at a set of spatial locations. (Sometimes the spatial arrangement is a regular grid). It is natural to seek a reduction of these data to a small set of spatial patterns and common associated time series that describe the dominant coherent patterns of variability that underlie the data. Tidal harmonic analysis, and filtering or averaging to a mean seasonal cycle, are examples of such a reduction of the data. These use prior knowledge of the dominant dynamics to fit a preconceived model to the data, e.g. the tidal harmonic forcing frequencies. But what if we don’t know the temporal patterns that are common across space? We might be able to propose likely spatial structures instead, based on physical arguments such as dynamical modes of propagating waves in the atmosphere and ocean (which, incidentally, are the solutions to an eigenvalue/eigenvector problem defined by the governing equations and boundary conditions). However, determining a reasonable set of mode structures might be intractable (mathematically, or computationally), or we simply may have no idea (yet) what the dominant processes are. EOF analysis extracts coherent spatial/temporal patterns of variability from the covariance of the data themselves. The modes are empirical in that they do not necessarily bear any correspondence to dynamical modes, and interpreting them as such could be misleading. But it is very often the case that we see a correspondence between EOFs and features of ocean and atmosphere dynamics that we recognize, which can guide further interpretation. Put simply, EOFs are the eigenvectors of a matrix formed from the covariance between all possible pairs of the time series. Treating a time series of data as a vector, we can describe the mean, variance, and covariance of time series in terms of vector and matrix operations. Before we proceed, this lecture reviews some essentials of matrix and vector algebra, drawing on the “Basic Machinery” described in Chapter 3 of Wunsch, C., The Ocean Circulation Inverse Problem, Cambridge University Press, 442 pp., 1996. -5- Linear algebra definitions: Matrix of M by N values: { } A = ai, j , 1 ≤ i ≤ M , 1 ≤ j ≤ N A dataset in multiple dimensions might be represented by a matrix: Each row of D is a vector di of the time series at position i. time j → ⎛ d11 … d1n ⎜ D = position ↓ ⎜ M O M ⎜⎝ dm1 L dmn ⎞ ⎟ ⎟ ⎟⎠ There are m rows (one for each position). There are n columns (one for each observation time). The elements di, j might be complex for 2-D vector data, e.g. current, wind, stress … d = u + −1v = u + iv or even di, j = temperature + i salinity . This can be useful if you believe the data co-vary and are not independent. In the case of mixing temperature and salinity you would want some kind of normalization so as not to mix units. >> the variable 1i in Matlab is always sqrt(-1) Transpose Vector of N values and its transpose ⎡ q1 ⎤ ⎢ ⎥ ⎢M ⎥ q = ⎢ qi ⎥ ⎢ ⎥ ⎢M ⎥ ⎢ ⎥ ⎣ qN ⎦ q T = ⎡⎣ q1 Lqi LqN ⎤⎦ -6- Inner product The inner, or “dot” product, of two vectors is a Tb = a b cosθ where θ is the angle (in N-dimensional space) between the two vectors. If If θ = 0 then the vectors are parallel. θ = π / 2 then the vectors are orthogonal. In more general terms, a Tb = ∑ i=1 ai bi i= N from which it follows that both vectors must be of length N (i.e. they are conforming) in order to compute the summation. If b is a unit vector (having length 1), then the inner product can be thought of as the projection of a onto the coordinate direction of b. (More on this later.) Norm or length The length or norm of a vector can be defined in many ways, but the conventional l2 norm is defined ( ) f = f Tf 1/2 = N ∑f 2 i i=1 The Cartesian distance between two points is the length of their vector difference 1/2 a - b = (a - b)T (a - b) = ⎡⎣(xa − xb )2 + (ya − yb )2 ⎤⎦ Sometimes the distance between two vectors is weighted n ∑cW c c = i ii i i=1 ( = c T Wc ) 1/ 2 To be useful, a weighting matrix is usually symmetric and positive definite. Matrix multiplication P Matrix multiplication is Cij = ∑ Aip Bpj ith row of A times jth column of B p=1 which requires the dimensions be conformable -7- MxN ~ MxP PxN We write C = AB The requirement that matrix operations be conformable is your friend in Matlab. Matrix operation rules: AB ≠ BA Multiplication is not commutative Multiplication is associative ABC = (AB)C = A(BC) ( AB ) T = BT AT The expansion of the transpose of a product N trace(A) = ∑ aii The sum of diagonal elements i=1 A symmetric matrix has the property C = C T of all rows of the matrix with themselves. so the product C TC is the dot product We can easily make a symmetric matrix B from any other matrix A B = ATA BT = (A T A)T = A T A = B The identity matrix: ⎛ 1 0 0 ⎜ I =⎜ 0 O 0 ⎜⎝ 0 0 1 ⎞ ⎟ ⎟ ⎟⎠ The inverse of a matrix A is denoted A −1 and defined such that A −1A = I It follows that ( AB ) -1 = B -1 A -1 Some of these concepts in linear algebra are demonstrated in Matlab script jw_linearalgebra.m at: http://marine.rutgers.edu/dmcs/ms615/jw_matlab Data set operations using matrices With an entire dataset in matrix form it is straightforward to calculate certain properties of the data using matrix operations and functions in Matlab™. Our data matrix D -8- j → time ⎛ d11 … d1n ⎜ D = position ↓ ⎜ M O M ⎜⎝ dm1 L dmn ⎞ ⎟ ⎟ ⎟⎠ [Matlab example: Compute the mean and variance of each time series.] DD T will be a matrix with elements proportional to the covariance of each time series with all the others. Basis set Suppose we had N vectors ei , each of dimension (length) N. If it is possible to represent any arbitrary N-dimensional vector, f , as a weighted sum of these N vectors, ei N f = ∑ α i ei i =1 then the ei are a called a spanning set, or basis set, because they are sufficient to span the entire N-dimensions. [This is somewhat analogous to Fourier analysis, where any function f(x) is represented as the weighted sum of a set of functions; namely sin kx and cos kx . These are the basis functions. You need an infinite set if x is continuous, or a finite set of size N if x is discrete with N elements.] To have this property, the ei must be independent, meaning that no single one of the ei can be represented as a weighted sum of the others excluding itself. The coefficients α k of the expansion can be found by solving a set of simultaneous equations describing the projection of f onto each of the ei . N eTk f = ∑ α i eTk e i i=1 This is easily solved in the case that the ei are mutually orthogonal and normal (have unit length), in which case we call them orthonormal. More on this in a moment … first, a class of vectors with this orthogonality property are the eigenvectors of a symmetric matrix. Eigenvectors and eigenvalues -9- Matrix multiplication can be thought of as a transformation of vector x into vector y Ax = y If a vector v has the property that transformation by matrix A leaves its direction unchanged, then v is said to be an eigenvector of matrix A, and satisfies the property Av = λ v where λ is a scalar. If A is square of dimension N, there are N eigenvectors and they are orthogonal, each with a corresponding eigenvalue λn : Av n = λn v n If A is a symmetric matrix, it will have N real eigenvalues and orthonormal eigenvectors that form a basis set. A matrix constructed of the N orthonormal eigenvectors, say Q, will have the property QT Q = I and therefore: Q-1 = QT It follows that AQ = QL QT AQ = QTQL = L where L is a matrix of zeros except for the eigenvalues λn on the diagonal. Matrix Q is said to “diagonalize” matrix A. Also, A is said to “factorize” according to: A = QLQT Orthonormal vectors Orthonormal vectors satisfy the property: eTk e i = δ ik where δ ik is the Kronecker delta: δ ik = 1 if i = k, (normal) and δ ik = 0 if i ≠ k (orthogonal). N Then eTk f = ∑ α iδ ki = α k i =1 is the projection of f onto basis vector e k and we have easily solved for the coefficients α k . - 10 - Some of these concepts in linear algebra are demonstrated in Matlab script jw_linearalgebra.m at: http://marine.rutgers.edu/dmcs/ms615/jw_matlab That was algebra – what do we need to know about calculus with matrices … Differentiation Consider a scalar, J, (a single number, not a vector) that is the product J = r Tq = qTr (so the vectors must be conformable) Differentiating this scalar with respect to the vector q produces a vector gradient ( ) ( ) ∂ T ∂ T q r = r q =r ∂q ∂q This is like the differentiation using the product rule for any two variables r and q. ∂ ( rq ) = r ∂q For a quadratic form the scalar would J be written: J = q T Aq (this requires the matrix A be NxN) We get ( ) ∂J = A + AT q ∂q much like the differentiation of a quadratic product Aq2 ( ) ∂ Aq 2 = 2 Aq ∂q Most spatial analysis of data that entails fitting or smoothing data to fit some statistical or dynamical model involves some form of weighted least squares fitting. Least squares will involve minimization, by differentiating some scalar functional J that represents the norm of a model-data misfit with respect to model parameters. - 11 -