Download Dia 1 - van der Veld

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Capelli's identity wikipedia , lookup

Basis (linear algebra) wikipedia , lookup

Quadratic form wikipedia , lookup

Tensor operator wikipedia , lookup

System of linear equations wikipedia , lookup

Bra–ket notation wikipedia , lookup

Linear algebra wikipedia , lookup

Eigenvalues and eigenvectors wikipedia , lookup

Cartesian tensor wikipedia , lookup

Determinant wikipedia , lookup

Jordan normal form wikipedia , lookup

Matrix (mathematics) wikipedia , lookup

Singular-value decomposition wikipedia , lookup

Perron–Frobenius theorem wikipedia , lookup

Orthogonal matrix wikipedia , lookup

Non-negative matrix factorization wikipedia , lookup

Four-vector wikipedia , lookup

Cayley–Hamilton theorem wikipedia , lookup

Matrix calculus wikipedia , lookup

Matrix multiplication wikipedia , lookup

Transcript
Multivariate Statistics
Matrix Algebra I
W. M. van der Veld
University of Amsterdam
Overview
•
•
•
•
•
•
Introduction
Definitions
Special names
Matrix transposition
Matrix addition
Matrix multiplication
Introduction
• The mathematics in which multivariate analysis is
cast is matrix algebra.
• We will present enough matrix algebra to facilitate
the description of the operations we need to
understand the matrix algebra involving the
multivariate analysis discussed in this course. In
addition this basic understanding is necessary for
the more advanced courses of the Research Master.
• Basically, all we need is a few basic tricks, at least at
first. Let us summarize them, so that you will have
some idea of what is coming and, more importantly,
of why these topics must be mastered.
Introduction
•
•
Our point of departure is always a multivariate data matrix with a certain number, n, of
rows for the individual observation units, and a certain number, m, of columns for the
variables.
In most applications of multivariate analysis, we shall not be interested in variable means.
They have their interest, of course, in each study, but multivariate analysis instead focuses
on variances and covariances. Therefore, the data matrix will in general be transformed into
a matrix where columns have zero means and where the numbers in the column represent
deviations from the mean.
•
•
•
Such a matrix is the basis for the variance-covariance matrix with n rows and m columns.
For a variable i, the variance is defined as Σxi2/n, whereas for two variables i and j the
covariance is defined as Σxixj /n, xi and xj being taken as deviations from the mean.
Variances and covariances can be collected in the variance-covariance matrix, in that the
number in row i, column i (on the diagonal), gives the variance of variable i, while the
number in row i, column j (i ≠ j), gives the covariance between the pair of variables i and j,
and is the same number as in row j, column i.
An often useful transformation is to standardize the data matrix: we first take deviations
from the mean for each column, then divide the deviation from the mean by the standard
deviation for the same column. The result is that values in a column will have zero mean
and unit variance.
The standardized data matrix is then the basis for calculating a correlation matrix, which is
nothing but a variance-covariance matrix for standardized variables. In the diagonal of this
matrix, we therefore find values equal to unity. In the other cells we find correlations: in
row i, column j, we shall find the correlation coefficient rij = Σxixj /nσiσj.
Introduction
•
•
•
•
•
Very often we shall need a variable that is a linear compound of the initial variables. The
linear compound is simply a variable whose values are obtained by a weighted addition of
values of the original variables. For example, with two initial variables x1 and x2, values of
the compound are defined as y = w1x1 + w2x2, where w1 and w2 are weights. A linear
compound could also be called a weighted sum.
For some techniques of multivariate analysis, we need to be able to solve simultaneous
equations. Doing so usually requires a computational routine called matrix inversion.
Multivariate analysis nearly always comes down to finding a minimum or a maximum of
some sort. A typical example is to find a linear compound of some variables that has
maximum correlation with some other variable (multiple correlation), or to find a linear
compound of the observed scores that has maximum variance (factor analysis). Therefore,
among our stock of basic tricks, we need to include procedures for finding extreme values
of functions.
In addition, we shall often need to find maxima (or minima) of functions where the
procedure is limited by certain side-conditions. For instance, we are given two sets of
variables, and are required to find a linear compound from the first set, and another from
the second set, such that the value of the correlation between these two compounds is
maximum. This task can be reformulated as follows: find the two compounds in such a way
that the covariance between them is maximum, given that the compounds both have unit
variance.
Very often in multivariate analysis, a maximization procedure under certain side-conditions
takes on a very specific and recognizable form, namely, finding eigenvectors and eigenvalues
of a given matrix.
Definitions
• For multivariate statistics the most important matrix is the
data matrix.
Data file
resp  id
00000
00001
00002
00007
Age Satlife SocTrst
49
4
2
25
6
7
23
2
7
68
7
1
Data matrix  49 4 2 3 


 25 6 7 5 
 23 2 7 5 


 68 7 1 7 
PolTrst
3
5
5
7
• The data matrix has a
certain number, n, of rows
for the individual
observation units, and a
certain number, m, of
columns for the variables.
Definitions
• In general a matrix has an n by m dimension.
• The convention is to denote matrices by boldface uppercase
letters.
 x11
Χ  
 x21
x12
x22
x13 

x23 
• The first subscript in a matrix element (xij) refers to the row
and the second subscript refers to the column.
• It is important to remember this convention when matrix
algebra is performed.
Definitions
• A vector is a special type of matrix that has only one row
(called a row vector) or one column (called a column vector).
Below, a is a column vector while b is a row vector.
• The convention is to denote vectors by boldface lowercase
letters.
 x1 
a    b  x1
 x2 
x2
x3 
Definitions
• A scalar is a matrix with only one row and one column.
• The convention is to denote scalars by italicized, lower case
letters (e.g., x).
Special names
• If n = m then the matrix is called a square matrix.
• The data matrix is normally not square, but the variancecovariance matrix is; and many others.
• Matrix A is square but matrix B is not square.
 3 4 5


A   2 12 5 
  1 7 0


 3 4 5
B  

 2 12 5 
Special names
• A symmetric matrix is a square matrix in which xij = xji , for
all i and j.
• The data matrix is normally not symmetric, but the variancecovariance matrix is.
• Matrix A is symmetric; matrix B is not symmetric.
 1 2  1


A   2 12 10 
  1 10 0 


 1 2  1


B   10 12 2 
  1 10 0 


Special names
• A diagonal matrix is a symmetric matrix where all the off
diagonal elements are 0.
• The data matrix is normally not diagonal, neither is the
variance covariance matrix. The variance matrix is diagonal.
• These matrices are often denoted with D; matrix D is
diagonal.
 1 0 0


D   0 12 0 
0 0 7


Special names
• An identity matrix is a diagonal matrix with 1s and only 1s
on the diagonal, it is also sometimes called the unity matrix.
• This is a useful matrix in matrix algebra.
• The convention is to denote the identity matrix by I.
 1 0 0


I   0 1 0
 0 0 1


Special names
• An unit vector is a vector containing only 1s.
• This is a useful vector in matrix algebra.
• The convention is to denote the identity matrix by u.
 1
 
u   1
 1
 
Matrix transposition
• Matrix transposition is a useful transformation, with many
purposes.
• The transpose of a matrix is denoted by a prime (A’) or a
superscript t or T (At or AT).
• What it does? The first row of a matrix becomes the first
column of the transpose matrix, the second row of the
matrix becomes the second column of the transpose, etc.
 1 5 1
A  

 0 2 1
 1 0


A'   5 2 
 2 1


Matrix transposition
• What the transpose of A? And the dimensions of A’?
 1 3
A  
  A'  ?
 0 4
• The transpose of a square matrix is a square matrix
• What type of special matrix is this matrix?
• What the transpose of this matrix?
1 3 5


A   3 1 7   A'  ?
5 7 1


• The transpose of a symmetric matrix is simply the original
matrix.
Matrix transposition
• The transpose of a row vector will be a
column vector, and the transpose of a column
vector will be a row vector.
 2
a     a'  2 3
 3
 4
 
b  4 0 2  b'   0 
 2
 
Matrix addition
• To add two matrices;
– they both must have the same number of rows, and
– they both must have the same number of columns.
• The elements of the two matrices are simply added together,
element by element, to produce the results.
• That is, for R = A + B, then rij = aij + bij.
 1 2  1


A   2 12 10 
  1 10 0 


 1 2  1


B   10 12 2 
  1 10 0 


...   2
4  2
 1  1 ...

 

R   ...
10  2    12 24 12 
  1  1 ... 0  0    2 20 0 

 

Matrix addition
• Matrix subtraction works in the same way, except that elements
are subtracted instead of added.
• What is the result of this subtraction?
 2 3

  4 0
1 5 
• What is the result of this addition?
 2 3   - 1 - 3

  

0 - 4  0 5 
Matrix addition
• Rules for matrix addition and subtraction:
– A+B=B+A
Commutative
– (A + B) + C = A + (B + C)
Associative
– (A + B)’ = A’ + B’
Matrix multiplication
• Multiplication between a scalar and a vector.
• Each element in the product matrix is simply the scalar
multiplied by the element in the vector.
• That is, for p = xa, then pij = xaij for all i and j. Thus,
 2
a   ; x  4
 3
 2  4 * 2  8 
p  xa  4   
   
 3   4 * 3  12 
p  ax  ?
• The following multiplication is also defined: p = ax. That is,
scalar multiplication is commutative.
Matrix multiplication
• Multiplication between two vectors.
• To perform this, the row vector must have as many columns
as the column vector has rows.
• The product is simply the sum of the first row vector element
multiplied by the first column vector element plus the second
row vector element multiplied by the second column vector
element plus the product of the third
elements, etc.
n
ai bi.
• In algebra, if p = ab, then p 

 0
 
0 1 2 1   ?
 2
 
i 1
0 * 0  1*1  2 * 2  5
Matrix multiplication
• Multiplication between two matrices.
• This is similar to the multiplication of two vectors.
• Specifically, in the expression P = AB, p=ai• b•j, where ai• is the
ith row vector in matrix A and b•j is the jth column vector in
matrix B.
• Thus, if
 1 2


1
5
1


A  
 B   0
4
 0 2 1
 7  1


P  AB  ?
Matrix multiplication
• Multiplication between two matrices.
• This is similar to the multiplication of two vectors.
• Specifically, in the expression P = AB, p=ai• b•j, where ai• is the
ith row vector in matrix A and b•j is the jth column vector in
matrix B.
• Thus, if
 1 2


1
5
1


A  
 B   0
4
 0 2 1
 7  1


8
P  AB  




 1
 
p11  a1b1  1 5 1 0   1 *1  5 * 0  1 * 7  8
 7
 
Matrix multiplication
• Multiplication between two matrices.
• This is similar to the multiplication of two vectors.
• Specifically, in the expression P = AB, p=ai• b•j, where ai• is the
ith row vector in matrix A and b•j is the jth column vector in
matrix B.
• Thus, if
 1 2


1
5
1


A  
 B   0
4
 0 2 1
 7  1


 8 19 
P  AB  



2
 
p12  a1b2  1 5 1 4   1 * 2  5 * 4  1 * 1  19
  1
 
Matrix multiplication
• Multiplication between two matrices.
• This is similar to the multiplication of two vectors.
• Specifically, in the expression P = AB, p=ai• b•j, where ai• is the
ith row vector in matrix A and b•j is the jth column vector in
matrix B.
• Thus, if
 1 2


1
5
1


A  
 B   0
4
 0 2 1
 7  1


 8 19 
P  AB  

7


 1
 
p21  a2b1  0 2 1 0   0 *1  2 * 0  1 * 7  7
 7
 
Matrix multiplication
• Multiplication between two matrices.
• This is similar to the multiplication of two vectors.
• Specifically, in the expression P = AB, p=ai• b•j, where ai• is the
ith row vector in matrix A and b•j is the jth column vector in
matrix B.
• Thus, if
 1 2


1
5
1


A  
 B   0
4
 0 2 1
 7  1


 8 19 
P  AB  

7
7


2
 
p22  a2b2  0 2 1 4   0 * 2  2 * 4  1 * 1  7
  1
 
Matrix multiplication
• Summary of multiplication procedure.
a b
d e

g j
c 
ag  bh  ci aj  bk  cl 

h k 

 dg  eh  fi dj  ek  fl 
f 
 i l 
Matrix multiplication
• For matrix multiplication to be legal, the first matrix must
have as many columns as the second matrix has rows. This,
of course, is the requirement for multiplying a row vector by
a column vector.
• The resulting matrix will have as many rows as the first
matrix and as many columns as the second matrix.
• In the example A had 2 rows and 3 columns while B had 3
rows and 2 columns, the matrix multiplication was therefore
defined resulting in matrix with 2 rows and 2 columns.
• Or in general:
– Dimension A is na by ma
Dimension B is nb by mb,
– Then the product P = AB is defined if ma=nb
– And the dimension of P is na by mb.
Matrix multiplication
• Rules for matrix and vector multiplication:
– AB ≠ BA
Not commutative
– A(BC ) = (AB)C
Associative
– A(B+C) = AB + AC
Distributive
– (B+C)A = BA + CA
– (AB) = BA
– (ABC) = CBA
• Rules for scalar multiplication:
– xA = Ax
Commutative
– x(A+B) = xA + xB
Distributive
– x(AB) = (xA)B = A(xB)
Associative
Matrix multiplication
• What is the product of:
 2
 
 3 2 4  ?
 4
 
 2
 4 8
 


 3 2 4   6 12 
 4
 8 16 
 


 2
 
 3 1 1 ?
 4
 
 2
 2 2
 


 3 1 1   3 3
 4
 4 4
 


 2
 
1 1 3  ?
 4
 
Not possible: [1x2][3x1]
Matrix multiplication
• What is the product of:
4

6
8

4

6
8

8  2 
 
Not defined: [3x2] by [1x3]
12  3 
16  4 
8
 4 *1  8 * 0   4 
 1  
  
12     6 *1  12 * 0    6 
0 


 8
16 
8
*
1

16
*
0

  
 3 7


0 2 3 5 5  0 * 3  2 * 5  3* 7 0 * 7  2 * 5  3* 3  31 19
 7 3


 1 0  2 0 
2
  

 0 1  0 2
Matrix multiplication
• Matrix division.
• For simple numbers, division can be reduced to multiplication
by the reciprocal of the divider
–
–
–
–
32 divided by 4, is the same as
32 multiplied by ¼, or
multiplied by 4-1,
where 4-1 is defined by the general equality a-1a = 1.
• When working with matrices, we shall adopt the latter idea,
and therefore not use the term division at all; instead we take
the multiplication by an inverse matrix as the equivalent of
division.
• However, the computation of the inverse matrix is quite
complex, and discussed next time.