Download (Printable File) Chapter 0 - Review of Linear Algebra and Statistics

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Degrees of freedom (statistics) wikipedia , lookup

Linear least squares (mathematics) wikipedia , lookup

Transcript
Chapter 0 - Review of linear Algebra
Objectives
• Basic definitions on matrices
• Matrix multiplications
• Addition and subtraction of matrices
• Computing the determinant of a matrix
• Scaling of a matrix
• Matrix transformation
• Transpose and Inverse of a matrix
• DOT and CROSS Products of vectors
What is a matrix?
A matrix is a two dimensional array that stores the elements
(numbers, or symbols representing numbers) in m rows and n
columns. A matrix might be denoted by a letter such as A and
is said to be m-by-n (m×n) in size.
Here is an example:
Note that indices start from 1 in MATLAB.
What is a matrix?
Example of a 4-by-4 matrix:
16 2 3 13
 5 11 10 8 

A
 9 7 6 12


4
14
15
1


The 4-by-4 matrix on the right hand side can be created in MATLAB using:
A = [16 2 3 13; 5 11 10 8; 9 7 6 12; 4 14 15 1]
Note that indices start from 1 in MATLAB. Also each row is separated by a
“;” and each member by a blank space. If you type the above on MATLAB
command line you will get:
A=
16 2 3 13
5 11 10 8
9 7 6 12
4 14 15 1
Multiplying two matrices
Let B and C be m-by-n and q-by-p matrices respectively. Here is
what we can say about the product of these two:
B * C is possible iff n = q
C * B is possible iff p = m
Thus, the product of two matrices is possible when the number of
columns on the left matrix is the same as the number of rows on
the right matrix.
How do we multiply two matrices. We show this in an example:
7 10 13
1
3
5

 
   1* 7  3 * 8  5 * 9 1*10  3 *11  5 *12 1*13  3 *14  5 *15 
B *C  
*
8
11
14
 

 
2 4 6 9 12 15 2 * 7  4 * 8  6 * 9 2 *10  4 *11  6 *12 2 *13  4 *14  6 *15


 76 103 130


100 136 172
I am sure by going through this example you will figure out how the rows on the
first matrix were multiplied by the columns of the second one, one-by-one. Note
that we couldn’t multiply C by B, why?
Adding or Subtracting two matrices
To add or subtract two matrices, they must be exactly of the
same type and size.
Example (Addition)
1 4 7 10 13 16 1  10  11 4  13  17 7  16  23
B  C  2 5 8  11 14 17  2  11  13 5  14  19 8  17  25
3 6 9 12 15 18 3  12  15 6  15  21 9  18  27
Example (Subtraction)
1 4 7 10 13 16 1  10  9 4  13  9 7  16  9
B  C  2 5 8  11 14 17  2  11  9 5  14  9 8  17  9
3 6 9 12 15 18 3  12  9 6  15  9 9  18  9
In this case, it is possible to do B+C and C+B and they both produce the same
result.
However, B-C and C-B are possible but do not produce the same result.
Multiplying a matrix by a constant or Identity Matrix
When you multiply a matrix by a constant, all elements of the matrix will
be multiplied by that constant. The is referred to as scaling. Example
(const=4):
16 5 9 4  64 20 36 16 
 2 11 7 14  8 44 28 56


Const * A  4 * 
 3 10 6 15 12 40 24 60

 

13
8
12
1
42
32
48
4

 

The Identity matrix is a square matrix (number of rows and columns are the
same), where the diagonal values of the matrix are all 1 and the rest of the
elements are 0. When multiplied by another matrix of the same size, identity
matrix produces the original matrix.
1 0 0 0 16 5 9 4  16 5 9 4 
0 1 0 0  2 11 7 14  2 11 7 14
*


I*A 
0 0 1 0  3 10 6 15  3 10 6 15

 
 

0
0
0
1
13
8
12
1
13
8
12
1

 
 

 Identity
*
Matrix A 
Matrix A
Computing the determinant of a matrix
Each square matrix of arbitrary size has a number called determinant of
the matrix. This number is computed through a process. Let’s try a 2-by-2
matrix first:
a b 
A
, the determinan t of this matrix is : | A | a * d  b * c
c
d


This is a bit more complicated when matrix is of larger size. Let’s try a 3by-3 matrix.
a
A  d
 g
b c
e f , the determinat must be computed in two steps :
h i 
e f
d f
d e
| A | a *
b*
 c*
 a (e * i  f * h )  b ( d * i  f * g )  c ( d * h  e * g )
h i
g i
g h
Similarly, for a 4-by-4 matrix, you will go through three steps:
1) Take the header of the first row, with corresponding 3-by-3 matrices
underneath, 2) Process the 3-by-3 matrices by repeating the steps in the above
example, 3) Compute the determinant of the 2-by-2 matrices and unfold them to
find the final result.
Transpose and Inverse of a Matrix
The transpose of a matrix is the same matrix with rows and
columns switched. The transpose of the A is:
16 2 3 13
 5 11 10 8 

A
 9 7 6 12


4
14
15
1


16 5 9 4 
 2 11 7 14

AT  
 3 10 6 15


13
8
12
1


The inverse of a matrix, A-1, is the matrix that produced the unit
matrix I when it is multiplied by the matrix itself. I = A-1* A.
There are several ways to compute the inverse of a matrix. We will
introduce the most common one here. Perhaps, we need to find out
whether a matrix has an inverse (invertible). If a matrix is not
invertible, it is singular. An n-by-n matrix A is invertible if there
exists an n-by-n matrix C such that AC = CA = I, where I is the
identity matrix.
Inverse of a Matrix
How do we find the inverse of a matrix? See example below.
 2 4
 2 4 1 0
We wish to compute A-1:
A
  6 8 | 0 1
6
8
First place the identity matrix of the




A| I
same size on the right-hand-side of the original
matrix. Then, work out through steps to move the identity matrix to the left-hand-side.
Once that is accomplished, the matrix on the right-hand-side will be the inverse.
Step1: we need to get ride of 4 on the first row. Thus, we will use: row1 = row1 – (.5) row2
2  .5 * 6 4  .5 * 8 1  0.5 * 0 0  .5 *1  1 0 1  .5
|

|
 6


8
0
1
6
8
0
1

 

Step2: we need to get ride of 6 on the second row. Thus, row2 = row2 + 6row1
0
1
 .5   1 0 1  .5
 1
6  6(1) 8  6(0) | 0  6(1) 1  6(.5)   0 8 | 6  2 

 

Step3: The matrix on the left is almost ready. The next thing we need to do is to divide the each row
by a number to produce the identity matrix on the left. Row1=row1*(-1), row2 = row2/8
0.5 
 1* (1)  1 0 * (1)  0 1* (1)  (1)  .5 * (1)  0.5  1 0  1
|

|
 0/8  0


8
/
8

1
6
/
8

0
.
75

2
/
8

(

0
.
25
)
0
1
0
.
75

0
.
25

 

0.5 
As you notice the identity matrix has moved
 1
1
A 

to the left-hand-side and thus the right-hand-side
0
.
75

0
.
25


matrix is the inverse.
DOT Product
 v1 
 u1 
The DOT product of v  v2  and u  u2  is defined as :
v3 
u3 
u  v  v1u1  v2u2  v3u3
The DOT product of vector will be a scalar and can also be defined as:
u  v | u || v | cos( )
where  is the angle between th e two vectors
and | u | and | v | denote the magnitute of the vectors u and v respective ly.
The DOT product of two orthogonal (vertical) vectors is 0. Example:
v = 2i + 3j – 4k and u = 2i - 3j + 2k
u.v  (2)( 2)  (3)( 3)  (4)( 2)  13
What is the angle between these two vectors?
First we need to compute the magnitude (length) of each vector.
| v | 2 2  32  (4) 2  29  5.3852 and | u | 2 2  (3) 2  (2) 2  17  4.1231
cos( ) 
u.v
 13

 -0.5855 which results in   arccos(0.5855)
| u || v | (4.1231)(5.3852)
o
 125.84
CROSS Product
 v1 
 u1 
The CROSS product of v  v2  and u  u 2  is defined as :
v3 
u3 
j k
i
v  u   v1 v2 v3   i (v2u3  v3u 2 )  j (v1u3  v3u1 )  k (v1u 2  v2u1 )
u1 u 2 u3 
The CROSS product can be defined as:
u  v | u || v | sin(  )
where  is the angle between th e two vectors
and | u | and | v | denote the magnitute of the vectors u and v respective ly.
Example: v = 2i + 3j – 4k and u = 2i - 3j + 2k
j
k 
i
v  u  2 3  4 
2  3 2 
i ((3)( 2)  (4)( 3))  j (( 2)( 2)  (4)( 2))  k (( 2)( 3)  (3)( 2)) 
 6i  12 j  12k
Note: The result is a vector
Chapter 0 - Review of Statistics
Objectives
• Computing the mean (average)
• Finding the median
• Computing the variance
• Computing standard deviation
• Probability and random numbers
• Probability distribution function
• Cumulative distribution function
Computing mean and median
The mean of a set of values is the weighted average of the possible values
in that set. Basically sum of all values divided by the number of values.
Sometimes this can be written as the sum shown on the right-hand-side.
Where f denotes the number of occurrences of a particular value being
m
N
observed in the set.
f *x

x 
i 1
i

N
j 1
i
j
N
The median is the point in the sequence where the set is divided into two
equal parts each containing ½ of the values.
Example: X = {2, 12, 4, 12, 7, 3, 5, 12, 4, 8, 2}
Sorted X={2, 2, 3, 4, 4, 5, 7, 8, 12, 12, 12}
The number in the middle of this set is 5, that is
Values
f (frequency)
the median. What if we had an even number of
2
2
values? What is the average?
3
1
11
4
2
5
1

7
1
or
8
1
12
3
x
i 1
i
11

2  12  4  12  7  3  5  12  4  8  2
 6.454545
11
7

fx
i 1
i i
11

2 * 2  1* 3  2 * 4  1* 5  1* 7  1* 8  3 *12
 6.454545
11
Measure of Variation
The mean and median do not describe the amount of dispersion or variation
among the observed values. See examples below:
30
45
40
40
35
25
35
20
30
30
25
25
15
20
20
15
10
15
10
10
5
5
5
0
0
1
2
3
4
5
0
1
2
3
4
5
1
2
3
4
5
All three have the same median and mean. The first measure of variance is
range. Once again let’s consider X = {2, 12, 4, 12, 7, 3, 5, 12, 4, 8, 2}. Range
defines the difference between the largest and the smallest values.
Range = 12-2 = 10.
The secondN measure of variation is the mean deviation, which is defined as:
| xi   |

i 1
Mean deviation 
n

2(| 2  6.4545 |)  3(| 12  6.4545 |)  2(| 4  6.4545 |)  (| 3  6.4545 |)  (| 5  6.4545 |)  (| 7  6.4545 |)  (| 8  6.4545 |)

11
2(4.4545)  3(5.5555)  2(2.4545)  3.4545  1.4545  1.5555  2.5555

 3.59
11
Measure of Variation
The most commonly used measure for the variation is sample variance
referred to as variance.
2
(
x


)

i
s2 
n 1
For the X = {2, 12, 4, 12, 7, 3, 5, 12, 4, 8, 2}, this will be:
2(2  6.4545) 2  3(12  6.4545) 2  2(4  6.4545) 2  (3  6.4545) 2  (5  6.4545) 2  (7  6.4545) 2  (8  6.4545) 2
s 
10
2
2(4.4545) 2  3(5.5555) 2  2(2.4545) 2  (3.4545) 2  (1.4545) 2  (1.5555) 2  (2.5555) 2

10
 16.0727
The standard deviation is an important measure deviation usually referred
to as the error and is defined as:
s  s 2  16.0727
MATLAB Examples
• Let the columns represent heart rate, weight and
hours of exercise per week
Random numbers and probability
Probability of an event, xi, is the chance of observing that event when a
large number of observations have been made. This is defined as:
P ( xi ) 
N xi
N
Where, Nx represents the number of times that that particular event have
been observed and N represents the total number of observations. Of
course, the sum of all probabilities must be 1, i.e., if you try all the
possibilities, then you must observe all events. For example in the set:
X = {2, 12, 4, 12, 7, 3, 5, 12, 4, 8, 2}
P(2)  P(4) 
N2 N4 2
N
3
1

 , P(12)  12  , P(3)  P(5)  P(7)  P(8) 
N
N 11
N 11
11
Note: The sum of all these probabilities is 1:
2 2 3 1 1 1 1
      1
11 11 11 11 11 11 11
A random set is a set in which the probability of a variable appearing is the
same as that of all other variables. Since we cannot create perfectly random
values we use pseudo random generators to produce a set of values.
Probability Distribution and Cumulative Distribution
In the previous example we had:
N2 N4 2
N
3
1

 , P(12)  12  , P(3)  P(5)  P(7)  P(8) 
N
N 11
N 11
11
P(2)  P(4) 
1
0.3
0.9
0.25
0.8
0.7
0.2
0.6
0.5
0.15
0.4
0.1
0.3
0.2
0.05
0.1
0
0
1
2
3
4
5
6
7
8
9
10
11
12
PDF
This is simply plotting the probabilities
of each values as they are observed.
Using this type of distribution we can
easily tell which one of the values were
seen most often
1
2
3
4
5
6
7
8
9
10
11
12
CDF
This is the cumulative probabilities as
we get to the next values. The final
result is always 1. That is where all
possible values are observed