Download pptx

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts

Four-vector wikipedia, lookup

Vector space wikipedia, lookup

Covariance and contravariance of vectors wikipedia, lookup

Euclidean vector wikipedia, lookup

Laplace–Runge–Lenz vector wikipedia, lookup

Matrix calculus wikipedia, lookup

System of linear equations wikipedia, lookup

Singular-value decomposition wikipedia, lookup

Eigenvalues and eigenvectors wikipedia, lookup

Gaussian elimination wikipedia, lookup

Matrix multiplication wikipedia, lookup

Cayley–Hamilton theorem wikipedia, lookup

Ordinary least squares wikipedia, lookup

Principal component analysis wikipedia, lookup

Non-negative matrix factorization wikipedia, lookup

Perron–Frobenius theorem wikipedia, lookup

Orthogonal matrix wikipedia, lookup

Jordan normal form wikipedia, lookup

Determinant wikipedia, lookup

Matrix (mathematics) wikipedia, lookup

Generalizations of the derivative wikipedia, lookup

Transcript
Lecture 3: Math Primer II
Machine Learning
Andrew Rosenberg
Today
•
•
•
•
Wrap up of probability
Vectors, Matrices.
Calculus
Derivation with respect to a vector.
1
Properties of probability density
functions
Sum Rule
Product Rule
2
Expected Values
• Given a random variable, with a distribution
p(X), what is the expected value of X?
3
Multinomial Distribution
• If a variable, x, can take 1-of-K states, we
represent the distribution of this variable
as a multinomial distribution.
• The probability of x being in state k is μk
4
Expected Value of a Multinomial
• The expected value is the mean values.
5
Gaussian Distribution
• One Dimension
• D-Dimensions
6
Gaussians
7
How machine learning uses
statistical modeling
• Expectation
– The expected value of a function is the
hypothesis
• Variance
– The variance is the confidence in that
hypothesis
8
Variance
• The variance of a random variable describes
how much variability around the expected
value there is.
• Calculated as the expected squared error.
9
Covariance
• The covariance of two random variables
expresses how they vary together.
• If two variables are independent, their
covariance equals zero.
10
Linear Algebra
• Vectors
– A one dimensional array.
– If not specified, assume x is a column
vector.
• Matrices
– Higher dimensional array.
– Typically denoted with capital letters.
– n rows by m columns
11
Transposition
• Transposing a matrix swaps columns and
rows.
12
Transposition
• Transposing a matrix swaps columns and
rows.
13
Addition
• Matrices can be added to themselves iff
they have the same dimensions.
– A and B are both n-by-m matrices.
14
Multiplication
• To multiply two matrices, the inner dimensions must
be the same.
– An n-by-m matrix can be multiplied by an m-by-k matrix
15
Inversion
• The inverse of an n-by-n or square matrix
A is denoted A-1, and has the following
property.
• Where I is the identity matrix is an n-by-n
matrix with ones along the diagonal.
– Iij = 1 iff i = j, 0 otherwise
16
Identity Matrix
• Matrices are invariant under multiplication
by the identity matrix.
17
Helpful matrix inversion properties
18
Norm
• The norm of a vector, x, represents the
euclidean length of a vector.
19
Positive Definite-ness
• Quadratic form
– Scalar
– Vector
• Positive Definite matrix M
• Positive Semi-definite
20
Calculus
• Derivatives and Integrals
• Optimization
21
Derivatives
• A derivative of a function defines the
slope at a point x.
22
Derivative Example
23
Integrals
• Integration is the inverse operation of
derivation (plus a constant)
• Graphically, an integral can be considered
the area under the curve defined by f(x)
24
Integration Example
25
Vector Calculus
• Derivation with respect to a matrix or
vector
• Gradient
• Change of Variables with a Vector
26
Derivative w.r.t. a vector
• Given a vector x, and a function f(x), how
can we find f’(x)?
27
Derivative w.r.t. a vector
• Given a vector x, and a function f(x), how
can we find f’(x)?
28
Example Derivation
29
Example Derivation
Also referred to as the gradient of a function.
30
Useful Vector Calculus identities
• Scalar Multiplication
• Product Rule
31
Useful Vector Calculus identities
• Derivative of an inverse
• Change of Variable
32
Optimization
• Have an objective function that we’d like to
maximize or minimize, f(x)
• Set the first derivative to zero.
33
Optimization with constraints
• What if I want to constrain the parameters
of the model.
– The mean is less than 10
• Find the best likelihood, subject to a
constraint.
• Two functions:
– An objective function to maximize
– An inequality that must be satisfied
34
Lagrange Multipliers
• Find maxima of
f(x,y) subject to a
constraint.
35
General form
• Maximizing:
• Subject to:
• Introduce a new variable, and find a
maxima.
36
Example
• Maximizing:
• Subject to:
• Introduce a new variable, and find a
maxima.
37
Example
Now have 3 equations with 3 unknowns.
38
Example
Eliminate Lambda
Substitute and Solve
39
Why does Machine Learning need
these tools?
• Calculus
– We need to identify the maximum likelihood, or
minimum risk. Optimization
– Integration allows the marginalization of
continuous probability density functions
• Linear Algebra
– Many features leads to high dimensional spaces
– Vectors and matrices allow us to compactly
describe and manipulate high dimension al
feature spaces.
40
Why does Machine Learning need
these tools?
• Vector Calculus
– All of the optimization needs to be performed
in high dimensional spaces
– Optimization of multiple variables
simultaneously – Gradient Descent
– Want to take a marginal over high
dimensional distributions like Gaussians.
41
Next Time
• Linear Regression
– Then Regularization
42