Survey

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts

Vector space wikipedia, lookup

Covariance and contravariance of vectors wikipedia, lookup

Euclidean vector wikipedia, lookup

Laplace–Runge–Lenz vector wikipedia, lookup

Matrix calculus wikipedia, lookup

System of linear equations wikipedia, lookup

Singular-value decomposition wikipedia, lookup

Eigenvalues and eigenvectors wikipedia, lookup

Gaussian elimination wikipedia, lookup

Matrix multiplication wikipedia, lookup

Cayley–Hamilton theorem wikipedia, lookup

Ordinary least squares wikipedia, lookup

Principal component analysis wikipedia, lookup

Non-negative matrix factorization wikipedia, lookup

Perron–Frobenius theorem wikipedia, lookup

Orthogonal matrix wikipedia, lookup

Jordan normal form wikipedia, lookup

Transcript

Lecture 3: Math Primer II Machine Learning Andrew Rosenberg Today • • • • Wrap up of probability Vectors, Matrices. Calculus Derivation with respect to a vector. 1 Properties of probability density functions Sum Rule Product Rule 2 Expected Values • Given a random variable, with a distribution p(X), what is the expected value of X? 3 Multinomial Distribution • If a variable, x, can take 1-of-K states, we represent the distribution of this variable as a multinomial distribution. • The probability of x being in state k is μk 4 Expected Value of a Multinomial • The expected value is the mean values. 5 Gaussian Distribution • One Dimension • D-Dimensions 6 Gaussians 7 How machine learning uses statistical modeling • Expectation – The expected value of a function is the hypothesis • Variance – The variance is the confidence in that hypothesis 8 Variance • The variance of a random variable describes how much variability around the expected value there is. • Calculated as the expected squared error. 9 Covariance • The covariance of two random variables expresses how they vary together. • If two variables are independent, their covariance equals zero. 10 Linear Algebra • Vectors – A one dimensional array. – If not specified, assume x is a column vector. • Matrices – Higher dimensional array. – Typically denoted with capital letters. – n rows by m columns 11 Transposition • Transposing a matrix swaps columns and rows. 12 Transposition • Transposing a matrix swaps columns and rows. 13 Addition • Matrices can be added to themselves iff they have the same dimensions. – A and B are both n-by-m matrices. 14 Multiplication • To multiply two matrices, the inner dimensions must be the same. – An n-by-m matrix can be multiplied by an m-by-k matrix 15 Inversion • The inverse of an n-by-n or square matrix A is denoted A-1, and has the following property. • Where I is the identity matrix is an n-by-n matrix with ones along the diagonal. – Iij = 1 iff i = j, 0 otherwise 16 Identity Matrix • Matrices are invariant under multiplication by the identity matrix. 17 Helpful matrix inversion properties 18 Norm • The norm of a vector, x, represents the euclidean length of a vector. 19 Positive Definite-ness • Quadratic form – Scalar – Vector • Positive Definite matrix M • Positive Semi-definite 20 Calculus • Derivatives and Integrals • Optimization 21 Derivatives • A derivative of a function defines the slope at a point x. 22 Derivative Example 23 Integrals • Integration is the inverse operation of derivation (plus a constant) • Graphically, an integral can be considered the area under the curve defined by f(x) 24 Integration Example 25 Vector Calculus • Derivation with respect to a matrix or vector • Gradient • Change of Variables with a Vector 26 Derivative w.r.t. a vector • Given a vector x, and a function f(x), how can we find f’(x)? 27 Derivative w.r.t. a vector • Given a vector x, and a function f(x), how can we find f’(x)? 28 Example Derivation 29 Example Derivation Also referred to as the gradient of a function. 30 Useful Vector Calculus identities • Scalar Multiplication • Product Rule 31 Useful Vector Calculus identities • Derivative of an inverse • Change of Variable 32 Optimization • Have an objective function that we’d like to maximize or minimize, f(x) • Set the first derivative to zero. 33 Optimization with constraints • What if I want to constrain the parameters of the model. – The mean is less than 10 • Find the best likelihood, subject to a constraint. • Two functions: – An objective function to maximize – An inequality that must be satisfied 34 Lagrange Multipliers • Find maxima of f(x,y) subject to a constraint. 35 General form • Maximizing: • Subject to: • Introduce a new variable, and find a maxima. 36 Example • Maximizing: • Subject to: • Introduce a new variable, and find a maxima. 37 Example Now have 3 equations with 3 unknowns. 38 Example Eliminate Lambda Substitute and Solve 39 Why does Machine Learning need these tools? • Calculus – We need to identify the maximum likelihood, or minimum risk. Optimization – Integration allows the marginalization of continuous probability density functions • Linear Algebra – Many features leads to high dimensional spaces – Vectors and matrices allow us to compactly describe and manipulate high dimension al feature spaces. 40 Why does Machine Learning need these tools? • Vector Calculus – All of the optimization needs to be performed in high dimensional spaces – Optimization of multiple variables simultaneously – Gradient Descent – Want to take a marginal over high dimensional distributions like Gaussians. 41 Next Time • Linear Regression – Then Regularization 42