Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
4: Regression (continued) CSC 4510 – Machine Learning Dr. Mary-Angela Papalaskari Department of Computing Sciences Villanova University Course website: www.csc.villanova.edu/~map/4510/ The slides in this presentation are adapted from: • The Stanford online ML course http://www.ml-class.org/ CSC 4510 - M.A. Papalaskari - Villanova University 1 Last time • • • • Introduction to linear regression Intuition – least squares approximation Intuition – gradient descent algorithm Hands on: Simple example using excel CSC 4510 - M.A. Papalaskari - Villanova University 2 Today • How to apply gradient descent to minimize the cost function for regression • linear algebra refresher CSC 4510 - M.A. Papalaskari - Villanova University 3 Reminder: sample problem Housing Prices (Portland, OR) 500 400 300 Price 200 (in 1000s of dollars) 100 0 0 500 1000 1500 Size CSC 4510 - M.A. Papalaskari - Villanova University 2000 (feet2) 2500 3000 4 Reminder: Notation Training set of housing prices (Portland, OR) Size in feet2 (x) 2104 1416 1534 852 … Price ($) in 1000's (y) 460 232 315 178 … Notation: m = Number of training examples x’s = “input” variable / features y’s = “output” variable / “target” variable CSC 4510 - M.A. Papalaskari - Villanova University 5 Reminder: Learning algorithm for hypothesis function h Training Set Learning Algorithm Linear Hypothesis: Size of house h Estimate price Univariate linear regression) CSC 4510 - M.A. Papalaskari - Villanova University 6 Reminder: Learning algorithm for hypothesis function h Training Set Learning Algorithm Linear Hypothesis: Size of house h Estimate price Univariate linear regression) CSC 4510 - M.A. Papalaskari - Villanova University 7 Gradient descent algorithm Linear Regression Model CSC 4510 - M.A. Papalaskari - Villanova University 8 Today • How to apply gradient descent to minimize the cost function for regression 1. a closer look at the cost function 2. applying gradient descent to find the minimum of the cost function • linear algebra refresher CSC 4510 - M.A. Papalaskari - Villanova University 9 Hypothesis: Parameters: Cost Function: Goal: CSC 4510 - M.A. Papalaskari - Villanova University 10 Simplified θ0 = 0 Hypothesis: Parameters: Cost Function: Goal: CSC 4510 - M.A. Papalaskari - Villanova University 11 θ0 = 0 (for fixed θ1 this is a function of x) (function of the parameter θ1 ) 3 3 2 2 1 1 y 0 0 1 x 2 3 0 -0.5 0 0.5 1 1.5 2 2.5 hθ (x) = x CSC 4510 - M.A. Papalaskari - Villanova University 12 θ0 = 0 (for fixed θ1 this is a function of x) (function of the parameter θ1 ) 3 2 y 1 0 0 1 x 2 3 hθ (x) = 0.5x CSC 4510 - M.A. Papalaskari - Villanova University 13 θ0 = 0 (for fixed θ1 this is a function of x) (function of the parameter θ1 ) 3 2 y 1 0 0 1 x 2 3 hθ (x) = 0 CSC 4510 - M.A. Papalaskari - Villanova University 14 What if θ0 ≠ 0? Hypothesis: Parameters: Cost Function: Goal: CSC 4510 - M.A. Papalaskari - Villanova University 15 (for fixed θ0 , θ1 , this is a function of x) (function of the parameters θ0 , θ1) 500 400 Price ($) in 1000’s 300 200 100 0 0 1000 2000 Size in feet2 (x) 3000 hθ (x) = 10 + 0.1x CSC 4510 - M.A. Papalaskari - Villanova University 16 CSC 4510 - M.A. Papalaskari - Villanova University 17 (for fixed θ0 , θ1 , this is a function of x) (function of the parameters θ0 , θ1) CSC 4510 - M.A. Papalaskari - Villanova University 18 (for fixed θ0 , θ1 , this is a function of x) (function of the parameters θ0 , θ1) CSC 4510 - M.A. Papalaskari - Villanova University 19 (for fixed θ0 , θ1 , this is a function of x) (function of the parameters θ0 , θ1) CSC 4510 - M.A. Papalaskari - Villanova University 20 (for fixed θ0 , θ1 , this is a function of x) (function of the parameters θ0 , θ1) CSC 4510 - M.A. Papalaskari - Villanova University 21 Today • How to apply gradient descent to minimize the cost function for regression 1. a closer look at the cost function 2. applying gradient descent to find the minimum of the cost function • linear algebra refresher CSC 4510 - M.A. Papalaskari - Villanova University 22 Have some function Want Gradient descent algorithm outline: • Start with some • Keep changing to reduce until we hopefully end up at a minimum CSC 4510 - M.A. Papalaskari - Villanova University 23 Have some function Want Gradient descent algorithm CSC 4510 - M.A. Papalaskari - Villanova University 24 Have some function Want Gradient descent algorithm learning rate CSC 4510 - M.A. Papalaskari - Villanova University 25 If α is too small, gradient descent can be slow. If α is too large, gradient descent can overshoot the minimum. It may fail to converge, or even diverge. CSC 4510 - M.A. Papalaskari - Villanova University 26 at local minimum Current value of CSC 4510 - M.A. Papalaskari - Villanova University 27 Gradient descent can converge to a local minimum, even with the learning rate α fixed. CSC 4510 - M.A. Papalaskari - Villanova University 28 Gradient descent algorithm Linear Regression Model CSC 4510 - M.A. Papalaskari - Villanova University 29 Gradient descent algorithm update and simultaneously CSC 4510 - M.A. Papalaskari - Villanova University 30 J(0,1) 1 0 CSC 4510 - M.A. Papalaskari - Villanova University 31 CSC 4510 - M.A. Papalaskari - Villanova University 33 (for fixed , this is a function of x) (function of the parameters CSC 4510 - M.A. Papalaskari - Villanova University ) 34 (for fixed , this is a function of x) (function of the parameters CSC 4510 - M.A. Papalaskari - Villanova University ) 35 (for fixed , this is a function of x) (function of the parameters CSC 4510 - M.A. Papalaskari - Villanova University ) 36 (for fixed , this is a function of x) (function of the parameters CSC 4510 - M.A. Papalaskari - Villanova University ) 37 (for fixed , this is a function of x) (function of the parameters CSC 4510 - M.A. Papalaskari - Villanova University ) 38 (for fixed , this is a function of x) (function of the parameters CSC 4510 - M.A. Papalaskari - Villanova University ) 39 (for fixed , this is a function of x) (function of the parameters CSC 4510 - M.A. Papalaskari - Villanova University ) 40 (for fixed , this is a function of x) (function of the parameters CSC 4510 - M.A. Papalaskari - Villanova University ) 41 (for fixed , this is a function of x) (function of the parameters CSC 4510 - M.A. Papalaskari - Villanova University ) 42 “Batch” Gradient Descent “Batch”: Each step of gradient descent uses all the training examples. Alternative: process part of the dataset for each step of the algorithm. The slides in this presentation are adapted from: • The Stanford online ML course http://www.ml-class.org/ CSC 4510 - M.A. Papalaskari - Villanova University 43 What’s next? We are not in univariate regression anymore: Size (feet2) 1 1 1 1 2104 1416 1534 852 Number of Number Age of home Price ($1000) bedrooms of floors (years) 5 3 3 2 1 2 2 1 45 40 30 36 CSC 4510 - M.A. Papalaskari - Villanova University 460 232 315 178 44 What’s next? We are not in univariate regression anymore: Size (feet2) 1 1 1 1 2104 1416 1534 852 Number of Number Age of home Price ($1000) bedrooms of floors (years) 5 3 3 2 1 2 2 1 45 40 30 36 CSC 4510 - M.A. Papalaskari - Villanova University 460 232 315 178 45 Today • How to apply gradient descent to minimize the cost function for regression 1. a closer look at the cost function 2. applying gradient descent to find the minimum of the cost function • linear algebra refresher CSC 4510 - M.A. Papalaskari - Villanova University 46 Linear Algebra Review CSC 4510 - M.A. Papalaskari - Villanova University 47 Matrix: Rectangular array of numbers Matrix Elements (entries of matrix) “ i, j entry” in the ith row, jth column Dimension of matrix: number of rows x number of columns CSC 4510 - M.A. Papalaskari - Villanova University eg: 4 x 2 48 Another Example: Representing communication links in a network b b a c e a d c e Adjacency matrix a b c d e a 0 1 2 0 3 b 1 0 0 0 0 c 2 0 0 1 1 d 0 0 1 0 1 e 3 0 1 1 0 Adjacency a b a 0 1 b 0 1 c 1 0 d 0 0 e 0 0 49 matrix c d e 0 0 2 0 0 0 0 1 0 1 0 1 0 0 0 d Vector: An n x 1 matrix. element CSC 4510 - M.A. Papalaskari - Villanova University 50 Vector: An n x 1 matrix. 1-indexed vs 0-indexed: element CSC 4510 - M.A. Papalaskari - Villanova University 51 Matrix Addition CSC 4510 - M.A. Papalaskari - Villanova University 52 Scalar Multiplication CSC 4510 - M.A. Papalaskari - Villanova University 53 Combination of Operands CSC 4510 - M.A. Papalaskari - Villanova University 54 Matrix-vector multiplication CSC 4510 - M.A. Papalaskari - Villanova University 55 Details: m x n matrix (m rows, n columns) n x 1 matrix (n-dimensional m-dimensional vector vector) To get yi, multiply A’s ith row with elements of vector x, and add them up. CSC 4510 - M.A. Papalaskari - Villanova University 56 Example CSC 4510 - M.A. Papalaskari - Villanova University 57 House sizes: CSC 4510 - M.A. Papalaskari - Villanova University 58 Example matrix-matrix multiplication Details: m x k matrix (m rows, k columns) k x n matrix (k rows, n columns) mxn matrix The ith column of the Matrix C is obtained by multiplying A with the ith column of B. (for i = 1, 2, … , n ) CSC 4510 - M.A. Papalaskari - Villanova University 60 Example: Matrix-matrix multiplication CSC 4510 - M.A. Papalaskari - Villanova University 61 House sizes: Have 3 competing hypotheses: 1. 2. 3. Matrix Matrix CSC 4510 - M.A. Papalaskari - Villanova University 62 Let and be matrices. Then in general, (not commutative.) E.g. CSC 4510 - M.A. Papalaskari - Villanova University 63 Let Compute Let Compute CSC 4510 - M.A. Papalaskari - Villanova University 64 Identity Matrix Denoted I (or Inxn or In). Examples of identity matrices: 2x2 3x3 4x4 For any matrix A, CSC 4510 - M.A. Papalaskari - Villanova University 65 Matrix inverse: A-1 If A is an m x m matrix, and if it has an inverse, Matrices that don’t have an inverse are “singular” or “degenerate” CSC 4510 - M.A. Papalaskari - Villanova University 66 Matrix Transpose Example: Let Then be an m x n matrix, and let is an n x m matrix, and CSC 4510 - M.A. Papalaskari - Villanova University 67 What’s next? We are not in univariate regression anymore: Size (feet2) 1 1 1 1 2104 1416 1534 852 Number of Number Age of home Price ($1000) bedrooms of floors (years) 5 3 3 2 1 2 2 1 45 40 30 36 CSC 4510 - M.A. Papalaskari - Villanova University 460 232 315 178 68