Download ppt - Villanova Computer Science

Document related concepts
no text concepts found
Transcript
4: Regression (continued)
CSC 4510 – Machine Learning
Dr. Mary-Angela Papalaskari
Department of Computing Sciences
Villanova University
Course website:
www.csc.villanova.edu/~map/4510/
The slides in this presentation are adapted from:
•
The Stanford online ML course http://www.ml-class.org/
CSC 4510 - M.A. Papalaskari - Villanova University
1
Last time
•
•
•
•
Introduction to linear regression
Intuition – least squares approximation
Intuition – gradient descent algorithm
Hands on: Simple example using excel
CSC 4510 - M.A. Papalaskari - Villanova University
2
Today
• How to apply gradient descent to minimize
the cost function for regression
• linear algebra refresher
CSC 4510 - M.A. Papalaskari - Villanova University
3
Reminder: sample problem
Housing Prices
(Portland, OR)
500
400
300
Price 200
(in 1000s
of dollars) 100
0
0
500
1000
1500
Size
CSC 4510 - M.A. Papalaskari - Villanova University
2000
(feet2)
2500
3000
4
Reminder: Notation
Training set of
housing prices
(Portland, OR)
Size in feet2
(x)
2104
1416
1534
852
…
Price ($) in
1000's (y)
460
232
315
178
…
Notation:
m = Number of training examples
x’s = “input” variable / features
y’s = “output” variable / “target” variable
CSC 4510 - M.A. Papalaskari - Villanova University
5
Reminder: Learning algorithm for
hypothesis function h
Training Set
Learning Algorithm
Linear Hypothesis:
Size of
house
h
Estimate
price
Univariate linear regression)
CSC 4510 - M.A. Papalaskari - Villanova University
6
Reminder: Learning algorithm for
hypothesis function h
Training Set
Learning Algorithm
Linear Hypothesis:
Size of
house
h
Estimate
price
Univariate linear regression)
CSC 4510 - M.A. Papalaskari - Villanova University
7
Gradient descent algorithm
Linear Regression Model
CSC 4510 - M.A. Papalaskari - Villanova University
8
Today
• How to apply gradient descent to minimize
the cost function for regression
1. a closer look at the cost function
2. applying gradient descent to find the minimum
of the cost function
• linear algebra refresher
CSC 4510 - M.A. Papalaskari - Villanova University
9
Hypothesis:
Parameters:
Cost Function:
Goal:
CSC 4510 - M.A. Papalaskari - Villanova University
10
Simplified
θ0 = 0
Hypothesis:
Parameters:
Cost Function:
Goal:
CSC 4510 - M.A. Papalaskari - Villanova University
11
θ0 = 0
(for fixed θ1 this is a function of x)
(function of the parameter θ1 )
3
3
2
2
1
1
y
0
0
1
x 2
3
0
-0.5 0 0.5 1 1.5 2 2.5
hθ (x) = x
CSC 4510 - M.A. Papalaskari - Villanova University
12
θ0 = 0
(for fixed θ1 this is a function of x)
(function of the parameter θ1 )
3
2
y
1
0
0
1
x 2
3
hθ (x) = 0.5x
CSC 4510 - M.A. Papalaskari - Villanova University
13
θ0 = 0
(for fixed θ1 this is a function of x)
(function of the parameter θ1 )
3
2
y
1
0
0
1
x 2
3
hθ (x) = 0
CSC 4510 - M.A. Papalaskari - Villanova University
14
What if θ0 ≠ 0?
Hypothesis:
Parameters:
Cost Function:
Goal:
CSC 4510 - M.A. Papalaskari - Villanova University
15
(for fixed θ0 , θ1 , this is a function of x)
(function of the parameters θ0 , θ1)
500
400
Price ($)
in 1000’s 300
200
100
0
0
1000
2000
Size in feet2 (x)
3000
hθ (x) = 10 + 0.1x
CSC 4510 - M.A. Papalaskari - Villanova University
16
CSC 4510 - M.A. Papalaskari - Villanova University
17
(for fixed θ0 , θ1 , this is a function of x)
(function of the parameters θ0 , θ1)
CSC 4510 - M.A. Papalaskari - Villanova University
18
(for fixed θ0 , θ1 , this is a function of x)
(function of the parameters θ0 , θ1)
CSC 4510 - M.A. Papalaskari - Villanova University
19
(for fixed θ0 , θ1 , this is a function of x)
(function of the parameters θ0 , θ1)
CSC 4510 - M.A. Papalaskari - Villanova University
20
(for fixed θ0 , θ1 , this is a function of x)
(function of the parameters θ0 , θ1)
CSC 4510 - M.A. Papalaskari - Villanova University
21
Today
• How to apply gradient descent to minimize
the cost function for regression
1. a closer look at the cost function
2. applying gradient descent to find the minimum
of the cost function
• linear algebra refresher
CSC 4510 - M.A. Papalaskari - Villanova University
22
Have some function
Want
Gradient descent algorithm outline:
• Start with some
• Keep changing
to reduce
until we hopefully end up at a minimum
CSC 4510 - M.A. Papalaskari - Villanova University
23
Have some function
Want
Gradient descent algorithm
CSC 4510 - M.A. Papalaskari - Villanova University
24
Have some function
Want
Gradient descent algorithm
learning rate
CSC 4510 - M.A. Papalaskari - Villanova University
25
If α is too small, gradient
descent can be slow.
If α is too large, gradient descent
can overshoot the minimum. It
may fail to converge, or even
diverge.
CSC 4510 - M.A. Papalaskari - Villanova University
26
at local minimum
Current value of
CSC 4510 - M.A. Papalaskari - Villanova University
27
Gradient descent can converge to a local
minimum, even with the learning rate α
fixed.
CSC 4510 - M.A. Papalaskari - Villanova University
28
Gradient descent algorithm
Linear Regression Model
CSC 4510 - M.A. Papalaskari - Villanova University
29
Gradient descent algorithm
update
and
simultaneously
CSC 4510 - M.A. Papalaskari - Villanova University
30
J(0,1)
1
0
CSC 4510 - M.A. Papalaskari - Villanova University
31
CSC 4510 - M.A. Papalaskari - Villanova University
33
(for fixed
, this is a function of x)
(function of the parameters
CSC 4510 - M.A. Papalaskari - Villanova University
)
34
(for fixed
, this is a function of x)
(function of the parameters
CSC 4510 - M.A. Papalaskari - Villanova University
)
35
(for fixed
, this is a function of x)
(function of the parameters
CSC 4510 - M.A. Papalaskari - Villanova University
)
36
(for fixed
, this is a function of x)
(function of the parameters
CSC 4510 - M.A. Papalaskari - Villanova University
)
37
(for fixed
, this is a function of x)
(function of the parameters
CSC 4510 - M.A. Papalaskari - Villanova University
)
38
(for fixed
, this is a function of x)
(function of the parameters
CSC 4510 - M.A. Papalaskari - Villanova University
)
39
(for fixed
, this is a function of x)
(function of the parameters
CSC 4510 - M.A. Papalaskari - Villanova University
)
40
(for fixed
, this is a function of x)
(function of the parameters
CSC 4510 - M.A. Papalaskari - Villanova University
)
41
(for fixed
, this is a function of x)
(function of the parameters
CSC 4510 - M.A. Papalaskari - Villanova University
)
42
“Batch” Gradient Descent
“Batch”: Each step of gradient
descent uses all the training
examples.
Alternative: process part of the
dataset for each step of the
algorithm.
The slides in this presentation are adapted from:
•
The Stanford online ML course http://www.ml-class.org/
CSC 4510 - M.A. Papalaskari - Villanova University
43
What’s next? We are not in univariate regression anymore:
Size
(feet2)
1
1
1
1
2104
1416
1534
852
Number
of
Number Age of home Price ($1000)
bedrooms of floors
(years)
5
3
3
2
1
2
2
1
45
40
30
36
CSC 4510 - M.A. Papalaskari - Villanova University
460
232
315
178
44
What’s next? We are not in univariate regression anymore:
Size
(feet2)
1
1
1
1
2104
1416
1534
852
Number
of
Number Age of home Price ($1000)
bedrooms of floors
(years)
5
3
3
2
1
2
2
1
45
40
30
36
CSC 4510 - M.A. Papalaskari - Villanova University
460
232
315
178
45
Today
• How to apply gradient descent to minimize
the cost function for regression
1. a closer look at the cost function
2. applying gradient descent to find the minimum
of the cost function
• linear algebra refresher
CSC 4510 - M.A. Papalaskari - Villanova University
46
Linear Algebra Review
CSC 4510 - M.A. Papalaskari - Villanova University
47
Matrix: Rectangular array of numbers
Matrix Elements (entries of matrix)
“ i, j entry” in the ith row, jth column
Dimension of matrix:
number of rows x number of columns
CSC 4510 - M.A. Papalaskari - Villanova University
eg: 4 x 2
48
Another Example:
Representing communication links in a network
b
b
a
c
e
a
d
c
e
Adjacency matrix
a b c d e
a 0 1 2 0 3
b 1 0 0 0 0
c 2 0 0 1 1
d 0 0 1 0 1
e 3 0 1 1 0
Adjacency
a b
a
0 1
b
0 1
c
1 0
d
0 0
e
0 0
49
matrix
c d e
0 0 2
0 0 0
0 1 0
1 0 1
0 0 0
d
Vector: An n x 1 matrix.
element
CSC 4510 - M.A. Papalaskari - Villanova University
50
Vector: An n x 1 matrix.
1-indexed vs 0-indexed:
element
CSC 4510 - M.A. Papalaskari - Villanova University
51
Matrix Addition
CSC 4510 - M.A. Papalaskari - Villanova University
52
Scalar Multiplication
CSC 4510 - M.A. Papalaskari - Villanova University
53
Combination of Operands
CSC 4510 - M.A. Papalaskari - Villanova University
54
Matrix-vector multiplication
CSC 4510 - M.A. Papalaskari - Villanova University
55
Details:
m x n matrix
(m rows,
n columns)
n x 1 matrix
(n-dimensional m-dimensional
vector
vector)
To get yi, multiply A’s ith row with
elements of vector x, and add them up.
CSC 4510 - M.A. Papalaskari - Villanova University
56
Example
CSC 4510 - M.A. Papalaskari - Villanova University
57
House sizes:
CSC 4510 - M.A. Papalaskari - Villanova University
58
Example matrix-matrix multiplication
Details:
m x k matrix
(m rows,
k columns)
k x n matrix
(k rows,
n columns)
mxn
matrix
The ith column of the Matrix C is obtained
by multiplying A with the ith column of B.
(for i = 1, 2, … , n )
CSC 4510 - M.A. Papalaskari - Villanova University
60
Example: Matrix-matrix multiplication
CSC 4510 - M.A. Papalaskari - Villanova University
61
House sizes:
Have 3 competing
hypotheses:
1.
2.
3.
Matrix
Matrix
CSC 4510 - M.A. Papalaskari - Villanova University
62
Let
and
be matrices. Then in general,
(not commutative.)
E.g.
CSC 4510 - M.A. Papalaskari - Villanova University
63
Let
Compute
Let
Compute
CSC 4510 - M.A. Papalaskari - Villanova University
64
Identity Matrix
Denoted I (or Inxn or In).
Examples of identity matrices:
2x2
3x3
4x4
For any matrix A,
CSC 4510 - M.A. Papalaskari - Villanova University
65
Matrix inverse: A-1
If A is an m x m matrix, and if it has an inverse,
Matrices that don’t have an inverse are “singular” or “degenerate”
CSC 4510 - M.A. Papalaskari - Villanova University
66
Matrix Transpose
Example:
Let
Then
be an m x n matrix, and let
is an n x m matrix, and
CSC 4510 - M.A. Papalaskari - Villanova University
67
What’s next? We are not in univariate regression anymore:
Size
(feet2)
1
1
1
1
2104
1416
1534
852
Number
of
Number Age of home Price ($1000)
bedrooms of floors
(years)
5
3
3
2
1
2
2
1
45
40
30
36
CSC 4510 - M.A. Papalaskari - Villanova University
460
232
315
178
68
Related documents