Download Linear regression

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Linear regression
By gradient descent
(with thanks to Prof. Ng’s
machine learning course)
Extending the single variable
multivariate linear regression
hΘ(x) = Θ0 + Θ1x
hΘ(x) = Θ0 + Θ1x1 + Θ2x2 + Θ3x3 + … Θnxn
e.g. start with house prices versus sq ft and then move to house prices
versus sq ft, number of bedrooms, age of house
hΘ(x) = Θ0x0 + Θ1x1 + Θ2x2 + Θ3x3 + … Θnxn
With x0 = 1
hΘ(x) = ΘTx
Cost function
J(Θ) = (1/2m)Σ i=1,m (hΘ(x(i)) – y(i))2
Gradient descent:
Repeat {
Θj = Θj - α ∂J(Θ)/∂Θj
} for all j simultaneously
Θj = Θj - (α /m)Σ i=1,m (hΘ(x(i)) – y(i))
Θ0 = Θ0 - (α /m)Σ i=1,m (hΘ(x(i)) – y(i)) x0(i) 1
Θ1 = Θ1 - (α /m)Σ i=1,m (hΘ(x(i)) – y(i)) x1(i)
Θ2 = Θ2 - (α /m)Σ i=1,m (hΘ(x(i)) – y(i)) x2(i)
What the Equations Mean
The matrices:
y
PRICE
2050
2150
2150
1999
1900
1800
and
x
1
1
1
1
1
1
SQFT
2650
2664
2921
2580
2580
2774
AGE
13
6
3
4
4
2
FEATS
7
5
6
4
4
4
Feature Scaling
Would like all features to fall roughly into range -1 ≤ x ≤ +1
xi replace with (xi - µi )/si where µi is the mean and si is the range;
alternatively, use mean and standard deviation
Don’t scale x0
Converting results back
Learning Rate and Debugging
With small enough α, J should decrease on each iteration: this is first test. An α too
large could have you going past the minimum and climbing other side of curve.
With α too small, convergence is too slow.
Try series of α values, say .oo1, .003,. 01, .03, .1, .3, 1, …
Matlab Implementation
Feature Normalization
function [X_norm, mu, sigma] = featureNormalize(X)
X_norm = X;
mu = zeros(1, size(X, 2));
sigma = zeros(1, size(X, 2));
mu = mean(X);
sigma = std(X);
m = size(X,1);
A = repmat(mu,m,1);
X_norm = X_norm - A;
A = repmat(sigma,m,1);
X_norm =X_norm./A;
end
Gradient Descent
function [theta, J_history]
= gradientDescentMulti(X, y, theta, alpha, num_iters)
m = length(y);
% number of training examples
J_history = zeros(num_iters, 1);
for iter = 1:num_iters
A = (X*theta - y);
deltatheta = (alpha/m)*(A'*X);
theta = theta - deltatheta';
J_history(iter) = computeCostMulti(X, y, theta);
end
end
Cost Function
function J = computeCostMulti(X, y, theta)
m = length(y);
% number of training examples
A = (X*theta - y);
J = (1/(2*m))*(A'*A);
end
Polynomials
hΘ(x) = Θ0 + Θ1x + Θ2x2 + Θ3x3
Replace x with x1, x2 with x2, x3 with x3
Scale the x, x2 , x3 values
Normal Equations
Θ = (A’ A)-1 A’y
A(:,n+1) = ones(length(x),1,class(x));
for a polynomial:
for j = n:-1:1
A(:,j) = x.*A(:,j+1);
end
W = A'*A
Y = A'*y
Θ = W\Y