Download Method of Least Squares

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts

Four-vector wikipedia, lookup

Matrix calculus wikipedia, lookup

System of linear equations wikipedia, lookup

Singular-value decomposition wikipedia, lookup

Eigenvalues and eigenvectors wikipedia, lookup

Gaussian elimination wikipedia, lookup

Matrix multiplication wikipedia, lookup

Cayley–Hamilton theorem wikipedia, lookup

Linear least squares (mathematics) wikipedia, lookup

Least squares wikipedia, lookup

Ordinary least squares wikipedia, lookup

Principal component analysis wikipedia, lookup

Non-negative matrix factorization wikipedia, lookup

Perron–Frobenius theorem wikipedia, lookup

Orthogonal matrix wikipedia, lookup

Jordan normal form wikipedia, lookup

Determinant wikipedia, lookup

Matrix (mathematics) wikipedia, lookup

Rotation matrix wikipedia, lookup

Transcript
Method of Least Squares
Least Squares

Method of Least Squares:
 Deterministic approach

The inputs u(1), u(2), ..., u(N) are applied to the system
The outputs y(1), y(2), ..., y(N) are observed
 Find a model which fits the input-output relation to a (linear?)
curve, f(n,u(n))
 ‘best’ fit by minimising the sum of the squres of the difference f - y

50
45
40
35
30
25
20
15
10
5
0
0
5
10
15
20
25
30
35
40
45
50
Least Squares

The curve fitting problem can be formulated as
model
variable
observations

Error:
Sum-of-error-squares:

Minimum (least-squares of error) is achieved when the gradient is zero

Problem Statement

For the inputs to the system, u(i)
The observed desired response
is, d(i)

Relation is assumed to be linear

Unobservable measurement error
 Zero mean


White
Problem Statement

Design a transversal filter which finds the least squares solution

Then, sum of error squares is
Data Windowing


We will express the input in matrix form
Depending on the limits i1 and i2 this matrix changes
Covariance Method
i1=M, i2=N
Postwindowing Method
i1=M, i2=N+M1
Autocorr. Method
i1=1, i2=N+M1
Prewindowing Method
i1=1, i2=N
Principle of Orthogonality

Error signal

Least squares (minimum of sum of squares) is achieved when

i.e., when
!Time averaging!
(For Wiener filtering)
(this was ensemble average)

The minimum-error time series emin(i) is orthogonal to the time series of
the input u(i-k) applied to tap k of a transversal filter of length M for
k=0,1,...,M-1 when the filter is operating in its least-squares condition.
Corollary of Principle of Orthogonality

LS estimate of the desired response is

Multiply principle of orthogonality by wk* and take summation over k

Then

When a transversal filter operates in its least-squares condition, the
least-squares estimate of the desired response -produced at the
output of the filter- and the minimum estimation error time series are
orthogonal to each other over time i.
Energy of Minimum Error

Due to the principle of orthogonality, second and third terms are
orthogonal, hence
where



, when eo(i)= 0 for all i, impossible
, when the problem is underdetermined fewer data points
than parameters infinitely many solutions (no unique soln.)!
Normal Equations
Principle of Orthogonality
Minimum error:
→
(t,k), 0≤(t,k) ≤M-1
time-average
autocorrelation function
of the input

z(-k), 0 ≤k ≤M-1
time-average
cross-correlation bw
the desired response
and the input
Hence,
Expanded system of the normal equations for linear least-squares filters.
Normal Equations (Matrix Formulation)

Matrix form of the normal equations for linear least-squares filters:
(if -1 exists!)


Linear least-squares counterpart of the Wiener-Hopf eqn.s.
Here  and z are time averages, whereas in Wiener-Hopf eqn.s
they were ensemble averages.
Minimum Sum of Error Squares

Energy contained in the time series

Or,

Then the minimum sum of error squares is
is
Properties of the Time-Average Correlation Matrix 

Property I: The correlation matrix  is Hermitian symmetric,

Property II: The correlation matrix  is nonnegative definite,

Property III: The correlation matrix  is nonsingular iff det() is nonzero

Property IV: The eigenvalues of the correlation matrix  are real and
non-negative.
Properties of the Time-Average Correlation Matrix 

Property V: The correlation matrix  is the product of two rectangular
Toeplitz matrices that are Hermitian transpose of each other.
Normal Equations (Reformulation)

But we know that
then
which yields
! Pseudo-inverse !

Substituting into the minimum sum of error squares expression gives
Projection

The LS estimate of d is given by

The matrix
is a projection operator
 onto the linear space spanned by the columns of data matrix A
 i.e. the space Ui.

The orthogonal complement projector is
Projection - Example

M=2 tap filter, N=4 → N-M+1=3
Let

Then

And

orthogonal
Projection - Example
Uniqueness of the LS Solution

LS always has a solution, is that solution unique?

The least-squares estimate
is unique if and only if the nullity (the
dimension of the null space) of the data matrix A equals zero.

AKxM, (K=N-M+1)

Solution is unique when A is of full column rank, K≥M
 All columns of A are linearly independent
 Overdetermined system (more eqns. than variables (taps))
 (AHA)-1 nonsingular → exists and unique


Infinitely many solutions when A has linearly dependent
columns, K<M
(AHA)-1 is singular
Properties of the LS Estimates

Property I: The least-squares estimate is unbiased, provided that
the measurement error process eo(i) has zero mean.

Property II: When the measurement error process eo(i) is white with
zero mean and variance 2, the covariance matrix of the leastsquares estimate
equals 2-1.

Property III: When the measurement error process eo(i) is white
with zero mean, the least squares estimate
is the best linear
unbiased estimate.

Property IV: When the measurement error process eo(i) is white and
Gaussian with zero mean, the least-squares estimate
achieves
the Cramer-Rao lower bound for unbiased estimates.
Computation of the LS Estimates

The rank (W) of an KxN (K≥N or K<N) matrix A gives
 The number of linearly independent columns/rows
 The number of non-zero eigenvalues/singular values

The matrix is said to be full rank (full column or row rank) if


Otherwise, it is said to be rank-deficient
Rank is an important parameter for matrix inversion
 If K=N (square matrix) and the matrix is full rank (W=K=N) (nonsingular) inverse of the matrix can be calculated, A-1=adj(A)/det(A)

If the matrix is not square (K≠N), and/or it is rank-deficient (singular),
A-1 does not exist, instead we can use the pseudo-inverse (a
projection of the inverse), A+
SVD

We can calculate the pseudo-inverse using SVD.

Any KxN matrix (K≥N or K<N) can be decomposed using the
Singular Value Decomposition (SVD) as follows:
SVD

The system of eqn.s,
 is overdetermined if K>N, more eqn.s than unknowns,



is underdetermined if K<N, more unknowns than eqn.s,


Unique solution (if A is full-rank)
Non-unique, infinitely many solutions (if A is rank-deficient)
Non-unique, infinitely many solutions
In either case the solution(s) is(are)
where
Computation of the LS Estimates

Find the solution of (A: KxM)

If K>M and rank(A)=M, (

Otherwise
, infinitely many solutions, but pseudo-inverse
gives the minimum-norm solution to the least squares problem.

) the unique solution is
Shortest length possible in the Euclidean norm sense.