Download Document

Document related concepts

History of statistics wikipedia , lookup

Statistics wikipedia , lookup

Transcript
Lecture 3
Review of Linear Algebra
Simple least-squares
Set up for standard Least Squares
y i = a + b xi
y1
y2 =
…
yN
d
1 x1
1 x2
… …
1 xN
= G
a
b
m
Standard Least-squares Solution
mest = [GTG]-1 GT d
practice
Set up a simple least-squares problem,
identifying the vectors d and m and the matrix
G
Solve it using the least-squares formula,
mest = [GTG]-1 GT d
Lecture 4
Probability
and what it has to do with
data analysis
the Gaussian or normal
distribution
variance
expected value
p(x) =
1
(2p)s
exp{ -
2
(x-x)
/
Memorize me !
2
2s
)
Properties of the normal distribution
Expectation =
Median =
p(x)
Mode = x
95%
x
x-2s x
x+2s
95% of
probability
within 2s of
the expected
value
Functions of a random variable
any function of a random variable is itself
a random variable
Errors propagate from
observations to inferences
General rule
given a distribution p(x)
e.g. where x are observations
and a function y(x)
e.g. where y are inferences
p(y) = p[x(y)] |dx/dy|
Suppose y(x) is a linear function y=Mx
Then, regardless of the type of distribution, p(x):
y=Mx
C y = M C x MT
In the special case that p(x) is a normal distrbution
p(y) is a normal distribution, too.
Means and Variances Add
Special case: y=Mx  y1 = Ax1 ± Bx2
So that M = [A, B]
y = Ax1 ± Bx2
y=Mx
Cy = M Cx
MT
sy2 = A2sx12 + B2sx22
Note that variance always add
practice
I would say …
practice transforming a distribution of two
variables, p(x1,x2) p(y1,y2)
when the functions y1(x1,x2) and y2(x1,x2) are
simple (but nonlinear) expressions and p(x1,x2)
is simple, too.
… but actually, even the simplest
version would be too long for a midterm.
Lecture 5
Probability and Statistics
Rule for propagating error in least-squared
Cy = M Cx MT
M=[GTG]-1GT
Uncorrelated data with equal variance Cd=sd2I
Cm = M Cd MT = sd2 [GTG]-1
From this follows the famous rule for the error
associated with the mean.
If G = N-1 [1, 1, … 1]T
sm = sd / N
the estimated mean is a normally-distributed random variable
the width of this distribution, sm, decreases with the square
root of the number of measurements
practice
Set up a simple (e.g. linear) error-propagation
problem by identifing the matrices M and Cd
Compute and interpret Cm using the rule
C y = M C x MT
And then write down 95% confidence intervals
Lecture 6
Bootstraps
Maximum Likelihood Methods
Pour into new pot
p(y)
Take 1 cup
p(y)
More or less the same thing in the 2 pots ?
Bootstrap method
random sampling with replacement
use the original dataset x
to create many new datasets x(i)
compute a y(x) from each
and empirically examine their distribution
The Principle of Maximum Likelihood
Given a parameterized distribution p(x;m)
Chose m so that it maximizes L(m)
L(m) = Si ln p(xi; m)
L/mi = 0
the dataset that was in fact observed is the most
probable one that could have been observed
Application to Normal Distribution
Sample mean and sample variance are
the maximum likelihood estimates of
the true mean and variance of a
normal distribution
practice
I would say …
use maximum likelihood to find the m
associated with a parameterized distribution
p(d,m)
when p(d,m) is something fairly simple
… but I think even the simplest such a problem
would be too long for a midterm
Lecture 7
Advanced Topics in
Least Squares
When the data
are normally-distributed with variance Cd
Maximum likelihood implies generalized leastsquares:
Minimize (d-Gm)T Cd-1 (d-Gm)
Which has solution
m = [GT Cd-1G]-1GTCd-1d
and
Cm = [GTCd-1G]-1
In the special case of uncorrelated data with different
variances Cd = diag( s12, s22, … sN2) = s 0 0 …
2
1
0 s22 0 …
0 0
s32 …
...
di’=si-1 di
multiply each data by
the reciprocal of its error
Gij’ = si-1 Gij multiply each row of the
data kernel by the same
amount
Then solve by ordinary least squares
practice
Set up a simple least-squares problem when the data
have non-uniform variance
Solve it: work out a formula for the least-squares
estimate of the unknowns, and their variance as well.
Interpret the results, e.g. write down 95% confidence
intervals for the unknowns
Lecture 8
Advanced Topics in
Least Squares
- Part Two -
prior information
assumptions about the behavior of the
unknowns
that ‘fill in’ the data gaps
Overall Strategy
1. Represent the observed data as a normal
probability distribution with d=dobs, Cd
2. Represent prior information as a probability
distribution with m=mA, Cm…
…
5. Apply maximum likelihood to the combined
distribution
Generalized least-squares solution
mest = mA + M [ dobs – GmA]
where M = [GTCd-1G + Cm-1]-1 GT Cd-1
Special case: uncorrelated data and prior
constraints
Cd=sd2I and Cm=sm2I
M = [ GTG + (sd/sm)2I ]-1 GT
Called damped least-squares
Unknown m’s filled in with their prior
values mA
Another special case: Smoothness …
Dm is a measure of roughness of m
e.g. second derivative
1 -2 1
D=
0 0
0 …
0 1 -2 1 0 0 …
…
0 0
0 … 1 -2 1
d2m/dx2  Dm
solution corresponds to generalized
least-squares with the choices
mA = 0
Cm-1 = (DTD)
practice
Set up a simple least-squares problem when prior
information about the model parameters is available.
Most importantly, specify mA and Cm in sensible
ways.
Solve it: work out a formula for the estimate of the
unknowns, and their variance as well.
Interpret the results, e.g. write down 95% confidence
intervals for the unknowns
Lecture 9
Interpolation and Splines
cubic splines –
yi yi+1
y
cubic a+bx+cx2+dx3
in this interval
a different cubic
in this interval
xi
xi+1
x
Properties
curve goes thru point at end of its interval
dy/dx match at interior points
d2y/dx2 match at interior points
d2y/dx2 =0 at end points
practice
Memorize the properties of cubic splines
Lecture 10
Hypothesis Testing
The Null Hypothesis
always a variant of this theme:
the results of an experiment differs
from the expected value only because
of random variation
5 tests
mobs = mprior when mprior and sprior are known
normal distribution
sobs = sprior when mprior and sprior are known
chi-squared distribution
mobs = mprior when mprior is known but sprior is unknown
t distribution
s1obs = s2obs when m1prior and m2prior are known
F distribution
m1obs = m2obs when s1prior and s2prior are unknown
modified t distribution
Not on
midterm
practice
Work through an example of each of the 4 tests
identify which test is being used, and why
indentify the Null hypothesis
compute the probability that the results deviate from
the Null Hypothesis only because of random noise
interpret the results
Lecture 11
Linear Systems
output (“response”) of a linear system
can be calculated by
convolving its input (“forcing”)
with its impulse response
h(t)
h(t)
amplitude h(t)
q(t)=g(t)
0
t
t
t
q(t)
h(t)g(t-t)
0
t
t
Convolution
integral
t
q(t) = - g(t-t) h(t) dt
t
how to do convolution by hand
x=[x0, x1, x2, x3, x4, …]T and y=[y0, y1, y2, y3, y4, …]T
Reverse on time-series, line them up as
shown, and multiply rows. This is first
element of x*y
x0, x1, x2, x3, x4, …

… y4, y3, y2, y1, y0
x0y0
Slide to increase the overlap by one, multiply
rows and add products. This is the second
element
x0, x1, x2, x3, x4, …
 
… y4, y3, y2, y1, y0
x0y1+x1y0
Slide again, multiply and add. This is
the third element
x0, x1, x2, x3, x4, …
  
… y4, y3, y2, y1, y0
x0y2+x1y1+x2y0
Repeat until time-series no longer overlap
Mathematical equivalent ways to
write the convolution
t
q(t) = - g(t-t) h(t) dt
h(t) is “forward in time”
or alternatively

q(t) = 0 g(t) h(t-t) dt
g(t) is “forward in time”
Matrix formulations
q=Gh
q0
q1
…
qN
= Dt
g0 0 0 0 0 0
g1 g0 0 0 0 0
…
gN … g 3 g 2 g1 g0
h0
h1
…
hN
and
q0
q1
…
qN
= Dt
h0 0 0 0 0 0
h1 h0 0 0 0 0
…
hN … h 3 h 2 h1 h0
q=Gg
g0
g1
…
gN
practice
Do some convolutions by hand
Make sketch-plots of the input, output and
impulse response
Lecture 12
Filter Theory
a generic way to construct a time-series
yk is obtained from xk by
convolving by filter fk
yk = Sp=-k fk-p xp
input
output
“digital”
filter
the z-transform
turn a timeseries into a polynomial
and vice versa
Z-transform
time-series
x=[x0, x1, x2, x3, x4, …]T
Convolving
time-series is
equivalent to
multiplying their
z-transforms
polynomial
x(z) = x0 + x1z + x2z2 + x3z3 + x4z4 + …
If
f = [1, -f1]T
then
finv = [1, f1, f12, f13, …]T
The inverse filter only exists when |f1|<1,
for otherwise the elements of finv grow
without bound
any filter of length N can be written as a
cascade of N-1 length-2 filters
f = [f0, f1, f2, f3, … fN-1]T
= [-r1, 1]T* [-r2, 1]T *…* [-rN-1, 1]T
where ri are the roots of f(z)
In the general case, an inverse filter only exists
when the roots ri of the corresponding f(z)
satisfy |ri|>1
such a filter is said to be
“minimum phase”
practice
Given a relatively short filter, f (3 or 4 coefficients)
Factor it into a cascade of 2-element filters, by
computing the roots of f(z)
Determine whether the filter f has an inverse