Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Lecture 3 Review of Linear Algebra Simple least-squares Set up for standard Least Squares y i = a + b xi y1 y2 = … yN d 1 x1 1 x2 … … 1 xN = G a b m Standard Least-squares Solution mest = [GTG]-1 GT d practice Set up a simple least-squares problem, identifying the vectors d and m and the matrix G Solve it using the least-squares formula, mest = [GTG]-1 GT d Lecture 4 Probability and what it has to do with data analysis the Gaussian or normal distribution variance expected value p(x) = 1 (2p)s exp{ - 2 (x-x) / Memorize me ! 2 2s ) Properties of the normal distribution Expectation = Median = p(x) Mode = x 95% x x-2s x x+2s 95% of probability within 2s of the expected value Functions of a random variable any function of a random variable is itself a random variable Errors propagate from observations to inferences General rule given a distribution p(x) e.g. where x are observations and a function y(x) e.g. where y are inferences p(y) = p[x(y)] |dx/dy| Suppose y(x) is a linear function y=Mx Then, regardless of the type of distribution, p(x): y=Mx C y = M C x MT In the special case that p(x) is a normal distrbution p(y) is a normal distribution, too. Means and Variances Add Special case: y=Mx y1 = Ax1 ± Bx2 So that M = [A, B] y = Ax1 ± Bx2 y=Mx Cy = M Cx MT sy2 = A2sx12 + B2sx22 Note that variance always add practice I would say … practice transforming a distribution of two variables, p(x1,x2) p(y1,y2) when the functions y1(x1,x2) and y2(x1,x2) are simple (but nonlinear) expressions and p(x1,x2) is simple, too. … but actually, even the simplest version would be too long for a midterm. Lecture 5 Probability and Statistics Rule for propagating error in least-squared Cy = M Cx MT M=[GTG]-1GT Uncorrelated data with equal variance Cd=sd2I Cm = M Cd MT = sd2 [GTG]-1 From this follows the famous rule for the error associated with the mean. If G = N-1 [1, 1, … 1]T sm = sd / N the estimated mean is a normally-distributed random variable the width of this distribution, sm, decreases with the square root of the number of measurements practice Set up a simple (e.g. linear) error-propagation problem by identifing the matrices M and Cd Compute and interpret Cm using the rule C y = M C x MT And then write down 95% confidence intervals Lecture 6 Bootstraps Maximum Likelihood Methods Pour into new pot p(y) Take 1 cup p(y) More or less the same thing in the 2 pots ? Bootstrap method random sampling with replacement use the original dataset x to create many new datasets x(i) compute a y(x) from each and empirically examine their distribution The Principle of Maximum Likelihood Given a parameterized distribution p(x;m) Chose m so that it maximizes L(m) L(m) = Si ln p(xi; m) L/mi = 0 the dataset that was in fact observed is the most probable one that could have been observed Application to Normal Distribution Sample mean and sample variance are the maximum likelihood estimates of the true mean and variance of a normal distribution practice I would say … use maximum likelihood to find the m associated with a parameterized distribution p(d,m) when p(d,m) is something fairly simple … but I think even the simplest such a problem would be too long for a midterm Lecture 7 Advanced Topics in Least Squares When the data are normally-distributed with variance Cd Maximum likelihood implies generalized leastsquares: Minimize (d-Gm)T Cd-1 (d-Gm) Which has solution m = [GT Cd-1G]-1GTCd-1d and Cm = [GTCd-1G]-1 In the special case of uncorrelated data with different variances Cd = diag( s12, s22, … sN2) = s 0 0 … 2 1 0 s22 0 … 0 0 s32 … ... di’=si-1 di multiply each data by the reciprocal of its error Gij’ = si-1 Gij multiply each row of the data kernel by the same amount Then solve by ordinary least squares practice Set up a simple least-squares problem when the data have non-uniform variance Solve it: work out a formula for the least-squares estimate of the unknowns, and their variance as well. Interpret the results, e.g. write down 95% confidence intervals for the unknowns Lecture 8 Advanced Topics in Least Squares - Part Two - prior information assumptions about the behavior of the unknowns that ‘fill in’ the data gaps Overall Strategy 1. Represent the observed data as a normal probability distribution with d=dobs, Cd 2. Represent prior information as a probability distribution with m=mA, Cm… … 5. Apply maximum likelihood to the combined distribution Generalized least-squares solution mest = mA + M [ dobs – GmA] where M = [GTCd-1G + Cm-1]-1 GT Cd-1 Special case: uncorrelated data and prior constraints Cd=sd2I and Cm=sm2I M = [ GTG + (sd/sm)2I ]-1 GT Called damped least-squares Unknown m’s filled in with their prior values mA Another special case: Smoothness … Dm is a measure of roughness of m e.g. second derivative 1 -2 1 D= 0 0 0 … 0 1 -2 1 0 0 … … 0 0 0 … 1 -2 1 d2m/dx2 Dm solution corresponds to generalized least-squares with the choices mA = 0 Cm-1 = (DTD) practice Set up a simple least-squares problem when prior information about the model parameters is available. Most importantly, specify mA and Cm in sensible ways. Solve it: work out a formula for the estimate of the unknowns, and their variance as well. Interpret the results, e.g. write down 95% confidence intervals for the unknowns Lecture 9 Interpolation and Splines cubic splines – yi yi+1 y cubic a+bx+cx2+dx3 in this interval a different cubic in this interval xi xi+1 x Properties curve goes thru point at end of its interval dy/dx match at interior points d2y/dx2 match at interior points d2y/dx2 =0 at end points practice Memorize the properties of cubic splines Lecture 10 Hypothesis Testing The Null Hypothesis always a variant of this theme: the results of an experiment differs from the expected value only because of random variation 5 tests mobs = mprior when mprior and sprior are known normal distribution sobs = sprior when mprior and sprior are known chi-squared distribution mobs = mprior when mprior is known but sprior is unknown t distribution s1obs = s2obs when m1prior and m2prior are known F distribution m1obs = m2obs when s1prior and s2prior are unknown modified t distribution Not on midterm practice Work through an example of each of the 4 tests identify which test is being used, and why indentify the Null hypothesis compute the probability that the results deviate from the Null Hypothesis only because of random noise interpret the results Lecture 11 Linear Systems output (“response”) of a linear system can be calculated by convolving its input (“forcing”) with its impulse response h(t) h(t) amplitude h(t) q(t)=g(t) 0 t t t q(t) h(t)g(t-t) 0 t t Convolution integral t q(t) = - g(t-t) h(t) dt t how to do convolution by hand x=[x0, x1, x2, x3, x4, …]T and y=[y0, y1, y2, y3, y4, …]T Reverse on time-series, line them up as shown, and multiply rows. This is first element of x*y x0, x1, x2, x3, x4, … … y4, y3, y2, y1, y0 x0y0 Slide to increase the overlap by one, multiply rows and add products. This is the second element x0, x1, x2, x3, x4, … … y4, y3, y2, y1, y0 x0y1+x1y0 Slide again, multiply and add. This is the third element x0, x1, x2, x3, x4, … … y4, y3, y2, y1, y0 x0y2+x1y1+x2y0 Repeat until time-series no longer overlap Mathematical equivalent ways to write the convolution t q(t) = - g(t-t) h(t) dt h(t) is “forward in time” or alternatively q(t) = 0 g(t) h(t-t) dt g(t) is “forward in time” Matrix formulations q=Gh q0 q1 … qN = Dt g0 0 0 0 0 0 g1 g0 0 0 0 0 … gN … g 3 g 2 g1 g0 h0 h1 … hN and q0 q1 … qN = Dt h0 0 0 0 0 0 h1 h0 0 0 0 0 … hN … h 3 h 2 h1 h0 q=Gg g0 g1 … gN practice Do some convolutions by hand Make sketch-plots of the input, output and impulse response Lecture 12 Filter Theory a generic way to construct a time-series yk is obtained from xk by convolving by filter fk yk = Sp=-k fk-p xp input output “digital” filter the z-transform turn a timeseries into a polynomial and vice versa Z-transform time-series x=[x0, x1, x2, x3, x4, …]T Convolving time-series is equivalent to multiplying their z-transforms polynomial x(z) = x0 + x1z + x2z2 + x3z3 + x4z4 + … If f = [1, -f1]T then finv = [1, f1, f12, f13, …]T The inverse filter only exists when |f1|<1, for otherwise the elements of finv grow without bound any filter of length N can be written as a cascade of N-1 length-2 filters f = [f0, f1, f2, f3, … fN-1]T = [-r1, 1]T* [-r2, 1]T *…* [-rN-1, 1]T where ri are the roots of f(z) In the general case, an inverse filter only exists when the roots ri of the corresponding f(z) satisfy |ri|>1 such a filter is said to be “minimum phase” practice Given a relatively short filter, f (3 or 4 coefficients) Factor it into a cascade of 2-element filters, by computing the roots of f(z) Determine whether the filter f has an inverse