Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
QMDA Review Session Things you should remember 1. Probability & Statistics the Gaussian or normal distribution variance expected value p(x) = 1 (2p)s exp{ - 2 (x-x) / 2 2s ) Properties of the normal distribution Expectation = Median = p(x) Mode = x 95% x x-2s x x+2s 95% of probability within 2s of the expected value Multivariate Distributions The Covariance Matrix, C, is very important Cij the diagonal elements give the variance of each xi sxi2 = Cii The off-diagonal elemements of C indicate whether pairs of x’s are correlated. E.g. C12 C12>0 x2 positive correlation C12<0 x2 negative correlation x2 x2 x1 x1 x1 x1 the multivariate normal distribution p(x) = (2p)-N/2 |Cx|-1/2 exp{ -1/2 (x-x)T Cx-1 (x-x) } has expectation x covariance Cx And is normalized to unit area if y is linearly related to x, y=Mx then y=Mx (rule for means) Cy = M Cx MT (rule for propagating error) These rules work regardless of the distribution of x 2. Least Squares Simple Least Squares Linear relationship between data, d, and model, m d = Gm Minimize prediction error E=eTe with e=dobs-Gm mest = [GTG]-1GTd If data are uncorrelated with variance, sd2, then Cm = sd2 [GTG]-1 Least Squares with prior constraints Given uncorrelated with variance, sd2, that satisfy a linear relationship d = Gm And prior information with variance, sm2, that satisfy a linear relationship h = Dm The best estimate for the model parameters, mest, solves d G m= eh eD With e = sm/sd. Previously, we discussed only the special case h=0 Newton’s Method for Non-Linear LeastSquares Problems Given data that satisfies a non-linear relationship d = g(m) Guess a solution m(k) with k=0 and linearize around it: Dm = m-m(k) and Dd = d-g(m(k)) and Dd=GDm With Gij = gi/mj evaluated at m(k) Then iterate, m(k+1) = m(k) + Dm with Dm=[GTG]-1GTDd hoping for convergence 3. Boot-straps Investigate the statistics of y by creating many datasets y’ and examining their statistics each y’ is created through random sampling with replacement of the original dataset y N resampled data Random integers in the range 1-N N original data y1 4 y’1 y2 3 y’2 y3 7 y’3 y4 11 y’4 y5 4 y’5 y6 1 y’6 y7 9 y’7 … … … yN 6 y’N Example: statistics of the mean of y, given N data Compute estimate 1 Si y’i N Now repeat a gazillion times and examine the resulting distribution of estimates 4. Interpolation and Splines linear splines yi yi+1 y in this interval y(x) = yi + (yi+1-yi)(x-xi)/(xi+1-xi) 1st derivative discontinuous here xi xi+1 x cubic splines yi yi+1 y cubic a+bx+cx2+dx3 in this interval a different cubic in this interval 1st and 2nd derivative continuous here xi xi+1 x 5. Hypothesis Testing The Null Hypothesis always a variant of this theme: the results of an experiment differs from the expected value only because of random variation Test of Significance of Results say to 95% significance The Null Hypothesis would generate the observed result less than 5% of the time Four important distributions Normal distribution Distribution of xi Chi-squared distribution Distribution of c2 = Si=1Nxi2 Student’s t-distribution Distribution of t = x0 / { N-1 Si=1Nxi2 } F-distribution Distribution of F = { N-1Si=1N xi2} / { M-1Si=1M xN+i2 } 5 tests mobs = mprior when mprior and sprior are known normal distribution sobs = sprior when mprior and sprior are known chi-squared distribution mobs = mprior when mprior is known but sprior is unknown t distribution s1obs = s2obs when m1prior and m2prior are known F distribution m1obs = m2obs when s1prior and s2prior are unknown modified t distribution 6. filters Filtering operation g(t)=f(t)*h(t) “convolution” t g(t) = - f(t-t) h(t) dt gk = Dt Sp=-k fk-p hp or alternatively g(t) = 0 f(t) h(t-t) dt gk = Dt Sp=0 fp hk-p How to do convolution by hand x=[x0, x1, x2, x3, x4, …]T and y=[y0, y1, y2, y3, y4, …]T Reverse on time-series, line them up as shown, and multiply rows. This is first element of x*y x0, x1, x2, x3, x4, … … y4, y3, y2, y1, y0 [x*y]1= x0y0 Then slide, multiply rows and add to get the second element of x*y x0, x1, x2, x3, x4, … … y4, y3, y2, y1, y0 [x*y]2= x0y1+x1y0 And etc … Matrix formulations of g(t)=f(t)*h(t) g=Fh g0 g1 … gN = Dt f0 0 0 0 0 0 f1 f0 0 0 0 0 … f N … f3 f 2 f 1 f 0 h0 h1 … hN and g0 g1 … gN = Dt h0 0 0 0 0 0 h1 h0 0 0 0 0 … hN … h 3 h 2 h1 h0 g=Hf f0 f1 … fN g=Hf g0 g1 … gN = Dt h0 0 0 0 0 0 h1 h0 0 0 0 0 … hN … h 3 h2 h1 h0 f0 f1 … fN Least-squares equation [HTH] f = HTg X(0) X(1) X(2) … X(N) = A(0) A(1) A(2) … A(N) Cross-correlation of h and g A(1) A(0) A(1) A(2) A(1) A(0) … … … A(N-1) A(N-2) … f0 f1 … fN Autocorrelation of h Ai and Xi Auto-correlation of a time-series, T(t) A(t) = + - T(t) T(t-t) dt Ai = Sj Tj Tj-i Cross-correlation of two time-series T(1)(t) and T(2)(t) X(t) = + - T(1)(t) T(2)(t-t) dt Xi = Sj T(1)j T(2)j-i 7. fourier transforms and spectra Integral transforms: C(w) = + - T(t) exp(-iwt) dt T(t) = (1/2p) + - C(w) exp(iwt) dw Discrete transforms (DFT) Ck = Sn=0N-1 Tn exp(-2pikn/N ) with k=0, …, N-1 Tn = N-1Sk=0N-1 Ck exp(+2pikn/N ) with n=0, …, N-1 Frequency step: DwDt = 2p/N Maximum (Nyquist) Frequency wmax = 1/ (2Dt) Aliasing and cyclicity in a digital world wn+N = wn and since time and frequency play symmetrical roles in exp(-iwt) tk+N = tk One FFT that you should know: C(w) = - + d(t) exp(-iwt) dt = exp(0) = 1 FFT of a spike at t=0 is a constant Error Estimates for the DFT Assume uncorrelated, normally-distributed data, dn=Tn, with variance sd2 The matrix G in Gm=d is Gnk=N-1 exp(+2pikn/N ) The problem Gm=d is linear, so the unknowns, mk=Ck, (the coefficients of the complex exponentials) are also normally-distributed. Since exponentials are orthogonal, GHG=N-1I is diagonal and Cm= sd2 [GHG]-1 = N-1sd2I is diagonal, too Apportioning variance equally between real and imaginary parts of Cm, each has variance s2= N-1sd2/2. The spectrum sm2= Crm2+ Cim2 is the sum of two uncorrelated, normally distributed random variables and is thus c22-distributed. The 95% value of c22 is about 5.9, so that to be significant, a peak must exceed 5.9N-1sd2/2 Convolution Theorem transform[ f(t)*g(t) ] = transform[g(t)] transform[f(t)] Power spectrum of a stationary time-series T(t) = stationary time series C(w) = +T/2 -T/2 T(t) exp(-iwt) dt S(w) = limT T-1 |C(w)|2 S(w) is called the power spectral density, the spectrum normalized by the length of the time series. Relationship of power spectral density to DFT To compute the Fourier transform, C(w), you multiply the DFT coefficients, Ck, by Dt. So to get power spectal density T-1 |C(w)|2 = (NDt)-1 |Dt Ck|2 = (Dt/N) |Ck|2 You multiply the DFT spectrum, |Ck|2, by Dt/N. Windowed Timeseries Fourier transform of long time-series convolved with the Fourier Transform of the windowing function is Fouier transform of windowed time-series Window Functions Boxcar its Fourier transform is a sinc function which has a narrow central peak but large side lobes Hanning (Cosine) taper its Fourier transform has a somewhat wider central peak but now side lobes 8. EOF’s and factor analysis Representation of samples as a linear mixing of factors S=CF (A in s1) (B in s1) (C in s1) (A in s2) (B in s2) (C in s2) (A in s3) (B in s3) (C in s3) … (A in sN) (B in sN) (C in sN) = (f1 in s1) (f2 in s1) (f3 in s1) (f1 in s2) (f2 in s2) (f3 in s2) (f1 in s3) (f2 in s3) (f3 in s3) … (f1 in sN) (f2 in sN) (f3 in sN) (A in f1) (A in f2) (A in f3) (B in f1) (B in f2) (B in f3) (C in f1) (C in f2) (C in f3) Samples Coefficients Factors NM NM MM data approximated with only most important factors p most important factors = those with the biggest coefficients (A in s1) (B in s1) (C in s1) (A in s2) (B in s2) (C in s2) (A in s3) (B in s3) (C in s3) … (A in sN) (B in sN) (C in sN) Samples NM = (f1 in s1) (f2 in s1) (f1 in s2) (f2 in s2) (f1 in s3) (f2 in s3) … (f1 in sN) (f2 in sN) ignore f3 S C’ F’ (A in f1) (A in f2) (B in f1) (B in f2) (C in f1) (C in f2) ignore f3 selected coefficients selected factors Np pM Singular Value Decomposition (SVD) Any NM matrix S and be written as the product of three matrices S = U L VT where U is NN and satisfies UTU = UUT V is MM and satisfies VTV = VVT and L is an NM diagonal matrix of singular values, li SVD decomposition of S S = U L VT write as S = U L VT = [U L] [VT] = C F So the coefficients are C = U L and the factors are F = VT The factors with the biggest li’s are the most important Transformations of Factors If you chose the p most important factors, they define both a subspace in which the samples must lie, and a set of coordinate axes of that subspace. The choice of axes is not unique, and could be changed through a transformation, T Fnew = T Fold A requirement is that T-1 exists, else Fnew will not span the same subspace as Fold S = C F = C I F = (C T-1) (T F)= Cnew Fnew So you could try to implement the desirable factors by designing an appropriate transformation matrix, T 9. Metropolis Algorithm and Simulated Annealing Metropolis Algorithm a method to generate a vector x of realizations of the distribution p(x) The process is iterative start with an x, say x(i) then randomly generate another x in its neighborhood, say x(i+1), using a distribution Q(x(i+1)|x(i)) then test whether you will accept the new x(i+1) if it passes, you append x(i+1) to the vector x that you are accumulating if it fails, then you append x(i) a reasonable choice for Q(x(i+1)|x(i)) normal distribution with mean=x(i) and sx2 that quantifies the sense of neighborhood The acceptance test is as follows first compute the quantify: a= p(x(i+1)) Q(x(i)|x(i+1)) p(x(i)) Q(x(i+1)|x(i)) If a>1 always accept x(i+1) If a<1 accept x(i+1) with a probability of a and accept x(i) with a probability of 1-a Simulated Annealing Application of Metropolis to Nonlinear optimization find m that minimizes E(m)=eTe where e = dobs-g(m) Based on using the Boltzman distribution for p(x) in the Metropolis Algorithm p(x) = exp{-E(m)/T} where temperature, T, is slowly decreased during the iterations 10. Some final words Start Simple ! Examine a small subset of your data and looking them over carefully Build processing scripts incrementally, checking intermediated results at each stage Make lots of plots and look them over carefully Do reality checks