Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Linear discrete inverse Problems (parameter estimation) Least squares and all that.... Least squares problems Least squares is the basis of many parameter estimation and data fitting procedures. A concise tutorial can be found in Chapter 15 of the book Numerical Recipes Press et al. (1992) Cambridge Univ. Press. Available for free online at http://www.nrbook.com Good explanation of essentials in Aster et al. (2005). 2 Linear discrete inverse problems Can a and b be resolved ? Under-determined Over-determined Even-determined 3 Over-determined: Linear discrete inverse problem To find the best fit model we can minimize the prediction error of the solution But the data contain errors. Let’s assume these are independent and normally distributed, then we weight each residual inversely by the standard deviation of the corresponding (known) error distribution. We can obtain a least squares solution by minimizing the weighted prediction error of the solution. 4 Over-determined: Linear discrete inverse problem We seek the model vector m which minimizes Compare with maximum likelihood Note that this is a quadratic function of the model vector. Solution: Differentiate with respect to m and solve for the model vector which gives a zero gradient in This gives… This is the least-squares solution. A solution to the normal equations: 3 5 Over-determined: Linear discrete inverse problem How does the Least-squares solution compare to the standard equations of linear regression ? Given N data yi with independent normally distributed errors and standard deviations i what are the expressions for the model parameters m = [a,b]T ? 6 Linear discrete inverse problem: Least squares What happens in the under and even-determined cases ? Under-determined, N=1: Matrix has a zero determinant What m ?eigenvalue and aiszero an infinite number of solutions exist Even-determined, N=2: r d m 0 G =error ? What=is the¡prediction Prediction error is zero ! 7 Example: Over-determined, Linear discrete inverse problem The Ballistics example Given data and noise Calculate G Is the data fit good enough ? And how to errors in data propagate into the solution ? 8 The two questions in parameter estimation We have our fitted model parameters …but we are far from finished ! We need to: Assess the quality of the data fit. Goodness of fit: Does the model fit the data to within the statistical uncertainty of the noise ? Estimate how errors in the data propagate into the model What are the errors on the model parameters ? 9 Goodness of fit Once we have our least squares solution mLS how do we know whether the fit is good enough given the errors in the data ? Use the prediction error at the least squares solution ! If data errors are Gaussian this as a chi-square statistic 10 Why do we always assume errors are Gaussian ? Probability density functions of 5 random variables X i 0 The Central Limit Theorem 2.5 0 100000 deviates Xi 0.5 X3 5.0 1.0 0 0.5 X1 1.0 0 0.5 X2 1.0 0 0.5 X4 1.0 0 0.5 X5 1.0 11 Mathematical Background: Probability distributions A random variable x follows a Normal or Gaussian probability density function if Multivariate case If data covariance matrix is diagonal then data errors are independent. 12 Mathematical Background: Probability distributions Expectation operator Expectation of a Gaussian random variable Variance Covariance If X and Y are independent then 13 Mathematical Background: Probability distributions Multi-dimensional Gaussian Expectation of a Gaussian random vector Covariance matrix 2 Correlation matrix 6 6 6 4 ¾1;1 ¾2;1 ¾3;1 ¾4;1 ¾1;2 ¾2;2 ¾3;2 ¾4;2 ¾1;3 ¾2;3 ¾3;3 ¾4;3 ¾1;4 ¾2;4 ¾3;4 ¾4;4 3 7 7 7 5 Independence between X and Y Positive or negative correlation 14 Background: Chi-square distribution If x follows a Normal distribution What distribution does the square of x follow ? Answer: A chi-square distribution with 1 degree of freedom If x1 and x2 are Normal, what distribution does y=x21 + x22 follow ? Answer: A chi-square distribution with 2 degrees of freedom 15 Goodness of fit For Gaussian data errors the data prediction error is the square of a Gaussian random variable hence it has a chi-square probability density function with N-M degrees of freedom. p = 0.95 p = 0.05 ndf Â2 ( 5% ) Â2 ( 50% ) Â2 ( 95% ) 5 1:15 4:35 11:07 10 3:94 9:34 18:31 20 10:85 19:34 31:41 50 34:76 49:33 67:50 100 77:93 99:33 124:34 The test provides a means to testing the assumptions that went into producing the least squares solution. It gives the likelihood that the fit actually achieved is reasonable. 16 Example: Goodness of fit The Ballistics problem Given data and noise How many degrees of freedom ? =N-M = 10 -3=7 In practice values between 0.1 and 0.9 are plausible 4 17 Variance ratio Another way of looking at the chi-square statistic is as the variance ratio of two distributions. Given two sets of random samples Question: Did they come from the same distribution or not ? The ratio of the variances of the two distributions follows a chi-square distribution. Chi-square tables tell us the likelihood that the two sets of observables come from the same distribution. 18 Goodness of fit For Gaussian data errors the chi-square statistic has a chisquare distribution with u = N-M degrees of freedom. ndf Â2 ( 5% ) Â2 ( 50% ) Â2 ( 95% ) 5 1:15 4:35 11:07 10 3:94 9:34 18:31 20 10:85 19:34 31:41 50 34:76 49:33 67:50 100 77:93 99:33 124:34 Exercise: If I fit 7 data points with a straight line and get what would you conclude ? If I fit 102 data points with a straight line and get what would you conclude ? If I fit 52 data points with a straight line and get what would you conclude ? 19 Goodness of fit For Gaussian data errors the chi-square statistic has a chisquare distribution with u = N-M degrees of freedom. ndf Â2 ( 5% ) Â2 ( 50% ) Â2 ( 95% ) 5 1:15 4:35 11:07 10 3:94 9:34 18:31 20 10:85 19:34 31:41 50 34:76 49:33 67:50 100 77:93 99:33 124:34 What could be the cause if: the prediction error is much too large ? (poor data fit) Truly unlikely data errors Errors in forward theory Under-estimated data errors the prediction error is too small ? (too good data fit) Truly unlikely data errors Over-estimated the data errors Fraud ! 20 Solution error Once we have our least squares solution mLS and we know that the data fit is acceptable, how do we find the likely errors in the model parameters arising from errors in the data ? m L S = G ¡ gd The data set we actually observed is only one realization of the many that could have been observed d0 ! m 0L S ! d+ ² m LS + ²m m 0L S = G ¡ gd 0 m L S + ² m = G ¡ g( d + ² ) The effect of adding noise to the data is to add noise to the solution ² m = G ¡ g² The model noise is a linear combination of the data noise ! 21 Solution error: Model Covariance Multivariate Gaussian data error distribution How to turn this into a probability distribution for the model errors ? We know that the solution error is a linear combination of the data error ² m = G ¡ g² The covariance of any linear combination Ad of Gaussian distributed random variables d is So we have the covariance of the model parameters 22 Solution error: Model Covariance The model covariance for a least squares problem depends on data errors and not the data itself ! G is controlled by the design of the experiment. is the least squares solution The data error distribution gives a model error distribution ! 23 Solution error: Model Covariance For the special case of independent data errors Independent data errors Correlated model errors For linear regression problem 24 Confidence intervals by projection (1-D) The model covariance is a symmetric M x M matrix In the multi-dimensional model space the value of D2 follows a distribution with M degrees of freedom. Projecting onto the mi axis the 1-D confidence interval becomes Where D2 follows a 21 distribution e.g. for 90% interval on m1 Note this is 90% the confidence interval on m1 alone. The joint (m1, m2) 90% confidence ellipse is wider than this. 25 Example: Model Covariance and confidence intervals For Ballistics problem 95% confidence interval for parameter i 26 Confidence intervals by projection The M-dimensional confidence ellipsoid can be projected onto any subset (or combination) of D parameters to obtain the corresponding confidence ellipsoid. Full M-dimensional ellipsoid Projected dimension ellipsoid Projected model vector Projected covariance matrix Chosen percentage point of the 2 distribution To find the 90% confidence ellipse for (x,y) from a 3-D (x,y,z) ellipsoid Can you see that this procedure gives the same formula for the 1-D case obtained previously ? 27 Recap: Goodness of fit and model covariance Once a best fit solution has been obtained we test goodness of fit with a chi-square test (assuming Gaussian statistics) If the model passes the goodness of fit test we may proceed to evaluating model covariance (if not then your data errors are probably too small) Evaluate model covariance matrix Plot model or projections of it onto chosen subsets of parameters Calculate confidence intervals using projected equation Where D2 follows a 21 distribution 28 Robust data fitting with the L1 norm Least squares solutions are not robust Minimize We can calculate an L1 solution with the IRLS algorithm is a diagonal weighting matrix that depends on the model See section 2.4 of Aster (2005) 29 Monte Carlo error propagation Its possible to define an approximate p statistic for the L1 solution and hence test goodness of fit of the solution. However there is no analytical solution to error propagation. …but Monte Carlo error propagation can be used. 1. Calculate data prediction from solution 2. Add random realization of noise to data and repeat IRLS algorithm 3. Repeat Q times and generate difference vectors 30 Monte Carlo error propagation For the ballistics problem we get Compare to LS solution without outlier 31 What if we do not know the errors on the data ? Both Chi-square goodness of fit tests and model covariance Calculations require knowledge of the variance of the data. What can we do if we do not know ? Consider the case of CD = ¾2 I Independent data errors Calculated from least squares solution So we can still estimate model errors using the calculated data errors but we can no long claim anything about goodness of fit. 32 Example: Over-determined, Linear discrete inverse problem MATLAB exercise Generate data with Gaussian noise for a linear regression problem and invert for the best fitting gradient and intercept 1. Generate xi points randomly between 0 and 10 2. Calculate data yi 3. Add N[0,] noise to data yi 4. Calculate G matrix 5. Use MATLAB matrix routines to solve the normal equations 6. Plot the data, plot the data errors and plot the least squares solution 33 Model Resolution matrix If we obtain a solution to an inverse problem we can ask what its relationship is to the true solution m est = G ¡ gd But we know d = G m t r ue and hence m est = G ¡ gG m t r ue = R m t r ue The matrix R measures how `good an inverse’ G-g is. The matrix R shows how the elements of mest are built from linear combination of the true model, mtrue. Hence matrix R measures the amount of blurring produced by the inverse operator. For the least squares solution we have ¡ 1 ¡ 1 G ¡ g = ( G T CD G) ¡ 1 G T CD 5 ) R= I 34 Example: Model resolution in a tomographic experiment m = R m tr ue If the calculated model resolution matrix looked like this 2 6 6 R= 6 4 0:75 ¡ 0:25 0:25 0:25 ¡ 0:25 0:75 0:25 0:25 0:25 0:25 0:75 ¡ 0:25 0:25 0:25 ¡ 0:25 0:75 True model 3 7 7 7 5 Spike test What units do the elements of R have ? Recovered model 35 Data Resolution matrix If we obtain a solution to an inverse problem we can ask what how it compares to the data d pr e = G m est But we know m est = G ¡ gd obs and hence d pr e = GG¡ gd obs = D d obs The matrix D is analogous to the model resolution matrix R but measures how independently the model produced by G-g can reproduce the data. If D = I then the data is fit exactly and the prediction error d-Gm is zero. 36 Recap: Linear discrete inverse problems The Least squares solution minimizes the prediction error. Goodness of fit criteria tells us whether the least squares model adequately fits the data, given the level of noise. Chi-square with N-M degrees of freedom The covariance matrix describes how noise propagates from the data to the estimated model Chi-square with M degrees of freedom Gives confidence intervals The resolution matrix describes how the estimated model relates to the true model 37