Download 1. Given a set of data (xi,yi),1 ≤ i ≤ N, we seek to find a

1. Given a set of data (xi , yi ), 1 ≤ i ≤ N, we seek to find a representation of the data yî such that N X S= (yi − yî )2 , i=1 is minimized. Geometrically this minimizes the square of the sum of the distance between the observed points and the proposed model points ŷi . 2. Developed in the fields of Astronomy and Geodesy: a) Different observations as being the best estimate of the true value - errors decreasing on aggregation - first expressed by Roger Coates in 1722. b) Method of averages - combining different observations under the same conditions. Used by Tobias Mayer while studying librations of the moon in 1750 and by Laplace in explaining the differences in motion of Jupiter and Saturn in 1788. c) Combination of different observations taken under different conditionsRoger Joseph Boscovich in 1757 and Laplace in 1788. d) Development of a criterion that can be evaluated to determine when the solution with the minimum error has been achieved - Laplace. e) First clear and precise explanation by Legendre in 1805, though in 1809, Gauss published a method for calculating the orbits of celestial bodies and claimed to have known about this method since 1795. However, Gauss went beyond Legendre and invented the Gaussian or normal distribution. Used to predict the future location of the newly discovered asteroid Ceres. f) In 1810, Laplace, after reading Gauss’ work, after proving the central limit theorem, gave a large sample justificationfor the method of least squares and the normal distribution. g) In 1822, Gauss showed that the least squares approach to regression analysis is optimal in the sense that in a linear model where the errors have mean zero and have equal variances, the best linear unbiased estimator of the coefficients is the least squares estimator - Gauss-Markov theorem. 3. Linear least squares: for ther data above, fit a linear model y = a+bx. Typically, we have to specify an error distribution - y = a + bx + ǫ, where ǫ N (0, σ 2 ). Here, x is the independent variable and y is the dependent or response variable. a) Minimize the error N X S= (yi − (â + b̂.xi ))2 . i=1 b) So find the a, b such that ∂S ∂S = 0, = 0. ∂a ∂b 1 c) Find the theoretical values for the least squares estimates for â, b̂. d) Define N X SSE = (yi − ŷi )2 , i=1 b̂ = PN i=1 (xi − x̄)(yi − PN 2 i=1 (xi − x̄) â = ȳ − b̂x̄ = Note some other definitions: SSxx = N X i=1 SSxy = n X i=1 2 i=1 cov(x, y) = 2 N X i=1 N X i=1 (yi − ȳ) = x2i PN − x i yi − n X i=1 yi2 SSxy . SSxx = i=1 yi − b̂ N (xi − x̄)(yi − ȳ) = SSyy = Note, PN (xi − x̄) = n X ȳ) − i=1 xi N . PN x i )2 PN xi )( N Pn yi2 )2 ( i=1 N ( ( i=1 i=1 n , Pn i=1 . SSxy 2 SSxx cov(x, y) , sx = , b̂ = . n−1 n−1 s2x The standard error of the regression is σ 2 and is estimated by s r Pn 2 SSE i=1 (Yi − Ŷi ) SY X = = . n−2 n−2 The standard error on the estimate of b, b̂ is s Pn 1 2 i=1 (ǫ̂i ) n−2 P , sb̂ = n 2 i=1 (xi − x̄) where ǫ̂ = yi − ŷi . The standard error on the intercept, a, â, is v u n u1 X sâ = sb̂ t x2i . n i=1 2 yi ) , e) Here you are estimating a, b with â, b̂ - a case of parameter estimation. Note we assume the errors for each observation are independent of each other - homoskedasticity. f) The residual sum of squares is SSres = n X i=1 (yi − ŷi )2 . The regression sum of squares (SSR ) is SSR = n X i=1 SST = n X i=1 (ŷi − ȳ)2 , (yi − ȳ)2 . g) Also (in some cases - simple linear regression) SST = SSR + SSRes . g) The coefficient of determination is defined as R2 = 1 − SSres SStot Therefore in many cases a value of R2 close to 1 means that the regression does a good job of explaining the variance in the data. In linear least squares with an intercept term, R2 equals the square of the Pearson correlation coefficient between the observed and predicted values of the dependent variable. h) R2 gives some information about the goodness of fit of a model. In regression, its a measure of how well the regression line approximates the real data points. R2 = 1 suggests the regression line fits the data perfectly. i) In many cases R2 increases we increase the number of variables in the model. Obviously N data points can be explained with a model with N parameters. Hence we have adjusted R2 . This is the almost the same as before but penlizes statistic as extra variables are added. The adjusted R2 is denoted by R̄2 and is R̄2 = 1 − (1 − R2 ) 3 n−1 , n−p−1 where p is the total number of regressors or independent variables in the model. R̄2 can be negative. In An unbiased estimate of (σ)2 is σ2 = SSRes = M SERes . n−2 h) So far we have not made any use of the normality assumptions for the error ǫ. Hence upto now we only need homoskedasticity or constant variance. 8. Using the normality assumption, we have t= b̂ − b ∼ tn−2 , sb̂ which is a Student’s t distribution with n − 2 degrees of freedom. We can construct a confidence interval for the slope b as b ∈ [b̂ − sb̂ t∗n−2 , b̂ + sb̂ t∗n−2 ] at confidence level (1 − γ), where t∗n−2 is the (1 − γ/2)th quantile of the tn−2 distribution. Similarly, a confidence interval for the intercept, a is a ∈ [â − sâ .t∗n−2 , â + sâ .t∗n−2 ] at confidence level (1 − γ). 12. In the more general case with more than 2 parameters, we specify the model as n X yi = Xij βj , (i = 1, ....m). j=1 That is we have m linear equations in n unknown coefficients β1 , β2 , ..βm . Written in matrix form this is Xβ = y. Minimizing the sum of squares leads to the normal equations for the least squares estimate β̂. (X T X)β̂ = X T y. The solution is β̂ = (X T X)−1 X T y. There is a large literature on the solution of these equations. This is sometimes referred to as the Generalized Linear Model. Maximum Likelihood estimation with a normal cdf is least squares. 4 13. In cases where the variance is difference from observation to observation, the residual sum of squares can be expressed as S= n X Wii ri2 , i=1 where Wii = 1 , σi2 and ri = yi − yî . If the weight matrix, Wij is diagonal (observational errors are uncorrelated), then the normal equations become (X T W X)β̂ = X T W y. 14. If we call the uncertainty on a given observation |sigmai , then the method of least squars amounts to minimizing χ2 = X yi − a − bxi ]2 , [ σi Under appropriate assumptions, this χ2 is distributed as a χ2 variable with two degrees of freedom (what assumptions are these?). So in some cases least sequares is equivalent to miniomizing χ2 . In these situations, we can use the value of χ2 as a measure of goodness of fit. In higher dimensional problems, can plot ”contours of constant χ2 . 15. Least Squares fit to a Polynomial a) Suppose we want to fit y(x) = a1 + a2 x + a3 x2 + ..... + am xm−1 , or more generally y(x) = m X ak fk (x), k=1 where the functions fk (x) could be powers of x but do not involve the parameters ai .. Under normality assumptions, we have P (a1 , ...., am ) = Π( so that σi 1 √ m X −1 X 1 ak fk (xi )]2 ], [y − )exp[ i 2 σi2 2π k=1 m X 1 X χ = [ [yi − ak fk (xi )]]2 , σ1 2 k=1 5 and the method of least squares amounts minimizing this expression. Problem: show that under the normal distribution, the maximum likelihood estimator is the least squares estimator. Consider a model yi = a1 + s2 xi + a3 x2i , for 2l measurements (xi , yi ), i = 1, ...2l, with each measurement having standard deviation σi , and the observations being normally distributed. Formluate the least squares equations and develop a matrix equation for the unknown parameters a1 , a2 , a3 . 14. Examples: a) Look at the following data representing the potential difference as a function of position along a current carrying wire: (position(cm), voltage(V )) (10, 0.37), (20.0, 0.58), (30.0, 0.83), (40.0, 1.15), (50.0, 1.36), (60.0, 1.62), (70.0, 1.90), (80.0, 2.18), (90.0, 2.45). Is the voltage linearly related to the position in the wire? b) Number of counts detected in 7.5 min. intervals as a function of distance from source: (distance(m), Counts), (0.2, 901), (0.25, 652), (0.30, 443), (0.35, 339), (0.40, 283), (0.45, 281)(0.50, 240), (0.60, 220)(0.75, 180), (1.0, 154). Is the number of counts linearly related to the inverse square of the distance? d) Derive a formula for making a linear fit to data with an intercept at the origin so tha y = bx. Apply your method to fit a straight line through the origin to the following coordinate pairs, Assume uniform uncertainties (σi = 1.5 in yi . Find χ2 of the fit and the uncertainty in b. e) A student measures the temperature (T ) of water in an insulated flas at times (t) separated by 1 minute and obtains (t(s), T (C)), (0, 98.51), (1, 98.50), (2, 98.50), (3, 98.49), (4, 98.52), (5, 98.49), (6, 98.52), (7, 98.45), (8, 98.47). f) Calculate the mean temperature and its standard error. g) Plot a graph of temp. vs time and make a least squares fit of a straight line to the data, Is there a statristically significant slope to the graph? h) The intercept is not equal to the mean value of the temperature you calculated. Now shift the tiem coordinate so that the mean time is 0. Refit the data with the new values of T. is the intercept now identical to the mean value of T? i) Show that, if the mean value of x is equal to zero, then the intercept b calculated from least squares is identical to the mean value of y. j) Example: Cepheid Period-Luminosity Relation. 6

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download 1. Given a set of data (xi,yi),1 ≤ i ≤ N, we seek to find a