* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Linear Regression
Survey
Document related concepts
Transcript
Linear Regression (by Dr. Tomas Co 5/8/2008) Definition: Linear regression refer to the process of fitting a linear model of the form given by equation (1) to a set of given data. where , , … , are the independent variables, is the dependent variable, and , , … , are the coefficients of the model. The simplest case is the equation of a line, when 1, often written as where is the slope and is the intercept. (1) (2) Approaches: Let the th data point be denoted by , , … , , 1, … , , where is the number of data points which must be larger than 1. Method 1: Matrix Calculations 1. Set up matrix M and vector h. ( Note: the size of M is rows by 1 columns while the length of vector h is .) 1 1 ! (3) 2. Set up solution vector v. ( Note: the length of column vector v is 1 ). " # $ (4) 3. Solve for v using the least squares formula (aka the normal equation), " % ! & " ' ( ' ! (5) Figure 1. Linear regression via matrix calculations. (Note: this is an array formula, i.e. select range for vector v and then use [CTRL Shift ENTER] ). Method 2: Using the LINEST Function. 1. Set up the data and the solution region. Figure 2. Set up for data and solution region. 2. Select the solution region and then implement the LINEST function ( Note: use [CTRL Shift ENTER] to invoke the array function) as shown in Figure 3. Figure 3. Implement the LINEST function. The range B4:B12 is the set of y values while the range C4:D12 is for the and values. The third argument is set to TRUE to mean that we need the value of ) 0, otherwise we set it to FALSE if we want to force 0. The fourth argument is set to FALSE to mean that we are not requesting for the calculation of statistical parameters. (See section below for the case when the fourth argument is set to TRUE). Statistics from LINEST: 1. Some statistic are available when the fourth argument of LINEST is set to TRUE. To access these, one needs to expand the result region to include four more rows as shown in Figure 4. 2. The result region has 1 columns and 5 rows described in Table 1, Table 1. Regression and statistics resulting from LINEST. , -res 33reg 33res ∆ 1 ∆ ∆ 2 The first row still contains the coefficients , … , , . The second row contains the standard error values for each corresponding coefficient. For example, the coefficient has a standard error uncertainty of 0.0802 given in cell F5. The other results are summarized in Table 2, where th regressed data is denoted by 6 , and the average of is denoted by 7 , i.e. 6 7 ∑9 (6) (7) Table 2. Description of the LINEST Statics. Item Description Formula , Coefficient of Determination ∑96 : 7 - range: 0 ; , ; 1 - if close to 1, model is good -res Standard deviation of the residuals 33reg 2 - smaller value means model predicted points are close to data points. 1 F-distribution value 2 Degree of Freedom for residuals Application ∑9 : 7 < 33reg /ν 33res /ν where 2 : 1 : - if greater than 1critical C, : 1, 2 then regression is justified with an C-confidence level. - if zero then solution demands an equality instead of approximation 33reg Sum of Squares ∑96 : 7 of regressor errors - indicates variability of regressed data 33res Sum of Squares of residual ∑96 : errors - indicates variability of the residuals Remarks: a) The formula for the standard errors of the coefficients is given by ∆ -res DE where E is the th diagonal entry of E ' ( , with matrix as defined in equation (3). (8) b) To obtain a 95% confidence interval for each parameter, we need to multiply the standard error by the t-distribution value corresponding to desired confidence level. In Excel, this can be found using the TINV function. For example, for a 95% confidence interval of in the example above, we have F95 IJKL1 : 0.95, 2 2.4469 ∆PQ ∆ R F95 0.2007 & 95% conWidence interval for :3.0202 \ 0.2007 c) To perform the 1-test, we can also use the FINV function in Excel. For our example, 195 ]JKL1 : 0.95, 2 , 2 5.1432. Since, 1 1509.3 ^ 5.1432, we can justify the linear model proposed with 95% confidence.