Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
Linear Regression Industrial Engineering Majors Authors: Autar Kaw, Luke Snyder http://numericalmethods.eng.usf.edu Transforming Numerical Methods Education for STEM Undergraduates 8/11/2017 http://numericalmethods.eng.usf.edu 1 Linear Regression http://numericalmethods.eng.usf.edu What is Regression? What is regression? Given n data points ( x1, y1), ( x 2, y 2), ... , ( xn, yn ) best fit y f (x ) to the data. The best fit is generally based on minimizing the sum of the square of the residuals, Sr. Residual at a point is ( xn, yn ) i yi f ( xi ) y f (x) Sum of the square of the residuals n Sr ( yi f ( xi )) i 1 3 2 ( x1, y1) Figure. Basic model for regression http://numericalmethods.eng.usf.edu Linear Regression-Criterion#1 Given n data points ( x1, y1), ( x 2, y 2), ... , ( xn, yn) best fit y a0 a1 x to the data. y x,y i i i yi a0 a1 xi x2 , y2 x ,y n n x ,y 3 3 i yi a0 a1 xi x1 , y1 x Figure. Linear regression of y vs. x data showing residuals at a typical point, xi . Does minimizing n i 1 4 i work as a criterion, where i yi (a 0 a1 xi ) http://numericalmethods.eng.usf.edu Example for Criterion#1 Example: Given the data points (2,4), (3,6), (2,6) and (3,8), best fit the data to a straight line using Criterion#1 Table. Data Points x 10 8 y 6 4.0 3.0 6.0 4 2.0 6.0 2 3.0 8.0 0 y 2.0 0 1 2 3 4 x Figure. Data points for y vs. x data. 5 http://numericalmethods.eng.usf.edu Linear Regression-Criteria#1 Using y=4x-4 as the regression curve Table. Residuals at each point for regression model y = 4x – 4. 10 y ypredicted ε = y - ypredicted 8 2.0 4.0 4.0 0.0 6 3.0 6.0 8.0 -2.0 2.0 6.0 4.0 2.0 3.0 8.0 8.0 0.0 y x 4 i 1 i 4 2 0 0 0 1 2 3 4 x Figure. Regression curve for y=4x-4, y vs. x data 6 http://numericalmethods.eng.usf.edu Linear Regression-Criteria#1 Using y=6 as a regression curve Table. Residuals at each point for y=6 x y ypredicted ε = y - ypredicted 2.0 4.0 6.0 -2.0 3.0 6.0 6.0 0.0 2.0 6.0 6.0 0.0 3.0 8.0 6.0 2.0 10 8 4 i 1 i y 6 4 2 0 0 0 1 2 3 4 x Figure. Regression curve for y=6, y vs. x data 7 http://numericalmethods.eng.usf.edu Linear Regression – Criterion #1 4 i 1 i 0 for both regression models of y=4x-4 and y=6. The sum of the residuals is as small as possible, that is zero, but the regression model is not unique. Hence the above criterion of minimizing the sum of the residuals is a bad criterion. 8 http://numericalmethods.eng.usf.edu Linear Regression-Criterion#2 n Will minimizing i 1 i work any better? y x,y i i i yi a0 a1 xi x2 , y2 n n x ,y 3 x1 , y1 x ,y 3 i yi a0 a1 xi x Figure. Linear regression of y vs. x data showing residuals at a typical point, xi . 9 http://numericalmethods.eng.usf.edu Linear Regression-Criteria 2 Using y=4x-4 as the regression curve Table. The absolute residuals employing the y=4x-4 regression model y ypredicted |ε| = |y - ypredicted| 2.0 4.0 4.0 0.0 3.0 6.0 8.0 2.0 2.0 6.0 4.0 2.0 3.0 8.0 8.0 0.0 8 6 y x 10 4 2 0 4 i 1 10 i 0 4 1 2 3 4 x Figure. Regression curve for y=4x-4, y vs. x data http://numericalmethods.eng.usf.edu Linear Regression-Criteria#2 Using y=6 as a regression curve Table. Absolute residuals employing the y=6 model 2.0 y 4.0 8 |ε| = |y – ypredicted| ypredicted 6.0 6 y x 10 2.0 4 3.0 6.0 6.0 0.0 2.0 6.0 6.0 0.0 3.0 8.0 6.0 2.0 4 i 1 11 i 2 0 0 4 1 2 3 4 x Figure. Regression curve for y=6, y vs. x data http://numericalmethods.eng.usf.edu Linear Regression-Criterion#2 4 i 1 i 4 for both regression models of y=4x-4 and y=6. The sum of the errors has been made as small as possible, that is 4, but the regression model is not unique. Hence the above criterion of minimizing the sum of the absolute value of the residuals is also a bad criterion. 4 Can you find a regression line for which regression coefficients? 12 i 1 i 4 and has unique http://numericalmethods.eng.usf.edu Least Squares Criterion The least squares criterion minimizes the sum of the square of the residuals in the model, and also produces a unique line. n 2 n S r i yi a0 a1 xi 2 i 1 i 1 y x,y i i i yi a0 a1 xi x2 , y2 n 3 i yi a0 a1 xi x 13 n x ,y 3 x1 , y1 x ,y Figure. Linear regression of y vs. x data showing residuals at a typical point, xi . http://numericalmethods.eng.usf.edu Finding Constants of Linear Model n n 2 Minimize the sum of the square of the residuals: S r i yi a0 a1 xi To find a 0 and a1 we minimize Sr with respect to a1 i 1 2 and i 1 a0 . n S r 2 yi a0 a1 xi 1 0 a0 i 1 n S r 2 yi a0 a1 xi xi 0 a1 i 1 giving n n n a a x y i 1 0 n 1 i i 1 n a x a x i 1 14 0 i i 1 1 i i 1 2 n i yi xi (a0 y a1 x) i 1 http://numericalmethods.eng.usf.edu Finding Constants of Linear Model Solving for a 0 and a1 directly yields, a1 n n n i 1 i 1 i 1 n x i y i x i y i n 2 n x i x i i 1 i 1 n 2 and n a0 15 n n n x y x x y i 1 2 i i 1 i i 1 i i 1 2 n xi2 xi i 1 i 1 n n i i (a0 y a1 x) http://numericalmethods.eng.usf.edu Example 1 As machines are used over long periods of time, the output product can get off target. Below is the average value of how much off target a product is getting manufactured as a function of machine use. Table. Off target value as a function of machine use. 16 Millimeters Off Target, h 30 1.10 33 1.21 34 1.25 35 1.23 39 1.30 44 1.40 45 1.42 1.4 h, (millimeters) Hours of Machine Use, t 1.5 1.3 1.2 1.1 1 25 30 35 40 45 50 t,(hours) Figure. Data points for h vs. t data http://numericalmethods.eng.usf.edu Example 1 cont. Regress the data to h will be 2mm off target. a0 a1t and find when the product Table. Summation data for linear regression With n 7 th t h Hours Millimeters Hours2 Millimeter-Hour 30 1.10 900 33 33 1.21 1089 39.93 34 1.25 1156 42.50 35 1.23 1225 43.05 39 1.30 1521 50.70 44 1.40 1936 61.6 45 1.42 2025 63.9 260 8.91 9852 334.68 t2 a1 7 7 7 i 1 i 1 i 1 n ti hi ti hi 2 7 2 n ti ti i 1 i 1 7334.68 260 8.91 2 79852 260 0.019179 mm h 7 7 i 1 17 http://numericalmethods.eng.usf.edu Example 1 cont. _ _ The value for a0 can then be found using a 0 h a1 t where 7 _ h 7 h i 1 i 1.2729 mm n _ _ t t i 1 n i 37.143 hours _ a0 h a1 t 1.2729 0.0197937.143 0.56050 mm h h 0.56050 0.019179t 18 http://numericalmethods.eng.usf.edu Example 1 cont. The linear regression model is now given by h 0.56050 0.019179t Figure. Linear regression of hours of use vs. millimeters off target. 19 http://numericalmethods.eng.usf.edu Example 1 cont. Solving for when h 2 mm yields h 0.56050 0.19179t 2 0.56050 0.019179t 2 0.56050 t 0.019179 75.056 hours 20 http://numericalmethods.eng.usf.edu Example 2 To find the longitudinal modulus of composite, the following data is collected. Find the longitudinal modulus, E using the regression model Table. Stress vs. Strain data E and the sum of the square of the Strain Stress residuals. (%) (MPa) 21 0 0.183 306 0.36 612 0.5324 917 0.702 1223 0.867 1529 1.0244 1835 1.1774 2140 1.329 2446 1.479 2752 1.5 2767 1.56 2896 3.0E+09 Stress, σ (Pa) 0 2.0E+09 1.0E+09 0.0E+00 0 0.005 0.01 0.015 0.02 Strain, ε (m/m) Figure. Data points for Stress vs. Strain data http://numericalmethods.eng.usf.edu Example 2 cont. Residual at each point is given by i i E i The sum of the square of the residuals then is n S r i2 i 1 n i E i 2 i 1 Differentiate with respect to E n S r 2 i E i ( i ) 0 E i 1 n Therefore E i 1 n i 1 22 i i 2 i http://numericalmethods.eng.usf.edu Example 2 cont. Table. Summation data for regression model σ ε2 εσ With i ε 1 0.0000 0.0000 0.0000 0.0000 2 1.8300×10−3 3.0600×108 3.3489×10−6 5.5998×105 3 3.6000×10−3 6.1200×108 1.2960×10−5 2.2032×106 and 4 5.3240×10−3 9.1700×108 2.8345×10−5 4.8821×106 12 5 7.0200×10−3 1.2230×109 4.9280×10−5 8.5855×106 6 8.6700×10−3 1.5290×109 7.5169×10−5 1.3256×107 7 1.0244×10−2 1.8350×109 1.0494×10−4 1.8798×107 8 1.1774×10−2 2.1400×109 1.3863×10−4 2.5196×107 9 1.3290×10−2 2.4460×109 1.7662×10−4 3.2507×107 10 1.4790×10−2 2.7520×109 2.1874×10−4 4.0702×107 11 1.5000×10−2 2.7670×109 2.2500×10−4 4.1505×107 12 1.5600×10−2 2.8960×109 2.4336×10−4 4.5178×107 1.2764×10−3 2.3337×108 12 12 i 1 2 i 1.2764 10 3 i 1 i Using i 2.3337 10 8 12 E i 1 12 i i i 1 2 i 2.3337 108 1.2764 103 182.84 GPa i 1 23 http://numericalmethods.eng.usf.edu Example 2 Results The equation 182.84 describes the data. Figure. Linear regression for Stress vs. Strain data 24 http://numericalmethods.eng.usf.edu Additional Resources For all resources on this topic such as digital audiovisual lectures, primers, textbook chapters, multiple-choice tests, worksheets in MATLAB, MATHEMATICA, MathCad and MAPLE, blogs, related physical problems, please visit http://numericalmethods.eng.usf.edu/topics/linear_regr ession.html THE END http://numericalmethods.eng.usf.edu