Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Class 22. Understanding Regression Sections 1-3 and 7 of Pfeifer Regression note EMBS Part of 12.7 What is the regression line? • It is a line drawn through a cloud of points. • It is the line that minimizes sum of squared errors. Error aka residual – Errors are also known as residuals. Predicted aka fitted – Error = Actual – Predicted. – Error is the vertical distance point (actual) to line (predicted). – Points above the line are positive errors. • The average of the errors will be always be zero • The regression line will always “go through” the average X, average Y. Can you draw the regression line? Which is the regression line? A B C D E F Which is the regression line? D Which is the regression line? (2,7) (1,3) Error = 1-3 = -2 Sum of Errors is 0! Error = 7-3 =4 (1,1) SSE=(-2^2+4^2+-2^2) is smaller than from any other line. (2,3) (3,3) (3,1) Error = 1-3 = -2 The line goes through (2,3), the average. Draw in the regression line… 200 160 140 150 120 100 100 80 60 50 40 20 0 0 50 100 150 0 200 0 160 50 100 150 200 155 140 135 120 100 115 80 95 60 75 40 55 20 0 35 20 70 120 170 0 50 100 150 200 250 300 Draw in the regression line… 200 160 140 150 120 100 100 80 60 50 40 20 0 0 50 100 150 0 200 0 160 50 100 150 200 155 140 135 120 100 115 80 95 60 75 40 55 20 0 35 20 70 120 170 0 50 100 150 200 250 300 Two Points determine a line… ….and regression can give you the equation. 250 200 Degrees F Degrees C Degrees F 0 32 100 212 150 100 50 0 0 50 100 Degrees C 150 Two Points determine a line… ….and regression can give you the equation. 250 y = 1.8x + 32 200 Degrees F Degrees C Degrees F 0 32 100 212 150 100 50 0 0 50 100 Degrees C 150 Four Sets of X,Y Data Data Set A X Y 10 9.14 8 8.14 13 8.74 9 8.77 11 9.25 14 8.1 6 6.13 4 3.1 12 9.13 7 7.26 5 4.74 Data Set B X Y 10 8.04 8 6.95 13 7.58 9 8.81 11 8.33 14 9.96 6 7.24 4 4.26 12 10.84 7 4.82 5 5.68 Data Set C X Y 10 7.47 8 6.47 13 8.97 9 6.97 11 10.87 14 9.47 6 5.47 4 4.47 12 8.47 7 8.87 5 4.97 Data Set D X Y 19 12.08 19 11.26 19 13.21 19 14.34 19 13.97 19 12.54 19 10.75 8 7.00 19 11.06 19 13.41 19 12.39 Four Sets of X,Y Data 10 12 A 9 B 10 8 8 7 6 6 5 4 4 3 2 2 0 1 0 5 10 15 0 0 5 10 12 15 16 C D 14 10 12 8 10 6 8 6 4 4 2 2 0 0 0 5 10 15 0 5 10 15 20 Four Sets of X,Y Data Data Analysis/Regression SUMMARY OUTPUT Regression Statistics Multiple R 0.8166 R Square 0.6669 Adjusted R Square 0.6299 Standard Error 1.2357 Observations 11 Identical Regression Output For A, B, C, and D!!!!! ANOVA df Regression Residual Total Intercept X 1 9 10 SS MS F Significance F 27.5100 27.5100 18.0164 0.0022 13.7425 1.5269 41.2525 Coefficients Standard Error t Stat P-value 2.9993 2.1532 1.3929 0.1971 0.5001 0.1178 4.2446 0.0022 Lower 95% Upper 95% Lower 95.0% Upper 95.0% -1.8716 7.8702 -1.8716 7.8702 0.2336 0.7666 0.2336 0.7666 Assumptions • Y is normal and we sample n independent observations. – The sample mean 𝑌 is the estimate of μ – The sample standard deviation s is the estimate of σ. – We use 𝑌 and s and n to test hypotheses about μ • Using the t-statistic and the t-distribution with n-1 dof. – We never forecasted “the next Y”. • Although, our point forecast for a new Y would be 𝑌 Example: Section 4 IQs IQ Mean Standard Error Median Mode Standard Deviation Sample Variance Kurtosis Skewness Range Minimum Maximum Sum Count 108.545 3.448 110 102 19.807 392.318 0.228 -0.499 85 57 142 3582 33 𝑌 s To test H0: μ=100 𝑡= 𝑌 − 100 𝑠/ 𝑛 n The CLT tells us this test works even if Y is not normal. Regression Assumptions • Y│X is normal with mean a+bX and standard deviation σ, and we sample n independent observations. – We use regression to estimate a, b, and σ. • 𝑎, 𝑏, and “standard error” are the appropriate estimates. • Our point forecast for a new observation is 𝑎 + 𝑏 (X) – (Plug X into the regression equation) • At some point, we will learn how to use regression output to test interesting hypotheses. • What about a probability forecast of the new YlX? EMBS (12.14) Summary: The key assumption of In both cases, we use the t linear regression….. because we don’t know σ. • Y ~ N(μ,σ) (no regression) • Y│X ~ N(a+bX,σ) (with regression) – In other words μ = a + b (X) or E(Y│X) = a + b(X) The mean of Y given X is a linear function of X. Without regression, we used data to estimate and test hypotheses about the parameter μ. With regression, we use (x,y) data to estimate and test hypotheses about the parameters a and b. With regression, we also want to use X to forecast a new Y. Example: Assignment 22 Standard MSF 26 34.2 29 34.3 85.9 143.2 85.5 140.6 140.6 40.4 101 239.7 179.3 126.5 140.8 Hours 2 4.17 4.42 4.75 4.83 6.67 7 7.08 7.17 7.17 10 12 12.5 13.67 15.08 Regression Statistics Multiple R 0.72600331 R Square 0.527080806 Adjusted R Square 0.490702407 Standard Error 2.773595935 Observations 15 error n ANOVA df Regression Residual Total Intercept MSF 1 13 14 Coefficients 3.312316042 0.044489502 𝑎 𝑏 Forecasting Y│X=157.3 • Plug X=157.3 into the regression equation to get 10.31 as the point forecast. – The point forecast is the mean of the probability distribution forecast. • Under Certain Assumptions……. – GOOD METHOD Assumes 𝑎 and 𝑏 and “standard error” are a, b, and σ. • Pr(Y<8) = NORMDIST(8,10.31,2.77,true) = 0.202 Example: Assignment 22 Standard MSF 26 34.2 29 34.3 85.9 143.2 85.5 140.6 140.6 40.4 101 239.7 179.3 126.5 140.8 Hours 2 4.17 4.42 4.75 4.83 6.67 7 7.08 7.17 7.17 10 12 12.5 13.67 15.08 Regression Statistics Multiple R 0.72600331 R Square 0.527080806 Adjusted R Square 0.490702407 Standard Error 2.773595935 Observations 15 ANOVA df Regression Residual Total Intercept MSF 1 13 14 Coefficients 3.312316042 0.044489502 error 𝑌│(X=157.3) n Job A 1 157.3 Intercept MSF Point Forecast 10.3105 sigma 2.77 X 8 Normdist 0.2021 Job B 1 64.7 6.1908 2.77 8 0.7432 𝑎 𝑏 𝑇ℎ𝑖𝑠 𝑖𝑠 𝑡ℎ𝑒 𝑝𝑟𝑜𝑏𝑎𝑏𝑖𝑙𝑖𝑡𝑦 𝑌 < 8 𝑔𝑖𝑣𝑒𝑛 𝑋 = 157.3 𝑎𝑛𝑑 𝑤𝑒 𝑘𝑛𝑜𝑤 𝑎, 𝑏, σ. Forecasting Y│X=157.3 • Plug X=157.3 into the regression equation to get 10.31 the point forecast. – The point forecast is the mean of the probability distribution forecast. • Under Certain Assumptions……. – BETTER METHOD • t= (8-10.31)/2.77 = -0.83 • Pr(Y<8) = 1-t.dist.rt(-0.83,13) = 0.210 dof = n - 2 Assumes 𝑎 and 𝑏 are a and b….but accounts for the fact that “standard error” is not σ Forecasting Y│X=157.3 • Plug X=157.3 into the regression equation to get 10.31 the point forecast. – The point forecast is the mean of the probability To account for using 𝑎 and 𝑏 to distribution forecast. estimate a and b, we must • Under Certain Assumptions……. – PERFECT METHOD • t= (8-10.31)/2.93 = -0.79 • Pr(Y<8) = 1-t.dist.rt(-0.79,13) = 0.222 dof = n - 2 increase the standard deviation used in the forecast. The “correct” standard deviation is called “standard error of prediction”…which here is 0.293. Probability Forecasting with Regression summary • Plug X into the regression equation to calculate the point forecast. – This becomes the mean. • GOOD – Use the normal with “standard error” in place of σ. • BETTER – Use the t (with n-2 dof) to account for using “standard error” to estimate σ. • PERFECT – Use the t with the “standard error of prediction” to account for using 𝑎 and 𝑏 to estimate a and b. Probability Forecasting with Regression • “Standard error of prediction” is larger than “standard error” and depends on – 1/n (the larger the n the smaller is “standard error of prediction”) – (X-𝑋)^2 (the farther the X is from the average X, the larger is “standard error of prediction”) • As n gets big, the “standard error of prediction” approaches “standard error”. (EMBS 12.26) The X for which we predict Y 1 𝑠𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑒𝑟𝑟𝑜𝑟 𝑜𝑓 𝑝𝑟𝑒𝑑𝑖𝑐𝑡𝑖𝑜𝑛 = 𝑠𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑒𝑟𝑟𝑜𝑟 × 1 + + 𝑛 The good and better methods ignore these terms…okay the bigger the n. Summed over the n data points 𝑋−𝑋 2 𝑋−𝑋 2 BOTTOM LINE • You will be asked to use the BETTER METHOD – Use the t with n-2 dof – Just use “standard error” • Know that “standard error” is smaller than the correct “standard deviation of prediction”. – As a result, your probability distribution is a little too narrow. • Know that the “standard deviation of prediction” depends on 1/n and (X-𝑋)^2 … which means it approaches “standard error” as n gets big. Much ado about nothing? 25 Perfect (widest and curved) 95% Prediction Intervals 20 Better Hours 15 Good (straight and narrowest) 10 5 0 0 -5 50 100 150 MSF 200 250 300 TODAY • Got a better idea of how the “least squares” regression line goes through the cloud of points. • Saw that several “clouds” can have exactly the same regression line….so chart the cloud. • Practiced using a regression equation to calculate a point forecast (a mean) • Saw three methods for creating a probability distribution forecast of Y│X. – We will use the better method. – We will know that it understates the actual uncertainty…..a problem that goes away as n gets big. Next Class • We will learn about “adjusted R square” – (p 9-10 pfeifer note) – The most over-rated statistic of all time. • We will learn the four assumptions required to use regression to make a probability forecast of Y│X. – (Section 5 pfeifer note, 12.4 EMBS) – And how to check each of them. • We will learn how to test H0: b=0. – (p 12-13 pfeifer note, 12.5 EMBS) – And why this is such an important test.