* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Simple Regression - Villanova University
Survey
Document related concepts
Transcript
Simple Linear Regression and Correlation 1 Learning Objectives Describe the Linear Regression Model State the Regression Modeling Steps Explain Ordinary Least Squares Compute Regression Coefficients Predict Response Variable Describe Residual & Influence Analysis Interpret Computer Output 2 Models Representation of Some Phenomenon Mathematical Model Is a Mathematical Expression of Some Phenomenon Often Describe Relationships between Variables Types » Deterministic Models » Probabilistic Models 3 Deterministic Models Hypothesize Exact Relationships Suitable When Prediction Error is Negligible Force Is Exactly Mass Times Acceleration F = m·a © 1984-1994 T/Maker Co. 4 Probabilistic Models Hypothesize 2 Components » Deterministic » Random Error Sales Volume Is 10 Times Advertising Spending Plus Random Error » Y = 10X + e » Random Error May Be Due to Factors Other Than Advertising 5 Types of Probabilistic Models Probabilistic Models Regression Models Correlation Models Other Models 6 Regression Models Answer ‘What Is the Relationship Between the Variables?’ Equation Used » 1 Numerical Dependent (Response) Variable – What Is to Be Predicted » 1 or More Numerical or Categorical Independent (Explanatory) Variables Used Mainly for Prediction 7 Regression Modeling Steps 1. Define Problem or Question 2. Specify Model 3. Collect Data 4. Do Descriptive Data Analysis 5. Estimate Unknown Parameters 6. Evaluate Model 7. Use Model for Prediction 8 Problem Definition Most Critical Step » Don’t Want Right Answer to Wrong Question What Are the Model Objectives? Who Will Use the Model? What Will Be the Benefits? Are Resources Available (Data, etc.)? How Will the Results Be Implemented? 9 Regression Modeling Steps 1. Define Problem or Question 2. Specify Model 3. Collect Data 4. Do Descriptive Data Analysis 5. Estimate Unknown Parameters 6. Evaluate Model 7. Use Model for Prediction 10 Specifying the Model Define Variables » Conceptual (e.g., Advertising, Price) » Empirical (e.g., List Price, Regular Price) » Measurement (e.g., $, Units) Hypothesize Nature of Relationship » Expected Effects (i.e., Coefficients’ Signs) » Functional Form (Linear or Non-Linear) » Interactions 11 Model Specification Is Based on Theory 1. Economic & Business Theory 2. Mathematical Theory 3. Previous Research 4. ‘Common Sense’ 12 Which Functional Form? Sales Sales Advertising Sales Advertising Sales Advertising Advertising 13 Types of Regression Models 1 Explanatory Variable Regression Models Multiple Simple Linear 2+ Explanatory Variables NonLinear Linear NonLinear 14 Linear Equations Y Y = mX + b m = Slope Change in Y Change in X b = Y-intercept X High School Teacher © 1984-1994 T/Maker Co. 15 Linear Regression Model Relationship Between Variables Is a Linear Function Population Y-Intercept Population Slope Random Error Y = a + b1 X i + e i Dependent (Response) Variable Independent (Explanatory) Variable 16 Population & Sample Regression Models Population Unknown Relationship Random Sample Yi = a + b1X i + ei Yi = a + b1X i + e i 17 Population Linear Regression Model Y Yi = a + b1X i + e i Observed Value ei = Random Error mYX = a + b1X i X Observed Value 18 Sample Linear Regression Model Y Yi = a + b1X i + ei (3,Y) Y-Y Residual (3,Y) ei = Random Error Yi = a + b1X i Unsampled Observation X Observed Value 19 Regression Modeling Steps 1. Define Problem or Question 2. Specify Model 3. Collect Data 4. Do Descriptive Data Analysis 5. Estimate Unknown Parameters 6. Evaluate Model 7. Use Model for Prediction 20 Scatter Diagram 1. Plot of All (Xi, Yi) Pairs 2. Suggests How Well Model Will Fit 60 40 20 0 Y 0 20 40 X 60 21 Regression Modeling Steps 1. Define Problem or Question 2. Specify Model 3. Collect Data 4. Do Descriptive Data Analysis 5. Estimate Unknown Parameters 6. Evaluate Model 7. Use Model for Prediction 22 Thinking Challenge How Would You Draw a Line Through the Points? How Do You Determine Which Line ‘Fits Best’? 60 40 20 0 Y 0 20 40 X 60 23 Ordinary Least Squares ‘Best Fit’ Means Difference Between Actual Values (Y ) & Predicted Values ( Y ) Are a Minimum » But Positive Differences Off-Set Negative $) ( Y Y n i i =1 i 2 n = e 2i i =1 OLS Minimizes the Sum of the Squared Differences (or Errors) 24 Ordinary Least Squares Graphically n OLS Minimizes 2 2 2 2 2 e e e e e i = 1+ 2+ 3+ 4 i =1 Yi = a + b1X i + ei Y e4 e2 e1 e3 Y$i = a + b1X i X 25 Coefficient Equations Sample Regression Equation Y$i = a + b1X i # (Xi, Yi) Pairs n Sample Slope b1 = X iYi - nXY i =1 n 2 Xi i =1 Sample Y-Intercept () -n X a = Y - b1X 2 Average Xi’s, Then Square 26 Computation Table Xi Yi X1 Y1 X2 Y2 2 Xi 2 X1 2 X2 : : : : : Xn Yn 2 Xn 2 Yn XnYn SXi SYi SXi 2 SYi SXiYi 2 2 Yi 2 Y1 2 Y2 X1Y1 XiYi X2Y2 27 Parameter Estimation Example You’re a marketing analyst for Hasbro Toys. You gather the following data: Ad $ Sales (Units) 1 1 2 1 3 2 4 2 5 4 What is the relationship between sales & advertising? 28 Scatter Diagram Sales vs. Advertising Sales 4 3 2 1 0 0 1 2 3 4 5 Advertising 29 Parameter Estimation Solution Table Xi Yi 2 Xi 2 Yi XiYi 1 1 1 1 1 2 1 4 1 2 3 2 9 4 6 4 2 16 4 8 5 4 25 16 20 15 10 55 26 37 30 Parameter Estimation Solution n b1 = X iYi - nXY i =1 n 2 Xi -n X 2 = 37 - 5 ( 3)( 2) 55 - 5 ( 9 ) = 0.70 i =1 a = Y - b1X = 2 - 0.70 ( 3 ) = -0.10 $ Y = -0.10 + 0.70 X 31 Coefficient Interpretation Solution Slope (b1) » Sales Volume (Y) Is Expected to Increase by .7 Units for Each $1 Increase in Advertising (X) Y-Intercept (a) » Average Value of Sales Volume (Y) Is -.10 Units When Advertising (X) Is 0 – Difficult to Explain to Marketing Manager – Expect Some Sales Without Advertising 32 Interpretation of Coefficients Slope (b1) » Estimated Y Changes by b1 for Each 1 Unit Increase in X – If b1 = 2, then Sales (Y) Is Expected to Increase by 2 for Each 1 Unit Increase in Advertising (X) Y-Intercept (a) » Average Value of Y When X = 0 – If a = 4, then Average Sales (Y) Is Expected to Be 4 When Advertising (X) Is 0 33 4.5 4.0 3.5 3.0 2.5 2.0 SALES 1.5 1.0 .5 0 ADVERT 1 2 3 4 5 6 Parameter Estimation SPSS Output i a c d e d f t i s c B e M i t E g 1 ( 1 5 7 5 A 0 1 4 6 5 a D Parameter Estimation Thinking Challenge You’re an economist for the county cooperative. You gather the following data: Fertilizer (lb.) Yield (lb.) 4 3.0 6 5.5 10 6.5 12 9.0 What is the relationship between fertilizer & crop yield? © 1984-1994 T/Maker Co. 38 Scatter Diagram Crop Yield vs. Fertilizer Yield (lb.) 10 8 6 4 2 0 0 5 10 15 Fertilizer (lb.) 39 Scatter Diagram Crop Yield vs. Fertilizer* Yield (lb.) 10 8 6 4 2 0 0 5 10 15 Fertilizer (lb.) 41 Parameter Estimation Solution Table* 2 2 Xi Yi Xi Yi XiYi 4 3.0 16 9.00 12 6 5.5 36 30.25 33 10 6.5 100 42.25 65 12 9.0 144 81.00 108 32 24.0 296 162.50 218 42 Parameter Estimation Solution* n b1 = X iYi - nXY i =1 n 2 Xi -n X 2 i =1 = 218 - 4( 8)( 6) 296 - 4( 64) = 0.65 () a = Y - b1X = 6 - 0.65 8 = 0.80 $ Yi = 0.80 + 0.65 X i 43 Coefficient Interpretation Solution* Slope (b1) » Crop Yield (Y) Is Expected to Increase by .65 lb. for Each 1 lb. Increase in Fertilizer (X) Y-Intercept (a) » Average Crop Yield (Y) Is Expected to Be 0.8 lb. When No Fertilizer (X) Is Used 44 Regression Modeling Steps 1. Define Problem or Question 2. Specify Model 3. Collect Data 4. Do Descriptive Data Analysis 5. Estimate Unknown Parameters 6. Evaluate Model 7. Use Model for Prediction 45 Evaluating the Model 1. How Well Does the Model Describe the Relationship Between the Variables? 2. Closeness of ‘Best Fit’ Closer the Points to the Line the Better 3. Assumptions Met 4. Significance of Parameter Estimates 5. Outliers (Unusual Observations) 46 Evaluating Model Steps 1. Examine Variation Measures 2. Test Coefficients for Significance 3. Do Residual Analysis Y$i = a + b1X i 4. Do Influence Analysis 47 Random Error Variation Variation of Actual Y from Predicted Y Measured by Standard Error of Estimate » Sample Standard Deviation of e » Denoted SYX Affects Several Factors » Parameter Significance » Prediction Accuracy 48 Standard Error of Estimate ( ) e e n 2 i SYX = i =1 n - k -1 n SYX = Yi i =1 2 -a ( n = Yi - Y$i i =1 ) 2 n - k -1 n n i =1 i =1 Yi - b1 X iYi n - k -1 49 Standard Error of the Estimate Example You’re a marketing analyst for Hasbro Toys. You find a = -.1 & b1 = .7. Ad $ Sales (Units) 1 1 2 1 3 2 4 2 5 4 What is the standard error of the estimate? 50 Solution Table Xi Yi 2 Xi 2 Yi XiYi 1 1 1 1 1 2 1 4 1 2 3 2 9 4 6 4 2 16 4 8 5 4 25 16 20 15 10 55 26 37 51 Standard Error of Estimate Solution n SYX = SYX = Yi i =1 2 -a n n i =1 i =1 Yi - b1 X iYi n - k -1 26 - ( -.1)(10) - (.7)(37) 5 - 1- 1 = .6055 52 Rule of Thumb for Interpreting the Standard Error of Estimate Regression line + 1(std. error): about 68% of the data points are expected to fall in this interval Regression line + 2(std. error): about 95% of the data points are expected to fall in this interval Regression line + 3(std. error): about 99.7% of the data points are expected to fall in this interval 53 Graphic Representation of Standard Error of Estimate One Standard Error One Standard Error Y _ X Xgiven X 54 u E t q q R m M a 1 5 a P 55 Regression Modeling Steps 1. Define Problem or Question 2. Specify Model 3. Collect Data 4. Do Descriptive Data Analysis 5. Estimate Unknown Parameters 6. Evaluate Model 7. Use Model for Prediction 56 Prediction With Regression Models Types of Predictions » Point Estimates » Interval Estimates What Is Predicted » Population Mean Response (mYX) for Given X – Point on Population Regression Line » Individual Response (Yi) for Given X 57 What Is Predicted Y YIndividual Mean Y (mYX) mYX= a + b 1X Prediction, ^ Y XGiven X 58 Confidence Interval Estimate of Mean Y (mYX) Y$ - tn- k -1,a / 2 SY$ mYX Y$ + tn- k -1,a / 2 SY$ where SY$ = SYX (X given - X ) + n n 2 2 ( ) X n X i 2 1 i =1 59 Factors Affecting Interval Width 1. Level of Confidence (1 - a) Width Increases as Confidence Increases 2. Data Dispersion (SYX) Width Increases as Variation Increases 3. Sample Size Width Decreases as Sample Size Increases 4. Distance of Xgiven from Mean`X Width Increases as Distance Increases 60 Why Distance from Mean? Y Greater Dispersion Than X1 _ Y X1 `X X2 X 61 Confidence Interval Estimate Example You’re a marketing analyst for Hasbro Toys. You find b0 = -.1, b1 = .7 & SYX = .60553. Ad $ Sales (Units) 1 1 2 1 3 2 4 2 5 4 Estimate the mean sales when advertising is $4 at the .05 level. 62 Solution Table Xi Yi 2 Xi 2 Yi XiYi 1 1 1 1 1 2 1 4 1 2 3 2 9 4 6 4 2 16 4 8 5 4 25 16 20 15 10 55 26 37 63 Confidence Interval Estimate Solution Y$i = -010 . + 070 . Xi Y$ - tn- P -1,a / 2 SY$ mYX Y$ + tn- P -1,a / 2 SY$ Y$ = -0.1 + 0.7( 4) = 2.7 SY$ =.60553 1 5 ( + X to be Predicted (4 - 3) 2 55 - 5 ( 3) 2 = 0.3316 ) ( ) 2.7 - 3.1824 0.3316 mYX 2.7 + 3.1824 0.3316 16445 . mYX 3.7553 Prediction Interval of Individual Response $ $ Y - tn - k -1,a / 2 S ind YP Y + tn- k -1,a / 2 Sind where S ind = SYX (X given - X ) 1+ + n n 2 2 ( ) X n X i 2 1 i =1 Note! 65 Why the Extra SYX? Y Y we're trying to predict e Expected (Mean) Y, ( myx) myx= a + b 1X Prediction, ^ Y Xgiven X 66 Prediction Interval of Individual Response Solution Y$ = -010 . + 0.70X i $ +t Y$ - tn - k -1a S Y Y S , /2 P n k , / 2 -1 a ind ind Y$ = -0.1 + 0.7( 4) = 2.7 (4 - 3) 1 Sind=.60553 1+ + 5 55 -( 5) 3 ( 2.7 - 3.1824 1.3 2 2 ) mYX 2.7 + .3 = 1.3 ( 3.1824 14371 . . Y 6.8371 1.3 ) i Hyperbolic Interval Bands Y _ X Xgiven X 68 Interval Estimate SPSS Output ad sales lmci_1 1.00 2.00 3.00 4.00 5.00 -.89270 .24450 1.13819 1.64450 1.90730 1.00 1.00 2.00 2.00 4.00 umci_1 2.09270 2.35550 2.86181 3.75550 4.89270 lici_1 -1.83757 -.89719 -.11100 .50281 .96243 uici_1 3.03757 3.49719 4.11100 4.89719 5.83757 70 Regression Modeling Steps 1. Define Problem or Question 2. Specify Model 3. Collect Data 4. Do Descriptive Data Analysis 5. Estimate Unknown Parameters 6. Evaluate Model 7. Use Model for Prediction 71 Evaluating the Model 1. How Well Does the Model Describe the Relationship Between the Variables? 2. Closeness of ‘Best Fit’ Closer the Points to the Line the Better 3. Assumptions Met 4. Significance of Parameter Estimates 5. Outliers (Unusual Observations) 72 Evaluating Model Steps 1. Examine Variation Measures 2. Test Coefficients for Significance 3. Do Residual Analysis Y$i = a + b1X i 4. Do Influence Analysis 73 Measures of Variation in Regression Total Sum of Squares (SST) » Measures Variation of Observed Yi Around the Mean`Y Explained Variation (SSR) » Variation Due to Relationship Between X&Y Unexplained Variation (SSE) » Variation Due to Other Factors 74 Variation Measures Y Yi Unexplained Sum of ^ )2 Squares (Yi - Y i SSE (xi,Yi) SST Total Sum of Squares (Yi -`Y)2 (xi,Yi) $ Yi = a + b1X i Explained Sum of ^ -`Y)2 Squares (Y i (xi,Yi) Xi SSR `Y X 75 Relationship SST = SSR + SSE SST = SSR SSE + SST SST SST 1 SSR SSR SSE = + SST SST SST 76 Coefficient of Determination Proportion of Variation ‘Explained’ by ˆ Relationship Between X & Y 0 r2 1 Explained Variation SSR r = = Total Variation SST 2 a = n n i =1 i =1 Yi + b1 X iYi - n Y n Yi i =1 2 -nY 2 2 77 Coefficient of Determination Examples Y r2 = 1 Y r2 = 1 ^=b +b X Y i 0 1 i ^=b +b X Y i 0 1 i X Y r2 = .8 X Y ^=b +b X Y i 0 1 i X r2 = 0 ^=b +b X Y i 0 1 i X 78 Adjusted Coefficient of Determination Proportion of Variation ‘Explained’ by Relationship Between X & Y Reflects » Sample Size » Number of Independent Variables Equation 2 radj ( = 1- 1- r 2 ) n - 2 n -1 79 Coefficient of Determination Example You’re a marketing analyst for Hasbro Toys. You find a = -.1 & b1 = .7. Ad $ Sales (Units) 1 1 2 1 3 2 4 2 5 4 What is the coefficient of determination? 80 Solution Table Xi Yi 2 Xi 2 Yi XiYi 1 1 1 1 1 2 1 4 1 2 3 2 9 4 6 4 2 16 4 8 5 4 25 16 20 15 10 55 26 37 81 Coefficient of Determination Solution Y$i = -0.10 + 0.70 X i a r = 2 n n i =1 i =1 Yi + b1 X iYi - n Y ( ) Y n Y -0.10(10)+ 0.70( 37)- 5( 2) 26 - 5( 2) n i =1 = 2 81.67% of Variation in Sales Is Due Advertising =.8167 2 2 i 2 2 82 Coeficient of Determination SPSS Output M ode l Summary Model 1 R .904 a R Square .817 a. Predictors: (Constant), ADVERT Adjusted R Square .756 Std. Error of the Estimate .6055 Types of Probabilistic Models Probabilistic Models Regression Models Correlation Models Other Models 84 Correlation Models Answer ‘How Strong Is the Linear Relationship Between 2 Variables?’ Coefficient of Correlation Used » Population Correlation Coefficient Denoted r (Rho) » Values Range from -1 to +1 » Measures Degree of Association Used Mainly for Understanding 85 Sample Coefficient of Correlation Pearson Product-Moment Coefficient of ˆ Correlation r = Coefficient of Determination n = ( X i - X )(Yi - Y i =1 ( ) X X n i =1 i 2 ) ( ) Y Y n i =1 2 i 86 Coefficient of Correlation Values Perfect Negative Correlation -1.0 Perfect Positive Correlation No Correlation -.5 Increasing Degree of Negative Correlation 0 +.5 +1.0 Increasing Degree of Positive Correlation 87 Coefficient of Correlation & Regression Model Y r=1 Y r = -1 ^=a +b X Y i 1 i ^=a +b X Y i 1 i X Y r = .89 X Y ^=a +b X Y i 1 i X r=0 ^=a +b X Y i 1 i X 88 Test of Coefficient of Correlation Tests If There Is a Linear Relationship Between 2 Numerical Variables Same Conclusion as Testing Population Slope b1 Hypotheses » H0: r = 0 (No Correlation) » H1: r 0 (Correlation) 89 Evaluating Model Steps Examine Variation Measures Test Coefficients for Significance Do Residual Analysis Do Influence Analysis $ Yi = a + b1X i 90 Test of Slope Coefficient Tests If There Is a Linear Relationship Between X & Y Involves Population Slope b1 Hypotheses » H0: b1 = 0 (No Linear Relationship) » H1: b1 0 (Linear Relationship) Theoretical Basis Is Sampling Distribution of Slopes 91 Sampling Distribution of Sample Slopes Y Sample 1 Line Sample 2 Line Population Line X Sampling Distribution sb1 b1 b1 All Possible Sample Slopes Sample 1: 2.5 Sample 2: 1.6 Sample 3: 1.8 Sample 4: 2.1 Very Large Number of Sample Slopes 92 Test of Slope Coefficient Test Statistic t n - k -1 = b1 - b1 Sb1 where Sb1 = SYX n ( ) Xi - n X i -1 2 2 93 Test of Slope Coefficient Example You’re a marketing analyst for Hasbro Toys. You find b0 = -.1, b1 = .7 & SYX = .60553. Ad $ Sales (Units) 1 1 2 1 3 2 4 2 5 4 Is the relationship significant at the .05 level? 94 Solution Table Xi Yi 2 Xi 2 Yi XiYi 1 1 1 1 1 2 1 4 1 2 3 2 9 4 6 4 2 16 4 8 5 4 25 16 20 15 10 55 26 37 95 Test Statistic Solution Sb1 = SYX n () Xi - n X 2 i -1 t n - P -1 = b1 - b1 Sb1 = = 2 0.70 - 0 0.1915 0.60553 ( 0 . 1915 = 2 55 - 5 3 ) = 3.656 96 Test of Slope Parameter Solution H0: b1 = 0 Test Statistic: H1: b1 0 b1 - b1 0.70 - 0 t= = = +3.655 Sb1 0.1915 a = .05 df = 5 - 1 - 1 = 3 Critical Value(s): Decision: Reject Reject Reject at a = .05 .025 .025 -3.1824 0 3.1824 t Conclusion: There is Evidence of a Relationship 97 Test of Slope Parameter Computer Output Parameter Estimates Parameter Standard T for H0: Variable DF Estimate Error Param=0 Prob>|T| INTERCEP 1 -0.1000 0.6350 -0.157 0.8849 ADVERT 1 0.7000 0.1914 3.656 0.0354 bP Sb P t = bP / Sb P P-Value 98 Test of Slope Parameter SPSS Output ------------------ Variables in the Equation -----------------Variable AD (Constant) B SE B Beta T Sig T .700000 -.100000 .191485 .635085 .903696 3.656 -.157 .0354 .8849 Evaluating Model Steps Examine Variation Measures Test Coefficients for Significance Do Residual Analysis Do Influence Analysis $ Yi = a + b1X i 100 Residual Analysis Residual Analysis Purposes » Examine Functional Form (Linear vs. Non-Linear Model) » Evaluate Violations of Assumptions Graphical Analysis of Residuals » Plot Residuals vs. X Values » Residuals Mean Errors – Difference Between Actual Y & Predicted Y Y (X1, mY1) X1 For one value X1, a population contains may Y values. Their mean is mY1. Y A Population Regression Line mY = a + BX X A Sample Regression Line Y The sample line approximates the population regression line. y = a + bx x Population Y and sample y Population and Sample Regression Lines Population Regression Line One of many sample regression lines Population X and sample x Histogram of Y values at X = X1 Y mY1 = a + BX1 mY = a + BX X1 X Histogram of Y Values at X = X1 f(e) Y X1 X mY1 = a + BX1 mY = a + BX Normal Distribution of Y Values when X = X1 f(e) The standard deviation of the normal distribution is the standard error of estimate. Y X1 X mY1 = a + BX1 mY = a + BX Normality & Constant Variance Assumptions f(e) Y X2 X X1 A Normal Regression Surface f(e) Every cross-sectional slice of the surface is a normal curve. Y X2 X X1 Linear Regression Assumptions Normality Y Values Are Normally Distributed For Each X e is a normally distributed random variable with a mean of Zero [ E(e ) = 0 ] Homoscedasticity (Constant Variance) Standard deviation of the e values is the same regardless of the given value of X Variance of e is same for all values of X. Independence of Errors The residuals ( e ) are independent of each other The size of the error for a paarticular value of x is not related to the size of the error for any other value of x Each of these distributions: 1. Is normal 2. Has the standard deviation, estimated by syx One Standard Deviation One Standard Deviation Line of Regression All three means lie on line of regression X1 X2 X3 X Residual Plots for Normality Construct histogram of residuals Plot residuals vs. X values Residual Plot 1 for Normality Construct histogram of residuals Nearly symmetric Centered near or at zero Shape is approximately normal 10 8 6 4 Std. Dev = 1.61 Mean = 0.0 N = 31.00 2 0 -3.0 -2.0 -1.0 0.0 RESIDUAL 1.0 2.0 3.0 Residual Plot 2 for Normality Plot residuals vs. X values Points should be distributed about the horizontal line at 0 Otherwise, normality is violated Residuals 0 X Residual Plots for Normality Plot of Residuals vs X Values Histogram of Residual 10 Residuals 8 6 4 Std. Dev = 1.61 Mean = 0.0 N = 31.00 2 0 -3.0 -2.0 -1.0 0.0 RESIDUAL 1.0 2.0 3.0 0 X Using SPSS to Test for Normality of Residuals Statistics/Regression/Linear » Dependent - Earnings » Independent - Rdexpend » Plot/Standardized Residual Plot: Histogram » Save – Predicted Value (Unstandardized or Standardized) – Residual (Unstandardizedor Standardized) Graph/Scatter/Simple » Y-Axis: res_1 ( zre_1 ) » X-Axis: rdexpend List of Data, Predicted Values and Residuals NUMBER SALES PRE_1 RES_1 2 5 1 3 4 1 5 24.00 28.00 22.00 26.00 25.00 24.00 26.00 24.05556 26.88889 23.11111 25.00000 25.94444 23.11111 26.88889 -.05556 1.11111 -1.11111 1.00000 -.94444 .88889 -.88889 DUNTON'S WORLD OF SOUND Histogram Frequency Dependent Variable: SALES 3.5 3.0 2.5 2.0 1.5 1.0 .5 0.0 Std. Dev = .91 Mean = 0.00 N = 7.00 -1.00 -.50 0.00 .50 1.00 Regression Standardized Residual DUNTON'S WORLD OF SOUND Plot of Residuals vs Number 1.5 Residual 1.0 .5 0.0 -.5 -1.0 -1.5 0 1 2 3 NUMBER DUNTON'S WORLD OF SOUND 4 5 6 The Electronic Firms An accounting standards board investingating the treatment of research and developmnet expenses by the nation’s major electronic firms was interested in the relationship between a firm’s research and development expenditures and its earnings. Earnings = 6.840 + 10.671(rdexpend) List of Data, Predicted Values and Residuals RDEXPEND EARNINGS 15.00 8.50 12.00 6.50 4.50 2.00 .50 1.50 14.00 9.00 7.50 .50 2.50 3.00 6.00 Data 221.00 83.00 147.00 69.00 41.00 26.00 35.00 40.00 125.00 97.00 53.00 12.00 34.00 48.00 64.00 PRE_1 RES_1 ZPR_1 ZRE_1 166.90075 97.54224 134.88913 76.20116 54.86008 28.18373 12.17792 22.84846 156.23021 102.87751 86.87170 12.17792 33.51900 38.85427 70.86589 54.09925 -14.54224 12.11087 -7.20116 -13.86008 -2.18373 22.82208 17.15154 -31.23021 -5.87751 -33.87170 -.17792 .48100 9.14573 -6.86589 1.84527 .48229 1.21620 .06291 -.35647 -.88070 -1.19523 -.98554 1.63558 .58713 .27260 -1.19523 -.77585 -.67101 -.04194 2.39432 -.64361 .53600 -.31871 -.61342 -.09665 1.01006 .75909 -1.38218 -.26013 -1.49909 -.00787 .02129 .40477 -.30387 Predicted Value Residual Standardized Standardized Predicted Value Residual ELECTRONIC FIRMS Histogram Dependent Variable: EARNINGS 6 Frequency 5 4 3 2 Std. Dev = .96 Mean = 0.00 N = 15.00 1 0 -1.50 -.50 -1.00 .50 0.00 1.50 1.00 2.50 2.00 Regression Standardized Residual ELECTRONIC FIRMS Plot of Residuals vs R&D Expenditures Plot of Residuals vs X Values 60 Residual 40 20 0 -20 -40 0 2 4 6 8 10 RDEXPEND ELECTRONIC FIRMS 12 14 16 Standardized Residual Plot of St. Residuals vs RDexpend Plot of Standardized Residuals vs X Value 3 2 1 0 -1 -2 0 2 4 6 8 10 RDEXPEND ELECTRONIC FIRMS 12 14 16 Homoscedasticity Constant Variance Correct Specification Heteroscedasticity SR SR 0 0 X X Fan-Shaped. Standardized Residuals Used. 127 Using SPSS to Test for Homoscedasticity of Residuals Graph/Scatter/Simple »Y-Axis: res_1 (zres_1) »X Axis: rdexpend Test for Homoscedasticity Plot of Residuals vs Number 1.5 Residual 1.0 .5 0.0 -.5 -1.0 -1.5 0 1 2 3 4 5 6 NUMBER 129 DUNTON’S WORLD OF SOUND Test for Homoscedasticity Plot of Residuals vs R&D Expenditures Plot of Residuals vs X Values 60 Residual 40 20 0 -20 -40 0 2 4 6 8 10 RDEXPEND ELECTRONIC FIRMS 12 14 16 Residual Plot for Independence Correct Specification Not Independent SR SR X X Plots Reflect Sequence Data Were Collected. 131 Two Types of Autocorrelation Positive Autocorrelation: successive terms in time series are directly related Negative Autocorrelation: successive terms are inversely related 132 Positive autocorrelation: Residuals tend to be followed by residuals with the same sign Residual y-y 20 0 -20 0 4 8 12 Time Period, t 16 20 Negative Autocorrelation: Residuals tend to change signs from one period to the next Residual y-y 20 0 -20 0 4 8 12 Time Period, t 16 20 Problems with autocorrelated time-series data sy.x and sb are biased downwards Invalid probability statements about regression equation and slopes F and t tests won’t be valid May imply that cycles exist May induce a falsely high or low agreement between 2 variables 135 Using SPSS to Test for Independence of Errors Graph/Sequence » Variables: res_1 Durbin-Watson Statistic 136 Time Sequence of Residuals 1.5 Residual 1.0 .5 0.0 -.5 -1.0 -1.5 1 2 3 4 5 Sequence number DUNTON’S WORLD OF SOUND 6 7 Time Sequence Plot of Residuals 60 Residual 40 20 0 -20 -40 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Sequence number ELECTRONIC FIRMS 140 Durbin-Watson Procedure Used to Detect Autocorrelation » Residuals in One Time Period Are Related to Residuals in Another Period » Violation of Independence Assumption Durbin-Watson Test Statistic n D= ei - ei -1 i =2 n 2 e i i =1 2 H0 : No positive autocorrelation exists (residuals are random) H1 : Positive autocorrelation exists Accept Ho if d> du Reject Ho if d < dL Inconclusive if dL < d < du d= Testing for Positive Autocorrelation There is The test is positive autocorrelation inconclusive 0 dL du There is no evidence of autocorrelation 4 2 143 Using SPSS with Autocorrelation Statistics/Regression/Linear Dependent; Independent Statistics/Durbin-Watson (use only time series data) If DW indicates autocorrelation, then … Statistics/Time Series/Autoregression Cochrane-Orcutt OK 144 M ode l Summary Model 1 R .904 a R Square .817 a. Predictors: (Constant), ADVERT Adjusted R Square .756 Std. Error of the Estimate .6055 DurbinWatson 2.509 Solutions for autocorrelation Changes in the dependent and independent variables - first differences Transform the variables Include an independent variable that measures the time of the observation Use lagged variables (once lagged value of dependent variable is introduced as independent variable, Durbon-Watson test is not valid 146 Residual Plot for Linearity (Functional Form) Correct Specification Add X2 Term e e X X Plot of Residuals vs R&D Expenditures Plot of Residuals vs X Values 60 Residual 40 20 0 -20 -40 0 2 4 6 8 10 RDEXPEND ELECTRONIC FIRMS 12 14 16 Evaluating Model Steps Examine Variation Measures Test Coefficients for Significance Do Residual Analysis Do Influence Analysis $ Yi = a + b1X i 149 Influence Analysis Outliers Influence Analysis Examines Observations that Strongly Affect Coefficient Values Example - During Data Collection Union Strike Occurred Should Try to Understand Why Influential Observations Occurred Cautiously Consider Deleting Observation 151 Effect of Influential Observation Y Influential Observation Line With Influential Observation Line Without Influential Observation X 152 Regression Cautions Violated Assumptions Relevancy of Historical Data Level of Significance Extrapolation Cause & Effect 155 Extrapolation Y Interpolation Extrapolation Extrapolation Relevant Range X 156 Cause & Effect Liquor Consumption # Teachers 157 Conclusion Described the Linear Regression Model Stated the Regression Modeling Steps Explained Ordinary Least Squares Computed Regression Coefficients Described Residual & Influence Analysis Predicted Response Variable Interpreted Computer Output 158