Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Interpreting Bi-variate OLS Regression • Stata Regression Output • Regression plots and RSS • R2 -- Coefficient of Determination – Adjusted R2 • Sample Covariance/Correlation • Hypothesis Testing – Standard Errors – T-tests and P-values February 14, 2006 Lecture 5a Slide #1 Stata Regression Model: Regressing Political Ideology Scale onto “Militant” Y average of variables 81 and 82 (reversed), 84, 85 and 86 X is variable 98: Political Ideology 1 = “strong Lib” 7=“strong Cons” .5 .3 .4 .2 .3 it y ns D De y .2sit en .1 .1 0 0 0 2 February 14, 2006 4 mi lita nt 6 8 0 Lecture 5a 2 4 p9 8_id eol 6 8 Slide #2 Regression Output regress militant p98_ideo, beta Source | SS df MS -------------+-----------------------------Model | 885.217261 1 885.217261 Residual | 2499.09626 2582 .967891658 -------------+-----------------------------Total | 3384.31352 2583 1.31022591 Number of obs = 2584 F( 1, 2582) = 914.58 Prob > F = 0.0000 R-squared = 0.2616 Adj R-squared = 0.2613 Root MSE = .98381 -----------------------------------------------------------------------------militant | Coef. Std. Err. t P>|t| Beta -------------+---------------------------------------------------------------p98_ideo | .3650334 .0120704 30.24 0.000 .5114341 _cons | 2.289487 .0547324 41.83 0.000 . -----------------------------------------------------------------------------February 14, 2006 Lecture 5a Slide #3 Regression Descriptive Statistics corr militant p98_ideo, means Variable | Mean Std. Dev. Min Max -------------+---------------------------------------------------militant | 3.837771 1.144651 1 7 p98_ideo | 4.241486 1.603726 1 7 | militant p98_ideo -------------+-----------------militant | 1.0000 p98_ideo | 0.5114 1.0000 February 14, 2006 Lecture 5a Slide #4 0 2 4 6 8 Regression Plot 0 2 4 p9 8_id eol 95 % CI 6 8 Fi tted value s mi lita nt February 14, 2006 Lecture 5a Slide #5 Measuring “Goodness of Fit” • Root of Mean Squared Error (“Root MSE”) se RSS , where RSS = e2 , K = parameters n K – Measures spread around the regression line • Coefficient of Determination (R2) ESS (Yˆi Y ) 2 and TSS (Yi Y ) 2 “model” or explained sum of squares R2 February 14, 2006 “total” sum of squares ESS RSS and (1 R 2 ) TSS TSS Lecture 5a 2 e 2 ( Y Y ) i Slide #6 Explaining R2 For each observation Yi, variation around the mean can be decomposed into that which is “explained” by the regression and that which is not: Book terminology: TSS = (all)2 RSS = (unexplained)2 ESS = (explained)2 unexplained deviation explained deviation Y Yˆ February 14, 2006 Lecture 5a Stata terminology: Residual = (unexplained)2 Model = (explained)2 Total = (all)2 Slide #7 Sample Covariance & Correlation • Sample covariance for a bivariate model is defined as: (Xi X )(Yi Y ) sXY n 1 • Sample correlations (r) “standardize” covariance by dividing by the product of the X and Y standard deviations: sX Y r sX sY February 14, 2006 Sample correlations range from -1 (perfect negative relationship) to +1 (perfect positive relationship) Lecture 5a Slide #8 Standardized Regression Coefficients (aka “Beta Weights” or “Betas”) • Formula: sX b b1 sY • In our example: * 1 1.604 0.365 0.511 1.145 • Interpretation: the number of std. deviations change in Y one should expect from a one std. deviation Change in X. February 14, 2006 Lecture 5a Slide #9 Hypothesis Tests for Regression Coefficients • For our model: Yi = 2.289+0.365*Xi+ei • Another sample of 2584 observations would lead to different estimates for b0 and b1. If we drew many such samples, we’d get the sample distribution of the estimates • We need to estimate the sample distribution, (because we usually can’t see it) based on our sample size and variance February 14, 2006 Lecture 5a Slide #10 To do that we calculate SEbs (Bivariate case only) se SEb1 , where TSSX (Xi X ) 2 TSSX SEb0 se February 14, 2006 1 X2 n TSSX Lecture 5a Slide #11 Interpreting Standard Errors • For our model: – b0 = 2.289, and SEb0 = 0.055 – b1 = 0.365, and SEb1 = 0.012 The T-test reports the number of standard errors our estimate falls away from zero. Thus, the “T” for b1 is 30.24 for our model. (rounding!) 0 (which is 30.24 SEb1 “units” away from b1) February 14, 2006 Assuming that we estimated the sample standard error correctly, we can identify how many standard errors our estimate is away from zero. Estimated Sampling Distribution for b1 b1 = 0.365 b1 - SEb1= 0.353 Lecture 5a b1 + SEb1= 0.373 Slide #12 Classical Hypothesis Testing Assume that b1 is zero. What is the probability that your sample would have resulted in an estimate for b1 that is 30.24 SEb1’s away from zero? To find out, determine the cumulative density of the estimated sampling distribution that falls more than 30.24 SEb1’s away from zero. See Table A4.1, page 350, in Hamilton. It reports discrete “p-values”, given the sample size and t-values. Note the distinction between 1 and 2 sided tests In general, if the t-stat is above 2, the p-value will be <0.05 -- which is the acceptable upper limit in a classical hypothesis test. Note: in Stata-speak, a p-value is a “p>|t|” Assume that b1 = 0.0 (null hypothesis) February 14, 2006 Estimated b1 = 0.365 (working hypothesis) Lecture 5a Slide #13 Coming up... • For Tuesday – Use variables 87-89 to make an “egalitarian” index for your dependent variable (Y) – Use p98_ideo (ideology) as the independent variable (X) to predict egaitarianism. Fully interpret the results. • Walk through the entire interpretation • Build a Stata do-file as you go • For Next Week: – Remainder of Chapter 2 • Schedule: – Feb 21: Residual Analysis & Exam Review – Feb 28: Exam February 14, 2006 Lecture 5a Slide #14