* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Slide 1
Interaction (statistics) wikipedia , lookup
Instrumental variables estimation wikipedia , lookup
Lasso (statistics) wikipedia , lookup
Data assimilation wikipedia , lookup
Choice modelling wikipedia , lookup
Time series wikipedia , lookup
Regression analysis wikipedia , lookup
Chapter 10: Simple Linear Regression A model in which a variable, X, explains another variable, Y, using a linear structure, with allowance for error, e—the unexplained part of Y: Y = b + m*X + e Regression Analysis Assesses two sets of Issues: • How well does X explain Y (Regression Analysis)? • Do the regression “residuals” behave like they theoretically should? (Residuals Analysis)? Regression Analysis: 4 issues 1. 2. 3. 4. R2: coefficient of determination: Evaluates the fit of the regression line to the data. 0 ≤ R2 ≤ 1. Ideally, R2 ≈ 1. SE: standard error of the regression. Measures the sparseness of the actual data points from the regression line . The SE is measured in units of Y, and ideally, SE ≈ 0. Can also compare SE to average(Y) and obtain a Coefficient of Variation to assess magnitude of SE. ANOVA Table Significance F pvalue for the test of the null hypothesis that the regression line is statistically insignificant (Ho: b=m=0 vs. Ha: m≠ 0)) Coefficient s table that reports the estimated intercept and slope for the regression line, their respective standard errors, test statistics and also p-values for the numeric significance (Ha: slope, m ≠ 0, and Ha: intercept b≠ 0), versus H0: m=0 and H0: b=0, respectively. Regression Statistics: Coefficient of Determination, r2, and Standard Error Chapter 10, Regression Analysis r2 SSR Regression Sum of Squares SST Total Sum of Squares ANOVA Y Yˆ n SYX Y? Y e SSE n2 n2 ANOVA df SS MS F Significance F Regression k SSR MSR =SSR/k MSR/MSE P-value of the F Test Residuals n-k-1 SSE MSE =SSE/(n-k-1) Total n-1 SST Estimate to perform Regression Analysis using Least Squares Assumptions: 2 Equations to solve for 2 unknowns: intercept b0, and slope b1 Unbiased Explanation: Se = 0 Y b0 b1 X Explanatory Factor, X, uncorrelated with e: SX*e = 0 X Y b X b X Coeff. table i 1 ANOVA 0 2 i 1 2 Residuals Analysis: 3 issues 1. 2. 3. Normality of residuals requires that we construct a histogram of the residuals, or a Box-Whisker Plot of the residuals, or that we construct a Normal Probability Plot of the residuals with the assistance of MSExcel. The residuals plot should show no pattern or regularities in the scatterplot between X and e. Otherwise, the linear model inconsistently explains Y as a function of X, and a nonlinear function of X would better explain Y. Autocorrelation of the residuals can be tested by using excel to compute the Durbin-Watson statistic from the residuals calculated by the Regression process. 1.4 ≤ DW,≤ 2.6 for no significant autocorrelation 1. Checking for Normality of Residuals 2. Checking for Uniform Variation in Residuals Relative to X 3. Checking for autocorrelation in residuals Durbin-Watson Calculations Sum of Squared Difference of Residuals 2123665.578 Sum of Squared Residuals 870949.4547 Want this value to be “close” to 2.00 Durbin-Watson Statistic 2.438333897