Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Time series wikipedia , lookup
Instrumental variables estimation wikipedia , lookup
Data assimilation wikipedia , lookup
Regression toward the mean wikipedia , lookup
Interaction (statistics) wikipedia , lookup
Choice modelling wikipedia , lookup
Linear regression wikipedia , lookup
Chapter 15 Model Building and Model Diagnostics McGraw-Hill/Irwin Copyright © 2011 by The McGraw-Hill Companies, Inc. All rights reserved. Model Building and Model Diagnostics 15.1 15.2 15.3 15.4 The Quadratic Regression Model Interaction Logistic Regression Model Building, and the Effects of Multicollinearity 15.5 Improving the Regression Model I: Diagnosing and Using Information about Outlying and Influential Observations 15-2 Model Building and Model Diagnostics 15.6 15.7 Improving the Regression Model II: Transforming the Dependent and Independent Variables Improving the Regression Model III: The Durbin-Watson Test and Dealing with Autocorrelation 15-3 LO 1: Model quadratic relationships by using the quadratic regression model. 15.1 The Quadratic Regression Model One useful form of linear regression is the quadratic regression model Assume we have n observations of x and y The quadratic regression model relating y to x is y = β0 + β1x + β2x2 + 1. 2. 3. β0 + β1x + β2x2 is the mean value of the dependent variable y when the value of the independent variable is x β0, β1 and β2 are unknown regression parameters relating the mean value of y to x is an error term that describes the effects on y of all factors other than x and x2 15-4 LO1 More Variables We have only looked at the simple case where we have y and x That gave us the quadratic regression model y = β0 + β1x + β2x2 + However, we are not limited to just two terms The following would also be a valid quadratic regression model y = β0 + β1x1 + β2x12 + β3x2 + β4x3 + 15-5 LO 2: Detect and model interaction between two independent variables. 15.2 Interaction Multiple regression models often contain interaction variables These are variables that are formed by multiplying two independent variables together For example, x1·x2 In this case, the x1·x2 variable would appear in the model along with both x1 and x2 We use interaction variables when the relationship between the mean value of y and one of the independent variables is dependent on the value of another independent variable 15-6 LO 3: Use a logistic model to estimate probabilities and odds ratios. 15.3 Logistic Regression Logistic regression and least squares regression are very similar Both produce prediction equations The y variable is what makes logistic regression different With least squares regression, the y variable is a quantitative variable With logistic regression, it is usually a dummy 0/1 variable With large data sets, y variable may be the probability of a set of observations having a dummy variable value of one 15-7 LO3 General Logistic Regression Model o 1 x1 2 x2 k xk e px1 , x2 ,, xk o 1 x1 2 x2 k xk 1 e p(x1,x2,…xk) is the probability that the event under consideration will occur when the values of the independent variable are x1,x2,…xk The odds of the event occurring are p(x1,x2,…xk)/(1-p(x1,x2,…xk)) The probability that the event will occur divided by the probability it will not occur 15-8 LO 4: Describe and measure multicollinearity. Multicollinearity is the condition where the independent variables are dependent, related or correlated with each other Effects 15.4 Model Building and the Effects of Multicollinearity Hinders ability to use t statistics and p-values to assess the relative importance of predictors Does not hinder ability to predict the dependent (or response) variable Detection Scatter plot matrix Correlation matrix Variance inflation factors (VIF) 15-9 LO 5: Use various model comparison criteria to identify one or more appropriate regression models. Comparing Regression Models on R2, s, Adjusted R2, and Prediction Interval Multicollinearity causes problems evaluating the pvalues of the model Therefore, we need to evaluate more than the additional importance of each independent variable We also need to evaluate how the variables work together One way to do this is to determine if the overall model gives a high R2 and adjusted R2, a small s, and short prediction intervals 15-10 LO5 C Statistic Another quantity for comparing regression models is called the C statistic First, calculate mean square error for the model containing all p potential independent variables Also known as CP statistic Denoted s2p Next, calculate SSE for a reduced model with k independent variables Calculate C as SSE C 2 n 2k 1 sp 15-11 LO 6: Use diagnostic measures to detect outlying and influential observations. 15.5 Diagnosing and Using Information About Outlying and Influential Observations Observation 1: Outlying with respect to y value Observation 2: Outlying with respect to x value Observation 3: Outlying with respect to x value and y value not consistent with regression relationship (Influential) 15-12 LO 7: Use data transformations to help remedy violations of the regression assumptions. A possible remedy for violations of the constant variance, correct functional form and normality assumptions is to transform the dependent variable Possible transformations include 15.6 Transforming the Dependent and Independent Variables Square root Quartic root Logarithmic The appropriate transformation will depend on the specific problem with the original data set 15-13 LO 8: Use the Durbin– Watson test to detect autocorrelated error terms. 15.7 The Durbin-Watson Test and Dealing with Autocorrelation One type of autocorrelation is called firstorder autocorrelation This is when the error term in time period t (t) is related to the error term in time period t-1 (t-1) The Durbin-Watson statistic checks for firstorder autocorrelation 15-14