Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Regression Adv. Experimental Methods & Statistics PSYC 4310 / COGS 6310 Michael J. Kalsher Department of Cognitive Science PSYC 4310/6310 Advanced Experimental Methods and Statistics © 2012, Michael Kalsher 1 Introduction to Regression • If two variables covary, we should be able to predict the value of one variable from another. • Correlation only tells us how much two variables covary. • In regression, we construct an equation that uses one or more variables (the IV(s) or predictor variable(s)) to predict another variable (the DV or outcome variable). – Predicting from one IV = Simple Regression – Predicting from multiple IVs = Multiple Regression PSYC 4310/6310 Advanced Experimental Methods and Statistics © 2011, Michael Kalsher Simple Regression: The Model • The general equation: Outcomei = (model) + errori • In regression the model is linear and we summarize a data set with a straight line. • The regression line is determined through the method of least squares. PSYC 4310/6310 Advanced Experimental Methods and Statistics © 2011, Michael Kalsher 3 The Regression Line: Model = “things that define the line we fit to the data” • Any straight line can be defined by: - The slope of the line (b1) The point at which the line crosses the ordinate, termed the intercept of the line (b0) The general equation: … becomes Outcomei = (model) + errori Yi = (b0 + b1Xi) + εi • b1 and b0 are termed regression coefficients • • • PSYC 4310/6310 b1 tells us what the model looks like (it’s shape) b0 tells us where the model is in geometric space εi is the residual term and represents the difference between participant i’s predicted and obtained scores. Advanced Experimental Methods and Statistics © 2011, Michael Kalsher 4 Method of Least Squares: Finding the line of best fit Regression Line Slope = b1 = dy / dx Residual (Error in Prediction) Sum of residuals = 0 Individual Data Points The method of least squares selects the line (regression line) that has the lowest sum of squared differences and therefore best represents the observed data. Once we determine the slope (b1)and intercept (b0) of the line, we can insert different values of our predictor variable into the model to estimate the value of the outcome variable. Intercept (Constant) b0 PSYC 4310/6310 Advanced Experimental Methods and Statistics © 2011, Michael Kalsher 5 Assessing Goodness of Fit • Even the best fitting line can be a lousy fit to the data, so we need to assess the goodness of fit of the model against our best estimate--the mean. • Let’s consider an example (see Field, p. 201): – A music mogul wants to know how many records her company will sell if she spends £100,000 on advertising. – In the absence of a model of the relationship between advertising and sales, the best guess would be the mean number of record sales (say 200,000)--regardless of amount of advertising. – So, as a basic strategy for predicting the outcome, we could use the mean, because on average it is a good guess. PSYC 4310/6310 Advanced Experimental Methods and Statistics © 2011, Michael Kalsher 6 Assessing Goodness of Fit SST SSR SSM Represents the total amount of differences present when the most basic model is applied to the data. Represents the degree of inaccuracy when the best model is fitted to the data. Shows the reduction in inaccuracy resulting from fitting the regression model to the data. SST uses the differences between the observed data and the mean value of Y. SSR uses the differences between the observed data and the regression line. SSM uses the differences between the mean value of Y and the regression line. A large SSM implies the regression model predicts the outcome variable better than the mean. PSYC 4310/6310 Advanced Experimental Methods and Statistics © 2011, Michael Kalsher 7 Assessing Goodness of Fit A large SSM implies the regression model is much better than using the mean to predict the outcome variable. How big is big? Assessed in two ways: (1) Via R2 and (2) the F-test (assesses the ratio of systematic to unsystematic variance). SSM 2 R = SST F= Represents the amount of variance in the outcome explained by the model relative to how much variance there was to explain. PSYC 4310/6310 Advanced Experimental Methods and Statistics SSM / df SSR / df = MSM MSR df for SSM = number of variables in the model df for SSR = number of observations minus number of parameters being estimated. © 2011, Michael Kalsher 8 Simple Regression Using SPSS: Predicting Record Sales (Y) from Advertising Budget (X) Record1.sav What’s the overall relationship between record sales and advertising budget? PSYC 4310/6310 Advanced Experimental Methods and Statistics © 2011, Michael Kalsher PSYC 4310/6310 Advanced Experimental Methods and Statistics © 2011, Michael Kalsher PSYC 4310/6310 Advanced Experimental Methods and Statistics © 2011, Michael Kalsher 11 Interpreting a Simple Regression: Overall Fit of the Model Advertising expenditure accounts for 33.5% of the variation in record sales. MSM SSM SSR SST MSR The significant “F” test allows us to conclude that the regression model results in significantly better prediction of record sales than the mean value of record sales. PSYC 4310/6310 Advanced Experimental Methods and Statistics © 2011, Michael Kalsher 12 Df = 1, 198 F=99.587 PSYC 4310/6310 Advanced Experimental Methods and Statistics © 2011, Michael Kalsher 13 Critical Values for F PSYC 4310/6310 Advanced Experimental Methods and Statistics © 2011, Michael Kalsher 14 14 Interpreting a Simple Regression: Model Parameters b0, the Y intercept b1, the slope, or the change in the outcome associated with a unit change in the predictor The ANOVA tells us whether the overall model results in a significantly good prediction of the outcome variable … not about the individual contribution of variables in the model. b0 = 134.14. Tells us that when no money is spent on ads, the model predicts 134,140 records will be sold. b1 = .096. The amount of change in the outcome associated with a unit change in the predictor. Thus, we can predict 96 extra record sales for every £1000 in advertising. Regression coefficients should be sig. different from 0 and big relative to their S.E. PSYC 4310/6310 Advanced Experimental Methods and Statistics © 2011, Michael Kalsher Interpreting a Simple Regression: Model Parameters Unstandardized Regression Weights Ypred = b0 + b1X Standardized Regression Weights Z y(pred) = bZx Intercept and Slope are in original units of X and Y and so aren’t directly comparable Standardized regression weights tell us the number of standard deviations that the outcome will change as a result of one standard deviation change in the predictor. Richards. (1982). Standardized versus Unstandardized Regression Weights. Applied Psychological Measurement, 6, 201-212. PSYC 4310/6310 Advanced Experimental Methods and Statistics © 2011, Michael Kalsher 16 Interpreting a Simple Regression: Using the Model Since we’ve demonstrated the model significantly improves our ability to predict the outcome variable (record sales), we can plug in different values of the predictor variable(s). record salesi = b0 + b1 advertising budgeti = 134.14 + (0.096 x advertising budgeti) What could the record executive expect if she spent £500,000 in advertising? How about £1,000,000? PSYC 4310/6310 Advanced Experimental Methods and Statistics © 2011, Michael Kalsher 17 Simple Regression: Supermodel.sav A fashion student interested in the factors that predict salaries of catwalk models collects data from 231 models. For each model, she asks them their salary per day on days they work (salary), their age (age), number of years they have worked as a model (years), and then gets a panel of experts from modeling agencies to rate the attractiveness of each model as a percentage with 100% being perfectly attractive (beauty). Use simple regression to predict the relationship between each of the potential predictor variables (i.e., age, years, beauty) to predict a model’s salary. PSYC 4310/6310 Advanced Experimental Methods and Statistics © 2011, Michael Kalsher 18 PSYC 4310/6310 Advanced Experimental Methods and Statistics © 2011, Michael Kalsher 19 Attractiveness Age Years PSYC 4310/6310 Advanced Experimental Methods and Statistics © 2011, Michael Kalsher 20 Attractiveness PSYC 4310/6310 Advanced Experimental Methods and Statistics © 2011, Michael Kalsher 21 Age PSYC 4310/6310 Advanced Experimental Methods and Statistics © 2011, Michael Kalsher 22 Years PSYC 4310/6310 Advanced Experimental Methods and Statistics © 2011, Michael Kalsher 23 PSYC 4310/6310 Advanced Experimental Methods and Statistics © 2011, Michael Kalsher 24