Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
2/8/2016 Regression PSYC 381 – Statistics Arlo Clark-Foos Regression: Predicting the Future • Correlation Regression • Examples: – Car Insurance • Age, Male, Car, Driving History – WHO & Avian Flu • Spread, Poverty Regression vs. Correlation • Regression: Prediction • Correlation: Relationship • Simple Linear Regression – Statistical tool that predicts an individual’s score on the DV from the score on one IV – Uses a straight line…if we know x, we can find y 1 2/8/2016 Linear Regression Using z Scores • A student who knows they will miss X days…What can I tell them about their probable exam grade? Linear Regression Using z Scores z yˆ (rxy )( z x ) ŷ = y “hat” (predicted score on variable y) rxy = Correlation between x and y zx = z score for a raw score on variable x Linear Regression Using z Scores Note: Predicted z scores for Y are smaller (i.e., closer to the mean) than the actual z scores for X…they are regressing to the mean. 2 2/8/2016 Regression to the Mean • The tendency of scores that are particularly high or low to drift toward the mean over time • Teaching Air Force Training – Good and Bad Days Flying Operant Conditioning Reward vs. Punishment Linear Regression Using z Scores • Regression to the mean – The tendency of scores that are particularly high or low to drift toward the mean over time • Predicted z score to predicted raw score z X X z ( ) Creating a Regression Line y m( x ) b Yˆ a b( X ) a = intercept…the value of Y when X = 0 b = slope, the amount of increase in Y for every increase of 1 in X 3 2/8/2016 Calculating Intercept (a) 1. Calculate a z score for X = 0 zx (0 M x ) SDx 2. Calculate predicted z score for Y z yˆ (rxy )( z x ) 3. Calculate predicted raw score from predicted z score Yˆ zY (SDY ) M Y Calculating Slope (b) • Repeat steps for X = 1 Slope Rise y2 y1 Run x2 x1 • How does Y-hat change as X goes from 0 to 1? – If positive, then the line goes up to the right. – If negative, then the line goes down to the right. – Drawing a regression line • Calculate several pairs of Y-hat and X, then plot them on your scatter plot and draw a straight line through the points. Standardized Slope (β) • When comparing regression equations for variables measured on different scales. – β = standardized version of slope in a regression equation (st. deviation (σ)units). 𝛽= 𝑏 𝑆𝑆𝑋 𝑆𝑆𝑌 4 2/8/2016 Errors in Prediction • Predicting the cost of moving to MI from GA – Truck Rental, Gas, Hotels – Oops…pet fee at hotels, food on the way up, furniture pads for truck – Standard Error of the Estimate • A statistic indicating the typical distance between regression line and actual data points Effect Size of Regression • Proportionate Reduction in Error (r2) – AKA: Coefficient of determination – Statistic that quantifies how much more accurate our predictions are when we use the regression line instead of the mean as a prediction tool. – Goal: How accurate is our regression equation at predicting the future? Coefficient of Determination (r2) • SSTotal – Total error we have if we use only the mean to predict 2 SSTotal (Y M Y ) 5 2/8/2016 Coefficient of Determination (r2) • SSTotal – Total error we have if we use only the mean to predict Coefficient of Determination (r2) • SSError – Total error we have if we use Y-hat from regression equation. SS Error (Y Yˆ ) 2 Coefficient of Determination (r2) • SSError – Total error we have if we use Y-hat from regression equation. 6 2/8/2016 Coefficient of Determination (r2) r2 ( SSTotal SS Error ) SSTotal • The amount of variance in DV that is explained by the IV – Proportion of variance accounted for Multiple Regression & R2 Y'i = b0 + b1X1i + b2X2i • Using several variables to predict future scores – Orthogonal Variable • An IV that makes a separate and distinct contribution in the prediction of a DV Stepwise Multiple Regression • Software determines the order in which IVs are included in the regression equation – Largest significant r2 comes first – Pros: Good if we have no good theory about our predictions – Cons: May ignore nonorthogonal, overlapping, variables…implying they are unimportant 7 2/8/2016 Hierarchical Multiple Regression • Researcher uses theory to determine the order in which IVs are included in the regression equation • PSYC 465: Age, Gender, Sleep, Depression • Pros: Based on theory so it is less likely to identify bad predictors on accident • Cons: Sometimes our theory is lacking 8