Download Regression

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Linear least squares (mathematics) wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Types of artificial neural networks wikipedia , lookup

Taylor's law wikipedia , lookup

Degrees of freedom (statistics) wikipedia , lookup

Categorical variable wikipedia , lookup

Regression toward the mean wikipedia , lookup

Transcript
2/8/2016
Regression
PSYC 381 – Statistics
Arlo Clark-Foos
Regression: Predicting the Future
• Correlation  Regression
• Examples:
– Car Insurance
• Age, Male, Car, Driving History
– WHO & Avian Flu
• Spread, Poverty
Regression vs. Correlation
• Regression: Prediction
• Correlation: Relationship
• Simple Linear Regression
– Statistical tool that predicts an individual’s score on the
DV from the score on one IV
– Uses a straight line…if we know x, we can find y
1
2/8/2016
Linear Regression Using z Scores
• A student who knows they will miss X days…What can I tell
them about their probable exam grade?
Linear Regression Using z Scores
z yˆ  (rxy )( z x )
ŷ
= y “hat” (predicted score on variable y)
rxy = Correlation between x and y
zx
= z score for a raw score on variable x
Linear Regression Using z Scores
Note: Predicted z scores for Y are smaller (i.e.,
closer to the mean) than the actual z scores
for X…they are regressing to the mean.
2
2/8/2016
Regression to the Mean
• The tendency of scores that are particularly high or
low to drift toward the mean over time
• Teaching Air Force Training
– Good and Bad Days Flying
Operant
Conditioning
Reward vs.
Punishment
Linear Regression Using z Scores
• Regression to the mean
– The tendency of scores that are particularly high
or low to drift toward the mean over time
• Predicted z score to predicted raw score
z
X   

X  z ( )  
Creating a Regression Line
y  m( x )  b
Yˆ  a  b( X )
a = intercept…the value of Y when X = 0
b = slope, the amount of increase in Y for
every increase of 1 in X
3
2/8/2016
Calculating Intercept (a)
1. Calculate a z score for X = 0
zx 
(0  M x )
SDx
2. Calculate predicted z score for Y
z yˆ  (rxy )( z x )
3. Calculate predicted raw score from
predicted z score
Yˆ  zY (SDY )  M Y
Calculating Slope (b)
• Repeat steps for X = 1
Slope 
Rise y2  y1

Run x2  x1
• How does Y-hat change as X goes from 0 to 1?
– If positive, then the line goes up to the right.
– If negative, then the line goes down to the right.
– Drawing a regression line
• Calculate several pairs of Y-hat and X, then plot them on your
scatter plot and draw a straight line through the points.
Standardized Slope (β)
• When comparing regression equations for
variables measured on different scales.
– β = standardized version of slope in a regression
equation (st. deviation (σ)units).
𝛽= 𝑏
𝑆𝑆𝑋
𝑆𝑆𝑌
4
2/8/2016
Errors in Prediction
• Predicting the cost of moving to MI from GA
– Truck Rental, Gas, Hotels
– Oops…pet fee at hotels, food on the way
up, furniture pads for truck
– Standard Error of the Estimate
• A statistic indicating the typical distance
between regression line and actual data points
Effect Size of Regression
• Proportionate Reduction in Error (r2)
– AKA: Coefficient of determination
– Statistic that quantifies how much more
accurate our predictions are when we use
the regression line instead of the mean as
a prediction tool.
– Goal: How accurate is our regression
equation at predicting the future?
Coefficient of Determination (r2)
• SSTotal
– Total error we have if we use only the mean to
predict
2
SSTotal  (Y  M Y )
5
2/8/2016
Coefficient of Determination (r2)
• SSTotal
– Total error we have if we use only the mean to
predict
Coefficient of Determination (r2)
• SSError
– Total error we have if we use Y-hat from
regression equation.
SS Error  (Y  Yˆ ) 2
Coefficient of Determination (r2)
• SSError
– Total error we have if we use Y-hat from
regression equation.
6
2/8/2016
Coefficient of Determination (r2)
r2 
( SSTotal  SS Error )
SSTotal
• The amount of variance in DV that is
explained by the IV
– Proportion of variance accounted for
Multiple Regression & R2
Y'i = b0 + b1X1i + b2X2i
• Using several variables to predict future scores
– Orthogonal Variable
• An IV that makes a separate and distinct
contribution in the prediction of a DV
Stepwise Multiple Regression
• Software determines the order in which IVs
are included in the regression equation
– Largest significant r2 comes first
– Pros: Good if we have no good theory
about our predictions
– Cons: May ignore nonorthogonal,
overlapping, variables…implying they are
unimportant
7
2/8/2016
Hierarchical Multiple Regression
• Researcher uses theory to determine the
order in which IVs are included in the
regression equation
• PSYC 465: Age, Gender, Sleep, Depression
• Pros: Based on theory so it is less likely to
identify bad predictors on accident
• Cons: Sometimes our theory is lacking
8