Download PPT Lecture Notes

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Data assimilation wikipedia , lookup

Interaction (statistics) wikipedia , lookup

Lasso (statistics) wikipedia , lookup

Instrumental variables estimation wikipedia , lookup

Choice modelling wikipedia , lookup

Time series wikipedia , lookup

Linear regression wikipedia , lookup

Regression analysis wikipedia , lookup

Coefficient of determination wikipedia , lookup

Transcript
Regression
Outline of Today’s Discussion
1.
Coefficient of Determination
2.
Regression Analysis: Introduction
3.
Regression Analysis: SPSS
4.
Regression Analysis: Excel
5.
Independent Predictors
Part 1
Coefficient of Determination
Coefficient of Determination
In correlational research
Researchers often use the “r-squared” statistic, also
called the “coefficient of determination”, to describe
the proportion of Y variability explained by X.
Coefficient of Determination
What range of values is possible for the
coefficient of determination (the r-squared statistic)?
Coefficient of Determination
Example: What is the evidence that IQ is heritable?
Coefficient of Determination
R-value for the IQ of
identical twins reared apart = 0.6.
What is the value of r-squared in this case?
Coefficient of Determination
So what proportion of the IQ is
unexplained (unaccounted for) by genetics?
Coefficient of Determination
Different sciences are characterized by
the r-squared values that are deemed impressive.
(Chemists might r-squared to be > 0.99).
Coefficient of Determination
As we have already seen r-squared
is the same as “eta-squared”.
Part 2
Regression Analysis Introduction
Regression Analysis Introduction
• Correlation is the process of finding a relationship
between variables.
• Regression is the process of finding the best-fitting
trend (line) that describes the relationship between
variables.
• So, correlation and regression are very similar!
Regression Analysis Introduction
• The ‘r’ statistic can be tested for statistical significance!
• Potential Pop Quiz Question: What two factors
determine the critical value (i.e., the number to beat)
when we engage in hypothesis testing?
Regression Analysis Introduction
DF for Correlation & Regression
Here n stands for the number of
pairs of scores.
Why would this be n-2,
rather than the usual n-1?
Regression Analysis Introduction
• In general, the formula for the degrees of freedom is
the number of observations minus the number of
parameters estimated.
• For correlation, we have one estimate for the mean of
X, and another estimate for the mean of Y.
• For regression, we have one estimate for the slope, and
another estimate for y intercept.
Regression Analysis Introduction
Slope can also be though of as
“rise over run”.
Regression Analysis Introduction
The “rise” on the ordinate = Y2 - Y1.
The “run” on the abscissa = X2 - X1.
Regression Analysis Introduction
“Rise over run” in pictures.
Regression Analysis Introduction
Here, the regression is “linear”…
Regression Analysis Introduction
Here, the regression is non-linear!
What would the equation look like for this trend?
Regression Analysis Introduction
• Let’s now return to linear regression, and learn how to
manually compute the slope and y-intercept.
• To compute the slope, we need two quantities that we
have already learned. These are SPxy (sums of
products) and SSx (sums of squares for X)…
Regression Analysis Introduction
SPxy
slope 
SSx
SPxy  (X  X )(Y  Y )
Regression Analysis Introduction
Once we have the slope,
it’s easy to get the y-intercept!
intercept = Y - (s lope * X)
Part 3
Regression Analysis:
SPSS
Regression Analysis: SPSS
• Later we’ll go to SPSS and get some practice with
regression.
• The steps in SPSS will be Analyze ---> Regression -->
Linear.
• We will place the criterion (i.e., the Y-axis variable) in
the “Dependent” box, and the predictor (i.e., the X-axis
variable) in the “Independent(s)” box.
• Click the “Statistics” box, and check “estimates”,
“model fit”, and “descriptives”.
Regression Analysis: SPSS
The “Coefficients” Section In SPSS Output
The “Coefficients” Section in the SPSS output
contains all the info needed for the regression equation,
the r statistic, and the evaluation of Ho (retain or reject).
Regression Analysis: SPSS
The “Coefficients” Section In SPSS Output
The constant is the “b” in, Y = mX + b.
Here, b = -9923.665
Regression Analysis: SPSS
The “Coefficients” Section In SPSS Output
The slope is the “m” in, Y = mX + b.
Here, m = 1807.836
Regression Analysis: SPSS
The “Coefficients” Section In SPSS Output
So, our regression equation is, Y = mX + b.
or
Y = 1807.836X - 9923.665.
Regression Analysis: SPSS
The “Coefficients” Section In SPSS Output
The r statistic is the
standardized coefficient, Beta.
r = .705
Regression Analysis: SPSS
The “Coefficients” Section In SPSS Output
Lastly, we look at the ‘sig’ value for the predictor,
(which is “EDU” in this case)
to determine whether predictor (x-axis variable)
is significantly correlated with the
criterion (y-axis variable).
Evaluate Ho: …do we retain or reject?
Part 4
Regression Analysis:
Excel
Regression Analysis: Excel
• Correlation and regression are very similar.
• If we have a significant correlation, the best-fitting
regression line is said to have a slope significantly
different from zero.
• Sometimes it is stated that “the slope departs
significantly from zero”.
Regression Analysis: Excel
• Note: A slope can be very modestly different from zero,
and still be “statistically significant” if all data points
fall very close to the line.
• In correlation and regression, statistical significance is
determined by the strength of the correlation between
two variables (the r-value), and NOT by the slope of the
regression line.
• The significance of the r-value, as always, depends on
the alpha level, and the df (which is n-2).
Take a peak at the r-value table.
Regression Analysis: Excel
Regression Analysis: Excel
• Remember: The regression line (equation) can help us
predict one score, given another score, but only if there
is a significant r-value.
• The terminology w/b… “the regression line explains (or
accounts for)” 42% of the variability in the scores (if rsquared = .42).
• To “explain” or “account for” does NOT mean “to
cause”.
Correlation does not imply causation!
Regression Analysis Continued
• A synonym for regression is prediction! Recall that
prediction is one of the four goals of the scientific
method. What were the others?
• A significant correlation implies a significant capacity
for prediction, i.e., a prediction that is reliably better
than chance!
Regression Analysis Continued
• The equation for a straight line, again, is:
y = mx + B
or
Criterion = ( slope * Predictor) + Intercept
• How many “parameters” in a linear equation?
• How about a quadratic equation?
Part 5
Independent Predictors
Independent Predictors
• So far, we’ve attempted to use regression for
prediction.
• Specifically, we’ve tried to predict one variable
Y (called the criterion), using one other
variable (called the predictor).
• Multiple Regression - the process by which one
variable Y (called the criterion) is predicted on
the basis of more than one variable (say, X1,
X2, X3…).
Independent Predictors
Here’s the simple case of one predictor variable.
The overlap (in gray) indicates the predictive strength.
Independent Predictors
If the overlap in the Venn diagram were to grow,
the r-value would grow, too!
Independent Prediction
Variable X1
Criterion (Y)
Here’s the same thing again…
but we’ll call the the predictor variable X1.
Independent Prediction
Variable X2
Variable X1
Criterion (Y)
By adding another predictor variable X2,
we could sharpen our predictions. Why?
Independent Prediction
Variable X2
Variable X1
Criterion (Y)
Unfortunately, X1 and X2 provide some redundant
information about Y, so the predictive increase is small.
Independent Prediction
Variable X2
Variable X1
Criterion (Y)
Unfortunately, X1 and X2 provide some redundant
information about Y, so the predictive increase is small.
Independent Prediction
Variable X2
Variable X1
Variable X3
Criterion (Y)
By contrast, variable X3 has no overlap with either
X1 or X2, so it would add the most new information.
Independent Prediction
Variable X2
Variable X1
Variable X3
Criterion (Y)
In short, since all three predictors provide some unique
information, predictions w/b best when using all three.
Independent Prediction
Variable X2
Variable X1
Variable X3
Criterion (Y)
If you wanted to be more parsimonious and use only
two of the three, which two would you pick, and why?
Independent Predictors
• That was a conceptual introduction to Multiple
Regression (predicting Y scores from more than one
variable).
• We will not learn about the computations for multiple
regression in this course (but you will if you take the
PSYCH 370 course).
• For our purposes, simply know that predictions
improve to the extent that the various predictors are
independent of each other.