Download I. Descriptive Statistics (3) Correlational studies Relationship

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Forecasting wikipedia , lookup

Time series wikipedia , lookup

Interaction (statistics) wikipedia , lookup

Instrumental variables estimation wikipedia , lookup

Regression toward the mean wikipedia , lookup

Regression analysis wikipedia , lookup

Linear regression wikipedia , lookup

Coefficient of determination wikipedia , lookup

Transcript
I. Descriptive Statistics (3)
Correlational studies
 Correlation
 Relationship between variables
 Scatterplots (Scatter Diagrams)
 Measuring Correlation:
 Studies in which two or more variables are
measured to find the direction and degree to
which they covary.
•  The Correlation Coefficient: r
 Covary: Two variables covary when a
change in one variable is related to a
consistent change in the other variable
Relationship between variables
  Postive relationship
  A relationship between two variables in which, as the
value of one variable increases, the value of the other
variable tends to increase also
Scatterplots (Scatter Diagrams)
 Bivariate distribution:
 A distribution in which two scores are obtained
from each subject
  Negative relationship
  A relationship between two variables in which, as the
value of one variable increases, the value of the other
variable tends to decrease
  No relationship
  Lack of relationship. No regularity among the pairs of
values of the variables
Scatterplots (Scatter Diagrams)
 Postive correlation
 Scatterplot:
 A graph of a bivariate distribution in which the
X variable is plotted on the horizontal axis and
the Y variable is plotted on the vertical axis
Scatterplots (Scatter Diagrams)
 Negative correlation
1
Scatterplots (Scatter Diagrams)
 No correlation
Scatterplots (Scatter Diagrams)
 Linear relationship
 A relationship between two variables that can
be described by a straight line
 Curvilinear relationship
 A relationship between two variables that can
be described best with a curved line
Measuring Correlation
 Correlation Coefficient
 A number between –1 and 1 that describes the
relationship between pairs of variables
 Pearson Correlation Coefficient
 A statistic, symbolized by r, that indicates the
degree of linear relationship between two
variables measured at the interval or ratio level
 -1 ≤ r ≤ 1
Measuring Correlation
Measuring Correlation
  Pearson Correlation
_
_ Coefficient
  r = ∑(x - X)(y - Y) / (n sx sy)
  r = ∑ zx zy / n
(z score formula)
  r = (n ∑xy - ∑x ∑y) / [√(n ∑x2 – (∑x)2 ) √(n ∑y2 – (∑y)2 )]
(computational formula)
  What does r mean?
  r is a type of mean: the mean of the products of paired z
scores
  Based on a measure of covariation: Cross Products
  The value of r is a measure of how well a straight line
describes the cluster of dots in a scatterplot
I. Descriptive Statistics (4)
 Regression
 But keep in mind that, … a high correlation
does not mean that there is a cause-effect
relation!
 Experimentation is needed!
 Regression Line and Predictive Errors
 Least Squares Regression Line
 Least Squares Regression Equation
 Standard Error of prediction
 What is r2?
2
Regression: Building on
Correlation
  Prediction (regression) vs. relation (correlation)
  (Simple) linear regression:
  Statistical tool used to predict scores on the dependent
variable from scores on (one) independent variable
Regression Line and
Predictive errors
  When a bivariate distribution
shows a linear relationship, it
is sometimes useful to try to
predict X from Y using a
regression line.
  This line is conceived as an
approximation to the cloud of
data observed in the
scatterplot.
Regression Line and
Predictive errors
Equation for a line
  Slope: The amount that Y is predicted to increase for an
increase of 1 in X.
  Y-intercept: the predicted value for Y when X is 0 (point at
which the line intercepts the y-axis)
  Predictive error: It is the difference, for each X, between
the observed corresponding Y and the value of the Ycoordinate.
  The position of the regression line should minimize the
total predictive error.
y = 2x + 5
Least Squares Regression line
 least squares regression line:
 the prediction line that minimizes the total
squared predictive error
 It has the form: y = mx + n
Least Squares equation
  The least squares regression
equation minimizes the total
of all squared prediction
errors for known Y scores in
the original correlation
analysis.
Yˆ = bX + a
  The slope is: b = r(Sy/Sx_)
_
  The Y-intercept is: a = Y – bX
  Assumptions: Linearity and
Homoscedasticity
3
Least Squares equation
  The least squares regression equation minimizes the total
of all squared prediction errors for known Y scores in the
original correlation analysis.
Yˆ = bX + a
  The slope is: b = r(Sy/S_x) _
  The Y-intercept is: a = Y – bX
Standard error of prediction
  How to determine the amount of error associated
with these predictions?
  Standard error of prediction (or Standard error of
the estimate):
  A statistic that indicates the typical distance
between a regression line and the actual data points
  Where do the values for a and b come from? And what do
they mean? (see pdf notes)
Standard error of prediction
  Standard error of prediction (or Standard error of
the estimate):
What is r2?
 The squared correlation coefficient (r2):
 is the proportion of total variance in one
variable that is predictable from its relationship
with the other variable
 provides a measure of the worth of least
squares as predictors
r2)
  Sy|x = Sy √(1 –
  It is a (rough) measure of the average amount of
predictive error by which known Y values deviate from
predicted Yˆ values
  It reflects the degree to which the points diverge from
the regression line.
  It reflects the accuracy of the prediction.
 
r2 = (SStotal – SSerror ) / SStotal
4