Download 10/12

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Wednesday, October 12
Correlation and Linear Regression
Explain to a non-statistician what it means to say “reading and math
scores are correlated r=.69 in this population”.
40
60
80
20
40
60
80
80
60
RDG
40
20
80
.69
60
MATH
40
80
60
SCI
40
20
80
60
CIV
40
20
2
0
CONCPT
-2
20
40
60
80
20
40
60
80
-2
0
2
Explain to a non-statistician what it means to say “reading and math
scores are correlated r=.69 in this population”.
• Using either the reading or math score,
you can predict the other value by how
much sd it is from its mean. Or r=.069
means both values are 0.69 sd from their
means.
Explain to a non-statistician what it means to say “reading and math
scores are correlated r=.69 in this population”.
• Using either the reading or math score,
you can predict the other value by how
much sd it is from its mean. Or r=.069
means both values are 0.69 sd from their
means.
• Regarding the question you gave today in the
class, I think that statement, “reading and math
scores are correlated r=.69 in this population”,
means that scores of reading can explain 69% of
math scores or vice versa. For example,
considering a relationship between reading and
math scores, (drawing a regression equation, in a
statistical term), that relationship (regression
equation) can explain 69% of other score, but 31%
of uncertainty remains.
•
• ". In general terms, when reading scores go
up, math scores also go up or when reading
scores go down, math scores go down. This
correlation says nothing about causation or
what caused what, only that the two scores
tend to move in the same direction
together. .69 indicates a strong degree of
correlation. This means that you can predict
fairly well, within a range of scores, what a
students math score will be based on their
reading score or vice versa.
• .69 is the slope of the line expressing the estimated
relationship between reading and math, in terms of
standardized scores. This slope is based on a set of
data points, their means, and the standard deviations,
which are basically the averages of the differences
between the actual scores in the sample and the
estimated scores in the population. Standardized
scores are the conversion of raw scores into scores in
terms of standard deviation units, and are centered
around zero. We can say that a 1-standard deviation
increase in reading is associated with a .69 standard
deviation increase in math, and vice versa. Reading
and math are strongly positively correlated.
• Saying that "reading and math scores are
correlated r=.69 in this population" tells you
that, knowing a given reading score's
deviation from the reading score mean, you
can make a pretty good prediction of how
much the math score will deviate from the
math score mean. When deviations are
expressed in standardized units, the ratio of a
predicted math score's deviation from the
math score mean to the given reading score's
deviation from the reading score mean is
.69.
•
•
•
•
•
The correlation value gives us a sense of how variables move in relation to
one another. The correlation ranges from -1 to 1. Positive correlations are
like attractive magnets: as one variable moves up or down, it tugs the other
value in the same direction. Negative correlations are like repulsive, samepole magnets: as one variable increases, it pushes the other variable farther
away from it.
The first thing to note is that our correlation value, 0.69, is positive. A
positive correlation indicates that as math scores increase, reading scores
will also increase (think of the attracting magnets). If the correlation value
was -0.69, this would move the other way: higher match scores would tend
to pair up with lower reading scores in a given student.
0.69 is also a high correlation value. This gives us a sense that we can
make fairly reliable predictions about a second variable if we know the first.
The closer the correlation value is to 0, the less reliably we can predict one
variable from another. In this case, given a student's math score, we could
predict his/her reading score with fair reliability.
The correlation coefficient r indicates the strength and direction of association between two variables.
In this example, it measures the association between math scores and reading scores amongst a
group of students, that is, the extent to which they vary together rather than independently of one
another. Math scores and reading scores could be correlated positively, meaning that high math
scores tend to go hand in hand with high reading scores and low math scores tend to go hand in
hand with low reading scores. If they were correlated negatively, math scores would decrease as
reading scores increased, and vice versa. R can take a value between -1 and 1, where 1 indicates
a perfect positive correlation and 1 indicates a perfect negative correlation, and 0 indicates that
there is no relationship between the two variables at all (i.e. they vary completely independently
from one another). The more perfect the relationship, the more accurately we can predict one
variable from another. To say that reading and math scores are correlated r=.69 in this population
therefore implies that as math scores get higher, reading scores get higher (a positive correlation)
and further, that this association is fairly strong - looking at a particular math score would give us a
pretty good idea of the corresponding reading score.
We can more deeply understand the magnitude of the correlation coefficient r via standardized scores:
z scores. A z score is a measurement of a raw (the original) score in terms of standard deviations,
that is, in terms of how much the raw score deviates from the mean (standard deviation measures
the variability of scores from the mean). As an example, if a raw score of 50 corresponds to a z
score of 2, that implies that it is 2 standard deviations above the mean.
Now imagine we have two variables, such as in our example reading scores and math scores, and
convert the raw scores into z scores. If all the the paired z scores (i.e. a student's transformed
reading and math scores) are the same and have the same sign (i.e. both lie either above or
below the mean, and are the same number of standard deviations away from it), we have a
perfect positive correlation. If the paired z scores are the saem, but have different signs (i.e. one
lies above and the other below the mean, but both are the same number of standard deviation
away from it), we have a perfect negative correlation. Saying that reading and math scores are
correlated r=.69 in this population, then, in terms of paired z scores, means that we can predict
the standardised math score by multiplying the standardised reading score by .69 - i.e. a reading
score that is 1.5 standard deviations from the mean (of reading scores) corresponds to a
You will not leave the room until…
• you have understood that a correlation is a
systematic quantitative expression of the
proportion of explained and unexplained
co-variation of two variables.
You will not leave the room until…
• you have understood that a correlation is a
systematic quantitative expression of the
proportion of explained and unexplained
co-variation of two variables … and you
love knowing this fact!
z y = zx
When X and Y are perfectly correlated
We can say that zx perfectly
predicts zy
zy’ = zx
Or
^
z y = zx
When they are imperfectly
correlated, i.e., rxy ≠ 1 or -1
zy’ = rxyzx
r is the slope of the predicted line, with a zero-intercept of z’y=0