Survey

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts

no text concepts found

Transcript

Wednesday, October 12 Correlation and Linear Regression Explain to a non-statistician what it means to say “reading and math scores are correlated r=.69 in this population”. 40 60 80 20 40 60 80 80 60 RDG 40 20 80 .69 60 MATH 40 80 60 SCI 40 20 80 60 CIV 40 20 2 0 CONCPT -2 20 40 60 80 20 40 60 80 -2 0 2 Explain to a non-statistician what it means to say “reading and math scores are correlated r=.69 in this population”. • Using either the reading or math score, you can predict the other value by how much sd it is from its mean. Or r=.069 means both values are 0.69 sd from their means. Explain to a non-statistician what it means to say “reading and math scores are correlated r=.69 in this population”. • Using either the reading or math score, you can predict the other value by how much sd it is from its mean. Or r=.069 means both values are 0.69 sd from their means. • Regarding the question you gave today in the class, I think that statement, “reading and math scores are correlated r=.69 in this population”, means that scores of reading can explain 69% of math scores or vice versa. For example, considering a relationship between reading and math scores, (drawing a regression equation, in a statistical term), that relationship (regression equation) can explain 69% of other score, but 31% of uncertainty remains. • • ". In general terms, when reading scores go up, math scores also go up or when reading scores go down, math scores go down. This correlation says nothing about causation or what caused what, only that the two scores tend to move in the same direction together. .69 indicates a strong degree of correlation. This means that you can predict fairly well, within a range of scores, what a students math score will be based on their reading score or vice versa. • .69 is the slope of the line expressing the estimated relationship between reading and math, in terms of standardized scores. This slope is based on a set of data points, their means, and the standard deviations, which are basically the averages of the differences between the actual scores in the sample and the estimated scores in the population. Standardized scores are the conversion of raw scores into scores in terms of standard deviation units, and are centered around zero. We can say that a 1-standard deviation increase in reading is associated with a .69 standard deviation increase in math, and vice versa. Reading and math are strongly positively correlated. • Saying that "reading and math scores are correlated r=.69 in this population" tells you that, knowing a given reading score's deviation from the reading score mean, you can make a pretty good prediction of how much the math score will deviate from the math score mean. When deviations are expressed in standardized units, the ratio of a predicted math score's deviation from the math score mean to the given reading score's deviation from the reading score mean is .69. • • • • • The correlation value gives us a sense of how variables move in relation to one another. The correlation ranges from -1 to 1. Positive correlations are like attractive magnets: as one variable moves up or down, it tugs the other value in the same direction. Negative correlations are like repulsive, samepole magnets: as one variable increases, it pushes the other variable farther away from it. The first thing to note is that our correlation value, 0.69, is positive. A positive correlation indicates that as math scores increase, reading scores will also increase (think of the attracting magnets). If the correlation value was -0.69, this would move the other way: higher match scores would tend to pair up with lower reading scores in a given student. 0.69 is also a high correlation value. This gives us a sense that we can make fairly reliable predictions about a second variable if we know the first. The closer the correlation value is to 0, the less reliably we can predict one variable from another. In this case, given a student's math score, we could predict his/her reading score with fair reliability. The correlation coefficient r indicates the strength and direction of association between two variables. In this example, it measures the association between math scores and reading scores amongst a group of students, that is, the extent to which they vary together rather than independently of one another. Math scores and reading scores could be correlated positively, meaning that high math scores tend to go hand in hand with high reading scores and low math scores tend to go hand in hand with low reading scores. If they were correlated negatively, math scores would decrease as reading scores increased, and vice versa. R can take a value between -1 and 1, where 1 indicates a perfect positive correlation and 1 indicates a perfect negative correlation, and 0 indicates that there is no relationship between the two variables at all (i.e. they vary completely independently from one another). The more perfect the relationship, the more accurately we can predict one variable from another. To say that reading and math scores are correlated r=.69 in this population therefore implies that as math scores get higher, reading scores get higher (a positive correlation) and further, that this association is fairly strong - looking at a particular math score would give us a pretty good idea of the corresponding reading score. We can more deeply understand the magnitude of the correlation coefficient r via standardized scores: z scores. A z score is a measurement of a raw (the original) score in terms of standard deviations, that is, in terms of how much the raw score deviates from the mean (standard deviation measures the variability of scores from the mean). As an example, if a raw score of 50 corresponds to a z score of 2, that implies that it is 2 standard deviations above the mean. Now imagine we have two variables, such as in our example reading scores and math scores, and convert the raw scores into z scores. If all the the paired z scores (i.e. a student's transformed reading and math scores) are the same and have the same sign (i.e. both lie either above or below the mean, and are the same number of standard deviations away from it), we have a perfect positive correlation. If the paired z scores are the saem, but have different signs (i.e. one lies above and the other below the mean, but both are the same number of standard deviation away from it), we have a perfect negative correlation. Saying that reading and math scores are correlated r=.69 in this population, then, in terms of paired z scores, means that we can predict the standardised math score by multiplying the standardised reading score by .69 - i.e. a reading score that is 1.5 standard deviations from the mean (of reading scores) corresponds to a You will not leave the room until… • you have understood that a correlation is a systematic quantitative expression of the proportion of explained and unexplained co-variation of two variables. You will not leave the room until… • you have understood that a correlation is a systematic quantitative expression of the proportion of explained and unexplained co-variation of two variables … and you love knowing this fact! z y = zx When X and Y are perfectly correlated We can say that zx perfectly predicts zy zy’ = zx Or ^ z y = zx When they are imperfectly correlated, i.e., rxy ≠ 1 or -1 zy’ = rxyzx r is the slope of the predicted line, with a zero-intercept of z’y=0