Download VARIANCE

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Psychometrics wikipedia , lookup

Generalized linear model wikipedia , lookup

Probability box wikipedia , lookup

Taylor's law wikipedia , lookup

Transcript
VARIANCE
Psychologists try to explain and/or predict behavior. They do this by showing that the
behavior of interest is related to other factors. For example, suppose that you want to
know if aggression in children is related to the number of hours that children watch TV.
In this case, the behavior of interest (the dependent variable) is aggression, and you want
to know if aggression can be systematically related to hours of TV watching.
To do this study, you would first have to select a group of children and measure each
child's aggression and the number of hours each child watches TV. If you took these
measurements you would find a range of values for each factor. Some children watch a
lot of TV, others very little. Some children are more aggressive than others. We refer to
the different values associated with a measure (e.g. aggression) as a distribution of scores.
For example, hours of TV watching would have a distribution of score values.
When you try to establish a relationship between variables, what you are trying to do is to
show that variability in one set of scores (e.g. aggression) can be systematically related to
variability in another set of scores (e.g. hours of TV ). For example, you might find that
aggression tends to increase as hours of TV watching increases.
How do we show that two variables such as aggression and hour of TV watching are
related to one another? There are a number of ways to do this, but each depends on
relating one distribution of scores to another distribution of scores. In order to show that
distributions are related, we have to describe or measure the amount of variability in each
distribution. There are various ways to describe or measure variability. For example, you
can create a visual representation of a distribution of scores: A frequency distribution is a
visual representation of variability in a set of scores. You can also produce visual
representations showing how different score distributions are related to one another:
Scatterplots are commonly used to illustrate relationships between distributions of scores.
Visual representations are very important statistical tools for evaluating data we have
collected in a research project. However, they have a shortcoming. They can not be
evaluated mathematically. We can simply "eyeball" them looking for relationships. Of
course, this creates problems because people will have no objective way to interpret the
figures. What we need is a way to mathematically measure the amount of variability in a
set of scores and to mathematically show that variability in one set of scores is related to
variability in another set of scores.
It turns out that statisticians have developed methods to define mathematically the
amount of variability in a set of scores and the extent to which variability in one set of
scores can be related to variability in another set of scores. The mathematical term for
indexing the amount of variability in a set of scores is VARIANCE. Variance is a number
that is based on the extent to which individual scores in a distribution of scores deviate
from the mean of that distribution of scores. For example, suppose that after measuring
aggression in our sample of children, we find aggression scores varying from 1 to 30 (I
made up these values ). We can use these raw score values to compute a mean aggression
score (an average). If we subtract the mean from each individual's aggression score we
can create a new set of scores called deviation scores. Now each individual has a raw
score and a deviation score that represents the extent to which the individual's raw score
differs from the mean score. Variance is calculated from these deviation scores. If the
deviation scores are large then variance will be large and visa versa. For descriptive
purposes, variances are often converted to values called standard deviations. A standard
deviation is like an average or mean. It can be thought of as the average size of the
deviation scores. For example, if the standard deviation for our aggression scores is 1.4,
we can assume that on average most individual raw scores will be about 1.4 score units
away from the mean. As an example (I am making this up), suppose that the mean
aggression score is 15. Then the majority of aggression raw scores would likely fall
between 13.6 and 16.4.
Virtually all statistical procedures used to establish relationships between variables
depend on the notion of variance. Remember that variance is one way to measure
variability in a set or distribution of scores. It is the mathematical holy grail. Only the
mean is a more important statistical concept, and it is so only by virtue of its role in
defining variance.
So how do we use variance to establish relationships between variables. For example,
how can variance be used to find out if aggression is related to TV watching? Here is one
way to think about this problem. Suppose that there is no relationship between aggression
and TV watching. If this were the case, then a child's aggression score would have no
connection to the child's TV watching score. It follows that a child's deviation aggression
score would then have no relation to a child's deviation TV watching score. Because
aggression variance and TV watching variance are both based on their respective
deviation scores, aggression variance would be unrelated to TV watching variance. In this
case, we would say that none of the variance associated with aggression can be accounted
for by variance in TV watching.
Now suppose that aggression and TV watching are perfectly related. That is, suppose that
if we knew how much TV a child watched we could predict with perfect accuracy their
aggression score. In this case, a child's aggression score could be predicted from their
TV watching score. If this were the case, then all of the aggression variance would be
accounted for by TV watching variance. In effect, TV watching scores could be
substituted for aggression scores.
Of course, in the empirical world there are probably no perfect relationships between
variables. You might be able to predict aggression based on knowledge of TV watching,
but you couldn't predict with perfect accuracy. Differences in aggression are likely
related to a wide variety of factors. In turns out that there are mathematical procedures
that allow you to compute how much of the total variance in your dependent variable
(aggression) is related to your predictor variable(TV watching), and how much of the
aggression variance is unrelated to your predictor variable. The latter source of variance
is often referred to as "residual variance" or sometimes "error variance". If two variables
are closely linked with one another, there will be very little residual variance, and we
would probably conclude that the variables are related to one another in some way.
FORMULA FOR VARIANCE:
∑ (X − X )
2
s =
2
x
N −1
where X is a score, X is the mean of the score distribution, and N is the number of
scores. The standard deviation is the square root of the variance.
∑ (X - X )
2
sx =
N -1