Download Slides

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Sufficient statistic wikipedia , lookup

Linear least squares (mathematics) wikipedia , lookup

Density of states wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Degrees of freedom (statistics) wikipedia , lookup

Taylor's law wikipedia , lookup

Student's t-test wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Transcript
Variability
1
Definition
Variability provides a measure of the degree
to which scores in a distribution are spread
out or clustered together
The more similar the scores are to each other,
the lower the measure of dispersion will be
The less similar the scores are to each other, the
higher the measure of dispersion will be
In general, the more spread out a distribution is,
the larger the measure of dispersion will be
2
Measures of Dispersion
Which of the
distributions of scores
has the larger
dispersion?
The upper distribution
has more dispersion
because the scores are
more spread out
That is, they are less
similar to each other
3
1
Measures of Dispersion
There are three main measures of
dispersion:
The range
The interquartile range
Variance / standard deviation
4
The Range
The range is defined as the difference between
the largest score in the set of data and the
smallest score in the set of data, XMax- XMin
What is the range of the following data:
4 8 1 6 6 2 9 3 6 9
The largest score (XMax) is 9; the smallest score
(XMin) is 1; the range is XMax - XMin = 9 - 1 = 8
5
When To Use the Range
The range is used when
you have ordinal data or
you are presenting your results to people with little
or no knowledge of statistics
The range is rarely used in scientific work as it
is fairly insensitive
It depends on only two scores in the set of data,
XMax and XMin
Two very different sets of data can have the same
range:
1 1 1 1 9 vs 1 3 5 7 9
6
2
The Interquartile Range
The interquartile range is defined as the
difference of the first and third quartiles
The first quartile is the 25th percentile
The third quartile is the 75th percentile
IR = Q3 - Q1
The semi-interquartile range is one half of
the interquartile range
7
IR Example
What is the IR for the data
to the right?
Lower half of the data:
N=5; index of middle score
= (N+1)/2=3
6 is the first quartile
Upper half of the data:
N=5; index of middle score
= (N+1)/2=3
20 is the third quartile
IR = (Q3 - Q1) = (20 - 6) =
14
8
IR Example
n = 10
p = 25
i = (n X p) / 100 + 0.5 = 3
k = integer part of i = 3
f=i–k=0
Q1 = (1 – f) X Obsk + f X Obsk + 1
= (1 – 0) X 6 + 0 X 8 = 6
9
3
IR Example
n = 10
p = 75
i = (n X p) / 100 + 0.5 = 8
k = integer part of i = 8
f=i–k=0
Q3 = (1 – f) X Obsk + f X Obsk + 1
= (1 – 0) X 20 + 0 X 30 = 20
IR = Q3 – Q1 = 20 – 6 = 14
10
When To Use the IR
The IR is often used with skewed data as it
is insensitive to the extreme scores
11
Variance of a Population
Variance is defined as the average of the
square deviations:
12
4
What Does the Variance Formula
Mean?
First, it says to subtract the mean from each
of the scores
This difference is called a deviate or a deviation
score
The deviate tells us how far a given score is
from the typical, or average, score
Thus, the deviate is a measure of dispersion for
a given score
13
What Does the Variance Formula
Mean?
Why can’t we simply take the average of
the deviates? That is, why isn’t variance
defined as:
This is not the
formula for
variance!
14
What Does the Variance Formula
Mean?
One of the definitions of the mean was that
it always made the sum of the scores minus
the mean equal to 0
Thus, the average of the deviates must be 0
since the sum of the deviates must equal 0
To avoid this problem, statisticians square
the deviate score prior to averaging them
Squaring the deviate score makes all the
squared scores positive
15
5
What Does the Variance Formula
Mean?
Variance is the mean of the squared
deviation scores
The larger the variance is, the more the
scores deviate, on average, away from the
mean
The smaller the variance is, the less the
scores deviate, on average, from the mean
16
Standard Deviation
When the deviate scores are squared in variance,
their unit of measure is squared as well
E.g. If people’s weights are measured in pounds,
then the variance of the weights would be expressed
in pounds2 (or squared pounds)
Since squared units of measure are often
awkward to deal with, the square root of variance
is often used instead
The standard deviation is the square root of variance
17
Standard Deviation
Standard deviation = √variance
Variance = standard deviation2
18
6
Computational Formula
When calculating variance, it is often easier to use
a computational formula which is algebraically
equivalent to the definitional formula:
σ2 is the population variance, X is a score, µ is the
population mean, and N is the number of scores 19
Computational Formula Example
20
Computational Formula Example
21
7
Variance of a Sample
Samples are usually less variable than the
population
Because the extreme values of the population are
rare, they are less likely to appear in the sample
If we used the formula for the variance of a
population with a sample, the sample variance
would systematically underestimate the
population variance
The estimate would be biased
22
Variance of a Sample
Thus, to avoid being biased, we need to increase
the size of the estimate by a little. That is we need
to divide the sum of the squares by a slightly
smaller number:
s2 is the sample variance, X is a score, X is the
sample mean, and N is the number of scores
23
Variance of a Sample
Thus, to avoid being biased, we need to increase
the size of the estimate by a little. That is we need
to divide the sum of the squares by a slightly
smaller number:
s2 is the sample variance, X is a score,  is the
sample mean, and n is the number of scores
24
8
Sum of Squares (SS) &
Degrees of Freedom (df)
The numerator of the formula, Σ(X - )² , is
called the sum of squares or SS
The denominator of the formula, n – 1, is called
the degrees of freedom or df
The degrees of freedom are the number of scores that
are free to take on any value after some restriction
(such as specifying the value of the mean) has been
placed on a set of data
25
Transformations of Scale
Adding or subtracting a constant from all
scores does not change the standard
deviation
Multiplying or dividing all scores by a
constant causes the standard deviation to be
multiplied or divided by the same constant
26
9