Download Measures of Variation

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Taylor's law wikipedia , lookup

World Values Survey wikipedia , lookup

Time series wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript
Measures of Variation
Recap: Measure of Central Tendency — Mean, Median, and Mode
Mean

Measure of central tendency

Most common measure

Acts as ‗balance point‘

Affected by extreme values (‗outliers‘)

Measure of central tendency

Middle value in ordered sequence

If n is odd, middle value of sequence

If n is even, average of 2 middle values

Not affected by extreme values
Median
Mean is generally used, unless extreme values (outliers) exist — then median is often used, since the
median is not sensitive to extreme values.
Mode

A measure of central tendency

Value that occurs most often

Not affected by extreme values

Used for either numerical or nominal data

There may be no mode

There may be several modes
Measures of Variation
Range: difference between the extreme values (max - min), actual values are most often
reported in the literature (min - max) rather than the difference
Variance - measure of variation in a sample of data: mean squared deviations of a value
from the mean, often referred to as the mean square or MS
Standard deviation: square root of the variance, measures amount of variation of values
around the mean
Standard error: measure of variability of sample means around a population mean
Coefficient of variation: used to compare variability among different variables that vary in
magnitude of the values (elephant weight versus mouse weight)
Measures of variation give information on the spread or variability of the
data values.
The ―data sets‖ have the same Mean, Median, and Mode yet clearly differ!.
Range
o
Simplest measure of variation

Difference between the largest and the smallest observations
Range = Maximum – Minimum
Team I has range 6 inches, Team II has range 17 inches.
Disadvantages of the Range

Ignores the way in which data are distributed

Only uses two entries from the data set

Sensitive to outliers
.
Variance and Standard Deviation

Measures of variation

Most common measures

Consider how data are distributed

Uses all the entries in a data set

Show variation about mean
.First you must know what is meant by the deviation of an entry in a data set. The deviation of an
entry x in a population data set is the difference between the entry and the mean of the data set.
Deviation of x = x – mean
.
Example: Find the deviation of each player‘s height for Team I.
Notice that the sum of the deviations is zero. Because this is true for any data set, it does not make
sense to find the average of the deviations. To overcome this problem, you can square each
deviation.
In a population data set, the mean of the squares of the deviations is called the population variance.
The population variance of a population data set of N entries is
The symbol σ is the lowercase Greek letter sigma.
The
portion of the variance formula is referred to as the sum of squares and is denoted
by SSx.
.
Example: Find the population variance of the player‘s height for Team I.
Now notice the unit of the answer… inches squared. We were not working with area. This is the
disadvantage with variance in that its units are usually meaningless. You‘ll be able to return to the
original unit of the data by using the standard deviation.
.
The population standard deviation of a population data set of N entries is the square root of the
population variance.
.
Example: Find the population standard deviation of the player‘s height for Team I.
.
Example: Find the population variance and standard deviation of Team II.
Since the population standard deviation of Team I is less than that of Team II, this is an indicator that
the heights of Team I are, as a whole, closer to the mean height of the team.
. Spread of data
o
Variances and standard deviations can be used to determine the spread of the
data. If the variance or standard deviation is large, the data are more dispersed.
The information is useful in comparing two or more data sets to determine which
is more variable.
The sample variance and sample standard deviation of a sample data set of n entries are
Notice that when you find the population variance, you divide by N, the number of entries. When you
find the sample variance, you divide by n – 1, one less than the number of entries.
.
Example: The heights of the two teams represent a sample of heights of all basketball players. Find
the variance and standard deviation of this sample of all basketball players.