Download Measuring Dispersion

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Degrees of freedom (statistics) wikipedia , lookup

History of statistics wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Taylor's law wikipedia , lookup

Categorical variable wikipedia , lookup

Student's t-test wikipedia , lookup

Regression toward the mean wikipedia , lookup

Transcript
Why statisticians were created
Measure of dispersion
FETP India
Competency to be gained
from this lecture
Calculate a measure of variation that is
adapted to the sample studied
Key issues
• Range
• Inter-quartile variation
• Standard deviation
Measures of spread, dispersion or
variability
• The measure of central tendency provides
important information about the distribution
• However, it does not provide information
concerning the relative position of other
data points in the sample
• Measure of spread, dispersion or variability
address are needed
Range
Why one needs to measure variability
Marks obtained
Biology
Physics
Chemistry
1
200
199
100
2
200
200
200
3
200
201
300
Mean
200
200
200
Variation
Nil
Slight
Substantial
0
2
200
Students
Range
Range
Every concept comes from a failure of
the previous concept
• Mean is distorted by outliers
• Median takes care of the outliers
Range
The range:
A simple measure of dispersion
• Take the difference between the lowest
value and the highest value
• Limitation:
 The range says nothing about the values between
extreme values
 The range is not stable: As the sample size
increases, the range can change dramatically
 Statistics cannot be used to look at the range
Range
Example of a range
• Take a sample of 10 heights:
 70, 95, 100, 103, 105, 107, 110, 112, 115 and 140
cms
• Lowest (Minimum) value
 70cm
• Highest (Maximum) value
 140cm
• Range
 140 – 70 = 70cm
Range
Three different distributions with the
same range (35 Kgs)
Even
X
X
X
30
Uneven
X
30
Clumped
40
X
X
40
X
50
X X X X
X
30
X
X
X
50
50
70
60
X X
70
60
X XX X X X X
40
X X
X
60
Range
70
The range increases with the sample size
Values
Range
Initial set
(5 values)
30
40
53
58
65
-
-
-
30
65
35
New set
(3 more
values)
30
40
53
58
65
48
51
64
30
65
35
New set
(3 more
values)
30
40
53
58
65
48
51
70
30
70
40
New set
(3 more
values)
30
40
53
58
65
28
51
70
28
70
42
Two ranges based on different sample sizes are not comparable
Range
Percentiles and quartiles
• Percentiles
 Those values in a series of observations, arranged
in ascending order of magnitude, which divide
the distribution into two equal parts
 The median is the 50th percentile
• Quartiles
 The values which divide a series of observations,
arranged in ascending order, into 4 equal parts
 The median is the 2nd quartile
Inter-quartile range
Sorting the data in increasing order
• Median
 Middle value (if n is odd)
 Average of the two middle values (if n is even)
 A measure of the “centre” of the data
• Quartiles divide the set of ordered values
into 4 equal parts Q2
Q1
First 25%
(Median)
2nd 25%
3rd 25%
Q3
4th 25%
The inter-quartile range
• The central portion of the distribution
• Calculated as the difference between the
third quartile and the first quartile
• Includes about one-half of the observations
• Leaves out one quarter of the observations
• Limitations:
 Only takes into account two values
 Not a mathematical concept upon which theories
can be developed
Inter-quartile range
The inter-quartile range: Example
• Values
 29 , 31 , 24 , 29 , 30 , 25
• Arrange
 24 , 25 , 29 , 29, 30 , 31
• Q1
 Value of (n+1)/4=1.75
 24+0.75 = 24.75
• Q3
 Value of (n+1)*3/4=5.2
 Q3 = 30+0.2 = 30.2
• Inter-quartile range = Q3 – Q1 = 30.2 – 24.75
Inter-quartile range
Graphic representation of the
inter-quartile range
Inter-quartile range
The mean deviation from the mean
• Calculate the mean of all values
• Calculate the difference between each value
and the mean
• Calculate the average difference between
each value and the mean
• Limitations:
 The average between negative and positive
deviations may generate a value of 0 while there
is substantial variation
Standard deviation
The mean deviation from the mean:
Example
Data
10 20 30 40 50 60 70
Mean = 280/7 = 40
Mean deviation from mean
10-40 20-40 ………
-30 -20 -10 0 10 20 30
Sum = 0
Standard deviation
Absolute mean deviation from the mean
• Calculate the mean of all values
• Calculate the difference between each value
and the mean and take the absolute value
• Calculate the average difference between
each value and the mean
• Limitations:
 Absolute value is not good from a mathematical
point of view
Standard deviation
Absolute mean deviation from the mean:
Example
Data
10 20 30 40 50 60 70
Mean = 280/7 = 40
Mean deviation from mean
10-40 20-40 ………
-30 -20 -10 0 10 20 30
Absolute values
30 20 10 0 10 20 30
Mean deviation from mean = 120/7 = 17.1
Standard deviation
Calculating the variance (1/2)
1. Calculate the mean as a measure of central
location (MEAN)
2. Calculate the difference between each
observation and the mean (DEVIATION)
3. Square the differences (SQUARED
DEVIATION)
• Negative and positive deviations will not cancel
each other out
• Values further from the mean have a bigger
impact
Standard deviation
Calculating the variance (2/2)
4. Sum up these squared deviations (SUM OF
THE SQUARED DEVIATIONS)
5. Divide this SUM OF THE SQUARED
DEVIATIONS by the total number of
observations minus 1 (n-1) to give the
VARIANCE
• Why divide by n - 1 ?
 Adjustment for the fact that the mean is just an
estimate of the true population mean
 Tends to make the variance larger
Standard deviation
The standard deviation
• Take the square root of the variance
• Limitations:
 Sensitive to outliers
n  x   x 
n( n  1 )
2
SD 
i
2
i
Standard deviation
Example
Patient
No of X
rays
Deviation
from mean
Absolute
deviation
Square
deviation
Square of
observations
A
10
10-9= 1
1
12 = 1
102 = 100
B
8
8-9= -1
1
-12 = 1
82 = 64
C
6
6-9= -3
3
-32 = 9
62 = 36
D
12
12-9 = 3
3
32 = 9
122 = 144
E
9
9-9 = 0
0
02 = 0
92 = 81
Total
45
0
8
20
425
Mean = 45/9 = 9 x-rays
Mean deviation = 8/5 = 1.6 x-rays
Variance = (20/(5-1)) = 20/4 = 5 x-rays Standard deviation = 5 = 2.2
Properties of the standard deviation
• Unaffected if same constant is added to (or
subtracted from) every observation
• If each value is multiplied (or divided) by a
constant, the standard deviation is also
multiplied (or divided) by the same constant
Standard deviation
Need of a measure of variation that is
independent from the measurement unit
• The standard deviation is expressed in the
same unit as the mean:
 e.g., 3 cm for height, 1.4 kg for weight
• Sometimes, it is useful to express variability
as a percentage of the mean
 e.g., in the case of laboratory tests, the
experimental variation is ± 5% of the mean
Standard deviation
The coefficient of variation
• Calculate the standard deviation
• Divide by the mean
 The standard deviation becomes “unit free”
• Coefficient of variation (%) =
 [S.D / Mean] x 100 (Pure number)
Standard deviation
Uses of the coefficient of variation
• Compare the variability in two variables
studied which are measured in different
units
 Height (cm) and weight (kg)
• Compare the variability in two groups with
widely different mean values
 Incomes of persons in different socio- economic
groups
Standard deviation
A summary of measures of dispersion
Measure
Advantages
Disadvantages
Range
•Obvious
•Easy to calculate
•Uses only 2 observations
•Increases with the sample size
•Can be distorted by outliers
Inter-quartile
range
•Not affected by
extreme values
•Uses only 2 observations
•Not amenable for further
statistical treatment
Standard
deviation
•Uses every value
•Highly influenced by extreme
•Suitable for further values
analysis
Choosing a measure of central tendency
and a measure of dispersion
Type of
distribution
Measure of central
tendency
Measure of dispersion
Normal
•Mean
•Standard deviation
Skewed
•Median
•Inter-quartile range
Exponential or
logarithmic
•Geometric mean
•Consult with the statistician
Key messages
• Report the range but be aware of its
limitations
• Report the inter-quartile deviation when you
use the median
• Report the standard deviation when you use
a mean