Download Measure of Dispersion or Variability

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Degrees of freedom (statistics) wikipedia , lookup

History of statistics wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Taylor's law wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript
Measure of Dispersion or Variability
Lecture 4
The mean, mode and median do a nice job in telling where the center of the data
set is, but often we are interested in more. For example, a pharmaceutical engineer
develops a new drug that regulates iron in the blood. Suppose he finds out that the
average sugar content after taking the medication is the optimal level. This does not
mean that the drug is effective. There is a possibility that half of the patients have
dangerously low sugar content while the other half has dangerously high content.
Instead of the drug being an effective regulator, it is a deadly poison. What the
pharmacist needs is a measure of how far the data spread apart. This is what the
variance and standard deviation do.
The most common measures of dispersion or variability are (Range, Variance,
Standard deviation and Coefficient of variation).
Range (R):
The range is the difference between the largest (XL) and the smallest (XS) values
in a set of observations.
R = XL - XS
Note: The range is poor measure of dispersion? Because it only takes into account
two of the values.
Variance:
The variance is the most commonly used to measure of spread in biological
statistics. For a population is defined as the sum of squares of the deviation from the
mean (SS), dividing by the total number of the deviations, and by one less than the
total number of the deviation (degree of freedom, df) for a sample.
N
 
2
( X  )
i 1
N
2
or
2

 N
 
  Xi  

1  N 2  i 1  
2
 
 Xi  N  ……..……….. (Population)
N  i 1




19
n
S 
2
__
 ( Xi  X )
2
i 1
or
n 1
2

 n
 
  Xi  
 n
1
 Xi 2   i 1  
S2 

 ……..……..….. (Sample)
n  1  i 1
n




Notes:
1. The variance is a measure that uses the mean as a point of reference.
2. The variance is small when all values are close to the mean.
3. The variance is large when all values are spread away from the mean.
Why the separate formula for sample?
The formula for sample divided by n-1 to:
1. Correct for probability that most extreme cases will be excluded from a smaller
sample.
2. Makes the sample more representative of the population for every small sample.
3. Reduces the denominator to a larger extent (If n=5 they we have a 20% reduction in
the denominator). But in large samples the n-1 correction does not have as large
effect.
Example:
We want to compute the sample variance of the following sample values:
10, 21, 33, 53, 54.
Solution: n=5
* First method:
5
_
x
 xi
i 1
5

10  21  33  53  54 171

 34.2
5
5
2
_


xi

x



i 1
2

 
S 
n 1
n
2
5
 xi  34.2
i 1
5 1
20
S
2
2
2
2
2
2

10  34.2  21  34.2  33  34.2  53  34.2  54  34.2

4

1506.8
 376.7
4
* Second method:
xi
xi  34.2
xi  34.22
10
21
33
53
54
-24.2
-13.2
-1.2
18.8
19.8
585.64
174.24
1.44
353.44
392.04
 xi  171
_


xi

x

0


i 1 
_


xi

x

  1506.8


i 1 
5
5
i 1
2
5
_
x
171
 34.2
5
1506.8
4
 376.7
S2 
* Third method:
xi
10
21
33
53
54
xi2
100
441
1089
2809
2916
 xi  171
 xi  7355
2
7355  534.2 1506.8
S 

 376.7
5 1
4
2
2
Standard Deviation (s) or (sd):
Is defined as a positive square root of variance.
   2 ……………. (Population)
s  s 2 ……………… (Sample)
The mean squared difference from the sample mean will, on average,
underestimate the population variance. In some samples, it will overestimate it, but
most of the time it will underestimate it, if the formula is modified so that the sum of
squared deviations is divided by n-1 rather than N, then the tendency to underestimate
the population variance is eliminated.
Coefficient of Variation (C.V):
The variance and the standard deviation are useful as measures of variation of
the values of a single variable for a single population or sample, but if we want to
21
compare the variation of two variables or two samples we can not use the variance and
the standard deviation because:
1. The variables or samples might have different units.
2. The variables or samples might have same means.
Coefficient of Variation is defined as the ratio of the standard deviation to the
mean. It is independent of the units employed (unit less).
C.V 
C.V 

100% ……………. (Population)

s
__
100% ….…………… (Sample)
X
Notes:
In biological experiments if coefficient of variation (C.V):
1) Is 10% to 15% the variance between a data is acceptable.
2) Is 5% or less, that is referring to homogenous data with less variance.
3) Is 25% or more indicating very considerable variance.
1. C.V for Ungrouped Data
Example: The data below present the technician from two different laboratories, all
making the same specific blood chemistry determination using a solution
with a know concentration (5 mg/ml).
Laboratories
1
2
C.V 
Technician
5, 7, 6, 6
6, 4, 9, 5
0.82
100  13.6%
6
Mean
6
6
C.V 
Standard deviation
0.82
2.16
2.16
100  36%
6
Lab. 1 gives the most accurate result.
Example: A set of data (4, 6, 3, 4, 5 and 2) compute: The range, the variance, the
standard deviation and the coefficient of variation?
Solution:
R = XL - XS ................................ R= 6-2=4
22
2

 n
 
  Xi  

1  n
2
2
S 
Xi   i 1  



n  1 i 1
n




1  2
(4  6  3  4  5  2)2 
2
2
2
2
2
S 
(4  6  3  4  5  2 ) 
2
6 1 
6

2
s  s 2 …................................. s  2 = 1.414
n
__
X
 Xi
i 1
C.V 
N
s
__
__
…................................ X 
4 63 45 2
4
6
100 …................................ C.V 
X
1.414
100  35.35%
4
2. C.V for Grouped Data
Example: The following shows the hemoglobin values (g/100ml) of 30 children
receiving treatment for hemolytic anemia, compute: The variance, the
standard deviation and the coefficient of variation?
Hemoglobin
Midpoint
(mi)
6.5 – 7.5
7.5 – 8.5
8.5 – 9.5
9.5 – 10.5
10.5 – 11.5
11.5 – 12.5
7
8
9
10
11
12
Frequency
(fi)
1
5
11
9
3
1
 fi =30
Solution:
2

 k
 
  mifi  
 k
1
2
2
 
 mi fi   i 1

Variance for grouped data ………… S  k
k


fi  1  i 1
fi 


i 1
i 1


23
S2 
1  2
(7 1  8  5  ...  12 1) 2 
2
2
(
7

1

8

5

...

12

1
)


  1.199
30  1 
30

s  s 2 …....................................................... s  1.199  1.095
k
__
X 
 mifi
i 1
k
 fi
…..…………
(7  1  8  5  9  11  10  9  11 3  12  1)
 9.367
30
i 1
C.V 
1.095
100  11.69%
9.367
Home Work 3:
Q1: The following table gives the results of a survey to study the ages and hemoglobin
levels of patients of a certain clinic.
Age (Years)
Hemoglobin level (g/dl)
Mean
30
60
24
Standard Deviation
6
10
Determine whether hemoglobin levels of the patients are more variable than ages?
Q2: The weights (in kg) of 8 pregnant women gave the following results:
8
x
i 1
i
8
x
 495
i 1
Find: (a) The mean.
2
i
 30659
(b) The variance.
(c) The standard deviation.
(d) The coefficient of variation.
Q3: The following are the glucose levels (g/100ml) of a sample of 50 children.
126
117
101
100
116
111
120
138
118
108
114
115
113
112
113
132
130
128
122
121
115
88
113
90
89
106
104
126
127
111
Prepare:
1. Grouped data by frequency distribution?
2. Find the mean, median and mode?
3. Compute the R, S2, S and C.V.?
25
115
116
109
108
122
123
149
140
121
137
110
119
115
83
109
117
118
110
108
134