Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Why statisticians were created Measure of dispersion FETP India Competency to be gained from this lecture Calculate a measure of variation that is adapted to the sample studied Key issues • Range • Inter-quartile variation • Standard deviation Measures of spread, dispersion or variability • The measure of central tendency provides important information about the distribution • However, it does not provide information concerning the relative position of other data points in the sample • Measure of spread, dispersion or variability address are needed Range Why one needs to measure variability Marks obtained Biology Physics Chemistry 1 200 199 100 2 200 200 200 3 200 201 300 Mean 200 200 200 Variation Nil Slight Substantial 0 2 200 Students Range Range Every concept comes from a failure of the previous concept • Mean is distorted by outliers • Median takes care of the outliers Range The range: A simple measure of dispersion • Take the difference between the lowest value and the highest value • Limitation: The range says nothing about the values between extreme values The range is not stable: As the sample size increases, the range can change dramatically Statistics cannot be used to look at the range Range Example of a range • Take a sample of 10 heights: 70, 95, 100, 103, 105, 107, 110, 112, 115 and 140 cms • Lowest (Minimum) value 70cm • Highest (Maximum) value 140cm • Range 140 – 70 = 70cm Range Three different distributions with the same range (35 Kgs) Even X X X 30 Uneven X 30 Clumped 40 X X 40 X 50 X X X X X 30 X X X 50 50 70 60 X X 70 60 X XX X X X X 40 X X X 60 Range 70 The range increases with the sample size Values Range Initial set (5 values) 30 40 53 58 65 - - - 30 65 35 New set (3 more values) 30 40 53 58 65 48 51 64 30 65 35 New set (3 more values) 30 40 53 58 65 48 51 70 30 70 40 New set (3 more values) 30 40 53 58 65 28 51 70 28 70 42 Two ranges based on different sample sizes are not comparable Range Percentiles and quartiles • Percentiles Those values in a series of observations, arranged in ascending order of magnitude, which divide the distribution into two equal parts The median is the 50th percentile • Quartiles The values which divide a series of observations, arranged in ascending order, into 4 equal parts The median is the 2nd quartile Inter-quartile range Sorting the data in increasing order • Median Middle value (if n is odd) Average of the two middle values (if n is even) A measure of the “centre” of the data • Quartiles divide the set of ordered values into 4 equal parts Q2 Q1 First 25% (Median) 2nd 25% 3rd 25% Q3 4th 25% The inter-quartile range • The central portion of the distribution • Calculated as the difference between the third quartile and the first quartile • Includes about one-half of the observations • Leaves out one quarter of the observations • Limitations: Only takes into account two values Not a mathematical concept upon which theories can be developed Inter-quartile range The inter-quartile range: Example • Values 29 , 31 , 24 , 29 , 30 , 25 • Arrange 24 , 25 , 29 , 29, 30 , 31 • Q1 Value of (n+1)/4=1.75 24+0.75 = 24.75 • Q3 Value of (n+1)*3/4=5.2 Q3 = 30+0.2 = 30.2 • Inter-quartile range = Q3 – Q1 = 30.2 – 24.75 Inter-quartile range Graphic representation of the inter-quartile range Inter-quartile range The mean deviation from the mean • Calculate the mean of all values • Calculate the difference between each value and the mean • Calculate the average difference between each value and the mean • Limitations: The average between negative and positive deviations may generate a value of 0 while there is substantial variation Standard deviation The mean deviation from the mean: Example Data 10 20 30 40 50 60 70 Mean = 280/7 = 40 Mean deviation from mean 10-40 20-40 ……… -30 -20 -10 0 10 20 30 Sum = 0 Standard deviation Absolute mean deviation from the mean • Calculate the mean of all values • Calculate the difference between each value and the mean and take the absolute value • Calculate the average difference between each value and the mean • Limitations: Absolute value is not good from a mathematical point of view Standard deviation Absolute mean deviation from the mean: Example Data 10 20 30 40 50 60 70 Mean = 280/7 = 40 Mean deviation from mean 10-40 20-40 ……… -30 -20 -10 0 10 20 30 Absolute values 30 20 10 0 10 20 30 Mean deviation from mean = 120/7 = 17.1 Standard deviation Calculating the variance (1/2) 1. Calculate the mean as a measure of central location (MEAN) 2. Calculate the difference between each observation and the mean (DEVIATION) 3. Square the differences (SQUARED DEVIATION) • Negative and positive deviations will not cancel each other out • Values further from the mean have a bigger impact Standard deviation Calculating the variance (2/2) 4. Sum up these squared deviations (SUM OF THE SQUARED DEVIATIONS) 5. Divide this SUM OF THE SQUARED DEVIATIONS by the total number of observations minus 1 (n-1) to give the VARIANCE • Why divide by n - 1 ? Adjustment for the fact that the mean is just an estimate of the true population mean Tends to make the variance larger Standard deviation The standard deviation • Take the square root of the variance • Limitations: Sensitive to outliers n x x n( n 1 ) 2 SD i 2 i Standard deviation Example Patient No of X rays Deviation from mean Absolute deviation Square deviation Square of observations A 10 10-9= 1 1 12 = 1 102 = 100 B 8 8-9= -1 1 -12 = 1 82 = 64 C 6 6-9= -3 3 -32 = 9 62 = 36 D 12 12-9 = 3 3 32 = 9 122 = 144 E 9 9-9 = 0 0 02 = 0 92 = 81 Total 45 0 8 20 425 Mean = 45/9 = 9 x-rays Mean deviation = 8/5 = 1.6 x-rays Variance = (20/(5-1)) = 20/4 = 5 x-rays Standard deviation = 5 = 2.2 Properties of the standard deviation • Unaffected if same constant is added to (or subtracted from) every observation • If each value is multiplied (or divided) by a constant, the standard deviation is also multiplied (or divided) by the same constant Standard deviation Need of a measure of variation that is independent from the measurement unit • The standard deviation is expressed in the same unit as the mean: e.g., 3 cm for height, 1.4 kg for weight • Sometimes, it is useful to express variability as a percentage of the mean e.g., in the case of laboratory tests, the experimental variation is ± 5% of the mean Standard deviation The coefficient of variation • Calculate the standard deviation • Divide by the mean The standard deviation becomes “unit free” • Coefficient of variation (%) = [S.D / Mean] x 100 (Pure number) Standard deviation Uses of the coefficient of variation • Compare the variability in two variables studied which are measured in different units Height (cm) and weight (kg) • Compare the variability in two groups with widely different mean values Incomes of persons in different socio- economic groups Standard deviation A summary of measures of dispersion Measure Advantages Disadvantages Range •Obvious •Easy to calculate •Uses only 2 observations •Increases with the sample size •Can be distorted by outliers Inter-quartile range •Not affected by extreme values •Uses only 2 observations •Not amenable for further statistical treatment Standard deviation •Uses every value •Highly influenced by extreme •Suitable for further values analysis Choosing a measure of central tendency and a measure of dispersion Type of distribution Measure of central tendency Measure of dispersion Normal •Mean •Standard deviation Skewed •Median •Inter-quartile range Exponential or logarithmic •Geometric mean •Consult with the statistician Key messages • Report the range but be aware of its limitations • Report the inter-quartile deviation when you use the median • Report the standard deviation when you use a mean