* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Measures of dispersion
		                    
		                    
								Survey							
                            
		                
		                
                            
                            
								Document related concepts							
                        
                        
                    
						
						
							Transcript						
					
					Measures of Dispersion Series I: 70 70 70 70 70 70 70 70 70 70 Series II: 66 67 68 69 70 70 71 72 73 74 Series III: 1 19 50 60 70 80 90 100 110 120 n =10, ∑X = 700 ; Mean = 700/10 = 70 Measures of Variability  A single summary figure that describes the spread of observations within a distribution. MEASURES OF DESPERSION     RANGE INTERQUARTILE RANGE VARIANCE STANDARD DEVIATION Measures of Variability  Range  Difference between the smallest and largest observations.  Interquartile Range  Range of the middle half of scores.  Variance  Mean of all squared deviations from the mean.  Standard Deviation  Rough measure of the average amount by which observations deviate from the mean. The square root of the variance. Variability Example: Range  Hotel Rates 52, 76, 100, 136, 186, 196, 205, 150, 257, 264, 264, 280, 282, 283, 303, 313, 317, 317, 325, 373, 384, 384, 400, 402, 417, 422, 472, 480, 643, 693, 732, 749, 750, 791, 891  Range: 891-52 = 839 Pros and Cons of the Range  Pros  Very easy to compute.  Scores exist in the data set.  Cons  Value depends only on two scores.  Very sensitive to outliers.  Influenced by sample size (the larger the sample, the larger the range). Inter quartile Range  The inter quartile range is Q3-Q1  50% of the observations in the distribution are in the inter quartile range.  The following figure shows the interaction between the quartiles, the median and the inter quartile range. Inter quartile Range Quartiles: Q Q 2 Q 1 n+1 = th 4 2(n+1) n+1 = = th 4 2 3 3(n+1) = th 4 Inter quartile : IQR = Q3 – Q1 Pros and Cons of the Interquartile Range  Pros  Fairly easy to compute.  Scores exist in the data set.  Eliminates influence of extreme scores.  Cons  Discards much of the data. Percentiles and Quartiles  Maximum is 100th percentile: 100% of values lie at or below the maximum  Median is 50th percentile: 50% of values lie at or below the median  Any percentile can be calculated. But the most common are 25th (1st Quartile) and 75th (3rd Quartile) Locating Percentiles in a Frequency Distribution  A percentile is a score below which a specific percentage of the distribution falls(the median is the 50th percentile.  The 75th percentile is a score below which 75% of the cases fall.  The median is the 50th percentile: 50% of the cases fall below it  Another type of percentile :The quartile lower quartile is 25th percentile and the upper quartile is the 75th percentile Locating Percentiles in a Frequency Distribution 25% F included m V a u l 25th r r u r c c c e percentile here . 0 6 6 6 V 0 . 1 4 5 1 1 50th percentile 50% included . 0 6 6 7 2 . 5 8 9 6 3 . 0 2 2 7 4 here 80th . 1 2 2 9 5 80% . 1 1 1 1 6 percentile included . 1 1 1 2 7 here . 8 8 8 0 E 7 8 0 T 2 2 M N 9 0 T Five Number Summary      Minimum Value 1st Quartile Median 3rd Quartile Maximum Value VARIANCE: Deviations of each observation from the mean, then averaging the sum of squares of these deviations. STANDARD DEVIATION: “ ROOT- MEANS-SQUARE-DEVIATIONS” Variance  The average amount that a score deviates from the typical score.  Score – Mean = Difference Score  Average of Difference Scores = 0  In order to make this number not 0, square the difference scores (no negatives to cancel out the positives). Variance: Computational Formula  Population   2 N  X 2  ( X ) 2 N 2  Sample n X  (  X ) 2 S  2 n 2 2 Variance  Use the computational formula to calculate the variance. X S2  n X 2  (  X ) 2 n2 10(400)  (60) 2 S  10 2 4000  3600 S2  100 S 2  4.0 2 X2 3 9 4 16 4 16 4 16 6 36 7 49 7 49 8 64 8 64 9 81 Sum: 60 Sum: 400 Variability Example: Variance Hotel Rates S2  n X 2  (  X ) 2 n2 2 35 ( 6686202 )  ( 13386 ) S2  35 2 234017070  179184996 2 S  1225 S 2  44760.88 2 X X 472 222784 303 91809 280 78400 282 79524 417 173889 400 160000 254 64516 205 42025 384 147456 264 69696 317 100489 76 5776 643 413449 480 230400 136 18496 250 62500 100 10000 732 535824 317 100489 264 69696 384 147456 750 562500 402 161604 422 178084 373 139129 325 105625 313 97969 749 561001 791 625681 196 38416 891 793881 283 80089 52 2704 186 34596 693 480249 Sum: 13386 Sum: 6686202 Pros and Cons of Variance  Pros  Takes all data into account.  Lends itself to computation of other stable measures (and is a prerequisite for many of them).  Cons  Hard to interpret.  Can be influenced by extreme scores. Standard Deviation  To “undo” the squaring of difference scores, take the square root of the variance.  Return to original units rather than squared units. Quantifying Uncertainty  Standard deviation: measures the variation of a variable in the sample.  Technically, s N 1 N 1 (x i 1 i  x) 2 Standard Deviation Rough measure of the average amount by which observations deviate on either side of the mean. The square root of the variance.  Population    s s 2  (X   ) 2 S N N  X  (  X) N2 2   Sample 2 2 (X  X ) n n X  ( X ) 2 S 2 n2 2 Variability Example: Standard Deviation S (X  X ) 2 n S (3  6) 2  (4  6) 2  (4  6) 2  (4  6) 2  (6  6) 2  (7  6) 2  (7  6) 2  (8  6) 2  (8  6) 2  (9  6) 2 10 S 40  2.0 10 S n X 2  ( X ) 2 n2 10(400)  (60) 2 S 10 2 4000  3600 100 Mean: 6 S Standard Deviation: 2 S  4.0 S  2.0 Variability Example: Standard Deviation Las Vegas Hotel Rates 9 8 7 Frequency 6 5 hotel rates 4 3 2 1 0 800-899 700-799 600-699 500-599 400-499 300-399 200-299 100-199 0-99 Rates Mean: 371.60 Standard Deviation: 35(6686202)  (13386) 2 S 35 2 S 234017070  179184996 1225 S  44760.88  $211.57 Pros and Cons of Standard Deviation  Pros  Lends itself to computation of other stable measures (and is a prerequisite for many of them).  Average of deviations around the mean.  Majority of data within one standard deviation above or below the mean.  Cons  Influenced by extreme scores. Mean and Standard Deviation  Using the mean and standard deviation together:  Is an efficient way to describe a distribution with just two numbers.  Allows a direct comparison between distributions that are on different scales. D e s c r ip t iv e S t a t is t ic s V a r ia b le : A g e A n d e r s o n - D a r lin g N o r m a lit y T e s t A -S quared: P - V a lu e : 15 30 45 60 Mean S tDe v V a r ia n ce S ke w n e ss Ku r t o s is N 75 M in im u m 1 s t Q u a r t ile M e d ia n 3 r d Q u a r t ile M a x im u m 9 5 % C o n f id e n ce I n t e r v a l f o r M u 0 .9 6 2 0 .0 1 4 3 6 .4 5 0 0 1 5 .7 3 5 6 2 4 7 .6 0 8 0 .6 7 9 6 2 6 8 .5 1 E- 0 2 60 1 1 .0 0 0 0 2 5 .0 0 0 0 3 1 .5 0 0 0 4 6 .7 5 0 0 7 9 .0 0 0 0 9 5 % C o n f id e n ce I n t e r v a l f o r M u 3 2 .3 8 5 1 28 33 38 43 9 5 % C o n f id e n ce I n t e r v a l f o r S ig m a 1 3 .3 3 8 0 9 5 % C o n f id e n ce I n t e r v a l f o r M e d ia n 4 0 .5 1 4 9 1 9 .1 9 2 1 9 5 % C o n f id e n ce I n t e r v a l f o r M e d ia n 2 8 .0 0 0 0 4 2 .0 0 0 0 WHICH MEASURE TO USE ? DISTRIBUTION OF DATA IS SYMMETRIC ---- USE MEAN & S.D., DISTRIBUTION OF DATA IS SKEWED ---- USE MEDIAN & QUARTILES Mean, Median and Mode Distributions  Bell-Shaped (also known as symmetric” or “normal”)  Skewed:  positively (skewed to the right) – it tails off toward larger values  negatively (skewed to the left) – it tails off toward smaller values A 100 samples were selected. Each of the sample contained 100 normal individuals. The mean Systolic BP of each sample is presented 110, 140, 110, 120, 130, 160, 100, 130, etc Systolic BP level 90 100 110 120 130 140 150 160 - 90, 120, 100, 120, No. of samples 5 10 20 34 20 10 5 2 Mean = 120 Sd., = 10 Normal Distribution Mean = 120 SD = 10 90 100 110 120 130 140 150 •The curve describes probability of getting any range of values ie., P(x>120), P(x<100), P(110 <X<130) •Area under the curve = probability •Area under the whole curve =1 •Probability of getting specific number =0, eg P(X=120) =0 ANY QUESTIONS THANK YOU
 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
                                             
                                             
                                             
                                             
                                             
                                             
                                             
                                             
                                             
                                             
                                            