Survey

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts

Transcript

MEASURES OF SPREAD – VARIABILITY- DIVERSITYVARIATION-DISPERSION 0.0 0.0 0.01 0.1 0.02 0.2 0.03 0.3 0.04 Variability in Nature, life, and various processes we investigate is fundamental to the theory of Statistics. 8 10 A 12 -20 0 20 B 40 Measures of variability Range, Inter-Quartile Range (IQR), Standard Deviation RANGE is the difference between the largest and the smallest observations. Range = maximum value – minimum value The more variability or spread is in the data, the larger the difference between the min and max, the larger the range. Example. For the two data sets summarized in the histograms: A: min = 6.96, max = 13, range=13 - 6.96 = 6.04 B: min = -22.74, max = 40.93, range= 40.93-(-22.74)=63.67 INTER-QUARTILE RANGE (IQR) Percentiles – divide data into 100ths. E.g. GRE score in 85th percentile means that 85 percent of the students taking the test scored lower and 15 percent of the students scored higher than this. QUARTILES- special percentiles: 50th, 25th and 75th percentiles. 50th percentile= median. 25th percentile = the first quartile, Q1= median of the lower half of the data. 75th percentile = the third quartile, Q3 = median of the upper half of the data. Example. Data is number of km to school for a sample of 18 kids. 2 5 3 4 7 7 8 5 4 3 7 8 9 11 2 3 3 1. Sorted data: 1 2 2 3 3 3 3 4 4 5 5 7 7 7 8 8 9 11 3 Q1 4.5 median 7 Q3 Median=(4+5)/2=4.5 Data below median: 1 2 2 3 3 3 3 4 4 ; median of this data: 3 = Q1 Data above the median: 5 5 7 7 7 8 8 9 11; median of this data: 7 = Q3. Inter-Quartile Range, IQR= Q3- Q1=7-3=4. FIVE POINT SUMMARY: (MIN, Q1, MEDIAN, Q3, MAX) Q3 7 Max. 11 BOXPLOTS – Graphical display of the five point summary 2 Example. Boxplot of the distance to school data. Box: between Q1 and Q3. 6 Median 4.5 4 Min. Q1 1 3 8 10 Example. Five point summary of the distance to school data. Line inside the box-median. Whiskers drawn to max/min. if high/low outliers present to 1.5xIQR above/below Q3/Q1; 20 25 Whiskers drawn to if no outliers to max/min. 30 Boxplot with outliers 5 3 4 7 7 8 5 4 3 7 8 9 11 2 3 3 30 5 Example. Boxplot of the distance to school data with an outlier. 10 15 Outlier(s) indicated as an open circle or a line above/below the whiskers. STANDARD DEVIATION STANDARD DEVIATION – measures average deviation of the data from the mean. x − x1 x − xi = deviation of the first observation from the sample mean = deviation of the ith observation from the sample mean n 1 Could try ∑ xi − x as the average deviation from the mean. n i =1 Problem: Difficult mathematically to deal with because of the absolute value. Solution. Use squared deviations. SAMPLE VARIANCE n 1 2 2 S = ( xi − x ) ∑ n − 1 i =1 SAMPLE VARIANCE = n = ∑x i 2 x) ( ∑ − 2 n n∑ xi − ( ∑ xi ) i n i =1 n −1 2 = i =1 n(n − 1) 2 SAMPLE VARIANCE AND STANDARD DEVIATION Notes on sample variance 2 Sample variance is always nonnegative, s ≥ 0. Sample variance = 0 only if all deviations from the mean are zero, that is all observations are the same. SAMPLE STANDARD DEVIATION S =POSITIVE SQUARE ROOT OF SAMPLE VARIANCE S= var iance = s. 2 Example. Standard deviations of the distance to school data with and without an outlier. Original data set: variance: s 2= 7.87, standard deviation= 2.81. Data set with an outlier: variance s 2= 40.57, standard deviation= 6.37. UNITS: All measures of center are in the same units as the observations. Variance is in the squared units of the observations. Standard deviation is in the same units as the observations. 6 8 10 12 Median 9.9992 14 0.005 0.0 0.0 0.0 0.1 0.5 0.010 0.2 1.0 0.015 0.3 1.5 0.020 0.4 2.0 MEASURES OF CENTER AND SYMMETRY OF THE DATA 0 2 4 6 Median 977.107 Mean 949.489 Mean 9.9924 8 10 0 200 400 600 Median ? Mean ? Skewed to the right Symmetric histogram Skewed to the left 800 1000 0.0 0.0 0.0 0.1 0.005 0.5 0.010 0.2 1.0 0.015 0.3 1.5 0.020 0.4 2.0 MEASURES OF CENTER AND SYMMETRY OF THE DATA - SUMMARY 0 2 4 6 8 10 6 HISTOGRAM SKEWED to the RIGHT MEAN > MEDIAN 8 10 HISTOGRAM SYMMETRIC MEAN=MEDIAN 12 14 0 200 400 600 800 1000 HISTOGRAM SKEWED to the LEFT MEAN < MEDIAN