Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
DESCRIBING DISTRIBUTION NUMERICALLY MEASURES OF CENTER: • MIDRANGE = (MAX + MIN) / 2 • MEDIAN IS THE MIDDLE VALUE WITH HALF OF THE DATA ABOVE AND HALF BELOW IT. • MEAN = (SUM OF DATA) / (NUMBER OF COUNTS n) EXAMPLE: DATA: 45, 46, 49, 35, 76, 80, 89, 94, 37, 61, 62, 64, 68, 56, 57, 57, 59, 71, 72. SORTED DATA: 35, 37, 45, 46, 49, 56, 57, 59, 61, 62, 64, 68, 71, 72, 76, 80, 89, 94. MIDRANGE = (94 + 35) / 2 = 64.5 MEDIAN = 61 MEAN = (35 + 37 + … + 94) / 19 = 62 NOTE: FOR SKEWED DISTRIBUTIONS THE MEDIAN IS A BETTER MEASURE OF THE CENTER THAN THE MEAN. 1 MEASURES OF THE SPREAD • RANGE = MAX – MIN • INTERQUARTILE RANGE (IQR) = Q3 – Q1 Q3 = UPPER QUARTILE = MEDIAN OF UPPER HALF OF DATA(INCLUDE MEDIAN IF n IS ODD) Q1 = LOWER QUARTILE MEDIAN OF LOWER HALF OF DATA(INCLUDE MEDIAN IF n IS ODD) • VARIANCE (later) • STANDARD DEVIATION (later) 2 Quartiles EXAMPLE: (odd number of observations, 19) Median = 61 UPPER HALF 35 37 45 46 49 56 57 57 59 [61 62 64 68 71 72 76 80 89 94] Q3 = (71 +72) / 2 = 71.5 LOWER HALF [35 37 45 46 49 56 57 57 59 61] 62 64 68 71 72 76 80 89 94 Q1 = (49 + 56) / 2 = 52.5 IQR = 71.5 – 52.5 = 19 Note: Include the median in the calculation of both quartiles 3 Quartiles EXAMPLE: (even number of observations, 18) 35 37 45 46 49 56 57 57 59 [60] [61 62 64 68 71 72 76 80 89 ] 60 = Median = (59+61)/2 (Average of the middle two numbers) UPPER HALF 35 37 45 46 49 56 57 57 59 [60] [61 62 64 68 71 72 76 80 89 ] Q3 = 71 LOWER HALF [35 37 45 46 49 56 57 57 59 ] 62 64 68 71 72 76 80 89 94 Q1 = 49 IQR = 71 – 49 = 42 4 5 – NUMBER SUMMARY: • THE 5-NUMBER SUMMARY OF A DISTRIBUTION REPORTS ITS MEDIAN, QUARTILES, AND EXTREMES(MINIMUM AND MAXIMUM) • MAX = 94 • Q3 = 71.5 • MEDIAN = 61 • Q1 = 52.5 • MIN=35 OUTLIERS: DATA VALUES WHICH ARE BEYOND FENCES IQR = Q3 – Q1 = 19 UPPER FENCE = Q3 + 1.5IQR = 71.5 + 1.5x19 = 100 LOWER FENCE = Q1 – 1.5IQR = 52.5 – 1.5x19 = 24 IN THE EXAMPLE CONSIDERED ABOVE, THERE ARE NO OUTLIERS. 5 BOXPLOTS WHENEVER WE HAVE A 5-NUMBER SUMMARY OF A\ (QUANTITATIVE) VARIABLE, WE CAN DISPLAY THE INFORMATION IN A BOXPLOT. • THE CENTER OF A BOXPLOT IS A BOX THAT SHOWS THE MIDDLE HALF OF THE DATA, BETWEEN THE QUARTILES. • THE HEIGHT OF THE BOX IS EQUAL TO THE IQR. • IF THE MEDIAN IS ROUGHLY CENTERED BETWEEN THE QUARTILES, THEN THE MIDDLE HALF OF THE DATA IS ROUGHLY SYMMETRIC. IF IT IS NOT CONTERED, THE DISTRIBUTION IS SKEWED. • THE MAIN USE FOR BOXPLOTS IS TO COMPARE GROUPS. 6 BOXPLOTS Boxplot of C1 100 90 80 C1 70 60 50 40 30 7 Examples: • 1. Here are costs of 10 electric smoothtop ranges rated very good or excellent by Consumers Reports in August 2002. • • 850 1000 • • • • Find the following statistics by hand: a) mean b) median and quartiles c) range and IQR 900 750 1400 1250 1200 1050 1050 565 8 VARIANCE = “AVERAGE” SQUARE DEVIATION FROM THE MEAN • DEVIATION = (each data value) – mean • VARIANCE = 4648 / (19 -1) = 258.8 • STANDARD DEVIATION = SQUARE ROOT ( VARIANCE) = 16.1 9 VARIANCE = “AVERAGE” SQUARE DEVIATION FROM THE MEAN • Step 1: Sort Data: 565 750 850 900 1000 1050 1050 1200 1250 1400 Mean = 1001.5 Median =1025 Q1=850 Q3=1200 Range = 835 IQR= 350 10 VARIANCE = “AVERAGE” SQUARE DEVIATION FROM THE MEAN Computing the Variance • DEVIATION = (each data value) – mean • Squared Deviation= ((each data value) – mean)^2 • Sum all squared deviations • Variance = (sum of all squared deviations)/(n-1), where n = is the number of observations 11 Variance Example: Data Squared Deviations 35 54.76 37 29.16 45 6.76 46 12.96 49 43.56 Mean = 42.4 • Variance = 147.2/4 = 36.8 • • Std Deviation = square root of variance Std dev = 6.06 12 Some Remarks • If the shape is skewed, report the median and IQR. • Mean and median will be very differnet. • You may want to include the mean and std deviation, but you should point out why the mean and the median differ. • If the histogram is symmetric, report the mean and the std deviation and possibly the median and IQR. 13