Survey

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Survey

Document related concepts

Transcript

S. Soleymani Math 52 Percentiles and Outliers Consider the following data set: 38 41 43 44 44 45 47 48 51 52 57 57 58 62 68 79 85 98 100 The histogram below suggests that the distribution of the data is right or positively-skewed. Histogram 6 4 Frequency 2 10 9 M or e 99 89 79 69 39.5 49.5 59.5 69.5 79.5 89.5 99.5 100 59 29 39 0 49 Frequency 8 Some descriptive statistics are found using Excel: Median = 52 Mean = 58.8 Standard Deviation = 18.9. Since the data is right-skewed then the median is less than the mean. This is because a few data points on the right of the distribution pull up the mean. We have already discussed that the median of 52 is a better measure of the center of this data set. Definition: the kth-percentile ( Pk ) of a data set corresponds to the data value that separates the lower kth% from the upper (100-kth)% of the data. To find Pk of a data set, you first need sort the data and then evaluate the locator L which is the location of Pk . The locator L can be found by: k L n 100 where, k is the percent being used, and n is the number of data points. CASE 1: If L isn’t a whole number, then round it up. The value of Pk is the Lth value from the lowest data value. EXAMPLE: Find the 50th percentile. 50 19 9.5 . Round L up to L=10. Now count 10 values from the Here k=50, n=19, then L 100 minimum value. The 10th value is 52, therefore P50 52. Note that the 50th percentile is the same as the median. CASE 2: If L is a whole number, then Pk is the average of the Lth and the next value in the data set. Sometimes we are interested in finding the quartiles of a data set. These quartiles are: Q1 P25 25th percentile Q2 P50 Median Q3 P75 75th percentile Definition: Interquartile Range (IQR) is a measure of variation and is computed by taking the difference of Q3 and Q1 , that is IQR Q3 Q1. The IQR for our data set is IQR 68 44 24 . In most cases we are interested to detect and eliminate outliers in a data set. Outliers are extreme unusual values that influence the mean and standard deviation of a data set. The borders that separate the outliers from the rest of the data are called fences and they are computed as followings: upper fence Q3 1.5 IQR lower fence Q1 1.5 IQR Find the outliers for the data set given in this handout. SOME PROPERTIES OF MEASURE OF CENTER AND VARIATION If the data points are increased or decreased by a nonzero constant then the measure of the center increases or decreases by the same constant. The measure of variation remains the same. If the data is rescaled by multiplying each data point by a constant, then all measures rescale by the same constant.