Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Lecture Note, June 25, 2014 Chih-Hsin Hsueh I. Measures of relative standing 1. Percentile: The pth percentile is a number such that p% of the total observations fall below it. 2. Quartile: Quartiles are special percentiles. (a) first (lower) quartile: 25th percentile. (b) second (median) quartile: 50th percentile. (c) third (upper) quartile: 75th percentile. Calculating quartiles: Given n measurements, the lower and upper quartiles (Q1 and Q3 ), can be calculated as follows, after ordering the data from smallest to largest • The position of Q1 is 0.25(n + 1) • The position of Q3 is 0.75(n + 1) • If the positions are not integers, find the quartiles by taking a weighted average of the two closest measurements. Example: Years of service of professors in a statistics department : 8, 13, 15, 17, 19, 20, 20, 26 , 27, 28, 30, 31, 32, 37, 39, 42 1 3. z-score: It measures how many standard deviation away from the mean does the measurement lie. x − x̄ z-score for datum x = s (a) Compute the z-score corresponding to the datum x = 90, if x̄ = 89 and s = 2 (b) Suppose that 40 and 90 are two elements of a population dat set and that their z-scores are −2 and 3, respectively. Determine the mean and standard deviation. II. Interquartile range (IQR) The range of the middle 50% of the measurements is called the interquartile range. IQR = Q3 − Q1 Note: • IQR is another measure of the spread of the data. • The Five number summary of the data: min Q1 median Q3 max III. Outlier An observation (or measurement) that is unusually large r small relative to the other values in a data set is called an outlier. Outliers typically are attributable to one of the following causes: • The measurement is observed, recorded, or entered into the computer incorrectly. • The measurement comes from a different population. • The measurement is correct, but represents a rare event. Note: To detect outlier: (1) use fence (explain later...) (2) use z-score if the data is normally distributed (cover in probability section) 2 IV. Boxplot It is a graph of five-number summary, with outliers plotted individually. From boxplot we can get a quick idea about center, variability and shape of the distribution and see whether there are any outliers in the data set. Steps for constructing a boxplot 1. Compute quartiles and IQR 2. Draw a horizontal line to represent the scale of measurements 3. Represent the three quartiles as short vertical lines along the horizontal scale 4. Join the end points horizontally to create a box. 5. Build fences • lower fence: Q1 − 1.5 × IQR • upper fence: Q3 + 1.5 × IQR 6. Measurements outside the fences are considered outliers and marked as ( * ) 7. Draw ”whiskers” from the ends of the box to the minimum and the maximum measurements that are not outliers. 3 Data Example A dietician obtains the amounts of sugar (in centigrams) from 100 centigrams (or 1 gram) in each of 10 different cereals, including Cheerios, Corn Flakes, Fruit Loops, and 7 others. Those values are listed below. 3, 24, 30, 47, 43, 7, 47, 37, 44, 39 • mean, standard deviation • five number summary • outliers? • boxplot 4