Download Lecture Note, June 25, 2014 Chih

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Time series wikipedia , lookup

Psychometrics wikipedia , lookup

Transcript
Lecture Note, June 25, 2014
Chih-Hsin Hsueh
I. Measures of relative standing
1. Percentile: The pth percentile is a number such that p% of the total observations fall
below it.
2. Quartile: Quartiles are special percentiles.
(a) first (lower) quartile: 25th percentile.
(b) second (median) quartile: 50th percentile.
(c) third (upper) quartile: 75th percentile.
Calculating quartiles: Given n measurements, the lower and upper quartiles (Q1 and
Q3 ), can be calculated as follows, after ordering the data from smallest to largest
• The position of Q1 is 0.25(n + 1)
• The position of Q3 is 0.75(n + 1)
• If the positions are not integers, find the quartiles by taking a weighted average of
the two closest measurements.
Example: Years of service of professors in a statistics department : 8, 13, 15, 17, 19, 20,
20, 26 , 27, 28, 30, 31, 32, 37, 39, 42
1
3. z-score: It measures how many standard deviation away from the mean does the measurement lie.
x − x̄
z-score for datum x =
s
(a) Compute the z-score corresponding to the datum x = 90, if x̄ = 89 and s = 2
(b) Suppose that 40 and 90 are two elements of a population dat set and that their
z-scores are −2 and 3, respectively. Determine the mean and standard deviation.
II. Interquartile range (IQR) The range of the middle 50% of the measurements is called
the interquartile range.
IQR = Q3 − Q1
Note:
• IQR is another measure of the spread of the data.
• The Five number summary of the data:
min
Q1
median
Q3
max
III. Outlier An observation (or measurement) that is unusually large r small relative to
the other values in a data set is called an outlier. Outliers typically are attributable to one of
the following causes:
• The measurement is observed, recorded, or entered into the computer incorrectly.
• The measurement comes from a different population.
• The measurement is correct, but represents a rare event.
Note: To detect outlier: (1) use fence (explain later...) (2) use z-score if the data is normally
distributed (cover in probability section)
2
IV. Boxplot It is a graph of five-number summary, with outliers plotted individually. From
boxplot we can get a quick idea about center, variability and shape of the distribution and
see whether there are any outliers in the data set.
Steps for constructing a boxplot
1. Compute quartiles and IQR
2. Draw a horizontal line to represent the scale of measurements
3. Represent the three quartiles as short vertical lines along the horizontal scale
4. Join the end points horizontally to create a box.
5. Build fences
• lower fence: Q1 − 1.5 × IQR
• upper fence: Q3 + 1.5 × IQR
6. Measurements outside the fences are considered outliers and marked as ( * )
7. Draw ”whiskers” from the ends of the box to the minimum and the maximum measurements that are not outliers.
3
Data Example A dietician obtains the amounts of sugar (in centigrams) from 100 centigrams (or 1 gram) in each of 10 different cereals, including Cheerios, Corn Flakes, Fruit Loops,
and 7 others. Those values are listed below.
3, 24, 30, 47, 43, 7, 47, 37, 44, 39
• mean, standard deviation
• five number summary
• outliers?
• boxplot
4