Download Percentiles The pth percentile of a data set is the data value such

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Misuse of statistics wikipedia , lookup

Time series wikipedia , lookup

Transcript
Percentiles
The pth percentile of a data set is the data value such that p percent
of the data is less than or equal to it.
If you scored the 90th percentile on the SAT, 90% of people scored that score
or less.
The median is the 50th percentile
The First Quartile or Q1 is the 25th percentile.
The Third Quartile or Q3 is the 75th percentile.
You guessed it, Q2=Median!
The difference of Q3 and Q1 is called the Interquartile Range and gives
the range of the middle 50% of the data. It is a good measure of spread for
skewed data
IQR = Q3 − Q1
• Actually, you need to say a bit more to specify a percentile because of mutliple values, and
whether the number of data points divides properly. Conventions are inconsistent here, and
we will not worry about those details.
• People also talk about deciles (10th, 20th, etc.). I have seen books define octiles, and even
once nanile (ninths!!!) though I don’t believe I have ever seen these used in real life.
The Five Number Summary
The five numbers Min, Q1, Median, Q3, Max, divide the data into quarters and are a convenient summary of the data set.
Min Q1 Med Q3 Max
Ex: For grades on a test:
37
70
82
88
99
• 25% of grades are between 37 and 70
• 25% of grades are between 70 and 82
• 25% of grades are between 82 and 88
• 25% of grades are between 88 and 99
We will represent the 5 number summary with a beautiful graphical representation, the Boxplot.
• The 5 number summary gives a similar amount of information for skewed data as the mean
and standard deviation give for bell shaped data. Why do you need 5 numbers instead of 2?
Because skewed data is more complicated!
1
The Boxplot
Popularized by the great 20th century statistician and proponent of exploratory data analysis John Tukey.
• Draw number line for range
• Draw rectangle from Q1 to Q3
• Draw vertical line at Median
• Draw horizontal lines out to Min and Max
Mn
37
Q1
70
Md
82
Q3
88
Mx
99
30
40
50
60
70
80
90
100
• I will usually give you the three quartiles after a test, but not the max and min because that
is someone’s grades!
• Notice you can see at a glance that the data is skewed left.
• Boxplots are particularly useful for comparing multiple distributions (test scores throughout
the semester perhaps).
• Boxplots are sometimes called Box-and-whisker plots, because I guess the lines to the extremes. I hope my whiskers don’t look like that!
• People will often stop the lines at the last score that they do not consider an outlier, and
mark the outliers as a *.
z-score
Just as percentile tells you where a particular data value fits in a skewed distribution, the z-score tells you where a value fits in a bell-shaped distribution.
The z-score of a data value is the number of standard deviations above (or
below) the mean it is. Specifically
z=
x−µ
.
σ
Ex: Women’s heights are bell shaped with a mean of µ = 65.5 in. and a
standard deviation of σ = 2.5 in. The z-score of a woman whose height is 69
inches would be
3.5
69 − 65.5
z=
=
= 1.4
2.5
2.5
so she is 1.4 s.d.s above the mean.
•
2
Examples of z-score
z=
x−µ
.
σ
Women’s heights are bell-shaped with a mean of 65.5 and an s.d. of 2.5.
Ex: What would be the z-score of a woman who is 60.5 inches tall?
z=
60.5 − 65.5
−5
=
= −2
2.5
2.5
so she is 2 s.d.s below the mean.
Ex: What would be the z-score of a woman who is 72 inches tall?
z=
6.5
72 − 65.5
=
= 2.6
2.5
2.5
so she is 2.6 s.d.s above the mean.
•
z-score as Universal Measure of Distance
This is units in which every bellshaped distribution looks approximately the same.
No matter what the variable is (if symmetric, unimodal), a z score . . .
between −1 and 1 (2/3)
between −2 and 2 (95%)
over 2 or under −2 (2.5% each)
over 3 or under −3 (1 in a 1000 each)
. . . is . . .
typical
usual
unusual
shocking
• If you see a woman whose height z-score is between 2 and 3 you would call her tall but you
would not be shocked. Of your z score on the test was between 2 and 3, you are doing well,
but you are not blowing your teacher away. If the z score for how much TV you watch per
week is between 2 and 3, everyone would say you watch a lot of TV but no one is going to
do an intervention.
3
Lecture 8 Key Points
After this lecture you should be able to
• Interpret and use the quartiles and the IQR
• Interpret box plots, relate to shape of histogram
• Know when to use mean and s.d. versus median, quartiles, etc.
• Compute z-scores.
After processing this lecture you should be able to
• Use z-score to give a universal picture of the place of a data value in the
data set.
• Compute percentiles, quartiles, and z-scores in Excel.
4