Download Measures of Relative Standing

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Bootstrapping (statistics) wikipedia , lookup

Data mining wikipedia , lookup

Time series wikipedia , lookup

Transcript
Measures of Relative Standing
A measure of relative standing is a measure of where a data value stands relative to the
distribution of the whole data set. With an idea of relative standing, we can say things like, “You
got a really high score compared to the rest of the class” or, “that man is unusually short”.
We’ll discuss three measures of relative standing: z-Scores, Quartiles, and Percentiles. But first,
a note about how to read these curves.
Viewing Distribution Curves
When viewing these graphs, imagine the data all stacked up under the curve in columns that all
have the same value. Where the curve is higher, you’ve got more of those data with that x value.
The z-Score
The z-score of a data value is how many standard deviations away the value is from the mean.
It’s the basis of our naming system shown below.
Unusually High
Really High
Very High
Pretty High
High
Average
Low
Pretty Low
Very Low
Really Low
Unusually Low
z>2
z=2
1<z<2
z=1
0<z<1
z=0
-1<z<0
z=-1
-2<z<-1
z=-2
z<-2
Beyond two std dev’s above the mean
At or near two std dev’s above the mean
In between one and two std dev’s above the mean
At or near one std dev above the mean
In between the mean and one std dev above the mean
At or near the mean
In between the mean and one std dev below the mean
At or near one std dev below the mean
In between one and two std dev’s below the mean
At or near two std dev’s below the mean
Beyond two std dev’s below the mean
It’s easy to compute the z-Score of a data value x. Just subtract the mean x and divide by the
xx
standard deviation s. z 
.
s
Example. If you scored a 55 on a test that had a mean of x  85 and a standard deviation of
s  10 , would your score be unusually low?
55  85
 3 .
Yes. The z-score for a grade of 55 is z 
10
Example. If you scored a 55 on a test that had a mean of x  75 and a standard deviation of
s  15 , would your score be unusually low?
55  75
 1.33 , according to our naming
No. Since the z-score of 55 in this case would be z 
15
chart above, it would be considered only “very low”, not “unusually low”.
Quartiles and Box Plots
This is another way of measuring relative standing. It’s an extension of the median. Recall that
the median is value at the center of the data list. So half the data have values below the median
(and half the data is above).
Look at the left hand figure below. Locate first quartile Q1 to the left of the median. 25% of the
data is lower than the first quartile Q1 . Locate third quartile Q3 to the right of the median. 75%
of the data is to the left of (has values less than) the third quartile Q3 .
Together with the Min and Max values, we have the 5 Number Summary description of a data
set. We also refer to these 5 numbers as The Quartiles.
Max
Q3
Med
Q1
Min
100% of data have values less than Max
75% of data have values less than Q3
50% of data have values less than Med
25% of data have values less than Q1
No data have values less than Min
A Box Plot is a visual representation of a 5 Number Summary. The box plots for the two
distributions are shown underneath the graphs. The box is what’s in between Q1 and Q3 . The
Med marked across the box. Lines extend out to the Min and Max values on either side.
As seen in the right hand figure, if the data distribution is skewed then the positions of the
quartiles are shifted.
Any data value can be compared to the positions of the quartiles. We can say things like, “My
score is above the third quartile.”
Moreover, the relative positions of the five locations can help describe the distribution of the data
set. Look at the two examples above. Notice that where the data is clustered together, the
quartiles are closer together on the data axis. And where the data is spread out, the quartiles are
farther apart. This is the basis for the Box Plot graphs that are used to compare data sets.
Percentiles
Percentiles are an extension of the Quartiles. The best way to define them is by example.
P50 is the “50th Percentile”. 50% of the data are less than P50 .
P30 is the “30th Percentile”. 30% of the data are less than P30 .
P92 is the “92th Percentile”. 92% of the data are less than P92 .
Convince yourself that Med  P50 and Q1  P25 . What percentile is Q3 ? Q3  P75 .
Note that Max  P100 . Which quartile is P0 ? P0  Min .