Download 3.4 Measures of Relative Standing

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

World Values Survey wikipedia , lookup

Time series wikipedia , lookup

Transcript
3.4-Measures of Relative Standing
This section introduces measures of relative standing, which are numbers showing the location of data values
relative to the other values within a data set. They can be used to compare values from different data sets, or to
compare values within the same data set. The most important concept is the z score. We will also discuss
percentiles and quartiles, as well as a new statistical graph called the boxplot.
Z Scores:
A z score is found by converting a value to a standardized scale and represents the number of standard
deviations that a data value is from the mean. It is calculated as follows:
z=
z=
x−x
This is the z score for a sample. It uses the sample mean and standard deviation.
s
x−µ
σ
This is the z score for a population. It uses the population mean and standard deviation
Z scores should always be rounded to two decimal places.
Example: Scores on a test have a mean of 70 and a standard deviation of 11. Michelle has a score of 48.
Convert Michelle's score to a z-score.
Solution: Because the formulas for a sample and population are the same, it makes no difference which
formula we use, however assuming we have all the scores, we have:
z=
x−µ
σ
=
48 − 70
= −2.00
11
Michelle has a z score of 2. This means that Michelle’s score of 48 was 2 standard deviations below the mean.
Z Scores, Unusual Values, and Outliers:
A value is considered unusual if it is more than 2 standard deviations from the mean. Whenever a value is less
than the mean, its corresponding z score is negative. Therefore unusual values will have z scores of less than -2
or greater than +2.
Ordinary values: –2 ≤ z score ≤ 2
Unusual Values:
z score < –2 or z score > 2
An outlier is a data value that is far away from the other data values. In general we will define an outlier as an
unusual value; that is, more than 2 standard deviations from the mean.
Example: A department store, on average, has daily sales of $ 29,876.76. The standard deviation of sales is $
1000. On Tuesday, the store sold $34,893.71 worth of goods. Find Tuesday's z score. Was Tuesday an
unusually good day?
Solution: Find the z score.
z=
x − x 34,893.71 − 29,876.76
=
= 5.02
s
1000
The z score of 5.02 is slightly more than 5 standard deviations above the mean which is a very unusual
number. It should therefore be obvious that Tuesday was a very unusual day.
Example: Which is better: a score of 82 on a test with a mean of 70 and a standard deviation of 8, or a score of
82 on a test with a mean of 75 and a standard deviation of 4?
Solution: Find the z score for each value.
z=
x − x 82 − 70
= 1.50
=
s
8
z=
x − x 82 − 75
=
= 1.75
s
4
The second score is a better score because it is 1.75 standard deviations above the mean as compared to the
first score which is 1.5 standard deviations above the mean.
Percentiles:
Percentiles are one type of quantiles which partition data into groups. Percentiles are measures of location
which divide a set of data into 100 groups with about 1% of the values in each group.
To find the percentile of a data value, use the formula:
Percentile of a Value x =
Number of Values less than x
⋅ 100%
Total Number of Values
Example: Find the percentile for the data value 14, given the following data set.
Data set: 4
6
14
10
4
10
18
18
22
6
6
18
12
2
18
Solution: There are 9 data values less than 14 and a total of 15 data values. Therefore we have:
Percentile of 14 =
9
⋅ 100% = 60%
15
Converting a Percentile to a Data Value:
To find a data value that corresponds to a given percentile, use the following formula on a sorted data set.
L represents the location of the data value in the sorted data set.
L=
n
k
L
Pk
k
⋅n
100
total number of values in the data set
percentile being used
locator that gives the position of a value
kth percentile
Example: The test scores of 32 students are listed below. Find P46 .
89
56
70
57
37
80
48
82
41
59
74
83
99
63
74
86
46
77
75
32
71
66
65
92
53
68
78
95
55
69
79
44
Solution: Use the formula to find the location of the data value that is in the 46th percentile.
L=
46
⋅ 32 = 14.72
100
Because 14.72 is not a whole number, round up to the
nearest whole number which is 15. When the data set is
sorted in order from highest to lowest, the data value
that is number 15 from the bottom is 68.
32
56
70
80
37
57
71
82
41
59
74
83
44
63
74
86
46
65
75
89
48
66
77
92
53
68
78
95
55
69
79
99
Quartiles:
Quartiles are measures of location which divide a set of data into four groups with about 25% of the values in
each group.
1st Quartile: 25% of the sorted values are less than or equal to this value.
2nd Quartile: 50% of the sorted values are less than or equal to this value.
3rd Quartile: 75% of the sorted values are less than or equal to this value.
Example: The test scores of 32 students are listed below. Find Q3 .
89
56
70
57
37
80
48
82
41
59
74
83
99
63
74
86
46
77
75
32
71
66
65
92
53
68
78
95
55
69
79
44
Solution: Because Q3 = P75 use the formula to find the location of the data value that is in the 3rd Quartile.
L=
75
⋅ 32 = 24
100
Because 24 is a whole number, the value of the 3rd Quartile (75th Percentile) is half way between the value of
L and L+1. Therefore, the 3rd Quartile is 79.5.
32
56
70
80
37
57
71
82
41
59
74
83
44
63
74
86
46
65
75
89
48
66
77
92
53
68
78
95
55
69
79
99
For a set of data, the 5-number summary consists of the minimum value; the first quartile Q1; the median (or
second quartile Q2); the third quartile, Q3; and the maximum value. A 5-number summary is used to create a
boxplot.
Boxplots:
A boxplot (or box-and-whisker-diagram) is a graph of a data set that consists of a line extending from the
minimum value to the maximum value, and a box with lines drawn at the first quartile, Q1; the median; and the
third quartile, Q3.
Boxplot from a normal distribution:
Boxplot from a skewed distribution:
Outliers and Modified Boxplots:
An outlier is a value that lies very far away from the vast majority of the other values in a data set.
An outlier can have a dramatic effect on the mean.
An outlier can have a dramatic effect on the standard deviation.
An outlier can have a dramatic effect on the scale of the histogram so that the true nature of the
distribution is totally obscured.
Modified Boxplots:
For purposes of constructing modified boxplots, we can consider outliers to be data values meeting specific
criteria. In modified boxplots, a data value is an outlier if it is above Q3 by an amount greater than 1.5 × IQR
or below Q1 by an amount greater than 1.5 × IQR.
A modified boxplot is constructed with these specifications:
A special symbol (such as an asterisk) is used to identify outliers.
The solid horizontal line extends only as far as the minimum data value that is not an outlier and the
maximum data value that is not an outlier.
Example: The weights (in pounds) of 30 newborn babies are listed below. Construct a boxplot for the data set.
5.5 5.7 5.8 5.9 6.1 6.1 6.3 6.4 6.5 6.6
6.7 6.7 6.7 6.9 7.0 7.0 7.0 7.1 7.2 7.2
7.4 7.5 7.7 7.7 7.8 8.0 8.1 8.1 8.3 8.7
Solution: Determine the quartile values of 5.5, 6.4, 7.0, 7.7, and 8.7. Then construct the boxplot.