Download Box Plot

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Elementary mathematics wikipedia , lookup

Transcript
Quartiles
Quartiles are dividers to sort data into four groups of approximately equal size, Q1 (first 25%), Q2 (middle
50% [also the median]), Q3 (first 75% or upper 25%).
Lower 25%
Q1
│
Q2
Second 25%% │ Third 25%
Q3
│ Upper 25%
You can think of Q1 as the “median” between the values below Q2, and similarly Q3 is the “median” between
Q2 and the values above Q2.
How to find the quartiles:
Step 1:
Step 2:
Sort the observations from smallest to largest
To find the location of each quartile use:
𝑷
𝑳𝒐𝒄𝒂𝒕𝒊𝒐𝒏 = 𝒏 (𝟏𝟎𝟎) where P is the 25th, 50th, and 75th percentiles and n = sample size
If Location is NOT an integer - round Location up to the next integer (4.2, 4.5, 4.7 would all round up to 5),
the quartile is data value at XLocation
If Location is an integer - average the data values in Location and (Location +1).
𝑿
+𝑿
The quartile is 𝑳𝒐𝒄𝒂𝒕𝒊𝒐𝒏 𝟐 𝑳𝒐𝒄𝒂𝒕𝒊𝒐𝒏+𝟏
Example #1: Suppose you have this data set:
14 15 10 11 19 12 17
The first step is to sort from smallest to largest:
10 11 12 14 15 17 19
To find the quartiles use:
𝟐𝟓
𝑸𝟏 = 𝟕 (𝟏𝟎𝟎) = 𝟏. 𝟕𝟓
𝟓𝟎
𝑸𝟐 = 𝟕 (
𝟏𝟎𝟎
𝟕𝟓
) = 𝟑. 𝟓
𝑸𝟑 = 𝟕 (𝟏𝟎𝟎) = 𝟓. 𝟐𝟓
round up to 2, so Q1 = second observation = 11
round up to 4, so Q2 = fourth observation = 14
round up to 6, so Q3 = sixth observation = 17
10 11 12 14 15 17 19
*note not all segments have the same amount of data when n is odd. The first and third sections have 1 ½ observations, the second
and third sections have ½ + 1 + ½ = 2 observations. This is OK. Different textbooks will compute Q1 and Q3 differently to try to
solve this. But we will not worry about this for our class.
Example #2: Suppose you have this data set:
14 15 10 11 19 12 17 24
The first step is to sort from smallest to largest:
10 11 12 14 15 17 19 24
To find the quartiles use:
𝟐𝟓
Since 2 is an integer add the 2nd and 3rd observations and divide by 2 to get 𝑸𝟏 =
𝑸𝟏 = 𝟖 (𝟏𝟎𝟎) = 𝟐
𝟏𝟏+𝟏𝟐
𝟐
𝟓𝟎
𝑸𝟐 = 𝟖 (
𝟏𝟎𝟎
= 𝟏𝟏. 𝟓
Since 4 is an integer add the 4th and 5th observations and divide by 2 to get 𝑸𝟐 =
)=𝟒
𝟏𝟒+𝟏𝟓
𝟐
𝟕𝟓
= 𝟏𝟒. 𝟓
Since 6 is an integer add the 6th and 7th observations and divide by 2 to get 𝑸𝟑 =
𝑸𝟑 = 𝟖 (𝟏𝟎𝟎) = 𝟔
𝟏𝟕+𝟏𝟗
𝟐
= 𝟏𝟖
Five Point Summary of a Sample Data Set.
In a five point summary, the following five characteristic numbers are used to summarize the data.
1.
2.
3.
4.
5.
The minimum data value
First Quartile (Q1)
Median (Q2)
Third Quartile (Q3)
The maximum data value
Box and Whiskers Display (Box Plot)
This is an approach to graphically summarizing data that allows you to study it by quartile groupings.
How to construct a box and whiskers display:
1. Compute the five point summary (smallest, Q1, Q2, Q3, largest)
2. Compute the interquartile range.
The interquartile range contains the middle fifty percent of the data.
IQR = Q3 - Q1
3. Compute the fences.
Fences help determine if there are outliers or extreme values in the data. These fences are typically not
shown in the box plot.
Fences = 1.5(IQR) above Q3 and below Q1
4. To draw the box plot first draw the horizontal axis (number line) and place the five number summary
on it (the axis should extend past the smallest and largest data values). Next draw the box with ends at
quartiles 1 and 3 and mark the median or Q2 with a vertical line within the box.
5. NOTE: The ends of the whiskers can represent several possible alternative values depending on what
textbook (or software!) you use, among them:





the lowest datum still within the inner and outer fences
the minimum and maximum of all of the data
one standard deviation above and below the mean of the data
the 9th percentile and the 91st percentile
the 2nd percentile and the 98th percentile.
6. To draw the whiskers we will use the first method above. The lower line (whisker) starts at the lower
end of the box and continues until the smallest value in the data set that is MORE THAN the lower
fence value. Likewise, the upper dashed (whisker) line starts at the upper end of the box and continues
until the largest value in the data set that is LESS THAN the upper fence value.
7. Add asterisks (*) to represent any data points less than the lower fence or larger than the upper fence.
Consider all *’s outside the fences as possible outliers requiring attention.
Example: Study the following example carefully as it explains how to calculate the percentiles.
Consider the following sample data set. Note that it is arranged in an ascending order.
5 5.4 6.3 8.2 8.7 9.3 9.5 9.8 10.5 11.6 11.8 12.1 12.6 20
Find the five number summary:
Smallest = 5
Q1: location = (percentile/100)n = (25th/100)14 = 3.5. Since 3.5 is not an integer we round up to 4 and look for
the number in the 4th position in the ordered data which is 8.2. Twenty five percent of the data is below 8.2.
Q2: location = (percentile/100)n = (50th/100)14 =7 (notice that this is also the median) = Since 7 is an integer
we take the 7th and 8th numbers in the ordered data and add them then divide by 2 to find the answer which is
(9.5 + 9.8)/2 = 9.65 Fifty percent of the data is below 9.65.
Q3: location = (percentile/100)n = (75th/100)14 =10.5. Since 10.5 is not an integer we handle it like we did the
first quartile. The number in the 11th position in the ordered data is 11.8. Therefore seventy five percent of the
data is below 11.8.
Largest = 20
To draw the chart first draw the horizontal axis to scale and place the five number summary on it. Next draw
the box with ends at quartile 1 and 3 and mark the median or Q2.
5
↑
Q1
↑
Q2
↑
Q3
8.2
9.65
11.8
20
Now we need to find the interquartile range and the fences.
Interquartile range – (IQR) The interquartile range contains the middle fifty percent of the data.
IQR = 11.8 – 8.2 = 3.6
IQR = Q3 - Q1
Fences are markers we can add to the chart to help determine if there are outliers or extreme values in the data.
If there are it may be appropriate to throw out some or all of the outliers before continuing to study the data.
Sometimes fences are called hinges or limits.
Fences = 1.5(IQR) above Q3 and below Q1
5.4 above Q3 = 17.2
5.4 below Q1= 2.8
1.5(3.6) = 5.4
Note: there is one potential outlier above 17.2
Note: there are no potential outliers below 2.3
Here’s how our box plot looks (typically you do not actually show the fences on the box plot). The dashed
lines represent the “whiskers” and show the range for the data within the inner fences. There is one mild outlier
designated with an asterisk (*) outside the inner fences but inside the outer fence.
*
-2.6
2.8
5
↑
Q1
↑
Q2
↑
Q3
8.2
9.65
11.8
12.6
17.2
22.6
NOTE: FYI, if a fifteenth observation were added you would have the same value for Q1:
5 5.4 6.3 8.2 8.7 9.3 9.5 9.8 10.5 11.6 11.8 12.1 12.6 20 23
For Q1 location = (25th/100)15 = 3.75, Q1 is still the fourth number = 8.2
For Q2 location = 7.5, Q2 is the eighth number = 9.8
For Q3 location = 11.25, Q3 is the twelfth number = 12.1
The top and bottom points of the mean diamond show the upper and lower 95% confidence points