Download Descriptive Statistics: Box Plot

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Data mining wikipedia , lookup

World Values Survey wikipedia , lookup

Time series wikipedia , lookup

Transcript
OpenStax-CNX module: m16296
1
Descriptive Statistics: Box Plot
∗
Susan Dean
Barbara Illowsky, Ph.D.
This work is produced by OpenStax-CNX and licensed under the
†
Creative Commons Attribution License 3.0
Box plots or box-whisker plots give a good graphical image of the concentration of the data. They
also show how far from most of the data the extreme values are. The box plot is constructed from ve values:
the smallest value, the rst quartile, the median, the third quartile, and the largest value. The median, the
rst quartile, and the third quartile will be discussed here, and then again in the section on measuring data
in this chapter. We use these values to compare how close other data values are to them.
The median, a number, is a way of measuring the "center" of the data. You can think of the median
as the "middle value," although it does not actually have to be one of the observed values. It is a number
that separates ordered data into halves. Half the values are the same number or smaller than the median
and half the values are the same number or larger. For example, consider the following data:
1; 11.5; 6; 7.2; 4; 8; 9; 10; 6.8; 8.3; 2; 2; 10; 1
Ordered from smallest to largest:
1; 1; 2; 2; 4; 6; 6.8 ; 7.2 ; 8; 8.3; 9; 10; 10; 11.5
The median is between the 7th value, 6.8, and the 8th value 7.2. To nd the median, add the two values
together and divide by 2.
6.8 + 7.2
=7
2
(1)
The median is 7. Half of the values are smaller than 7 and half of the values are larger than 7.
Quartiles are numbers that separate the data into quarters. Quartiles may or may not be part of the
data. To nd the quartiles, rst nd the median or second quartile. The rst quartile is the middle value
of the lower half of the data and the third quartile is the middle value of the upper half of the data. To get
the idea, consider the same data set shown above:
1; 1; 2; 2; 4; 6; 6.8; 7.2; 8; 8.3; 9; 10; 10; 11.5
The median or second quartile is 7. The lower half of the data is 1, 1, 2, 2, 4, 6, 6.8. The middle value
of the lower half is 2.
1; 1; 2; 2 ; 4; 6; 6.8
The number 2, which is part of the data, is the rst quartile. One-fourth of the values are the same or
less than 2 and three-fourths of the values are more than 2.
The upper half of the data is 7.2, 8, 8.3, 9, 10, 10, 11.5. The middle value of the upper half is 9.
7.2; 8; 8.3; 9 ; 10; 10; 11.5
The number 9, which is part of the data, is the third quartile. Three-fourths of the values are less than
9 and one-fourth of the values are more than 9.
To construct a box plot, use a horizontal number line and a rectangular box. The smallest and largest
data values label the endpoints of the axis. The rst quartile marks one end of the box and the third
∗
†
Version 1.13: Mar 28, 2013 1:32 am +0000
http://creativecommons.org/licenses/by/3.0/
http://cnx.org/content/m16296/1.13/
OpenStax-CNX module: m16296
2
quartile marks the other end of the box. The middle fty percent of the data fall inside the box.
The "whiskers" extend from the ends of the box to the smallest and largest data values. The box plot gives
a good quick picture of the data.
You may encounter box and whisker plots that have dots marking outlier values. In those
cases, the whiskers are not extending to the minimum and maximum values.
note:
Consider the following data:
1; 1; 2; 2; 4; 6; 6.8 ; 7.2; 8; 8.3; 9; 10; 10; 11.5
The rst quartile is 2, the median is 7, and the third quartile is 9. The smallest value is 1 and the largest
value is 11.5. The box plot is constructed as follows (see calculator instructions in the back of this book or
on the TI web site1 ):
The two whiskers extend from the rst quartile to the smallest value and from the third quartile to the
largest value. The median is shown with a dashed line.
Example 1
The following data are the heights of 40 students in a statistics class.
59; 60; 61; 62; 62; 63; 63; 64; 64; 64; 65; 65; 65; 65; 65; 65; 65; 65; 65; 66; 66; 67; 67; 68; 68; 69;
70; 70; 70; 70; 70; 71; 71; 72; 72; 73; 74; 74; 75; 77
Construct a box plot:
Using the TI-83, 83+, 84, 84+ Calculator
• Enter data into the list editor (Press STAT 1:EDIT). If you need to clear the list, arrow up
•
•
•
•
to the name L1, press CLEAR, arrow down.
Put the data values in list L1.
Press STAT and arrow to CALC. Press 1:1-VarStats. Enter L1.
Press ENTER
Use the down and up arrow keys to scroll.
•
•
•
•
•
Smallest value = 59
Largest value = 77
Q1: First quartile = 64.5
Q2: Second quartile or median= 66
Q3: Third quartile = 70
Using the TI-83, 83+, 84, 84+ to Construct the Box Plot
Go to 14:Appendix for Notes for the TI-83, 83+, 84, 84+ Calculator. To create the box plot:
• Press Y=. If there are any equations, press CLEAR to clear them.
• Press 2nd Y=.
1
http://education.ti.com/educationportal/sites/US/sectionHome/support.html
http://cnx.org/content/m16296/1.13/
OpenStax-CNX module: m16296
•
•
•
•
•
•
•
•
3
Press 4:Plotso. Press ENTER
Press 2nd Y=
Press 1:Plot1. Press ENTER.
Arrow down and then use the right arrow key to go to the 5th picture which is the box plot.
Press ENTER.
Arrow down to Xlist: Press 2nd 1 for L1
Arrow down to Freq: Press ALPHA. Press 1.
Press ZOOM. Press 9:ZoomStat.
Press TRACE and use the arrow keys to examine the box plot.
a. Each quarter has 25% of the data.
b. The spreads of the four quarters are 64.5 - 59 = 5.5 (rst quarter), 66 - 64.5 = 1.5 (second
quarter), 70 - 66 = 4 (3rd quarter), and 77 - 70 = 7 (fourth quarter). So, the second quarter
has the smallest spread and the fourth quarter has the largest spread.
c. Interquartile Range: IQR = Q3 − Q1 = 70 − 64.5 = 5.5.
d. The interval 59 through 65 has more than 25% of the data so it has more data in it than the
interval 66 through 70 which has 25% of the data.
e. The middle 50% (middle half) of the data has a range of 5.5 inches.
For some sets of data, some of the largest value, smallest value, rst quartile, median, and third
quartile may be the same. For instance, you might have a data set in which the median and the
third quartile are the same. In this case, the diagram would not have a dotted line inside the box
displaying the median. The right side of the box would display both the third quartile and the
median. For example, if the smallest value and the rst quartile were both 1, the median and the
third quartile were both 5, and the largest value was 7, the box plot would look as follows:
Example 2
Test scores for a college statistics class held during the day are:
99; 56; 78; 55.5; 32; 90; 80; 81; 56; 59; 45; 77; 84.5; 84; 70; 72; 68; 32; 79; 90
Test scores for a college statistics class held during the evening are:
98; 78; 68; 83; 81; 89; 88; 76; 65; 45; 98; 90; 80; 84.5; 85; 79; 78; 98; 90; 79; 81; 25.5
Problem
http://cnx.org/content/m16296/1.13/
(Solution on p. 5.)
OpenStax-CNX module: m16296
What are the smallest and largest data values for each data set?
What is the median, the rst quartile, and the third quartile for each data set?
Create a boxplot for each set of data.
Which boxplot has the widest spread for the middle 50% of the data (the data between the
rst and third quartiles)? What does this mean for that set of data in comparison to the
other set of data?
• For each data set, what percent of the data is between the smallest value and the rst quartile?
(Answer: 25%) the rst quartile and the median? (Answer: 25%) the median and the third
quartile? the third quartile and the largest value? What percent of the data is between the
rst quartile and the largest value? (Answer: 75%)
•
•
•
•
The rst data set (the top box plot) has the widest spread for the middle 50% of the data. IQR =
Q3 − Q1 is 82.5 − 56 = 26.5 for the rst data set and 89 − 78 = 11 for the second data set.
So, the rst set of data has its middle 50% of scores more spread out.
25% of the data is between M and Q3 and 25% is between Q3 and Xmax.
http://cnx.org/content/m16296/1.13/
4
OpenStax-CNX module: m16296
Solutions to Exercises in this Module
Solution to Example 2, Problem (p. 3)
First Data Set
•
•
•
•
•
Xmin
Q1 =
M =
Q3 =
Xmax
= 32
56
74.5
82.5
= 99
Second Data Set
•
•
•
•
•
Xmin
Q1 =
M =
Q3 =
Xmax
= 25.5
78
81
89
= 98
Glossary
Denition 1: Median
A number that separates ordered data into halves. Half the values are the same number or smaller
than the median and half the values are the same number or larger than the median. The median
may or may not be part of the data.
Denition 2: Quartiles
The numbers that separate the data into quarters. Quartiles may or may not be part of the data.
The second quartile is the median of the data.
http://cnx.org/content/m16296/1.13/
5