Download MAT 155 Principles of Math II

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
Transcript
MAT 142 Ch15 Note
Park
15.1 Graphs and Charts
I. Representing Categorical or Discrete Data
The following data set consists of m&m colors in a bag.
RED
BLUE
YELLOW
BLUE
RED
GREEN
BLUE
BROWN
RED
YELLOW
RED
RED
RED
RED
BLUE
GREEN
BLUE
YELLOW
BROWN
RED
BLUE
RED
BLUE
YELLOW
YELLOW
BLUE
BROWN
YELLOW
BROWN
RED
A. Frequency & Relative Frequency Table
Color
Frequency
Relative Frequency
RED
BLUE
BROWN
GREEN
YELLOW
Which color do you have the most of? What is the relative frequency of that color?
B. Bar Graph
*Note: The bars do not touch. The bars are centered on the category or discrete numerical value.
MAT 142 Ch15 Note
Park
C. Dot Plot (Line Plot)
D. Circle Graph
To make a circle graph:
1) Know the fraction of the circular
region allocated to each value of
the variable
2) Because there are 360° in a
circle, we can draw a sector with
an angle size of 360 x fraction of
the value.
*Circle graphs often, though not always, represent percents of the total rather than frequencies.
E. Pictograph
*Note: equally spaced intervals and key assigning weight to a picture
MAT 142 Ch15 Note
Park
Potential dangers of pictographs…..What is wrong with each of the following pictographs? How could we
“fix” them?
II. Representing Numerical Data
Record the height (in inches) of your classmates below:
A. Stem-and-leaf plot
Stem (tens-digit)
Leaf (ones-digit)
*Note: order leaves from smallest to largest to make an ordered stem-and-leaf
B. Histogram
While there is some flexibility involved in determining the details of a histogram, some general guidelines
should be followed:
- Use a reasonable number of bins (usually 4 to 7 bins is recommended)
- Every data value must fall into exactly one bin, which implies
- bin boundaries must be defined precisely
- your bins must cover the entire range of your data set
- Bins must have equal widths
- The drawing of a histogram (bar graph) can have no gaps between bins
Let’s construct a histogram based on the data set we got!
MAT 142 Ch15 Note
Park
1. Order the data!
2. Range
3. Width
4. Class-count table
5. Histogram
MAT 142 Ch15 Note
Park
EX1
The following histogram represents fuel economy of several cars sold in the United States and
Europe. What descriptive properties of this data set does this histogram make clear?
Certain properties of data sets are described with specific terminology.
DEF1
An _________________________ is a data value that does not follow the overall pattern of the data set.
A data set which contains a high outlier – resulting in a histogram which will have a “tail” on the right – is
called _____________________________________________.
A data set which contains a low outlier – resulting in a histogram which will have a “tail” on the left – is called
_____________________________________________.
Histograms and other graphical representations of data allow us to see certain properties of an entire data set. In
some cases, though, we prefer precise and concise numerical summaries – one or two numerical values which
capture the essential properties of a data set.
MAT 142 Ch15 Note
Park
Two essential properties we will focus on are the ____________________________ and
__________________________________ of a data set.
The center of a data set can be thought of as its “average” – the one number which is the best representation of
all of the numbers in the data set. Two common measures of the center of a data set are the
____________________________ and the ___________________________________.
EX 2
Find the mean and median of the values 2, 13, 16, 4, 4, 19, 22, 19.
EX 3
Find the mean and median of the following data set, which represents patient wait times at an emergency
room, measured in minutes.
11
16
18
21
24
24
25
29
31
33
38
41
42
91
94
For the above data set, is the mean or the median a better representation of the center of the set? Why?
Note that the ________________________is strongly affected by __________________________, while the
__________________ is not. Specifically, the mean is __________________________________
________________________________ outliers. Thus, comparing the values of the mean and median can be an
indication of skewness.
MAT 142 Ch15 Note
Park
EX 4
State the mean and median of each of the three data sets below.
A: 75, 75, 75, 75, 75
B: 65, 70, 75, 80, 85
C: 45, 60, 75, 90, 105
The data sets from EX 4 demonstrate that describing a data set by simply stating its center doesn’t always give a complete
picture of the properties of the set.
In addition to describing the center of a data set, it is often helpful to also measure its __________________________,
that is, the extent to which the data varies.
Two common measures of spread we will focus on are standard deviation and the five-number summary.
The purpose of calculating the standard deviation of a data set is find how far the data values are from the mean “on
average.” This calculation entails several steps, but throughout the process, keeping in mind this purpose will help clarify
the steps required.
EX 5
Calculate the standard deviation of the following data set.
1.8
2.2
2.4
2.4
3.4
3.4
4
Since standard deviation measures how far data are from the mean, the first step will be to find the mean.
MAT 142 Ch15 Note
Park
EX 5
(Continued)
The next several steps can be efficiently organized in the following table:
Data
values
Subtract
the
Square
mean
Once we have completed the table, there are still steps to finish.
If the data values 2.8, 2.8, 2.9, and 2.9 were added to our data set, would we expect the new set to have a
higher or lower standard deviation than we just calculated? Why?
MAT 142 Ch15 Note
Park
EX 6
Calculate the standard deviation of the following data set.
40
42
Data
values
EX 7
43
43
48
48
49
51
Subtract
the
Square
mean
Without performing any calculations, identify which of the following data sets would have the largest
standard deviation. Which would have the lowest?
A: 34, 38, 41, 46
B: 1.2, 16.75, 17.2, 34.6, 48.2
C: 116, 119, 119, 120, 121, 122, 122, 122, 123
In earlier examples, we noted that if our data set has ___________________________, we prefer to use the
____________________________ to describe the center of the data set. In such cases, the standard deviation – which is
based on the mean of the data set – may be an ineffective measure of the spread of the data.
Thus, in some circumstances – such as when our data set contains outliers – we prefer to describe the spread of the data
using the five-number summary.
MAT 142 Ch15 Note
Park
As its name suggests, the five-number summary is a list of five key numbers which are equally spaced throughout the data
set. By identifying these key numbers, we can gauge the variation of the data across the range of the entire set –
approximately 25% of the data values in a set fall between the numbers in the five-number summary.
DEF 2
The five-number summary consists of the following five numerical values derived from the data set, listed in
increasing order:
Minimum :: Lower Quartile :: Median :: Upper Quartile :: Maximum
EX 8
The lower quartile of a data set (also called the first quartile, denoted (
which fall below the overall median of the entire data set.
) is the median of the data values
The upper quartile of a data set (also called the third quartile, denoted (
which fall above the overall median of the entire data set.
) is the median of the data values
Consider the following stem-and-leaf plot based on the first test in one instructor’s MAT 142 course:
3
4
5
6
7
8
9
9
0
0
1
0
0
0
0
0
3
0
0
1
8
1
3
2
1
1
8
1
3
3
1
9
2
5
3
2
2
6
9
2
3
8
9
8
8
8
9
8
9
8
9
9
State the five-number summary for this
data set.
If the instructor is feeling generous and decides to give the top 25% of students an A, what test score becomes
the cut-off for an A?
EX 9
Soil ecologists measured levels of a specific nutrient in 28 samples collected from two different sites. The
data from the two sites is represented by the following five-number summaries.
Site A: 2.6 3.2 3.8 4.2 4.8
Site B: .01 3.7 3.9 4.3 4.8
Label the following descriptions with A or B, based on which site best fits the property described.
____ has an outlier
____ more symmetric distribution
____ has the larger median
____ would have the larger standard deviation
____ has the larger third quartile
____ has the larger mean
A convenient visual representation of a five-number summary is a boxplot.
MAT 142 Ch15 Note
Park
EX 10
On the axes below, construct two separate boxplots for the two five-number summaries given in EX 9.
A
B
0
0.8
1.6
2.4
3.2
4.0
4.8
Based on the above boxplots, what percentage of samples from site A had a nutrient level between 3.2 and
4.2? _______
How many samples from site B had a nutrient level higher than 4.3? _______
How many samples from site B had a nutrient level between .01 and 3.7? _______
Within what range of nutrient levels did the middle 50% of the samples from site B fall?
____________________