Download Chap. 2: Methods for Describing Sets of Data

Document related concepts

Time series wikipedia , lookup

Transcript
© 2011 Pearson Education, Inc
Statistics for Business and
Economics
Chapter 2
Methods for Describing
Sets of Data
© 2011 Pearson Education, Inc
Contents
1. Describing Qualitative Data
2. Graphical Methods for Describing
Quantitative Data
3. Summation Notation
4. Numerical Measures of Central Tendency
5. Numerical Measures of Variability
6. Interpreting the Standard Deviation
© 2011 Pearson Education, Inc
Contents
7. Numerical Measures of Relative Standing
8. Methods for Detecting Outliers: Box Plots
and z-scores
9. Graphing Bivariate Relationships
10. The Time Series Plot
11. Distorting the Truth with Descriptive
Techniques
© 2011 Pearson Education, Inc
Learning Objectives
1. Describe data using graphs
2. Describe data using numerical measures
© 2011 Pearson Education, Inc
2.1
Describing Qualitative Data
© 2011 Pearson Education, Inc
Key Terms
A class is one of the categories into which
qualitative data can be classified.
The class frequency is the number of
observations in the data set falling into a
particular class.
The class relative frequency is the class
frequency divided by the total numbers of
observations in the data set.
The class percentage is the class relative
frequency multiplied by 100.
© 2011 Pearson Education, Inc
Data Presentation
Data
Presentation
Qualitative
Data
Quantitative
Data
Dot
Plot
Summary
Table
Bar
Graph
Pie
Chart
Stem-&-Leaf
Display
Pareto
Diagram
© 2011 Pearson Education, Inc
Frequency
Distribution
Histogram
Data Presentation
Data
Presentation
Qualitative
Data
Quantitative
Data
Dot
Plot
Summary
Table
Bar
Graph
Pie
Chart
Stem-&-Leaf
Display
Pareto
Diagram
© 2011 Pearson Education, Inc
Frequency
Distribution
Histogram
Summary Table
1. Lists categories & number of elements in category
2. Obtained by tallying responses in category
3. May show frequencies (counts), % or both
Row Is
Category
Major
Count
Accounting
130
Economics
20
Management
50
Total
200
© 2011 Pearson Education, Inc
Tally:
|||| ||||
|||| ||||
Data Presentation
Data
Presentation
Qualitative
Data
Quantitative
Data
Dot
Plot
Summary
Table
Bar
Graph
Pie
Chart
Stem-&-Leaf
Display
Pareto
Diagram
© 2011 Pearson Education, Inc
Frequency
Distribution
Histogram
Bar Graph
Percent
Used
Also
Frequency
150
Equal Bar
Widths
Bar Height
Shows
Frequency or %
100
50
0
Acct.
Econ.
Major
Zero Point
© 2011 Pearson Education, Inc
Mgmt.
Vertical Bars
for Qualitative
Variables
Data Presentation
Data
Presentation
Qualitative
Data
Quantitative
Data
Dot
Plot
Summary
Table
Bar
Graph
Pie
Chart
Stem-&-Leaf
Display
Pareto
Diagram
© 2011 Pearson Education, Inc
Frequency
Distribution
Histogram
Pie Chart
1. Shows breakdown of
total quantity into
categories
2. Useful for showing
relative differences
Majors
Econ.
10%
Mgmt.
25%
36°
Acct.
65%
3. Angle size
•
(360°)(percent)
(360°) (10%) = 36°
© 2011 Pearson Education, Inc
Data Presentation
Data
Presentation
Qualitative
Data
Quantitative
Data
Dot
Plot
Summary
Table
Bar
Graph
Pie
Chart
Stem-&-Leaf
Display
Pareto
Diagram
© 2011 Pearson Education, Inc
Frequency
Distribution
Histogram
Pareto Diagram
Like a bar graph, but with the categories arranged by
height in descending order from left to right.
Percent
Used
Also
Frequency
150
Equal Bar
Widths
Bar Height
Shows
Frequency or %
100
50
0
Acct.
Mgmt.
Major
Zero Point
© 2011 Pearson Education, Inc
Econ.
Vertical Bars
for Qualitative
Variables
Summary
Bar graph: The categories (classes) of the qualitative
variable are represented by bars, where the height of
each bar is either the class frequency, class relative
frequency, or class percentage.
Pie chart: The categories (classes) of the qualitative
variable are represented by slices of a pie (circle). The
size of each slice is proportional to the class relative
frequency.
Pareto diagram: A bar graph with the categories
(classes) of the qualitative variable (i.e., the bars)
arranged by height in descending order from left to
right.
© 2011 Pearson Education, Inc
Thinking Challenge
You’re an analyst for IRI. You want to show the
market shares held by Web browsers in 2006.
Construct a bar graph, pie chart, & Pareto diagram to
describe the data.
Browser
Firefox
Internet Explorer
Safari
Others
Mkt. Share (%)
14
81
4
1
© 2011 Pearson Education, Inc
Market Share (%)
Bar Graph Solution*
100%
80%
60%
40%
20%
0%
Firefox
Internet
Explorer
Safari
Browser
© 2011 Pearson Education, Inc
Others
Pie Chart Solution*
Market Share
Firefox,
14%
Safari, 4%
Others,
1%
Internet
Explorer,
81%
© 2011 Pearson Education, Inc
Market Share (%)
Pareto Diagram Solution*
100%
80%
60%
40%
20%
0%
Internet
Explorer
Firefox
Safari
Browser
© 2011 Pearson Education, Inc
Others
2.2
Graphical Methods for Describing
Quantitative Data
© 2011 Pearson Education, Inc
Data Presentation
Data
Presentation
Qualitative
Data
Quantitative
Data
Dot
Plot
Summary
Table
Bar
Graph
Pie
Chart
Stem-&-Leaf
Display
Pareto
Diagram
© 2011 Pearson Education, Inc
Frequency
Distribution
Histogram
Dot Plot
1. Horizontal axis is a scale for the quantitative variable,
e.g., percent.
2. The numerical value of each measurement is located
on the horizontal scale by a dot.
© 2011 Pearson Education, Inc
Data Presentation
Data
Presentation
Qualitative
Data
Quantitative
Data
Dot
Plot
Summary
Table
Bar
Graph
Pie
Chart
Stem-&-Leaf
Display
Pareto
Diagram
© 2011 Pearson Education, Inc
Frequency
Distribution
Histogram
Stem-and-Leaf Display
1. Divide each observation
into stem value and leaf
value
• Stems are listed in
order in a column
• Leaf value is placed in
corresponding stem
row to right of bar
2 144677
3 028
4 1
2. Data: 21, 24, 24, 26, 27, 27, 30, 32, 38, 41
© 2011 Pearson Education, Inc
26
Data Presentation
Data
Presentation
Qualitative
Data
Quantitative
Data
Dot
Plot
Summary
Table
Bar
Graph
Pie
Chart
Stem-&-Leaf
Display
Pareto
Diagram
© 2011 Pearson Education, Inc
Frequency
Distribution
Histogram
Frequency Distribution
Table Steps
1. Determine range
2. Select number of classes
• Usually between 5 & 15 inclusive
3. Compute class intervals (width)
4. Determine class boundaries (limits)
5. Compute class midpoints
6. Count observations & assign to classes
© 2011 Pearson Education, Inc
Frequency Distribution Table
Example
Raw Data: 24, 26, 24, 21, 27 27 30, 41, 32, 38
Class
Width
Midpoint Frequency
15.5 – 25.5
20.5
3
25.5 – 35.5
30.5
5
35.5 – 45.5
40.5
2
Boundaries
(Lower + Upper Boundaries) / 2
© 2011 Pearson Education, Inc
Relative Frequency &
% Distribution Tables
Relative Frequency
Distribution
Percentage
Distribution
Class
Prop.
Class
%
15.5 – 25.5
.3
15.5 – 25.5
30.0
25.5 – 35.5
.5
25.5 – 35.5
50.0
35.5 – 45.5
.2
35.5 – 45.5
20.0
© 2011 Pearson Education, Inc
Data Presentation
Data
Presentation
Qualitative
Data
Quantitative
Data
Dot
Plot
Summary
Table
Bar
Graph
Pie
Chart
Stem-&-Leaf
Display
Pareto
Diagram
© 2011 Pearson Education, Inc
Frequency
Distribution
Histogram
Histogram
Class
15.5 – 25.5
25.5 – 35.5
35.5 – 45.5
Count
5
Frequency
Relative
Frequency
Percent
4
3
Bars
Touch
2
1
0
0
15.5
25.5
35.5
45.5
Lower Boundary
© 2011 Pearson Education, Inc
55.5
Freq.
3
5
2
2.3
Summation Notation
© 2011 Pearson Education, Inc
Summation Notation
Most formulas we use require a summation of numbers.
n
x
i
i1
Sum the measurements on the variable that appears to the
right of the summation symbol, beginning with the 1st
measurement and ending with the nth measurement.
© 2011 Pearson Education, Inc
Summation Notation
For the data x1  5, x2  3, x3  8, x4  5, x5  4
5
2
2
2
2
2
2
x

x

x

x

x

x
i 1 2 3 4 5
i1
 5 2  32  8 2  5 2  4 2
 25  9  64  25  16  139
© 2011 Pearson Education, Inc
2.4
Numerical Measures
of Central Tendency
© 2011 Pearson Education, Inc
Thinking Challenge
$400,000
$70,000
$50,000
$30,000
... employees cite low pay -most workers earn only
$20,000.
$20,000
... President claims average
pay is $70,000!
© 2011 Pearson Education, Inc
Two Characteristics
The central tendency of the set of
measurements–that is, the tendency of the data to
cluster, or center, about certain numerical values.
Central Tendency
(Location)
© 2011 Pearson Education, Inc
Two Characteristics
The variability of the set of measurements–that
is, the spread of the data.
Variation
(Dispersion)
© 2011 Pearson Education, Inc
Standard Notation
Measure
Sample
Population
Mean
X

Size
n
N
© 2011 Pearson Education, Inc
Mean
1.
2.
3.
4.
Most common measure of central tendency
Acts as ‘balance point’
Affected by extreme values (‘outliers’)
Denoted x where
n
x 
x i
i 1
n

x 1  x 2 … x
n
© 2011 Pearson Education, Inc
n
Mean Example
Raw Data:
10.3 4.9 8.9 11.7 6.3 7.7
n
x 

x i
i 1
n

x1x2 x
3
x
4
x
5
6
10 .3  4.9  8.9  11.7  6.3  7.7
6
 8.30
© 2011 Pearson Education, Inc
x6
Median
1. Measure of central tendency
2. Middle value in ordered sequence
•
•
If n is odd, middle value of sequence
If n is even, average of 2 middle values
3. Position of median in sequence
n 1
Positioning Point 
2
4. Not affected by extreme values
© 2011 Pearson Education, Inc
Median Example
Odd-Sized Sample
• Raw Data: 24.1 22.6 21.5 23.7 22.6
• Ordered: 21.5 22.6 22.6 23.7 24.1
• Position:
1
2
3
4
5
n 1 5 1
Positioning Point 

 3.0
2
2
Median  22 .6
© 2011 Pearson Education, Inc
Median Example
Even-Sized Sample
• Raw Data: 10.3 4.9 8.9 11.7 6.3 7.7
• Ordered: 4.9 6.3 7.7 8.9 10.3 11.7
• Position:
1
2
3
4
5
6
n 1 6 1
Positioning Point 

 3.5
2
2
7.7  8.9
Median 
 8.30
2
© 2011 Pearson Education, Inc
Mode
1. Measure of central tendency
2. Value that occurs most often
3. Not affected by extreme values
4. May be no mode or several modes
5. May be used for quantitative or qualitative
data
© 2011 Pearson Education, Inc
Mode Example
• No Mode
Raw Data: 10.3 4.9 8.9 11.7 6.3 7.7
• One Mode
Raw Data: 6.3 4.9 8.9
6.3 4.9 4.9
• More Than 1 Mode
Raw Data: 21 28
41
28
© 2011 Pearson Education, Inc
43
43
Thinking Challenge
You’re a financial analyst
for Prudential-Bache
Securities. You have
collected the following
closing stock prices of new
stock issues: 17, 16, 21, 18,
13, 16, 12, 11.
Describe the stock prices
in terms of central
tendency.
© 2011 Pearson Education, Inc
Central Tendency Solution*
Mean
n
x 

x i
i 1
n

x 1  x 2 … x
8
8
17  16  21  18  13  16  12  11
8
 15 .5
© 2011 Pearson Education, Inc
Central Tendency Solution*
Median
• Raw Data: 17 16 21
• Ordered: 11 12 13
• Position:
1 2 3
n
Positioning Point 
Median 
16  16
2
18 13 16 12 11
16 16 17 18 21
4 5 6 7 8
1 8 1

 4.5
2
2
 16
© 2011 Pearson Education, Inc
Central Tendency Solution*
Mode
Raw Data:
17 16 21 18 13 16 12 11
Mode = 16
© 2011 Pearson Education, Inc
Summary of
Central Tendency Measures
Measure
Mean
Median
Mode
Formula
x i / n
(n+1)
Position
2
none
Description
Balance Point
Middle Value
When Ordered
Most Frequent
© 2011 Pearson Education, Inc
Shape
1. Describes how data are distributed
2. Measures of Shape
• Skew = Symmetry
Left-Skewed
Mean Median
Symmetric
Mean = Median
© 2011 Pearson Education, Inc
Right-Skewed
Median Mean
2.5
Numerical Measures
of Variability
© 2011 Pearson Education, Inc
Range
1. Measure of dispersion
2. Difference between largest & smallest
observations
Range = xlargest – xsmallest
3. Ignores how data are distributed
7 8 9 10
Range = 10 – 7 = 3
7 8 9 10
Range = 10 – 7 = 3
© 2011 Pearson Education, Inc
Variance &
Standard Deviation
1. Measures of dispersion
2. Most common measures
3. Consider how data are distributed
4. Show variation about mean (x or μ)
x = 8.3
4
6
8 10 12
© 2011 Pearson Education, Inc
Standard Notation
Measure
Mean
Sample
Population
x

s

Standard
Deviation
2
Variance
s
Size
n
© 2011 Pearson Education, Inc

2
N
Sample Variance Formula
n
s2 
 x
 x
2
i
i1
n 1
x1  x   x2  x 


2
2
 L  xn  x 
2
n 1
n – 1 in denominator!
© 2011 Pearson Education, Inc
Sample Standard Deviation
Formula
s  s2
n

 x
 x
2
i
i1
n 1
x1  x   x2  x 
2

2
L  xn  x 
n 1
© 2011 Pearson Education, Inc
2
Variance Example
Raw Data:
10.3 4.9 8.9 11.7 6.3 7.7
n
s
2

 (x i  x )
i 1
n
2
n 1
where x 
2
s
2
2
x i
i 1
n
 8.3
2
10 .3  8.3 )  (4.9  8.3 )  …  (7.7  8.3 )
(

6 1
 6.368
© 2011 Pearson Education, Inc
Thinking Challenge
• You’re a financial analyst
for Prudential-Bache
Securities. You have
collected the following
closing stock prices of
new stock issues: 17, 16,
21, 18, 13, 16, 12, 11.
• What are the variance
and standard deviation
of the stock prices?
© 2011 Pearson Education, Inc
Variation Solution*
Sample Variance
Raw Data: 17 16 21 18 13 16 12 11
n
s
2

n
2
 (x i  x )
i 1
n 1
where x 
2
s
2
2
x i
i 1
n
 15 .5
2
17  15 .5 )  (16  15 .5 )  …  (11  15 .5 )
(

 11.14
8 1
© 2011 Pearson Education, Inc
Variation Solution*
Sample Standard Deviation
n
s  s2 
 x
i
 x
i1
n 1
2
 11.14  3.34
© 2011 Pearson Education, Inc
Summary of
Variation Measures
Measure
Range
Standard Deviation
(Sample)
Formula
Description
X largest – X smallest
n
 x  x 
2
i
Total Spread
Dispersion about
Sample Mean
i1
n 1
Standard Deviation
(Population)
n
 x  µ 
2
i
x
i1
Dispersion about
Population Mean
N
n
Variance
(Sample)
 xi  x 
2
i1
n 1
Squared Dispersion
about Sample Mean
© 2011 Pearson Education, Inc
2.6
Interpreting the
Standard Deviation
© 2011 Pearson Education, Inc
Interpreting Standard Deviation:
Chebyshev’s Theorem
• Applies to any shape data set
• No useful information about the fraction of data in the
interval x – s to x + s
• At least 3/4 of the data lies in the interval
x – 2s to x + 2s
• At least 8/9 of the data lies in the interval
x – 3s to x + 3s
• In general, for k > 1, at least 1 – 1/k2 of the data lies
in the interval x – ks to x + ks
© 2011 Pearson Education, Inc
Interpreting Standard Deviation:
Chebyshev’s Theorem
x  3s
x  2s
xs
x
xs
x  2s
No useful information
At least 3/4 of the data
At least 8/9 of the data
© 2011 Pearson Education, Inc
x  3s
Chebyshev’s Theorem Example
• Previously we found the mean
closing stock price of new stock
issues is 15.5 and the standard
deviation is 3.34.
• Use this information to form an
interval that will contain at least
75% of the closing stock prices of
new stock issues.
© 2011 Pearson Education, Inc
Chebyshev’s Theorem Example
At least 75% of the closing stock prices of new stock
issues will lie within 2 standard deviations of the mean.
x = 15.5
s = 3.34
(x – 2s, x + 2s) = (15.5 – 2∙3.34, 15.5 + 2∙3.34)
= (8.82, 22.18)
© 2011 Pearson Education, Inc
Interpreting Standard Deviation:
Empirical Rule
• Applies to data sets that are mound shaped and
symmetric
• Approximately 68% of the measurements lie in
the interval x  s to x  s
• Approximately 95% of the measurements lie in
the interval x  2s to x  2s
• Approximately 99.7% of the measurements lie
in the interval x  3s to x  3s
© 2011 Pearson Education, Inc
Interpreting Standard Deviation:
Empirical Rule
x – 3s
x – 2s
x–s
x
x+s
x +2s
x + 3s
Approximately 68% of the measurements
Approximately 95% of the measurements
Approximately 99.7% of the measurements
© 2011 Pearson Education, Inc
Empirical Rule Example
Previously we found the mean
closing stock price of new
stock issues is 15.5 and the
standard deviation is 3.34. If
we can assume the data is
symmetric and mound shaped,
calculate the percentage of the
data that lie within the intervals
x + s, x + 2s, x + 3s.
© 2011 Pearson Education, Inc
Empirical Rule Example
• According to the Empirical Rule, approximately 68%
of the data will lie in the interval (x – s, x + s),
(15.5 – 3.34, 15.5 + 3.34) = (12.16, 18.84)
• Approximately 95% of the data will lie in the interval
(x – 2s, x + 2s),
(15.5 – 2∙3.34, 15.5 + 2∙3.34) = (8.82, 22.18)
• Approximately 99.7% of the data will lie in the interval
(x – 3s, x + 3s),
(15.5 – 3∙3.34, 15.5 + 3∙3.34) = (5.48, 25.52)
© 2011 Pearson Education, Inc
2.7
Numerical Measures
of Relative Standing
© 2011 Pearson Education, Inc
Numerical Measures of
Relative Standing: Percentiles
• Describes the relative location of a
measurement compared to the rest of the data
• The pth percentile is a number such that p% of
the data falls below it and (100 – p)% falls
above it
• Median = 50th percentile
© 2011 Pearson Education, Inc
Percentile Example
• You scored 560 on the GMAT exam. This
score puts you in the 58th percentile.
• What percentage of test takers scored lower
than you did?
• What percentage of test takers scored higher
than you did?
© 2011 Pearson Education, Inc
Percentile Example
• What percentage of test takers scored lower
than you did?
58% of test takers scored lower than 560.
• What percentage of test takers scored higher
than you did?
(100 – 58)% = 42% of test takers scored
higher than 560.
© 2011 Pearson Education, Inc
Numerical Measures of
Relative Standing: z–Scores
• Describes the relative location of a
measurement compared to the rest of the data
• Sample z–score
xx
z
s
Population z–score
z
x µ

• Measures the number of standard deviations
away from the mean a data value is located
© 2011 Pearson Education, Inc
Z–Score Example
• The mean time to assemble a
product is 22.5 minutes with a
standard deviation of 2.5 minutes.
• Find the z–score for an item that
took 20 minutes to assemble.
• Find the z–score for an item that
took 27.5 minutes to assemble.
© 2011 Pearson Education, Inc
Z–Score Example
x = 20, μ = 22.5 σ = 2.5
z = x σ– μ = 20 – 22.5 = –1.0
2.5
x = 27.5, μ = 22.5 σ = 2.5
z = x σ– μ = 27.5 – 22.5 = 2.0
2.5
© 2011 Pearson Education, Inc
Interpretation of z–Scores for
Mound-Shaped Distributions
of Data
1. Approximately 68% of the measurements
will have a z-score between –1 and 1.
2. Approximately 95% of the measurements
will have a z-score between –2 and 2.
3. Approximately 99.7% of the measurements
will have a z-score between –3 and 3.
(see the figure on the next slide)
© 2011 Pearson Education, Inc
Interpretation of z–Scores
© 2011 Pearson Education, Inc
2.8
Methods for Detecting Outliers:
Box Plots and z-Scores
© 2011 Pearson Education, Inc
Outlier
An observation (or measurement) that is unusually large
or small relative to the other values in a data set is called
an outlier. Outliers typically are attributable to one of
the following causes:
1. The measurement is observed, recorded, or entered
into the computer incorrectly.
2. The measurement comes from a different
population.
3. The measurement is correct but represents a rare
(chance) event.
© 2011 Pearson Education, Inc
Quartiles
Measure of noncentral tendency
Split ordered data into 4 quarters
25%
25%
Q1
25%
Q2
25%
Q3
Lower quartile QL is 25th percentile.
Middle quartile m is the median.
Upper quartile QU is 75th percentile.
Interquartile range: IQR = QU – QL
© 2011 Pearson Education, Inc
Quartile (Q2) Example
• Raw Data: 10.3 4.9 8.9 11.7 6.3 7.7
• Ordered: 4.9 6.3 7.7 8.9 10.3 11.7
• Position:
1
2
3
4
5
6
Q2 is the median, the average of the two middle
scores (7.7 + 8.9)/2 = 8.8
© 2011 Pearson Education, Inc
Quartile (Q1) Example
• Raw Data: 10.3 4.9 8.9 11.7 6.3 7.7
• Ordered: 4.9 6.3 7.7 8.9 10.3 11.7
• Position:
1
2
3
4
5
6
QL is median of bottom half = 6.3
© 2011 Pearson Education, Inc
Quartile (Q3) Example
• Raw Data: 10.3 4.9 8.9 11.7 6.3 7.7
• Ordered: 4.9 6.3 7.7 8.9 10.3 11.7
• Position:
1
2
3
4
5
6
QU is median of bottom half = 10.3
© 2011 Pearson Education, Inc
Interquartile Range
1. Measure of dispersion
2. Also called midspread
3. Difference between third & first quartiles
• Interquartile Range = Q3 – Q1
4. Spread in middle 50%
5. Not affected by extreme values
© 2011 Pearson Education, Inc
Thinking Challenge
• You’re a financial analyst for
Prudential-Bache Securities.
You have collected the
following closing stock prices
of new stock issues: 17, 16,
21, 18, 13, 16, 12, 11.
• What are the quartiles, Q1
and Q3, and the interquartile
range?
© 2011 Pearson Education, Inc
Quartile Solution*
Q1
Raw Data:
Ordered:
Position:
17 16 21 18 13 16 12 11
11 12 13 16 16 17 18 21
1 2 3 4 5 6 7 8
QL is the median of the bottom half, the average
of the two middle scores (12 + 13)/2 = 12.5
© 2011 Pearson Education, Inc
Quartile Solution*
Q3
Raw Data:
Ordered:
Position:
17 16 21 18 13 16 12 11
11 12 13 16 16 17 18 21
1 2 3 4 5 6 7 8
QU is the median of the bottom half, the average
of the two middle scores (17 + 18)/2 = 17.5
© 2011 Pearson Education, Inc
Interquartile Range Solution*
Interquartile Range
Raw Data: 17 16 21 18 13 16 12 11
Ordered:
11 12 13 16 16 17 18 21
Position:
1 2 3 4 5 6 7 8
Interquartile Range = Q3 – Q1 = 17.5 – 12.5 = 5
© 2011 Pearson Education, Inc
Box Plot
1. Graphical display of data using 5-number
summary
Xsmallest Q 1 Median Q 3
4
6
8
10
© 2011 Pearson Education, Inc
Xlargest
12
Box Plot
1. Draw a rectangle (box) with the ends
(hinges) drawn at the lower and upper
quartiles (QL and QU). The median data is
shown by a line or symbol (such as “+”).
2. The points at distances 1.5(IQR) from each
hinge define the inner fences of the data set.
Line (whiskers) are drawn from each hinge
to the most extreme measurements inside the
inner fence.
© 2011 Pearson Education, Inc
Box Plot
3. A second pair of fences, the outer fences, are
defined at a distance of 3(IQR) from the hinges.
One symbol (*) represents measurements falling
between the inner and outer fences, and another (0)
represents measurements beyond the outer fences.
4. Symbols that represent the median and extreme data
points vary depending on software used. You may
use your own symbols if you are constructing a box
plot by hand.
© 2011 Pearson Education, Inc
Shape & Box Plot
Left-Skewed
Q 1 Median Q3
Symmetric
Q1
Median Q 3
© 2011 Pearson Education, Inc
Right-Skewed
Q 1 Median Q 3
Detecting Outliers
Box Plots: Observations falling between the
inner and outer fences are deemed suspect
outliers. Observations falling beyond the
outer fence are deemed highly suspect
outliers.
z-scores: Observations with z-scores greater than
3 in absolute value are considered outliers.
(For some highly skewed data sets,
observations with z-scores greater than 2 in
absolute value may be outliers.)
© 2011 Pearson Education, Inc
2.9
Graphing Bivariate Relationships
© 2011 Pearson Education, Inc
Graphing Bivariate
Relationships
• Describes a relationship between two
quantitative variables
• Plot the data in a scattergram (or scatterplot)
y
y
y
x
Positive
relationship
x
Negative
relationship
© 2011 Pearson Education, Inc
x
No
relationship
Scattergram Example
• You’re a marketing analyst for Hasbro Toys.
You gather the following data:
Ad $ (x) Sales (Units) (y)
1
1
2
1
3
2
4
2
5
4
• Draw a scattergram of the data
© 2011 Pearson Education, Inc
Scattergram Example
Sales
4
3
2
1
0
0
1
2
3
Advertising
© 2011 Pearson Education, Inc
4
5
2.10
The Time Series Plot
© 2011 Pearson Education, Inc
Time Series Plot
• Used to graphically display data produced over
time
• Shows trends and changes in the data over
time
• Time recorded on the horizontal axis
• Measurements recorded on the vertical axis
• Points connected by straight lines
© 2011 Pearson Education, Inc
Time Series Plot Example
• The following data shows
the average retail price of
regular gasoline in New
York City for 8 weeks in
2006.
• Draw a time series plot
for this data.
Date
Oct 16, 2006
Oct 23, 2006
Oct 30, 2006
Nov 6, 2006
Nov 13, 2006
Nov 20, 2006
Nov 27, 2006
Dec 4, 2006
© 2011 Pearson Education, Inc
Average
Price
$2.219
$2.173
$2.177
$2.158
$2.185
$2.208
$2.236
$2.298
Time Series Plot Example
Price
2.35
2.3
2.25
2.2
2.15
2.1
2.05
10/16
10/23
10/30
11/6
11/13
11/20
Date
© 2011 Pearson Education, Inc
11/27
12/4
2.11
Distorting the Truth with
Descriptive Statistics
© 2011 Pearson Education, Inc
Errors in Presenting Data
1. Use area to equate to value
2. No relative basis in
comparing data batches
3. Compress the vertical axis
4. No zero point on the vertical
axis
5. Gap in the vertical axis
6. Use of misleading wording
7. Knowing central tendency
without knowing variability
© 2011 Pearson Education, Inc
Reader Equates Area to Value
Bad Presentation
Good Presentation
Minimum Wage
Minimum Wage
1960: $1.00
4
$
1970: $1.60
2
1980: $3.10
0
1990: $3.80
1960
© 2011 Pearson Education, Inc
1970
1980
1990
No Relative Basis
Bad Presentation
300
Freq.
Good Presentation
A’s by Class
A’s by Class
30%
200
20%
100
10%
0
0%
FR SO
JR
SR
%
FR SO JR SR
© 2011 Pearson Education, Inc
Compressing
Vertical Axis
Bad Presentation
Good Presentation
Quarterly Sales
200
$
Quarterly Sales
50
100
25
0
0
Q1 Q2 Q3 Q4
$
Q1
© 2011 Pearson Education, Inc
Q2
Q3
Q4
No Zero Point
on Vertical Axis
Bad Presentation
Good Presentation
Monthly Sales
45
$
Monthly Sales
60
42
40
39
20
36
0
J M M J
S N
$
J
M M J
© 2011 Pearson Education, Inc
S
N
Gap in the Vertical Axis
Bad Presentation
© 2011 Pearson Education, Inc
Changing the Wording
Changing the title of the graph can influence the reader.
We’re not doing so well.
Still in prime years!
© 2011 Pearson Education, Inc
Knowing only central tendency
Knowing ONLY the central tendency might lead one
to purchase Model A. Knowing the variability as
well may change one’s decision!
© 2011 Pearson Education, Inc
Key Ideas
Describing Qualitative Data
1.
2.
3.
4.
Identify category classes
Determine class frequencies
Class relative frequency = (class freq)/n
Graph relative frequencies
© 2011 Pearson Education, Inc
Key Ideas
Graphing Quantitative Data
1 Variable
1. Identify class intervals
2. Determine class interval frequencies
3. Class relative relative frequency =
(class interval frequencies)/n
4. Graph class interval relative frequencies
© 2011 Pearson Education, Inc
Key Ideas
Graphing Quantitative Data
2 Variables
Scatterplot
© 2011 Pearson Education, Inc
Key Ideas
Numerical Description of Quantitative Date
Central Tendency
Mean
Median
Mode
© 2011 Pearson Education, Inc
Key Ideas
Numerical Description of Quantitative Date
Variation
Range
Variance
Standard Deviation
Interquartile range
© 2011 Pearson Education, Inc
Key Ideas
Numerical Description of Quantitative Date
Relative standing
Percentile score
z-score
© 2011 Pearson Education, Inc
Key Ideas
Rules for Detecting Quantitative Outliers
Interval
Chebyshev’s Rule
Empirical Rule
x s
x  2s
x  3s
At least 0%
At least 57%
At least 89%
≈ 68%
≈ 95%
All
© 2011 Pearson Education, Inc
Key Ideas
Rules for Detecting Quantitative Outliers
Method
Box plot:
z-score
Suspect
Values
between inner
and outer
fences
2 < |z| < 3
© 2011 Pearson Education, Inc
Highly Suspect
Values beyond
outer fences
2 < |z| < 3