Download Business Statistics: A Decision-Making

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Bootstrapping (statistics) wikipedia , lookup

Foundations of statistics wikipedia , lookup

Time series wikipedia , lookup

History of statistics wikipedia , lookup

Misuse of statistics wikipedia , lookup

Transcript
Business Statistics:
A Decision-Making Approach
7th Edition
Chapter 3
Describing Data Using
Numerical Measures
Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc.
Chap 3-1
Review


A frequency histogram is an effective way of
converting quantitative data into useful and
meaningful information.

Visual indication of where data are centered

How much spread there is in the data around the center
Relative Frequency: when number of total
observations of more than two surveys are differ.

Simply change frequency to relative frequency (%)
Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc.
Chap 2-2
Review

Grouped Data Frequency Distributions

Continuous or discrete data: large number of possible
values are spread over a large rage
1.
Determine the number of classes or groups
2.
Establish class width
3.
Determine boundaries
Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc.
Chap 2-3
Chapter Goals
After completing this chapter, you should be able to:

Compute and interpret the mean, median, and mode for a
set of data

Compute the range, variance, and standard deviation and
know what these values mean

Construct and interpret a box and whisker graph

Compute and explain the coefficient of variation and
z scores

Use numerical measures along with graphs, charts, and
tables to describe data
Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc.
Chap 3-4
Chapter Topics

Measures of Center and Location


Other measures of Location


Weighted mean, percentiles, quartiles
Measures of Variation


Mean, median, mode
Range, interquartile range, variance and standard
deviation, coefficient of variation
Using the mean and standard deviation together

Coefficient of variation, z-scores
Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc.
Chap 3-5
Summary Measures
Describing Data Numerically
Center and Location
Other Measures
of Location
Mean
Median
Mode
Variation
Range
Percentiles
Interquartile Range
Quartiles
Weighted Mean
Variance
Standard Deviation
Coefficient of
Variation
Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc.
Chap 3-6
Measures of Center and Location
Overview
Center and Location
Mean
Median
Mode
Weighted Mean
n
x
x
i1
i
XW
n

i1
i
i
i
N
x
wx


w
wx


w
i
N
Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc.
W
i
i
i
Chap 3-7
Mean (Arithmetic Average)

The Mean is the arithmetic average of data
values

Population mean
N = Population Size
N
x
x1  x 2    xN


N
N
i1

i
Sample mean
n = Sample Size
n
x
x
i1
i
n
Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc.
x1  x 2    x n

n
Chap 3-8
Mean (Arithmetic Average)
(continued)



The most common measure of central tendency
Mean = sum of values divided by the number of values
Affected by extreme values (outliers)
0 1 2 3 4 5 6 7 8 9 10
Mean = 3
1  2  3  4  5 15

3
5
5
Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc.
0 1 2 3 4 5 6 7 8 9 10
Mean = 4
1  2  3  4  10 20

4
5
5
Chap 3-9
Median


In an ordered array, the median is the “middle”
number, i.e., the number that splits the
distribution in half
The median is not affected by extreme values

Useful when data are highly skewed
0 1 2 3 4 5 6 7 8 9 10
0 1 2 3 4 5 6 7 8 9 10
Median = 3
Median = 3
Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc.
Chap 3-10
Median Example
(continued)
Data array:
4, 4, 5, 5, 9, 11, 12, 14, 16, 19, 22, 23, 24

Sort the data (array data), note that n = 13

Find the i = (1/2)n position:
i = (1/2)(13) = 6.5

Since 6.5 is not an integer, round up to 7

The median is the value in the 7th position:
Md = 12
Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc.
Chap 3-11
Shape of a Distribution

Describes how data is distributed

Symmetric or skewed
Left-Skewed
Symmetric
Right-Skewed
Mean < Median
Mean = Median
Median < Mean
(Longer tail extends to left)
Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc.
(Longer tail extends to right)
Chap 3-12
Mode

The value that is repeated most often



Another measure of central location
Not affected by extreme values
Used for either numerical or categorical data


There may be no mode
There may be several modes
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
0 1 2 3 4 5 6
Mode = 5
No Mode
Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc.
Chap 3-13
Weighted Mean

Used when values are grouped by frequency or
relative importance (GPA)
Example: Sample of
26 Repair Projects
Days to
Complete
Frequency
5
4
6
12
7
8
8
2
Weighted Mean Days
to Complete:
XW 
w x
w
Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc.
i
i

(4  5)  (12  6)  (8  7)  (2  8)
4  12  8  2

164
 6.31 days
26
i
Chap 3-14
Review Example

Five houses on a hill by the beach
$2,000 K
House Prices:
$2,000,000
500,000
300,000
100,000
100,000
$500 K
$300 K
$100 K
$100 K
Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc.
Chap 3-15
Summary Statistics
House Prices:
$2,000,000
500,000
300,000
100,000
100,000

Mean:

Median: middle value of ranked data
= $300,000

Mode: most frequent value
= $100,000
Sum 3,000,000
($3,000,000/5)
= $600,000
Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc.
Chap 3-16
Which measure of location
is the “best”?

Mean is generally used, unless
extreme values (outliers) exist

Then Median is often used, since
the median is not sensitive to
extreme values.

Example: Realtors say “Median Home
Prices”….not average – less sensitive
to outliers
Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc.
Chap 3-17
Other Location Measures
Other Measures
of Location
Percentiles
The pth percentile in a data array:


p% are less than or equal to this
value
(100 – p)% are greater than or
equal to this value
(where 0 ≤ p ≤ 100)
Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc.
Quartiles

1st quartile = 25th percentile

2nd quartile = 50th percentile
= median

3rd quartile = 75th percentile
Chap 3-18
Percentiles

The pth percentile in an ordered array of n values is the
value in ith position, where
p
i
(n)
100

If i is not an integer,
round up to the next
higher integer value
Example: Find the 60th percentile in an ordered array of
19 values.
p
60
i
(n) 
(19)  11.4
100
100
Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc.
So use value in the
i = 12th position
Chap 3-19
Quartiles

Quartiles split the ranked data into 4 equal
groups:
25% 25%
Q1

Q2
25%
25%
Q3
Note that the second quartile (the 50th percentile)
is the median
Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc.
Chap 3-20
Quartiles

Example: Find the first quartile
Sample Data in Ordered Array: 11 12 13 16 16 17 18 21 22
(n = 9)
Q1 = 25th percentile, so find i :
25
i = 100 (9) = 2.25
so round up and use the value in the 3rd position: Q1 = 13
Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc.
Chap 3-21
Box and Whisker Plot

A box-and-whisker plot (often simply called a box plot)
is a graphical way of showing data. It is useful for
quickly finding outliers - data points out of line with the
rest of the data set.
Example:
25%
25%
25%
25%
* *
Outliers
Lower
Limit
1st
Quartile
Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc.
Median
3rd
Quartile
Upper
Limit
Chap 3-22
Shape of Box and Whisker Plots

The Box and central line are centered between the
endpoints if data is symmetric around the median

(A Box and Whisker plot can be shown in either
vertical or horizontal format)
Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc.
Chap 3-23
Distribution Shape and
Box and Whisker Plot
Left-Skewed
Q1
Q2 Q3
Symmetric
Q1 Q2 Q3
Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc.
Right-Skewed
Q1 Q2 Q3
Chap 3-24
Constructing the Box and Whisker Plot
See the additional note for quartiles on the class website
* *
Outliers
Lower
Limit
1st
Quartile
The lower limit is
Q1 – 1.5 (Q3 – Q1)




Median
3rd
Quartile
Upper
Limit
The upper limit is
Q3 + 1.5 (Q3 – Q1)
The center box extends from Q1 to Q3
The line within the box is the median
The whiskers extend to the smallest and largest values within
the calculated limits
Outliers are plotted outside the calculated limits
Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc.
Chap 3-25
Box-and-Whisker Plot Example
Try this using the online tool on the class website

Below is a Box-and-Whisker plot for the following data:
Min
Q1
Q2
Q3
Max
0 2
2
2
3
3
4
5
6 11 27
*
0 2 3
6
12
Upper limit = Q3 + 1.5 (Q3 – Q1)
= 6 + 1.5 (6 – 2) = 12

27
27 is above the
upper limit so is
shown as an outlier
This data is right skewed, as the plot depicts
Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc.
Chap 3-26
Measures of Variation
Variation
Range
Interquartile
Range
Variance
Standard Deviation
Population
Variance
Population
Standard
Deviation
Sample
Variance
Sample
Standard
Deviation
Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc.
Coefficient of
Variation
Chap 3-27
Variation

Measures of variation give information on
the spread or variability of the data
values.
Same center,
different variation
Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc.
Chap 3-28
Range


Simplest measure of variation
Difference between the largest and the smallest
observations:
Range = xmaximum – xminimum
Example:
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
Range = 14 - 1 = 13
Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc.
Chap 3-29
Disadvantages of the Range

Ignores the way in which data are distributed
7
8
9
10
11
12
Range = 12 - 7 = 5

7
8
9
10
11
12
Range = 12 - 7 = 5
Sensitive to outliers
1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,5
Range = 5 - 1 = 4
1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,120
Range = 120 - 1 = 119
Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc.
Chap 3-30
Interquartile Range

Can eliminate some outlier problems by using
the interquartile range

Eliminate some high-and low-valued
observations and calculate the range from the
remaining values.

Interquartile range = 3rd quartile – 1st quartile
Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc.
Chap 3-31
Interquartile Range Example
Example:
X
minimum
Q1
25%
12
Median
(Q2)
25%
30
25%
45
X
Q3
maximum
25%
57
70
Interquartile range
= 57 – 30 = 27
Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc.
Chap 3-32
Variance

Average of squared deviations of values from
the mean

Population variance:
N
σ 
2

i1
N
n
Sample variance:
s 
2
Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc.
2
(x

μ)
 i
2
(x

x
)
 i
i1
n -1
Chap 3-33
Standard Deviation



Most commonly used measure of variation
Shows variation about the mean
Has the same units as the original data

N
Population standard deviation:
σ

Sample standard deviation:
i1
N
n
s
Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc.
2
(x

μ)
 i
2
(x

x
)
 i
i1
n -1
Chap 3-34
Calculation Example:
Sample Standard Deviation
Sample
Data (Xi) :
10
12
n=8
s
14
15
17
18
18
24
Mean = x = 16
(10  x ) 2  (12  x ) 2  (14  x ) 2    (24  x ) 2
n 1

(10  16) 2  (12  16) 2  (14  16) 2    (24  16) 2
8 1

130
7

4.3095
Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc.
Chap 3-35
Comparing Standard Deviations
Same mean, but different
standard deviations:
Data A
11
12
13
14
15
16
17
18
19
20 21
Mean = 15.5
s = 3.338
20 21
Mean = 15.5
s = .9258
20 21
Mean = 15.5
s = 4.57
Data B
11
12
13
14
15
16
17
18
19
Data C
11
12
13
14
15
16
17
18
Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc.
19
Chap 3-36
Coefficient of Variation

Measures relative variation

Always in percentage (%)

Shows variation relative to mean

Is used to compare two or more sets of data
measured in different units
Population
σ
CV  
μ
  100%
 
Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc.
Sample
 s 
  100%
CV  
x 


Chap 3-37
Comparing Coefficients
of Variation

Stock A:
 Average price last year = $50
 Standard deviation = $5
s
CVA  
x

Stock B:



$5
  100% 
 100%  10%

$50

Average price last year = $100
Standard deviation = $5
s
CVB  
x

$5
  100% 
 100%  5%

$100

Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc.
Both stocks
have the same
standard
deviation, but
stock B is less
variable relative
to its price
Chap 3-38
The Empirical Rule


If the data distribution is bell-shaped, then
the interval:
μ  1σ contains about 68% of the values in
the population or the sample
68%
μ
μ  1σ
Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc.
Chap 3-39
The Empirical Rule


μ  2σ contains about 95% of the values in
the population or the sample
μ  3σ contains about 99.7% of the values
in the population or the sample
95%
99.7%
μ  2σ
μ  3σ
Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc.
Chap 3-40
Tchebysheff’s Theorem

Regardless of how the data are distributed,
at least (1 - 1/k2) of the values will fall within
k standard deviations of the mean

Examples:
At least
within
(1 - 1/12) = 0% ……..... k=1 (μ ± 1σ)
(1 - 1/22) = 75% …........ k=2 (μ ± 2σ)
(1 - 1/32) = 89% ………. k=3 (μ ± 3σ)
Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc.
Chap 3-41
Standardized Data Values

A standardized data value refers to
the number of standard deviations a
value is from the mean

Standardized data values are
sometimes referred to as z-scores
Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc.
Chap 3-42
Standardized Population Values
x μ
z
σ
where:
 x = original data value
 μ = population mean
 σ = population standard deviation
 z = standard score
(number of standard deviations x is from μ)
Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc.
Chap 3-43
Standardized Sample Values
xx
z
s
where:
 x = original data value
 x = sample mean
 s = sample standard deviation
 z = standard score
(number of standard deviations x is from μ)
Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc.
Chap 3-44
Standardized Value Example

IQ scores in a large population have a bellshaped distribution with mean μ = 100 and
standard deviation σ = 15
Find the standardized score (z-score) for a
person with an IQ of 121.
Answer:
x  μ 121 100
z

 1.4
σ
15
Someone with an IQ of 121 is 1.4 standard deviations
above the mean
Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc.
Chap 3-45
Using Microsoft Excel

Descriptive Statistics are easy to obtain
from Microsoft Excel

Use menu choice:
Data / data analysis / descriptive statistics

Enter details in dialog box
Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc.
Chap 3-46
Using Excel

Select:
Data / data analysis / descriptive statistics
Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc.
Chap 3-47
Using Excel
(continued)

Enter dialog box
details

Check box for
summary statistics

Click OK
Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc.
Chap 3-48
Excel output
Microsoft Excel
descriptive statistics output,
using the house price data:
House Prices:
$2,000,000
500,000
300,000
100,000
100,000
Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc.
Chap 3-49
Chapter Summary

Described measures of center and location

Mean, median, mode, weighted mean

Discussed percentiles and quartiles

Created Box and Whisker Plots

Illustrated distribution shapes

Symmetric, skewed
Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc.
Chap 3-50
Chapter Summary
(continued)

Described measure of variation

Range, interquartile range, variance,
standard deviation, coefficient of variation

Discussed Tchebysheff’s Theorem

Calculated standardized data values
Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc.
Chap 3-51
Measures of Variation
Variation
Range
Interquartile
Range
Variance
Standard Deviation
Population
Variance
Population
Standard
Deviation
Sample
Variance
Sample
Standard
Deviation
Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc.
Coefficient of
Variation
Chap 3-52