Download The Population Variance

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Section 2.3 Measures of Variation
Figure 2.31 indicates that we need measures of
variation to express how the two distributions differ.
Figure 2.31 Repair Times for Personal Computers at Two Service Centers
Chapter 2 Descriptive Statistics
Measures of Variation
Range
Largest minus the smallest measurement
The Population Variance  2 (pronounced sigma
squared)
The average of the squared deviations of all
the population measurements from the
population mean
Standard Deviation  (pronounced sigma)
The square root of the variance
Chapter 2 Descriptive Statistics
The Range
Range = largest measurement - smallest measurement
The range measures the interval spanned by all the data
Example 2.3: Internist’s Salaries (in thousands of
dollars)
127 132 138 141 144 146 152 154 165 171 177 192 241
Range = 241 - 127 = 114 ($114,000)
Chapter 2 Descriptive Statistics
The Variance
Population X1, X2, …, XN
Sample x1, x2, …, xn
s2
2
Population Variance
N
2 
Sample Variance
n
 X i -  
2
s 2=
i=1
N
Chapter 2 Descriptive Statistics
2


x
x
 i
i=1
n-1
The Variance
For a population of size N, the population variance 2 is defined
as
N
2 
2


x


 i
i 1
N
2
2
2

x1     x2       xN   

N
For a sample of size n, the sample variance s2 is defined as
n
s2 
2


x

x
 i
i 1
n 1
2
2
2

x1  x   x2  x     xn  x 

n 1
and is a point estimate for 2
Chapter 2 Descriptive Statistics
The Standard Deviation
Population Standard Deviation, :
Sample Standard Deviation, s:
 
s s
Chapter 2 Descriptive Statistics
2
2
Example 2.5
Consider the population of profit margins for five of the
best big companies in America as rated by Forbes
magazine on its website on March 16, 2005. These profit
margins are 8%, 10%, 15%, 12% and 5%.
Population Mean

8  10  15  12  5 50

 10 %
5
5
Population 2 8  102  10  102  15  102  12  102  5  102
 
Variance
5
2
2

 2   0 2  52  2 2   5

5
4  0  25  4  25 58
2


 11.6% 
5
5
Population Standard Deviation

Chapter 2 Descriptive Statistics
 2  11 .6  3.406 %
Example 2.6 The Car Mileage Case
Sample variance and standard deviation for first five car
mileages from Table 2.1
30.8, 31.7, 30.1, 31.6, 32.1
so x  31.26
2
  xi  x 
5
s2 
i 1
5 1

30.8  31.26 2  31.7  31.26 2  30.1  31.26 2  31.6  31.26 2  32.1  31.26 2

4
= 2.572 /4 = 0.643
s  s 2  .643  0.8019
Chapter 2 Descriptive Statistics
Sample variance and standard deviation for all car mileages
from Table 2.1, Recall x  31.5531 .
49
s 
2
2


x

x
 i
i 1
49  1
30.66204

 0.638793
48
s  s 2  0.638793  0.7992
The point estimate of the variance of all cars is 0.638793 mpg2
and the point estimate of the standard deviation of all cars is
0.7992 mpg.
Chapter 2 Descriptive Statistics
The computational formula for the sample variance
s2
2
n


 
  xi  

1  n
 

xi2   i 1


n  1  i 1
n






The Payment Time Case
Example 2.7
Consider the sample of 65 payment times in Table 2.2.
65
x
i 1
i
65
x
i 1
2
i
 x1  x2    x65  22  19    21  1,177
2
 x12  x22    x65
 (22) 2  (19) 2    (21) 2  22,317
Therefore
1 
(1,177) 2  1,004.2464
s 
 15.69135
22,317 

(65  1) 
65 
64
2
and s  s 2  15.69135  3.9612 Days.
Chapter 2 Descriptive Statistics
The Empirical Rule for
Normal Populations
If a population has mean  and standard deviation  and
is described by a normal curve (symmetrical, bell-shaped
distribution), then
1. 68.26% of the population measurements lie within one
standard deviation of the mean: [, ]
2. 95.44% of the population measurements lie within two
standard deviations of the mean: [2, 2]
3. 99.73% of the population measurements lie within
three standard deviations of the mean: [3, 3]
Chapter 2 Descriptive Statistics
3- 12
Empirical Rule
68%
95%
99.7%
3
2 1

1 2  3
Chapter 2 Descriptive Statistics
Tolerance Intervals
An Interval that contains a specified percentage of the
individual measurements in a population is called a
tolerance interval.
 The one, two, and three standard deviation intervals
around  given in (1), (2) and (3) are tolerance
intervals containing, respectively, 68.26 percent, 95.44
percent and 99.73 percent of the measurements in a
normally distributed population.
 The three-sigma interval   3 ] to be a tolerance
interval that contains almost all of the measurements
in a normally distributed population.
Chapter 2 Descriptive Statistics
Example 2.8
The Car Mileage Case
Recall x  31.5531 and standard deviation S= 0.7992

68.26% of all individual cars will have mileages in
the range
x  s]  31.6  0.8]  30.8,32.4] mpg

95.44% of all individual cars will have mileages in
the range
x  2s]  31.6 1.6]  30.0,33.2] mpg

99.73% of all individual cars will have mileages in
the range
x  3s]  31.6  2.4]  29.2,34.0] mpg
Chapter 2 Descriptive Statistics
Example 2.8
The Car Mileage Case
Chapter 2 Descriptive Statistics
Skewness and the Empirical Rule
 The Empirical Rule holds for normally distributed
populations.
 This rule also approximately holds for populations
having mound-shaped (single-peaked) distributions that
are not very skewed to the right or left.
 For example , Recall that the distribution of 65
payment times, it indicates that the empirical rule holds.
Chapter 2 Descriptive Statistics
Chebyshev’s Theorem
Let  and  be a population’s mean and standard
deviation, then for any value k > 1,
At least 100(1 - 1/k2 )% of the population measurements
lie in the interval:
[k, k]
Use only for non-mound distributions
• Skewed
 But not extremely skewed. (If a population is
extremely skewed, it is best to measure variation
by using percentiles.)
• Bimodal
Chapter 2 Descriptive Statistics
Empirical Rule and Chebyshev’s
Theorem
Population
Empirical Rule
Normal
Mound-shaped,
not very skewed
Chebyshev
Any population but
not extremely
skewed
[μ-σ, μ+σ]
About 68%
[μ-2σ, μ+2σ]
About 95%
At least 75%
[μ-3σ, μ+3σ]
About 99.7%
At least 88.89%
Chapter 2 Descriptive Statistics
Exercise Shipping Times
(Chebychev’s Theorem and the Empirical Rule)
A group of 13 students is studying in Istanbul, Tukey,
for five weeks. As part of their study of the local
economy, they each purchased an Oriental rug and
arranged for its shipment back to the United States. The
shipping time, in days, For each rug was
31 31 42 39 42 43 34
30 28 36 37 35 40
Estimate the percentage of days that are within two
standard deviations of the Mean. Is it likely to take 2
months for a delivery?
Chapter 2 Descriptive Statistics
z Scores

For any x in a population or sample, the associated z
score is
x  mean
z
standard deviation

The z score is the number of standard deviations that
x is from the mean
 A positive z score is for x above (greater than) the
mean
 A negative z score is for x below (less than) the
mean
Chapter 2 Descriptive Statistics
Example 2.9
Population of profit margins for five big American companies:
8%, 10%, 15%, 12%, 5%
 = 10%,  = 3.406%
x
x – mean
z score
8
8 – 10 = -2
-2/3.406 = -0.59
10
10- 10 = 0
0/3.406 = 0.00
15
15 – 10 = 5
5/3.406 = 1.47
12
12 – 10 = 2
2/3.406 = 0.59
5
5 – 10 = -5
-5/3.406 = -1.47
Chapter 2 Descriptive Statistics
Coefficient of Variation



Measures the size of the standard deviation relative to the
size of the mean
Coefficient of Variation = standard deviation ×100%
mean
Used to:
 compare the relative variabilities of values about the
mean
 Compare the relative variability of populations or
samples with different means and different standard
deviations
 Measure risk
Chapter 2 Descriptive Statistics
Table: Rates of Return: Asset A and B
Rates of Return
Year
5 years ago
4 years ago
3 years ago
2 years ago
1 year ago
Total
Average rate of Return
Standard deviation
Expected
Return
Asset A (house)
Asset B (stock)
11.3%
12.5
13.0
12.0
12.2
61.0
12.2%
0.63
9.4%
17.1
10.3
11.2
13.5
61.5
12.3%
3.08
Which of these two single asset is better?
Chapter 2 Descriptive Statistics
Section 2.4 Percentiles, Quartiles and Boxand-Whiskers Display
For a set of measurements arranged in increasing order,
the pth percentile is a value such that p percent of the
measurements fall at or below the value and (100-p)
percent of the measurements fall at or above the value
The first quartile Q1 is the 25th percentile
The second quartile (or median) Md is the 50th percentile
The third quartile Q3 is the 75th percentile
The interquartile range IQR is Q3 - Q1
Chapter 2 Descriptive Statistics
One of Procedures for calculating
Percentiles
1. Arrange the measurements in increasing order .
p
2. Calculate the index i  n 
100
A. If i is not integer, the next integer greater than i denotes the
Position of the pth percentile in the ordered arrangement.
B. If i is integer, then the pth percentile is the average of the
The measurement in positions i and i+1 in the ordered
arrangement
Chapter 2 Descriptive Statistics
Example 2.10
DVD Recorder Satisfaction
20 customer satisfaction ratings:
1 3 5 5 7 8 8 8 8 8 8 9 9 9 9 9 10 10 10 10
Md = (8+8)/2 = 8
Q1 = (7+8)/2 = 7.5
Q3 = (9+9)/2 = 9
IQR = Q3  Q1 = 9  7.5 = 1.5
Chapter 2 Descriptive Statistics
Five-Number Summary





Smallest Value
First Quartile
Median
Third Quartile
Largest Value
Chapter 2 Descriptive Statistics
Using stem-and-leaf displays to find percentiles
(a)The 75th percentile of the 65 payment times, and a fivenumber summary
Chapter 2 Descriptive Statistics
(b) The 5th percentile of the 60 bottle design ratings and a
five-number summary
Chapter 2 Descriptive Statistics
Box-and-Whiskers Plots

The box plots the:
 first quartile, Q1
 median, Md
 third quartile, Q3
 inner fences, located 1.5IQR away from the quartiles:



= Q1 – (1.5  IQR)
= Q3 + (1.5  IQR)
outer fences, located 3IQR away from the quartiles:


= Q1 – (3  IQR)
= Q3 + (3  IQR)
Chapter 2 Descriptive Statistics

The “whiskers” are dashed lines that plot the range of
the data
 A dashed line drawn from the box below Q1 down to
the smallest measurement
 Another dashed line drawn from the box above Q3
up to the largest measurement
Example: 20 customer satisfaction ratings:
1 3 5 5 7 8 8 8 8 8 8 9 9 9 9 9 10 10 10 10
Chapter 2 Descriptive Statistics
Outliers


Outliers are measurements that are very different from
most of the other measurements
 Because they are either very much larger or very much
smaller than most of the other measurements
Outliers lie beyond the fences of the box-and-whiskers plot
 Measurements between the inner and outer fences are
mild outliers
 Measurements beyond the outer fences are severe
outliers
Chapter 2 Descriptive Statistics
Chapter 2 Descriptive Statistics
Section 2.5 Describing Qualitative Data
Pie charts of the proportion (as percent) of all cars sold
in the United States by different manufacturers, 1970
versus 1997
Chapter 2 Descriptive Statistics
Population and Sample Proportions
Population X1, X2, …, XN
Sample x1, x2, …, xn
pˆ
p
Sample Proportion
Population Proportion
n
pˆ 
x
i
i =1
n
p^ is the point estimate of p
Chapter 2 Descriptive Statistics
Example 2.11 The Marketing Ethics Case
117 out of 205 marketing researchers disapproved
of action taken in a hypothetical scenario
X = 117, number of researches who disapprove
n = 205, number of researchers surveyed
Sample Proportion:
p̂ 
X 117

 0.57
n 205
Chapter 2 Descriptive Statistics
Bar Chart
Percentage of Automobiles Sold by Manufacturer, 1970
versus 1997
Chapter 2 Descriptive Statistics
Pie Chart
Percentage of Automobiles Sold by Manufacturer,1997
Chapter 2 Descriptive Statistics
An Bar Chart of U.S Automobile Sales in 1997
Chapter 2 Descriptive Statistics
Misleading Graphs and Charts:
Scale Break
Break the vertical scale to exaggerate effect
Mean Salaries at a Major University, 2002 - 2005
Chapter 2 Descriptive Statistics
Misleading Graphs and Charts: Scale Effects
Compress vs. stretch the vertical axis to exaggerate or minimize
the effect
Mean Salary Increases at a Major University, 2002 - 2005
Chapter 2 Descriptive Statistics
Weighted Means

Sometimes, some measurements are more important than
others
 Assign numerical “weights” to the data


Weights measure relative importance of the value
Calculate weighted mean as
w x
w
i
i
i
where wi is the weight assigned to the ith measurement xi
Chapter 2 Descriptive Statistics
Example 2.12
June 2001 unemployment rates in the U.S. by region
Census Region
Civilian Labor Force Unemployment
(millions)
Rate (%)
Northeast
26.9
4.1
South
50.6
4.7
Midwest
34.7
4.4
West
32.5
5.0
Want the mean unemployment rate for the U.S.
Chapter 2 Descriptive Statistics

Calculate it as a weighted mean
 So that the bigger the region, the more heavily it counts
in the mean
The data values are the regional unemployment
rates
 The weights are the sizes of the regional labor
forces




26 .9  4.1  50 .6  4.7   34 .7  4.4  32 .5  5.0
26 .9  50 .6  34 .7  25 .5  32 .5
663 .29
 4.58 %
144 .7
Note that the unweigthed mean is 4.55%, which
underestimates the true rate by 0.03%

That is, 0.0003  144.7 million = 43,410 workers
Chapter 2 Descriptive Statistics
Related documents