Download sample standard deviation

Document related concepts
no text concepts found
Transcript
Statistics
Numerical Representation of
Data
Part 2 – Measure of Variation
Warm –up
Approximate the mean of the frequency
distribution.
Class
1– 6
7 – 12
13 – 18
19 – 24
Frequency, f
21
16
28
13
Warm-up - What can be said about the
relationship between the mean and
median in the dotplot below?
a)
b)
c)
d)
The mean is smaller
than the median.
The mean is bigger
than the median.
The mean is equal to
the median.
Nothing can be
determined based on
the graph.
Warm-up - An investor was interested in
determining how much gain she had in her
401K plan in the last 6 quarters. The data
is listed below. Find the median and the
mean of the data.
-510 110 1230 1900 -680 1700
a) Mean = 1021.7
b) Mean = 1021.7
c) Mean = 625
d) Mean = 625
e) Mean = 625
Median = 670
Median = 1565
Median = 3.5
Median = 670
Median = 1565
Agenda





Warm-up
Homework Review
Lesson Objectives
 Determine the range of a data set
 Determine the variance and standard deviation of a
population and of a sample
 Use the Empirical Rule and Chebychev’s Theorem to
interpret standard deviation
 Approximate the sample standard deviation for
grouped data
Summary
Homework
Measures of Variation
Consider the following:
Wait time in minutes:

Bank 1
Median
7.20
Mean
7.15
Complaints/mo 3
Bank 2
7.20
7.15
22
Objective 1

Compute the range of a variable from raw data
3-7
Range
Range
 The difference between the maximum and
minimum data entries in the set.
 The data must be quantitative.
 Range = (Max. data entry) – (Min. data entry)
Example: Finding the Range
A corporation hired 10 graduates. The starting
salaries for each graduate are shown. Find the
range of the starting salaries.
Starting salaries (1000s of dollars)
41 38 39 45 47 41 44 41 37 42
Solution: Finding the Range

Ordering the data helps to find the least and
greatest salaries.
37 38 39 41 41 41 42 44 45 47
minimum

Range = (Max. salary) – (Min. salary)
= 47 – 37 = 10
maximu
m
The range of starting salaries is 10 or $10,000.
EXAMPLE Finding the Range of a Set of Data
The following data represent the travel times (in minutes)
to work for all seven employees of a start-up web
development company.
23, 36, 23, 18, 5, 26, 43
Find the range.
Range = 43 – 5
= 38 minutes
3-11
Objective 2

Compute the variance of a variable from raw data
3-12
Deviation, Variance, and Standard Deviation
Deviation
 The difference between the data entry, x, and
the mean of the data set.
 Population data set:


Deviation of x = x – μ
Sample data set:

Deviation of x = x – x
Example: Finding the Deviation
A corporation hired 10 graduates. The starting
salaries for each graduate are shown. Find the
deviation of the starting salaries.
Starting salaries (1000s of dollars)
41 38 39 45 47 41 44 41 37 42
Solution:
• First determine the mean starting salary.
x 415


 41.5
N
10
Solution: Finding the Deviation

Determine the
deviation for
each data entry.
Salary
($1000s), x
Deviation: x – μ
41
41 – 41.5 = –0.5
38
38 – 41.5 = –3.5
39
39 – 41.5 = –2.5
45
45 – 41.5 = 3.5
47
47 – 41.5 = 5.5
41
41 – 41.5 = –0.5
44
44 – 41.5 = 2.5
41
41 – 41.5 = –0.5
37
37 – 41.5 = –4.5
42
Σx = 415
42 – 41.5 = 0.5
Σ(x – μ) = 0
To order food at a McDonald’s Restaurant, or at Wendy’s
Restaurant, you must stand in line. The following data
represent the wait time (in minutes) in line for a simple
random sample of 30 customers at each restaurant during
the lunch hour. For each sample, answer the following:
(a) What was the mean wait time?
(b) Draw a histogram of each restaurant’s wait time.
(c ) Which restaurant’s wait time appears more dispersed?
Which line would you prefer to wait in? Why?
3-16
Wait Time at Wendy’s
1.50
2.53
1.88
3.99
0.90
0.79
1.20
2.94
1.90
1.23
1.01
1.46
1.40
1.00
0.92
1.66
0.89
1.33
1.54
1.09
0.94
0.95
1.20
0.99
1.72
0.67
0.90
0.84
0.35
2.00
Wait Time at McDonald’s
3.50
0.00
1.97
0.00
3.08
0.00
0.26
0.71
0.28
2.75
0.38
0.14
2.22
0.44
0.36
0.43
0.60
4.54
1.38
3.10
1.82
2.33
0.80
0.92
2.19
3.04
2.54
0.50
1.17
0.23
3-17
(a) The mean wait time in each line is 1.39
minutes.
3-18
(b)
3-19
The population variance of a variable is the sum of
squared deviations about the population mean divided
by the number of observations in the population, N.
That is it is the mean of the sum of the squared
deviations about the population mean.
3-20
The population variance is symbolically represented
by σ2 (lower case Greek sigma squared).
Note: When using the above formula, do not round until the
last computation. Use as many decimals as allowed by your
calculator in order to avoid round off errors.
3-21
EXAMPLE
Computing a Population
Variance
The following data represent the travel times (in minutes) to
work for all seven employees of a start-up web development
company.
23, 36, 23, 18, 5, 26, 43
Compute the population variance of this data. Recall that
174

 24.85714
7
3-22
xi
μ
xi – μ
(xi – μ)2
23
36
23
18
24.85714
24.85714
24.85714
24.85714
-1.85714
11.14286
-1.85714
-6.85714
3.44898
124.1633
3.44898
47.02041
5
26
43
24.85714
24.85714
24.85714
-19.8571
1.142857
18.14286
394.3061
1.306122
329.1633
 x   
i

2
x  


i
N
2
2

902.8571
902.8571

 129.0 minutes2
7
3-23
The Computational Formula
3-24
EXAMPLE
Computing a Population Variance
Using the Computational Formula
The following data represent the travel times (in minutes) to
work for all seven employees of a start-up web
development company.
23, 36, 23, 18, 5, 26, 43
Compute the population variance of this data using the
computational formula.
3-25
Data Set : 23, 36, 23, 18, 5, 26, 43
2
2
2
2
x

23

36

...

43
 5228
i
x
i
 23  36  ...  43  174
2 
2
x
i
x



i
N
N
2
1742
5228 
7

7
 129.0
3-26
The sample variance is computed by determining the
sum of squared deviations about the sample mean and
then dividing this result by n – 1.
3-27
Note: Whenever a statistic consistently overestimates or
underestimates a parameter, it is called biased. To obtain an
unbiased estimate of the population variance, we divide the
sum of the squared deviations about the mean by n - 1.
3-28
EXAMPLE Computing a Sample Variance
Previously, we obtained the following simple random sample for the
travel time data: 5, 36, 26.
Compute the sample variance travel time.
Travel Time, xi
Sample Mean,
Deviation about the
Mean,
Squared Deviations about the
Mean,
 x  x
2
x
xi  x
5
22.333
5 – 22.333
= -17.333
(-17.333)2 = 300.432889
36
22.333
13.667
186.786889
26
22.333
3.667
13.446889
i
 x  x
i
s
2
x  x



i
n 1
2
 500.66667
2

500.66667
3 1
 250.333 square minutes
3-29
Objective 3

Compute the standard deviation of a variable
from raw data
3-30
The population standard deviation is denoted by
It is obtained by taking the square root of the population
variance, so that
The sample standard deviation is denoted by
s
It is obtained by taking the square root of the sample variance, so
that
s  s2
3-31
EXAMPLE
Computing a Population
Standard Deviation
The following data represent the travel times (in minutes) to
work for all seven employees of a start-up web development
company.
23, 36, 23, 18, 5, 26, 43
Compute the population standard deviation of this data.
Recall, from the last objective that σ2 = 129.0 minutes2.
Therefore,
902.8571
  2 
 11.4 minutes
7
3-32
EXAMPLE Computing a Sample Standard
Deviation
Recall the sample data 5, 26, 36 results in a sample variance of
s2 

xi  x
n 1

2

500.66667
3 1
 250.333 square minutes
Use this result to determine the sample standard deviation.
s  s2 
500.666667
 15.8 minutes
3 1
3-33
EXAMPLE
Comparing Standard Deviations
Determine the standard deviation waiting time
for Wendy’s and McDonald’s. Which is
larger? Why?
3-34
Wait Time at Wendy’s
1.50
2.53
1.88
3.99
0.90
0.79
1.20
2.94
1.90
1.23
1.01
1.46
1.40
1.00
0.92
1.66
0.89
1.33
1.54
1.09
0.94
0.95
1.20
0.99
1.72
0.67
0.90
0.84
0.35
2.00
Wait Time at McDonald’s
3.50
0.00
1.97
0.00
3.08
0.00
0.26
0.71
0.28
2.75
0.38
0.14
2.22
0.44
0.36
0.43
0.60
4.54
1.38
3.10
1.82
2.33
0.80
0.92
2.19
3.04
2.54
0.50
1.17
0.23
3-35
EXAMPLE
Comparing Standard Deviations
Determine the standard deviation waiting time
for Wendy’s and McDonald’s. Which is
larger? Why?
Sample standard deviation for Wendy’s:
0.738 minutes
Sample standard deviation for McDonald’s:
1.265 minutes
3-36
Deviation, Variance, and Standard Deviation
Population Variance
( x   )
 
N
2

2
Sum of squares, SSx
Population Standard Deviation
2

(
x


)
  2 
N

Finding the Population Variance & Standard
Deviation
In Words
1. Find the mean of the
population data set.
In Symbols
x

N
2. Find deviation of each
entry.
x–μ
3. Square each deviation.
(x – μ)2
4. Add to get the sum of
squares.
SSx = Σ(x – μ)2
Finding the Population Variance & Standard
Deviation
In Words
In Symbols
5. Divide by N to get the
population variance.
2

(
x


)
2 
N
6. Find the square root to get
the population standard
deviation.
( x   ) 2

N
Example: Finding the Population Standard
Deviation
A corporation hired 10 graduates. The starting
salaries for each graduate are shown. Find the
population variance and standard deviation of
the starting salaries.
Starting salaries (1000s of dollars)
41 38 39 45 47 41 44 41 37 42
Recall μ = 41.5.
Solution: Finding the Population Standard
Deviation


Determine SSx Salary, x
N = 10
41
Deviation: x – μ Squares: (x – μ)2
41 – 41.5 = –0.5
(–0.5)2 = 0.25
38
38 – 41.5 = –3.5
(–3.5)2 = 12.25
39
39 – 41.5 = –2.5
(–2.5)2 = 6.25
45
45 – 41.5 = 3.5
(3.5)2 = 12.25
47
47 – 41.5 = 5.5
(5.5)2 = 30.25
41
41 – 41.5 = –0.5
(–0.5)2 = 0.25
44
44 – 41.5 = 2.5
(2.5)2 = 6.25
41
41 – 41.5 = –0.5
(–0.5)2 = 0.25
37
37 – 41.5 = –4.5
(–4.5)2 = 20.25
42
42 – 41.5 = 0.5
(0.5)2 = 0.25
Σ(x – μ) = 0
SSx = 88.5
Solution: Finding the Population Standard
Deviation
Population Variance
( x   )
88.5

 8.9
•  
N
10
2
2
Population Standard Deviation
•    2  8.85  3.0
The population standard deviation is about 3.0, or
$3000.
Deviation, Variance, and Standard Deviation
Sample Variance
( x  x )
s 
n 1
2
2

Sample Standard Deviation
2

(
x

x
)
s  s2 
n 1

Finding the Sample Variance & Standard
Deviation
In Words
In Symbols
1. Find the mean of the
sample data set.
x
2. Find deviation of each
entry.
xx
3. Square each deviation.
( x  x )2
4. Add to get the sum of
squares.
SS x  ( x  x ) 2
x
n
Finding the Sample Variance & Standard
Deviation
In Words
5. Divide by n – 1 to get the
sample variance.
6. Find the square root to get
the sample standard
deviation.
In Symbols
2

(
x

x
)
s2 
n 1
( x  x ) 2
s
n 1
Example: Finding the Sample Standard
Deviation
The starting salaries are for the Chicago
branches of a corporation. The corporation has
several other branches, and you plan to use the
starting salaries of the Chicago branches to
estimate the starting salaries for the larger
population. Find the sample standard deviation
of the starting salaries.
Starting salaries (1000s of dollars)
41 38 39 45 47 41 44 41 37 42
Solution: Finding the Sample Standard
Deviation


Determine SSx
n = 10
Deviation: x – μ
Squares: (x – μ)2
41
41 – 41.5 = –0.5
(–0.5)2 = 0.25
38
38 – 41.5 = –3.5
(–3.5)2 = 12.25
39
39 – 41.5 = –2.5
(–2.5)2 = 6.25
45
45 – 41.5 = 3.5
(3.5)2 = 12.25
47
47 – 41.5 = 5.5
(5.5)2 = 30.25
41
41 – 41.5 = –0.5
(–0.5)2 = 0.25
44
44 – 41.5 = 2.5
(2.5)2 = 6.25
41
41 – 41.5 = –0.5
(–0.5)2 = 0.25
37
37 – 41.5 = –4.5
(–4.5)2 = 20.25
42
42 – 41.5 = 0.5
(0.5)2 = 0.25
Salary, x
Σ(x – μ) = 0
SSx = 88.5
Solution: Finding the Sample Standard
Deviation
Sample Variance
( x  x )
88.5

 9.8
• s 
n 1
10  1
2
2
Sample Standard Deviation
88.5
 3.1
• s s 
9
2
The sample standard deviation is about 3.1, or
$3100.
Example: Using Technology to Find the
Standard Deviation
Sample office rental rates
(in dollars per square foot
per year) for Miami’s
central business district
are shown in the table.
Use a calculator or a
computer to find the mean
rental rate and the sample
standard deviation.
(Adapted from: Cushman &
Wakefield Inc.)
Office Rental Rates
35.00
33.50
37.00
23.75
26.50
31.25
36.50
40.00
32.00
39.25
37.50
34.75
37.75
37.25
36.75
27.00
35.75
26.00
37.00
29.00
40.50
24.50
33.00
38.00
Solution: Using Technology to Find the
Standard Deviation
Sample Mean
Sample
Standard
Deviation
Interpreting Standard Deviation


Standard deviation is a measure of the typical
amount an entry deviates from the mean.
The more the entries are spread out, the
greater the standard deviation.
Estimating Standard Deviation

Range Rule of Thumb


S = R/4
A quick estimation tool to determine if the
standard deviation calculation is
approximately correct
Interpreting Standard Deviation: Empirical
Rule (68 – 95 – 99.7 Rule)
For data with a (symmetric) bell-shaped
distribution, the standard deviation has the
following characteristics:
• About 68% of the data lie within one standard
deviation of the mean.
• About 95% of the data lie within two standard
deviations of the mean.
• About 99.7% of the data lie within three standard
deviations of the mean.
Interpreting Standard Deviation: Empirical
Rule (68 – 95 – 99.7 Rule)
99.7% within 3 standard
deviations
95% within 2 standard
deviations
68% within 1
standard
deviation
34%
2.35%
2.35%
x  3s
34%
13.5%
x  2s
13.5%
x s
x
xs
x  2s
x  3s
Standard Deviation and Area

Zσ
1σ
1.645σ
1.960σ
2σ
2.576σ
3σ
3.2906σ
4σ
5σ
6σ
Percentage within CI
68.2689492%
90%
95%
95.4499736%
99%
99.7300204%
99.9%
99.993666%
99.9999426697%
99.9999998027%
Percentage outside CI
31.7310508%
10%
5%
4.5500264%
1%
0.2699796%
0.1%
0.006334%
0.0000573303%
0.0000001973%

7σ
99.9999999997440%
0.0000000002560%










Ratio outside CI
1 / 3.1514871
1 / 10
1 / 20
1 / 21.977894
1 / 100
1 / 370.398
1 / 1000
1 / 15,788
1 / 1,744,278
1 / 506,800,000
1 / 390,600,000,000
Example: Using the Empirical Rule
In a survey conducted by the National Center
for Health Statistics, the sample mean height of
women in the United States (ages 20-29) was
64 inches, with a sample standard deviation of
2.71 inches. Estimate the percent of the women
whose heights are between 64 inches and
69.42 inches.
Solution: Using the Empirical Rule
• Because the distribution is bell-shaped, you can
use the Empirical Rule.
34%
13.5%
55.87
x  3s
58.58
x  2s
61.29
x s
64
x
66.71
xs
69.42
x  2s
72.13
x  3s
34% + 13.5% = 47.5% of women are between 64
and 69.42 inches tall.
Chebychev’s Theorem

The portion of any data set lying within k
standard deviations (k > 1) of the mean is at
1
least:
1
k2
1 3
1  2  or 75%
• k = 2: In any data set, at least
2
4
of the data lie within 2 standard deviations of
the mean.
1 8
1  2  or 88.9%
• k = 3: In any data set, at least
3
9
of the data lie within 3 standard deviations of
the mean.
Example: Using Chebychev’s Theorem
The age distribution for Florida is shown in the
histogram. Apply Chebychev’s Theorem to the
data using k = 2. What can you conclude?
Solution: Using Chebychev’s Theorem
k = 2: μ – 2σ = 39.2 – 2(24.8) = -10.4 (use 0
since age can’t be negative)
μ + 2σ = 39.2 + 2(24.8) = 88.8
At least 75% of the population of Florida is
between 0 and 88.8 years old.
Standard Deviation for Grouped Data
Sample standard deviation for a frequency
distribution


( x  x ) 2 f
s
n 1
where n= Σf (the number of
entries in the data set)
When a frequency distribution has classes,
estimate the sample mean and standard
deviation by using the midpoint of each class.
Example: Finding the Standard Deviation for
Grouped Data
You collect a random sample
of the number of children per
household in a region. Find
the sample mean and the
sample standard deviation of
the data set.
Number of Children
in 50 Households
1
3
1
1
1
1
2
2
1
0
1
1
0
0
0
1
5
0
3
6
3
0
3
1
1
1
1
6
0
1
3
6
6
1
2
2
3
0
1
1
4
1
1
2
2
0
3
0
2
4
Solution: Finding the Standard Deviation for
Grouped Data


First construct a frequency
distribution.
Find the mean of the frequency
distribution.
xf 91
x

 1.8
n
50
The sample mean is about 1.8
children.
x
f
xf
0
10
0(10) = 0
1
19
1(19) = 19
2
7
2(7) = 14
3
7
3(7) =21
4
2
4(2) = 8
5
1
5(1) = 5
6
4
6(4) = 24
Σf = 50
Σ(xf )= 91
Coefficient of Variation
The coefficient of variation (or CV) for a set
of nonnegative sample or population data,
expressed as a percent, describes the
standard deviation relative to the mean.
This is a measure of the variability of the
data. The higher the percentage, the more
variable.
Sample
CV =
s  100%
x
Population
CV =

 100%

Coefficient of Variation Example

The average score in a calculus class is 110,
with a standard deviation of 5; the average
score in a statistics class is 106, with a
standard deviation of 4. Which class is more
variable in terms of scores?

CVc= 5/110 = 4.5% ; CVs = 4/106 = 3.8%
The calculus class is more variable.

Summary




Determined the range of a data set
Determined the variance and standard
deviation of a population and of a sample
Used the Empirical Rule and Chebychev’s
Theorem to interpret standard deviation
Approximated the sample standard deviation
for grouped data
Homework


Pt 1 – Pg. 84 – 86; # 1-21 odd
Pt 2 – Pg. 86 -91; # 23 – 49 odd
Related documents