Download Measures of Variability

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Bootstrapping (statistics) wikipedia , lookup

History of statistics wikipedia , lookup

Taylor's law wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript
Measures of Variability
•
•
•
•
•
Range
Interquartile range
Variance
Standard deviation
Coefficient of variation
Consider the sample of
starting salaries of business
grads. We would be
interested in knowing if there
was a low or high degree of
variability or dispersion in
starting salaries received.
Range
•Range is simply the difference between the
largest and smallest values in the sample
•Range is the simplest measure of variability.
•Note that range is highly sensitive to the
largest and smallest values.
Example: Apartment Rents
Seventy studio apartments
were randomly sampled in
a small college town. The
monthly rent prices for
these apartments are listed
in ascending order on the next slide.
Range
Range = largest value - smallest value
Range = 615 - 425 = 190
425
440
450
465
480
510
575
430
440
450
470
485
515
575
430
440
450
470
490
525
580
435
445
450
472
490
525
590
435
445
450
475
490
525
600
435
445
460
475
500
535
600
435
445
460
475
500
549
600
435
445
460
480
500
550
600
440
450
465
480
500
570
615
440
450
465
480
510
570
615
Interquartile Range
 The interquartile range of a data set is the difference
between the third quartile and the first quartile.
 It is the range for the middle 50% of the data.
 It overcomes the sensitivity to extreme data values.
Interquartile Range
3rd Quartile (Q3) = 525
1st Quartile (Q1) = 445
Interquartile Range = Q3 - Q1 = 525 - 445 = 80
425
440
450
465
480
510
575
430
440
450
470
485
515
575
430
440
450
470
490
525
580
435
445
450
472
490
525
590
435
445
450
475
490
525
600
435
445
460
475
500
535
600
435
445
460
475
500
549
600
435
445
460
480
500
550
600
440
450
465
480
500
570
615
440
450
465
480
510
570
615
Variance
•The variance is a measure of variability that
uses all the data
•The variance is based on the difference
between each observation (xi) and the
mean ( x for the sample and μ for the
population).
The variance is the average of the
squared differences between the
observations and the mean value
For the population:
For the sample:
2

(
x


)
i
2 
N
2

(
x

x
)
i
s2 
n 1
Standard Deviation
• The Standard Deviation of a data set
is the square root of the variance.
• The standard deviation is measured
in the same units as the data, making
it easy to interpret.
Computing a standard deviation
For the population:
( xi   ) 2

N
For the sample:
( xi  x ) 2
s
n 1
Coefficient of Variation
Just divide the
standard deviation
by the mean and
multiply times 100
Computing the coefficient
of variation:

100

For the population
s
 100
x
For the sample
The heights (in inches) of 25 individuals were recorded and the
following statistics were calculated
mean = 70range = 20mode = 73variance = 784median = 74 The
coefficient of variation equals
10
1.
2.
3.
4.
11.2%
1120%
0.4%
40%
5
0
0%
4
0%
3
0%
2
0
1
0%
If index i (which is used to determine the
location of the pth percentile) is not an integer,
its value should be
10
1. squared
2. divided by (n 1)
3. rounded down
4. rounded up
5
0
0%
4
0%
3
0%
2
0
1
0%
Which of the following symbols represents the
variance of the population?
1. 2
2. 
3. 
10
5
0%
0%
3
0
1
0%
2
0
Which of the following symbols
represents the size of the sample
1.
2.
3.
4.
2

N
n
10
5
0
0%
4
0%
3
0%
2
0
1
0%
The symbol s is used to represent
1.
2.
3.
4.
5
the variance of the
population
the standard
deviation of the
sample
the standard
deviation of the
population
the variance of the
sample
10
0
0%
4
0%
3
0%
2
0
1
0%
The numerical value of the variance
4.
5
0%
1
0
0
0%
0%
0%
4
3.
10
3
2.
is always larger than the
numerical value of the
standard deviation
is always smaller than the
numerical value of the
standard deviation
is negative if the mean is
negative
can be larger or smaller
than the numerical value
of the standard deviation
2
1.
If the coefficient of variation is 40% and
the mean is 70, then the variance is
1.
2.
3.
4.
10
28
2800
1.75
784
5
0
0%
4
0%
3
0%
2
0
1
0%
Problem 22, page 94
Broker-Assisted 100 Shares at $50 per Share
Range
45.05
Interquartile Range
23.98
Variance
Standard Deviation
Coefficient of
Variation
190.67
13.8
38.02
25th percentile
6
75th percentile
18
interquart 25
24.995
interquart 75
48.975
Mean
36.32
Online 500 Shares at $50 per Share
Range
57.50
Interquartile Range
11.475
Variance
140.633
Standard Deviation
11.859
Coefficient of Variation
57.949
25th percentile
75th percentile
interquart 25
13.475
interquart 75
24.95
Mean
20.46
The variability of
commissions is greater
for broker-assisted
trades
Using Excel to Compute the Sample Variance,
Standard Deviation, and Coefficient of Variation

Formula Worksheet
1
2
3
4
5
6
7
A
B
C
D
E
Apart- Monthly
ment Rent ($)
1
525
Mean =AVERAGE(B2:B71)
2
440
Median =MEDIAN(B2:B71)
3
450
Mode =MODE(B2:B71)
4
615
Variance =VAR(B2:B71)
5
480
Std. Dev. =STDEV(B2:B71)
6
510
C.V. =E6/E2*100
Note: Rows 8-71 are not shown.
Using Excel to Compute the Sample Variance,
Standard Deviation, and Coefficient of Variation

Value Worksheet
1
2
3
4
5
6
7
A
B
C
D
Apart- Monthly
ment Rent ($)
1
525
Mean
2
440
Median
3
450
Mode
4
615
Variance
5
480
Std. Dev.
6
510
C.V.
Note: Rows 8-71 are not shown.
E
490.80
475.00
450.00
2996.16
54.74
11.15
Using Excel’s
Descriptive Statistics Tool
Step 4 When the Descriptive Statistics dialog box
appears:
Enter B1:B71 in the Input Range box
Select Grouped By Columns
Select Labels in First Row
Select Output Range
Enter D1 in the Output Range box
Select Summary Statistics
Click OK
Using Excel’s Descriptive Statistics Tool
• Descriptive Statistics Dialog Box
Using Excel’s
Descriptive Statistics Tool

Value Worksheet (Partial)
1
2
3
4
5
6
7
8
A
B
C
D
E
Apart- Monthly
ment Rent ($)
Monthly Rent ($)
1
525
2
440
Mean
490.8
3
450
Standard Error
6.542348114
4
615
Median
475
5
480
Mode
450
6
510
Standard Deviation 54.73721146
7
575
Sample Variance
2996.162319
Note: Rows 9-71 are not shown.
Using Excel’s
Descriptive Statistics Tool

Value Worksheet (Partial)
9
10
11
12
13
14
15
16
A
8
9
10
11
12
13
14
15
B
430
440
450
470
485
515
575
430
C
D
Kurtosis
Skewness
Range
Minimum
Maximum
Sum
Count
Note: Rows 1-8 and 17-71 are not shown.
E
-0.334093298
0.924330473
190
425
615
34356
70
Measures of Relative Location
and Detecting Outliers
• z-scores
• Chebyshev’s Theorem
• Detecting Outliers
By using the mean
and standard
deviation together,
we can learn more
about the relative
location of
observations in a
data set
z-score
Here we compare
the deviation from
the mean of a single
observation to the
standard deviation
The z-score is compute for each xi :
xi  x
zi 
s
Where
zi is the z-score for xi
x
is the sample mean
s is the sample standard deviation
The z-score can be
interpreted as the
number of standard
deviations xi is from
the sample mean
Z-scores for the starting salary data
Graduate
Starting Salary
xi - x
z-score
1
2850
-90
-0.543
2
2950
10
0.060
3
3050
110
0.664
4
2880
-60
-0.362
5
2755
-185
-1.117
6
2710
-230
-1.388
7
2890
-50
-0.302
8
3130
190
1.147
9
2940
0
0.000
10
3325
385
2.324
11
2920
-20
-0.121
12
2880
-60
-0.362
Chebyshev’s Theorem
At least (1-1/z2) of the data values must be
within z standard deviations of the mean,
where z is greater than 1.
This theorem enables us to
make statements about the
proportion of data values
that must be within a
specified number of
standard deviations from
the mean
Implications of Chebychev’s Theorem
• At least .75, or 75 percent of the data values
must be within 2 ( z = 2) standard deviations of
the mean.
• At least .89, or 89 percent, of the data values
must be within 3 (z = 3) standard deviations of
the mean.
• At least .94, or 94percent, of the data values
must be within 4 (z = 4) standard deviations
from the mean.
Note: z must be greater than one but need not be an integer.
Chebyshev’s Theorem
For example:
Let z = 1.5 with
x = 490.80 and s = 54.74
At least (1  1/(1.5)2) = 1  0.44 = 0.56 or 56%
of the rent values must be between
x - z(s) = 490.80  1.5(54.74) = 409
and
x + z(s) = 490.80 + 1.5(54.74) = 573
(Actually, 86% of the rent values
are between 409 and 573.)
Detecting Outliers
You can use z-scores to
detect extreme values in the
data set, or “outliers.” In the
case of very high z-scores
(absolute values) it is a good
idea to recheck the data for
accuracy.