Download ch4_c_f01_105

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
MEASURES OF VARIABILITY
• Variance
– Population variance
– Sample variance
• Standard Deviation
– Population standard deviation
– Sample standard deviation
• Coefficient of Variation (CV)
– Sample CV
– Population CV
1
MEASURES OF VARIABILITY
POPULATION VARIANCE
• The population variance is the mean squared
deviation from the population mean:
N
2 
•
•
•
•
•
(x
i 1
i
 )
N
Where 2 stands for the population variance
 is the population mean
N is the total number of values in the population
xi is the value of the i-th observation.
 represents a summation
2
MEASURES OF VARIABILITY
SAMPLE VARIANCE
• The sample variance is defined as follows:
N
s2 
•
•
•
•
•
(x
i 1
i
 x)
n 1
Where s2 stands for the sample variance
x is the sample mean
n is the total number of values in the sample
xi is the value of the i-th observation.
 represents a summation
3
MEASURES OF VARIABILITY
SAMPLE VARIANCE
• A sample of monthly advertising expenses (in 000$)
is taken. The data for five months are as follows: 2.5,
1.3, 1.4, 1.0 and 2.0. Compute the sample variance.
4
MEASURES OF VARIABILITY
SAMPLE VARIANCE
• Notice that the sample variance is defined as the sum
of the squared deviations divided by n-1.
• Sample variance is computed to estimate the
population variance.
• An unbiased estimate of the population variance may
be obtained by defining the sample variance as the
sum of the squared deviations divided by n-1 rather
than by n.
• Defining sample variance as the mean squared
deviation from the sample mean tends to
underestimate the population variance.
5
MEASURES OF VARIABILITY
SAMPLE VARIANCE
• A shortcut formula for the sample variance:
2
n


 
  xi  
n
1 
 i 1  
2
s2 
x

 i

n  1  i 1
n




•
•
•
•
Where s2 is the sample variance
n is the total number of values in the sample
xi is the value of the i-th observation.
 represents a summation
6
MEASURES OF VARIABILITY
SAMPLE VARIANCE
• A sample of monthly sales expenses (in 000 units) is
taken. The data for five months are as follows: 264,
116, 165, 101 and 209. Compute the sample
variance using the short-cut formula.
7
MEASURES OF VARIABILITY
SAMPLE VARIANCE
• The shortcut formula for the sample variance:
2
n


 
  xi  
n
1 
 i 1  
2
s2 
x

 i

n  1  i 1
n




•
n
x

If you have the sum of the measurements
i 1
i
already computed, the above formula is a shortcut
because you need only to compute the sum of the
squares,
n
x
i 1
i
8
MEASURES OF VARIABILITY
POPULATION/SAMPLE STANDARD DEVIATION
• The standard deviation is the positive square root of
the variance:
2



Population standard deviation:
2
s

s
Sample standard deviation:
• Compute the standard deviations of advertising and
sales.
9
MEASURES OF VARIABILITY
POPULATION/SAMPLE STANDARD DEVIATION
• Compute the sample standard deviation of
advertising data: 2.5, 1.3, 1.4, 1.0 and 2.0
• Compute the sample standard deviation of sales
data: 264, 116, 165, 101 and 209
10
MEASURES OF VARIABILITY
POPULATION/SAMPLE CV
• The coefficient of variation is the standard deviation
divided by the means

Population coefficient of variation: CV 

s
Sample coefficient of variation: cv 
x
11
MEASURES OF VARIABILITY
POPULATION/SAMPLE CV
• Compute the sample coefficient of variation of
advertising data: 2.5, 1.3, 1.4, 1.0 and 2.0
• Compute the sample coefficient of variation of sales
data: 264, 116, 165, 101 and 209
12
MEASURES OF ASSOCIATION
• Scatter diagram plot provides a graphical description
of positive/negative, linear/non-linear relationship
• Some numerical description of the positive/negative,
linear/non-linear relationship are obtained by:
– Covariance
• Population covariance
• Sample covariance
– Coefficient of correlation
• Population coefficient of correlation
• Sample coefficient of correlation
13
MEASURES OF ASSOCIATION: EXAMPLE
• A sample of monthly advertising and sales data are
collected and shown below:
Month
Sales
(000 units)
Advertising
(000 $)
1
2
3
4
5
264
116
165
101
209
2.5
1.3
1.4
1.0
2.0
• How is the relationship between sales and
advertising? Is the relationship linear/non-linear,
positive/negative, etc.
14
POPULATION COVARIANCE
• The population covariance is mean of products of
deviations from the population mean:
N
COV ( X ,Y ) 
•
•
•
•
 x
i 1
i
  x  yi   y 
N
Where COV(X,Y) is the population covariance
x, y are the population means of X and Y respectively
N is the total number of values in the population
xi , yi are the values of the i-th observations of X and Y
respectively.
15
•  represents a summation
SAMPLE COVARIANCE
• The sample covariance is mean of products of
deviations from the sample mean:
 x  x y  y 
n 1
cov( X ,Y ) 
•
•
•
•
i 1
i
i
n 1
Where cov(X,Y) is the sample covariance
x , y are the sample means of X and Y respectively
n is the total number of values in the population
xi , yi are the values of the i-th observations of X and Y
respectively.
16
•  represents a summation
SAMPLE COVARIANCE
Sales
Advertising
Month (in 000$) (in 000 units)
264
2.5
1
116
1.3
2
165
1.4
3
101
1
4
209
2
5
171
1.64
Mean
0.602495 67.18258703
SD
Total=
cov =
17
POPULATION/SAMPLE COVARIANCE
• If two variables increase/decrease together,
covariance is a large positive number and the
relationship is called positive.
• If the relationship is such that when one variable
increases, the other decreases and vice versa, then
covariance is a large negative number and the
relationship is called negative.
• If two variables are unrelated, the covariance may be
a small number.
• How large is large? How small is small?
18
POPULATION/SAMPLE COVARIANCE
• How large is large? How small is small? A drawback
of covariance is that it is usually difficult to provide
any guideline how large covariance shows a strong
relationship and how small covariance shows no
relationship.
• Coefficient of correlation can overcome this drawback
to a certain extent.
19
POPULATION COEFFICIENT OF CORRELATION
• The population coefficient of correlation is the
population covariance divided by the population
standard deviations of X and Y:

COV ( X ,Y )
 x y
• Where  is the population coefficient of correlation
• COV(X,Y) is the population covariance
• x, y are the population means of X and Y
respectively
20
SAMPLE COEFFICIENT OF CORRELATION
• The sample coefficient of correlation is the sample
covariance divided by the sample standard deviations
of X and Y:

COV ( X ,Y )
 x y
• Where r is the sample coefficient of correlation
• cov(X,Y) is the sample covariance
• sx, sy are the sample means of X and Y respectively
21
SAMPLE COEFFICIENT OF CORRELATION
Sales
Advertising
Month (in 000$) (in 000 units)
264
2.5
1
116
1.3
2
165
1.4
3
101
1
4
209
2
5
171
1.64
Mean
0.602495 67.18258703
SD
Total=
cov =
r=
22
POPULATION/SAMPLE
COEFFICIENT OF CORRELATION
• The coefficient of correlation is always between -1
and +1.
– Values near -1 or +1 show strong relationship
– Values near 0 show no relationship’
– Values near 1 show strong positive linear
relationship
– Values near -1 show strong negative linear
relationship
23
EXAMPLE
• Salary and expenses for cultural activities, and sports
related activities are collected from 100 households. Data
of only 5 households shown below:
Salary and expenses
data for 100 households
Salary Culture Sports
$54,600 $1,020
$990
$57,500 $1,100
$460
$53,300
$900
$780
$43,500
$570
$860
$57,200
$900 $1,390
How are the
relationships (linear/nonlinear, positive/negative)
between (i) salary
and culture, (ii) salary
and sports, and
(iii) sports and culture?
24
Expenses for Cultural
Activities
SALARY-CULTURE
$1,600
$1,200
$800
$400
$0
$35,000
$55,000
$75,000
Salary
cov = 1094787, r = 0.5065 (positive, linear)
$95,000
25
Expenses for cultural
activities
SPORTS-CULTURE
1600
1200
800
400
0
$500
$1,000
$1,500
$2,000
Expenses for sports related activities
cov = -33608, r = -0.5201 (negative, linear)
26
SALARY-SPORTS
Expenses for sports
related activities
$1,900
$1,400
$900
$400
$35,000
$55,000
$75,000
$95,000
Salary
cov = -219026, r = -0.08122 (no linear relationship)
27
Related documents