Download Chapter 3c--Measures of association between variables

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Five-Number Summary
1
Smallest Value
2
First Quartile
3
Median
4
Third Quartile
5
Largest Value
Five-Number Summary
Lowest Value = 425
First Quartile = 445
Median = 475
Third Quartile = 525
425
440
450
465
480
510
575
430
440
450
470
485
515
575
430
440
450
470
490
525
580
435
445
450
472
490
525
590
435
445
450
475
490
525
600
Largest Value = 615
435
445
460
475
500
535
600
435
445
460
475
500
549
600
435
445
460
480
500
550
600
440
450
465
480
500
570
615
440
450
465
480
510
570
615
Box Plot
 A box is drawn with its ends located at the first and
third quartiles.
 A vertical line is drawn in the box at the location of
the median (second quartile).
375 400 425 450 475 500 525 550 575 600 625
Q1 = 445
Q3 = 525
Q2 = 475
Box Plot



Limits are located (not drawn) using the interquartile
range (IQR).
Data outside these limits are considered outliers.
The locations of each outlier is shown with the
symbol * .
… continued
Box Plot
 The lower limit is located 1.5(IQR) below Q1.
Lower Limit: Q1 - 1.5(IQR) = 445 - 1.5(75) = 332.5
 The upper limit is located 1.5(IQR) above Q3.
Upper Limit: Q3 + 1.5(IQR) = 525 + 1.5(75) = 637.5
 There are no outliers (values less than 332.5 or
greater than 637.5) in the apartment rent data.
Box Plot
• Whiskers (dashed lines) are drawn from
the ends of the box to the smallest and
largest data values inside the limits.
375 400 425 450 475 500 525 550 575 600 625
Smallest value
inside limits = 425
Largest value
inside limits = 615
Measures of Association
Between two Variables
•Covariance
•Correlation coefficient
Covariance
• Covariance is a measure of linear association
between variables.
• Positive values indicate a positive correlation
between variables.
• Negative values indicate a negative correlation
between variables.
To compute a covariance for variables x and y
 xy 
( xi   x )( yi  u y )
For populations
N
( xi  x )( yi  y )
s xy 
n 1
For samples
Mortgage Interest Rates and Monthly Home Sales, 1980-2004
17
Mortgage Interest Rate
(Percent)
n = 299
x  60.3
II
15
I
13
11
y  9.02
9
IV
7
III
5
3
15
35
55
75
95
Monthly Home Sales (thousands)
115
If the majority of the
sample points are
located in quadrants II
and IV, you have a
negative correlation
between the variables—
as we do in this case.
Thus the covariance will
have a negative sign.
The (Pearson) Correlation Coefficient
A covariance will tell you if 2
variables are positively or
negatively correlated—but it will
not tell you the degree of
correlation. Moreover, the
covariance is sensitive to the unit
of measurement. The correlation
coefficient does not suffer from
these defects
The (Pearson) Correlation Coefficient
rxy 
s xy
sx s y
 xy
 xy 
 x y
Note that:
For populations
For samples
 1   xy  1
and
 1  rxy  1
Distance Traveled in 5
Hours (Miles)
Correlation Coefficient = 1
500
400
300
200
100
0
0
20
40
60
Average Speed (MPH)
80
100
I have 7 hours per
week for exercise
Time Spent Swimming
(Hours)
Correlation Coefficient = -1
8
7
6
5
4
3
2
1
0
0
2
4
6
Time Spent Jogging (Hours)
8
Example: Golf Stats
A golfer is interested in
investigating the relationship, if any,
between driving distance and 18-hole
score.
Average Driving
Average
Distance (yds.) 18-Hole Score
277.6
69
259.5
71
269.1
70
267.0
70
255.6
71
272.9
69
Using Excel to Compute the
Covariance and Correlation Coefficient

1
2
3
4
5
6
7
8
Formula Worksheet
A
Average
Drive
277.6
259.5
269.1
267.0
255.6
272.9
B
18-Hole
Score
69
71
70
70
71
69
C
D
E
Pop. Covariance =COVAR(A2:A7,B2:B7)
Samp. Correlation =CORREL(A2:A7,B2:B7)
Using Excel to Compute the
Covariance and Correlation Coefficient

1
2
3
4
5
6
7
8
Value Worksheet
A
Average
Drive
277.6
259.5
269.1
267.0
255.6
272.9
B
18-Hole
Score
69
71
70
70
71
69
C
D
E
Pop. Covariance
Samp. Correlation
-5.9
-0.9631
The Weighted Mean and
Working with Grouped Data
•
•
•
•
Weighted mean
Mean for grouped data
Variance for grouped data
Standard deviation for grouped data.
GPA Example
A grade point average is a
weighted-mean. That is, 4hour courses are weighted
more than 3- hour courses
when computing a GPA
The Weighted Mean
 wi xi
x
 wi
Where wi is the weight attached
to observation i
Example: Raw Materials Purchase
Purchase
Cost per
Pound($)
Number of
Pounds
1
3.00
1200
2
3.40
500
3
2.80
2750
4
2.90
1000
5
3.25
800
Let x1 = 3.00, x2 = 3.40, x3 = 2.80, x4 =2.90, and x5 = 3.25
Let w1 = 1200, w2 = 500, w3 = 2750, w4 =1000, and w5 =800
1200(3)  500(3.40)  2750(2.80)  1000(2.90)  800(3.25)
1200  500  2750  1000  800
18,500

 2.96
6250
x
Thus:
Grouped Data
 The weighted mean computation can be used to
obtain approximations of the mean, variance, and
standard deviation for the grouped data.
 To compute the weighted mean, we treat the
midpoint of each class as though it were the mean
of all items in the class.
 We compute a weighted mean of the class midpoints
using the class frequencies as weights.
 Similarly, in computing the variance and standard
deviation, the class frequencies are used as weights.
Sample Mean for Grouped Data
 fi M i

N
 fi M i
x
n
For populations
For samples
Where fi is the frequency of class i and Mi is
the midpoint of class i
Example: Apartment Rents
Given below is the previous sample of monthly rents
for 70 studio apartments, presented here as grouped
data in the form of a frequency distribution.
Rent ($)
420-439
440-459
460-479
480-499
500-519
520-539
540-559
560-579
580-599
600-619
Frequency
8
17
12
8
7
4
2
4
2
6
Sample Mean for Grouped Data
Rent ($)
420-439
440-459
460-479
480-499
500-519
520-539
540-559
560-579
580-599
600-619
Total
fi
8
17
12
8
7
4
2
4
2
6
70
Mi
429.5
449.5
469.5
489.5
509.5
529.5
549.5
569.5
589.5
609.5
f iMi
3436.0
7641.5
5634.0
3916.0
3566.5
2118.0
1099.0
2278.0
1179.0
3657.0
34525.0
34,525
x
 493.21
70
This approximation
differs by $2.41 from
the actual sample
mean of $490.80.
Variance for Grouped Data
2

f
(
M


)
2
i
i
 
N
2

f
(
M

x
)
i
i
s2 
n 1
For populations
For samples
Sample Variance for Grouped Data
Rent ($)
420-439
440-459
460-479
480-499
500-519
520-539
540-559
560-579
580-599
600-619
Total
fi
8
17
12
8
7
4
2
4
2
6
70
Mi
429.5
449.5
469.5
489.5
509.5
529.5
549.5
569.5
589.5
609.5
Mi - x
-63.7
-43.7
-23.7
-3.7
16.3
36.3
56.3
76.3
96.3
116.3
(M i - x )2 f i (M i - x )2
4058.96 32471.71
1910.56 32479.59
562.16
6745.97
13.76
110.11
265.36
1857.55
1316.96
5267.86
3168.56
6337.13
5820.16 23280.66
9271.76 18543.53
13523.36 81140.18
208234.29
continued
Sample Variance for Grouped Data
• Sample Variance
s2 = 208,234.29/(70 – 1) = 3,017.89
• Sample Standard Deviation
s  3,017.89  54.94
This approximation differs by only $.20
from the actual standard deviation of $54.74.
Related documents