Download Chapter 5 - danagoins

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Chapter 5
Summarizing
Bivariate Data
Correlation
Variables:
Response variable (y) measures an
outcome (dependent)
Explanatory variable (x) helps explain
or influence changes in a response
variable (independent)
Suppose we found the age and weight
of a sample of 10 adults.
Create a scatterplot of the data
below.
Is there any relationship between the
age and weight of these adults?
Age 24
30
41
28
50
46
49
35
20
39
Wt 256 124 320 185 158 129 103 196 110 130
Does there seem to be a relationship
between age and weight of these
adults?
Suppose we found the height and weight of a
sample of 10 adults.
Create a scatterplot of the data below.
Is there any relationship between the height
and weight of these adults?
Is it positive or negative? Weak or
strong?
Ht
74
65
77
72
68
60
62
73
61
64
Wt 256 124 320 185 158 129 103 196 110 130
Does there seem to be a relationship
between height and weight of these
adults?
When describing relationships between
two variables, you should address:
• Direction (positive, negative, or neither)
• Strength of the relationship (how much
scattering?)
• Form ( linear or some other pattern)
• Unusual features (outliers or influential points)
And ALWAYS in context of the problem!
Identify as having a positive association,
a negative association, or no association.
1. Heights of mothers & heights of their adult +
daughters
2. Age of a car in years and its current value 3. Weight of a person and calories consumed +
4. Height of a person and the person’s birth
NO
month
5. Number of hours spent in safety training and
the number of accidents that occur
The closer the points in a
The farther away from a
scatterplot are to a straight
straight line – the weaker the
line - the stronger the
relationship
relationship.
Correlation
measures the direction and the
strength of a linear relationship
between 2 quantitative variables.
Find the mean
and standard
deviation of
the heights and
weights of the
10 students:
Height
(in)
Weight
(lb)
74
256
65
124
77
320
x  67.6
sx  6.06
72
185
68
158
60
129
62
103
y  171.1
73
196
61
110
64
130
s y  70.25
r
 x i  x   y  y   xi  x  y i  y 




i
  s  s 
 s  
x
 y 
 x 
 sy  
 xi  x  y i  y 
1



n  1  s x  s y 
=
Find the mean
and standard
deviation of
the heights and
weights of the
10 students:
Height
(in)
Weight
(lb)
74
256
65
124
77
320
72
185
68
158
60
129
62
103
73
196
61
110
64
130
r
 x i  x   y  y   xi  x  y i  y 




i
  s  s 
 s  
x
 y 
 x 
 sy  
1.06
-.43
1.55
.73
.07
-1.25
-.92
.89
-1.09
-.59
1.21
-.67
2.12
.20
-.19
-.60
-.97
.35
-.87
-.59
 xi  x  y i  y 
1



n  1  s x  s y 
1.28
.29
3.29
.14
-.01
.75
.90
.32
.95
.35
=
8.24/9=
.9157
Correlation Coefficient (r)• A quantitative assessment of the strength
& direction of the linear relationship
between bivariate, quantitative data
• Pearson’s sample correlation is used most
• parameter - r (rho)
• statistic - r
 xi  x  yi  y 
1




r



n  1  s x  s y 
Speed Limit
(mph)
55
50
45
40
30
20
Avg. # of
accidents
(weekly)
28
25
21
17
11
6
Calculate r. Interpret r in
context.
r = .9964
There is a strong, positive, linear relationship
between speed limit and average number of
accidents per week.
Properties of r
(correlation coefficient)
• legitimate values of r is [-1,1]
No
Correlation
Strong
correlation
Moderate
Correlation
Weak correlation
-1 -.8
-.5
0
.5
.8
1
Some Correlation Pictures
Some Correlation Pictures
Some Correlation Pictures
Some Correlation Pictures
Some Correlation Pictures
Some Correlation Pictures
x (in mm) 12
y
4
15
7
21
10
32
14
26
9
19
8
Find r. .9181
Interpret r in context.
There is a strong, positive, linear
relationship between speed limit and the
number of weekly accidents.
24
12
• value of r is not changed by any
transformations
x (in mm) 12
y
4
15
7
21
10
32
14
26
9
19
8
24
12
Find
r. following
Do the
transformations &
.9181
Change to cmcalculate
& find r.r .9181
1) 5(x + 14)
2) (y + 30) ÷ 4
The correlations are the same.
STILL = .9181
• value of r does not depend on
which of the two variables is labeled
x
Switch x & y & find r.
Type: LinReg L2, L1
The correlations are the same.
• value of r is non-resistant
x
y
12
4
15
7
21
10
32
14
26
9
19
8
24
22
Find r.
Outliers affect the correlation
coefficient
• value of r is a measure of the extent
to which x & y are linearly related
Find the correlation for these points:
x
-3 -1 1 3 5 7 9
Y
40 20 8 4 8 20 40
What does this correlation mean?
Sketch the scatterplot
r = 0, but has a definite
relationship!
1) Correlation makes no distinction between explanatory
and response variable. It is unitless.
2) Correlation does not change when we change the units
of measurement of x, y, or both.
3) Correlation requires both variables to be quantitative.
4) Correlation does not describe curved relationship
between variables, no matter how strong. Only the
linear relationship between variables.
5) Like the mean and standard deviation, the correlation
is not resistant: r is strongly affected by a few
outlying observations.
Correlation does not imply
causation
Correlation does not imply causation
Correlation does not
imply causation
Related documents