Download Measuring Association

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Regression analysis wikipedia , lookup

Data assimilation wikipedia , lookup

Psychometrics wikipedia , lookup

Financial correlation wikipedia , lookup

Transcript
Measuring Association
September 10, 2001
Statistics for Psychosocial Research
Lecture 2
Today’s Topics
•
•
•
•
Covariance
Pearson correlation
Spearman correlation
Association with non-linear data
– tetrachoric / polychoric correlation
– odds ratios
• Association matrices
Measuring Associations
• Goal: Evaluate assocations between pairs of
variables being used to measure a construct of
interest
• Examples:
– depression: sleeping problems ~ guilt?
– disability: time to walk 10 m ~ self-reported difficulty
walking 10 m?
– schizophrenia: social class ~ schizophrenia?
– SES: education ~ income?
Associations in Psychosocial Research
• Crucial to the process of defining a
construct
(1) “too” associated?
(2) not associated?
• not appropriately describing “construct”
• measuring different dimensions of “construct”
(e.g. mood versus somatic symptoms of
depression)
Associations between variables affect….
•
•
•
•
•
Reliability
Validity
Factor Analysis
Latent Class Analysis
Structural Equation Models
 Measurement
Issue
Variance and Covariance
• Variance: Measures variability in one variable, X.
N
sx 2 
1
N 1

i 1
( xi  x ) 2   x 2
• Covariance: Measures how to two variables, X
and Y, covary.
s xy 
N
1
N 1
 (x
i 1
i
 x)( yi  y )   xy
0. 0.1 0.2 0.3 0.4
0. 0.1 0.2 0.3 0.4
Examples of Variance
-1
-5
0
05
11
05
X
-1
-5
0
05
11
05
Y
-10 -5 0 5 10
-10 -5 0 5 10
Examples of Covariance
-2
0
2
X
4
-3
-2
-1
0
1
2
3
X
Correlation, r
Correlation is a scaled version of covariance
rxy 
s xy
2
sx s y
2
-1 < r < 1
r=1
perfect positive correlation
r = -1
perfect negative correlation
r=0
uncorrelated
Covariance and Correlation
• When are they appropriate measures of
association?
• What type of association do they describe?
• Transformations
• Scatterplots
• Outliers
Spearman Correlation
• Use when:
– skewed data
– outliers
– sparse data
• Effect:
– downweights outliers
– smooths a curve to a straight line
Spearman Correlation
• Method:
0.4 0.8 1.2 1.6
– sort x and y
– replace data with ranks
– calculate pearson
correlation on ranks.
data
x
y
0.1 0.4
0.3 0.6
0.5 0.5
0.6 0.9
0.8 1.8
1.0 1.2
r=0.79
0 .2
0 .4
0 .6
0 .8
1 .0
x
ranks
x* y*
1
1
2
3
3
2
4
4
5
6
6
5
r=0.89
Spearman Correlation
Spearman r = 0.59
0 20 40 60 80 10
-5 0 5 10 15
Pearson r = 0.72
00
.0
1
.5
1
.0
2
.5
2
.0
.5 0 2 0
40
60
80
10
x1
x
Problems with Correlation/Covariance
between variables
What if one (or both) variables is (are) not really continuous?
e.g. number of pregnancies and education level
0. 0.1 0.2 0.3 0.4
0. 0.5 0.15
r = -0.6
1 2 3 4
02468
N
u mb e r Educa
of
Pr e
0 2 4 6 8
Is correlation appropriate?
11
.0
2
.5
2
.0
3
.5
3
.0
4
.5
.0
E
d u c a tio n
Other issues
• Binary: r = 0.35
0. 0.4 0.8
• Highly skewed or “floor” or “ceiling” effects
– e.g. number of hospital admissions, percent
humidity daily in Baltimore in July, minimental exam score
• Ordinal: Takes finite number of values
– e.g. on a scale of 1 to 5
0
0
.0
0
.2
0
.4
0
.6
1
.8
.0
x
Binary Example: Disability
• Two types of association
– redundancy: b and c cells are
close to 0
– hierarchy: either b OR c is
close to 0, but other is not.
• Pearson correlation mixes up
Difficulty No
association and similarity of
Walking 1 mile
“marginal”distribution
Yes
• Consequences: If hierarchy is
relevant, you get low reliability,
consistency, and misleading
internal validity by using
pearson correlation.
Difficulty Walking
1/4 Mile
No
Yes
40
0
40
40
20
60
80
20
100
Alternative Measures
• Tetrachoric Correlation
– binary variables
• Polychoric Correlation
– ordinal variables
• Odds Ratio
– binary variables
Tetrachoric Correlation
• Estimates what the
correlation between two
binary variables would be
if the “ratings” were made
on a continuous scale.
• Example: difficulty
walking up 10 steps and
difficulty lifting 10 lbs.
no
d
d
if
iff
fic
ic
L e ve l
Tetrachoric Correlation
• Assumes that both
“traits” are normally
distributed
• Correlation, r,
measures how narrow
the ellipse is.
• a, b, c, d are the
proportions in each
quadrant
d
c
a
b
Tetrachoric Correlation
For  = ad/bc,
Approximation 1:
1
Q
1
Approximation 2 (Digby):
3 4  1
Q 34
 1
Tetrachoric Correlation
• Example:
– Tetrachoric correlation
= 0.61
– Pearson correlation
Difficulty
= 0.41
Lifting 10 lb.
– Odds ratio = 6
• Interpretation?
– Same as Pearson
correlation.
Difficulty Walking
Up 10 Steps
No
Yes
No
40
10
50
Yes
20
30
50
60
40
100
Odds Ratio
• Measure of association
between two binary
variables
• Risk associated with x
given y.
• Example:
odds of difficulty walking
up 10 steps to the odds
of difficulty lifting 10
lb:
OR 
p1 /(1 p1 )
p2 /(1 p2 )

ad
bc

( 40)( 30)
( 20)(10)
6
Odds Ratio
Difficulty Walking
1/4 Mile
Difficulty
Walking 1 mile
ad
bc
No
Yes
No
40
0
40
Yes
40
20
60
80
20
100

( 40)( 20)
( 40)( 0)

Pros and Cons
• Tetrachoric correlation
– same interpretation as spearman and pearson
correlation
– “difficult” to calculate
• Odds Ratio
– easy to understand, but no “perfect” association that is
manageable
– easy to calculate
– not comparable to correlations
• May give you different results/inference!
Association Matrices
• Age, income, education
• Correlation Matrix
grade
income
grade 1.00
income 0.45
age
-0.25
age
0.45 -0.25
1.00 -0.13
-0.13 1.00
• Covariance Matrix
grade
income
age
grade
income
age
6.61
28.18
-5.77
28.18 -5.77
592.69 -29.10
-29.10 81.23
Association Matrices
• Depression: depressed mood, sleep
problems, fatigue
• Odds Ratio Matrix
depress
sleep
fatigue
depress
---
8.17
10.91
sleep
8.17
---
16.12
fatigue
10.91
16.12
---