Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Measuring Association September 10, 2001 Statistics for Psychosocial Research Lecture 2 Today’s Topics • • • • Covariance Pearson correlation Spearman correlation Association with non-linear data – tetrachoric / polychoric correlation – odds ratios • Association matrices Measuring Associations • Goal: Evaluate assocations between pairs of variables being used to measure a construct of interest • Examples: – depression: sleeping problems ~ guilt? – disability: time to walk 10 m ~ self-reported difficulty walking 10 m? – schizophrenia: social class ~ schizophrenia? – SES: education ~ income? Associations in Psychosocial Research • Crucial to the process of defining a construct (1) “too” associated? (2) not associated? • not appropriately describing “construct” • measuring different dimensions of “construct” (e.g. mood versus somatic symptoms of depression) Associations between variables affect…. • • • • • Reliability Validity Factor Analysis Latent Class Analysis Structural Equation Models Measurement Issue Variance and Covariance • Variance: Measures variability in one variable, X. N sx 2 1 N 1 i 1 ( xi x ) 2 x 2 • Covariance: Measures how to two variables, X and Y, covary. s xy N 1 N 1 (x i 1 i x)( yi y ) xy 0. 0.1 0.2 0.3 0.4 0. 0.1 0.2 0.3 0.4 Examples of Variance -1 -5 0 05 11 05 X -1 -5 0 05 11 05 Y -10 -5 0 5 10 -10 -5 0 5 10 Examples of Covariance -2 0 2 X 4 -3 -2 -1 0 1 2 3 X Correlation, r Correlation is a scaled version of covariance rxy s xy 2 sx s y 2 -1 < r < 1 r=1 perfect positive correlation r = -1 perfect negative correlation r=0 uncorrelated Covariance and Correlation • When are they appropriate measures of association? • What type of association do they describe? • Transformations • Scatterplots • Outliers Spearman Correlation • Use when: – skewed data – outliers – sparse data • Effect: – downweights outliers – smooths a curve to a straight line Spearman Correlation • Method: 0.4 0.8 1.2 1.6 – sort x and y – replace data with ranks – calculate pearson correlation on ranks. data x y 0.1 0.4 0.3 0.6 0.5 0.5 0.6 0.9 0.8 1.8 1.0 1.2 r=0.79 0 .2 0 .4 0 .6 0 .8 1 .0 x ranks x* y* 1 1 2 3 3 2 4 4 5 6 6 5 r=0.89 Spearman Correlation Spearman r = 0.59 0 20 40 60 80 10 -5 0 5 10 15 Pearson r = 0.72 00 .0 1 .5 1 .0 2 .5 2 .0 .5 0 2 0 40 60 80 10 x1 x Problems with Correlation/Covariance between variables What if one (or both) variables is (are) not really continuous? e.g. number of pregnancies and education level 0. 0.1 0.2 0.3 0.4 0. 0.5 0.15 r = -0.6 1 2 3 4 02468 N u mb e r Educa of Pr e 0 2 4 6 8 Is correlation appropriate? 11 .0 2 .5 2 .0 3 .5 3 .0 4 .5 .0 E d u c a tio n Other issues • Binary: r = 0.35 0. 0.4 0.8 • Highly skewed or “floor” or “ceiling” effects – e.g. number of hospital admissions, percent humidity daily in Baltimore in July, minimental exam score • Ordinal: Takes finite number of values – e.g. on a scale of 1 to 5 0 0 .0 0 .2 0 .4 0 .6 1 .8 .0 x Binary Example: Disability • Two types of association – redundancy: b and c cells are close to 0 – hierarchy: either b OR c is close to 0, but other is not. • Pearson correlation mixes up Difficulty No association and similarity of Walking 1 mile “marginal”distribution Yes • Consequences: If hierarchy is relevant, you get low reliability, consistency, and misleading internal validity by using pearson correlation. Difficulty Walking 1/4 Mile No Yes 40 0 40 40 20 60 80 20 100 Alternative Measures • Tetrachoric Correlation – binary variables • Polychoric Correlation – ordinal variables • Odds Ratio – binary variables Tetrachoric Correlation • Estimates what the correlation between two binary variables would be if the “ratings” were made on a continuous scale. • Example: difficulty walking up 10 steps and difficulty lifting 10 lbs. no d d if iff fic ic L e ve l Tetrachoric Correlation • Assumes that both “traits” are normally distributed • Correlation, r, measures how narrow the ellipse is. • a, b, c, d are the proportions in each quadrant d c a b Tetrachoric Correlation For = ad/bc, Approximation 1: 1 Q 1 Approximation 2 (Digby): 3 4 1 Q 34 1 Tetrachoric Correlation • Example: – Tetrachoric correlation = 0.61 – Pearson correlation Difficulty = 0.41 Lifting 10 lb. – Odds ratio = 6 • Interpretation? – Same as Pearson correlation. Difficulty Walking Up 10 Steps No Yes No 40 10 50 Yes 20 30 50 60 40 100 Odds Ratio • Measure of association between two binary variables • Risk associated with x given y. • Example: odds of difficulty walking up 10 steps to the odds of difficulty lifting 10 lb: OR p1 /(1 p1 ) p2 /(1 p2 ) ad bc ( 40)( 30) ( 20)(10) 6 Odds Ratio Difficulty Walking 1/4 Mile Difficulty Walking 1 mile ad bc No Yes No 40 0 40 Yes 40 20 60 80 20 100 ( 40)( 20) ( 40)( 0) Pros and Cons • Tetrachoric correlation – same interpretation as spearman and pearson correlation – “difficult” to calculate • Odds Ratio – easy to understand, but no “perfect” association that is manageable – easy to calculate – not comparable to correlations • May give you different results/inference! Association Matrices • Age, income, education • Correlation Matrix grade income grade 1.00 income 0.45 age -0.25 age 0.45 -0.25 1.00 -0.13 -0.13 1.00 • Covariance Matrix grade income age grade income age 6.61 28.18 -5.77 28.18 -5.77 592.69 -29.10 -29.10 81.23 Association Matrices • Depression: depressed mood, sleep problems, fatigue • Odds Ratio Matrix depress sleep fatigue depress --- 8.17 10.91 sleep 8.17 --- 16.12 fatigue 10.91 16.12 ---