Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
• Spearman’s correlation coefficient , rs, can be computed as Pearson’s r on the ranks; i.e., rank the X’s (among the X’s) and the Y’s (among the Y’s) and then compute the correlation of the ranks… • See Table 5.2.1 and let’s do it in R (use cor with method=“s” or “p” on the ranks...) • We may test the null hypothesis of no association between X and Y by doing a permutation test on the ranks – all possible assignments of the ranks of the Y’s to the ranks of the X’s – if our correspondence yields an unusually high (or low) value of rs, then we should reject the hypothesis of no association between X and Y. • We may also test the above hypothesis with the same normal approximation used for Pearson’s r: Z= rs(sqrt(n-1)); i.e. rs is approx. N(0,1/(sqrt(n-1)) • What about ties?? There are two methods mentioned on p.155ff: – compute adjusted ranks (midranks) and apply the same formulas we’ve just mentioned – use the tie-adjusted formulae given on page 156 (see the next slide...) – the author (and I too!) recommend the former. • The following formula for Spearman’s rank correlation (without ties) appears in the literature and we’ll mention it here. It is the one that can be modified for ties – see page 156 where it is defined... n 6D 2 rs 1 where D ( R ( X ) R ( Y )) i i n(n2 1) i 1 Verify that it gives the same results – see problem #13 on page 192-193 for an outline of the theoretical proof of the equivalence of this formula to the definition of rs . • Another measure of association is Kendall’s Tau, t, which looks at the distribution of concordant and discordant pairs of the (X,Y)s: • (Xi,Yi) and (Xj,Yj) are concordant if Xi < Xj implies Yi < Yj and discordant if Xi < Xj implies Yi > Yj (or equivalently, concordant if (Xi – Xj)( Yi - Yj ) > 0; discordant if (Xi – Xj)( Yi - Yj ) < 0). X and Y are positively associated if pairs are more likely to be concordant than discordant and negatively associated if pairs are more likely to be discordant than concordant. t 2 P[( X i X j )(Yi Y j ) 0] 1 • Note that tau is just rescaled to be between -1 and +1; if there is no association, then the probability of a concordant pair is the same as the probability of a discordant pair, .5, so t = 0. • We estimate tau by counting the fraction of concordant pairs in the data, doubling it and subtracting 1 j j rt 2 V i 1 i n 2 1 • Here, 1, if ( X i X j )(Yi Y j ) 0 Vi U ij , where U ij 0, if ( X X )( Y Y ) 0 i j i j j i 1 n • Ranks may also be used to compute tau, since pairs of ranks are concordant or discordant according to whether the original pairs are concordant or discordant. • R computes Kendall’s tau in cor.test and SAS computes it in PROC CORR; • Exact p-values for testing the hypothesis of no association between X and Y may be obtained by a permutation test; approximate p-values may be obtained from the large sample properties of Kendall’s tau statistic: rt is apprximately N (0, SD(rt )), 4n 10 where VAR (rt ) 9(n 2 n) • HW: Read Chapter 5 through page 163 – we will complete this topic (association between two continuous variables) on Thursday – have your questions ready by then. Do problems #3-5 on page 189-190 … we’ll discuss them next class...