Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
CHAPTER 19 Correspondence Analysis Tables, Figures, and Equations From: McCune, B. & J. B. Grace. 2002. Analysis of Ecological Communities. MjM Software Design, Gleneden Beach, Oregon http://www.pcord.com Figure 19.1. A synthetic data set of eleven species with noiseless hump-shaped responses to an environmental gradient. The gradient was sampled at eleven points (sample units), numbered 1-11. PC A SU11 SU1 SU2 SU10 SU3 CA EnvGradA SU11 SU1 SU10 SU2 SU4 SU8 Axis 2 Axis 2 SU9 SU5 SU7 SU3 SU9 EnvGradA SU4 SU8 SU7 SU6 Axis 1 SU5 SU6 Axis 1 Figure 19.2. Comparison of PCA and CA of the data set shown in Figure 19.1. PCA curves the ends of the gradient in, while CA does not. The vectors indicate the correlations of the environmental gradient with the axis scores. CA 11 10 10 9 8 5 6 7 7 4 5 6 7 8 8 11 910 11 9 6 5 4 2 3 PCA 1 2 3 4 NMS 3 1 2 1 Axis 1 Figure 19.3. One-dimensional ordinations of the data set in Figures 19.1 and 19.2, using nonmetric multidimensional scaling (NMS), PCA, and CA. Both NMS and CA ordered the points correctly on the first axis, while PCA did not. How it works Axes are rotated simultaneously through species space and sample space with the object of maximizing the correspondence between the two spaces. This produces two sets of linear equations, X and Y, where X = A Y and Y = A' X A is the original data matrix with n rows (henceforth sample units) and p columns (henceforth species). X contains the coordinates for the n sample units on k axes (dimensions). Y contains the coordinates for the p species on k axes. Note that both Y and X refer to the same coordinate system. The goal is to maximize the correspondence, R, defined as: n p a R = ij xi y j i=1 j=1 p n a ij i=1 j=1 under the constraints that p n x i=1 2 i =1 and y j=1 2 i =1 Major steps eigenanalysis approach 1. Calculate weighting coefficients based on the reciprocals of the sample unit totals and the species totals. v contains the sample unit weights, w contains the species weights, ai+ is the total for sample unit i, a+j is the total for species j. 1 vi = a i+ 1 and w j = a+ j The square roots of these weights are placed in the diagonal of the two matrices V½ and W½, which are otherwise filled with zeros. Given n sample units and p species: V½ has dimensions n n W½ has dimensions p p. 2. Weight the data matrix A by V½ and W½: 1/ 2 B = V In other words, bij = 1/ 2 AW a ij a i+ a+ j This is a simultaneous weighting by row and column totals. The resulting matrix B has n rows and p columns. 3. Calculate a cross-products matrix: S = B'B = V½AWA'V½. The dimensions of S are n n. The term on the right has dimensions: (n n)(n p)(p p)(p n)(n n) Note that S, the cross-products matrix, is a variancecovariance matrix as in PCA except that the cross-products are weighted by the reciprocals of the square roots of the sample unit totals and the species totals. 4. Now find eigenvalues as in PCA. Each eigenvalue (latent root) is a lambda () that solves: │S - I│ = 0 This is the “characteristic equation.” Note that it is the same as that used in PCA, except for the contents of S. 5. Also find the eigenvectors Y (p k) and X (n k) for each of k dimensions: [S - I]x = 0 and [ W½A'VAW½ - I]y = 0 using the same set of in both cases. For each axis there is one and there is one vector x. For every there is one vector y. 6. At this point, we have found X and Y for k dimensions such that: X = A Y nk np pk and Y = A' X pk pn nk where Y contains the species ordination, A is the original data matrix, and X contains the sample ordination. Each component or axis can be represented as a linear combination of the original variables. Each eigenvector contains the coefficients for the equation for one axis. For eigenvector 1 (the first column of Y): Score1 xi = y1ai1 + y2ai2 + ... + ypaip for entity i The sample unit scores are scaled by multiplying each element of the SU eigenvectors, X, by SU scaling factor = a++ / ai+ where a++ is the grand total of the matrix A. The species scores are scaled by multiplying each element of the SU eigenvectors, Y, by Species scaling factor = a++ / a+ j Major steps reciprocal averaging approach 1. Arbitrarily assign scores, x, to the n sample units. The scores position the sample units on an ordination axis. 2. Calculate species scores as weighted averages, where a+j is the total for species j: n a yj = ij i=1 a+ j xi 3. Calculate new site scores by weighted averaging of the species scores, where ai+ is the total for sample unit i: p a xi = ij j=1 a i+ yj 4. Center and standardize the site scores so that n ai+ xi = 0 i=1 n and a i=1 i+ 2 i x =1 5. Check for convergence of the solution. If the site scores are closer than a prescribed tolerance to the site scores of the preceding iteration, then stop. Otherwise, return to step 2. A B -192 160 Sample units Species 1 -254 2 56 3 212 Figure 19.4. One dimensional CA ordination of the same data set used in the weighted averaging example in the previous chapter (Fig. 18.1). Scores were standardized to unit variance, then multiplied by 100. 1.0 Axis 2 RA 200 Axis 2 NMS CA 0.5 Axis 1 100 -2.0 Axis 1 0 -200 0 200 1.0 -0.5 400 -1.0 -100 -200 2 Axis 2 PC A Axis 1 0 -6 -1.0 0.0 0.0 -2 2 -2 -4 -6 6 10 Figure 19.5. Comparison of 2-D CA (RA), nonmetric multidimensional scaling (NMS), and principal components analysis (PCA) of a data set with known underlying structure. The lines connect sample points along the major environmental gradient. The minor secondary gradient is nearly orthogonal to the major gradient. In the perfect ordination, the points would form a horizontally elongate grid. Inset: the ideal result is a regular 3 10 grid. Table 19.1. Comparison of correspondence analysis CA, nonmetric multidimensional scaling (NMS), and principal components analysis (PCA) of a data set with known underlying structure. Proportion of variance represented* Axis 1 Axis 2 Cumulative Correlations with environmental variables Axis 1 Gradient 1 Gradient 2 Axis 2 Gradient 1 Gradient 2 CA NMS PCA 0.473 0.242 0.715 0.832 0.084 0.917 0.327 0.194 0.521 -0.982 -0.067 0.984 0.022 -0.852 -0.397 -0.058 -0.241 0.102 0.790 -0.204 0.059 * Eigenvalue-based for CA and PCA; distance-based for NMS