Download Chapter 19

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
CHAPTER 19
Correspondence Analysis
Tables, Figures, and Equations
From: McCune, B. & J. B. Grace. 2002. Analysis of
Ecological Communities. MjM Software Design,
Gleneden Beach, Oregon http://www.pcord.com
Figure 19.1. A synthetic data set of eleven species with noiseless hump-shaped responses
to an environmental gradient. The gradient was sampled at eleven points (sample units),
numbered 1-11.
PC A
SU11
SU1
SU2
SU10
SU3
CA
EnvGradA
SU11
SU1
SU10
SU2
SU4
SU8
Axis 2
Axis 2
SU9
SU5
SU7
SU3
SU9
EnvGradA
SU4
SU8
SU7
SU6
Axis 1
SU5
SU6
Axis 1
Figure 19.2. Comparison of PCA and CA of the data set shown in Figure 19.1.
PCA curves the ends of the gradient in, while CA does not. The vectors indicate the
correlations of the environmental gradient with the axis scores.
CA
11
10
10
9
8
5
6
7
7
4
5
6
7
8
8 11
910
11
9
6
5
4
2
3
PCA
1
2 3
4
NMS
3
1
2
1
Axis 1
Figure 19.3. One-dimensional ordinations of the
data set in Figures 19.1 and 19.2, using nonmetric
multidimensional scaling (NMS), PCA, and CA.
Both NMS and CA ordered the points correctly on
the first axis, while PCA did not.
How it works
Axes are rotated simultaneously through species space and sample
space with the object of maximizing the correspondence between
the two spaces. This produces two sets of linear equations, X and
Y, where
X = A Y and Y = A' X
A is the original data matrix with n rows (henceforth sample units)
and p columns (henceforth species).
X contains the coordinates for the n sample units on k axes
(dimensions).
Y contains the coordinates for the p species on k axes.
Note that both Y and X refer to the same coordinate system.
The goal is to maximize the correspondence, R, defined as:
n
p
 a
R =
ij
xi y j
i=1 j=1
p
n
 a
ij
i=1 j=1
under the constraints that
p
n
x
i=1
2
i
=1
and
y
j=1
2
i
=1
Major steps  eigenanalysis approach
1. Calculate weighting coefficients based on the
reciprocals of the sample unit totals and the species totals.
v contains the sample unit weights,
w contains the species weights,
ai+ is the total for sample unit i,
a+j is the total for species j.
1
vi =
a i+
1
and w j =
a+ j
The square roots of these weights are placed in the
diagonal of the two matrices V½ and W½, which are
otherwise filled with zeros.
Given n sample units and p species:
V½ has dimensions n  n
W½ has dimensions p  p.
2. Weight the data matrix A by V½ and W½:
1/ 2
B = V
In other words,
bij =
1/ 2
AW
a ij
a i+ a+ j
This is a simultaneous weighting by row and column totals.
The resulting matrix B has n rows and p columns.
3. Calculate a cross-products matrix: S = B'B = V½AWA'V½.
The dimensions of S are n  n.
The term on the right has dimensions:
(n  n)(n  p)(p  p)(p  n)(n  n)
Note that S, the cross-products matrix, is a variancecovariance matrix as in PCA except that the cross-products
are weighted by the reciprocals of the square roots of the
sample unit totals and the species totals.
4. Now find eigenvalues as in PCA. Each eigenvalue
(latent root) is a lambda () that solves:
│S - I│ = 0
This is the “characteristic equation.” Note that it is the
same as that used in PCA, except for the contents of S.
5. Also find the eigenvectors Y (p  k) and X (n  k) for
each of k dimensions:
[S - I]x = 0
and
[ W½A'VAW½ - I]y = 0
using the same set of  in both cases. For each axis
there is one  and there is one vector x. For every 
there is one vector y.
6. At this point, we have found X and Y for k dimensions such
that:
X = A Y
nk np pk
and
Y = A' X
pk pn nk
where
Y contains the species ordination,
A is the original data matrix, and
X contains the sample ordination.
Each component or axis can be represented as a linear
combination of the original variables.
Each eigenvector contains the coefficients for the equation for
one axis.
For eigenvector 1 (the first column of Y):
Score1 xi = y1ai1 + y2ai2 + ... + ypaip for entity i
The sample unit scores are scaled by multiplying
each element of the SU eigenvectors, X, by
SU scaling factor =
a++ / ai+
where a++ is the grand total of the matrix A.
The species scores are scaled by multiplying each element
of the SU eigenvectors, Y, by
Species scaling factor =
a++ / a+ j
Major steps  reciprocal averaging approach
1. Arbitrarily assign scores, x, to the n sample units. The
scores position the sample units on an ordination axis.
2. Calculate species scores as weighted averages,
where a+j is the total for species j:
n
a
yj =
ij
i=1
a+ j
xi
3. Calculate new site scores by weighted averaging of the
species scores, where ai+ is the total for sample unit i:
p
a
xi =
ij
j=1
a i+
yj
4. Center and standardize the site scores so that
n
 ai+ xi = 0
i=1
n
and
a
i=1
i+
2
i
x =1
5. Check for convergence of the solution.
If the site scores are closer than a prescribed tolerance to
the site scores of the preceding iteration, then stop.
Otherwise, return to step 2.
A
B
-192
160
Sample units
Species
1
-254
2
56
3
212
Figure 19.4. One dimensional CA ordination of the same
data set used in the weighted averaging example in the
previous chapter (Fig. 18.1). Scores were standardized to
unit variance, then multiplied by 100.
1.0
Axis 2
RA
200
Axis 2
NMS
CA
0.5
Axis 1
100
-2.0
Axis 1
0
-200
0
200
1.0
-0.5
400
-1.0
-100
-200
2
Axis 2
PC A
Axis 1
0
-6
-1.0
0.0
0.0
-2
2
-2
-4
-6
6
10
Figure 19.5. Comparison of 2-D CA (RA),
nonmetric multidimensional scaling (NMS),
and principal components analysis (PCA) of
a data set with known underlying structure.
The lines connect sample points along the
major environmental gradient. The minor
secondary gradient is nearly orthogonal to
the major gradient. In the perfect ordination,
the points would form a horizontally
elongate grid. Inset: the ideal result is a
regular 3  10 grid.
Table 19.1. Comparison of
correspondence analysis CA,
nonmetric multidimensional
scaling (NMS), and principal
components analysis (PCA)
of a data set with known
underlying structure.
Proportion of variance
represented*
Axis 1
Axis 2
Cumulative
Correlations with
environmental
variables
Axis 1
Gradient 1
Gradient 2
Axis 2
Gradient 1
Gradient 2
CA
NMS
PCA
0.473
0.242
0.715
0.832
0.084
0.917
0.327
0.194
0.521
-0.982
-0.067
0.984
0.022
-0.852
-0.397
-0.058
-0.241
0.102
0.790
-0.204
0.059
* Eigenvalue-based for CA and PCA; distance-based
for NMS