Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Correspondence Analysis: Simple ( CA) and Detrended (DCA) Vamsi Sundus Shawnalee What is Correspondence Analysis? AKA Reciprocal Averaging (RA). Basically: An ordination technique that involves repeatedly calculating weighted averages. Popular only in France (due to Benzecri). What is Detrended Correspondence Analysis? Designed specifically to solve certain problems found when using CA on ecological data based on “empirical desire to reshape data closer to the models visualized by ecologists.” Popular mainly in the ecological community. Weighted Means Weighted mean results when some of the numbers in the data are repeated. Consider: Arithmetic Mean: 1 2 3 4 5 15 3 5 5 Weighted Mean; value of 1 found 10 times. (10 1) 2 3 4 5 24 1.71... 10 1 1 1 1 14 Application of Weighted Means Let’s say we had some hypothetical data as follows: Year 1 2 3 4 5 6 7 8 9 10 Counts 100 90 80 60 50 40 20 5 0 0 Application of Weighted Means To know what’s the average lifetime of the species, you would have to use the weighted averages to compute a weighted mean (below): 1100 2 90 3 80 4 60 5 50 6 40 7 20 8 5 3.21 100 90 80 60 50 40 20 5 Year Counts 1 100 2 90 3 80 4 60 5 50 6 40 7 20 8 5 9 0 10 0 M e a n Y e a r Application CA Algorithm to find “mean species” in a 3 species case. But theoretically, most ecologists and the like would be observing multiple species at the same time and hence have count data for these multi-species groups such as follows: Year Counts 1 100 2 90 3 80 M Y 4 60 e e 5 50 a a 6 40 n r 7 20 8 5 9 0 10 0 Year Counts 1 0 2 10 3 20 4 35 5 50 6 60 7 30 8 20 9 10 10 0 M e a n Y e a r Year Counts 1 0 2 0 3 5 4 10 5 20 6 30 7 40 8 60 9 75 10 90 M e a n Y e a r Step 1 Start with a random weighting. It’s pretty kosher to start from 0.0 100.0 in whatever increments are needed. In our case, we’ll do (0,50,100) for (A, B, C) Use this formula for nth species rank: n 1 100 | S species S 1 Step 2 Use the starter weights (which are arbitrary essentially) and compute a weighting for each of the years Year Counts Counts Counts 1 100 0 0 2 90 10 0 3 80 20 5 4 60 35 10 5 50 50 20 6 40 60 30 7 20 30 40 8 5 20 60 9 0 10 75 10 0 0 90 Y1 --> 0.0 --> 5.0 --> 14.3 --> 26.2 --> 37.5 --> 46.2 --> 61.1 --> 82.4 --> 94.1 --> 100.0 0 100 50 0 100 0 0.0 | Year1 100 0 0 Step 3 We can now calculate a new weighting for each species using these new year weightings. 0 100 5 90 14.3 80 ... 0 94.9 0 100 19.1 100 90 ... 20 5 Calculate similarly for B, C Old weightings for species S10 S1a 0 19.1 50 43.9 100 78.5 New calculated weightings for species Step 4 These new weightings for each species though aren’t that useful, so we need to rescale them back to 0 100, instead of currently 19.1 78.5. So, to do this, simply use a logical rescaling method. S1a 19.1 43.9 78.5 100 ( S1a MIN ) S1b MAX MIN Step 4 cont. So, after computing the rescaled values, we find the following: S10 0 50 100 S1a 19.1 43.9 78.5 S1b 0.00 41.75 100.00 Step 5 This is now one cycle of the CA completed. “Weightings for each year are recalculated using the new, rescaled weightings for the species.” Eventually a stable patter will emerge. 10-20 iterations. Correspondence Analysis That was CA utilized in a simplistic example. Detrended Correspondence Analysis • This technique is not purely mathematical • It’s a series of rules that are used to reshape data to make it friendlier for analysis. • Once again, primarily used for ecological data, but can be extended to anything (data simply can’t contain negative values). • The reason that this technique is used is to over come the arch effect (the horseshoe effect). Arch Effect (Horseshoe Effect) • Found in data whenever “PCA or other distance conserving ordination techniques are applied to data which follow a continuous gradient, along which there is a progressive turnover of dominant variables.” – Such as in ecological succession • After ordination by a distance conserving technique and the first two axes are plotted against each other, one would find an arch shape. Steps of DCA Two major stages Ordination by CA (as previous) Then get rid of arch effect by brute-force. Goal (the bold one) Notice There’s a loss of information, specifically the second CA axis, the Y-axis in this case. Software Standard software according to Shaw is based on the same source code and entered through some front-end of DECORANA. However, there is a package to do this in R. Basics in R. decorana(veg, iweigh=0, iresc=4, ira=0, mk=26, short=0, before=NULL, after=NULL) veg = data matrix Iweigh = downweighting of rare species. Both CA and DCA are extremely sensitive to rare species, so this would decrease the importance of rare species. Iresc = number of cycles of reiteration. Ira = turns CA into DCA, if turned on (0 = detrended, 1 = simple) There’s no information to extend this in Shaw, so, leaving it until a later time. FIN