Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
BioSim2 Workshop New England Association of Environmental Biologists 30th Annual Meeting Bethel Inn Bethel, Maine March 30, 2006 Outline – B, Index of Biotic Similarity -1976 Paper Faults of other Indexes Available Basic Concept and Biosim – B, Index of Biotic Similarity -1992 Paper Options and BioSim1 Preferred Options – B, Index of Biotic Similarity -2005 Paper BioSim2 Macroinvertebrate Data Original Data Rearranged in Double Dendrogram Order Habitat Data Physical Chemical Data All Data Combined – Working with Your Own Data Community Structure (Species Composition) Species occurrence and abundance (that is, both the kinds and numbers of species present) Measures of Community Structure Being Used in Pollution Surveys in 1970s • Gleason's Richness Index • Shannon's Diversity Index • Menhinik's Richness Index • Pielou's Evenness Index • Simpson's Index of Dominance • Brillouin's Index • McIntosh's Index Where Others Go Wrong Where Presence-Absence Similarity Coefficients Go Wrong P = number of matches in which a given taxon is present at both stations N = number of matches in which a given taxon is absent from both stations M = number of matches in which a given taxon is present in one station and absent from the other. Where Chutter's Biotic Index Fails N = total number of individuals at Station a Xi = number of individuals in the ith taxon at Station a Qi = quality index for the ith taxon k = the number of taxa at Station a CB is calculated for only one station at a time. Where Percentage Similarity of Community Fails k = number of different taxa at Stations a and b ria and rib = the relative abundances of the ith taxa in stations a and b, respectively Index of Biotic Similarity (Pinkham-Pearson Index) Barbour et al. (1992) in a systematic comparison of the metrics proposed in EPA's rapid bioassessment protocol (Pfalkin et al., 1989), concluded that B "may be the most appropriate metric to serve as a measure of community similarity." Example of B, Index of Biotic Similarity Comparing More Than Two Stations Each comparison between stations is called a paired comparison (PC). When dealing with two or more stations, the number of paired comparisons is expressed by the formula: Matrix of B’s Between 11 Habitat Parameters BioSim • BioSim, 1976 Fortran IV – Calculated B’s from original data matrix and produced dendrograms from resulting matrices of B’s. BioSim1 • BioSim1, 1992, DOS – As above, plus – Defined terms clearly – Provided a strategy for analyzing data – Contained new features as part of this strategy BioSim1 - Terms • Original Data Set • The Matrix of B’s • The Dendrogram BioSim1 – Terms - Original Data Set A data set is usually 3-dimensional That is it includes three variables: sample sites or replicates at one site taxa sampled sampling dates Normally the data set is analyzed two variables at a time. Thus an original data set has two axes, one variable on each axis. This is a two-dimensional matrix The parameters are the actual values of the variables, such as Site 1 Gammarus sp June 10, 2005 Data points in the matrix are the number of organisms recorded in the sample involving a parameter on the x-axis and one on the y-axis. Each comparison between two data points is called a match. BioSim1 – Terms Components of an Original Data Set BioSim1 – Terms - The Matrix of B’s The triangular array of paired comparisons between all the possible pairs of parameters of a given variable forms a matrix of B's. Each paired comparison in the matrix is a B-value. BioSim1 – Terms Components of a Matrix of B’s BioSim1 – Terms - The Dendrogram The matrix of B's is surveyed for paired comparisons of parameters with high B-values. Each of these pairs is linked in a cluster. For the purpose of the discussion which follows, a single parameter is considered to comprise a cluster. By an iterative process, additional clusters are linked to those already linked, based on a function of the average for the Bvalues between the two clusters. In this manner a dendrogram of clusters is formed which resembles a tree on its side, with the branches linking the various clusters at nodes. The nodes link clusters of parameters. Note that the B-value for a given cluster is found by extending the node lines to the coefficient of similarity scale (B-value scale). BioSim1 – Terms Components of a Dendrogram BioSim1 – Strategy for Data Analysis • Nature of the original data points • Configuration of the original data matrix • Variations of B • Data points with low numbers of organisms • Unweighted vs weighted clustering • Configuration of the rearranged data matrix • Establishing environmentally valid subclusters BioSim1 – Strategy for Data Analysis Nature of the original data points • numbers of individuals in a taxon (density) • percentage of entire sample represented by a single taxon (% composition) • biomass, productivity, chlorophyll, etc. • chemical parameters (will need scaling) • physical parameters (will need scaling) • habitat parameters (will need scaling) • paired comparisons between any of the above (will need scaling) BioSim1 – Strategy for Data Analysis Configuration of the original data matrix • Taxa across sites for a given date (2-dimensional matrix) clusters of sites based on taxa they contain & clusters of taxa at those sites • Taxa across dates for a given site (2-dimensional matrix) clusters of dates based on taxa they contain & clusters of taxa on those dates • Sites across dates for a given taxon (2-dimensional matrix) clusters of dates based sites & clusters of sites on those dates • Taxa across sites for given dates (3-dimensional matrix) clusters of sites on dates based on the taxa they contain clusters of taxa based on their distribution at sites on given dates BioSim1 – Strategy for Data Analysis Variations of B • B, as displayed • B1, used original data matrix to calculate % composition and then developed a dendrogram based on the % composition values • B2, as in B1, but calculated using a weighting factor for each match based on the average of the % composition values in that match BioSim1 – Strategy for Data Analysis Data points with low numbers of organisms • 0/0 matches scored as 1 or ignored? • 1/1, 1/2, 2/2 matches ignored? Perkins (1981) • Compressing the data matrix Delete any taxon that is represented by fewer than X total individuals over all the sites/dates being compared as long as X/n, where n is the number of times it occurs is < 3. Any similarly logical rule BioSim1 – Strategy for Data Analysis Data points with low numbers of organisms Further Consideration of sampling error: 0/1 = 0.0 ; 1/1 = 1 - 0/100 = 0.0 ; 1/100 = 0.01 Solution (Clifford and Stephanson, 1975): use an adjustment factor (f). Add f to both numerator and denominator. They recommend f = 1/5 lowest non-zero entry in original data matrix. In above case, f = 0.2: 0.2/1.2 = 0.17 - 0.2/100 = 0.002 which may reflect a more realistic relationship BioSim1 – Strategy for Data Analysis Data points with low numbers of organisms • Ignored matches The number of matches (k in the formula for B) is decreased by one for each match ignored in each paired comparison. CC725R 2 7.7 2.1 0 1.1 1.8 0 0 CC725R 3 9.5 4.7 0 0.5 5 0 0.2 BioSim1 – Strategy for Data Analysis Unweighted vs weighted clustering • The algorithm that determines which of two candidate clusters to join to an already existing cluster. A cluster of 5 parameters could be linked to a cluster of 3 parameters or another of 2 parameters. There would be 5x3 Bvalues to average for the first possibility and 5x2 for the second. • Unweighted: The averages of the two sets of B-values determine the decision. • Weighed: The number of parameters in the cluster to be joined plays a role in the decision. BioSim1 – Strategy for Data Analysis Configuration of the rearranged data matrix Rearranging original data matrix in dendrogram order sites/taxa A B C D E F G CPC720R1 3.8 0.2 0.9 15.7 0.7 6.9 0 CPC720R2 3.8 0.7 0.7 12.5 6.4 8.7 0 CPC720R3 8.4 0.5 2 22 3.8 9.6 1.5 CC725R1 5.7 3.2 0 0 1.8 0 0 CC725R2 7.7 2.1 0 1.1 1.8 0 0 CC725R3 9.5 4.7 0 0.5 5 0 0.2 sites/taxa A B C D E F G CC725R2 7.7 2.1 0 1.1 1.8 0 0 CC725R1 5.7 3.2 0 0 1.8 0 0 CC725R3 9.5 4.7 0 0.5 5 0 0.2 CPC720R2 3.8 0.7 0.7 12.5 6.4 8.7 0 CPC720R1 3.8 0.2 0.9 15.7 0.7 6.9 0 CPC720R3 8.4 0.5 2 22 3.8 9.6 1.5 BioSim1 – Strategy for Data Analysis Configuration of the rearranged data matrix Rearranging original data matrix in double-dendrogram order sites/taxa A B C D E F G CC725R2 7.7 2.1 0 1.1 1.8 0 0 CC725R1 5.7 3.2 0 0 1.8 0 0 CC725R3 9.5 4.7 0 0.5 5 0 0.2 CPC720R2 3.8 0.7 0.7 12.5 6.4 8.7 0 CPC720R1 3.8 0.2 0.9 15.7 0.7 6.9 0 CPC720R3 8.4 0.5 2 22 3.8 9.6 1.5 F C G D E B A CC725R2 0 0 0 1.1 1.8 2.1 7.7 CC725R1 0 0 0 0 1.8 3.2 5.7 CC725R3 0 0 0.2 0.5 5 4.7 9.5 CPC720R2 8.7 0.7 0 12.5 6.4 0.7 3.8 CPC720R1 6.9 0.9 0 15.7 0.7 0.2 3.8 CPC720R3 9.6 2 1.5 22 3.8 0.5 8.4 BioSim1 – Strategy for Data Analysis •Establishing environmentally valid subclusters F C G D E B A CC725R2 0 0 0 1.1 1.8 2.1 7.7 CC725R1 0 0 0 0 1.8 3.2 5.7 CC725R3 0 0 0.2 0.5 5 4.7 9.5 CPC720R2 8.7 0.7 0 12.5 6.4 0.7 3.8 CPC720R1 6.9 0.9 0 15.7 0.7 0.2 3.8 CPC720R3 9.6 2 1.5 22 3.8 0.5 8.4 Preferred Options Preferred Options Preferred Options BioSim2 • BioSim2, 2005, Java Format – Most features as in BioSim1 – But, very user-friendly – Provides many of the options in BioSim1 as automatic output – Expands on capability of mining the data for insights Figure 1. Opening Screen of BioSim2. Demonstration of BioSim2 • Input original compressed data matrix • Output of BioSim2 – Row Dendrogram – Row Cophenetic Correlation Coefficient – Column Dendrogram – Column Cophenetic Correlation Coefficient – Original Data Matrix Rearranged in DoubleDendrogram Order 0’s Present 0’s Removed Coding The Reordered Data Matrix Synthesis Habitat Parameters • • • • • • • • • • • • Silt Rating: Pc(Pebble count)Boulder% PcCobble% PcCoarseGravel % PcGravel% PcSand% PcSilt% PcClay% Embeddedness: Canopy% Width Filamentous Green% (0 = none, 5 = extreme) (>256mm) (>64-256mm) (>16-64mm) (>2-16mm) (>0.6-2mm) (>0.004-0.6mm) (<0.004mm) (>75%-poor to <5%-exc) (0 to 100% in incr of 10) (bank full-m) (substrate cover) [0-5] [0-100] [0-100] [0-100] [0-100] [0-100] [0-100] [0-100] [1-5] [0-100] [1.5-15] [0-100] Physical/Chemical Parameters • Station Miles from mouth of watershed [0.1-14.8] •Water Temperature C [21.7-5.5] •pH Standard units [6.97-8.4] •Alkalinity mg/l [4.2-247] •Lab Conductivity ìmhos/cm [25.4-1700] •Color ptco units [2.5-70] •Turbidity NTU [0.23-13.6] •Total Suspended Solids mg/l [0.5-7.4] •Total Phosphorus µg/l [5-75] • Total Dissolved P µg/l [5-44] • Physical/Chemical Parameters • Water samples are collected at BASE FLOWS at the time of biological sampling. • Water pH, and D.O. are determined in the field using Hydro Lab meters. • All other parameters are submitted to the VTDEC Chemistry Laboratory for analysis following an EPA-approved set of SOP’s. • Samples are collected for alkalinity, conductivity, base cations and anions, nutrients, and metals. • Metals and dissolved phosphorus samples are filtered in the field and represent total dissolved concentrations. Physical/Chemical Parameters Continued •Total Nitrogen mg/l [0.1-2.23] •Total NOX mgN/l [0.05-2.08] •Total Cl mg/l [1-468] •Dissolved Na µg/l [1.3-260] •Dissolved K µg/l [0.24-7.91] •Total SO4 mg/l [1.9-56.1] •Dissolved Ca mg/l [1.96-83.5] •Dissolved Mg mg/l [1.1-74.4] •Dissolved Fe µg/l [46.6-538] •Dissolved Mn µg/l [10-218] RESULTS HABITAT DATA • Scaling Sample Points – Discussion • Reordered Data Matrix – 0’s Included – Synthesis • Habitat Dendrogram – Synthesis RESULTS Physical/Chemical DATA • • • • Stream Dendrogram Physical/Chemical Parameter Dendrogram Reordered Data Matrix - Coded Synthesis RESULTS ALL DATA COMBINED • Inverse of the Habitat and Physical Chemical Data • Synthesis Acknowledgements • The EPSCoR Baccalaureate College Summer Research Program under NSF Grant Number, EPS-0236976 • The Norwich University Faculty Development Program • The VTDEC BioSim2 Workshop Questions – Discussion? Contact: [email protected] http://www2.norwich.edu/pinkhamc/ 802 485-2319 BioSim2 Workshop Practice with your Data