Download BioSim2WS

Document related concepts
no text concepts found
Transcript
BioSim2 Workshop
New England Association of Environmental
Biologists
30th Annual Meeting
Bethel Inn
Bethel, Maine
March 30, 2006
Outline
– B, Index of Biotic Similarity -1976 Paper
Faults of other Indexes Available
Basic Concept and Biosim
– B, Index of Biotic Similarity -1992 Paper
Options and BioSim1
Preferred Options
– B, Index of Biotic Similarity -2005 Paper
BioSim2
Macroinvertebrate Data
Original Data Rearranged in Double Dendrogram Order
Habitat Data
Physical Chemical Data
All Data Combined
– Working with Your Own Data
Community Structure
(Species Composition)
Species occurrence and abundance
(that is, both the kinds and numbers of species present)
Measures of Community Structure
Being Used in Pollution Surveys in 1970s
• Gleason's Richness Index
• Shannon's Diversity Index
• Menhinik's Richness Index
• Pielou's Evenness Index
• Simpson's Index of Dominance
• Brillouin's Index
• McIntosh's Index
Where Others Go Wrong
Where Presence-Absence Similarity
Coefficients Go Wrong
P = number of matches in
which a given taxon is
present at both stations
N = number of matches in
which a given taxon is
absent from both stations
M = number of matches in
which a given taxon is
present in one station and
absent from the other.
Where Chutter's Biotic Index Fails
N = total number of individuals at Station a
Xi = number of individuals in the ith taxon at Station a
Qi = quality index for the ith taxon
k = the number of taxa at Station a
CB is calculated for only one station at a time.
Where Percentage Similarity of Community Fails
k = number of different taxa at Stations a and b
ria and rib = the relative abundances of the ith taxa
in stations a and b, respectively
Index of Biotic Similarity
(Pinkham-Pearson Index)
Barbour et al. (1992) in a systematic comparison of the metrics
proposed in EPA's rapid bioassessment protocol (Pfalkin et al.,
1989), concluded that B "may be the most appropriate metric
to serve as a measure of community similarity."
Example of B, Index of Biotic Similarity
Comparing More Than Two Stations
Each comparison between stations is called a paired
comparison (PC).
When dealing with two or more stations, the number of
paired comparisons is expressed by the formula:
Matrix of B’s Between 11 Habitat
Parameters
BioSim
• BioSim, 1976 Fortran IV
– Calculated B’s from original data matrix and
produced dendrograms from resulting
matrices of B’s.
BioSim1
• BioSim1, 1992, DOS
– As above, plus
– Defined terms clearly
– Provided a strategy for analyzing data
– Contained new features as part of this
strategy
BioSim1 - Terms
• Original Data Set
• The Matrix of B’s
• The Dendrogram
BioSim1 – Terms - Original Data Set
A data set is usually 3-dimensional
That is it includes three variables:
sample sites or replicates at one site
taxa sampled
sampling dates
Normally the data set is analyzed two variables at a time.
Thus an original data set has two axes, one variable on each axis.
This is a two-dimensional matrix
The parameters are the actual values of the variables, such as
Site 1
Gammarus sp
June 10, 2005
Data points in the matrix are the number of organisms recorded in the
sample involving a parameter on the x-axis and one on the y-axis.
Each comparison between two data points is called a match.
BioSim1 – Terms
Components of an Original Data Set
BioSim1 – Terms - The Matrix of B’s
The triangular array of paired
comparisons between all the possible
pairs of parameters of a given
variable forms a matrix of B's.
Each paired comparison in the matrix
is a B-value.
BioSim1 – Terms
Components of a Matrix of B’s
BioSim1 – Terms - The Dendrogram
The matrix of B's is surveyed for paired comparisons of
parameters with high B-values. Each of these pairs is linked
in a cluster. For the purpose of the discussion which follows,
a single parameter is considered to comprise a cluster.
By an iterative process, additional clusters are linked to those
already linked, based on a function of the average for the Bvalues between the two clusters. In this manner a
dendrogram of clusters is formed which resembles a tree on
its side, with the branches linking the various clusters at
nodes. The nodes link clusters of parameters. Note that the
B-value for a given cluster is found by extending the node
lines to the coefficient of similarity scale (B-value scale).
BioSim1 – Terms
Components of a Dendrogram
BioSim1 – Strategy for Data Analysis
• Nature of the original data points
• Configuration of the original data matrix
• Variations of B
• Data points with low numbers of organisms
• Unweighted vs weighted clustering
• Configuration of the rearranged data matrix
• Establishing environmentally valid subclusters
BioSim1 – Strategy for Data Analysis
Nature of the original data points
•
numbers of individuals in a taxon (density)
• percentage of entire sample represented
by a single taxon (% composition)
• biomass, productivity, chlorophyll, etc.
• chemical parameters (will need scaling)
• physical parameters (will need scaling)
• habitat parameters (will need scaling)
• paired comparisons between any of
the above (will need scaling)
BioSim1 – Strategy for Data Analysis
Configuration of the original data matrix
•
Taxa across sites for a given date (2-dimensional matrix)
clusters of sites based on taxa they contain
& clusters of taxa at those sites
• Taxa across dates for a given site (2-dimensional matrix)
clusters of dates based on taxa they contain
& clusters of taxa on those dates
• Sites across dates for a given taxon (2-dimensional matrix)
clusters of dates based sites
& clusters of sites on those dates
• Taxa across sites for given dates (3-dimensional matrix)
clusters of sites on dates based on the taxa they contain
clusters of taxa based on their distribution at sites on given dates
BioSim1 – Strategy for Data Analysis
Variations of B
• B, as displayed
• B1, used original data matrix to calculate %
composition and then developed a dendrogram
based on the % composition values
• B2, as in B1, but calculated using a weighting
factor for each match based on the average of
the % composition values in that match
BioSim1 – Strategy for Data Analysis
Data points with low numbers of organisms
• 0/0 matches
scored as 1 or ignored?
• 1/1, 1/2, 2/2 matches
ignored? Perkins (1981)
• Compressing the data matrix
Delete any taxon that is represented by fewer than X total
individuals over all the sites/dates being compared as long as X/n,
where n is the number of times it occurs is < 3.
Any similarly logical rule
BioSim1 – Strategy for Data Analysis
Data points with low numbers of organisms
Further Consideration of sampling error:
0/1 = 0.0 ; 1/1 = 1
-
0/100 = 0.0 ; 1/100 = 0.01
Solution (Clifford and Stephanson, 1975): use an
adjustment factor (f). Add f to both numerator and
denominator. They recommend f = 1/5 lowest non-zero
entry in original data matrix. In above case, f = 0.2:
0.2/1.2 = 0.17
-
0.2/100 = 0.002
which may reflect a more realistic relationship
BioSim1 – Strategy for Data Analysis
Data points with low numbers of organisms
• Ignored matches
The number of matches (k in the formula for B) is
decreased by one for each match ignored in each
paired comparison.
CC725R
2
7.7
2.1
0
1.1
1.8
0
0
CC725R
3
9.5
4.7
0
0.5
5
0
0.2
BioSim1 – Strategy for Data Analysis
Unweighted vs weighted clustering
• The algorithm that determines which of two candidate clusters to
join to an already existing cluster.
A cluster of 5 parameters could be linked to a cluster of 3
parameters or another of 2 parameters. There would be 5x3 Bvalues to average for the first possibility and 5x2 for the second.
• Unweighted: The averages of the two sets of B-values determine
the decision.
• Weighed: The number of parameters in the cluster to be joined
plays a role in the decision.
BioSim1 – Strategy for Data Analysis
Configuration of the rearranged data matrix
Rearranging original data matrix in dendrogram order
sites/taxa
A
B
C
D
E
F
G
CPC720R1
3.8
0.2
0.9
15.7
0.7
6.9
0
CPC720R2
3.8
0.7
0.7
12.5
6.4
8.7
0
CPC720R3
8.4
0.5
2
22
3.8
9.6
1.5
CC725R1
5.7
3.2
0
0
1.8
0
0
CC725R2
7.7
2.1
0
1.1
1.8
0
0
CC725R3
9.5
4.7
0
0.5
5
0
0.2
sites/taxa
A
B
C
D
E
F
G
CC725R2
7.7
2.1
0
1.1
1.8
0
0
CC725R1
5.7
3.2
0
0
1.8
0
0
CC725R3
9.5
4.7
0
0.5
5
0
0.2
CPC720R2
3.8
0.7
0.7
12.5
6.4
8.7
0
CPC720R1
3.8
0.2
0.9
15.7
0.7
6.9
0
CPC720R3
8.4
0.5
2
22
3.8
9.6
1.5
BioSim1 – Strategy for Data Analysis
Configuration of the rearranged data matrix
Rearranging original data matrix in double-dendrogram order
sites/taxa
A
B
C
D
E
F
G
CC725R2
7.7
2.1
0
1.1
1.8
0
0
CC725R1
5.7
3.2
0
0
1.8
0
0
CC725R3
9.5
4.7
0
0.5
5
0
0.2
CPC720R2
3.8
0.7
0.7
12.5
6.4
8.7
0
CPC720R1
3.8
0.2
0.9
15.7
0.7
6.9
0
CPC720R3
8.4
0.5
2
22
3.8
9.6
1.5
F
C
G
D
E
B
A
CC725R2
0
0
0
1.1
1.8
2.1
7.7
CC725R1
0
0
0
0
1.8
3.2
5.7
CC725R3
0
0
0.2
0.5
5
4.7
9.5
CPC720R2
8.7
0.7
0
12.5
6.4
0.7
3.8
CPC720R1
6.9
0.9
0
15.7
0.7
0.2
3.8
CPC720R3
9.6
2
1.5
22
3.8
0.5
8.4
BioSim1 – Strategy for Data Analysis
•Establishing environmentally valid subclusters
F
C
G
D
E
B
A
CC725R2
0
0
0
1.1
1.8
2.1
7.7
CC725R1
0
0
0
0
1.8
3.2
5.7
CC725R3
0
0
0.2
0.5
5
4.7
9.5
CPC720R2
8.7
0.7
0
12.5
6.4
0.7
3.8
CPC720R1
6.9
0.9
0
15.7
0.7
0.2
3.8
CPC720R3
9.6
2
1.5
22
3.8
0.5
8.4
Preferred Options
Preferred Options
Preferred Options
BioSim2
• BioSim2, 2005, Java Format
– Most features as in BioSim1
– But, very user-friendly
– Provides many of the options in BioSim1 as
automatic output
– Expands on capability of mining the data for
insights
Figure 1. Opening Screen of BioSim2.
Demonstration of BioSim2
• Input original compressed data matrix
• Output of BioSim2
– Row Dendrogram
– Row Cophenetic Correlation Coefficient
– Column Dendrogram
– Column Cophenetic Correlation Coefficient
– Original Data Matrix Rearranged in DoubleDendrogram Order
0’s Present
0’s Removed
Coding The Reordered Data Matrix
Synthesis
Habitat Parameters
•
•
•
•
•
•
•
•
•
•
•
•
Silt Rating:
Pc(Pebble count)Boulder%
PcCobble%
PcCoarseGravel %
PcGravel%
PcSand%
PcSilt%
PcClay%
Embeddedness:
Canopy%
Width
Filamentous Green%
(0 = none, 5 = extreme)
(>256mm)
(>64-256mm)
(>16-64mm)
(>2-16mm)
(>0.6-2mm)
(>0.004-0.6mm)
(<0.004mm)
(>75%-poor to <5%-exc)
(0 to 100% in incr of 10)
(bank full-m)
(substrate cover)
[0-5]
[0-100]
[0-100]
[0-100]
[0-100]
[0-100]
[0-100]
[0-100]
[1-5]
[0-100]
[1.5-15]
[0-100]
Physical/Chemical Parameters
• Station
Miles from mouth
of watershed
[0.1-14.8]
•Water Temperature
C
[21.7-5.5]
•pH
Standard units
[6.97-8.4]
•Alkalinity
mg/l
[4.2-247]
•Lab Conductivity
ìmhos/cm
[25.4-1700]
•Color
ptco units
[2.5-70]
•Turbidity
NTU
[0.23-13.6]
•Total Suspended Solids
mg/l
[0.5-7.4]
•Total Phosphorus
µg/l
[5-75]
• Total Dissolved P
µg/l
[5-44]
•
Physical/Chemical Parameters
• Water samples are collected at BASE FLOWS at the time of
biological sampling.
• Water pH, and D.O. are determined in the field using Hydro Lab
meters.
• All other parameters are submitted to the VTDEC Chemistry
Laboratory for analysis following an EPA-approved set of SOP’s.
• Samples are collected for alkalinity, conductivity, base cations and
anions, nutrients, and metals.
• Metals and dissolved phosphorus samples are filtered in the field
and represent total dissolved concentrations.
Physical/Chemical Parameters
Continued
•Total Nitrogen
mg/l
[0.1-2.23]
•Total NOX
mgN/l
[0.05-2.08]
•Total Cl
mg/l
[1-468]
•Dissolved Na
µg/l
[1.3-260]
•Dissolved K
µg/l
[0.24-7.91]
•Total SO4
mg/l
[1.9-56.1]
•Dissolved Ca
mg/l
[1.96-83.5]
•Dissolved Mg
mg/l
[1.1-74.4]
•Dissolved Fe
µg/l
[46.6-538]
•Dissolved Mn
µg/l
[10-218]
RESULTS
HABITAT DATA
• Scaling Sample Points – Discussion
• Reordered Data Matrix – 0’s Included
– Synthesis
• Habitat Dendrogram
– Synthesis
RESULTS
Physical/Chemical DATA
•
•
•
•
Stream Dendrogram
Physical/Chemical Parameter Dendrogram
Reordered Data Matrix - Coded
Synthesis
RESULTS
ALL DATA COMBINED
• Inverse of the Habitat and Physical Chemical
Data
• Synthesis
Acknowledgements
• The EPSCoR Baccalaureate College
Summer Research Program under NSF
Grant Number, EPS-0236976
• The Norwich University Faculty
Development Program
• The VTDEC
BioSim2 Workshop
Questions – Discussion?
Contact:
[email protected]
http://www2.norwich.edu/pinkhamc/
802 485-2319
BioSim2 Workshop
Practice with your Data
Related documents