Download The Analysis of Ecological Survey Data with SAS and EAP

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
THE ANALYSIS OF ECOLOGICAL SURVEY DATA WITH SAS AND EAP
Robert W. Smith, Ecological Data Analysis
1151 Avila Drive, Ojai, CA 93023
matrix with the rows (species) and
columns (samples) arranged in the same
order as appears on the corresponding
dendrograms (EAP FROe TWT).
The data
values can be standardi~ed and converted
to symbols for compactness and ease of
interpretation (Fig. 1). The symbols
in the two-way tables of Figures land
2 are based on species mean (of values
>O)standardi~ed data, and the values
corresponding to each symbol are as
follows:
INTRODUCTION
The Ecological Analysis package
is a set of user-written SAS procedures which are useful in the analysis
of ecological survey data.
These types
of data are often collected as part of
environmental impact or monitoring
studies. Both biological (usually as
species importance values) and environ(EAP)
mental data are collected at several
pertinent locations.
The first step in the analysis
usually consists of finding the biological patterns in the data. Although
single species can be studied, the main
emphasis here will involve the study of
biological patterns at the community
Symbol
level.
blank
*
+
After the biological patterns are
quantified and illustrated, they can be
correlated with the environmental
measurements. The results of this
analysis can lead to hypotheses of
cause and effect.
These types of analyses can conveniently be performed with SAS and EAP
procedures. As the methods are discussed, the procedures involved will
be noted.
~,.
>2
>1 to 2
>.5 to 1
>0 to .5
o
Such a table is extremely valuable
to the ecologist because the biological
patterns can easily be seen and interpreted.
In addition, when choosing
groups from the dendrograms, reference
to the table is very useful.
The two-way tables are informative
because similar samples and species are
grouped; thus they will appear in contiguous positions on the two-way table.
However, the specific order of entities
along a dendrogram can be quite arbitrary since anyone node on the dendrogram can be rotated 180 0 without changing
any groupings. The clustering algorithm
in EAP FROC DENDRO is modified to create
a maxirr,ally informative order of entities along the dendrogram.
To accomplish this, the main trend in the data
is found by calculating scores for each
entity along an ordination axis (see
below). The order of entities along the
dendrogram is made to approach the order
of entities along the ordination axis
by appropriate rotation of the nodes
(Fig. 2). Note that the rows and
columns of the two-way table in Fig. 2
show continuous biological change.
This
is not evident from observing the twoway table in Fig. 1.
METHODS FOR FINDING BIOLOGICAL
(COMMUNITY) PATTERNS
1.
Range of Values
Agglomerative Hierarchical Cluster
Analysis: This type of cluster analysis
consists of two steps.
I} 'Distances
are calculated between all pairs of
entities (the units being clustered,
which can be observations or variables)
These distance values are proportional
to the dissimilarity of the entities.
2)
The most similar remaining pairs
of entities are successively fused to
form larger and larger groups until all
entities are in a single group. The
path and levels of fusion are shown
in a tree-like structure called a dendrogram (Fig. 1). All agglomerative
clustering methods are similar in this
respect, but differ in the manner in
which distances between groups of entities are calculated as the groups are
built. Some examples of clustering
strategies are complete linkage (used
by SAS PROe CLUSTER), single linkage,
centroid (Sneath and Sokal, 1973),
group average (EAP PROC DENDRO) and
flexible (Lance and Williams, 1967; EAP
PROC DENDRO) .
The dendrograms from a sample and
a species cluster analysis can be used
to construct a two-way coincidence table,
which is simply the biological data
Ordination Analysis:
Here the
relationships between the entities
are displayed in a maximally informative
subset of a multidimensional space,
The
entities being studied are represented
by points in the space and the distance
between two points should be proportional to the dissimilarity of the corresponding entities (Fig. 3).
The dimensions of' the space are called axes and
the point coordinates are called scores.
As with agglomerative cluster
analysis, the ordination techniques can
i
610
Alternately, the groups can be formed
according to the distributions of
selected species (Green, 1971). Discriminant coefficients will show which
environmental variables are correlated
with the discriminant axes.
'l'he results from this type of
analysis usually can be improved by
weighting the observations according
to how well each observation fits into
each group. There is one set of weights
for each observation for each group
(Smith, 1976, 1979). Additional important biological within- and betweengroup information can be conveyed with
the weights. When groups are defined
with cluster analysis, the weights can
be calculated from the same intersample distances used in the clustering
(EAP PROC GRSIM).
be based on inter-entity distances.
Principal coordinates. analysis (EAP
PRoe PCOORD) and multidimensional
scaling (SAS supplemental PROC ALSCAL)
directly use distances, while principal
cornFonents analysis (SAS PRoe PRINCOMP)
are indirectly based on Euclidean
distances.
When ordination scores are plotted,
it is necessary to be able to identify
individual points. EAP PROC PLOTM has
an option which generates a set of plot
symbols which are easily identifiable
from an accompanying table (Fig. 4c).
In addition, symbols can automatically
be generated to distinguish groups
(Fig. 4b) or trends of selected variables in the plot (Fig. 4a).
METHODS FOR CORRELATING BIOLOGICAL
AND ENVIRONMENTAL PATTERNS
Non-parametric Analysis of
Distances: This method can be used
to test for community difference between a 1riori defined groups of
samples Dyer, unpublished; EAP PROC
DCOMP).
Inter-sample distances calculated from the biological data are
divided up into within-group distances
and between-group distances. A nonparametric test is then made to determine if the between-group distances
are significantly larger than the withingroup distances. This method takes into
account the lack of independence of the
distance values.
The groupings can reflect some
hypothesis concerning biologicalenvironmental relationships. For
example, samples taken in an area of
impact could be compared with samples
in a similar area without the impact.
Multiple Linear Regression:
Environmental factors Wh1Ch cause major
community changes will be correlated
with the first or first few ordination
axes. Multiple regression can be used
to possibly identify these factors
(SAS ~ROC GLM, SAS PROC REG). Here,
the dependent variable will represent
scores for an ordination axis" and the
independent variables will be the
measured environmental variables
(Cassie and Michael, 1968; Smith aHd
Greene, 1976).
It is also possible to use intersample distances as the dependent
variable, and corresponding changes
in environmental variables as independent variables (Dyer, 1978; EAP PRoe
REGDIST). With such an analysis,
modified calculations are required
since distance values are not necessarily independent observations.
THE MEASUREMENT OF DISTANCE
Canonical Correlations: Canonical
correlatIons can be used instead of
multiple regression to study the correlations between the ordination axes
and the environmental variables (SAS
PRoe CANCORR). One set of variables
consists of the ordination axes scores
and the other set is the environmental
variables.
Most of the methods mentioned
above require the calculation of
distances. There are several distance indices from which to choose.
When using species importance values,
some indices arc more appropriate than
others. One of the most widely available distance indices is Euclidean. Unfortunately, this index is not well
suited for ecological data (Beals,
1973). EAP PRoe DENDRO has distance
indices which are suitable for these
types of data.
Inter-sample Distances: Intersample d1stances are used to measure
community changes. When used in this
manner, all distance indices have one
major shortcoming. As the actual community change increases beyond a moderate level, the distance index values
do not increase commensurately. This
is due to the fact that species change
takes place in a non-linear, nonmonotonic manner (see Fig. la), and
the distance indices assume linear
Discriminant Analysis: SAS PROC
DISCRIM 1S ma1nly used for classifying observations into a priori defined
groups. Alternately, discrlmlnant
analysis can be used to study betweengroup differences (EAP PROC WTDISC) .
-Here, the axes of a defined multidimensional space are positioned to
maximally separate the groups. The
original dimensions of the multidimensional space could represent
the measured environmental variables,
and the groups could be defined by a
cluster analysis of the samples using
biological data (Smith, 1976; Bernstein,
et aI, 1978; Green and Vascotto, 1978).
611
species change (Swan, 1970).
The
relatively shorter and moderate distances can be somewhat improved with
proper data transformation and
standardization (Smith, 1976; Smith,
in prep; EAP FRoe TSALL).
The relatively longer distances can be substantially improved by reestimation
using a "step-across" procedure
(Williamson, 1978) modified by Smith
(1981)
(EAP PROC DENDRO).
Technical Memorandum 80/9. CSIRO Institute
of Earth Resources, Division of Land Use
Research, CanbGrra, Australia.
Bernstein, B.B., R.R. Hessler, R. Smith, and
P.A. Jumars, 1978. Spatial dispersion of
benthic Foraminifera in the abyssal central
North Pacific. Limnol. Oceanographer.
23 (3),
Here, the
shorter distances are used to reestimate the longer distances.
Inter-species Distances:
With
most lndlces, the data must first be
standardized by species maximum to
remove the irrelevant effects of
scale in the calculations (Smith, 1976)
The distance values will be inversely
proportional to the overlap between the
species being compared.
These distance
values can be adversely affected by uneven sampling of the various habitats
in the survey area (Colwell and Futuyma,
1971). This can somewhat be corrected
for with the use of weights in the
distance calculations (EAP PROC UNIQWT,
543-556.
Green, R.H. and G.L. Vascotto. 1978. A method
for the analysis of environmental factors
controlling patterns of species composition
in aquatic communities. Water Res. 12: 583590.
Howard-Williams, C. and B.H. Walker, 1974.
The vegetation of a tropical African lake:
classifi cat.ion and ordination of thE!
vegetation of Lake Chilwa (Malawi). J. Ecol.
TSALL) .
Besides species overlap, ecologists
are often interested in the relative
habitat preferences of the species.
For
example, the distance between two nonoverlapping species will be the maximal
distance value, regardless of their
habitat preferences.
The ecologist may
prefer that two non-overlapping species
occurring in similar habitats will be
separated by a shorter distance than
two non-overlapping species found in
very dissimilar habitats.
The distances
measuring overlap can be converted to
distances reflecting relative habitat
preference with the "two-step" method
(Belbin, 1980; Austin and Belbin, unpublished; EAP PRoe DENDRO).
62 (3),
831-853.
Lance, G.K. and W.T. Williams, 1967. A general
theory of classificatory sorting strategies.
I. Hierarchical systems. Computer J. 9:
373-380.
Smith, R.W., 1976. Numerical analysis of ecological survey data, Ph.D. thesis. University
of Southern California, LA. 401 pp.
Smith, R.W., 1979. Discriminant analysis. EAP
Technical Report No.1: 53 pp. Available
from author at 1151 Avila Dr., Ojai, CA 93023.
Smith, R.W., 19B1. The re-estimation of ecological ~istance values using the step-across procedure. EAP Technical Report No.2: 19 pp.
Available from author at 1151 Avila Drive,
Ojai, CA 93023.
Smith, R.W. and C.S. Greene, 1976. Biological
communities near submarine outfall. Journal
Water Pollution. Control Fed. 48(8): 18941912.
Smith, R.W., in preparation. The improvement of
ecological distances with transformation and
standardization. To be an EAP technical report.
Sneath, P.A. and R.R. Sakal, 1973. Numerical
Taxonomy. W.H. Freeman and Co., San Francisco: 573 pp.
Swan, J.M.A., 1970. An examination of some
ordination problems by use of simulated
vegetational data. Ecology 5t: 89-102.
Whittaker, R.H., 1973. Direct gradient analysis: Techniques. In Handbook of Vegetation
Science, Part 5: Ordination and Classification
of Communities. R.H. Whittaker, ed., Dr. W.
Junk Publishers, The Hague: 7-31.
Williamson, M.H., 1978. The Ordination of
Indidence Data. J. Ecol. 66: 911-920.
DISCUSSION
The authors of EAP are active in
the management and analysis of ecological survey data.
Accordingly, every
effort is made to keep the EAP programs
user-oriented and the techniques stateof-the-art.
Besides the EAP procedures
mentioned, there are several other procedures for analysis, display, and data
manipUlation which can be useful to the
analyst.
REFERENCES
Austin, M.P. and L. Belbin, unpublished. A new
approach to the inverse classification problem in floristic analysis.
Beals, E.W., 1973. Ordination: Mathematical
elegance and ecological naivete. J. Eco1.,
61(1),
401-416.
Cassie, R.M. and M.D. Michael, 1968. Fauna
and sediments of an intertidal mud flat:
A multivariate analysis. J. EXp. Mar.
BioI. & Ecol., 2: 1-23.
Colwell, R.K., and D.J. Futuyma, 1971. On the
measurement of niche breadLh and overlap.
Ecology S2 (4): 5fi7-S76.
Dyer, D.P., 1978. An analysis of species
dissimilarity using multiple environmental
variables. Ecology 59(1); 117-125.
Dyer, D.P., unpublished. A statistical test
for dissimilarity and similarity matrices.
Available from David P. Dyer, Moorman Mfg.
Co., 1000 N. 30th St., Quincy, IL 62301
Green, R.H., 1971. A multivariate statistical
approach to the Hutchinsonian niche: bivalve
molluscks of central Canada. Ecology 52(4):
23-35.
Belbin, L., 1980 TWOSTP; A program incorporating asymmetric comparisons that use two
steps to produce a dissimilarity matrix.
612
Density
500 , - - - - -__~~------__~~~------------------------,
400
300
200
100
234
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
Samples Along Moisture Gradient (Wet --;;. Dry)
a. Plant Species Densities Along Environmental MOisture Gradient. After Whittaker (1973) .
.. Site 15
Site 16
Site 14
S
I
S
I
S
I
S
I
S
I
S
I
S
I
S
I
S
T
T
T
T
T
T
T
T
T
Site 19
E
E
E
E
E
E
E
E
E
Site 11
1
6
1
1
1
2
8
5
9
1
0
6
7
Site 17
Site 18
Site 10
Site 13
Site 12
S
I
Site 7
Site 6
Site 9
Site 8
Site 4
........... Site 5
..................................................
Site 3
Site 1
Site 2
b. Cluster Analysis of Samples
.. Species I
Species J
Species K
I
S
I
S
I
S
I
S
I
S
I
S
I
S
I
S
I
S
I
T T
T
T
T
T
T
T
T
T
E
E
E
E
E
E
E
E
E
E
1
5
1
1
8
1
3
7
9
4
3
2
4
123 4 567 8 901 234 5 6 7 8 9
Species
Species
Species
Species
Species
Species
Species
Species
Species
Species
Species
Species
I
J
K
L
B
C
D
E
G
H
F
A
""***+---*+
""****+--*+
***"'+-.
+-
--.+**
_+.
"*+**w
+*--***-+
+- * + *
--+ . .
**.+-**-+- *+
****--*+
- . - ..
**-+****++---
123 4 567 8 9
a 1 234
• +
56 7 89
d. Two-way Coincidence Table.
Species L
Species B
,t
Species C
Species D
Species E
Species G
•....•.•..• u
. • • . . . ! •••••••••
Species H
Species F
Species A
c. Cluster Analysis of Species
613
Figure 1. Cluster Analysis of Biological Data
Along a Moisture Gradient Using
Unimproved Clustering "Algorithm.
Site 19
Site 18
Site 17
.. Site 16
Site 15
Site 14
Site 13
Site 12
8ite 11
8ite 10
Site 9
Site 8
8ite 7
Site 6
Site 5
8ite 4
Site 3
Site 2
8ite 1
a. Cluster Analysis of Samples
•• ,.......... Species L
8
I
8
I
S
8
I
8
I
8
I
8
I
S
I
I
8
I
T
E
T
T
T
E
E
E
8
6
4
2
...........
Species K
T
T
T
T
T
Species J
E
E
E
E
E
.. Species I
1
8
1
6
1
4
1
2
1
0
Species H
Species G
S
S
I
I
T
T
E
E
9
7
"......... Species F
8
I
8
I
8
I
8
I
8
I
8
I
,.......... Species E
T
T
T
T
E
E
E
E
T
E
1
1
7
1
1
5
3
Species C
Species B
9
Species 0
Species A
b. Cluster Analysis of Species
8
I
8
I
T
E
T
T
E
5
3
1 2 345 6 7 8 9 0 1 2 3 4 5 7 8 9 0
Species
Species
Species
Species
Species
Species
Species
Species
Species
Species
Species
Species
L
K
J
I
H
G
F
E
C
B
0
A
* * + - -.
- +* * * * +- .
+* * * * * * +- - .
- + * * * .... + - - •
- - + * * * " ... + - - .
- + ......... + - - - + ... * ......... * + + - - -
• + ... * -- - + * ... * ... + - - + + ........ ...
- + .... +
+.
1 2 345 6 789 0 1 2 3 456 7 8 9
c. Two-way Coincidence Table.
Figure 2. Cluster Analysis of Biological Data Along A Moisture Gradient Using Improved
Clustering Algorithm
614
E
II
Neutral to acidic
~ -70
-71
67.
4 Marsh
65
30
36
~33
34.
073
31
3~.7
.72
18 lie
.47
I~
05
51
24 0 52
063
048
) Alkaline marsh
.2
!4"g
055
Swamp transition
Floodplain
058
~o
.57 Grassland
5901"' 062
56.
13
I----~------------------------------------I
Figure 3. Ordination of Samples Based on Species Compositions Using Principal
Components Analysis. After Howard·Williams and Walker, 1974.
I
I
I
9,
I 12
I
11
5+1
3
7
7
"
I
I
I
1+
6
5
7.
A
••
i::t+
9+
A
2
14
5'4
1
1
7.
1
4
3
x
1
I
s
1
2
2
32
1+
3
-+-----+-----+-----+-19
7
13
9
-+-----+-----+-----+-13
1
2
3
I
45
INCREMENTED SYMBOL PLOT
13+
1
1
1
9+
1 44
I
X
I
S
6
3
7
SYMBOLS
GROUP
1101STlJRE
13+
19
I 23
1
17
5+"
1
1
1
1+
D
E
F
Gil
I
7
"
#19
#3
#4
"'0
013
# 16
.,
OBS SYM ID
6
8
SITE #6
14
9 SITE #14
17
A SITE #17
7.
B
5
C
8
D
15
E
13
19
c. Symbols For Identification
of Points Using Symbol Table.
Locations (1 ;;:: lowest moisture,
9 ;;:: highest moisture).
OBS SYM ID
19
1
SITE
3
2
SITE
3
SITE
10
SITE
"
13
5
SITE
16
6
SITE
1
7
SITE
J
AXISl
b. Symbols Signifying Groups
of Samples.
Measurements at Sample
A
9
-+-----+-----+-----+--
AXISl
AXISl
a. Symbols From Moisture
c
6
5
"
8
.,
SITE
SITE #5
SITE #8
SITE #15
ons SYM ID
12
F
SITE
9
G
SITE
H
SITE
11
I
SITE
7
,.
J
#12
i9
#11
#7
SITE #1.
d. Symbol Table For Symbols in c.
Figure 4. Output From EAP PROe PLOTM. Point Positions on All Plots Are Identical But
Symbols Are Different.
615