Download Computing and Plotting First-Order Spatial Autocorrelation Statistics Using SAS

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Spatial analysis wikipedia , lookup

Transcript
COMPUTING AND PLOTTING FIRST ORDER SPATIAL AUTOCORRELATION
STATISTICS USING SAS~
.
Larry Layne, SUNY-College of Environmental Science and Forestry,
133 Illick Hall, Syracuse, NY 132.10
ABSTRACT
location of mapping units? The presence
of any such influence is known as
spatial autocorrelation.
Presence of spatial autocorrelation in
classical statistical models violates
the assumption of independence and leads
to biased estimates of variation for
tests used in inferential statistics
(Griffith 1988).
The SAS program
presented here computes 2 indices of
spatial autocorrelation: Moran's I and
Geary's c ratio, and their respective
tests of significance. In addition, the
program computes the spatial mean,
spatial median and standard distance,
and plots these values on the areal
distribution of attribute values. The
program assumes a spatial arrangement of
a regular lattice of squares, in any
configuration, in order to compute the
connectivity matrix.
The program
requires 3 variables in the data set:
two variables corresponding to the X-Y
location and a variable containing the
attribute, or response, measurement, for
each areal unit. The program requires
SAS modules BASE and SAS/IKL~ Plotting
is accomplished using SAS/GRAPH'f PROC
GPLOT, but is not required to compute
the statistics.
Within
the
context
of
classical
statistics,
presence
of
spatial
autocorrelation in a data set is
important to consider, particularly if a
classical statistical model such as
analysis
of
variance
(ANOVA)
or
regression will be used to analyze the
data.
Presence
of
spatial
autocorrelation will lead to biased
estimates of variance components, and.
consequently, tests of significance.
Spatial autocorrelation implies by its
name that a correlation exists among the
values of a single variable on a map due
to the observed spatial arrangement of
the values.
This correlation, if
present,
must
be
identified and
accounted for since a correlation among
a set of values will tend to decrease
the variance estimate, thus affecting
the Type I and Type II errors (Griffith
1988). Once any correlation among a set
of values is taken into account by a
statistical model, inference testing can
continue normally, according to the
specifications of the proposed model.
Ilf.l'RODUCTIOlf
ME'l1K>DOLOGY
Development of faster and cheaper
computing
has
enabled
Geographic
Information Systems (GIS) to become a
practical
tool
for
storing
and
displaying spatial information.
The
next step in working with spatial
information is the ability to ask
questions and test hypotheses of such
data within a statistical context. The
first question to ask of spatial data is
whether or not spatial arrangement of
mapping units has an effect on the
attribute (or response) variable, or set
of variables, on the map. That is, are
attribute values influenced due to
Two
estimators
of
spatial
autocorrelation frequently used are
Moran's I and Geary's c ratio. Moran's
I is based on the concept of a Pearson
correlation coefficient and. has an
expected value of -1/(n-1).
Moran's I
is intuitively appealing since with
large n, the expected value approaches
0, values greater than 0 indicate
positive spatial autocorrelation, values
less than 0 denote negative spatial
autocorrelation, and the coefficient is
approximately bounded by, but not
restricted to, the limits within -1 and
1050
+1. This, obviol,1s1y, is similar to the
concept of a
Pearson correlation
coefficient.
juxtaposed,
the
covariation
is
effectively not computed since the
resulting covariation for the two
attribute values is multiplied by 0,
then the 0 is added to the estimate sum.
Although the weights matrix 9an consist
of.,values other than just O's and l's,
using the connectivity matrix is
apPropriate unless there is specific
reason to use other values.
This is
el?pecially true in the case of a regular
lattice arrangement of mapping units.
The connectivity is thus derived by
creating a matrix of n rows by n
columns. A 1 is placed in the matrix if
the row'.s map unit is juxtaposed with
the column's map unit,. and 0 otherwise.
The connectivity matrix is therefore
symmetric with O's on the diagonal.
On the other hand, Geary's c ratio uses
a variance-covariance approach and is
not as easily likened to a Pearson
correlation coefficient..
Geary's c
ratio has an el(pected value of 1 with
values less than 1 denoting positive
spatial autocorrelation and values
greater than 1 indicating negative
spatial
autocorrelation.
The
coefficient is bounded on the lower end
by 0, but is unbounded on the upper end.
However, mpst estimates pf Geary's c
fall within the bounds of 0 and +2.
The formula for Koran's I is given as:
MC-J7[ft ~'jU,-l) j-1)1I [eft C,j)f
(f
(f,-fl')
Spatial mean location is computed from
the x-y coordinates using the attribute
value as a weight for each map unit and
given as:
and Geary's c ratio is:
For both of the above equations, n is
the number of map units on the map, fi
is the attribute value of the ith map
unit, fj is the attribute value of the
jth map unit, and t:;; is the weight
coefficient correspondlng. to the. ith, jtlJ.
map unit.
For variance formulas
corresponding to the normal and random
sampling perspectives for these two
estimators, consult Griffith (1987).
:.
where ·fi is .. the value of the attribute
variable of a map unit, Xj. and Yi are
geocoordinates ·of the itli map . unit.
Spatial mean location can be interpreted
as a center of gravity (Griffith 1991).
One measure of· spatial dispersion, the
standarddis40ce is given as:
In order to compute the estimate of
spatial
autocorrelation,
weighting
coefficients are required to take
spatial arrangement of the data into
account.
In the case of estimating
first order spatial autocorrelation,
only those attribute values which are
juxtaposed to !:he reference map unit are
considered.
Computationally,
the
coefficients are contained as a matrix
of O's and l's (called the connectivity
matrix) and denotes to ..which other map
units a particular reference map unit is
juxtaposed.
If two map units are
juxtaposed, the covariation for the two
attribute values
is
computeq and
multiplied by 1, then added to the
estimate sum. If two map units are not
The standard distance is measurementscale specific ail,d . Can lie used to
construct contour·lines .of equidistant
deviation around the spatial·· mean
(Griffith 1991 ).
Spatial median
minimizing:
is
that
j_J1'
r
t:!
1051
1', [(x!-U)'+ (ycV) 'p/',
point
(U, V)
where
to the Fi X and Y values. The program
automatically computes the connectivity
matrix based on the x-y coordinates,
assuming the map Wli ts to be square and
equal in size. The map units, however,
can be arranged in any configuration,
with missing map Wlits allowed. Missing
attribute values (observations with x-y
coordinates present but attribute value
missing) are deleted from the data set
before analysis. Any observations with
missing x or y coordinates are likewise
deleted from the data set 'before
analysiS.
is the Euclidean distance separating the
areal Wlit centroid (Xj,.Yi) and the
spatial median (U,V), (Gr1f!ith, 1991).
Computing the spatial median is a
recursive process where at each step T,
the spatial median is given as (Uf,Vf )
where,
l-~
u.~ flxll [ (Xl-U._ 1 )
t:!
'+ (Yl-V.-
1)
']1"] I
;I-~
[k
Following isa sample map containing a
set of 8 map units:
fll [ (X;I-CT.-1)·+ (Yl-V.-1) ']"'},
t-"
___ I
V.= [~ fiYtl [(XrU._~}2+ (Yr V.-.) 2]"']/
t:!
t-"
[yo- fjl
t:!
A
B
D
E
F
G
C
___
I
I
I
I
I
[(Xi-U._~) 2+ (Yj-V,-l) 2] ~/2:
___ I
I
H
I
I
___ I
I
and the initial values for the spatial
median (Uo' Vo) is the spatial m~. The
convergence criterion used is 10· •
SAMPLE DATA AID 'DIE PROGRAK
Notice that the right center cell is
missing and that the side of anyone n\ap
Unit can be juxtaposed with only' one
other map Wlit. Following is the same
map with attribute values included:
As described above,
2 pieces of
information are required in order to
compute
a
particular
spatial
autocorrelation coefficient. These are
the
attribute
variable
Wlder
consideration as well as the spatial
arrangement of the attribute values.
For purposes of the program presented in
this paper, spatial arrangement of the
attribute variable is in the form of
Cartesian coordinates.
Thus, three
variables are required in order to use
this program.' The variable.s are F, X,
and Y where F represents the attribute
value of the map Wlit and X and Y are
the associated Cartesian coordinates.
The number of observations in the input
data set is equal to the number of map
units contained in the map.
A macro
window allows you to select the file
containing your data and specify the
variables in your data set corresponding
___ I
I
5
4
4
3
3
2
3 II
___ I
I
___ I
I
I
I
___ I
1
I
The corresponding cOll.l'lecti vi ty matrix
for this map would. be:
1052
A
B
C
D
E
F
G
x-v
H
coordinates of spatial mean
standard distance
A
0
1
0
1
0
0
0
0
B
1
0
1
0
1
0
0
0
C
0
1
0
0
0
0
0
0
D
1
0
0
0
1
1
0
0
0
1
0
1
0
0
1
0
F
0
0
0
1
0
0
1
0
G
0
0
0
0
1
1
0
0
H
0
0
0
0
0
0
1
0
=1.095445115
2.24
Spatial median value =
25;633507391
x-v coordinates of spatial median = 1.622
2.279
Nuaber of iterations to conveI'gence = 13
First order spatial autocorrelation coefficients
with variance estimates and tests of significance
.
E
= 1.68
=
expected value of Moran's I
-0.142857143
Moran's I coefficient
0.4508301405
=
variance for Moran's I coefficient
under assumption of random samplinq = 0.0679712
and z-value 2.2771687
=
variance for Moran's I coefficient
under assumption of ra.ndo.iz~tion
and z-value
=0.0669296
=2.2948206
expected value of Geary ratio
Geary ratio coefficient = 0.3218390805
=1
variance for Geary ratio coefficient
under assumption of random. sampling' =0.0850480
and z-value 2.3254161
=
variance for Geary ratio coefficient
under assumption of randoaization
0.0859774
and z-value 2.3128134
=
=
Again, notice that the connectivity
matrix consists of only l's and O's, the
diagonal consists only of O's, and that
the matrix is square and symmetric. The
corresponding output produced from the
program follows:
The Moran coefficient denotes positive
spatial autocorrelation since the value
is greater than zero. Likewise, Geary's
c ratio indicates positive spatial
autocorrelation as the value is less
than 1. Significance test of the nuil
hypothesis that spatial autocorrelation
is not present are determined by
comparing the z statistic with thez
distribution. At the .05 significance
level, z 05 = 1.96.
Since both the
computed'z statistics are greater than
1.96, we conclude that the null
hypothesis of no spatial autocorrelation
should be rejected.
A second option presented to you via a
1053
macro window is wheth.er you want to plot
the map, along with some of the
pertinent statistics. Since SAS/GRAPH
is used to plot the data you need to
place
the
appropriate
GOPTIONS
statement(s) at the beginning of the
program specifying the graphics device
you are plotting to.
You will also be prompted to enter an
output file specification if you would
like a destination different from the
default output file. By default program
results are written to an output file
called SPACCORR RESULTS (on CMS) or
SPACCORR.OUT (on Unix or MS-DOS) by use
of a FILENAME statement.
The output
file includes Moran's I and its test of
significance, Geary's c and its test of
significance, x-y coordinates and values
of the spatial mean and spatial median,
and the standard distance. If desired,
the connectivity matrix can be output to
a file called SPACCORR LISTING (or
SPACCORR.LIS). The program can be run
from either the SAS Display Manager or
as batch. The program may be obtained
via snail mail from the author or
electronically by sending email to
[email protected].
REFEREl'ICES
Griffith,
D.A.
1991.
Statistical
Analysis for Geographers. Englewood
, Cliffs, N.J. : Prentice Hall, 478 pp.
Griffith,
D.A.
1987.
Spatial
Autocorrelation: a primer. Association
of American Geographers, Washington,
D.C., 75 pp.
Griffith, D.A. 1988. Advanced Spatial
Statistics:
Advanced
studies
in
theoretical and applied econometrics.
Kluwer Academic' Publishers, Norwell, MA,
273 pp.
1054
:' ''('1-. ""~«
,.•" e ... ~.,~."",1'j !<tr.-~""'-""'~""''''"~''''''''.~'-.''''~7"t''~' _.. '.~~~(<"'"' ,-,,_ '0">
[M:?I'.~~~S. _~~_-~-~~~QB~01-~-._z~~~;i~~_. ~-:2~~~~~O-6. -~~T~G~~~;~~_~~_~--.-~2-1~_~9.~~~~_~.~~~-~~~~:_:- -2~~;28 ~j~J
00
4.00
3.0
UI
....IUGI
....C
1:J
t-
O
0
U
GI
U
1
'" 0
C
GI
Co
0
VI
VI
....GIGI
~100
3,00
Co
I
0
GI
CI
rl
IU
U
........
Co
GI
>
00 - ..... ----,---.--, ---._--'-0'- ----r -._.'-'0' 1._ 0,
·~··-t-
2.00
..-· -.. , ' - -.---,--_._-.-- ...
r --.-- ... --,----.... -..
T·----,-··--·-·,--·--·---,--·---,-.. ------r--··
2
1
Horizontal geo-reference coordinates
mean
"'-spatial median
o-spatial
Plot of geo--referenced coordinates with associated attribute values,
spatial mean and spatial median of attribute values.
z-values of spatial autocorrelation estimators are for randomization perspective
1.0
3