Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
COMPUTING AND PLOTTING FIRST ORDER SPATIAL AUTOCORRELATION STATISTICS USING SAS~ . Larry Layne, SUNY-College of Environmental Science and Forestry, 133 Illick Hall, Syracuse, NY 132.10 ABSTRACT location of mapping units? The presence of any such influence is known as spatial autocorrelation. Presence of spatial autocorrelation in classical statistical models violates the assumption of independence and leads to biased estimates of variation for tests used in inferential statistics (Griffith 1988). The SAS program presented here computes 2 indices of spatial autocorrelation: Moran's I and Geary's c ratio, and their respective tests of significance. In addition, the program computes the spatial mean, spatial median and standard distance, and plots these values on the areal distribution of attribute values. The program assumes a spatial arrangement of a regular lattice of squares, in any configuration, in order to compute the connectivity matrix. The program requires 3 variables in the data set: two variables corresponding to the X-Y location and a variable containing the attribute, or response, measurement, for each areal unit. The program requires SAS modules BASE and SAS/IKL~ Plotting is accomplished using SAS/GRAPH'f PROC GPLOT, but is not required to compute the statistics. Within the context of classical statistics, presence of spatial autocorrelation in a data set is important to consider, particularly if a classical statistical model such as analysis of variance (ANOVA) or regression will be used to analyze the data. Presence of spatial autocorrelation will lead to biased estimates of variance components, and. consequently, tests of significance. Spatial autocorrelation implies by its name that a correlation exists among the values of a single variable on a map due to the observed spatial arrangement of the values. This correlation, if present, must be identified and accounted for since a correlation among a set of values will tend to decrease the variance estimate, thus affecting the Type I and Type II errors (Griffith 1988). Once any correlation among a set of values is taken into account by a statistical model, inference testing can continue normally, according to the specifications of the proposed model. Ilf.l'RODUCTIOlf ME'l1K>DOLOGY Development of faster and cheaper computing has enabled Geographic Information Systems (GIS) to become a practical tool for storing and displaying spatial information. The next step in working with spatial information is the ability to ask questions and test hypotheses of such data within a statistical context. The first question to ask of spatial data is whether or not spatial arrangement of mapping units has an effect on the attribute (or response) variable, or set of variables, on the map. That is, are attribute values influenced due to Two estimators of spatial autocorrelation frequently used are Moran's I and Geary's c ratio. Moran's I is based on the concept of a Pearson correlation coefficient and. has an expected value of -1/(n-1). Moran's I is intuitively appealing since with large n, the expected value approaches 0, values greater than 0 indicate positive spatial autocorrelation, values less than 0 denote negative spatial autocorrelation, and the coefficient is approximately bounded by, but not restricted to, the limits within -1 and 1050 +1. This, obviol,1s1y, is similar to the concept of a Pearson correlation coefficient. juxtaposed, the covariation is effectively not computed since the resulting covariation for the two attribute values is multiplied by 0, then the 0 is added to the estimate sum. Although the weights matrix 9an consist of.,values other than just O's and l's, using the connectivity matrix is apPropriate unless there is specific reason to use other values. This is el?pecially true in the case of a regular lattice arrangement of mapping units. The connectivity is thus derived by creating a matrix of n rows by n columns. A 1 is placed in the matrix if the row'.s map unit is juxtaposed with the column's map unit,. and 0 otherwise. The connectivity matrix is therefore symmetric with O's on the diagonal. On the other hand, Geary's c ratio uses a variance-covariance approach and is not as easily likened to a Pearson correlation coefficient.. Geary's c ratio has an el(pected value of 1 with values less than 1 denoting positive spatial autocorrelation and values greater than 1 indicating negative spatial autocorrelation. The coefficient is bounded on the lower end by 0, but is unbounded on the upper end. However, mpst estimates pf Geary's c fall within the bounds of 0 and +2. The formula for Koran's I is given as: MC-J7[ft ~'jU,-l) j-1)1I [eft C,j)f (f (f,-fl') Spatial mean location is computed from the x-y coordinates using the attribute value as a weight for each map unit and given as: and Geary's c ratio is: For both of the above equations, n is the number of map units on the map, fi is the attribute value of the ith map unit, fj is the attribute value of the jth map unit, and t:;; is the weight coefficient correspondlng. to the. ith, jtlJ. map unit. For variance formulas corresponding to the normal and random sampling perspectives for these two estimators, consult Griffith (1987). :. where ·fi is .. the value of the attribute variable of a map unit, Xj. and Yi are geocoordinates ·of the itli map . unit. Spatial mean location can be interpreted as a center of gravity (Griffith 1991). One measure of· spatial dispersion, the standarddis40ce is given as: In order to compute the estimate of spatial autocorrelation, weighting coefficients are required to take spatial arrangement of the data into account. In the case of estimating first order spatial autocorrelation, only those attribute values which are juxtaposed to !:he reference map unit are considered. Computationally, the coefficients are contained as a matrix of O's and l's (called the connectivity matrix) and denotes to ..which other map units a particular reference map unit is juxtaposed. If two map units are juxtaposed, the covariation for the two attribute values is computeq and multiplied by 1, then added to the estimate sum. If two map units are not The standard distance is measurementscale specific ail,d . Can lie used to construct contour·lines .of equidistant deviation around the spatial·· mean (Griffith 1991 ). Spatial median minimizing: is that j_J1' r t:! 1051 1', [(x!-U)'+ (ycV) 'p/', point (U, V) where to the Fi X and Y values. The program automatically computes the connectivity matrix based on the x-y coordinates, assuming the map Wli ts to be square and equal in size. The map units, however, can be arranged in any configuration, with missing map Wlits allowed. Missing attribute values (observations with x-y coordinates present but attribute value missing) are deleted from the data set before analysis. Any observations with missing x or y coordinates are likewise deleted from the data set 'before analysiS. is the Euclidean distance separating the areal Wlit centroid (Xj,.Yi) and the spatial median (U,V), (Gr1f!ith, 1991). Computing the spatial median is a recursive process where at each step T, the spatial median is given as (Uf,Vf ) where, l-~ u.~ flxll [ (Xl-U._ 1 ) t:! '+ (Yl-V.- 1) ']1"] I ;I-~ [k Following isa sample map containing a set of 8 map units: fll [ (X;I-CT.-1)·+ (Yl-V.-1) ']"'}, t-" ___ I V.= [~ fiYtl [(XrU._~}2+ (Yr V.-.) 2]"']/ t:! t-" [yo- fjl t:! A B D E F G C ___ I I I I I [(Xi-U._~) 2+ (Yj-V,-l) 2] ~/2: ___ I I H I I ___ I I and the initial values for the spatial median (Uo' Vo) is the spatial m~. The convergence criterion used is 10· • SAMPLE DATA AID 'DIE PROGRAK Notice that the right center cell is missing and that the side of anyone n\ap Unit can be juxtaposed with only' one other map Wlit. Following is the same map with attribute values included: As described above, 2 pieces of information are required in order to compute a particular spatial autocorrelation coefficient. These are the attribute variable Wlder consideration as well as the spatial arrangement of the attribute values. For purposes of the program presented in this paper, spatial arrangement of the attribute variable is in the form of Cartesian coordinates. Thus, three variables are required in order to use this program.' The variable.s are F, X, and Y where F represents the attribute value of the map Wlit and X and Y are the associated Cartesian coordinates. The number of observations in the input data set is equal to the number of map units contained in the map. A macro window allows you to select the file containing your data and specify the variables in your data set corresponding ___ I I 5 4 4 3 3 2 3 II ___ I I ___ I I I I ___ I 1 I The corresponding cOll.l'lecti vi ty matrix for this map would. be: 1052 A B C D E F G x-v H coordinates of spatial mean standard distance A 0 1 0 1 0 0 0 0 B 1 0 1 0 1 0 0 0 C 0 1 0 0 0 0 0 0 D 1 0 0 0 1 1 0 0 0 1 0 1 0 0 1 0 F 0 0 0 1 0 0 1 0 G 0 0 0 0 1 1 0 0 H 0 0 0 0 0 0 1 0 =1.095445115 2.24 Spatial median value = 25;633507391 x-v coordinates of spatial median = 1.622 2.279 Nuaber of iterations to conveI'gence = 13 First order spatial autocorrelation coefficients with variance estimates and tests of significance . E = 1.68 = expected value of Moran's I -0.142857143 Moran's I coefficient 0.4508301405 = variance for Moran's I coefficient under assumption of random samplinq = 0.0679712 and z-value 2.2771687 = variance for Moran's I coefficient under assumption of ra.ndo.iz~tion and z-value =0.0669296 =2.2948206 expected value of Geary ratio Geary ratio coefficient = 0.3218390805 =1 variance for Geary ratio coefficient under assumption of random. sampling' =0.0850480 and z-value 2.3254161 = variance for Geary ratio coefficient under assumption of randoaization 0.0859774 and z-value 2.3128134 = = Again, notice that the connectivity matrix consists of only l's and O's, the diagonal consists only of O's, and that the matrix is square and symmetric. The corresponding output produced from the program follows: The Moran coefficient denotes positive spatial autocorrelation since the value is greater than zero. Likewise, Geary's c ratio indicates positive spatial autocorrelation as the value is less than 1. Significance test of the nuil hypothesis that spatial autocorrelation is not present are determined by comparing the z statistic with thez distribution. At the .05 significance level, z 05 = 1.96. Since both the computed'z statistics are greater than 1.96, we conclude that the null hypothesis of no spatial autocorrelation should be rejected. A second option presented to you via a 1053 macro window is wheth.er you want to plot the map, along with some of the pertinent statistics. Since SAS/GRAPH is used to plot the data you need to place the appropriate GOPTIONS statement(s) at the beginning of the program specifying the graphics device you are plotting to. You will also be prompted to enter an output file specification if you would like a destination different from the default output file. By default program results are written to an output file called SPACCORR RESULTS (on CMS) or SPACCORR.OUT (on Unix or MS-DOS) by use of a FILENAME statement. The output file includes Moran's I and its test of significance, Geary's c and its test of significance, x-y coordinates and values of the spatial mean and spatial median, and the standard distance. If desired, the connectivity matrix can be output to a file called SPACCORR LISTING (or SPACCORR.LIS). The program can be run from either the SAS Display Manager or as batch. The program may be obtained via snail mail from the author or electronically by sending email to [email protected]. REFEREl'ICES Griffith, D.A. 1991. Statistical Analysis for Geographers. Englewood , Cliffs, N.J. : Prentice Hall, 478 pp. Griffith, D.A. 1987. Spatial Autocorrelation: a primer. Association of American Geographers, Washington, D.C., 75 pp. Griffith, D.A. 1988. Advanced Spatial Statistics: Advanced studies in theoretical and applied econometrics. Kluwer Academic' Publishers, Norwell, MA, 273 pp. 1054 :' ''('1-. ""~« ,.•" e ... ~.,~."",1'j !<tr.-~""'-""'~""''''"~''''''''.~'-.''''~7"t''~' _.. '.~~~(<"'"' ,-,,_ '0"> [M:?I'.~~~S. _~~_-~-~~~QB~01-~-._z~~~;i~~_. ~-:2~~~~~O-6. -~~T~G~~~;~~_~~_~--.-~2-1~_~9.~~~~_~.~~~-~~~~:_:- -2~~;28 ~j~J 00 4.00 3.0 UI ....IUGI ....C 1:J t- O 0 U GI U 1 '" 0 C GI Co 0 VI VI ....GIGI ~100 3,00 Co I 0 GI CI rl IU U ........ Co GI > 00 - ..... ----,---.--, ---._--'-0'- ----r -._.'-'0' 1._ 0, ·~··-t- 2.00 ..-· -.. , ' - -.---,--_._-.-- ... r --.-- ... --,----.... -.. T·----,-··--·-·,--·--·---,--·---,-.. ------r--·· 2 1 Horizontal geo-reference coordinates mean "'-spatial median o-spatial Plot of geo--referenced coordinates with associated attribute values, spatial mean and spatial median of attribute values. z-values of spatial autocorrelation estimators are for randomization perspective 1.0 3