Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Spatial Databases First law of geography [Tobler]: Everything is related to everything, but nearby things are more related than distant things. Lecture 8 : Spatial Statistics Autocorrelation & Geographically Weighted Regression Pat Browne Correlation The correlation coefficient is a measure of the degree of linear relationship between two variables, X and Y. Correlation measures the strength of a relationship between data. The correlation coefficient ranges from -1 to +1. In contrast to regression (discussed later) the correlation relation does not mean that one thing causes the other (there could be other reasons the data has high correlation). Correlation Correlation Regression • Regression: takes a numerical dataset and develops a mathematical formula that fits the data. The results can be used to predict future behaviour. Works well with continuous quantitative data like weight, speed or age. Not good for categorical data where order is not significant, like colour, name, gender, nest/no nest. Example: plotting snowfall against height above sea level. Standard statistical concepts: Regression Y X Y = A + BX; The response variable is Y, and X is the continuous explanatory variable. Parameter A is the intercept. Parameter B is the slope coefficient. The difference between each data point and the value predicted by the line (the model) is called a residual . Regression Y X Where X , Y are the means of X and Y. Alternative terminology for linear regression equation: Y = a + bX where •Y is the dependent variable •a is the intercept •b is the slope or regression coefficient •X is the independent variable Regression Model in R (see Lab) Moving the line to get a best fit Changing the slope of the line to get a best fit R can calculate the maximum likelihood estimate of the intercept and slope giving: y = 4.8 + (0.6 * x) Local Versus Global Statistics. From “Geographically Weighted Regression” by Fotheringham,Brunsdon,Charlton Local Versus Global Statistics. From “Geographically Weighted Regression” by Fotheringham,Brunsdon,Charlton The ecological fallacy and the modifiable areal unit problem From “Spatial data analysis” by Christopher D. Lloyd We often need to use spatially aggregated data, for example census zones or cells in remotely sensed images. Such zones are unlikely to be internally homogeneous. A cell in a remotely sensed image has only one value, but in the real world there may be several features in the area covered by the cell. The variation within an area is lost if the area is larger than the individual features it contains. Ecological fallacy/Modifi able areal unit problem(MAUP) From “Spatial data analysis” by Christopher D. Lloyd The ecological fallacy refers to the problem of making inferences about individuals from aggregate data. For example, not all people in one census zone are likely to share the same characteristics. The majority of people in a census zone may be wealthy, but if there is a housing estate (high density) just inside one edge of the zone then clearly generalizations about the population of the zone may be unsound. Modifable areal unit problem(MAUP) From “Spatial data analysis” by Christopher D. Lloyd • The MAUP is composed of two parts: – The scale effect: Statistical analyses based on data aggregated over areas of different sizes will produce different results. – The zoning effect :Two sets of zones can have the same or similar areas but very different forms and analyses based on two such sets of zones may vary. Modifable areal unit problem(MAUP) From “Spatial data analysis” by Christopher D. Lloyd Moving Window From “Spatial data analysis” by Christopher D. Lloyd Moving windows (MW) map how values change from place to place. MW used in many contexts, including finding the gradient of the terrain locally Spatial autocorrelation • Spatial autocorrelation (SA) is the degree of correlation between neighbouring values of some property of a region (e.g. population). SA occurs when the value of a variable in a location is correlated with values of the same variable in the neighbourhood. SA is measured with Moran’s I. • Moran’s I measures the average correlation between the value of a variable at one location and the value at nearby locations. The essential idea is to specify pairs of locations that influence each other along with the relative intensity of interaction. Moran’s I provides a global view of spatial autocorrelation correlation. Moran’s I • The range of the Moran's I statistic depends on the spatial weight matrix. • When Moran's I is scaled by its bounds the statistic is restricted to the range ±1 • Moran’s I can serve as a tool for modeling spatial dependencies in many data mining techniques. Same Mean and SD but different Moran’s I Same Mean and SD but different Moran’s I Spatial Autocorrelation: Moran’s I example Moran’s I - example Figure 7.5, pp. 190 •Pixel value set in (b) and (c ) are same but their Moran Is are different. •Q? Which dataset between (b) and (c ) has higher spatial autocorrelation? Neighbours. Immediate neighbours can be considered using either a rooks or queens case. The neighbour relation can be weighted with simple ajacency or more complex calculations, such as boundary length. Geographical Weights • Binary: Rook or queen neighbours • Distance based • Boundary or perimeter based. • Weights can be rownormalized using the number of adjacent cells Neigbourhood relationship contiguity matrix Spatial Lag Example 1 2 7 4 3 6 5 4 7 6 5 8 5 4 • Spatial lag = sum of spatiallyweighted values of neighboring cells 4 9 6 Lag for cell 2 = 1/3(7) + 1/3(5) +1/3(4) = 5.3 3 Sample Region Ids top left and Values in centre Spatial Lag • Map 1 and Map 2 represent a set of rainfall readings for regions labelled A to I. For both maps the mean is 10, and the standard deviation is 3.8. • Lag for E in Map1=(6+7+13+14)/4=10 • Lag for E in Map2=(7+8+6+5)/4 =6.5 • In Map 1 the lag=E, in Map2 lag<E, hence E is more like its neighbours in Map1 than in Map2 (Rooks case). Spatial autocorrelation Negative Dispersed Spatial Independence Spatial Clustering Positive Moran’s I • Global Moran’s I • What is the extent of clustering in the total area? • Is this clustering significantly different from a random spatial distribution? • Local Moran’s I • Do local clusters (high-high or low-low) or local spatial outliers (high-low or low-high) exist? • Are these local clusters and spatial outliers statistically significant? • Local Moran is a special case of Local indicators of spatial association (LISA) Moran Scatter Plot Scatter Diagram between X and Lag-X, the “spatial lag” of X formed by averaging all the values of X for the neighboring polygons Identifies which type of spatial autocorrelation exists. Low/High negative SA Low/Low positive SA High/High positive SA High/Low negative SA Briggs Henan University 2010 28 Moran’s I index Spatial Autocorrelation: Case Study Nest locations Distance to open water Vegetation durability Water depth Spatial Autocorrelation Classical Statistical Assumptions (i.i.d) do not hold for spatially dependent data Moran’s I - example • Moran I statistic for map 1 is 0.55316092 • Moran I statistic for map 2 is -0.76724138 Moran’s I - example Spatial Autocorrelation : Moran Scatterplot Map São Paulo WZ Q4 = LH Q1= HH a 0 Q2= LL Q3 = HL 0 z Old-aged population Moran’s I: A measure of spatial autocorrelation • Given x x1,...xn sampled over n locations. t zWz Moran I is defined as I zz t Where z x1 x ,...,xn x and W is a normalized contiguity matrix. Fig. 7.5, pp. 190 How to decide the weight wij ? The weight indicates the spatial interaction between entities. 1) Binary wij, also called absolute adjacency. Covers the general case answering the question is a value in a region similar or different to its neighbours. wij = 1 if two geographic entities are adjacent; otherwise, wij = 0. Choice of adjacency definition queens(8) or rooks(4). How to decide the weight wij ? The weight indicates the spatial interaction between entities. 2) The distance between geographic entities. Often the inverse distance is used, further objects get less weight, near object get more weight e.g. centre of epidemic. wij = f(dist(i,j)), dist(i,j) is the distance between i and j. 3) The length of common boundary for area entities. Policing borders, smaller borders less weight. wij = f(leng(i,j)), leng(i,j) is the length of common boundary between i and j. How to decide the weight wij ?1 The choice of weights should ultimately be driven by a rationale for including those areas as neighbors that have a spatial effect on a given location. This rationale can be derived from theory or be the result of using ESDA to experiment with different weights and connectivity orders. Since weights matrices are used to create spatial lags that average neighboring values, the choice of a weights matrix will determine which neighboring values will be averaged. For instance, since rook weights will usually have fewer neighbors than queen weights, on average, each neighboring observation has more influence. How to decide the weight wij ? 1 The question of which weights to choose is more pertinent in the context of modeling than ESDA since modeling is based on substantive notions of spatial effects while ESDA prioritizes the rejection of spatial randomness. Therefore, if there are no substantive reasons to guide the choice of weights in ESDA, using a weights file with as few neighbors as possible (such as rook) makes sense. Especially with irregular areal units (as opposed to grids), the difference between rook and queen weights is often minimal. However, it is advisable to test how sensitive your results are to your weights specifications by comparing multiple weights matrices. Spatial Outlier Detection • Global outliers are observations which appear inconsistent with the remainder of that data set. • Global outliers deviate so much from other observations that it may be possible that they were generated by a different mechanism. • Spatial outliers are observations that appear inconsistent with their neighbours. Spatial Outlier Detection • Detecting spatial outliers has important applications in transportation, ecology, public safety, public health, climatology and location based services. • Geographic objects have a spatial (location, shape, metric & topological properties) & non-spatial component (house owner, sensor id., soil type). Spatial Outlier Detection • Spatial neighbourhoods may be defined using spatial attributes & spatial relations. • Comparisons between spatially referenced objects can be based on non-spatial attributes. • A spatial outlier is a spatially referenced object whose non-spatial attribute values differ from those of other spatially referenced objects in its spatial neighbourhood. Spatial Outlier Detection • The upper left & lower right quadrants of figure 7.17 indicate a spatial association of dissimilar values; low values surrounded by high value neighbours (P & Q) and high values surrounded by low values (S). Spatial Outlier Detection • Moranoutlier is a point located in the upper left or lower right quadrant of a Moran scatter plot. Spatial Outlier Detection • Moranoutlier is a point located in the upper left or lower right quadrant of a Moran scatter plot. WZ Q4 = LH Db 0 Q2= LL Q1= HH Cb a Q3 = HL z 0 values in a given location Model Evaluation • Consider the two-class classification problem ‘nest’ or ‘no-nest’. The four possible outcomes (or predictions) are shown on the next slide. The desired predictions are: – 1) where the model says the should be a nest and there is an actual nest (True Positive) – 2) where the model says there is no nest and there is no nest (True Negative) • The other outcomes are not desirable and point to a flaw in the model. Model Evaluation