Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Answer Key to Problem Set #7 Geog 3000: Advanced Geographic Statistics Instructor: Dr. Paul C. Sutton Problem Set Number 7 focuses on descriptive spatial statistics and spatial interpolation methods. These problems draw from information in Chapter 3 & 4 of the McGrew & Monroe text and from Chpater 16 of the Chang Intro GIS text I have included in the zip file for this problem set. Kriging and other spatial interpolation techniques are mathematically complex (for most of us anyway). These exercises focus on conceptual rather than computational understanding. You can teach a whole graduate course in Kriging and spatial interpolation. We are skimming over some of the important basic concepts. For most of us, these computations take place with the click of a button. Understanding what these computations are doing and interpreting the results of these computations is what I am trying to get across with these exercises. Good Luck! Read all about Pirates and Global Warming at the Church of the Flying Spaghetti Monster (http://www.venganza.org/about/open-letter/ ). Is correlation causality? #1) What is a Coefficient of Variation anyway? Suppose the length of Pine Beetles is distributed normally N(1 cm, 0.1 cm). [That’s mean and standard deviation BTW]. Also assume that garter snakes are distributed normally also N(60 cm, 3 cm). Which of these two creatures has a greater variability of length? How can you appropriately compare apples and oranges? What statistics would you use to demonstrate this? The standard deviation of the Pine Beetle is 10% (.1 cm/1 cm) of the average length of a Pine Beetle. The standard deviation of the Garter snake is only 5% (3 / 60) of its average length. By this measure (the coefficient of variation) the Pine Beetle has greater variability of length. Wikipedia gives a pretty good definition: In probability theory and statistics, the coefficient of variation (CV) is a normalized measure of dispersion of a probability distribution. It is defined as the ratio of the standard deviation to the mean : This is only defined for non-zero mean, and is most useful for variables that are always positive. It is also known as unitized risk or the variation coefficient. The coefficient of variation should only be computed for data measured on a ratio scale. As an example, if a group of temperatures are analyzed, the standard deviation does not depend on whether the Kelvin or Celsius scale is used. However the mean temperature of the data set would differ in each scale and thus the coefficient of variation would differ. So the coefficient of variation does not have any meaning for data on an interval scale.[1] #2) Descriptive Spatial Statistics Cosider the following Point locations in the scatterplot below. Note: the coordinates of these points are provided in the table on the right. A) What is the mean center of these points? Xmc = 3.86 Ymc = 3.5 B) What is the weighted mean center of these points if you use their Z value as the weight? Xwmc = 3.493 Ywmc = 3.145 C) How can the mean center be interpreted? The mean center is the location that minimizes the squared distances to each point. D) How can the weighted mean center be interpreted? The mean center minimizes the (squared distance to each point times that points respective z-value). The effect is to move the mean center closer to more heavily weighted points. #3) Emergency Services for Prairie Dogs Suppose you have been charged with providing ambulance service to seven prairie dogs scattered on a prairie called isotropia. The location of these prairie dogs is given by the following table of coordinates. Where should you locate X Y Prairie Dog your prairie dog ambulance if you wanted to 1 1 Fred minimize the average distance you would have to drive to 3 7 Barney get to these prairie dog locations? (NOTE: provide 9 11 Wilma a verbal descriptive answer here not a numerical one). 4 2 Betty Assume all seven prairie dogs have equal 8 8 Sam probability of needing ambulance services and that on 2 10 Elma 9 4 Bob the prairie of ‘isotropia’ the shortest distance (e.g. fastest prairie dog ambulance ride) between two points is a straight line. How does one go about obtaining a numerical answer to this question? How is this problem different than the spatial mean problems you explored in question #2? You should locate your prairie dog ambulance at the Euclidean median of these prairie dog locations. The Euclidean median is different than the spatial mean in that it minimizes unsquared rather than squared distances to each of the seven prairie dogs. See this section in Chapman and McGrew text (pg 55): "For many geographic applications, another measure of "center" is more useful than the mean center. Often, it is more practical to determine the central location that minimizes the sum of unsquared, rather than squared, distances. This location, which minimizes the sum of Euclidean distances from all other points in a spatial distribution to that central location, is called the Euclidean median, (Xe, Ye), or median center. Mathematically, this location minimizes the sum: (see equation 4.6) Unfortunately, determining coordinates of the Euclidean median is methodologically complex. Computer-based iterative algorithms (step by step procedures) must be used to reach a solution. These algorithms evaluate a sequence of possible coordinates and gradually converge on the best location for the Euclidean median. " #4) The First Law of Geography Y Waldo Tobler is often credited for positing the First Law of Geography (aka Tobler’s Law) which states: “Everything is related to everything else but near things are more related than distant things.” It can be argued that the first law is merely a qualitative statement concerning the nature of spatial autocorrelation. Autocorrelation is typically described as the self-similarity of phenomena in spatial or temporal domains. The Dow Jones Industrial Average (DJIA) can be used to describe temporal autocorrelation. If the DJIA closed at 10,000 yesterday it is more likely to close at or near 10,000 today than it is a year from now. Rainfall is a good way to exemplify spatial autocorrelation. If it is raining at my house it is more likely to be raining at my neighbor’s house than it is to be raining across town. Human beings seem to have a natural or innate understanding of spatial and temporal autocorrelation (to some extent at least). If I show my 10 year old son the graph on the right and tell him to guess the temperature at the location marked with an ‘X’ based on the other Bivariate Fit of Y By x temperature measurements he will do some 10 sort of mental spatial interpolation in which 9 89 nearer values will have more 8 ‘weight’ than distant values (he guessed 71). Chapter 16 from the Chang 7 110 text that accompanies this problem set 6 does a good job at describing 5 X numerous formal mathematical means 4 for performing spatial interpolation 3 including: Inverse Distance 2 65 Weighting (IDW), splines, 1 polynomial trend surface curve fitting, 1 2 3 4 5 6 7 8 9 10 and kriging. Kriging actually x involves mathematically characterizing spatial autocorrelation by fitting a curve to a variogram. A variogram characterizes variance (or its inverse – correlation) as a function of distance. For details on these methods read the file: Ch16introGISkangtsungChang.pdf. Answer the questions on the following pages that are associated with spatial interpolation. A) Inverse Distance weighting is a simple spatial interpolation method. Global methods use all the available control points to estimate a value at an unknown location. Local methods use all the points within a fixed distance or the nearest ‘N’ neighbors or the nearest ‘N’ neighbors within a given distance. Given the following points provide an estimate of the value of Z0 for the location denoted by ‘X’ using: 1) Global IDW with exp = 1 and exp = 2; 2) Local IDW with exp = 1 using an inclusive Radius of 5.2. 3) Local IDW with exp = 1 using the three nearest neighbors Bivariate Fit of Y By x Global uses all 5 points. Local Radius 5.2 uses the 2nd and 4th points only. Local with three nearest neighbors uses 1st point, 2nd point, and 4th point. If exp = 1 use simple distance, if exp=2 use distance squared. B) Given the image (raster/grid) below with only 6 known values use IDW to fill in all the blanks. You’ll be advised to think like a programmer and do this in excel. Use Inverse Distance Squared and go with a Global approach. Draw the interpolated image on the right (e.g. fill in the blanks in the grid). The Zestimates for the points in the grid above are pasted in on the right. The BOLDED known points have been estimated via a method known as cross-validation (CV). Essentially you simply remove the known point and use the other known points to estimate the point you removed. The point at (x=1, y=4, z=1) had a CV estimate of 5.15. Over a 400% error. Below is an excel spreadsheet I used to calculate all these values. This is a tiny and simple small dataset. I think it is clear that we all believe that computers and computer programmers are GOOD . C) How good do you think your IDW estimates of the empty cells in question ‘B” are? How might you go about characterizing your ‘skill’ of estimation. How do you think the ‘skill’ (aka accuracy) of estimate will vary as a function of distance from know values? The example point at X=1, Y=4 Z=1 had an extimated Z of 5.15 when you ‘pretended’ that you did not know its actual value was 1. That was a pretty bad estimate. A summary of the rest of the errors for the know values is in this table: In this case the average percentage error was over 100% of the actual value. This is not very good. Error will probably be a function of the following: 1) the natural variability of the phenomena being measured, 2) distance of known points to the location to be estimated, and 3) the number of known points and their distribution in space. #5) The Variogram: Characterizing Spatial Autocorrelation Imagine 1,000 rainfall gauges scattered throughout the conterminous United States. On April 15th, 2004 you assemble all the rainfall measurements at these 1,000 stations. You have a table that looks like this: A) Describe in your own words how you would produce a variogram from the data above. Building a variogram is called by some a ‘simple’ process. It took me a while to ‘grok’ this ‘simple’ procedure. I will try to explain it so it seems ‘simple’. Here we go: First of all you have a huge table of points in space with Z values. Now – you have to create a table that has all of the distances between all possible ‘pairs’ of points (e.g. Bodip to Bumwallop, Bodip to Goofusville, Bodip to Teaneck, etc. AND Oxnard to Bumwallop, Oxnard to Goofusville, Oxnard to Teaneck, etc. ). In other words, if you have a table of ONLY ten locations you have a ‘Distance table’ with 9+8+7+6+5+4+3+2+1 = 45 distances - for a semi-variogram (90 for a ‘full’ variogram because Bodip to Bumwallop is not the same as Bumwallop to Bodip – whatever ). 10 locations making 45 distances might not seem so bad but this is non-linear with increases in point locations. This ‘distance table’ can get pretty big pretty fast. So imagine this table and imagine ‘sorting’ the table on the ‘Distance’ column (see figure below – There may be logical inconsistencies in the tables below that you might be able to identify with multi-dimensional scaling techniques – go with it as an example for this purpose though, it should work for that): Sort the above Table on ‘Dist A to B’ To get something like the above – Of course there will be more records inbetween all these distance values. Now you do something similar to ‘Binning’ for a histogram. You have sorted by distance between points and you ‘Bin’ on that column (e.g. all the ‘pairs of points’ that are from 141 to 150 units apart). Let’s say that bin (141-150) has 30 pairs of points in it. You can calculate a ‘mean’ and a ‘variance’ of the difference between their Z values (Also, if you wanted to create a Correlogram you could calculate the correlation ( R ) between the Z values for each of these paired points). In any case, Lather rinse repeat for paired points that have distances of 131 to 140, 121 to 130, 111 to 120 …… 0 to 10 (or whatever ‘bin size’ you choose). You should now be able to build a table that looks like this (Variance or Semi-Variance or Correlation as a function of distance): Note: Variance typically increases with distance whereas Correlation decreases with increasing distance. With the information in the table above you can plot variance or correlation as a function of distance which in this case will be ‘bin center’. Once this is done you get to ‘fit’ that curve with a line, a Gaussian bell shaped curve, or other curve forms. This is how you characterize self-similarity of spatially distributed numbers in space. This fitted curve becomes a look up table (e.g. you use a calculated distance to ‘look up’ a variance or correlation) used in the spatial interpolation technique known as Kriging. B) Explain what a variogram and/or correlogram is and how it is used in the spatial interpolation process known as kriging. A variogram (or correlogram) is a device that characterizes spatial auto-correlation. It is in essence a quantification of Tobler’s law. Kriging assumes that certain spatially variable phenomena (ore grade for example – it was developed by mining and geologic engineers) have three components of variability: 1) Spatial Autocorrelation, 2) Large Overall Trends, and 3) Random variation. The variogram characterizes the first (spatial autocorrelation) to improve our ability to predict unknown quantities in space. C) Draw a generic variogram for a spatially autocorrelated phenomenon and label the ‘Range’, ‘Nugget’, and ‘Sill’. Provide a conceptual explanation of these terms using a specific example of a spatially autocorrelated phenomena such as temperature. The nugget is the ‘natural variation’ of the phenomena that can occur at zero distance. Imagine a room at constant temperature that you measured the temperature of over and over again. If you got a mean of 72 degrees with a variance of .02 degrees your nugget would be .02 (or the reliability of your thermometer might be suspect). Figure taken from a paper posted to the discussion board at this URL: http://www.iasri.res.in/ebook/EBADAT/6-Other%20Useful%20Techniques/11-Spatial%20STATISTICAL%20TECHNIQUES.pdf The sill represents the variability of the phenomena in the aggregate. If you measure the earth at 1000 random points around the globe and got a mean of 72 degrees with a standard deviation of 15 then your sill would be 15 degrees. The range represents the distance over which knowing a value within that distance can inform your estimate of an unknown location. For example: If you are trying to guess the temperature of Bodip, Kansas and your variogram of temperature has a range of 300 km then you would need to have a known temperature within 300 km of Bodip in order to have a spatially informed estimate of the temperature. If you don’t have any known points within the range of your variogram you might as well simply guess the mean of the phenomena you are trying to estimate. D) Suppose you generated an artificial ‘dataset’ using ‘X’ coordinates drawn from a Uniform (0, 100) random variable, ‘Y’ coordinates from a Uniform(0,100) random variable, and ‘Z’ values for these points from a Normal(100,15) random variable. What would a variogram for this dataset look like? Draw one. I’m too lazy to sketch this. Basically you would have a flat variogram in which the nugget and sill had the same value. It would be a uniformly flat line of variance as a function of distance at a height of 15 squared. If it were a correlogram it would be R = 0 no matter what distance you choose. #6) Cross-Validation, Bootstrapping, and Jack-Knifing Oh My! In question #4-C you were asked how you might go about characterizing the accuracy or ‘skill’ of a particular spatial interpolation. A technique known by several names such as crossvalidation, bootstrapping, and jack-knifing has been formalized to characterize the accuracy or ‘skill’ of temporal and spatial interpolations. A) Explain how cross-validation is performed. Cross validation is done by ‘removing’ each known point (one at a time) and estimating its value with the remaining known points. If you have ‘n’ points, you will have ‘n’ estimates of those points also. This is a pretty good way of assessing the skill or accuracy of your interpolation. B) Explain how cross-validation can be used to characterize the accuracy of a spatial interpolation? For every known point you estimate its Z value. You can calculate all the standard statistics for assessing error such as Mean Error, Mean Percentage Error, Mean absolute deviation, Coefficient of Variation, etc. C) Can the results of Cross-Validation be mapped? Is it possible to map levels or degrees of confidence of interpolation? Yes, you can map your errors at all known locations and then interpolate your errors to get a map of estimated error. Kriging techniques are particularly good for this purpose. D) How could you use cross-validation to compare the accuracy of two different spatial interpolation techniques (e.g. contrasting the skill of an IDW interpolation to the skill of a kriging interpolation of the same dataset). Simply to a cross-validation on your data using both IDS and Kriging. It turns out that IDS often defeats kriging in this benchmark because the assumption of stationarity that is important to kriging is often not true. Stationarity is associated with the variogram that characterizes spatial autocorrelation being constant across space. This is rarely true. And geographic data isn’t independent either. E) Conduct a cross-validation on the dataset provided in question #4-B. This table pretty much sums it up……..