Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Geographical analysis Overlay, cluster analysis, autocorrelation, trends, models, network analysis, spatial data mining Geographical analysis • Combination of different geographic data sets or themes by overlay or statistics • Discovery of patterns, dependencies • Discovery of trends, changes (time) • Development of models • Interpolation, extrapolation, prediction • Spatial decision support, planning • Consequence analysis (What if?) Example overlay • Two subdivisions with labeled regions soil Soil Soil Soil Soil vegetation type type type type 1 2 3 4 Birch forest Beech forest Mixed forest Birch forest on soil type 2 Kinds of overlay • Two subdivisions with the same boundaries - nominal and nominal Religion and voting per municipality - nominal and ratio Voting and income per municipality - ratio and ratio Average income and age of employees • Two subdivisions with different boundaries Soil type and vegetation • Subdivision and elevation model Vegetation and precipitation Kinds of overlay, cont’d • Subdivision and point set quarters in city, occurrences of violence on the street • Two elevation models elevation and precipitation • Elevation model and point set elevation and epicenters of earthquakes • Two point sets money machines, street robbery locations • Network and subdivision, other network, elevation model Result of overlay • New subdivision or map layer, e.g. for further processing • Table with combined data • Count, surface area Soil Type Type Type Type …. 1 2 3 4 Vegetation Area Beech Birch Mixed Beech …. 30 15 8 2 ha ha ha ha #patches 2 2 1 1 Buffer and overlay • Neighborhood analysis: data of a theme within a given distance (buffer) of objects of another theme Sightings of nesting locations of the great blue heron (point set) Rivers; buffer with width 500 m of a river Overlay Nesting locations great blue heron near river Overlay: ways of combination • Combination (join) of attributes • One layer as selection for the other – Vegetation types only for soil type 2 – Land use within 1 km of a river Overlay in raster • Pixel-wise operation, if the rasters have the same coordinate (reference) system Pixel-wise AND Forest Population increase above 2% per year Both Overlay in vector • E.g. the plane sweep algorithm as given in Computational Geometry (line segment intersection), to get the overlay in a topological structure • Using R-trees as an indexing structure to find intersections of boundaries Combined (multi-way) overlays • Site planning, new construction sites depending on multiple criteria • Another example (earth sciences): Parametric land classification: partitioning of the land based on chosen, classified themes Elevation Annual precipitation Types of rock Overlay: partitioning based on the three themes Analysis point set • Points in an attribute space: statistics, e.g. regression, principal component analysis, dendrograms (area, population, #crimes) (12, 34.000, (14, 45.000, (15, 41.000, (17, 63.000, (17, 66.000, …… …… #crimes #population 34) 31) 14) 82) 79) Analysis point set • Points in geographical space without associated value: clusters, patterns, regularity, spread Actual average nearest neighbor distance versus expected Av. NN. Dist. for this number of points in the region For example: volcanoes in a region; crimes in a city Analysis point set • Points in geographical space with value: up to what distance are measured values “similar” (or correlated)? 11 10 12 13 12 19 14 16 18 21 21 20 22 17 15 16 Analysis point set • Temperature at location x and 5 km away from x is expected to be nearly the same • Elevation (in Switzerland) at location x and 5 km away from x is not expected to be related (even over 1 km), but it is expected to be nearly the same 100 meters away • Other examples: – depth to groundwater – soil humidity – nitrate concentration in the soil Analysis point set • Points in geographical space with value: auto-correlation (~ up to what distance are measured values “similar”, or correlated) 11 10 12 13 12 19 14 16 18 21 21 20 22 17 15 16 n points (n choose 2) pairs; each pair has a distance and a difference in value difference 2 Average difference 2 observed expected 2 difference distance Classify distances and determine average per class distance Observed variogram Average 2 difference observed expected 2 difference Model variogram (linear) σ sill 2 nugget distance range distance Smaller distances more correlation, smaller variance Importance auto-correlation • Descriptive statistic of a data set: describes the distance-dependency of auto-correlation • Interpolation based on data further away than the range is nonsense 11 10 12 16 20 13 14 ?? 21 16 19 18 21 12 22 15 17 range Importance auto-correlation • If the range of a geographic variable is small, more sample point measurements are needed to obtain a good representation of the geographic variable through spatial interpolation influences cost of an analysis or decision procedure, and quality of the outcome of the analysis Analysis subdivision • Nominal subdivision: auto-correlation (~ clustering of equivalent classes) • Ratio subdivision: auto-correlation PvdA CDA VVD Auto-correlation No auto-correlation Auto-correlation nominal subdivision Join count statistic: PvdA CDA VVD • 22 neighbor relations (adjacencies) among 12 provinces • Pr(province A = VVD and province B = VVD) = 4/12 * 3/11 • E(VVD adj. VVD) = 22 * 12/132 = 2 • Reality: 4 times • E(CDA adj. PvdA) = 5.33; reality once 4/12 * 4/11 * 2 * 22 Geographical models • Properties of (geographical) models: – – – – – – selective (simplification, more ideal) approximative analogous (resembles reality) structured (usable, analyzable, transformable) suggestive re-usable (usable in related situations) Geographical models • Functions of models: – – – – – – psychological (for understanding, visualization) organizational (framework for definitions) explanatory constructive (beginning of theories, laws) communicative (transfer scientific ideas) predictive Example: forest fire • Is the Kröller-Müller museum well enough protected against (forest)fire? • Data: proximity fire dept., burning properties of land cover, wind, origin of fire • Model for: fire spread Time neighbor pixel on fire: [1.41 *] b * ws * (1- sh) * (0.2 + cos ) b = burn factor ws = wind speed = angle wind – direction pixel sh = soil humidity Forest fire Wind, speed 3 Forest; burn factor 0.8 Heath; burn factor 0.6 Road; burn factor 0.2 Museum Soil humidity Origin < 3 minutes < 6 minutes < 9 minutes > 9 minutes Forest fire model • Selective: only surface cover, humidity and wind; no temperature, seasonal differences, … • Approximative: surface cover in 4 classes; no distinction in forest type, etc., pixel based so direction discretized • Structured: pixels, simple for definition relations between pixels • Re-usable: approach/model also applies to other locations (and other spread processes) Network analysis • When distance or travel time on a network (graph) is considered • Dijkstra’s shortest path algorithm • Reachability measure for a destination: potential value potential (i ) w c j ij j wj = weight origin j = distance decay parameter c ij = distance cost between origin j and destination i Example reachability • Law Ambulance Transport: every location must be reachable within 15 minutes (from origin of ambulance) Example reachability • Physician’s practice: - optimal practice size: 2350 (minimum: 800) - minimize distance to practice - improve current situation with as few changes as possible Current situation: 16 practices, 30.000 people, average 1875 per practice Computed, improved situation: 13 practices Example in table Original New 16 13 Number of practice locations 9 7 Number of practices < 800 size 2 0 3957 4624 Average travel distance (km) 0,9 1,2 Largest distance (km) 5,2 5,4 Number of practices Number of people > 3 km Analysis elevation model • Landscape shape recognition: - peaks and pits - valleys and ridges - convexity, concavity • Water flow, erosion, watershed regions, landslides, avalanches Spatial data mining • Finding spatial patterns in large spatial data sets – within one spatial data set – across two or more data sets • With time: spatio-temporal data mining Spatial data mining and computation • “Geographic data mining involves the application of computational tools to reveal interesting patterns in objects and events distributed in geographic space and across time” (Miller & Han, 2001) • Large data sets attempt to carefully define interesting patterns (to avoid finding non-interesting patterns) advanced algorithms needed for efficiency Clustering? • Are the people clustered in this room? How do we define a cluster? • In spatial data mining we have objects/ entities with a location given by coordinates • Cluster definitions involve distance between locations Clustering - options • • • • Determine Determine Determine Determine whether clustering occurs the degree of clustering the clusters the largest cluster • Determine the outliers Co-location • Are the men clustered? • Are the women clustered? • Is there a co-location of men and women? co-location pattern Co-location • Like before, we may be interested in – – – – – is there co-location? the degree of co-location the largest co-location the co-locations themselves the objects not involved in co-location Spatio-temporal data • Locations have a time stamp • Interesting patterns involve space and time • Example here: time-stamped point set Trajectory data • Entities with a trajectory (time-stamped motion path) • Interesting patterns involve subgroups with similar heading, expected arrival, joint motion, ... • n entities = trajectories; n = 10 – 100,000 • t time steps; t = 10 – 100,000 input size is nt • m size subgroup (unknown); m = 10 – 100,000 Trajectory data • • • • Migration patterns of animals Trajectories of tornadoes Tracking of (suspect) individuals for security Lifelines of people for social behavior Example pattern in trajectories • What is the location visited by most entities? location = circular region of specified radius Example pattern in trajectories • What is the location visited by most entities? location = circular region of specified radius 4 entities Example pattern in trajectories • What is the location visited by most entities? location = circular region of specified radius 3 entities Example pattern in trajectories • Compute buffer of each trajectory Example pattern in trajectories • Compute buffer of each trajectory • Compute the arrangement of the buffers and the cover count of each cell 1 1 1 2 0 1 1 Example pattern in trajectories • One trajectory has t time stamps; its buffer can be computed in O(t log t) time • All buffers can be computed in O(nt log t) time • The arrangement can be computed in O(nt log (nt) + k) time, where k = O((nt)2) is the complexity of the arrangement • Cell cover counts are determined in O(k) time Example pattern in trajectories • Total: O(nt log (nt) + k) time • If the most visited location is visited by m entities, this is O(nt log (nt) + ntm) • Note: input size is nt ; n entities, each with location at t moments Patterns in entity data Spatial data Spatio-temporal data • n points (locations) • Distance is important • n trajectories, each has t time steps • Distance is timedependent – clustering pattern • Presence of attributes (e.g. male/female): – co-location patterns – flock pattern – meet pattern • Heading and speed are important and are also time-dependent Patterns in trajectories • n trajectories, each with t time steps n polygonal lines with t vertices Patterns in trajectories • Flock and meet patterns: large enough subset that has same “character” during a time interval – close to each other – same direction of motion – ... Flock: changing location Meet: fixed location • Determine the longest duration pattern Patterns in trajectories • Longest flock: given a radius r and subset size m, determine the longest time interval for which the m entities were within each other’s proximity (circle radius r) Time = 0 1 2 3 4 5 6 7 8 m=3 longest flock in [ 1.9 , 6.4 ] Patterns in trajectories • Computing the longest flock is NP-hard • This remains true for radius cr approximations with c < 2 • A radius 2 approximation of the longest flock can be computed in time O(n2 t log n) ... meaning: if the longest flock for radius r has duration , then we surely find a flock of duration for radius 2r Patterns in trajectories fixed radius m=3 flock meet fixed subset Patterns in trajectories • Go into 3D (space-time) for algorithms time 4 3 2 1 0 flock meet Patterns in trajectories Exact radius results flock meet NP-hard O(n2t2 (n2 log n + t)) Patterns in trajectories Approximate radius results flock O(n2t log n) factor 2 meet O((n2t log n) / (m2)) factor 1+ Patterns in trajectories • Flock and meet patterns require algorithms in 3dimensional space (space-time) • Exact algorithms are inefficient only suitable for smaller data sets • Approximation can reduce running time with an order of magnitude Summary • There are many types of geographical analysis, it is the main task of a GIS • Overlay and buffer analysis are most important • Statistics is also very important • Spatial and spatio-temporal data mining gives new types of analysis of geographic data