Download Data Management – Data Structures and Models Part 3

Data Management – Data Structures and Models Part 3 Vicki Drake Department of Earth Sciences Data modeling: defining real world geographic features in terms of their characteristics and relationships with each other Three steps to data modeling and data abstraction Conceptual data modeling – the scope and requirements of a database –The Data Model Logical data modeling - user’s view of database defining attributes and relationship – The Data Structure Physical data modeling - Specifying internal storage structure and file organization of database- The File Structure Data Structure – higher level of data abstraction than information organization design and implementation of information organization in terms of database models (human-oriented view of data) Forms the basis for next level of data abstraction in GIS:File Structure or File Format File structure is the hardware implementation-oriented view of data Reflects physical storage of data on a specific computer media (magnetic tapes, disk, etc.) Descriptive Data Structure – describes the design and implementation of information organization for non-spatial data (attributes) Based on Relational Data Structure Relational Data Structure - the Table (aka: a Relation) A relation is a collection of tuples corresponding to rows of table A tuple is made up of attributes corresponding to columns of table Each relation has a unique identifier called the Primary Field a column or combination of columns that have no identical values in any two rows © Vicki Drake SMC – F,2000 GIS lectures 1 Data Management – Graphical (Raster) Data Structure • Graphic Data Structures - Raster Data Structure • space subdivided into regular grids of square cells or polygonal meshes (aka: pixels) • Location of each cell defined by row/column numbers • Area of each cells defines spatial resolution of data • Position of a geographic feature recorded only to nearest pixel • Different attributes at same cell stored as separate themes/layers (i.e.,soil type, forest cover and slope themes for raster data pertaining to each in same area) • Value stored for each cell indicates types of objects, phenomenon or condition found at that particular location Values coded as: integers, real numbers, alphabet – Integer values act as code numbers for “look-up table” or attribute table Raster Data Storage– • Raster Data are acquired in that form from remote sensing, photogrammetry or scanning – – – – Raster Raster Raster Raster is a common way of structuring Digital Elevation data (DEM) is a common format for data interchange is used for merging with remote sensing or DEMs algorithms are faster and simpler Raster Data Storage Data Management - Raster • There are variants to the regular grid raster data structure including: – Irregular tessellation – TIN (triangulated irregular network) – Hierarchical tessellation (QuadTree) – Scan-Line Data Management - Quadtree • The Quadtree data model provides more compact raster representation by using a variable-sized grid cell as large cells are subdivided. • A coarse resolution (large cells) is used to encode large homogeneous areas, while a finer resolution (small cells) is used for areas of high spatial variability © Vicki Drake SMC – F,2000 GIS lectures 2 • Physical structure of the computer file organized to numbering scheme – cells close together on a map are also close together in the file – Identification of nearest neighbor of a selected point or identification of area in which a point is located is easily accomplished Hierarchical Tessellation to Quadtree Data Management - TIN • • In a TIN model, sample points are connected by lines to form triangles The triangle’s surface is defined by the elevation of the corner points – Within each triangle, the surface is usually represented by a plane • TINs are a way to build a surface out of irregularly spaced points and are an alternative to the regular raster of a DEM (Digital Elevation Model) Data Management – Scan Order • There are various ways of compressing and storing raster data – Scan Order is one way Data Management – Creating a Raster • Lay a grid over a geologic map – code each cell with a value that represents the rock type which appears in the majority of cell’s area © Vicki Drake SMC – F,2000 GIS lectures 3 • • Direct entry of each layer cell by cell is simplest – Process is tedious and time-consuming as each layer can contain millions of cells Run length encoding more efficient – Data entered as pairs: first value, then run length – The array 00011001110011101111 would be entered as: 0 3, 1 2, 0 2, 1 3, 0 2, 1 3, 0 1, 1 4 Creating a Raster (from NCGIA) © Vicki Drake SMC – F,2000 GIS lectures 4 Data Management – Raster Data • In Run-Length Encoding – adjacent cells along a row that have the same value are treated as a group termed a “run”. • • The quantity of data needed to represent a “clumped” pattern of spatial variability is reduced • • • The vector data model provides for precise positioning in space • In the Vector Data model, objects are created by connecting points with straight lines and areas are defined by sets of lines Repeated values are coded with a more compact data structure Data Management – Vector Data Structure Points, lines and polygons are used to represent features of interest Location of features referenced to map positions using X,Y coordinate system (Cartesian Coordinate System) Data Management – Vector Data • Once points are entered and geometric lines are created, topology must be “built” © Vicki Drake SMC – F,2000 GIS lectures 5 • • During topology generation, problems such as “overshoots”, “undershoots”, and “spikes” are edited Once topology is built, attributes can be keyed in or imported from other digital databases and are linked to the different objects Data Management - Vector Example Data Management – Vector Data • Analysis functions with vector GIS not quite the same as with raster GIS © Vicki Drake SMC – F,2000 GIS lectures 6 – More operations deal with objects • Some operations are more accurate – Estimates of area based on polygons more accurate than counts of pixels • Some operations are slower – Overlaying layers, setting buffers • Some operations are faster – Finding path through road network Data Management – Vector Data • • There are many implementations of vector data structures including: The Spaghetti Model – a direct line-for-line unstructured translation of a paper map – Common boundaries between adjacent polygons must be recorded twice, once of each polygon – Spatial relationships between features not recorded – only a collection of coordinate strings – Efficient for digitally reproducing maps, but not efficient for most types of spatial analyses Data Management – Spaghetti Model Data Management – Vector Data The Hierarchical Structure – a vector data structure developed to facilitate data retrieval by separately storing points, lines and areas in a logically hierarchical manner Data Management – Hierarchical Vector Data • The Topological Model – a vector data structure that retains spatial relationship by explicitly storing adjacency information © Vicki Drake SMC – F,2000 GIS lectures 7 – The logical feature for line and area coverage is a straight line segment (arc), as a series of points that start and end at a node – A node is an intersection where two or more arcs meet and individual line segments are defined by the coordinates of the node – Topological information is stored by recording • The from-node and to-node of each line segment (arc) • The left-polygon and right-polygon (in the direction of the from-node to the to- node) of each line segment Data Management - Topological Data Management – Georelational Data Structure • The Georelational Data structure was developed to handle geographic data and allows for the association between spatial (graphical) and non-spatial (descriptive) data • Many vector-based GIS software packages now have this as the primary data structure • Spatial and Non-Spatial data are stored in relational tables – Point, line, and polygon data are stored in separate Feature Attribute Tables (FAT) – In the FAT, each entity is assigned a unique feature identifier (FID) – Entities in spatial and non-spatial relational tables are linked by the common FID of entities Data Management – Georelational Data Structure Data Management – Vector Data Capabilities © Vicki Drake SMC – F,2000 GIS lectures 8 • Boolean Operations can be performed using Vector Data – When two maps are overlayed, areas (polygons) that are superimposed have the “and” condition – A spatial representation is used to illustrated Boolean operators in the study of logic through Venn diagrams – the GIS area overlay then is a geographical instance of a Venn diagram – “XOR” is the “exclusive or” – A xor B means A or B but not both Data Management – Venn Diagrams Data Management – Reclassify, Dissolve, Merge • Reclassify, dissolve and merge operations are used frequently in working with area objects to aggregate areas based on attributes • A soils map can be produced of major soil types from a layer that has polygons based on finely defined classification Data Management – Dissolve… Data Management – Dissolve… • In a city zoning, there is a need to know how many individual landuse zones have been created in the city and the geographic distribution of them © Vicki Drake SMC – F,2000 GIS lectures 9 • Dissolving boundaries between parcels if the zoning is the same can result in a map showing large areas of similar zoning classes Data Management - Overlays • • • Reclassifying areas by a single attribute or some combination is the first step Dissolving boundaries between areas of same type means to delete the arc between two polygons if the relevant attribute are the same in both polygons Merging the polygons into large objects is to recode the sequence of line segments thtat connect to form the boundary (I.e., rebuild topology) and assign new identification codes to each new object Data Management – Overlays… Data Management – Topological Overlays • Points, lines and polygons all may be overlayed or combined (previous slides) • When polygons are overlayed, many new and smaller polygons are created – some of which may not represent true spatial variations • These are spurious polygons or sliver polygons and represent a major problem • In most cases, a GIS will allow the user to set a “tolerance” value for deleting spurious polygons during overlay operations – usually with a deletion rule based on shape (most slivers tend to be long and thin) Vector Advantages and Disadvantages ADVANTAGES COMPLEX DATA STRUCTURE OVERLAYING COMPUTATIONALLY EXPENSIVE DISPLAY TIME-CONSUMING SPATIAL MODELING DIFFICULT DISADVANTAGES • REPRESENT DISCRETE ENTITIES – COMPACT DATA STRUCTURE – TOPOLOGY EASILY DESCRIBED EASY COORDINATE TRANSFORMATION RASTER ADVANTAGES/DISADVANTAGES • ADVANTAGES – REPRESENT CONTINUOUS © Vicki Drake PHENOMENA SMC – F,2000 – SIMPLE DATA GIS lectures STRUCTURE – SIMPLIFIES SPATIAL ANALYSIS AND • 10 DISADVANTAGES – MUST COMPROMISE BETWEEN RESOLUTION AND DATA SIZE – CRUDE RASTER MAPS OF POOR CARTOGRAPHIC QUALITY – COORDINATE © Vicki Drake SMC – F,2000 GIS lectures 11

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Data Management – Data Structures and Models Part 3