Download Data Management – Data Structures and Models Part 3

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Operational transformation wikipedia , lookup

Entity–attribute–value model wikipedia , lookup

Big data wikipedia , lookup

Data Protection Act, 2012 wikipedia , lookup

Data center wikipedia , lookup

Data model wikipedia , lookup

Data analysis wikipedia , lookup

Forecasting wikipedia , lookup

Information privacy law wikipedia , lookup

Database model wikipedia , lookup

3D optical data storage wikipedia , lookup

Data vault modeling wikipedia , lookup

Business intelligence wikipedia , lookup

Transcript
Data Management – Data Structures and Models Part 3
Vicki Drake
Department of Earth Sciences
Data modeling: defining real world geographic features in terms of their characteristics and
relationships with each other
Three steps to data modeling and data abstraction
Conceptual data modeling – the scope and requirements of a database –The Data Model
Logical data modeling - user’s view of database defining attributes and relationship – The
Data Structure
Physical data modeling - Specifying internal storage structure and file organization of database- The
File Structure
Data Structure – higher level of data abstraction than information organization
design and implementation of information organization in terms of database models (human-oriented
view of data)
Forms the basis for next level of data abstraction in GIS:File Structure or File Format
File structure is the hardware implementation-oriented view of data
Reflects physical storage of data on a specific computer media (magnetic tapes, disk, etc.)
Descriptive Data Structure – describes the design and implementation of information
organization for non-spatial data (attributes)
Based on Relational Data Structure
Relational Data Structure - the Table (aka: a Relation)
A relation is a collection of tuples corresponding to rows of table
A tuple is made up of attributes corresponding to columns of table
Each relation has a unique identifier called the Primary Field
a column or combination of columns that have no identical values in any two rows
© Vicki Drake
SMC – F,2000
GIS lectures
1
Data Management –
Graphical (Raster) Data Structure
• Graphic Data Structures - Raster Data Structure • space subdivided into regular grids of square cells or polygonal meshes (aka: pixels)
• Location of each cell defined by row/column numbers
• Area of each cells defines spatial resolution of data
• Position of a geographic feature recorded only to nearest pixel
• Different attributes at same cell stored as separate themes/layers (i.e.,soil type, forest cover and
slope themes for raster data pertaining to each in same area)
• Value stored for each cell indicates types of objects, phenomenon or condition found at that
particular location
Values coded as: integers, real numbers, alphabet
– Integer values act as code numbers for “look-up table” or attribute table
Raster Data Storage–
• Raster Data are acquired in that form from remote sensing, photogrammetry or scanning
–
–
–
–
Raster
Raster
Raster
Raster
is a common way of structuring Digital Elevation data (DEM)
is a common format for data interchange
is used for merging with remote sensing or DEMs
algorithms are faster and simpler
Raster Data Storage
Data Management - Raster
•
There are variants to the regular grid raster data structure including:
– Irregular tessellation – TIN (triangulated irregular network)
– Hierarchical tessellation (QuadTree)
– Scan-Line
Data Management - Quadtree
• The Quadtree data model provides more compact raster representation by using a variable-sized
grid cell as large cells are subdivided.
• A coarse resolution (large cells) is used to encode large homogeneous areas, while a finer
resolution (small cells) is used for areas of high spatial variability
© Vicki Drake
SMC – F,2000
GIS lectures
2
• Physical structure of the computer file organized to numbering scheme – cells close together on a
map are also close together in the file
– Identification of nearest neighbor of a selected point or identification of area in which a point is
located is easily accomplished
Hierarchical Tessellation to Quadtree
Data Management - TIN
•
•
In a TIN model, sample points are connected by lines to form triangles
The triangle’s surface is defined by the elevation of the corner points
– Within each triangle, the surface is usually represented by a plane
• TINs are a way to build a surface out of irregularly spaced points and are an alternative
to the regular raster of a DEM (Digital Elevation Model)
Data Management – Scan Order
•
There are various ways of compressing and storing raster data – Scan Order is one way
Data Management – Creating a Raster
•
Lay a grid over a geologic map – code each cell with a value that represents the rock type which
appears in the majority of cell’s area
© Vicki Drake
SMC – F,2000
GIS lectures
3
•
•
Direct entry of each layer cell by cell is simplest
– Process is tedious and time-consuming as each layer can contain millions of cells
Run length encoding more efficient
– Data entered as pairs: first value, then run length
– The array 00011001110011101111 would be entered as: 0 3, 1 2, 0 2, 1 3, 0 2, 1 3, 0 1, 1 4
Creating a Raster (from NCGIA)
© Vicki Drake
SMC – F,2000
GIS lectures
4
Data Management – Raster Data
•
In Run-Length Encoding – adjacent cells along a row that have the same value are treated as a
group termed a “run”.
•
•
The quantity of data needed to represent a “clumped” pattern of spatial variability is reduced
•
•
•
The vector data model provides for precise positioning in space
•
In the Vector Data model, objects are created by connecting points with straight lines and areas
are defined by sets of lines
Repeated values are coded with a more compact data structure
Data Management – Vector Data Structure
Points, lines and polygons are used to represent features of interest
Location of features referenced to map positions using X,Y coordinate system (Cartesian Coordinate
System)
Data Management – Vector Data
•
Once points are entered and geometric lines are created, topology must be “built”
© Vicki Drake
SMC – F,2000
GIS lectures
5
•
•
During topology generation, problems such as “overshoots”, “undershoots”, and “spikes” are edited
Once topology is built, attributes can be keyed in or imported from other digital databases and are
linked to the different objects
Data Management - Vector Example
Data Management – Vector Data
•
Analysis functions with vector GIS not quite the same as with raster GIS
© Vicki Drake
SMC – F,2000
GIS lectures
6
– More operations deal with objects
•
Some operations are more accurate
– Estimates of area based on polygons more accurate than counts of pixels
•
Some operations are slower
– Overlaying layers, setting buffers
•
Some operations are faster
– Finding path through road network
Data Management – Vector Data
•
•
There are many implementations of vector data structures including:
The Spaghetti Model – a direct line-for-line unstructured translation of a paper map
– Common boundaries between adjacent polygons must be recorded twice, once of each polygon
– Spatial relationships between features not recorded – only a collection of coordinate strings
– Efficient for digitally reproducing maps, but not efficient for most types of spatial analyses
Data Management – Spaghetti Model
Data Management – Vector Data
The Hierarchical Structure – a vector data structure developed to facilitate data retrieval by separately
storing points, lines and areas in a logically hierarchical manner
Data Management – Hierarchical Vector Data
• The Topological Model – a vector data structure that retains spatial relationship by explicitly storing
adjacency information
© Vicki Drake
SMC – F,2000
GIS lectures
7
– The logical feature for line and area coverage is a straight line segment (arc), as a series of
points that start and end at a node
– A node is an intersection where two or more arcs meet and individual line segments are defined
by the coordinates of the node
– Topological information is stored by recording
• The from-node and to-node of each line segment (arc)
• The left-polygon and right-polygon (in the direction of the from-node to the to-
node) of each line segment
Data Management - Topological
Data Management – Georelational Data Structure
• The Georelational Data structure was developed to handle geographic data and allows for the
association between spatial (graphical) and non-spatial (descriptive) data
• Many vector-based GIS software packages now have this as the primary data structure
• Spatial and Non-Spatial data are stored in relational tables
– Point, line, and polygon data are stored in separate Feature Attribute Tables (FAT)
– In the FAT, each entity is assigned a unique feature identifier (FID)
– Entities in spatial and non-spatial relational tables are linked by the common FID of entities
Data Management – Georelational Data Structure
Data Management – Vector Data Capabilities
© Vicki Drake
SMC – F,2000
GIS lectures
8
•
Boolean Operations can be performed using Vector Data
– When two maps are overlayed, areas (polygons) that are superimposed have the “and”
condition
– A spatial representation is used to illustrated Boolean operators in the study of logic through
Venn diagrams – the GIS area overlay then is a geographical instance of a Venn diagram
– “XOR” is the “exclusive or” – A xor B means A or B but not both
Data Management – Venn Diagrams
Data Management – Reclassify, Dissolve, Merge
•
Reclassify, dissolve and merge operations are used frequently in working with area objects to
aggregate areas based on attributes
•
A soils map can be produced of major soil types from a layer that has polygons based on finely
defined classification
Data Management – Dissolve…
Data Management – Dissolve…
•
In a city zoning, there is a need to know how many individual landuse zones have been created in
the city and the geographic distribution of them
© Vicki Drake
SMC – F,2000
GIS lectures
9
•
Dissolving boundaries between parcels if the zoning is the same can result in a map showing large
areas of similar zoning classes
Data Management - Overlays
•
•
•
Reclassifying areas by a single attribute or some combination is the first step
Dissolving boundaries between areas of same type means to delete the arc between two polygons
if the relevant attribute are the same in both polygons
Merging the polygons into large objects is to recode the sequence of line segments thtat connect to
form the boundary (I.e., rebuild topology) and assign new identification codes to each new object
Data Management – Overlays…
Data Management – Topological Overlays
• Points, lines and polygons all may be overlayed or combined (previous slides)
• When polygons are overlayed, many new and smaller polygons are created – some of which may
not represent true spatial variations
• These are spurious polygons or sliver polygons and represent a major problem
• In most cases, a GIS will allow the user to set a “tolerance” value for deleting spurious polygons
during overlay operations – usually with a deletion rule based on shape (most slivers tend to be
long and thin)
Vector Advantages and Disadvantages
ADVANTAGES
COMPLEX DATA STRUCTURE
OVERLAYING COMPUTATIONALLY EXPENSIVE
DISPLAY TIME-CONSUMING
SPATIAL MODELING DIFFICULT
DISADVANTAGES
• REPRESENT DISCRETE ENTITIES
– COMPACT DATA STRUCTURE
– TOPOLOGY EASILY DESCRIBED
EASY COORDINATE TRANSFORMATION
RASTER ADVANTAGES/DISADVANTAGES
•
ADVANTAGES
– REPRESENT
CONTINUOUS
© Vicki Drake
PHENOMENA
SMC – F,2000
–
SIMPLE DATA
GIS lectures
STRUCTURE
– SIMPLIFIES SPATIAL
ANALYSIS AND
•
10
DISADVANTAGES
– MUST COMPROMISE
BETWEEN RESOLUTION
AND DATA SIZE
– CRUDE RASTER MAPS OF
POOR CARTOGRAPHIC
QUALITY
– COORDINATE
© Vicki Drake
SMC – F,2000
GIS lectures
11