Download How is data structured for use in Geographical Information systems

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Big data wikipedia , lookup

Data Protection Act, 2012 wikipedia , lookup

Entity–attribute–value model wikipedia , lookup

Clusterpoint wikipedia , lookup

Data center wikipedia , lookup

Data model wikipedia , lookup

Data analysis wikipedia , lookup

Forecasting wikipedia , lookup

Data vault modeling wikipedia , lookup

3D optical data storage wikipedia , lookup

Information privacy law wikipedia , lookup

Business intelligence wikipedia , lookup

Database model wikipedia , lookup

Transcript
GE3180 Essay
How is data structured for use in Geographical Information
systems? Giving examples show what can be achieved with
structured data that cannot be achieved with unstructured
data.
In this essay I will discuss the nature of data structures in geographical information
systems, how data is (and can be) structured and why structure is so important. I
intend to use some examples that will illustrate the benefits that structuring data
can bring.
Using structured data in Geographical Information Systems (GIS) is important for
many reasons. GIS systems deal with data that is inherently spatial in nature.
Spatial data has some important characteristics and caveats that make it stand out
from other data stores. Spatial data represents points in space, which can be
references in the x (easting), y (northing) and z-axis (height usually metres above
sea level) and also importantly in time. Tied to this absolute location of features is
another important aspect of spatial data, namely that objects appear in space. The
purpose of GIS systems is to provide us with a representation of these objects, that
is, to generalise and abstract them so that they might be analysed. This is not as
simple an operation as it might seem. There are many problems of scale and
placement. For example, in a vector-based map, while at a high resolution a house
might be displayed as an area, in a lower resolution it might not appear at all,
since it is no longer significant. Spatial data tends to be either very sparse or
available in massive quantities, and this poses significant problems of organisation
and processing.
There are two important types of data types used by GIS systems, raster and vector
images. Raster maps are built up from grids of regular sized pixels, whereas vector
maps are built from points, lines and areas. Raster maps are the less useful of the
two, as they cannot be easily scaled nor have topological data added easily. The
inherent nature of using raster images means some detail is lost as features less
than a pixel in size cannot be displayed. The numerical calculation required to
calculate either the intersection or the join of 2 or more polygons is quite
intensive; use of raster structures will cut down this overhead. When using vector
maps it is much easier to change the scale and add attribute data to the building
Page 1 of 9
9904279B
GE3180 Essay
blocks, points, lines and areas. This ability to reference attributes of the
components means that complicated selection, boundary and other algorithms may
be used to interpret the data.
In answering this question it is important to understand the nature of data in order
to place it in context with spatial data.
There are different subsets of data structure, firstly, the classic computational
structure of files and records. The second type of data structure important in the
context of GIS systems is topographic or spatial structure. Does the data contain
geo-coded references and markers for use with GIS packages?
In a sense all digital data is structured, since it lives on structured filing systems.
Data is searchable and manipulable. However, anyone who has ever spent time
searching for a lost document will tell you that this sort of structure is not very
forgiving. The size and multidimensional nature of spatial data requires a more
comprehensive structure than most other data types. To be useful in a GIS system
the data needs to be searched, selected and displayed. To do this in a reasonable
amount of time, the data needs to have a standard structure. Filters can be built
for unstructured data but these have to be very complex and will, therefore, be
slower. Structured data often requires less storage space, but while structuring
adds an overhead, it is usually less significant than having to deal with many
unpredictable and undocumented fields. As well as structure in the classical
computational sense, spatial data requires structuring of topological feature sets.
For analysis of spatial data users will often want to re-scale and select certain
features, which is possible if the data is topographically structured. Scaling of
vector images is possible without the use of topographic information. This is useful
for some applications but more useful is the ability to select certain features or
groups of features.
If topographic information is added into the data structure it is possible to analyse
the information. The next problem is to deal with the storage. The most common
method of storing large volumes of data in a searchable and logical structure uses a
database. There are several different types of database, flat file, relational, object
relational and object orientated.
Page 2 of 9
9904279B
GE3180 Essay
Relational databases are currently the dominant database type. In a relational
database information is stored in tables whereby each table is a grouping of
related data stored in fields and records, a record being a row of field data.
Tables can be linked by using key fields. A key field is a unique field which can
identify a
Image 1 Example Relational Database, showing the link between two tables
particular record. Relational databases are based upon relational algebra and
therefore the database management has a strong mathematical basis. Relational
databases allow the creation of indexes to greatly speed up searches of data. Until
recently relational databases were not considered the best method of handling
spatial data. The relational schema was not good at sorting map objects, so GIS
packages such as Arc View used a proprietary database to store mapping
information and a relational database to store attribute and meta data about the
map. This made data exchange difficult especially for government agencies, which
found they had to keep multiple copies of the same data in different formats.
Modern database servers such as Oracle are now much more capable of storing
spatial data in a structured way. Binary Large Objects (BLOBS) allow large objects
to be accessed in a useful way through a traditional relational database.
Page 3 of 9
9904279B
GE3180 Essay
According to Rigaur1 some of the important aspects of a spatial DBMS are

Logical data representation extended to geometric data

The query Language must integrate new functions in order to capture the rich
set of possible operators applicable to geometric objects.

Efficient data access is essential, B-Trees are not appropriate

Join algorithm problematic
Rigaur makes another important point in that relational databases store data with
no notion of ordered lists. Therefore in cases where the order of the results is
important the data needs to be stored in another field.
Object orientated is a method of representing the real world in abstract
conceptual models. Real world entities are modelled as objects. Objects can have
attributes and actions associated with them. Also objects allow inheritance so you
can have a generic top level object (tree for example), then more specific objects
which inherit the attributes and actions of the higher level one (for example oak
tree is an instance of a tree). Object orientation is a good method of structuring
spatial data. However it is a relatively new concept and takes time to be adopted.
Object Relational databases attempt to build on the existing framework and
knowledge of relational databases whilst incorporating some of the new features
from object orientation. They are more optimised than Flat File or hierarchical
structures in which data is stored in a file or series of files. Access is generally
slow, as there is the overhead of opening and closing the files through the file
system. Flat filing systems are also more difficult to index. Flat file structures are
not well suited to storing spatial data.
Image 2 Hierarchical database schema
1
Rigour P 2002 pp 239-240
Page 4 of 9
9904279B
GE3180 Essay
An alternative (or complimentary) method to using a database is to use a mark-up
or meta language, which Longley describes as “Strictly defined, metadata are data
about data”2. Standardised General Markup Language (SGML) is an older standard
defined to help businesses share, describe and explain data more effectively.
Extensible Mark-up Language (XML) is a development of SGML. XML is a Meta
language, which uses tags to describe and
store both data and the structure of
the data. Also to facilitate the exchange of information XML supports Unicode and
can, therefore, be built using different alphabets (many databases only have native
support for the Latin alphabet).
An example XML document.
<?xml version="1.0"?>
<!—Demonstration XML DOC -->
<SpatialData id = “houses”>
<house number=”11”>
<postcode>SK71RQ</postcode>
<latitude baring = ”north”>53.33904</latitude>
<longitude baring = ”west”>2.17695</longditude>
</house>
</SpatialData>
In the example document I have created a tag called spatial data in which is nested
a house tag with various data about a house’s location. On its own it is a fairly
useless piece of information. However if there are many other documents
conforming to the same tag format they could be parsed and interrogated in much
the same way as a database.
Since an increasing number of different agencies are now collecting data, the
exchange of information is becoming more important. Different agencies want to
link their spatial data sets together to give greater depth when using GIS for spatial
analysis. When using unstructured data, this exchange is extremely difficult, as
each different user must invent a way to make the data useful. If the data is
structured, XML documents can be run against schemas or Document Type
Definitions (DTD’s) to make sure they conform to a specific standard.
Metadata will have another benefit. As well as adding structure, it will make it
possible to conduct searches more effectively. New technologies XML and Scalable
Vector Graphics SVG will allow images to have metadata contained within the
2
Longely 2002 pp 154
Page 5 of 9
9904279B
GE3180 Essay
image, and this will mean that it should be possible to conduct a free text search
to return appropriate image results. For GIS systems this sort of structure means
libraries or repositories of GIS data could be searched for relevant information.
Longley refers to this as Collection Level Metadata.
“Collection level metadata (CLM) defines the information needed to
make an intelligent choice, based on knowledge of each collections
contents3”
Searching using the make up of the document is far more efficient and should
return more relevant results.
There are times when structuring GIS data is neither necessary nor possible to
implement. Historically there were limitations on the amount of data that could be
stored and a limit to the amount of processing that could be carried out. However
modern computers have a tremendous amount of processing power, even handheld
devices such as module phones are much more powerful than the computers used
to send the Apollo spacecraft to the Moon.
A situation in which structuring data can be deemed less important is a scenario
where the amount of information is small enough to be manipulated from its raw
form. In this case the extra overhead of structuring might cost more in terms of
effort than then the benefits of structure would return.
Data structuring still has a major feature with which it cannot cope well, that is,
temporal changes - “changes in land use over time, changes to a pipe network,
rainfall records or stream flow records. These are not well represented in current
GIS technology, but newer object oriented GIS should make this more readily
available”4.
Object orientation and Object relational databases will have the capability to add
time stamps and or valid time periods so that temporal changes can be recorded
with more flexibility. Unstructured data could not handle temporal change by its
3
Longely et al 2002 pp 157
4
Elgy web-site
Page 6 of 9
9904279B
GE3180 Essay
very nature; using an unordered technology it would be impossible to show changes
over time.
To highlight the difference in usefulness between products that use structured and
unstructured data a good example would be Ordnance Survey’s Mastermap.
Mastermap uses structured data in the form of graphical mark-up language (GML –
this is a graphical standard based on XML). This gives a vector-based map, which
also stores topographic data. So rather than just a vector area, it stores the
position of the area and its neighbouring components. Contrast this product to the
old Landline product, which again was a vector-based version of Ordnance Survey
maps. However Landline did not include topographical data and was merely a
series of points, lines and areas. This lack of data meant that in the GIS context
the data would be considered to be unstructured. Without the topographic
properties its usefulness is severely limited. Multimap allows the user to run
complicated selection, removal and boundary algorithms, which are possible,
because the system has details of the relative placement of the various objects. In
Landline it is difficult to do these kinds of enquiries because the system has no
notion of relative placement. Lineline maps were mostly used as a backdrop or
wallpaper to add value to other studies while Multimap Images can be integrated
with the data.
Federal Geographic Data Committee (FGDC), an federal commission in the United
States set up to encourage the use Geographical Information Systems and the
exchange of spatial data, drew up a framework for data exchange - Content
Standard for Digital Geospatial Metadata CSDGM.5
“The standard was developed from the perspective of defining the information
required by a prospective user to determine the availability of a set of geospatial
data, to determine the fitness the set of geospatial data for an intended use, to
determine the means of accessing the set of geospatial data, and to successfully
transfer the set of geospatial data. As such, the standard establishes the names of
data elements and compound elements to be used for these purposes, the
definitions of these data elements and compound elements, and information about
the values that are to be provided for the data elements.”
5
From the web-site of the Federal Geographic Data Committee
Page 7 of 9
9904279B
GE3180 Essay
The
original
version
of
the
CSDGM
framework
used
SGML
(see
http://www.fgdc.gov/metadata/contstan.html). Version 2 has now been released
to support XML / GML mark-up languages, and is in essence a framework for
information exchange.
One of the first studies to use the FGDC’s CSDGM framework was a group of
researchers in America plotting the location of boreholes. Information on the
various boreholes (for aquifers) was patchy, with some sites well documented,
others not so. By using the XML based structure the team found they could record
all the details in standard form for each site. It then became possible to compare
borehole data with other data, for example overlaying borehole data onto maps
showing levels of heavy metal contamination in order to see which water supplies
might be at risk from contamination. The survey was so successful that its use has
been extended to parts of Africa where water contaminated with heavy metals was
being used to irrigate crops.
The governments of Australia and New Zealand have devised a framework to use
XML to structure spatial data. They call the department the Australia and New
Zealand Land Information Commission ANZLIC. They released a guideline for
handling spatial data Australian Spatial Data Infrastructure (ASDI) that defines a
meta-data structure using SGML/XML.
Structuring data affords a whole host of benefits including improved speed, adding
topographical features, which enable a richer set of GIS operations to be carried
out, such as boundary enquiries, generalisation, scaling and selection. In fact so
many GIS operations are either enabled or vastly improved when the data is
structured that it is an integral part of a modern GIS package. Data structures also
enable data to be pooled more effectively as information exchange is an easier and
less time consuming operation. With the improvements in modern hardware,
increases in processor speed and storage there seems scant reason to use anything
other than structured data, as the benefits seem to far outweigh the drawbacks.
Page 8 of 9
9904279B
GE3180 Essay
Bibliography
Australia and New Zealand Land Information Council “Spatial Data Infrastructure“.
http://www.anzlic.org.au/asdi/asdimain.htm
Elgy J, “GIS data structures”, http://wwwusers.aston.ac.uk/~elgyj/data_structures.htm Aston University, Birmingham
Federal Geographic Data Committee, “Federal Geographic Data Committee”
http://www.fgdc.gov
Houlding SW (2001), “XML and opportunity for <meaningful> data structure in
Geosciences”, from Computation and Geoscience Vol 27 Number 7 Aug 2001.
International Association for Mathematical Geology, http://www.iamg.org
Longley PA et al (1999), “GIS Management Issues and applications 2nd Edition”,
Wiley
Longley PA, Goodchild MF, Maguire DJ, Rhind DW (2002), “Geographic Information
Systems and Science”, Wiley
Openshaw S & Abrahart J (2000), “Geocomputation”, Taylor and Francis.
Ordinance Survey, “Ordinance Survey Online”, http://www.ordsvy.gov.uk/
Purdue University “Basic Concepts of Spatial (and Aspatial)”, Data Structure
http://pasture.ecn.purdue.edu/~caagis/tgis/course/gistrc.html
Rigour P, Scholl M & Voisard A (2002), “Spatial Databases with application to GIS”,
Morgan Kaufman
Page 9 of 9
9904279B