* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download How is data structured for use in Geographical Information systems
Data Protection Act, 2012 wikipedia , lookup
Entity–attribute–value model wikipedia , lookup
Clusterpoint wikipedia , lookup
Data center wikipedia , lookup
Data analysis wikipedia , lookup
Forecasting wikipedia , lookup
Data vault modeling wikipedia , lookup
3D optical data storage wikipedia , lookup
Information privacy law wikipedia , lookup
GE3180 Essay How is data structured for use in Geographical Information systems? Giving examples show what can be achieved with structured data that cannot be achieved with unstructured data. In this essay I will discuss the nature of data structures in geographical information systems, how data is (and can be) structured and why structure is so important. I intend to use some examples that will illustrate the benefits that structuring data can bring. Using structured data in Geographical Information Systems (GIS) is important for many reasons. GIS systems deal with data that is inherently spatial in nature. Spatial data has some important characteristics and caveats that make it stand out from other data stores. Spatial data represents points in space, which can be references in the x (easting), y (northing) and z-axis (height usually metres above sea level) and also importantly in time. Tied to this absolute location of features is another important aspect of spatial data, namely that objects appear in space. The purpose of GIS systems is to provide us with a representation of these objects, that is, to generalise and abstract them so that they might be analysed. This is not as simple an operation as it might seem. There are many problems of scale and placement. For example, in a vector-based map, while at a high resolution a house might be displayed as an area, in a lower resolution it might not appear at all, since it is no longer significant. Spatial data tends to be either very sparse or available in massive quantities, and this poses significant problems of organisation and processing. There are two important types of data types used by GIS systems, raster and vector images. Raster maps are built up from grids of regular sized pixels, whereas vector maps are built from points, lines and areas. Raster maps are the less useful of the two, as they cannot be easily scaled nor have topological data added easily. The inherent nature of using raster images means some detail is lost as features less than a pixel in size cannot be displayed. The numerical calculation required to calculate either the intersection or the join of 2 or more polygons is quite intensive; use of raster structures will cut down this overhead. When using vector maps it is much easier to change the scale and add attribute data to the building Page 1 of 9 9904279B GE3180 Essay blocks, points, lines and areas. This ability to reference attributes of the components means that complicated selection, boundary and other algorithms may be used to interpret the data. In answering this question it is important to understand the nature of data in order to place it in context with spatial data. There are different subsets of data structure, firstly, the classic computational structure of files and records. The second type of data structure important in the context of GIS systems is topographic or spatial structure. Does the data contain geo-coded references and markers for use with GIS packages? In a sense all digital data is structured, since it lives on structured filing systems. Data is searchable and manipulable. However, anyone who has ever spent time searching for a lost document will tell you that this sort of structure is not very forgiving. The size and multidimensional nature of spatial data requires a more comprehensive structure than most other data types. To be useful in a GIS system the data needs to be searched, selected and displayed. To do this in a reasonable amount of time, the data needs to have a standard structure. Filters can be built for unstructured data but these have to be very complex and will, therefore, be slower. Structured data often requires less storage space, but while structuring adds an overhead, it is usually less significant than having to deal with many unpredictable and undocumented fields. As well as structure in the classical computational sense, spatial data requires structuring of topological feature sets. For analysis of spatial data users will often want to re-scale and select certain features, which is possible if the data is topographically structured. Scaling of vector images is possible without the use of topographic information. This is useful for some applications but more useful is the ability to select certain features or groups of features. If topographic information is added into the data structure it is possible to analyse the information. The next problem is to deal with the storage. The most common method of storing large volumes of data in a searchable and logical structure uses a database. There are several different types of database, flat file, relational, object relational and object orientated. Page 2 of 9 9904279B GE3180 Essay Relational databases are currently the dominant database type. In a relational database information is stored in tables whereby each table is a grouping of related data stored in fields and records, a record being a row of field data. Tables can be linked by using key fields. A key field is a unique field which can identify a Image 1 Example Relational Database, showing the link between two tables particular record. Relational databases are based upon relational algebra and therefore the database management has a strong mathematical basis. Relational databases allow the creation of indexes to greatly speed up searches of data. Until recently relational databases were not considered the best method of handling spatial data. The relational schema was not good at sorting map objects, so GIS packages such as Arc View used a proprietary database to store mapping information and a relational database to store attribute and meta data about the map. This made data exchange difficult especially for government agencies, which found they had to keep multiple copies of the same data in different formats. Modern database servers such as Oracle are now much more capable of storing spatial data in a structured way. Binary Large Objects (BLOBS) allow large objects to be accessed in a useful way through a traditional relational database. Page 3 of 9 9904279B GE3180 Essay According to Rigaur1 some of the important aspects of a spatial DBMS are Logical data representation extended to geometric data The query Language must integrate new functions in order to capture the rich set of possible operators applicable to geometric objects. Efficient data access is essential, B-Trees are not appropriate Join algorithm problematic Rigaur makes another important point in that relational databases store data with no notion of ordered lists. Therefore in cases where the order of the results is important the data needs to be stored in another field. Object orientated is a method of representing the real world in abstract conceptual models. Real world entities are modelled as objects. Objects can have attributes and actions associated with them. Also objects allow inheritance so you can have a generic top level object (tree for example), then more specific objects which inherit the attributes and actions of the higher level one (for example oak tree is an instance of a tree). Object orientation is a good method of structuring spatial data. However it is a relatively new concept and takes time to be adopted. Object Relational databases attempt to build on the existing framework and knowledge of relational databases whilst incorporating some of the new features from object orientation. They are more optimised than Flat File or hierarchical structures in which data is stored in a file or series of files. Access is generally slow, as there is the overhead of opening and closing the files through the file system. Flat filing systems are also more difficult to index. Flat file structures are not well suited to storing spatial data. Image 2 Hierarchical database schema 1 Rigour P 2002 pp 239-240 Page 4 of 9 9904279B GE3180 Essay An alternative (or complimentary) method to using a database is to use a mark-up or meta language, which Longley describes as “Strictly defined, metadata are data about data”2. Standardised General Markup Language (SGML) is an older standard defined to help businesses share, describe and explain data more effectively. Extensible Mark-up Language (XML) is a development of SGML. XML is a Meta language, which uses tags to describe and store both data and the structure of the data. Also to facilitate the exchange of information XML supports Unicode and can, therefore, be built using different alphabets (many databases only have native support for the Latin alphabet). An example XML document. <?xml version="1.0"?> <!—Demonstration XML DOC --> <SpatialData id = “houses”> <house number=”11”> <postcode>SK71RQ</postcode> <latitude baring = ”north”>53.33904</latitude> <longitude baring = ”west”>2.17695</longditude> </house> </SpatialData> In the example document I have created a tag called spatial data in which is nested a house tag with various data about a house’s location. On its own it is a fairly useless piece of information. However if there are many other documents conforming to the same tag format they could be parsed and interrogated in much the same way as a database. Since an increasing number of different agencies are now collecting data, the exchange of information is becoming more important. Different agencies want to link their spatial data sets together to give greater depth when using GIS for spatial analysis. When using unstructured data, this exchange is extremely difficult, as each different user must invent a way to make the data useful. If the data is structured, XML documents can be run against schemas or Document Type Definitions (DTD’s) to make sure they conform to a specific standard. Metadata will have another benefit. As well as adding structure, it will make it possible to conduct searches more effectively. New technologies XML and Scalable Vector Graphics SVG will allow images to have metadata contained within the 2 Longely 2002 pp 154 Page 5 of 9 9904279B GE3180 Essay image, and this will mean that it should be possible to conduct a free text search to return appropriate image results. For GIS systems this sort of structure means libraries or repositories of GIS data could be searched for relevant information. Longley refers to this as Collection Level Metadata. “Collection level metadata (CLM) defines the information needed to make an intelligent choice, based on knowledge of each collections contents3” Searching using the make up of the document is far more efficient and should return more relevant results. There are times when structuring GIS data is neither necessary nor possible to implement. Historically there were limitations on the amount of data that could be stored and a limit to the amount of processing that could be carried out. However modern computers have a tremendous amount of processing power, even handheld devices such as module phones are much more powerful than the computers used to send the Apollo spacecraft to the Moon. A situation in which structuring data can be deemed less important is a scenario where the amount of information is small enough to be manipulated from its raw form. In this case the extra overhead of structuring might cost more in terms of effort than then the benefits of structure would return. Data structuring still has a major feature with which it cannot cope well, that is, temporal changes - “changes in land use over time, changes to a pipe network, rainfall records or stream flow records. These are not well represented in current GIS technology, but newer object oriented GIS should make this more readily available”4. Object orientation and Object relational databases will have the capability to add time stamps and or valid time periods so that temporal changes can be recorded with more flexibility. Unstructured data could not handle temporal change by its 3 Longely et al 2002 pp 157 4 Elgy web-site Page 6 of 9 9904279B GE3180 Essay very nature; using an unordered technology it would be impossible to show changes over time. To highlight the difference in usefulness between products that use structured and unstructured data a good example would be Ordnance Survey’s Mastermap. Mastermap uses structured data in the form of graphical mark-up language (GML – this is a graphical standard based on XML). This gives a vector-based map, which also stores topographic data. So rather than just a vector area, it stores the position of the area and its neighbouring components. Contrast this product to the old Landline product, which again was a vector-based version of Ordnance Survey maps. However Landline did not include topographical data and was merely a series of points, lines and areas. This lack of data meant that in the GIS context the data would be considered to be unstructured. Without the topographic properties its usefulness is severely limited. Multimap allows the user to run complicated selection, removal and boundary algorithms, which are possible, because the system has details of the relative placement of the various objects. In Landline it is difficult to do these kinds of enquiries because the system has no notion of relative placement. Lineline maps were mostly used as a backdrop or wallpaper to add value to other studies while Multimap Images can be integrated with the data. Federal Geographic Data Committee (FGDC), an federal commission in the United States set up to encourage the use Geographical Information Systems and the exchange of spatial data, drew up a framework for data exchange - Content Standard for Digital Geospatial Metadata CSDGM.5 “The standard was developed from the perspective of defining the information required by a prospective user to determine the availability of a set of geospatial data, to determine the fitness the set of geospatial data for an intended use, to determine the means of accessing the set of geospatial data, and to successfully transfer the set of geospatial data. As such, the standard establishes the names of data elements and compound elements to be used for these purposes, the definitions of these data elements and compound elements, and information about the values that are to be provided for the data elements.” 5 From the web-site of the Federal Geographic Data Committee Page 7 of 9 9904279B GE3180 Essay The original version of the CSDGM framework used SGML (see http://www.fgdc.gov/metadata/contstan.html). Version 2 has now been released to support XML / GML mark-up languages, and is in essence a framework for information exchange. One of the first studies to use the FGDC’s CSDGM framework was a group of researchers in America plotting the location of boreholes. Information on the various boreholes (for aquifers) was patchy, with some sites well documented, others not so. By using the XML based structure the team found they could record all the details in standard form for each site. It then became possible to compare borehole data with other data, for example overlaying borehole data onto maps showing levels of heavy metal contamination in order to see which water supplies might be at risk from contamination. The survey was so successful that its use has been extended to parts of Africa where water contaminated with heavy metals was being used to irrigate crops. The governments of Australia and New Zealand have devised a framework to use XML to structure spatial data. They call the department the Australia and New Zealand Land Information Commission ANZLIC. They released a guideline for handling spatial data Australian Spatial Data Infrastructure (ASDI) that defines a meta-data structure using SGML/XML. Structuring data affords a whole host of benefits including improved speed, adding topographical features, which enable a richer set of GIS operations to be carried out, such as boundary enquiries, generalisation, scaling and selection. In fact so many GIS operations are either enabled or vastly improved when the data is structured that it is an integral part of a modern GIS package. Data structures also enable data to be pooled more effectively as information exchange is an easier and less time consuming operation. With the improvements in modern hardware, increases in processor speed and storage there seems scant reason to use anything other than structured data, as the benefits seem to far outweigh the drawbacks. Page 8 of 9 9904279B GE3180 Essay Bibliography Australia and New Zealand Land Information Council “Spatial Data Infrastructure“. http://www.anzlic.org.au/asdi/asdimain.htm Elgy J, “GIS data structures”, http://wwwusers.aston.ac.uk/~elgyj/data_structures.htm Aston University, Birmingham Federal Geographic Data Committee, “Federal Geographic Data Committee” http://www.fgdc.gov Houlding SW (2001), “XML and opportunity for <meaningful> data structure in Geosciences”, from Computation and Geoscience Vol 27 Number 7 Aug 2001. International Association for Mathematical Geology, http://www.iamg.org Longley PA et al (1999), “GIS Management Issues and applications 2nd Edition”, Wiley Longley PA, Goodchild MF, Maguire DJ, Rhind DW (2002), “Geographic Information Systems and Science”, Wiley Openshaw S & Abrahart J (2000), “Geocomputation”, Taylor and Francis. Ordinance Survey, “Ordinance Survey Online”, http://www.ordsvy.gov.uk/ Purdue University “Basic Concepts of Spatial (and Aspatial)”, Data Structure http://pasture.ecn.purdue.edu/~caagis/tgis/course/gistrc.html Rigour P, Scholl M & Voisard A (2002), “Spatial Databases with application to GIS”, Morgan Kaufman Page 9 of 9 9904279B