Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Guidelines for Geographic Data Intended for the GISCO Reference Database Lovell Johns Ltd, Witney, UK Copyright © 2005 Eurostat 3rd November 2005 Contents Introduction .........................................................................................................3 i ii iii iv v vi vii Introduction to Eurostat................................................................................................................ 3 What is GISCO? .......................................................................................................................... 3 INSPIRE and the GISCO project ................................................................................................. 4 GISCO Reference Database ....................................................................................................... 5 Contact Information ..................................................................................................................... 6 Purpose of guidelines .................................................................................................................. 6 Intended audience ....................................................................................................................... 6 1 Geographic Guidelines: Normative Section.....................................................7 1.1 1.1.1 1.1.2 1.2 1.2.1 1.2.2 1.2.3 1.2.4 1.2.5 1.2.6 1.2.7 1.3 1.3.1 1.3.2 1.3.3 1.3.4 1.3.5 1.4 1.4.1 1.4.2 1.5 1.5.1 1.5.2 1.5.3 1.6 1.6.1 1.6.2 2 2.1 2.1.1 2.1.2 2.1.3 2.2 2.2.1 2.2.2 2.3 2.3.1 2.3.2 2.4 2.4.1 2.4.2 2.4.3 2.4.4 2.4.5 2.4.6 3 3.1 3.2 3.2.1 3.2.2 3.3 3.3.1 3.3.2 3.4 3.4.1 3.4.2 3.4.3 Naming Conventions..................................................................................................................... 7 Introduction .................................................................................................................................. 7 Naming Conventions Description ................................................................................................ 8 Metadata ......................................................................................................................................... 18 The Metadata Standard ............................................................................................................... 18 The GISCO Profile....................................................................................................................... 18 The mandatory elements of the GISCO profile............................................................................ 18 The Metadata Editor .................................................................................................................... 19 Metadata Editor Use (Within ArcCatalog).................................................................................... 19 Validation of compliance of metadata to GISCO profile .............................................................. 26 Metadata Server .......................................................................................................................... 27 Spatial Reference System............................................................................................................. 29 Introduction .................................................................................................................................. 29 ETRS89 Ellipsoidal Coordinate Reference System (ETRS89).................................................... 30 ETRS89 Transverse Mercator Coordinate Reference System (ETRS-TMzn) ............................ 31 ETRS89 Lambert Conformal Conic Coordinate Reference System (ETRS-LCC) ...................... 34 ETRS89 Lambert Azimuthal Equal Area Coordinate Reference System (ETRS-LAEA)............. 36 Grid Creation Standards ............................................................................................................... 38 Introduction .................................................................................................................................. 38 European Grid Coding System .................................................................................................... 38 Generalisation and Generalisation Parameters.......................................................................... 43 Introduction .................................................................................................................................. 43 Generalisation in ArcGIS ............................................................................................................. 43 GISCO Reference Database Generalisation Parameters ........................................................... 43 Database Interoperability.............................................................................................................. 45 Overview...................................................................................................................................... 45 Spatial Data Standards and GIS Interoperability ......................................................................... 45 Geographic Guidelines: Non-Normative Section ............................................51 What are good practices and what are bad practices for creation of data .............................. 51 Data delivery characterset ........................................................................................................... 51 Data including names .................................................................................................................. 51 Avoid specific Geodatabase features .......................................................................................... 51 Some tips and tricks to make the dataflow more fluent ............................................................ 52 Delivering data in (personal) geodatabase .................................................................................. 52 Use GISCO standard database projection, extent and precision ................................................ 52 Experienced difficulties in loading data in the new GISCO structure ...................................... 53 Delivering in shapefile format ...................................................................................................... 53 Problems with charactersets........................................................................................................ 53 Making GISCO Maps...................................................................................................................... 54 Introduction .................................................................................................................................. 54 Why do we use maps? ................................................................................................................ 54 Statistical Analysis in GIS ............................................................................................................ 54 What is a thematic map? ............................................................................................................. 55 Introduction to mapping concepts................................................................................................ 56 Creating a statistical map using the ‘Mapping Tool’ .................................................................... 57 Geographic Guidelines: Data Quality Section .................................................61 Quality Assurance Principles ....................................................................................................... 61 Geometric Quality.......................................................................................................................... 61 Scale and Resolution................................................................................................................... 61 Positional Accuracy ..................................................................................................................... 63 Topological Quality ....................................................................................................................... 63 Dangle Nodes .............................................................................................................................. 63 Polygon Topology ........................................................................................................................ 65 Attribute Consistency ................................................................................................................... 66 Limited code lists ......................................................................................................................... 66 Unlimited code lists...................................................................................................................... 67 Continuous values ....................................................................................................................... 67 1 3.5 3.5.1 3.6 3.6.1 3.6.2 3.7 3.7.1 3.7.2 3.7.3 3.7.4 3.7.5 3.8 3.8.1 3.8.2 3.8.3 3.8.4 3.8.5 3.8.6 3.9 3.9.1 3.9.2 3.9.3 3.9.4 Topological Consistency .............................................................................................................. 67 Relationships between datasets .................................................................................................. 67 Completeness ................................................................................................................................ 68 Geographical Completeness ....................................................................................................... 68 Attribute Completeness ............................................................................................................... 68 Generalisation................................................................................................................................ 69 Introduction .................................................................................................................................. 69 Point Remove .............................................................................................................................. 69 Bend Simplify............................................................................................................................... 69 Choosing a suitable tolerance ..................................................................................................... 69 Analysing and improving the results ............................................................................................ 70 Data Formats.................................................................................................................................. 70 Popular Vector Formats............................................................................................................... 70 Vector Format Limitations............................................................................................................ 73 Preferred Vector Formats ............................................................................................................ 73 Popular Raster Formats............................................................................................................... 74 Raster Format Limitations............................................................................................................ 74 Preferred Raster Formats ............................................................................................................ 75 Documentaion of Data Quality ..................................................................................................... 76 Data Quality Overview ................................................................................................................. 76 103Data Quality Elements ........................................................................................................... 76 Desriptors of the Data Quality Sub-elements .............................................................................. 76 Documenting Quality Information ................................................................................................ 77 APPENDICES APPENDIX I – The elements of the GISCO profile ...................................................................... 82 APPENDIX II - Glossary of GIS Terms ......................................................................................... 94 APPENDIX III - Glossary of Abbreviations .................................................................................. 97 APPENDIX IV - List of Document Sources .................................................................................. 99 2 Introduction i. Introduction to Eurostat A consistent European policy requires appropriate data, upon which well-founded decisions can be taken and a solid policy can be built. The collection and maintenance of such suitable data is one of the tasks of Eurostat, the body responsible for statistics within the European Commission. The initial impetus for the development of GISCO came from the Unit for Environment Statistics within the Directorate also responsible for agricultural statistics. The Directorate had already been involved in the MARS Programme (Monitoring Agriculture with Remote Sensing) and the need to integrate environmental information with more classical statistical indicators encouraged the use of a geographical reference frame as a mechanism to relate such information. A significant influence in these developments was the CORINE Programme (CoORdination of INformation on the Environment), established in 1985, which assembled environmental and non-environmental geographical data in a GIS system. As a result, the GISCO project was initiated and resulted in a broad range of GIS-related services for Eurostat, the European Commission and European organisations. ii. What is GISCO? ii(a). Organisation GISCO is the Geographic Information System of the European Commission. Originally conceived as a prototype GIS cell that would serve a wide spectrum of users and uses, the GISCO project has developed a service-oriented dimension, namely in geographical database development, thematic mapping, desktop mapping and dissemination of data. Providing these types of services is directly related to key parts of the GISCO mandate. The GISCO team consists of four distinct modules with the following tasks: • • • • GISCO Reference Database; Mapping and Spatial Analysis; Contact with users, producers and COGI; INSPIRE. ii(b). Mandate The mandate of GISCO can be subdivided according to the different actors that are involved. GISCO takes initiatives at the level of the Directorate General, Eurostat, at the level of the Commission, within the European statistical system and on the international level. The mandate of GISCO comprises the following tasks: 1. Within Eurostat • Raise awareness concerning ways to combine geographical and statistical information; • Promote and stimulate the use of GIS; • Manage the GISCO reference database; • Ensure quality control of Eurostat GIS products; • Act as a consultancy and reference centre for map production and spatial analysis; • Oversee GIS related projects. 2. Within the Commission • Manage the GISCO reference database including database structure, architecture, data transfer, ...; 3 • • • Chair the GISCO user committee and technical committee, organise and chair the meetings of the COGI; Express and support users' needs in terms of software and hardware; Promote and actively participate in the co-ordination of Commission activities in the area of GI and GIS. 3. Within the European statistical system • Promote geo-referencing of statistics and encourage the integration of GIS in the national statistical offices; • Promote collaboration between national statistical institutes (NSI) and national mapping agencies (NMA); • Promote harmonisation and co-ordination of the GI management systems used in statistical organisations, • Ensure standardisation and harmonisation in the exchange of geographical information between Member States and Eurostat; • Co-ordinate the participation of European statisticians in GI and GIS activities, promote their know-how in standardisation processes and ensure that their needs are taken into account in market developments. 4. International co-operation • Promote co-operation between NMA at European level and harmonise their approaches on technical matters and commercial policies, including pricing and copyright; • Pursue the harmonisation of EU and broader international initiatives in GI; • Participate in GIS-related projects in other statistical international organisations, for example, the United Nations Economic Commission for Europe (UN/ECE). iii. INSPIRE and the GISCO project The Commission is currently preparing legislation aimed at improving the integration of spatial data in Europe. The initiative is known as INSPIRE (INfrastructure for SPatial InfoRmation in Europe). A spatial data infrastructure is considered as an interacting system of basic geographical data, spatial information services, technical standards and specifications and an institutional framework. The initiative is based on the following principles that guide its activities: 1. Data should be collected once and maintained at the level where this can be done most effectively; 2. It should be possible to combine seamless spatial information from different sources across Europe and share it between many users and applications; 3. It should be possible for information collected at one level to be shared between all the different levels, detailed for detailed investigations, general for strategic purposes; 4. Geographic information needed for good governance at all levels should be abundant under conditions that do not refrain from its extensive use; 5. It should be easy to discover which geographic information is available, fits the needs for a particular use and under which conditions it can be acquired and used; 6. Geographic data should become easy to understand and interpret because it can be visualised within the appropriate context and selected in a user-friendly way. The INSPIRE initiative was developed with the active collaboration of the main stakeholders concerned. During 2002, six working groups helped to draw up the various components of the infrastructure: • • • • • • joint reference data and metadata; environmental data; data policy and legal aspects; architecture and standards; financing and implementation structures; impact analysis. 4 The reports produced by these groups and other documents on the INSPIRE initiative can be found at the following address: http://www.ec-gis.org/inspire. The proposal for an INSPIRE Directive has a framework structure, which needs further technical refinement through Implementing Rules. Five Implementing Rules have been identified in the proposal for a Directive, respectively dealing with: • • • • • Creation and updating of metadata for spatial data and spatial data services; Harmonised spatial data specifications; Network services and interoperability; Rules governing access and rights of use to spatial data sets and services for Community institutions and bodies; Monitoring and reporting of the implementation of the Directive. The future Directive will address the Member States who will subsequently transpose the Directive into national/regional legislation. Following good governance practices, the Commission, as the initiator and facilitator of INSPIRE, is also engaged to comply with future INSPIRE measures for all spatial data and services held and managed by the Commission itself. In line with Member States expectations, GISCO and related SDI components within the Commission will have to progressively become the EU node in a distributed EU-wide SDI architecture. GISCO has to ensure INSPIRE compliant development of GI service components as part of the Commission’s internal GI infrastructure. More information on INSPIRE can be found at http://inspire.jrc.it. The new GISCO database should follow the INSPIRE principles. The technology used will have to apply to the INSPIRE standards. The applications should be compatible with applications of a future INSPIRE network. iv. GISCO Reference Database Within the framework of the GISCO project, an extensive geo-referenced database has been developed. One of the main topics of the GISCO mandate is to extend, maintain and update this database. The numerous data sets offered by GISCO include: Topographical data: • • • • hydrography (water patterns, lakes,...); altimetry (digital elevation model); infrastructure data (ports, airports, roads, rail networks, ...); administrative entities (countries, regions, ...). Thematic data: • • • • land resources (land cover, soil data, vegetation, climatic conditions, ...); Community support frameworks (structural funds, INTERREG, ...); environmental data (coastal erosion, soil erosion, ...); industrial themes (energy transport networks, location of nuclear power stations, ...). The GISCO Reference Database is the central database of the GISCO System Architecture. This system will be extended to also contain other spatial databases, such as IMAGE 2000, that are regarded as complementary data. A user will be able to connect to more than one database via different interfaces, connecting directly to another spatial database, or the user can use an Internet Map Server connection to retrieve and view web maps with a web browser, to view the data in a web browser application, or access web services. 5 v. Contact Information Points of contact for further information on how to access the GISCO service and GISCO database: Eurostat GISCO project Rue Alcide Gasperi Batiment Bech D3/704 L-2920 LUXEMBOURG Tel: (352) 4301 - 32076 Fax: (352) 4301 - 34029 E-mail functional mailbox for GISCO: [email protected] Intranet website: www.cc.cec/gisco_eurostat vi. Purpose of guidelines These guidelines have been created in order to assure the compatibility of newly generated or converted geographic data with the GISCO standards. The guidelines are divided into three main sections: The normative section addresses particular standards that must be met such as naming conventions, geographic reference systems, grids and metadata. Some of these chapters are already contained in the first section of the GISCO database manual. The non-normative section discusses difficulties in data loading, good and bad practices in the creation of data and some tips and tricks to make data flow more fluent. In depth guidelines are also included on how to create GISCO maps presenting some of the main principles of cartography and thematic mapping The final section of the guidelines addresses data quality and consistency. This section discusses different quality elements that should be adhered to before GIS data should be used with, or included into the GISCO Reference Database. There are appendices containing a glossary of common GIS terms and abbreviations used in the document. This document is a combination of new material and material collated from other sources which have been edited and/or updated. The appendices also contain a list of sources that have been used for the base material of various chapters. vii. Intended audience The main audience for the guidelines are data integrators of the GISCO Reference database. The users of the GISCO database should have a good knowledge of GIS and geographic data. These guidelines do not explain what GIS is, nor introduce basic GIS principles such as projections or metadata. It does not explain how to work with ArcMap or any other GIS software component. The guidelines are also relevant for the GISCO Management to give an understanding of the procedures used for the integration of geographic data. Other developers wanting to work conformity with GISCO standardised procedures may also find these guidelines useful. 6 1 Geographic Guidelines: Normative Section This section addresses particular standards that must be met when creating data intended for the GISCO Reference Database such as naming conventions, geographic reference systems, grids and metadata. Some of these sections are already contained in the first section of the GISCO database manual, others are collated from other sources or unique to this document. The non-normative section discusses difficulties, tips, and guidelines. The final section of the document addresses data quality and consistency. 1.1 Naming Conventions 1.1.1 Introduction GISCO database features (eg. feature classes, tables) are held in a relational database structure. The aim of the naming conventions is: • • • to reflect the contents of the feature in a standardised, concise way; to reflect the logical and physical location of the feature within the database; to assure uniqueness of the feature name within the database. A sequence of abbreviations is therefore used to describe the contents of a database feature. The codes are grouped into code lists according to their meaning. Syntax rules define the sequence and the reading of the codes. The names of features, tables and attributes are composed according to the following categories: • • • • • Topic Feature data themes, feature classes, object classes and subtypes are named according to their topic category. Entity The type of a feature or object, e.g. region, boundary, point. Scale, Accuracy, Precision Time stamp or Version Source The naming conventions describe naming rules for the following database features: • Feature data themes • Feature, Object classes and subtypes • Relationships • Domains • Attributes The attribute and class names can not exceed 30 characters length. This restriction is due to the limitation in length for the names of tables and attributes in ORACLE. Long names are self explanatory, but become uncomfortable to deal with in programs, scripts, table headings, etc. Sensible and defined contractions in the attribute and table names can help to the readability of documentation and programming code. The name of the features and objects in the geodatabase is not meant to be a subset of the metadata. These names must contain the minimum information required to uniquely identify the entity they represent 7 1.1.2 Naming Conventions Description 1.1.2.1 Feature data themes Alphanumeric data (object classes) and geometry (feature classes) will be conceptually grouped in feature data themes. This concept substitutes the former “layers” and does not make part of the geodatabase structure. Feature data themes are hierarchically independent of feature datasets, i.e. one feature dataset could contain several feature data themes or vice versa. The first step to define the names of the different classes and attributes is to identify the geographical entity modelled in the feature data theme. The datatype name can be as long as desired. It represents an abstract concept and it is not bound to the limitation of name length in databases. Every datatype will have associated a short name. The short name will have a maximum of 4 characters. The generic words “area”, “zones”, “location”, “patterns” etc will be disregarded when choosing the short name. Table 1: Examples of short names Feature Type (long name) Territorial Units for Statistics (NUTS + Statistical Regions) Communes Structural Funds Zones Urban Audit Areas Designated Areas Inland water Short name NUTS COMM STFD URAU DSIG INWA A feature data theme will comprise at least one feature class or object class. 1.1.2.2 Feature/object class & subtypes names: The information that will make part of the class name must be exclusively the information needed to identify uniquely the class. This can be (or not, depending on the final model) the case of the version, scale, time stamp or source. It should not be the case of the extent (EU, EC, WD). The former “extent” segment in the name of the coverage will not appear in the new naming conventions. The projection is also dropped as the vector data will be stored in geographical coordinates. Raster data will be stored in either of the standard coordinate reference systems. 1.1.2.3 Class identification In order to identify feature classes within a feature data theme, they will be extended by a class identification. The class name is conditional, if feature data theme name and entity type name do not uniquely identify a feature or object class. The class identification must be exclusively based on concepts essential to the class. Scale or time stamps are not essential to any class. - NUTS-1 The class identification will always be a singular noun. Every class identifier will have a short name associated. This short name will have a maximum of 4 characters. 8 1.1.2.4 Entity type The entity type describes the concept for modelling a certain feature class. The table gives an overview on the keywords that should be used for describing the type of entity. The description of the entity type is mandatory. The entity type is abbreviated with 2 characters. Table 2: Keywords for describing entity types Short Name PO Long Name Description Example Polygon Lake polygon RG Region BN LI Boundary Line NW Network LB Label PT ND AN RT Point Node Annotation Route GR Grid IM Image A closed, two-dimensional figure with at least three sides that represents an area. It is used in GIS to describe spatial elements with a discrete area, such as parcels, political districts. Area feature that can represent a single area feature as more than one polygon (multipart polygons). Line feature separating polygon features Line feature representing a geographical entity An interconnected set of lines representing possible paths from one location to another (routing aspect) Point feature, used a reference of a polygon Feature modelled as point End point of a line feature Text feature for annotating a map Linear feature specifying a path through a network A data format for storing raster data that defines geographic space as an array of equally sized square cells arranged in rows and columns. Each cell stores a numeric value that represents a geographic attribute (such as elevation) for that unit of space. A raster-based representation or description of a scene, typically produced by an optical or electronic device, such as a camera or a scanning radiometer. Nuts regions Nuts boundary Road network Centroid of NUTS region Settlement Road junctions Ocean names Digital elevation model Satellite image Examples: The datatype “NUTS” models only NUTS regions. The class identification can be dropped, The entity type is a generic one: NUTS-RG, NUTS-BN, NUTS-LB. The datatype “Urban Audit” models 3 different entities: Cities, Kernels and Large Urban Zones. The generic class identifications can not be used. In stead, specific class identifications are needed: URAU-CITY-PO, URAU-CITY-LB, URAU-LUZO-PO 9 1.1.2.5 Additional identifiers Additional identifiers have to be used in order to uniquely name a feature or table. Additional identifiers are scale or precision, version or time stamp and the source. 1.1.2.6 Syntax rules The class name will be built up by adding the following strings, in this order: - Feature data theme short name (compulsory) - Class identification short name (conditional) - Entity type (compulsory) - Scale or precision: 100K, 1M, 200M (conditional) K stands for “thousand” M stands for “million” or “metres” (no lower case allowed) - Version: Vxx - time stamp: - source. (If needed) (If applicable and needed) (If needed) No spaces are allowed. The different segments will be joined by a “-“. In order to get a stable and logic sort of feature and object class names the use of leading zeroes in the scale, version and time stamp segments should be considered. Examples: NUTS-RG-01M-2003 (Feature data theme – entity type – scale – time stamp) If the NUTS levels are separated in different feature classes, the feature class identification expressing the NUTS level should be added: NUTS-1-RG-01M-2003 If all the generalised versions are hosted in the same feature class, then the scale should be omitted in the feature class: NUTS-RG-2003 although it could appear again in the subtypes that host separately the different scales. NUTS-LB-2003 (Feature data theme – feature class identification – time stamp) In this case the NUTS level should be an attribute of the feature class. If the labels are separated in different feature classes by NUTS level then the NUTS level should be part of the feature class identification: NUTS-3LAB-2003. Neither scale nor precision are applicable - STFD-PO-2000_2006 (Feature data theme – Feature class identification – time stamp. If only one scale is available, then “1M” is not needed. This information appears in the metadata) - URAU-CITY-PO (Urban audit cities) - URAU-LUZO-PO (Urban Audit Large Urban Zones) - URAU-CITY-LB (Urban Audit City labels) In the actual design of the Urban Audit dataset, the geometry of the Urban Audit I, Urban Audit II and French National Urban Audit are sharing the same coverages. In consequence, neither version nor time stamp should make part of the feature class name. In general, version and time stamp will only exceptionally appear at the same time. Time stamp is preferable to version. The version number does not give much information. “V9” only means that there were 8 versions before it, while “2003” gives a more precise idea of the validity of the feature class The source name will rarely be needed. 10 A list of “Class identifications” and “class identification short names” must be defined and carefully updated. Before defining a new “class identification”, it must be verified that none of the existing ones is suitable for the new class. Names, that are defined for entity types must not be used with class identifiers, e.g. label, regions, etc. Table 3: Example of class identification short names Feature/object class identification Airport City Commune Condominium Country Short name AIRP CITY COMM COND CTRY 1.1.2.7 Relationship names and role names in Arc GIS (Forward/Backward path label) The name of relationships in ArcGIS will be a noun (when possible a “-ship” noun). that gives a general description of the relationship. The role of the entities involved in a relationship (called in ArcGIS terminology “Forward / Backward path label”) will be a verb. The forward and backward path label can be labelled with different verbs, but it is a common practice to label one with an active form and the other with the same verb in passive form. Example: Relationship name: drainage Forward path label: drains to Backward path label: is drained by On the one hand, these names will have not correspondent in the Oracle database. On the other hand, these relationships will be implemented as tables or attributes in Oracle, as described in the following paragraphs. 1.1.2.8 1-to-1 relationships between feature class and object class In a correct design, the attributes of an object class having a 1-to-1 relationship to a feature class should be integrated in the feature class, i.e. the object class should not exist. This is not applicable when the object class has a 1-to-1 relationship to at least two feature classes (for instance, different generalisations of the same entity). In this case redundancy should be avoided and the attributes will be either integrated in one of the feature classes or simply separated in an object class. In this last case the name of the object class will be all the common concepts identified in the attributed feature classes. For instance: - How to call the attributes of the NUTS 2003? Let us assume that the following feature classes are defined: - NUTS-3-RG-01M-2003 - NUTS-3-RG-03M-2003 - NUTS-3-RG-10M-2003 - NUTS-3-LB-2003 The object class that contains the attributes for all these feature classes would be “NUTS-3-RG2003-ATTR”. The concept “Region” is included in the identifier “LB” i.e. “Region label”. 11 Warning! This method does not guarantee that ANY feature class with the words “NUTS”, “3”, “RG” and “2003” are related to the object class “NUTS-3-RG-2003-ATTR“ but it is very unlikely if the feature types and feature classes are sensibly chosen. 1.1.2.9 1-to-many relationships The 1-to-many relationships should be implemented as a foreign key in the “many” side of the relationship. The foreign key name should follow the conventions described in the paragraph “Attribute names” 1.1.2.10 Many-to-many relationship names The many-to-many relationships are physically stored as an object class. The name of the object class that represents a many-to-many relationship will be built up by the following segments: First end class/subclass name Second end class/subclass name A noun (when possible a “-ship” noun) that gives a general description of the relationship When both ends belong to the same feature data theme: The datatype short name should be omitted. If both have the same time stamp and/or version they should appear only once in the second end The noun for describing the relationship has a maximum length of 8 characters. If the resulting table name exceeds 30 characters, the noun will be shortened to reach the maximum number of characters allowed Example: COMM-COND-2001-MANAGMNT NUTS-RG-BN-2003-DELIMIT 1.1.2.11 Domain names Factually, all classes that are pointed by a foreign key in another class are domains. In this paragraph we will refer only to the “coded value domains” According to the scope of the domain, they can be classified as general purpose domains (used in several classes in different datatypes) specific usage domain (strongly related to one single datatype). In order to make the naming conventions easier to understand and to apply, both types of domains will be treated equally. The domain name will not contain any reference to the datatype it belongs to (if any). The name will be as descriptive as possible. Sometimes it requires an effort to find the word that better describes the criteria followed to define the domain. The forbidden words (“type”, “class” or “status”) should not be used in the domain name. The name should be composed of the attribute name and the extension DOM. In case the domain name does not uniquely identify a table, additional identifiers, such as time stamps can be added to the name. Example: The name of the domain for the attribute “ISO-LANG-CODE” will be “ISO-LANG-CODE-DOM”. The name “ISO-LANG-CODE-2001-DOM” identifies the domain for ISO language codes valid in 2001. 12 1.1.2.12 Attribute names The common object oriented nomenclature concatenates class and attribute to uniquely identify an attribute (example: “NUTS-3-RG-2003.NUTS-ID”). This way, an attribute name can be repeated in different feature or object classes. The attribute name will omit all references to the feature class identification, scale or time stamp that are already included in the feature or object class name. Example: Good: NUTS-ID Bad: NUTS-3-2003-ID 1.1.2.13 Primary key The first step to start the naming of attributes will be to identify the conceptual “Primary key”: The primary key will be named after the feature type identifier short name + “ID”. When this is not sufficient, the name will be “Feature type name – Feature/object class identification – ID” Examples: - “NUTS-ID”. We can have “NUTS-3-RG-2003.NUTS-ID” and “NUTS-2-LB-1999.NUTS ID” “COMM-COND-ID” It might happen that several ID can be chosen (so called “candidate keys”). In such a case we recommend to use an internal or ad hoc defined ID. The other candidate keys can be considered (and named) as foreign keys to object classes (example: airports, where there are several codes available: ICAO, IATA, and others) 1.1.2.14 Foreign keys The attributes that are foreign keys will keep the same name that the pointed attribute, except the suffix: “ID” will be changed by “CODE”. It might happen that this attribute name is not sufficient to distinctly identify where this key is pointing to. In such a case, a version or time stamp should be included: - NUTS-3-RG-2003-CODE - COMM-2001-CODE This “extra” information (the level and the time stamp) must be included only if other attributes with level and time stamps are available in the same feature/object class. The attribute name is not meant to substitute the metadata or the data model! For instance: let us assume that Communes 2003 are available. The time stamp for the foreign key “NUTS CODE” should be sufficient, since NUTS 2003 will be generated based on communes 2003. In the case of “airports”, for instance, the level can be omitted: - NUTS-2003-CODE - NUTS-1999-CODE - NUTS-1995-CODE 1.1.2.15 Duplicated Foreign keys in the same class It might happen that the same foreign key appears twice in the same class, each one as the result of the implementation of different relationship or a different role (end-name). In this case the foreign key name would be duplicated. This conflict must be solved somehow: - When the foreign key implements a relationship to/from a subclass or subtype, then the name of the subclass/subtype is privileged over the name of the more generic class. - If the previous criterion is not applicable, the attribute name will be suffixed with the role name of the relationship implemented by the foreign key 13 The many-to-many relationship COMM-COND-2001-MANAGMNT will have two foreign keys both pointing to COMM-COMM-2001: one represents the condominium and the other one represents the administrator of the condominium: Example: A condominium (subclass or subtype of Commune) is co-administrated by several communes and/or higher administrative local units. Object class name: COMM-COND-2001-MANAGMNT Attribute names: COMM-COND-2001-MANAGMNT-ID (internal ID) COMM-COND-CODE (foreign key to COMM-COMM-2001, subtype Condominium) COMM-CODE (foreign key to COMM-COMM-2001) NUTS-1999-CODE (foreign key to NUTS-RG-1999 and NUTS-LB-1999) The time stamp “2001” has been omitted in “COMM-COND-CODE” and “COMM-CODE” since it is coincident with the time stamp in the class name. On the other side, the time stamp is strictly needed in the NUTS code since it does not refer to NUTS 2001 (which does not exist) 1.1.2.16 Classification attributes “Type”, “Class”, “Classification”, “Status”: These words must be avoided in the attribute names. Every classification is done according to well defined criteria. The name of the classification, type or status attribute must be decided after these criteria. Example: Degree of urbanisation or DEGU. Never use TYPE or “Commune Type” Eligibility (there is no need to add the word “Status”) Classifications should never be grouped. Grouping classifications is a very bad database design habit. Example: BAD: Airport type: - Military active - Military inactive - Civil active - Civil and military inactive - Others. GOOD: Airport usage: Military Civil Public Civil Private Military and Civil Airport operability - Active - Inactive - Not known Types, classes and status will be related to a domain class. In such a case, these attributes must be treated as foreign keys to their respective domain classes (Examples: “Airport management code” and “airport operability code”) 14 1.1.2.17 Other attribute names: Attributes that are not foreign key are obviously only related to the feature/object class that host them. In consequence, no prefix will be used. When it will be needed to identify the feature/object class among other synonyms, then the OO notation will be used (“Communes 2003.name”) “NAME”: Word (or small number of words) by which individual person, animal, place or thing is spoken of or to. These attributes will be called simply “NAME”. It might happen that several names are available for the same feature/object class. In this case the attribute name can be distinguished by a suffix, usually the source. Example: NAME-SABE NAME-SIRE “DESCRIPTION”: Verbal portrait or portraiture of person, object or event, more or less complete. 1.1.2.18 Definition. The “description” attributes will be named “DESCRIPTION”. It is important not to mix up “descriptions” and “names”: the description, by definition, is rather long. No attribute can be named “definition”. The attributes named this way will be renamed as “description” 1.1.2.19 Keywords: A list of keywords and short keywords will be developed and constantly updated. Before naming an attribute, the list of forbidden words and keywords should be consulted. Examples: Table 4: Examples of keywords Keywords Name Description Longitude Latitude Altitude Objective Country Code Identifier Abbreviation NAME DESC LON LAT ALT OBJ CNTR CODE ID Table 5: Examples of forbidden worlds and alternatives Forbidden words Type Class Definition Sequential number Nation State (political concept) NUTS 0 alternative Name the criteria used for the classification Name the criteria used for the classification Description ID or code Country Country Country 15 1.1.2.20 Topic Category Names Table 6: Topic Category Names ISO 19115 Topic Category INSPIRE data theme 01 - Farming Agricultural and Aquaculture Facilities Habitats and biotopes Biogeographical Regions Habitats and biotopes Statistical Units 02 - Biota 02 - Biota 02 - Biota 03 - Boundaries 03 - Boundaries 03 - Boundaries 04 - Climatology / Meteorology / Atmosphere 05 - Economy (14 Oceans) 06 - Elevation 06 - Elevation 07 - Environment 07 - Environment 08 - Geo-scientific Information 08 - Geo-scientific Information 08 - Geo-scientific Information 08 - Geo-scientific Information 08 - Geo-scientific Information (14 - Oceans) 10 - Imagery/Base maps/Earth cover 12 - Inland waters 12 - Inland waters 12 - Inland waters (07 Elevation) 13 - Locations 13 - Locations (01 Farming) 13 - Locations (16 Society) 13 - Locations (16 Society) 14 - Oceans 14 - Oceans 15 - Planning/Cadastre 15 - Planning/Cadastre Administrative Units Administrative Units Meteorological Spatial Features Area Management Feature Type (Long name) Farm Accountancy Data Network (FADN) Natural Vegetation Biogeographical Zones Biotopes Territorial Units for Statistics (NUTS + Statistical Regions) Communes Subcommunes Climate Short name FADN VEGT BIOG BIOT NUTS COMM SCOM CLIM Fishing Areas FISH Elevation Oceanographic Features Natural Risk Zones Protected sites Natural Risk Zones Digital Elevation Model Bathimetry Land Quality Designated Areas Soil Erosion Risk DEM BATH LNQU DSIG SOER Natural Risk Zones ERTR Soil Geology Geomorphology ErosionTrend Soil Geology Sediments Discharges SDDS Natural Risk Zones Coastal Erosion COER Land Cover Land Cover LCOV Hydrography Hydrography Hydrography Water Patterns Lakes Watersheds WTPT LAKE WTSH Geographical Grids Geographical Grids Geographical Grid LUCAS GGGR LUCA Geographical Names Settlements STTL Geographical Names Gazetteer GAZZ Sea Regions Oceanographic Features Zones and Reporting Units Zones and Reporting Coastline boundaries Sea Level rise Inter Regional COAS SELV IREG Leader Zones LEAD SOIL 16 15 - Planning/Cadastre 15 - Planning/Cadastre 15 - Planning/Cadastre 15 - Planning/Cadastre 16 - Society 16 - Society 18 - Transportation 18 - Transportation 18 - Transportation 18 - Transportation 18 - Transportation 19 - Utilities / Communication 19 - Utilities / Communication 19 - Utilities / Communication Units Zones and Reporting Units Zones and Reporting Units Zones and Reporting Units Zones and Reporting Units Population distribution Demography Population distribution Demography Transport Networks Transport Networks Transport Networks Transport Networks Transport Networks Production and industrial facilities Production and industrial facilities Production and industrial facilities Less Favoured Areas LFAV National Support NTSU Structural Funds Zones STFU Urban Audit URAU Population POPU Degree of urbanisation DGUR Airports Ferry links Ports Road infrastructure Railway infrastructure Nuclear Power AIRP FERR PORT ROAD RAIL NUPW Energy Production ENPR Energy Transport ENTR 17 1.2 Metadata This section describes the metadata standard that should be used for data intended for the GISCO Reference database, and an example of how to produce and validate metadata to the metadata standard using ArcGIS. Chapter 3.9 of this document discusses metadata in concern with data quality and includes an example of data quality reporting. 1.2.1 The Metadata Standard The ISO 19115 consists of a comprehensive schema for describing geographic data. The schema comprises a total of more than 300 elements of which 23 elements are core elements and 12 elements are mandatory for compliance with the international standard. The mandatory elements focus on discovery aspect of the metadata. Despite on information on the metadata itself, they provide information on the title, the category, the reference date, the geographic location, a short description of the data and the data provider. The core set expands the mandatory elements with additional information on the type, the scale, the format, the reference system and the data lineage. These elements give rough information on the potential usage of the data. 1.2.2 The GISCO Profile For shared usage of spatial data within the Commission, additional information on the data might be necessary. A metadata profile comprises at least the mandatory elements of the ISO 19115 standard and defines additional elements from the ISO 19115 standard. The standard even describes a way for extending the profile to elements that are not part of the current standard. To sum up, a profile contains elements of the standard plus elements that extend the standard in a pre-defined way. The profile can change the obligation of elements from optional to mandatory. Usually, a profile is developed if the core elements are not sufficient for describing the data according to the needs of an organization. The GISCO Profile is described in the document “D8.A.1Gisco Metadata Model Description”. The starting point for the implementation of this profile is the “Eurosion Metadata Model V3” to which two packages have been added: the “Metadata Attribute Information” and the “legislation Information”. This profile is also compatible with the INSPIRE model. 1.2.3 The mandatory elements of the GISCO profile The exhaustive list of elements of the GISCO profile can be found in Appendix I The following elements in the metadata editor and the validation are mandatory for the GISCO profile: • • • • • • • • • • • • • Dataset title Dataset date (date, date type [creation, revision, publication]) Responsible Party (either individual or position name and role) Dataset language Topic Category Spatial Resolution (equivalent scale, distance) Abstract Spatial representation type Reference System Geographic Location Lineage Metadata Point of contact (either individual or position name and role) Metadata date stamp (date, date type [creation, revision, publication]) 18 The following elements are conditional: • • • Dataset characterset Metadata language (if not defined by encoding) Metadata characterset (if ISO 10646-1 not used) 1.2.4 The Metadata Editor To decide whether or not a data source is suitable to use in your map you often need more information than its basic properties and a look at its features. You may need information about the data's accuracy, or how a set of measurements was collected. An item's metadata may include this type of documentation along with many properties that have been derived from the data automatically. The Metadata tab presents this information in an easy-to-read format. You can view the same set of metadata in many ways by choosing a different stylesheet from the dropdown list on the Metadata toolbar. Style sheets are similar to queries that select and process some data from a database and present the results as a report. Each stylesheet converts the metadata into a different-looking HTML page. You can explore its content as you would any HTML page in a browser. The metadata synchronisers assure the automatic update of the metadata when geometric features properties change in the Geodatabase. Metadata in ArcCatalog consists of properties and documentation. Properties, such as the extent of a shapefiles features, are derived from the item itself. Documentation is descriptive information supplied by a person. By default, when you try to view an item's metadata, ArcCatalog will create it for you automatically if it doesn't already exist; it will then add many of the item's properties to it. Once created, metadata becomes part of the item itself. It is automatically moved, copied, and deleted along with the item. Every time you view the metadata, ArcCatalog automatically updates the properties recorded in it with current values. This ensures that the metadata is kept up to date with changes to the data source. For example, the extent and count of a shapefile's features will be current one when you look at its metadata, even if new features were recently added. Eventually, metadata can be imported into or exported from the Geodatabase under the form of XML files. The current ESRI release of the ISO editor in ArcCatalog is only intended to support the "core" elements as defined by ISO. Therefore an ISO wizard editor has been developed in Visual Basic and ArcObjects. It manages the metadata of the core elements of the ISO norm and the additional elements. 1.2.5 Metadata Editor Use (Within ArcCatalog) This chapter describes how to visualize GISCO XML files with the Metadata Editor (using the GISCO metadata style sheet made for). Second step consists in describing the import of a XML file compliant with GISCO metadata scheme to another XML file using ISO_GISCO metadata scheme, compatible with the Standard ESRI Metadata Editor, in order to be able to modify, update or complete metadata files. The use of this Standard Metadata Editor Wizard is shown. Last step depicts how to export the ISO_GISCO XML file modified into an GISCO XML file compliant with GISCO XML metadata scheme. 1.2.5.1 Visualisation of GISCO XML files To make delivered GISCO XML compliant files, please do proceed to the following actions: - Copy the Eurosion stylesheet Gisco_Metadata.xsl into the directory: [location where ArcGIS is installed]…\Metadata\Stylesheets\ 19 Launch ArcCatalog application. Select the TAB called metadata. Activate menu View -> Toolbars -> Metadata - let appear a window called Stylesheet. Choose “Gisco_Metadata” previously installed. Metadata compliant with GISCO metadata stylesheet can now be viewed with ArcCatalog tool. Note : Under Windows 2000 OS, ArcCatalog application might be closed and re-launched to make the modification efficient. 20 21 1.2.5.2 Import of GISCO XML files This function allows the conversion of a GISCO XML file compliant with the GISCO XML SCHEMA into an internal ESRI XML format. This operation is needed to ensure metadata update with the ArcGIS Standard Editor Wizard. AdministratedArea XML metadata file is currently displayed within GISCO_Metadata Stylesheet The import is launched by FIRST selecting the file to import and then pushing the button… …corresponding to Import of Metadata as shown below. 22 Browse the XML file to be imported. IMPORTANT: Disable the option “Enable automatic update of metadata” unticking the box. After validation 'OK', the file has been imported. Its visualization now requires the use of ISO_GISCO Stylesheet Visualizing the imported file using the ISO_GISCO will allow the edition and modification of the imported file (in memory) with the Standard ArcGIS Metadata Editor Wizard. This is described in the next paragraph. 23 1.2.5.3 Editing metadata Once the XML imported file is displayed using ISO_GISCO stylesheet, the Metadata Editor Wizard is accessible through the button: 24 1.2.5.4 Export metadata to an XML format compliant with the GISCO XML SCHEMA This function allows the previously modified ISO_Gisco XML file into GISCO XML SCHEMA compatible format. The export is launched by FIRST selecting the file to export and then pushing the button corresponding to Export of Metadata as shown below: The metadata file exported into an XML file compliant with the GISCO XML SCHEMA can be displayed by changing of stylesheet and selecting GISCO_Metadata one. 25 The file has been correctly exported and is now modified and still compliant with GISCO Metadata Model. 1.2.6 Validation of compliance of metadata to GISCO profile. A Java tool is available to validate the compliance of newly created metadata to the GISCO profile. The tool has been made using the Xerces2 Java Parser 2.6.2 API. It is launched from the command line and takes the path to the XML instance to be validated as parameter. The used schema file is indicated within the xml instance. Following are steps to make use of this tool: 1. First, unzip the JavaValidator.zip file to a folder (i.e. myFolder) 2. Open a command line window (command cmd.exe) and position the command line at the myFoldef directory (i.e. c:>cd …\myFolder) 3. To run the tool use command java –jar as follows: Make sure to specify the jar file “giscoXmlValidator.jar” containing the java classes and the path to the xml instance to be validated. (here xml/catalog.xml). Make sure to put the XercesJar folder containing the needed API in the same directory as giscoXmlValidator.jar Be sure, also, that the used xsd schema file for the validation is well indicated within value of xsi:schemaLocation attribute of the root element in the xml instance file. 26 For example: <MD_Metadata xmlns="http://www.giscoLN.org/metadataModel/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.giscoLN.org/metadataModel/ D:\GISCO\WorkingDirectory\GiscoXsd\xsd_LongNames\gisco_LN_20050426.xsd" … > (NB please, use the back slash “\” symbol as separator in absolute path to the xsd file) Then run the tool as follows: The program outputs a result display message to the standard output but also to log.txt log file located in the same directory as the giscoXmlValidator.jar file If validation succeeded the following message is displayed: The same message is also written to log.txt file Otherwise, error messages are displayed 1.2.7 Metadata Server ArcIMS provides built-in functions to set up a metadata catalogue service. Metadata can be published, i.e. transferred from feature classes to the metadata server, and be used for data discovery. A web application running in a browser is able to search the metadata server for information that is contained in the metadata and retrieve the correspondent data or map service. Additionally metadata can be organised hierarchically and browsed in order to find the desired spatial data. A metadata server is part of the internet services provided by GISCO. GISCO has set up a metadata server that contains information on the spatial data of the GISCO reference database as it is visible through the Intranet service. The feature classes and datasets are grouped according to the ISO 19115 Topic Category Codes. For the time being, the INSPIRE spatial data themes are considered as to be too unstable for using them as basis for categorisation. 27 The metadata server is based on ArcIMS. Each metadata entry of spatial data is completed by a map service that is available for other web services too. The server is managed by GISCO, the content and the web map services are also managed by GISCO. Users are able to read and search the metadata server. The server is configured in close cooperation with the INSPIRE developments, i.e. the server will be used as a node of a broader spatial data infrastructure. In addition to read-only access selected users have authoring rights, i.e. these users are able to publish metadata on the GISCO server. In this case, there is no direct link from the metadata to the data held by another DG. Interested users have to approach the responsible persons, if they were interested in utilising the spatial data. This approach is chosen in order to assure that data can be discovered, on the one hand, and to assure data integrity of the reference database on the other hand. Technically, the metadata of the feature classes in the spatial database have been copied (published) to the metadata server. Additionally, a map service has been created that assures the link between metadata and data. Metadata authors outside GISCO will simply publish their data to the metadata server. The intranet address of the Metadata Server is www.gisco.eurostat.cec/metadatacatalog. 28 1.3 Spatial Reference System 1.3.1 Introduction The European Terrestrial Reference System 1989 (ETRS89) is the geodetic datum for PanEuropean spatial data collection, storage and analysis1. This is based on the GRS80 ellipsoid and is the basis for a coordinate reference system using ellipsoidal coordinates. For many PanEuropean purposes a plane coordinate system is preferred. But the mapping of ellipsoidal coordinates to plane coordinates cannot be made without distortion in the plane coordinate system. Distortion can be controlled, but not avoided. Figure 1: ETRS89 geodetic datum For many purposes the plane coordinate system should have minimum distortion of scale and direction. This can be achieved through a conformal map projection. The ETRS89 Transverse Mercator Coordinate Reference System (ETRS-TMzn) is recommended for conformal PanEuropean mapping at scales larger than 1:500 000. For Pan-European conformal mapping at scales smaller or equal 1:500 000 the ETRS89 Lambert Conformal Conic Coordinate Reference System (ETRS-LCC) is recommended. With conformal projection methods attributes such as area will not be free of distortion. For PanEuropean statistical mapping at all scales or for other purposes where true area representation is required, the ETRS89 Lambert Azimuthal Equal Area Coordinate Reference System (ETRS-LAEA) is recommended.2 Figure 2: ETRS-LAEA Coordinate System 1 See: Annoni, A., Luzet, C., (Eds) (2000) Proceedings of the workshop “Spatial Reference System for Europe”, Marne la Vallée, 23-24 Nov. 1999, EUR19575/EN 2 See: Annoni, A., Luzet, C., Gubler, E., Ihde, J. (Eds) (2001) Map Projections for Europe, EUR20120/EN 29 The ETRS89 datum and the above projections are described below. 1.3.2 ETRS89 Ellipsoidal Coordinate Reference System (ETRS89) 1.3.2.1 ETRS89 Description The European Terrestrial Reference System 1989 (ETRS89) is the geodetic datum for PanEuropean spatial data collection, storage and analysis. This is based on the GRS80 ellipsoid and is the basis for a coordinate reference system using ellipsoidal coordinates. The ETRS89 Ellipsoidal Coordinate Reference System (ETRS89) is recommended to express and to store positions, as far as possible. 1.3.2.2 ETRS89 Definition The next table contains the fully described ETRS89 Ellipsoidal Coordinate Reference System (ETRS89) following ISO 19111 Spatial referencing by coordinates: The coordinate lines of the Ellipsoidal Coordinate System are curvilinear lines on the surface of the ellipsoid. They are called parallels for constant latitude (phi) and meridians for constant longitude (lamda). When the ellipsoid is related to the shape of the Earth, the ellipsoidal coordinates are named geodetic coordinates. In some cases the term geographic coordinate system usually implies a geodetic coordinate system. Table 7: ETRS89 Definition Entity Value CRS ID ETRS89 CRS alias ETRS89 Ellipsoidal CRS CRS valid area Europe CRS scope Geodesy, Cartography, Geoinformation systems, Mapping Datum ID ETRS89 Datum alias European Terrestrial Reference System 1989 Datum type geodetic Datum realization epoch 1989 Datum valid area Europe / EUREF Datum scope European datum consistent with ITRS at the epoch 1989.0 and fixed to the stable part of the Eurasian continental plate for georeferencing of GIS and geokinematic tasks Datum remarks see Boucher, C., Altamimi, Z. (1992): The EUREF Terrestrial Reference System and its First Realizations. Veröffentlichungen der Bayerischen Kommission für die Internationale Erdmessung, Heft 52, München 1992, pages 205-213- or ftp://lareg.ensg.ign.fr/pub/euref/info/guidelines/ Greenwich Prime meridian ID Prime meridian Greenwich longitude Ellipsoid ID 0° Ellipsoid alias New International Ellipsoid semi-major axis 6 378 137 m Ellipsoid shape TRUE Ellipsoid inverse flattening 298.2572221 Ellipsoid remarks see Moritz, H. (1988): Geodetic Reference System 1980. Bulletin Geodesique, The Geodesists Handbook, 1988, GRS 80 30 Internat. Union of Geodesy and Geophysics Coordinate system ID Ellipsoidal Coordinate System Coordinate system type geodetic Coordinate system dimension 3 Coordinate system axis name geodetic latitude Coordinate system axis direction North Coordinate system axis unit identifier Coordinate system axis name degree Coordinate system axis direction East Coordinate system axis unit identifier Coordinate system axis name degree Coordinate system axis direction up Coordinate system axis unit identifier metre geodetic longitude ellipsoidal height If the origin of a right-handed Cartesian coordinate system coincides with the centre of the ellipsoid, the Cartesian Z-axis coincides with the axis of rotation of the ellipsoid and the positive Xaxis passes through the point "phi" = 0, "lamda" = 0. 1.3.3 ETRS89 Transverse Mercator Coordinate Reference System (ETRSTMzn) 1.3.3.1 ETRS-TMzn Description The ETRS89 Transverse Mercator Coordinate Reference System (ETRSTMzn) is identical to the Universal Transverse Mercator grid system for the northern Hemisphere applied to the ETRS89 geodetic datum and the GRS80 ellipsoid. The UTM system was developed for worldwide application between 80° S and 84° N with the follow ing basic features: 1. 60 zones of 6° longitudinal extension numbered c onsecutively from 1 to 60, beginning with number 1 for the zone between 180° W and 174° W and continuing eastward 2. central meridian scale factor of 0.9996 producing two lines of secancy approximately 180 000 m East and West of the central meridian 3. negative coordinates are avoided by assigning a false easting value of 500 000 m East at the central meridian; and false northing values at the equator of 0 m for the northern hemisphere and 10 000 000 m for the southern hemisphere 4. uniform conversion formulas from one zone to another 5. unique referencing for all zones in a plane rectangular coordinate system 6. meridional convergence (between the true and grid North) to be less than 5° 7. map distortion within the zones to be less than 1:2,500 ETRS-TMzn is a series of zones, where “zn” in the identifier is the zone number. Each zone runs from the equator northwards to latitude 84º North and is 6-degrees wide in longitude reckoned from the Greenwich prime meridian. Zone 31 is centred on 3º East and is used between 0º and 6º East, zone 32 is centred on 9º East and is used between 6º and 12º East, etc. 31 Table 8: Zones of the ETRS-TMzn. Zone number (zn) Longitude of Origin (degrees) West Limit (degrees) East Limit (degrees) South Limit (degrees) North Limit (degrees) 26 27 28 29 30 31 32 33 34 35 36 37 38 39 27º West 21º West 15º West 9º West 3º West 3º East 9º East 15º East 21º East 27º East 33º East 39º East 45º East 51º East 30º West 24º West 18º West 12º West 6º West 0º East 6º East 12º East 18º East 24º East 30º East 36º East 42º East 48º East 24º West 18º West 12º West 6º West 0º West 6º East 12º East 18º East 24º East 30º East 36º East 42º East 48º East 54º East 0º North 0º North 0º North 0º North 0º North 0º North 0º North 0º North 0º North 0º North 0º North 0º North 0º North 0º North 84º North 84º North 84º North 84º North 84º North 84º North 84º North 84º North 84º North 84º North 84º North 84º North 84º North 84º North 1.3.3.2 ETRS-TMzn Definition Table 9: ETRS-TMzn Definition Entity Type CRS ID ETRS-TMzn CRS remarks zn is the zone number, starting with 1 on the zone from 180° West to 174° West, increasing eastwards to 60 on th e zone from 174° East to 180° East CRS alias ETRS89 Transverse Mercator CRS CRS valid area Europe CRS scope Datum ID CRS for conformal pan-European mapping at scales larger than 1:500,000 ETRS89 Datum alias European Terrestrial Reference System 1989 Datum type geodetic Datum realization epoch 1989 Datum valid area Europe/EUREF Datum scope European datum consistent with ITRS at the epoch 1989.0 and fixed to the stable part of the Eurasian continental plate for georeferencing of GIS and geokinematic tasks Datum remarks Prime meridian ID see Boucher, C., Altamimi, Z. (1992): The EUREF Terrestrial Reference System and its First Realizations. Veröffentlichungen der Bayerischen Kommission für die Internationale Erdmessung, Heft 52, München 1992, pages 205-213 or ftp://lareg.ensg.ign.fr/pub/euref/info/guidelines/ Greenwich Prime meridian Greenwich longitude 0° Ellipsoid ID GRS 80 Ellipsoid alias New International 32 Ellipsoid semi-major axis 6 378 137 m Ellipsoid shape TRUE Ellipsoid inverse flattening 298.2572221 Ellipsoid remarks see Moritz, H. (1988): Geodetic Reference System 1980. Bulletin Geodesique, The Geodesists Handbook, 1988, Internat. Union of Geodesy and Geophysics 115 Coordinate system ID TMzn Coordinate system type Projected Coordinate system dimension 2 Coordinate system remarks Projection: Transverse Mercator in zones, 6° width Coordinate system axis name N Coordinate system axis direction North Coordinate system axis unit identifier Coordinate system axis name Metre Coordinate system axis direction East Coordinate system axis unit identifier Operation ID Metre Operation valid area Europe Operation scope for conformal pan-European mapping at scales larger E TMzn than 1:500,000 Operation method name Transverse Mercator Projection Operation method name alias TMzn Operation method formula Transverse Mercator Mapping Equations, in Hooijberg, Practical Geodesy, 1997, pages 81-84, 111-114 Operation method parameters number Operation parameter name 7 Operation parameter value 0° Operation parameter remarks 0°, the Equator Operation parameter name longitude of origin Operation parameter value central meridian (CM) of each zone Operation parameter remarks central meridians ..,3° W, 3° E, 9° E, 15° E, 21° E Operation parameter name false northing Operation parameter value 0m latitude of origin ,... Operation parameter remarks Operation parameter name false easting Operation parameter value 500 000 m Operation parameter remarks Operation parameter name scale factor at central meridian Operation parameter value 0.9996 Operation parameter remarks Operation parameter name width of zones Operation parameter value 6° Operation parameter remarks 33 Operation parameter name latitude limits of system Operation parameter value 0° N and 84° N Operation parameter remarks Note that the axes abbreviations for ETRS-TMzn and ETRS-LCC are N and E whilst for the ETRS-LAEA they are Y and X. 1.3.4. ETRS89 Lambert Conformal Conic Coordinate Reference System (ETRS-LCC) 1.3.4.1 ETRS-LCC Description The ETRS89 Lambert Conformal Conic Coordinate Reference System (ETRS-LCC) is a single projected coordinate reference system for all of the pan-European area applied to the ETRS89 geodetic datum and the GRS80 ellipsoid. Because of the greater extent in longitude than in latitude, a Lambert Conic Conformal projection with two standard parallels is utilised. The scale factor is only a function of the latitudes of the standard parallels and the latitude of the point where it is computed. 1.3.4.2 ETRS-LCC Definition Table 10: ETRS-LCC Definition Entity Value CRS ID ETRS-LCC CRS alias ETRS89 Lambert Conformal Conic CRS CRS valid area Europe CRS scope CRS for conformal pan-European mapping at scales smaller or equal 1:500,000 Datum ID ETRS89 Datum alias European Terrestrial Reference System 1989 Datum type geodetic Datum realization epoch 1989 Datum valid area Europe/EUREF Datum scope Ellipsoid ID European datum consistent with ITRS at the epoch 1989.0 and fixed to the stable part of the Eurasian continental plate for georeferencing of GIS and geokinematic tasks see Boucher, C., Altamimi, Z. (1992): The EUREF Terrestrial Reference System and its First Realizations. Veröffentlichungen der Bayerischen Kommission für die Internationale Erdmessung, Heft 52, München 1992, pages 205-213 or ftp://lareg.ensg.ign.fr/pub/euref/info/guidelines/ Prime meridian ID Greenwich Prime meridian Greenwich longitude 0° GRS 80 Ellipsoid alias New International Ellipsoid semi-major axis 6 378 137 m Ellipsoid shape TRUE Datum remarks 34 Ellipsoid inverse flattening 298.2572221 Ellipsoid remarks see Moritz, H. (1988): Geodetic Reference System 1980. Bulletin Geodesique, The Geodesists Handbook, 1988, Internat. Union of Geodesy and Geophysics Coordinate system ID LCC Coordinate system type Projected Coordinate system dimension 2 Coordinate system axis name N Coordinate system axis direction North Coordinate system axis unit identifier Coordinate system axis name Metre Coordinate system axis direction East Coordinate system axis unit identifier Operation ID metre Operation valid area Europe Operation scope for conformal pan-European mapping at scales smaller or equal 1:500,000 Operation method name Lambert Conformal Conic Projection with 2 standard parallels Lambert Conformal Conic Projection, in Hooijberg, Practical Geodesy, 1997, pages 133-139 Operation method formula E LCC Operation method parameters number Operation parameter name 6 Operation parameter value 35°N lower parallel Operation parameter remarks Operation parameter name upper parallel Operation parameter value 65° N Operation parameter remarks Operation parameter name latitude grid origin Operation parameter value 52° N Operation parameter remarks Operation parameter name longitude grid origin Operation parameter value 10° E Operation parameter remarks Operation parameter name false northing Operation parameter value 2 800 000 m Operation parameter remarks Operation parameter name false easting Operation parameter value 4 000 000 m Operation parameter remarks 35 1.3.5. ETRS89 Lambert Azimuthal Equal Area Coordinate Reference System (ETRS-LAEA) 1.3.5.1 ETRS-LAEA Description The ETRS89 Lambert Azimuthal Equal Area Coordinate Reference System (ETRS-LAEA) is a single projected coordinate reference system for all of the Pan-European area. It is based on the ETRS89 geodetic datum and the GRS80 ellipsoid. Its defining parameters are given in the following table according to ISO 19111 Spatial referencing by coordinates. 1.3.5.2 ETRS-LAEA Definition Table 11: ETRS-LAEA Definition Entity CRS ID CRS alias CRS valid area CRS scope Datum ID Datum alias Datum type Datum realization epoch Datum valid area Datum scope Datum remarks Prime meridian ID Prime meridian Greenwich longitude Ellipsoid ID Ellipsoid alias Ellipsoid semi-major axis Ellipsoid shape Ellipsoid inverse flattening Ellipsoid remarks Coordinate system ID Coordinate system type Coordinate system dimension Coordinate system axis name Coordinate system axis direction Coordinate system axis unit identifier Value ETRS-LAEA ETRS89 Lambert Azimuthal Equal Area CRS Europe CRS for Pan-European statistical mapping at all scales or other purposes where true area representation is required ETRS89 European Terrestrial Reference System 1989 geodetic 1989 Europe / EUREF European datum consistent with ITRS at the epoch 1989.0 and fixed to the stable part of the Eurasian continental plate for georeferencing of GIS and geokinematic tasks see Boucher, C., Altamimi, Z. (1992): The EUREF Terrestrial Reference System and its First Realizations. Veröffentlichungen der Bayerischen Kommission für die Internationale Erdmessung, Heft 52, München 1992, pages 205-213 or ftp://lareg.ensg.ign.fr/pub/euref/info/guidelines Greenwich 0° GRS 80 New International 6 378 137 m TRUE 298.2572221 see Moritz, H. (1988): Geodetic Reference System 1980. Bulletin Geodesique, The Geodesists Handbook, 1988, Internat. Union of Geodesy and Geophysics LAEA projected 2 Y North metre 36 Coordinate system axis name Coordinate system axis direction Coordinate system axis unit identifier Operation ID Operation valid area Operation scope Operation method name Operation method formula Operation method parameters number Operation parameter name Operation parameter value Operation parameter name Operation parameter value Operation parameter remarks Operation parameter name Operation parameter value Operation parameter remarks Operation parameter name Operation parameter value Operation parameter remarks X East metre LAEA Europe for Pan-European statistical mapping at all scales or other purposes where true area representation is required Lambert Azimuthal Equal Area Projection US Geological Survey Professional Publication 1395, "Map Projection - A Working Manual" by John P. Snyder. 4 latitude of origin 52° N longitude of origin 10° E false northing 3 210 000.0 m false easting 4 321 000.0 m With these defining parameters, locations North of 25° have positive grid northing and locations eastwards of 30° West longitude have positive grid easting. Note that the axes abbreviations for ETRS-LAEA are Y and X whilst for the ETRS-LCC and ETRS-TMnz they are N and E. Caution: All EU projections are based on ETRS89 datum and therefore use ellipsoidal formulas. In some GIS applications the Lambert Azimuthal Equal Area method is implemented only in spherical form. Geodetic latitude and longitude must not be used in these spherical implementations. To do so may cause significant error (up to 15km!). Use the example conversions above to test whether software uses appropriate formulas. 37 1.4 Grid Creation Standards 1.4.1 Introduction A grid for representing thematic information is a system of regular and geo-referenced cells, with a specified shape and size, and an associated property. The Workshop on European Reference Grids held in Ispra, October 2003 recommended to adopt a common European Grid Reference System for Reporting and Statistical Analysis3. The proposal for a European grid coding system is based on the initial discussion during the workshop that continued during the preparation of the short proceedings. A decimal grid Coding System was initially proposed by Albrecht Wirthmann during the workshop. Successively during the consultation phase a second proposal was submitted by Mark Greaves. The discussion involved other experts (e.g. Lars Bernard, Andrus Meiner,.) and demonstrated the need to modify the initial proposals introducing additional levels in addition to the different hierarchical levels of the proposed decimal grid system. After analysing advantages and disadvantages of different solutions, a new proposal was formulated and proposed for adoption as European specification for INSPIRE. This system is described in the following paragraphs. 1.4.2 European Grid Coding System 1.4.2.1 Basic assumptions and definitions Coordinate Reference System The geographical location of the grid points are based on the Lambert Azimuthal Equal Area coordinate reference system (ETRS-LAEA) as defined by the Spatial Reference and the Map Projections workshops in Marne la Vallee (1999, 2001) 22 . The cartographic projection is centred on the point N 52°, E 10°. The coordinate system is metric. 1.4.2.2 Hierarchical Structure The grid is defined as hierarchical grid in metric coordinates in power of 10. The hierarchical structure is determining the structure of the grid coding system. Figure 3: Hierarchical grid structure for the first 3 levels 3 See: Annoni, A., (Ed) (2005) European Reference Grids, workshop proceedings, EUR21494/EN 38 1.4.2.3 Code Definition In agreement with the workshop recommendations the coding system must satisfy the following principles: · easy to manipulate, · hierarchical, · having a European Unique Code Identifier For these reasons all systems proposed in the following sections are based on the coordinates of the Grid. For clarity all examples refer to the same given pair of “raw” coordinates (5780354, 436102) that are given in meters. 1.4.2.4 Ordering of Axes It is assumed that the first coordinate (in the example 5780354) identifies the Easting of the point, i.e. the coordinate value along the west-to-east axis. The second coordinate (in the example 436102) identifies the Northing of the point, i.e. the coordinate value along the south-to-north axis. Grid code identifies south-western corner of a cell To derive a code at an accuracy level that is less accurate than the one given by a pair of coordinates always a truncation method is used, i.e. the grid code coordinates for a coarser resolution are always describing the lower left corner of the cell that includes the given coordinates. 1.4.2.5 Direct Coordinate Coding System This coding system concatenates the coordinates of Easting and Northing of a grid point. The length of the coordinates defines the precision of the grid. A grid with a precision of 1 m would require a maximum of 7 digits by each dimension. The resulting code would have 14 digits. A grid with a precision of 1 km would be defined by a code comprising 8 digits. Leading zeros are coded in order to preserve the precision information. Figure 4: Direct Coordinate Coding System for 100 km & 1000 km resolution 39 1.4.2.6 Quad-tree Subdivision The difference in resolution between the different hierarchical levels of the proposed decimal grid system is rather large. For some applications, it might be necessary to insert additional levels in between. This could be done by simply dividing a grid into 4 equally spaced sub cells. Thus, a grid with a distance of 1 km could be divided into cells of 500 m length. A second level could be introduced by dividing each sub cell again into 4 equally sized cells of 250 m length. A next sub division would lead to grid cells of 125 m length. This is close to the next lower hierarchical level of the decimal grid. Therefore, it is suggested to introduce a maximum of 2 sub divisions for each grid level. This method of sub dividing a grid is called quad-tree, as each cell is divided into 4 quarters. The graphical representation of the grid structure when traversing the grid from its root cell to its smallest sub cell results in a tree structure with 4 branches at each level: Figure 5: Grid tree structure 1.4.2.7 Explicit indication of resolution level using powers of 10 and 2 An explicit indication of hierarchical (resolution) level seems an asset, but all systems proposed in the previous section presented hide the notion of primary and secondary level and require some effort to remember the correspondence between level and precision. A new proposal is formulated in this chapter (with two options) that seems to overcome most of the problems identified in previous proposals. It is suggested to use a coordinate coding system for constructing the grid code with the following characteristics: 1. The system is based on a primary grid and in two additional sub-levels (secondary and tertiary grid) 2. The coordinate values are expressed in decimetres. 3. The primary grid (metric) will have 7 primary levels (first column in Table 12) 4. Two additional sub-levels are authorised as quadtree subdivision of the primary grid (second column in Table 12) except as for level one, where only on sublevel is authorised to not have sub-decimetre resolution. 40 Table 12: Primary (power of 10) and quad-tree levels (power of 2) for explicit indication Primary Level 1 Quadtree Level 0 Value in dm in m/km 101 20 10 1m 1 1 101 21 5 0.5m 2 0 10 2 20 100 10m 2 1 10 2 21 50 5m 2 2 102 22 25 2.5m 3 0 10 3 20 1000 100m 3 1 10 3 21 500 50m 3 2 10 3 22 250 25m 4 0 10 4 20 10 000 1000m / 1km 4 1 4 1 10 2 5 000 500m 4 2 4 10 2 2 2 500 250m 5 0 5 10 2 0 100 000 10km 5 1 5 1 10 2 50 000 5km 5 2 5 10 2 2 25 000 2500m/ 2.5km 6 0 6 10 2 0 1 000 000 100 km 6 1 6 1 10 2 500 000 50km 6 2 10 6 22 25 000 25km 7 0 10 7 20 10 000 000 1 000km 7 1 10 7 21 5 000 000 500km 7 2 10 7 22 2 500 000 250km Two ways to express the code are proposed: 1. A fixed length code (here a point is used as a delimiter, which increases readability but is not necessary for automatic processing): Code=Level.QuadtreeLevel.EastCoordinate.NorthCoordinate 2. A floating length code (again a point as delimiter) which makes the quad-tree level code (the last part) optional, i.e. it has only to be indicated if a quad-tree level is used: Code=EastCoordinate.NorthCoordinate.Level[.QuadtreeLevel] 41 Table 13: Two applications for the explicit indication coding for the example coordinates (578035, 436102) Primary Level Quadtree level m East North 1 0 1 57803540 4361020 1 1 0.5 57803540 4361020 2 0 10 57803500 4361000 2 1 5 57803500 4361000 2 2 2.5 57803525 4361000 3 0 100 57803000 4361000 3 1 50 57803500 4361000 3 2 25 57803500 4361000 4 0 1000 57800000 4360000 4 1 500 57800000 4360000 4 2 250 57802500 4360000 5 0 10000 57800000 4300000 5 1 5000 57800000 4350000 5 2 2500 57800000 4350000 6 0 100000 57000000 4000000 6 1 50000 57500000 4000000 6 2 25000 57750000 4250000 7 0 1000000 50000000 0 7 1 500000 55000000 0 7 2 250000 57500000 2500000 Fixed Length Code Floating Code 1.0.57803540.04361020 5780354.0436102.1[.0] 1.1.57803540.04361020 57803540.04361020.1.1 2.0.57803500.04361000 578035.043610.2[.0] 2.1.57803500.04361000 5780350.0436100.2.1 2.2.57803525.04361000 57803525.04361000.2.2 3.0.57803000.04361000 57803.04361.3.[0] 3.1.57803500.04361000 578035.043610.3.1 3.2.57803500.04361000 5780350.0436100.3.2 4.0.57800000.04360000 5780.0436.4[.0] 4.1.57800000.04360000 57800.04360.4.1 4.2.57802500.04360000 578025.043600. 4.2 5.0.57800000.04300000 578.043.5[.0] 5.1.57800000.04350000 5780.0435.5.1 5.2.57800000.04350000 57800.04350.5.2 6.0.57000000.04000000 57.04.6[.0] 6.1.57500000.04000000 575.040.6.1 6.2.57750000.04250000 5775.0425.6.2 7.0.50000000.00000000 5.0.7[.0] 7.1.55000000.00000000 55.00.7.1 7.2.57500000.02500000 575.025.7.2 Both coding systems: 1. can be easily derived from the coordinate values. 2. are easily understandable. Clearly the fixed length code shows more redundancy and less flexibility – i.e. for a change of precision - but the coding rules are straightforward and thus seem to be easier to handle by computers. 42 1.5 Generalisation and Generalisation Parameters 1.5.1 Introduction The subject of cartographic generalisation is the reduction of information in a map when scale is reduced. Applied methods in this context are simplification, selection, deletion, exaggeration, symbolisation. One aspect of generalisation is the simplification of lines and polygons. In this context generalisaton methods simplify lines by removing small fluctuations or extraneous bends from it while preserving its essential shape. Generalisation allows you to create simplified datasets for displaying or publishing at smaller scales based on your larger scale data. 1.5.2 Generalisation in ArcGIS In ArcGIS the generalisation tool is called ‘Simplify Line’. The ‘Simplify Lines’ tool allows two different algorithms for generalisation: Point Remove is a fast, simple algorithm that reduces a line quite effectively by removing redundant points, such as over digitised vertices; however, the angularity of the resulting line will increase significantly as the tolerance increases so the line may become aesthetically unpleasing. Use Point Remove for data compression or a relatively low degree of simplification. Bend Simplify applies advanced techniques to detect bends along a line, analyze their characteristics, and eliminate insignificant ones. It takes longer to process than Point Remove, but the resulting line is more faithful to the original and shows better aesthetic quality. Figure 6: Types of generalisation algorithms used in ArcGIS Both the ‘Point Removal’ and ‘Bend Simplify’ algorithms are described in more detail in chapter 3.7 of this document. Many feature types within the GISCO Reference database (e.g. Coastline) are also included at different display scales. These scaled versions have been generalised. Generalisation is discussed in more detail in section 3.7 of this document, along with tips on how to analyse and improve generalisation results. 1.5.3 GISCO Reference Database Generalisation Parameters The following table shows, for the common display scales, the threshold at which small polygons can be deleted, along with the weed tolerance parameter used in ArcInfo. The ‘Bend Simplify’ algorithm should be used to generalise data intended for the GISCO Reference Database. 43 Table 14: Deletion thresholds and weed tolerance parameters Scale Max Area 1:100 000 1:1m 1:3m 1:4m 1:10m 1:20m 2500 m2 0.25 km2 2.25 km2 4.00 km2 25.00 km2 100.00 km2 Weed Tolerance 50m 500m 1500m 2000m 4500m 9000m It must be remembered that there maybe some exceptions when deleting polygons. For instance, it was found that the Austrian part of the Constance lake had disappeared in the NUTS 1:20million feature class. Although that polygon represents just 32 Km2, it should be shown at 1:20m. These kind of issues are politically sensitive. 44 1.6 Database Interoperability 1.6.1 Overview Geographic information system (GIS) technology is evolving beyond the traditional GIS community and becoming an integral part of the information infrastructure in many organizations. The unique integration capabilities of a GIS allow disparate data sets to be brought together to create a complete picture of a situation. GIS technology illustrates relationships, connections, and patterns that are not necessarily obvious in any one data set, enabling organizations to make better decisions based on all relevant factors. Organizations are able to share, coordinate, and communicate key concepts between departments within an organization or between separate organizations using GIS as the central spatial data infrastructure. GIS technology is also being used to share crucial information across organizational boundaries via the Internet and the emergence of Web Services. To fully realise the capability and benefits of geographic information and GIS technology, spatial data needs to be shared and systems need to be interoperable. GIS technology provides the framework for a shared spatial data infrastructure and a distributed architecture. Interoperability is one way in which the integration of spatial data in Europe can be improved. It can aid the maintenance, availability and understanding of spatial data, and therefore contribute towards the aims of the principles set out by the INSPIRE Initiative. The paragraphs below discusses the value of being "open," the evolution of spatial standards with the development of new technologies, including the future of Web Services, and provides an overview of where efforts are being concentrated in regards to interoperability. 1.6.2 Spatial Data Standards and GIS Interoperability 1.6.2.2 What Does Being an "Open" GIS Mean? To put this question into context, it is important to understand that during the past 20 years, the concepts, standards, and technology for implementing GIS interoperability have evolved through six stages. 1. Data converters (DLG, MOSS, GIRAS) 2. Standard interchange formats (SDTS, DXF ™ , GML) 3. Open file formats (VPF, shapefiles) 4. Direct read application programming interfaces (APIs) (ArcSDE ® API, CAD Reader, ArcSDE CAD Client) 5. Common features in a database management system (DBMS) (OGC Simple Feature Specification for SQL ™ ) 6. Integration of standardized GIS Web services (WMS, WFS) All six of these approaches and related technologies are important and continue to play a significant role in GIS interoperability today. Data sharing between organizations with different GIS vendor systems was limited to data converters, transfer standards, and later open file formats. Sharing spatial data with other core business applications was rarely achieved. Today, most GIS products directly read and sometimes dynamically transform data with minimal time delay. The point here is that the GIS community has been pursuing open interoperability for many years, and the solutions to achieving this goal have changed with the development of new technologies. Another factor to be considered is the still evolving view of the role that GIS plays in an organization. In the early days of GIS, the focus, with rare exceptions, was on individual, isolated projects. Today the focus is on the integration of spatial data and analysis in the mission-critical business processes and work flows of the enterprise and on increasing the return on investment (ROI) in GIS technology and databases by improving interoperability, decision making, and service delivery. 45 Finally, it is worthwhile to remember why GIS system technology is implemented in the first place. Even if we have specialized responsibility for gathering and managing geographic data, we need to remember that a GIS is not an end in itself. A GIS must produce useful information products that can be shared among multiple users, while at the same time provide a consistent infrastructure to ensure data integrity. It is important not to get caught up in the technology and forget this basic principle. Interoperability enables the integration of data between organizations and across applications and industries, resulting in the generation and sharing of more useful information. 1.6.2.3 The Value of Being Open An open GIS system allows for the sharing of geographic data, integration among different GIS technologies, and integration with other non-GIS applications. It is capable of operating on different platforms and databases and can scale to support a wide range of implementation scenarios from the individual consultant or mobile worker using GIS on a workstation or laptop to enterprise implementations that support hundreds of users working across multiple regions and departments. An open GIS also exposes objects that allow for the customization and extension of functional capabilities using industry-standard development tools. 1.6.2.4 The Georelational Database Gradually, GIS models evolved into georelational structures where related attribute data could be stored in a relational database that was linked to the file-based spatial features. However, the georelational format had limited scalability, and the dual data structure (spatial features stored in proprietary file-based format with attributes stored in a relational database) meant that the GIS could not take full advantage of relational database features such as backup and recovery, replication, and fail-over. In addition, supporting large data layers required the use of complex tiling structures to maintain performance, and sharing spatial information with other core business applications was still not possible. 1.6.2.5 The Spatially Enabled Database In the mid-'90s, new technology emerged that enabled spatial data to be stored in relational databases (often referred to as spatially enabling the database), opening a new era of broad scalability and the support of large, non-tiled, continuous data layers. When the new spatially enabled databases were combined with client development environments that could be embedded within core business applications, the sharing of spatial features with core business applications, such as customer management systems, became possible. In addition, these spatially enabled databases allowed organizations to take the first steps toward enterprise GIS and the elimination of organizational "spatial data islands." Perhaps not coincidently, the open GIS movement was spawned shortly after the arrival of the first all-relational models capable of storing both spatial and attribute data in a relational database when standards organizations, such as the Open GeoSpatial Consortium (OGC) and the ‘International Organization for Standardization’ began promoting the idea of data sharing through spatial data standards. The early work of these organizations was focused on sharing simple spatial features in a relational database, thereby enabling interoperability between the commercial GIS vendors. OGC, an international industry consortium of private companies, government agencies, and universities, published an open spatial standard called the Simple Features Specification. 1.6.2.6 The Future with Web Services Much of the focus of GIS developers today is Web services, as these are seen as the best longrange solutions for data sharing and interoperability. Web services avoid the issues and complications of GIS applications being tied to the spatial schema of a specific RDBMS vendor and allow GIS vendors to manage their own data using the best methods and formats for their tools in whatever database environment they choose. In addition, Web services allow server-to-server sharing of data and services, as opposed to integration only happening at the client level as it does with standards that are focused on the DBMS. Some vendors choose to use an RDBMS with schema and methods that perform optimally for their tools. Others use file systems. Web services mean that each GIS vendor can build and 46 manage its own GIS data and readily provide GIS services (data, maps, and geoprocessing) to a larger audience in a common environment. 1.6.2.7 Web Services Framework Web services are a fundamentally new framework and set of standards for computing. Web services envision a network of distributed computing nodes, which can include servers, workstations, desktop clients, and lightweight "pervasive" clients (phones, PDAs, etc.). Web services standards provide the glue by which these computers and devices interact to form a greater computing whole, accessed from any other device on the network. It is also important to recognize that Web services are not just for the Internet; they represent a powerful architecture for all types of distributed computing. Web services provide a framework for fusing computing devices via open networks (the Internet, wireless, and local networks). In Web services, computing nodes have three roles: client, service, and broker. A client is any computer that accesses functions from one or more other computing nodes on the network. Typical clients include desktop computers, Web browsers, Java applets, and mobile devices. A client process makes a request of a computing service and receives results for each request. A service is a computing process that awaits requests, responds to each request, and returns a set of results. A broker is essentially a service metadata portal for registering and discovering services. Any network client can search the portal for an appropriate service. Server and broker technologies are typically used on UNIX, Linux, and Windows platforms. Web services can support the integration of information and services that are maintained on a distributed network. This is appealing in organizations, such as the Comission and the main stakeholders concerned with the INSPIRE Intiative, that have departments that independently collect and manage spatial data,and who require these data sets to be integrated. The use of Web services (a connecting technology) coupled with GIS (an integrating technology) can efficiently support this need. The result is that the various layers of information can be dynamically queried and integrated, while at the same time the custodians of the data can maintain this information in a distributed computing environment. 1.6.2.8 The Standards for Web Services The key standards used for Web services are a series of protocols (i.e., XML; Simple Object Access Protocol [SOAP]; Web Services Description Language [WSDL]; and Universal Description, Discovery, and Integration [UDDI]) that support sophisticated communications between various nodes in a network. They enable smarter communication and collaborative processing among nodes built within any Web services-compliant architecture. Web services can be accessed with devices such as browsers, mobile devices such as telephones, desktop clients, and other information appliances. To discover these services, a broker is provided. The discovery protocol is referred to as a Universal Description, Discovery, and Integration. In the GIS context, the UDDI node plays the role of a metadata server of registered Web services. A user can search a UDDI directory and find other distributed service providers or services that exist on a network. Web services interoperate (communicate) through an XML-based protocol known as Simple Object Access Protocol. This is an XML API to the functions provided by a Web service. Each Web service "advertises" its SOAP API using a mechanism called Web Services Description Language, allowing easy discovery of any service's capabilities. 47 Figure 7: Integration of standards-based web services Web services provide an open, interoperable, and highly efficient framework for implementing systems. They are interoperable because each piece of software communicates with each other piece via the standard SOAP and XML protocols. This means that if a developer "wraps" an application with a SOAP API, it can talk with (call/serve) other applications. Web services are efficient because they build on the stateless (loosely coupled) environment of the Internet. A number of nodes can be dynamically connected only when necessary to carry out a specific task such as update a database or provide a particular service. While conceptually the basic computer components of a Web services system are still clients and servers, it is important to recognize that the network connections are dynamically created "just in time" and, therefore, do not require the overhead of "statefull" networks. These networks can be implemented in open as well as secure environments. 1.6.2.9 Web Services and GIS This loosely coupled architecture provides a new and promising solution for implementation of complex collaborative applications such as a distributed GIS. In some ways, the integration of GIS and Web services simply means that GIS can be more extensively implemented, and people will be able to take mapping, data, and geoprocessing services from many servers and integrate them into a common environment. Unique to GIS-based Web services is the ability to not only connect and interoperate but to integrate data using the unique properties that are inherent within GIS itself (i.e., data integration and fusion based on geographic location). Web services enable the realization of some of the principles on which the INSPIRE Initiative is based: 1. Data should be collected once and maintained at the level where this can be done most effectively; 2. It should be possible to combine seamless spatial information from different sources across Europe and share it between many users and applications; 3. It should be possible for information collected at one level to be shared between all the different levels, detailed for detailed investigations, general for strategic purposes; 4. Geographic information needed for good governance at all levels should be abundant under conditions that do not refrain from its extensive use; 5. It should be easy to discover which geographic information is available, fits the needs for a particular use and under which conditions it can be acquired and used; 48 6. Geographic data should become easy to understand and interpret because it can be visualised within the appropriate context and selected in a user-friendly way. GIS fundamentally involves the integration of data from multiple sources. The Web services architecture establishes a particular type of relationship between service providers and consumers of information that nicely supports the dynamic integration of data, key to creating a spatial data infrastructure. 1.6.2.12 WMS and WFS Integration of web services can be achieved by using the OGC Web Map Service (WMS) and Web Feature Service (WFS) standards. A Web Map Service produces maps of spatially referenced data dynamically from geographic information. This international standard defines a "map" to be a portrayal of geographic information as a digital image file suitable for display on a computer screen. A map is not the data itself. WMS-produced maps are generally rendered in a pictorial format such as PNG, GIF or JPEG, or occasionally as vector-based graphical elements in Scalable Vector Graphics (SVG) or Web Computer Graphics Metafile (WebCGM) formats. This International Standard defines three operations: • • • returns service-level metadata returns a map whose geographic and dimensional parameters are well-defined returns information about particular features shown on a map (optional) Web Map Service operations can be invoked using a standard web browser by submitting requests in the form of URLs. The content of such URLs depends on which operation is requested. In particular, when requesting a map the URL indicates what information is to be shown on the map, what portion of the earth is to be mapped, the desired coordinate reference system, and the output image width and height. When two or more maps are produced with the same geographic parameters and output size, the results can be accurately overlaid to produce a composite map. The use of image formats that support transparent backgrounds (e.g., GIF or PNG) allows underlying maps to be visible. Furthermore, individual maps can be requested from different servers. The Web Map Service thus enables the creation of a network of distributed map servers from which clients can build customized maps. The Web Feature Service is an interface allowing requests for geographical features across the web being highly interoperable. It uses the XML-based GML for data exchange. The WFS specification defines interfaces for describing data manipulation operations of geographic features. Data manipulation operations include the ability to: • • • • Create a new feature instance Delete a feature instance Update a feature instance Get or Query features based on spatial and non-spatial constraints A WFS describes discovery, query, or data transformation operations. The request is generated on the client and is posted to a web feature server using HTTP. The web feature server then executes the request. The WFS specification uses HTTP as the distributed computing platform, although this is not a hard requirement. There are two encodings defined for WFS operations: • XML (amenable to HTTP POST/SOAP) • Keyword-Value pairs (amenable to HTTP GET/REST) 49 1.6.2.13 Standards Organizations There are many international standards organizations associated with spatial interoperability: • • • • • • • • • • • • • ISO - International Organization for Standardization OGC - Open Geospatial Consortium OGCE - Open Geospatial Consortium (Europe) W3C - World Wide Web Consortium ANSI - American National Standards Institute IHO - International Hydrographic Organization WS-I - Web Services Interoperability Organization LIF - Location Interoperability Forum WLIA - Wireless Location Industry Association FGDC - Federal Geographic Data Committee GSDI - Global Spatial Data Infrastructure CEN - European Committee for Standardization DGIWG - Digital Geographic Information Working Group OGCE meets Europe's interoperability challenges and is actively involved in a number of European Union Projects: • ETeMII - European Territorial Management Information Infrastructure • GETIS - Geoprocessing Networks in a European Territorial Interoperability Study • GINIE - Geographic Information Network in Europe • INSPIRE-INfrastructure for SPatial InfoRmation in Europe 50 2 Geographic Guidelines: Non-Normative Section This section discusses difficulties in data loading, good and bad practices in the creation of data and some tips and tricks to make data flow more fluent. In depth guidelines are also included on how to create GISCO maps presenting some of the main principles of cartography and thematic mapping. The normative part (Section 1) addresses particular standards that must be met when creating data intended for the GISCO Reference Database. The final part of the document addresses data quality and consistency. 2.1 What are good practices and what are the bad practices for the creation of data. The paragraphs below describe some good practices when creating data intended for the GISCO Reference database, and bad practices that should be avoided. 2.1.1 Data delivery characterset ArcSDE supports Unicode, but with some limitations. To load and display a feature class with attributes in multiple languages, ArcSDE does not support Unicode. The full Unicode support is available in the personal geodatabase. GISCO SDE database is an Oracle database is created with character set UTF8, and thus CHAR and VARCHAR can store characters in UTF8. Only one language can be loaded or displayed at once. To load or display different languages, set the appropriate NLS_LANG value for each language. Therefore, the character set for each dataset should be specified on delivery. 2.1.2 Data including names Include also an attribute with the Names in ASCII. This provides an ‘always readable’ name, in case something goes wrong with character conversion. 2.1.3 Avoid specific Geodatabase features (only usable for ArcGIS clients) These features are only recognized by ESRI ArcGIS clients. They can be useful in an ESRI client environment, but they are not enforced at database level. Therefore, only use them if they are really needed and, appropriate action need to be taken at database level. 2.1.3.1 Composite relationship classes These type of relationship class implements a kind of referential integrity rule between two feature classes (child can not exist without the parent), eventually with a cascading delete rule. This restriction is not enforced at database level, allowing third party tools to bypass the rule. If needed to maintain database integrity, also the database Foreign Key constraint should be created in order to enforce the maintenance of the integrity at database level. 2.1.3.2 Relationship Rules Relationship rules control how parent records relate to child records. These rules can be validated with ESRI ArcMap (but are not enforced). They are not enforced at database level at all and there is no easy implementation at database level. Therefore, avoid the use of Relationship rules. 2.1.3.3 Domains A domain is a declaration of acceptable attribute values. This restriction is not enforced at database level, allowing third party tools to bypass the rule. If needed to maintain database integrity: • For range domain, create a database check constraint. 51 • For a coded value domain, create a lookup table with a referential integrity constraint at database level. 2.1.3.4 Subtypes Subtypes provide a way to group features of one feature class into subsets using values in an attribute. An integer value is added as attribute, and the geodatabase can translate this value to a meaningful text. For third party products, a lookup table should be provided for translating the integers. 2.2 Some tips and tricks to make the dataflow more fluent. The paragraphs below outline some basic tips to help the dataflow to GISCO more fluent. 2.2.1 Delivering data in (personal) geodatabase • • • • Provide UML if possible provide SDE-export files with specification of characterset used; if non of above is possible then describe relations between data; never create relationships on the OBJECTID of feature classes!!! 2.2.2 Use GISCO standard database projection, extent and precision. Use the same spatial reference parameters as the GISCO database for setting up the geodatabase (personal or SDE) feature datasets and feature classes. This way rounding errors are avoided when re-projecting or changing to a different precision. Figure 8: GISCO Database spatial reference properties 52 4 2.3 Experienced difficulties in loading data into the new GISCO structure. The paragraphs below describe some of the common difficulties that have already been discovered when loading data into the new GISCO Reference Database structure, and solutions to help GISCO avoid such problems. 2.3.1 Delivering in shapefile format Problem: ArcSDE applies shape validation rules to any shape to be stored in the SDE database, while native shapefiles do not enforce these rules. SDE will reject shapes that do not conform to the SDE shape validation rules, which results in incomplete loading of data. Solution: Be sure to deliver a ‘clean shapefile’. Use the ‘Check Geometry’ tool in the ‘Features’ toolset of the ‘Data Management Tools’ of the ArcGIS toolbox to find feature geometry problems and correct them. More Information: ArcSDE9.0 Developer help -> Getting Started -> Geometry 2.3.2 Problems with Charactersets Problem: GISCO SDE is using an Oracle database created with character set UTF8 to store multiple European character sets. ArcSDE supports Unicode, but with some limitations. CHAR and VARCHAR can store characters in UTF8, but only one language can be loaded or displayed at once. Thus, you can not load (or display) different character sets at the same time. Solution: To load or display different languages, you need to set the appropriate NLS_LANG value for each language. For dataset delivered in sdeexport format, specify the NLS_LANG setting for each dataset, so it can be applied when using sdeimport. 4 ArcGIS assumes equal resolutions for the X and Y axis. Therefore, the maximum distance of both axis defines the spatial resolution of the dataset. As a side effect, ArcGIS automatically extends the upper limit of one axis with the maximum extent of the dataset. 53 The full Unicode support is available in the personal geodatabase. However, when loading the personal geodatabase to SDE database, the correct NLS_LANG must be set. Therefore, if different language sets are loaded in the personal geodatabase, provide information on which subsets need to be loaded with which NLS_settings. More Information: ESRI knowledge base: Article ID 27341 2.4 Making GISCO Maps 2.4.1 Introduction 2.4.1.1 Overview The section describes good practices for making thematic maps based on the GISCO Reference Database. GISCO aims to promote the appropriate use of maps for visualizing statistics but also to provide non-experts with some basic guidelines on designing thematic statistical maps and how to avoid the most common errors. The chapter is extended by more fundamental description of cartographic mapping principles that can be found at http://www.gisco.eurostat.cec/mappingguide. The pdf version of the Desktop Mapping guide can be downloaded at http://www.gisco.eurostat.cec/gisco/cfm/reports_en.cfm. 2.4.1.2 Purpose of this chapter This chapter is intended to give some principles on cartographic mapping and to introduce the GISCO mapping tool. The mapping tool allows to create statistical maps within ArcMAP in a semiautomated way. 2.4.2 Why do we use maps? Maps are a great way of displaying statistical data. They can present complex data clearly and compactly They can be a great help in spotting patterns within data They are accessible • • • • people understand maps (or at least think they do) people like maps maps attract attention and brighten up presentation But maps of statistics do present a number of problems A map always generalises and simplifies information. Maps can end up as decoration - unless you are careful sometimes the appearance of the map can become more important than its value and validity for presenting statistics. Information on a map is always interpreted information. Maps can mislead as well as provide useful information. Bad design can provide completely the wrong impression of the data. There is always the risk of unintentionally lying with maps. 54 Avoiding these problems and making sure that maps inform the reader, release new information from the data and present the statistics in valid way isn't difficult. It does however require that you are logical, careful and think hard about what you are doing. Economic, social and natural actions and phenomena all have a spatial component. By coupling statistical information with geographical territories we enhance the effectiveness with which they are presented or analysed. 2.4.4 What is a thematic map? In the past, map production was rather exclusive, but today everyone with a PC and mapping software can. We then use the map to communicate with other people, and we want them to receive the message the way it was meant to be received. We can distinguish between different types of maps: Topographic-, technical- and thematic maps. A road map or survey map is a good example of a topographic map. Technical maps are those you receive from the technical division in the commune, they describe the border of your site, and where to find technical equipment on your site. In this guide we refer only to thematic maps. This is a map where we connect non geographical data sets (ex. economic, social, demographic traffic data) with an indirect geographical reference (ex. region code, commune code, road number) to the map. This could be a starting point for future analyses, where the producer and/or the reader want to increase insight into the data set during a cartographic presentation. Figure 9: Thematic maps Thematic maps take their bases from existing topographic maps but they are distinguished further by the subject matter which usually is not the physical earth or locations upon it. The subject may be some distillation of physical phenomena, such as average annual temperature or precipitation values. Commonly, though, the subjects mapped are both abstract and non-physical, like crude birth rate per thousand inhabitants. The concern of thematic mapping is for a sound presentation of the essence of some distribution. We consider a thematic map as the primary component of any spatial analysis, presenting statistical information on "how much" or "how many", but also "where" a phenomena occurs. A strong sense of "visual logic" is vital, and a knack for choosing the right words to accompany the graphics is equally important. Thematic Maps - Their design and production by David J. Cuff and Mark T. Mattson 55 2.4.5 Introduction to mapping concepts In order to create a complete map, several important mapping concepts should be followed, such as: • • • map features map characteristics structure of a thematic map Beside the choice of the right symbol (point, line, area) describing a specific theme analysed, the map characteristics (projection, scale,…) are essential in order to form the base elements of a map. As far as the structure of a map is concerned, certain elements such as title, legend, … are absolutely necessary to create a clear elaborate map. In the following paragraphs all these ingredients, making a successful map, will be explained more in detail. 2.4.5.1 Map features The information conveyed by a map is represented graphically as a set of map components. Location information is usually represented by points, lines and areas. Point feature: A point feature is represented by a single location. It defines a map object of which the boundary or shape is too small to show as a line or area feature. Line feature: A line feature is a set of connected, ordered coordinates. It represents the linear shape of a map object that may be too narrow to be displayed as an area, such as a road, or a feature that has no width, such as a contour line. Area feature: An area feature is a closed figure whose boundary encloses a homogeneous area, such as a state, country, soil type or lake. 2.4.5.2 Map characteristics Map projection: Each map projection is the location framework of a thematic map. It is a systematic arrangement of the earth’s meridians and parallels onto a plane surface. We have got different types of projections, but each generates automatically some distortions of the area, distance, shape and direction. There is no transformation process which can completely eliminate simultaneously all these distortions. So the user has got to select the most appropriate projection depending on the map’s message. Map scale: Map scale is the extent of reduction required to display a portion of the Earth’s surface on a map. It can be expressed as a representative fraction, which is a ratio of the distance on the map page to distance on the ground. Larger scale maps show features in greater detail but represent less area. Smaller-scale maps show larger area but represent less detail. It is important to remember that only maps of the same scale should be used as overlays. Maps in different scales serve different needs. By no means a map of 1:3.000.000 should be used in order to depict the location of wells in a region. However, it is adequate for presenting the ports of France on an A0 format poster. Map resolution: The resolution of a map is the accuracy with which the location and shape of map features can be depicted for a given map scale. Scale affects resolution. In a larger-scale map, the resolution of features more closely matches real-world features because the extent of reduction from ground-tomap is less. When using a map we should always think about the scale and resolution. 56 Other map characteristics are the map accuracy and the map extent. 2.4.5.3 Structure of a thematic map A complete map contains 5 elements: Title, legend, scale, textual information, and the actual map. A map should be as self-explanatory as possible, so that a reader immediately sees what the map is all about without consulting the legend (e.g. for quantitative data). This is obtainable if we follow the visual rules according to the thematic information and the used visual variables. Title: The title should identify which theme variables are involved, what the map is all about. Very often we need a long title, and in this case we should use a short main title and a subtitle. The subtitle should contain information about the area the map covers and in the case of statistical information the reference period of the data. An indication of the NUTS level showing the breakdown of the regional data is obligatory. Legend: The legend should identify each of the theme variables used in the map as well as which visual variable corresponds to which theme variable. In simple words the variable which has been used for mapping should be explicitly stated and not mixed with the "Title" of the map i.e what does the line or point show?; what is the difference between the blue and the red line etc. The unit of measurement of the variable is obligatory. Scale: The scale is one of the most important elements on a map. So scale selection has got an important consequence for the map’s appearance and its potential as a communication device. On a map, however, use a graphic scale bar rather than a numerical scale (1:50 000), because any reduction of the map will not correspond anymore to the reality. Textual information: This could be subtitles or footnotes connected to the map as well as a declaration of the statistical and geographical data sources, the date of production of the map itself and the geographic orientation (N). Orientation need not always to be shown by an arrow, you can also use the graticule (parallels and meridians) or grid ticks. In case of statistical data from different sources, estimates or with a different reference period should be explicitly declared. Any exceptions of the NUTS nomenclature should be mentioned. The actual map: This is the thematic map produced from geographical information (ex. NUTS boundaries, commune boundaries etc) and statistical information for the NUTS regions, communes etc. 2.4.6 Creating a statistical map using the ‘Mapping Tool’ 2.4.6.1 Introduction The “Mapping Tool” software has been created for GISCO, for the production of statistical mapping based on NUTS regions from within the ArcMap environment. The tool can cater for the production of one-off single maps by running the software in wizard mode. Or alternatively the software can be run in batch mode with the use of map spec files that define all the parameters used in creating the maps. The use of the map spec file is also useful as a record of the maps created and also enables users to easily create maps again to the same or an adjusted spec. Software takes external input from • Pre-created ArcMap template files in which the NUTS data source is referenced. • Statistical data in a suitable tabular format (txt files, csv files, dbf, or Personal geodatabase tables) • Map spec files that define all the map parameters. 57 Output from the software is • ArcMap documents. These can be used in the future to recreate the mapping and export to a suitable export format. • Maps exported to a defined export format. (EPS, JPEG, TIFF, PNG or PDF) Figure 10: Example of map production process The software is supplied in the form of an executable file (*.dll). This software can then be added to the ArcMap environment as a custom button. For full instructions on how to use the ‘Mapping Tool’ (including installation), the Mapping Tool user guide should be obtained. An overview of the map making process is described below, together with an example of the map output. 2.4.6.2 Map Making Process The wizard allows the user to join statistical data to NUTS regions, and display the data in a variety of ways by giving the user a number of options. The software uses predefined ArcMap templates (*.mxt), which define the page layout and the source data, as the basis for creating the maps. Templates can be found at …. The software allows the use of pre-created files in which the map parameters can be stored, read and used in the map creation process. This allows maps to be recreated exactly, processing as a batch and also provides an easier way of specifying the parameters. The software can also read a style file of pre-defined styles to define the colour scheme for the statistical mapping. The software will define the colour scheme as a “shade set”, which is in effect a range of colours which are defined by ArcGIS as “Color Ramps” 58 On running the software, the user has 3 choices – • • • Create using wizard – This is the default option Create using a map spec file – This enables the user to create a map to a previously created specification, by loading a map spec file (*.msf) that defines all the map parameters. Process as a batch – Batch process the map production by selecting multiple map spec files. The user can then run through a number of processes, giving options to customise the map display and then create the map output: 1. Page Layout - Stored templates can be selected. 2. Language - It is necessary to specify a language for the maps. This is the language that the standard labels (E.g. Copyright clause) will be printed in. The options in this dropdown list are read from the standard labels table, so other languages can be added where necessary. 3. Statistical Dataset - The user can navigate to the desired statistical data source. This data source can be comma or tab separated text file, dbf file or geo-database feature class 4. Join field - The ‘Join field’ is used to join in the statistical data to the NUTS regions. 5. Map Number - This is a map reference to be entered by the user and is used to uniquely identify the map. This may not actually be number value. 6. Symbology - On the main dialog window is displayed a brief description of the method in which the map production process will use to display the statistical data. This method can be amended in a number of ways including: amending the symbology category, method of classification, values to exclude from the classification and definition of the symbology. 7. Shade Sets - It is necessary to define how the statistical data is displayed (i.e. the fill colour of each class). This is accomplished by defining a shade set to use. A “Shade set” in this instance describes what ArcGIS defines as a “Color Ramp”, which is simply an ordered set of predefined colours or graduated range of colours. This can then be applied to a statistical classification of data. 8. Export the map - Define if the map is to be exported to any of the supported formats as part of the map creation process. 9. Map Labels - Define labels to print to the map legend 10. Other options - The user has the option to save the parameters as input on the dialog to a map spec file, and to choose a filename. 11. Run the map creation process. 59 Figure 11: An example of the map output 60 3 Geographic Guidelines: Data Quality Section 3.1 Quality Assurance Principles As the sources and amounts of digital spatial data increase, it becomes increasingly important to enable the integration of heterogeneous data environments. One way to tackle this challenge is to agree upon data quality standards. The availability of standards has many advantages to the data collector, processor and user, especially when many different sources are used. The GISCO Reference environment is a particular heterogeneous data environment. The reason for this is that GISCO does not create spatial data itself, nor does it ‘order’ data based on set specifications. Until now, GISCO uses ‘the best sources available’ in order to create a European wide database. This means that for a particular data request, GISCO looks for suited data already available on the European market, in the first place in the public domain. As these data sets already exist, GISCO has little influence on the data quality. GISCO can only evaluate and assess the proposed data sets and then decide if a given data set is fit for integration in the GISCO Reference environment. Data sets that are proposed to be integrated in the GISCO database have to be assessed on the quality before any further processing is carried out. The reason for this is to ensure that no unnecessary work is made to process data that are not fit for integration. The supplied data sets have to pass controls made on their overall quality, for example: • • • • • • is the source reliable? will necessary support be supplied? are updates foreseen? are data consistent and complete? is the coding consistent? does the data set cover the necessary geographical extent? This section discusses different quality elements that should be adhered to before GIS data should be used with, or included into the GISCO Reference Database. Data quality information will need to be included in the metadata when data is submitted for use within the GISCO Reference Database. Metadata provide information on data. They can be used for searching and accessing related data. GISCO has developed a metadata profile based on ISO 19115 definition for geographic metadata for use within the Commission. Although some metadata elements such as the dataset extent can be created automatically in software such as ArcGIS, data quality elements such as scale and resolution should be recorded manually. Section 1.2 of this document shows how metadata conforming to the GISCO metadata profile can be created using a metadata editor as part of ArcCatalog. 3.2 Geometric quality 3.2.1 Scale and Resolution 3.2.1.1 Map Scale Map scale specifies the amount of reduction between the real world and its graphic representation. It is usually expressed as a ration (e.g. 1:1.000.000), or equivalence (e.g. 1 cm = 1.000.000 cm). When a map is printed it always has a fixed scale. 61 3.2.1.2 Display Scale Even though a GIS allows you to zoom in on a map, the display scale should be the scale when the map ‘looks right’. The display scale influences two things about a map: The amount of detail. If a very detailed map is displayed on a small scale it might become overwhelmed with detail, and become too crowded. On the other hand, when a small scale map is shown on a large scale, it will look over-generalised. The size and placement of text and symbols. These must be sized to be readable at the display scale, and placed so that they do not overlap each other. However, it must be remembered that when the smaller scale of the range is used (e.g.1:50 000 000) areas of detail might be merged into black blobs If the map will be used only for illustration of a certain trend (e.g. statistical data), where the detail of the geographical features is less important, it can be acceptable to make a map on a larger scale. Therefore, the actual scale on which a particular data set is displayed or printed greatly depends on the purpose of the map A GIS map’s annotation (text and symbols) must be designed for a certain display scale. If not, the annotation will not look right compared to the displayed map. 3.2.1.3 Data Resolution Data resolution is the smallest difference between adjacent positions that can be recorded and it also limits the minimum size of features that can be stored. The resolution is dependent of the method used to obtain digital data and is tied to the map scale. A typical minimum distance between co-ordinates that can be captured from a paper map is about ⅓ millimetre. The maximum distance between co-ordinates should be 2 millimetre. The table below gives the resolutions corresponding to a certain nominal scale. The resolutions should not be taken as absolute figures, but be used as indications on what resolution ranges should be expected. The maximum distance between co-ordinates is given in the third column. More complex data need more co-ordinates. Therefore, for a complex data set such as a soil more vertices are needed than for a data set such as parcel map. The appreciation of the quality of a data set thus remains subject to the expertise of the evaluator. Table 15: Typical resolution and maximal distance between co-ordinates Scale Resolution in meters Max distance (= 0.0003 m * scale factor) (= 0.002 m * scale factor) 1:100.000 30 200 1:1.000.000 300 2 000 1:3.000.000 900 4 500 3 000 20 000 1:10.000.000 Figure 12: Resolution sample testing on a map with a scale of 1:100 000 ~ 30 m Vertex Node Node Vertex 62 3.2.2 Positional Accuracy Even if the resolution is of required quality, the accuracy of the map might not be accordingly. The geometric accuracy refers to the degree to which information on a map or in a digital database matches true or accepted values. Since digital geographic data is an attempt to model and describe the real world, no map will ever be completely accurate. The accuracy depends on the way the data was created. The level of accuracy varies greatly with each data set. Distances between 0.3 - 1 mm are the smallest distances that can be measured on a map. The distance depends on factors such as line thickness. Accuracy can be expressed in map units (expressed in cm or mm) or real world units (expressed in meters). The conversion between the two is done through the scale. If 1 millimetre is said to be the accuracy for a certain data set this means that any feature on a map is allowed to move around 1 mm compared to the reference material. This implies that the digital map has been printed in the appropriate scale and the same projection system as the reference material. The actual displacement (in meters) of features compared to the used reference material depends on the map scale according to the table below. The maximum displacement allowed is 1 mm, which corresponds to the distances in the table below. Table 16: Displacements when an accuracy of maximum 1mm is accepted Scale Max displacement (0.001 m * scale factor) 1:100 000 100 1:1 000 000 1000 1:3 000 000 3000 1:10 000 000 10000 In order to control the accuracy, the data set should be compared to reference material of appropriate scale. The difficulty within GISCO is that the maps used as source material for the assessed data set are not available. In order to be able to measure accuracy, it is proposed to use topographical maps as reference material, or maps of a similar quality. If the original maps are not used, the measured accuracy might be different from the one given by the data providers. Moreover, when maps of the same scale as those from which a data set is derived are not available reference maps of a more detailed scale must be used. Then it becomes more difficult to give a statement on accuracy. Furthermore, differences might be expected due to data manipulation such as edge-matching and projection conversions. 3.3 Topological quality Topological quality is measured for vector data sets only. Topology defines the spatial relationships between the features in a vector data set. To have a correct topology, some basic quality measures must be fulfilled. 3.3.1 Dangle Nodes Polygon datasets or line networks must be check for dangle node errors. These are essentially end point nodes which represent an overshot or undershot of an intersecting line. 63 Figure 13: A Dangle node In a network, dangle nodes should be snapped to the line it intersects. Where this intersection occurs, the intersecting lines should be split, so the eventual set has three lines, all with coincident end points. Figure 14: Cleaning linework Dangle nodes should normally not be present in a polygon coverage, as these can be an indication of non-closed polygons. Figure 15: Closing polygons Dangle nodes There are however situations when they can be present without representing an error. This can for example occur for so-called ‘cul-de-sacs’. Since it becomes difficult to evaluate which dangle nodes are errors and which are not it is strongly recommended not to have dangle nodes in a polygon coverage. Figure 16: ‘Cul-de-sac’ dangle nodes Dangle node Dangle nodes can be traced with GIS software that supports topology.For line coverages that are said to contain arc/node topology it is important to verify this. The topological data structure is used 64 to represent connectivity between arcs and nodes. The relationship is built up in the attribute table by specifying the ‘from’ and the ‘to’ node for the different arcs. Figure 17: Arc/node connectivity Electricity line From node To node 1 1 4 2 4 3 3 3 2 4 2 1 3 3 2 2 4 4 1 1 3.3.2 Polygon Topology All polygons should be labelled and a polygon should have one and only one label; Figure 18: Polygon Labels A5433 A5432 A5434 Polygons without label can either be sliver polygons or polygons that are not correctly labelled. Sliver polygons are small unwanted polygons that are created when polygon are created from noncoincident lines. No sliver polygons should be present. Sliver polygons often occur when two or more coverages are overlaid. Figure 19: ‘Sliver’ polygons Coastline Sliver Polygons Land Use Land Use Soils Soils The presence of sliver polygons in the data set can often be detected through a control of the polygon size. Very small polygons compared to the average polygon size are likely to be sliver polygons. no dangle nodes are allowed to be present. 65 3.4 Attribute Consistency The attribute data in the data set have to be assessed on the thematic content. Taken into account the number of different themes covered in the GISCO database, the wide EU and pan European coverage, the different definitions and nomenclatures and the limited expertise in a number of these themes, it is not possible to make an exhaustive assessment of the quality of the contents of a data set. There are two general characteristics, consistency and completeness, which are discussed in the paragraphs below. There are three different types of attribute data These are limited code lists, unlimited code lists and continuous values. Depending on the type of attribute the following controls are carried out: 3.4.1 Limited code lists For data sets with limited code lists, such as land use, soil nomenclature and road types, it has to be controlled if each feature is coded with one of the values in the code list. A consistent data set implies that the coding is exhaustive and no other codes than what is available in the legend can be coded in the attribute table. The result of the assessment is the statement if each feature in the data set is coded according to the values in the code list. If this is not the case, it does not mean that the data set is refused for integration. For consistency, errors could be corrected if reference material is available. If this is worth doing so will be decided after the overall quality assessment. Figure 20: Example code list. The polygons that are coded with 127, 128 and 135 are not correctly coded. 123 126 124 125 127 128 135 Code 123 124 125 126 Definition Densely populated area Intermediately populated area Sparsely populated area No data 3.4.2 Unlimited code lists For unlimited code lists, for example settlement lists and administrative region lists, it is difficult to control that all data actually are included. One way to control this is to compare the data set with suitable reference material. Furthermore, it can be verified that all geographic objects have a related record in the attribute table. A map with the codes can be printed out and given to an expert or group of experts who can give a statement of the quality. If exhaustive reference material is not available, it is not possible to give an absolute statement about the quality. A qualitative statement can be given. Table 17: Code list; all the NUTS 1 regions have a related record in the attribute table. Nuts region name (NUTS 1) NORD-PAS-DE-CALAIS ILE DE FRANCE BASSIN PARISIEN … … … … … … … … 3.4.3 Continuous values Continuous values, for example altitude and population number, can to a certain extent be controlled by some statistical checks: • the maximum and minimum values for the data set. In that way outlayers can be detected and controlled. If these values are unreasonable, for example if the maximum population number in a Belgium settlement is 10 million, the thematic content of the data set is not correct. • control of the sum e.g. the sum of population figures in different communes. These figures should then be comparable with the corresponding population figures for the country. • control of the rank distribution e.g. rank the settlements according to their size and verify if the order is the expected. Furthermore, the quality of the continuous values can be assessed through sample checking. The attribute data should then be compared to reference material. This work should be done by experts in the field. 3.5 Topological Consistency 3.5.1 Relationship between datasets If a thematic dataset is based the GISCO Reference Database, then there must be topological consistency between the thematic dataset and the GISCO Reference Database. For instance, if the thematic dataset uses NUTS regions, then these must overlay the NUTS data exactly. There should be no sliver polygons if the two would to be merged together. Similarly, coastline should be consistent between any thematic datasets and the GISCO Reference Database. Note the NUTS regions and coastline are already consistent within the database. You may have a point layer that should be consistent with a polygon layer. For a basic example, every country should have one capital city, but not more than one. Basic spatial overlay tests can be performed in ArcMap and other GIS packages. For data intended for the GISCO Reference Database, more advanced script is written to test the quality of the topological consistency as part of the quality control procedures. 3.6 Completeness 3.6.1 Geographic completeness Completeness measures the amount of spatial features included in a digital data set as a result of data input and data conversion. Completeness of the data should be evaluated for the geographic extent and for the geographic objects.. 3.6.1.1 Geographic extent The geographic extent should correspond to the metadata information. It is particularly important to control that all parts of a country’s territory are available in the data set. Special attention should be drawn to the presence of data on… • the French DOMs: Reunion, Martinique, Guadeloupe, Guyane (France) ; • the Canarias (Spain); • Madeira and Açores (Portugal); …because they are often forgotten. 3.6.1.2 Geographic objects Geographic features that are included in the data set are mentioned and explained in the metadata. Normally, the feature density is scale dependent since a larger scale can include more features without losing its clearness. For each data set the actual feature content has to correspond with the metadata information. For example, if a data set is supposed to include all settlements with a population number of more than 100.000 but actually only includes the capitals of each country, then the data set is not complete. To control the feature content the data set is compared to suitable thematic reference material according to theme, for example road maps, soil maps rather than to national topographical maps. 3.6.2 Attribute Completeness The completeness of the attribute tables is also important to control. Firstly it has to be ensured that all non existing values are marked as no data. Secondly there are not allowed to be too many no data values since that would make the attribute table incomplete. One way to get an idea of the completeness is to calculate the ratio between the number of no data records and the total number of records. 68 3.7 Generalisation 3.7.1 Introduction Section 1.5 of this document introduces Generalisation in ArcGIS and states parameters used for the GISCO reference database. This section looks further into the ‘Point Removal’ and ‘Bend Simplify’ Algorithms and includes advice on how to analyse and improve generalisation results. 3.7.2 Point Remove Point Remove applies a published algorithm with enhancements. It is a fast, simple line simplification algorithm. It keeps the so-called critical points that depict the essential shape of a line and removes all other points. The algorithm connects the endpoints of a line with a 'trend line'. The distance of each vertex to the trend line is measured, perpendicularly. Vertices closer to the line than the tolerance are eliminated. The line is then divided by the vertex farthest from the trend line, which makes two new trend lines. The remaining vertices are measured against these lines, and the process continues until all vertices within the tolerance are eliminated (see the diagram below). Figure 21: Simplification process Point Remove is efficient for data compression and for eliminating redundant details; however, the line that results may contain unpleasant sharp angles and spikes which reduce the cartographic quality of the line. Use Point Remove for relatively small amounts of data reduction or compression and when you don't need high cartographic quality. 3.7.3 Bend Simplify Bend Simplify applies shape recognition techniques that detect bends, analyse their characteristics, and eliminate insignificant ones. A linear feature can be seen as composed by a series of bends, each is defined as having the same sign (positive or negative) for the inflection angles at its consecutive vertices. Several geometrical properties of each bend are compared with those of a reference half circle, the diameter of which equals to the specified simplification tolerance. These measures determine whether a bend is kept or eliminated, meaning replacing the bend by its baseline (the line connecting the endpoints of the bend). The simplification takes place iteratively such that the smaller bends may "disappear" in the early rounds and, therefore, form bigger bends. The resulting line follows the main shape of the original line more faithfully and shows better cartographic quality that from Point Remove. 3.7.4 Choosing a suitable tolerance The tolerance value determines the degree of simplification. To produce cartographic outputs, set the tolerance equal to or greater than the threshold of separation (the minimum allowable spacing between graphic elements). Since the tolerance is used for the entire input, trial and error may be required to find a suitable tolerance for all features. Using the same tolerance, Point Remove produces rougher and more simplified result than Bend Simplify. 69 3.7.5 Analysing and improving the results When the ‘Resolve Errors’ option is used in the command line or in script, or when the "Resolve topological errors" checkbox is checked on the dialog, the process will check for topological errors, line-crossing, coincident lines, or collapsed zero-length lines. If any of these errors are detected after the first round of simplification, the involved line segments (not the entire lines) will be located and a reduced tolerance, 50% of the previously used, will be applied to re-simplify these segments. This iteration will repeat as many times as needed until no more topological errors are found. The output feature class will contain two new attributes, MaxSimpTol and MinSimpTol, which show the range of tolerances actually used in simplifying each line. The two fields, MaxSimpTol and MinSimpTol, will not be added if no errors were found in the process. If the output feature class contains MaxSimpTol and MinSimpTol fields, you can have an estimate on how the specified tolerance worked for the data. If the values of the MaxSimpTol and MinSimpTol fields for the majority of output lines are smaller than the specified tolerance, it means there are many conflicts found during the process and you might want to lower the tolerance. For lines with the values of the MaxSimpTol and MinSimpTol smaller than the tolerance, they may represent a narrow area, such as a narrow, double-line river or two very closed boundary lines. In that case, perhaps simplification of the lines may not be the right solution. The narrow features may need to be represented differently, for example, by single lines. The tool simplifies lines one by one and the longer a line runs, the more pleasing the result. Keep this in mind when you collect or construct the source data. Also wherever possible, position endpoints of lines on long, smooth sections of lines, rather than severely bent sections. 3.8 Data Formats 3.8.1 Popular Vector Formats There are many vector formats within the GIS Industry. As the GISCO Reference Database is based on ESRI Architecture, ESRI formats are generally preferred, although open standard formats such as GML will also be accepted in the future. The following paragraphs describe the ERIS vector data formats and GML. 3.8.1.1 Coverage (ESRI) Coverages have been part of the ESRI ArcInfo software (now part of the ArcGIS suite) since the early 90’s when GIS was in it’s infancy, but is still a viable format for complex geo-processing and spatial analysis. Coverages are a collection of files located in multiple directories. Because of this layout, ArcCatalog must be used to relocate, copy, rename, delete and reformat the data. ESRI provides the export format (e00) which allows all spatial and descriptive information for a coverage to be combined into a single ASCII file. The reverse operation of import recreates the original coverage from the e00 file with no loss in accuracy or detail or topology. Coverages can contain feature classes which are classified as either primary, composite or secondary. Primary features include arcs (lines), nodes, polygons and label points. Composite features such as routes and regions are built from primary features. Secondary features include ground-registration TIC marks and annotation. Multiple features classes can be contained within the same coverage. For example, line and point and route and annotation features could all exist in the same coverage. Perhaps the most useful characteristic of a coverage is the ability to maintain and store topology. Topology is the spatial relationships between vector features within the data structure. These relationships include; • Line Connectivity (i.e. to and from), • Line Contiguity (i.e. adjacency and direction), and • Area Definition 70 Topology is stored very efficiently in the ESRI coverage structure (i.e. no redundant coordinates). 3.8.1.2 Shapefile (ESRI) Shapefiles were first introduced with ESRI’s GIS software ArcView 2. Although ArcView is now part of the ArcGIS 9 suite and has changed dramatically, shapefiles can still be viewed, created and edited. Shapefiles can contain either points, multipoints (multiple points linked to one record), lines or polygons. A shapefile stores non-topological geometry and attribute information for the spatial features in a data set. The geometry for a feature is stored as a shape comprising a set of vector coordinates. An ESRI shapefile consists of a main file, an index file, and a dBASE table. The main file is a direct access, variable-record-length file in which each record describes a shape with a list of its vertices. In the index file, each record contains the offset of the corresponding main file record from the beginning of the main file. The dBASE table contains feature attributes with one record per feature. The one-toone relationship between geometry and attributes is based on record number. Attribute records in the dBASE file must be in the same order as records in the main file. In ArcView 3.2a onwards, a shapefile can also consist of a projection file (.prj). Because shapefiles do not have the processing overhead of a topological data structure, they have advantages over other data sources such as faster drawing speed and edit ability. Shapefiles handle single features that overlap or that are non-contiguous. They also typically require less disk space and are easier to read and write. 3.8.1.3 Geodatabase (ESRI) The ESRI Geodatabase format was introduced with ArcGIS 8. It is designed to hold all types of geographical data and support a relational database structure. It can hold vector and raster data. The types of vector data a Geodatabase supports includes lines, points and polygons, as well as associated annotation. Geodatabases can also store topology. Each geodatabase can have a number of feature datasets. Feature datasets exist in the geodatabase to define a scope for a particular spatial reference. All feature classes that participate in topological relationships with one another (for example, a geometric network or a topology) must have the same spatial reference. Tables can also be linked to feature classes. As with coverages, ArcInfo is used to view, create, copy and rename the different elements. Figure 22: Geodatabase Structure 71 Geodatabases can be stored in a ‘personal’ Geodatabases or a ‘multi-user’ Geodatabases. Personal Geodatabase support is built into ArcInfo and provides access to local data. It is directed towards personal or small work-group use and can handle small to moderately sized datasets. It is held in Microsoft Access format, although this is invisible to the user. Microsoft Access is not required. Figure 23: Personal Geodatabase support A ‘multi-user’ geodatabase (A Geodatabase served through ArcSDE) can manage very large datasets and serve large numbers of viewers and editors. It can also be ‘versioned’. A versioned Geodatabase allows editors to work concurrently and includes a framework for resolving edit conflicts. A Geodatabase served through ArcSDE is supported in ArcIMS. A multi-user Geodatabase can be stored as Oracle 8, SQL, Informix, DB2 or Sybase. Figure 24: Multi-user geodatabase support 3.8.1.4 GML (Geography Markup Language) GML or Geography Markup Language is an XML based encoding standard for geographic information developed by the OpenGIS Consortium (OGC). Like any XML encoding, GML represents geographic information in the form of text. Text has a certain simplicity and visibility on its side. It is easy to inspect and easy to change. Add XML and it can also be controlled. GML encoding already allows for quite complex features. The geometry of a geographic feature can also be composed of many geometry elements. A geometrically complex feature can consist of a mix of geometry types including points, line strings and polygons. It can also hold topology and attribute information. GML is becoming more and more popular within the GIS industry. Below is an example of GML: 72 <Feature fid="142" featureType="school" > <Description>Balmoral Middle School</Description>> <Property Name="NumFloors" type="Integer" value="3"/> <Property Name="NumStudents" type="Integer" value="987"/> <Polygon name="extent" srsName="epsg:27354"> <LineString name="extent" srsName="epsg:27354"> <CData> 491888.999999459,5458045.99963358 491904.999999458,5458044.99963358 491908.999999462,5458064.99963358 491924.999999461,5458064.99963358 491925.999999462,5458079.99963359 491977.999999466,5458120.9996336 491953.999999466,5458017.99963357 </CData> </LineString> </Polygon> </Feature> 3.8.2 Vector Format Limitations 3.8.2.1 Coverages The naming of the coverages are limited to a maximum of 13 characters and you can not use spaces. The options for naming the features are very limited, which can lead to very cryptic naming conventions. Coverages have a number of disadvantages to a Geodatabases. They do not hold subtypes and domains, or feature linked annotation. The relationships in Geodatabase are also more intelligent than those of a coverage. Coverages can no longer be edited in ArcInfo Desktop 8.3 onwards. 3.8.2.2 Shapefiles Like coverages and Geodatabases, shapefiles can support point, line and area features. Unlike a coverage or a Geodatabase however, only a single shape type can be contained per shapefile. More importantly, a shapefile is a non-topological data structure which can limit spatial analysis since connectivity and adjacency information is not explicitly recorded. Shapefiles do not hold relationships. ESRI does not provide an export format for shapefiles. Instead, ESRI recommends that you package the shapefile (e.g. using WinZip) for transfers, archiving and internet access. 3.8.2.3 Geodatabases To edit an advanced Geodatabase feature (e.g. topology), at least an ArcEditor license is required. A personal Geodatabase holding vector data has a size limit of 2Gb. Multi-user Geodatabases need commercial database software such as Oracle 8, SQL, Informix, DB2 or Sybase. 3.8.2.4 GML GML is only now being adopted by the GIS industry. As the file format is text based and uncompressed, data sizes can be large. 3.8.3 Preferred Vector Formats Topology and relationships are important when it comes to integrating data within the GISCO Reference Database. Therefore ESRI export format (e00) created from coverages, or Geodatabases are preferred formats for vector data delivery. GML will be accepted in the future. 73 3.8.4 Popular Raster Formats There are many raster formats available commercially. Most of which can be used within the GIS environment. Below are some of the popular formats that can be georeferenced and viewed in GIS in their correct geographical location. Using GIS software they can also be projected into different coordinate systems. 3.8.4.1 TIFF (Tagged Image File Format) TIFF is a very common format. It can hold colour information at varying bit levels. (eg. a two colour or 1 bit image has a smaller filesize than a 4 colour or 16 bit image). It can have up to 16 million colours (24 bit). A TIF file can hold colours in a particular order, making it a good format for holding sets or raster tiles with a consistent palette (e.g. using an ESRI Image Catalogue). This also means that TIF is a suitable format for holding DEM (Digital Elevation Model) data where each colour can represent a height value. You can store TIFF files as uncompressed or compressed (using LZW compression). Until recently the LZW algorithm used to compress the file was owned by Unisys. It is now royalty free. TIFF files display quickly in a GIS package when a user navigates around. 3.8.4.2 JPEG (Joint Photographic Experts Group) JPEG is a particularly popular format for photographs. It uses 16 million colours, which as a TIF file would make the file size large. However, JPEG uses a compression algorithm which takes into account the different elements of colour in adjacent pixels. This can rapidly reduce file size without losing much quality to the naked eye. The quality of a JPEG file can be specified as a percentage. 3.8.4.3 GIF (Graphics Interchange Format) GIF uses a palette of 16 to 256 colours and is popular with internet developers. It also uses a compression algorithm to make the file size smaller, but with no loss of picture quality (of an image with 256 or less colours). As with compressed TIFF’s, the algorithm used to compress the file was owned by Unisys. It is now royalty free. 3.8.4.4 PNG (Portable Network Graphics) PNG is an extensible file format for the lossless, portable, well-compressed storage of raster images. It has no patent on the compression algorithm, and like TIFF, can store colours in a particular order, making it a good format for holding sets or raster tiles with a consistent palette. 3.8.4.5 DEM (Digital Elevation Model) There are a number of commercial raster formats designed specifically to hold geographical raster data with height information. These include DEM, ESRI’s GRID format, IMG (ERDAS Image Format) and BIL (Band Interleaved by Line Format). All these are regularly used within GIS packages and like TIFF, hold colours as height information, and can hold projection information also. 3.8.4.6 Geodatabase (ESRI) Geodatabases can import most raster data. This allows raster data and vector data to be stored within within the same file. 3.8.5 Raster Format Limitations A TIFF file can have a large filesize depending on the number of colours used, although LZW compression can greatly reduce this. Many GIS packages have been slow to take up GIF and LZW compressed TIFF format because of the algorithm license. Some packages may require an extra license. ESRI’s ArcGIS 9.1 (and a fully patched version of 9.0) does not need a special license file. JPEG’s always holds the maximum number of colours (16 million). This is not a consistent palette. It is automatically created when the file is saved. Every time a JPEG is saves with compression it will loose more quality. 74 Although PNG is a good all round format. It is slower to display than TIFF as it has to be uncompressed on-the-fly. DEM formats are usually not read by raster graphics packages, but have to be created from other raster formats in a GIS package. The same is true for Geodatabases. 3.8.6 Preferred raster formats Generally, JPEG should be avoided as it is more suitable for photographs rather than geographic data. However, photographs may be part of your Geographic database to aid the recognition of land marks. These can often be linked to geographic coordinates with the use of hyperlinks. LZW compressed TIFF is the most suitable format for one off images (such as hillshading) or sets of raster tiles (such as topographic or street maps with a large coverage area. For DEM’s, IMG or BIL formats are preferred. 75 3.9 Documentation of data quality The framework for applying quality assurance procedures and reporting the results is set by the draft ISO standards on quality principles (19113), evaluation procedures (19114), and metadata (19115). 3.9.1 Data Quality Overview According to the GISCO metadata profile, every GIS layer has to be complemented with overview information on data quality. It consists of descriptions of the purpose, the usage and information on the history (lineage) of the GIS layer. Purpose describes the original objectives for creating the GIS layer, usage illustrates the actual usage(s) of the layer by describing related applications. The lineage gives information on the history of the dataset. It covers the total life cycle of a dataset from initial collection and processing to its current form. The lineage statement may contain the component “source information” that describes the origin of the dataset and the component “process step” that records the events of transformations in the lifetime of the dataset. Lineage also includes information on the process and the intervals to maintain a dataset. 3.9.2 Data Quality Elements In addition to the general statements on data quality in the overview elements, it is recommended that the GIS layers include information on quantitative data quality elements. These are completeness, logical consistency, positional accuracy and thematic accuracy. Table 18: Selected data quality elements and sub-elements Quality Element Quality Sub-Element Completeness Commission Omission Logical consistency Conceptual consistency Domain consistency Topological consistency Format consistency Positional accuracy Absolute or external accuracy Thematic accuracy Classification correctness 3.9.3 Descriptors of the Data Quality Sub-Elements The results of the quality assurance for the above mentioned data quality sub-elements should be described using seven descriptors. The descriptors comprise the • • • • • • • scope, measure, evaluation procedure, result, value type, value unit and date of the data quality sub-element. Quality measurements are only valid for defined scopes. The scope can be a geographic or a temporal extent, or a certain level of the data hierarchy (i.e. dataset series, dataset, features, or 76 attributes). The scope may even be different within a single dataset, e.g. if the dataset is merged from different data providers. The data quality measure describes briefly the test that is used for measuring the quality within the defined scope. The evaluation procedure should be described or, alternatively, there should be a reference to where a detailed description of the procedure can be found. This description is very important because it is necessary to understand the result of the applied test. Each test yields a certain result that is part of the data quality report. In order to understand the result, it is necessary to give information on the type of the value and on the unit of measurement. The reporting is completed with the date on which the quality test is performed. 3.9.4 Documenting Quality Information The results of applied quality tests should be documented as part of the metadata. The ISO 19115 provides a defined structure, that follows the logic of the above described data elements, sub-elements and descriptors. The metadata standard distinguishes between data quality information as a report and as information of the history (lineage) of the data. The report comprises information on quality measurements, grouped according to the data quality sub-elements. Figure 25: Conceptual model of metadata description on data quality 77 Example for reporting data quality according to ISO 19115: 78 79 80 81 APPENDICES APPENDIX I - The elements of the GISCO profile Legend: Core metadata for geographic dataset Within GISCO Metadata Editor (ref: ISO19115 2003:(E)) Elements produced by the editor that do not conform to GISCO Schema Profile Section Name: General Information (Mandatory) Page Name (Default/Custom) Field Name Metadata author (D) Encoded Elements Mandatory for Validation Name Organization Position or role in the organization Yes No No Function in relation to the metadata Yes Metadata Information Address No Character coding used for the metadata No set (C) File identifier of the metadata No Scope to which the metadata applies No Name of the hierarchy levels No Name of the metadata standard used No Version of the metadata standard used No DataSet URI Date that the metadata was created No Yes Title Of the dataset Alternative titles Edition or version number Yes No No Date for the edition or version No Title (D) 82 Creation Date and Date when was the dataset first created Yes Language (D) Language in the metadata No Language used in the data Yes Point of contact overview (D) Used to specify the point of contacts to add Point of contact (D) Point of contact’s name Yes if one of the page Fields is encoded Point of contact’s organization No Point of contact’s position or role No Function in relation to the dataset No Point of contact’s address No Themes or Categories (D) Themes or categories Yes Abstract (C) Narrative summary about the content of Yes the dataset Purpose (C) Summary of the intentions with which the No resource was developed Section Name: Data Information (Mandatory) Page Name (Default/Custom) Field Name Encoded Elements Mandatory for Validation 83 Additional Characteristics (D) Keywords Overview Used to add pages for: Keywords Scale Maintenance information Restrictions on use of data Used to specify keywords pages to add (D) Theme Keywords Keywords Yes if keyword info is encoded (D) Place Keywords Thesaurus name Yes if thesaurus info is encoded Thesaurus date Yes if thesaurus info is encoded Thesaurus date type Yes if thesaurus info is encoded Keywords Same as Theme Keywods page (D) Thesaurus name Thesaurus date Thesaurus date type Temporal keywords Keywords Same as Theme Keywods page (D) Thesaurus name Thesaurus date Thesaurus date type Stratum keywords Keywords Same as Theme Keywods page (D) Thesaurus name 84 Thesaurus date Thesaurus date type Discipline Keywords Keywords Same as Theme Keywods page (D) Thesaurus name Thesaurus date Thesaurus date type Scale Single scale Yes if scale info is added (D) Scale range Yes if scale info is added Resolution distance Resolution units of measure No No Maintenance Information How often is the dataset updated Yes if following Next update date is encoded (D) When was the dataset last revised No When is the dataset next scheduled to be updated No Restrictions overview (D) Used to specify which restriction pages to add: Use restrictions Legal restrictions Security restriction Use restrictions Use restrictions (D) Legal restrictions Data access restrictions (D) Data use restrictions Do any other restriction apply No No No 85 Security Restriction Which security restriction has been applied to the dataset No Status of the resource No (D) Status (C) Section Name: Spatial Information Page Name (Default/Custom) Field Name Encoded Elements Mandatory for Validation Spatial Representation Method used to spatially represent geographic information No (C) Vector representation Code which identifies the degree of the complexity No (C) Name of point and vector spatial objects Yes if 'Total number of points…' fields is encoded This page is added Total number of the point or vector object No only if Spatial Representation value is set to 'vector' Grid representation Number of spatial-temporal axes (C) Identification of grid data as point or cell Yes if Grid representation info is added This page is added Indentification of weather or not only if Spatial parameters for transformation exists Representation value is set to 'grid' Dimension name. Name of the axis Yes if Grid representation info is added Yes if Grid representation info is added Yes if information about spatial-temporal properties is added 86 Resolution. Degree of detail in the grid dataset No Dimension size. Number of elements along the axis Yes if information about spatial-temporal properties is added Coordinate System (C) Coordinate system No Geographic bounding box (D) Northern-most coordinate Yes Western-most coordinate Yes Eastern-most coordinate Yes Southern-most coordinate Yes Additional extent Page used to specify additional extent information (Temporal information, vertical information) information (D) Single date No Temporal information (D) Range of date / Start date Yes if End date is added Range of date / Yes if Start date is added End date Vertical Information (D) Minimum height Yes if vertical info is added Maximum height Yes if vertical info is added Units of measure Yes if vertical info is added Vertical datum 87 Section Name: Data Quality Information Page Name (Default/Custom) Field Name General Quality Information (C) Source Information (C) Process step Information Encoded Elements Mandatory for Validation General explanation of the data No producer's knowledge about the lineage of the dataset Source Information No Description Yes if process step info is added (C) Data Quality Elements overview (C) Used to include Data Quality Elements Sections 88 Section Name: Quality completeness commission. Quality completeness omission. Quality conceptual consistency. Quality format consistency. Quality topological consistency. Quality absolute external positional accuracy. Quality gridded data positional accuracy. Quality relative internal positional accuracy. Quality accuracy of a time measurement. Quality temporal consistency. Quality temporal validity. Quality thematic classification correctness. Quality non quantitative attribute accuracy. Quality quantitative attribute accuracy. Page Name (Default/Custom) Field Name [Completeness: Commission] (C) Encoded Elements Mandatory for Validation Name of the test applied to the data No Code identifying a registered standard procedure / Authority No Code identifying a registered standard procedure / Value Code Yes if the above field is added (Authority info) Description of the measure No Type of method used to evaluate quality No Description of the evaluation method No Reference to the procedure information No Date which a data quality measure was No applied 89 Specify a result type (radio button No selection). This will generete either Conformance result page or Quntitative result page Conformance result Explanation Yes if data is added in [Completeness Commission] page (C) Indication of the conformance result Yes if data is added in [Completeness Commission] page Quantitative result Value type for reporting a data quality result (C) No Value unit for reporting a data quality result Yes if Quantitative results info is added Statistical method used No Quantitative value determined by the evaluation procedure Yes if Quantitative results info is added 90 Section Name: Distribution Information Page Name (Default/Custom) Field Name Encoded Elements Mandatory for Validation Introduction (D) Introduction page to specify about the availability of the dataset to third part Publication Date (D) When was the dataset published No Distributor (D) Distributor’s name Distributor’s organization No No Distributor’s position or role No Function in relation to the dataset Yes if distributor info is added Distributor’s address No Page asking if the dataset is published in digital format Digital publication (D) Yes if Publication format info is added Publication format Name of the format (D) Version of the data format Yes if Publication format info is added Off-line delivery options Medium on which the dataset is available No (D) On-line delivery options Where is the dataset located (D) What connection protocol must be used No to access this location What function is performed at this location Yes if 'On-line delivery options' info are added No 91 What happens or what is available at this No location Ordering process Ordering instructions Typical turnaround time for completing (D) No No an order Fees and terms for purchasing the dataset No Section Name: Metadata Attribute Information(Optional) Page Name (Default/Custom) Field Name Table Description Name (C) Description Attribute Label to add Fields Description: Label (read only field) ‘i attribute name’ (C) Encoded Elements Mandatory for Validation Yes if Metadata Attribute info is added No Yes if attribute info is added No Description Field Type No Yes if attribute info is added Source Binary Width No Yes if attribute info is added Precision Scale No No Section Name: Legislation Information (Optional) Page Name (Default/Custom) Field Name Encoded Elements Mandatory for Validation Legislation option Page used to specify weather to add or not legislation information (C) 92 Legislation Information Title Yes if legislation info is added (C) Reference date type Yes if legislation info is added Reference date Yes if legislation info is added Legislation type Yes if legislation info is added Country or other entity to which the legislation corresponds Yes if legislation info is added Internal reference Yes if legislation info is added 93 APPENDIX II - Glossary of Technical GIS Terms ArcCatalog The file management application of ArcGIS. ArcGIS ESRI's GIS suite of software. ArcInfo The highest license available for ArcGIS, allowing full functionality. ArcMap ArcMap is the central application in ArcGIS for all map-based tasks. ArcSDE Server software that provides ArcSDE client applications (for example, ArcGIS Desktop, ArcGIS Server, ArcIMS) a gateway for storing, managing and using spatial data in one of the following commercial database management systems: IBM DB2 UDB, IBM Informix, Microsoft SQL Server, and Oracle. Attribute A property of a geographic feature described numerically or by characters. Attributes are mostly stored in tabular format, and are linked to the feature by a user-assigned identifier. In a geo-relational database model, it describes a spatial feature (e.g. point, line, node or area). Coordinate System A fixed reference framework superimposed onto the surface of an area to designate the position of a point within it; a reference system consisting of a set of points, lines, and/or surfaces; and a set of rules, used to define the positions of points in space in either two or three dimensions. The Cartesian coordinate system and the geographic coordinate system used on the earth's surface are common examples of coordinate systems. Coverage A data model for storing geographic features using ArcInfo software. A coverage stores a set of thematically associated data considered to be a unit. It usually represents a single layer, such as soils, streams, roads, or land use. In a coverage, features are stored as both primary features (points, arcs, polygons) and secondary features (tics, links, annotation). Feature attributes are described and stored independently in feature attribute tables. Coverages cannot be edited in ArcGIS. Dangle Node End point nodes which represent an overshot or undershot of an intersecting line. Dataset Any organized collection of data with a common theme Datum In the most general sense, any set of numeric or geometric constants from which other quantities, such as coordinate systems, can be defined. A datum defines a reference surface. There are many types of datums, but most fall into two categories: horizontal and vertical. Domain In a geodatabase, a mechanism for enforcing data integrity. Attribute domains define what values are allowed in a field in a feature class or nonspatial attribute table. If the features or nonspatial objects have been grouped into subtypes, different attribute domains can be assigned to each of the subtypes. Ellipsoid A three-dimensional, closed geometric shape, all planar sections of which are ellipses or circles. An ellipsoid has three independent axes, and is usually specified by the lengths a,b,c of the three semi-axes. If an ellipsoid is made by rotating an ellipse about one of its axes, then two axes of the ellipsoid are the same, and it is called an ellipsoid of revolution, or 94 spheroid. If the lengths of all three of its axes are the same, it is a sphere. Feature Class A collection of geographic features with the same geometry type (such as point, line, or polygon), the same attributes, and the same spatial reference. Feature classes can stand alone within a geodatabase or be contained within shapefiles, coverages, or other feature datasets. Feature classes allow homogeneous features to be grouped into a single unit for data storage purposes. Feature Dataset A collection of feature classes stored together that share the same spatial reference; that is, they have the same coordinate system, and their features fall within a common geographic area. Feature classes with different geometry types may be stored in a feature dataset. Foreign Key A column or combination of columns in one table whose values match the primary key in another table. A value in the foreign key can only exist if there is a corresponding value in the primary key, unless the value is NULL. Foreign key–primary key relationships define a relational join. Generalisation Simplification of map information, so that information remains clear and uncluttered when map scale is reduced. Usually involves a reduction in detail, a resampling to larger spacing, or a reduction in the number of points in a line. Traditionally this has been done manually by a cartographer, but increasingly semi-automated and even automated methods have been used, particularly in conjunction with a GIS. Geodatabase An object-oriented data model introduced by ESRI that represents geographic features and attributes as objects and the relationships between objects but is hosted inside a relational database management system. A geodatabase can store objects, such as feature classes, feature datasets, nonspatial tables, and relationship classes. GIS Geographic Information System: an organised set of computer hardware, software, geographic data and personnel designed to capture, store, maintain, analyse and present all kinds of spatially-referenced information in the most efficient way. Legend The reference area on a map that lists and explains the colors, symbols, line patterns, shadings, and annotation that have been used on the map to code the various elements and data values. The legend includes a sample of each symbol with text describing what it means. Legends often include the map's scale, origin, and projection. Map Projection A mathematical model that transforms the locations of features on the earth's surface to locations on a two-dimensional surface, normally using Cartesian co-ordinates. Meridian A great circle on the earth that passes through the poles, often used synonymously with longitude. Metadata Information about the content, quality, condition, and other characteristics of data. Node A node is an endpoint of an arc. The from-node is the first vertex in the arc; the to-node is the last vertex. 95 Polygon A polygon is an area defined by arcs that make up its boundary, including arcs defining any islands inside. It is a many-sided, closed figure, defined by a series of arcs and by a label point positioned inside the polygon. Primary Key A column or set of columns in a database that uniquely identifies each record. A primary key allows no duplicate values and cannot be NULL. Raster Spatial data recorded using cells. The location of the feature is described through its position in a grid (consisting of rows and columns). The cell at this particular position has a certain value, being the value for the geographic phenomenon represented. Schema The structure or design of a database or database object such as a table. In a relational database, the schema defines the tables, the fields in each table, and the relationships between fields and tables. Shapefile A vector data storage format for storing the location, shape, and attributes of geographic features. A shapefile is stored in a set of related files and contains one feature class. Subtype In geodatabases, a subset of features in a feature class or objects in a table that share the same attributes. Thematic Data This is data where non geographical data sets are connected (eg. demographic data) with an indirect geographical reference (eg. region code). Topology Procedure for the explicit definition of spatial relationships between connecting or adjacent coverage features. Vector Geographic features are recorded as sets of co-ordinates using a Cartesian co-ordinate system. The location of the feature is described through a series of x,y co-ordinates. Vertex One of a set of ordered x,y co-ordinates that compose a line feature. 96 APPENDIX III - Glossary of Abbreviations API Application Program Interface CORINE CoORdination of INformation on the Environment DBMS DataBase Management System DEM Digital Elevation Model ETRS European Terrestrial Reference System EU25 The 25 Countries of the European Union FDD Frequency Distribution Diagram GIF Graphics Interchange Format GISCO Geographic Information System of the European Commission GML Geography Markup Language HDV Histogram of Data Values HTML HyperText Markup Language IATA International Air Transport Association ICAO International Civil Aviation Organisation INSPIRE INfrastructure for SPatial InfoRmation in Europe INTERREG An EU-funded programme that helps Europe’s regions form partnerships to work together on common projects ISO International Organization for Standardization JPEG Joint Photographic Experts Group LAEA Lambert Azimuthal Equal Area LCC Lambert Conformal Conic LZW Unisys patented Lempel Ziv Welch data compression and decompression technology. MARS Monitoring Agriculture with Remote Sensing NMA National Mapping Agencies NSI National Statistical Institutes NUTS Nomenclature of territorial units for statistics. This abbreviation is only applicable to EU members. OGC Open Geospatial Consortium 97 PNG Portable Network Graphics RDBMS Relational DataBase Management System ROI Return On Investment SDI Spatial Data Infrastructures SOAP Simple Object Access Protocol TIFF Tagged Image File Format TM Transverse Mercator UDDI Universal Description, Discovery, and Integration UML Unified Modeling Language UN/ECE United Nations Economic Commission for Europe WFS Web Feature Service WMS Web Map Service XML eXtensible Markup Language 98 APPENDIX IV – List of Document Sources This document is a combination of new material and material collated from other sources which have been edited and/or updated. Below is a list of sources that have been used for the base material of various chapters. 1.1 Naming Conventions GISCO Database Manual SADL, K.U.Leuven R&D Copyright © 2005 Eurostat 1.3 Spatial Reference System Map Projections for Europe Joint Research Centre EuroGeographics Bundesamt für Landestopographie, Switzerland Bundesamt für Kartographie und GeodäsieGermany 1.4 Grid Creation Standards 1st Workshop on European Reference Grids Ispra, 27-29 October 2003 JRC-IES-LMU-ESDI 1.5 Generalisation Parameters ArcGIS Help Copyright © 2004 ESRI 1.6 Spatial Data Standards and Spatial Data Standards and GIS Interoperability GIS Interoperability An ESRI White Paper - January 2003 Copyright © 2003 ESRI OGC Website http://www.opengeospatial.org 2.4 Making GISCO Maps GISCO Desktop Mapping guide Desktop Mapping guide, version 2 Geographical Information System for the Commission Directorate E - Unit E4 Structural Funds Mapping Tool User Guide Lovell Johns Ltd, Oxfordshire, UK. Copyright © 2005 Eurostat 3.1-3.6 Data Quality Quality Assessment and Quality Control related to GISCO data Copyright © 1998 Eurostat 3.7 Generalisation ArcGIS Help Copyright © 2004 ESRI 99