Download Geographic Guidelines

Document related concepts

Relational model wikipedia , lookup

Clusterpoint wikipedia , lookup

Entity–attribute–value model wikipedia , lookup

Functional Database Model wikipedia , lookup

Object-relational impedance mismatch wikipedia , lookup

Database model wikipedia , lookup

Transcript
Guidelines for Geographic Data
Intended for the GISCO Reference Database
Lovell Johns Ltd, Witney, UK
Copyright © 2005 Eurostat
3rd November 2005
Contents
Introduction .........................................................................................................3
i
ii
iii
iv
v
vi
vii
Introduction to Eurostat................................................................................................................ 3
What is GISCO? .......................................................................................................................... 3
INSPIRE and the GISCO project ................................................................................................. 4
GISCO Reference Database ....................................................................................................... 5
Contact Information ..................................................................................................................... 6
Purpose of guidelines .................................................................................................................. 6
Intended audience ....................................................................................................................... 6
1
Geographic Guidelines: Normative Section.....................................................7
1.1
1.1.1
1.1.2
1.2
1.2.1
1.2.2
1.2.3
1.2.4
1.2.5
1.2.6
1.2.7
1.3
1.3.1
1.3.2
1.3.3
1.3.4
1.3.5
1.4
1.4.1
1.4.2
1.5
1.5.1
1.5.2
1.5.3
1.6
1.6.1
1.6.2
2
2.1
2.1.1
2.1.2
2.1.3
2.2
2.2.1
2.2.2
2.3
2.3.1
2.3.2
2.4
2.4.1
2.4.2
2.4.3
2.4.4
2.4.5
2.4.6
3
3.1
3.2
3.2.1
3.2.2
3.3
3.3.1
3.3.2
3.4
3.4.1
3.4.2
3.4.3
Naming Conventions..................................................................................................................... 7
Introduction .................................................................................................................................. 7
Naming Conventions Description ................................................................................................ 8
Metadata ......................................................................................................................................... 18
The Metadata Standard ............................................................................................................... 18
The GISCO Profile....................................................................................................................... 18
The mandatory elements of the GISCO profile............................................................................ 18
The Metadata Editor .................................................................................................................... 19
Metadata Editor Use (Within ArcCatalog).................................................................................... 19
Validation of compliance of metadata to GISCO profile .............................................................. 26
Metadata Server .......................................................................................................................... 27
Spatial Reference System............................................................................................................. 29
Introduction .................................................................................................................................. 29
ETRS89 Ellipsoidal Coordinate Reference System (ETRS89).................................................... 30
ETRS89 Transverse Mercator Coordinate Reference System (ETRS-TMzn) ............................ 31
ETRS89 Lambert Conformal Conic Coordinate Reference System (ETRS-LCC) ...................... 34
ETRS89 Lambert Azimuthal Equal Area Coordinate Reference System (ETRS-LAEA)............. 36
Grid Creation Standards ............................................................................................................... 38
Introduction .................................................................................................................................. 38
European Grid Coding System .................................................................................................... 38
Generalisation and Generalisation Parameters.......................................................................... 43
Introduction .................................................................................................................................. 43
Generalisation in ArcGIS ............................................................................................................. 43
GISCO Reference Database Generalisation Parameters ........................................................... 43
Database Interoperability.............................................................................................................. 45
Overview...................................................................................................................................... 45
Spatial Data Standards and GIS Interoperability ......................................................................... 45
Geographic Guidelines: Non-Normative Section ............................................51
What are good practices and what are bad practices for creation of data .............................. 51
Data delivery characterset ........................................................................................................... 51
Data including names .................................................................................................................. 51
Avoid specific Geodatabase features .......................................................................................... 51
Some tips and tricks to make the dataflow more fluent ............................................................ 52
Delivering data in (personal) geodatabase .................................................................................. 52
Use GISCO standard database projection, extent and precision ................................................ 52
Experienced difficulties in loading data in the new GISCO structure ...................................... 53
Delivering in shapefile format ...................................................................................................... 53
Problems with charactersets........................................................................................................ 53
Making GISCO Maps...................................................................................................................... 54
Introduction .................................................................................................................................. 54
Why do we use maps? ................................................................................................................ 54
Statistical Analysis in GIS ............................................................................................................ 54
What is a thematic map? ............................................................................................................. 55
Introduction to mapping concepts................................................................................................ 56
Creating a statistical map using the ‘Mapping Tool’ .................................................................... 57
Geographic Guidelines: Data Quality Section .................................................61
Quality Assurance Principles ....................................................................................................... 61
Geometric Quality.......................................................................................................................... 61
Scale and Resolution................................................................................................................... 61
Positional Accuracy ..................................................................................................................... 63
Topological Quality ....................................................................................................................... 63
Dangle Nodes .............................................................................................................................. 63
Polygon Topology ........................................................................................................................ 65
Attribute Consistency ................................................................................................................... 66
Limited code lists ......................................................................................................................... 66
Unlimited code lists...................................................................................................................... 67
Continuous values ....................................................................................................................... 67
1
3.5
3.5.1
3.6
3.6.1
3.6.2
3.7
3.7.1
3.7.2
3.7.3
3.7.4
3.7.5
3.8
3.8.1
3.8.2
3.8.3
3.8.4
3.8.5
3.8.6
3.9
3.9.1
3.9.2
3.9.3
3.9.4
Topological Consistency .............................................................................................................. 67
Relationships between datasets .................................................................................................. 67
Completeness ................................................................................................................................ 68
Geographical Completeness ....................................................................................................... 68
Attribute Completeness ............................................................................................................... 68
Generalisation................................................................................................................................ 69
Introduction .................................................................................................................................. 69
Point Remove .............................................................................................................................. 69
Bend Simplify............................................................................................................................... 69
Choosing a suitable tolerance ..................................................................................................... 69
Analysing and improving the results ............................................................................................ 70
Data Formats.................................................................................................................................. 70
Popular Vector Formats............................................................................................................... 70
Vector Format Limitations............................................................................................................ 73
Preferred Vector Formats ............................................................................................................ 73
Popular Raster Formats............................................................................................................... 74
Raster Format Limitations............................................................................................................ 74
Preferred Raster Formats ............................................................................................................ 75
Documentaion of Data Quality ..................................................................................................... 76
Data Quality Overview ................................................................................................................. 76
103Data Quality Elements ........................................................................................................... 76
Desriptors of the Data Quality Sub-elements .............................................................................. 76
Documenting Quality Information ................................................................................................ 77
APPENDICES
APPENDIX I – The elements of the GISCO profile ...................................................................... 82
APPENDIX II - Glossary of GIS Terms ......................................................................................... 94
APPENDIX III - Glossary of Abbreviations .................................................................................. 97
APPENDIX IV - List of Document Sources .................................................................................. 99
2
Introduction
i.
Introduction to Eurostat
A consistent European policy requires appropriate data, upon which well-founded decisions can be
taken and a solid policy can be built. The collection and maintenance of such suitable data is one
of the tasks of Eurostat, the body responsible for statistics within the European Commission.
The initial impetus for the development of GISCO came from the Unit for Environment Statistics
within the Directorate also responsible for agricultural statistics. The Directorate had already been
involved in the MARS Programme (Monitoring Agriculture with Remote Sensing) and the need to
integrate environmental information with more classical statistical indicators encouraged the use of
a geographical reference frame as a mechanism to relate such information. A significant influence
in these developments was the CORINE Programme (CoORdination of INformation on the
Environment), established in 1985, which assembled environmental and non-environmental
geographical data in a GIS system.
As a result, the GISCO project was initiated and resulted in a broad range of GIS-related services
for Eurostat, the European Commission and European organisations.
ii.
What is GISCO?
ii(a). Organisation
GISCO is the Geographic Information System of the European Commission. Originally conceived
as a prototype GIS cell that would serve a wide spectrum of users and uses, the GISCO project
has developed a service-oriented dimension, namely in geographical database development,
thematic mapping, desktop mapping and dissemination of data. Providing these types of services
is directly related to key parts of the GISCO mandate.
The GISCO team consists of four distinct modules with the following tasks:
•
•
•
•
GISCO Reference Database;
Mapping and Spatial Analysis;
Contact with users, producers and COGI;
INSPIRE.
ii(b). Mandate
The mandate of GISCO can be subdivided according to the different actors that are involved.
GISCO takes initiatives at the level of the Directorate General, Eurostat, at the level of the
Commission, within the European statistical system and on the international level. The mandate of
GISCO comprises the following tasks:
1. Within Eurostat
• Raise awareness concerning ways to combine geographical and statistical information;
• Promote and stimulate the use of GIS;
• Manage the GISCO reference database;
• Ensure quality control of Eurostat GIS products;
• Act as a consultancy and reference centre for map production and spatial analysis;
• Oversee GIS related projects.
2. Within the Commission
• Manage the GISCO reference database including database structure, architecture, data
transfer, ...;
3
•
•
•
Chair the GISCO user committee and technical committee, organise and chair the
meetings of the COGI;
Express and support users' needs in terms of software and hardware;
Promote and actively participate in the co-ordination of Commission activities in the area of
GI and GIS.
3. Within the European statistical system
• Promote geo-referencing of statistics and encourage the integration of GIS in the national
statistical offices;
• Promote collaboration between national statistical institutes (NSI) and national mapping
agencies (NMA);
• Promote harmonisation and co-ordination of the GI management systems used in
statistical organisations,
• Ensure standardisation and harmonisation in the exchange of geographical information
between Member States and Eurostat;
• Co-ordinate the participation of European statisticians in GI and GIS activities, promote
their know-how in standardisation processes and ensure that their needs are taken into
account in market developments.
4. International co-operation
• Promote co-operation between NMA at European level and harmonise their approaches
on technical matters and commercial policies, including pricing and copyright;
• Pursue the harmonisation of EU and broader international initiatives in GI;
• Participate in GIS-related projects in other statistical international organisations, for
example, the United Nations Economic Commission for Europe (UN/ECE).
iii.
INSPIRE and the GISCO project
The Commission is currently preparing legislation aimed at improving the integration of spatial data
in Europe. The initiative is known as INSPIRE (INfrastructure for SPatial InfoRmation in Europe). A
spatial data infrastructure is considered as an interacting system of basic geographical data,
spatial information services, technical standards and specifications and an institutional framework.
The initiative is based on the following principles that guide its activities:
1. Data should be collected once and maintained at the level where this can be done most
effectively;
2. It should be possible to combine seamless spatial information from different sources
across Europe and share it between many users and applications;
3. It should be possible for information collected at one level to be shared between all the
different levels, detailed for detailed investigations, general for strategic purposes;
4. Geographic information needed for good governance at all levels should be abundant
under conditions that do not refrain from its extensive use;
5. It should be easy to discover which geographic information is available, fits the needs for a
particular use and under which conditions it can be acquired and used;
6. Geographic data should become easy to understand and interpret because it can be
visualised within the appropriate context and selected in a user-friendly way.
The INSPIRE initiative was developed with the active collaboration of the main stakeholders
concerned. During 2002, six working groups helped to draw up the various components of the
infrastructure:
•
•
•
•
•
•
joint reference data and metadata;
environmental data;
data policy and legal aspects;
architecture and standards;
financing and implementation structures;
impact analysis.
4
The reports produced by these groups and other documents on the INSPIRE initiative can be
found at the following address: http://www.ec-gis.org/inspire.
The proposal for an INSPIRE Directive has a framework structure, which needs further technical
refinement through Implementing Rules. Five Implementing Rules have been identified in the
proposal for a Directive, respectively dealing with:
•
•
•
•
•
Creation and updating of metadata for spatial data and spatial data services;
Harmonised spatial data specifications;
Network services and interoperability;
Rules governing access and rights of use to spatial data sets and services for Community
institutions and bodies;
Monitoring and reporting of the implementation of the Directive.
The future Directive will address the Member States who will subsequently transpose the Directive
into national/regional legislation. Following good governance practices, the Commission, as the
initiator and facilitator of INSPIRE, is also engaged to comply with future INSPIRE measures for all
spatial data and services held and managed by the Commission itself. In line with Member States
expectations, GISCO and related SDI components within the Commission will have to
progressively become the EU node in a distributed EU-wide SDI architecture. GISCO has to
ensure INSPIRE compliant development of GI service components as part of the Commission’s
internal GI infrastructure.
More information on INSPIRE can be found at http://inspire.jrc.it.
The new GISCO database should follow the INSPIRE principles. The technology used will have to
apply to the INSPIRE standards. The applications should be compatible with applications of a
future INSPIRE network.
iv.
GISCO Reference Database
Within the framework of the GISCO project, an extensive geo-referenced database has been
developed. One of the main topics of the GISCO mandate is to extend, maintain and update this
database.
The numerous data sets offered by GISCO include:
Topographical data:
•
•
•
•
hydrography (water patterns, lakes,...);
altimetry (digital elevation model);
infrastructure data (ports, airports, roads, rail networks, ...);
administrative entities (countries, regions, ...).
Thematic data:
•
•
•
•
land resources (land cover, soil data, vegetation, climatic conditions, ...);
Community support frameworks (structural funds, INTERREG, ...);
environmental data (coastal erosion, soil erosion, ...);
industrial themes (energy transport networks, location of nuclear power stations, ...).
The GISCO Reference Database is the central database of the GISCO System Architecture. This
system will be extended to also contain other spatial databases, such as IMAGE 2000, that are
regarded as complementary data. A user will be able to connect to more than one database via
different interfaces, connecting directly to another spatial database, or the user can use an Internet
Map Server connection to retrieve and view web maps with a web browser, to view the data in a
web browser application, or access web services.
5
v.
Contact Information
Points of contact for further information on how to access the GISCO service and GISCO
database:
Eurostat
GISCO project
Rue Alcide Gasperi
Batiment Bech D3/704
L-2920 LUXEMBOURG
Tel: (352) 4301 - 32076
Fax: (352) 4301 - 34029
E-mail functional mailbox for GISCO: [email protected]
Intranet website: www.cc.cec/gisco_eurostat
vi.
Purpose of guidelines
These guidelines have been created in order to assure the compatibility of newly generated or
converted geographic data with the GISCO standards.
The guidelines are divided into three main sections:
The normative section addresses particular standards that must be met such as naming
conventions, geographic reference systems, grids and metadata. Some of these chapters are
already contained in the first section of the GISCO database manual.
The non-normative section discusses difficulties in data loading, good and bad practices in the
creation of data and some tips and tricks to make data flow more fluent. In depth guidelines are
also included on how to create GISCO maps presenting some of the main principles of cartography
and thematic mapping
The final section of the guidelines addresses data quality and consistency. This section discusses
different quality elements that should be adhered to before GIS data should be used with, or
included into the GISCO Reference Database.
There are appendices containing a glossary of common GIS terms and abbreviations used in the
document. This document is a combination of new material and material collated from other
sources which have been edited and/or updated. The appendices also contain a list of sources that
have been used for the base material of various chapters.
vii.
Intended audience
The main audience for the guidelines are data integrators of the GISCO Reference database. The
users of the GISCO database should have a good knowledge of GIS and geographic data. These
guidelines do not explain what GIS is, nor introduce basic GIS principles such as projections or
metadata. It does not explain how to work with ArcMap or any other GIS software component.
The guidelines are also relevant for the GISCO Management to give an understanding of the
procedures used for the integration of geographic data. Other developers wanting to work
conformity with GISCO standardised procedures may also find these guidelines useful.
6
1 Geographic Guidelines: Normative Section
This section addresses particular standards that must be met when creating data intended for the
GISCO Reference Database such as naming conventions, geographic reference systems, grids
and metadata. Some of these sections are already contained in the first section of the GISCO
database manual, others are collated from other sources or unique to this document.
The non-normative section discusses difficulties, tips, and guidelines. The final section of the
document addresses data quality and consistency.
1.1
Naming Conventions
1.1.1 Introduction
GISCO database features (eg. feature classes, tables) are held in a relational database structure.
The aim of the naming conventions is:
•
•
•
to reflect the contents of the feature in a standardised, concise way;
to reflect the logical and physical location of the feature within the database;
to assure uniqueness of the feature name within the database.
A sequence of abbreviations is therefore used to describe the contents of a database feature.
The codes are grouped into code lists according to their meaning. Syntax rules define the
sequence and the reading of the codes. The names of features, tables and attributes are
composed according to the following categories:
•
•
•
•
•
Topic
Feature data themes, feature classes, object classes and subtypes are named according to
their topic category.
Entity
The type of a feature or object, e.g. region, boundary, point.
Scale, Accuracy, Precision
Time stamp or Version
Source
The naming conventions describe naming rules for the following database features:
•
Feature data themes
•
Feature, Object classes and subtypes
•
Relationships
•
Domains
•
Attributes
The attribute and class names can not exceed 30 characters length. This restriction is due to the
limitation in length for the names of tables and attributes in ORACLE.
Long names are self explanatory, but become uncomfortable to deal with in programs, scripts,
table headings, etc. Sensible and defined contractions in the attribute and table names can help to
the readability of documentation and programming code.
The name of the features and objects in the geodatabase is not meant to be a subset of the
metadata. These names must contain the minimum information required to uniquely identify the
entity they represent
7
1.1.2 Naming Conventions Description
1.1.2.1 Feature data themes
Alphanumeric data (object classes) and geometry (feature classes) will be conceptually grouped in
feature data themes. This concept substitutes the former “layers” and does not make part of the
geodatabase structure. Feature data themes are hierarchically independent of feature datasets, i.e.
one feature dataset could contain several feature data themes or vice versa.
The first step to define the names of the different classes and attributes is to identify the
geographical entity modelled in the feature data theme. The datatype name can be as long as
desired. It represents an abstract concept and it is not bound to the limitation of name length in
databases.
Every datatype will have associated a short name. The short name will have a maximum of 4
characters. The generic words “area”, “zones”, “location”, “patterns” etc will be disregarded when
choosing the short name.
Table 1: Examples of short names
Feature Type (long name)
Territorial Units for Statistics (NUTS + Statistical Regions)
Communes
Structural Funds Zones
Urban Audit Areas
Designated Areas
Inland water
Short name
NUTS
COMM
STFD
URAU
DSIG
INWA
A feature data theme will comprise at least one feature class or object class.
1.1.2.2 Feature/object class & subtypes names:
The information that will make part of the class name must be exclusively the information needed
to identify uniquely the class. This can be (or not, depending on the final model) the case of the
version, scale, time stamp or source. It should not be the case of the extent (EU, EC, WD). The
former “extent” segment in the name of the coverage will not appear in the new naming
conventions. The projection is also dropped as the vector data will be stored in geographical
coordinates. Raster data will be stored in either of the standard coordinate reference systems.
1.1.2.3 Class identification
In order to identify feature classes within a feature data theme, they will be extended by a class
identification. The class name is conditional, if feature data theme name and entity type name do
not uniquely identify a feature or object class.
The class identification must be exclusively based on concepts essential to the class. Scale or time
stamps are not essential to any class.
- NUTS-1
The class identification will always be a singular noun.
Every class identifier will have a short name associated. This short name will have a maximum of 4
characters.
8
1.1.2.4 Entity type
The entity type describes the concept for modelling a certain feature class. The table gives an
overview on the keywords that should be used for describing the type of entity. The description of
the entity type is mandatory. The entity type is abbreviated with 2 characters.
Table 2: Keywords for describing entity types
Short
Name
PO
Long Name
Description
Example
Polygon
Lake polygon
RG
Region
BN
LI
Boundary
Line
NW
Network
LB
Label
PT
ND
AN
RT
Point
Node
Annotation
Route
GR
Grid
IM
Image
A closed, two-dimensional figure with at
least three sides that represents an
area. It is used in GIS to describe spatial
elements with a discrete area, such as
parcels, political districts.
Area feature that can represent a single
area feature as more than one polygon
(multipart polygons).
Line feature separating polygon features
Line feature representing a geographical
entity
An interconnected set of lines
representing possible paths from one
location to another (routing aspect)
Point feature, used a reference of a
polygon
Feature modelled as point
End point of a line feature
Text feature for annotating a map
Linear feature specifying a path through
a network
A data format for storing raster data that
defines geographic space as an array of
equally sized square cells arranged in
rows and columns. Each cell stores a
numeric value that represents a
geographic attribute (such as elevation)
for that unit of space.
A raster-based representation or
description of a scene, typically
produced by an optical or electronic
device, such as a camera or a scanning
radiometer.
Nuts regions
Nuts boundary
Road network
Centroid of NUTS
region
Settlement
Road junctions
Ocean names
Digital elevation
model
Satellite image
Examples:
The datatype “NUTS” models only NUTS regions. The class identification can be dropped, The
entity type is a generic one: NUTS-RG, NUTS-BN, NUTS-LB.
The datatype “Urban Audit” models 3 different entities: Cities, Kernels and Large Urban Zones.
The generic class identifications can not be used. In stead, specific class identifications are
needed: URAU-CITY-PO, URAU-CITY-LB, URAU-LUZO-PO
9
1.1.2.5 Additional identifiers
Additional identifiers have to be used in order to uniquely name a feature or table. Additional
identifiers are scale or precision, version or time stamp and the source.
1.1.2.6 Syntax rules
The class name will be built up by adding the following strings, in this order:
- Feature data theme short name
(compulsory)
- Class identification short name
(conditional)
- Entity type
(compulsory)
- Scale or precision: 100K, 1M, 200M
(conditional)
K stands for “thousand”
M stands for “million” or “metres” (no lower case allowed)
- Version: Vxx
- time stamp:
- source.
(If needed)
(If applicable and needed)
(If needed)
No spaces are allowed. The different segments will be joined by a “-“.
In order to get a stable and logic sort of feature and object class names the use of leading zeroes
in the scale, version and time stamp segments should be considered.
Examples:
NUTS-RG-01M-2003 (Feature data theme – entity type – scale – time stamp)
If the NUTS levels are separated in different feature classes, the feature class identification
expressing the NUTS level should be added: NUTS-1-RG-01M-2003
If all the generalised versions are hosted in the same feature class, then the scale should be
omitted in the feature class: NUTS-RG-2003 although it could appear again in the subtypes that
host separately the different scales.
NUTS-LB-2003 (Feature data theme – feature class identification – time stamp)
In this case the NUTS level should be an attribute of the feature class. If the labels are separated
in different feature classes by NUTS level then the NUTS level should be part of the feature class
identification: NUTS-3LAB-2003. Neither scale nor precision are applicable
- STFD-PO-2000_2006 (Feature data theme – Feature class identification – time stamp. If only one
scale is available, then “1M” is not needed. This information appears in the metadata)
- URAU-CITY-PO (Urban audit cities)
- URAU-LUZO-PO (Urban Audit Large Urban Zones)
- URAU-CITY-LB (Urban Audit City labels)
In the actual design of the Urban Audit dataset, the geometry of the Urban Audit I, Urban Audit II
and French National Urban Audit are sharing the same coverages. In consequence, neither
version nor time stamp should make part of the feature class name.
In general, version and time stamp will only exceptionally appear at the same time. Time stamp is
preferable to version. The version number does not give much information. “V9” only means that
there were 8 versions before it, while “2003” gives a more precise idea of the validity of the feature
class
The source name will rarely be needed.
10
A list of “Class identifications” and “class identification short names” must be defined and carefully
updated. Before defining a new “class identification”, it must be verified that none of the existing
ones is suitable for the new class. Names, that are defined for entity types must not be used with
class identifiers, e.g. label, regions, etc.
Table 3: Example of class identification short names
Feature/object class
identification
Airport
City
Commune
Condominium
Country
Short
name
AIRP
CITY
COMM
COND
CTRY
1.1.2.7 Relationship names and role names in Arc GIS (Forward/Backward path label)
The name of relationships in ArcGIS will be a noun (when possible a “-ship” noun). that gives a
general description of the relationship.
The role of the entities involved in a relationship (called in ArcGIS terminology “Forward /
Backward path label”) will be a verb. The forward and backward path label can be labelled with
different verbs, but it is a common practice to label one with an active form and the other with the
same verb in passive form. Example:
Relationship name: drainage
Forward path label: drains to
Backward path label: is drained by
On the one hand, these names will have not correspondent in the Oracle database. On the other
hand, these relationships will be implemented as tables or attributes in Oracle, as described in the
following paragraphs.
1.1.2.8 1-to-1 relationships between feature class and object class
In a correct design, the attributes of an object class having a 1-to-1 relationship to a feature class
should be integrated in the feature class, i.e. the object class should not exist.
This is not applicable when the object class has a 1-to-1 relationship to at least two feature classes
(for instance, different generalisations of the same entity). In this case redundancy should be
avoided and the attributes will be either integrated in one of the feature classes or simply
separated in an object class. In this last case the name of the object class will be all the common
concepts identified in the attributed feature classes. For instance:
- How to call the attributes of the NUTS 2003? Let us assume that the following feature classes are
defined:
- NUTS-3-RG-01M-2003
- NUTS-3-RG-03M-2003
- NUTS-3-RG-10M-2003
- NUTS-3-LB-2003
The object class that contains the attributes for all these feature classes would be “NUTS-3-RG2003-ATTR”. The concept “Region” is included in the identifier “LB” i.e. “Region label”.
11
Warning! This method does not guarantee that ANY feature class with the words “NUTS”, “3”, “RG”
and “2003” are related to the object class “NUTS-3-RG-2003-ATTR“ but it is very unlikely if the
feature types and feature classes are sensibly chosen.
1.1.2.9 1-to-many relationships
The 1-to-many relationships should be implemented as a foreign key in the “many” side of the
relationship. The foreign key name should follow the conventions described in the paragraph
“Attribute names”
1.1.2.10 Many-to-many relationship names
The many-to-many relationships are physically stored as an object class. The name of the object
class that represents a many-to-many relationship will be built up by the following segments:
First end class/subclass name
Second end class/subclass name
A noun (when possible a “-ship” noun) that gives a general description of the relationship
When both ends belong to the same feature data theme:
The datatype short name should be omitted.
If both have the same time stamp and/or version they should appear only once in the second end
The noun for describing the relationship has a maximum length of 8 characters. If the resulting
table name exceeds 30 characters, the noun will be shortened to reach the maximum number of
characters allowed
Example:
COMM-COND-2001-MANAGMNT
NUTS-RG-BN-2003-DELIMIT
1.1.2.11 Domain names
Factually, all classes that are pointed by a foreign key in another class are domains. In this
paragraph we will refer only to the “coded value domains”
According to the scope of the domain, they can be classified as general purpose domains (used in
several classes in different datatypes) specific usage domain (strongly related to one single
datatype). In order to make the naming conventions easier to understand and to apply, both types
of domains will be treated equally.
The domain name will not contain any reference to the datatype it belongs to (if any). The name
will be as descriptive as possible. Sometimes it requires an effort to find the word that better
describes the criteria followed to define the domain. The forbidden words (“type”, “class” or
“status”) should not be used in the domain name. The name should be composed of the attribute
name and the extension DOM. In case the domain name does not uniquely identify a table,
additional identifiers, such as time stamps can be added to the name.
Example:
The name of the domain for the attribute “ISO-LANG-CODE” will be “ISO-LANG-CODE-DOM”.
The name “ISO-LANG-CODE-2001-DOM” identifies the domain for ISO language codes valid in
2001.
12
1.1.2.12 Attribute names
The common object oriented nomenclature concatenates class and attribute to uniquely identify an
attribute (example: “NUTS-3-RG-2003.NUTS-ID”). This way, an attribute name can be repeated in
different feature or object classes. The attribute name will omit all references to the feature class
identification, scale or time stamp that are already included in the feature or object class name.
Example:
Good: NUTS-ID
Bad:
NUTS-3-2003-ID
1.1.2.13 Primary key
The first step to start the naming of attributes will be to identify the conceptual “Primary key”: The
primary key will be named after the feature type identifier short name + “ID”. When this is not
sufficient, the name will be “Feature type name – Feature/object class identification – ID”
Examples:
- “NUTS-ID”. We can have “NUTS-3-RG-2003.NUTS-ID” and “NUTS-2-LB-1999.NUTS ID”
“COMM-COND-ID”
It might happen that several ID can be chosen (so called “candidate keys”). In such a case we
recommend to use an internal or ad hoc defined ID. The other candidate keys can be considered
(and named) as foreign keys to object classes (example: airports, where there are several codes
available: ICAO, IATA, and others)
1.1.2.14 Foreign keys
The attributes that are foreign keys will keep the same name that the pointed attribute, except the
suffix: “ID” will be changed by “CODE”.
It might happen that this attribute name is not sufficient to distinctly identify where this key is
pointing to. In such a case, a version or time stamp should be included:
- NUTS-3-RG-2003-CODE
- COMM-2001-CODE
This “extra” information (the level and the time stamp) must be included only if other attributes with
level and time stamps are available in the same feature/object class. The attribute name is not
meant to substitute the metadata or the data model! For instance: let us assume that Communes
2003 are available. The time stamp for the foreign key “NUTS CODE” should be sufficient, since
NUTS 2003 will be generated based on communes 2003. In the case of “airports”, for instance, the
level can be omitted:
- NUTS-2003-CODE
- NUTS-1999-CODE
- NUTS-1995-CODE
1.1.2.15 Duplicated Foreign keys in the same class
It might happen that the same foreign key appears twice in the same class, each one as the result
of the implementation of different relationship or a different role (end-name). In this case the
foreign key name would be duplicated. This conflict must be solved somehow:
- When the foreign key implements a relationship to/from a subclass or subtype, then the name of
the subclass/subtype is privileged over the name of the more generic class.
- If the previous criterion is not applicable, the attribute name will be suffixed with the role name of
the relationship implemented by the foreign key
13
The many-to-many relationship COMM-COND-2001-MANAGMNT will have two foreign keys both
pointing to COMM-COMM-2001: one represents the condominium and the other one represents
the administrator of the condominium:
Example: A condominium (subclass or subtype of Commune) is co-administrated by several
communes and/or higher administrative local units.
Object class name: COMM-COND-2001-MANAGMNT
Attribute names:
COMM-COND-2001-MANAGMNT-ID (internal ID)
COMM-COND-CODE (foreign key to COMM-COMM-2001, subtype Condominium)
COMM-CODE (foreign key to COMM-COMM-2001)
NUTS-1999-CODE (foreign key to NUTS-RG-1999 and NUTS-LB-1999)
The time stamp “2001” has been omitted in “COMM-COND-CODE” and “COMM-CODE” since it is
coincident with the time stamp in the class name. On the other side, the time stamp is strictly
needed in the NUTS code since it does not refer to NUTS 2001 (which does not exist)
1.1.2.16 Classification attributes
“Type”, “Class”, “Classification”, “Status”: These words must be avoided in the attribute names.
Every classification is done according to well defined criteria. The name of the classification, type
or status attribute must be decided after these criteria.
Example:
Degree of urbanisation or DEGU. Never use TYPE or “Commune Type”
Eligibility (there is no need to add the word “Status”)
Classifications should never be grouped. Grouping classifications is a very bad database design
habit. Example:
BAD: Airport type:
- Military active
- Military inactive
- Civil active
- Civil and military inactive
- Others.
GOOD: Airport usage:
Military
Civil Public
Civil Private
Military and Civil
Airport operability
- Active
- Inactive
- Not known
Types, classes and status will be related to a domain class. In such a case, these attributes must
be treated as foreign keys to their respective domain classes (Examples: “Airport management
code” and “airport operability code”)
14
1.1.2.17 Other attribute names:
Attributes that are not foreign key are obviously only related to the feature/object class that host
them. In consequence, no prefix will be used. When it will be needed to identify the feature/object
class among other synonyms, then the OO notation will be used (“Communes 2003.name”)
“NAME”: Word (or small number of words) by which individual person, animal, place or thing is
spoken of or to.
These attributes will be called simply “NAME”.
It might happen that several names are available for the same feature/object class. In this case the
attribute name can be distinguished by a suffix, usually the source. Example:
NAME-SABE
NAME-SIRE
“DESCRIPTION”: Verbal portrait or portraiture of person, object or event, more or less complete.
1.1.2.18 Definition.
The “description” attributes will be named “DESCRIPTION”. It is important not to mix up
“descriptions” and “names”: the description, by definition, is rather long. No attribute can be named
“definition”. The attributes named this way will be renamed as “description”
1.1.2.19 Keywords:
A list of keywords and short keywords will be developed and constantly updated. Before naming an
attribute, the list of forbidden words and keywords should be consulted. Examples:
Table 4: Examples of keywords
Keywords
Name
Description
Longitude
Latitude
Altitude
Objective
Country
Code
Identifier
Abbreviation
NAME
DESC
LON
LAT
ALT
OBJ
CNTR
CODE
ID
Table 5: Examples of forbidden worlds and alternatives
Forbidden words
Type
Class
Definition
Sequential number
Nation
State (political concept)
NUTS 0
alternative
Name the criteria used for the classification
Name the criteria used for the classification
Description
ID or code
Country
Country
Country
15
1.1.2.20 Topic Category Names
Table 6: Topic Category Names
ISO 19115 Topic Category
INSPIRE data theme
01 - Farming
Agricultural and
Aquaculture Facilities
Habitats and biotopes
Biogeographical Regions
Habitats and biotopes
Statistical Units
02 - Biota
02 - Biota
02 - Biota
03 - Boundaries
03 - Boundaries
03 - Boundaries
04 - Climatology /
Meteorology / Atmosphere
05 - Economy (14 Oceans)
06 - Elevation
06 - Elevation
07 - Environment
07 - Environment
08 - Geo-scientific
Information
08 - Geo-scientific
Information
08 - Geo-scientific
Information
08 - Geo-scientific
Information
08 - Geo-scientific
Information (14 - Oceans)
10 - Imagery/Base
maps/Earth cover
12 - Inland waters
12 - Inland waters
12 - Inland waters (07 Elevation)
13 - Locations
13 - Locations (01 Farming)
13 - Locations (16 Society)
13 - Locations (16 Society)
14 - Oceans
14 - Oceans
15 - Planning/Cadastre
15 - Planning/Cadastre
Administrative Units
Administrative Units
Meteorological Spatial
Features
Area Management
Feature Type (Long name)
Farm Accountancy Data Network
(FADN)
Natural Vegetation
Biogeographical Zones
Biotopes
Territorial Units for Statistics
(NUTS + Statistical Regions)
Communes
Subcommunes
Climate
Short
name
FADN
VEGT
BIOG
BIOT
NUTS
COMM
SCOM
CLIM
Fishing Areas
FISH
Elevation
Oceanographic Features
Natural Risk Zones
Protected sites
Natural Risk Zones
Digital Elevation Model
Bathimetry
Land Quality
Designated Areas
Soil Erosion Risk
DEM
BATH
LNQU
DSIG
SOER
Natural Risk Zones
ERTR
Soil
Geology Geomorphology
ErosionTrend
Soil
Geology
Sediments Discharges
SDDS
Natural Risk Zones
Coastal Erosion
COER
Land Cover
Land Cover
LCOV
Hydrography
Hydrography
Hydrography
Water Patterns
Lakes
Watersheds
WTPT
LAKE
WTSH
Geographical Grids
Geographical Grids
Geographical Grid
LUCAS
GGGR
LUCA
Geographical Names
Settlements
STTL
Geographical Names
Gazetteer
GAZZ
Sea Regions
Oceanographic Features
Zones and Reporting
Units
Zones and Reporting
Coastline boundaries
Sea Level rise
Inter Regional
COAS
SELV
IREG
Leader Zones
LEAD
SOIL
16
15 - Planning/Cadastre
15 - Planning/Cadastre
15 - Planning/Cadastre
15 - Planning/Cadastre
16 - Society
16 - Society
18 - Transportation
18 - Transportation
18 - Transportation
18 - Transportation
18 - Transportation
19 - Utilities /
Communication
19 - Utilities /
Communication
19 - Utilities /
Communication
Units
Zones and Reporting
Units
Zones and Reporting
Units
Zones and Reporting
Units
Zones and Reporting
Units
Population distribution Demography
Population distribution Demography
Transport Networks
Transport Networks
Transport Networks
Transport Networks
Transport Networks
Production and industrial
facilities
Production and industrial
facilities
Production and industrial
facilities
Less Favoured Areas
LFAV
National Support
NTSU
Structural Funds Zones
STFU
Urban Audit
URAU
Population
POPU
Degree of urbanisation
DGUR
Airports
Ferry links
Ports
Road infrastructure
Railway infrastructure
Nuclear Power
AIRP
FERR
PORT
ROAD
RAIL
NUPW
Energy Production
ENPR
Energy Transport
ENTR
17
1.2
Metadata
This section describes the metadata standard that should be used for data intended for the GISCO
Reference database, and an example of how to produce and validate metadata to the metadata
standard using ArcGIS. Chapter 3.9 of this document discusses metadata in concern with data
quality and includes an example of data quality reporting.
1.2.1 The Metadata Standard
The ISO 19115 consists of a comprehensive schema for describing geographic data. The schema
comprises a total of more than 300 elements of which 23 elements are core elements and 12
elements are mandatory for compliance with the international standard. The mandatory elements
focus on discovery aspect of the metadata. Despite on information on the metadata itself, they
provide information on the title, the category, the reference date, the geographic location, a short
description of the data and the data provider.
The core set expands the mandatory elements with additional information on the type, the scale,
the format, the reference system and the data lineage. These elements give rough information on
the potential usage of the data.
1.2.2 The GISCO Profile
For shared usage of spatial data within the Commission, additional information on the data might
be necessary. A metadata profile comprises at least the mandatory elements of the ISO 19115
standard and defines additional elements from the ISO 19115 standard. The standard even
describes a way for extending the profile to elements that are not part of the current standard. To
sum up, a profile contains elements of the standard plus elements that extend the standard in a
pre-defined way. The profile can change the obligation of elements from optional to mandatory.
Usually, a profile is developed if the core elements are not sufficient for describing the data
according to the needs of an organization.
The GISCO Profile is described in the document “D8.A.1Gisco Metadata Model Description”.
The starting point for the implementation of this profile is the “Eurosion Metadata Model V3” to
which two packages have been added: the “Metadata Attribute Information” and the “legislation
Information”.
This profile is also compatible with the INSPIRE model.
1.2.3 The mandatory elements of the GISCO profile
The exhaustive list of elements of the GISCO profile can be found in Appendix I The following
elements in the metadata editor and the validation are mandatory for the GISCO profile:
•
•
•
•
•
•
•
•
•
•
•
•
•
Dataset title
Dataset date (date, date type [creation, revision, publication])
Responsible Party (either individual or position name and role)
Dataset language
Topic Category
Spatial Resolution (equivalent scale, distance)
Abstract
Spatial representation type
Reference System
Geographic Location
Lineage
Metadata Point of contact (either individual or position name and role)
Metadata date stamp (date, date type [creation, revision, publication])
18
The following elements are conditional:
•
•
•
Dataset characterset
Metadata language (if not defined by encoding)
Metadata characterset (if ISO 10646-1 not used)
1.2.4 The Metadata Editor
To decide whether or not a data source is suitable to use in your map you often need more
information than its basic properties and a look at its features. You may need information about the
data's accuracy, or how a set of measurements was collected. An item's metadata may include this
type of documentation along with many properties that have been derived from the data
automatically. The Metadata tab presents this information in an easy-to-read format. You can view
the same set of metadata in many ways by choosing a different stylesheet from the dropdown list
on the Metadata toolbar. Style sheets are similar to queries that select and process some data
from a database and present the results as a report. Each stylesheet converts the metadata into a
different-looking HTML page. You can explore its content as you would any HTML page in a
browser. The metadata synchronisers assure the automatic update of the metadata when
geometric features properties change in the Geodatabase.
Metadata in ArcCatalog consists of properties and documentation. Properties, such as the extent
of a shapefiles features, are derived from the item itself. Documentation is descriptive information
supplied by a person.
By default, when you try to view an item's metadata, ArcCatalog will create it for you automatically
if it doesn't already exist; it will then add many of the item's properties to it. Once created, metadata
becomes part of the item itself. It is automatically moved, copied, and deleted along with the item.
Every time you view the metadata, ArcCatalog automatically updates the properties recorded in it
with current values. This ensures that the metadata is kept up to date with changes to the data
source. For example, the extent and count of a shapefile's features will be current one when you
look at its metadata, even if new features were recently added.
Eventually, metadata can be imported into or exported from the Geodatabase under the form of
XML files.
The current ESRI release of the ISO editor in ArcCatalog is only intended to support the "core"
elements as defined by ISO.
Therefore an ISO wizard editor has been developed in Visual Basic and ArcObjects. It manages
the metadata of the core elements of the ISO norm and the additional elements.
1.2.5 Metadata Editor Use (Within ArcCatalog)
This chapter describes how to visualize GISCO XML files with the Metadata Editor (using the
GISCO metadata style sheet made for). Second step consists in describing the import of a XML file
compliant with GISCO metadata scheme to another XML file using ISO_GISCO metadata scheme,
compatible with the Standard ESRI Metadata Editor, in order to be able to modify, update or
complete metadata files. The use of this Standard Metadata Editor Wizard is shown. Last step
depicts how to export the ISO_GISCO XML file modified into an GISCO XML file compliant with
GISCO XML metadata scheme.
1.2.5.1 Visualisation of GISCO XML files
To make delivered GISCO XML compliant files, please do proceed to the following actions:
- Copy the Eurosion stylesheet Gisco_Metadata.xsl into the directory:
[location where ArcGIS is installed]…\Metadata\Stylesheets\
19
Launch ArcCatalog application. Select the TAB called metadata.
Activate menu View -> Toolbars -> Metadata - let appear a window called Stylesheet.
Choose “Gisco_Metadata” previously installed.
Metadata compliant with GISCO metadata stylesheet can now be viewed with ArcCatalog tool.
Note : Under Windows 2000 OS, ArcCatalog application might be closed and re-launched to make
the modification efficient.
20
21
1.2.5.2 Import of GISCO XML files
This function allows the conversion of a GISCO XML file compliant with the GISCO XML SCHEMA
into an internal ESRI XML format. This operation is needed to ensure metadata update with the
ArcGIS Standard Editor Wizard.
AdministratedArea XML metadata file is currently displayed within GISCO_Metadata Stylesheet
The import is launched by FIRST selecting the file to import and then pushing the button…
…corresponding to Import of Metadata as shown below.
22
Browse the XML file to be imported.
IMPORTANT: Disable the option “Enable automatic update of metadata” unticking the box.
After validation 'OK', the file has been imported.
Its visualization now requires the use of ISO_GISCO Stylesheet
Visualizing the imported file using the ISO_GISCO will allow the edition and modification of the
imported file (in memory) with the Standard ArcGIS Metadata Editor Wizard. This is described in
the next paragraph.
23
1.2.5.3 Editing metadata
Once the XML imported file is displayed using ISO_GISCO stylesheet, the Metadata Editor Wizard
is accessible through the button:
24
1.2.5.4 Export metadata to an XML format compliant with the GISCO XML SCHEMA
This function allows the previously modified ISO_Gisco XML file into GISCO XML SCHEMA
compatible format.
The export is launched by FIRST selecting the file to export and then pushing the button
corresponding to Export of Metadata as shown below:
The metadata file exported into an XML file compliant with the GISCO XML SCHEMA can be
displayed by changing of stylesheet and selecting GISCO_Metadata one.
25
The file has been correctly exported and is now modified and still compliant with GISCO Metadata
Model.
1.2.6 Validation of compliance of metadata to GISCO profile.
A Java tool is available to validate the compliance of newly created metadata to the GISCO profile.
The tool has been made using the Xerces2 Java Parser 2.6.2 API. It is launched from the
command line and takes the path to the XML instance to be validated as parameter. The used
schema file is indicated within the xml instance.
Following are steps to make use of this tool:
1. First, unzip the JavaValidator.zip file to a folder (i.e. myFolder)
2. Open a command line window (command cmd.exe) and position the command line at the
myFoldef directory (i.e. c:>cd …\myFolder)
3. To run the tool use command java –jar as follows:
Make sure to specify the jar file “giscoXmlValidator.jar” containing the java classes and the
path to the xml instance to be validated. (here xml/catalog.xml).
Make sure to put the XercesJar folder containing the needed API in the same directory as
giscoXmlValidator.jar
Be sure, also, that the used xsd schema file for the validation is well indicated within value
of xsi:schemaLocation attribute of the root element in the xml instance file.
26
For example:
<MD_Metadata xmlns="http://www.giscoLN.org/metadataModel/"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.giscoLN.org/metadataModel/
D:\GISCO\WorkingDirectory\GiscoXsd\xsd_LongNames\gisco_LN_20050426.xsd"
…
>
(NB please, use the back slash “\” symbol as separator in absolute path to the xsd file)
Then run the tool as follows:
The program outputs a result display message to the standard output but also to log.txt log
file located in the same directory as the giscoXmlValidator.jar file
If validation succeeded the following message is displayed:
The same message is also written to log.txt file
Otherwise, error messages are displayed
1.2.7 Metadata Server
ArcIMS provides built-in functions to set up a metadata catalogue service. Metadata can be
published, i.e. transferred from feature classes to the metadata server, and be used for data
discovery. A web application running in a browser is able to search the metadata server for
information that is contained in the metadata and retrieve the correspondent data or map service.
Additionally metadata can be organised hierarchically and browsed in order to find the desired
spatial data.
A metadata server is part of the internet services provided by GISCO. GISCO has set up a
metadata server that contains information on the spatial data of the GISCO reference database as
it is visible through the Intranet service. The feature classes and datasets are grouped according to
the ISO 19115 Topic Category Codes. For the time being, the INSPIRE spatial data themes are
considered as to be too unstable for using them as basis for categorisation.
27
The metadata server is based on ArcIMS. Each metadata entry of spatial data is completed by a
map service that is available for other web services too. The server is managed by GISCO, the
content and the web map services are also managed by GISCO. Users are able to read and
search the metadata server. The server is configured in close cooperation with the INSPIRE
developments, i.e. the server will be used as a node of a broader spatial data infrastructure. In
addition to read-only access selected users have authoring rights, i.e. these users are able to
publish metadata on the GISCO server. In this case, there is no direct link from the metadata to the
data held by another DG. Interested users have to approach the responsible persons, if they were
interested in utilising the spatial data. This approach is chosen in order to assure that data can be
discovered, on the one hand, and to assure data integrity of the reference database on the other
hand. Technically, the metadata of the feature classes in the spatial database have been copied
(published) to the metadata server. Additionally, a map service has been created that assures the
link between metadata and data. Metadata authors outside GISCO will simply publish their data to
the metadata server.
The intranet address of the Metadata Server is www.gisco.eurostat.cec/metadatacatalog.
28
1.3
Spatial Reference System
1.3.1 Introduction
The European Terrestrial Reference System 1989 (ETRS89) is the geodetic datum for PanEuropean spatial data collection, storage and analysis1. This is based on the GRS80 ellipsoid and
is the basis for a coordinate reference system using ellipsoidal coordinates. For many PanEuropean purposes a plane coordinate system is preferred. But the mapping of ellipsoidal
coordinates to plane coordinates cannot be made without distortion in the plane coordinate system.
Distortion can be controlled, but not avoided.
Figure 1: ETRS89 geodetic datum
For many purposes the plane coordinate system should have minimum distortion of scale and
direction. This can be achieved through a conformal map projection. The ETRS89 Transverse
Mercator Coordinate Reference System (ETRS-TMzn) is recommended for conformal PanEuropean mapping at scales larger than 1:500 000. For Pan-European conformal mapping at
scales smaller or equal 1:500 000 the ETRS89 Lambert Conformal Conic Coordinate Reference
System (ETRS-LCC) is recommended.
With conformal projection methods attributes such as area will not be free of distortion. For PanEuropean statistical mapping at all scales or for other purposes where true area representation is
required, the ETRS89 Lambert Azimuthal Equal Area Coordinate Reference System (ETRS-LAEA)
is recommended.2
Figure 2: ETRS-LAEA Coordinate System
1
See: Annoni, A., Luzet, C., (Eds) (2000) Proceedings of the workshop “Spatial Reference System for
Europe”, Marne la Vallée, 23-24 Nov. 1999, EUR19575/EN
2
See: Annoni, A., Luzet, C., Gubler, E., Ihde, J. (Eds) (2001) Map Projections for Europe, EUR20120/EN
29
The ETRS89 datum and the above projections are described below.
1.3.2 ETRS89 Ellipsoidal Coordinate Reference System (ETRS89)
1.3.2.1 ETRS89 Description
The European Terrestrial Reference System 1989 (ETRS89) is the geodetic datum for PanEuropean spatial data collection, storage and analysis. This is based on the GRS80 ellipsoid and is
the basis for a coordinate reference system using ellipsoidal coordinates. The ETRS89 Ellipsoidal
Coordinate Reference System (ETRS89) is recommended to express and to store positions, as far
as possible.
1.3.2.2 ETRS89 Definition
The next table contains the fully described ETRS89 Ellipsoidal Coordinate Reference System
(ETRS89) following ISO 19111 Spatial referencing by coordinates:
The coordinate lines of the Ellipsoidal Coordinate System are curvilinear lines on the surface of the
ellipsoid. They are called parallels for constant latitude (phi) and meridians for constant longitude
(lamda). When the ellipsoid is related to the shape of the Earth, the ellipsoidal coordinates are
named geodetic coordinates. In some cases the term geographic coordinate system usually
implies a geodetic coordinate system.
Table 7: ETRS89 Definition
Entity
Value
CRS ID
ETRS89
CRS alias
ETRS89 Ellipsoidal CRS
CRS valid area
Europe
CRS scope
Geodesy, Cartography, Geoinformation systems, Mapping
Datum ID
ETRS89
Datum alias
European Terrestrial Reference System 1989
Datum type
geodetic
Datum realization epoch
1989
Datum valid area
Europe / EUREF
Datum scope
European datum consistent with ITRS at the epoch 1989.0
and fixed to the stable part of the Eurasian continental plate
for georeferencing of GIS and geokinematic tasks
Datum remarks
see Boucher, C., Altamimi, Z. (1992): The EUREF Terrestrial
Reference System and its First Realizations.
Veröffentlichungen der Bayerischen Kommission für die
Internationale Erdmessung, Heft 52, München 1992, pages
205-213- or ftp://lareg.ensg.ign.fr/pub/euref/info/guidelines/
Greenwich
Prime meridian ID
Prime meridian Greenwich
longitude
Ellipsoid ID
0°
Ellipsoid alias
New International
Ellipsoid semi-major axis
6 378 137 m
Ellipsoid shape
TRUE
Ellipsoid inverse flattening
298.2572221
Ellipsoid remarks
see Moritz, H. (1988): Geodetic Reference System 1980.
Bulletin Geodesique, The Geodesists Handbook, 1988,
GRS 80
30
Internat. Union of Geodesy and Geophysics
Coordinate system ID
Ellipsoidal Coordinate System
Coordinate system type
geodetic
Coordinate system dimension
3
Coordinate system axis name
geodetic latitude
Coordinate system axis direction
North
Coordinate system axis unit
identifier
Coordinate system axis name
degree
Coordinate system axis direction
East
Coordinate system axis unit
identifier
Coordinate system axis name
degree
Coordinate system axis direction
up
Coordinate system axis unit
identifier
metre
geodetic longitude
ellipsoidal height
If the origin of a right-handed Cartesian coordinate system coincides with the centre of the
ellipsoid, the Cartesian Z-axis coincides with the axis of rotation of the ellipsoid and the positive Xaxis passes
through the point "phi" = 0, "lamda" = 0.
1.3.3 ETRS89 Transverse Mercator Coordinate Reference System (ETRSTMzn)
1.3.3.1 ETRS-TMzn Description
The ETRS89 Transverse Mercator Coordinate Reference System (ETRSTMzn) is identical to the
Universal Transverse Mercator grid system for the northern Hemisphere applied to the ETRS89
geodetic datum and the GRS80 ellipsoid. The UTM system was developed for worldwide
application between 80° S and 84° N with the follow ing basic features:
1. 60 zones of 6° longitudinal extension numbered c onsecutively from 1 to 60, beginning with
number 1 for the zone between 180° W and 174° W and continuing eastward
2. central meridian scale factor of 0.9996 producing two lines of secancy approximately 180
000 m East and West of the central meridian
3. negative coordinates are avoided by assigning a false easting value of 500 000 m East at
the central meridian; and false northing values at the equator of 0 m for the northern
hemisphere and 10 000 000 m for the southern hemisphere
4. uniform conversion formulas from one zone to another
5. unique referencing for all zones in a plane rectangular coordinate system
6. meridional convergence (between the true and grid North) to be less than 5°
7. map distortion within the zones to be less than 1:2,500
ETRS-TMzn is a series of zones, where “zn” in the identifier is the zone number. Each zone runs
from the equator northwards to latitude 84º North and is 6-degrees wide in longitude reckoned from
the Greenwich prime meridian. Zone 31 is centred on 3º East and is used between 0º and 6º East,
zone 32 is centred on 9º East and is used between 6º and 12º East, etc.
31
Table 8: Zones of the ETRS-TMzn.
Zone
number
(zn)
Longitude
of Origin
(degrees)
West
Limit
(degrees)
East
Limit
(degrees)
South
Limit
(degrees)
North
Limit
(degrees)
26
27
28
29
30
31
32
33
34
35
36
37
38
39
27º West
21º West
15º West
9º West
3º West
3º East
9º East
15º East
21º East
27º East
33º East
39º East
45º East
51º East
30º West
24º West
18º West
12º West
6º West
0º East
6º East
12º East
18º East
24º East
30º East
36º East
42º East
48º East
24º West
18º West
12º West
6º West
0º West
6º East
12º East
18º East
24º East
30º East
36º East
42º East
48º East
54º East
0º North
0º North
0º North
0º North
0º North
0º North
0º North
0º North
0º North
0º North
0º North
0º North
0º North
0º North
84º North
84º North
84º North
84º North
84º North
84º North
84º North
84º North
84º North
84º North
84º North
84º North
84º North
84º North
1.3.3.2 ETRS-TMzn Definition
Table 9: ETRS-TMzn Definition
Entity
Type
CRS ID
ETRS-TMzn
CRS remarks
zn is the zone number, starting with 1 on the zone from 180°
West to 174° West, increasing eastwards to 60 on th e zone
from 174° East to 180° East
CRS alias
ETRS89 Transverse Mercator CRS
CRS valid area
Europe
CRS scope
Datum ID
CRS for conformal pan-European mapping at scales larger
than 1:500,000
ETRS89
Datum alias
European Terrestrial Reference System 1989
Datum type
geodetic
Datum realization epoch
1989
Datum valid area
Europe/EUREF
Datum scope
European datum consistent with ITRS at the epoch 1989.0
and fixed to the stable part of the Eurasian continental plate
for georeferencing of GIS and geokinematic tasks
Datum remarks
Prime meridian ID
see Boucher, C., Altamimi, Z. (1992): The EUREF
Terrestrial Reference System and its First Realizations.
Veröffentlichungen der Bayerischen Kommission für die
Internationale Erdmessung, Heft 52, München 1992, pages
205-213 or ftp://lareg.ensg.ign.fr/pub/euref/info/guidelines/
Greenwich
Prime meridian Greenwich longitude
0°
Ellipsoid ID
GRS 80
Ellipsoid alias
New International
32
Ellipsoid semi-major axis
6 378 137 m
Ellipsoid shape
TRUE
Ellipsoid inverse flattening
298.2572221
Ellipsoid remarks
see Moritz, H. (1988): Geodetic Reference System 1980.
Bulletin Geodesique, The Geodesists Handbook, 1988,
Internat. Union of Geodesy and Geophysics 115
Coordinate system ID
TMzn
Coordinate system type
Projected
Coordinate system dimension
2
Coordinate system remarks
Projection: Transverse Mercator in zones, 6° width
Coordinate system axis name
N
Coordinate system axis direction
North
Coordinate system axis unit
identifier
Coordinate system axis name
Metre
Coordinate system axis direction
East
Coordinate system axis unit
identifier
Operation ID
Metre
Operation valid area
Europe
Operation scope
for conformal pan-European mapping at scales larger
E
TMzn
than 1:500,000
Operation method name
Transverse Mercator Projection
Operation method name alias
TMzn
Operation method formula
Transverse Mercator Mapping Equations, in Hooijberg,
Practical Geodesy, 1997, pages 81-84, 111-114
Operation method parameters
number
Operation parameter name
7
Operation parameter value
0°
Operation parameter remarks
0°, the Equator
Operation parameter name
longitude of origin
Operation parameter value
central meridian (CM) of each zone
Operation parameter remarks
central meridians ..,3° W, 3° E, 9° E, 15° E, 21° E
Operation parameter name
false northing
Operation parameter value
0m
latitude of origin
,...
Operation parameter remarks
Operation parameter name
false easting
Operation parameter value
500 000 m
Operation parameter remarks
Operation parameter name
scale factor at central meridian
Operation parameter value
0.9996
Operation parameter remarks
Operation parameter name
width of zones
Operation parameter value
6°
Operation parameter remarks
33
Operation parameter name
latitude limits of system
Operation parameter value
0° N and 84° N
Operation parameter remarks
Note that the axes abbreviations for ETRS-TMzn and ETRS-LCC are N and E
whilst for the ETRS-LAEA they are Y and X.
1.3.4. ETRS89 Lambert Conformal Conic Coordinate Reference System
(ETRS-LCC)
1.3.4.1 ETRS-LCC Description
The ETRS89 Lambert Conformal Conic Coordinate Reference System (ETRS-LCC) is a
single projected coordinate reference system for all of the pan-European area applied to the
ETRS89 geodetic datum and the GRS80 ellipsoid. Because of the greater extent in longitude
than in latitude, a Lambert Conic Conformal projection with two standard parallels is utilised.
The scale factor is only a function of the latitudes of the standard parallels and the latitude of
the point where it is computed.
1.3.4.2 ETRS-LCC Definition
Table 10: ETRS-LCC Definition
Entity
Value
CRS ID
ETRS-LCC
CRS alias
ETRS89 Lambert Conformal Conic CRS
CRS valid area
Europe
CRS scope
CRS for conformal pan-European mapping at scales
smaller or equal 1:500,000
Datum ID
ETRS89
Datum alias
European Terrestrial Reference System 1989
Datum type
geodetic
Datum realization epoch
1989
Datum valid area
Europe/EUREF
Datum scope
Ellipsoid ID
European datum consistent with ITRS at the epoch
1989.0 and fixed to the stable part of the Eurasian
continental plate for georeferencing of GIS and
geokinematic tasks
see Boucher, C., Altamimi, Z. (1992): The EUREF
Terrestrial Reference System and its First Realizations.
Veröffentlichungen der Bayerischen Kommission für die
Internationale Erdmessung, Heft 52, München 1992,
pages 205-213 or
ftp://lareg.ensg.ign.fr/pub/euref/info/guidelines/ Prime
meridian ID Greenwich Prime meridian Greenwich
longitude 0°
GRS 80
Ellipsoid alias
New International
Ellipsoid semi-major axis
6 378 137 m
Ellipsoid shape
TRUE
Datum remarks
34
Ellipsoid inverse flattening
298.2572221
Ellipsoid remarks
see Moritz, H. (1988): Geodetic Reference System 1980.
Bulletin Geodesique, The Geodesists Handbook, 1988,
Internat. Union of Geodesy and Geophysics
Coordinate system ID
LCC
Coordinate system type
Projected
Coordinate system dimension
2
Coordinate system axis name
N
Coordinate system axis direction
North
Coordinate system axis unit
identifier
Coordinate system axis name
Metre
Coordinate system axis direction
East
Coordinate system axis unit
identifier
Operation ID
metre
Operation valid area
Europe
Operation scope
for conformal pan-European mapping at scales smaller or
equal 1:500,000
Operation method name
Lambert Conformal Conic Projection with 2 standard
parallels
Lambert Conformal Conic Projection, in Hooijberg,
Practical Geodesy, 1997, pages 133-139
Operation method formula
E
LCC
Operation method parameters
number
Operation parameter name
6
Operation parameter value
35°N
lower parallel
Operation parameter remarks
Operation parameter name
upper parallel
Operation parameter value
65° N
Operation parameter remarks
Operation parameter name
latitude grid origin
Operation parameter value
52° N
Operation parameter remarks
Operation parameter name
longitude grid origin
Operation parameter value
10° E
Operation parameter remarks
Operation parameter name
false northing
Operation parameter value
2 800 000 m
Operation parameter remarks
Operation parameter name
false easting
Operation parameter value
4 000 000 m
Operation parameter remarks
35
1.3.5. ETRS89 Lambert Azimuthal Equal Area Coordinate Reference System
(ETRS-LAEA)
1.3.5.1 ETRS-LAEA Description
The ETRS89 Lambert Azimuthal Equal Area Coordinate Reference System (ETRS-LAEA) is a
single projected coordinate reference system for all of the Pan-European area. It is based on the
ETRS89 geodetic datum and the GRS80 ellipsoid. Its defining parameters are given in the
following table according to ISO 19111 Spatial referencing by coordinates.
1.3.5.2 ETRS-LAEA Definition
Table 11: ETRS-LAEA Definition
Entity
CRS ID
CRS alias
CRS valid area
CRS scope
Datum ID
Datum alias
Datum type
Datum realization epoch
Datum valid area
Datum scope
Datum remarks
Prime meridian ID
Prime meridian Greenwich longitude
Ellipsoid ID
Ellipsoid alias
Ellipsoid semi-major axis
Ellipsoid shape
Ellipsoid inverse flattening
Ellipsoid remarks
Coordinate system ID
Coordinate system type
Coordinate system dimension
Coordinate system axis name
Coordinate system axis direction
Coordinate system axis unit identifier
Value
ETRS-LAEA
ETRS89 Lambert Azimuthal Equal Area CRS
Europe
CRS for Pan-European statistical mapping at all
scales or other purposes where true area
representation is required
ETRS89
European Terrestrial Reference System 1989
geodetic
1989
Europe / EUREF
European datum consistent with ITRS at the
epoch 1989.0 and fixed to the stable part of the
Eurasian continental plate for georeferencing of
GIS and geokinematic tasks
see Boucher, C., Altamimi, Z. (1992): The
EUREF Terrestrial Reference System and its
First Realizations. Veröffentlichungen der
Bayerischen Kommission für die Internationale
Erdmessung, Heft 52, München 1992, pages
205-213
or
ftp://lareg.ensg.ign.fr/pub/euref/info/guidelines
Greenwich
0°
GRS 80
New International
6 378 137 m
TRUE
298.2572221
see Moritz, H. (1988): Geodetic Reference
System 1980. Bulletin Geodesique, The
Geodesists Handbook, 1988, Internat. Union of
Geodesy and Geophysics
LAEA
projected
2
Y
North
metre
36
Coordinate system axis name
Coordinate system axis direction
Coordinate system axis unit identifier
Operation ID
Operation valid area
Operation scope
Operation method name
Operation method formula
Operation method parameters number
Operation parameter name
Operation parameter value
Operation parameter name
Operation parameter value
Operation parameter remarks
Operation parameter name
Operation parameter value
Operation parameter remarks
Operation parameter name
Operation parameter value
Operation parameter remarks
X
East
metre
LAEA
Europe
for Pan-European statistical mapping at all
scales or other purposes where true area
representation is required
Lambert Azimuthal Equal Area Projection
US Geological Survey Professional Publication
1395, "Map Projection - A Working Manual" by
John P. Snyder.
4
latitude of origin
52° N
longitude of origin
10° E
false northing
3 210 000.0 m
false easting
4 321 000.0 m
With these defining parameters, locations North of 25° have positive grid northing and locations
eastwards of 30° West longitude have positive grid easting. Note that the axes abbreviations for
ETRS-LAEA are Y and X whilst for the ETRS-LCC and ETRS-TMnz they are N and E.
Caution: All EU projections are based on ETRS89 datum and therefore use ellipsoidal formulas. In
some GIS applications the Lambert Azimuthal Equal Area method is implemented only in spherical
form. Geodetic latitude and longitude must not be used in these spherical implementations. To do
so may cause significant error (up to 15km!). Use the example conversions above to test whether
software uses appropriate formulas.
37
1.4
Grid Creation Standards
1.4.1 Introduction
A grid for representing thematic information is a system of regular and geo-referenced cells, with a
specified shape and size, and an associated property.
The Workshop on European Reference Grids held in Ispra, October 2003 recommended to adopt a
common European Grid Reference System for Reporting and Statistical Analysis3.
The proposal for a European grid coding system is based on the initial discussion during the
workshop that continued during the preparation of the short proceedings. A decimal grid Coding
System was initially proposed by Albrecht Wirthmann during the workshop. Successively during
the consultation phase a second proposal was submitted by Mark Greaves. The discussion
involved other experts (e.g. Lars Bernard, Andrus Meiner,.) and demonstrated the need to modify
the initial proposals introducing additional levels in addition to the different hierarchical levels of the
proposed decimal grid system.
After analysing advantages and disadvantages of different solutions, a new proposal was
formulated and proposed for adoption as European specification for INSPIRE. This system is
described in the following paragraphs.
1.4.2 European Grid Coding System
1.4.2.1 Basic assumptions and definitions Coordinate Reference System
The geographical location of the grid points are based on the Lambert Azimuthal Equal Area
coordinate reference system (ETRS-LAEA) as defined by the Spatial Reference and the Map
Projections workshops in Marne la Vallee (1999, 2001) 22 . The cartographic projection is centred
on the point N 52°, E 10°. The coordinate system is metric.
1.4.2.2 Hierarchical Structure
The grid is defined as hierarchical grid in metric coordinates in power of 10. The hierarchical
structure is determining the structure of the grid coding system.
Figure 3: Hierarchical grid structure for the first 3 levels
3
See: Annoni, A., (Ed) (2005) European Reference Grids, workshop proceedings, EUR21494/EN
38
1.4.2.3 Code Definition
In agreement with the workshop recommendations the coding system must satisfy the following
principles:
· easy to manipulate,
· hierarchical,
· having a European Unique Code Identifier
For these reasons all systems proposed in the following sections are based on the coordinates of
the Grid. For clarity all examples refer to the same given pair of “raw” coordinates (5780354,
436102) that are given in meters.
1.4.2.4 Ordering of Axes
It is assumed that the first coordinate (in the example 5780354) identifies the Easting of the point,
i.e. the coordinate value along the west-to-east axis. The second coordinate (in the example
436102) identifies the Northing of the point, i.e. the coordinate value along the south-to-north axis.
Grid code identifies south-western corner of a cell
To derive a code at an accuracy level that is less accurate than the one given by a pair of
coordinates always a truncation method is used, i.e. the grid code coordinates for a coarser
resolution are always describing the lower left corner of the cell that includes the given coordinates.
1.4.2.5 Direct Coordinate Coding System
This coding system concatenates the coordinates of Easting and Northing of a grid point. The
length of the coordinates defines the precision of the grid. A grid with a precision of 1 m would
require a maximum of 7 digits by each dimension. The resulting code would have 14 digits. A grid
with a precision of 1 km would be defined by a code comprising 8 digits. Leading zeros are coded
in order to preserve the precision information.
Figure 4: Direct Coordinate Coding System for 100 km & 1000 km resolution
39
1.4.2.6 Quad-tree Subdivision
The difference in resolution between the different hierarchical levels of the proposed decimal grid
system is rather large. For some applications, it might be necessary to insert additional levels in
between. This could be done by simply dividing a grid into 4 equally spaced sub cells. Thus, a grid
with a distance of 1 km could be divided into cells of 500 m length. A second level could be
introduced by dividing each sub cell again into 4 equally sized cells of 250 m length. A next sub
division would lead to grid cells of 125 m length. This is close to the next lower hierarchical level of
the decimal grid. Therefore, it is suggested to introduce a maximum of 2 sub divisions for each grid
level. This method of sub dividing a grid is called quad-tree, as each cell is divided into 4 quarters.
The graphical representation of the grid structure when traversing the grid from its root cell to its
smallest sub cell results in a tree structure with 4 branches at each level:
Figure 5: Grid tree structure
1.4.2.7 Explicit indication of resolution level using powers of 10 and 2
An explicit indication of hierarchical (resolution) level seems an asset, but all systems proposed in
the previous section presented hide the notion of primary and secondary level and require some
effort to remember the correspondence between level and precision. A new proposal is formulated
in this chapter (with two options) that seems to overcome most of the problems identified in
previous proposals.
It is suggested to use a coordinate coding system for constructing the grid code with the following
characteristics:
1. The system is based on a primary grid and in two additional sub-levels (secondary and
tertiary grid)
2. The coordinate values are expressed in decimetres.
3. The primary grid (metric) will have 7 primary levels (first column in Table 12)
4. Two additional sub-levels are authorised as quadtree subdivision of the primary grid
(second column in Table 12) except as for level one, where only on sublevel is authorised
to not have sub-decimetre resolution.
40
Table 12: Primary (power of 10) and quad-tree levels (power of 2) for explicit indication
Primary
Level
1
Quadtree
Level
0
Value
in dm
in m/km
101 20
10
1m
1
1
101 21
5
0.5m
2
0
10 2 20
100
10m
2
1
10 2 21
50
5m
2
2
102 22
25
2.5m
3
0
10 3 20
1000
100m
3
1
10 3 21
500
50m
3
2
10 3 22
250
25m
4
0
10 4 20
10 000
1000m / 1km
4
1
4
1
10 2
5 000
500m
4
2
4
10 2
2
2 500
250m
5
0
5
10 2
0
100 000
10km
5
1
5
1
10 2
50 000
5km
5
2
5
10 2
2
25 000
2500m/ 2.5km
6
0
6
10 2
0
1 000 000
100 km
6
1
6
1
10 2
500 000
50km
6
2
10 6 22
25 000
25km
7
0
10 7 20
10 000 000
1 000km
7
1
10 7 21
5 000 000
500km
7
2
10 7 22
2 500 000
250km
Two ways to express the code are proposed:
1. A fixed length code (here a point is used as a delimiter, which increases readability but is
not necessary for automatic processing):
Code=Level.QuadtreeLevel.EastCoordinate.NorthCoordinate
2. A floating length code (again a point as delimiter) which makes the quad-tree level code
(the last part) optional, i.e. it has only to be indicated if a quad-tree level is used:
Code=EastCoordinate.NorthCoordinate.Level[.QuadtreeLevel]
41
Table 13: Two applications for the explicit indication coding for the example coordinates
(578035, 436102)
Primary
Level
Quadtree
level
m
East
North
1
0
1
57803540
4361020
1
1
0.5
57803540
4361020
2
0
10
57803500
4361000
2
1
5
57803500
4361000
2
2
2.5
57803525
4361000
3
0
100
57803000
4361000
3
1
50
57803500
4361000
3
2
25
57803500
4361000
4
0
1000
57800000
4360000
4
1
500
57800000
4360000
4
2
250
57802500
4360000
5
0
10000
57800000
4300000
5
1
5000
57800000
4350000
5
2
2500
57800000
4350000
6
0
100000
57000000
4000000
6
1
50000
57500000
4000000
6
2
25000
57750000
4250000
7
0
1000000
50000000
0
7
1
500000
55000000
0
7
2
250000
57500000
2500000
Fixed Length Code
Floating Code
1.0.57803540.04361020
5780354.0436102.1[.0]
1.1.57803540.04361020
57803540.04361020.1.1
2.0.57803500.04361000
578035.043610.2[.0]
2.1.57803500.04361000
5780350.0436100.2.1
2.2.57803525.04361000
57803525.04361000.2.2
3.0.57803000.04361000
57803.04361.3.[0]
3.1.57803500.04361000
578035.043610.3.1
3.2.57803500.04361000
5780350.0436100.3.2
4.0.57800000.04360000
5780.0436.4[.0]
4.1.57800000.04360000
57800.04360.4.1
4.2.57802500.04360000
578025.043600. 4.2
5.0.57800000.04300000
578.043.5[.0]
5.1.57800000.04350000
5780.0435.5.1
5.2.57800000.04350000
57800.04350.5.2
6.0.57000000.04000000
57.04.6[.0]
6.1.57500000.04000000
575.040.6.1
6.2.57750000.04250000
5775.0425.6.2
7.0.50000000.00000000
5.0.7[.0]
7.1.55000000.00000000
55.00.7.1
7.2.57500000.02500000
575.025.7.2
Both coding systems:
1. can be easily derived from the coordinate values.
2. are easily understandable.
Clearly the fixed length code shows more redundancy and less flexibility – i.e. for a change of
precision - but the coding rules are straightforward and thus seem to be easier to handle by
computers.
42
1.5
Generalisation and Generalisation Parameters
1.5.1 Introduction
The subject of cartographic generalisation is the reduction of information in a map when scale is
reduced. Applied methods in this context are simplification, selection, deletion, exaggeration,
symbolisation. One aspect of generalisation is the simplification of lines and polygons. In this
context generalisaton methods simplify lines by removing small fluctuations or extraneous bends
from it while preserving its essential shape. Generalisation allows you to create simplified datasets
for displaying or publishing at smaller scales based on your larger scale data.
1.5.2 Generalisation in ArcGIS
In ArcGIS the generalisation tool is called ‘Simplify Line’. The ‘Simplify Lines’ tool allows two
different algorithms for generalisation:
Point Remove is a fast, simple algorithm that reduces a line quite effectively by removing
redundant points, such as over digitised vertices; however, the angularity of the resulting line will
increase significantly as the tolerance increases so the line may become aesthetically unpleasing.
Use Point Remove for data compression or a relatively low degree of simplification.
Bend Simplify applies advanced techniques to detect bends along a line, analyze their
characteristics, and eliminate insignificant ones. It takes longer to process than Point Remove, but
the resulting line is more faithful to the original and shows better aesthetic quality.
Figure 6: Types of generalisation algorithms used in ArcGIS
Both the ‘Point Removal’ and ‘Bend Simplify’ algorithms are described in more detail in chapter 3.7
of this document.
Many feature types within the GISCO Reference database (e.g. Coastline) are also included at
different display scales. These scaled versions have been generalised. Generalisation is discussed
in more detail in section 3.7 of this document, along with tips on how to analyse and improve
generalisation results.
1.5.3 GISCO Reference Database Generalisation Parameters
The following table shows, for the common display scales, the threshold at which small polygons
can be deleted, along with the weed tolerance parameter used in ArcInfo. The ‘Bend Simplify’
algorithm should be used to generalise data intended for the GISCO Reference Database.
43
Table 14: Deletion thresholds and weed tolerance parameters
Scale
Max Area
1:100 000
1:1m
1:3m
1:4m
1:10m
1:20m
2500 m2
0.25 km2
2.25 km2
4.00 km2
25.00 km2
100.00 km2
Weed
Tolerance
50m
500m
1500m
2000m
4500m
9000m
It must be remembered that there maybe some exceptions when deleting polygons. For instance, it
was found that the Austrian part of the Constance lake had disappeared in the NUTS 1:20million
feature class. Although that polygon represents just 32 Km2, it should be shown at 1:20m. These
kind of issues are politically sensitive.
44
1.6
Database Interoperability
1.6.1 Overview
Geographic information system (GIS) technology is evolving beyond the traditional GIS community
and becoming an integral part of the information infrastructure in many organizations. The unique
integration capabilities of a GIS allow disparate data sets to be brought together to create a
complete picture of a situation. GIS technology illustrates relationships, connections, and patterns
that are not necessarily obvious in any one data set, enabling organizations to make better
decisions based on all relevant factors. Organizations are able to share, coordinate, and
communicate key concepts between departments within an organization or between separate
organizations using GIS as the central spatial data infrastructure. GIS technology is also being
used to share crucial information across organizational boundaries via the Internet and the
emergence of Web Services.
To fully realise the capability and benefits of geographic information and GIS technology, spatial
data needs to be shared and systems need to be interoperable. GIS technology provides the
framework for a shared spatial data infrastructure and a distributed architecture.
Interoperability is one way in which the integration of spatial data in Europe can be improved. It can
aid the maintenance, availability and understanding of spatial data, and therefore contribute
towards the aims of the principles set out by the INSPIRE Initiative.
The paragraphs below discusses the value of being "open," the evolution of spatial standards with
the development of new technologies, including the future of Web Services, and provides an
overview of where efforts are being concentrated in regards to interoperability.
1.6.2 Spatial Data Standards and GIS Interoperability
1.6.2.2 What Does Being an "Open" GIS Mean?
To put this question into context, it is important to understand that during the past 20 years, the
concepts, standards, and technology for implementing GIS interoperability have evolved through
six stages.
1. Data converters (DLG, MOSS, GIRAS)
2. Standard interchange formats (SDTS, DXF ™ , GML)
3. Open file formats (VPF, shapefiles)
4. Direct read application programming interfaces (APIs) (ArcSDE ® API, CAD Reader, ArcSDE
CAD Client)
5. Common features in a database management system (DBMS) (OGC Simple Feature
Specification for SQL ™ )
6. Integration of standardized GIS Web services (WMS, WFS)
All six of these approaches and related technologies are important and continue to play a
significant role in GIS interoperability today. Data sharing between organizations with different GIS
vendor systems was limited to data converters, transfer standards, and later open file formats.
Sharing spatial data with other core business applications was rarely achieved. Today, most GIS
products directly read and sometimes dynamically transform data with minimal time delay. The
point here is that the GIS community has been pursuing open interoperability for many years, and
the solutions to achieving this goal have changed with the development of new technologies.
Another factor to be considered is the still evolving view of the role that GIS plays in an
organization. In the early days of GIS, the focus, with rare exceptions, was on individual, isolated
projects. Today the focus is on the integration of spatial data and analysis in the mission-critical
business processes and work flows of the enterprise and on increasing the return on investment
(ROI) in GIS technology and databases by improving interoperability, decision making, and service
delivery.
45
Finally, it is worthwhile to remember why GIS system technology is implemented in the first place.
Even if we have specialized responsibility for gathering and managing geographic data, we need to
remember that a GIS is not an end in itself. A GIS must produce useful information products that
can be shared among multiple users, while at the same time provide a consistent infrastructure to
ensure data integrity. It is important not to get caught up in the technology and forget this basic
principle.
Interoperability enables the integration of data between organizations and across applications and
industries, resulting in the generation and sharing of more useful information.
1.6.2.3 The Value of Being Open
An open GIS system allows for the sharing of geographic data, integration among different GIS
technologies, and integration with other non-GIS applications. It is capable of operating on different
platforms and databases and can scale to support a wide range of implementation scenarios from
the individual consultant or mobile worker using GIS on a workstation or laptop to enterprise
implementations that support hundreds of users working across multiple regions and departments.
An open GIS also exposes objects that allow for the customization and extension of functional
capabilities using industry-standard development tools.
1.6.2.4 The Georelational Database
Gradually, GIS models evolved into georelational structures where related attribute data could be
stored in a relational database that was linked to the file-based spatial features. However, the
georelational format had limited scalability, and the dual data structure (spatial features stored in
proprietary file-based format with attributes stored in a relational database) meant that the GIS
could not take full advantage of relational database features such as backup and recovery,
replication, and fail-over. In addition, supporting large data layers required the use of complex tiling
structures to maintain performance, and sharing spatial information with other core business
applications was still not possible.
1.6.2.5 The Spatially Enabled Database
In the mid-'90s, new technology emerged that enabled spatial data to be stored in relational
databases (often referred to as spatially enabling the database), opening a new era of broad
scalability and the support of large, non-tiled, continuous data layers. When the new spatially
enabled databases were combined with client development environments that could be embedded
within core business applications, the sharing of spatial features with core business applications,
such as customer management systems, became possible. In addition, these spatially enabled
databases allowed organizations to take the first steps toward enterprise GIS and the elimination of
organizational "spatial data islands." Perhaps not coincidently, the open GIS movement was
spawned shortly after the arrival of the first all-relational models capable of storing both spatial and
attribute data in a relational database when standards organizations, such as the Open GeoSpatial Consortium (OGC) and the ‘International Organization for Standardization’ began
promoting the idea of data sharing through spatial data standards. The early work of these
organizations was focused on sharing simple spatial features in a relational database, thereby
enabling interoperability between the commercial GIS vendors. OGC, an international industry
consortium of private companies, government agencies, and universities, published an open
spatial standard called the Simple Features Specification.
1.6.2.6 The Future with Web Services
Much of the focus of GIS developers today is Web services, as these are seen as the best longrange solutions for data sharing and interoperability.
Web services avoid the issues and complications of GIS applications being tied to the spatial
schema of a specific RDBMS vendor and allow GIS vendors to manage their own data using the
best methods and formats for their tools in whatever database environment they choose. In
addition, Web services allow server-to-server sharing of data and services, as opposed to
integration only happening at the client level as it does with standards that are focused on the
DBMS. Some vendors choose to use an RDBMS with schema and methods that perform optimally
for their tools. Others use file systems. Web services mean that each GIS vendor can build and
46
manage its own GIS data and readily provide GIS services (data, maps, and geoprocessing) to a
larger audience in a common environment.
1.6.2.7 Web Services Framework
Web services are a fundamentally new framework and set of standards for computing. Web
services envision a network of distributed computing nodes, which can include servers,
workstations, desktop clients, and lightweight "pervasive" clients (phones, PDAs, etc.). Web
services standards provide the glue by which these computers and devices interact to form a
greater computing whole, accessed from any other device on the network. It is also important to
recognize that Web services are not just for the Internet; they represent a powerful architecture for
all types of distributed computing. Web services provide a framework for fusing computing devices
via open networks (the Internet, wireless, and local networks). In Web services, computing nodes
have three roles: client, service, and broker. A client is any computer that accesses functions from
one or more other computing nodes on the network. Typical clients include desktop computers,
Web browsers, Java applets, and mobile devices. A client process makes a request of a computing
service and receives results for each request. A service is a computing process that awaits
requests, responds to each request, and returns a set of results. A broker is essentially a service
metadata portal for registering and discovering services. Any network client can search the portal
for an appropriate service. Server and broker technologies are typically used on UNIX, Linux, and
Windows platforms.
Web services can support the integration of information and services that are maintained on a
distributed network. This is appealing in organizations, such as the Comission and the main
stakeholders concerned with the INSPIRE Intiative, that have departments that independently
collect and manage spatial data,and who require these data sets to be integrated. The use of Web
services (a connecting technology) coupled with GIS (an integrating technology) can efficiently
support this need. The result is that the various layers of information can be dynamically queried
and integrated, while at the same time the custodians of the data can maintain this information in a
distributed computing environment.
1.6.2.8 The Standards for Web Services
The key standards used for Web services are a series of protocols (i.e., XML; Simple Object
Access Protocol [SOAP]; Web Services Description Language [WSDL]; and Universal Description,
Discovery, and Integration [UDDI]) that support sophisticated communications between various
nodes in a network. They enable smarter communication and collaborative processing among
nodes built within any Web services-compliant architecture.
Web services can be accessed with devices such as browsers, mobile devices such as
telephones, desktop clients, and other information appliances. To discover these services, a broker
is provided. The discovery protocol is referred to as a Universal Description, Discovery, and
Integration. In the GIS context, the UDDI node plays the role of a metadata server of registered
Web services. A user can search a UDDI directory and find other distributed service providers or
services that exist on a network.
Web services interoperate (communicate) through an XML-based protocol known as Simple Object
Access Protocol. This is an XML API to the functions provided by a Web service. Each Web
service "advertises" its SOAP API using a mechanism called Web Services Description Language,
allowing easy discovery of any service's capabilities.
47
Figure 7: Integration of standards-based web services
Web services provide an open, interoperable, and highly efficient framework for implementing
systems. They are interoperable because each piece of software communicates with each other
piece via the standard SOAP and XML protocols. This means that if a developer "wraps" an
application with a SOAP API, it can talk with (call/serve) other applications. Web services are
efficient because they build on the stateless (loosely coupled) environment of the Internet. A
number of nodes can be dynamically connected only when necessary to carry out a specific task
such as update a database or provide a particular service.
While conceptually the basic computer components of a Web services system are still clients and
servers, it is important to recognize that the network connections are dynamically created "just in
time" and, therefore, do not require the overhead of "statefull" networks. These networks can be
implemented in open as well as secure environments.
1.6.2.9 Web Services and GIS
This loosely coupled architecture provides a new and promising solution for implementation of
complex collaborative applications such as a distributed GIS. In some ways, the integration of GIS
and Web services simply means that GIS can be more extensively implemented, and people will
be able to take mapping, data, and geoprocessing services from many servers and integrate them
into a common environment. Unique to GIS-based Web services is the ability to not only connect
and interoperate but to integrate data using the unique properties that are inherent within GIS itself
(i.e., data integration and fusion based on geographic location).
Web services enable the realization of some of the principles on which the INSPIRE Initiative is
based:
1. Data should be collected once and maintained at the level where this can be done most
effectively;
2. It should be possible to combine seamless spatial information from different sources
across Europe and share it between many users and applications;
3. It should be possible for information collected at one level to be shared between all the
different levels, detailed for detailed investigations, general for strategic purposes;
4. Geographic information needed for good governance at all levels should be abundant
under conditions that do not refrain from its extensive use;
5. It should be easy to discover which geographic information is available, fits the needs for a
particular use and under which conditions it can be acquired and used;
48
6. Geographic data should become easy to understand and interpret because it can be
visualised within the appropriate context and selected in a user-friendly way.
GIS fundamentally involves the integration of data from multiple sources. The Web services
architecture establishes a particular type of relationship between service providers and consumers
of information that nicely supports the dynamic integration of data, key to creating a spatial data
infrastructure.
1.6.2.12 WMS and WFS
Integration of web services can be achieved by using the OGC Web Map Service (WMS) and Web
Feature Service (WFS) standards. A Web Map Service produces maps of spatially referenced data
dynamically from geographic information. This international standard defines a "map" to be a
portrayal of geographic information as a digital image file suitable for display on a computer
screen. A map is not the data itself. WMS-produced maps are generally rendered in a pictorial
format such as PNG, GIF or JPEG, or occasionally as vector-based graphical elements in Scalable
Vector Graphics (SVG) or Web Computer Graphics Metafile (WebCGM) formats.
This International Standard defines three operations:
•
•
•
returns service-level metadata
returns a map whose geographic and dimensional parameters are well-defined
returns information about particular features shown on a map (optional)
Web Map Service operations can be invoked using a standard web browser by submitting requests
in the form of URLs. The content of such URLs depends on which operation is requested. In
particular, when requesting a map the URL indicates what information is to be shown on the map,
what portion of the earth is to be mapped, the desired coordinate reference system, and the output
image width and height. When two or more maps are produced with the same geographic
parameters and output size, the results can be accurately overlaid to produce a composite map.
The use of image formats that support transparent backgrounds (e.g., GIF or PNG) allows
underlying maps to be visible. Furthermore, individual maps can be requested from different
servers. The Web Map Service thus enables the creation of a network of distributed map servers
from which clients can build customized maps.
The Web Feature Service is an interface allowing requests for geographical features across the
web being highly interoperable. It uses the XML-based GML for data exchange.
The WFS specification defines interfaces for describing data manipulation operations of
geographic features. Data manipulation operations include the ability to:
•
•
•
•
Create a new feature instance
Delete a feature instance
Update a feature instance
Get or Query features based on spatial and non-spatial constraints
A WFS describes discovery, query, or data transformation operations. The request is generated on
the client and is posted to a web feature server using HTTP. The web feature server then executes
the request. The WFS specification uses HTTP as the distributed computing platform, although this
is not a hard requirement.
There are two encodings defined for WFS operations:
• XML (amenable to HTTP POST/SOAP)
• Keyword-Value pairs (amenable to HTTP GET/REST)
49
1.6.2.13 Standards Organizations
There are many international standards organizations associated with spatial interoperability:
•
•
•
•
•
•
•
•
•
•
•
•
•
ISO - International Organization for Standardization
OGC - Open Geospatial Consortium
OGCE - Open Geospatial Consortium (Europe)
W3C - World Wide Web Consortium
ANSI - American National Standards Institute
IHO - International Hydrographic Organization
WS-I - Web Services Interoperability Organization
LIF - Location Interoperability Forum
WLIA - Wireless Location Industry Association
FGDC - Federal Geographic Data Committee
GSDI - Global Spatial Data Infrastructure
CEN - European Committee for Standardization
DGIWG - Digital Geographic Information Working Group
OGCE meets Europe's interoperability challenges and is actively involved in a number of European
Union Projects:
•
ETeMII - European Territorial Management Information Infrastructure
•
GETIS - Geoprocessing Networks in a European Territorial Interoperability Study
•
GINIE - Geographic Information Network in Europe
•
INSPIRE-INfrastructure for SPatial InfoRmation in Europe
50
2 Geographic Guidelines: Non-Normative Section
This section discusses difficulties in data loading, good and bad practices in the creation of data
and some tips and tricks to make data flow more fluent. In depth guidelines are also included on
how to create GISCO maps presenting some of the main principles of cartography and thematic
mapping.
The normative part (Section 1) addresses particular standards that must be met when creating
data intended for the GISCO Reference Database. The final part of the document addresses data
quality and consistency.
2.1 What are good practices and what are the bad practices for the
creation of data.
The paragraphs below describe some good practices when creating data intended for the GISCO
Reference database, and bad practices that should be avoided.
2.1.1 Data delivery characterset
ArcSDE supports Unicode, but with some limitations. To load and display a feature class with
attributes in multiple languages, ArcSDE does not support Unicode. The full Unicode support is
available in the personal geodatabase.
GISCO SDE database is an Oracle database is created with character set UTF8, and thus CHAR
and VARCHAR can store characters in UTF8. Only one language can be loaded or displayed at
once. To load or display different languages, set the appropriate NLS_LANG value for each
language. Therefore, the character set for each dataset should be specified on delivery.
2.1.2 Data including names
Include also an attribute with the Names in ASCII. This provides an ‘always readable’ name, in
case something goes wrong with character conversion.
2.1.3 Avoid specific Geodatabase features (only usable for ArcGIS clients)
These features are only recognized by ESRI ArcGIS clients. They can be useful in an ESRI client
environment, but they are not enforced at database level. Therefore, only use them if they are
really needed and, appropriate action need to be taken at database level.
2.1.3.1 Composite relationship classes
These type of relationship class implements a kind of referential integrity rule between two feature
classes (child can not exist without the parent), eventually with a cascading delete rule. This
restriction is not enforced at database level, allowing third party tools to bypass the rule.
If needed to maintain database integrity, also the database Foreign Key constraint should be
created in order to enforce the maintenance of the integrity at database level.
2.1.3.2 Relationship Rules
Relationship rules control how parent records relate to child records. These rules can be validated
with ESRI ArcMap (but are not enforced). They are not enforced at database level at all and there
is no easy implementation at database level. Therefore, avoid the use of Relationship rules.
2.1.3.3 Domains
A domain is a declaration of acceptable attribute values. This restriction is not enforced at
database level, allowing third party tools to bypass the rule.
If needed to maintain database integrity:
• For range domain, create a database check constraint.
51
•
For a coded value domain, create a lookup table with a referential integrity constraint at
database level.
2.1.3.4 Subtypes
Subtypes provide a way to group features of one feature class into subsets using values in an
attribute. An integer value is added as attribute, and the geodatabase can translate this value to a
meaningful text. For third party products, a lookup table should be provided for translating the
integers.
2.2
Some tips and tricks to make the dataflow more fluent.
The paragraphs below outline some basic tips to help the dataflow to GISCO more fluent.
2.2.1 Delivering data in (personal) geodatabase
•
•
•
•
Provide UML
if possible provide SDE-export files with specification of characterset used;
if non of above is possible then describe relations between data;
never create relationships on the OBJECTID of feature classes!!!
2.2.2 Use GISCO standard database projection, extent and precision.
Use the same spatial reference parameters as the GISCO database for setting up the
geodatabase (personal or SDE) feature datasets and feature classes. This way rounding errors
are avoided when re-projecting or changing to a different precision.
Figure 8: GISCO Database spatial reference properties
52
4
2.3 Experienced difficulties in loading data into the new GISCO
structure.
The paragraphs below describe some of the common difficulties that have already been
discovered when loading data into the new GISCO Reference Database structure, and solutions to
help GISCO avoid such problems.
2.3.1 Delivering in shapefile format
Problem:
ArcSDE applies shape validation rules to any shape to be stored in the SDE database, while native
shapefiles do not enforce these rules. SDE will reject shapes that do not conform to the SDE
shape validation rules, which results in incomplete loading of data.
Solution:
Be sure to deliver a ‘clean shapefile’. Use the ‘Check Geometry’ tool in the ‘Features’ toolset of the
‘Data Management Tools’ of the ArcGIS toolbox to find feature geometry problems and correct
them.
More Information:
ArcSDE9.0 Developer help -> Getting Started -> Geometry
2.3.2 Problems with Charactersets
Problem:
GISCO SDE is using an Oracle database created with character set UTF8 to store multiple
European character sets. ArcSDE supports Unicode, but with some limitations. CHAR and
VARCHAR can store characters in UTF8, but only one language can be loaded or displayed at
once. Thus, you can not load (or display) different character sets at the same time.
Solution:
To load or display different languages, you need to set the appropriate NLS_LANG value for each
language. For dataset delivered in sdeexport format, specify the NLS_LANG setting for each
dataset, so it can be applied when using sdeimport.
4
ArcGIS assumes equal resolutions for the X and Y axis. Therefore, the maximum distance of both axis
defines the spatial resolution of the dataset. As a side effect, ArcGIS automatically extends the upper limit
of one axis with the maximum extent of the dataset.
53
The full Unicode support is available in the personal geodatabase. However, when loading the
personal geodatabase to SDE database, the correct NLS_LANG must be set. Therefore, if
different language sets are loaded in the personal geodatabase, provide information on which
subsets need to be loaded with which NLS_settings.
More Information:
ESRI knowledge base: Article ID 27341
2.4
Making GISCO Maps
2.4.1 Introduction
2.4.1.1 Overview
The section describes good practices for making thematic maps based on the GISCO Reference
Database. GISCO aims to promote the appropriate use of maps for visualizing statistics but also to
provide non-experts with some basic guidelines on designing thematic statistical maps and how to
avoid the most common errors. The chapter is extended by more fundamental description of
cartographic mapping principles that can be found at http://www.gisco.eurostat.cec/mappingguide.
The pdf version of the Desktop Mapping guide can be downloaded at
http://www.gisco.eurostat.cec/gisco/cfm/reports_en.cfm.
2.4.1.2 Purpose of this chapter
This chapter is intended to give some principles on cartographic mapping and to introduce the
GISCO mapping tool. The mapping tool allows to create statistical maps within ArcMAP in a semiautomated way.
2.4.2 Why do we use maps?
Maps are a great way of displaying statistical data.
They can present complex data clearly and compactly
They can be a great help in spotting patterns within data
They are accessible •
•
•
•
people understand maps (or at least think they do)
people like maps
maps attract attention and brighten up presentation
But maps of statistics do present a number of problems
A map always generalises and simplifies information.
Maps can end up as decoration - unless you are careful sometimes the appearance of the map
can become more important than its value and validity for presenting statistics.
Information on a map is always interpreted information. Maps can mislead as well as provide
useful information. Bad design can provide completely the wrong impression of the data. There is
always the risk of unintentionally lying with maps.
54
Avoiding these problems and making sure that maps inform the reader, release new information
from the data and present the statistics in valid way isn't difficult. It does however require that you
are logical, careful and think hard about what you are doing.
Economic, social and natural actions and phenomena all have a spatial component. By coupling
statistical information with geographical territories we enhance the effectiveness with which they
are presented or analysed.
2.4.4 What is a thematic map?
In the past, map production was rather exclusive, but today everyone with a PC and mapping
software can. We then use the map to communicate with other people, and we want them to
receive the message the way it was meant to be received.
We can distinguish between different types of maps: Topographic-, technical- and thematic maps.
A road map or survey map is a good example of a topographic map. Technical maps are those you
receive from the technical division in the commune, they describe the border of your site, and
where to find technical equipment on your site.
In this guide we refer only to thematic maps. This is a map where we connect non geographical
data sets (ex. economic, social, demographic traffic data) with an indirect geographical reference
(ex. region code, commune code, road number) to the map. This could be a starting point for future
analyses, where the producer and/or the reader want to increase insight into the data set during a
cartographic presentation.
Figure 9: Thematic maps
Thematic maps take their bases from existing topographic maps but they are distinguished further
by the subject matter which usually is not the physical earth or locations upon it. The subject may
be some distillation of physical phenomena, such as average annual temperature or precipitation
values. Commonly, though, the subjects mapped are both abstract and non-physical, like crude
birth rate per thousand inhabitants.
The concern of thematic mapping is for a sound presentation of the essence of some distribution.
We consider a thematic map as the primary component of any spatial analysis, presenting
statistical information on "how much" or "how many", but also "where" a phenomena occurs.
A strong sense of "visual logic" is vital, and a knack for choosing the right words to accompany the
graphics is equally important. Thematic Maps - Their design and production by David J. Cuff and
Mark T. Mattson
55
2.4.5 Introduction to mapping concepts
In order to create a complete map, several important mapping concepts should be followed, such
as:
•
•
•
map features
map characteristics
structure of a thematic map
Beside the choice of the right symbol (point, line, area) describing a specific theme analysed, the
map characteristics (projection, scale,…) are essential in order to form the base elements of a
map.
As far as the structure of a map is concerned, certain elements such as title, legend, … are
absolutely necessary to create a clear elaborate map.
In the following paragraphs all these ingredients, making a successful map, will be explained more
in detail.
2.4.5.1 Map features
The information conveyed by a map is represented graphically as a set of map components.
Location information is usually represented by points, lines and areas.
Point feature:
A point feature is represented by a single location. It defines a map object of which the boundary or
shape is too small to show as a line or area feature.
Line feature:
A line feature is a set of connected, ordered coordinates. It represents the linear shape of a map
object that may be too narrow to be displayed as an area, such as a road, or a feature that has no
width, such as a contour line.
Area feature:
An area feature is a closed figure whose boundary encloses a homogeneous area, such as a state,
country, soil type or lake.
2.4.5.2 Map characteristics
Map projection:
Each map projection is the location framework of a thematic map. It is a systematic arrangement of
the earth’s meridians and parallels onto a plane surface. We have got different types of projections,
but each generates automatically some distortions of the area, distance, shape and direction.
There is no transformation process which can completely eliminate simultaneously all these
distortions. So the user has got to select the most appropriate projection depending on the map’s
message.
Map scale:
Map scale is the extent of reduction required to display a portion of the Earth’s surface on a map. It
can be expressed as a representative fraction, which is a ratio of the distance on the map page to
distance on the ground. Larger scale maps show features in greater detail but represent less area.
Smaller-scale maps show larger area but represent less detail. It is important to remember that
only maps of the same scale should be used as overlays. Maps in different scales serve different
needs. By no means a map of 1:3.000.000 should be used in order to depict the location of wells in
a region. However, it is adequate for presenting the ports of France on an A0 format poster.
Map resolution:
The resolution of a map is the accuracy with which the location and shape of map features can be
depicted for a given map scale. Scale affects resolution. In a larger-scale map, the resolution of
features more closely matches real-world features because the extent of reduction from ground-tomap is less. When using a map we should always think about the scale and resolution.
56
Other map characteristics are the map accuracy and the map extent.
2.4.5.3 Structure of a thematic map
A complete map contains 5 elements: Title, legend, scale, textual information, and the actual map.
A map should be as self-explanatory as possible, so that a reader immediately sees what the map
is all about without consulting the legend (e.g. for quantitative data). This is obtainable if we follow
the visual rules according to the thematic information and the used visual variables.
Title:
The title should identify which theme variables are involved, what the map is all about. Very often
we need a long title, and in this case we should use a short main title and a subtitle. The subtitle
should contain information about the area the map covers and in the case of statistical information
the reference period of the data. An indication of the NUTS level showing the breakdown of the
regional data is obligatory.
Legend:
The legend should identify each of the theme variables used in the map as well as which visual
variable corresponds to which theme variable. In simple words the variable which has been used
for mapping should be explicitly stated and not mixed with the "Title" of the map i.e what does the
line or point show?; what is the difference between the blue and the red line etc. The unit of
measurement of the variable is obligatory.
Scale:
The scale is one of the most important elements on a map. So scale selection has got an important
consequence for the map’s appearance and its potential as a communication device. On a map,
however, use a graphic scale bar rather than a numerical scale (1:50 000), because any reduction
of the map will not correspond anymore to the reality.
Textual information:
This could be subtitles or footnotes connected to the map as well as a declaration of the statistical
and geographical data sources, the date of production of the map itself and the geographic
orientation (N). Orientation need not always to be shown by an arrow, you can also use the
graticule (parallels and meridians) or grid ticks. In case of statistical data from different sources,
estimates or with a different reference period should be explicitly declared. Any exceptions of the
NUTS nomenclature should be mentioned.
The actual map:
This is the thematic map produced from geographical information (ex. NUTS boundaries,
commune boundaries etc) and statistical information for the NUTS regions, communes etc.
2.4.6 Creating a statistical map using the ‘Mapping Tool’
2.4.6.1 Introduction
The “Mapping Tool” software has been created for GISCO, for the production of statistical mapping
based on NUTS regions from within the ArcMap environment. The tool can cater for the production
of one-off single maps by running the software in wizard mode. Or alternatively the software can be
run in batch mode with the use of map spec files that define all the parameters used in creating the
maps.
The use of the map spec file is also useful as a record of the maps created and also enables users
to easily create maps again to the same or an adjusted spec.
Software takes external input from • Pre-created ArcMap template files in which the NUTS data source is referenced.
• Statistical data in a suitable tabular format (txt files, csv files, dbf, or Personal geodatabase tables)
• Map spec files that define all the map parameters.
57
Output from the software is • ArcMap documents. These can be used in the future to recreate the mapping and export to
a suitable export format.
• Maps exported to a defined export format. (EPS, JPEG, TIFF, PNG or PDF)
Figure 10: Example of map production process
The software is supplied in the form of an executable file (*.dll). This software can then be added to
the ArcMap environment as a custom button. For full instructions on how to use the ‘Mapping Tool’
(including installation), the Mapping Tool user guide should be obtained. An overview of the map
making process is described below, together with an example of the map output.
2.4.6.2 Map Making Process
The wizard allows the user to join statistical data to NUTS regions, and display the data in a variety
of ways by giving the user a number of options. The software uses predefined ArcMap templates
(*.mxt), which define the page layout and the source data, as the basis for creating the maps.
Templates can be found at ….
The software allows the use of pre-created files in which the map parameters can be stored, read
and used in the map creation process. This allows maps to be recreated exactly, processing as a
batch and also provides an easier way of specifying the parameters.
The software can also read a style file of pre-defined styles to define the colour scheme for the
statistical mapping. The software will define the colour scheme as a “shade set”, which is in effect
a range of colours which are defined by ArcGIS as “Color Ramps”
58
On running the software, the user has 3 choices –
•
•
•
Create using wizard – This is the default option
Create using a map spec file – This enables the user to create a map to a previously
created specification, by loading a map spec file (*.msf) that defines all the map
parameters.
Process as a batch – Batch process the map production by selecting multiple map spec
files.
The user can then run through a number of processes, giving options to customise the map display
and then create the map output:
1. Page Layout - Stored templates can be selected.
2. Language - It is necessary to specify a language for the maps. This is the language that
the standard labels (E.g. Copyright clause) will be printed in. The options in this dropdown
list are read from the standard labels table, so other languages can be added where
necessary.
3. Statistical Dataset - The user can navigate to the desired statistical data source. This data
source can be comma or tab separated text file, dbf file or geo-database feature class
4. Join field - The ‘Join field’ is used to join in the statistical data to the NUTS regions.
5. Map Number - This is a map reference to be entered by the user and is used to uniquely
identify the map. This may not actually be number value.
6. Symbology - On the main dialog window is displayed a brief description of the method in
which the map production process will use to display the statistical data. This method can
be amended in a number of ways including: amending the symbology category, method of
classification, values to exclude from the classification and definition of the symbology.
7. Shade Sets - It is necessary to define how the statistical data is displayed (i.e. the fill
colour of each class). This is accomplished by defining a shade set to use. A “Shade set”
in this instance describes what ArcGIS defines as a “Color Ramp”, which is simply an
ordered set of predefined colours or graduated range of colours. This can then be applied
to a statistical classification of data.
8. Export the map - Define if the map is to be exported to any of the supported formats as
part of the map creation process.
9. Map Labels - Define labels to print to the map legend
10. Other options - The user has the option to save the parameters as input on the dialog to a
map spec file, and to choose a filename.
11. Run the map creation process.
59
Figure 11: An example of the map output
60
3 Geographic Guidelines: Data Quality Section
3.1 Quality Assurance Principles
As the sources and amounts of digital spatial data increase, it becomes increasingly important to
enable the integration of heterogeneous data environments. One way to tackle this challenge is to
agree upon data quality standards. The availability of standards has many advantages to the data
collector, processor and user, especially when many different sources are used.
The GISCO Reference environment is a particular heterogeneous data environment. The reason
for this is that GISCO does not create spatial data itself, nor does it ‘order’ data based on set
specifications. Until now, GISCO uses ‘the best sources available’ in order to create a European
wide database. This means that for a particular data request, GISCO looks for suited data already
available on the European market, in the first place in the public domain. As these data sets
already exist, GISCO has little influence on the data quality. GISCO can only evaluate and assess
the proposed data sets and then decide if a given data set is fit for integration in the GISCO
Reference environment.
Data sets that are proposed to be integrated in the GISCO database have to be assessed on the
quality before any further processing is carried out. The reason for this is to ensure that no
unnecessary work is made to process data that are not fit for integration. The supplied data sets
have to pass controls made on their overall quality, for example:
•
•
•
•
•
•
is the source reliable?
will necessary support be supplied?
are updates foreseen?
are data consistent and complete?
is the coding consistent?
does the data set cover the necessary geographical extent?
This section discusses different quality elements that should be adhered to before GIS data should
be used with, or included into the GISCO Reference Database.
Data quality information will need to be included in the metadata when data is submitted for use
within the GISCO Reference Database. Metadata provide information on data. They can be used
for searching and accessing related data. GISCO has developed a metadata profile based on ISO
19115 definition for geographic metadata for use within the Commission. Although some metadata
elements such as the dataset extent can be created automatically in software such as ArcGIS,
data quality elements such as scale and resolution should be recorded manually. Section 1.2 of
this document shows how metadata conforming to the GISCO metadata profile can be created
using a metadata editor as part of ArcCatalog.
3.2 Geometric quality
3.2.1 Scale and Resolution
3.2.1.1 Map Scale
Map scale specifies the amount of reduction between the real world and its graphic representation.
It is usually expressed as a ration (e.g. 1:1.000.000), or equivalence (e.g. 1 cm = 1.000.000 cm).
When a map is printed it always has a fixed scale.
61
3.2.1.2 Display Scale
Even though a GIS allows you to zoom in on a map, the display scale should be the scale when
the map ‘looks right’. The display scale influences two things about a map:
The amount of detail. If a very detailed map is displayed on a small scale it might become
overwhelmed with detail, and become too crowded. On the other hand, when a small scale map is
shown on a large scale, it will look over-generalised.
The size and placement of text and symbols. These must be sized to be readable at the display
scale, and placed so that they do not overlap each other.
However, it must be remembered that when the smaller scale of the range is used
(e.g.1:50 000 000) areas of detail might be merged into black blobs
If the map will be used only for illustration of a certain trend (e.g. statistical data), where the detail
of the geographical features is less important, it can be acceptable to make a map on a larger
scale. Therefore, the actual scale on which a particular data set is displayed or printed greatly
depends on the purpose of the map
A GIS map’s annotation (text and symbols) must be designed for a certain display scale. If not, the
annotation will not look right compared to the displayed map.
3.2.1.3 Data Resolution
Data resolution is the smallest difference between adjacent positions that can be recorded and it
also limits the minimum size of features that can be stored. The resolution is dependent of the
method used to obtain digital data and is tied to the map scale. A typical minimum distance
between co-ordinates that can be captured from a paper map is about ⅓ millimetre. The maximum
distance between co-ordinates should be 2 millimetre.
The table below gives the resolutions corresponding to a certain nominal scale. The resolutions
should not be taken as absolute figures, but be used as indications on what resolution ranges
should be expected. The maximum distance between co-ordinates is given in the third column.
More complex data need more co-ordinates. Therefore, for a complex data set such as a soil more
vertices are needed than for a data set such as parcel map. The appreciation of the quality of a
data set thus remains subject to the expertise of the evaluator.
Table 15: Typical resolution and maximal distance between co-ordinates
Scale
Resolution in meters
Max distance
(= 0.0003 m * scale factor)
(= 0.002 m * scale factor)
1:100.000
30
200
1:1.000.000
300
2 000
1:3.000.000
900
4 500
3 000
20 000
1:10.000.000
Figure 12: Resolution sample testing on a map with a scale of 1:100 000
~ 30 m
Vertex
Node
Node
Vertex
62
3.2.2 Positional Accuracy
Even if the resolution is of required quality, the accuracy of the map might not be accordingly. The
geometric accuracy refers to the degree to which information on a map or in a digital database
matches true or accepted values. Since digital geographic data is an attempt to model and
describe the real world, no map will ever be completely accurate. The accuracy depends on the
way the data was created.
The level of accuracy varies greatly with each data set. Distances between 0.3 - 1 mm are the
smallest distances that can be measured on a map. The distance depends on factors such as line
thickness. Accuracy can be expressed in map units (expressed in cm or mm) or real world units
(expressed in meters). The conversion between the two is done through the scale.
If 1 millimetre is said to be the accuracy for a certain data set this means that any feature on a map
is allowed to move around 1 mm compared to the reference material. This implies that the digital
map has been printed in the appropriate scale and the same projection system as the reference
material. The actual displacement (in meters) of features compared to the used reference material
depends on the map scale according to the table below.
The maximum displacement allowed is 1 mm, which corresponds to the distances in the table
below.
Table 16: Displacements when an accuracy of maximum 1mm is accepted
Scale
Max displacement
(0.001 m * scale factor)
1:100 000
100
1:1 000 000
1000
1:3 000 000
3000
1:10 000 000
10000
In order to control the accuracy, the data set should be compared to reference material of
appropriate scale. The difficulty within GISCO is that the maps used as source material for the
assessed data set are not available. In order to be able to measure accuracy, it is proposed to use
topographical maps as reference material, or maps of a similar quality. If the original maps are not
used, the measured accuracy might be different from the one given by the data providers.
Moreover, when maps of the same scale as those from which a data set is derived are not
available reference maps of a more detailed scale must be used. Then it becomes more difficult to
give a statement on accuracy.
Furthermore, differences might be expected due to data manipulation such as edge-matching and
projection conversions.
3.3 Topological quality
Topological quality is measured for vector data sets only. Topology defines the spatial
relationships between the features in a vector data set. To have a correct topology, some basic
quality measures must be fulfilled.
3.3.1 Dangle Nodes
Polygon datasets or line networks must be check for dangle node errors. These are essentially end
point nodes which represent an overshot or undershot of an intersecting line.
63
Figure 13: A Dangle node
In a network, dangle nodes should be snapped to the line it intersects. Where this intersection
occurs, the intersecting lines should be split, so the eventual set has three lines, all with coincident
end points.
Figure 14: Cleaning linework
Dangle nodes should normally not be present in a polygon coverage, as these can be an indication
of non-closed polygons.
Figure 15: Closing polygons
Dangle nodes
There are however situations when they can be present without representing an error. This can for
example occur for so-called ‘cul-de-sacs’. Since it becomes difficult to evaluate which dangle
nodes are errors and which are not it is strongly recommended not to have dangle nodes in a
polygon coverage.
Figure 16: ‘Cul-de-sac’ dangle nodes
Dangle node
Dangle nodes can be traced with GIS software that supports topology.For line coverages that are
said to contain arc/node topology it is important to verify this. The topological data structure is used
64
to represent connectivity between arcs and nodes. The relationship is built up in the attribute table
by specifying the ‘from’ and the ‘to’ node for the different arcs.
Figure 17: Arc/node connectivity
Electricity line
From node
To node
1
1
4
2
4
3
3
3
2
4
2
1
3
3
2
2
4
4
1
1
3.3.2 Polygon Topology
All polygons should be labelled and a polygon should have one and only one label;
Figure 18: Polygon Labels
A5433
A5432
A5434
Polygons without label can either be sliver polygons or polygons that are not correctly labelled. Sliver
polygons are small unwanted polygons that are created when polygon are created from noncoincident lines. No sliver polygons should be present. Sliver polygons often occur when two or more
coverages are overlaid.
Figure 19: ‘Sliver’ polygons
Coastline
Sliver Polygons
Land Use
Land Use
Soils
Soils
The presence of sliver polygons in the data set can often be detected through a control of the polygon
size. Very small polygons compared to the average polygon size are likely to be sliver polygons.
no dangle nodes are allowed to be present.
65
3.4 Attribute Consistency
The attribute data in the data set have to be assessed on the thematic content.
Taken into account the number of different themes covered in the GISCO database, the wide EU and
pan European coverage, the different definitions and nomenclatures and the limited expertise in a
number of these themes, it is not possible to make an exhaustive assessment of the quality of the
contents of a data set.
There are two general characteristics, consistency and completeness, which are discussed in the
paragraphs below. There are three different types of attribute data These are limited code lists,
unlimited code lists and continuous values. Depending on the type of attribute the following controls
are carried out:
3.4.1 Limited code lists
For data sets with limited code lists, such as land use, soil nomenclature and road types, it has to be
controlled if each feature is coded with one of the values in the code list.
A consistent data set implies that the coding is exhaustive and no other codes than what is available
in the legend can be coded in the attribute table.
The result of the assessment is the statement if each feature in the data set is coded according to the
values in the code list. If this is not the case, it does not mean that the data set is refused for
integration. For consistency, errors could be corrected if reference material is available. If this is
worth doing so will be decided after the overall quality assessment.
Figure 20: Example code list. The polygons that are coded with 127, 128 and 135 are not correctly
coded.
123
126
124
125
127
128
135
Code
123
124
125
126
Definition
Densely populated area
Intermediately populated area
Sparsely populated area
No data
3.4.2 Unlimited code lists
For unlimited code lists, for example settlement lists and administrative region lists, it is difficult to
control that all data actually are included. One way to control this is to compare the data set with
suitable reference material. Furthermore, it can be verified that all geographic objects have a related
record in the attribute table. A map with the codes can be printed out and given to an expert or group
of experts who can give a statement of the quality.
If exhaustive reference material is not available, it is not possible to give an absolute statement about
the quality. A qualitative statement can be given.
Table 17: Code list; all the NUTS 1 regions have a related record in the attribute table.
Nuts region name (NUTS 1)
NORD-PAS-DE-CALAIS
ILE DE FRANCE
BASSIN PARISIEN
…
…
…
…
…
…
…
…
3.4.3 Continuous values
Continuous values, for example altitude and population number, can to a certain extent be controlled
by some statistical checks:
• the maximum and minimum values for the data set. In that way outlayers can be detected and
controlled. If these values are unreasonable, for example if the maximum population number in a
Belgium settlement is 10 million, the thematic content of the data set is not correct.
•
control of the sum
e.g. the sum of population figures in different communes. These figures should then be comparable
with the corresponding population figures for the country.
•
control of the rank distribution
e.g. rank the settlements according to their size and verify if the order is the expected.
Furthermore, the quality of the continuous values can be assessed through sample checking. The
attribute data should then be compared to reference material. This work should be done by experts in
the field.
3.5 Topological Consistency
3.5.1 Relationship between datasets
If a thematic dataset is based the GISCO Reference Database, then there must be topological
consistency between the thematic dataset and the GISCO Reference Database.
For instance, if the thematic dataset uses NUTS regions, then these must overlay the NUTS data
exactly. There should be no sliver polygons if the two would to be merged together. Similarly,
coastline should be consistent between any thematic datasets and the GISCO Reference Database.
Note the NUTS regions and coastline are already consistent within the database.
You may have a point layer that should be consistent with a polygon layer. For a basic example,
every country should have one capital city, but not more than one.
Basic spatial overlay tests can be performed in ArcMap and other GIS packages. For data intended
for the GISCO Reference Database, more advanced script is written to test the quality of the
topological consistency as part of the quality control procedures.
3.6 Completeness
3.6.1 Geographic completeness
Completeness measures the amount of spatial features included in a digital data set as a result of
data input and data conversion. Completeness of the data should be evaluated for the geographic
extent and for the geographic objects..
3.6.1.1 Geographic extent
The geographic extent should correspond to the metadata information. It is particularly important to
control that all parts of a country’s territory are available in the data set. Special attention should be
drawn to the presence of data on…
•
the French DOMs: Reunion, Martinique, Guadeloupe, Guyane (France) ;
•
the Canarias (Spain);
•
Madeira and Açores (Portugal);
…because they are often forgotten.
3.6.1.2 Geographic objects
Geographic features that are included in the data set are mentioned and explained in the metadata.
Normally, the feature density is scale dependent since a larger scale can include more features
without losing its clearness.
For each data set the actual feature content has to correspond with the metadata information. For
example, if a data set is supposed to include all settlements with a population number of more than
100.000 but actually only includes the capitals of each country, then the data set is not complete.
To control the feature content the data set is compared to suitable thematic reference material
according to theme, for example road maps, soil maps rather than to national topographical maps.
3.6.2 Attribute Completeness
The completeness of the attribute tables is also important to control. Firstly it has to be ensured that
all non existing values are marked as no data. Secondly there are not allowed to be too many no data
values since that would make the attribute table incomplete. One way to get an idea of the
completeness is to calculate the ratio between the number of no data records and the total number of
records.
68
3.7 Generalisation
3.7.1 Introduction
Section 1.5 of this document introduces Generalisation in ArcGIS and states parameters used for the
GISCO reference database. This section looks further into the ‘Point Removal’ and ‘Bend Simplify’
Algorithms and includes advice on how to analyse and improve generalisation results.
3.7.2 Point Remove
Point Remove applies a published algorithm with enhancements. It is a fast, simple line simplification
algorithm. It keeps the so-called critical points that depict the essential shape of a line and removes all
other points. The algorithm connects the endpoints of a line with a 'trend line'. The distance of each
vertex to the trend line is measured, perpendicularly. Vertices closer to the line than the tolerance are
eliminated. The line is then divided by the vertex farthest from the trend line, which makes two new
trend lines. The remaining vertices are measured against these lines, and the process continues until
all vertices within the tolerance are eliminated (see the diagram below).
Figure 21: Simplification process
Point Remove is efficient for data compression and for eliminating redundant details; however, the
line that results may contain unpleasant sharp angles and spikes which reduce the cartographic
quality of the line. Use Point Remove for relatively small amounts of data reduction or compression
and when you don't need high cartographic quality.
3.7.3 Bend Simplify
Bend Simplify applies shape recognition techniques that detect bends, analyse their characteristics,
and eliminate insignificant ones. A linear feature can be seen as composed by a series of bends,
each is defined as having the same sign (positive or negative) for the inflection angles at its
consecutive vertices. Several geometrical properties of each bend are compared with those of a
reference half circle, the diameter of which equals to the specified simplification tolerance. These
measures determine whether a bend is kept or eliminated, meaning replacing the bend by its baseline
(the line connecting the endpoints of the bend). The simplification takes place iteratively such that the
smaller bends may "disappear" in the early rounds and, therefore, form bigger bends. The resulting
line follows the main shape of the original line more faithfully and shows better cartographic quality
that from Point Remove.
3.7.4 Choosing a suitable tolerance
The tolerance value determines the degree of simplification. To produce cartographic outputs, set the
tolerance equal to or greater than the threshold of separation (the minimum allowable spacing
between graphic elements). Since the tolerance is used for the entire input, trial and error may be
required to find a suitable tolerance for all features. Using the same tolerance, Point Remove
produces rougher and more simplified result than Bend Simplify.
69
3.7.5 Analysing and improving the results
When the ‘Resolve Errors’ option is used in the command line or in script, or when the "Resolve
topological errors" checkbox is checked on the dialog, the process will check for topological errors,
line-crossing, coincident lines, or collapsed zero-length lines. If any of these errors are detected after
the first round of simplification, the involved line segments (not the entire lines) will be located and a
reduced tolerance, 50% of the previously used, will be applied to re-simplify these segments. This
iteration will repeat as many times as needed until no more topological errors are found. The output
feature class will contain two new attributes, MaxSimpTol and MinSimpTol, which show the range of
tolerances actually used in simplifying each line. The two fields, MaxSimpTol and MinSimpTol, will not
be added if no errors were found in the process.
If the output feature class contains MaxSimpTol and MinSimpTol fields, you can have an estimate on
how the specified tolerance worked for the data. If the values of the MaxSimpTol and MinSimpTol
fields for the majority of output lines are smaller than the specified tolerance, it means there are many
conflicts found during the process and you might want to lower the tolerance.
For lines with the values of the MaxSimpTol and MinSimpTol smaller than the tolerance, they may
represent a narrow area, such as a narrow, double-line river or two very closed boundary lines. In that
case, perhaps simplification of the lines may not be the right solution. The narrow features may need
to be represented differently, for example, by single lines.
The tool simplifies lines one by one and the longer a line runs, the more pleasing the result. Keep this
in mind when you collect or construct the source data. Also wherever possible, position endpoints of
lines on long, smooth sections of lines, rather than severely bent sections.
3.8 Data Formats
3.8.1 Popular Vector Formats
There are many vector formats within the GIS Industry. As the GISCO Reference Database is based
on ESRI Architecture, ESRI formats are generally preferred, although open standard formats such as
GML will also be accepted in the future. The following paragraphs describe the ERIS vector data
formats and GML.
3.8.1.1 Coverage (ESRI)
Coverages have been part of the ESRI ArcInfo software (now part of the ArcGIS suite) since the early
90’s when GIS was in it’s infancy, but is still a viable format for complex geo-processing and spatial
analysis.
Coverages are a collection of files located in multiple directories. Because of this layout, ArcCatalog
must be used to relocate, copy, rename, delete and reformat the data.
ESRI provides the export format (e00) which allows all spatial and descriptive information for a
coverage to be combined into a single ASCII file. The reverse operation of import recreates the
original coverage from the e00 file with no loss in accuracy or detail or topology.
Coverages can contain feature classes which are classified as either primary, composite or
secondary. Primary features include arcs (lines), nodes, polygons and label points. Composite
features such as routes and regions are built from primary features. Secondary features include
ground-registration TIC marks and annotation.
Multiple features classes can be contained within the same coverage. For example, line and point and
route and annotation features could all exist in the same coverage.
Perhaps the most useful characteristic of a coverage is the ability to maintain and store topology.
Topology is the spatial relationships between vector features within the data structure. These
relationships include;
• Line Connectivity (i.e. to and from),
• Line Contiguity (i.e. adjacency and direction), and
• Area Definition
70
Topology is stored very efficiently in the ESRI coverage structure (i.e. no redundant coordinates).
3.8.1.2 Shapefile (ESRI)
Shapefiles were first introduced with ESRI’s GIS software ArcView 2. Although ArcView is now part of
the ArcGIS 9 suite and has changed dramatically, shapefiles can still be viewed, created and edited.
Shapefiles can contain either points, multipoints (multiple points linked to one record), lines or
polygons.
A shapefile stores non-topological geometry and attribute information for the spatial features in a data
set. The geometry for a feature is stored as a shape comprising a set of vector coordinates.
An ESRI shapefile consists of a main file, an index file, and a dBASE table. The main file is a direct
access, variable-record-length file in which each record describes a shape with a list of its vertices. In
the index file, each record contains the offset of the corresponding main file record from the beginning
of the main file. The dBASE table contains feature attributes with one record per feature. The one-toone relationship between geometry and attributes is based on record number. Attribute records in the
dBASE file must be in the same order as records in the main file.
In ArcView 3.2a onwards, a shapefile can also consist of a projection file (.prj).
Because shapefiles do not have the processing overhead of a topological data structure, they have
advantages over other data sources such as faster drawing speed and edit ability. Shapefiles handle
single features that overlap or that are non-contiguous. They also typically require less disk space and
are easier to read and write.
3.8.1.3 Geodatabase (ESRI)
The ESRI Geodatabase format was introduced with ArcGIS 8. It is designed to hold all types of
geographical data and support a relational database structure. It can hold vector and raster data. The
types of vector data a Geodatabase supports includes lines, points and polygons, as well as
associated annotation. Geodatabases can also store topology.
Each geodatabase can have a number of feature datasets. Feature datasets exist in the geodatabase
to define a scope for a particular spatial reference. All feature classes that participate in topological
relationships with one another (for example, a geometric network or a topology) must have the same
spatial reference. Tables can also be linked to feature classes. As with coverages, ArcInfo is used to
view, create, copy and rename the different elements.
Figure 22: Geodatabase Structure
71
Geodatabases can be stored in a ‘personal’ Geodatabases or a ‘multi-user’ Geodatabases.
Personal Geodatabase support is built into ArcInfo and provides access to local data. It is directed
towards personal or small work-group use and can handle small to moderately sized datasets. It is
held in Microsoft Access format, although this is invisible to the user. Microsoft Access is not required.
Figure 23: Personal Geodatabase support
A ‘multi-user’ geodatabase (A Geodatabase served through ArcSDE) can manage very large datasets
and serve large numbers of viewers and editors. It can also be ‘versioned’. A versioned Geodatabase
allows editors to work concurrently and includes a framework for resolving edit conflicts.
A Geodatabase served through ArcSDE is supported in ArcIMS.
A multi-user Geodatabase can be stored as Oracle 8, SQL, Informix, DB2 or Sybase.
Figure 24: Multi-user geodatabase support
3.8.1.4 GML (Geography Markup Language)
GML or Geography Markup Language is an XML based encoding standard for geographic information
developed by the OpenGIS Consortium (OGC).
Like any XML encoding, GML represents geographic information in the form of text. Text has a certain
simplicity and visibility on its side. It is easy to inspect and easy to change. Add XML and it can also
be controlled.
GML encoding already allows for quite complex features. The geometry of a geographic feature can
also be composed of many geometry elements. A geometrically complex feature can consist of a mix
of geometry types including points, line strings and polygons. It can also hold topology and attribute
information.
GML is becoming more and more popular within the GIS industry. Below is an example of GML:
72
<Feature fid="142" featureType="school" >
<Description>Balmoral Middle School</Description>>
<Property Name="NumFloors" type="Integer" value="3"/>
<Property Name="NumStudents" type="Integer" value="987"/>
<Polygon name="extent" srsName="epsg:27354">
<LineString name="extent" srsName="epsg:27354">
<CData>
491888.999999459,5458045.99963358 491904.999999458,5458044.99963358
491908.999999462,5458064.99963358 491924.999999461,5458064.99963358
491925.999999462,5458079.99963359 491977.999999466,5458120.9996336
491953.999999466,5458017.99963357 </CData>
</LineString>
</Polygon>
</Feature>
3.8.2 Vector Format Limitations
3.8.2.1 Coverages
The naming of the coverages are limited to a maximum of 13 characters and you can not use spaces.
The options for naming the features are very limited, which can lead to very cryptic naming
conventions.
Coverages have a number of disadvantages to a Geodatabases. They do not hold subtypes and
domains, or feature linked annotation.
The relationships in Geodatabase are also more intelligent than those of a coverage.
Coverages can no longer be edited in ArcInfo Desktop 8.3 onwards.
3.8.2.2 Shapefiles
Like coverages and Geodatabases, shapefiles can support point, line and area features. Unlike a
coverage or a Geodatabase however, only a single shape type can be contained per shapefile. More
importantly, a shapefile is a non-topological data structure which can limit spatial analysis since
connectivity and adjacency information is not explicitly recorded.
Shapefiles do not hold relationships.
ESRI does not provide an export format for shapefiles. Instead, ESRI recommends that you
package the shapefile (e.g. using WinZip) for transfers, archiving and internet access.
3.8.2.3 Geodatabases
To edit an advanced Geodatabase feature (e.g. topology), at least an ArcEditor license is required.
A personal Geodatabase holding vector data has a size limit of 2Gb.
Multi-user Geodatabases need commercial database software such as Oracle 8, SQL, Informix, DB2
or Sybase.
3.8.2.4 GML
GML is only now being adopted by the GIS industry. As the file format is text based and
uncompressed, data sizes can be large.
3.8.3 Preferred Vector Formats
Topology and relationships are important when it comes to integrating data within the GISCO
Reference Database. Therefore ESRI export format (e00) created from coverages, or Geodatabases
are preferred formats for vector data delivery. GML will be accepted in the future.
73
3.8.4 Popular Raster Formats
There are many raster formats available commercially. Most of which can be used within the GIS
environment. Below are some of the popular formats that can be georeferenced and viewed in GIS in
their correct geographical location. Using GIS software they can also be projected into different
coordinate systems.
3.8.4.1 TIFF (Tagged Image File Format)
TIFF is a very common format. It can hold colour information at varying bit levels. (eg. a two colour or
1 bit image has a smaller filesize than a 4 colour or 16 bit image). It can have up to 16 million colours
(24 bit). A TIF file can hold colours in a particular order, making it a good format for holding sets or
raster tiles with a consistent palette (e.g. using an ESRI Image Catalogue). This also means that TIF
is a suitable format for holding DEM (Digital Elevation Model) data where each colour can represent a
height value.
You can store TIFF files as uncompressed or compressed (using LZW compression). Until recently
the LZW algorithm used to compress the file was owned by Unisys. It is now royalty free.
TIFF files display quickly in a GIS package when a user navigates around.
3.8.4.2 JPEG (Joint Photographic Experts Group)
JPEG is a particularly popular format for photographs. It uses 16 million colours, which as a TIF file would make
the file size large. However, JPEG uses a compression algorithm which takes into account the different elements
of colour in adjacent pixels. This can rapidly reduce file size without losing much quality to the naked eye. The
quality of a JPEG file can be specified as a percentage.
3.8.4.3 GIF (Graphics Interchange Format)
GIF uses a palette of 16 to 256 colours and is popular with internet developers. It also uses a compression
algorithm to make the file size smaller, but with no loss of picture quality (of an image with 256 or less colours).
As with compressed TIFF’s, the algorithm used to compress the file was owned by Unisys. It is now royalty free.
3.8.4.4 PNG (Portable Network Graphics)
PNG is an extensible file format for the lossless, portable, well-compressed storage of raster images.
It has no patent on the compression algorithm, and like TIFF, can store colours in a particular order,
making it a good format for holding sets or raster tiles with a consistent palette.
3.8.4.5 DEM (Digital Elevation Model)
There are a number of commercial raster formats designed specifically to hold geographical raster
data with height information. These include DEM, ESRI’s GRID format, IMG (ERDAS Image Format)
and BIL (Band Interleaved by Line Format). All these are regularly used within GIS packages and like
TIFF, hold colours as height information, and can hold projection information also.
3.8.4.6 Geodatabase (ESRI)
Geodatabases can import most raster data. This allows raster data and vector data to be stored within
within the same file.
3.8.5 Raster Format Limitations
A TIFF file can have a large filesize depending on the number of colours used, although LZW
compression can greatly reduce this.
Many GIS packages have been slow to take up GIF and LZW compressed TIFF format because of
the algorithm license. Some packages may require an extra license. ESRI’s ArcGIS 9.1 (and a fully
patched version of 9.0) does not need a special license file.
JPEG’s always holds the maximum number of colours (16 million). This is not a consistent palette. It
is automatically created when the file is saved. Every time a JPEG is saves with compression it will
loose more quality.
74
Although PNG is a good all round format. It is slower to display than TIFF as it has to be
uncompressed on-the-fly.
DEM formats are usually not read by raster graphics packages, but have to be created from other
raster formats in a GIS package. The same is true for Geodatabases.
3.8.6 Preferred raster formats
Generally, JPEG should be avoided as it is more suitable for photographs rather than geographic
data. However, photographs may be part of your Geographic database to aid the recognition of land
marks. These can often be linked to geographic coordinates with the use of hyperlinks.
LZW compressed TIFF is the most suitable format for one off images (such as hillshading) or sets of
raster tiles (such as topographic or street maps with a large coverage area.
For DEM’s, IMG or BIL formats are preferred.
75
3.9 Documentation of data quality
The framework for applying quality assurance procedures and reporting the results is set by the draft
ISO standards on quality principles (19113), evaluation procedures (19114), and metadata (19115).
3.9.1 Data Quality Overview
According to the GISCO metadata profile, every GIS layer has to be complemented with overview
information on data quality. It consists of descriptions of the purpose, the usage and information on
the history (lineage) of the GIS layer. Purpose describes the original objectives for creating the GIS
layer, usage illustrates the actual usage(s) of the layer by describing related applications. The lineage
gives information on the history of the dataset. It covers the total life cycle of a dataset from initial
collection and processing to its current form. The lineage statement may contain the component
“source information” that describes the origin of the dataset and the component “process step” that
records the events of transformations in the lifetime of the dataset. Lineage also includes information
on the process and the intervals to maintain a dataset.
3.9.2 Data Quality Elements
In addition to the general statements on data quality in the overview elements, it is recommended that
the GIS layers include information on quantitative data quality elements. These are completeness,
logical consistency, positional accuracy and thematic accuracy.
Table 18: Selected data quality elements and sub-elements
Quality Element
Quality Sub-Element
Completeness
Commission
Omission
Logical consistency
Conceptual consistency
Domain consistency
Topological consistency
Format consistency
Positional accuracy
Absolute or external accuracy
Thematic accuracy
Classification correctness
3.9.3 Descriptors of the Data Quality Sub-Elements
The results of the quality assurance for the above mentioned data quality sub-elements should be
described using seven descriptors. The descriptors comprise the
•
•
•
•
•
•
•
scope,
measure,
evaluation procedure,
result,
value type,
value unit and
date
of the data quality sub-element.
Quality measurements are only valid for defined scopes. The scope can be a geographic or a
temporal extent, or a certain level of the data hierarchy (i.e. dataset series, dataset, features, or
76
attributes). The scope may even be different within a single dataset, e.g. if the dataset is merged from
different data providers.
The data quality measure describes briefly the test that is used for measuring the quality within the
defined scope. The evaluation procedure should be described or, alternatively, there should be a
reference to where a detailed description of the procedure can be found. This description is very
important because it is necessary to understand the result of the applied test. Each test yields a
certain result that is part of the data quality report. In order to understand the result, it is necessary to
give information on the type of the value and on the unit of measurement. The reporting is completed
with the date on which the quality test is performed.
3.9.4 Documenting Quality Information
The results of applied quality tests should be documented as part of the metadata. The ISO 19115 provides a
defined structure, that follows the logic of the above described data elements, sub-elements and descriptors. The
metadata standard distinguishes between data quality information as a report and as information of the history
(lineage) of the data. The report comprises information on quality measurements, grouped according to the data
quality sub-elements.
Figure 25: Conceptual model of metadata description on data quality
77
Example for reporting data quality according to ISO 19115:
78
79
80
81
APPENDICES
APPENDIX I - The elements of the GISCO profile
Legend:
Core metadata for geographic dataset Within GISCO Metadata Editor (ref: ISO19115
2003:(E))
Elements produced by the editor that do not conform to GISCO Schema Profile
Section Name: General Information (Mandatory)
Page Name
(Default/Custom) Field Name
Metadata author
(D)
Encoded Elements
Mandatory for Validation
Name
Organization
Position or role in the organization
Yes
No
No
Function in relation to the metadata
Yes
Metadata
Information
Address
No
Character coding used for the metadata No
set
(C)
File identifier of the metadata
No
Scope to which the metadata applies
No
Name of the hierarchy levels
No
Name of the metadata standard used
No
Version of the metadata standard used
No
DataSet URI
Date that the metadata was created
No
Yes
Title Of the dataset
Alternative titles
Edition or version number
Yes
No
No
Date for the edition or version
No
Title
(D)
82
Creation Date and Date when was the dataset first created Yes
Language
(D)
Language in the metadata
No
Language used in the data
Yes
Point of contact
overview (D)
Used to specify the point of contacts to add
Point of contact
(D)
Point of contact’s name
Yes if one of the page Fields is encoded
Point of contact’s organization
No
Point of contact’s position or role
No
Function in relation to the dataset
No
Point of contact’s address
No
Themes or
Categories
(D)
Themes or categories
Yes
Abstract (C)
Narrative summary about the content of Yes
the dataset
Purpose (C)
Summary of the intentions with which the No
resource was developed
Section Name: Data Information (Mandatory)
Page Name
(Default/Custom) Field Name
Encoded Elements
Mandatory for Validation
83
Additional
Characteristics
(D)
Keywords
Overview
Used to add pages for:
Keywords
Scale
Maintenance information
Restrictions on use of data
Used to specify keywords pages to add
(D)
Theme Keywords Keywords
Yes if keyword info is encoded
(D)
Place Keywords
Thesaurus name
Yes if thesaurus info is encoded
Thesaurus date
Yes if thesaurus info is encoded
Thesaurus date type
Yes if thesaurus info is encoded
Keywords
Same as Theme Keywods page
(D)
Thesaurus name
Thesaurus date
Thesaurus date type
Temporal
keywords
Keywords
Same as Theme Keywods page
(D)
Thesaurus name
Thesaurus date
Thesaurus date type
Stratum keywords Keywords
Same as Theme Keywods page
(D)
Thesaurus name
84
Thesaurus date
Thesaurus date type
Discipline
Keywords
Keywords
Same as Theme Keywods page
(D)
Thesaurus name
Thesaurus date
Thesaurus date type
Scale
Single scale
Yes if scale info is added
(D)
Scale range
Yes if scale info is added
Resolution distance
Resolution units of measure
No
No
Maintenance
Information
How often is the dataset updated
Yes if following Next update date is encoded
(D)
When was the dataset last revised
No
When is the dataset next scheduled to
be updated
No
Restrictions
overview
(D)
Used to specify which restriction pages to add:
Use restrictions
Legal restrictions
Security restriction
Use restrictions Use restrictions
(D)
Legal restrictions Data access restrictions
(D)
Data use restrictions
Do any other restriction apply
No
No
No
85
Security
Restriction
Which security restriction has been
applied to the dataset
No
Status of the resource
No
(D)
Status
(C)
Section Name: Spatial Information
Page Name
(Default/Custom) Field Name
Encoded Elements
Mandatory for Validation
Spatial
Representation
Method used to spatially represent
geographic information
No
(C)
Vector
representation
Code which identifies the degree of the
complexity
No
(C)
Name of point and vector spatial objects Yes if 'Total number of points…' fields is
encoded
This page is added Total number of the point or vector object No
only if Spatial
Representation
value is set to
'vector'
Grid
representation
Number of spatial-temporal axes
(C)
Identification of grid data as point or cell Yes if Grid representation info is added
This page is added Indentification of weather or not
only if Spatial
parameters for transformation exists
Representation
value is set to 'grid'
Dimension name. Name of the axis
Yes if Grid representation info is added
Yes if Grid representation info is added
Yes if information about spatial-temporal
properties is added
86
Resolution. Degree of detail in the grid
dataset
No
Dimension size. Number of elements
along the axis
Yes if information about spatial-temporal
properties is added
Coordinate
System
(C)
Coordinate system
No
Geographic
bounding box
(D)
Northern-most coordinate
Yes
Western-most coordinate
Yes
Eastern-most coordinate
Yes
Southern-most coordinate
Yes
Additional extent Page used to specify additional extent information (Temporal information, vertical
information)
information
(D)
Single date
No
Temporal
information
(D)
Range of date / Start date
Yes if End date is added
Range of date /
Yes if Start date is added
End date
Vertical
Information
(D)
Minimum height
Yes if vertical info is added
Maximum height
Yes if vertical info is added
Units of measure
Yes if vertical info is added
Vertical datum
87
Section Name: Data Quality Information
Page Name
(Default/Custom) Field Name
General Quality
Information
(C)
Source
Information
(C)
Process step
Information
Encoded Elements
Mandatory for Validation
General explanation of the data
No
producer's knowledge about the lineage
of the dataset
Source Information
No
Description
Yes if process step info is added
(C)
Data Quality
Elements
overview
(C)
Used to include Data Quality Elements Sections
88
Section Name:
Quality completeness commission.
Quality completeness omission.
Quality conceptual consistency.
Quality format consistency.
Quality topological consistency.
Quality absolute external positional accuracy.
Quality gridded data positional accuracy.
Quality relative internal positional accuracy.
Quality accuracy of a time measurement.
Quality temporal consistency.
Quality temporal validity.
Quality thematic classification correctness.
Quality non quantitative attribute accuracy.
Quality quantitative attribute accuracy.
Page Name
(Default/Custom) Field Name
[Completeness:
Commission]
(C)
Encoded Elements
Mandatory for Validation
Name of the test applied to the data
No
Code identifying a registered standard
procedure / Authority
No
Code identifying a registered standard
procedure / Value Code
Yes if the above field is added (Authority
info)
Description of the measure
No
Type of method used to evaluate quality No
Description of the evaluation method
No
Reference to the procedure information No
Date which a data quality measure was No
applied
89
Specify a result type (radio button
No
selection). This will generete either
Conformance result page or Quntitative
result page
Conformance
result
Explanation
Yes if data is added in [Completeness
Commission] page
(C)
Indication of the conformance result
Yes if data is added in [Completeness
Commission] page
Quantitative result Value type for reporting a data quality
result
(C)
No
Value unit for reporting a data quality
result
Yes if Quantitative results info is added
Statistical method used
No
Quantitative value determined by the
evaluation procedure
Yes if Quantitative results info is added
90
Section Name: Distribution Information
Page Name
(Default/Custom) Field Name
Encoded Elements
Mandatory for Validation
Introduction
(D)
Introduction page to specify about the availability of the dataset to third part
Publication Date
(D)
When was the dataset published
No
Distributor
(D)
Distributor’s name
Distributor’s organization
No
No
Distributor’s position or role
No
Function in relation to the dataset
Yes if distributor info is added
Distributor’s address
No
Page asking if the dataset is published in digital format
Digital publication
(D)
Yes if Publication format info is added
Publication format Name of the format
(D)
Version of the data format
Yes if Publication format info is added
Off-line delivery
options
Medium on which the dataset is available No
(D)
On-line delivery
options
Where is the dataset located
(D)
What connection protocol must be used No
to access this location
What function is performed at this
location
Yes if 'On-line delivery options' info are
added
No
91
What happens or what is available at this No
location
Ordering process Ordering instructions
Typical turnaround time for completing
(D)
No
No
an order
Fees and terms for purchasing the
dataset
No
Section Name: Metadata Attribute Information(Optional)
Page Name
(Default/Custom) Field Name
Table Description Name
(C)
Description
Attribute Label to add
Fields Description: Label (read only field)
‘i attribute name’
(C)
Encoded Elements
Mandatory for Validation
Yes if Metadata Attribute info is added
No
Yes if attribute info is added
No
Description
Field Type
No
Yes if attribute info is added
Source
Binary Width
No
Yes if attribute info is added
Precision
Scale
No
No
Section Name: Legislation Information (Optional)
Page Name
(Default/Custom) Field Name
Encoded Elements
Mandatory for Validation
Legislation option Page used to specify weather to add or not legislation information
(C)
92
Legislation
Information
Title
Yes if legislation info is added
(C)
Reference date type
Yes if legislation info is added
Reference date
Yes if legislation info is added
Legislation type
Yes if legislation info is added
Country or other entity to which the
legislation corresponds
Yes if legislation info is added
Internal reference
Yes if legislation info is added
93
APPENDIX II - Glossary of Technical GIS Terms
ArcCatalog
The file management application of ArcGIS.
ArcGIS
ESRI's GIS suite of software.
ArcInfo
The highest license available for ArcGIS, allowing full functionality.
ArcMap
ArcMap is the central application in ArcGIS for all map-based tasks.
ArcSDE
Server software that provides ArcSDE client applications (for example,
ArcGIS Desktop, ArcGIS Server, ArcIMS) a gateway for storing, managing
and using spatial data in one of the following commercial database
management systems: IBM DB2 UDB, IBM Informix, Microsoft SQL Server,
and Oracle.
Attribute
A property of a geographic feature described numerically or by characters.
Attributes are mostly stored in tabular format, and are linked to the feature
by a user-assigned identifier. In a geo-relational database model, it
describes a spatial feature (e.g. point, line, node or area).
Coordinate System
A fixed reference framework superimposed onto the surface of an area to
designate the position of a point within it; a reference system consisting of
a set of points, lines, and/or surfaces; and a set of rules, used to define the
positions of points in space in either two or three dimensions. The
Cartesian coordinate system and the geographic coordinate system used
on the earth's surface are common examples of coordinate systems.
Coverage
A data model for storing geographic features using ArcInfo software. A
coverage stores a set of thematically associated data considered to be a
unit. It usually represents a single layer, such as soils, streams, roads, or
land use. In a coverage, features are stored as both primary features
(points, arcs, polygons) and secondary features (tics, links, annotation).
Feature attributes are described and stored independently in feature
attribute tables. Coverages cannot be edited in ArcGIS.
Dangle Node
End point nodes which represent an overshot or undershot of an
intersecting line.
Dataset
Any organized collection of data with a common theme
Datum
In the most general sense, any set of numeric or geometric constants from
which other quantities, such as coordinate systems, can be defined. A
datum defines a reference surface. There are many types of datums, but
most fall into two categories: horizontal and vertical.
Domain
In a geodatabase, a mechanism for enforcing data integrity. Attribute
domains define what values are allowed in a field in a feature class or
nonspatial attribute table. If the features or nonspatial objects have been
grouped into subtypes, different attribute domains can be assigned to each
of the subtypes.
Ellipsoid
A three-dimensional, closed geometric shape, all planar sections of which
are ellipses or circles. An ellipsoid has three independent axes, and is
usually specified by the lengths a,b,c of the three semi-axes. If an ellipsoid
is made by rotating an ellipse about one of its axes, then two axes of the
ellipsoid are the same, and it is called an ellipsoid of revolution, or
94
spheroid. If the lengths of all three of its axes are the same, it is a sphere.
Feature Class
A collection of geographic features with the same geometry type (such as
point, line, or polygon), the same attributes, and the same spatial
reference. Feature classes can stand alone within a geodatabase or be
contained within shapefiles, coverages, or other feature datasets. Feature
classes allow homogeneous features to be grouped into a single unit for
data storage purposes.
Feature Dataset
A collection of feature classes stored together that share the same spatial
reference; that is, they have the same coordinate system, and their
features fall within a common geographic area. Feature classes with
different geometry types may be stored in a feature dataset.
Foreign Key
A column or combination of columns in one table whose values match the
primary key in another table. A value in the foreign key can only exist if
there is a corresponding value in the primary key, unless the value is
NULL. Foreign key–primary key relationships define a relational join.
Generalisation
Simplification of map information, so that information remains clear and
uncluttered when map scale is reduced. Usually involves a reduction in
detail, a resampling to larger spacing, or a reduction in the number of
points in a line. Traditionally this has been done manually by a
cartographer, but increasingly semi-automated and even automated
methods have been used, particularly in conjunction with a GIS.
Geodatabase
An object-oriented data model introduced by ESRI that represents
geographic features and attributes as objects and the relationships
between objects but is hosted inside a relational database management
system. A geodatabase can store objects, such as feature classes, feature
datasets, nonspatial tables, and relationship classes.
GIS
Geographic Information System: an organised set of computer hardware,
software, geographic data and personnel designed to capture, store,
maintain, analyse and present all kinds of spatially-referenced information
in the most efficient way.
Legend
The reference area on a map that lists and explains the colors, symbols,
line patterns, shadings, and annotation that have been used on the map to
code the various elements and data values. The legend includes a sample
of each symbol with text describing what it means. Legends often include
the map's scale, origin, and projection.
Map Projection
A mathematical model that transforms the locations of features on the
earth's surface to locations on a two-dimensional surface, normally using
Cartesian co-ordinates.
Meridian
A great circle on the earth that passes through the poles, often used
synonymously with longitude.
Metadata
Information about the content, quality, condition, and other characteristics
of data.
Node
A node is an endpoint of an arc. The from-node is the first vertex in the arc;
the to-node is the last vertex.
95
Polygon
A polygon is an area defined by arcs that make up its boundary, including
arcs defining any islands inside. It is a many-sided, closed figure, defined
by a series of arcs and by a label point positioned inside the polygon.
Primary Key
A column or set of columns in a database that uniquely identifies each
record. A primary key allows no duplicate values and cannot be NULL.
Raster
Spatial data recorded using cells. The location of the feature is described
through its position in a grid (consisting of rows and columns). The cell at
this particular position has a certain value, being the value for the
geographic phenomenon represented.
Schema
The structure or design of a database or database object such as a table.
In a relational database, the schema defines the tables, the fields in each
table, and the relationships between fields and tables.
Shapefile
A vector data storage format for storing the location, shape, and attributes
of geographic features. A shapefile is stored in a set of related files and
contains one feature class.
Subtype
In geodatabases, a subset of features in a feature class or objects in a
table that share the same attributes.
Thematic Data
This is data where non geographical data sets are connected (eg.
demographic data) with an indirect geographical reference (eg. region
code).
Topology
Procedure for the explicit definition of spatial relationships between
connecting or adjacent coverage features.
Vector
Geographic features are recorded as sets of co-ordinates using a
Cartesian co-ordinate system. The location of the feature is described
through a series of x,y co-ordinates.
Vertex
One of a set of ordered x,y co-ordinates that compose a line feature.
96
APPENDIX III - Glossary of Abbreviations
API
Application Program Interface
CORINE
CoORdination of INformation on the Environment
DBMS
DataBase Management System
DEM
Digital Elevation Model
ETRS
European Terrestrial Reference System
EU25
The 25 Countries of the European Union
FDD
Frequency Distribution Diagram
GIF
Graphics Interchange Format
GISCO
Geographic Information System of the European Commission
GML
Geography Markup Language
HDV
Histogram of Data Values
HTML
HyperText Markup Language
IATA
International Air Transport Association
ICAO
International Civil Aviation Organisation
INSPIRE
INfrastructure for SPatial InfoRmation in Europe
INTERREG
An EU-funded programme that helps Europe’s regions form partnerships to work
together on common projects
ISO
International Organization for Standardization
JPEG
Joint Photographic Experts Group
LAEA
Lambert Azimuthal Equal Area
LCC
Lambert Conformal Conic
LZW
Unisys patented Lempel Ziv Welch data compression and decompression
technology.
MARS
Monitoring Agriculture with Remote Sensing
NMA
National Mapping Agencies
NSI
National Statistical Institutes
NUTS
Nomenclature of territorial units for statistics. This abbreviation is only applicable to
EU members.
OGC
Open Geospatial Consortium
97
PNG
Portable Network Graphics
RDBMS
Relational DataBase Management System
ROI
Return On Investment
SDI
Spatial Data Infrastructures
SOAP
Simple Object Access Protocol
TIFF
Tagged Image File Format
TM
Transverse Mercator
UDDI
Universal Description, Discovery, and Integration
UML
Unified Modeling Language
UN/ECE
United Nations Economic Commission for Europe
WFS
Web Feature Service
WMS
Web Map Service
XML
eXtensible Markup Language
98
APPENDIX IV – List of Document Sources
This document is a combination of new material and material collated from other sources which have
been edited and/or updated. Below is a list of sources that have been used for the base material of
various chapters.
1.1 Naming Conventions
GISCO Database Manual
SADL, K.U.Leuven R&D
Copyright © 2005 Eurostat
1.3 Spatial Reference System
Map Projections for Europe
Joint Research Centre
EuroGeographics
Bundesamt für Landestopographie, Switzerland
Bundesamt für Kartographie und
GeodäsieGermany
1.4 Grid Creation Standards
1st Workshop on European Reference Grids
Ispra, 27-29 October 2003
JRC-IES-LMU-ESDI
1.5 Generalisation Parameters ArcGIS Help
Copyright © 2004 ESRI
1.6 Spatial Data Standards and Spatial Data Standards and GIS Interoperability
GIS Interoperability
An ESRI White Paper - January 2003
Copyright © 2003 ESRI
OGC Website
http://www.opengeospatial.org
2.4 Making GISCO Maps
GISCO Desktop Mapping guide
Desktop Mapping guide, version 2
Geographical Information System for the Commission
Directorate E - Unit E4 Structural Funds
Mapping Tool User Guide
Lovell Johns Ltd, Oxfordshire, UK.
Copyright © 2005 Eurostat
3.1-3.6 Data Quality
Quality Assessment and Quality Control related to GISCO data
Copyright © 1998 Eurostat
3.7 Generalisation
ArcGIS Help
Copyright © 2004 ESRI
99