Download CAMINAR Catchment Management and Mining Impacts in Arid and

Document related concepts

Extensible Storage Engine wikipedia , lookup

Entity–attribute–value model wikipedia , lookup

Database wikipedia , lookup

Relational model wikipedia , lookup

Clusterpoint wikipedia , lookup

Functional Database Model wikipedia , lookup

Database model wikipedia , lookup

Transcript
INCO-CT2006-032539
CAMINAR
Catchment Management and Mining Impacts
in Arid and Semi-Arid South America
Instrument: Specific Targeted Research Project
Thematic Priority: A.2.3 Managing arid and semi-arid ecosystems
Deliverable D7
GIS environment database and report
Due date of deliverable: Month 12
Actual submission date: Month 14
Start date of project:
1 February 2007
Duration: 36 months
Organisation name of lead contractor for this deliverable: Instituto Superior Técnico - IST
(final)
Project co-funded by the European Commission within the Sixth Framework Programme
(2002-2006)
Dissemination Level
PU
PP
RE
CO
Public
Restricted to other programme participants (including the Commission Services)
Restricted to a group specified by the consortium (including the Commission Services)
Confidential, only for members of the consortium (including the Commission Services)
X
European Commission Sixth Framework Programme
Specific International Scientific Cooperation Activities (INCO)
Activity Area: A. Developing countries
A2. RATIONAL USE OF NATURAL RESOURCES
A.2.3. Managing arid and semi-arid ecosystems
Specific Targeted Research or Innovation Project CAMINAR
Contract No. INCO-CT2006-032539
Catchment Management and Mining Impacts
in Arid and Semi-Arid South America
D7
GIS environment database and report
January 2008
Authors:
Luis Ribeiro, Ana Buxo
Instituto Superior Técnico - IST
Lisbon, Portugal
IST – Instituto Superior Técnico – Avenida Rovisco Pais, 1049-001 Lisboa, Portugal – www.ist.utl.pt
[email protected]; [email protected]
GIS environment database and report
EXECUTIVE SUMMARY
The focus of the CAMINAR project is the sustainable use of water resources and the
associated ecosystems under a specific context, where ecological, climatic, social and
economic factors have to be accounted for. This implicates an increased complexity and
quantity of information necessary for the water management process.
The growth and advances of information technologies and databases have provided the ability
to access and effectively manage large volumes of information, thus increasing the quality of
managerial decision-support. To assure that the decision making process is practical, usable
and reliable, it is necessary that the whole data set concerning each basin, may they be about
ecology, topography, climate or social-economic factors, can be analysed together. The use of
Geographic Information Systems and relational Database Management System adds value to
the original data sets, by migrating and converting them into a normalized database that can
simultaneously accommodate both geographic and alphanumeric data. This process enhances
data quality through the application of validation procedures that detect and correct
inconsistencies, on individual elements or entire data sets, and by imposing a framework for
structuring the information, according to pre-defined topologic/geometric and alphanumeric
criteria.
Consequently, in this first year of the CAMINAR project, the activities of Workpackage 6
Decision Support Tools were focused on the development of a spatial database that integrates
both geographic and non-geographical data. The present version of the GIS environment
database is the cumulated work of the teams from Workpackages 2-4, responsible for the data
collection, and of the team from Workpackage 6 responsible for the definition and
implementation of the necessary procedures to generate a database to support decision making
more effectively. Many types of information were collected by teams from South America,
generating a very significant volume of files and paper documents, which could be potentially
used. This data was then analysed, revised, processed and integrated into the database, using
the productivity and analytical tools of Geographic Information Systems.
By the end of the first year of the project, structured data repository has been created, with the
necessary quality and quantity of data required to proceed to the following stage, were this
data can be used to feed specialized modelling and hydrological analysis tools. This spatial
database is more easily managed than the scattered original data sets and can be easily used to
promote continuous updates and upgrades of the information, thus improving the decision
process by the use of the most actual and reliable information.
Also, the use of standard formats and systems allows this data repository to interact with a
variety of other applications, namely hydrologic and hydraulic modelling packages, such as
the various possibilities that will be tested and evaluated during the second year of the
CAMINAR project.
Finally, but not less important, the work has been developed taking into consideration the
necessary transfer of this database, and the facts that are behind it, to the final users.
Therefore, standard formats, public domain data models and well documented procedures
have been used, aiming to generate reproducible results.
i
GIS environment database and report
Index
EXECUTIVE SUMMARY.......................................................................................................I
INDEX ...................................................................................................................................... II
ACRONYMS AND ABBREVIATIONS ..............................................................................III
LIST OF FIGURES ...............................................................................................................IV
LIST OF TABLES .................................................................................................................VI
LIST OF ANNEXES ............................................................................................................ VII
1
2
INTRODUCTION............................................................................................................. 1
1.1
Objectives .................................................................................................................... 1
1.2
Geographic Information Systems versus Decision Support Systems.......................... 1
DECISION SUPPORT TOOLS....................................................................................... 4
2.1
Building the database................................................................................................... 4
2.1.1 Data Collection ........................................................................................................ 4
2.1.2 Data modelling......................................................................................................... 7
2.1.3 The ArcHydro data model ..................................................................................... 10
2.1.4 Data processing...................................................................................................... 14
2.1.5 Database implementation....................................................................................... 27
3
CONCLUSIONS ............................................................................................................. 41
4
REFERENCES................................................................................................................ 42
ANNEX I – Data Collection Guidelines ..………………………………………………… 44
ANNEX II – Basic Archydro Data Model Framework………………………………….. 56
ii
GIS environment database and report
Acronyms and Abbreviations
DBMS
DEM
DSS
GIS
ODBC
OLE
RDBMS
SDSS
TIN
WP
Database Management System
Digital Elevation Model
Decision Support System
Geographic Information System
Open Database Connectivity
Object Linking and Embedding
Relational Database Management System
Spatial Decision Support System
Triangular Irregular Network
Workpackage
iii
GIS environment database and report
List of figures
Figure 1: Conceptual SDSS for watershed management and evaluation of environmental
impacts from mines ...............................................................................................2
Figure 2: View of the metadatabase form for file retrieval...............................................6
Figure 3: Example of data retrieval using the metadatabase: queries for all documents
and files related to a chosen category....................................................................7
Figure 4: Data Model Based on Inventory ......................................................................10
Figure 5: Integrating Data Inventory using a Behavioural Model ..................................11
Figure 6: Components of the ArcHydro data model. ......................................................13
Figure 7: Geodatabase view of ArcHydro data model....................................................13
Figure 8: Example of Bolivia, Lake Poopó: data processing of hydrography data. .......18
Figure 9: Example of Peru / Chili basin: The original river cartography base. ..............19
Figure 10: Example of the original river’s cartography for the Chili River, before
processing. ...........................................................................................................19
Figure 11: Example of the original river’s cartography for the Chili River, after
processing. ...........................................................................................................20
Figure 12: Examples of some of the files and tables containing data for time series
integration............................................................................................................22
Figure 13: Example of the aspect of formatted time series table in ArcHydro data model22
Figure 14: Aspect of the official website of the GLCN Land Cover Topic Centre, and
the software available to assist in classification and normalization of land cover
themes..................................................................................................................24
Figure 15: Example of cross-population: layer “Superficies_riego_elqui231205” and
layer “Panos_cultivados_elqui” contained different types of information about
the same feature (land parcels) ............................................................................26
Figure 16: The resulting layer contains the aggregated data from both data sources. ....26
Figure 17: Digital elevation model generated for the Elqui basin, with overlaying river
network ................................................................................................................30
Figure 18: Derived DEM information for Elqui river basin: slope grid (left) and aspect
grid (right). ..........................................................................................................31
Figure 19: Digital elevation model generated for the Chili river basin, with overlaying
river network .......................................................................................................32
Figure 20: Derived DEM information for Chili river basin: slope grid (left) and aspect
grid (right). ..........................................................................................................32
Figure 21: Base data: contour lines for the Lake Poopó area and overlaying
hydrographic network..........................................................................................33
Figure 22: Surface Reconditioning by the AGREE method (Hellweger, 1997).............34
Figure 23: Comparison of the flow accumulation grid and the river network for the Chili
river basin (Peru) .................................................................................................35
iv
GIS environment database and report
Figure 24: Comparison of the flow accumulation grid and the river network for the
Elqui river basin (Chile) ......................................................................................35
Figure 25: Catchment areas and watershed areas for the Chile river basin (Peru). ........36
Figure 26: Catchment areas and watershed areas for the Elqui river basin (Chile). .......36
Figure 27: Examples of network analyses: selecting lines upstream or downstream of a
certain point. ........................................................................................................38
Figure 28: Connecting MonitoringPoints to the river network. ......................................40
Figure 29: Geodatabase view of the connection between river network, monitoring
station and temporal data.....................................................................................40
v
GIS environment database and report
List of tables
Table 1: General data collection statistics.........................................................................5
Table 2: Data collection statistics: types of formats in which original data was received15
Table 3: Coordinates systems identified in the original layers and used in the final
database for each basin........................................................................................17
Table 4: Chile – Elqui River DEM statistics...................................................................30
Table 5: Peru – Chili River DEM statistics.....................................................................31
Table 6: River network statistics for the three basin areas.............................................. 38
vi
GIS environment database and report
List of annexes
Annex I
Data collection guidelines
Annex II
Basic ArcHydro data model framework
vii
GIS environment database and report
1
1.1
INTRODUCTION
Objectives
The objective of Workpackage 6 (WP6) in the CAMINAR project is the development of
decision support tools to support participatory water management planning in 3 demonstration
river-basins: Chili River, in Peru, Elqui River, in Chile and Lake Poopó in Bolivia.
During this first year of project CAMINAR, main activities related to the “River-basin case
studies data collection” task, for the three study areas. It involved the completion of the
following objectives:
o Development of a consistent format for lead partners of WP2-4 to use to pass
hydrological, sediment transport and hydrochemical data from three demonstration
river basins
o Reception, quality-checking and database entry of data provided by lead partners of
WP2-4, and presentation in a GIS environment.
o Obtain and process relevant remote sensing data for study river-basin (directly and/or
from partners as appropriate) and manipulate onto same GIS platform.
1.2
Geographic Information Systems versus Decision Support Systems
Traditionally geographic information was managed by cartographic systems, that later
developed into Geographic Information Systems (GIS). A GIS differs from a mapping or
cartographic software, as it will have a database of geographic data, allowing linkages
between different types of data and the ability to query this spatial data. Also, alphanumeric
attributes can be handled easily using an integrated database management system (DBMS).
An increasing number of examples indicate that an integrated decision making process can
provide the ability of delivering more realistic approaches in arid zone ecosystems
management. This is achieved by the use of assessment tools that can examine the
consequences of alternative management. Systems such as DSS (Decision Support Systems)
focus on specific decisions and on supporting rather than replacing the user's decision making
processes. Definitions of DSS emphasise the need to support semi-structured and unstructured
decisions.
Many widely accepted definitions of Decision Support Systems (DSS) identify the need for a
combination of a database, an interface and a model component directed at a specific
problem. In terms of these definitions, a GIS would not be regarded as a DSS as it lacks
support for the use of specific models. However, there are today many techniques allowing
data to be shared and/or transferred between computer applications.
The growth and advances of information technologies and databases have provided the ability
to access and effectively manage large volumes of information, thus increasing the quality of
managerial decision-support. As in so many other problems dealt with by DSS, and in
particularly in those related to natural resources, water management is dependent on
information that has a spatial component. Events do not occur in an independent way of their
geographic location and the concepts of vicinity, proximity and connectivity are inherent to
the understanding of the hydrologic system. Based on these aspects and since the dependency
1
GIS environment database and report
of such a system on geographic data is of such dimension, we can refer to such as system as a
Spatial Decision Support System (SDSS). In a SDSS, the GIS component is employed as a
basic conceptual framework for the representation of the territorial domain, as well as
interface with end-users.
In some areas where specific modelling and simulation are needed to fully support the
decision, the use of GIS can make use of techniques for interaction between software
packages such as object linking and embedding (OLE), dynamic data exchange, and open
database connectivity (ODBC), acting as the basic medium for sharing data between
applications or modules. In such work environment, GIS and DSS definition converge, as we
can find that in both cases the interface, the database and model components required to fully
support decisions, are present.
To assure that the decision making process is practical, usable and reliable, it is necessary that
the whole data set concerning each basin, may they be about ecology, topography, climate or
social-economic factors, can be analysed together. The use of GIS and relational DBMS adds
value to the original data sets, by migrating and converting them into a normalized database
that can simultaneously accommodate both geographic and alphanumeric data. This process
enhances data quality through the application of validation procedures that detect and correct
inconsistencies, on individual elements or entire data sets, and by imposing a framework for
structuring the information, according to pre-defined topologic/geometric and alphanumeric
criteria.
Presently, as the cost of technology decreases and the field of applications for GIS spreads,
many of the management science models used are being incorporated into the GIS
environment.
Data Collection
MODELS
Surface Water
GIS
Groundwater
Database: Spatial and
non-Spatial
Information
Water Quality
Analysis and data
access tools for
pre and post
processing
Data management
Risks appraisal
Vulnerability
Alternatives analysis
User Interface
Maps
Reports
Graphics
Statistics
Tables
Scenarios
Decision making
Figure 1: Conceptual SDSS for watershed management and evaluation of environmental impacts from
mines
Geographic information systems are capable of integrating geographical data with other data
from various sources to provide the necessary information for effective decision making in a
watershed management system.
2
GIS environment database and report
Typically a GIS system serves both as a tool box and as a database.
As a tool box, GIS allows planners to perform spatial analysis using its geoprocessing or
cartographic functions such as data retrieval, map overlay and connectivity.
Decision makers can also extract data from the GIS’s database and input it to other modelling
and analysis programs, together with data from other databases or specially conducted
surveys.
An example of the use of such GIS capabilities for hydrologic analysis would be a set of GIS
tools, capable to manage and analyse vector or raster based data sets, in order to determine
hydrologic elements and connectivity, to perform calculations of drainage parameters and
prepare the input files for modelling/simulation applications. These tools allow automated
model setup significantly faster, leading to reproducible results.
Another important aspect in the use of GIS as a DSS generator is related to the data
acquisition process. It is of major importance in developing DSS projects, the impact of the
data. The quality of the decision making process depends on the availability and reliability of
information. Decisions based on the DSS are as good and reliable as the data stored in the
system’s database.
In this sense, the traditional role of GIS as a tool for speeding up data acquisition process and
the processing of spatial data is also of great importance.
GIS was initially used as a mean for the completion of activities which contributed directly to
productivity, dealing with the greater complexity of spatial information, with resourceconsuming procedures and validation and error checking automation.
Normally, one of the main obstacles to the development of successful and usable DSS, as well
as GIS, is the access to data in such formats that can be combined in an integrated database.
But there are other constraints beyond the availability of pertinent data. These projects have
been hindered by difficulties such as the cost of accessing such data, the complicate technical
procedures necessary to process geographic information and the large volume of data that
imply significant employment of time and human resources.
3
GIS environment database and report
2
DECISION SUPPORT TOOLS
Building decision tools for watershed data management requires a repository of valid, reliable
data that can be used to assist in the decision process.
Our focus is therefore on enhancing the population of DSS with reliable ecological,
environmental, climatic or socio-technical information.
As in many similar situations, a DSS (or a spatial DSS) does not start from scratch. We
benefit from being able to access some data sets or being able to establish contacts and
partnerships, with a variety of local data providers, using the expertise and previous work
done by the teams from WP2-4.
However this data is dispersed and it is stored in repositories of several entities, in many
different formats, with distinct requisites and technical specification.
To being able to develop a DSS, we have to build a consistent, integrated data repository.
Due to the previous evaluation of the variability of formats, the volume of data and the
geographic and non-geographic nature of it, it was considered that GIS applications
associated with relational data management systems, would be the most convenient tools for
manipulation of this data sets, as they simplifies and aid in productive generation of structured
information.
2.1
Building the database
2.1.1
Data Collection
The initial work of WP6 was to define some lines of development and procedures related to
the data collection task.
The objective was to specify some criteria for the data collection that could be used by teams
from WP2-4 in their participation on this task.
In view of this objective, a document named “Data Collection Guideline” (see Annex 1), was
produced and sent to every one of the three teams from South America.
This document defined some of the tasks to be preformed, such as identifying possible data
sources, as well as capacitating the teams in doing some pre-analysis of the data. The purpose
was that, before being sent to WP6, there could be some type of pre-screening of the files and
paper documents, which could detect more obvious problems.
Examples of this pre-analysis could be the detection of problems of informatics nature (e.g.
files that could not be opened).
Also, it was asked if some type of additional information required for GIS integration could
be added (e.g. projection systems, classification systems, topologic aspects, unique ID or
specific user codes for cross-data analysis, etc.).
4
GIS environment database and report
Teams from WP2-4 have done an extensive data compilation, from different sources, internal
and external to their work groups. It includes pre-existing GIS layers in several formats,
reports and articles, raw data in tables or lists, satellite images, photographs and general data,
both in paper and digital media.
Files and other documents were then sent to team from WP6, for evaluation and database
integration.
To facilitate the data transference, different platforms were made available, depending on the
choice of each team, the technological capabilities locally installed and the nature of the data
involved. Data can be sent by courier, using CD-ROM or DVD media, as well as e-mail, FTP
or directly delivered to the team of WP6 during its visits to the project’s locations on South
America.
From all the platforms available, the FTP as proven to be the most effective and faster, as
long as each team has access to the internet, and a sufficient band width, enabling to transfer
data packages as they becomes available. In some circumstances, large volumes of
information can be sent, making response time very fast.
Also, in some occasions, FTP was used as a communication tools, in situations such as when
the help of local teams was important in the evaluation, completion or validation of certain
data sets, since it can de used in both directions (send and receive) by teams from WP2-4 and
WP6.
In the future FTP will have an increasing importance in the development of the database,
since it can be used to sent to the teams of WP2-4 revised database versions and updates, for
their use, analysis, validation and completion, resulting in a interactive process of data
refinement and allowing to adjust the final product to the requirement of the potential final
users.
As a result of such work, a significantly large volume of data was gathered. By the end of
January, 2008 the following volumes of information were received:
≈ 3.2 Gbytes of information
4 543 files or documents delivered
760 geographic layers or themes
27 different formats
≈ 18 000 Km2 Basin area for Bolivia (Lake Poopó)
Basin area for Chile (Elqui river +
≈ 10 096 Km2
Quebrada Chacay + Pan de Azúcar)
2
≈ 14 475 Km Basin area for Peru (Chili river + Colca)
Table 1: General data collection statistics
Each file received is catalogued in a metadatabase. The metadatabase contains information
about the data. Each file or paper document received is classified according to attributes such
as:
5
GIS environment database and report
o
o
o
o
o
o
o
o
o
Basin / case study
Data of reception,
Media of delivery (FTP, CD, E-Mail, etc)
Name
Location on the computer
Type
Format
Category
and general description and/or observation
Figure 2: View of the metadatabase form for file retrieval.
The generation of such a database was necessary due to volume of files and documents
involved. The objectives of this metadatabase were in first place to keep track of all the
information received and rapidly accessing it.
It as also proven to be an efficient method of internally manage the flux of information, since
it allows for potential users of such files and document to find all the elements related to a
certain subject or basin.
6
GIS environment database and report
Figure 3: Example of data retrieval using the metadatabase: queries for all documents and files related to
a chosen category.
Furthermore, it is used as one of the method applied for data completion and validation, by
the cross analysis of documents and files related to the same theme. If more than one data
source for a certain theme can be can be compared, it is possible to detect inconsistencies in
the data or in other cases, by cross-populating of the datasets, more rich information is
obtained.
2.1.2
Data modelling
Data modelling is the process by which a standard architecture for the project’s database is
defined and thus used to organize, manage and access information.
Basic goals of data modelling are:
o Simplifying the process of projects implementation.
Database model is a complex technique and one which has a significant impact on the
success of the project’s implementation and adherence to the objectives.
Major risks in developing data models are oversimplification that could truncate
futures analytical capabilities required by the system, and over complexity that can
direct the project to dead-ends and never-ending tasks, loosing track of the real
purpose.
This occurs because the data model must take into account all the phases of
development of the database: the nature of the problem, the industries or scientific
disciplines involved, the type and formats of data available, the data acquisition and
integration processes, as well as the end-users interface with the information and the
generations of final products (maps, reports or statistics).
7
GIS environment database and report
This means that data modelling some time occurs before many of the necessary
information is yet available and many aspects still remained to be specified, namely by
the end-users.
In these sense, previous experience in similar project becomes a significant plus, as it
allows for a faster project’s development by the existence of a starting point.
o Promote and support standards to bring consistency and synergy between similar
systems
More and more, GIS and spatial databases are used by projects and organizations as a
corporative system.
This means that the information served by the spatial database is used transversally
inside organizations, by different groups of users, with various interests but that share
the same data.
This also means that the information gathered in the spatial database is not only used
inside the graphical and cartographical environment of the GIS, but also by other
applications and software.
To do so, more standard and formal data modelling is required, since each system or
application needs to recognize the data contained in the database management system,
and needs to know what each attribute, each value, each classification system means,
in order to adapted it to its own specifications.
However, the choice of a data model should not be such that it won’t allow the
integration of case-specific design situations. Therefore it needs to be extensible,
flexible, and adaptable to the user requirements
o To facilitate the transmission of both knowledge and technology, by the use of well
documented processes.
The development of systems such as GIS or DSS has to deal with the existence of
different modules and components (database, models and interface), the large volume
of data processed and the distinct profiles of potential users involved.
The successful use of the end-products depends also on the capacity of describing and
justifying its production process and the technological and/or data constraints
associated with its development.
This means, that when the system is finally concluded, all process, procedures, and
requirements have to be documented.
This is an important step since these documents will be the bases for end-users to
understand its content, how it works, what are the limitations, how it can be expanded,
what functionalities are available, etc.
8
GIS environment database and report
The generation of a spatial database according to a well defined, standard data model,
allows for a more extensive and clear documentation.
Another important aspect of the documentation process is giving the final users the
knowledge that will allow them to reproduce the process, if necessary, by identifying
the main steps and by clearly describing the why and how of each procedure.
Taking into to account all these items, it was decided to use a pre-defined, third-party data
model named ArcHydro.
This data model was jointly developed by the Centre for Research in Water Resources of the
University of Texas at Austin, and ESRI (Environmental Systems Research Institute, Inc.).
It was designed taking into account the different needs of several types of potential users
dealing with surface and groundwater issues, from scientists dealing with natural resources to
engineers as well as planners and managers.
ArcHydro is a geospatial and temporal data model developed for the analysis and
management of water resources. It consists of two sub-models, one for surface water and
another for groundwater, allowing for integrated analysis of many subjects related to
hydrology.
In the development of ArcHydro, the question regarding the integration between a spatial
database and hydrologic simulation models was considered in a very efficient way.
ArcHydro is not in it self a simulation model but has associated with the model, a set of tools
that allow for the definition of the framework of generic elements (hydrofeatures), the way
features from different data layers interconnect as well as tools for populating the database
with the parameters used by a significant number of hydrologic models.
ArcHydro represents a basic data model that, if necessary, can be modified, by adding other
data layers, new attributes to the existing features, more complex relations between elements,
new rules or domains to the behaviour of elements, in order to adjust it to particular projects
or applications.
One of the reasons for this choice was the fact that ArcHydro is already a standard, uniform
data model with an extensive base of users, allowing it to be repeatedly tested, corrected and
improved over time.
A large number of articles, books and internet sites, have abundant information about the
concepts, the templates, the implementation procedures and examples of applications that can
help end-users understand the principles behind its use.
There are also some discussion groups and forums which users can consult to solve particular
questions regarding their individual project, sharing their experience and knowledge with an
enlarged base of experienced users.
In addition, there is a clear tendency to transform the GIS work environment into a formal
DSS, by the direct integration of analysis tools and simulation model into the GIS interface.
9
GIS environment database and report
Pairing with this tendency, some broadly used hydrologic model are using ArcHydro data
model as their data models and for others, pre-processors that link these external application
to the ArcHydro model already exist and are freely available.
2.1.3
The ArcHydro data model
Traditionally, data model design used in GIS was, as what it can described, an inventory data
model. This consisted on a set or a list of geographic layers, each representing a specific
theme and group of elements. Each data layer was then stored in individual files or set of
files, that existed by itself.
In these type of data model, the relations between individual elements or distinct layers are
only deduced by the analysis that the user makes of the data, since the user can observe the
geographic relations, the proximity, the vicinity and through the use of basic GIS operations
such as unions, intersections, buffering, can identify and quantify these relations.
Point of interest in
the
hydrographic
system
Line features in
the hydrographic
system
Surface features in the
hydrographic system
Makes an inventory
of all features of a
given type in the
region
Pump
Dam
What is it?
Where is it?
Bridge
Figure 4: Data Model Based on Inventory
However, our perception of a natural hydrologic system is quiet different.
The elements that we can recognize are connected and we can even describe the relations
between them.
Thus our perception is more like a model based on behaviour: a drop of rain that falls at a
certain location will go downhill through the line of major slope, into a nearby stream that by
its turn is connected no the next reach downstream and so one, until it reaches the sea.
ArcHydro uses then an approach were the inventory data model is integrated using a
behavioural model.
10
GIS environment database and report
Relationships
between
objects linked by tracing
path of water movement
Figure 5: Integrating Data Inventory using a Behavioural Model
In this case, the data model will not only store the data layers but also the relations between
the elements (that represent the movement of a drop of water) and defined rules.
To do so, it uses the advantages of a special spatial data structure in Relational Database
Management Systems (an RDBMS-based GIS System): the geodatabase model.
Conceptually, it is a combination of GIS objects enhanced with the capabilities of a relational
database to allow for relationships, topologies, and geometric networks to be stored in the
same place.
ArcHydro data model is then a geodatabase data model.
Geodatabase is an object-oriented data model that allows both physical objects and
behaviours to be stored in the same scheme.
o A geodatabase can define general and arbitrary relationships between objects and
features;
o A geodatabase can enforce the integrity of attributes through domains and validation
rules;
o A geodatabase can present multiple versions so that many users can edit the same
data;
o A geodatabase can model topologically complex sets of features such as networks.
A geodatabase stores both geographic objects and non-geographic data in a commercial
relational database:
11
GIS environment database and report
o A uniform repository of data where all of geographic and non-geographic data can be
stored and centrally managed in one database file;
o This means that we can use the advantage of improved performance of spatial analysis
by use of topological rules as well as all the advantage of data management, retrieving
and analysis associated with relational databases;
o Many application need to extract information from the spatial database but not to
display or geographically process the data. Since geodatabases are implemented using
standard DBMS these applications can access the data using standard languages,
protocols or technologies.
From the hydrologic point of view, the ArcHydro data model divides water resources into five
different components.
Each component is organized in the geodatabase as a feature data set, meaning a group of
interdependent geographic layers.
The five components of the complete ArcHydro data model are:
o Hydrography
It is the base data from topographic maps and tabular data inventories. It includes
layers such as monitoring stations, structures of relevant impact in the hydrologic
system (dams, bridges, water discharge points, etc.) and well as other elements that are
relevant to the complete description of water related features;
o Drainage
It includes features like drainage areas (such as catchments or watersheds), outlet
points and stream lines. These features describe the process by which water moves
from the point where it falls onto the landscape down to a stream, then to a river and
finally to the sea;
o Network
Network is a set of lines and relevant points that define the path of water flow;
o Channel
A set of lines, transverse or parallel to the river network that defines the river shape;
o Time series
Set of tabular data that describe the temporal variation of certain water property
related with a hydrofeature.
The description of each of these components, from the geodatabase point of view is presented
in Annex II.
12
GIS environment database and report
Drainage
Network
Flow
Time Series
hydrofeatures
Time
Channel
Hydrography
Figure 6: Components of the ArcHydro data model.
Figure 7: Geodatabase view of ArcHydro data model
13
GIS environment database and report
2.1.4
Data processing
Once we’ve defined how we want our database to look like, and some data is already
available, we have to start considering how we are going to migrate, convert or import the
original data set into the chosen data model.
Since original data was produced by multiple data providers, with different purposes, taking
into account distinct technical specifications, the obvious starting situation that we have to be
prepared is to deal with a huge diversity of formats, standards, files and disperse elements.
Due to this expectable diversity in the original documents and the predictable large volume of
data, we have locally installed the capability to process, integrate and edit many types of data
collected by teams from WP2-4.
This phase of work is designated as data processing.
Data processing consist on a series of operations, editions and manipulation of the original
data in view of it’s preparation for integration in the defined data model.
Also, the diversity of formats for both geographic and alphanumeric data has implied that
each single file of set of files had to be individually evaluated in order to define the necessary
procedures towards its integration in the database.
Next we will present a series of examples of data processing issues, and how they were
applied to the some practical situations.
o Format conversion
One of the major aspects that we had to deal with was the diversity of formats in
which data was available.
As a first level, we had to deal with the possibility of having data not only in digital
formats, allowing for more automated processing, but also data only available in paper
documents.
Fortunately, most of the data available was already in digital formats, thus potentially
more “easy” to integrate.
However, data available only in paper formats has proven to be of major importance
and with relevance for the type and complexity of the simulation models that could be
applied in the next stage of the project.
This was because most of the data available on paper corresponds to temporal data,
such as climatic series or historical data from hydrochemical monitoring campaigns,
which are essential for use in such model where they need to be applied to calibrate
these models.
The more detailed and more extensive temporal data we have, the more reliable the
simulation results potentially become.
14
GIS environment database and report
Also, in a certain number of cases, more tabular data was found to be stored in digital
formats not compatible with a straightforward process of integration in the database.
These situations occurred when tabular data was stored in static formats such as TIF
images, Corel Draw files, pictures pasted into Word documents, pdf’s, etc.
In such situations as well as the case of paper documents, the use of semi-automatic
procedures was applied to convert data into usable formats.
Applications such as OCR (Optical Character Recognition) were used and with some
additional manual error checking, most of the information could be recovered into
digital format.
Although very time consuming, the fact that this tabular information was, in some
cases, more detailed than the available digital information, indicates that the resources
used in this process are well justified.
Format
Digital formats
Adobe Reader
ArcInfo Coverage
ArcView legend
ArcView Presentation
ASCII
AutoCAD DWG
AutoCAD DXF
Compress file RAR
Compressed file ZIP
Corel Draw Drawing
DBASE table
Image BMP
Image IMG (ArcGis)
Image JPEG
Image MrSID
Image TIF
Image WMF
Mapinfo (at least .MAP+.TAB)
MsAccess
MsExcel
MsPowerPoint
MsVisio
MsWord
Print file
Shapefile (at least .shp+.shx+.dbf)
TIN (ArcGis)
Hardcopy – Paper: Reports/Thesis/Articles
Unidentified / Incomplete data sets
Extension
pdf
avl
apr
dwg
dxf
rar
zip
cdr
dbf
bmp
img
jpg
sid
tif
wmf
mdb
xls
ppt
vsd
doc
prt
Nº of files
4 445
9
201
8
28
10
63
10
1
7
4
59
2
4
161
187
23
14
425
1
60
1
8
196
30
2 924
9
21
77
Table 2: Data collection statistics: types of formats in which original data was received
15
GIS environment database and report
For the other digital data, of geographic or alphanumeric nature, multiple platforms
were made available: GIS applications, tools for translating and transforming
geographic data, spreadsheets, CAD, image processors, etc.
That way, each file could be effectively open and analysed. After, and if necessary, the
procedures necessary for converting it into the appropriate format for database
integration would be specified.
o Projection system conversion
As the name Geographic Information System indicates, the information associated
with these systems has a geographic component.
It is this geographic information that distinguishes cartographic data and maps, from
simple draws or schematic representation.
Geographic information means that for each element, there is a pair or group of
coordinates that uniquely position’s it in the earth surface.
Geographic coordinates are referenced to a certain projection system. Projection
systems are systematic transformations of the spheroid shape of the earth so that the
curved, irregular, three-dimensional shape of a geographic area on the earth can be
represented in two dimensions, as x,y coordinates.
Since the heterogeneous shape of the Earth can not be exactly measured, a series of
approximations and models define the parameters to be considered, such as the Earth
major and minor radius of the spheroid, the angles, etc.
These parameters are then used in the mathematical expressions that convert data from
a geographical location - latitude and longitude - on the Earth’s spheroid to a
representative location on a flat surface of a map.
The definition of the projection system is of relevance whenever working in a project
in which data from different sources is involved. This occurs, because if we want the
spatial database to integrate and jointly analyse all these layers, they have to be in the
same projections systems so that we can overlay them.
A specific situation in which the correct definition of the projection system is essential
is the calculation of length and area related values, as it is the case on many variables
used in hydrologic modelling.
Another important aspect is when working with complex topologic features such as a
hydrographic network. In this case, the GIS uses the coordinates to analysis the
connectivity of the elements.
So, one of the first tasks was to identify what were the projections systems in which
the original data layers were sent and then, convert each data set into the project’s
projection system.
16
GIS environment database and report
This was not always an easy task, because the projection system of many layers was
not known, and by overlaying them, it was clear that several systems were present.
By the analysis of the totally of the geographic layers sent, it was possible to identify
those were the projection system was indicated and by comparison with the remaining,
it was possible to deduct this information for the rest.
For the future, the problem of identification of the projection system will be simple
since the geodatabase format stores this information.
Basin
Bolivia
Peru
Chile
Coordinates systems found
in original data
GCS WGS84
UTM Zone 19S WGS84
GCS WGS84
UTM Zone 18S WGS84
UTM Zone 19S WGS84
UTM Zone 19S WGS84
UTM Zone 19S PSAD 1956
Geographic
Projected
Geographic
Projected
Projected
Projected
Projected
Coordinate system for
ArcHydro database
UTM Zone 19S WGS84
UTM Zone 18S WGS84
UTM Zone 19S WGS84
Table 3: Coordinates systems identified in the original layers and used in the final database for each
basin.
Also, in the GIS environment, it will be possible to easily convert data from one
projection system to another, because it can be done by data sets and not individually,
layer by layer, as in other GIS formats.
o Geometric / Topologic correction
Topology can be viewed as a spatial data structure used primarily to ensure that the
associated data forms a consistent and clean geometric fabric.
In geodatabases, the concept has evolved and considers an approach where topology is
a set of governing rules applied to feature classes that explicitly define the spatial
relationships that must exist between features.
Examples of the topologic and geometric corrections necessary to assure the
consistency with the conceptual definition of a certain feature and its geometric
representation are:
• For an area to be considerer a polygonal feature, the line defining its
surrounding boundaries must be a perfectly closed line;
• The common boundary between two adjacent areas must be defined by lines
that are perfectly overmatching
• When two lines, like for instance, two streams intersect, the lines
representing them must be broken at the exact point of intersection.
Other situations where topologic and/or geometric corrections are necessary are when
features are represented by a certain geometry that does not comply with the
specification of the selected data model. Examples of this situation are:
17
GIS environment database and report
•
•
Depending on the objectives, the original geometric precision and the
graphical display, some times a feature like a lake may be represented by a
simple line (not necessarily closed) that defines its banks. However, in the
ArcHydro data model, features such as lakes, swamps or water reservoirs
are represented as polygons;
Many time, to facilitate better and more presentable printouts, features such
as climatic station, gauge stations or production wells are represented as
small circular areas, when in fact in the ArcHydro data model, they must be
converted into points.
As can be observed in the figure presented below, from Lake Poopó, in Bolivia, in the
hydrography original data file, all the elements were represented by simple lines.
There were elements such as rivers, river banks, lakes, water reservoirs and water
springs.
Each line was after classified and separated into different thematic layers; lines
representing water reservoirs and lakes were also corrected to form perfectly closed
areas.
Figure 8: Example of Bolivia, Lake Poopó: data processing of hydrography data.
Apart from these corrections and adjustments of topology, there also was also a very
frequent situation where adjacent features and individual elements were present in
multiple, geographically adjacent files, which had to be “collated” together.
In these cases, some necessary procedures might be the merging of adjacent,
equivalent areas, or the elimination of pseudo-breaks in lines segments that represent
the same element.
18
GIS environment database and report
Figure 9, presents the original data for the rivers and streams features that were
available in the case of Chili River (Peru). The 11 distinct files were collated together
into a single, homogenous base and pseudo-nodes were eliminated.
Figure 9: Example of Peru / Chili basin: The original river cartography base.
Also, since in this case, cartographic data was generated for map production and
printing, there were no intersections created at where different river segments meet.
Those intersections had to be added to the lines.
Figure 10: Example of the original river’s cartography for the Chili River, before processing.
19
GIS environment database and report
And finally, each line was geometrically corrected to ensure that the digitizing
directions would match water flow directions.
Figure 11: Example of the original river’s cartography for the Chili River, after processing.
Topologic and geometric conformity of features to the data model is absolutely
required for database implementation.
Therefore, all the elements of the data layers integrated in the project’s database were
systematically analysed and all the necessary correction were made.
In terms of priorities in data processing, and due to its obvious importance to the
construction of each basin’s database, the main efforts regarding geographic data
processing were focused in all files regarding themes such as river network, lakes,
shore lines, water reservoirs, irrigation channels, pipelines and other water transport
structures, water withdrawal and water discharge point, monitoring stations, drainage
features and topography (contour lines and spot heights).
o Table reformatting
Along with the geographic data representing the spatial distribution of features, there
are other types of information that are essential in a DSS for water resources
management.
One of such cases is times series, temporal and/or historical data of flow and water
quality measurements taken at gauges or monitoring stations, as well as climatic
variables.
Some of this data is measured and archived at regular intervals, such as annual,
monthly or daily series; others are collected at irregular intervals, like for instance,
water quality samples, that are collected occasionally.
One of the main aspects of working with temporal data is being capable of creating a
database that can be accessible to other independent water resources models.
20
GIS environment database and report
Even if GIS interface and graphic capabilities have significantly improved, in most
cases, the best work environment to analyse temporal data, still remain more usable
and friendly applications such as spreadsheets or specifically designed applications.
So, the TimeSeries component of the ArcHydro model is clearly a component that is
not so much direct to the GIS but to the more technical analysis that can be obtained
by hydrologic/hydraulic simulation models.
The main difficulties when working with time series results from the fact that we are
often dealing with very voluminous information. Also, frequently there might be some
problems related to data formats, such as units, data precision, detection limits and
date and time representation.
On the other hand, hydrologic models also use their own internal formats to work with
temporal data.
So, the ArcHydro proposes a basic structure to store undifferentiated time series into a
unified system.
In the ArcHydro data model each value stored in the table represents three dimensions:
(i) the date/time at which the measurement was made; (ii) the type of variable being
measured and (iii) the spatial feature or location at where it was measured.
In contrast with this very simple structure, the original data files that contain the
values to be migrated are normally present in many distinct formats.
In the case of the three basins considered, temporal data could be found in tables
inserted into reports, paper documents, in plain ASCII files or more structured
MsExcel files, in row oriented or column oriented tables, with a clear indication of the
station name or code at which the measurements were made or at locations that had to
be interpreted in order to relate them to the monitoring stations geographic location,
etc.
The work involved an initial analysis of many type of documents, to identify which
reports, which tables or which individual files contained this type of data.
Then, all the different tables, structured and organized differently, were systematically
imported into the ArcHydro’s time series tables, resulting in a uniform format for all
data, regardless of their origin, data provider or original format.
The automation of the procedures, when possible, was limited to groups of tables with
similar formats.
21
GIS environment database and report
Figure 12: Examples of some of the files and tables containing data for time series integration.
In this process of reformatting and integrating original tabular data into the standard
format, were of course, also included the files resulting from the paper-to-digital
conversion by the use of character recognition applications.
Figure 13: Example of the aspect of formatted time series table in ArcHydro data model
22
GIS environment database and report
o Attributes normalization and data reclassification
As in many other projects of this type, the possible data providers can be of very
distinct nature.
Apart from some data generated by previous projects participated by the teams from
CAMINAR, it was also possible to collect data produced by universities, consultant
companies and public entities such as institutes related to agriculture, water
management, cartography, among others.
Each of these entities had a specific purpose and the information they produce uses
criteria, data management systems and classification systems that are specific for their
own objectives.
It is common that a rivers and streams map produced by an entity related to
cartography and map production looks very different from a map produced by an
entity related to water management.
Also, many times, classification systems are very specific of a certain regional area
and needs to be worked so that it can be generalised and more easily integrated in
hydrologic simulation models that, one their hand, use their own data classes and
scales.
This type of questions is frequent when dealing with thematic information such as
soils, land use, land cover, geology, ecology, climatic regions, vegetation and others.
In many cases, although named differently, some data layers presented overlapping
objectives, but not uniform classification.
Such themes were produced with very varying spatial resolutions as well as using
classification systems that depended on the particular objectives and the nature of the
project they derived from and sometimes, with very specific local to regional
approaches.
Considering the future use of these types of information in hydraulic/hydrologic
modelling, some reviewing had to be done, so that we can achieve values and classes
that have more meaningful and clear definition.
This type of work is currently undergoing study and some normalization systems are
being analysed. The objective is to find some type of standard, well documented,
practical systems that could be applied to the data.
An example being focused is the case of the thematic layers such as land cover, land
use and vegetation. From the preliminary analysis made to documents and files of
these data categories, we’ve concluded that the definitions and classes are unclear and
mixed.
For this particular situation, the definition of land use and land cover could be revised
using some standard systematization like, for instance, that proposed by FAO. This
23
GIS environment database and report
organization has available on their website a guide (“Land Cover Classification
System Classification concepts and user manual Software version”) and a freely
available software that aids the application of the proposed classification system.
Figure 14: Aspect of the official website of the GLCN Land Cover Topic Centre, and the software
available to assist in classification and normalization of land cover themes.
Other standard classification systems could also be analysed, such as the CORINE
Land Cover project. Also in this case, the proposed system is well described in public
documents, available through the internet, presenting the framework, objectives and
methodology.
The use of classes that can be easily be related to FAO’s or CORINE’s systems is
frequent in modelling applications.
So, since the ArcHydro data model, is flexible in the type of additional attributes
associated with each feature, we could have a multidimensional vision of such
thematic layers, mixing local classification systems along with more clear standard
classifications.
Nevertheless, at this moment, this reclassification requires more detailed work, in
documenting and understanding the definitions of the classes originally applied, so
that correct correspondences can be made.
Also, proposed reclassification needs to be extensively discussed and validated by
local teams from WP2-4, using their knowledge of local reality to critically review
every option.
Another step in this data processing task is the data attributes normalization. This step
is intended to revise the original data structure applied in data production that was not
always adapted to the environment of relational databases.
24
GIS environment database and report
Examples of this situation are when classes of a certain thematic layers were described
by very long text attributes, directly associated with geographic elements.
In this situation what happens is the a lot of database space is allocated to store this
information, which does not present a problem when dealing with small scale projects,
but has to be avoid in medium-to-large scale projects.
Using relational database functions, the geodatabase model resolves this situation by
the use of domains. Attribute domains are rules that describe the legal values of a
certain field of the database and are used to constrain the values allowed in any
particular attribute for a table, feature class, or subtype.
Thus, the information regarding classes is not stored in the feature’s table but in other
tables, accessible through relational functionalities, significantly reducing the size of
the database and improving the time for data access.
Domains can simultaneously act as a legend and a validation tool for certain features
since only values registered in the feature’s domain will be accepted.
Another aspect in which this process of attribute normalization will be notice is in the
clearness of the information. Since many of the information were stored in long text
type attributes, there were many variation in the orthography, punctuation,
abbreviation and in the use of accents. The use of a domain values for identifying
these thematic features will eliminate the variability and errors in its description.
o Cross-Populating - Mixing different sources
When building a spatial database using as a starting point a multiplicity of data layers
and files from different sources, it is probable that, in some cases, there will be more
than one file or layer about the same related subject.
Sometimes, these multiple files can be analysed together so that inconsistencies can be
detected and resolved.
Other times, and since these data sets were produced by distinct data providers, they
contain different views of the same subject. They can than be combined to form a
much more rich view of the theme.
When such situations occurred for a certain theme or layer, it was possible to
compensate some more incomplete data sets with additional information from other
geographic layers or related tabular data.
In the example presented in figures 15 and 16, two data layers containing information
about land parcels where combined: one of the layers contained information about
owners and the other layer contained information about agricultural production.
Cross-population can be achieved by the use of common codes or unique identification
attributes, or by spatial analysis.
25
GIS environment database and report
Figure
15: Example of cross-population: layer “Superficies_riego_elqui231205” and layer
“Panos_cultivados_elqui” contained different types of information about the same feature
(land parcels)
Figure 16: The resulting layer contains the aggregated data from both data sources.
26
GIS environment database and report
2.1.5
Database implementation
Database implementation is a stage of the work through which data from different sources and
formats is put together in a single, uniform database structure, according to a predefined data
model and specified relations (geometric or alphanumeric) between the different hydro
features.
Through this stage of work, the final data repository is generated and prepared for its future
use associated with several possible hydrologic modelling systems, under analysis.
The choice of which type or what particular hydrologic model can be used, is now possible in
an informed way, since we can effectively see if the required amount of data and parameters
are available for proper model implementation and calibration.
It is after this stage, that we can establish a second phase of work together with teams from
WP2-4, in order to focus on model development according to their needs and expectations.
Next we will describe some of the most important items that were focused at this
implementation stage.
o The HydroID concept
The HydroID is an integer value that uniquely identifies every single feature in the ArcHydro
geodatabase.
Like normally used in RDBMS, each object is uniquely identified in the table it is stored, by a
code (generally numeric), that identifies, in an unequivocal way, each object in the moment it
is created.
In the ArcHydro data model this concept is extended in such a way that each element in the
geodatabase is uniquely identified by a numeric code that in unique through out the whole
database.
This mean, that even if distinct hydrofeatures are stored in separate tables, the HydroID of an
element is never repeated in the database.
Because of the importance of this attribute within ArcHydro, it has to be managed very
carefully, through a pair of tables – the LayerKeyTable and the HydroIDTable.
Using the ArcHydro’s “Assign HydroID Tool”, each time a new element is created, a new
HydroID is assigned to it and the counter is updated so that this same code will never be
assigned to another element.
This unique feature code is used along the geoprocessing to identify the relationships between
elements, using parallel attributes. Here are some examples of this use of a relational
structure:
a) When the relations between distinct catchment areas are defined the attribute
NextDownID identifies, for each catchment, the HydroID of the next downstream
catchment;
27
GIS environment database and report
b) In the HydroNetwork feature the NextDownID attribute indicates the HydroID of
the river segment that is downstream of the element, thus storing the direction of
water flow into the database itself;
o Drainage analysis using digital elevation models
As rain fall on the Earth surface, the part that either does not infiltrate into the soil or doesn’t
evaporate, runs off the land surface, from the higher land areas down to a stream, then into a
river and finally to the sea.
The physiographic parameters of the Earth’s shapes directs the flow of water to the valleys
and, in its turn, is modelled by the erosive power of running water.
This intimate relation between the shape of the landscape and the stream network is expressed
by the drainage features.
So, traditionally, drainage feature, such as dividing lines, are obtained by different forms of
analysis of topographic information.
In the ArcHydro data model, drainage areas are divided in three levels of representation:
a) Catchments – are elementary drainage areas directly derived by the application of
physical rules to the topographic model;
b) Watersheds – area division of the basin into drainage areas defined by particular
purposes, and where the outlet of a watershed is not necessarily coincident with a
catchment outlet point. These watersheds are generally water management units;
c) Basin – is a set of drainage areas, that can be approximately defined as the study
area, since its geographic definition is used a basic water management unit.
The interest of using different definitions of drainage system is directly related to the
interaction of ArcHydro with external modelling applications.
The concept of catchment as the elementary drainage unit is easy to apprehend due to its
direct relation to the landscape shape and parameters.
Nevertheless, the use of hydrologic models for water management and planning requires a
different geographical unit. In this situation hydrologic processes and water resources issues
are commonly analysed by the use of distributed watershed models.
The watershed models require physiographic information such as configuration of channel
network, location of drainage divides, channel length and slope, subcatchments geometry and
properties, etc.
Thus, the ArcHydro data models, is adapted to multiple situations, since it already takes into
account different geographic basic units for modelling and analysis.
28
GIS environment database and report
To be able to generate automated and reproducible drainages areas, digital elevation models
(DEM) are used.
To produce these models, topographic information is required, such as contour lines, spot
heights, natural sinks and sources, break lines, and others.
Although digital elevation models can be modelled using a irregular triangular network (TIN),
in what concerns hydrologic analysis, it is more common to use models formed by altitude
values placed at regular spaced interval, that define a grid or a raster format.
To generate these models, some basic data is required, generally files produced for
topographic or cartographic purposes.
Normally this type of data is very voluminous, especially if we consider the average area of
the three basins being studied in this project.
The stating point is a file or set of files (representing adjacent map sheets of topographic data
that are merged together) that are edited in order to remove identifiable and/or systematic
errors. This situation is very common, since the enormous amount of contour lines makes it
very probable to contain errors, sometimes only identifiable after the production of the DEM.
One important parameter to define when building raster DEM is the grid cell size. In the
vector data formed by the contour lines, the vertical interval between contour lines is
generally directly related to the map scale and data’s geometric precision.
When converting the vector data into raster data, the question of the geometric precision is
transpose into the grid cell size.
Nevertheless, we can’t say that there is a unique relation between original map scale, contour
lines interval and grid cell size. Some other factor may affect this transformation and must be
considered.
Some literature review regarding the generation of digital elevation models, presents some
analysis of the impact of the original topographic data, and the purpose for which the DEM is
intended, in describing some of the constraints related to DEM spatial resolution.
However, is very difficult to find literature that clearly proposes some kind of statistics or
parameters that indicate the best grid cell size (or grid resolution) to be used.
Authors like Hengl (2006), propose some method for defining grid cell size depending on the
spatial resolution and nature of original data, but also on the morphology and complexity of
the terrain itself, which is interesting.
Other authors, like for instance Maidment & Djokic (2000) or Wechsler (2006) refer some
grid cell size most suited for DEM resolution but only from the hydrologic analysis point of
view, don’t taking into account that sometimes high resolution, reliable data is very difficult
to obtain or too expensive for some types of projects.
29
GIS environment database and report
Nevertheless, there some empirical methods that can be used. One possibility is to generate
several DEM with different grid cell sizes, and then reverse the process, generating contour
lines from these DEM and comparing them with the original data.
At some point it is possible to generate a DEM that is not too coarse to produce low resolution
analysis, which would have a significant impact is the identification of major drainage lines,
neither a DEM that is so fine, that altitude values are extrapolated and became unrealistic.
For the three basin considered for analysis, several type of topographic data was available.
For instance, in the case of the Rio Elqui basin (Chile), there was already a previous existing
TIN model, that was further extended to the Pan de Azúcar and Chacay areas, by the inclusion
of contour lines.
The contour interval for this base information was, in average, 50 meters, and there weren’t
any other type of topographic data available. The resulting DEM was produced with a 50
meters resolution, which, after several essays proved to be the best possible achieved
resolution.
CHILE – Elqui River DEM
Original topographic data
Contour lines
Contour lines vertical interval 50 meters (some minor areas with
25 meters interval)
DEM cell size (x,y)
50 x 50 meters
DEM columns and rows
3012, 2578
DEM Min value
0 meters
DEM Max Value
6200 meters
Uncompress file size
29.62 MB
Table 4: Chile – Elqui River DEM statistics
Figure 17: Digital elevation model generated for the Elqui basin, with overlaying river network
30
GIS environment database and report
Figure 18: Derived DEM information for Elqui river basin: slope grid (left) and aspect grid (right).
In the case of the Peru study area, on the Chili river basin, the original topographic data, was
obtained from the national geographic institute, and had very good resolution. It consisted of
17 files, of adjacent maps sheets and themes, containing contour lines and 21 files with spot
heights information.
Due to this good quality data, it was possible to generate a DEM with grid cell size of 30
meters. This high resolution topographic model was very important for the subsequent phases
of work, namely the definition of drainage features from the DEM, because the Peru area has
a very complex topography, with very abrupt and narrow valleys, which could only be
imprinted into the model using a smaller cell size.
PERU – Chili River DEM
Original topographic data
Contour lines and spot heights
Contour lines vertical interval 50 meters (some minor areas with 25 meters
interval)
DEM cell size (x,y)
30 x 30 meters
DEM columns and rows
7215, 5700
DEM Min value
0 meters
DEM Max Value
6283.55 meters
Uncompress file size
156.88 MB
Table 5: Peru – Chili River DEM statistics
31
GIS environment database and report
Figure 19: Digital elevation model generated for the Chili river basin, with overlaying river network
Figure 20: Derived DEM information for Chili river basin: slope grid (left) and aspect grid (right).
In the case of Lake Poopó, in Bolivia, some topographic data, namely contour lines, was
obtained. Currently the hypothesis of obtaining new topographic data for a wider area is being
attempted, because this way it would be possible to generate a DEM for an area overlaying
the total geographic extension of the hydrographic network, and thus, making possible the
production of overmatching drainage statistics.
32
GIS environment database and report
Figure 21: Base data: contour lines for the Lake Poopó area and overlaying hydrographic network.
After producing the DEM for each basin, we can use this newly generated data layer to
produce other derived information layers, such as slope and aspect grids and to obtain
physiographic parameters for the landscape.
But we also need to use these layers to generate information and parameters about the
drainage system, both qualitatively as well as quantitatively.
In the hydrologic analysis of a certain area, we want to use DEM to generate other types of
information such as:
a) Flow direction grids: that indicate for each cell in the grid model, the direction
of the water flow along the direction of steepest descent;
b) Flow accumulation grid: a grid that records the number of cells that drain into
a particular cell, thus representing the area of accumulated flow at each point.
c) Stream definition: a grid that defines the individual streams that are obtained
considering that a drainage line converts into a stream when the area of
contributing drainage accumulation exceeds a threshold.
However, to obtain realistic results in the generation of such grids, sometimes the digital
elevation model generated solely by the input of topographic data must be modified. This
process that may be described as the generation of a Hydro DEM (DEM for hydrologic
analysis).
Due to particular configuration of the input data or unidentified errors, it is common that non
realistic features will appear the DEM, thus affecting derived drainage feature.
33
GIS environment database and report
An example of this situation is the generation of artificial sinks (closed depressions on the
terrain) that have no real existence. Consequently, the drainage features generated from these
DEM, would contain many endorheic drainage systems.
One type of processing of the original DEM towards a hydro DEM is then filling these
artificial sinks. This procedure can be done in a semi-automated way, by the initial
identification of these features on the original DEM, and posterior filling of the artificial
depressions on the landscape.
Another type of processing normally applied to the generation of DEM for hydrologic
analysis is to force the imprinting of the river network onto the DEM.
Factors such as the positional quality of the topographic data, the spatial resolution and the
complexity of the landscape, can produce DEM than when processed to generate the streams
grid (major drainage lines), presents poor results when compared to the vector river
cartography.
This situation can be solved by the use of methods such as the AGREE method (Hellweger,
1997) included in the ArcHydro Tools.
This method forces the effect of the river network in the drainage system by exaggerating the
aspect of the river valleys in the DEM. In result, this “excavated” DEM maximizes the slope
defined by the modified river channels, making drainage features to converge to the actual
river network.
Figure 22: Surface Reconditioning by the AGREE method (Hellweger, 1997)
This type of processing produces very good results in areas where the resolution of the DEM
is unable to identify the river valley and also in flat areas, where the major drainage lines are
barely identified by the elevation models.
34
GIS environment database and report
Figure 23: Comparison of the flow accumulation grid and the river network for the Chili river basin
(Peru)
Figure 24: Comparison of the flow accumulation grid and the river network for the Elqui river basin
(Chile)
This type of processing of the original DEM results in more accurate mapping of drainage
features and brings consistency to the various layers of information regarding the hydrologic
system.
Using this Hydro DEM grid, many drainage features can be derived from the landscape
model, and some parameters can be quantified:
Catchment areas
Drainage outlets and major drainage lines
Watersheds
Length of longest flow path inside each watershed
Slope along major flow path
Etc.
35
GIS environment database and report
These are some of the features and values required as input by many hydrologic and hydraulic
simulation models.
Figure 25: Catchment areas and watershed areas for the Chile river basin (Peru).
Figure 26: Catchment areas and watershed areas for the Elqui river basin (Chile).
o Hydrographic network
One the path of the evolution of the GIS to more elaborated data models, it has become
possible to use topological network in these systems.
A network is a system of interconnected elements, such as lines connecting points. Examples
of networks include highways connecting cities, streets interconnected to each other at street
intersections, sewer and water lines that connect to houses, and also a river network, with
river segments connecting to each other in a sequential order.
In a hydrographic network there are two types of elements: edges and junctions.
The edges are the elements through which the water flows (river segments).
Junctions are points that connect the edges (river intersections) or other points along the
system with particular impact in the movement of the water along the network (may be water
pumps, bridges, a water discharge stations, a hydrometric monitoring station, etc.)
36
GIS environment database and report
In the geodatabase view of ArcHydro’s structure, these features are respectively named
HydroEdge and HydroJunction.
Connectivity is an inherent property, as it defines how the flow travels over the network.
Network elements, such as edges and junctions, must then be interconnected to allow
navigation over the network. Additionally, these elements have properties that control
navigation on the network.
One of such properties is, in the case of a river network, is the flow direction. Unlike a street
network, water only flows in one direction: downhill. So, this information must be added to
the elements, for instance, by defining that the digitizing direction of lines is the same as the
flow direction.
The river network is the backbone of the system, describing the motion of the water through
the landscape. It connects catchments with their stream channels, describes the connectivity of
points along rivers and determines the ordering of flow as it passes from one river reach to the
next.
As previously mentioned, in the case of a geometric network features, topologic and
geometric conformity of the elements to the data model is absolutely required. This means
that, when building the river network for the three river basins under study, each element of
original files was corrected in order to resolve situation such as:
- Gaps (lines not exactly connected at intersections)
- Overshoots (small lime segments that overpass the intersection point with another
river segment)
- Loops (lines that form loops, crossing itself, thus generating circular, undetermined
flow patterns)
The original source data used to build these networks was very heterogeneous. In the case of
the Elqui river basin (Chile), there was only one file, with some considerable degree of preprocessing, that practically had eliminated most of the edition errors, and most of the lines
already built with the digitizing direction parallel to the water flow.
In the case of the Chili River (Peru), original data consisted of 11 files, each correspondent to
a map sheet, produced by the geographical national institute. Those eleven files had to be
collated together.
Also, since this information was produced for printing purposes, the cartographic base had to
be integrally edited in order to remove all the pseudo nodes at map sheets boundary, to correct
gaps and overshoots, to create intersection at where different line segments met and to adjust
the digitizing direction according to the water movement.
In the case of Lake Poopó, in Bolivia, original data contained a mixed of different line types,
from rivers, lake boundaries, river banks, water springs, water reservoir boundaries, irrigation
channels, among others. So, each line had to be associated with its type, and the remaining
base, containing only river lines, was then processed in a way similar to the Chili’s case.
37
GIS environment database and report
River network statistics
Number of edges in the Elqui river (Chile) network
937
Cumulated length of the Elqui river (Chile) network 3 484 Km
Number of edges in the Chili river (Peru) network
3 906
Cumulated length of the Chili river (Peru) network 9 898 Km
Number of edges in the Lake Poopó (Bolivia) network
1 091*
Cumulated length of the Lake Poopó (Bolivia) network 4 930 Km*
* - provisional version
Table 6: River network statistics for the three basin areas.
When the network has been built, many types of specific analysis can be performed. In first
place, we can perform some analytical operation that graphically displays the underlying
topological relations that are associated with it.
For instance, we can “navigate” in this network and, and for a given point in the network we
can identify which segments are upstream or downstream of a certain location:
Figure 27: Examples of network analyses: selecting lines upstream or downstream of a certain point.
The advantages of using network analysis are not only limited to displaying the
interconnectivity of the elements on the screen.
The results of this type of analysis can also be stored in the database and used as an input for
other type of applications.
In the HydroEdge feature, an attribute such as NextDownID stores, directly in the database,
the unique code of the connected downstream element. Other attributes present consolidated
analysis, such as cumulated length of upstream or downstream segments, average slope of a
certain portion of the hydrographic network or others.
38
GIS environment database and report
o Monitoring stations and temporal data
In the ArcHydro data models there is a group of features generically named HydroPoints.
These HydroPoints are distributed by 7 types or classes.
a) In first place, three of these classes are Dams, Bridges and generic Structures. They
represent man-made or natural features whose presence affects the natural movement
of water.
b) In second place, we have the WaterWithdrawal and the WaterDischarge types. They
represent points on the river network where water is removed or added to the system.
c) Thirdly we have the MonitoringPoints, which are location on the network where the
water flow or quality is measured.
d) Finally we have the UserPoints, to accommodate for relevant points in the river
network, that don’t fit any of the other classes, but are important for hydrologic or
purely cartographic use. They can be for instance, a point where the river crosses an
aquifer boundary.
So, we find in the HydroPoints features, a way to integrate our tabular, temporal data, and
associate it with a specific location on the hydrologic system. And this is obtained by relating
the times series tables of the ArcHydro data model with the MonitoringPoints.
As in all the structure of ArcHydro, the relationship between a particular monitoring station
and the historical values measured at that particular location is made by the use of the
HydroID unique code.
For instance, the flow values measured at a particular hydrometric station is characterized in
the TimeSeries table by the FeatureID attribute that, on its hand, corresponds to the same
HydroID code of the geographic element representing the monitoring station.
So, this particular location and the correspondent set of flow values, represent the quantity of
water that passes at a specific point of the river network.
However, there is a particular aspect to this situation.
In almost all GIS cases, when we plot a map with the location of monitoring stations and the
river network, we can verify that these points aren’t generally placed on the centre of the
streams.
This common situation occurs because of two factors:
a) Due to the fact that the coordinates used to locate the monitoring station, for instance a
gauge station, represents the location of the infrastructure or building in which the
equipment is installed. So, these devices are installed on the river banks;
b) Also, in other cases, there could be different geometric precision of the cartographic
layers containing gauge positions and river lines. Thus, it’s unlikely that they would
match.
In many projects, one of the possibilities applied to overcome this setback was to displace the
monitoring point from their original location, to a position on top of the river network.
39
GIS environment database and report
The ArcHydro data model, proposes another approach. And this is done by the use of
relational properties of the database and by the possibility of permanently storing theses
relations into the database itself.
This is obtained by the use of the HydroJunctions features of the river network.
As described before, these junctions are points along the river network that identify particular
features or events in the hydrologic system.
So, we can add a particular junction to our network, to identify the presence of a nearby
monitoring station.
That way, we can maintain the MonitoringPoints at their original position, and use a related
features (HydroJunction) to account for it’s presence in the overall water movement.
JunctionID
Relationship
HydroID
Figure 28: Connecting MonitoringPoints to the river network.
Network
Hydrography
Figure 29: Geodatabase view of the connection between river network, monitoring station and temporal
data.
40
GIS environment database and report
3
CONCLUSIONS
The present version of the GIS environment database is the cumulated work of teams from
WP2-4, responsible for the data collection task and team from WP6 with the definition and
implementation of the necessary procedures to generate a database to support decision making
more effectively.
Even though there was a considerable amount of supporting data about each of the three
basins under study, the original data sets were in an enormous variety of formats, media,
standards, that made it very difficult to have a global view of each case.
To assure that the decision making process is practical, usable and reliable, it is necessary that
the whole data sets concerning each basin, may they be about ecology, topography, climate or
social-economic factors, can be analysed together.
The use of GIS and relational DBMS adds value to the original data sets, by migrating and
converting them into a normalized database that can simultaneously accommodate both
geographic and alphanumeric data.
This process enhances data quality through the application of validation procedures that detect
and correct inconsistencies, on individual elements or entire data sets, and by imposing a
framework for structuring the information, according to pre-defined topologic/geometric and
alphanumeric criteria.
This spatial database is more easily managed than the scattered original data sets and can be
easily used to promote continuous updates and upgrades of the information, thus improving
the decision process by the use of the most actual and reliable information.
Also, the use of standard formats and systems allows this data repository to interact with a
variety of other applications, namely hydrologic and hydraulic modelling packages, such as
the various possibilities that will be tested and evaluated for during the second stage of our
work within the CAMINAR project.
Finally, but not less important, the work has been developed taking into consideration the
necessary transfer of this database, and the facts that are behind it, to the final users.
Therefore, standard formats, public domain data models and well documented procedures
have been used, aiming to generate reproducible results.
41
GIS environment database and report
4
REFERENCES
Felicísimo, Angel M - 1994 - “Modelos Digitales del Terreno: Introducción y aplicaciones en
las ciencias ambientales”. Pentalfa Ediciones, 122 pp.
Hellweger, F. - 1997 – “AGREE - DEM Surface Reconditioning System”. University of
Texas, Website: http://www.ce.utexas.edu/prof/maidment/GISHYDRO/ferdi/research/agree/agree.html
Hengl, Tomislav - 2006 – “Finding the right pixel size”. Computers & Geosciences 32, 1283–
1298
Keenan, P. – 2004 - "Using a GIS as a DSS Generator". DSSResources.COM.
Maidment, D., Djokic, D (ed.) - 2000 – “Hydrologic and hydraulic modelling support with
geographic information systems”. ESRI Press, 216 pp.
Maidment, D. (ed.) - 2002 – “Arc Hydro: GIS for water resources”. ESRI, Redlands, CA
Wechsler, S. – 2006 – “Uncertainties associated with digital elevation models for hydrologic
applications: a review”. Hydrology and Earth System Sciences Discussions 3, 2343–2384
42
GIS environment database and report
ANNEX I
DATA COLLECTION GUIDELINES
1
OBJECTIVE
The purpose of this document is to define some guideline to the collection of data and the
way to make data transfer between teams more efficient and clear.
In this initial phase of the project the objectives are to:
ƒ
ƒ
ƒ
ƒ
Identify the types of information required to build a GIS database
Identify possible data sources for each type of information;
Characterize and evaluate the data quality and availability
Identify the procedures necessary to integrate data from multiple types and sources
in a structured architecture.
In order to assess the various aspects of data quality and availability, the various data
collected, in the form of paper maps, existing databases, digital cartography, lists, etc, should
be accompanied by a description of parameters that will help in the evaluation of its content
and the steps and procedures necessary for its integration in the GIS.
In order to do so, a METADATA file should be filled as completely as possible for each
document collected.
The type of information necessary to fill the metadata file is described in the present
document.
The Metadata File (template) for each data type collect is presented in chapter 7 of the
present document.
2
SUMMARY
The first step to start building the database is to collect and obtain as much information as
possible for all the natural or man induced factors that can affect or be affected by the
hydrologic systems, for superficial or groundwater, and to allows an extensive
characterization of the river basin, natural process that define the way the water system
works and the man-made interventions that might alter the natural processes and/or be affect
by those changes.
Following data collection, the second step is to use this information to build a database,
using a GIS (geographical information system) to integrate, validate and analyze many
different subjects, and layers of information, in a structured architecture.
Later, the information gathered and organized in the database, can be used in simulation
models that will assist in the evaluation of parameters such as water resources, uses and
quality and the way different practices and factors will influence water availability for natural
or human use.
Annex 1 – Data Collection Guidelines
Page 1 of Annex I
GIS environment database and report
3
DATA TYPES
Helpful information for defining a database for watershed management may come from very
specific and well located in time and space sources, such as a time series for water quality
analysis collected at a monitoring station or from information of more general type, but yet
useful for the characterization of the system, such as a geological report containing pertinent
information on the structural geology or a synthetic statistical analysis of the climate
characterization for the basin.
Nevertheless, the more detailed and rich the information is, the more realistic approach can
be obtained in the simulation models.
4
DATA SUPPORTS
Data collected can be obtained both in paper or digital media.
Depending on the nature of the data contained in each media, it can later be converted or
integrated in the GIS database, or used as general parameters in the simulation models.
Examples of data collected in paper formats are:
ƒ
ƒ
ƒ
Reports containing pertinent information for watershed management, such as a
geological report or information regarding statistics for water use in the basin;
Maps that do not exist in digital format and that can later be digitized for integration in
the GIS;
List of values, for instance, climate time series or water quality analyses that don’t
exist in digital format.
Whenever possible, documents that only exist in paper formats should be converted to digital
supports, locally. This is particularly critical for paper maps, which are difficult to copy and
greatly loose quality from copy to copy.
If possible, maps existing originally in paper should be converted to vector maps or, if such
hypothesis is excluded, to images obtained by large scale and good resolution scanners.
Examples of data in digital formats can range from very well structured data, already
integrated in existing information systems, such as existing GIS or database, to tables of
values, digital reports, text files, with diversified formats and structures.
Examples of data collected in digital formats can be:
ƒ
ƒ
ƒ
ƒ
GIS layers/themes containing both geographical and alphanumerical information;
MsExcel files with list of values, for instance data collect at water wells;
Database files, such as MsAccess or Dbase;
Computer assisted Design (CAD) files, containing cartographic information;
Bellow is presented a list of possible data formats that can be used for future integration in
the project’s database:
Paper documents
Reports, thesis, articles, etc.
Maps
Lists
Annex 1 – Data Collection Guidelines
Page 2 of Annex I
GIS environment database and report
Digital Documents
ƒ For data containing geographical and non-geographical information
CAD files such as AutoCAD (*.dwg, *.dxf), Microstation 2D or 3D (*.dgn)
GIS formats such as ArcInfo coverage (*.e00), ArcView shapefiles (*.shp), ArcGis
personal geodatabases (*.mdb), Mapinfo (MIF/MID), etc.
Databases such as MsAccess (*.mdb) or Dbase (*.dbf)
Tabular data containing X,Y coordinates associated with other values and information
such as MsExcel files (*.xls, *.csv), ASCII files (*.txt, *.prn), GPS files, etc.
Satellite images or orthophotomaps (*.Tiff, *.Tiff with World File, GeoTiff, *.ECW,
*.Sid, BIL, BIC)
ƒ For other data without information related to it’s geographical location
ASCII lists, MsExcel files, MsAccess or Dbase database files, etc.
Other lists or tabular data contained in reports or articles in digital format.
5
DATA DOCUMENTATION
In order to be able to integrate each type of information in the database’s system, some
additional information on the nature, type and source of the data may be required.
That way, is necessary that each document, layers or file collect, should have a complete
description and characterization to its origin and the nature of its content.
In a general way, the additional information about the data collect can be synthesized in the
following main topic:
a. Data Source
It is fundamental to know where each type of information or document came from, who
produced it, when and for what purpose.
For instance, a report with information regarding water quality produced by an agriculture
institute, would have a different approach in comparison to a water quality study made to
determine drinking water requirements. Both source can have important information from a
watershed management system point of view, but must be evaluated and processed
differently in order to its integration into a unique structured database for the whole basin.
Also, the date from when the information was produced can be of relevance too. An example
would be a very out-dated map for land use, where, for instance, the areas of agricultural
activities would be much less than the present day situation. It is important to know that in
the modelling environment, in order to account for some underestimation of impacts from
agricultural use.
b. Data Content
Each file or document may contain different type of information with different uses in the
database’s structure.
So, the content of each file should be described as completely as possible so that the
necessary steps for its integration in the system can be correctly evaluated.
For instance, a Digital Elevation Model produced in a GIS, should be accompanied by a brief
description of its source, who produced it, when, using what kind of base information
Annex 1 – Data Collection Guidelines
Page 3 of Annex I
GIS environment database and report
(contour lines, radar information, etc.), an indication of its format, of the software version, of
the cells resolution, etc.
An MsExcel file, containing a list of values describing the times series of values collected at a
meteorological station, should be accompanied by a description of the number of columns
present in the file and the name of each column.
A CAD files should preferably be accompanied by a description of its content, name and
number of layers present, content of each layer, scale or resolution, version of the software,
source or producer, type and format of the information presented in each layers, etc.
c. Software versions
Files produced using the same software but with different version may need different
procedures in order to be integrated in the database.
So, it is very helpful to know in each case, the version of the software used.
For instance, a Microstation (CAD) file of 3D type or 2D type needs to be processed
differently in the GIS; a personal geodatabase (*.mdb) produced with ArcGis 8.3 or ArcGis 9,
have substantial differences in their structure, therefore requiring a different approaches.
So, whenever possible, software version should be indicated for every kind of digital theme.
d. Geographical Information
Since most of the information required to build a database for watershed management is
dependent on the spatial position of each phenomena, most of the information required to
build the system comes from documents, both in paper or digital support, that contain the
geographical position of information.
Such documents can be paper maps, GPS files with data collected in the field, lists of data
associated with its X,Y coordinates, digital maps in GIS or CAD formats, etc.
X,Y information has its own standards, so a Projection System or Coordinate System must
be defined for the geographical database.
Since, the coordinate or projection system may not be used universally between different
data producers it should be considered the possibility that the information gathered for the
basin may be presented in different coordinates systems and requires a subsequent
conversion into a unique system.
To do that, the original coordinate system used in the production of digital maps and list of
X,Y must be documented.
So, for each type of information containing geographical information, the original Coordinate
system should be described.
Things that need to be answered are:
ƒ
Are coordinates presented in rectangular (Cartesian X,Y) or geographical values
(Latitude, Longitude, in decimal values or in degrees)?
Annex 1 – Data Collection Guidelines
Page 4 of Annex I
GIS environment database and report
ƒ
What’s the projection system? What are the parameters for the projection system datum, projection name, ellipsoid, major, minor axes, false easting or false northing,
etc?
e. Alphanumeric Information
One can expect that many different file containing dispersed information will be required to
get a rich characterization of the hydrologic system.
Each file or document, can be produced from different sources, for different purposes, at a
wide range of dates. So, we must be prepared for an enormous diversity of ways to
represent the information.
In many of these files, tables, list and documents, measurements and values are presented
in ways that are clear in a certain context, but may be difficult to interpret at a certain
distance.
Also, in many cases, people tend to work with short names, alias or abbreviations, normally
more friendly and simple to use.
Therefore, if a table, a list of values or the columns of a database associated with a
geographical theme is collect, a complete description of the columns or heading name and
its meaning should be obtained as often as possible.
f.
Data formats and units
For some kinds of data, some additional information may prevent further misinterpretation or
misreading, due to ambiguities.
For instance, the way dates are written can vary greatly, so each file or document containing
dates should indicate its format, for example: Day-Month-Year or Year-Month-Day, or others.
In the case of numeric values, it is also possible that even the same parameter can be
present in more than one unit or measurement system depending on purpose, analytical
technique, time of data collection, etc.
So, whenever possible, numerical fields or values present in lists, tables or databases should
have indicated the units that apply to each case.
g. Data classes, data codes. legends and data ranges
For may type of information it is common to used information classified according to some
kind of coding system.
In many cases, such classification can use some type of standard classification, well
documented and easily accessible, but in others, locally defined classes are used.
In the first case we can use as an example, the land use classification defined by FAO, that
can be applied in many different area of the world in a uniform and standardized manner. In
the second case, can be, for instance, the lithological or stratigraphic classification used in
geological maps, dependent on legends developed independently by different countries.
So, in cases where information on a particular aspect of the study area is represented by a
classification system, by a pre-defined set of numeric or alphanumeric codes, by classes
Annex 1 – Data Collection Guidelines
Page 5 of Annex I
GIS environment database and report
ranges or by a particular legend, such information should be delivered along with the
documents or files regarding it.
Also, a description of the methodology or criteria used to define each class or code is of
importance.
Annex 1 – Data Collection Guidelines
Page 6 of Annex I
GIS environment database and report
6
DATA CATEGORIES
The following list in not intended to be an exhaustive list of all the data to be collect and integrated in the project’s database. It is a list of all the
main categories of data that can be used to get a more realistic description and characterization of the basin’s model.
So, the list of categories and associated theme should be looked as: “finding the most of information related to….” each one of these subjects.
Depending on each basin’s data availability, up-to-date, readiness, quality, etc, adaptations will have to be made to account to local reality.
Topography
Contour lines
Spot Height
Digital Elevation Models in raster (grid)
format
Digital Elevation Model in TIN format
3D lines, 2D lines with associated elevation texts
3D points, 2D points with associated elevation texts, 2D texts, etc.
With information on the methodology used in its production and cells size
resolution and vertical scale (if applied)
With information on the methodology used in its production and vertical scale (if
applied)
Administrative boundaries, Territory administration
Administrative
units
or
political Polygon delimitation of “Provincias”, “Regions”, etc.
boundaries
Urban areas
Main urban areas delimitation as polygons or points of approximate location for
main urban areas and respective identification (Ciudades, Localidades, pueblos,
etc)
Demography
Demographic information
Geographic distribution of people associated with the basin’s area, the
administrative units or the urban areas.
Basin
Basin delimitation
If exists, polygon delimitation of the basin
Sub-basin delimitation
If exists, polygon delimitation of sub-basins.
Geology
Geological maps
Digital geological maps with associated legend and class description; paper
geological maps and related documents, such as lithological and structural
description of the area
Soils maps
Soils maps
Digital or paper maps of soils, with associated legend, and class’s description,
characterization and properties or other parameters such as thickness, texture,
composition, etc.
Annex 1 – Data Collection Guidelines
Page 7 of Annex I
GIS environment database and report
Land Use / Land Cover
Land Use
Land Cover
Digital or paper maps describing arrangements, activities and inputs people
undertake on the basin’s area with associated legend, type of classification used
and classes description, characterization and properties
Digital or paper maps of the observed biophysical cover of the territory in the
basin’s area, with associated legend, type of classification used and classes
description, characterization and properties
Climate
Meteorological stations location
Meteorological time series
General meteorological data
Precipitation grids or estimations
Ecology, environment and natural heritage
Natural parks and reserves
Protected habitats and/or landscapes
Economic activities
Agriculture Infrastructures
Agriculture practices
Industry
Mining activities
Energy Sources
Surface hydrology
Streams network
Water surfaces
Digital or paper maps or X,Y list with meteorological station location
Meteorological time series of data collected in meteorological station with or
without known location (rainfall, temperature, evaporation, etc)
General parameters or statistical records for meteorological characterization of
the basin’s area, for example, cumulative precipitation on the basin area for past
years and others, temperature information, actual or potential evaporation, etc
Models for average precipitation along basin area that might have been
produced with information on model’s generation.
Polygon delimitation of natural parks
Delimitation of areas of special interest for natural preservation, protected
habitats, endangered species, etc.
Irrigation areas and perimeters; irrigation channels
Crop types and agriculture practices
Location of main industrial area with potential effect on dispersed contamination
of soils and water, and possible list of components being discharged; Location
of isolated industries such as heavy industries, chemical industries and other
industrial activities with known pollutant discharges
Concession or permit area; mine’s plant or main features, water discharge
points or areas, minerals extracted, etc.
Hydro electric, thermal or other power stations
Streams line network for the water system in the basin
Lakes, natural or man-made water reservoirs, damns, broad rivers banks with
associated information such as name, water volume (measure at a certain date
or maximum capacity), water level (measured at a certain date or maximum
Annex 1 – Data Collection Guidelines
Page 8 of Annex I
GIS environment database and report
Costal or shore line
Infrastructures
Water quality monitoring stations
River gauging / monitoring stations
Water supply systems
level), etc.
Shore line
Bridges, dams, channels, pipelines; Other associated information such as
historical values of water discharges at each dam, volumes of water by use, etc.
With associated values measured at each location and if possible, with historical
values (time series) for chemical and physical parameters.
With associated information such as mean annual level, minimum and
maximum values and dates, river flow measurements, etc.
Intakes, volume, use, etc.
Groundwater
Aquifers delimitation
Aquifers geometry
Groundwater wells
Groundwater sources
Groundwater quality
Water use activities, regulatory requirements
Water quality / quantity
Pollution sources
Water treatment systems
Cartographic delimitation of aquifers or geological formations associated with
aquifers.
Measured top and bottom of aquifers formations, estimations or models of
aquifers geometry and information regarding model’s generation, aquifers
thickness, lithological logs, etc.
Municipal, industrial or agricultural boreholes with associated information , if
available, such as aquifer name, rock formation, total depth, transmissivity,
measured water levels, etc
With associated information, if available, such as aquifer, geological formation of
outcrop, flow, etc.
Groundwater analysis or time series of analysis collected at groundwater well
and sources, with information regarding various chemical and physical
parameters.
General or statistical information for water quality in the basin’s area or part of
the area, if possible, classified according to potential use (irrigation, drinking,
etc.); General or statistical information on water use in the basin’s area, with
estimations of volumes used by economic sectors or geographical regions, etc.
General or statistical information regarding diffuse or punctual pollution sources
from industrial discharges, water treatment discharges, domestic discharges,
agriculture runoff, chemical or effluent storage areas, etc
For waste or clean water with information on capacity, mean or annual flow
rates, etc.
Annex 1 – Data Collection Guidelines
Page 9 of Annex I
GIS environment database and report
7
METADATA (an MsExcel file with Metadata’s structure will be delivered for use)
Document Name / File Name:
Data Title:
Data Category:
Data Summary Description:
Data Source / Producer:
Purpose:
Year of production:
Year of last update:
Class of spatial representation:
Product Format:
ArcInfo (*.e00)
AutoCAD (*.dxf)
MsAccess (*.mdb)
Choose one of the following
NONE
VECTOR
TIN
IMAGE
GRID
LIST w/ X,Y
Choose one of the following
ArcView (*.shp)
AutoCAD (*.dwg)
Dbase (*.dbf)
TIFF Image (*.tif)
Image w/ World file
Other image format: ____________________________
Other file format:________________________________
Products format version:
ArcGis (*.mdb)
Microstation 2D (*.dgn)
MsExcel workbook (*.xls)
MapInfo (MIF/MID)
Microstation 3D (*.dgn)
MsExcel comma separated
(*.csv)
Image ErMapper (*:ecw)
Image GeoTiff (*.tif)
Other GIS Format:____________
ASCII file
Image Mr.Sid (*.sid)
Distributor:_________________________
Version (number or year)
Scale / Spatial Resolution.
Projection System
Projection Name:
Datum
Projection parameters:
Coordinate System
Elipsóide:
Ellipsoide parameters:
Choose one of the following
Annex 1 – Data Collection Guidelines
Page 10 of Annex I
GIS environment database and report
Rectangular coordinates (X,Y)
Themes or layers (for GIS or CAD files)
Name
Description
Associated attributes (For GIS files, ASCII, MsExcel, MsAccess, Dbase or other lists and tabular data)
Attribute Name
Description
Format
Units
Geographical coordinates (Lat / Long)
Type of elements
Class or Domain
Annex 1 – Data Collection Guidelines
Page 11 of Annex I
GIS environment database and report
8
DATA SHARING
A protocol for data transfer should be defined to minimize time for data accessibility, costs,
and administrative procedures.
FTP
ƒ
ƒ
ƒ
ƒ
ƒ
Easy and fast for digital data
User’s access using authentication procedures (user name and password)
Available all the time
Limited to small to medium dimension files
Dependent on the internet velocities available at each site
As soon as all procedures necessary for establishing the ftp area on the server and the
user’s names and passwords are defined, they will be communicate to each team’s
responsible.
CD-ROM or DVD
ƒ
ƒ
For digital data of larger dimension, the use of FTP for data transfer may be too slow
or even impossible.
Large data files can be recorded using media such as CR-ROM or DVD
PAPER DOCUMENTS
ƒ
9
In those cases where digital conversion of paper documents (maps, reports, listings
and other), is totally impossible to performed locally, paper documents, preferably
good quality copies, can be also send, using normal courier services.
PROPOSED SYSTEM ARCHITECTURE FOR BUILDING THE DATABASE
In order to build the system’s database that will later interact with the modelling applications,
a GIS will be used to assist in the different tasks. The proposed architecture containing all
the required functions and tools required is presented in the following table:
Software / Application
ArcEditor
Spatial Analyst
Geostatistical Analyst
ArcHydro for watershed
management
ArcHydro for
groundwater
management
MsAccess
Functionalities
Building the database: data
edition, conversion, integration,
topological validation.
Raster-based spatial modelling
and analysis
Analysis of data’s spatial
correlation, global trends,
prediction tools and surface
creation
Data processing and interface
with basin analysis models
Data processing and interface
with groundwater models
Licence mode
Single-user
Database management system
Single-user (integrated in
MsOffice professional or later
version)
Single-user (ArcGis
extension)
Single-user (ArcGis
extension)
Public domain
Public domain
Annex 1 – Data Collection Guidelines
Page 12 of Annex I
GIS environment database and report
ANNEX II
Basic ArcHydro data model framework
Component: Hydrography
Table: Bridge
Feature geometry:
Point
Description: A bridge is a structure that allows passage over an
obstacle.
Name
OBJECTID
Shape
HydroID
HydroCode
FType
Name
JunctionID
Type
Long Integer
OLE Object
Long Integer
Text
Text
Text
Long Integer
Size
4
4
30
30
100
4
Domain
-
Table: Dam
Feature geometry:
Point
Description: Is a structure that creates an artificial lake or
reservoir, by blocking a river or stream.
Name
OBJECTID
Shape
HydroID
HydroCode
FType
Name
JunctionID
Type
Long Integer
OLE Object
Long Integer
Text
Text
Text
Long Integer
Size
4
4
30
30
100
4
Domain
-
Table: HydroLine
Feature geometry:
Line
Description: Line features important for cartographic
representation, not contained in the Hydro Network feature.
Name
OBJECTID
Shape
HydroID
HydroCode
FType
Name
Shape_Length
Type
Long Integer
OLE Object
Long Integer
Text
Text
Text
Double
Size
4
4
30
30
100
8
Domain
-
Annex I1 – ArcHydro data model framework
Page 1 of Annex II
GIS environment database and report
Table: HydroResponseUnit
Feature geometry:
Polygon
Description: Describes the response of the terrain (geology, soil,
land cover, etc.), for surface water balance accounting.
Name
OBJECTID
Shape
HydroID
HydroCode
AreaSqKm
Shape_Length
Shape_Area
Type
Long Integer
OLE Object
Long Integer
Text
Double
Double
Double
Size
4
4
30
8
8
8
Domain
-
Table: MonitoringPoint
Feature geometry:
Point
Description: Location of stations that measure water quantity or
quality.
Name
OBJECTID
Shape
HydroID
HydroCode
FType
Name
JunctionID
Type
Long Integer
OLE Object
Long Integer
Text
Text
Text
Long Integer
Size
4
4
30
30
100
4
Domain
-
Table: Structure
Feature geometry:
Point
Description: Point feature that might have significant impact in the
water flow but aren't included in the bridge or dam classes (e.g. a
waterfall).
Name
Type
Size
Domain
OBJECTID
Long Integer
4
Shape
OLE Object
HydroID
Long Integer
4
HydroCode
Text
30
FType
Text
30
Name
Text
100
JunctionID
Long Integer
4
-
Annex I1 – ArcHydro data model framework
Page 2 of Annex II
GIS environment database and report
Table: UserPoint
Feature geometry:
Point
Description: Point of particular interest to the user, not included in
any other feature classes (may include locations where rivers cross
aquifer or political or administrative boundaries).
Name
Type
Size
Domain
OBJECTID
Long Integer
4
Shape
OLE Object
HydroID
Long Integer
4
HydroCode
Text
30
FType
Text
30
Name
Text
100
JunctionID
Long Integer
4
Table: Waterbody
Feature geometry:
Polygon
Description: Represents water bodies such as lakes, bays and
estuaries, water reservoirs, swaps, etc.
Name
OBJECTID
Shape
HydroID
HydroCode
FType
Name
AreaSqKm
JunctionID
Shape_Length
Shape_Area
Type
Long Integer
OLE Object
Long Integer
Text
Text
Text
Double
Long Integer
Double
Double
Size
4
4
30
30
100
8
4
8
8
Domain
-
Table: WaterDischarge
Feature geometry:
Point
Description: Points where flow is added to the stream network.
Name
OBJECTID
Shape
HydroID
HydroCode
FType
Name
JunctionID
Type
Long Integer
OLE Object
Long Integer
Text
Text
Text
Long Integer
Size
4
4
30
30
100
4
Domain
-
Annex I1 – ArcHydro data model framework
Page 3 of Annex II
GIS environment database and report
Table: WaterWithdrawal
Feature geometry:
Point
Description: Points where flow is diverted or pumped from the
stream network.
Name
OBJECTID
Shape
HydroID
HydroCode
FType
Name
JunctionID
Type
Long Integer
OLE Object
Long Integer
Text
Text
Text
Long Integer
Size
4
4
30
30
100
4
Domain
-
Component: Drainage
Table: Basin
Feature geometry:
Polygon
Description: Packaging unit; administrative unit for drainage
analysis; ≈ study area.
Attribute
Type
Size
Domain
Long
Integer
OBJECTID
4
OLE
Object
Shape
Long
Integer
HydroID
4
Text
HydroCode
30
Long
Integer
DrainID
4
Double
AreaSqKm
8
Long Integer
JunctionID
4
Long Integer
NextDownID
4
Double
Shape_Length
8
Double
Shape_Area
8
-
Annex I1 – ArcHydro data model framework
Page 4 of Annex II
GIS environment database and report
Table: Catchment
Feature geometry:
Polygon
Description: Elementary drainage areas, defined by physical rules
applied to the landscape’s configuration.
Name
Type
Size
Domain
Long Integer
OBJECTID
4
OLE Object
Shape
Long Integer
HydroID
4
Text
HydroCode
30
Long Integer
DrainID
4
Double
AreaSqKm
8
Long
Integer
JunctionID
4
Long
Integer
NextDownID
4
Double
Shape_Length
8
Double
Shape_Area
8
Table: DrainageLine
Feature geometry:
Line
Description: Line of major accumulated flow for each catchment;
drainage path.
Name
Type
Size
Domain
OBJECTID
Long Integer
4
Shape
OLE Object
HydroID
Long Integer
4
HydroCode
Text
30
DrainID
Long Integer
4
Shape_Length
Double
8
Table: DrainagePoint
Feature geometry:
Point
Description: Point representing the most downstream location
within a catchment; outlet of drainage area.
Name
Type
Size
Domain
OBJECTID
Long Integer
4
Shape
OLE Object
HydroID
Long Integer
4
HydroCode
Text
30
DrainID
Long Integer
4
JunctionID
Long Integer
4
-
Annex I1 – ArcHydro data model framework
Page 5 of Annex II
GIS environment database and report
Table: Watershed
Feature geometry:
Polygon
Description: Drainage areas defined by subdivision of the
landscape according to management criteria
Name
Type
Size
Domain
Long Integer
OBJECTID
4
OLE Object
Shape
Long Integer
HydroID
4
Text
HydroCode
30
Long Integer
DrainID
4
Double
AreaSqKm
8
Long
Integer
JunctionID
4
Long
Integer
NextDownID
4
Double
Shape_Length
8
Double
Shape_Area
8
-
Component: Network
Table: HydroEdge
Feature geometry:
Line
Description: Lines representing water flow along the network.
Name
OBJECTID
Shape
Enabled
HydroID
HydroCode
ReachCode
Name
LengthKm
LengthDown
FlowDir
FType
EdgeType
Shape_Length
Type
Long Integer
OLE Object
Integer
Long Integer
Text
Text
Text
Double
Double
Long Integer
Text
Long Integer
Double
Size
Domain
4
2
4
30
30
100
8
8
4 HydroFlowDirections
30
4 HydroEdgeType
8
-
Annex I1 – ArcHydro data model framework
Page 6 of Annex II
GIS environment database and report
Domain: HydroFlowDirections
Code
Description
0 Uninitialized
1 WithDigitized
2 AgainstDigitized
3 Indeterminate
Domain: HydroEdgeType
Code
Description
1 Flowline
2 Shoreline
Table: HydroJunction
Feature geometry:
Point
Description: Points representing important features in the network with
impact in the water movement (dam, waterfall, etc.) or a particular event
(monitoring point)
Name
Type
Size
Domain
OBJECTID
Long Integer
4
Shape
OLE Object
AncillaryRole
Integer
2 AncillaryRoleDomain
Enabled
Integer
2
HydroID
Long Integer
4
HydroCode
Text
30
NextDownID
Long Integer
4
LengthDown
Double
8
DrainArea
Double
8
FType
Text
30
Domain: AncillaryRoleDomain
Code
Description
0 None
1 Source
2 Sink
Annex I1 – ArcHydro data model framework
Page 7 of Annex II
GIS environment database and report
Table: HydroNetwork_Junctions
Feature geometry:
Point
Description: Generic junctions (points) in the network.
Name
OBJECTID
SHAPE
Enabled
Type
Long Integer
OLE Object
Integer
Size
4
2
Domain
-
Table: SchematicLink
Feature geometry:
Line
Description: Simplified version of the network, presenting the general
connectivity among water features.
Name
OBJECTID
Shape
HydroID
HydroCode
FromNodeID
ToNodeID
Shape_Length
Type
Long Integer
OLE Object
Long Integer
Text
Long Integer
Long Integer
Double
Size
4
4
30
4
4
8
Domain
-
Table: SchematicNode
Feature geometry:
Point
Description: Nodes in the schematic network, representing connection
between SchematicLinks and particular events or features (drainage outlets,
catchments centroids, etc.)
Name
Type
Size
Domain
OBJECTID
Long Integer
4
Shape
OLE Object
HydroID
Long Integer
4
HydroCode
Text
30
FeatureID
Long Integer
4
-
Annex I1 – ArcHydro data model framework
Page 8 of Annex II
GIS environment database and report
Component: Channel
Table: CrossSection
Feature geometry:
Line
Description: Is a 3D polyline feature, where each vertex in the line is defined
by coordinates (x, y, z)
Name
OBJECTID
Shape
HydroID
HydroCode
ReachCode
RiverCode
CSCode
JunctionID
CSOrigin
ProfileM
Shape_Length
Type
Long Integer
OLE Object
Long Integer
Text
Text
Text
Text
Long Integer
Text
Double
Double
Size
4
4
30
30
30
30
4
30
8
8
Domain
-
Table: ProfileLine
Feature geometry:
Description: Longitudinal view of a channel, using lines drawn parallel to
the stream flow.
Name
OBJECTID
Shape
HydroID
HydroCode
ReachCode
RiverCode
FType
ProfOrigin
Shape_Length
Type
Long Integer
OLE Object
Long Integer
Text
Text
Text
Text
Text
Double
Size
4
4
30
30
30
30
30
8
Domain
ProfileLineType
-
Domain: ProfileLineType
Code
Description
1 Thalweg
2 Bankline
3 Streamline
Annex I1 – ArcHydro data model framework
Page 9 of Annex II
GIS environment database and report
Component: Time series
Table: TimeSeries
Feature geometry:
NONE (Tabular)
Description: Tabular data (values) resulting from temporal,
historical data, measured at regular or irregular intervals.
Attribute
OBJECTID
FeatureID
TSTypeID
TSDateTime
TSValue
Type
Long Integer
Long Integer
Long Integer
Date/Time
Double
Size
4
4
4
8
8
Domain
-
Table: TSType
Feature geometry:
NONE (Tabular)
Description: Characterization of time series types by regularity,
interval, units of measurement, etc.
Attribute
OBJECTID
TSTypeID
Variable
Units
IsRegular
TSInterval
DataType
Origin
Type
Long Integer
Long Integer
Text
Text
Long Integer
Long Integer
Long Integer
Long Integer
Size
Domain
4
4
30
30
4
4 TSIntervalType
4 TSDataType
4 TSOrigins
Domain: TSDataType
Code
Description
1 Instantaneous
2 Cumulative
3 Incremental
4 Average
5 Maximum
6 Minimum
Domain: TSOrigins
Code
Description
1 Recorded
2 Generated
Annex I1 – ArcHydro data model framework
Page 10 of Annex II
GIS environment database and report
Domain: TSIntervalType
Code
Description
1 1Minute
2 2Minute
3 3Minute
4 4Minute
5 5Minute
6 10Minute
7 15Minute
8 20Minute
9 30Minute
10 1Hour
11 2Hour
12 3Hour
13 4Hour
14 6Hour
15 8Hour
16 12Hour
17 1Day
18 1Week
19 1Month
20 1Year
99 Other
HydroID Management Tables
Table: LAYERKEYTABLE
Feature geometry:
NONE (Tabular)
Attribute
Type
Size
Long
Integer
OBJECTID
4
Text
LAYERNAME
35
Text
LAYERKEY
35
Domain
-
Table: HYDROIDTABLE
Feature geometry:
Attribute
OBJECTID
LAYERKEY
HYDROID
Domain
-
NONE (Tabular)
Type
Size
Long Integer
4
Text
35
Double
8
Annex I1 – ArcHydro data model framework
Page 11 of Annex II