Download 2-3 Information on the OSPAR RID database development activities

Document related concepts

Microsoft Access wikipedia , lookup

Oracle Database wikipedia , lookup

IMDb wikipedia , lookup

Entity–attribute–value model wikipedia , lookup

Open Database Connectivity wikipedia , lookup

Extensible Storage Engine wikipedia , lookup

Concurrency control wikipedia , lookup

Microsoft Jet Database Engine wikipedia , lookup

Relational model wikipedia , lookup

Database wikipedia , lookup

Clusterpoint wikipedia , lookup

ContactPoint wikipedia , lookup

Database model wikipedia , lookup

Transcript
Baltic Marine Environment Protection Commission
Project on Development of a HELCOM Pollution Load User
System
Helsinki, Finland, 26-27 February 2014
Document title
Code
Category
Agenda Item
Submission date
Submitted by
PLUS 5-2014, 2-3
Information on the OSPAR RID database development activities
2-3
DEC
2 – Information by Secretariat and Contracting Parties
27.2.2014
Chair of HELCOM LOAD
Background
OSPAR HASEC initiated over summer 2013 a project for preparing requirements for – and proposals on how
– a modernized OSPAR RID database could be developed and for solving some issues on the present RID
database. The enclose annex 1 “Developing the Database for the Comprehensive Study on Riverine Inputs
and Direct Discharges (RID)” is the result of the consultant (QUODATA) analysis with an proposal for a
solution for the RID database. Annex 1 also included description of the present RID data model and
functionalities, the reporting templates, and which parameters that are reported.
The analysis assumes that the requirements which were selected as “need to have” by the OSPAR INPUT
group should be included as a part of a modernized RID database. The prioritized requirements are
described and included in annex 2 and can be divided in the following topics:
a. Data access
b. Data structure
c. Data import
d. Validation of imported data
e. Data export
f.
General functionalities (filer, sort and search entries; customize range of geographic, areas,
and flag individual data with a short comment)
QUODATA describe and evaluate 5 solution scenarios for the 2016 RID database:
S1: Improve existing solution. Only one organization hosts the one active (i.e. “production”) version of the
database. The host handles tasks like import into the database.
S2: Improve existing solution. All CPs can import data, which is synchronized into the web version.
S3: Database is re-developed by means of a pure web solution, so that all functions can be used from a web
browser.
S4: Close IT cooperation with HELCOM. One joint database and all data are shared.
S5: Close IT cooperation with HELCOM. There are two separate databases, data are not shared.
QUODATA evaluate these scenarios based on the assumption that the same nine work packages as
recommended should be conducted for each solution, and QUODATA has estimated the cost for the five
solution with a precision of -30 to +60 %. The recommended work packages consist of:
g. WP 1: Confirmation of database model
Page 1 of 2
PLUS 5-2014, 2-3
h. WP 2: Choice of the database platform
i.
WP 3: Definition of workflow roles and user responsibilities
j.
WP 4: Concept for IT data integrity and security
k. WP 5: Development of quality assurance procedures for data validation
l.
WP 6: Implementation of the functionalities
m. WP 7: Initial data migration and production environment
n. WP 8: Database testing and preparation for approval by OSPAR
o. WP 9: Preparing documentation
QUODATA assumes that 1 man month cost 13 600€ (160 hour pr. month, 1hour = 85€). The estimated total
cost is estimated as:
S1: 23.8-27.3 man months + maintenance for 5 years (102,400€) in total 426,080-473,680€
S2: 24.4-28.0 man months + maintenance for 5 years (100,050€) in total= 431,890-480,850€
S3: 40.4-42.3 man months+ maintenance for 5 years (94,610€) in total 644,050-675,330€
S4-S5: Not estimated but at least as high as S3, but some sharing with HELCOM should be expected, also
regarding the maintenance cost.
OSPAR HASEC 2014 will in March 2014 consider how to proceed with the modernization of the RID
database. A data task group (RID DTG) following the project gives the following advice to HASEC:
a. HASEC support to investigate the possibilities to cooperate with HELCOM with the aim of
investigating a partly common solution with HELCOM.
b. HASEC request RID DTG to describe how such an investigation could be implemented,
included a road map.
c. HASEC should indicate if they find it realistic to find at least expected 250 000 € for
developing a revised RID database solution even dividing cost with HELCOM.
d. HASEC should not at present support scenarios S1 and S2.
Solution S1 and S2 built on the existing RID system, will not change the datamodel and seem very expensive
compared with what is obtained and gained. Only S3 will start from scratch and provide a new system.
There is a strong need to search for (at least a partly) common solution with HELCOM
The LOAD chair will inform further on progress with the RID modernization project.
Action required
The Meeting is invited to:
−
take note of the provided information
−
take into account the results and progress on the RID modernization project in relation to HELCOM
PLUS
−
consider possibilities for cooperation, common developed between HELCOM PLUS and OSPAR RID
modernization projects.
Page 2 of 2
Annex 1:
Developing the Database for
the Comprehensive Study on Riverine Inputs
and Direct
Discharges (RID)
Step 6 – Proposal for a solution for the RID Database
Draft report
Imprint
QuoData GmbH
Quality Management and Statistics
Prellerstr. 14
D-01309 Dresden
Germany
Phone: +49 (0) 351 40 28867 0
Fax:
+49 (0) 351 40 28867 19
Email: [email protected]
Web:
www.quodata.de
Authors
PD Dr. habil. Steffen Uhlig
Dipl.-Phys. Christian Bläul
Dipl.-Math. Henning Baldauf
Bertrand Colson
21.02.2014
Developing the Database for the Comprehensive Study on Riverine Inputs and Direct Discharges
(RID)
Step 6 – Proposal for a solution for the RID Database
Contents
1
Introduction .................................................................................................................................... 5
2
Description of data and workflow ................................................................................................. 6
3
Assessment of the 2008 RID database ........................................................................................ 7
4
5
3.1
Database platform and software ........................................................................................... 7
3.2
Database access ................................................................................................................... 7
3.3
Database structure ................................................................................................................ 7
3.4
Data import ............................................................................................................................ 7
3.5
Data validation ....................................................................................................................... 7
3.6
Data export ............................................................................................................................ 7
3.7
Description of the database ................................................................................................... 8
Improvements to the RID database in 2014 – the 2014 RID database ...................................... 9
4.1
Background ........................................................................................................................... 9
4.2
New tables ............................................................................................................................. 9
4.3
Import module ...................................................................................................................... 10
4.4
Export module ..................................................................................................................... 10
RID database user requirements ................................................................................................ 11
5.1
General requirements .......................................................................................................... 11
5.2
Requirements of the OSPAR community ............................................................................ 11
5.2.1 Functionalities with high priority ................................................................................................. 12
5.2.2 Functionalities with low priority .................................................................................................. 13
6
Recommended database model for the 2016 RID database .................................................... 14
7
Solutions for the 2016 RID database .......................................................................................... 15
8
7.1
Scenario overview ............................................................................................................... 15
7.2
Motivation for scenario choice and detailed scenario description ....................................... 15
7.3
Advantages and disadvantages of the scenarios ................................................................ 16
Recommended organisation of the implementation of the 2016 RID database .................... 19
8.1
WP 1: Confirmation of database model ............................................................................... 19
8.2
WP 2: Choice of the database platform ............................................................................... 20
8.3
WP 3: Definition of workflow roles and user responsibilities ............................................... 20
8.4
WP 4: IT data integrity and security .................................................................................... 21
8.5
WP 5: Development of quality assurance procedures for data validation ........................... 22
8.6
WP 6: Implementation of the functionalities ........................................................................ 23
8.7
WP 7: Initial data migration and setup of production environment ...................................... 23
8.8
WP 8: Database testing and preparation for approval by OSPAR ...................................... 23
8.9
WP 9: Preparing documentation ......................................................................................... 24
8.10 Project organisation ............................................................................................................. 24
9
Budgeting of the implementation of the 2016 RID database ................................................... 25
9.1
Total costs ........................................................................................................................... 25
9.2
Annual maintenance costs of the 2016 RID database ........................................................ 28
QuoData GmbH
Page 3 of 44
Developing the Database for the Comprehensive Study on Riverine Inputs and Direct Discharges
(RID)
Step 6 – Proposal for a solution for the RID Database
10 Time table of the implementation of the 2016 RID database ................................................... 29
11 Recommended scenario .............................................................................................................. 30
11.1 Common solution with HELCOM PLC ................................................................................. 30
11.2 Independent, standalone RID database .............................................................................. 30
A
Annex ............................................................................................................................................ 32
A.1
Structure of 2008 RID database .......................................................................................... 33
A.2
Description of 2008 RID database ...................................................................................... 34
A.2.1 Structure of input tables....................................................................................................... 34
A.2.2 Main Form............................................................................................................................ 34
A.2.3 Tables and Reports ............................................................................................................. 34
A.2.4 Export/Aggregation .............................................................................................................. 40
A.2.5 Import from Excel ................................................................................................................ 41
A.2.6 Log Book.............................................................................................................................. 43
QuoData GmbH
Page 4 of 44
Developing the Database for the Comprehensive Study on Riverine Inputs and Direct Discharges
(RID)
Step 6 – Proposal for a solution for the RID Database
1 Introduction
The Comprehensive Study on Riverine Inputs and Direct Discharges (RID) is an OSPAR monitoring
programme designed to provide annual information on the pressures and development of selected
pollutants in the OSPAR maritime area (North-East Atlantic). On a mandatory basis the
concentrations and loads of the determinants cadmium, copper, lead, mercury and zinc, nitrogen
and phosphorus species, and suspended particulate matter are to be monitored and reported by
each OSPAR Contracting Party.
RID data constitute an important part of the OSPAR Information System for the purpose of marine
environmental assessment. RID data are also important for the implementation of the EU Marine
Strategy Framework Directive (MSFD). The objectives of the RID study are set out in the RID
Principles.
As a basis for the assessment of the RID data, the RID database was designed and implemented by
QuoData in 2008. The aim of the RID database is the structured storage of the annual RID data and
the harmonization of the national RID data. An overview of the 2008 database is given in section 3.
However, there is a need to improve the database structure due to some shortcomings of the 2008
RID database. These shortcomings result from the fact that this first database was conceived as the
first step of a step-wise improvement process. While some improvements are currently carried out
(see section 4), further steps will be necessary. Suggestions can be found in this document. This
report describes and compares proposals regarding a long-term solution for the RID database.
Required features and functionalities were determined on the basis of the evaluation of a
questionnaire and prioritized by the RID Database Task Group (RID DTG). These required features
and functionalities are summarized in section 5. As completely new functionalities are desired, the
database model needs to be expanded. Section 6 presents the recommended revision of the 2008
RID database model, even though the changes are minimal.
Different solutions for the future RID database (in this report it is referred to as “2016 RID database”
as it is assumed that the project of implementing the 2016 RID database will start in 2015 and it will
be completed in mid-2016) are compared in section 7. Time and financial resources as well as
specifications covering data submission, procedures for comprehensive quality assurance, data
access, security and requirements for the web interface are taken into account. In section 8, a project
plan for implementing the 2016 RID database is presented. A budget draft and a road map are given
in sections 9 and 10.
QuoData GmbH
Page 5 of 44
Developing the Database for the Comprehensive Study on Riverine Inputs and Direct Discharges
(RID)
Step 6 – Proposal for a solution for the RID Database
2 Description of data and workflow
This section describes how RID data are generated and distributed from an organizational
perspective. For a technical description, please refer to section 3.
All Contracting Parties (CP) monitor the pollutants in their rivers, as specified in the RID principles.
For most CPs, monitoring is carried out at least at a yearly basis. The CPs calculate input into a
maritime area on the basis of individual water samples and flow measurements. Specifically, in the
OSPAR context, the word “input” refers to mass amount of substance carried by water into the
maritime area. All input is then aggregated and an annual level is estimated and reported to the
OSPAR Secretariat. Each CP then submits a Text report containing meta data and a Data report
consisting of Excel files with a fixed structure. This data is then imported into the RID database. This
process takes place every year.
In order to fulfill data requests by so-called ‘data users’, data can be extracted from the database.
The same is done to prepare the annual Data Report, which contains the input for one year as well as
input time series.
QuoData GmbH
Page 6 of 44
Developing the Database for the Comprehensive Study on Riverine Inputs and Direct Discharges
(RID)
Step 6 – Proposal for a solution for the RID Database
3 Assessment of the 2008 RID database
The basic idea of the 2008 RID database was to create a database in order to manage the data from
all Excel files of riverine inputs and direct discharges since 1989 in a consistent way.
3.1
Database platform and software
The 2008 database was built with Microsoft Access and with functions programmed in Visual Basic
for Applications.
3.2
Database access
The data is only available upon request to the RID Data Centre. The database is operated by the
Norwegian Institute for Agricultural and Environmental Research (Bioforsk). The national RID data of
the OSPAR Contracting Parties are managed by Bioforsk, i.e. data import, quality assurance and data
export is carried out by Bioforsk1.
A database directly accessible by the Contracting Parties was not within the scope in 2008.
3.3
Database structure
The data structure was conceived for storing data from the previously used Excel detailed sheets
(Tables 5a-c, 6a-c, 7, 8, 9) and the annual overview tables (AA Tables 1a – 4b).
An overview of the complete structure of the database can be found in the Annex A.1. A more
general data structure was not requested in 2008. Accordingly, it was accepted that data e.g. from
aquaculture discharges, urban runoffs as well as inputs from unmonitored areas would be taken into
consideration only at a later stage.
Some very important technical improvements to the 2008 RID database structure are currently being
implemented by QuoData. Detailed information on this work will be found in section 4.
3.4
Data import
Annual data can be imported to the Access database using Excel files. The database can create Excel
templates2 for data entry. The Excel files filled out by the Contracting Parties are then uploaded to
the database.
3.5
Data validation
The OSPAR Commission’s intention was to implement quality assurance procedures at a later stage.
Accordingly, automated validation tests identifying possible problems in the imported data are not
part of the 2008 RID database. Thus, manual checks need to be carried out.
3.6
Data export
With the data export module, all tables can be exported. A selected set of raw3 data for one OSPAR
Contracting Party can be exported into Excel. No further export options were requested.
1
2
Contracting parties don’t have direct access to the database, but infrequently, copies are sent to data providers.
files with the correct structure but no data content
QuoData GmbH
Page 7 of 44
Developing the Database for the Comprehensive Study on Riverine Inputs and Direct Discharges
(RID)
Step 6 – Proposal for a solution for the RID Database
Depending on the different versions of Excel, Access and Windows used, substantial errors have
occurred, e.g. data export failed altogether or exported values were erroneous. These issues are
addressed in Step 4 of this project and described in section 4.
To visualize and analyse selected data, export to RTrend was also implemented in the database.
RTrend, developed by QuoData, is a software program that allows the adjustment and statistical
evaluation of riverine loads.
3.7
Description of the database
A detailed description of the 2008 database general functionalities is given in Annex A.2.
3
Here raw means “as imported”, i.e. without further calculations
QuoData GmbH
Page 8 of 44
Developing the Database for the Comprehensive Study on Riverine Inputs and Direct Discharges
(RID)
Step 6 – Proposal for a solution for the RID Database
4 Improvements to the RID database in 2014 – the 2014 RID
database
4.1
Background
The RID Review Group recommended changing the 2008 RID database structure in order to
harmonize the database4 and to improve data export possibilities.
QuoData was contracted by the OSPAR secretariat to implement the necessary changes in Step 4 of
this project. It must be noted that the implementation’s start was delayed by the OSPAR secretariat
until the meeting of the Working Group on Inputs to the Marine Environment on 28-30 January 2014
(“INPUT meeting”) in order make decisions with respect to possible consequences. Currently,
QuoData is working on the implementation. The work will be completed at the end of March 2014.
The necessary changes include new tables and new import/export functionalities.
4.2
New tables
A set of new tables will be included in the RID database. In the first place, the restructuring addresses
the reporting of direct discharges. The data of aquaculture discharges and other direct discharges will
be added to the RID database. In addition, the restructuring addresses the reporting of riverine
inputs. Inputs of all monitored rivers will be pooled into one table. Therefore, main and tributary
rivers are no longer reported in separate tables.
The improved structure of the RID database is summarised in the following table.
Table 2008 RID database
Table 4-1:
4
2014 RID database
5a
Sewage effluents
5b
Industrial effluents
5c
Total direct discharges Aquaculture discharges
5d
--
Other direct discharges
5e
--
Total direct discharges
6a
Main rivers
Monitored rivers
6b
Tributary rivers
Unmonitored areas
6c
Total riverine inputs
Total riverine inputs
(monitored + unmonitored)
7
Contaminant concentrations
8
Limits of detection and/or quantification
9
Catchment-dependent information
Improvements of RID database structure revised in consultation with RID Database Task
Group
removing the inconsistencies in data reporting across the CPs and increasing comparability of data from different countries
QuoData GmbH
Page 9 of 44
Developing the Database for the Comprehensive Study on Riverine Inputs and Direct Discharges
(RID)
Step 6 – Proposal for a solution for the RID Database
In order to ensure consistency in the data series, it will be necessary to carry out some restructuring
and assignment adjustments in the historical data. Previously, data reporting was not 100%
harmonized. For instance, there were differences in the reporting of unmonitored areas. During the
INPUT meeting, several Contracting Parties preferred a resubmission of national RID data in order to
avoid wrong data assignments.
4.3
Import module
The import module will be adapted so that the reported subtotals and totals for all Contracting
Parties can be imported. This adaptation improves data comparability and ensures that the exported
data reflect the actual reported inputs into the sea.
4.4
Export module
The following export functions will be added or improved

Export of reported raw data of each Contracting Party

Export of aggregated data per country or OSPAR Region

Export of time series based on single data tables (e.g. for one constituent in one river for
several years) or aggregated data (e.g. total loads to the sea from a Contracting Party or a
maritime area)

Export of annual overview tables generated on the basis of the actual reported totals.
QuoData GmbH
Page 10 of 44
Developing the Database for the Comprehensive Study on Riverine Inputs and Direct Discharges
(RID)
Step 6 – Proposal for a solution for the RID Database
5 RID database user requirements
5.1
General requirements
There are implicit requirements common to the majority of software projects. In particular, even
though the following points were not explicitly mentioned in the questionnaire, it is understood that
the RID database should

be easy and intuitive to use,

be flexible, i.e. it is easy to carry out enhancements,

provide detailed error messages,

be robust, i.e. perform well not only under ordinary conditions,

be secure,

be quick, i.e. all functionalities should work fast.
5.2
Requirements of the OSPAR community
To find out about the requirements of the potential end users (‘data providers’ as well as ‘data
users’) a questionnaire was developed with the support of OSPAR in step 1 of this project. Experts
active in OSPAR Contracting Parties were asked to prioritize the different functionalities of an
improved RID database. (For a detailed analysis and evaluation of the questionnaire results, see
document INPUT 14/4/3-Add.1).
The online questionnaires were sent to a total of 86 respondents in October 2013. The final deadline
for filling it in was mid-November 2013. There were a total of 35 replies5, reflecting the expectations
across the OSPAR community. The prioritized functionalities were graded into the following three
categories based on the frequency of answers in the questionnaire: "need to have", "nice to have"
and "minority wish" (for detailed information see document INPUT 14/4/3-Add.2a). However, the
RID Database Task Group (DTG) classified the "need to have" functionalities into basic and less basic
ones. This classification leads to the final proposed prioritisation.
Priority
What functionalities belong to this category?
high
All basic functions
medium
All other functions which are not considered basic, but fairly easy to implement.
low
Functions which are not within the scope of RID and might increase the report burden
on each OSPAR Contracting Party.
Table 5-1:
Priority criteria for the functionalities of the RID database
On the basis of the questionnaire results and the DTG prioritisation, the high-, medium- and lowpriority functionalities are summarized separately in the following subsections.
5
Replies from Belgium, Denmark, France, Germany, Ireland, The Netherlands, Norway, Spain, Sweden, the United Kingdom,
and additionally from ICES and the OSPAR Secretariat.
QuoData GmbH
Page 11 of 44
Developing the Database for the Comprehensive Study on Riverine Inputs and Direct Discharges
(RID)
Step 6 – Proposal for a solution for the RID Database
5.2.1
5.2.1.1
Functionalities with high priority
Data access
All respondents but one of the questionnaire expressed the opinion that the database should be
accessible by a web browser. Hence, the 2016 RID database needs to provide a web-based access to
the RID data. This web-based access does not imply a web-only solution, but it allows extending the
Access database and making its data available for web browsers.
5.2.1.2
Data structure
As extracted from the replies to the questionnaire, the improved data structure (see section 4.2)
needs to be extended only in two aspects. The first extension desired by the data users concerns the
storage of the measurement uncertainty for the annually aggregated loads and the second one
concerns the storage of the limit of detection and limit of quantification for all reported and
measured data (not only for the sewage effluents, industrial effluents and riverine inputs per year,
catchment area and Contracting Party as before).
5.2.1.3
Data import
No significant changes are required regarding data submission. The import of data will continue to
take place on the basis of Excel files. Contracting Parties can alternatively use the CSV format for
reporting.
However, a new requirement is the possibility of importing partial RID datasets. In the case of the
submission of such incomplete datasets, the same validation rules should be provided as in the case
of a complete submission.
5.2.1.4
Validation of imported data
Quality control procedures should be implemented in the 2016 RID database in order to reduce
import errors. After data import, automatic tests for the identification of
-
missing values,
-
invalid values (e.g. wrong units),
-
suspicious values (e.g. outliers) and
-
too many significant figures
should be carried out in the RID database. Subsequently, a status report of these quality control
procedures should be provided. The validation should only issue warnings about potential problems,
but not prevent users from importing values.
5.2.1.5
Data export
Data export to both Excel and CSV formats should be possible (the user chooses one of the formats).
5.2.1.6
General functionalities
There are three features in this questionnaire section that should be included in the 2016 RID
database.
First, it should be possible to filter, sort and search entries (e.g. determinands, years, catchment
areas) in the different parts of the RID tables.
QuoData GmbH
Page 12 of 44
Developing the Database for the Comprehensive Study on Riverine Inputs and Direct Discharges
(RID)
Step 6 – Proposal for a solution for the RID Database
Second, it should be possible to customize the range of geographic areas (e.g. across sub-regions)
from which data are to be aggregated in data products.
Thirdly, it should be possible to flag an individual data item with a short comment or an external
reference.
5.2.1.7
Functionalities with medium priority
The RID Database Task Group considers two functionalities to be important but not essential:
(1) The storage of the measurement uncertainty at the level of load calculation should be
included in the 2016 RID database. It must be taken into account that the quantification of
measurement uncertainties for such environmental monitoring programs remains a difficult
scientific task.
(2) It would be a nice feature if the RID database were able to link to a GIS system to create
maps by means of unique georeferencing identifiers stored in the RID database.
5.2.2
Functionalities with low priority
Some functionalities requested by questionnaire respondents are not considered as prioritised tasks
by the RID Database Task Group, e.g.:
(1) The storage of monitoring data used for RID calculations, as this lies outside the scope of
RID.
(2) The possibility of data exchange between the RID database and other databases, as the
differences with regard to database structure and data format of the different databases are
too big.
(3) A separate module for trend analysis (e.g. RTrend software), as users should use their
preferred trend analysis software.
(4) The possibility to link with the streamlined text reporting format or a summary abstract.
QuoData GmbH
Page 13 of 44
Developing the Database for the Comprehensive Study on Riverine Inputs and Direct Discharges
(RID)
Step 6 – Proposal for a solution for the RID Database
6 Recommended database model for the 2016 RID database
The database model, also called database structure defines which tables the database uses
internally, and which fields they have. In layman’s terms, a field is a table column. The table contents
are not part of the structure. Typically, the database structure is not directly visible to any user of the
database.
The pre-2014 database structure is given in annex A.1. There were no structure changes required for
Step 4 (immediate improvements to RID database, section 4) of this project.
There are 2 high priority requirements listed in section 5 that require minor database structure
changes:
1. Storage of the measurement uncertainty for the annual aggregated loads requires a new field
of type float in “Table 5 and 6”.
2. The storage of the limit of detection and limit of quantification for all reported and measured
data requires two new fields of type float in "Table 5 and 6".
There are also 2 medium priority requirements listed in section 5 that require straightforward
additions of new fields.
Later in this document, a scenario will be presented in which a re-development of the RID database is
recommended. However, QuoData suggests keeping the 2008 database structure or using a very
similar one (e.g. minor field type changes), as it has proved to be useful. Typically, a developer has
full control over the database structure. However, depending on the development tools or
frameworks used for implementation, in some cases considerable changes to the structure might be
necessary in order to speed up development and thus reduce costs. Accordingly, a call for tenders
should not make this structure mandatory, but rather put it forward as an example.
QuoData GmbH
Page 14 of 44
Developing the Database for the Comprehensive Study on Riverine Inputs and Direct Discharges
(RID)
Step 6 – Proposal for a solution for the RID Database
7 Solutions for the 2016 RID database
In the following section, the scenarios for the 2016 RID database are laid out. First, an overview is
given. Then the motivation for choosing these specific scenarios and excluding others is presented.
Finally, advantages and disadvantages are discussed.
7.1
Scenario overview
The following five scenarios are in line for the 2016 RID database.
S1. Improve existing solution. Only one organization hosts the one active (i.e. “production”)
version of the database. The host handles tasks like import into the database.
S2. Improve existing solution. All CPs can import data, which is synchronized into the web version.
S3. Database is re-developed by means of a pure web solution, so that all functions can be used
from a web browser.
S4. Close IT cooperation with HELCOM. One joint database and all data are shared.
S5. Close IT cooperation with HELCOM. There are two separate databases, data are not shared.
7.2
Motivation for scenario choice and detailed scenario description
Scenarios S1 and S2 are based on the existing solution. Because it already contains mechanisms for
data import and export, the amount of computer code to be written can be kept small if these
mechanisms continue to be used. This implies that as few functionalities as possible are made
available via a browser. Specifically, scenarios S1 and S2 do not allow the user to import data via the
web, but do contain a table view and Excel/CSV export functionality. This implies that on the web
there is only one user role: the ‘data user’. Users of this type cannot change the database.
S3, on the other hand, allows the user to perform all tasks within the web browser. This allows for
better accessibility, exposure to a wider audience and a longer expected lifetime6. In contrast to S1,
only open-source software is considered to reduce OSPAR’s costs as well as dependence on vendors.
In our opinion, the restriction to open source isn’t a factor in one-time costs, but closed-source
components reduce options for future enhancements and potentially give rise to license costs. As a
result, open source is cheaper to maintain.
Scenarios S4 and S5 are motivated by keeping compatibility high and costs low, as many outcomes of
the HELCOM PLC efforts are expected to be usable for OSPAR. At the moment of writing this
document, the HELCOM database structure is not defined yet. Neither the likelihood of cooperation
nor the resulting costs of cooperation between the two conventions can be assessed. In S4, only one
joint database for both HELCOM and OSPAR exists, meaning both conventions import their data into
the exact same database. Access rules could prevent read or write access by role. In S5, only
structure and computer code is shared, but not the data. This results in much less need for
6
before the database is replaced because of IT reasons or immense cost increases due to scarcity of experts familiar with the
IT environment
QuoData GmbH
Page 15 of 44
Developing the Database for the Comprehensive Study on Riverine Inputs and Direct Discharges
(RID)
Step 6 – Proposal for a solution for the RID Database
requirements’ and definitions’ harmonization. Functions specific to HELCOM would be hidden in the
OSPAR copy of the database.
7.3
Advantages and disadvantages of the scenarios
The table on the next page lists the pros and cons for each of the five scenarios. The first column
contains two types of identifiers: rows whose identifier starts with the letter M are about the
maintenance aspects. Those starting with letter O are about one-time efforts.
The abbreviation WP stands for work package. The nine proposed work packages are described in
section 8. Some table cells contain Euro-based cost estimates, which will be broken down and
explained in section 9.
A recommendation on the scenario to choose is given in section 11.
QuoData GmbH
Page 16 of 44
Developing the Database for the Comprehensive Study on Riverine Inputs and Direct Discharges
(RID)
Step 6 – Proposal for a solution for the RID Database
Scenario S1
S2
S3
S4
S5
high
medium
highest
high
WP
# Requirements to be compared
M1
Implementation costs for future high
requirements
Yearly data submission:
reoccurring effort for CPs and
M2 OSPAR secretary to keep data up
to date; waiting time for
feedback from the data
validation
high, due to email
exchange and slow
feedback (S1), need
for synchronization
(S2), both S1-S2:
active host needed
small, direct online submission,
no active host for most data
requests
Required effort for planning
O1 stage (everything that happens
before the IT implementation
starts)
medium
high
Development and
implementation work including
O2 documentation (web server,
user interface, database
management system)
high
O3
One-time network security
efforts for the database
74'800 € 80'240 €
high
little effort, because high
only of only one webaccessible role:
data user
no
no
very small very small medium
administrative expenditure to
none
harmonize DB
O6
model/backend/frontend choice
with HELCOM
QuoData GmbH
76'160 € - 106'080
€81'600 €
112'880 €
low-highest (depends
on cooperation and
cost-sharing
1-5
conditions)
very high highest, but price
shared with HELCOM
180'880 € 187'680 € 348'160 € 6-9
372'640 €
223'040 € 231'200 €
Does reporting format for data no
O4 suppliers change, do CPs need to
adjust?
Effort to plan and program
O5 workflows, user rights and
privileges
medium
none
none
high, but shared
4,6
maybe
a little bit more than
S3, but shared
3
high
1,2
Page 17 of 44
Developing the Database for the Comprehensive Study on Riverine Inputs and Direct Discharges
(RID)
Step 6 – Proposal for a solution for the RID Database
Existing Access application can
partly be reused (smaller
O7
development, testing and
documentation effort)
O8
yes
Functionality available to remote little
users with only a web browser
yes
only data no
model
little
all
all
a little
more
high, may be shared
(depends on
similarity of
workflows)
Expected effort for OSPAR to
approve delivery, i.e. after IT
O9 implementation and testing by
contractor, with the aim to
switch to production
little
little
5’250 € 10’500 €
5’500 € 10’850 €
Effort for migration of existing
data to the new database
little
O10
Table 7-1:
7’650 € 15’050 €
medium
medium-high
8
7
Advantages and disadvantages of the five scenarios for the 2016 RID database
QuoData GmbH
Page 18 of 44
Developing the Database for the Comprehensive Study on Riverine Inputs and Direct Discharges
(RID)
Step 6 – Proposal for a solution for the RID Database
8 Recommended organisation of the implementation of the 2016
RID database
The complete project of the implementation of the 2016 RID database needs to be split up into
different subprojects or work packages. Each work package (WP) covers a definite area of
responsibilities associated with specific tasks. Each work package provides an output for the
subsequent work package.
While the details of the tasks themselves will be different from one scenario to the next, the overall
list below is independent from the chosen scenario (see section 7). In order to program the 2016 RID
database, QuoData recommends the following nine work packages:
WP 1: Confirmation of database model
WP 2: Choice of the database platform
WP 3: Definition of workflow roles and user responsibilities
WP 4: Concept for IT data integrity and security
WP 5: Development of quality assurance procedures for data validation
WP 6: Implementation of the functionalities
WP 7: Initial data migration and production environment
WP 8: Database testing and preparation for approval by OSPAR
WP 9: Preparing documentation
In the following, the separate work packages are described.
8.1
WP 1: Confirmation of database model
The aim of the first work package is to design the final database model. It will be necessary to
compare the proposed solution for the RID database outlined in this report with the objectives at the
time of the project implementation. It is not unlikely that the requirements concerning the RID
program will change.
It will be necessary to check whether the user requirements have changed, whether new input tables
or additional information are of interest to OSPAR, whether there is obsolete information stored in
the RID database, whether import/export functions should be expanded or additional functionalities
are desired, among other things.
If there will be no changes or supplements the database model proposed in section 6 may be used.
Otherwise the database model needs to be adapted accordingly.
QuoData GmbH
Page 19 of 44
Developing the Database for the Comprehensive Study on Riverine Inputs and Direct Discharges
(RID)
Step 6 – Proposal for a solution for the RID Database
8.2
WP 2: Choice of the database platform
This work packages will deliver the common environment for the RID database and the decision on
the frontend and backend software system used for the implementation of the 2016 RID database.
The general term “database” can be split into the fields of data storage and user interaction. The first
is often called a backend or Database Management Systems (DBMS), and only requires attention
from IT administrators during initial setup and during emergencies. The latter is the frontend, and
typically the only part any user experiences. A typical frontend can work together with several
backends and vice versa.
An appropriate backend and frontend need to be chosen on the basis of the requirements for the
2016 RID database, the specified database model (see WP 1), the scenario (S1 – S5), the available
budget and the experiences of the software development team.
If the decision to use scenario 1 or 2 (expansion of the 2008/2014 RID database) is taken, the
backend MS Access will be applicable. Otherwise it has to be decided which is the optimum backend.
This could be, for example:

MySQL (widely-used open source)

PostgreSQL (open source released by a community of developers called PostgreSQL Global
Development Group)

MS SQL Server (powerful backend produced by Microsoft)
Additionally, a framework or RAD environment has to be chosen for the frontend.
For a detailed comparison of the different backend and frontend considerations, the reader is
referred to the Step 2 report of this project.
8.3
WP 3: Definition of workflow roles and user responsibilities
This work package focuses primarily on the data access rights, also called user roles. It is important to
have a clear understanding of all the tasks and processes that will play a role during the production
stage (i.e. once all one-time tasks such as implementation have been completed) of the RID database
and which persons will be responsible for each task. The workflow proceeds as follows:
7
1. The input tables are filled in by entering data into the Excel templates.
2. The completed Excel templates are imported into the RID database.
3. The submitted RID data is released after a quality check of the data and corrections where
required.
4. The released RID data can be exported, e.g. for preparing the annual Data Report or for
assessing time series of inputs.
In this work package it is necessary to define:

Who is allowed to import data?
7
it is acknowledged that other formats might be required to be supported as well, but the authors of this document will use
Excel as an example format during this discussion
QuoData GmbH
Page 20 of 44
Developing the Database for the Comprehensive Study on Riverine Inputs and Direct Discharges
(RID)
Step 6 – Proposal for a solution for the RID Database

Who is responsible for the validation of the data to be imported? Who carries out the quality
assurance procedures of the data to be imported?

What is the communication process, e.g. if there are inconsistencies in the data to be
imported? Who contacts the contracting parties, and in which situations?

Who is responsible for the final release of imported data?

Who is allowed to export which data?

Who is allowed to read which data?

Which data, if any, are public?
Based on the corresponding answers user groups are assigned which have specific responsibilities
and privileges. Moreover it is necessary to define



Who is responsible for the administration of content?
o
maintenance and management of the data tables,
o
adaptation of Excel templates,
o
data recovery etc...
Who is responsible for the user management?
o
Who decides on “who can do what”?
o
Who sets up new users? Who changes user privileges?
Who is responsible for the technical administration?
o
database maintenance (there are a lot of tools for an effective system and database
maintenance),
o
data backup,
o
data security (protection from malicious actions),
o
emergency data recovery etc…
For example, there might be five user groups: OSPAR secretariat, RID Database Task Group, CPs, an
external contractor for the technical administration and the public.
The concrete responsibilities and tasks need to be set down in a written agreement.
8.4
WP 4: IT data integrity and security
The RID data might be accidently deleted, lost due to hardware or software problems, or subjected
to malicious actions. All of these pose a threat to the data integrity and security. The objective of this
work package is to analyse the requirements and possible methods to ensure the security and
integrity of the RID data. It is necessary to define the procedures to maintain, backup, recover and
secure the data stored in the RID database in order to guarantee no data loss and data abuse.
This work package also includes reaching a decision as to where the production server will be hosted
(e.g. OSPAR secretariat, HELCOM secretariat in case of scenario S4 or S5, ICES, OSPAR Working group,
EMEP, Research Institutes as Bioforsk, external company)
QuoData GmbH
Page 21 of 44
Developing the Database for the Comprehensive Study on Riverine Inputs and Direct Discharges
(RID)
Step 6 – Proposal for a solution for the RID Database
8.5
WP 5: Development of quality assurance procedures for data validation
This work package deals with the validation of the data to be imported and the development of
suitable quality assurance (QA) procedures in order to guarantee assessment is carried out on the
basis of reliable data. After all checks have been completed, the data can be released. Data can be
read and aggregated results calculated only after the data has been released.
The completed Excel spreadsheets are electronically imported into the RID database by the
responsible person. It goes without saying that the data to be imported is checked before importing
by the Contracting Parties. Nevertheless, automatic QA tests need to be performed in order to
identify possible problems and inconsistencies. These QA tests will include tests for the identification
of

missing values,
A test is performed to determine that a complete set of data has been imported.
Gaps or missing data need to be identified. The user must provide information as to
whether the missing data will be submitted later, or are definitively not available.

invalid values,
For example a test checking for wrong units, or a test checking for negative entries.

suspicious values,
For example several statistical outlier tests are performed to identify values which
are too low or too high.

values with too few or too many significant figures,
A test is performed to identify results with too few or too many significant figures, as
such values can indicate incorrect precision. More importantly, it is necessary to
decide how such values will be treated. In particular, the treatment of too many
significant figures is important as rounding rules are necessary for display. It should
be mentioned that the import file format may play a role here, as e.g. Excel uses both
a display format and an internal float value. They will probably have to be used
together at all times, as the internal value always has a large number of decimal
places.

duplicate values,
A test is performed to verify that the same data were not imported twice, including
for different determinands or years. Procedures to determine when it is permitted to
overwrite existing data need to be defined.

out-of-range values,
For example a QA test checks whether the range of the loads is too wide or
implausible given the LOD, i.e. whether the range of the load (upper value minus
lower value) coincides with the limit of detection based on the run-off.

inconsistent values.
For example a QA test checks whether the concentration and load figures given in the
database are sufficiently consistent, i.e. whether there is a clear relationship between
the mean concentration and mean load based on the run-off.
QuoData GmbH
Page 22 of 44
Developing the Database for the Comprehensive Study on Riverine Inputs and Direct Discharges
(RID)
Step 6 – Proposal for a solution for the RID Database
This work package requires the theoretical specification of the QA tests to be included. It is necessary
to define which procedures will be applied and how the procedures will work.
The output of all QA tests needs to be clarified, both in prose description and with examples. A status
report with detailed information on possible problems and inconsistencies needs to be drafted. This
status report will summarize the results of the QA tests and, if necessary, describe any further steps
necessary to eliminate the problems. It has to be noted that, for all QA tests, it will be necessary to
clarify whether the user is allowed to import despite failed tests.
8.6
WP 6: Implementation of the functionalities
This work package is concerned with the IT implementation of the 2016 RID database. Specifically,
the following functions need to be implemented:

Database structure (e.g. scheme, database tables, columns, keys and indexes)

Data import

QA tests

Data export

Functionalities desired by the users (e.g. searching, sorting, filtering)

Warning and help messages

Access facilities of the different user groups

Automatic logging of the database accesses and changes

Backup and recovery functions
Afterwards, the proper functioning must be extensively tested. Ideally, this is in part done via
automated tests that can later be re-run in case of a new functionality being implemented.
8.7
WP 7: Initial data migration and setup of production environment
The main purpose of this work package is the transmission of the whole RID data (i.e. database
content) to the new platform. This work package includes the installation of the production server,
the transmission of data from the development environment (local computers) to the production
server and the submission of the existing RID data from the 2008/2014 Access database to the newly
developed one. Possible data restructuring must be taken into account. Additionally, a backup of the
“old” database needs to be performed.
8.8
WP 8: Database testing and preparation for approval by OSPAR
This work package aims to verify and validate the completely developed RID database. It will be
necessary to evaluate whether the features and functionalities which have been implemented in the
database meet the specified requirements as well as the users’ needs and whether they have been
implemented correctly. Whenever a shortcoming occurs it will need to be resolved by the software
development team. The database will not be delivered until all shortcomings have been eliminated.
However, the approval of the delivery of the software system also depends on the availability of
documentation, as described in the next work package.
QuoData GmbH
Page 23 of 44
Developing the Database for the Comprehensive Study on Riverine Inputs and Direct Discharges
(RID)
Step 6 – Proposal for a solution for the RID Database
8.9
WP 9: Preparing documentation
The final work package is concerned with the production of detailed documentation. On the one
hand the documentation must include a user’s manual, and, on the other hand, the program’s source
code documentation. The former will describe comprehensively the usage of the 2016 RID database
including all tasks and workflows defined in WP 3. The second one ensures the possibility to transfer
the responsibility for the technical part of the database to a different contractor from the software
development team. This also ensures the possibility to further modify and extend the RID database
over many years independently of the developer.
8.10 Project organisation
It is a great advantage for the project’s success if there is a project manager overseeing the software
developer’s side. This person plans and manages all work packages which are the responsibility of the
software development team and coordinates the various project activities between OSPAR, the RID
Database Task Group, and the software development team to ensure that WPs are completed on
time, to specification and within budget.
The budget for the project manager was increased in comparison to the Step 2 report, as we consider
the RID DB to require slightly more managing than average software projects.
QuoData GmbH
Page 24 of 44
Developing the Database for the Comprehensive Study on Riverine Inputs and Direct Discharges
(RID)
Step 6 – Proposal for a solution for the RID Database
9 Budgeting of the implementation of the 2016 RID database
9.1
Total costs
For scenarios S1-S3, the costs based on the priorities suggested by the RID Database Task Group (RID
DTG) are presented in the following table. The costs are set out for each work package, for the
project management and for the maintenance. As a further classification, the costs are set out for
the case that
a. Only high priority requirements are implemented.
b. High and medium priority requirements are implemented, and the option for
adding low priority requirements at a later stage is considered.
Please note that the numbers below carry an uncertainty of -30% to +60%. The estimation was based
on man months. To derive Euro numbers, it is assumed that one man month corresponds to 160 h
and 85 €/hour. The last number is an average based on the different salaries of the expertise fields
expected to be necessary, e.g. project management and IT expert oversight are more cost intense
than basic IT tasks, document copy-editing and layouting.
Scenarios S4 and S5 are not considered in the table. This is in part because of the temporal overlap
between the HELCOM PLC and OSPAR RID decision-making and the current uncertainties regarding
the HELCOM PLC project development. The costs of both S4 and S5 are expected to be at least as
high as those of S3, but there is a saving potential as costs can be split between HELCOM and OSPAR.
However, high harmonization costs are to be expected and their estimation carries a high
uncertainty.
The estimation uses on a top-down approach based on experience of similar projects and
implementation tasks. The cost estimation presented here follows lays on the same foundation as
the Step 2 report calculation. However, in Step 2 the IT security planning was not considered. Also, a
different grouping was used in Step 2 than in this report. Specifically, to avoid misunderstandings,
the presentation in this report is based on work packages defined in much more detail then the
general categories used in Step 2.
QuoData GmbH
Page 25 of 44
Developing the Database for the Comprehensive Study on Riverine Inputs and Direct Discharges (RID)
Step 6 – Proposal for a solution for the RID Database
Scenario
S1
S2
S3
Improve existing ACCESS solution and
only one organization can import data
Improve existing ACCESS solution and
all CPs can import data
Database is re-developed by means of
a pure web solution
Cost item
WP 1: Confirmation of
database model
a. 0.6 man months =
8'160 €
a. 0.6 man months =
8'160 €
a. 0.6 man months =
8'160 €
b. 0.6 man months =
8'160 €
b. 0.6 man months =
8'160 €
b. 0.6 man months =
8'160 €
WP 2: Choice of the database
platform
0.2 man months =
2'720 €
0.2 man months =
2'720 €
0.6 man months =
8'160 €
WP 3: Definition of workflow
roles and user responsibilities
0.5 man months =
6'800 €
0.5 man months =
6'800 €
0.9 man months =
12'240 €
WP 4: IT data integrity and
security
1.8 man months =
24'480 €
1.9 man months =
25'840 €
2.8 man months =
38'080 €
WP 5: Development of QA
procedures for data
validation
a. 2.4 man months =
32'640 €
a. 2.4 man months =
32'640 €
a. 2.9 man months =
39'440 €
b. 2.8 man months =
38'080 €
b. 2.8 man months =
38'080 €
b. 3.4 man months =
46'240 €
WP 6: Implementation of the
functionalities
a. 8.1 man months = 110'160 €
a. 8.4 man months = 114'240 €
a. 16.1 man months = 218'960 €
b. 10
b. 10.4 man months = 141'440 €
b. 17
WP 7: Initial data migration
and setup of production env.
man months = 136'000 €
0.4 man months =
5'440 €
0.4 man months =
5'440 €
WP 8: Database testing and
preparation for approval
(contains both OSPAR and
contractor costs)
a. 2.4 man months =
32'640 €
a. 2.5 man months =
b. 3
man months =
40'800 €
WP 9: Preparing
a. 2.4 man months =
32'640 €
QuoData GmbH
man months =
54'400 €
34'000 €
a. 3.5 man months =
47'600 €
b. 3.1 man months =
42'160 €
b. 4.3 man months =
58'480 €
a. 2.5 man months =
34'000 €
a. 2
27'200 €
Page 26 of 44
4
man months = 231'200 €
man months =
Developing the Database for the Comprehensive Study on Riverine Inputs and Direct Discharges (RID)
Step 6 – Proposal for a solution for the RID Database
documentation
Project manager
Maintenance for 5 years
Total
Table 9-1:
b. 3
man months =
40'800 €
5
man months =
68'000 €
b. 3.1 man months =
5
man months =
42'160 €
68'000 €
7
man months =
28'560 €
95'200 €
102'400 €
100'050 €
94'610 €
a. 23.8 man months = 426'080 €
a. 24.4 man months = 431'890 €
a. 40.4 man months = 644'050 €
b. 27.3 man months = 473'680 €
b. 28
b. 42.7 man months = 675'330 €
man months = 480'850 €
Estimation of the costs of implementing the 2016 RID database
QuoData GmbH
b. 2.1 man months =
Page 27 of 44
Developing the Database for the Comprehensive Study on Riverine Inputs and Direct Discharges
(RID)
Step 6 – Proposal for a solution for the RID Database
9.2
Annual maintenance costs of the 2016 RID database
In this section, a breakdown of the maintenance costs from in the previous table is given. Costs are
given in Euros per year (€/y)
Scenario
Cost item
S1
S2
S3
Backups (and in the unlikely case of a disaster: restore). In S1
and S3, this is done centrally. In S2, each CP holds a copy of
the database and has to take care of backups
1'320 €/y
1'450 €/y
2'380 €/y
Software security updates to all involved software systems
3'300 €/y
4'520 €/y
5'490 €/y
13'210
€/y
9'680 €/y
7'940 €/y
Handle special data requests for data users. Done by the
organization responsible for database content, e.g. OSPAR
secretary or Bioforsk
2'640 €/y
2'640 €/y
2'640 €/y
Database maintenance of user rights and privileges. In S3,
this means create/remove users. This is likely done by CPs
and/or OSPAR Secretariat
n/a
n/a
472 €/y
20'480
€/y
20'010
€/y
18'922
€/y
Yearly data submission: reoccurring effort for CPs and OSPAR
Secretariat to keep data up to date; in S1, this contains
additional costs due waiting time for feedback from the data
validation; in S2 this contains the need for synchronization of
the distributed copies
Total
Table 9-2:
Estimation of the annual costs of maintenance
QuoData GmbH
Page 28 of 44
Developing the Database for the Comprehensive Study on Riverine Inputs and Direct Discharges (RID)
Step 6 – Proposal for a solution for the RID Database
10 Time table of the implementation of the 2016 RID database
The following table presents an example of which tasks can be taken care of in parallel. Some tasks take shorter than presented, but have to be done during
that period. No specific scenario is targeted, as the durations needed for each task depend more on factors such as contractor experience, man power and
requirement changes. However, in our opinion the call for tender should ask potential contractors for a more precise and date-bound schedule.
Month
Work package
1
2
3
4
5
6
7
8
9
Preparation of call for tender
Collection of offers
Decision and contract
WP 1: Confirmation of database model
WP 2: Choice of the database platform
WP 3: Definition of workflow roles and user responsibilities
WP 4: IT data integrity and security
WP 5: Development of QA procedures for data validation
WP 6: Implementation of the functionalities
WP 7: Initial data migration and setup of production env.
WP 8: Database testing and preparation for approval
WP 9: Preparing documentation
Table 10-1:
Proposed road map for the project of the implementation of the 2016 RID database
QuoData GmbH
Page 29 of 44
10
11
12
13
14
15
16
17
18
11 Recommended scenario
The recommended scenario depends on whether a common solution with the HELCOM PLC database
(scenarios S4 or S5) or an independent, standalone RID database (S1 – S3) is preferred.
11.1 Common solution with HELCOM PLC
The OSPAR RID and HELCOM PLC databases have partly different requirements. If the requirements can be
harmonized, an integrated common database could make sense. The advantages of such a common system
for data users and data providers are obvious. As costs for developing and maintaining the database can be
split between HELCOM and OSPAR, considerable cost reductions would be achieved. However, there are
comparatively high harmonization costs. Agreements have to reached, among other things, on

Methodology of sampling and monitoring

Definitions, e.g. unmonitored areas

Methodology of verification of data

Determination of measurement uncertainties

Consideration of values below LOD/LOQ

Methodology of the estimation of loads from unmonitored areas

Methodology of trend analysis

Principles of source apportionment

Report structure
The administrative amount of work involved in preventing any preference regarding OSPAR’s or HELCOM’s
requirements must also be taken into account.
Thus, for small and customized applications, a scheme as in scenario S5 is recommended. For a large-scale
solution8, a common single database might be better-suited, as in scenario S4.
11.2 Independent, standalone RID database
The complete re-development of the 2008/2014 RID database as web-only solution (S3) gives the user the
most functionality in the easiest accessible way, but the costs are about 1.5 times higher than those
connected with an improvement of the existing Access based database (S1 and S2). The existing database
represents a solid foundation for the 2016 RID database. The basic 2008 and improved 2014 database can
be expanded with an acceptable amount of work to fulfil the users’ requirements such as providing
8

an intuitive, flexible, robust, secure and quick application

easy web-access to the RID data for data users

easy import of RID data

validated and quality-assured RID data, as well as

flexible export of RID data.
e.g. long-term collaboration, expansion of the scope of the RID Programme or strong integration into 3 rd-party databases.
Compared to scenario S1, scenario S2 offers the considerable advantage that the time-consuming import is
done by the data providers themselves, so errors are found quicker and feedback is more direct as
continuous communication between central data host and CPs is no longer required. Thus, QuoData
recommends scenario S2.
A Annex
A.1 Structure of 2008 RID database
Figure 10-1: Implemented structure of the 2008 RID database
For additional information, the reader is referred to the RID database documentation delivered by QuoData
in 2008.
A.2 Description of 2008 RID database
A.2.1
Structure of input tables
To cover the different input sources reported by the contracting parties, there are a several input tables.
The tables 5a to 5c contain the direct discharges to the maritime area (sewage effluents in table 5a,
industrial effluents in table 5b and the totals of direct discharges in table 5c). The tables 6a to 6c contain
the riverine inputs to the maritime area (6a for main rivers, 6b for tributary rivers and 6c for the totals of
riverine inputs). There are three other tables. Table 7 contains information on the measured concentration
of a contaminant for a selected combination of contracting party – year – catchment area. Table 8 contains
the contaminant-specific detection limits of the sewage effluents, industrial effluents and riverine inputs
for a selected combination of contracting party – year – catchment area. Specific information on the
catchment area, e.g. flow rate, are content of table 9.
A.2.2
Main Form
The whole data management of the 2008 RID database is conducted through the main form. This form
consists of four tabs:

Tables and Reports,

Export/Aggregation,

Import from Excel and

Log Book.
A.2.3
Tables and Reports
The tab Tables and Reports is used for displaying various tables according to the selected criteria. Five
possible overview reports can be created:

Plausibility check,

Annual summary,

Structure of water bodies,

Available tables,

Annual data per year
Figure 10-2: Main Form – Tables and Reports
Plausibility Check
In this section the data from each contracting party can be checked for its plausibility. For a respective
contaminant the plausibility check tests the not aggregated raw data (1) whether the range of the load
(upper value minus lower value) coincides with the limit of detection based on the run off (so-called “Noise
Check”) and (2) whether there is a clear relationship between the mean concentration and mean load
based on the runoff (so-called “Consistency Check”).
Annual Summary
In this section the following summary tables can be created for one corresponding year.

Table 1a:
Information
Convention
Received
on
Inputs
to
the
Maritime
Area
of
the
OSPAR

Table 1b:
Determinands Reported by Contracting Parties

Table 2:
Direct Discharges to the Maritime Area of the OSPAR Convention by
Country

Table 3:
Riverine
Inputs
to
the
Maritime
Area
of
the
OSPAR
Convention
by
Country

Table 4:
Sum of Direct Discharges (Table 2) and Riverine Inputs (Table 3) to the Maritime Area
of the OSPAR Convention by Country

Table 4b:
Sum of Direct Discharges and Riverine Inputs to the Maritime Area of the OSPAR
Convention by Sea Area
Figure 10-3: Example: Table 1a from Tables and Reports – Annual Summary
Structure of Water bodies
An overview of the structure of all water bodies for each contracting party can be exported in this section.
Figure 10-4: Structure of Water bodies for Germany
Availability Tables
In this section a separate Excel file containing an availability overview for the input tables 5a – 5c as well as
6a – 6c can be exported per each contracting party. This file indicates for which water body and which year
the specified contracting party submitted data.
Figure 10-5: Availability Tables for Germany – table 6a
Annual Data per Year
For each contracting party and year the data of the input tables (5a - 9) can be exported separately in an
Excel file. Available for selection is either an overview of the number of existing values or an overview of
the lower, upper and mean load/concentration.
Figure 10-6: Annual Data per Year – Existing Values
Figure 10-7: Annual Data per Year – Annual Data
A.2.4
Export/Aggregation
Via this tab RID data over several years can be exported to Excel or directly to RTrend to generate longterm trends.
Figure 10-8: Main Form – Export to RTrend
In order to guarantee a statistical reliable trend assessment there are certain rules on extrapolation and
interpolation of missing data and the aggregation of data.
For the data export, a contracting party, the time span and, if applicable, the determinand have to be
specified.
A.2.5
Import from Excel
Via this tab annual data can be imported to the RID database. For an easy import, Excel file templates can
be generated for each contracting party and year.
These templates correspond with the export files of the input tables (table 5a to 9). After entering the data
into the Excel files, these may be uploaded.
Figure 10-9: Main Form – Import from Excel
Figure 10-10:
Template for data input – CP Germany, year 2009, table 6a
A.2.6
Log Book
For the purpose of traceability actions of the database can be logged into a log book. All imports into the
database are automatically logged and the log book can be exported to an Excel file.
Figure 10-11:
Main Form – Log Book
The following actions can be recorded:

Data delivery when data have been delivered;

E-mail when information have been sent or received via an e-mail;

Telephone call when information have been sent or received via a telephone call.
Agenda Item 4 INPUT 14/4/3‐Add.2a‐E
English only
OSPAR Convention for the Protection of the Marine Environment of the North‐East Atlantic
Meeting of the Working Group on Inputs to the Marine Environment (INPUT) London (UK): 28 ‐ 30 January 2014 Development of the RID Database ‐ Step 2: Evaluation of database model and outline of overall requirements Presented by the Secretariat and QuoData Attached are the results of the work by QuoData on ‘Step 2’ of the project as reported on 10 January 2014. Description of Step 2 Evaluate database model and outline overall requests on data submission, data access and database interface (from the contract between OSPAR Commission and QuoData) Under this task the Contractor will evaluate the existing database model (structure) and assess if it matches the database outputs required by users. This will involve the Contractor to: 2A assess the existing database; 2B. evaluate possible database platforms; 2C set minimum requirements; 2D identify pros and cons of different database designs. 2A Assessment of existing database The Contractor will assess the existing database model. For this assessment, the summary of the questionnaire (see step 1) will be used. The highly prioritized needs will be given focus. The Contractor will consider all relevant wishes and questions for the optimal database structure. Based upon these preconditions, the Contractor will evaluate, where the existing database does not meet the user requirements. This will happen through the contact to the RID DTG. 2B Evaluation of possible database platforms Based on the user requirements evaluated in step 1 and the minimum requirements (see next section) as well as other requirements (e.g. expenditures of time and costs), the Contractor will recommend the most appropriate technique to create the database and for OSPAR running and maintaining it. The Contractor will consider and offer a variety of platforms for the new database. Especially the database systems MS SQL, MySQL and MS Access, but also SAP Advantage Database Server and Postgre SQL. Regarding the frontend, the Contractor will also consider and offer a variety of options, such as Drupal, DaDaBIK or DevExpress’ XAF in order to generate user‐
friendly web user interfaces and guarantee filtering and sorting functions. 2C Minimum requirements The Contractor will work towards a proposal that guarantees that the end user works with a user‐friendly, intuitive interface. This could include a web interface, making the system requirements for the end user as minimal as possible and increasing the accessibility. To ensure the data is valid there are different scenarios of user roles. These will be such that they can be implemented through a system of user groups with different rights. 2D Pros and cons of different database designs The Contractor will evaluate the advantages and disadvantages of: 1.
2.
3.
one commonly used database for OSPAR and HELCOM; two separate databases with a common structure; retain the existing RID database and implement a new interface upon it. The evaluation will be based on different categories concerning the user needs. These will be specified after the results of the questionnaire are summarized. Other categories will be: 1.
2.
3.
4.
the security structure (different user roles and user groups or equivalent system); the ability to change the database structure in the future; the ability to generate reports and visualizations; the security and consistency of the data; and also expenditures of time and costs. 1 of 1
OSPAR Commission INPUT 14/4/3‐Add.2a‐E
Developing the Database for
the Comprehensive Study on
Riverine Inputs and Direct
Discharges (RID)
Evaluate database model and outline
overall requirements on data submission,
data access and database interface
Imprint
QuoData GmbH
Quality Management and Statistics
Kaitzer Str. 135
D-01187 Dresden
Germany
Phone: +49 (0) 351 40 28867 0
Fax:
+49 (0) 351 40 28867 19
E-mail: [email protected]
Web:
www.quodata.de
Authors
PD Dr. habil Steffen Uhlig
Dipl.-Phys. Christian Bläul
Dipl.-Math. Henning Baldauf
Dipl.-Math. Kirstin Frost
10.01.2014
Developing the Database for the Comprehensive Study on Riverine Inputs and Direct Discharges (RID)
Step 2 – Evaluate database model and outline overall requests on data submission, data access and database interface
Contents
1
Summary and introduction ........................................................................................................... 4
2
Assessment of existing database (Step 2A) ............................................................................... 6
3
4
2.1
Prioritization of tasks ................................................................................................................. 6
2.2
Summary table .......................................................................................................................... 7
2.3
Requirements basic to all software systems ........................................................................... 13
Evaluation of possible database platforms (Step 2B) .............................................................. 14
3.1
Considered backends ............................................................................................................. 14
3.2
Frontend options ..................................................................................................................... 15
Approach comparison ................................................................................................................. 17
4.1
Implementation effort .............................................................................................................. 17
4.2
Summary table ........................................................................................................................ 18
4.3
Read-only web frontend for approach (A1) and (A2) .............................................................. 25
4.4
Synchronisation methods for the distributed copies in approach (A2) ................................... 25
4.5
Summary of effort by topic for the three approaches (Step 2D) ............................................. 26
5
OSPAR and HELCOM: Pros and cons of different database designs (Step 2D) .................. 29
A
Appendices ................................................................................................................................... 34
A.1
DBMS backend possibilities comparison ................................................................................ 34
A.1.1
Performance .................................................................................................................................. 34
A.1.2
License costs ................................................................................................................................. 34
A.1.3
Operating system requirements ..................................................................................................... 34
A.1.4
Administration effort ....................................................................................................................... 35
A.1.5
Spatial Data (GIS).......................................................................................................................... 35
A.2
Web database vs. local database software ............................................................................ 35
A.2.1
Advantages of web applications .................................................................................................... 35
A.2.2
Disadvantages of web applications................................................................................................ 36
A.3
Technical considerations on frontend frameworks using Drupal as an example .................. 37
QuoData GmbH
Page 3 of 37
Developing the Database for the Comprehensive Study on Riverine Inputs and Direct Discharges (RID)
Step 2 – Evaluate database model and outline overall requests on data submission, data access and database interface
1 Summary and introduction
In this report, the current Access-based database as well as possible future web-based database
1
solutions are checked with regard to the wishes stated in the questionnaire of Step 1 of this project.
Additionally, some implicit requirements, such as ease-of-use, an intuitive user interface and userfriendly error handling are considered. It was found that the current RID DB supports 7 of the 28
functions classified as “Need to have”. None of the 8 “Nice to have” functions are currently available.
This classification was based on the frequency of answers in the questionnaire (see Section 2.1).
From a technical perspective, all functions requested by users could be implemented into the existing
Access-based solution.
While every system has its advantages and caveats, based on the information obtained by QuoData
th
until Jan 6 2014, extending the Access database and making its data available for web browsers
seems slightly more cost-efficient (see Section 4.3). Three approaches are considered:
(A1) Access database with a read-only web frontend for data presentation, data downloads. Only
2
one copy exists, into which only one organisation imports data using a non-web import
module. This approach cost about 18.4 ± 6 man months.
(A2) Access database with a read-only web frontend for data presentation and data downloads.
All data providers have a copy of the database, which they can use for importing, reporting
2
and GIS integration. All copies are synchronised by only one organisation . This approach
cost about 19.9 ± 6 man months.
(A3) Pure web solution. All functions are usable with a browser, including the data import by the
CPs. This approach cost about 24.3 ± 10 man months.
While the web-based approach is more accessible and future-proof, the Access approaches poses
less risk. More differences and implications are discussed in Section 3 and 4, as well as Appendix A.2.
The following sub-steps were defined in the contract between OSPAR and QuoData:
Step 2A – Assessment of the existing database
First of all the existing database model is assessed regarding the responses of the questionnaire.
Based on all relevant wishes of data providers and data users, it is analysed where the existing
database does not meet the user requirements and, if possible, how it can be implemented in the
current RID database structure.
Step 2B – Evaluation of possible database platforms
The decision about the database management system should be based on the user requirements
1
To find out about the requirements of the end users ('data providers' and 'data users'), a questionnaire to prioritize new
features and improvements for the RID database was filled in by 35 participants, from 21st October to 12th November 2013. A
report containing all questions and answers has been provided by QuoData GmbH.
2
e.g. OSPAR secretary or Bioforsk
Page 4 of 37
QuoData GmbH
Developing the Database for the Comprehensive Study on Riverine Inputs and Direct Discharges (RID)
Step 2 – Evaluate database model and outline overall requests on data submission, data access and database interface
evaluated in step 1 and the resulting minimum requirements. Other requirements (e.g. expenditures of
time and costs) are also considered. To determine which technique is most appropriate to create the
database and for OSPAR running and maintaining the database, the different possible solution are
assessed.
There are a variety of platforms for a new database. Especially the database systems MS SQL,
MySQL and MS Access, but also PostgreSQL should be mentioned here. Regarding the frontend
there is another variety of options. Existing frameworks such as Drupal or DevExpress’ XAF could be
used to generate user-friendly web user interfaces and guarantee filtering and sorting functions. Each
of these solutions comes with a large number of well-tested plugins and modules to allow for speedy,
cost-efficient and reliable software development.
Step 2C – Minimum requirements
Depending on the database platform it should always be guaranteed that the end user works with a
user-friendly, intuitive interface. This could include a web interface, making the system requirements
for the end user as minimal as possible and increasing the accessibility.
MS SQL, MySQL or PostgreSQL and even MS Access databases can be maintained under a Server
infrastructure. This is an easy way to guarantee the availability for the end users, fulfil security
requirements, and warrant proper data validation.
Therefore in this step, the different options are compared with regard to the overall minimum
requirements for the
•
data submission
•
data validation
•
database interface etc.
Step 2D – Pros and cons of different database designs
In this last step the advantages and disadvantages of
•
one commonly used database for OSPAR and HELCOM,
•
two separate databases with a common structure
•
keeping the existing RID database and implement a new interface upon it
are evaluated.
The evaluation is based on different categories concerning the user needs. These are specified based
on the results of the questionnaire. Other categories are:
•
the security structure (different user roles and user groups or equivalent system),
•
the ability to change the database structure in the future,
•
the ability to generate reports and visualizations,
•
the security and consistency of the data, and also
•
expenditures of time and costs.
QuoData GmbH
Page 5 of 37
Developing the Database for the Comprehensive Study on Riverine Inputs and Direct Discharges (RID)
Step 2 – Evaluate database model and outline overall requests on data submission, data access and database interface
2 Assessment of existing database (Step 2A)
In this section, the current MS Access-based RID database, abbreviated as RID DB from now on, is
examined. First, a summary table to ease decision-taking is presented. In it, for all wishes stated, it is
checked if they are currently served to end users, in other words that they are usable without any IT
implementation effort. These estimated efforts will be spelled out in Section 4, where a table lists them
for realization within the current RID DB and alternative proposal.
2.1
Prioritization of tasks
To prioritize the questionnaire items, the scoring system explained below was used to harmonize the
different answer possibilities. It resulted in the following categories to be used in this report:
1. ‘Need to have’: score of 0 or larger. Also in this priority category are requirements basic to all
software systems: free of crashes, system availability, behaviour after the user makes an
error, and intuitivity.
2. ‘Nice to have’: score fulfilling this condition 0 > score > -1
3. ‘Minority wish’: most respondents don’t consider this important: score < -1
To calculate the score, the possible answers were assigned a value. These values were averaged to
create the score.
Answer
very important
+2
fairly important
+1
important
0
slightly important
-1
not at all important
-2
Answer
Page 6 of 37
Value
Value
yes
+2
no
-2
QuoData GmbH
Developing the Database for the Comprehensive Study on Riverine Inputs and Direct Discharges (RID)
Step 2 – Evaluate database model and outline overall requests on data submission, data access and database interface
2.2
2.2.1
Summary table
Part 1: Tables of RID database
Question
What do potential users
prefer?
1 Is it preferred to pool all
monitored rivers in one
comprehensive table?
yes
no
2 Should the set of
separate RID tables be
expanded?
yes
no
3 Which additional RID
tables are needed?
many proposals
meas’nt uncertainty
calculation methods
4 Should monitoring data
used for RID calculations
be stored in the RID
database?
very important
fairly important
important
slightly important
not at all important
Should RID load data at
less than annual
aggregation be stored in
the RID database?
5 If less-than annual
values are to be stored
in the RID database,
what temporal
aggregation should such
data have?
QuoData GmbH
Score
Prioritization
Available
in existing
RID DB
28 of 33
5 of 33
1.39
Need to have
no
10 of 29
19 of 29
-0.62
Nice to have
--
n/a
Nice to have
--
16 of 32
5 of 32
7 of 32
3 of 32
1 of 32
1.00
Need to have
no
very important
fairly important
important
slightly important
not at all important
10 of 32
9 of 32
7 of 32
1 of 32
5 of 32
0.56
Need to have
no
monthly
per quarter
every six months
27 of 30
3 of 30
0 of 30
n/a
Need to have
no
4 of 12
3 of 12
Page 7 of 37
Developing the Database for the Comprehensive Study on Riverine Inputs and Direct Discharges (RID)
Step 2 – Evaluate database model and outline overall requests on data submission, data access and database interface
2.2.2
Part 2: Access to the RID data
Question
What do potential users
prefer?
1 Should the RID
database have the
functionality to exchange
data with other
databases?
yes
no
With which database
should a data exchange
be possible?
Score
Prioritization
Available
to users of
existing
RID DB
25 of 32
7 of 32
1.13
Need to have
no
universal, easily
accessible format
that can be
exchanged with
other databases
7 of 22
n/a
Need to have
--
WISE (EEA)
6 of 22
Nice to have
HELCOM PLUS,
EMEP, WFD, MSFD,
CEMP, CAMP, ICES
database
2 of 22
Minority wish
EMECO, Waterquality database NL,
DONAR, GIS DB,
GEMS/Water, DB on
winter nutrient
concentrations
1 of 22
Minority wish
2 Is there a need to allow
easy access to data and
charts by a webbrowser?
yes
no
33 of 34
1 of 34
1.88
Need to have
no
3 Is there a need to allow
easy access to data and
charts by smartphone or
tablet?
yes
no
6 of 32
26 of 32
-1.25
Minority wish
no
4 Which additional
conditions should apply
to access to the RID
database?
data should be
public after QA step
2 of 5
n/a
Minority wish
no
original data should
be attributed to
original providers
1 of 5
contact of data
manager if a user
finds unreasonable
values
1 of 5
1 of 5
Password protected;
users from OSPAR
community only
Page 8 of 37
QuoData GmbH
Developing the Database for the Comprehensive Study on Riverine Inputs and Direct Discharges (RID)
Step 2 – Evaluate database model and outline overall requests on data submission, data access and database interface
2.2.3
Part 3: Submission of data into the RID database
Question
What do potential users
prefer?
Score
Prioritization
Available
to users of
existing
RID DB
1
Submitting data via
importing the data file?
very important
fairly important
important
slightly important
not at all important
11 of 17
4 of 17
2 of 17
0 of 17
0 of 17
1.53
Need to have
yes
Submitting data via
transferring the data by
“copy & paste”?
very important
fairly important
important
slightly important
not at all important
1 of 11
1 of 11
4 of 11
3 of 11
2 of 11
-0.36
Nice to have
no
Excel file format for
submitting data?
very important
fairly important
important
slightly important
not at all important
10 of 18
4 of 18
3 of 18
1 of 18
0 of 18
1.28
Need to have
yes
Comma separated
values (CSV) file format
for submitting data?
very important
fairly important
important
slightly important
not at all important
3 of 13
2 of 13
3 of 13
1 of 13
4 of 13
-0.08
Nice to have
no
Which other formats
should be used for data
submission?
ACCESS
XML
ASCII
2 of 9
2 of 9
1 of 9
n/a
Minority wish
no
3
Should be given the
opportunity to submit
partial RID datasets?
yes
no
14 of 17
3 of 17
1.29
Need to have
yes
4
Should there be a
possibility to add
information on the
‘measure of
uncertainty’?
very important
fairly important
important
slightly important
not at all important
6 of 17
7 of 17
2 of 17
1 of 17
1 of 17
0.94
Need to have
no
Should there be a
possibility to add
additional comments on
data (e.g. flagging when
data is missing or
suspicious)?
very important
fairly important
important
slightly important
not at all important
6 of 18
8 of 18
3 of 18
1 of 18
0 of 18
1.06
Need to have
no
2
QuoData GmbH
Page 9 of 37
Developing the Database for the Comprehensive Study on Riverine Inputs and Direct Discharges (RID)
Step 2 – Evaluate database model and outline overall requests on data submission, data access and database interface
2.2.4
Part 4: Validation of imported data
Question
What do potential users
prefer?
Score
Prioritization
Available
to users of
existing
RID DB
1
Covering quality control
procedure for missing
values?
very important
fairly important
important
slightly important
not at all important
14 of 20
1 of 20
1 of 20
2 of 20
2 of 20
1.15
Need to have
no
Covering quality control
procedure for invalid
values (e.g. wrong units
in concentration values)?
very important
fairly important
important
slightly important
not at all important
17 of 20
2 of 20
0 of 20
1 of 20
0 of 20
1.75
Need to have
no
Covering quality control
procedure for suspicious
values?
very important
fairly important
important
slightly important
not at all important
10 of 20
5 of 20
2 of 20
3 of 20
0 of 20
1.10
Need to have
no
Covering quality control
procedure for too many
significant figures?
very important
fairly important
important
slightly important
not at all important
4 of 18
5 of 18
4 of 18
4 of 18
1 of 18
0.39
Need to have
no
2
Which further automatic
tests should be taken
into account?
Double recordings
3 of 12
n/a
Nice to have
no
3
Who is responsible for
the final approval of
imported data?
Contracting Party
only
relevant OSPAR
Committee
somebody else
14 of 18
n/a
--
n/a
--
Who should be
responsible for the final
approval of imported
data instead of
Contracting Parties or
relevant OSPAR
committee?
Page 10 of 37
RID input data group
or data manager
2 of 18
2 of 18
4 of 18
QuoData GmbH
Developing the Database for the Comprehensive Study on Riverine Inputs and Direct Discharges (RID)
Step 2 – Evaluate database model and outline overall requests on data submission, data access and database interface
2.2.5
Part 5: Functionality
Question
What do potential users
prefer?
Score
Prioritization
Available
to users of
existing
RID DB
1 Should it be possible to
filter, sort and search
entries (for example
select parameters,
years, areas) from the
different parts of the RID
tables?
very important
fairly important
important
slightly important
not at all important
22 of 31
7 of 31
0 of 31
2 of 31
0 of 31
1.58
Need to have
no
2 Should it be possible to
customize e.g. the range
of geographic areas (e.g.
across sub-regions) from
which data are to be
aggregated in data
products?
very important
fairly important
important
slightly important
not at all important
20 of 31
8 of 31
1 of 31
1 of 31
1 of 31
1.45
Need to have
no
3 For which particular
features you would like a
‘customization’ option?
subregional
assessment
Total river basin
inputs
parts of sea area
covered by OSPAR
11 of 20
n/a
Nice to have
no
very important
fairly important
important
slightly important
not at all important
17 of 30
8 of 30
2 of 30
3 of 30
0 of 30
1.30
Need to have
yes
Exporting data via CSV
file?
very important
fairly important
important
slightly important
not at all important
10 of 23
7 of 23
3 of 23
3 of 23
0 of 23
1.04
Need to have
no
Exporting data via XML
file?
very important
fairly important
important
slightly important
not at all important
5 of 26
3 of 26
7 of 26
5 of 26
6 of 26
-0.15
Nice to have
no
Which other formats
should be used for data
table export?
ACCESS
ASCII
Shape file for graphs
PDF
2 of 11
3 of 11
2 of 11
1 of 11
n/a
Minority wish
no
14 of 31
9 of 31
5 of 31
3 of 31
0 of 31
1.10
Need to have
no
4 Exporting data via Excel
file?
5 Should the RID
database be able to link
to a GIS system so as to
create maps?
QuoData GmbH
very important
fairly important
important
slightly important
not at all important
5 of 20
3 of 20
Page 11 of 37
Developing the Database for the Comprehensive Study on Riverine Inputs and Direct Discharges (RID)
Step 2 – Evaluate database model and outline overall requests on data submission, data access and database interface
2.2.6
Part 6: Additional features
Question
What do potential users
prefer?
Score
Prioritization
Available
to users of
existing
RID DB
2 Should LOD be included
in the RID database?
very important
fairly important
important
slightly important
not at all important
14 of 29
6 of 29
4 of 29
3 of 29
2 of 29
0.93
Need to have
yes
Should LOQ be included
in the RID database?
very important
fairly important
important
slightly important
not at all important
15 of 28
7 of 28
5 of 28
1 of 28
0 of 28
1.29
Need to have
no
Should flow
measurements be
included in the RID
database?
very important
fairly important
important
slightly important
not at all important
22 of 29
5 of 29
2 of 29
0 of 29
0 of 29
1.69
Need to have
yes
Should proportions of
measurements
above/below LOD and
LOQ be included in the
RID database?
very important
fairly important
important
slightly important
not at all important
14 of 28
6 of 28
4 of 28
3 of 28
1 of 28
1.04
Need to have
no
Should level of the
annual aggregated loads
be included in the RID
database?
very important
fairly important
important
slightly important
not at all important
11 of 27
7 of 27
4 of 27
4 of 27
1 of 27
0.85
Need to have
no
3 Should a separate
module for trend
analysis (e.g. RTrend
software) be part of the
RID database
capabilities?
yes
no
24 of 32
8 of 32
1.00
Need to have
yes
4 Should some of the
fields of the ‘text
reporting format’ be
incorporated into the RID
data submission and the
RID database?
yes
no
19 of 29
10 of 29
0.62
Need to have
no
Page 12 of 37
QuoData GmbH
Developing the Database for the Comprehensive Study on Riverine Inputs and Direct Discharges (RID)
Step 2 – Evaluate database model and outline overall requests on data submission, data access and database interface
2.3
Requirements basic to all software systems
The items mentioned below were not asked in the questionnaire. Instead, they are an implicit
requirement of almost all software projects:
•
Ease-of-use: describes if user can get the results he needs quickly
The current RID DB has a clearly defined feature set. The only part that consumes more user
time than necessary is the data import, which is part of the improvements of Step 4 of this
project. The actual functionality potential users would expect was discussed above.
•
Intuitive user interface: describes how difficult it is for a user to guess or learn which functions
are where, and how they work. As the current RID DB consists of just 4 tabs in a single
window, as can be seen in Figure 1, learning to use it is easy.
•
Error handling: when the user tries something that is prevented by the software or if the
software needs more input than the user has given: Most error messages are currently very
short. It is recommended to provide a higher level of detail.
After an error occurs, the software remains usable. However, the RID database does not
clearly communicate which contents have been updates. Improving this is also part of Step 4.
Figure 1: Main window of current RID DB
QuoData GmbH
Page 13 of 37
Developing the Database for the Comprehensive Study on Riverine Inputs and Direct Discharges (RID)
Step 2 – Evaluate database model and outline overall requests on data submission, data access and database interface
3 Evaluation of possible database platforms (Step 2B)
In the previous section, the currently existing Access solution was assessed. This section will focus on
possible alternatives for an entirely new development of the RID database (without using existing VBA
source code). For a comparison between the two approaches (A1) and (A2) of enhancing the currently
existing database and creating a new solution (A3) from scratch, the reader is referred to Section 4.
The general term “database” can be split in the realms of data storage and user interaction. The first is
often called a back-end or Database Management Systems (DBMS), and only requires attention from
IT administrators during initial setup and during emergencies. The latter is the frontend, and typically
the only part any user experiences. A typical frontend can work together with several backends and
vice versa.
3.1
Considered backends
In this section, four different DBMS are analysed from different perspectives. To begin with, these
systems are presented shortly.
•
MySQL
MySQL is a widely-used open source DBMS.
•
MS SQL Server
MS SQL Server is a powerful DBMS produced by Microsoft.
•
MS Access
MS Access is a software package produced by Microsoft which combines a DBMS (called
Microsoft Jet Engine) with an integrated development environment.
•
PostgreSQL
PostgreSQL is an open source DBMS released by a community of developers called
PostgreSQL Global Development Group.
MS Access differs from the other 3 analysed DBMS in that its main focus lies on desktop database
applications. It offers a powerful graphical user interface which makes it comparatively easy to create
and maintain databases. Database applications with higher complexity can be created with the
development environment for Visual Basic for Applications (VBA), which is integrated in the software
package.
It is possible to use Access as database in a browser-based web application as a data storage, but not
as a programming environment. Alternatively, Access applications can be used over the web with the
help of Microsoft Terminal Services and Remote Desktop Application in Windows Server 2008. This
can be an appropriate solution if an existing local Access application should be extended to remote
users. But compared to a browser-based web application, it does not perform as well in a multi-user
scenario and is connected with higher licence costs.
The maximum size of an Access Database is 2 GB, which could be a limiting factor in the long term,
because (a) the database grows much faster in multi-user environments due to its internal change
7
tracking , and (b) the possible onset of sub-annual data collection.
Page 14 of 37
QuoData GmbH
Developing the Database for the Comprehensive Study on Riverine Inputs and Direct Discharges (RID)
Step 2 – Evaluate database model and outline overall requests on data submission, data access and database interface
As 33 out of 34 respondents of the questionnaire expressed the opinion, that the database should be
accessible by a web browser, using MS Access as the data storage for an entirely new development is
an inferior choice. It is therefore not be included in the following analysis. Please note that this
statement is not about extending the existing RID DB, but about selecting a backend for an alternative
solution.
The other 3 database systems however perform equally well on the expected data amounts and user
numbers. The details that lead to this conclusion can be found in Appendix A.1. The following table
summarises the information given there.
MySQL
MS SQL
PostgreSQL
Performance
good
excellent
good
License costs
free
free
free
Operating System
Windows, Linux, UNIX
Windows
Windows, Linux, UNIX
Administration effort
low
high
moderate
Spatial Data (GIS)
basic support
full support
full support
3.2
3
Frontend options
A frontend is what the user sees: it determines the graphical user interface, typically having buttons,
input boxes, information displayed in tables etc. In the following paragraphs, three options are
presented. They are not to be confused with the three approaches discussed elsewhere in this
document. Here are the general options for frontend creation:
(1) A native application, typically a Windows program
(2) A web application
(3) A mixture of both
(1)
In the most common scenario, a Windows program needs to be installed locally. This means that
every user of the new RID DB has his or her own installation, and typically expects its data to be
up-to-date. A Windows application could connect to a central server via the web to synchronize
the local copy of the data with the central server. Well-designed native applications have speed
benefits over pure web applications, but require more attention from the IT department within
every institute the application is used in. This would limit the reach to users that depend on the
database and might deter potential users.
(2)
In a web application, everything is managed within a browser. Therefore, there is no installation
effort for end-users. This option is considered in approach (A3). It makes the database very easy
to access, but might feel slower for certain data-centred tasks.
3
for SQL SERVER EXPRESS EDITION
QuoData GmbH
Page 15 of 37
Developing the Database for the Comprehensive Study on Riverine Inputs and Direct Discharges (RID)
Step 2 – Evaluate database model and outline overall requests on data submission, data access and database interface
(3)
To relieve this problem and avoid the problems of the native application, one can split the user
requirements into some for a core user group that have to use a native application (ideally with
better speed) and a web application for occasional users or users within restrictive IT
environments.
Using this architecture, a locally installed software could be used for importing and Windows-centric
tasks like GIS integration. An additional web application would be specialized for data presentation,
data downloads and reporting. The existing Access application could partly be reused and therefore
the development effort could be reduced. On the other hand, two different systems would have to be
maintained, which will increase the long-term costs. This option is used for approach (A1) and (A2).
For a detailed list of pros and cons of web applications, the reader is referred to Appendix A.2.
We believe that for a new development without the use of existing source code, option (2) carries most
benefits, least risk and smallest overall costs, especially when including the internal costs within the
Contracting Party. If a fresh start is chosen, it is therefore recommended to develop the database user
4
interface as a complete web application. For this task, a content management system like Drupal or a
5
framework like DevExpress eXpressApp Framework (XAF) could be used instead of starting from
scratch. Drupal is free of costs and supports the three web DBMS backends mentioned above. More
details on the development process of a future RID DB with Drupal can be found in Appendix A.3. XAF
requires a one-time development licence (ca. 1604 €), and supports all above mentioned database
systems.
Instead of using a framework, a web application could also be developed from scratch with much more
effort but also more possibilities for customization. The best choice of programming language and
framework depends on the experience of the software development team, but the situation is similar
for other available frameworks and programming languages.
4
5
http://drupal.org
https://www.devexpress.com/products/net/application_framework
Page 16 of 37
QuoData GmbH
Developing the Database for the Comprehensive Study on Riverine Inputs and Direct Discharges (RID)
Step 2 – Evaluate database model and outline overall requests on data submission, data access and database interface
4 Approach comparison
In this section, technical explanations and effort estimations are presented for each question of Step 1.
The two approaches (A1) and (A2) of extending the current Access database and approach (A3) of
creating a new web application are compared.
4.1
Implementation effort
In the tables below, the effort column lists the estimated technical effort (time) needed to implement
6
and test the feature. It does not take into account the effort to create end-user and IT documentation,
because time needed for documentation is similar for most functions. Also, the effort column does not
include how much work the change entails for the IT department of the Contracting Party. Before the
implementation can begin, it is recommended to craft a detailed requirements document. The effort for
describing the feature in such a document is crucial for the success of the overall project, but is not
part of the implementation estimate given below.
Here, only a qualified statement is given due to the high uncertainty of individual tasks. The exact
hours depend on factors such as past experience and organisational overhead (i.e. a project done by
1 person during 4 years can’t be sped up to be finished in one year by 4 people. Instead, an overhead
of 50-100% for design, internal communication needs and quality assurance is to be expected).
Nevertheless, here are the approximate hours for the qualitative terms:
Effort statement
Man
months
without
overhead
small
0.0 – 0.3
medium
0.3 – 1.2
high
1.2 – 1.8
In Section 4.3, the efforts of for the questions are quantified by topic.
6
Here, test refers to the individual function being tested by the developer or someone with a similar mindset. It ensures that the
function works as the developer indents it to work. Only if proper documentation or communication exists, this will also ensure
that the function works how the end user needs it. It does not refer to integration tests (all functions coming together) or user
experience tests.
QuoData GmbH
Page 17 of 37
Developing the Database for the Comprehensive Study on Riverine Inputs and Direct Discharges (RID)
Step 2 – Evaluate database model and outline overall requests on data submission, data access and database interface
4.2
Summary table
The following headings group the assessment by questionnaire topics. Note that approaches (A1) and (A2) are not evaluated separately, because from the
programming perspective, they are almost identical, with the exception of an additional synchronization method for approach (A2).
4.2.1
Part 1: Tables of RID database
Question
Prioritization
Available
in existing
RID DB
1
Is it preferred to pool all
monitored rivers in one
comprehensive table?
Need to have
no
2
Should the set of
separate RID tables be
expanded?
Nice to have
--
3
Which additional RID
tables are needed?
Nice to have
--
4
Should monitoring data
used for RID calculations
be stored in the RID
database?
Need to have
no
Should RID load data at
less than annual
aggregation be stored in
the RID database?
Need to have
no
Page 18 of 37
What needs to be done to extend the Access RID DB?
Effort to
extend
Access DB
Effort for
new web
application
Within the RID DB, no distinction is made between main
rivers, tributary rivers, areas or point sources. This distinction
only exists for the import files. The import workflow including
templates and its messages needs to be updated. The main
work here would be for the data providers who have to
change their reporting procedures and data collection
systems.
small
included in
minimal
design
Expanding the tables affects both the import and the export. It
would also need comprehensive documentation for the data
providers and split-up of existing tables. This task would likely
also make it necessary to communicate with Bioforsk to make
sure that the required accuracy is reached.
medium
less than
Access
The current DB design is not meant for less-than-annual data.
Import, export and data storage would need to be changed.
This change also implies a larger planning step to make sure
that the frequency is either flexible or fixed correctly.
medium
about half of
Access
QuoData GmbH
Developing the Database for the Comprehensive Study on Riverine Inputs and Direct Discharges (RID)
Step 2 – Evaluate database model and outline overall requests on data submission, data access and database interface
4.2.2
Part 2: Access to the RID data
Question
Prioritization
Available
in existing
RID DB
1
Should the RID
database have the
functionality to exchange
data with other
databases?
Need to have
no
2
Is there a need to allow
easy access to data and
charts by a webbrowser?
Need to have
3
Is there a need to allow
easy access to data and
charts by smartphone or
tablet?
Minority wish
QuoData GmbH
What needs to be done to extend the Access RID DB?
Effort to
extend
Access DB
Effort for
new web
application
A connection to another database cannot be assessed without
proper documentation. It is likely that a complete link would be
very hard to create based on the assumption that data is likely
to be not fully compatible. Access can, in theory, connect to
online databases to retrieve or submit data. This would require
the internet connection for Access, which sometimes is not
granted by the respective IT department. Currently, the RID
DB doesn’t support any remote data sources and no generic
import or export format. Indeed, for some data, no export
functionality exists in the current RID DB.
medium
medium
no
Access doesn’t provide an easy way to make its interface
available online. While a web frontend could be built on top of
the current database, it would mean a tremendous effort,
because the existing VBA code is meant for a locally installed
Access. It would be almost impossible to use more than the
idea and the structure of the existing code, because a web
frontend would not use VBA (see Step 2B). Also, Access
doesn’t make its VBA code directly available to other
programming language. The only sensible way would be to
use intermediate communication tables that store VBA input or
output temporarily.
high
included in
minimal
design
no
Smartphones and tablets can read Excel files, but not Access
files. Thus, a data export would be needed before copying the
data on the mobile device. This is however very unpractical
because a desktop Access installation would be needed.
Alternatively, the web interface of question 2 would be
accessible by a mobile device. Unless the web interface is
indented to be used on mobile devices, it would be
cumbersome to use.
medium
medium
Page 19 of 37
Developing the Database for the Comprehensive Study on Riverine Inputs and Direct Discharges (RID)
Step 2 – Evaluate database model and outline overall requests on data submission, data access and database interface
4.2.3
Part 3: Submission of data into the RID database
Question
Prioritization
Available
in existing
RID DB
1
Submitting data via
importing the data file?
Need to have
yes
This is currently implemented.
Submitting data via
transferring the data by
“copy & paste”?
Nice to have
no
Excel file format for
submitting data?
Need to have
Comma separated
values (CSV) file format
for submitting data?
Nice to have
2
Page 20 of 37
What needs to be done to extend the Access RID DB?
Effort to
extend
Access DB
Effort for
new web
application
available
see below
The RID DB can produce empty template files. These files
can be filled with copy and paste already, but no format
detection or column assignment method exists at the moment.
Implementing such a mechanism would reduce the import
effort dramatically, but would not be sensible because it would
imply that the data provider has a copy of the Access RID DB.
This copy would then need to be synchronized with the main
DB. The institution responsible of collecting data files, e.g.
Bioforsk already receives files in the correct Excel format,
ready for import. They would probably refuse undocumented
formats to use such a smart “copy & paste” mechanism, as
the Excel template already serves as both documentation of
the data meaning and as import format.
small
small if
file-based
import exists,
otherwise
medium
yes
Currently, the RID DB only accepts files opened in Excel. The
file format itself is merely a container. Excel files increase the
interoperability because it avoids decimal separator conflicts.
Also, the RID DB produces native Excel templates (empty
files). For the RID DB to support CSV of the exactly same
format as the Excel file, the Effort of implementing this task is
estimated as 1. Another fundamental question is the content
format detection and column assignment discussed above.
available
medium
no
Some participants suggested using a tab-separated format.
From the implementation effort, it is no different to supporting
CSV files.
small
small if
file-based
import exists,
otherwise
medium
QuoData GmbH
Developing the Database for the Comprehensive Study on Riverine Inputs and Direct Discharges (RID)
Step 2 – Evaluate database model and outline overall requests on data submission, data access and database interface
Question
Prioritization
Available
in existing
RID DB
What needs to be done to extend the Access RID DB?
Effort to
extend
Access DB
Effort for
new web
application
Which other formats
should be used for data
submission?
Minority wish
no
Support of generic XML files would be very time-consuming,
but supporting a well-specified format would be easily
realizable. Some respondents asked for Access as an import
format. Unless the DB format is well-specified, this would be
effortful to implement.
tabseparated:
small;
XML:
high
small if
file-based
import exists,
otherwise
medium
3
Should be given the
opportunity to submit
partial RID datasets?
Need to have
yes
The current RID DB already supports partial imports.
However, no feedback or warning mechanism exists for the
user to see that not all areas, rivers, or substances have been
imported.
available
no additional
effort
4
Should there be a
possibility to add
information on the
‘measure of
uncertainty’?
Need to have
no
Supporting the uncertainty import would imply a change of the
current reporting format, leading to time-consuming changes
in the data provider’s data collection infrastructure and
scientific underpinning/verification. From the IT perspective, a
pure import and storage would not be a very difficult task.
Measurement uncertainties should be importable as absolute
and relative values. The reporting and export would also need
to be changed, requiring a comprehensive planning stage.
medium
medium
Should there be a
possibility to add
additional comments on
data (e.g. flagging when
data is missing or
suspicious)?
Need to have
no
The meta data is currently not stored in an accessible way.
The main task would be defining which objects and data
structures can be commented on, e.g. rivers, individual
concentrations or loads, entire years or catchment areas.
Another issue is whether comments have to be imported or
can be inserted later. Who is allowed to comment? Where
should the comments appear? After completion of the
planning, the implementation is straight-forward.
medium
medium
QuoData GmbH
Page 21 of 37
Developing the Database for the Comprehensive Study on Riverine Inputs and Direct Discharges (RID)
Step 2 – Evaluate database model and outline overall requests on data submission, data access and database interface
4.2.4
Part 4: Validation of imported data
Question
Prioritization
Available
in existing
RID DB
1
Covering quality control
procedure for missing
values?
Need to have
no
Covering quality control
procedure for invalid
values (e.g. wrong units
in concentration values)?
Need to have
no
Covering quality control
procedure for suspicious
values?
Need to have
Covering quality control
procedure for too many
significant figures?
Which further automatic
tests should be taken
into account?
2
Page 22 of 37
What needs to be done to extend the Access RID DB?
Effort to
extend
Access DB
Effort for
new web
application
small
small
small
small
no
small
small
Need to have
no
smallmedium
smallmedium
Nice to have
no
medium per
test
medium per
test
The detection of these values is straightforward. To detect
such values, one or more statistical models and tests are
needed. Due to the nature of these tests, a probability of false
alarm exists. Therefore, the test should merely be a flag,
advising the user about the problem, but not stopping him
from importing the data.
QuoData GmbH
Developing the Database for the Comprehensive Study on Riverine Inputs and Direct Discharges (RID)
Step 2 – Evaluate database model and outline overall requests on data submission, data access and database interface
4.2.5
Part 5: Functionality
Question
Prioritizatio
n
Available
in existing
RID DB
1
Should it be possible to
filter, sort and search
entries (for example
select parameters, years,
areas) from the different
parts of the RID tables?
Need to have
no
2
Should it be possible to
customize e.g. the range
of geographic areas (e.g.
across sub-regions) from
which data are to be
aggregated in data
products?
Need to have
no
3
For which particular
features you would like a
‘customization’ option?
Nice to have
no
4
Exporting data via Excel
file?
Need to have
yes
Exporting data via CSV
file?
Need to have
no
Exporting data via XML
file?
Nice to have
no
Which other formats
should be used for data
table export?
Minority wish
no
Should the RID database
be able to link to a GIS
system so as to create
maps?
Need to have
no
5
QuoData GmbH
What needs to be done to extend the Access RID DB?
Effort to
extend
Access DB
Effort for
new web
application
Access provides functionalities to filter, to sort and to search
the data records. These are only available direct in the
database view or in the development view. Therefore it is
better to let the end user directly use the database and not
just the form frontend.
smallmedium
smallmedium
Options for customizing data aggregation could be
implemented in Access. Therefore it would be necessary to
alter the frontend and the data queries in the backend.
medium
medium
Access provides the export capability for Excel files and also
CSV files. Therefore it is easy to implement features
regarding the export that are not yet in the RID DB. These
export functions are the easiest to implement. Export to the
formats XML and ACSII is possible too, but not demanded by
a majority of users. PDF export can be established through
report functions in Access and some open-source software.
available
small
small
small
medium
small
PDF: small
PDF: small
ArcGIS:
small-high
small-high
By the use of OLE DB driver, Access can connect to ArcGIS.
This can be used to create maps out of the tables, but would
lead to additional license costs. Please note that the RID DB
does currently not store geographic shapes.
Page 23 of 37
Developing the Database for the Comprehensive Study on Riverine Inputs and Direct Discharges (RID)
Step 2 – Evaluate database model and outline overall requests on data submission, data access and database interface
4.2.6
Part 6: Additional features
Question
Effort to
extend
Access DB
Effort for new web
application
available
small
small
small
yes
available
small
Need to have
no
small
small
Should level of the
annual aggregated loads
be included in the RID
database?
Need to have
no
small
small
3
Should a separate
module for trend
analysis (e.g. RTrend
software) be part of the
RID database
capabilities?
Need to have
yes
A separate module for trend analysis could be implemented in
the RID DB. This functionality already exists with the RTrend
Software. However, there is only an interface and an indirect
data exchange by opening an EXCEL file which was created
beforehand in the RID DB in RTrend.
small for
testing and
improving
existing
interface
4
Should some of the
fields of the ‘text
reporting format’ be
incorporated into the RID
data submission and the
RID database?
Need to have
no
Adding meta data to the RID DB could be managed. To add
meta data it is necessary to extend the data model of the RID
DB. This can be done with access internally. Type and
coverage of the meta data stored in the RID DB have to be
specified later.
medium
2
Prioritization
Available
in existing
RID DB
Should LOD be included
in the RID database?
Need to have
yes
Should LOQ be included
in the RID database?
Need to have
no
Should flow
measurements be
included in the RID
database?
Need to have
Should proportions of
measurements
above/below LOD and
LOQ be included in the
RID database?
Page 24 of 37
What needs to be done to extend the Access RID DB?
The decision about the features depends also from the ability
of the data providers to provide data for the asked
parameters. Based upon this ability, these additional features
can be implemented through expanding the data model of the
RID DB and thus can be done within Access.
medium for RTrend
or generic Excel file
creation
medium for web
trend analysis
without external
tools
medium
QuoData GmbH
Developing the Database for the Comprehensive Study on Riverine Inputs and Direct Discharges (RID)
Step 2 – Evaluate database model and outline overall requests on data submission, data access and database interface
4.3
Read-only web frontend for approach (A1) and (A2)
97% of questionnaire respondents want a browser-based access to “data and charts”. This is a vague
statement. To be able to supply an effort estimate, the following functions are proposed:
•
Table of loads or concentrations per source and year (filterable and searchable, so e.g. only
one substance for all years and sources, or all substances for one year etc.)
•
Table of inputs for an aggregate of sources (e.g. multiple rivers) or catchment area
•
Time series charts per substance, with a selectable source or aggregate of sources
•
Chart and table of trend assessment and load adjustment (based on RTrend)
When choosing approach (A1) or (A2), we propose to limit the web capabilities to read-only access,
because
•
only one role is needed, which makes implementation and administration easier
•
import and data validation already exist in the Access, and can be extended more quickly
•
avoid quick growth of the Access DB.
7
The web frontend may use a Access DB copy as its data source to avoid adding complexity.
4.4
Synchronisation methods for the distributed copies in approach (A2)
The motivations for using approach (A2) are the following:
•
Existing parts of the Access database can continue to be used (no data migration,
re-programming or testing).
•
Currently, only one organisation has access to all functions of the database. If approach (A1)
is taken, this might remain true, whereas approach (A2) is making the database available to all
Contracting Parties. Thus, more users could benefit from existing and future functions.
The time-consuming import is done by the data providers themselves, so errors are found quicker and
feedback is more direct. This is done by distributing copies of the Access DB to the CPs.The last point
requires synchronisation of the database copies. Since old data is not modified and each CP only
adds their own new data, programming the synchronisation can be done, for example, with Jet
Replication. Additionally, the keeper of the master copy needs to have a way of checking which data
was already imported to avoid the problem of forgetting the data of one CP. This check can be done
by a “close year for import” function, that also publishes the data on the read-only web frontend as
official. This leads to the following workflow proposal:
1. Access database copies are sent out to the CPs, after final approval of the development.
2. Each CP imports their data using their copy of the Access DB.
3. The modified copy is made available to the keeper of the master copy, who also runs the web
frontend.
7
Since Access is a file-based solution, it handles possible concurrent write operations by reserving space, which in practice
means it grows quicker than TCP/IP-based databases.
QuoData GmbH
Page 25 of 37
Developing the Database for the Comprehensive Study on Riverine Inputs and Direct Discharges (RID)
Step 2 – Evaluate database model and outline overall requests on data submission, data access and database interface
4. The keeper imports data into the master copy.
5. The software will check if all data was imported when the keeper closes the year for import.
6. The master copy is again distributed to the CPs for usage and import of next year’s data.
4.5
Summary of effort by topic for the three approaches (Step 2D)
In the following tables, the following three approaches are compared:
(A1)
Access database with a read-only web frontend for data presentation, data downloads.
2
Only one copy exists, into which only one organisation imports data using a non-web
import module.
(A2)
Access database with a read-only web frontend for data presentation, data downloads. All
data providers have copy of the database, which they can use for importing, reporting and
2
GIS integration. All copies are synchronised by only one organisation .
(A3)
Redevelopment for the web. All functions are usable with a browser, including the data
import by the CPs.
When extending the Access DB, most user requirements would be implemented using VBA, which
means they are not accessible from the Internet (see Section 3.1). Only the functions explicitly asking
8
for web browser support would be made available from the Internet using a non-VBA solution. For
approach (A3), developing a new web-based application, the estimates are based on using the XAF
framework. This implies that all functions are usable from any browser via the Internet. Experts in web
programming are already easier to find than VBA experts. This and other factors lead us to the
conclusion that approach (A3) is more suitable for a long-term solution that can be adapted to future
decisions and currently unknown needs with less effort.
The most important difference is how the data providers experience the RID DB: In case of
approach (A1), the data import needs to be done by one host (that was the role of Bioforsk). Within
approach (A2), a mechanism for synchronising distributed Access DB copies has to be tested,
documented and executed by the CP. While the former will likely result in slightly higher reoccurring
costs, the latter is associated with a one-time effort of about 1-2 man months. With the web-based
approach (A3), the CP can import the data themselves, with immediate feedback (e.g. flagging of
problematic/missing data) and thus less communication effort for synchronisation or Excel file
exchange.
8
e.g. question 2.2 from Step 1: “Is there a need to allow easy access to data and charts by a web-browser?”
Page 26 of 37
QuoData GmbH
Developing the Database for the Comprehensive Study on Riverine Inputs and Direct Discharges (RID)
Step 2 – Evaluate database model and outline overall requests on data submission, data access and database interface
Criterion
Implementation costs for future requirements
(A1)
(A2)
higher highest
(A3)
9
lower
Functionality exposed to remote users
little
little
all
Existing Access application could partly be reused (smaller development,
testing and documentation effort)
yes
yes
no
Reoccurring effort for CPs and OSPAR secretary to keep data up to date
high
medium small
The man months below are rough estimates, not precise quantities. For the estimations, the HELCOM
efforts (e.g. cooperation or an interface) were ignored. The table below is based on the Step 1
questionnaire where indicated. Please note that the questionnaire was kept brief on purpose, and
there is a chance that not all user requirements have been captured. Minority wishes and the
comments of open (free-text) questions have not been included in the table below.
9
Several MS Access versions have to be supported, new software versions have to be distributed
QuoData GmbH
Page 27 of 37
Developing the Database for the Comprehensive Study on Riverine Inputs and Direct Discharges (RID)
Step 2 – Evaluate database model and outline overall requests on data submission, data access and database interface
Topic
(A1)
(A2)
(A3)
0.2
0.2
0.2
Detailed specification of data model
Recommended (additional) IT documentation for status
quo
Implement data model, restructure database, move to new platform, move data to new
database
Initial implementation, migration of existing data
-
-
2.5
-
1.5
-
Need to have: questions 3.4a, 6.2, 6.4
1.2
1.2
1.3
Nice to have: questions 1.2, 1.3
1.4
1.4
1.3
General planning and specification of import workflow
0.4
0.4
0.3
Need to have: questions 3.1, 3.2a, 3.2c
0.2
0.2
1.2
Nice to have: question 3.2b
0.1
0.1
0.3
General planning and design of data validation including
mathematical details
0.6
0.6
0.6
Need to have: questions 3.4b, 4.1
1.0
1.0
1.3
Nice to have: question 4.2
0.4
0.4
0.4
Synchronising distributed Access DB copies
Review, design and implement new reporting form(s)
Set up quality assurance system for data
reporting
Operationalize the web application for reporting and quality-checking of national data
Integration testing and adaptations
2.0
2.0
2.8
Hosting, setup, approval testing by OSPAR
1.5
1.5
2.5
Documentation: maintenance and end-user
1.0
1.0
1.1
Set up public web application for users to view, graphically display and download data
Need to have: questions 2.2 = Read-only web interface
1.5
1.5
-
Need to have: questions 2.3, 5.1, 5.2, 5.4a-b, 5.4d, 6.3
1.7
1.7
1.5
Nice to have: question 5.4c (XML)
0.7
0.7
0.5
2
2
4
0.5
0.5
0.5
2
2
2
18.4
19.9
24.3
Other reporting and analysis tools not covered in the
questionnaire
Integration in OSPAR information system/GIS etc.
Questions 2.1, 5.5: minimal well-defined data link (does
not cover creating a GIS system or map-based user
interface)
Organisational overhead
(meetings, web conferences)
Σ
Page 28 of 37
QuoData GmbH
Developing the Database for the Comprehensive Study on Riverine Inputs and Direct Discharges (RID)
Step 2 – Evaluate database model and outline overall requests on data submission, data access and database interface
5 OSPAR and HELCOM: Pros and cons of different database
designs (Step 2D)
Below, advantages and disadvantages of the following database designs are discussed:
1. one commonly used database for OSPAR and HELCOM
2. two separate databases with a common structure
3. independent, standalone RID database, i.e. keeping the existing RID database and implement
a new user interface upon it.
QuoData GmbH
Page 29 of 37
Developing the Database for the Comprehensive Study on Riverine Inputs and Direct Discharges (RID)
Step 2 – Evaluate database model and outline overall requests on data submission, data access and database interface
Requirements to be compared
Database content (main tables
for content items and metadata)
Backend (part of application
which defines functionality and
logic and which is not seen by
user)
Frontend (graphical user
interface)
Page 30 of 37
one commonly used database for
OSPAR and HELCOM
same
two separate databases with a
common structure
same
+ consistent content between HELCOM and OSPAR
+ consistent metadata between HELCOM and OSPAR
- needs to be clarified (between HELCOM and OSPAR) which content
and metadata shall be of interest
- needs to be clarified (between HELCOM and OSPAR) which relations
between the tables shall exist
same
same
+ one-time development work for backend (one web server, one database
management system)
+ consistent tables, relations and procedures
- needs to be clarified (between HELCOM and OSPAR) which backend
shall be chosen
same
+ one-time development work for
frontend
+ consistent forms and reports
+ same data query
- needs to be clarified (between
HELCOM and OSPAR) which
frontend shall be chosen
different
Independent, standalone RID DB
different
- inconsistent content possible
- inconsistent metadata possible
+ no administrative expenditure to
harmonize the tables, their
definitions and attributes
+ import format doesn’t change,
CPs don’t need to adjust
different
- double development work for
backend (two web servers, two
database management systems)
- inconsistent tables, relations and
procedures possible
+ no administrative expenditure to
harmonize backend choice
different
- double development work for frontend
- backend become costly if the functionality of the frontend is not clear
- different forms and reports
- different user interface
+ optical differentiation between different areas of responsibility of
HELCOM and OSPAR
+ no administrative expenditure to harmonize frontend choice
QuoData GmbH
Developing the Database for the Comprehensive Study on Riverine Inputs and Direct Discharges (RID)
Step 2 – Evaluate database model and outline overall requests on data submission, data access and database interface
Requirements to be compared
Data submission/ data entry
one commonly used database for
OSPAR and HELCOM
same
two separate databases with a
common structure
same
+ consistent import tables for HELCOM Plus and OSPAR RID
+ same procedure for Contracting Parties with Baltic coast and North-East
Atlantic coast (no double expenditure due to only one data format)
- needs to be clarified (between HELCOM and OSPAR) how the import
tables, import designs look like
- needs to be clarified (between HELCOM and OSPAR) what shall be
imported (measurement uncertainty, monthly data, methodology of
calculation, calculation factors, GIS data etc.)
- needs to be clarified (between HELCOM and OSPAR) what do the
different definitions (e.g. sub-regions, areal and point source definitions)
mean clearly
Data verification
(quality assurance)
QuoData GmbH
same
same
+ same automatic quality control
+ same verification tools (same algorithms for detection of format errors,
missing values, suspicious values, invalid values or duplicates)
+ same statistical calculations (e.g. the detection of outliers highly
depends on the statistical test applied)
+ same documentation (e.g. quality reports, messages, flagging of data)
+ same correction methods
+ comparability of HELCOM Plus data and OSPAR RID data
- needs to be clarified (between HELCOM and OSPAR) which verification
tools shall be used, which different kinds of documentation shall be
inserted and which correction methods shall be carried out
Independent, standalone RID DB
different
- inconsistent import tables
possible
- double expenditure for
Contracting Parties
+ no administrative expenditure to
harmonize import tables and
content of import tables
different
- likely different verification tools
- data providers have to consider
two different QA steps
(verification tools, documentation
systems and correction methods)
- comparability of data can be
slightly limited due to different QA
+ no administrative expenditure to
harmonize QA
Page 31 of 37
Developing the Database for the Comprehensive Study on Riverine Inputs and Direct Discharges (RID)
Step 2 – Evaluate database model and outline overall requests on data submission, data access and database interface
Requirements to be compared
Data output (data analysis,
graphs, trend charts, GIS maps,
results tables, reports)
User rights
one commonly used database for
OSPAR and HELCOM
same
+ easy access to marine
environmental data for institutes and
authorities, Contracting Parties or
scientists
+ compatibility of output data (same
formats)
+ comparability of HELCOM Plus data
and OSPAR RID data
+ corresponding tools need to be
implemented just once
- needs to be clarified (between
HELCOM and OSPAR) which data
shall be exported and how
same
two separate databases with a
common structure
different
same
+ consistent documentation
+ only one documentation for users
- consultations between HELCOM
and OSPAR with regard to
responsibilities in writing
(administrative overhead)
Page 32 of 37
different
- users need to access two different systems
- different output formats
- user has to learn to read different graphs, charts and tables (e.g.
different axis, legends, displays, colours)
- corresponding tools need to be developed twice
+ no administrative expenditure to harmonize data export
+ specific requirements can be taken into account (e.g. different reporting
or different GIS maps are required or other trend charts are of interest)
same
+ clear superior user groups with different rights, for instance data users,
data providers, data managers and IT administrators
+ only one group for data managers and IT administrators
+ only one login for data users and data providers
- needs to be clarified (between HELCOM and OSPAR) who is allowed to do
what and who are data managers and who are IT administrators
Guidelines/Instructions
Independent, standalone RID DB
same/different
+ same backend manual
- different frontend manual
different
- different user groups possible
- data manager and IT
administrators for each system
- double login data for data users
+ clear separation between
HELCOM Plus and OSPAR RID
data
different
- user need to read different
instructions
+ no consultations necessary
+ specific features can be focused
on (e.g. HELCOM features that
are irrelevant to OSPAR)
QuoData GmbH
Developing the Database for the Comprehensive Study on Riverine Inputs and Direct Discharges (RID)
Step 2 – Evaluate database model and outline overall requests on data submission, data access and database interface
OSPAR and HELCOM have partly different thematic focuses and also different requirements. This
also applies to the application of the OSPAR RID database and the HELCOM PLC database. If the
focuses and requirements could be harmonized, an integrated common database would be an
alternative. The advantages of such common system for data users and data providers are obvious.
But the comparatively high costs for harmonization have to be taken into account. Agreements have to
reached, among other things, on
•
Methodology of sampling and monitoring
•
Definitions, e.g. unmonitored areas
•
Methodology of verification of data
•
Consideration of LOD/LOQ, measurement uncertainty
•
Methodology of the estimation of loads
•
Methodology of the quantification of loads from unmonitored areas
•
Methodology of trend analysis
•
Principles of source apportionment
•
Report structure
It has also to be taken into account what happens if the focuses from OSPAR and HELCOM diverge in
10 years. It has always to be assured that there is no preference regarding OSPAR or HELCOM.
Thus, the administrative effort should not be underestimated. For small and customized applications
integrations are recommendable. However, for larger solutions an adequate examination has to be
carried out which can be resulted in two different databases with one common structure. As a webbased database platform also the ICES Data Centre could be a possibility.
QuoData GmbH
Page 33 of 37
Developing the Database for the Comprehensive Study on Riverine Inputs and Direct Discharges (RID)
Step 2 – Evaluate database model and outline overall requests on data submission, data access and database interface
A Appendices
A.1 DBMS backend possibilities comparison
A.1.1
Performance
If data is entered monthly instead of yearly, the size of the database will consequently grow faster.
Therefore, the chosen DBMS should be able to handle several million datasets without major impact
on the performance as well as total database sizes of several Gigabytes.
All of the compared databases fulfil these requirements. While MS SQL exceeds both MySQL and
PostgreSQL in performance, this advantage only starts become important with databases which are
much larger than the database in question.
A.1.2
License costs
MySQL
MS SQL
PostgreSQL
free
Express Edition:
Free, but imposes minor restrictions
free
Standard Edition:
ca. 1.300 € per Core, or ca. 650 € per Server + 150 € per CAL
Business Intelligence Edition:
ca. 6.300 € per Server + 150 € per CAL
Enterprise Edition:
ca. 5.100 € per Core
As the table shows, the licensing fees for the products differ considerably. Both, MySQL and
PostgreSQL are completely free.
MS SQL offers different pricing options and editions. As noted above, the performance restrictions on
the free Express Edition are unlikely to turn out limiting for the project. The per-core pricing options
require a purchase of a minimum of 4 core licenses per physical processor. The pricing options which
include a CAL (Client Access License) mean that one has to pay additional fees according to the
number of user which access the system. Therefore, this option is unsuitable for web applications
which are accesses by more than a few dozen users.
A.1.3
Operating system requirements
While MS SQL only operates on Microsoft Windows systems, MySQL and PostgreSQL are available
for many common operating systems. The monthly hosting costs for a Windows server are slightly
higher than for a server working with Linux.
All DBMS have modest hardware requirements.
Page 34 of 37
Developing the Database for the Comprehensive Study on Riverine Inputs and Direct Discharges (RID)
Step 2 – Evaluate database model and outline overall requests on data submission, data access and database interface
A.1.4
Administration effort
The configuration and maintenance of MySQL is rather simple and there are a lot of tools and
documentation available, as the system is commonly used.
The configuration of a PostgreSQL database is slightly more complex which increases the initial effort.
On the other hand, this provides more possibilities to optimize the performance.
MS SQL is the most complex of the analysed DBMS.
The administration effort is especially important for the long-term maintenance of a future RID DB,
since OSPAR would have to plan with costs for backup and security update tasks. From that
perspective, MySQL should be the preferred DBMS.
A.1.5
Spatial Data (GIS)
Both MS SQL and PostgreSQL provide support for spatial data with a large number of build-in
functions. Current versions of MySQL also support spatial data to a certain degree, but the set of
available functions is much more limited. Therefore, the performance of requests which are dependent
on geographical information could be worse than on the other systems. Even in case GIS data plays
an important role for a future RID DB, this is a very minor point because all functions can be manually
writing during the web application development in case the DBMS doesn’t support them.
A.2 Web database vs. local database software
In this section, the most important advantages and disadvantages of web applications in comparison
with classical MS Office desktop applications (just as the current RID-DB) are listed. This elaboration
shall provide a basis of decision-making which kind of frontend could be realized.
A.2.1
•
•
•
Advantages of web applications
No installation of software on the personal computers
-
Outdated software on a personal computer is not possible
-
Updates can be installed during operation centrally
-
No effort for local CP IT administrators
Low costs per workstation
-
No license costs per workstation for specific software
-
Lower hardware requirements as complex calculations are carried out by web server
High data security
-
The storage of all data on the web server allows the central backup and, if necessary, the
central installation of a backup
-
Web server can be protected by means of firewalls, access controls etc. in order to secure
sensitive data
-
Data can easily be encoded. If the data is stored locally encoded infrastructure, e.g. emails are required. This would be technically much more complex.
-
Different user roles guarantee that each user can only edit data intended for him.
Breaches are less likely and can be detected with log files.
QuoData GmbH
Page 35 of 37
Developing the Database for the Comprehensive Study on Riverine Inputs and Direct Discharges (RID)
Step 2 – Evaluate database model and outline overall requests on data submission, data access and database interface
•
Higher data quality
-
Reports can be created based on the current data due to the storage on the web server
-
Error-prone data transmission via E-mail is dropped
-
Format conversions are not required as in the case of synchronization with locally stored
data (master-client architecture)
•
Easier maintenance of the master data
-
•
Location independence of users
-
•
Master data are maintained and modified centrally in the web server database
User can access the database from each computer
Correct date and time specifications at all time
-
Due to the usage of the date and time information of the web server it is ensured that
always correct and consistent values for data and time are used (record changes, log
books)
•
Low-priced development and maintenance
-
Know-how for web-applications is currently less-expensive compared to know-how for
desktop applications
-
Simple changes can be implemented quicker as compilation of source code is usually not
necessary
-
Integration with other web based technologies is easier, e.g. links to other webpages
(reference to help or external webpages) or links to other web servers or web-based
databases
A.2.2
•
Disadvantages of web applications
Internet connection is required for data exchange and data evaluation
-
unreliable internet connection can affect the work. This can prove challenging during
meetings at foreign/rented locations.
•
an appropriate web server is required
-
web server require more computing power the more users use the application
-
web server has stronger security requirements if it handles sensitive data
-
for use of server-software licenses might be required unless Open-Source software
packages (e.g. MySQL, Drupal, PHP) are used
•
Introduction to working with the user interface is required (MS Office products are more familiar)
•
Display of the application in the web browser depends on the used browser
-
the display of the application needs to be checked for each potential browser
-
complex applications occasionally enforce the use of specific browsers
-
possible difficulties with the IT administration of the user’s authority if specific browsers
should be used
•
Integration with local applications of the personal computer is more complex, for instance the data
export to MS Outlook for mailing purposes
Page 36 of 37
Developing the Database for the Comprehensive Study on Riverine Inputs and Direct Discharges (RID)
Step 2 – Evaluate database model and outline overall requests on data submission, data access and database interface
A.3 Technical considerations on frontend frameworks using Drupal as an
example
Drupal supports the following database systems: MySQL and PostgreSQL as well as MSSQL and
Oracle with additional modules.
Like other frameworks, Drupal natively supports a comprehensive user management system, and it is
easy to present data in tables with sorting and filter options. There are also existing modules for
importing and exporting of data in different formats including the ones asked for in the questionnaire.
Common development tasks can be carried out without any programming. For more special functions,
own modules have to be written in PHP.
The magnitude of the reduction of development time compared to creating a web application from
scratch depends on the complexity of the system and how much of its functionality is covered by
existing Drupal modules. For very complex web applications or very unexperienced programmers,
there is a small risk that the development time with Drupal exceeds the development time without it
(app from scratch). This applies to all other web application frameworks.
If data is entered monthly instead of yearly, the size of the database will consequently grow faster.
Therefore, the chosen framework should be able to handle several million datasets without major
impact on the performance. Drupal can handle this amount of data but it uses complex database
structure which is highly flexible but impacts the performance negatively. To a certain degree, this can
be compensated with appropriate hardware of the server, which would increase the hosting costs. The
server hardware should be chosen based on a limit on the user interface response time and a
sufficient amount of data.
QuoData GmbH
Page 37 of 37