Download BlueBRIDGE Competitive Call – Data management services for

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Nearest-neighbor chain algorithm wikipedia , lookup

Nonlinear dimensionality reduction wikipedia , lookup

K-nearest neighbors algorithm wikipedia , lookup

Cluster analysis wikipedia , lookup

Expectation–maximization algorithm wikipedia , lookup

K-means clustering wikipedia , lookup

Transcript
BlueBRIDGE – 675680
www.bluebridge-vres.eu
Annex A: Proposal Template
Green highlighted areas to be completed by applicant
BlueBRIDGE Competitive Call – Data management
services for SMEs
Building Research environments for fostering Innovation, Decision making,
Governance and Education to support Blue growth
March 2017
Full title of your project
Acronym of your proposal (optional)
BlueBRIDGE receives funding from the European Union’s Horizon 2020 research and innovation programme under
grant agreement No. 675680
BlueBRIDGE – 675680
www.bluebridge-vres.eu
Disclaimer
BlueBRIDGE (675680) is a Research and Innovation Action (RIA) co-funded by the
European Commission under the Horizon 2020 research and innovation
programme
The goal of BlueBRIDGE, Building Research environments for fostering Innovation,
Decision making, Governance and Education to support Blue growth, is to support
capacity building in interdisciplinary research communities actively involved in
increasing the scientific knowledge of the marine environment, its living resources,
and its economy with the aim of providing a better ground for informed advice to
competent authorities and to enlarge the spectrum of growth opportunities as
addressed by the Blue Growth societal challenge.
This document is the application form for the BlueBRIDGE Open Call for SMEs.
The document has been produced with the funding of the European Commission.
The content of this publication is the sole responsibility of the BlueBRIDGE Consortium and its experts, and it
cannot be considered to reflect the views of the European Commission.
BlueBRIDGE Open Call Application Form
Page 2 of 20
BlueBRIDGE – 675680
www.bluebridge-vres.eu
Table of contents
1
Background and Qualifications ............................................................................................ 4
2
Problem Statement and Objectives ...................................................................................... 5
3
Business Impact and Sustainability ...................................................................................... 6
4
Requested BlueBRIDGE resources ........................................................................................ 7
5
Open Access ...................................................................................................................... 20
BlueBRIDGE Open Call Application Form
Page 3 of 20
BlueBRIDGE – 675680
1 BACKGROUND AND QUALIFICATIONS
www.bluebridge-vres.eu
Provide a brief company profile including information on who your customers are, an overview of the
activities that you perform and your qualifications (including the technical expertise that you have in
house).
Maximum 500 words
Remark: The information in this section may be used in public documents and reports by the BlueBRIDGE
consortium.
BlueBRIDGE Open Call Application Form
Page 4 of 20
BlueBRIDGE – 675680
2 PROBLEM STATEMENT AND OBJECTIVES
www.bluebridge-vres.eu
Outline the problem statement that BlueBRIDGE can help you to address and describe the objectives that you
want to achieve through this proposal. These objectives should be those achievable within your proposed
action, not through subsequent development. Preferably, they should be stated in a measurable and
verifiable form.
Maximum 500 words.
BlueBRIDGE Open Call Application Form
Page 5 of 20
BlueBRIDGE – 675680
3 BUSINESS IMPACT AND SUSTAINABILITY
www.bluebridge-vres.eu
Describe how the set up of the proposed BlueBRIDGE collaborative environment may impact on the
growth/innovation/business/service portfolio/research etc. If you are planning to use the BlueBRIDGE
services/resources to improve a current process, please describe how this will change your process (a detailed
description of your current process would be useful for evaluators). Please also indicate if you envisage the
use of BlueBRIDGE also beyond the duration of the proposal.
Maximum 500 words.
BlueBRIDGE Open Call Application Form
Page 6 of 20
BlueBRIDGE – 675680
4 REQUESTED BLUEBRIDGE RESOURCES
www.bluebridge-vres.eu
Please select from the given list which BlueBRIDGE data sources, services, technologies, data analytics and
algorithms will be required for your application.
Data Sources
Name
Description/Link
Examples
Biological
and
Ecological List of
Names
Biological and Ecological authoritative and
comprehensive list of names of marine
organisms, including information on synonymy
Catalogue of Life, World
Register of Marine Species
(WoRMS), World Register of
Deep-Sea Species (WoRDSS).
Taxonomic, trophic level and
life history traits data from
FishBase
Biological
and
Ecological Data
Biological and Ecological Data evidence about
more than 1.6 million species, collected over
three centuries of natural history exploration
and including current observations from citizen
scientists, researchers and automated
monitoring programmes.
Global
Biodiversity
Information Facility (GBIF),
Ocean
Biogeographic
Information System (OBIS)
Chemical
&
physical
variables
with
global geospatial
coverage
Apparent
Oxygen
Utilization
World
Ocean
Atlas,
EMODnet,
Copernicus
Marine
Environmental
Monitoring System, Planet
OS, GEBCO
Annual,
seasonal
-
monthly,
and
Apparent Oxygen
Utilization
Dissolved Oxygen
Do you need
this resource?
Oxygen Saturation
Ice
Concentration,
velocity
Chlorophyll
Mass
Concentration
Chlorophyll in Sea Water
-
Mole
Concentration
-
thickness,
of
of Dissolved Oxygen in
Sea Water
Nitrate in Sea Water
Phosphate in Sea Water
Phytoplankton expressed as
carbon in sea water
BlueBRIDGE Open Call Application Form
Page 7 of 20
BlueBRIDGE – 675680
www.bluebridge-vres.eu
Carbon
Net Primary Productivity of
Carbon
Nitrate
Annual,
seasonal
monthly,
and
Phosphate
Annual,
seasonal
monthly,
and
Salinity
Monthly average coverage
Sea Surface
Height
Monthly average coverage
Sea
Water
Salinity
Annual,
seasonal
monthly,
and
Sea
Water
Temperature
Annual,
seasonal
monthly,
and
Silicate
Annual,
seasonal
monthly,
and
Temperature
Monthly average coverage
Wind Speed
Monthly ASCAT global wind
Wind Stress
Monthly ASCAT global wind
Zonal Velocity
Monthly average coverage
Services & Technologies
Name
RStudio
Description/Link
Characteristic
RStudio makes R easier to use. It
includes
a
code
editor,
debugging & visualization tools
RStudio server is configured with 16
cores and 16 GB RAM
BlueBRIDGE Open Call Application Form
Do you need this
resource?
Page 8 of 20
BlueBRIDGE – 675680
www.bluebridge-vres.eu
Data Miner
DataMiner Manager1 is a
computational
engine
for
performing
data
analytics
operations. Specifically, it offers a
unique access to perform data
analytics on heterogeneous data,
which may reside either at client
side, in the form of commaseparated values files, or be
remotely hosted, possibly in a
database.
Data Miner cluster is configured to
support high throughput computing
on 100 cores and 100 GB RAM
Spatial
Data
Infrastructure
The Spatial Data Infrastructure
includes:
- Geoserver cluster to manage
vector data accessible via OGC
WMS and WFS protocols
The Spatial Data Infrastructure is
configured to support the storage of
spatially referenced datasets up to
0.5 TB of disk space.
- Geonetwork to manage spatial
spatially referenced metadata
accessible via OGC CSW protocol
- Thedds Data Service cluster to
manage NetCDF, OpenDAP, and
HDF5 datasets accessible via
OPeNDAP protocol
1
Storage
Infrastructure
The
Storage
infrastructure
supports
storage
of
files
organized in directories. Policies
can be associated with directories
by selecting private to a single
user, restricted access to specified
users, shared with all users of the
VRE
The Storage infrastructure is
configured to support the storage of
files up to 2 TB of disk space.
Relational
Database
Relational
Database
transactional replication
with
The Relational Database cluster is
configured to support the storage up
to 0.5 TB of disk space
Social
Framework
All applications running on the
infrastructure
are
make
accessible through a portal. It
includes
facilities
for
the
management of users, for
communicating with users via
The Social Framework cluster is
configured to manage up to 500
users and 5 VREs.
Mandatory
If you need to trun an algorithm or a model on the infrastructure you need Data Miner
BlueBRIDGE Open Call Application Form
Page 9 of 20
BlueBRIDGE – 675680
www.bluebridge-vres.eu
posts and notifications, for
managing access policies, etc.
SmartGears
Framework
SmartGears framework is to make
your Tomcat based application
runnable on the infrastructure. It
manages on behalf of the
application
authentication,
authorization,
accounting,
monitoring, and alerting.
Performance
evaluation
aquaculture
Techno economic investment
analysis and what if analysis --news aquaculture training VRE
news
in
Mandatory
if
Tomcat
based
applications must
be hosted on the
infrastructure
Data
harmonization?
The Data Harmonization facility
supports the semi-automatic
harmonization of time series with
respect to code lists and
controlled
vocabularies.
It
provides a suite for human
curators that can define tailored
template for harmonizing series
of time series.
The Data Harmonization facility can
be used to harmonize time series up
to 1 M observations for each
iteration of the harmonization
process.
Data Publication
Species
distribution
maps
generation;
Production
of
indicators; Facilities for creating
and
managing
enhanced
documents;
generation
of
standard ISO 10139 metadata for
geospatial datasets
The Data publication facility allows
to publish product in the VRE with
the aim to make available either at
all members of the VRE or open
access.
Data Analytics
Name
Description/Link
Examples
Facilities for
species
occurrence and
geospatial
datasets
processing
Time Series Analysis, Time Geo
Chart, XYExtractor, ZExtraction,
Raster Data Publisher, ESRI-GRID
Extraction, Maps Comparison
The Scalable Data Mining VRE
BlueBRIDGE Open Call Application Form
Do you need this
service?
Published examples:
Coro,
Gianpaolo,
Pasquale
Pagano, and Anton Ellenbroek.
"Comparing
heterogeneous
distribution maps for marine
Page 10 of 20
BlueBRIDGE – 675680
www.bluebridge-vres.eu
species." GIScience & Remote
Sensing 51.5 (2014): 593-611.
Coro, Gianpaolo, et al. "Automatic
classification of climate change
effects on marine species
distributions in 2050 using the
AquaMaps
model."
Environmental
and
ecological statistics 23.1 (2016):
155-180.
Facilities
for
performing data
mining tasks on
tabular
and
computer science
data
Feed Forward Neural Network
Regressor, Feed Forward Neural
Network Trainer, Dbscan, Kmeans,
Lof, Xmeans, WEB App Publisher,
Quality Analysis, Generic Charts,
Stat Val
The Tabular Data Lab VRE
Published examples:
Candela, Leonardo, et al. "Species
distribution modeling in the
cloud."
Concurrency
and
Computation:
Practice
and
Experience (2013).
Coro,
Gianpaolo,
et
al.
"Parallelizing the execution of
native data mining algorithms for
computational
biology."
Concurrency
and
Computation:
Practice
and
Experience 27.17 (2015): 46304644.
Facilities for the
management and
supervision
of
ecosystems
Absence Cells from AquaMaps, HRS,
Absence Generation from Obis,
Estimate Monthly Fishing Effort,
Ecopath with Ecosim, Estimate
Fishing
Activity,
SEADATANET
Interpolator, Species Maps from
Points, BiOnym, Whole Steps Vpa
Iccat Bft E
The Biodiversity Lab VRE
Published examples:
Coro, Gianpaolo, et al. "Improving
data quality to build a robust
distribution
model
for
Architeuthis dux." Ecological
Modelling 305 (2015): 29-39.
Coro, Gianpaolo, Luigi Fortunati,
and Pasquale Pagano. "Deriving
fishing monthly effort and caught
species
from
vessel
trajectories."
OCEANS-Bergen,
2013 MTS/IEEE. IEEE, 2013.
BlueBRIDGE Open Call Application Form
Page 11 of 20
BlueBRIDGE – 675680
www.bluebridge-vres.eu
Facilities for the
development of
optimized
feeding
and
growth models
Simulfishkpis
Performance and Evaluation in
Aquaculture VRE
Facilities
for
supporting
decision making
and
strategic
investment
analysis
and
doing
better
planning in the
aquaculture
domain
Mpa Intersect V2
Protected Area Impact Maps VRE,
Aquaculture Atlas Generation VRE
Algorithms
Name
Description/Link
Feed
Forward
Neural Network
Regressor
The algorithm simulates a realvalued vector function using a
trained Feed Forward Artificial
Neural Network and returns a table
containing the function actual inputs
and the predicted outputs
Requires the DataMiner Cluster
Feed
Forward
Neural Network
Trainer
The algorithm trains a Feed Forward
Artificial Neural Network using an
online Back-Propagation procedure
and returns the training error and a
binary file containing the trained
network
Requires the DataMiner Cluster
Dbscan
A clustering algorithm for real
valued vectors that relies on the
density-based spatial clustering of
applications with noise (DBSCAN)
algorithm. A maximum of 4000
points is allowed.
Requires the DataMiner Cluster
Kmeans
A clustering algorithm for real
valued vectors that relies on the k-
Requires the DataMiner Cluster
BlueBRIDGE Open Call Application Form
Requirements
Do you need this
resource?
Page 12 of 20
BlueBRIDGE – 675680
www.bluebridge-vres.eu
means algorithm, i.e. a method
aiming to partition n observations
into k clusters in which each
observation belongs to the cluster
with the nearest mean, serving as a
prototype of the cluster. A
Maximum of 4000 points is allowed.
Lof
Local Outlier Factor (LOF). A
clustering algorithm for real valued
vectors that relies on Local Outlier
Factor algorithm, i.e. an algorithm
for finding anomalous data points
by measuring the local deviation of
a given data point with respect to its
neighbours. A Maximum of 4000
points is allowed.
Requires the DataMiner Cluster
Xmeans
A
clustering
algorithm
for
occurrence points that relies on the
X-Means
algorithm,
i.e.
an
extended version of the K-Means
algorithm improved by an ImproveStructure part. A Maximum of 4000
points is allowed.
Requires the DataMiner Cluster
An algorithms applying signal
processing to a non uniform time
series. A maximum of 10000 distinct
points in time is allowed to be
processed. The process uniformly
samples the series, then extracts
hidden periodicities and signal
properties. The sampling period is
the shortest time difference
between two points. Finally, by
using Caterpillar-SSA the algorithm
forecasts the Time Series. The
output shows the detected
periodicity, the forecasted signal
and the spectrogram.
Requires the DataMiner Cluster
Time Geo Chart
An algorithm producing an
animated gif displaying quantities
as colors in time. The color indicates
the sum of the values recorded in a
country.
Requires the DataMiner Cluster
XYExtractor
An algorithm to extract values
associated to an environmental
Requires the DataMiner Cluster
Time
Analysis
Series
BlueBRIDGE Open Call Application Form
Page 13 of 20
BlueBRIDGE – 675680
www.bluebridge-vres.eu
feature repository (e.g. NETCDF,
ASC, GeoTiff files etc. ). A grid of
points at a certain resolution is
specified by the user and values are
associated to the points from the
environmental repository. It accepts
as one geospatial repository ID (via
their UUIDs in the infrastructure
spatial data repository - recoverable
through the Geoexplorer portlet) or
a direct link to a file and the
specification about time and space.
The algorithm produces one table
containing the values associated to
the selected bounding box.
ZExtraction
An algorithm to extract the Z values
from
a
geospatial
features
repository (e.g. NETCDF, ASC,
GeoTiff files etc. ). The algorithm
analyses the repository and
automatically extracts the Z values
according to the resolution wanted
by the user. It produces one chart of
the Z values and one table
containing the values.
Requires the DataMiner Cluster
Absence
Cells
from AquaMaps
An algorithm producing cells and
features (HCAF) for a species
containing absense points taken by
an Aquamaps Distribution
Requires the DataMiner Cluster
HRS
An algorithm that calculates the
Habitat Representativeness Score,
i.e. an indicator of the assessment of
whether a specific survey coverage
or another environmental features
dataset, contains data that are
representative of all available
habitat variable combinations in an
area.
Requires the DataMiner Cluster
Absence
Generation from
Obis
An algorithm to estimate absence
records from survey data in OBIS.
Based on the work in Coro, G.,
Magliozzi, C., Berghe, E. V., Bailly,
N., Ellenbroek, A., & Pagano, P.
(2016).
Estimating
absence
locations of marine species from
Requires the DataMiner Cluster
BlueBRIDGE Open Call Application Form
Page 14 of 20
BlueBRIDGE – 675680
www.bluebridge-vres.eu
data of scientific surveys in OBIS.
Ecological Modelling, 323, 61-76.
Raster
Publisher
Data
This algorithm publishes a raster file
as a maps or datasets in the eInfrastructure. NetCDF-CF files are
encouraged, as WMS and WCS
maps will be produced using this
format. For other types of files
(GeoTiffs, ASC etc.) only the raw
datasets will be published. The
resulting map or dataset will be
accessible via the VRE GeoExplorer
by the VRE participants.
Requires the DataMiner Cluster
Estimate
Monthly Fishing
Effort
An algorithm that estimates fishing
exploitation at 0.5 degrees
resolution from activity-classified
vessels trajectories. Produces a
table with csquare codes, latitudes,
longitudes and resolution and
associated overall fishing hours in
the time frame of the vessels
activity. Requires each activity point
to be classified as Fishing or other.
This algorithm is based on the paper
'Deriving Fishing Monthly Effort and
Caught Species' (Coro et al. 2013, in
proc. of OCEANS - Bergen, 2013
MTS/IEEE). Example of input table
(NAFO
anonymised
data):
http://goo.gl/3auJkM
Requires the DataMiner Cluster
Ecopath
Ecosim
with
Ecopath with Ecosim (EwE) is a free
ecological/ecosystem
modeling
software suite. This algorithm
implementation expects a model
and a configuration file as inputs;
the result of the analysis is returned
as a zip archive. References:
Christensen, V., & Walters, C. J.
(2004). Ecopath with Ecosim:
methods,
capabilities
and
limitations. Ecological modelling,
172(2), 109-139.
Requires the DataMiner Cluster
Estimate Fishing
Activity
An algorithm that estimates activity
hours (fishing or other) from vessels
trajectories, adds bathymetry
information to the table and
Requires the DataMiner Cluster
BlueBRIDGE Open Call Application Form
Page 15 of 20
BlueBRIDGE – 675680
www.bluebridge-vres.eu
classifies (point-by-point) fishing
activity of the involved vessels
according to two algorithms: one
based
on
speed
(activity_class_speed
output
column) and the other based on
speed
and
bathymetry
(activity_class_speed_bath output
column). The algorithm produces
new columns containing this
information. This algorithm is based
on the paper 'Deriving Fishing
Monthly Effort and Caught Species'
(Coro et al. 2013, in proc. of
OCEANS - Bergen, 2013 MTS/IEEE).
Example of input table (NAFO
anonymised
data):
http://goo.gl/3auJkM
ESRI-GRID
Extraction
An algorithm to extract values
associated to an environmental
feature repository (e.g. NETCDF,
ASC, GeoTiff files etc. ). A grid of
points at a certain resolution is
specified by the user and values are
associated to the points from the
environmental repository. It accepts
as one geospatial repository ID (via
their UUIDs in the infrastructure
spatial data repository - recoverable
through the Geoexplorer portlet) or
a direct link to a file and the
specification about time and space.
The algorithm produces one ESRI
GRID ASCII file containing the
values associated to the selected
bounding box.
Requires the DataMiner Cluster
SEADATANET
Interpolator
A connector for the SeaDataNet
infrastructure. This algorithms
invokes the Data-Interpolating
Variational
Analysis
(DIVA)
SeaDataNet service to interpolate
spatial data. The model uses
GEBCO bathymetry data and
requires an estimate of the
maximum spatial span of the
correlation between points and the
signal-to-noise ratio, among the
other parameters. It can interpolate
up to 10,000 points randomly taken
Requires the DataMiner Cluster
BlueBRIDGE Open Call Application Form
Page 16 of 20
BlueBRIDGE – 675680
www.bluebridge-vres.eu
from the input table. As output, it
produces a NetCDF file with a
uniform grid of values. This
powerful interpolation model is
described in Troupin et al. 2012,
'Generation of analysis and
consistent error fields using the
Data Interpolating Variational
Analysis (Diva)', Ocean Modelling,
52-53, 90-101.
WEB
Publisher
App
This algorithm publishes a zip file
containing a Web site, based on
html and javascript in the eInfrastructure. It generates a public
URL to the application that can be
shared.
Requires the DataMiner Cluster
Maps
Comparison
An algorithm for comparing two
OGC/NetCDF maps in seamless way
to the user. The algorithm assesses
the similarities between two
geospatial maps by comparing
them in a point-to-point fashion. It
accepts as input the two geospatial
maps (via their UUIDs in the
infrastructure
spatial
data
repository - recoverable through the
Geoexplorer portlet) and some
parameters
affecting
the
comparison such as the z-index, the
time index, the comparison
threshold. Note: in the case of WFS
layers it makes comparisons on the
last feature column.
Requires the DataMiner Cluster
Quality Analysis
An evaluator algorithm that
assesses the effectiveness of a
distribution model by computing
the
Receiver
Operating
Characteristics (ROC), the Area
Under Curve (AUC) and the
Accuracy of a model
Requires the DataMiner Cluster
Species
Maps
from Points
An algorithm to produce a GIS map
from a probability distribution made
up of x,y coordinates and a certain
resolution.
Requires the DataMiner Cluster
BlueBRIDGE Open Call Application Form
Page 17 of 20
BlueBRIDGE – 675680
www.bluebridge-vres.eu
Generic Charts
An algorithm producing generic
charts of attributes vs. quantities.
Charts are displayed per quantity
column. Histograms, Scattering and
Radar charts are produced for the
top ten quantities. A gaussian
distribution reports overall statistics
for the quantities.
Requires the DataMiner Cluster
BiOnym
An
algorithm
implementing
BiOnym, a flexible workflow
approach to taxon name matching.
The workflow allows to activate
several taxa names matching
algorithms and to get the list of
possible transcriptions for a list of
input raw species names with
possible authorship indication.
Requires the DataMiner Cluster
Stat Val
Statistical validation of BIPARTITE
WEIGHTED network.
Requires the DataMiner Cluster
Simulfishkpis
Creates simulation models for KPIs
fish production in Aquaculture.
Import data from SimulFish Growth
database via URLs. Calculated KPIs
are FCR, SFR, Mortality using
Regression models generated by
GAMs and MARs methodologies.
Requires the DataMiner Cluster
Whole Steps Vpa
Iccat Bft E
ICCAT (Eastern) Bluefin Tuna Stock
Assessment. This set of R and
Fortran code have been provided by
ICCAT and IFremer to execute a
whole Stock assessment workflow
Requires the DataMiner Cluster
Mpa Intersect V2
An algorithm to compute areas of
geomorphic features in an EEZ or
ECOREGION area and in its
intersecting Marine Protected Areas
(MPAs)
Requires the DataMiner Cluster
BlueBRIDGE Open Call Application Form
Page 18 of 20
BlueBRIDGE – 675680
www.bluebridge-vres.eu
If you need any other specific resource which is not part of the current list or you need to integrate in the
virtual environment your own software/application (for example R, Java, JavaScript, Phyton, etc.), please
describe below what you need and why (max.1 page)
BlueBRIDGE Open Call Application Form
Page 19 of 20
BlueBRIDGE – 675680
5 OPEN ACCESS
www.bluebridge-vres.eu
Indicate your willingness to share your data. Identify the type of data that you will be sharing and an overall
percentage. The more data you share on the BlueBRIDGE infrastructure, the more access to resources we will
provide. Open access is not mandatory but strongly encouraged.
Maximum 500 words.
BlueBRIDGE Open Call Application Form
Page 20 of 20