Download User scenario on Marine Biodiversity AquaMaps User scenario on

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
User scenario on Marine Biodiversity
AquaMaps
Pasquale Pagano
[email protected]
National Research Council (CNR) – ISTI
Italy
www.venus-c.eu
Marine Biodiversity
AquaMaps
• AquaMaps VRE is a virtual environment
providing set of services for the generation,
standardized dissemination, and mapped
visualization of model-based, large-scale
predictions of currently known occurrence of
marine species
Standardized range maps of
marine species
• Ecological Niche Modelling
– extrapolation of known species occurrences data to determine
environmental envelopes (species tolerances)
– predict future distributions by matching species tolerances against
local environmental conditions (e.g. climate change and sea pollution)
• AquaMaps*
– Maps large-scale species distributions based on existing but
fragmented and potentially non-representative occurrence data
– Uses knowledge of the geographic extents of commercial species
available from FAO
– Uses information about habitat usage available from online species
databases
* Initially defined to predict global distributions of marine mammals (by Kashner et al. -2006) and then
generalised to marine species.
AquaMaps is the only species distribution modelling approach that combines numerical algorithms
with expert knowledge
Species Range Maps
Production Workflow
HSPEN
Good
Cells
HCAF
HCAF
Generating
Species
Occurrences
Probability
Defining
Environmental
Envelopes
Biological
Species
HSPEN
HSPEN
HSPEN
Plotting Range
Maps
HSPEC
HSPEC
HSPEC
• Color-coded species range map, using a halfdegree latitude and longitude dimensions
Species Range Maps
Complexity
0.5 latitude and longitude cells
(35 km2 equator), > 170k marine cells
Large volume of input and output data
• Less than 7,000 species:
– native range = 56,468,301
– suitable range = 114,989,360
• Estimation for 50,000 species:
– native range = 350,000,000
– suitable range = 715,000,000
[Eli E. Agbayani, FishBase Project/INCOFISH]
Very large number of computation
• One Multispecies map computed on 3.5 % of the marine areas and 25%
species requires 125 millions computations
• One global map (extended to all species and marine cells around the
world) requires about 400 billions computations
[N. Bailly, WorldFish Center]
AquaMaps is a production
environment
• Produce range maps, multispecies maps plus a
variety of thematic maps: taxonomic,
climatologic, invasiveness, …
• Allows researchers to evaluate the impact of
climate changes (e.g. 2050)
• Distribute maps via different services
– FishBase, the most widely used biological information
system with over one million visitors per month;
– Species information systems such as SeaLifeBase and
GBIF
Current Limitations
• To evaluate the marine
biodiversity changes in
presence of human
disasters
• To provide useful
information to mitigate
the impact of natural
disasters on marine
biodiversity
0.5 latitude and longitude cells
Increasing the precision
means …
Less than 13,000 species
• Increase resolution 0.5 deg -> 0.1 deg
– Environment DBs => 4,5 million rows
– Iterations => 55 billion iterations
– Species Prediction DBs => 2.5 billion rows
• Scientists may tweak parameters
– Species Prediction up to 55 billion rows (870 Gb)
Roadmap to increase the number of species 13,000 -> 50,000
– 4-times more rows
 Change the current technologies
– Relational databases
– Execution models
8
Expected benefits of VENUS-C
• Execution models



+
Sequential
Multithreaded
Batch
MapReduce
• the overall problem is split into smaller chunks which can be processed in parallel. All
partial solutions are then combined / reduced into the overall result
+ COMPSs
• COMPSs runtime is responsible for processing a single application as remote tasks, i.e.
checking their data dependencies and scheduling their concurrent execution on
distributed parallel resources
+ Generic Worker
• Distribute the load over multiple machines
• IO

+
+
+
Relational
Hadoop VFS (local fs, http, ftp, s3, kfs, hdfs)
Cassandra
CDMI
Conclusions
• Venus-C allows to
– Perform analysis that otherwise could not be
undertaken
– Increase the efficiency of research
– Sharing results in a community in real time
– Maximize the data production
– Accelerate the research