Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
User scenario on Marine Biodiversity AquaMaps Pasquale Pagano [email protected] National Research Council (CNR) – ISTI Italy www.venus-c.eu Marine Biodiversity AquaMaps • AquaMaps VRE is a virtual environment providing set of services for the generation, standardized dissemination, and mapped visualization of model-based, large-scale predictions of currently known occurrence of marine species Standardized range maps of marine species • Ecological Niche Modelling – extrapolation of known species occurrences data to determine environmental envelopes (species tolerances) – predict future distributions by matching species tolerances against local environmental conditions (e.g. climate change and sea pollution) • AquaMaps* – Maps large-scale species distributions based on existing but fragmented and potentially non-representative occurrence data – Uses knowledge of the geographic extents of commercial species available from FAO – Uses information about habitat usage available from online species databases * Initially defined to predict global distributions of marine mammals (by Kashner et al. -2006) and then generalised to marine species. AquaMaps is the only species distribution modelling approach that combines numerical algorithms with expert knowledge Species Range Maps Production Workflow HSPEN Good Cells HCAF HCAF Generating Species Occurrences Probability Defining Environmental Envelopes Biological Species HSPEN HSPEN HSPEN Plotting Range Maps HSPEC HSPEC HSPEC • Color-coded species range map, using a halfdegree latitude and longitude dimensions Species Range Maps Complexity 0.5 latitude and longitude cells (35 km2 equator), > 170k marine cells Large volume of input and output data • Less than 7,000 species: – native range = 56,468,301 – suitable range = 114,989,360 • Estimation for 50,000 species: – native range = 350,000,000 – suitable range = 715,000,000 [Eli E. Agbayani, FishBase Project/INCOFISH] Very large number of computation • One Multispecies map computed on 3.5 % of the marine areas and 25% species requires 125 millions computations • One global map (extended to all species and marine cells around the world) requires about 400 billions computations [N. Bailly, WorldFish Center] AquaMaps is a production environment • Produce range maps, multispecies maps plus a variety of thematic maps: taxonomic, climatologic, invasiveness, … • Allows researchers to evaluate the impact of climate changes (e.g. 2050) • Distribute maps via different services – FishBase, the most widely used biological information system with over one million visitors per month; – Species information systems such as SeaLifeBase and GBIF Current Limitations • To evaluate the marine biodiversity changes in presence of human disasters • To provide useful information to mitigate the impact of natural disasters on marine biodiversity 0.5 latitude and longitude cells Increasing the precision means … Less than 13,000 species • Increase resolution 0.5 deg -> 0.1 deg – Environment DBs => 4,5 million rows – Iterations => 55 billion iterations – Species Prediction DBs => 2.5 billion rows • Scientists may tweak parameters – Species Prediction up to 55 billion rows (870 Gb) Roadmap to increase the number of species 13,000 -> 50,000 – 4-times more rows Change the current technologies – Relational databases – Execution models 8 Expected benefits of VENUS-C • Execution models + Sequential Multithreaded Batch MapReduce • the overall problem is split into smaller chunks which can be processed in parallel. All partial solutions are then combined / reduced into the overall result + COMPSs • COMPSs runtime is responsible for processing a single application as remote tasks, i.e. checking their data dependencies and scheduling their concurrent execution on distributed parallel resources + Generic Worker • Distribute the load over multiple machines • IO + + + Relational Hadoop VFS (local fs, http, ftp, s3, kfs, hdfs) Cassandra CDMI Conclusions • Venus-C allows to – Perform analysis that otherwise could not be undertaken – Increase the efficiency of research – Sharing results in a community in real time – Maximize the data production – Accelerate the research