Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Entity–attribute–value model wikipedia , lookup
Data Protection Act, 2012 wikipedia , lookup
Data center wikipedia , lookup
Data analysis wikipedia , lookup
Forecasting wikipedia , lookup
Data vault modeling wikipedia , lookup
3D optical data storage wikipedia , lookup
Database model wikipedia , lookup
Toward a distributed information system for marine biology and limnology (aka PAKT project) Presenting: Karen Stocks, Amarnath Gupta, Chris Condit Peter Arzberger (PI), Paul Brewin, Li Chen, Heasoo Hwang, Yannis Papakonstantinou, Xufei Qian, Simone Santini, Reza Wahadj, Ilya Zaslavsky + Rutgers University, University of Auckland, U. Wisconsin Funding from the Gordon and Betty Moore Foundation The Big Challenge: Integrating distributed and heterogeneous data resources to advance marine ecology and limnology Opening the “Data Closet” OBIS Seamounts Lakes Testbed CalCOFI Marine Testbed Information Technology Development Seamounts (undersea mountains) Seamounts are - biologically unique - heavily fished habitats SeamountsOnline: Centralized relational database Seamount Science Example Can seamount diversity be predicted from seamount depth, distance from continental margin, geological age, surface productivity, etc.? Does endemism follow the predictions if Island Biogeography Theory? Seamount Challenges Combine multiple, distributed datatypes: • relational species distributions data in SeamountsOnline (seamounts.sdsc.edu) • bathymetry data and seamount morphology data in the Seamount Catalog (earthref.org) • raster physical data from World Ocean Atlas, satellite imagery, etc. Users Research: CenSeam – Data Analysis Working Group – Expedition Planning Management – United Nations: IUCN-sponsored workshop on deepwater corals on Seamount – International Seabed Authority workshop Seamount Research Coordination Network, NSF OBIS: Ocean Biogeographic Information System (www.iobis.org) OBIS • The Ocean Biogeographic Information System is an international federation of 50+ distributed data providers (7 mil data records) sharing species distribution data • OBIS has a well established community (secretariat funding, 10 regional node centers, etc.) but limited resources to build infrastructure • The current DiGIR client-server system allows ~70 fields of data to be transferred (an extended Darwin Core) (www.iobis.org) OBIS Science Examples • Evaluating biogeographic provinces with real data • Predicting the spread of invasive species • Identifying diversity hotspots/siting marine protected areas • Evaluating our state of knowledge OBIS Challenges • integrate OBIS biological data with emerging physical data resources • hierarchical data • allow habitat-specific data exploration • extend query functionality (e.g. to complex spatial queries) • capture more data when registering new data providers/serve specific communities better Integrate OBIS biological data with emerging physical data resources CalCOFI - CalCOFI (the California Cooperative Ocean Fisheries Investigations) is a 50+ year long monitoring study off of Southern California - 4 times per year a regular grid of stations is sampled for larval fish, zooplankton, and physical ocean parameters CalCOFI Science Examples • Determining scales of variability in biological components in space and time • Correlating fluctuations in larval fish abundance with physical parameters over time. • Developing ecosystem models for habitatbased management Technical Challenges • Multiple data types: relational, hierarchical, raster, point, voxel, etc. • Geospatial data operations • Ontologies • Higher knowledge sources Integrating Physical and Biological Oceanographic Data The Information Systems Viewpoint What are we integrating and why? • The Science Goals – Explain biodiversity • • • • Of a species Of any taxonomic grouping of species Around a habitat By correlating distribution of a taxonomic group with the spatial (temporal) distribution of physical phenomena • By creating groupings of physical and biological parameters that correlate with the distribution and abundance of species – Perhaps for specific habitats – Create predictive models • Given physical parameters or habitat characteristics, predict species distribution and abundance • Given species distribution, predict physical parameters • … Studies collected-for Samples taken-from A Conceptual Framework for a Global Biodiversity Schema collected-from Collection Method Collection System Intra-class-relationships (parameterized) Collection Target Loc. Loc Classes Classes observed-at Time/Frequency Intra-class-relationships (parameterized) Organism Organism Classes Classes Observations Partial-mapping occur-at Location Generic Locational Reference Of Organisms Organism Properties Referred Object Organism-Class Existence Point-in-space Organism-Class Abundance Surface-in-space Organism-Class Rel. Abundance Individual Organism Contributions spatial relationships Organisms Organism Properties enviro-locationrelationships Partial-mapping associated-with Partial-mapping Environ. Environ. Ontology-1 Ontology-k Spatial-Volume solid annular Environmental Parameters Generic Environ. Reference Of Organisms Environmental Region Properties A Conceptual Framework for a Global Physical Oceanography Schema collection metadata Measurement (data/function) resolution point spatial collection pattern surface time/frequency parameters coverage value Referred Object Point-in-space scalar prob. dense vector view-definition name properties Surface-in-space Spatial-Volume solid annular Phenomena sparse volume What are we integrating and why? • Data elements – The central elements • Distribution of biological and physical variables – Point distributions – Field distributions – Object-bound distributions • Grouping of biological and physical variables – Hierarchical groupings – Hypergraph groupings – Additional elements • • • • Geographic boundaries Details of observations Details of habitats and objects therein … Point, Field & Object-bound Distributions • Distributions – Point distributions are sparse • Continuous distributions – Field distributions are dense • Often discrete – Object-bound distributions are sparse • Around objects • Associated with other object-related properties • Modeling field distributions as arrays – Can be modeled using nested-relational calculus (algebra) + indices + counting (Libkin 95) • Special access functions can be useful (Marathe 98) – Non-uniform field (NUF) distributions: alignedarrays with nulls • NRC + indices + counting + list operations • Dimension transformation + interpolation – Containment vs. overlap semantics We are yet to show the relationship between Map Algebra and Array Algebra Integration of Point with NUF Distribution Data Sources • Some issues – Value AT POINT queries – Neighborhood queries • Two possible “join” semantics – “snap” points to array-cells – “regrid” arrays to point resolution with interpolation • Planning the joins in a mediator – Scenario • A prior sub query selects a set of points P • Another prior subquery selects a set of array cells by condition C • Find value of function F for the points at the corresponding cells – Solutions • Get P and C-result at the mediator and compute F at the mediator • Collect the set P at the mediator, call function F on array with condition C for each element of P • Send an array indexing function to point source and return indexes, and perform an indexed selection from array source – Not implemented yet The General Integration Problem • Sources need to export different data models – – – – • • • • Different algebras Semantics of structures Semantics of values Constraints among values and domains How do we register this information? What combined algebra does the mediator support? How do we control addition of newer sources? How does this work in the GAV or GLAV integration framework? • How do we include type and structure transformations, and domain-specific value-association as part of the mediation process? The Current Integration Framework • Some Decisions – All data are “relationalized” – Algebraic operations are implemented on top of relational sources as functions – Functions are modeled in the BIRN mediator as relations with binding patterns – Popular native formats like OpenDAP are semantically too heterogeneous and has poor query capabilities • Value based queries are disallowed • We need to augment the registration mechanism to (semiautomatically) ingest all metadata • We will ingest the data and store it relationally in a networkaccessible relational system – Will consider the problems of adding vector-data and unaligned array data as a next step The Demonstration • The global schema The marked tables are augmented with physical parameters from the World Ocean Atlas – over two different grids Technology Overview • Microsoft ASP.NET • Asynchronous Javascript and XML (AJAX) • Google Maps Google Maps • Pros – – – – – Intuitive U.I. Bathymetry Simple Javascript API Speed Cost • Cons – Google dependant – Data volume limitation • Alternatives Under Consideration – ESRI ArcGIS Server – 3D Client (ArcGlobe, GoogleEarth, WorldWind) – Some combination Data Sources • SeamountsOnline – Biological Oceanography Information • World Ocean Atlas – Physical Oceanography Information • Biological and Physical Combination Next Steps • Interface Refinement • Apply learning to OBIS • Questions? Contact Information • Amarnath Gupta ([email protected]) • Karen Stocks ([email protected]) • Chris Condit ([email protected])