Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Long Term Ecological Research Network Office Trends Project Spaghetti & Linguine (aka Trends Data Store) Mark Servilla [email protected] 14 September 2006 Table of Contents • • • • Background System Architecture System Workflow and Architecture Details Demonstration Screen Examples LNO NIS Message from IMExec - Feb 2006 • “IMExec suggests that this activity be used to scope and determine the feasibility of using EML in the development of NIS modules for solving general synthesis problems.” • “The premise of this project is that EML will adequately describe the data set (e.g., entities, attributes, physical characteristics) to allow the capture of distributed data sets into a central SQL database.” • “Determining the nature of this model for dynamic data delivery – whether it is more site-loaded or more (network) service-loaded – is critical.” • “IMExec suggests that the near-term Trends NIS module activity be focused on development of a prototype for demonstration at the ASM in September.” LNO NIS Prerequisites • Site data is documented with “rich” and “complete” EML • Time-series data must be captured as “snap shots” for EML temporal coverage – i.e., no “continuous end date” • Site data is open and accessible through a standard protocol such as HTTP • Site EML documents are harvested on a regular basis into the LTER Metacat LNO NIS What is EML? Ecological Metadata Language is… • An ecological metadata standard • Very extensible; it can be used to describe many different types of data • Comprehensive and supports a rich set of constructs to fully describe data including – how to access distributed data – its logical and physical structure • Defined by an XML Schema • For further information: – http://knb.ecoinformatics.org/software/eml/ LNO NIS What is Metacat? Metacat is… • A storage system for metadata and data (optimized for use with EML) • Built on top of relational database system using Java servlets • Requires metadata to be in XML format • Provides a customizable web interface • Support point-to-point replication • For further information: – http://knb.ecoinformatics.org/software/metacat/ LNO NIS Trends Data Store Architecture EML Trends Metadata Source A - EML.xml Metacat/ Harvester Derived Metadata Source Provenance Integration Methods Trends Contact EML Factory Source B Source C EML Parser/ Loader Dataset Registry 1̊ f(x) 2̊ Primary Database (source data) Data Integration/ Transformation Secondary Database (derived data) Trends Data Warehouse LNO NIS Store HTML Front SOAP Generalized Workflow 1. 2. 3. 4. 5. 6. 7. 8. LNO NIS Sites collect and document time-series data (e.g., climate, social-economics, …) Sites update EML with a new revision EML is harvested into Metacat EML Loader/Parser loads new/updated dataset into primary database Data integration/transformation converts “raw” data into “derived” data Derived data is stored in secondary database EML is generated for derived data and is stored in Metacat Derived data is made available to store front Decomposed Workflow 1. 2. 3. 4. 5. 6. 7. 8. LNO NIS Sites collect and document time-series data (e.g., climate, social-economics, …) Sites update EML with a new revision EML is harvested into Metacat EML Loader/Parser loads new/updated dataset into primary database Data integration/transformation converts “raw” data into “derived” data Derived data is stored in secondary database EML is generated for derived data and is stored in Metacat Derived data is made available to store front LTER Site Data Collection • Time-series data – Physical environment (e.g., climate, …) – Human population and economy – Biogeochemistry – Biotic structure • Data/metadata – – – – LNO NIS Relational Database Spreadsheet Text file HTML/XML Generalized Workflow 1. 2. 3. 4. 5. 6. 7. 8. LNO NIS Sites collect and document time-series data (e.g., climate, social-economics, …) Sites update EML with a new revision EML is harvested into Metacat EML Loader/Parser loads new/updated dataset into primary database Data integration/transformation converts “raw” data into “derived” data Derived data is stored in secondary database EML is generated for derived data and is stored in Metacat Derived data is made available to store front EML, Metacat, and the Harvester • EML Package ID EML Source A Metacat/ Harvester Source B Source C LNO NIS “independent of the Trends Project” knb-lter-site.XX.YY knb-lter-sev.354.1 knb-lter-sev.354.2 knb-lter-sev.354.3 • Metacat stores the XML of EML; new revisions take precedence – old revisions are deprecated, but not deleted • Harvester is a time-based update process to “pull” site EML and inserts into Metacat Generalized Workflow 1. 2. 3. 4. 5. 6. 7. 8. LNO NIS Sites collect and document time-series data (e.g., climate, social-economics, …) Sites update EML with a new revision EML is harvested into Metacat EML Loader/Parser loads new/updated dataset into primary database Data integration/transformation converts “raw” data into “derived” data Derived data is stored in secondary database EML is generated for derived data and is stored in Metacat Derived data is made available to store front EML Loader/Parser • Dataset registry identifies Trends data in Metacat • New revisions assert a “new” data load. The EML parser/loader EML Source A Metacat/ Harvester Source B Source C EML Parser/ Loader Dataset Registry LNO NIS 1̊ – Translates the site EML into the RDBMS DDL – Creates a new DB table in the primary database based on the revision – Loads the new data into the primary database – Trigger to continue workflow Generalized Workflow 1. 2. 3. 4. 5. 6. 7. 8. LNO NIS Sites collect and document time-series data (e.g., climate, social-economics, …) Sites update EML with a new revision EML is harvested into Metacat EML Loader/Parser loads new/updated dataset into primary database Data integration/transformation converts “raw” data into “derived” data Derived data is stored in secondary database EML is generated for derived data and is stored in Metacat Derived data is made available to store front Data Transformation • Primary DB (1°) stores site data in native schema • Transformation module reads native schema, performs transformation/integration, and writes to global schema • Secondary DB (2°) stores derived data in consistent global schema 1̊ f(x) 2̊ Wind direction (knb-eco-trends.1.1) MCM Canada Glacier Wind Timestamp of observation 15 min interval wdir Wind direction (azimuth) wdirstd Standard deviation of wind direction wspd Wind speed meters/second wspdmax Maximum wind speed meters/second wpsdmin Minimum wind speed meters/second Timestamp (daily) Wind direction std dev (knb-eco-trends.2.1) Timestamp (daily) value Wind speed max (knb-eco-trends.5.1) Timestamp (daily) LNO NIS value … date_time “triggered by data load” value Global Schema scope revision knb_eco_trends_1_1 identifier LNO NIS Generalized Workflow 1. 2. 3. 4. 5. 6. 7. 8. LNO NIS Sites collect and document time-series data (e.g., climate, social-economics, …) Sites update EML with a new revision EML is harvested into Metacat EML Loader/Parser loads new/updated dataset into primary database Data integration/transformation converts “raw” data into “derived” data Derived data is stored in secondary database EML is generated for derived data and is stored in Metacat Derived data is made available to store front EML for the “derived” • EML Factory generates EML metadata for the derived data and inserts into Metacat • Derived data is now accessible through the Metacat user interface EML Metacat/ Harvester Trends Metadata EML.xml - Derived Metadata Source Provenance Integration Methods Trends Contact EML Factory 2̊ LNO NIS Generalized Workflow 1. 2. 3. 4. 5. 6. 7. 8. LNO NIS Sites collect and document time-series data (e.g., climate, social-economics, …) Sites update EML with a new revision EML is harvested into Metacat EML Loader/Parser loads new/updated dataset into primary database Data integration/transformation converts “raw” data into “derived” data Derived data is stored in secondary database EML is generated for derived data and is stored in Metacat Derived data is made available to store front Store Front • Store Front provides API to derived data products in secondary DB • HTML – today • Web service – tomorrow • Issues: – – – – – LNO NIS Authentication Authorization Provenance Quality Interactive Plots http://fire.lternet.edu/Trends (beta site location) 2̊ Store HTML Front SOAP HTML Store Front (evolution in progress) LNO NIS Animated Workflow Step 2 Step 5 Step 1 EML Trends Metadata Source A EML.xml Metacat/ Harvester - Derived Metadata Source Provenance Integration Methods Trends Contact EML Factory Source B Source C EML Parser/ Loader 1̊ 2̊ Step 4 Dataset Registry Step 3 LNO NIS f(x) Store HTML Front SOAP Step 6 Thank You – The End LNO NIS