Download PowerPoint document describing the Trends data store - GCE-LTER

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Relational model wikipedia , lookup

Database wikipedia , lookup

Big data wikipedia , lookup

Extensible Storage Engine wikipedia , lookup

Entity–attribute–value model wikipedia , lookup

Clusterpoint wikipedia , lookup

Functional Database Model wikipedia , lookup

Object-relational impedance mismatch wikipedia , lookup

Database model wikipedia , lookup

Transcript
Long Term Ecological Research
Network Office
Trends Project
Spaghetti & Linguine
(aka Trends Data Store)
Mark Servilla
[email protected]
14 September 2006
Table of Contents
•
•
•
•
Background
System Architecture
System Workflow and Architecture Details
Demonstration Screen Examples
LNO NIS
Message from IMExec - Feb 2006
• “IMExec suggests that this activity be used to scope and
determine the feasibility of using EML in the development of
NIS modules for solving general synthesis problems.”
• “The premise of this project is that EML will adequately
describe the data set (e.g., entities, attributes, physical
characteristics) to allow the capture of distributed data sets
into a central SQL database.”
• “Determining the nature of this model for dynamic data
delivery – whether it is more site-loaded or more (network)
service-loaded – is critical.”
• “IMExec suggests that the near-term Trends NIS module
activity be focused on development of a prototype for
demonstration at the ASM in September.”
LNO NIS
Prerequisites
• Site data is documented with “rich” and
“complete” EML
• Time-series data must be captured as “snap
shots” for EML temporal coverage – i.e., no
“continuous end date”
• Site data is open and accessible through a
standard protocol such as HTTP
• Site EML documents are harvested on a regular
basis into the LTER Metacat
LNO NIS
What is EML?
Ecological Metadata Language is…
• An ecological metadata standard
• Very extensible; it can be used to describe many different
types of data
• Comprehensive and supports a rich set of constructs to
fully describe data including
– how to access distributed data
– its logical and physical structure
• Defined by an XML Schema
• For further information:
– http://knb.ecoinformatics.org/software/eml/
LNO NIS
What is Metacat?
Metacat is…
• A storage system for metadata and data (optimized for
use with EML)
• Built on top of relational database system using Java
servlets
• Requires metadata to be in XML format
• Provides a customizable web interface
• Support point-to-point replication
• For further information:
– http://knb.ecoinformatics.org/software/metacat/
LNO NIS
Trends Data Store Architecture
EML
Trends
Metadata
Source
A
-
EML.xml
Metacat/
Harvester
Derived Metadata
Source Provenance
Integration Methods
Trends Contact
EML
Factory
Source
B
Source
C
EML Parser/
Loader
Dataset
Registry
1̊
f(x)
2̊
Primary
Database
(source data)
Data
Integration/
Transformation
Secondary
Database
(derived data)
Trends Data Warehouse
LNO NIS
Store HTML
Front
SOAP
Generalized Workflow
1.
2.
3.
4.
5.
6.
7.
8.
LNO NIS
Sites collect and document time-series data (e.g.,
climate, social-economics, …)
Sites update EML with a new revision
EML is harvested into Metacat
EML Loader/Parser loads new/updated dataset into
primary database
Data integration/transformation converts “raw” data
into “derived” data
Derived data is stored in secondary database
EML is generated for derived data and is stored in
Metacat
Derived data is made available to store front
Decomposed Workflow
1.
2.
3.
4.
5.
6.
7.
8.
LNO NIS
Sites collect and document time-series data (e.g.,
climate, social-economics, …)
Sites update EML with a new revision
EML is harvested into Metacat
EML Loader/Parser loads new/updated dataset into
primary database
Data integration/transformation converts “raw” data
into “derived” data
Derived data is stored in secondary database
EML is generated for derived data and is stored in
Metacat
Derived data is made available to store front
LTER Site Data Collection
• Time-series data
– Physical environment (e.g.,
climate, …)
– Human population and
economy
– Biogeochemistry
– Biotic structure
• Data/metadata
–
–
–
–
LNO NIS
Relational Database
Spreadsheet
Text file
HTML/XML
Generalized Workflow
1.
2.
3.
4.
5.
6.
7.
8.
LNO NIS
Sites collect and document time-series data (e.g.,
climate, social-economics, …)
Sites update EML with a new revision
EML is harvested into Metacat
EML Loader/Parser loads new/updated dataset into
primary database
Data integration/transformation converts “raw” data
into “derived” data
Derived data is stored in secondary database
EML is generated for derived data and is stored in
Metacat
Derived data is made available to store front
EML, Metacat, and the Harvester
• EML Package ID
EML
Source
A
Metacat/
Harvester
Source
B
Source
C
LNO NIS
“independent of
the Trends
Project”
knb-lter-site.XX.YY
knb-lter-sev.354.1
knb-lter-sev.354.2
knb-lter-sev.354.3
• Metacat stores the XML
of EML; new revisions
take precedence – old
revisions are deprecated,
but not deleted
• Harvester is a time-based
update process to “pull”
site EML and inserts
into Metacat
Generalized Workflow
1.
2.
3.
4.
5.
6.
7.
8.
LNO NIS
Sites collect and document time-series data (e.g.,
climate, social-economics, …)
Sites update EML with a new revision
EML is harvested into Metacat
EML Loader/Parser loads new/updated dataset into
primary database
Data integration/transformation converts “raw” data
into “derived” data
Derived data is stored in secondary database
EML is generated for derived data and is stored in
Metacat
Derived data is made available to store front
EML Loader/Parser
• Dataset registry identifies
Trends data in Metacat
• New revisions assert a
“new” data load. The
EML parser/loader
EML
Source
A
Metacat/
Harvester
Source
B
Source
C
EML Parser/
Loader
Dataset
Registry
LNO NIS
1̊
– Translates the site EML
into the RDBMS DDL
– Creates a new DB table in
the primary database based
on the revision
– Loads the new data into the
primary database
– Trigger to continue
workflow
Generalized Workflow
1.
2.
3.
4.
5.
6.
7.
8.
LNO NIS
Sites collect and document time-series data (e.g.,
climate, social-economics, …)
Sites update EML with a new revision
EML is harvested into Metacat
EML Loader/Parser loads new/updated dataset into
primary database
Data integration/transformation converts “raw” data
into “derived” data
Derived data is stored in secondary database
EML is generated for derived data and is stored in
Metacat
Derived data is made available to store front
Data Transformation
• Primary DB (1°) stores site data in native schema
• Transformation module reads native schema, performs
transformation/integration, and writes to global schema
• Secondary DB (2°) stores derived data in consistent
global schema
1̊
f(x)
2̊
Wind direction (knb-eco-trends.1.1)
MCM Canada Glacier Wind
Timestamp of observation 15 min interval
wdir
Wind direction (azimuth)
wdirstd
Standard deviation of wind direction
wspd
Wind speed meters/second
wspdmax
Maximum wind speed meters/second
wpsdmin
Minimum wind speed meters/second
Timestamp (daily)
Wind direction std dev (knb-eco-trends.2.1)
Timestamp (daily)
value
Wind speed max (knb-eco-trends.5.1)
Timestamp (daily)
LNO NIS
value
…
date_time
“triggered by
data load”
value
Global Schema
scope
revision
knb_eco_trends_1_1
identifier
LNO NIS
Generalized Workflow
1.
2.
3.
4.
5.
6.
7.
8.
LNO NIS
Sites collect and document time-series data (e.g.,
climate, social-economics, …)
Sites update EML with a new revision
EML is harvested into Metacat
EML Loader/Parser loads new/updated dataset into
primary database
Data integration/transformation converts “raw” data
into “derived” data
Derived data is stored in secondary database
EML is generated for derived data and is stored in
Metacat
Derived data is made available to store front
EML for the “derived”
• EML Factory generates EML metadata for the
derived data and inserts into Metacat
• Derived data is now accessible through the
Metacat user interface
EML
Metacat/
Harvester
Trends
Metadata
EML.xml
-
Derived Metadata
Source Provenance
Integration Methods
Trends Contact
EML
Factory
2̊
LNO NIS
Generalized Workflow
1.
2.
3.
4.
5.
6.
7.
8.
LNO NIS
Sites collect and document time-series data (e.g.,
climate, social-economics, …)
Sites update EML with a new revision
EML is harvested into Metacat
EML Loader/Parser loads new/updated dataset into
primary database
Data integration/transformation converts “raw” data
into “derived” data
Derived data is stored in secondary database
EML is generated for derived data and is stored in
Metacat
Derived data is made available to store front
Store Front
• Store Front provides API
to derived data products
in secondary DB
• HTML – today
• Web service – tomorrow
• Issues:
–
–
–
–
–
LNO NIS
Authentication
Authorization
Provenance
Quality
Interactive Plots
http://fire.lternet.edu/Trends
(beta site location)
2̊
Store HTML
Front
SOAP
HTML Store Front
(evolution in progress)
LNO NIS
Animated Workflow
Step 2
Step 5
Step 1
EML
Trends
Metadata
Source
A
EML.xml
Metacat/
Harvester
-
Derived Metadata
Source Provenance
Integration Methods
Trends Contact
EML
Factory
Source
B
Source
C
EML Parser/
Loader
1̊
2̊
Step 4
Dataset
Registry
Step 3
LNO NIS
f(x)
Store HTML
Front
SOAP
Step 6
Thank You – The End
LNO NIS