Download Database access and data retrieval Lisbon 18/02/09 R. Coelho

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Entity–attribute–value model wikipedia , lookup

Big data wikipedia , lookup

Relational model wikipedia , lookup

Database wikipedia , lookup

Extensible Storage Engine wikipedia , lookup

Microsoft Jet Database Engine wikipedia , lookup

Functional Database Model wikipedia , lookup

Clusterpoint wikipedia , lookup

Object-relational impedance mismatch wikipedia , lookup

Database model wikipedia , lookup

Transcript
Database access and data retrieval (a users view)
R. Coelho
Associação EURATOM/IST, Instituto de Plasmas e Fusão Nuclear
Outline
1 – General overview of fusion databases
2 – Data storage/retrieval methods and datastructures
3 – SDAS at ISTTOK
Database access and data retrieval
Lisbon 18/02/09
R. Coelho
1/29
I - General overview of fusion databases

Databases play a fundamental role in fusion plasma
research



Essential for storage of seminal/standard benchmarking
discharges.
Assist the construction/deduction of elementary scaling laws and
design phase of fusion devices (what to expect on confinement,
MHD, transport,…)
Assist the modeling effort by providing a validated set of input
experimental data (cross sections, machine dependent data,…)
and experimental plasma data on which to validate the codes.
 Databases offer a clear display of community achievements
Database access and data retrieval
Lisbon 18/02/09
R. Coelho
2/29
Fusion databases : 3 notable examples

International Multi-Tokamak Profile Database (ITPA)

Atomic Data and Analysis Structure (ADAS)

Experimental Nuclear Reaction Data (EXFOR)
Database access and data retrieval
Lisbon 18/02/09
R. Coelho
3/29
International Multi-Tokamak Profile Database (ITPA)
• Objectives
– To provide all the information required for transport codes to
simulate discharges from a variety of tokamaks.
– Provide data to be compared against the predicted outputs from
the codes.
– Provide data and the modelling results to be used as part of the
ITER physics basis.
• Coverage
– Released publically in 1998.
– Built from 201 shots from 21 devices. Recent data has been added
to secondary but remains for “working group” only access.
Database access and data retrieval
Lisbon 18/02/09
R. Coelho
4/29
International Multi-Tokamak Profile Database (ITPA)
• Storage/accessing
– MDS+ server, data stored as MDS+ trees.
– Relational database with comments, 0D and 1/2D metadata
assists the database queries.
http://tokamak-profiledb.ukaea.org.uk/
C M Roach, M Walters, R V Budny, F Imbeaux,
TW Fredian et al, Nuc. Fus., 48, 125001 (2008)
Database access and data retrieval
Lisbon 18/02/09
R. Coelho
5/29
Atomic Data and Analysis Structure (ADAS)
• Objectives
– Provide interconnected set of computer codes and data collections
for modelling the radiating properties of ions and atoms in
plasmas.
– Assist in the analysis and interpretation of spectral emission and
support detailed plasma models (crucial in plasma edge).
• Coverage
– Plasmas ranging from the interstellar medium through the solar
atmosphere and laboratory thermonuclear fusion devices to
technological plasmas.
Database access and data retrieval
Lisbon 18/02/09
R. Coelho
6/29
Atomic Data and Analysis Structure (ADAS)
• Accessing
– A key range of routines for accessing the database and delivering
data to user codes is included. FORTRAN, C, C++, IDL and
MATLAB are supported.
http://open.adas.ac.uk/index.php
Assisting fusion since JET was born…(1983)
Database access and data retrieval
Lisbon 18/02/09
R. Coelho
7/29
Experimental Nuclear Reaction Data (EXFOR)
• Objectives
– Provide an extensive compilation of experimental nuclear reaction
data.
• Coverage
– Neutron induced reactions have been compiled systematically
since the discovery of the neutron.
– Charged particle and photon reactions have been covered less
extensively
– Data from 17700 experiments, its' bibliographic information, as
well as experimental information about the data. The status (e.g.,
the source of the data), and history (e.g., date of last update) of the
data set is also included.
Database access and data retrieval
Lisbon 18/02/09
R. Coelho
8/29
Experimental Nuclear Reaction Data (EXFOR)
• Repository
– Stored at International Network of Nuclear Reaction Data Centres
(NRDC). http://www-nds.iaea.org/exfor/exfor.htm
Database access and data retrieval
Lisbon 18/02/09
R. Coelho
9/29
II - Data storage/retrieval methods

MDS+

HDF5

Universal Access Layer method

Paradigm for data retrieval methodologies
Database access and data retrieval
Lisbon 18/02/09
R. Coelho
10/29
MDSplus (MDS+)
Database access and data retrieval
http://www.mdsplus.org
Lisbon 18/02/09
R. Coelho
11/29
MDSplus (MDS+)
SOME CONCEPTS
•
The Data Hierarchy - Trees, Nodes, and Models. A self-descriptive
hierarchy called a TREE, consisting of large numbers of named
NODES which make up the branches (structure) and leaves (data) of
each tree.
– MDSplus SHOTS are trees created from a special type of tree called a
MODEL, a template which contains all of the structure and setup data for
an experiment or code.
•
Node Characteristics - Self Description : metadata including the
data type, array dimensions, data length, units, independent axes, the
parents and children of the node, tag names, the date when the data
was stored, the name of the user who wrote data, and so forth.
Database access and data retrieval
Lisbon 18/02/09
R. Coelho
12/29
MDSplus (MDS+)
TREE EXAMPLE
•
•
The node on the far right "Ip" is an example of a MEMBER, a type of
node used to contain data
Child and member nodes as analogous to the directories and files on a
typical operating system.
Database access and data retrieval
Lisbon 18/02/09
R. Coelho
13/29
MDSplus (MDS+)
DETAILS ON THE API
•
The basic calls as they would be ordered in an application are, in
generic syntax:
mdsconnect,'server_name'
mdsopen,'tree_name',shot_number
result = mdsvalue('expression')
mdsput,'node_name','expression'
mdsclose,[[Documentation_beginners_tree_name,shot]
mdsdisconnect
Database access and data retrieval
Lisbon 18/02/09
R. Coelho
14/29
MDSplus (MDS+)
ACCESSING JET DATA
(workaround since not a native MDS+ server storage)
• MATLAB
>> mdsconnect('mdsplus.jet.efda.org')
>> [y,status]=mdsvalue('_sig=jet("ppf/magn/ipla",40573)')
>> [x,status]=mdsvalue('dim_of(_sig)')
>> mdsdisconnect
• IDL
IDL> mdsconnect,'mdsplus.jet.efda.org'
IDL> y=mdsvalue('_sig=jet("ppf/magn/ipla",40573)')
IDL> x=mdsvalue('dim_of(_sig)')
IDL> plot,x,y
IDL> mdsdisconnect
Database access and data retrieval
Lisbon 18/02/09
R. Coelho
15/29
HDF5
http://www.hdfgroup.org/index.html
• HDF5 is a self-describing file format and library for storing
scientific data.
• A versatile data model that can represent very complex data
objects and a wide variety of metadata (different datatypes
on the same tree) with direct access to parts of the file
without parsing the entire file.
• A completely portable file format with no limit on the number
or size of data objects in the collection.
• A software library that runs on a range of computational
platforms, from laptops to massively parallel systems, and
implements a high-level API with C, C++, Fortran 90, and
Java interfaces.
Database access and data retrieval
Lisbon 18/02/09
R. Coelho
16/29
Database access and data retrieval
Lisbon 18/02/09
R. Coelho
17/29
Universal Access Layer (UAL)
MOTIVATION
•
HDF5 and MDSplus represent successful tools for a common data
format and organization, thus allowing effective data sharing among
different applications.
•
But will these standards survive the lifespan of ITER ? A more generic
approach is envisaged and been implemented on the ITM-TF.
•
Consistent Physical Objects (CPO) - a generic view in trees and subtrees of the data organization, transparent to the actual method used
for data storage.
G. Manduchi et al, Fusion Engineering and Design 83, 462-466 (2008)
Database access and data retrieval
Lisbon 18/02/09
R. Coelho
18/29
Universal Access Layer (UAL)
DATA STRUCTURE
Database access and data retrieval
Lisbon 18/02/09
R. Coelho
19/29
Universal Access Layer (UAL)
DATA STRUCTURE
MSE CPO
Database access and data retrieval
Lisbon 18/02/09
R. Coelho
20/29
PARSING THE DATA STRUCTURE
•
CPO tree-like hierarchical structure is defined through language
independent XML schemas. These can be easily parsed to each
programming language.
Database access and data retrieval
Lisbon 18/02/09
R. Coelho
21/29
Universal Access Layer (UAL)
DATA FLOW
•
(D.COSTER)
The multi-level UAL manages the CPO I/O between codes as a
common data bus and the data retrieval (MDS+ or HDF5 stored data)
Database access and data retrieval
Lisbon 18/02/09
R. Coelho
22/29
Universal Access Layer (UAL)
CPO I/O
•
euitm_open(name,shot,run)
•
euitm_get(path, output_structure)
– the location of the CPO is specified by the string argument “path”
– output_structure is language dependent and will hold the output data.
•
•
• euitm_put(path, input_structure)
– the location of the CPO is specified by the string argument “path”
– input_structure is language dependent and will hold the input data. CPO is
specified by the string argument “path”.
Database access and data retrieval
Lisbon 18/02/09
R. Coelho
23/29
Universal Access Layer (UAL)
ACCESSING EXPERIMENTAL DATA
Cortesy of J.Signoret and F.Imbeaux
Database access and data retrieval
Lisbon 18/02/09
R. Coelho
24/29
Metodologies for data retrieval
WHAT IS A SIGNAL ?
“any kind of data that describes a particular measurement during a
discharge and contains some information about plasma
properties”,e.g. 2/3D data time-series data, contour maps, images…
OUTPUT PER SHOT ?
Diagnostics at JET top 10 Gbytes/shot….much smaller than the
expected values for ITER !
WHAT IS MEASURED ?
Physical properties manifest as patterns with a direct parallel between
the physical behaviour and the structural shapes that are generated
(spikes in D emission during Edge localised modes (ELMs), Soft Xray and ECE emission during sawtooth crash (ST).
Database access and data retrieval
Lisbon 18/02/09
R. Coelho
25/29
Metodologies for data retrieval
Traditional approach
•
Query founded on shot/signal
•
Manual inspection of structural shapes/features
•
Very tedious and long process
Database access and data retrieval
Lisbon 18/02/09
R. Coelho
26/29
Metodologies for data retrieval
Pattern recognition approach
•
Data with technical and scientific criteria guidance.
•
“Pattern oriented” compliant, just as people behave when they analyze
data.
•
Relies on enclosed techniques for data retrieval :
– Feature extraction
– single entity (temporal segment inside a waveform or a set of
pixels within an image)
– compound entity (more than one segment/signal)
– Classification system (supervised/unsupervised)
– Similarity measure (metrics proximity measure)
J.Vega et al, Fusion Eng. And Design 83, 382 (2008)
Database access and data retrieval
Lisbon 18/02/09
R. Coelho
27/29
III – Shared Data Access System (SDAS)
Why another Data Retrieval Software?
The problem
•
Scientists need to access data from different laboratories;
– Each laboratory has its own way of retrieving data;
– Scientists have to spend time and effort learning how the different
data access schemes work, change their analysis code for each
experiment and manage updated versions for each different
program and library required;
• Does not mean that every association must store and retrieve
data in the same way.
•
The main data index is changing from shot number to time and events,
where the pulse number is just one among the most relevant events
against data is catalogued.
Database access and data retrieval
Lisbon 18/02/09
R. Coelho
28/33
Shared Data Access System (SDAS)
Why another Data Retrieval Software?
The solution
• Hide all complexity from end-users;
• Scientists only have to learn once how to access data;
• Users don't ask data for information directly to the
association's database but to a software layer;
• The software layer provides the same data access
functions in all associations;
• Data blocks are tagged against specific events which
happen during the life cycle of a discharge
29/29
Database access and data retrieval
Lisbon 18/02/09
R. Coelho
Without SDAS
With SDAS
Database access and data retrieval
Lisbon 18/02/09
R. Coelho
SDAS Technology
• SDAS is based on Remote Procedure Calls (RPC);
• The SDAS server is formed by an XML-RPC server
and by a connector to the storage mechanism;
• Data is indexed by time and events;
• SDAS server and libraries available on Python,
Java and C++;
• Read and Write support (for post processed data)
• Supported in several data analysis programs:
– Matlab, IDL, Octave, Mathematica
• Documentation in wiki: http://cdaq.cfn.ist.utl.pt:8085/
• Currently being used in ISTTOK/PT, Compass/CZ
• and TJ-II/ES
Database access and data retrieval
31/29
Lisbon 18/02/09
R. Coelho
Data access
• SDAS libraries are easily integrated in programs such as MatLab,
Mathematica and IDL;
• SDAS provides over 20 functions which allow to:
– Search parameters and events;
– Retrieve single and multiple data
32/33
Database access and data retrieval
Lisbon 18/02/09
R. Coelho