Download Data Management in Geodise

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Expense and cost recovery system (ECRS) wikipedia , lookup

Microsoft Access wikipedia , lookup

Data vault modeling wikipedia , lookup

Concurrency control wikipedia , lookup

Metadata wikipedia , lookup

File locking wikipedia , lookup

Computer file wikipedia , lookup

Business intelligence wikipedia , lookup

Database wikipedia , lookup

Entity–attribute–value model wikipedia , lookup

Relational model wikipedia , lookup

Versant Object Database wikipedia , lookup

Clusterpoint wikipedia , lookup

Database model wikipedia , lookup

Transcript
Data Management in Geodise
Zhuoan Jiao, Jasmin Wason & Marc Molinari
{ z.jiao, j.l.wason, m.molinari } @soton.ac.uk
© Geodise Project, University of Southampton, 2003.
http://www.geodise.org/
Providing Data Management Services for
Engineering
 Engineering design and optimisation is a computationally
intensive process.
 Large quantities of data may be generated at different
locations with different characteristics.
 Engineering data is traditionally stored in flat files with
little descriptive metadata provided by the file system.
 Our focus is on leveraging existing database tools not
commonly used in engineering …
 …and making them accessible to users of the system.
© Geodise Project, University of Southampton, 2003.
http://www.geodise.org/
Tools and Services (1)
File storage
 Applications can archive data sent over GridFTP in
file systems for benefits of:
 Accessibility by a larger community (via authorisation)
 Storage capacity
 Additional metadata storage and query facilities
Metadata management service
 The data can be stored with additional descriptive
information detailing standard metadata (e.g. file
format, description) and application domain specific
metadata (e.g. grids, flux_order).
 An XML database is used as is it flexible enough to
store nested, complex engineering data.
© Geodise Project, University of Southampton, 2003.
http://www.geodise.org/
Tools and Services (2)
Query service
 Queries can be performed over the metadata
database to help the user locate required data
intuitively and efficiently.
Authorisation service
 Access rights to data can be granted to an
authenticated user based on information stored in an
authorisation database.
Location service
 Files are referenced with a unique handle.
 The location service provides access to a database of
file locations mapped to handles.
© Geodise Project, University of Southampton, 2003.
http://www.geodise.org/
Data Management Implementation for MATLAB
To increase the usability of file and metadata management services for
Engineers we have implemented a MATLAB Toolkit for archiving,
querying and retrieval of data to and from a Geodise repository.
Client
Grid
Geodise Database
Toolkit
Matlab
Functions
Globus Server
Refers
to
GridFTP
Java
clients
.NET
Location
Service
Location
Database
Authorisation
Service
Authorisation
Database
CoG
Apache
SOAP
SOAP
SOAP
Browser
Java
Metadata
Archive & Query
Services
© Geodise Project, University of Southampton, 2003.
http://www.geodise.org/
Metadata
Database
Geodise Database Toolkit for MATLAB –
Archive



gd_archive – Store a file with some metadata.
gd_datagroup – A datagroup is a collection of related files that may
be logically grouped together – this can also have associated
metadata.
Syntax:
groupID = gd_datagroup(<group_name>, [<metadata>])
fileID = gd_archive(<file_name>,[<metadata>],[<groupID>])

Examples:
m.dimension = ‘2D’;
m.component.gamma = 1.4;
groupID = gd_datagroup(‘2D-LP turbine rotor job9’, m)
meta.grids = 1
meta.flux_order = 2
fileID = gd_archive(‘input.dat’, meta, groupID)
fileID = gd_archive(‘mesh_ns.grid.1.adf’, [], groupID)
fileID = gd_archive(‘airfoil.msh’)
© Geodise Project, University of Southampton, 2003.
http://www.geodise.org/
XML Toolbox for MATLAB
Marc Molinari – GEM project.
xml_format():Convert a MATLAB variable into an XML string.
 xml_parse():Convert an XML string into a MATLAB variable.
Example:


>> A.b = ‘Hello World’;
>> A.c.aa = [1 2; 3 4; 5 6];
>> X = xml_format(A)
X =
<struct idx="0" size="1 1" fields="a b c">
<char idx="1" name="b" size="1 11">Hello World</char>
<struct idx="1" name="c" size="1 1" fields="aa ">
<double idx="1" name="aa" size="3 2"> 1 3 5 2 4 6
</double>
</struct>
</struct>
>> Y = xml_parse (X);
>> str = Y.b
str =
Hello World
© Geodise Project, University of Southampton, 2003.
http://www.geodise.org/
Application of XML Toolbox for MATLAB

Metadata set by user as a MATLAB structure.


MATLAB structure  Type-based XML



More natural format for MATLAB user.
Element names = variable types (e.g. <struct>, <double>)
Easier for conversion to and from structures.
Type-based XML  Name-based XML


Element names = variable names (e.g. <grids>, <turb_model>)
Easier for database query.
xml_format.m
Type-based
XML
XSLT
type2name
Name-based
XML
xml_parse.m
Type-based
XML
XSLT
name2type
Name-based
XML
MATLAB
© Geodise Project, University of Southampton, 2003.
http://www.geodise.org/
Geodise Database Toolkit for MATLAB – Query

gd_query

Text based query expressed over MATLAB variables for use in MATLAB
scripts.
 Converted to XPath to query XML database.
 XML Toolbox used to convert results into a list of metadata structures.

Syntax:
Results = gd_query( <query string>,[‘file’|‘datagroup’] )

Example 1: datagroup
Results = gd_query(‘dimension = 2D’, ‘datagroup’)
Results{1}.standard.files.fileID
ans =
input_dat_632d05be-ba26-479b-9607-d1845f3c78ff
ans =
mesh_ns_cs_adf_ce875805-47b7-4e25-a5f7-9a8adf8f21b6

Example 2: file
r = gd_query(‘standard.userID = me & grids < 2’);
r{1}.grids
ans =
1
© Geodise Project, University of Southampton, 2003.
http://www.geodise.org/
Geodise Database Toolkit for MATLAB –
Retrieve

gd_retrieve

Retrieve a file from the repository using unique handle.
 Asks Authorisation service whether user has permission to retrieve the
file.
 Asks Location service where the file is.
 File transferred back to local file system using GridFTP.

Syntax
newFileLocation = gd_retrieve(<fileID>, <localPath>)

Examples
gd_retrieve(‘input_dat_632d05be-ba26-479b-960…’, ‘E:\tmp’)
ans =
E:\tmp\input.dat
gd_retrieve(‘input_dat_632d05be-ba26-479b-960…’,
‘E:\tmp\control42.dat’)
ans =
E:\tmp\control42.dat
© Geodise Project, University of Southampton, 2003.
http://www.geodise.org/
Authorisation

Data Authorisation





Globus certificate subject mapped to user ID.
User sets access rights for the data they archive, so it can be
queried and retrieved by others.
Access rights stored in a relational database, accessed through
Authorisation web service.
Grant users and groups access rights by including their user ID
or group ID in the metadata structure.
Example
m.grids = 1
m.access.users = {‘userA’,’userB’}
m.access.groups = {‘groupC’}
gd_archive (‘input.dat’, m)
© Geodise Project, University of Southampton, 2003.
http://www.geodise.org/
Future Work

Archive structures as XML



OGSA DAI integration



Cannot query inside archived files.
Archive MATLAB structures as XML and query them.
Replace and enhance some of our functionality with that
provided by OGSA DAI.
E.g. Name mapping interface for authenticating Grid credentials
to local ids (system and relational database ids) .
Change database system


Xindice XML database – flexible and good for prototyping but not
scalable and no security.
Will choose a relational database with XML capabilities – Oracle,
DB2, SQL Server.
© Geodise Project, University of Southampton, 2003.
http://www.geodise.org/