Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Concurrency control wikipedia , lookup
Extensible Storage Engine wikipedia , lookup
Microsoft Jet Database Engine wikipedia , lookup
Functional Database Model wikipedia , lookup
Relational model wikipedia , lookup
Versant Object Database wikipedia , lookup
Entity–attribute–value model wikipedia , lookup
Integrating Data Management into Engineering Applications Zhuoan Jiao, Jasmin Wason, Marc Molinari, Steven Johnston & Simon Cox School of Engineering Sciences University of Southampton, UK {z.jiao, j.l.wason, m.molinari, s.j.johnston, sjc} @soton.ac.uk Geodise Project Grid Enabled Optimisation and Design Search for Engineering © Geodise Project, University of Southampton, 2003. http://www.geodise.org/ Challenges Large quantities of data generated at different locations with different characteristics. Engineering data is traditionally stored in flat files with little descriptive metadata – hard to search and share. Our focus is to leverage existing database tools not commonly used in engineering applications, and … … provide them in an environment familiar to engineers. © Geodise Project, University of Southampton, 2003. http://www.geodise.org/ Geodise Database Toolkit Overview Store data with additional descriptive information. Standard metadata (file name, size, …) User-defined application specific metadata Query over metadata to more easily locate required data. Retrieve data based on logical data identities. Provide a familiar interface for engineers – wrap database services as Matlab functions and metadata as Matlab structures and variables. Tools can be used in scripts running locally or on a remote compute resource. © Geodise Project, University of Southampton, 2003. http://www.geodise.org/ Architecture Client Grid Geodise Database Toolkit Matlab Functions Globus Server GridFTP Java clients CoG Geodise Database Web Services Apache SOAP Location Service SOAP Authorisation Service GUI SOAP Metadata Archive & Query Services © Geodise Project, University of Southampton, 2003. http://www.geodise.org/ Metadata Database Services (1) Storage Archive data in file systems, sent over GridFTP. Archive Matlab structures and variables as XML documents in a database. Support datagroup concept to aggregate related data. Location service to map logical data identities with physical storage locations. Authorisation service Access rights to data can be granted to authenticated users from Matlab. © Geodise Project, University of Southampton, 2003. http://www.geodise.org/ Services (2) Metadata archive service Descriptive information can be added to data: Standard technical metadata (e.g. file size, format, date): mostly autogenerated. Application specific metadata: user defined Matlab structure. Query service Query over metadata to efficiently locate required data. Client side command-line and GUI interfaces. © Geodise Project, University of Southampton, 2003. http://www.geodise.org/ Client Tools (1) Archive data - gd_archive Store files/ structures into archive with some metadata. metadata.model = ‘pgb_design’ metadata.result.bandgap = 20 fileID = gd_archive(‘C:\input.dat’, metadata) var.a = [1.4, 5.32, 4.98] structID = gd_archive(var, metadata) Group data - gd_datagroup, gd_datagroupadd Logically group together related data. groupID = gd_datagroup (‘my datagroup’, group_metadata) gd_datagroupadd (groupID, fileID) © Geodise Project, University of Southampton, 2003. http://www.geodise.org/ Client Tools (2) Query data - gd_query Query archive from script or GUI. gd_query (‘file.userID = me & result.bandgap < 40’, ‘file.*’) Retrieve data - gd_retrieve Retrieve archived data to local machine. gd_retrieve (fileID, ‘E:\files\control.dat’) var = gd_retrieve (structID) © Geodise Project, University of Southampton, 2003. http://www.geodise.org/ XML Toolbox for Matlab Matlab Variables/Structures XML Convert proprietary format to XML description and vice versa transparently, easy-to-use. Benefit: XML can be transferred, stored, queried and retrieved across the Grid. Two functions used in database toolbox: xml_format(): converts a MATLAB variable to an XML string. xml_parse(): converts an XML string to a MATLAB variable. GEM project http://www.soton.ac.uk/~gridem/Pages/xmltoolbox.htm © Geodise Project, University of Southampton, 2003. http://www.geodise.org/ Application of XML Toolbox in Database Toolkit Matlab (A) Generate file File archive (B) Archive local file path structure Data file XML filehandle (C) Query Metadata database query string structure structure structure XML XML XML filehandle (D) Retrieve filehandle local file path Data file © Geodise Project, University of Southampton, 2003. http://www.geodise.org/ Geodise Database Toolkit used in GENIE © Geodise Project, University of Southampton, 2003. http://www.geodise.org/ Current Work - XML Schema Generation and Evolution XML Schemas describing metadata can be used for: Automatically generating graphical query interfaces. Improving query performance. Categorisation of user defined metadata. Schemas automatically generated from XML Modified tool from the Castor project (http://castor.exolab.org). Schemas evolve over time Changes are made to user-defined metadata as a design is developed. SchemaEvolver tool compares generated schema with those previously stored. Depending on similarity weighting of closest match the outcome is: Exact match – Metadata conforms to an existing XML Schema Similar – Existing XML Schema modified to include differences No match – Metadata conforms to a new XML Schema which must be stored © Geodise Project, University of Southampton, 2003. http://www.geodise.org/ Example – Similar XML Schemas <metadata> <a> 5.342 </a> <b> 2D </b> <c> new info </c> </metadata> <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"> <xs:element name="metadata"> <xs:complexType> 2 <xs:sequence> <xs:element name="a" type="xs:float"/> <xs:element name="b" type="xs:string"/> <xs:element name="c" type="xs:string"/> … 1 1. Generate XML from user metadata structure and store in database. 3 2. Generate XML Schema from XML. 3. Compare with previously stored schemas. 4. If similar schema found merge the two to create a new, evolved schema. 5. Database 5 Compare <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"> <xs:element name="metadata"> <xs:complexType> <xs:sequence> <xs:element name="a" type="xs:float"/> <xs:element name="b" type="xs:string"/> … 4 Evolve <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"> Add this schema to the <xs:element name="metadata"> <xs:complexType> database and associate <xs:sequence> the XML with it. <xs:element name="a" type="xs:float"/> <xs:element name="b" type="xs:string"/> © Geodise Project, University of Southampton, 2003. <xs:element name="c" type="xs:string“ minOccurs="0" /> http://www.geodise.org/ … Future Work Metadata and XML Schemas More work on producing an evolved schema with the SchemaEvolver GUI tool for user input to assist XML schema modification Further research into XML Schema versioning Web query interface Dynamic generation based on database and XML Schemas. Database support for Geodise graphical workflow construction Infrastructure Improved Web Service security and use of OGSA-DAI. Jython implementation of client toolkit © Geodise Project, University of Southampton, 2003. http://www.geodise.org/