Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Mirror, Mirror on the Wall, What is the Best <XML> Database Solution of All? Akmal B Chaudhri Senior Architect Informix Labs Disclaimer Any opinions expressed are mine and not necessarily those of my employer Copyright © 2001 Informix 2 Acknowledgements All trademarks are acknowledged Various people at Informix for some of the presentation material Robert Sutor, Lee Kheng Joo and Eve Maler Copyright © 2001 Informix 3 Abstract Managing XML documents is problematic when document collections grow. How do we successfully store and query document collections? One solution is a database. We will discuss the problem of integrating XML with databases and examine choices, such as relational databases, object databases, objectrelational databases and native XML servers Copyright © 2001 Informix 4 Speaker Biography The speaker has been working in the area of Object Databases for 10 years He has previously worked for Reuters, Logica and Computer Associates as well as OODB research at City University ? Copyright © 2001 Informix 5 Life at Informix Labs I build linear accelerators! Technology Evangelist Technology Pragmatist Copyright © 2001 Informix 6 Agenda The Importance of XML Architectural View of XML Demonstrations XML to ORDB using “roll-your-own” XML to ORDB using Mapping Tool Copyright © 2001 Informix 7 The Importance of XML Waves of Technology eCommerce Internet Web Computing Client Server Departmental Servers Mainframes Copyright © 2001 Informix 9 The Importance of XML By 2003, more than 75% of ebusiness applications will include XML, regardless of which language the application has been written-in. Copyright © 2001 Informix 10 Tasks/Roles Assumed by XML Data transfer between applications and systems As middleware between an RDBMS and an e-commerce front end As a document repository, possibly replacing SGML repositories As a centralized database Other 0 10 20 30 40 50 60 % Source: [Walker00] Copyright © 2001 Informix 11 Vendor Market Share 1999 Vendor Product Revenue US$ Million Market Share % Sterling/CA Vision 3 26.8 SAG Tamino 2.4 21.4 Poet CMS 2.1 18.8 eXcelon 1.5 13.4 2.2 19.6 eXcelon All Others Source: [IDC00] Copyright © 2001 Informix 12 XML DBs Predicted Growth 800 700 US$ Million 600 500 400 300 200 100 0 1999 2000 2001 2002 2003 2004 Source: [IDC00] Copyright © 2001 Informix 13 Architectural View of XML Jump Gate Ready … Copyright © 2001 Informix 15 XML Persistence Options Indexed File System Database System Relational Object Native Dynamic Hashing Libraries Hybrid Source: [Edwards01] Copyright © 2001 Informix 16 XML Database Products Type Middleware XML-Enabled DBs Native XML DBs XML Servers XML App Servers CMS Persistent DOM DataCentric DocumentCentric Source: [Bourret00] Copyright © 2001 Informix 17 Data-Centric Fine-grained data Order of elements not significant Examples Sales Order Flight Schedule Restaurant Menu … Machine consumption Source: [Bourret00] Copyright © 2001 Informix 18 Document-Centric Large-grained data Order of elements is significant Examples Book Email Advertisement … Human consumption Source: [Bourret00] Copyright © 2001 Informix 19 Three Types of XML DBs XML Generating Database XML Document Database XML Component Database Source: [Chelsom00] Copyright © 2001 Informix 20 XML Generating Database XML is generated from the database XML Document XML Formatter Copyright © 2001 Informix 21 XML Document Database Database stores complete XML documents or document fragments XML Document Copyright © 2001 Informix XML Document XML Document XML Document 22 XML Component Database Full XML awareness XML Document Copyright © 2001 Informix <A> <A> <B>...</B> <A> <B>...</B> </A> <B>...</B> </A> </A> 23 XML Persistence Options Database System Relational Object Native Copyright © 2001 Informix 24 Object Databases eXtensible Markup Language Enterprise Java Beans Java Language World Wide Web Copyright © 2001 Informix 25 IEEE Computer, August 2000 Copyright © 2001 Informix 26 Cattell vs. Stonebraker Cattell Object-oriented databases are doing just fine, and the news of their demise is highly exaggerated. Stonebraker ODBMSs occupy a small niche market that has no broad appeal. The technology is in semi-rigor mortis, … Source: [Leavitt00] Copyright © 2001 Informix 27 DB Sales Revenue 1999 2001* US$ US$ RDB/ORDB 11.1 Billion 15.6 Billion OODB 211 Million 265 Million *Predicted Source: [IDC00] Copyright © 2001 Informix 28 XML and OO XML is not OO No inheritance No encapsulation No behaviour ... OODB is overkill for structured text Some Content Management Systems are built on top of OODBs Copyright © 2001 Informix 29 XML Persistence Options Database System Relational Object Native Copyright © 2001 Informix 30 Native Databases Many vendors developing “Native” XML databases Documents needed in original form Structural information is maintained Storage, query and retrieval of structure and content Good for point solutions Support for non-XML data? Copyright © 2001 Informix 31 XML Persistence Options Database System Relational Object Native Copyright © 2001 Informix 32 Relational Databases RDB products scale well Traditional and semi-structured data can co-exist and be used by multiple applications RDBs can process complex XML queries on large databases within seconds Source: [Florescu99] Copyright © 2001 Informix 33 Three Things We Need To Do Get XML into Database (storage) Get XML out of Database (retrieval) Query XML (processing) Copyright © 2001 Informix 34 BLOB/CLOB Storage XML Storage Multiple Relational Table Mapping purchase_order customer <?xml version='1.0'?> <ORDER id="abc123" date="27 Oct 1999"> <PERSON age="50" gender="Male"> <NAME> <FAMILY>Doe</FAMILY> <GIVEN>John</GIVEN> </NAME> <ADDRESS> ... </ADDRESS> </PERSON> <ITEM id="s1">Shirt</ITEM> <ITEM id="j2">Jacket</ITEM> </ORDER> items XML DataPort Hierarchical Storage Copyright © 2001 Informix 1.0 ORDER id, date 1.1 1.2 1.3 PERSON ITEM ITEM gender,age id id 1.1.1 1.1.2 NAME ADDRESS 1.1.1.1 1.1.1.2 FAMILY GIVEN 35 Choosing RDB Storage Model If Relational schema already exists Consider If no Relational schema exists Consider mapping to multiple tables BLOB/CLOB model If documents needed in original form Consider hierarchical model Structural information is maintained Storage, query and retrieval of structure and content Copyright © 2001 Informix 36 XML Processing XML is SGML derivative HTML is SGML derivative Therefore … Tools used for HTML can be reworked for XML DTDs/XML Schema SELECT query results formatted as XML Copyright © 2001 Informix 37 XML Storage/Retrieval Multiple Relational Tables Roll-your-own Mapping Tool JDBC BLOB/CLOB Verity/Excalibur Hierarchical Storage Copyright © 2001 Informix 38 BLOB/CLOB BLOB storage for semi-structured data This is the usual approach Indexing is key to efficient query processing Full-text indexing for semi-structured data Advanced indexing for path queries Copyright © 2001 Informix 39 BLOB/CLOB XML Document Copyright © 2001 Informix 40 Indexing Example create table docs (id serial, xml_doc clob); insert into docs values (0, FileToClob('d:\xml\order_abc123.xml', 'server')); create index idx1 on docs (xml_doc vts_clob_ops) using vts in sbspace; select * from docs where vts_contains(xml_doc, '(John) <IN> GIVEN'); Copyright © 2001 Informix 41 XML Storage/Retrieval Multiple Relational Tables Roll-your-own Mapping Tool JDBC BLOB/CLOB Verity/Excalibur Hierarchical Storage Copyright © 2001 Informix 42 XML Storage/Retrieval Multiple Relational Tables Roll-your-own Mapping Tool JDBC BLOB/CLOB Verity/Excalibur Hierarchical Storage Copyright © 2001 Informix 43 XML Storage/Retrieval Multiple Relational Tables Roll-your-own Mapping Tool JDBC BLOB/CLOB Verity/Excalibur Hierarchical Storage Copyright © 2001 Informix 44 JAXP Overview Java API for XML Parsing (JAXP) is currently available for programmatically accessing XML documents JAXP can be divided into three sets Simple API for XML (SAX) Document Object Model (DOM) Plugability Layer Copyright © 2001 Informix 45 JAXP Glossary SAX - event-driven protocol, with the programmer providing callback methods that the parser invokes when parsing a document DOM - random-access protocol, which converts an XML document into a collection of in-memory objects Plugability Layer - standardizes access to SAX/DOM by providing “Factory” methods for creating and configuring SAX parsers and creating DOM objects (type “Document”) Copyright © 2001 Informix 46 XML in JDBC 2.20 We would like to support users who use JAXP in their JDBC applications without putting code that is specifically related to JDBC in the driver New static methods to facilitate storage and retrieval of XML data in database columns These methods not only support users of XML but also provide flexibility regarding which JAXP package the user is using Copyright © 2001 Informix 47 Storing XML Data The methods used during data storage will assist in Parsing the XML data Verify that well-formed and/or valid XML data are stored Invalid XML data are rejected Copyright © 2001 Informix 48 XMLtoString() Example -- Example of inserting an XML file into an lvarchar column create table tab1 (col1 lvarchar); try { String cmd = "insert into tab1 values(?)"; PreparedStatement pstmt = conn.prepareStatement(cmd); pstmt.setString(1, UtilXML.XMLtoString("/tmp/x.xml")); pstmt.execute(); pstmt.close(); } catch (SQLException e) { ... } Copyright © 2001 Informix 49 Retrieving XML Data The methods used during data retrieval will assist in converting data to type “InputSource” which is the standard input type for both SAX and DOM methods XML data to DOM XML Copyright © 2001 Informix 50 getInputSource() Example (1) -- Fetch XML data from an lvarchar column into an InputSource -- for (SAX) parsing try { String sql = "select col1 from tab1"; Statement stmt = conn.createStatement(); ResultSet r = stmt.executeQuery(sql); // Other SAX parsers can go here if desired Parser p = ParserFactory.makeParser("com.sun.xml.parser.Parser"); p.setDocumentHandler(new myHandler()); p.setErrorHandler(new errHandler()); Copyright © 2001 Informix 51 getInputSource() Example (2) while(r.next()) { InputSource i = UtilXML.getInputSource(r.getString(1)); p.parse(i); } r.close(); } catch (SQLException e) { ... } Copyright © 2001 Informix 52 DOM Support The DOM specification does not provide a standard way to create a DOM object JAXP provides factory methods that provide a standard way of creating DOM objects Copyright © 2001 Informix 53 InputStreamtoDOM() Example -- Fetch XML data from a text column into a DOM object create table tab2 (col1 text); try { String sql = "select col1 from tab2"; Statement stmt = conn.createStatement(); ResultSet r = stmt.executeQuery(sql); while(r.next()) { Document doc = UtilXML.InputStreamtoDOM(r.getAsciiStream(1)); } r.close(); } catch (SQLException e) { ... } Copyright © 2001 Informix 54 XML Parser JDBC driver uses Sun’s JAXP API and by default a non-validating XML Parser The default can be changed in two ways where <new parser> is the alternative parser % java -Dorg.xml.sax.parser=<new parser> System.setProperty("org.xml.sax.parser", "<new parser>"); Copyright © 2001 Informix 55 JAXP Summary JDBC 2.20 XML support makes it easy to store/retrieve XML documents to/from an Informix Database using Sun’s JAXP 1.0 API Ensures valid or well-formed XML document during insertion because of XML parsing using the SAX protocol Sun’s non-validation parser is used by default, but the ability to specify and use any parser is provided Copyright © 2001 Informix 56 Demonstrations Architecture for Demos Source: Derived from [Plummer99] Copyright © 2001 Informix 58 Cloudscape Cloudscape can store Java objects in table columns Not just blobs – objects have structure Java code can accept different data and store as XML Embed XML formatter into Cloudscape Extend server Copyright © 2001 Informix 59 Cloudscape Demo: Tables create table xml_objects (dtd_name char(20), constraint dtd_name_primary_key primary key, xml serialize(xmlobject)); create table dtd_nodes (nodename char(20), constraint nodename_primary_key primary key, contains_elements varchar(20), node_root boolean, contains_attributes varchar(20), attribute_required boolean, contains_data boolean, data_required boolean); Copyright © 2001 Informix 60 Cloudscape Demo: Java (1) import ... public class XMLObject implements Serializable { public Vector elementNames; public Vector elementValues; public String rtnString; public void XMLObject(Vector names, Vector values) { this.elementNames = names; this.elementValues = values; } ... Copyright © 2001 Informix 61 Cloudscape Demo: Java (2) ... public String returnXMLFormat(String DTD) { genFromDTD dtd = new genFromDTD(); // XML Formatter rtnString = genFromDTD.returnXMLFormat(this, DTD); return rtnString; } public String toString() { return "XMLObject Class"; } } Copyright © 2001 Informix 62 Cloudscape Demo: SQL select xml.returnXMLFormat('BOOKS') from xml_objects where dtd_name = 'BOOKS'; Copyright © 2001 Informix 63 Cloudscape XML Demo 1. Start Cloudview 3. Compile Java Files 5. Start Web Server 2. View XML 4. Start Cloudview 6. Start Browser 7. Stop Web Server Copyright © 2001 Informix 64 Object Translator Provides an object view of a database Supports Java™/EJB (and VB/MTS) Builds an object model from a relational schema DBA can focus on the schema, developers focus on Java Outputs components Supports Cloudscape, Informix and other JDBC sources Copyright © 2001 Informix 65 Mapping/Modelling Process Compile-time SQL UML OR Maps Runtime Database Access Object Model Forward Engineer Data Model Reverse Engineer Object Translator Solution Copyright © 2001 Informix 66 Object Translator 1.1 Developer maps XML documents to map objects Generated Java objects become XML document handlers Store and restore the XML document data in the database XML markup is not stored or restored Allows applications to use existing schemas for incoming XML documents Copyright © 2001 Informix 67 Object Translator XML Demo Use an existing XML document Create links between elements of XML document and attributes of map object Generate Java files and servlet from map object Compile and run Copyright © 2001 Informix 68 Object Translator XML Demo 1. Start Cloudview 3. Start OT 5. Start Web Server 2. View XML 4. Copy Files 6. Start Browser 7. Stop Web Server Copyright © 2001 Informix 69 What about Performance? A couple of independent benchmarks are being developed XMach-1 XML Store ... Copyright © 2001 Informix 70 Example Performance Results We conclude DTD approach is the best strategy among the six approaches we studied and there is no clear need to build an “XMLspecific” database system. Source: [Tian] Copyright © 2001 Informix 71 Final Thoughts ... Technology is moving fast Vendor marketing ahead of product capabilities Many “Beta” products available Copyright © 2001 Informix 72 Software Downloads Cloudscape http://www.cloudscape.com/ Object Translator http://www.informix.com/idn- secure/webtools/ot/ Copyright © 2001 Informix 73 Resources http://www.oasisopen.org/cover/xmlAndDatabases.html http://www.rpbourret.com/xml/ http://www.sees.bangor.ac.uk/~rich/resea rch.html http://www.xml-und-datenbanken.de/ http://www.soi.city.ac.uk/~akmal/html.dir/ benchmarks.html Copyright © 2001 Informix 74