* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download XML SUPPORT IN IBM DB2, SQL SERVER, ORACLE
Survey
Document related concepts
Relational algebra wikipedia , lookup
Concurrency control wikipedia , lookup
Microsoft Access wikipedia , lookup
Extensible Storage Engine wikipedia , lookup
Microsoft Jet Database Engine wikipedia , lookup
Oracle Database wikipedia , lookup
Open Database Connectivity wikipedia , lookup
Entity–attribute–value model wikipedia , lookup
Microsoft SQL Server wikipedia , lookup
Relational model wikipedia , lookup
Transcript
Comparison of XML Support in IBM DB2 9, Microsoft SQL Server 2005, Oracle 10g O. Beza¹, M. Patsala², E. Keramopoulos³ ¹Dpt. Of Information Technology, Alexander Technology Educational Institute (ATEI), Thessaloniki, Greece, E-mail: [email protected] ²Dpt. Of Information Technology, Alexander Technology Educational Institute (ATEI), Thessaloniki, Greece, E-mail: [email protected] ³Dpt. Of Information Technology, Alexander Technology Educational Institute (ATEI), Thessaloniki, Greece, E-mail: [email protected] Abstract In this paper we present the relation between XML (Extensible Markup Language) documents and Relational Database Management System (RDBMS) IBM DB2 9, MICROSOFT SQL SERVER 2005 and ORACLE 10g. The research aims to develop and describe the ways in which we can manipulate this type of documents using these three XML-enabled Databases and perform a comparison analysis of their XML support. The paper discusses the basic characteristics/concepts of XML and it presents the structure of XML documents, all related technologies (DTDs, SCHEMATA, etc) and two of the most important XML Query languages XPath and XQuery. Moreover, we outline the basic concepts of Database systems and how they can benefit using XML. The emphasis of the paper is given in the presentation of the comparison analysis, which is based on a list of basic features of XML that a RDBMS should support. We introduce these XML features and we analyze the comparison analysis by presenting examples of using XML with IBM DB2 9, MICROSOFT SQL SERVER 2005 and ORACLE 10g. Finally we summarize all our conclusions in a comparison table which contains all the supported XML operations from the three RDBMSs. Keywords: XML, XML-enabled Databases, DTD, XML Schema 1. Introduction XML (Extensible Markup Language) [1, 2] is a markup language developed by the World Wide Web Consortium (W3C) [3] to deliver structured content over the web. XML was originally developed as an application profile of SGML [4], but soon XML made an instant success for a variety of other application domains. That’s because XML provides many advantages as a data format over others, including: 1. Built-in support for internationalization due to the fact that it utilizes unicode. 2. Platform independence. 3. Human readable format which is easier for developers to locate and fix errors than with other data storage formats. 1 4. Extensibility in a manner that allows developers to add extra information to a format without breaking applications that based on older versions of the specific format. 5. Large number of off-the-shelf tools for processing existing XML documents. XML databases have become widely accepted for all applications where the storage of XML data is necessary. There are three different types of XML databases [5], namely: XML Enabled Database: A database that holds data in some format different than XML. An interface is provided, which presents XML info to the application even though the data is stored in some other format than XML. An XML-enabled database might be a relational database, an object-relational database, or an object-oriented database. Native XML Database: This type of database allows XML data to be stored directly. Also, they define a (logical) model for an XML document that stores and retrieves documents according to that model. Native XML databases are likely to perform better than XML-enabled databases since there is little need for converting the data. The data conversion in an enabled database is almost always going to be more significant and time consuming than with a native database. Hybrid XML Database: A database that have characteristics of Native XML Databases and XML Enabled Databases. IBM DB2 9, Microsoft SQL Server 2005 and Oracle 10g are XML enabled relational database management systems (RDBMS). DB2 offers two ways of processing XML documents: XML Extender [6, 7] and PureXML [8, 9]. In this paper, we present the XML characteristics that the three RDBMSs should support. In particular, in section 2 the three DBMSs are examined against some XML technologies, such as DTDs, XML Schema, XPath, XQuery and XSL. The method that the three DBMSs use in order to store an XML Document is given in Section 3. In the next section, we focus on the mapping between XML documents and DBMSs and the XML indexes. In Section 5, we examine how the three DBMSs compose and decompose XML documents into/from relational table columns. In section 6, we study the way that three DBMSs use the extension of SQL3 [10] for using XML, i.e. SQL/XML. Finally, in section 7 we conclude by summarizing in a comparison table which contains all the described XML features that a RDBMS should support. 2. XML Technologies 2.1 DTD (Document Type Definition) Documents DTDs [11] are documents where are defined some markup rules as a vocabulary. DTDs have a different syntax from XML and are generally used to specify the order and occurrence of elements in an XML document. In fact the use of DTDs is not so popular since the XML schemata were introduced. However there are programmers that prefer to use DTDs mainly because DTDs are easier to code and validate than XML Schemata. Sql Server does not support the use of DTD files. DB2 2 PureXML does not support DTD validation but it permits the insertion of documents that contain a DOCTYPE that refer to DTDs. On the other hand, DB2 XML Extender and Oracle fully support DTD validation. 2.2 XML Schemata An XML Schema [12] is a mechanism introduced by the W3C and can be used in place of a DTD to define the specifications for the content of XML documents. All three DBMSs register an XML schema in their database. Oracle and DB2 provide a repository that contains all registered XML technologies used for validation and stores them in their hierarchical structure, named XML DB Repository and XML Schema Repository respectively. SQL Server does not provide such a tool, but it provides a method for modifying an XML Schema which is, the alter xml schema collection. Similarly DB2 gives the method add xmlschema document to. In Oracle once we register the schema in the database we can not modify it. Moreover, Oracle provides two methods isSchemaBased that checks if the inserted XML document conforms to a schema and isSchemaValidated that checks if the document inserted in a column is valid. Finally, in Oracle and SQL Server we can create XML schemata in the database. One of the most important reasons that XML Schemata are used by DBMSs is to validate XML documents before inserted in the columns of a table. In the case of SQL Server we have to define from the creation of the table whether the XML column will contain an XML document that conforms to an XML Schema. Thus, if we create a typed table the XML document should contain all the tags and the same names that the schema defines. On the contrary in Oracle we just have to name the root element and the rest of the document may differ. In the case of DB2 we have to define on insert command whether the document will be validated against a schema or not, as we can see in the following example. insert into PurchaseOrder(poid,info) values (2002,XMLVALIDATE( XMLPARSE(DOCUMENT'<purchaseOrder poid="2002" orderDate="199910-20" status=""> … </purchaseOrder>') ACCORDING TO XMLSCHEMA ID migrate.po)); 2.3 XPath-Xquery-XSL XPath [13] is a query language that conforms to a data model (DTD, XML Schema) and provides a hierarchical representation of XML documents. All three DBMS use it to navigate through elements and attributes in an XML document that is stored n a column of XML type. XQuery [14] is a W3C Recommendation and conforms to the same data model of XPath. XQuery is used for finding and extracting elements and attributes from XML documents. According to our research DB2’s support of XQuery is superior compared to the other two since it treats XQuery as a first-class language. Only DB2 XML Extender does not support XQuery. 3 dB2 PureXML's XQuery: select xmlquery('$cinfo/purchaseOrder/shipTo/name' passing info as "cinfo") from purchaseOrder ; XQuery: xquery for $y in db2fn:xmlcolumn('PURCHASEORDER.INFO')/purchase Order/items return $y SQL Server's XQuery: Select poid, info.query('for $y in /purchaseOrder/items return <topic>{$y/item[@pid]}</topic>') from purchaseorder Oracle's XQuery: SelectXMLQuery ('$cinfo/product/description/name[ora:contains (.,"Roll")>0]' passing info as "cinfo" returning content) from product; XSL (Extensible Stylesheet Language) [15] is a language for expressing style sheets. In other words it defines how an XML document should be presented. All three DBMS fully support the use of XSL. 3. Storage Methods In this section we examine the method that the three DBMSs use in order to store an XML Document in a database. In particular, SQL Server [16, 17] stores XML Documents in table columns of XML type like BLOBs (Binary Large Objects). In the case that the XML document stored in an untyped 1 column then it is stored as Unicode (UTF-16) whereas in the other case that the XML Document is stored in a typed 2 column then it is stored with the same type as the XML schema defines. For example, Create table Product (pid varchar(10) not null primary key, name varchar(128), category varchar(32), price decimal(30,2), info xml); Create table purchaseorder ( POid bigint not null primary key, Status varchar(10) not null default 'New', Info XML(content PO)); Oracle [18, 19] stores XML documents as intact documents in xmltype type columns of tables like CLOBs (Character Large Objects) or BLOBs (Binary Large Objects) or as a distinct xmltype table. For example, Create table purchaseOrder ( POid bigint not null primary key, status varchar (10) not null, info xmltype) xmltype info xmlschema "http://www.w3.org/2001/XMLSchema" element "purchaseOrder"; 1 Typed is a terminology used in SQL Server to describe those columns of XML type that do not comply with an XML Schema. 2 Untyped is a terminology used in SQL Server to describe those columns of XML type that comply to an XML Schema 4 Create table purchaseOrder of xmltype XMLSCHEMA "http://www.w3.org/2001/XMLSchema" element "purchaseOrder"; DB2 XML Extender stores the XML document in a single column as character data, extracting values into "side tables". For example, Create table PurchaseOrder ( POid bigint not null primary key, Status varchar (10) not null with default 'New',Info db2xml.xmlclob not logged not null); In case of DB2 PureXML the XML document is stored in a column of XML type. What is worth mentioning is that PureXML does not store documents as plain text and does not map XML to relational or objectrelational tables. Instead, it stores XML in its inherent hierarchical format, which matches the XML data model. Any XML document is a well-defined tree of elements and attributes, and XML queries are expressed in terms of tree traversal. An example of storing an XML document in DB2 PureXML is given: Create table PurchaseOrder ( POid bigint not null primary key,Status varchar (10) not null with default 'shipped', Info xml not null); 4. XML Mapping and Indexing The concept of mapping [20] is of greatest importance for XML Enabled Databases, and that’s because the data transfer between the XML document and the database is based on the mapping between them. Using DB2’s XML Extender, the mapping between the tables of the database and the structure of the XML document is defined by a document called DAD (Data Access Definition). This document maps the elements of the XML document with the columns of the table. In contrary, DB2 PureXML uses annotated XML Schemata [12], instead of DAD files. Generally annotated Schemata, which are also referred as mapping schemata, are used by all three RDBMSs for mapping. Annotations can be defined on tables (sql:relation annotation), on fields (sql:field annotation) and on referential integrity relationships (sql:relationship annotation). In case that an XML schema is not registered in a database, each one of the DBMSs use a default mapping. SQL Server also uses FOR XML clauses [21] that define how the select clauses are mapped to XML documents. A common characteristic of all the XML Enabled Databases is the support of XML indexes [17, 22] which are produced by elements and attributes of XML documents. Just like relational indexes, XML indexes are used to improve the performance of queries. The user should always create indexes over frequently accessed data that results in a much better performance of the select statements and executed over the indexed data. 5 5. Managing XML Documents and Relational Data When working with XML documents and Databases we can either store the documents intact in columns of XML type or decompose XML data into relational tables. Another operation we can perform is to compose XML documents from existing relational data. In case of decomposition DB2 XML Extender provides the method of XML Collection [23]. XML Collection is defined by a DAD document that determines how the elements and the attributes are to be mapped in one or more relational tables. After we enable the database for XML Collection (dxxadm enable_collection database_name name “path”) we insert the XML data in the tables using DAD and XML documents. On the other hand DB2 PureXML uses an annotated XML Schema that we have registered in the database and we have enabled it for decomposition and with the command decompose xml document the desirable XML data are inserted into tables. SQL Server uses a stored procedure sp_xml_preparedocument and OPENXML clauses [16] for decomposing XML data in relational tables. This procedure does not require an XML Schema, we just have to define the XML document. This approach is not automated like DB2’s and it does not support insertion in more than one tables. Oracle performs a similar procedure using dbms_xmlgen. By defining an XML document and the name of tables we want to create and their contained columns we can insert XML data in them. This approach is far more complicated compared to DB2’s, especially when we want to update a lot of tables or insert more than one XML documents. Apart from these functions we can create XML documents and Schemata from existing relational data. Oracle and DB2 use SQL/XML methods to produce XML documents whereas SQL Server uses OPENXML statements. Finally, DB2 does not support XML Schema creation. 6. SQL/XML SQL/XML [24] is an extension of SQL that is part of ANSI/ISO SQL 2003. SQL/XML was developed by INCITS H2.3 [25], with participation from Oracle, IBM, Microsoft (which does not plan to implement SQL/XML), Sybase [26], and DataDirect Technologies [27]. It's extensions include the following: • Mapping SQL tables, schemas, and catalogs to XML documents. • Generation of an XML schema corresponding to an XML document generated from SQL data. • An XML data type to allow columns of SQL tables to contain XML data. • Publishing functions that allow SQL queries to create XML structures using XML publishing functions including: XMLELEMENT, XMLATTRIBUTES, XMLFOREST, XMLCONCAT, XMLAGG, and XMLGEN 6 7. Conclusions Summing up this paper we quote some observations we made during our research. Oracle was rather slow working in Windows XP and less userfriendly compared to the other two DBMS. What we find quiet convenient working with SQL Server and DB2 was the existence of hyperlinks in the columns that contain XML documents and in query results. One limitation of DB2 9 is that working with XML Extender is necessary to make a number of steps in order to enable the database and in case of PureXML we have to work in a database with codeset UTF-16. Below we indicate a comparison table that consists of all the functions and tools that DBMS use for XML support. FEATURES XML Technologies Storage Methods XML Data Type DTD XML Schemas Xpath Xquery XSL BLOB CLOB VARCHAR Native XMLType XML XMLVarchar XMLClob XMLFile Columns of XML Type Tables of XML type XML Validation XML Shredding Composition of XML Documents Composition of XML Schema XML Mapping XML Indexing SQL/XML XML Repository DB2 PureXML XMLExtender 3 3 3 3 3 3 3 3 3 3 3 3 ORACLE 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 DTD XML Schema SQL SERVER 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 Figure 1: Comperative Table As we can see from the above table DB2 XML Extender does not support the use of XML Schemata and XQuery whereas SQL Server fells short in the use of DTDs. 7 One big advantage of DB2 PureXML is the native storage of XML data. This approach contributes to a faster query performance and data access. One of the most interesting functions that SQLServer and Oracle offer is the creation of XML Schema, something that DB2 does not support. Finally, DB2 PureXML and Oracle provide a very helpful tool for managing XML schemata and validating technologies, the XML Repository. REFERENCES [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] [22] [23] [24] [25] [26] [27] Extensible Markup Language (XML) 1.0 (Fourth Edition), W3C Recommendation, (29 September 2006) Available at: http://www.w3.org/TR/xml/ Extensible Markup Language (XML) Tutorial. Available at: http://www.w3schools.com/ xml/ World Wide Web Consortium (W3C), Available at: http://www.w3.org/ SGML, Available at: http://www.w3.org/TR/html4/intro/sgmltut.html What is an XML database?, Available at: http://xmldb-org.sourceforge.net/faqs.html IBM Redbooks, XML Guide , db2xge90 IBM Redbooks, XML Extender Administration and Programming, Version 8.2, db2sxe81 IBM Redbooks, DB2 9: pureXML Overview and Fast Start, sg247298 IBM Redbooks, DB2 9 pureXML Guide, sg247315 ISO/IEC 9075-1:2003. Information technology — Database languages — SQL — Part 1: Framework (SQL/Framework). Kelvin Williams, Professional XML Databases, Wrox Press, Ltd 2000 Introduction to Annotated XSD Schemas (SQLXML 4.0), Available at: http://technet.microsoft.com/en-us/library/ms171870.aspx XML Path Language (XPath) Version 1.0, W3C Recommendation, (16 November 1999), Available at: http://www.w3.org/TR/xpath XML Path Language (XQuery) Version 1.0, W3C Recommendation Available at: http://www.w3.org/TR/xquery Extensible Stylesheet Language (XSL) Version 1.1, Available at: http://www.w3.org/TR/xsl/ Scott Klein, 2006/ Professional SQL Server 2005 XML. Wiley Publishing. Mitch Ruebush, Comparing SQL Server 2005 and Oracle 10g as a Database Platform for Microsoft .NET Developers, April 2005 Shelley Higgins, Oracle Application Developer’s Guide - XML, 10g (9.0.4) Part No. B12099-01, Oracle Corporation, 2003 Geoff Lee, Mastering XML DB Queries in Oracle 10g, Release 2, Oracle Corporation, March 2006. Igor Dayen, Storing XML in Relational Databases, June 20, 2001 Srinivas Sampath, Beginning SQL Server 2005 XML Programming, 21 February 2006 IBM Redbooks, DB2 9: Indexing XML documents with DB2 9 pureXML IBM Redbooks, XML for DB2 Information Integration, SG24-6994 SQL/XML, Available at: http://www.stylusstudio.com/sqlxml_tutorial.html INCITS H2.3, Available at: http://www.incits.org/ Sybase, see also:www.sybase.com/ DataDirect Technologies, Available at: www.datadirect.com/ 8