Download Paper

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Concurrency control wikipedia , lookup

Relational algebra wikipedia , lookup

Extensible Storage Engine wikipedia , lookup

Microsoft Jet Database Engine wikipedia , lookup

Functional Database Model wikipedia , lookup

Oracle Database wikipedia , lookup

Open Database Connectivity wikipedia , lookup

Microsoft SQL Server wikipedia , lookup

Database wikipedia , lookup

SQL wikipedia , lookup

PL/SQL wikipedia , lookup

Entity–attribute–value model wikipedia , lookup

Versant Object Database wikipedia , lookup

Clusterpoint wikipedia , lookup

Relational model wikipedia , lookup

Database model wikipedia , lookup

Transcript
Habiba Skalli
ID: 001C543860
XML and Data Management
For: Dr. Haddouti
Fall 2003
Storing XML using Relational Model
Outline:
1. Introduction:
a. Why Store XML.
b. Where Store XML?
2. XML Storage Requirements.
3. Strategies of Storing XML.
4. Mapping XML to Relational Model.
5. XML enabled databases:
a. Microsoft SQL Server 2000.
b. Oracle 8i/9i.
c. IBM DB2.
6. Conclusion.
7. References.
1. Introduction:
XML is becoming the standard for the exchange of information from Business to Business.
And the amount of information exchanged using XML grows very fast. But why do we need
to store XML?
The answer to this question is:
- XML documents are conceived as transitory form of data.
- XML is not designed to facilitate efficient data retrieval and storage.
- Processing and accessing data in large XML files is a time consuming process.
- XML data should be stored in a consistent and efficient manner to enforce data
integrity. [1+5]
XML documents can be stored either into an RDBMS, an ODBMS or a Native XML
Database.
The focus of this paper is on the relational database model.
2. XML Storage Requirements:
XML documents are either: Data Centric, Document Centric or Semi structured.
a. Data Centric Documents:
An XML document is data centric if it has a regular structure and contains
updateable data (e.g. invoice documents, purchase orders, flight schedules).
1
“Traditional relational databases are typically better at dealing with data centric
requirements.”
The technique that is appropriate for storing data centric XML documents is
mapping the structure of the XML document to the database: store the XML
document contents into relational tables. [2]
An Example of data centric document:
<order>
<customer>Meyer</customer>
<position>
<isbn>1-234-56789-0</isbn>
<number>2</number>
<price currency=„Euro“>30.00</price>
</position>
</order> [17]
b. Document Centric Documents:
Document centric data tends to be more unpredictable in size and content (e.g.
newspaper content, articles, and advertisements).
Native XML databases and content management systems are typically better at
storing document centric data.
A content management system is an application designed to manage documents
and it can be built on top of a native XML database. [2]
c. Semi Structured Documents:
The difference between data centric and document centric documents is not always
easy to identify. Semi structured documents are combination of data centric and
document centric. [2]
An example of semi structured document:
<movie>
<title>Insomnia</title>
<year>2002</year>
<company>Warner Bros</company>
<description>Sent to a small Alaska town to investigate the murder of a
teenage girl, a veteran police officer (Al Pacino) is forced…</ description >
</movie>
2
3. Strategies for Storing XML:
The storage of XML can be achieved in three different ways: structure based storing,
model based storing and text based storing.
a. Structure Based Storing (e.g. STORED, POET):
In structure based storing, the database schemas represent the logical structure of the
XML document (or DTDs if they are available). Therefore, a relation or a class is
created for each element type in the XML documents. [14]
Take the example of STORED (Semi-structured TO Relational Data). STORED is a
query language that does the mapping between XML (semi-structured data) and
relational data. This mapping is generated automatically using data mining techniques.
Mapping with STORED is lossless because an overflow graph is used to store the data
that does not fit into the schema. After the STORED process is finished, the system
becomes ready for insert and update in the database. [19]
b. Model Based Storing (e.g. Excelon’s Xis, Infonyte):
In model based storing, a fixed database schema is used to store the structure of all
XML documents. [14]
Take the example of Xis. Xis is a native XML database that stores XML data directly
as Document Object Model (DOM) trees. It makes changes in the structure and data of
XML documents dynamically and in real time by processing only those XML
elements or sub-elements needed for a particular business process or transaction. [20]
c. Text Based Storing:
Text based storing / method 1:
Store the entire document as text, such as a Binary Large OBject (BLOB) into the
relational database. This strategy is appropriate for document centric data.
All leading RDBMS vendors support storing the entire document (Microsoft SQL
Server 2000, Oracle8i, IBM DB2). This method is simple but it does not allow flexible
indexing and searching of data. [2]
Text Based storing / method 2:
Store the entire document in the file system with a pointer to that file stored in the
database. This method is useful if the number of XML documents is small and
infrequently updated and it is supported by leading database vendors. But it has
problems: (1) It is not very flexible for storing and retrieving data (2) has the problem
of security since the files are stored outside the database. [2]
3
4. Mapping XML to Relational Model:
The strategy that is the most used for storing XML into the Relational BD is mapping
the XML document structure to the database which means that the XML data is stored
into relational tables (called side tables). This method allows easy search, update and
retrieval of the data stored in the database.
The problem here is how to do the mapping between and the database knowing that
the relational database model is based on primary keys, foreign keys, tables, rows and
columns. Obviously, XML is built on a different principle.
The differences between XML and relational databases can be summarised as follows:
[1]
Therefore the mapping between XML and relational model can be achieved as
follows:
[2]
Mapping Example: [15]
<SalesOrder>
<Number>1234</Number>
<Customer>Gallagher Industries</Customer>
<Date>29.10.00</Date>
<Item Number="1">
<Part>A-10</Part>
<Quantity>12</Quantity>
<Price>10.95</Price>
</Item>
<Item Number="2">
<Part>B-43</Part>
<Quantity>600</Quantity> <Price>3.99</Price>
</Item>
</SalesOrder>
4
This XML document can be mapped into the following tables:
SaleOrders
---------Number Customer
Date
----------- --------------------------- -------------1234
Gallagher Industries 29.10.00
... ... ... ... ... ...
Items
-------SONumber Item Part Quantity Price
--------------- ------ ------ ----------- ------1234
1
A-10 12
10.95
1234
2
B-43 600
3.99
... ... ... ... ...
Problems with mapping:
There are many problems associated with the mapping between semi-structured
documents and relational tables. The first one is that the mapping between XML and
the DB is not done in a natural way. Second, the XML documents have to be cut into
pieces and inserted into the relational tables. Third to query the database, tables should
be joined, and with large tables, this process becomes time consuming. And finally, it
takes time to restore the entire XML document. [18]
The solution for these problems associated with relational model is to use Native XML
databases.
5. XML Enabled Databases:
Various Database vendors have developed efficient solutions for automatic conversions of
XML into and out of relational databases: Microsoft's SQL Server 2000, Oracle 8i, Oracle
9i and IBM DB2.
The tools and techniques offered to achieve retrieval and storage of the XML data vary.
a. Microsoft SQL Server 2000:
SQL server 2000 introduced many features to deal with the storage and retrieval of
XML documents:
- SELECT statement result sets can be mapped into XML documents by using
FOR XML keyword.
- XML documents can be stored into the database using OPENXML. [2]
Retrieving XML documents from the database using FOR XML:
In General, the SQL code that generates an XML document from the database is:
SELECT select_list
FROM table_source
WHERE search_condition
FOR XML AUTO | RAW | EXPLICIT [, XMLDATA] [, ELEMENTS] [, BIN
ARY BASE64]
5
FOR XML – example: [12]
Assume we have these two tables in the SQL database:
Orders:
OrderID
OrderDate
10248 1996-07-04T00:00:00
10249 1997-07-04T00:00:00
Order Details:
OrderID
ProductID
UnitPrice
Quantity
10248
11
14
12
10248
42
9,8
10
10248
10249
72
15
34,8
5
5
2
Using this SELECT statement we will construct an XML document that will contain
information about an order (create an invoice):
SELECT Invoice.OrderID InvoiceNo,
OrderDate,
ProductID,
UnitPrice Price,
Quantity
FROM Orders Invoice JOIN [Order Details] Item
ON Invoice.OrderID = Item.OrderID
WHERE Invoice.OrderID = 10248
FOR XML AUTO, ELEMENTS
The SELECT statement gives the following result:
<Invoice>
<InvoiceNo>10248</InvoiceNo>
<OrderDate>1996-07-04T00:00:00</OrderDate>
<Item>
<ProductID>11</ProductID>
<Price>14</Price>
<Quantity>12</Quantity>
</Item>
<Item>
<ProductID>42</ProductID>
<Price>9.8</Price>
<Quantity>10</Quantity>
</Item>
<Item>
<ProductID>72</ProductID>
6
<Price>34.8</Price>
<Quantity>5</Quantity>
</Item>
</Invoice>
- AUTO allows you to rename the attributes in the XML document. The default is the
name of the table.
-Queries executed using RAW mode return an XML element for each row in the
resulting rowset.
- EXPLICIT is used to produce both elements and attributes.
- XMLDATA is optional and used only with EXPLICIT.
- ELEMENTS instructs the creation of sub-elements instead of attributes.
- BINARY is used to extract binary data (such as an image) in the XML document.
[12]
Storing XML documents into the database using OPENXML:
The storing is done in three steps:
1. Compiling the XML document into internal DOM representation to obtain an
“XML document handler” using the stored procedure sp_xml_preparedocument.
2. Creating a database schema with OPENXML.
3. Removing the compiled XML document from memory using the stored procedure
sp_xml_removedocument. [3]
OPENXML – Example: [9]
DECLARE @idoc int
DECLARE @doc varchar(1000)
SET @doc =‘
<ROOT>
<Customer CustomerID="VINET" ContactName="Paul Henriot">
<Order CustomerID="VINET" EmployeeID="5" OrderDate="1996-0704T00:00:00">
<OrderDetail OrderID="10248" ProductID="11" Quantity="12"/>
<OrderDetail OrderID="10248" ProductID="42" Quantity="10"/>
</Order>
</Customer>
<Customer CustomerID="LILAS" ContactName="Carlos Gonzlez">
..
</Customer>
<ROOT>'
--Create an internal representation of the XML document.
EXEC sp_xml_preparedocument @idoc OUTPUT, @doc
-- Execute a SELECT statement that uses the OPENXML rowset provider.
SELECT *FROM
OPENXML (@idoc, '/ROOT/Customer',1)
WITH (CustomerID varchar(10), ContactName varchar(20))
EXEC sp_xml_removedocument @idoc
7
The result set is:
CustomerID ContactName
--------------- ----------------VINET
Paul Henriot
LILAS
Carlos Gonzlez
Storing XML documents into the SQL database uses OPENXML (for insert and
update).
OPENXML is an extension to transact-SQL that provides a rowset.
The syntax:
OPENXML(idoc int [in], rowpattern nvarchar[in], [flags byte[in]])
[WITH (SchemaDeclaration | TableName)]
- idoc is the document handle of the internal representation of an XML document
created by calling sp_xml_preparedocument.
- rowpattern is the XPath pattern used to identify the nodes.
- flags Indicates the mapping that should be used between the XML data and the
relational rowset. [9]
SQL: mapping XML schema into DB schema:

One way to map an XML schema to a relational schema (and the opposite) is to
transform the XML schema into a mapping schema.
It is the mapping schema that allows the mapping of elements and attributes to tables
and columns and the retrieval of relational data as XML documents.
XML View Mapper utility is a utility for SQL Server 2000 that allows the creation of
an XDR (XML Data Reduced) schema from relational tables. it also allows the
creation of a mapping schema corresponding to the XML schema. XML View Mapper
also generates XDR schema from an XML document or create schema from DTD.
[21, 22]
NB: XDR schemas were created by Microsoft to be used with their products.
Example of mapping schema:
<?xml version="1.0" ?>
<Schema xmlns="urn:schemas-microsoft-com:xml-data"
xmlns:dt="urn:schemas-microsoft-com:datatypes"
xmlns:sql="urn:schemas-microsoft-com:xml-sql">
<ElementType name="Employee“ sql:relation="Employees">
<AttributeType name="EmpID" />
<AttributeType name="FName" />
<AttributeType name="LName" />
<attribute type="EmpID" sql:field="EmployeeID" />
<attribute type="FName" sql:field="FirstName" />
<attribute type="LName" sql:field="LastName" />
</ElementType>
</Schema>
[21]
8
The bolded elements are what makes this schema a mapping schema.
b. Oracle 8i - 9i:
Oracle offers XML SQL utility (a set of Java classes) that:
– models XML document elements as a collection of nested tables and allows
insert, update and delete.
– generates XML documents from SQL query results or a JDBC result set
object. [8]
[2]
Extracting XML documents from the database:
We want to get the result set of the following SQL statement into an XML document:
SELECT Title, Author, Publisher, Year, ISBN
FROM BookList WHERE BookID = 1234
XML utility can be used to generate a DTD based on the schema of the underlying
table being queried.
The following simple code allows the creation of an XML document that contains the
result set of the query:
import java.sql.*;
import java.math.*;
import oracle.xml.sql.query.*;
import oracle.jdbc.*;
import oracle.jdbc.driver.*;
public class read_samp1e
{
public static void main(String args[]) throws SQLException
{
String user = "scott/tiger";
DriverManager.registerDriver(new oracle.jdbc.driver.OracleDriver());
//init a JDBC connection by passing in the user
Connection conn =
DriverManager.getConnection("jdbc:oracle:oci8:"+user+" @");
// init the OracleXMLQuery by using the initialized JDB
connection
and passing in "Booklist" as tabName
9
OracleXMLQuery qry = new OracleXMLQuery(conn,"select * from
Booklist WHERE BookID = 1234");
// get the XML document in the string format which allows
us to print it
String xmlString = qry.getXMLString();
// print out the result to the screen
System.out.println(" OUPUT IS:\n"+xmlString);
// Close the JDBC connection
conn.close();
}
}[13]
The query results in the creation of the following XML document:
<?xml version="1.0"?>
<ROWSET>
<ROW id="1">
<TITLE>Oracle 9i</TITLE>
<AUTHOR>Mike Wilson</AUTHOR>
<PUBLISHER>William Morrow and
Co.</PUBLISHER>
<YEAR>1997</YEAR>
<ISBN>0688149251</ISBN>
</ROW>
</ROWSET>
[13]
Storing XML documents into the database:
Storing the XML document into the database uses the following instructions:
//Init the OracleXMLSave class:
OracleXMLSave sav = new OracleXMLSave(conn, Booklist);
sav.insertXML(simpledoc.xml);
[13]
Remark: XSU (XML SQL) utility does not allow the storage of attributes. These have
to be transformed into elements. [3]
Oracle: mapping XML schema into DB schema:
The Oracle 9i Release 2 has a new feature called XML DB Repository. This repository
allows storage of XML documents directly in Oracle9i Database provided an XML
schema. Once the XML schema is registered, the XML-to-relational database mapping
is done automatically. After storage of XML data into the RDB, you can restore an
XML document with the same DOM representation of the original XML document.
[23]
Oracle 9i – XML Type: [24]
Oracle9i Database implements a number of standards-based functions to query
relational data and return XML documents: XMLType, XMLELEMENT…
10

XML or XMLType is a datatype to hold XML data.
XMLType view contains the result of an SQL query in the form of an XML document.

Syntax:
CREATE OR REPLACE VIEW STUDENT
OF XMLType WITH …
c. IBM DB2: [11]
IBM DB2 offers DB2 XML Extender. It allows the storage of XML documents in two
ways:
– XML Column: allows the storage and retrieval of the entire XML document
as a column data.
– XML Collection: decomposes/composes the XML document into/from a
collection of relational tables.
Document Access Definition:
DB2 XML Extender provides a mapping scheme called a Document Access Definition
(DAD). DAD is a file that allows an XML document to be mapped into relational data
using either XML columns or XML Collections using the DTDs.
A unique feature of DB2 is the ability to manage and index XML documents located
within the file system, a single column, or spread across multiple tables and columns.
6. Conclusion:
Storing XML into relational model is a huge and very important topic. We saw that
there are many techniques to store XML documents among these are text based
storage and the mapping between XML and relational model.
Text based model is simple but it does not allow very flexible search and update.
Mapping XML into relational model is the most popular way to store XML, it allows
flexible manipulation and search of data but it has few problems due to the differences
between the structure of XML and relational data. Despite these problems, I think that
this method is the most appropriate for XML storage especially for data centric
documents.
Oracle 9i, Oracle 8i, Microsoft SQL server 2000 and IBM DB2 all support storage of
XML documents.
Oracle uses XML SQL Utility (XSU), SQL server uses OPENXML and FOR XML
keywords in the SQL statements and DB2 uses DAD.
I think the leading database is Oracle 9i that has added very important features to make
easier to do the mapping.
11
7. References:
1. www.eaijournal.com/PDF/StoringXMLChampion.pdf
2. www.acm.org/crossroads/xrds8-4/XML_RDBMS.html
3. http://www.xml.com/pub/a/2001/06/20/databases.html
4. http://www.w3.org/XML/RDB.html
5. http://www.infoloom.com/gcaconfs/WEB/granada99/noe.HTM
6. http://www.hitsw.com/products_services/whitepapers/integrating_xml_rdb/
7. http://www.utdallas.edu/~lkhan/papers/APESXDRD_ProcACM3rdWIDM2001.pdf
8. http://www.xml.com/pub/r/846
9. http://msdn.microsoft.com/library/default.asp?url=/library/en-us/tsqlref/ts_oaoz_5c89.asp
10. http://msdn.microsoft.com/library/default.asp?url=/library/enus/xmlsql/ac_openxml_94mk.asp
11. Db2XMLeXtender.pdf
12. www.microsoft.com/mspress/books/sampchap/5178a.asp
13. www.devx.com/assets/download/4513.pdf
14. http://nike.psu.edu/classes/ist597/2003-fall/papers/TOIT2001-authorCopy.pdf
15. http://www.xml.com/pub/a/2001/05/09/dtdtodbs.html?page=1
16. http://www.rpbourret.com/xml/ProdsNative.htm
17. Course slides.
18. www.csis.hku.hk/~dbgroup/seminar/wlian020308.ppt
19. http://nike.psu.edu/classes/ist597/2003-fall/ papers/deutsch98storing.pdf
20. http://www.nhm.ac.uk/science/rco/enhsin/ENHSIN_Caching.pdf
21. http://msdn.microsoft.com/library/default.asp?url=/library/enus/xmlsql/ac_mschema_5cfn.asp
22. http://www.databasejournal.com/features/mssql/article.php/10894_2235451_2
23. http://otn.oracle.com/oramag/webcolumns/2003/ techarticles/scardina_xmldb.html
24. http://otn.oracle.com/oramag/oracle/03-may/o33xml.html
12