Download XML Today

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

SQL wikipedia , lookup

Microsoft Jet Database Engine wikipedia , lookup

Extensible Storage Engine wikipedia , lookup

Microsoft SQL Server wikipedia , lookup

Open Database Connectivity wikipedia , lookup

Database wikipedia , lookup

Functional Database Model wikipedia , lookup

Entity–attribute–value model wikipedia , lookup

Relational model wikipedia , lookup

Clusterpoint wikipedia , lookup

Database model wikipedia , lookup

Transcript
XML Today
assoc. prof. Vladimir Dimitrov
Faculty of Mathematics and Informatics, Sofia University
POB 1829, Sofia 1000, Bulgaria
[email protected]
Abstract
State of the art of XML usage in information
systems is discussed. XML products are
classified into the following categories:
Middleware, XML-Enabled Databases, Native
XML Databases, XML Servers, Wrappers,
Content Management Systems, XML Query
Engines, and XML Data Binding. DBMS
based ways of incorporating XML is
analyzed.
Introduction
XML is designed for:
 Document interchange between different
systems (local or remote);
 Structured data export/import from
databases;
 Textual databases (semi-structured and
unstructured documents) used for full text
retrieval.
Introduction
XML products can be classified into the following categories:

Middleware. This software is used by data-centric applications to transfer data between
XML documents and databases;

XML-Enabled Databases. Database systems extended for data transfer between XML
documents and the database. They are used in data-centric applications;

Native XML Databases. Database systems that store XML in "native" form, which can be
some variant of the DOM mapped to an underlying persistent data store. They are used in
data- and document-centric applications;

XML Servers. These are a XML-aware J2EE servers, Web application servers, integration
engines, and custom servers. These servers can be used for distributed applications or
simply for publishing of XML documents on the Web. They are used for data- and
document-centric applications;

Wrappers. This kind of software treats XML documents as a source of relational data.
Typically they support SQL for querying XML documents. It is used in data-centric
applications;

Content Management Systems. These are applications support content/document
management. They can be implemented with XML databases or directly on the file system.
Usually they include features such as check-in/check-out, versioning, and editors. They are
used in document-centric applications;

XML Query Engines. These are standalone engines that support XML documents
querying. They are used in data- and document-centric applications;

XML Data Binding. Products that bind XML documents to objects. They can also support
persistent objects into the database. They are used in data-centric applications.
Middleware
Middleware software is used by data-centric applications to transfer data between XML documents and
databases. It usually runs in the process space of the application and usually accesses data in relational
databases using ODBC, JDBC, or OLE DB.
Some examples of this kind of software are:

ADO from Microsoft. ADO can Recordset objects as save XML documents and can restore Recordset
objects from XML documents. In this case, Recordset objects are used to transfer data between XML
documents and databases. ADO XML document is divided in two parts: first one maps the XML data from
the second part into the Recordset. This mapping is described with an annotated version of XML-Data
Reduced. In ADO XML tree of nested elements is presented as a tree of nested Recordsets and vice
versa. Updates, deletes, or inserts are flagged in the XML document with ADO-specific tags.

Delphi from Borland. It is application development tool that supports the transfer of data between XML
documents and databases through the use of client data sets. All of the data is local to the client in the
client data set. Last ones can be bound to databases or XML documents. Client data sets are used as
mediator between XML documents and databases. Client data set are mapped into XML document and
vice versa via tables. With client data sets its possible to emulate object-relational mapping.

XML SQL Utility for Java, XSQL Servlet from Oracle. XML SQL Utility for Java is a set of Java classes
for transferring data between a relational database and an XML document. They can be used through the
provided front ends or in a user-written application. If the database system supports SQL 3 object views,
the product uses an object-relational mapping; otherwise it uses a table-based mapping for a single
table. The XML SQL Utility for Java accepts XML documents or DOM Documents. It returns results as
XML documents, a DOM Documents, or SAX2 events and may include inline XML Schemas. The XML
SQL Utility for Java supports updates and deletes. XSQL Servlet is a Java servlet that uses the XML SQL
Utility for Java.
XML-Enabled Databases
XML-enabled database systems have extensions for transferring data between XML documents and their own databases. They are used by
data-centric applications.
Some examples of this kind of software are:

Access 2003 from Microsoft. It transfers data to/from XML documents using a table-based mapping. Individual data values must be in
child elements (attributes are ignored) and table/column names must match element names. Access 2003 can create an XML Schema
document describing exported data.

DB2 from IBM. DB2 supports XML in the base DB2 product (as publishing functions in SQL/XML), in the XML Extender and Text
Extender, and in its Web services framework. The DB2 XML Extender (DB2 UDB Extender) can store XML documents in columns of type
VARCHAR, CLOB, or files using the XMLVARCHAR, XMLCLOB, or XMLFILE user-defined types or in XML collections. Data Access
Definition (DAD) file allows one or more elements/attributes to be indexed. XML collections map non-XML data to an XML document
according to a DAD document. There two different mappings: SQL mapping and RDB node mapping. SQL mapping uses templates to
specify where the results should be placed. RDB node mapping is an object-relational mapping and can be used to transfer data both to
and from the database. A visual tool is provided for constructing DAD documents - that is, mapping elements and attributes to tables and
columns. Applications use stored procedures to invoke the XML Extender. The XML Extender manages DAD documents and DTD-s in its
own tables. The XML Extender can send XML documents to and retrieve XML documents from MQSeries message queues, validate XML
documents against XML Schemas or DTD-s, transform XML documents with XSLT, copy XML documents between files and the
database, and extract values from XML documents. The DB2 Text Extender supports many search technologies, such as fuzzy searches,
synonym searches, and searches by sentence or paragraph. DB2 WORF uses DADX documents to define Web services. DADX
documents extend the functionality of DAD documents and describe how a Web service accesses data in the database. Supported
functionality includes storing and retrieving documents with the XML Extender, executing SQL queries, and calling stored procedures.
DB2 WORF can also generate WSDL documents from DADX documents.

FoxPro Microsoft. Visual FoxPro transfers data between an XML document and a FoxPro table with: CURSORTOXML,
XMLTOCURSOR, and XMLUPDATEGRAM. CURSORTOXML and XMLTOCURSOR use a table-based mapping. Column data can be
represented either as attributes or as child elements. FoxPro can use XML Schema to determine the mapping. If XML Schema is not
presented, FoxPro analyzes the XML document to determine the structure of the document and to construct the mapping. FoxPro
generates an inline schema when transferring data from the database to XML.

Informix from IBM. Informix supports XML through its Object Translator and through the Web DataBlade. In Object Translator, XML
support is provided through generated methods that transfer data between objects and XML documents. A GUI tool can be used to create
object-relational mappings from XML documents to the database. The Web DataBlade is an application that creates XML documents from
templates containing embedded SQL statements and other scripting language commands.
XML-Enabled Databases



Oracle 8i, Oracle 9i XDB from Oracle. Oracle 9i XDB supports both XML-enabled and native storage of XML data. It blurs the
boundaries between relational data and XML data by providing SQL features (implemented at the engine level) that allow users to view
relational data as XML and XML data as relational. The main feature is the XMLType data type. This is a predefined object type that can
store an XML document. Like any object type, XMLType can be used as the data type of a column in a table or view. The latter usage is
important, as it means that an XML "view" - a virtual XML document - can be constructed over any data, regardless of whether it is
relational data or XML data. A number of operators have been added to SQL to help view XML data as relational data and vice versa.
XMLType data can be stored in either of two ways: with object-relational storage or as a CLOB. The storage options are interchangeable
and XML applications use the same code regardless of which option is chosen. XMLType data can be accessed in several ways. Java
Beans (which can be generated from an XML Schema) can be used when the data uses object-relational storage. The DOM can be used
regardless of the storage option. (The DOM implementation populates nodes lazily for better concurrency.) Both methods can cache
changes and store them later with a call to XMLType.save(). In addition, data can be accessed by executing SQL statements that use the
operators mentioned earlier. The other major feature of XDB is the XML Repository. This provides a file system-like view of XMLType
objects in the database. That is, XMLType objects (regardless of whether they actually contain XML data or are just XML views over
relational data) can be assigned a path and corresponding URL in the repository hierarchy. These can then be accessed via WebDAV,
FTP, JNDI, and SQL; the latter has special operators for this purpose. In addition, the repository maintains properties for each object,
such as owner, modification date, version, and access control.
SQL Server 2000 from Microsoft. Microsoft SQL Server 2000 supports XML in three ways: the FOR XML clause in SELECT
statements, XPath queries that use annotated XML-Data Reduced schemas, and the OpenXML function in stored procedures. SELECT
statements and XPath queries can be submitted via HTTP, either directly or in a template file. The FOR XML clause has three options,
which specify how the SELECT statement is mapped to XML. RAW models the result set as a table, with one element (named "row")
returned for each row. Columns can be returned either as attributes or child elements. AUTO is the same as RAW, except that: 1) the row
elements are named the same as table name, and 2) the resulting XML is nested in a linear hierarchy in the order in which tables appear
in the select list. Annotated XML-Data Reduced schemas contain extra attributes that map elements and attributes to tables and columns.
These specify an object-relational mapping between the XML document and the database, and are used to query the database using a
subset of XPath. A tool exists to construct mapping schemas graphically. The OpenXML function uses a table-based mapping to extract
any part of an XML document as a table and use it in most places a table name can be used, such as the FROM clause of a SELECT
statement. This can be used in conjunction with an INSERT statement to transfer data from an XML document to the database. An XPath
expression identifies the element or attribute that represents a row of data. Additional XPath expressions identify the related elements,
attributes, or PCDATA that comprise the columns in each row, such as the children of the row element.
Sybase ASE 12.5 from Sybase. Sybase supports XML in two ways. First, the ResultSetXml class can transfer data between an XML
document and the database. A ResultSetXml object can be created from an XML document or a SELECT statement. Among other things,
applications can modify the data in a ResultSetXml object, serialize the data to an XML document, or create an SQL script to create a
table for the data and store the data in the database. The XML document used by ResultSetXml has a proprietary format that contains a
set of ColumnMetaData elements followed by a set of Row and Column elements. Sybase also has native XML capabilities. It can store
XML documents in a pre-parsed, indexed form in BLOB columns. These can then be queried with XQL.
Native XML Databases
A native XML database is one that:
 Defines a (logical) model for an XML document - as opposed to
the data in that document - and stores and retrieves documents
according to that model. At a minimum, the model must include
elements, attributes, PCDATA, and document order. Examples of
such models are the XPath data model, the XML Infoset, and the
models implied by the DOM and the events in SAX 1.0.
 Has an XML document as its fundamental unit of (logical)
storage, just as a relational database has a row in a table as its
fundamental unit of (logical) storage.
 Is not required to have any particular underlying physical storage
model. For example, it can be built on a relational, hierarchical,
or object-oriented database, or use a proprietary storage format
such as indexed, compressed files.
Native XML Databases


Native XML databases fall into two broad categories:
Text-based storage. Store the entire document in text form and
provide some sort of database functionality in accessing the
document. A simple strategy for this might store the document as
a BLOB in a relational database or as a file in a file system and
provide XML-aware indexes over the document. A more
sophisticated strategy might store the document in a custom,
optimized data store with indexes, transaction support, and so
on.
Model-based storage. Store a binary model of the document
(such as the DOM or a variant thereof) in an existing or custom
data store. For example, this might map the DOM to relational
tables such as Elements, Attributes, and Entities or store the
DOM in pre-parsed form in a data store written specifically for
this task. This includes the category formerly known as
"Persistent DOM Implementations".
Native XML Databases
There are two major differences between the two strategies. First, textbased storage can exactly round-trip the document, down to such
trivialities as whether single or double quotes surround attribute values.
Model-based storage can only round-trip documents at the level of the
underlying document model. This should be adequate for most
applications but applications with special needs in this area should
check to see exactly what the model supports.
The second major difference is speed. Text-based storage obviously has
the advantage in returning entire documents or fragments in text form.
Model-based storage probably has the advantage in combining
fragments from different documents, although this does depend on
factors such as document size, parsing speed (for text-based storage),
and retrieval speed (for model-based storage). Whether it is faster to
return an entire document as a DOM tree or SAX events probably
depends on the individual database, again with parsing speed
competing against retrieval speed.
Native XML Databases
Native XML databases differ from XML-enabled databases in three
main ways:
 Native XML databases can preserve physical structure (entity
usage, CDATA sections, etc.) as well as comments, PIs, DTDs,
etc. While XML-enabled databases can do this in theory, this is
generally not done in practice.
 Native XML databases can store XML documents without
knowing their schema (DTD), assuming one even exists.
Although XML-enabled databases could generate schemas on
the fly, this is impractical in practice, especially when dealing with
schema-less documents.
 The only interface to the data in native XML databases is XML
and related technologies, such as XPath, the DOM, or an XMLspecific API. XML-enabled databases, on the other hand, offer
direct access to the data, such as through ODBC.
Native XML Databases
Some examples of this kind of software are:

Berkeley DB XML from Sleepycat Software. Berkeley DB XML is an application-specific native XML data manager built on
Berkeley DB. Berkeley DB XML provides storage and retrieval for native XML data and semi-structured data. Berkeley DB
XML is supplied as a library that links directly into the application's address space. This eliminates bottlenecks that occur in
client-server systems. APIs are available in a number of languages, including C++, Java, Python, Perl, Ruby, and Tcl.
Berkeley DB XML stores XML documents in collections. A single application may operate on many collections at the same
time. A single application may also combine data from different collections easily. Non-XML data may be included by
creating standard Berkeley DB tables. Tables and collections may be used together, with full support for Berkeley DB
transactions and recovery services, by multiple users simultaneously. Berkeley DB XML enables fast look up by allowing
individual collections to be indexed differently. This allows Berkeley DB XML to speed up the common queries over particular
collections. Each collection supports multiple indexes. A wide variety of available indexing schemes support different XPath
queries efficiently. Berkeley DB XML's Query Processor implements XPath 1.0. A cost-based query optimizer considers the
indices that exist, the data volume that a query is likely to produce and the cost of computation and disk I/O to select a query
plan with the lowest run-time cost.

Lore from Stanford University. Semi-structured data is data with more structure than a conversation, but less structure
than a telephone book. A good example is a resume (curriculum vitae). While virtually all resumes include a name, address,
and telephone number, only some will include an email address, Web site, or FAX number. Most will include a list of
previous jobs, but others might include only a list of university courses. Depending on the profession, there might be a list of
software used or licenses held. XML is well-suited to storing semi-structured data and shares a feature common to many
semi-structured data models: it is self-describing. That is, it carries a certain amount of metadata with the data. In the case of
XML, this is in the form of element type and attributes names. The legality of well-formed documents mirrors another feature
found in many semi-structured data models: the data model is not required to have a definitive schema, and the model can
be extended at will by the addition of new fields. Lore is a database designed for storing semi-structured data. Although it
predates XML, it has recently been migrated for use as an XML database. It includes a query language (Lorel), multiple
indexing techniques, a cost-based query optimizer, multi-user support, logging, and recovery, as well as the ability to import
external data. Because Lore is designed for use with semi-structured data, XML documents without DTDs can be easily
stored. An interesting feature of Lore is a DataGuide, which is a "structural summary of all paths in the database". Unlike
structured databases, in which the structure is specified first and data is added according to that structure, data is entered
first into Lore and the structure is then summarized. The resulting information useful for query processing. The Lore
executables are "available for public use". Source code may be available in some circumstances.
Native XML Databases

Tamino from Software AG. Tamino XML Server is a suite of products built in three layers - core
services, enabling services, and solutions (third-party applications) - which may be purchased in a variety
of combinations. Core services include a native XML database, an integrated relational database,
schema services, security, administration tools, and Tamino X-Tension, a service that allows users to
write extensions that customize server functionality. The XML engine uses the Data Map, which
describes where the data in a given XML document is stored. This allows individual XML documents to
be composed of data from multiple, heterogeneous sources, such as the native XML data store,
relational databases, and the file system. Since the connections to external data (made through the XNode module) are live and bidirectional, Tamino may thus be used to perform heterogeneous joins and
updates. Tamino's XML support includes the DOM, JDOM, SAX, and XML:DB APIs, an extended XPath
implementation called X-Query (not to be confused with W3C XQuery, which it predates), full-text
retrieval, processing of XML documents with server-side XSL and CSS, and limited support for SOAP. It
can store schema-less documents and can use schema information (including a subset of XML
Schemas) if it is available. The internal SQL engine is directly addressable through ODBC, JDBC, and
OLE DB. However, when addressed via these APIs, it cannot integrate data from the internal XML data
store or from external data sources. Enabling services include X-Port, X-Plorer, X-Application, various
APIs, X-Node, and the WebDAV Server. X-Port provides URL-based data transfer through various
standard HTTP servers, X-Plorer is a browser-based navigation tool for documents stored in Tamino, and
X-Application is a set of JSP tags for accessing Tamino through Web pages. The WebDAV Server adds
namespace management, additional properties and overwrites protection to the existing Tamino XML
Server functionality. This allows Tamino to serve as a virtual file system where the information can be
stored and retrieved using a standard Web browser and the common drag and drop metaphor. Tamino is
not built on top of Adabas, a hierarchical database from Software AG. Instead, the Tamino data store was
built from the ground up as a native XML database, obviously drawing on the knowledge gained from
developing Adabas.
XML Servers
XML servers are XML-aware J2EE servers, Web application
servers, integration engines, and custom servers. Unlike
middleware, XML servers usually run in a separate process
space from the application. Some XML servers are used to build
distributed applications, such as e-commerce and business-tobusiness applications, where XML serves as the data transport.
Others are used simply to publish XML documents to the Web.
XML servers often contain complete application development
environments and may provide access to data in a variety of data
stores, including legacy databases, email messages, and
application data.
Net.Data from IBM is a Web server add-on for transferring data
from a database to XML (or any text-based format). The product
uses templates with a Net.Data-specific macro language. This is
quite flexible, including variables, function definitions, loops, and
if statements, as well as being able to parameterize SQL
statements for nested queries.
Wrappers
Wrappers are systems that treat XML documents as a source of relational data.
(The term comes from federated database systems, where a wrapper is a
component that "wraps" a source system so its data uses the model (usually
relational) of a target system.) You can think of wrappers as the opposite of
XML-enabled databases. That is, with wrappers, XML data is treated as
relational data, while with XML-enabled databases; relational data is treated as
XML data.
Wrappers can be used in a variety of situations. One common use is so that data
from an XML document can be included in a heterogeneous join - that is, a
SELECT statement that joins data from different systems. Another common use
is for editing XML documents. Although this latter use might seem surprising, it
provides developers an easy and familiar way to modify XML documents that
are structured like a table.
Wrappers typically implement an SQL query engine, use an object-relational or
table-based mapping, and work only with data-centric documents.
DB2 Information Integrator of IBM and OpenXML function in SQL Server of
Microsoft are examples of such a kind of software.
Content Management Systems
Content management systems are systems for storing, retrieving, and assembling documents
from document fragments (content). They generally include such features as editors,
version control, and multi-user access. Although they are usually built on top of a database
(some are built on top of the file system), this is generally hidden from the user.
SiberSafe of SiberLogic as example. SiberSafe is a 100% Java, TCP/IP, HTTP and
WebDAV-enabled multithreaded, load-balanced XML repository server that provides XML
content management functionality in the following areas:

User-defined fragmentation of XML documents;

Fragment-level storage, locking and retrieval of XML documents;

Fragment-level versioning of XML documents;

Fragment-level indexing and search of XML documents;

XML document dependency tracking and external entity management;

Publishing XML documents into various formats, including PDF, RTF, HTML etc;

Fully integrated workflow control, including tracking of tasks and assignments;

Project-wide fragment-level branching and merging;

Multi-language translation automation;

Windows NT, WebDAV and HTTP (browser) clients available out-of-the-box;
SiberSafe can also store documents other than XML, such as images and DTDs. SiberSafe is
DTD neutral and can work with any user-defined DTD. It can run on any JDBC-compatible
database, however is by default configured for Microsoft Access.
Conclusion
Information presented here is collected from the Web. There are
many software products from above defined categories. Here as
examples are presented only the most important one from
commercial and scientific point of view.
There are two main problems with XML in the database systems:
 How to store and retrieve XML documents?
 How to merge older technologies (relational and object-relational
ones) with XML?
These problems are focused in the implementation of quickly
evolving standard XQuery – only a few database systems
support it.
Will XML be merged in current database systems, how it has been
happened with object-oriented databases and relational
databases in object-relational ones, the future will show, but
undoubtedly XML is a new challenge to the database systems.