Download XML.com: Mapping DTDs to Databases [May. 09, 2001]

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Clusterpoint wikipedia , lookup

Operational transformation wikipedia , lookup

Data vault modeling wikipedia , lookup

Versant Object Database wikipedia , lookup

Entity–attribute–value model wikipedia , lookup

Relational model wikipedia , lookup

Database model wikipedia , lookup

Transcript
Mapping DTDs to Databases
by Ronald Bourret
May 09, 2001
1. Overview
A common question in the XML community is how to
map XML to databases. This article discusses two
mappings: a table-based mapping and an object-relational
(object-based) mapping. Both mappings model the data
in XML documents rather than the documents
themselves. This makes the mappings a good choice for
data-centric documents and a poor choice for documentcentric documents. The table-based mapping can't handle
mixed content at all, and the object-relational mapping of
mixed content is extremely inefficient.
Both mappings are commonly used as the basis for
software that transfers data between XML documents and
databases, especially relational databases. An important
characteristic in this respect is that they are bidirectional.
That is, they can be used to transfer data both from XML
documents to the database and from the database to XML
documents. One consequence is that they are likely to be
used as canonical mappings on top of which XML query
languages can be built over non-XML databases. The
canonical mappings will define virtual XML documents
that can be queried with something like XQuery.
In addition to being used to transfer data between XML
documents and databases, the first part of the objectrelational mapping is used in "data binding", the
marshalling and unmarshalling of data between XML
documents and objects.
2. Table-based Mapping
There is an obvious mapping
between the following XML
document and table:
<A>
<B>
<C>ccc</C>
Table A
Table of Contents
•Overview
•Table-based
Mapping
•Object-Relational
Mapping
•The Basic Mapping
•Mapping Complex
<D>ddd</D>
Content Models
•Mapping Mixed
D
E
</B>
-- Content
----•Mapping Order
<B>
<=>
•Mapping Attributes
...
...
...
•Alternate Mappings
<C>fff</C>
ccc
ddd
eee
•Generating Schema
<D>ggg</D>
•Generating
fff
ggg
hhh
<E>hhh</E>
Relational Database
...
...
...
Schema from DTDs
</B>
</A>
•Generating DTDs
from Database
Schema
It's called a table-based mapping.
•Mapping XML
It views the document as a single
Schemas to
table or a set of tables. The
Databases
structure of the document must
•Related Topics
be either
-------
<E>eee</E>
C
<Table>
<Row>
<Column_1>...</Column_1>
...
<Column_n>...</Column_n>
</Row>
...
<Row>
<Column_1>...</Column_1>
...
<Column_n>...</Column_n>
</Row>
</Table>
or
<Tables>
<Table_1>
<Row>
<Column_1>...</Column_1>
...
<Column_n>...</Column_n>
</Row>
...
</Table_1>
...
<Table_n>
<Row>
<Column_1>...</Column_1>
...
<Column_m>...</Column_m>
</Row>
...
</Table_n>
</Tables>
with the caveat that column data can be represented either
as PCDATA-only elements (shown) or attributes.
The obvious advantage of this mapping is its simplicity.
Because it matches the structure of tables and result sets
in a relational database, it is easy to write code based on
this mapping; code which is fast, scales well, and is quite
useful for certain applications, such as transferring data
between databases one table at a time.
The mapping has several disadvantages; primarily, it only
works with a very small subset of XML documents. In
addition, it does not preserve physical structure (such as
character and entity references, CDATA sections,
character encodings, or the standalone declaration) or
document information (such as the document type or
DTD), comments, or processing instructions.
The table-based mapping is commmonly used by
middleware to transfer data between XML documents
and relational databases. It is also used in some Web
application servers to return result set data as XML.
3. Object-Relational Mapping
Because table-based mappings only work with a limited
subset of XML documents, some middleware tools, most
XML-enabled relational databases, and most XMLenabled object servers use a more sophisticated mapping,
called an object-relational mapping. This models the
XML document as a tree of objects that are specific to the
data in the document, then maps these objects to the
database.
(The name "object-relational" is actually a misnomer -- a
better name would be object-based mapping. This is
because the objects can be mapped to non-relational
databases, such as object-oriented or hierarchical
databases, or simply left alone, as is done in data binding
systems. However, because object-relational is a wellknown term and this mapping is most commonly used
with relational databases, that term will be used here. In
addition, all examples use relational tables.)
To understand the object-relational mapping, it is easiest
to first look at some simple examples. To start, notice that
there is an obvious mapping between the following XML
document, object, and row in a table:
XML
Tables
=============
===============
Table A
<A>
------<B>bbb</B>
C
D
<C>ccc</C>
----<D>ddd</D>
...
...
</A>
ccc
ddd
Objects
============
object A {
B = "bbb"
<=>
C = "ccc"
B
<=>
D = "ddd"
}
--...
bbb
...
...
...
Similarly, there is an obvious mapping between the
following element type definition, class, and table
schema:
DTD
Table Schema
======================
====================
Classes
============
class A {
CREATE TABLE A
String B;
B VARCHAR(10) NOT NULL,
<!ELEMENT A (B, C, D)>
<=>
<=>
C VARCHAR(10) NOT NULL,
String C;
String D;
D VARCHAR(10) NOT NULL
}
)
As a more complex example, consider the following
XML document:
<SalesOrder>
<Number>1234</Number>
<Customer>Gallagher Industries</Customer>
<Date>29.10.00</Date>
<Item Number="1">
<Part>A-10</Part>
<Quantity>12</Quantity>
<Price>10.95</Price>
</Item>
<Item Number="2">
<Part>B-43</Part>
<Quantity>600</Quantity>
<Price>3.99</Price>
</Item>
</SalesOrder>
which maps to the following objects:
object SalesOrder {
number = 1234;
customer = "Gallagher Industries";
date = 29.10.00;
items = {ptrs to Item objects};
}
/
\
/
\
/
\
object Item {
object Item {
number = 1;
number = 2;
part = "A-10";
part = "B-43";
quantity = 12;
quantity = 600;
price = 10.95;
price = 3.95;
}
}
and then to rows in the following tables:
SaleOrders
---------Number
Customer
------------------------1234
Gallagher Industries
...
...
...
...
Items
----SONumber
-------1234
1234
...
Item
---1
2
...
Part
---A-10
B-43
...
Date
-------29.10.00
...
...
Quantity
-------12
600
...
Price
----10.95
3.99
...
All of these are examples of object-relational mappings
from DTDs to relational databases.
Next Page
Mapping DTDs to Databases
by Ronald Bourret | Pages: 1, 2, 3, 4, 5
3.1. The Basic Mapping
An object-relational mapping is performed in two steps.
First, an XML schema (a DTD in this case) is mapped to
an object schema, then the object schema is mapped to
the database schema. The two mappings can optionally be
combined for a direct DTD-to-database mapping, as is
done in most software today.
In considering this mapping, it is important to understand
that the objects involved are specific to each DTD and
not objects from the DOM. In particular, these objects
model the data in the XML document, while the DOM
models the structure of the XML document. For example,
the following shows the object trees for data-specific
objects and the DOM that would be created for the XML
document in the example above:
SalesOrder
/
\
Item
Item
Document
|
Element_______
______/ / \ \______
\_________
/
/
\
\
\
Element
\
Element
Element
Element
|
|
|
Element
/
|
\
etc.
Text
Text
Text
/
|
\ \_______
/
\
\
Element
Attr
|
Element Element
|
|
Text
Text
|
|
Text
Text
This distinction is particularly important when you
consider how the data is stored in the database. To store
the data-specific objects, you will need SalesOrders and
Items tables; to store the DOM objects, you will need
Document, Element, Text, and Attr tables. The most
important difference is that non-XML applications can
use the data-specific tables but can't use the DOMspecific tables.
3.1.1. Mapping DTDs to Object Schemas
The mapping starts by recognizing that element types are
data types. Element types that have PCDATA-only
content are called simple element types; the term is
borrowed from W3C XML Schema. These hold a single
data value and are equivalent to scalar data types in an
object-oriented programming language. (Note that the
word "scalar" is used here to mean "consisting of a single
data value". In some languages, "scalar" data types -- in
this sense of the word -- are represented by objects. The
most notable example of this is the String data type in
Java.) Attribute types are also simple types.
Element types that have element or mixed content, or that
have attributes, are called complex data types; the term is
again borrowed from XML Schema. These hold a
structured value and are equivalent to classes in an
object-oriented programming language or structs in C.
Note that an element type that has empty content and
attributes is still "complex". The reason for this is that the
attributes also provide structure and are roughly
equivalent to child PCDATA-only elements.
The object-relational mapping first maps simple types to
scalar data types. For example, the element type Title
might be mapped to a String, and the element type Price
might be mapped to a float. It then maps complex types
to classes, with each element type in the content model of
the complex type mapped to a property of that class. The
data type of each property is the data type to which the
referenced element type is mapped. For example, a
reference to a Title element would be mapped to a String
property, and a reference to a Price element would be
mapped to a float property. References to complex
element types are mapped to pointers/references to an
object of the class to which the complex element type is
mapped.
The last part of the mapping maps attributes to properties,
with the data type of the property determined from the
data type of the attribute. Note that attributes are
equivalent to references to element types in a content
model. This is because, like references in a content
model, they are local to a given element type. The only
conceptual difference is that the attribute type is defined
locally, rather than at a global (DTD-wide) level, as is the
case with element types.
For example, in the following the simple element types B,
D, E, and the attribute F are all mapped to Strings and the
complex element types A and C are mapped to classes A
and C. The content models and attributes of A and C are
mapped to properties of classes A and C. The references
to B, D, and E in the content models of A and C are
mapped to String properties (because the types are
mapped to Strings) and the attribute F is also mapped to a
String property. The reference to C in the content model
of A is mapped to a property with the type
pointer/reference to an object of class C because element
type C is mapped to class C.
DTD
=========================
Classes
=============
<!ELEMENT A (B, C)>
<!ELEMENT B (#PCDATA)>
<!ATTLIST A
F CDATA #REQUIRED>
class A {
String b;
C
c;
String f;
}
<!ELEMENT C (D, E)>
<!ELEMENT D (#PCDATA)>
<!ELEMENT E (#PCDATA)>
==>
==>
class C {
String d;
String e;
}
One important point to reiterate here is that mapping
references to element types in a content model is different
than mapping the element types themselves. Element
types map to data types, while references to element
types map to properties in structured data types (classes).
The difference is clear when you consider an element
type that is referenced in two different content models. In
this case, each reference must be mapped separately, with
the data types of the resulting properties determined by
the data types to which the element types themselves (not
the references) are mapped.
For example, consider the Title and Section element types
in the following. Each of these element types is
referenced in the content models of both Chapter and
Appendix. Each reference is mapped separately for each
parent element type. The data type of the properties to
which the references to Title are mapped is String, since
Title contains only PCDATA and is mapped to a String.
The data type of the properties to which the references to
Section are mapped is a pointer/reference to a Section
object, since the Section element type is complex and is
mapped to a Section class.
DTD
Classes
=========================================
=============
<!ELEMENT Chapter (Title, Section+)>
class Chapter {
<!ELEMENT Appendix (Title, Section+)>
String
title;
<!ELEMENT Title (#PCDATA)>
Section[] sections;
<!ELEMENT Section (#PCDATA | p | b | i)*>
}
==>
class Appendix {
String
title;
Section[] sections;
}
The other important point to reiterate is that simple
element types and attributes can be mapped to data types
other than String. For example, an element type named
Quantity might be mapped to an integer. When mapping
from a DTD, this requires human intervention, since there
is no way to predict the target data type from a PCDATAonly element type. However, when mapping from an
XML Schema, the target type is known since XML
Schemas have data types.
3.1.2. Mapping Object Schemas to Database Schemas
In the second part of the object-relational mapping,
classes are mapped to tables (known as class tables),
scalar properties are mapped to columns, and
pointer/reference properties are mapped to primary
key/foreign key relationships. For example:
Classes
============
class A {
String b;
C
c;
String f;
}
class C {
String d;
String e;
}
==>
==>
Tables
=================
Table A:
Column b
Column c_fk
Column f
Table C:
Column d
Column e
Column c_pk
Note that the tables are joined by a primary key (C.c_pk)
and a foreign key (A.c_fk). Because the relationship
between the parent and child elements is one-to-one, the
primary key can be in either table. If the relationship is
one-to-many, the primary key must be on the "one" side
of the relationship, regardless of whether this is the parent
or child. For example, if a SalesOrder element contains
multiple Item elements, then the primary key must be in
the SalesOrder table (the parent). But if each Item
element contains a Part element, the primary key must be
in the Part table (the child), since one part can occur in
many items.
A primary key column can be created as part of the
mapping, as is the case for the column c_pk, or an
existing column or columns can be used as the primary
key. For example, if a SalesOrder element type has a
Number child, this might be mapped to the primary key
column.
If a primary key column is created as part of the mapping,
its value must be generated by the database or the data
transfer software. While this is generally considered
better database design than using data columns as
primary keys, it has a disadvantage when used with
XML, and this is that the generated key is meaningless
outside the source database. Thus, when data with a
generated key is transferred to an XML document, it will
either contain a meaningless primary key (if the primary
key value is transferred) or no primary key at all (if the
key is not transferred). In the latter case, it may be
impossible to re-identify the source of the data, which is a
problem if the data is modified and returned to the
database as an XML document.
3.1.3. Miscellany
Before we continue with the more complicated parts of
the mapping, we need to mention two things. First, names
can be changed during the mapping. For example, the
DTD, object schema, and relational schema can all use
different names. For example, the following DTD uses
different names than the following class:
DTD
Classes
===============================
=====================
<!ELEMENT Part (Number, Price)>
PartClass {
<!ELEMENT Number (#PCDATA)>
String numberProp;
<!ELEMENT Price (#PCDATA)>
float priceProp;
class
==>
}
which uses different names than the following table:
Classes
=====================
==================
class PartClass {
String numberProp;
float priceProp;
PRTPRICE
}
Tables
==>
Table PRT
Column PRTNUM
Column
Second, the objects involved in the mapping are
conceptual. That is, there is no need to actually instantiate
them when transferring data between an XML document
and a relational database. (This is not to say that the
objects can't be instantiated. Whether it is useful to
instantiate the objects depends on the actual application.)
3.2. Mapping Complex Content Models
The content models we have seen so far are relatively
simple. What happens with more complex content
models, such as the following?
<!ELEMENT A (B?, (C | ((D | E | F | G)*, (H | I)+,
J?)))>
In this section, we will consider the various parts of the
content model. Mapping the above example will be left as
an exercise to the reader. (I've always wanted to say that.)
3.2.1. Mapping Sequences
As has already been seen, each element type referenced
in a sequence is mapped to a property, which is then
mapped either to a column or to a primary key, foreign
key relationship. For example:
DTD
Tables
======================
==============
Classes
============
<!ELEMENT A (B, C)>
Table A
<!ELEMENT B (#PCDATA)>
==>
Column b
class A {
==>
String b;
C
c;
Column c_fk
}
<!ELEMENT C (D, E)>
Table C
<!ELEMENT D (#PCDATA)>
==>
Column d
<!ELEMENT E (#PCDATA)>
Column e
class C {
==>
String d;
String e;
}
Column c_pk
3.2.2. Mapping Choices
As with sequences, each element type referenced in a
choice is also mapped to a property, then either to a
column or a primary key, foreign key relationship. The
only difference from the way sequences are mapped is
that the properties and columns can be null. For example,
suppose we changed the sequence in the content model of
A to a choice. The mapping from DTD to object schema
would then be
DTD
Classes
======================
========================
<!ELEMENT A (B | C)>
<!ELEMENT B (#PCDATA)>
// Nullable
==>
class A {
String b;
C
c;
// Nullable
<!ELEMENT C (D, E)>
<!ELEMENT D (#PCDATA)>
<!ELEMENT E (#PCDATA)>
==>
}
class C {
String d;
String e;
}
and the mapping from object schema to database schema
would be
Classes
Tables
========================
===========================
class A {
String b; // Nullable
// Nullable
C
c; // Nullable
c_fk // Nullable
}
class C {
String d;
// Not nullable
String e;
// Not nullable
}
c_pk // Not nullable
==>
Table A (
Column b
Column
==>
Table C (
Column d
Column e
Column
To see why this is true, consider the following XML
document, which conforms to the DTD above. Because
the choice in the content model of A requires that either B
or C (but not both) be present as a child, one of the two
corresponding properties (and columns) will always be
null.
XML
Tables
=============
===========
Objects
============
Table A
--------<A>
b
c_fk
<B>bbb</B>
-----</A>
bbb
NULL
object a {
==>
b = "bbb"
==>
c = null
}
...
...
Note that if the primary key used to join the tables was in
table A, the corresponding foreign key column in C
would not be nullable. If A did have a C child, the
column would have to have a value to join it to the
correct row in table A. If A did not have a C child, there
would simply be no row in table C.
3.2.3. Mapping Repeated Children
Children that can occur multiple times in their parent,
which are known as repeated children, are mapped to
multi-valued properties and then either to multiple
columns in a table or to a separate table, known as a
property table.
If a content model contains repeated references to an
element type, the references are mapped to a single
property, which is an array of known size. This can be
mapped either to multiple columns in a table or to a
property table. For example, the following shows how to
map a repeated reference to multiple columns in a table:
DTD
Classes
Tables
=========================
==============
===========
<!ELEMENT A (B, B, B, C)>
Table A
<!ELEMENT B (#PCDATA)>
==>
String[] b;
==>
Column b1
<!ELEMENT C (#PCDATA)>
String
c;
Column b2
class A
{
}
Column b3
Column c
If the + or * operator is applied to a reference, the
reference is again mapped to a single property, which this
time is an array of unknown size. Since the number of
values can be arbitrarily large, the property must be
mapped to a property table, which will contain one row
for each value. The property table is linked to the class
table by a primary key, foreign key relationship, where
the primary key is in the class table. For example,
DTD
Tables
======================
==============
Classes
==============
<!ELEMENT A (B+, C)>
Table A
<!ELEMENT B (#PCDATA)>
==>
b;
==>
Column a_pk
<!ELEMENT C (#PCDATA)>
c;
Column c
class A {
String[]
String
}
Table B
Column a_fk
Column b
3.2.4. Mapping Optional Children
Children that are optional in their parent are mapped to
nullable properties, then to nullable columns. This has
already seen for children that appear in a choice group, as
in the following mapping from DTD to object schema:
DTD
Classes
=========================
========================
<!ELEMENT A (B | C | D)>
<!ELEMENT B (#PCDATA)>
// Nullable
<!ELEMENT C (#PCDATA)>
// Nullable
==>
class A {
String b;
String c;
D
d;
// Nullable
<!ELEMENT D (E, F)>
<!ELEMENT E (#PCDATA)>
<!ELEMENT F (#PCDATA)>
==>
}
class D {
String e;
String f;
}
and object schema to database schema:
Classes
Tables
========================
========================
class A {
String b; // Nullable
// Nullable
String c; // Nullable
// Nullable
D
d; // Nullable
d_fk // Nullable
==>
Table A
Column b
Column c
Column
}
class D {
String e;
// Not nullable
String f;
// Not nullable
}
d_pk // Not nullable
Table D
Column e
==>
Column f
Column
It also applies when the ? or * operator is applied to a
reference, as in the following mapping from DTD to
object schema:
DTD
Classes
======================
==========================
<!ELEMENT A (B?, C*)>
<!ELEMENT B (#PCDATA)>
// Nullable
<!ELEMENT C (#PCDATA)>
c[]; // Nullable
class A {
String b;
==>
String
}
and object schema to database schema:
Classes
Tables
==========================
===============================
class A {
String b;
// Nullable
b
// Nullable
String c[]; // Nullable
a_pk // Not nullable
}
==>
Table A
Column
Column
Table C
Column
a_fk
// Not nullable
c
// Not nullable
Column
Note that the column used to store C (in property table C)
is not nullable. This is because, if no C children are
present in A, there are simply no rows in table C.
3.2.5. Mapping Subgroups
References in subgroups are mapped to properties of the
parent class, then to columns in the class table, as in the
following mapping from DTD to object schema:
DTD
Classes
=========================
============================
<!ELEMENT A (B, (C | D))>
<!ELEMENT B (#PCDATA)>
b; // Not nullable
<!ELEMENT C (#PCDATA)>
c; // Nullable
<!ELEMENT D (E, F)>
d; // Nullable
<!ELEMENT D (E, F)>
<!ELEMENT E (#PCDATA)>
e;
==>
class A {
String
String
D
==>
}
class D {
String
<!ELEMENT F (#PCDATA)>
String
f;
}
and object schema to database schema:
Classes
Tables
============================
==========================
class A {
String b; // Not nullable
Column b
// Not nullable
String c; // Nullable
Column c
// Nullable
D
d; // Nullable
Column d_fk // Nullable
}
class D {
(
String e;
Column e
String f;
Column f
}
Column d_pk
Table A
==>
Table D
==>
You might be wondering how this is possible. What
happened to the structure imposed by the subgroup? In
fact, this structure appears only in the content model, not
in instance documents. For example, both of the
following documents conform to the above content
model:
<A>
<B>bbbbbb</B>
<C>cccccc</C>
</A>
<A>
<B>bbbbbb</B>
<D>
<E>eee</E>
<F>fff</F>
</D>
</A>
There's no way to determine the presence of the subgroup
from these documents. Structurally, C and D are
indistinguishable from B; they are just children of A.
Thus, they can be mapped like children of A that are not
in a subgroup.
One consequence of mapping references in subgroups
directly to properties of the parent class is that
repeatability and optionality can be indirect. For example,
in the following content model C, D, and E are both
optional and repeatable. They are repeatable because the
+ operator applies to them indirectly. C is optional
because it is directly in a choice group, but D and E are
optional because they are indirectly in a choice group.
DTD
Classes
===============================
=============================
<!ELEMENT A (B, (C | (D, E))+)>
class A
{
<!ELEMENT B (#PCDATA)>
String
b;
<!ELEMENT C (#PCDATA)>
String[] c; // May be null
<!ELEMENT D (E, F)>
d; // May be null
<!ELEMENT E (#PCDATA)>
String[] e; // May be null
==>
D[]
}
Mapping DTDs to Databases
by Ronald Bourret | Pages: 1, 2, 3, 4, 5
3.3. Mapping Mixed Content
Mixed content is just a choice group to which the *
operator applies indirectly, except that it can contain
PCDATA mixed between child elements. Thus, the
element type references in mixed content can be mapped
first to properties that are nullable arrays of unknown
size, then to property tables.
To see how to map mixed content, consider the following
XML document:
<A>
This text <c>cc</c> makes
<b>bbbb</b> no sense
<c>cccc</c> except as
<b>bb</b> an example.
</A>
and then notice that it is essentially the same as the
following document, in which PCDATA has been
wrapped in <pcdata> elements:
<A>
<pcdata>This text </pcdata><c>cc</c><pcdata> makes
</pcdata><b>bbbb</b><pcdata> no sense
</pcdata><c>cccc</c><pcdata> except as
</pcdata><b>bb</b><pcdata> an example.</pcdata>
</A>
From this, it is easy to see that PCDATA can be treated
like any other child element. Thus, PCDATA in mixed
content is mapped to a nullable array of unknown size,
then to a property table. The following shows how to map
mixed content from a DTD to an object schema:
DTD
Classes
===============================
===================
class A
{
<!ELEMENT A (#PCDATA | B | C)*>
String[] pcdata;
<!ELEMENT B (#PCDATA)>
String[] b;
<!ELEMENT C (#PCDATA)>
String[] c;
==>
}
and an object schema to a database schema:
Classes
Tables
===================
===================================
Table PCDATA
-----Column a_fk
class A {
Column pcdata
String[] pcdata;
Table B
String[] b;
-----Column a_fk
String[] c;
Column b
}
Table C
/
Table A
==>
/
Column a_pk--\
\
\-----Column a_fk
Column c
To see what is actually stored in the database, consider
the document shown at the start of this section, which is
mapped to the following object, then to rows in the
following tables. (We assume that the system generates a
primary key of value 1 for the row in the table for A. This
is used to link the row in table A to the rows in the other
tables.)
Objects
Tables
============================
===============================
Table PCDATA
a_fk
pcdata
----
-----------
1
This text
1
makes
object a {
1
no sense
pcdata = {"This text ",
1
except as
" makes ",
1
an example.
" no sense ",
" except as",
Table B
Table A
==>
a_pk
----
" an example."}
b
b
---- ---c
1
bbbb
}
1
bb
1
a_fk
= {"bbbb", "bb"}
= {"cc", "cccc"}
Table C
a_fk
----
c
----
1
cc
1
cccc
One of the things that should be readily obvious from this
example is that the object-relational mapping is not very
efficient at storing mixed content. Because of this, it is
more commonly used in data-centric applications, which
tend to have little mixed content.
There are two ways to solve this problem. The first is to
use a mapping other than the object-relational mapping.
For example, if the document is modeled using the DOM
or a similar structure, and this is mapped to the database
with an object-relational mapping, there are far fewer
tables in the database -- Document, Element, Attr, Text,
etc. -- although a similar number of joins are required to
retrieve a document. The second strategy is to not break
documents into their smallest possible components but
instead to break them into larger pieces, such as chapters
or sections. This strategy can be used with the objectrelational mapping; for more information, see section
3.6.1, "Mapping Complex Element Types to Scalar
Types".
3.4. Mapping Order
This section discusses how the object-relational mapping
handles order.
3.4.1. Sibling Order, Hierarchical Order, and Document
Order
Sibling means "brother or sister". Thus, sibling elements
or PCDATA are elements or PCDATA that have the
same parent. In other words, they appear in the same
content model. For example, if the document from the
previous section is represented as a tree, it is readily
apparent which elements are siblings: those elements at
the second level of the hierarchy, which all have A as
their parent.
A
___________________________|______________________
|
|
|
|
|
|
|
|
This text C makes B no sense C except as
an example
|
|
|
cc
bbbb
cccc
|
B
|
bb
Note that the elements at the third level of the hierarchy
are not siblings because they don't share the same parent.
This also points out the difference between sibling order,
which is the order in which children occur in their parent,
and hierarchical order, which is the level at which
children appear in a tree representing the document.
Different still is document order, which is the order in
which elements and text appear in an XML document.
For example:
Sibling order (order not shown where there is only one
sibling):
A
___________________________|______________________
|
|
|
|
|
|
|
|
This text C makes B no sense C except as
an example
1
2
3
4
5
6
7
9
|
|
|
cc
bbbb
cccc
|
B
8
|
bb
Hierarchical order:
1
A
___________________________|______________________
|
|
|
|
|
|
|
|
|
2 This text C makes B no sense C except as
B an example
|
|
|
|
3
cc
bbbb
cccc
bb
Document order:
A
1
___________________________|______________________
|
|
|
|
|
|
|
|
This text C makes B no sense C except as
an example
2
3
5
6
8
9
11
14
|
|
|
cc
bbbb
cccc
4
7
10
According to the XML specification, sibling order is
significant. In practice, this depends on the application.
For example, in a data-centric application, where an
|
B
12
|
bb
13
XML document is used to populate an object or a table,
sibling order usually does not matter because objectoriented languages have no concept of order among their
properties. Similarly, relational databases have no
concept of order among their columns. Thus, the sibling
order is not significant in either of the following
documents:
<Part>
<Number>123</Number>
<Desc>Turkey wrench</Desc>
<Price>10.95</Price>
</Part>
<Part>
<Price>10.95</Price>
<Desc>Turkey wrench</Desc>
<Number>123</Number>
</Part>
both of which can be mapped to the following object and
row in a table:
Objects
Tables
=========================
===================================
Table Parts
object part {
--------------------number = 123
Desc
Price
desc = "Turkey wrench"
----------- ----price = 10.95
Turkey wrench 10.95
---------==>
Number
------
--
123
(A major exception to this is when a data-centric
document must match a specific DTD. This occurs when
an application must validate documents, such as when
they come from an unknown or untrusted source.
Although "all groups" in XML Schemas help in this
situation by allowing a set of children to appear in any
order, they do not support repeated children.)
On the other hand, in document-centric applications, in
which documents are generally designed for human
consumption, sibling order is very important. For
example, I am likely to like the first review and not the
second:
<Review>
<p>Ronald Bourret is an
<b>excellent writer</b>.
Only an <b>idiot</b>
wouldn't read his work.</p>
</Review>
<Review>
<p>Ronald Bourret is an
<b>idiot</b>. Only an
<b>excellent writer</b>
wouldn't read his work.</p>
</Review>
The object-relational mapping can preserve sibling order,
as will be seen below, although in practice few products
support this. It inherently preserves hierarchical order by
mapping references to simple element types to columns
in a table and by mapping references to complex element
types to primary key, foreign key relationships. It
preserves document order when both hierarchical and
sibling order are preserved.
3.4.2. Mapping Sibling Order
Because object-oriented languages have no concept of
order among their properties, and relational databases
have no concept of order among their columns, it is
necessary to store sibling order values separately from
data values. One way to do this is to introduce separate
properties and columns in which to store order values.
Another way to do this is to store the order values in the
mapping itself.
3.4.2.1. Order Properties and Columns
Order properties and order columns are used to store
order values. They are separate from data properties and
data columns. One property or column is needed for each
referenced element type or PCDATA for which order is
deemed important. For example, consider the above
mixed content example. The following maps the sibling
order in a DTD to order properties:
DTD
Classes
===============================
========================
class A {
String[] pcdata;
int[]
pcdataOrder;
<!ELEMENT A (#PCDATA | B | C)*>
String[] b;
<!ELEMENT B (#PCDATA)>
int[]
bOrder;
<!ELEMENT C (#PCDATA)>
String[] c;
int[]
==>
cOrder;
}
and then to order columns:
Classes
Tables
========================
========================================
Table PCDATA
class A {
-----Column a_fk
String[] pcdata;
/
Column pcdata
int[]
pcdataOrder;
/
Column pcdataOrder
String[] b;
/
Table B
int[]
bOrder;
a_pk--------Column a_fk
String[] c;
\
Column b
int[]
cOrder;
\
Column bOrder
}
\ Table C
Table A
==>
Column
\----Column a_fk
Column c
Column cOrder
Notice that the order properties are stored in tables
parallel to the properties that they order.
The following example shows order properties being used
to preserve sibling order in the "makes-no-sense"
example. One important thing to notice here is that all the
order properties share the same order space. An order
value that appears in one order property won't appear in
another order property.
Classes
Tables
=================================
=====================================
Table PCDATA
a_fk pcdata
pcdataOrder
---- ----------- ----------1
1
1
This text
object a {
makes
pcdata
no sense
1
except as
7
1
3
= {"This text ",
5
" makes ",
Table A
1
" no sense ",
an example. 9
" except as",
a_pk
" an example."}
Table B
pcdataOrder = {1, 3, 5, 7, 9}
a_fk b
bOrder
b
= {"bbbb", "bb"}
---- ---- -----bOrder
= {4, 8}
1
bbbb 4
==>
--
--
1
1
bb
c
= {"cc", "cccc"}
8
cOrder
= {2, 6}
}
Table C
a_fk
c
cOrder
---- ---- -----1
cc
2
1
cccc 6
Although order properties are most commonly used to
maintain order in mixed content, they can be used with
element content as well. For example, consider the
following element type definition. Because B can appear
an arbitrary number of times in A, it is stored in a
separate property table. Without order properties, there
would be no way to determine how to order the B
children. (Note that row order cannot be used here, as
relational databases are not guaranteed to return rows in
any particular order.)
<!ELEMENT A (B*, C)>
3.4.2.2. Storing Order in the Mapping
In many cases, sibling order is important only because of
validation; the application itself does not care about
sibling order except to be able to validate a document.
This is especially true of element content in data-centric
documents. In such cases, it may be sufficient to store
order information in the mapping itself.
For example, given the following content model, the
mapping could store the information that the children of
A are ordered B, then C, then D:
<!ELEMENT A (B, C, D)>
In practice, there are limitations to storing order
information in a mapping. For example, consider the
following content model:
<!ELEMENT A (B?, C, B)>
Constructing a document that matches this content model
requires software to decide first how much data is
available for constructing B elements. If there is only
enough data to construct one B element, it won't construct
the first B element, since the second B element is
required.
It is unlikely that most software will go to the trouble of
doing this. Instead, a reasonable limitation is to support
only those content models that group all siblings of the
same element type together. This is sufficient for many
data-centric content models and can be implemented by
storing the position of each element in the content model
in the mapping.
For example, order of siblings in the following content
models can be mapped this way. Note that in the third
content model, Author and Editor can both be assigned
the same order value or different values; if they are
assigned different values, all elements of one type one
will occur before any elements of the other type.
<!ELEMENT Part (Number, Description, Price)>
<!ELEMENT Order (Number, CustNum, Date, Item*)>
<!ELEMENT Book (Title, (Author | Editor)+, Price,
Review*)>
When order information is stored only in the mapping,
round-tripping of documents is not possible whenever the
content model contains more than one element of the
same type. For example, consider the following content
model:
<!ELEMENT A (B+, C)>
Although the mapping can tell the software that all B
elements must occur before the C element, it cannot
specify the order of the B elements. Thus, if data is
transferred from a document containing this content
model to the database and back again, there is no
guarantee that the B elements will occur in the same
order as in the original document. Fortunately, this is not
often a problem for data-centric documents.
Mapping DTDs to Databases
by Ronald Bourret | Pages: 1, 2, 3, 4, 5
3.5. Mapping Attributes
As was seen earlier, attributes are mapped to scalar
properties. This section will discuss the details of that
mapping, along with other issues.
3.5.1. Mapping Single- and Multi-Valued Attributes
There are two different classes of attributes: single-valued
(CDATA, ID, IDREF, NMTOKEN, ENTITY,
NOTATION, and enumerated) and multi-valued
(IDREFS, NMTOKENS, and ENTITIES). As might be
expected, these map to single-valued properties (and then
to columns) and multi-valued properties (and then to
property tables). For example:
DTD
Classes
Tables
============================
============
===========
<!ELEMENT A
A {
<!ATTLIST A
String b;
D
String c;
<!ELEMENT B
String d;
<!ELEMENT C
(B, C)>
Table A
class
Column B
CDATA #REQUIRED>
==>
Column C
(#PCDATA)>
Column D
(#PCDATA)>
==>
}
and:
DTD
Tables
========================
==============
Classes
==============
<!ELEMENT A (B, C)>
Table A
<!ATTLIST A
b;
Column a_pk
D IDREFS #IMPLIED>
c;
==>
Column b
<!ELEMENT B (#PCDATA)>
d;
Column c
<!ELEMENT C (#PCDATA)>
Table D
class A {
String
==>
String
String[]
}
Column a_fk
Column d
The order in which attributes occur is not significant
according to the XML Information Set. For example, the
following two XML documents are considered identical.
Because of this, there is no need to use order properties to
maintain the order in which attributes occur, although it
would certainly be possible to do so.
<A B="bbb"
C="ccc"
D="ddd"/>
<A C="ccc"
B="bbb"
D="ddd"/>
On the other hand, the order in which values occur in
multi-valued attributes is considered significant. As was
the case with sibling elements and PCDATA, order
properties can be used to maintain information about the
order in which values in a multi-valued attribute occur.
However, there is one important difference between the
order properties used for sibling elements and PCDATA
and those used for multi-valued attributes: each order
property used for a multi-valued attribute has its own
order space. This can be seen in the following example:
XML
===============
=========================
<A B="dd ee ff"
"ff"}
Objects
object a {
b = {"dd", "ee",
C="gg hh"/>
==>
bOrder = {1, 2, 3}
c = {"gg", "hh"}
cOrder = {1, 2}
}
Alert readers will note that order properties are not
strictly necessary at the object level; array order could be
used instead. However, they are needed in relational
databases, since there is no concept of row order there.
3.5.2. Mapping ID/IDREF(S) Attributes
ID attributes are used to uniquely identify elements in an
XML document. IDREF and IDREFS attributes are used
to associate one element with another by referring to the
latter element's ID. This is usually done when it is not
possible to form this association by nesting one element
inside another. For example, consider the following
directed graph:
A
/ \
B
C
\ /
D
This can be represented in an XML document as:
<A>
<B ref_d="1">
...
</B>
<C ref_d="1">
...
</C>
<D id="1">
...
</D>
</A>
ID/IDREF(S) attributes map to primary key, foreign key
relationships. For example, the above document could be
stored in the database with the following tables and
columns:
Table A
Column a_pk
...
/
\
/
\
Table B
Table C
Column a_fk
Column a_fk
Column ref_d
Column ref_d
...
...
\
/
\
/
Table D
Column a_fk
Column id
...
One thing that data transfer software needs to be careful
of when storing ID/IDREF(S) attributes in a database is
that IDs are only guaranteed to be unique inside a given
XML document. Thus, if the data from more than one
document is stored in the same table, there is no
guarantee that the IDs will be unique. The solution to this
problem is to somehow "decorate" the IDs. This could be
done by mapping the attributes to two columns, one of
which contains a value that is unique to each document
and the other of which contains the ID, or by decorating
the ID itself, such as prefixing it with a unique value.
A similar problem exists when transferring data from the
database to a XML document. If the retrieved data
originates from more than one document, the data transfer
software needs to ensure that the ID values are unique.
This might involve changing one or more values, along
with the values of any IDREF(S) attributes that reference
them.
Currently, most products do not support ID/IDREF
attributes as distinct from other attributes.
3.5.3. Mapping Notation Attributes
Notations are used in XML to alert the application of how
an element or unparsed entity is to be treated. For
example, the following "xhtml" notation might tell the
application that the element contains XHTML and should
be displayed by a browser:
<Desc type="xhtml">
<html>
<head><title>Turkey
wrench</title></title></head>
<body>
<p>A very <b>big</b> turkey wrench.</p>
</body>
</html>
</Desc>
Notation attributes and their values are generally of no
interest to the object-relational mapping; they're treated
as simply another attribute.
The only exception to this occurs when the notation
indicates the data type of the contained text. For example,
the notation "base64" might tell the application that an
element contains binary data encoded as Base64 (a
MIME encoding that maps binary data to a subset of USASCII). In most cases, this information will only be of
interest to software that generates mappings. It could use
the information to map the element type to a binaryvalued property and then to a BLOB (Binary Large
OBject). In such cases, the mapping itself does not use
this information. The mapping from element to BLOB is
independent of the fact that a notation contains data type
information.
The only exception to this is when the data transfer
software is sophisticated enough to switch mappings at
run time based on the notation value. In this case, each
possible notation is mapped to a data type, which is then
used to convert the data.
3.5.4. Mapping ENTITY/ENTITIES Attributes
ENTITY and ENTITIES attributes are used to associate
unparsed, external data (such as a binary file) with an
XML document. These are mapped the same as any other
attribute, except that, when the data is transferred, the
entity may be substituted for the attribute value (when
transferring data from XML to the database) or a new
entity may be created and its identifier stored as the
attribute value (when transferring data from the database
to XML). Because unparsed entity values can be
generated dynamically, it is a good idea for the mapping
to specify whether the value or the entity URI is to be
stored in the database.
Because unparsed entities always have associated
notations, it is possible to use those notations when
determining the data type of the entity (either at map time
or run time).
3.6. Alternate Mappings
In the previous sections, we have described how to map
DTDs to databases. In fact, this description has not been
complete, as there are a number of other ways we could
have done the mapping. In this section, we will discuss
two of the most significant alternatives.
3.6.1. Mapping Complex Element Types to Scalar Types
Although complex element types are normally mapped to
classes and then to tables, it is possible to map them to
scalar types. In other words, references to complex
element types can be mapped to scalar properties. The
value of such properties is generally the element content,
serialized as XML. This is useful when the element's
value makes sense only as a whole and shouldn't be
broken into smaller parts.
For example, consider an XML document that gives
information about a part. If one of the child elements is a
part description in XHTML, it probably makes no sense
to fragment it further. As we have already seen, this will
result in the data being scattered across numerous tables;
one for italic words, one for bold words, one for words
used in hyperlinks, etc. Therefore, it is better to store this
description in a single column:
DTD
Classes
Tables
===========================
===============
==============
<!ELEMENT Part (Num, Desc)>
Part {
Table Part
<!ELEMENT Number (#PCDATA)>
num;
==>
Column num
<!-- Use Inline entity
desc;
Column desc
from XHTML -->
<!ELEMENT Desc (%Inline;)>
class
==>
String
String
}
For example, the following description is stored as
follows:
<Part>
<Number>127</Number>
<Desc>
A very <b>big</b>
turkey wrench.
</Desc>
</Part>
=>
Table Part
Num Desc
--- ---------------127 A very <b>big</>
turkey wrench.
Note that storing data as XML does cause problems for
data transfer software. In particular, the software cannot
distinguish between markup and data. For example, how
does the application determine if the <b> in the following
text is a <b> element or text?
An example of the <b> element is <b>this
element</b>.
One solution to this is to store actual elements in the
database using tags and character data using entity
references:
An example of the &lt;b&gt; element is <b>this
element</b>.
The problem with this is that non-XML applications can't
then search the database as they might expect.
3.6.2. Mapping Scalar Properties to Property Tables
Although single, scalar-valued properties are usually
mapped to columns, they can be mapped to property
tables. This is useful, for example, for storing BLOBs or
infrequently used properties in a separate table from the
main table. For example:
Classes
===============
class Part {
String num;
String desc;
}
Tables
==================
==>
Table Parts
Column num
Table Descriptions
Column num
Column desc
3.6. Conclusions
The object-relational mapping works for all XML
documents, maps well to objects, and allows non-XML
applications to use the data in the database. Because of
this, it is a good choice for storing data-centric documents
and (not surprisingly) is used as the underlying model in
some middleware, most XML-enabled relational
databases, and most XML-enabled object servers.
It should be noted that all of these products implement
slightly different versions of the object-relational
mapping and none implements all possibilities in the
mapping. Of the more common parts of the mapping, all
map complex element types to classes and references to
element types to properties, as well as using primary key,
foreign key pairs to join tables. However, some map
columns only to PCDATA-only elements, others map
columns only to attributes, and still others allow both.
Similarly, most of these products do not support sibling
order or mixed content, and many do not allow users to
change names during the mapping.
The object-relational mapping is not a good choice for
ordinary documents. First, it is inefficient when used with
mixed content. Second, like the table-based mapping, it
does not preserve physical structure, comments, or
processing instructions.
4. Generating Schema
We now consider how to generate relational database
schema from a DTD according to the object-relational
mapping and vice versa. Since there are a number of
possible paths through the object-relational mapping, the
algorithms here simply choose the most commonly used
branch each time there is a choice. For example, single
references to PCDATA-only elements can be mapped to
a column or a separate property table. Since the most
common choice is to map them to a column, the
following algorithm generates a column from such
references.
For object-oriented databases, the generation process is
similar.
4.1. Generating Relational Database Schema from
DTDs
Relational schemas are generated by reading through the
DTD and processing each element type:

Complex element types generate class tables with
primary key columns.

Simple element types are ignored except when
processing content models.
To process a content model:





Single references to simple element types
generate columns; if the reference is optional (?
operator), the column is nullable.
Repeated references to simple element types
generate property tables with foreign keys.
References to complex element types generate
foreign keys in remote class tables.
PCDATA in mixed content generates a property
table with a foreign key.
Optionally generate order columns for all
referenced element types and PCDATA.
To process attributes:



Single-valued attributes generate columns; if the
attribute is optional, the column is nullable.
Multi-valued attributes generate property tables
with foreign keys.
If an attribute has a default, it is used as the
column default.
The following example shows how this process works.
Consider the following DTD:
DTD
Tables
=================================================
=================
<!ELEMENT
<!ELEMENT
<!ELEMENT
<!ELEMENT
Order (OrderNum, Date, CustNum, Item*)>
OrderNum (#PCDATA)>
Date (#PCDATA)>
CustNum (#PCDATA)>
<!ELEMENT Item (ItemNum, Quantity, Part)>
<!ELEMENT ItemNum (#PCDATA)>
<!ELEMENT Quantity (#PCDATA)>
<!ELEMENT Part (PartNum, Price)>
<!ELEMENT PartNum (#PCDATA)>
<!ELEMENT Price (#PCDATA)>
In the first step, we generate tables for complex element
types and primary keys for these tables:
DTD
Tables
=================================================
=================
<!ELEMENT Order (OrderNum, Date, CustNum, Item*)>
==>
Table Order
<!ELEMENT OrderNum (#PCDATA)>
Column OrderPK
<!ELEMENT Date (#PCDATA)>
<!ELEMENT CustNum (#PCDATA)>
<!ELEMENT Item (ItemNum, Quantity, Part)>
==>
Table Item
<!ELEMENT ItemNum (#PCDATA)>
Column ItemPK
<!ELEMENT Quantity (#PCDATA)>
<!ELEMENT Part (PartNum, Price)>
==>
Table Part
<!ELEMENT PartNum (#PCDATA)>
Column PartPK
<!ELEMENT Price (#PCDATA)>
In the second step, we generate columns for references to
simple element types:
DTD
Tables
=================================================
=================
<!ELEMENT Order (OrderNum, Date, CustNum, Item*)>
==>
Table Order
<!ELEMENT OrderNum (#PCDATA)>
Column OrderPK
<!ELEMENT Date (#PCDATA)>
Column OrderNum
<!ELEMENT CustNum (#PCDATA)>
Column Date
Column CustNum
<!ELEMENT Item (ItemNum, Quantity, Part)>
==>
Table Item
<!ELEMENT ItemNum (#PCDATA)>
Column ItemPK
<!ELEMENT Quantity (#PCDATA)>
Column ItemNum
Column Quantity
<!ELEMENT Part (PartNum, Price)>
==>
Table Part
<!ELEMENT PartNum (#PCDATA)>
Column PartPK
<!ELEMENT Price (#PCDATA)>
Column PartNum
Column Price
In the final step, we generate foreign keys for references
to complex element types:
DTD
Tables
=================================================
=================
<!ELEMENT Order (OrderNum, Date, CustNum, Item*)>
==>
Table Order
<!ELEMENT OrderNum (#PCDATA)>
Column OrderPK
<!ELEMENT Date (#PCDATA)>
Column OrderNum
<!ELEMENT CustNum (#PCDATA)>
Column Date
Column CustNum
<!ELEMENT Item (ItemNum, Quantity, Part)>
==>
Table Item
<!ELEMENT ItemNum (#PCDATA)>
Column ItemPK
<!ELEMENT Quantity (#PCDATA)>
Column ItemNum
Column Quantity
Column OrderFK
<!ELEMENT Part (PartNum, Price)>
==>
Table Part
<!ELEMENT PartNum (#PCDATA)>
Column PartPK
<!ELEMENT Price (#PCDATA)>
Column PartNum
Column Price
Column PartFK
A generated schema isn't going to be the same as a
human would have written. In addition to naming
problems (for example, a person might have called the
tables Orders, Items, and Parts), the generation algorithm
was unable to determine that the relationship between
Items and Parts was many-to-one. The algorithm was also
unable to recognize that OrderNum and PartNum could
be used as primary keys, and it could not determine data
types and column lengths, although XML Schemas will
solve the latter problem. Although no naming collisions
occurred or illegal names were generated, both are
possible.
4.2. Generating DTDs from Database Schema
DTDs are generated by starting from a single "root" table
or set of root tables and processing each:


Each root table generates an element type with
element content in the form of a single sequence.
Each data (non-key) column in the table generates
an element type with PCDATA-only content and
a reference in the sequence; nullable columns
generate optional references.
Primary and foreign keys are generated as follows:





The remote table is processed in the same manner
as a root table.
A reference to the element type for the remote
table is added to the sequence.
If the key is the primary key, the reference is
optional and repeated (* operator). This is because
there is no guarantee that a row will exist in the
foreign table, nor is there a guarantee that only
one row will exist.
If the key is the primary key, PCDATA-only
element types are optionally generated for each
column in the key. If these are generated,
references to these element types are added to the
sequence. This is useful only if primary keys
contain data.
If the key is a foreign key and is nullable, the
reference is optional (? operator).
During this process, the key used to reach the table (if
any) is not processed. This saves the algorithm from
generating element types that duplicate those created in
the parent table.
The following example shows how this process works.
Consider the following database schema:
Table Orders
Column OrderNum
Column Date
Column CustNum
Table Items
Column OrderNum
Column ItemNum
Column Quantity
Column PartNum
Table Parts
Column PartNum
Column Price
In our first step, we generate an element type for the root
table (Orders):
Tables
DTD
==================
===================================================
Table Orders
()>
Column OrderNum
Column Date
Column CustNum
Table Items
Column OrderNum
Column ItemNum
==>
<!ELEMENT Orders
Column Quantity
Column PartNum
Table Parts
Column PartNum
Column Price
Next, we generate PCDATA-only elements for the data
columns (Date and CustNum) and add references to these
elements to the content model of the Orders element:
Tables
DTD
==================
===================================================
Table Orders
(Date, CustNum)>
Column OrderNum
Column Date
(#PCDATA)>
Column CustNum
(#PCDATA)>
==>
<!ELEMENT Orders
<!ELEMENT Date
<!ELEMENT CustNum
Table Items
Column OrderNum
Column ItemNum
Column Quantity
Column PartNum
Table Parts
Column PartNum
Column Price
Now we generate a PCDATA-only element for the
primary key (OrderNum) and add a reference to it to the
content model:
Tables
DTD
==================
===================================================
Table Orders
==>
(Date, CustNum, OrderNum)>
Column OrderNum
OrderNum (#PCDATA)>
Column Date
(#PCDATA)>
Column CustNum
(#PCDATA)>
<!ELEMENT Orders
<!ELEMENT
<!ELEMENT Date
<!ELEMENT CustNum
Table Items
Column OrderNum
Column ItemNum
Column Quantity
Column PartNum
Table Parts
Column PartNum
Column Price
And then add an element for the table (Items) to which
the primary key is exported, as well as a reference to it in
the content model:
Tables
DTD
==================
===================================================
Table Orders
(Date, CustNum, OrderNum, Items*)>
Column OrderNum
OrderNum (#PCDATA)>
Column Date
(#PCDATA)>
Column CustNum
(#PCDATA)>
<!ELEMENT Orders
Table Items
Items()>
Column OrderNum
Column ItemNum
Column Quantity
Column PartNum
<!ELEMENT
==>
<!ELEMENT
<!ELEMENT Date
<!ELEMENT CustNum
Table Parts
Column PartNum
Column Price
We process the data and primary key columns in the
remote (Items) table in the same way:
Tables
DTD
==================
===================================================
Table Orders
(Date, CustNum, OrderNum, Items*)>
Column OrderNum
OrderNum (#PCDATA)>
Column Date
(#PCDATA)>
Column CustNum
(#PCDATA)>
<!ELEMENT Orders
Table Items
Items(ItemNum, Quantity)>
Column OrderNum
Column ItemNum
(#PCDATA)>
Column Quantity
Quantity (#PCDATA)>
Column PartNum
<!ELEMENT
==>
<!ELEMENT
<!ELEMENT Date
<!ELEMENT CustNum
<!ELEMENT ItemNum
<!ELEMENT
Table Parts
Column PartNum
Column Price
And then add an element for the table (Parts) to which the
foreign key corresponds:
Tables
DTD
==================
===================================================
Table Orders
(Date, CustNum, OrderNum, Items*)>
Column OrderNum
OrderNum (#PCDATA)>
Column Date
(#PCDATA)>
Column CustNum
(#PCDATA)>
<!ELEMENT Orders
<!ELEMENT
<!ELEMENT Date
<!ELEMENT CustNum
Table Items
Items(ItemNum, Quantity, Parts)>
Column OrderNum
Column ItemNum
(#PCDATA)>
Column Quantity
Quantity (#PCDATA)>
Column PartNum
<!ELEMENT
Table Parts
Parts()>
Column PartNum
Column Price
<!ELEMENT
==>
<!ELEMENT ItemNum
<!ELEMENT
Finally, we process the foreign key table (Parts):
Tables
DTD
==================
===================================================
Table Orders
(Date, CustNum, OrderNum, Items*)>
Column OrderNum
OrderNum (#PCDATA)>
Column Date
(#PCDATA)>
Column CustNum
(#PCDATA)>
<!ELEMENT Orders
Table Items
(ItemNum, Quantity, Parts)>
Column OrderNum
Column ItemNum
(#PCDATA)>
Column Quantity
Quantity (#PCDATA)>
Column PartNum
<!ELEMENT Items
Table Parts
Parts(PartNum, Price)>
Column PartNum
(#PCDATA)>
Column Price
(#PCDATA)>
<!ELEMENT
==>
<!ELEMENT
<!ELEMENT Date
<!ELEMENT CustNum
<!ELEMENT ItemNum
<!ELEMENT
<!ELEMENT PartNum
<!ELEMENT Price
As was the case in the previous section, the generated
DTD is not what a human would have created. Although
the only problems here are with names, the algorithm
cannot recognize order columns or property tables.
5. Mapping XML Schemas to Databases
Most XML schema languages can be mapped to
databases with an object-relational mapping. The exact
mappings depend on the language. DDML, DCD, and
XML Data Reduced schemas can be mapped in a manner
almost identical to DTDs. The mappings for W3C
Schemas, Relax, TREX, and SOX appear to be somewhat
more complex. It is not clear to me that Schematron can
be mapped.
In the case of W3C Schemas, a complete mapping to
object schemas and then to database schemas is available.
Briefly, this maps complex types to classes (with
complex type extension mapped to inheritance) and maps
simple types to scalar data types (although many facets
are lost). "All" groups are treated like unordered
sequences and substitution groups are treated like
choices. Finally, most identity constraints are mapped to
keys. For complete details, see
http://www.rpbourret.com/xml/SchemaMap.htm.
6. Related Topics
For a wider discussion of XML and databases, see:
XML and Databases
(http://www.rpbourret.com/xml/XMLAndDatabases.htm)
For a reasonably updated list of XML database products,
see:
XML Database Products
(http://www.rpbourret.com/xml/XMLDatabaseProds.htm)
If you have comments or questions about this paper, you
can reach me at:
[email protected]