Download Document

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

PL/SQL wikipedia , lookup

Data vault modeling wikipedia , lookup

Concurrency control wikipedia , lookup

Resource Description Framework wikipedia , lookup

Semantic Web wikipedia , lookup

Database wikipedia , lookup

National Information Exchange Model wikipedia , lookup

Operational transformation wikipedia , lookup

Entity–attribute–value model wikipedia , lookup

Relational model wikipedia , lookup

Versant Object Database wikipedia , lookup

Clusterpoint wikipedia , lookup

Database model wikipedia , lookup

XML wikipedia , lookup

Transcript
Tamino –
a DBMS Designed for XML
Dr. Harald Schoning
Presenter: Wenhui Li
University of Ottawa
Instructed by:
Dr. Mengchi Liu
Carleton University
Abstract



Who?- Software AG
What?- XML database management system
When?



1999 the first time unveiled
2004 June Tamino XML Server 4.2
Why?


management and transfer of structured and
unstructured data
completely designed for XML
Industry Background


XML is becoming prevailing for data
processing in the internet.
Early goals of Tamino


Easy data exchanging
Evolution trend


Storing, managing, publishing and
exchanging XML documents
Business modeling
Industry Background cont’
XML support in databases



Oracle XML Developer’s Kit
SQL Server 2000
DB2 XML Extender
Limitations of XML support via
traditional RDBMS or ORDB


XML is not well-structured like
RDB,ORDB or OODB
Storing and querying XML is possible
but not feasible in these DB systems
Two Modeling approaches

Data-centric documents




Regular structure
Order does not matter
No mixed content
Document-centric documents



less regular structure
significance of the order
mixed content
Why don’t use relational DB


XML documents can have schematic
information (DTD), but they are not
required to.
classical database handling objects of a
predefined type, cannot be applied in
XML
Why doesn’t use XML itself?


XML is just a markup language, it does
not contain processing facilities on its
own
querying a set of XML documents is
outside the scope of the XML
recommendation
Therefore, comes the Tamino!
What does Tamino do?



What’s Tamino (the 1st slide)
Store XML documents, HTML files and
GIF images, etc.
Retrieve them in a set-oriented manner,
with sophisticated query facilities
Tamino’s architecture
The schema of XML
documents




XML support schematic information, but
it differs from the classical databases
DTD have a couple of deficiencies (e.g.
data type)
W3C working group is developing an
XML schema description language
However, DTD is the only standard
schema at present
XML schema vs.
RDB and OODB schema




In RDB or OODB, the schema is created
before the instances can be stored
Instances must conform to the declared
schema
In XML database, each instance declares a
schema on its own.
for XML documents, grouping of objects of
homogeneous structure into (pre-defined)
tables or classes doesn’t work
Query and Index of XML
schema




Queries operate on sets
Indexes are defined on the basis of a
common schema
For the purpose of querying, arbitrary objects
could be grouped to sets
Index definition also requires at least a
common subset in the structure
Schema handling in Tamino





Grouping documents by open content model
+ user-directed document grouping
Documents grouped into collections
Within a collection, declare several document
types
For each document type define a common
schema (open content model)
For each document, Tamino assigns one of
the document type
Type Assignment




Assignment is based on the root element type
Document must match the schema of the
document type assigned, but might have
additional elements/attributes
In a document type, documents might differ
considerably
If no appropriate document type, document is
stored without any schema checking
Tamino schema example
Document accepted by Tamino
<City Inhabitants=”138000”>
<Name>Darmstart</Name>
<Addition>The city of art nouveaud</Addtion>
<Monument Height=”39m”>
<Name>Langer Ludwig</Name>
<Location>
<Name>Luisenplatz<Name>
<MapIndex>M5</MapIndex>
</Location>
</Monument>
</City>
Is an element/attribute should be
modeled?





an index will be defined on this
element/attribute
the element/attribute is to be mapped to an
external data source or to a server extension
dedicated access rights will be defined on the
element/attribute
the presence / multiplicity of the element is to
be enforced
one of the above conditions hold for a child of
the element
Indexing of Tamino

value-based indexes




well known from traditional database
systems
used to accelerate the search
exactly address the data object
names need not be unique within a DTD
Example of value-based index

value-based indexes
data-centric view
<!ELEMENT City (Name, Inhabitants, Monument+)>
<!ELEMENT Monument (Name, Description)>
<!ELEMENT Inhabitants (#PCDATA)>
<!ELEMENT Name (#PCDATA)>
<!ELEMENT Description (#PCDATA)>

Indexing of Tamino (cont’)

text indexing



document-centric view
limit the scope to a specific part of the
document
the scope might span element content
Example of text index

text indexing
document-centric view
<statement>
<author>
<firstname>Harald></firstname>
<lastname>Schoning</lastname>
</author>
<text>
X<italic>M</italic>L and X<italic>S</italic>L
are <stressed>very</stressed> important
</text>
</speech>

Indexing of Tamino (cont’)

structural index



If multiplicity permits the omission of
elements
or if no DTD is known
Example


in a database of all European cities
search all those cities which have an
element called “beach”
Querying XML documents





Currently, there is no standardized query
language
XPath allows positioning within a single
document
XPath fits well the needs of retrieval in datacentric environments
document-centric environments need a more
content-based retrieval facility
Tamino also supports full text search
Expectation for XML processor



W3C:XML recommendation specifies the
handling of entities, comments and
processing instructions.
User: Tamino, leave comments intact, no
processing instruction evaluated, leave entity
references unresolved.
User: the output of a Tamino query should
match the specification of an XML processor.
Why don’t leave entities
unresolved?




In case result is a set of (parts of) matching
documents
This result DTD must include all different
entity declarations of the original document
Definition of the entity might differ from
document to document
So, for the same entity name, entities are
renamed, and the entity references are
changed accordingly.
problems of external entities



These entities can change without the database
system knowing about this
Thus, the values of external entities must not be
included in indexes
Example:
<!ENTITY &mysubject SYSTEM
“http://www.softwareag.com/hottopic.xml”>
...
<ticker>Todays hot topic: &mysubject</ticker>

Checking the current contents of the external entity
lead to unacceptable response times.
Relational Databases and XML




major (object-) relation database
systems include some forms of XML
support
The simplest form is to generate XML
documents for existing relational data.
But, real database handling of XML
requires that XML data can be stored
and retrieved
Two approaches
XML support approach(1)



Map the XML document is to relational
tables and their columns
Markup is ignored on storage, and
reconstructed on retrieval
advantage of this approach:

the contents of an XML document can be
handled with traditional SQL
XML support approach(1) cont’

Shortcomings:

The sequence information lost
<Order CustomerId=”567” Date=”12- 12-2000”>
<Item ProductID=” 17” Quantity=”2”/>
<Item ProductID=”l6” Quantity=”9”/>
<Item ProductID=“ 19 ” Quantity=“8”/>
</Order>
The retrieval of the order:
<Order CustomerId=”567” Date=”12-12-2000”>
<Item ProductID=” 16” Quantity=”9’/>
<Item ProductID=” 17” Quantity=”2”/>
<Item ProductID=” 19” Quantity=”8”/>
</Order>
XML support approach(1) cont’



Data-centric documents sequence might
not matter, it does for document-centric
this approach loses all comments and
processing instructions
mixed content cannot be stored easily
in this model
XML support approach(2)




Leaves the XML document intact and
stores it in a large text field (“BLOB”)
Or even outside the database
Text search is possible
Can limit a certain text-based condition
XML support approach(2) cont’

Limitations:


no structure-aware combinations are
possible
Value-based search is not supported on
these text fields



IBM solution: side tables
But, direct manipulation of side tables destroys
the consistency of the database
Security can be defined on document level
only, but not on elements or attributes
Summary






Tamino was designed with particular attention to the
XML
Schema handling for XML is different from relational
databases does
In Schema handling, external entities cause
conceptual problems
value-based indexes are useful for XML, as well as
text index and structural index
Comments and processing instructions should be
preserved when documents are stored
The result of a query against an XML database
should be XML
Q&A
Thanks!