Download Presentación de PowerPoint

Document related concepts

Relational algebra wikipedia , lookup

Object-relational impedance mismatch wikipedia , lookup

Entity–attribute–value model wikipedia , lookup

Relational model wikipedia , lookup

Clusterpoint wikipedia , lookup

Database model wikipedia , lookup

Transcript
Ernestina Menasalvas
Facultad de Informática.
Universidad Politécnica de Madrid
[email protected]
Sources
The slides are taken from:
• Database System Concepts©Silberschatz, Korth and
Sudarshan. See www.db-book.com for conditions on re-use
• XML tutorial at http://www.w3schools.com/xml/default.asp (the
ones with no footnote)
AGENDA
•
•
•
•
•
•
•
Introduction
Structure of XML Data
XML Document Schema
Querying and Transformation
Application Program Interfaces to XML
Storage of XML Data
XML Applications
Introduction
What is XML?
•
•
•
•
•
•
XML stands for EXtensible Markup Language
XML is a markup language much like HTML
XML was designed to describe data
XML tags are not predefined. You must define your own tags
Defined by the WWW Consortium (W3C)
Derived from SGML (Standard Generalized Markup Language),
but simpler to use than SGML
• Documents have tags giving extra information about sections of
the document
– E.g. <title> XML </title> <slide> Introduction …</slide>
• XML uses a Document Type Definition (DTD) or an XML
Schema to describe the data
• XML with a DTD or XML Schema is designed to be selfdescriptive
The Main Difference Between XML and HTML
• XML was designed to carry data.
• XML is not a replacement for HTML.
XML and HTML were designed with different goals:
– XML was designed to describe data and to focus on what data is.
HTML was designed to display data and to focus on how data looks.
– HTML is about displaying information, while XML is about describing
information.
• XML is a Complement to HTML
• It is important to understand that XML is not a replacement for
HTML. In future Web development it is most likely that XML will be
used to describe the data, while HTML will be used to format and
display the same data.
• XML is a cross-platform, software and hardware independent
tool for transmitting information.
How can XML be used
• It is important to understand that XML was designed to store, carry,
and exchange data. XML was not designed to display data.
• XML can Separate Data from HTML
• With XML, your data is stored outside your HTML.
• When HTML is used to display data, the data is stored inside your
HTML. With XML, data can be stored in separate XML files. This
way you can concentrate on using HTML for data layout and
display, and be sure that changes in the underlying data will not
require any changes to your HTML.
• XML data can also be stored inside HTML pages as "Data
Islands". You can still concentrate on using HTML only for
formatting and displaying the data.
Why using XML
•
XML is Used to Exchange Data
– With XML, data can be exchanged between incompatible systems.
•
XML and B2B
– With XML, financial information can be exchanged over the Internet.
•
XML Can be Used to Share Data
– With XML, plain text files can be used to share data.
•
XML Can be Used to Store Data
– With XML, plain text files can be used to store data.
•
XML Can Make your Data More Useful
– With XML, your data is available to more users.
•
XML Can be Used to Create New Languages
– XML is the mother of WAP and WML.
•
If Developers Have Sense
– If they DO have sense, all future applications will exchange their data in
XML.
XML: applications
• Data interchange is critical in today’s networked world
– Examples:
• Banking: funds transfer
• Order processing (especially inter-company orders)
• Scientific data
– Chemistry: ChemML, …
– Genetics: BSML (Bio-Sequence Markup Language),
…
– Paper flow of information between organizations is being replaced
by electronic flow of information
• Each application area has its own set of standards for
representing information
• XML has become the basis for all new generation data
interchange formats
Database System Concepts - 5th Edition, Aug 22, 2005.
10.9
©Silberschatz, Korth and Sudarshan
XML applications (Cont.)
• Earlier generation formats were based on plain text with line
headers indicating the meaning of fields
– Similar in concept to email headers
– Does not allow for nested structures, no standard “type” language
– Tied too closely to low level document structure (lines, spaces, etc)
• Each XML based standard defines what are valid elements, using
– XML type specification languages to specify the syntax
• DTD (Document Type Descriptors)
• XML Schema
– Plus textual descriptions of the semantics
• XML allows new tags to be defined as required
– However, this may be constrained by DTDs
• A wide variety of tools is available for parsing, browsing and
querying XML documents/data
Database System Concepts - 5th Edition, Aug 22, 2005.
10.10
©Silberschatz, Korth and Sudarshan
XML Does not DO Anything
• XML was not designed to DO anything.
• Maybe it is a little hard to understand, but XML does not DO
anything. XML was created to structure, store and to send
information.
• The following example is a note to Tove from Jani, stored as XML:
<note> <to>Tove</to>
<from>Jani</from>
<heading>Reminder</heading>
<body>Don't forget me this weekend!</body>
</note>
• The note has a header and a message body. It also has sender
and receiver information. But still, this XML document does not DO
anything. It is just pure information wrapped in XML tags.
• Someone must write a piece of software to send,
receive or display it.
Comparison with Relational Data
• Inefficient: tags, which in effect represent schema information, are
repeated
• Better than relational tuples as a data-exchange format
– Unlike relational tuples, XML data is self-documenting due to presence
of tags
– Non-rigid format: tags can be added
– Allows nested structures
– Wide acceptance, not only in database systems, but also in browsers,
tools, and applications
Database System Concepts - 5th Edition, Aug 22, 2005.
10.12
©Silberschatz, Korth and Sudarshan
Syntax
XML syntax rules
•
•
Rules are very simple and very strict. easy to learn, and very easy to use.
XML documents use a self-describing and simple syntax.
<?xml version="1.0" encoding="ISO-8859-1"?>
<note>
<to>Tove</to>
<from>Jani</from>
<heading>Reminder</heading>
<body>Don't forget me this weekend!</body>
</note>
•
•
•
•
The first line in the document - the XML declaration - defines the XML
version and the character encoding used in the document. In this case the
document conforms to the 1.0 specification of XML and uses the ISO8859-1 (Latin-1/West European) character set.
The next line describes the root element of the document (like it was
saying: "this document is a note"):
<note> The next 4 lines describe 4 child elements of the root (to, from,
heading, and body):
And finally the last line defines the end of the root element
XML syntax rules
• All XML Elements Must Have a Closing Tag
– With XML, it is illegal to omit the closing tag.
• In HTML some elements do not have to have a closing tag.
<p>This is a paragraph
<p>This is another paragraph
• XML declaration does not have a closing tag. This is not an error.
The declaration is not a part of the XML document itself. It is not an
XML element, and it should not have a closing tag.
<p>This is a paragraph</p>
• XML Tags are Case Sensitive Unlike HTML,
– With XML, the tag <Letter> is different from the tag <letter>.
• XML Elements Must be Properly Nested
<b><i>This text is bold and italic</i></b>
• Comments in XML similar to that of HTML.
<!-- This is a comment -->
XML syntax rules
•
All XML documents must contain a single tag pair to define a root
element.
– All other elements must be within this root element.
– All elements can have sub elements (child elements). Sub elements must be
correctly nested within their parent element:
<root> <child> <subchild>.....</subchild> </child> </root>
•
•
•
XML Attribute Values Must be Quoted
With XML, it is illegal to omit quotation marks around attribute
values.
XML elements can have attributes in name/value pairs just like in HTML.
In XML the attribute value must always be quoted. Study the two XML
documents below. The first one is incorrect, the second is correct:
<?xml version="1.0" encoding="ISO-8859-1"?>
<note date="12/11/2002">
<to>Tove</to>
<from>Jani</from>
</note>
•
With XML, the white space in your document is not truncated. This is
unlike HTML.
XML is Free and Extensible
• XML tags are not predefined. You must "invent" your
own tags.
• The tags used to mark up HTML documents and the
structure of HTML documents are predefined. The
author of HTML documents can only use tags that are
defined in the HTML standard (like <p>, <h1>, etc.).
• XML allows the author to define his own tags and his
own document structure.
• The tags in the example above (like <to> and <from>)
are not defined in any XML standard. These tags are
"invented" by the author of the XML document.
XML tags
• The ability to specify new tags, and to create nested tag structures
make XML a great way to exchange data, not just documents.
– Much of the use of XML has been in data exchange applications, not as a
replacement for HTML
• Tags make data (relatively) self-documenting
– E.g.
<bank>
<account>
<account_number> A-101 </account_number>
<branch_name>
Downtown </branch_name>
<balance>
500
</balance>
</account>
<depositor>
<account_number> A-101 </account_number>
<customer_name> Johnson </customer_name>
</depositor>
</bank>
Database System Concepts - 5th Edition, Aug 22, 2005.
10.18
©Silberschatz, Korth and Sudarshan
Structure of XML Data
Structure of XML Data
• Tag: label for a section of data
• Element: section of data beginning with <tagname>
and ending with matching </tagname>
• Elements must be properly nested
– Proper nesting
• <account> … <balance> …. </balance> </account>
– Improper nesting
• <account> … <balance> …. </account> </balance>
– Formally: every start tag must have a unique matching end
tag, that is in the context of the same parent element.
• Every document must have a single top-level element
Database System Concepts - 5th Edition, Aug 22, 2005.
10.20
©Silberschatz, Korth and Sudarshan
Elements are extensible
• XML documents can be extended to carry more information.
• Look at the following XML NOTE example:
<note>
<to>Tove</to>
<from>Jani</from>
<body>Don't forget me this weekend!</body>
</note>
•
Let's imagine that we created an application to produce this
output:
MESSAGE To: Tove
From: Jani
Don't forget me this weekend!
Elements are extensible
• Imagine that the author of the XML document added some extra
information to it:
<note>
<date>2002-08-01</date>
<to>Tove</to>
<from>Jani</from>
<body>Don't forget me this weekend!</body>
</note>
• Should the application break or crash?
• No.
• The application should still be able to find the <to>, <from>, and
<body> elements in the XML document and produce the same
output.
Elements have Content
• An XML element is everything from (including) the
element's start tag to (including) the element's end tag.
• An element can have:
–
–
–
–
–
element content,
mixed content,
simple content, or
empty content.
An element can also have attributes.
Elements have Content
book has element content, because it contains other elements.
<book>
Prod has empty content,
<title>My First XML</title>
because it carries no information
<prod id="33-657" media="paper"></prod>
<chapter>Introduction to XML
Chapter has mixed content because it
<para>What is HTML</para>
contains both text and other elements
<para>What is XML</para>
Para has simple content (or text
</chapter>
content)
<chapter>XML Syntax
<para>Elements must have a closing tag</para>
<para>Elements must be properly nested</para>
</chapter>
</book>
In the example above only the prod element has attributes.
The attribute named id has the value "33-657".
The attribute named media has the value "paper".
Element Naming
• XML elements must follow these naming rules:
–
–
–
–
Names can contain letters, numbers, and other characters
Names must not start with a number or punctuation character
Names must not start with the letters xml (or XML, or Xml, etc)
Names cannot contain spaces
• follow these simple rules:
– Any name can be used, no words are reserved, but the idea is to
make names descriptive.
– Avoid "-" and "." in names. (subtract. or property )
– Element names can be as long as you like,
– XML documents often have a corresponding database, in which fields
exist corresponding to elements in the XML document. Use the
naming rules of your database for the elements in the XML
documents.
– Non-English letters like éòá are perfectly legal in XML element names,
but watch out for problems if your software vendor doesn't support
them.
– The ":" should not be used in element names because it is reserved to
be used for something called namespaces
XML elements attributes
•
•
•
XML elements can have attributes in the start tag, just like HTML.
Attributes are used to provide additional information about elements.
From HTML
<IMG SRC="computer.gif">.
The SRC attribute provides additional information about the IMG element.
•
•
•
•
In HTML (and in XML) attributes provide additional information about
elements:
Attributes often provide information that is not a part of the data. In the
example below, the file type is irrelevant to the data, but important to the
software that wants to manipulate the element:
<file type="gif">computer.gif</file>
Attribute values must always be enclosed in quotes, but either single
or double quotes can be used.
<person sex="female">
<person sex='female'>
Attributes
• Elements can have attributes
<account acct-type = “checking” >
<account_number> A-102 </account_number>
<branch_name> Perryridge </branch_name>
<balance> 400 </balance>
</account>
• Attributes are specified by name=value pairs inside the starting
tag of an element
• An element may have several attributes, but each attribute name
can only occur once
<account acct-type = “checking” monthly-fee=“5”>
Database System Concepts - 5th Edition, Aug 22, 2005.
10.27
©Silberschatz, Korth and Sudarshan
Use of Elements vs. Attributes
• Data can be stored in child elements or in attributes.
• Take a look at these examples:
<person sex="female">
<firstname>Anna</firstname>
<lastname>Smith</lastname>
</person>
<person>
<sex>female</sex>
<firstname>Anna</firstname>
<lastname>Smith</lastname>
</person>
• In the first example sex is an attribute. In the last, sex is a child
element. Both examples provide the same information.
• There are no rules about when to use attributes, and when to use
child elements.
• In XML use child elements if the information feels like data.
Attributes vs. Subelements
• Distinction between subelement and attribute
– In the context of documents, attributes are part of markup, while
subelement contents are part of the basic document contents
– In the context of data representation, the difference is unclear and
may be confusing
• Same information can be represented in two ways
– <account account_number = “A-101”> …. </account>
– <account>
<account_number>A-101</account_number> …
</account>
– Suggestion: use attributes for identifiers of elements, and use
subelements for contents
Database System Concepts - 5th Edition, Aug 22, 2005.
10.29
©Silberschatz, Korth and Sudarshan
Avoid using attributes?
• Some of the problems with using attributes are:
– attributes cannot contain multiple values (child
elements can)
– attributes are not easily expandable (for future
changes)
– attributes cannot describe structures (child elements
can)
– attributes are more difficult to manipulate by program
code
– attribute values are not easy to test against a
Document Type Definition (DTD) - which is used to
define the legal elements of an XML document
Avoid using attributes?
• If you use attributes as containers for data, you end up with
documents that are difficult to read and maintain.
• Try to use elements to describe data.
• Use attributes only to provide information that is not relevant to the
data.
• Don't end up like this (this is not how XML should be used):
<note day="12" month="11" year="2002" to="Tove" from="Jani"
heading="Reminder" body="Don't forget me this weekend!"> </note>
• metadata (data about data) should be stored as attributes, and that
data itself should be stored as elements.
Elements can be nested
<bank-1>
<customer>
<customer_name> Hayes </customer_name>
<customer_street> Main </customer_street>
<customer_city> Harrison </customer_city>
<account>
<account_number> A-102 </account_number>
<branch_name>
Perryridge </branch_name>
<balance>
400 </balance>
</account>
<account>
…
</account>
</customer>
.
.
</bank-1>
Database System Concepts - 5th Edition, Aug 22, 2005.
10.32
©Silberschatz, Korth and Sudarshan
Motivation for Nesting
• Nesting of data is useful in data transfer
– Example: elements representing customer_id, customer_name, and
address nested within an order element
• Nesting is not supported, or discouraged, in relational databases
– With multiple orders, customer name and address are stored
redundantly
– normalization replaces nested structures in each order by foreign key
into table storing customer name and address information
– Nesting is supported in object-relational databases
• But nesting is appropriate when transferring data
– External application does not have direct access to data referenced
by a foreign key
Database System Concepts - 5th Edition, Aug 22, 2005.
10.33
©Silberschatz, Korth and Sudarshan
Elements and text
• Mixture of text with sub-elements is legal in
XML.
– Example:
<account>
This account is seldom used any more.
<account_number> A-102</account_number>
<branch_name> Perryridge</branch_name>
<balance>400 </balance>
</account>
– Useful for document markup, but discouraged for
data representation
Database System Concepts - 5th Edition, Aug 22, 2005.
10.34
©Silberschatz, Korth and Sudarshan
Namespaces
• XML data has to be exchanged between organizations
• Same tag name may have different meaning in different
organizations, causing confusion on exchanged documents
• Specifying a unique string as an element name avoids confusion
• Better solution: use unique-name:element-name
• Avoid using long unique names all over document by using XML
Namespaces
<bank Xmlns:FB=‘http://www.FirstBank.com’>
…
<FB:branch>
<FB:branchname>Downtown</FB:branchname>
<FB:branchcity> Brooklyn </FB:branchcity>
</FB:branch>
…
</bank>
Database System Concepts - 5th Edition, Aug 22, 2005.
10.35
©Silberschatz, Korth and Sudarshan
More on XML Syntax
• Elements without subelements or text content can be abbreviated
by ending the start tag with a /> and deleting the end tag
– <account number=“A-101” branch=“Perryridge” balance=“200 />
• To store string data that may contain tags, without the tags being
interpreted as subelements, use CDATA as below
– <![CDATA[<account> … </account>]]>
Here, <account> and </account> are treated as just strings
CDATA stands for “character data”
Database System Concepts - 5th Edition, Aug 22, 2005.
10.36
©Silberschatz, Korth and Sudarshan
DTD
Well Formed XML Documents
•
•
A "Well Formed" XML document has correct XML syntax.
A "Well Formed" XML document is a document that conforms to the XML
syntax rules that were described in the previous chapters:
–
–
–
–
–
XML documents must have a root element
XML elements must have a closing tag
XML tags are case sensitive
XML elements must be properly nested
XML attribute values must always be quoted
<?xml version="1.0" encoding="ISO-8859-1"?>
<note>
<to>Tove</to>
<from>Jani</from>
<heading>Reminder</heading>
<body>Don't forget me this weekend!</body>
</note>
Valid XML Documents
• A "Valid" XML document also conforms to a DTD.
• A "Valid" XML document is a "Well Formed" XML document, which
also conforms to the rules of a Document Type Definition (DTD):
<?xml version="1.0" encoding="ISO-8859-1"?>
<!DOCTYPE note SYSTEM "InternalNote.dtd">
<note>
<to>Tove</to>
<from>Jani</from>
<heading>Reminder</heading>
<body>Don't forget me this weekend!</body>
</note>
DTD vs SCHEMA
• A DTD defines the legal elements of an XML document.
• The purpose of a DTD is to define the legal building blocks of an
XML document. It defines the document structure with a list of legal
elements.
• Why Use a DTD?
– With a DTD, each of your XML files can carry a description of its own
format.
– With a DTD, independent groups of people can agree to use a
standard DTD for interchanging data.
– Your application can use a standard DTD to verify that the data you
receive from the outside world is valid.
– You can also use a DTD to verify your own data.
• W3C supports an XML based alternative to DTD called XML
Schema
Document Type Definition (DTD)
• The type of an XML document can be specified using a DTD
• DTD constraints structure of XML data
– What elements can occur
– What attributes can/must an element have
– What subelements can/must occur inside each element, and how
many times.
• DTD does not constrain data types
– All values represented as strings in XML
• DTD syntax
– <!ELEMENT element (subelements-specification) >
– <!ATTLIST element (attributes) >
Database System Concepts - 5th Edition, Aug 22, 2005.
10.41
©Silberschatz, Korth and Sudarshan
DTD introduction
•
•
•
A DTD can be declared inline inside an XML document, or as an external
reference.
If the DTD is declared inside the XML file It should be wrapped in a
DOCTYPE definition with the following syntax:
<!DOCTYPE root-element [element-declarations]>
Example XML document with an internal DTD:
<?xml version="1.0"?>
<!DOCTYPE note [
<!ELEMENT note (to,from,heading,body)>
<!ELEMENT to
(#PCDATA)>
<!ELEMENT from (#PCDATA)>
<!ELEMENT heading (#PCDATA)>
<!ELEMENT body (#PCDATA)>
]>
<note>
<to>Tove</to>
<from>Jani</from>
<heading>Reminder</heading>
<body>Don't forget me this
weekend</body>
</note>
DTD introduction (cont)
•
External DTD Declaration
– If the DTD is declared in an external file, it should be wrapped in a DOCTYPE
definition with the following syntax:
• <!DOCTYPE root-element SYSTEM "filename">
•
This is the same XML document as above, but with an external DTD
<?xml version="1.0"?>
<!DOCTYPE note SYSTEM "note.dtd">
<note>
<to>Tove</to>
<from>Jani</from>
<heading>Reminder</heading>
<body>Don't forget me this weekend!</body>
</note>
•
And this is the file "note.dtd" which contains the DTD:
<!ELEMENT note (to,from,heading,body)>
<!ELEMENT to (#PCDATA)>
<!ELEMENT from (#PCDATA)>
<!ELEMENT heading (#PCDATA)>
<!ELEMENT body (#PCDATA)>
DTD - XML Building Blocks
• Seen from a DTD point of view, all XML documents (and HTML
documents) are made up by the following building blocks:
–
–
–
–
–
Elements
Attributes
Entities
PCDATA
CDATA
Elements and attributes
•
Elements are the main building blocks of both XML and HTML
documents.
•
Examples of HTML elements are "body" and "table". Examples of XML
elements could be "note" and "message". Elements can contain text, other
elements, or be empty. Examples of empty HTML elements are "hr", "br"
and "img".
Examples:
•
<body>some text</body>
<message>some text</message>
•
Attributes provide extra information about elements.
Attributes are always placed inside the opening tag of an element. Attributes
always come in name/value pairs. The following "img" element has additional
information about a source file:
<img src="computer.gif" />
The name of the element is "img". The name of the attribute is "src". The
value of the attribute is "computer.gif". Since the element itself is empty it
is closed by a " /".
Entities
• Some characters have a special meaning in XML, like the less
than sign (<) that defines the start of an XML tag.
• Most of you know the HTML entity: "&nbsp;". This "no-breakingspace" entity is used in HTML to insert an extra space in a
document. Entities are expanded when a document is parsed by
an XML parser.
• The following entities are predefined in XML:
Entity References
&lt;
&gt;
&amp;
&quot;
&apos;
Character
<
>
&
"
'
Entities
• Entities are variables used to define shortcuts to standard text or
special characters.
•
Entity references are references to entities
•
Entities can be declared internal or external
An Internal Entity Declaration
•
Syntax
•
<!ENTITY entity-name "entity-value">
•
Example
DTD Example:
<!ENTITY writer "Donald Duck.">
<!ENTITY copyright "Copyright W3Schools.">
XML example:
<author>&writer;&copyright;</author>
•
Note: An entity has three parts: an ampersand (&), an entity name, and a
semicolon (;).
An External Entity Declaration
• Syntax
• <!ENTITY entity-name SYSTEM "URI/URL">
DTD Example:
<!ENTITY writer SYSTEM "http://www.w3schools.com/entities.dtd">
<!ENTITY copyright SYSTEM
"http://www.w3schools.com/entities.dtd">
• XML example:
<author>&writer;&copyright;</author>
PCDATA and CDATA
• PCDATA means parsed character data.
– Think of character data as the text found between the start tag and the
end tag of an XML element.
– PCDATA is text that WILL be parsed by a parser. The text will be
examined by the parser for entities and markup.
– Tags inside the text will be treated as markup and entities will be
expanded.
– However, parsed character data should not contain any &, <, or >
characters; these need to be represented by the &amp; &lt; and &gt;
entities, respectively.
• CDATA means character data.
– CDATA is text that will NOT be parsed by a parser. Tags inside the
text will NOT be treated as markup and entities will not be expanded.
Empty Elements and PCDATA
• Empty elements are declared with the category keyword EMPTY:
<!ELEMENT element-name EMPTY>
• Example:
<!ELEMENT br EMPTY>
• XML example:
<br />
• Elements with Parsed Character Data
• Elements with only parsed character data are declared with
#PCDATA inside parentheses:
<!ELEMENT element-name (#PCDATA)>
• Example:
• <!ELEMENT from (#PCDATA)>
More on elements
•
Elements declared with the category keyword ANY, can contain any
combination of parsable data:
<!ELEMENT element-name ANY>
Example:
<!ELEMENT note ANY>
•
Elements with one or more children (sequences) are declared with the
name of the children elements inside parentheses:
<!ELEMENT element-name (child1)>
or
<!ELEMENT element-name (child1,child2,...)>
•
Example:
<!ELEMENT note (to,from,heading,body)>
•
When children are declared in a sequence separated by commas, the
children must appear in the same sequence in the document. In a full
declaration, the children must also be declared, and the children can also
have children. The full declaration of the "note" element is:
<!ELEMENT note (to,from,heading,body)>
<!ELEMENT to
(#PCDATA)>
<!ELEMENT from (#PCDATA)>
<!ELEMENT heading (#PCDATA)>
<!ELEMENT body (#PCDATA)>
Elements (cont)
•
Declaring Only One Occurrence of an Element
<!ELEMENT element-name (child-name)>
<!ELEMENT note (message)>
•
The example above declares that the child element "message" must occur
once, and only once inside the "note" element.
•
Declaring Minimum One Occurrence of an Element
<!ELEMENT element-name (child-name+)>
<!ELEMENT note (message+)>
•
The + sign in the example above declares that the child element
"message" must occur one or more times inside the "note" element.
•
Declaring Zero or More Occurrences of an Element
<!ELEMENT element-name (child-name*)>
<!ELEMENT note (message*)>
•
The * sign in the example above declares that the child element
"message" can occur zero or more times inside the "note" element.
More on elements
•
Declaring Zero or One Occurrences of an Element
<!ELEMENT element-name (child-name?)>
<!ELEMENT note (message?)>
•
The ? sign in the example above declares that the child element
"message" can occur zero or one time inside the "note" element.
•
Declaring either/or Content
<!ELEMENT note (to,from,header,(message|body))>
The example above declares that the "note" element must contain a "to" element,
a "from" element, a "header" element, and either a "message" or a "body"
element.
•
Declaring Mixed Content
– <!ELEMENT note (#PCDATA|to|from|header|message)*>
•
The example above declares that the "note" element can contain zero or
more occurrences of parsed character data, "to", "from", "header", or
"message" elements.
Bank DTD
<!DOCTYPE bank [
<!ELEMENT bank ( ( account | customer | depositor)+)>
<!ELEMENT account (account_number branch_name
balance)>
<! ELEMENT customer(customer_name customer_street
customer_city)>
<! ELEMENT depositor (customer_name account_number)>
<! ELEMENT account_number (#PCDATA)>
<! ELEMENT branch_name (#PCDATA)>
<! ELEMENT balance(#PCDATA)>
<! ELEMENT customer_name(#PCDATA)>
<! ELEMENT customer_street(#PCDATA)>
<! ELEMENT customer_city(#PCDATA)>
]>
Database System Concepts - 5th Edition, Aug 22, 2005.
10.56
©Silberschatz, Korth and Sudarshan
Declaring Attributes
• An attribute declaration has the following syntax:
<!ATTLIST element-name attribute-name attribute-type default-value>
• DTD example:
<!ATTLIST payment type CDATA "check">
• XML example:
<payment type="check" />
Attribute Specification in DTD
• Attribute specification : for each attribute
– Name
– Type of attribute
• CDATA
• ID (identifier) or IDREF (ID reference) or IDREFS (multiple IDREFs)
– more on this later
– Whether
• mandatory (#REQUIRED)
• has a default value (value),
• or neither (#IMPLIED)
• Examples
– <!ATTLIST account acct-type CDATA “checking”>
– <!ATTLIST customer
customer_id ID
# REQUIRED
accounts
IDREFS # REQUIRED >
Database System Concepts - 5th Edition, Aug 22, 2005.
10.58
©Silberschatz, Korth and Sudarshan
The attribute-type can be one of the following
Type
Description
•
CDATA
The value is character data
•
(en1|en2|..)
The value must be one from an enumerated list
•
ID
The value is a unique id
•
IDREF
The value is the id of another element
•
IDREFS
The value is a list of other ids
•
NMTOKEN
The value is a valid XML name
•
NMTOKENS
The value is a list of valid XML names
•
ENTITY
The value is an entity
•
ENTITIES
The value is a list of entities
•
NOTATION
The value is a name of a notation
•
xml:
The value is a predefined xml value
Attributes
•
The default-value can be one of the following:
Value
Explanation
Value
The default value of the attribute
#REQUIRED
The attribute is required
#IMPLIED
The attribute is not required
#FIXED value
The attribute value is fixed
•
DTD:
<!ELEMENT square EMPTY>
<!ATTLIST square width CDATA "0">
•
Valid XML:
<square width="100" />
In the example above, the "square" element is defined to be an empty
element with a "width" attribute of type CDATA. If no width is specified, it
has a default value of 0.
•
attributes
• #REQUIRED
• Syntax
• <!ATTLIST element-name attribute_name attribute-type
#REQUIRED>
• DTD:
<!ATTLIST person number CDATA #REQUIRED>
•
•
•
•
Valid XML:
<person number="5677" />
Invalid XML:
<person />
• Use the #REQUIRED keyword if you don't have an option for a
default value, but still want to force the attribute to be present.
attributes
•
•
•
•
#IMPLIED
<!ATTLIST element-name attribute-name attribute-type #IMPLIED>
DTD:
<!ATTLIST contact fax CDATA #IMPLIED>
Valid XML:
<contact fax="555-667788" />
•
Valid XML:
<contact />
•
Use the #IMPLIED keyword if you don't want to force the author to include
an attribute, and you don't have an option for a default value.
•
•
•
#FIXED
<!ATTLIST element-name attribute-name attribute-type #FIXED "value">
DTD:
<!ATTLIST sender company CDATA #FIXED "Microsoft">
•
Valid XML:
<sender company="Microsoft" />
•
•
•
Invalid XML:
<sender company="W3Schools" />
Use the #FIXED keyword when you want an attribute to have a fixed value
without allowing the author to change it. If an author includes another
value, the XML parser will return an error.
attributes
• Enumerated Attribute Values
• <!ATTLIST element-name attribute-name (en1|en2|..) defaultvalue>
• Example
• DTD:
• <!ATTLIST payment type (check|cash) "cash">
•
•
•
•
XML example:
<payment type="check" />
or
<payment type="cash" />
• Use enumerated attribute values when you want the attribute value
to be one of a fixed set of legal values.
IDs and IDREFs
• An element can have at most one attribute of type ID
• The ID attribute value of each element in an XML document must
be distinct
– Thus the ID attribute value is an object identifier
• An attribute of type IDREF must contain the ID value of an element
in the same document
• An attribute of type IDREFS contains a set of (0 or more) ID
values. Each ID value must contain the ID value of an element in
the same document
Database System Concepts - 5th Edition, Aug 22, 2005.
10.64
©Silberschatz, Korth and Sudarshan
Bank DTD with Attributes
• Bank DTD with ID and IDREF attribute types.
<!DOCTYPE bank-2[
]>
<!ELEMENT account (branch, balance)>
<!ATTLIST account
account_number ID
# REQUIRED
owners
IDREFS # REQUIRED>
<!ELEMENT customer(customer_name, customer_street,
customer_city)>
<!ATTLIST customer
customer_id
ID
# REQUIRED
accounts
IDREFS # REQUIRED>
… declarations for branch, balance, customer_name,
customer_street and customer_city
Database System Concepts - 5th Edition, Aug 22, 2005.
10.65
©Silberschatz, Korth and Sudarshan
XML data with ID and IDREF attributes
<bank-2>
<account account_number=“A-401” owners=“C100 C102”>
<branch_name> Downtown </branch_name>
<balance>
500 </balance>
</account>
<customer customer_id=“C100” accounts=“A-401”>
<customer_name>Joe
</customer_name>
<customer_street> Monroe </customer_street>
<customer_city> Madison</customer_city>
</customer>
<customer customer_id=“C102” accounts=“A-401 A-402”>
<customer_name> Mary </customer_name>
<customer_street> Erin
</customer_street>
<customer_city> Newark </customer_city>
</customer>
</bank-2>
Database System Concepts - 5th Edition, Aug 22, 2005.
10.66
©Silberschatz, Korth and Sudarshan
Example: TV Schedule DTD
•
By David Moisan. Copied from http://www.davidmoisan.org/
<!DOCTYPE TVSCHEDULE [
<!ELEMENT TVSCHEDULE (CHANNEL+)>
<!ELEMENT CHANNEL (BANNER,DAY+)>
<!ELEMENT BANNER (#PCDATA)>
<!ELEMENT DAY (DATE,(HOLIDAY|PROGRAMSLOT+)+)>
<!ELEMENT HOLIDAY (#PCDATA)>
<!ELEMENT DATE (#PCDATA)>
<!ELEMENT PROGRAMSLOT (TIME,TITLE,DESCRIPTION?)>
<!ELEMENT TIME (#PCDATA)>
<!ELEMENT TITLE (#PCDATA)>
<!ELEMENT DESCRIPTION (#PCDATA)>
<!ATTLIST TVSCHEDULE NAME CDATA #REQUIRED>
<!ATTLIST CHANNEL CHAN CDATA #REQUIRED>
<!ATTLIST PROGRAMSLOT VTR CDATA #IMPLIED>
<!ATTLIST TITLE RATING CDATA #IMPLIED>
<!ATTLIST TITLE LANGUAGE CDATA #IMPLIED>
]>
Newspaper Article DTD
•
Copied from http://www.vervet.com/
<!DOCTYPE NEWSPAPER [
<!ELEMENT NEWSPAPER (ARTICLE+)>
<!ELEMENT ARTICLE (HEADLINE,BYLINE,LEAD,BODY,NOTES)>
<!ELEMENT HEADLINE (#PCDATA)>
<!ELEMENT BYLINE (#PCDATA)>
<!ELEMENT LEAD (#PCDATA)>
<!ELEMENT BODY (#PCDATA)>
<!ELEMENT NOTES (#PCDATA)>
<!ATTLIST ARTICLE AUTHOR CDATA #REQUIRED>
<!ATTLIST ARTICLE EDITOR CDATA #IMPLIED>
<!ATTLIST ARTICLE DATE CDATA #IMPLIED>
<!ATTLIST ARTICLE EDITION CDATA #IMPLIED>
<!ENTITY NEWSPAPER "Vervet Logic Times">
<!ENTITY PUBLISHER "Vervet Logic Press">
<!ENTITY COPYRIGHT "Copyright 1998 Vervet Logic Press">
]>
Limitations of DTDs
• No typing of text elements and attributes
– All values are strings, no integers, reals, etc.
• Difficult to specify unordered sets of subelements
– Order is usually irrelevant in databases (unlike in the document-layout
environment from which XML evolved)
– (A | B)* allows specification of an unordered set, but
• Cannot ensure that each of A and B occurs only once
• IDs and IDREFs are untyped
– The owners attribute of an account may contain a reference to another
account, which is meaningless
• owners attribute should ideally be constrained to refer to customer
elements
Database System Concepts - 5th Edition, Aug 22, 2005.
10.69
©Silberschatz, Korth and Sudarshan
XML Schema
• XML Schema is a more sophisticated schema language which
addresses the drawbacks of DTDs. Supports
– Typing of values
• E.g. integer, string, etc
• Also, constraints on min/max values
– User-defined, complex types
– Many more features, including
• uniqueness and foreign key constraints, inheritance
• XML Schema is itself specified in XML syntax, unlike DTDs
– More-standard representation, but verbose
• XML Scheme is integrated with namespaces
• BUT: XML Schema is significantly more complicated than DTDs.
Database System Concepts - 5th Edition, Aug 22, 2005.
10.70
©Silberschatz, Korth and Sudarshan
Schemas
• XML Schema is an XML-based alternative to DTD.
• An XML schema describes the structure of an XML document.
• The XML Schema language is also referred to as XML Schema
Definition (XSD).
• The purpose of an XML Schema is to define the legal building
blocks of an XML document, just like a DTD.
• An XML Schema:
–
–
–
–
–
–
–
–
defines elements that can appear in a document
defines attributes that can appear in a document
defines which elements are child elements
defines the order of child elements
defines the number of child elements
defines whether an element is empty or can include text
defines data types for elements and attributes
defines default and fixed values for elements and attributes
An XML Schema
The following XML Schema file called "note.xsd" that defines the elements of the XML document
above ("note.xml"):
<?xml version="1.0"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
targetNamespace="http://www.w3schools.com"
xmlns="http://www.w3schools.com"
elementFormDefault="qualified">
<xs:element name="note">
<xs:complexType>
<xs:sequence>
<xs:element name="to" type="xs:string"/>
<xs:element name="from" type="xs:string"/>
<xs:element name="heading" type="xs:string"/>
<xs:element name="body" type="xs:string"/>
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:schema>
A Reference to an XML Schema
• This XML document has a reference to an XML Schema:
<?xml version="1.0"?>
<note
xmlns="http://www.w3schools.com"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.w3schools.com note.xsd">
<to>Tove</to>
<from>Jani</from>
<heading>Reminder</heading>
<body>Don't forget me this weekend!</body>
</note>
Simple elements
• XML Schemas define the elements of your XML files.
• A simple element is an XML element that contains only text. It
cannot contain any other elements or attributes.
• A simple element is an XML element that can contain only text. It
cannot contain any other elements or attributes.
• However, the "only text" restriction is quite misleading. The text
can be of many different types. It can be one of the types included
in the XML Schema definition (boolean, string, date, etc.), or it can
be a custom type that you can define yourself.
• You can also add restrictions (facets) to a data type in order to limit
its content, or you can require the data to match a specific pattern.
Defining a Simple Element
•
The syntax for defining a simple element is:
<xs:element name="xxx" type="yyy"/>
•
•
where xxx is the name of the element and yyy is the data type of the
element.
XML Schema has a lot of built-in data types. The most common types are:
xs:string
xs:decimal
xs:integer
xs:boolean
xs:date
xs:time
Example
• Here are some XML elements:
<lastname>Refsnes</lastname>
<age>36</age>
<dateborn>1970-03-27</dateborn>
• And here are the corresponding simple element definitions:
<xs:element name="lastname" type="xs:string"/>
<xs:element name="age" type="xs:integer"/>
<xs:element name="dateborn" type="xs:date"/>
Default and Fixed Values for Simple Elements
• Simple elements may have a default value OR a fixed value
specified.
• A default value is automatically assigned to the element when no
other value is specified.
• In the following example the default value is "red":
<xs:element name="color" type="xs:string" default="red"/>
• A fixed value is also automatically assigned to the element, and
you cannot specify another value.
• In the following example the fixed value is "red":
<xs:element name="color" type="xs:string" fixed="red"/>
Attributes
• Simple elements cannot have attributes. If an element has
attributes, it is considered to be of a complex type. But the attribute
itself is always declared as a simple type.
• The syntax for defining an attribute is:
<xs:attribute name="xxx" type="yyy"/>
where xxx is the name of the attribute and yyy specifies the data type of
the attribute.
• The most common types are:
xs:string
xs:decimal
xs:integer
xs:boolean
xs:date
xs:time
Attributes (cont.)
• Here is an XML element with an attribute:
<lastname lang="EN">Smith</lastname>
• And here is the corresponding attribute definition:
<xs:attribute name="lang" type="xs:string"/>
• Default and fixed like with elements
Restrictions (facets)
•
•
•
Restrictions are used to define acceptable values for XML elements or
attributes. Restrictions on XML elements are called facets.
Restrictions on Values
The following example defines an element called "age" with a restriction.
The value of age cannot be lower than 0 or greater than 120:
<xs:element name="age">
<xs:simpleType>
<xs:restriction base="xs:integer">
<xs:minInclusive value="0"/>
<xs:maxInclusive value="120"/>
</xs:restriction>
</xs:simpleType>
</xs:element>
Restrictions on a Set of Values
•
To limit the content of an XML element to a set of acceptable values, we would use
the enumeration constraint.
•
The example below defines an element called "car" with a restriction. The only
acceptable values are: Audi, Golf, BMW:
<xs:element name="car">
<xs:simpleType>
<xs:restriction base="xs:string">
<xs:enumeration value="Audi"/>
<xs:enumeration value="Golf"/>
<xs:enumeration value="BMW"/>
</xs:restriction>
</xs:simpleType>
</xs:element>
•
Many Restrictions can be settled in data types (lenght, values, set of
values)
What is a Complex Element?
• A complex element is an XML element that contains other
elements and/or attributes.
• There are four kinds of complex elements:
–
–
–
–
empty elements
elements that contain only other elements
elements that contain only text
elements that contain both other elements and text
• Note: Each of these elements may contain attributes as well!
How to Define a Complex Element
•
Look at this complex XML element, "employee", which contains only other
elements:
<employee>
<firstname>John</firstname>
<lastname>Smith</lastname>
</employee>
• We can define a complex element in an XML Schema two different ways:
1. The "employee" element can be declared directly by naming the element, like this:
<xs:element name="employee">
<xs:complexType>
<xs:sequence>
<xs:element name="firstname" type="xs:string"/>
<xs:element name="lastname" type="xs:string"/>
</xs:sequence>
</xs:complexType>
</xs:element>
Complex element definition
• The "employee" element can have a type attribute that refers to the
name of the complex type to use:
<xs:element name="employee" type="personinfo"/>
<xs:complexType name="personinfo">
<xs:sequence>
<xs:element name="firstname" type="xs:string"/>
<xs:element name="lastname" type="xs:string"/>
</xs:sequence>
</xs:complexType>
Complex elements
• If you use the method described above, several elements can refer
to the same complex type, like this:
<xs:element name="employee" type="personinfo"/>
<xs:element name="student" type="personinfo"/>
<xs:element name="member" type="personinfo"/>
<xs:complexType name="personinfo">
<xs:sequence>
<xs:element name="firstname" type="xs:string"/>
<xs:element name="lastname" type="xs:string"/>
</xs:sequence>
</xs:complexType>
More features of XML Schema
• Attributes specified by xs:attribute tag:
– <xs:attribute name = “account_number”/>
– adding the attribute use = “required” means value must be specified
• Key constraint: “account numbers form a key for account elements
under the root bank element:
<xs:key name = “accountKey”>
<xs:selector xpath = “]bank/account”/>
<xs:field xpath = “account_number”/>
<\xs:key>
• Foreign key constraint from depositor to account:
<xs:keyref name = “depositorAccountKey”
refer=“accountKey”>
<xs:selector xpath = “]bank/account”/>
<xs:field xpath = “account_number”/>
<\xs:keyref>
Database System Concepts - 5th Edition, Aug 22, 2005.
10.91
©Silberschatz, Korth and Sudarshan
An XSD example
<?xml version="1.0" encoding="ISO-8859-1"?>
<shiporder orderid="889923"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:noNamespaceSchemaLocation="shiporder.xsd">
<orderperson>John Smith</orderperson>
<shipto>
<name>Ola Nordmann</name>
<address>Langgt 23</address>
<city>4000 Stavanger</city>
<country>Norway</country>
</shipto>
<item>
<title>Empire Burlesque</title>
<note>Special Edition</note>
<quantity>1</quantity>
<price>10.90</price>
</item>
<item>
<title>Hide your heart</title>
<quantity>1</quantity>
<price>9.90</price>
</item>
</shiporder>
Create an XML Schema of the example
•
Now we want to create a schema for the XML document in the slide
•
We start by opening a new file that we will call "shiporder.xsd". To create
the schema we could simply follow the structure in the XML document and
define each element as we find it. We will start with the standard XML
declaration followed by the xs:schema element that defines a schema:
<?xml version="1.0" encoding="ISO-8859-1" ?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
...
...
•
</xs:schema>
In the schema above we use the standard namespace (xs), and the URI
associated with this namespace is the Schema language definition, which
has the standard value of http://www.w3.org/2001/XMLSchema.
Example cont.
• Next, we have to define the "shiporder" element. This element has
an attribute and it contains other elements, therefore we consider it
as a complex type. The child elements of the "shiporder" element
is surrounded by a xs:sequence element that defines an ordered
sequence of sub elements:
<xs:element name="shiporder">
<xs:complexType>
<xs:sequence>
...
...
</xs:sequence>
...
</xs:complexType>
</xs:element>
Example (cont.)
•
Then we have to define the "orderperson" element as a simple type
(because it does not contain any attributes or other elements). The type
(xs:string) is prefixed with the namespace prefix associated with XML
Schema that indicates a predefined schema data type:
<xs:element name="orderperson" type="xs:string"/>
•
Next, we have to define two elements that are of the complex type:
"shipto" and "item". We start by defining the "shipto" element:
<xs:element name="shipto">
<xs:complexType>
<xs:sequence>
<xs:element name="name" type="xs:string"/>
<xs:element name="address" type="xs:string"/>
<xs:element name="city" type="xs:string"/>
<xs:element name="country" type="xs:string"/>
</xs:sequence>
</xs:complexType>
</xs:element>
example
•
Now we can define the "item" element. This element can appear multiple
times inside a "shiporder" element. This is specified by setting the
maxOccurs attribute of the "item" element to "unbounded" which means
that there can be as many occurrences of the "item" element as the author
wishes. Notice that the "note" element is optional. We have specified this
by setting the minOccurs attribute to zero:
<xs:element name="item" maxOccurs="unbounded">
<xs:complexType>
<xs:sequence>
<xs:element name="title" type="xs:string"/>
<xs:element name="note" type="xs:string" minOccurs="0"/>
<xs:element name="quantity" type="xs:positiveInteger"/>
<xs:element name="price" type="xs:decimal"/>
</xs:sequence>
</xs:complexType>
</xs:element>
example
• We can now declare the attribute of the "shiporder" element. Since
this is a required attribute we specify use="required".
• Note: The attribute declarations must always come last:
• <xs:attribute name="orderid" type="xs:string" use="required"/>
example
the complete listing of the schema file called "shiporder.xsd":
<?xml version="1.0" encoding="ISO-8859-1" ?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:element name="shiporder">
<xs:complexType>
<xs:sequence>
<xs:element name="orderperson" type="xs:string"/>
<xs:element name="shipto">
<xs:complexType>
<xs:sequence>
<xs:element name="name" type="xs:string"/>
<xs:element name="address" type="xs:string"/>
<xs:element name="city" type="xs:string"/>
<xs:element name="country" type="xs:string"/>
</xs:sequence>
</xs:complexType>
</xs:element>
Example (cont)
<xs:element name="item" maxOccurs="unbounded">
<xs:complexType>
<xs:sequence>
<xs:element name="title" type="xs:string"/>
<xs:element name="note" type="xs:string" minOccurs="0"/>
<xs:element name="quantity" type="xs:positiveInteger"/>
<xs:element name="price" type="xs:decimal"/>
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:sequence>
<xs:attribute name="orderid" type="xs:string" use="required"/>
</xs:complexType>
</xs:element>
</xs:schema>
Divide the Schema
•
The previous design method is very simple, but can be difficult to read and maintain
when documents are complex. The next design method is based on defining all
elements and attributes first, and then referring to them using the ref attribute.
<?xml version="1.0" encoding="ISO-8859-1" ?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<!-- definition of simple elements -->
<xs:element name="orderperson" type="xs:string"/>
<xs:element name="name" type="xs:string"/>
<xs:element name="address" type="xs:string"/>
<xs:element name="city" type="xs:string"/>
<xs:element name="country" type="xs:string"/>
<xs:element name="title" type="xs:string"/>
<xs:element name="note" type="xs:string"/>
<xs:element name="quantity" type="xs:positiveInteger"/>
<xs:element name="price" type="xs:decimal"/>
<!-- definition of attributes -->
<xs:attribute name="orderid" type="xs:string"/>
Divide the Schema
<!-- definition of complex elements -->
<xs:element name="shipto">
<xs:complexType>
<xs:sequence>
<xs:element ref="name"/>
<xs:element ref="address"/>
<xs:element ref="city"/>
<xs:element ref="country"/>
</xs:sequence>
</xs:complexType>
</xs:element>
<xs:element name="item">
<xs:complexType>
<xs:sequence>
<xs:element ref="title"/>
<xs:element ref="note" minOccurs="0"/>
<xs:element ref="quantity"/>
<xs:element ref="price"/>
</xs:sequence>
</xs:complexType>
</xs:element>
…
…
Querying and transforming
Querying and Transforming XML Data
•
•
•
•
Translation of information from one XML schema to another
Querying on XML data
Above two are closely related, and handled by the same tools
Standard XML querying/translation languages
– XPath
• Simple language consisting of path expressions
– XSLT
• Simple language designed for translation from XML to XML and
XML to HTML
– XQuery
• An XML query language with a rich set of features
Database System Concepts - 5th Edition, Aug 22, 2005.
10.103
©Silberschatz, Korth and Sudarshan
Tree Model of XML Data
• Query and transformation languages are based on a tree model of
XML data
• An XML document is modeled as a tree, with nodes corresponding
to elements and attributes
– Element nodes have child nodes, which can be attributes or
subelements
– Text in an element is modeled as a text node child of the element
– Children of a node are ordered according to their order in the XML
document
– Element and attribute nodes (except for the root node) have a single
parent, which is an element node
– The root node has a single child, which is the root element of the
document
Database System Concepts - 5th Edition, Aug 22, 2005.
10.104
©Silberschatz, Korth and Sudarshan
XPath
• XPath is used to address (select) parts of documents using
path expressions
• A path expression is a sequence of steps separated by “/”
– Think of file names in a directory hierarchy
• Result of path expression: set of values that along with their
containing elements/attributes match the specified path
• E.g.
/bank-2/customer/customer_name evaluated on the
bank-2 data we saw earlier returns
<customer_name>Joe</customer_name>
<customer_name>Mary</customer_name>
• E.g.
/bank-2/customer/customer_name/text( )
returns the same names, but without the enclosing tags
Database System Concepts - 5th Edition, Aug 22, 2005.
10.105
©Silberschatz, Korth and Sudarshan
What is XPath?
•
•
•
•
•
•
XPath is a syntax for defining parts of an XML document
XPath uses path expressions to navigate in XML documents
XPath contains a library of standard functions
XPath is a major element in XSLT
XPath is a W3C Standard
XPath Path Expressions
– XPath uses path expressions to select nodes or node-sets in an XML
document. These path expressions look very much like the
expressions you see when you work with a traditional computer file
system.
• XPath Standard Functions
– XPath includes over 100 built-in functions. There are functions for
string values, numeric values, date and time comparison, node and
QName manipulation, sequence manipulation, Boolean values, and
more.
XPath Terminology
• Nodes: In XPath, there are seven kinds of nodes: element,
attribute, text, namespace, processing-instruction, comment,
and document (root) nodes. XML documents are treated as
trees of nodes.
• The root of the tree is called the document node (or root
node).
• Look at the following XML document:
<?xml version="1.0" encoding="ISO-8859-1"?>
<bookstore>
<book>
<title lang="en">Harry Potter</title>
<author>J K. Rowling</author>
<year>2005</year>
<price>29.99</price>
</book>
</bookstore>
Example of nodes in the XML document above
•
<bookstore> (document node)
•
<author>J K. Rowling</author> (element node)
•
lang="en" (attribute node)
•
Atomic values
•
Atomic values are nodes with no children or parent.
•
Example of atomic values:
•
J K. Rowling
•
"en"
Relationship of Nodes
• Parent
• Each element and attribute has one parent.
• In the following example; the book element is the parent of the title,
author, year, and price:
<book>
<title>Harry Potter</title>
<author>J K. Rowling</author>
<year>2005</year>
<price>29.99</price>
</book>
Relationship of Nodes
• Children
• Element nodes may have zero, one or more children.
• In the following example; the title, author, year, and price elements
are all children of the book element:
<book>
<title>Harry Potter</title>
<author>J K. Rowling</author>
<year>2005</year>
<price>29.99</price>
</book>
Relationship of Nodes
•
Siblings
– Nodes that have the same parent.
– In the following example; the title, author, year, and price elements are all siblings:
<book>
<title>Harry Potter</title>
<author>J K. Rowling</author>
<year>2005</year>
<price>29.99</price>
</book>
•
Ancestors
–
–
A node's parent, parent's parent, etc.
In the following example; the ancestors of the title element are the book element and the
bookstore element:
<bookstore>
<book>
<title>Harry Potter</title>
<author>J K. Rowling</author>
<year>2005</year>
<price>29.99</price>
</book>
</bookstore>
Relationship of Nodes
•
Descendants
•
A node's children, children's children, etc.
•
In the following example; descendants of the bookstore element are the
book, title, author, year, and price elements:
<bookstore>
<book>
<title>Harry Potter</title>
<author>J K. Rowling</author>
<year>2005</year>
<price>29.99</price>
</book>
</bookstore>
Xpath syntax
•
The XML Example Document
•
We will use the following XML document in the examples below.
<?xml version="1.0" encoding="ISO-8859-1"?>
<bookstore>
<book>
<title lang="eng">Harry Potter</title>
<price>29.99</price>
</book>
<book>
<title lang="eng">Learning XML</title>
<price>39.95</price>
</book>
</bookstore>
Selecting Nodes
• XPath uses path expressions to select nodes in an XML document.
The node is selected by following a path or steps. The most useful
path expressions are listed below:
Expression
Description
nodename
Selects all child nodes of the node
/
Selects from the root node
//
Selects nodes in the document from the
current node that match the selection no matter
where they are
.
Selects the current node
..
Selects the parent of the current node
@
Selects attributes
Selecting nodes examples
Path Expression
• bookstore
• /bookstore
• bookstore/book
Result
Selects all the child nodes of the bookstore
element
Selects the root element bookstore
Selects all book elements that are children of
bookstore
• //book
Selects all book elements no matter where
they are in the document
• bookstore//book
Selects all book elements that are
descendant of the bookstore element, no
matter where they are under the bookstore element
• //@lang
Selects all attributes that are named lang
XPath (Cont.)
• The initial “/” denotes root of the document (above the top-level
tag)
• Path expressions are evaluated left to right
– Each step operates on the set of instances produced by the previous
step
• Selection predicates may follow any step in a path, in [ ]
– E.g. /bank-2/account[balance > 400]
• returns account elements with a balance value greater than 400
• /bank-2/account[balance] returns account elements containing a
balance subelement
• Attributes are accessed using “@”
– E.g. /bank-2/account[balance > 400]/@account_number
• returns the account numbers of accounts with balance > 400
– IDREF attributes are not dereferenced automatically (more on this
later)
Database System Concepts - 5th Edition, Aug 22, 2005.
10.116
©Silberschatz, Korth and Sudarshan
Functions in XPath
• XPath provides several functions
– The function count() at the end of a path counts the number of
elements in the set generated by the path
• E.g. /bank-2/account[count(./customer) > 2]
– Returns accounts with > 2 customers
– Also function for testing position (1, 2, ..) of node w.r.t. siblings
• Boolean connectives and and or and function not() can be used in
predicates
• IDREFs can be referenced using function id()
– id() can also be applied to sets of references such as IDREFS and
even to strings containing multiple references separated by blanks
– E.g. /bank-2/account/id(@owner)
• returns all customers referred to from the owners attribute of
account elements.
Database System Concepts - 5th Edition, Aug 22, 2005.
10.117
©Silberschatz, Korth and Sudarshan
More XPath Features
• Operator “|” used to implement union
– E.g. /bank-2/account/id(@owner) | /bank-2/loan/id(@borrower)
• Gives customers with either accounts or loans
• However, “|” cannot be nested inside other operators.
• “//” can be used to skip multiple levels of nodes
– E.g. /bank-2//customer_name
• finds any customer_name element anywhere under the
/bank-2 element, regardless of the element in which it is
contained.
• A step in the path can go to parents, siblings, ancestors and
descendants of the nodes generated by the previous step,
not just to the children
– “//”, described above, is a short from for specifying “all
descendants”
– “..” specifies the parent.
• doc(name) returns the root of a named document
Database System Concepts - 5th Edition, Aug 22, 2005.
10.118
©Silberschatz, Korth and Sudarshan
What is XQuery?
•
•
•
•
XQuery is the language for querying XML data
XQuery for XML is like SQL for databases
XQuery is built on XPath expressions
XQuery is supported by all the major database engines
(IBM, Oracle, Microsoft, etc.)
• XQuery is a W3C Recommendation
• XQuery can be used to:
– Extract information to use in a Web Service
– Generate summary reports
– Transform XML data to XHTML
– Search Web documents for relevant information
XQuery
• XQuery is a general purpose query language for XML data
• Currently being standardized by the World Wide Web Consortium
(W3C)
– The textbook description is based on a January 2005 draft of the
standard. The final version may differ, but major features likely to stay
unchanged.
• XQuery is derived from the Quilt query language, which itself
borrows from SQL, XQL and XML-QL
• XQuery uses a
for … let … where … order by …result … (FLOWR)
• syntax
for
 SQL from
where  SQL where
order by  SQL order by
result  SQL select
let allows temporary variables, and has no equivalent in SQL
Database System Concepts - 5th Edition, Aug 22, 2005.
10.120
©Silberschatz, Korth and Sudarshan
FLWOR Syntax in XQuery
• For clause uses XPath expressions, and variable in for clause
ranges over values in the set returned by XPath
• Simple FLWOR expression in XQuery
– find all accounts with balance > 400, with each result enclosed in an
<account_number> .. </account_number> tag
for
$x in /bank-2/account
let
$acctno := $x/@account_number
where $x/balance > 400
return <account_number> { $acctno }
</account_number>
– Items in the return clause are XML text unless enclosed in {}, in which
case they are evaluated
Database System Concepts - 5th Edition, Aug 22, 2005.
10.121
©Silberschatz, Korth and Sudarshan
Flowr expression
• Let clause not really needed in this query, and selection
can be done In XPath. Query can be written as:
for $x in /bank-2/account[balance>400]
return <account_number> { $x/@account_number }
</account_number>
Joins
• Joins are specified in a manner very similar to SQL
for $a in /bank/account,
$c in /bank/customer,
$d in /bank/depositor
where $a/account_number = $d/account_number
and $c/customer_name = $d/customer_name
return <cust_acct> { $c $a } </cust_acct>
• The same query can be expressed with the selections specified
as XPath selections:
for $a in /bank/account
$c in /bank/customer
$d in /bank/depositor[
account_number = $a/account_number and
customer_name = $c/customer_name]
return <cust_acct> { $c $a } </cust_acct>
Database System Concepts - 5th Edition, Aug 22, 2005.
10.123
©Silberschatz, Korth and Sudarshan
Nested Queries
• The following query converts data from the flat structure for
bank information into the nested structure used in bank-1
<bank-1> {
for $c in /bank/customer
return
<customer>
{ $c/* }
{ for $d in /bank/depositor[customer_name =
$c/customer_name],
$a in
/bank/account[account_number=$d/account_number]
return $a }
</customer>
} </bank-1>
•
•
$c/* denotes all the children of the node to which $c is bound, without
the enclosing top-level tag
$c/text() gives text content of an element without any subelements /
tags
Database System Concepts - 5th Edition, Aug 22, 2005.
10.124
©Silberschatz, Korth and Sudarshan
Sorting in XQuery
•
•
•
The order by clause can be used at the end of any expression. E.g. to return
customers sorted by name
for $c in /bank/customer
order by $c/customer_name
return <customer> { $c/* } </customer>
Use order by $c/customer_name to sort in descending order
Can sort at multiple levels of nesting (sort by customer_name, and by
account_number within each customer)
<bank-1> {
for $c in /bank/customer
order by $c/customer_name
return
<customer>
{ $c/* }
{ for $d in /bank/depositor[customer_name=$c/customer_name],
$a in
/bank/account[account_number=$d/account_number] }
order by $a/account_number
return <account> $a/* </account>
</customer>
} </bank-1>
Database System Concepts - 5th Edition, Aug 22, 2005.
10.125
©Silberschatz, Korth and Sudarshan
Functions and Other XQuery Features
• User defined functions with the type system of XMLSchema
function balances(xs:string $c) returns list(xs:decimal*) {
for $d in /bank/depositor[customer_name = $c],
$a in /bank/account[account_number = $d/account_number]
return $a/balance
}
• Types are optional for function parameters and return values
• The * (as in decimal*) indicates a sequence of values of that type
• Universal and existential quantification in where clause predicates
– some $e in path satisfies P
– every $e in path satisfies P
• XQuery also supports If-then-else clauses
Database System Concepts - 5th Edition, Aug 22, 2005.
10.126
©Silberschatz, Korth and Sudarshan
XQuery Conditional Expressions
• "If-Then-Else" expressions are allowed in XQuery.
• Look at the following example:
for $x in doc("books.xml")/bookstore/book
return
if ($x/@category="CHILDREN")
then <child>{data($x/title)}</child>
else <adult>{data($x/title)}</adult>
XQuery Conditional Expressions
• The result of the example above will be:
<adult>Everyday Italian</adult>
<child>Harry Potter</child>
<adult>Learning XML</adult>
<adult>XQuery Kick Start</adult>
Present the Result In an HTML List
• Look at the following XQuery FLWOR expression:
for $x in doc("books.xml")/bookstore/book/title
order by $x
return $x
• The expression above will select all the title elements under the
book elements that are under the bookstore element, and return
the title elements in alphabetical order.
Present the Result In an HTML List
•
Now we want to list all the book-titles in our bookstore in an HTML list. We
add <ul> and <li> tags to the FLWOR expression:
<ul>
{
for $x in doc("books.xml")/bookstore/book/title
order by $x
return <li>{$x}</li>
}
</ul>
•
The result of the above will be:
<ul>
<li><title
<li><title
<li><title
<li><title
</ul>
lang="en">Everyday Italian</title></li>
lang="en">Harry Potter</title></li>
lang="en">Learning XML</title></li>
lang="en">XQuery Kick Start</title></li>
Presenting results as HTML
•
Now we want to eliminate the title element, and show only the data inside
the title element:
<ul>
{
for $x in doc("books.xml")/bookstore/book/title
order by $x
return <li>{data($x)}</li>
}
</ul>
•
The result will be (an HTML list):
<ul>
<li>Everyday Italian</li>
<li>Harry Potter</li>
<li>Learning XML</li>
<li>XQuery Kick Start</li>
</ul>
Accesing and storing XML
Application Program Interface
• There are two standard application program interfaces to XML
data:
– SAX (Simple API for XML)
• Based on parser model, user provides event handlers for parsing
events
– E.g. start of element, end of element
– Not suitable for database applications
– DOM (Document Object Model)
• XML data is parsed into a tree representation
• Variety of functions provided for traversing the DOM tree
• E.g.: Java DOM API provides Node class with methods
getParentNode( ), getFirstChild( ), getNextSibling( )
getAttribute( ), getData( ) (for text node)
getElementsByTagName( ), …
• Also provides functions for updating DOM tree
Database System Concepts - 5th Edition, Aug 22, 2005.
10.140
©Silberschatz, Korth and Sudarshan
Storage of XML Data
• XML data can be stored in
– Non-relational data stores
• Flat files
– Natural for storing XML
– But has all problems of files (no concurrency, no
recovery, …)
• XML database
– Database built specifically for storing XML data,
supporting DOM model and declarative querying
– Currently no commercial-grade systems
– Relational databases
• Data must be translated into relational form
• Advantage: mature database systems
• Disadvantages: overhead of translating data and queries
Database System Concepts - 5th Edition, Aug 22, 2005.
10.141
©Silberschatz, Korth and Sudarshan
Storage of XML in Relational Databases
• Alternatives:
– String Representation
– Tree Representation
– Map to relations
Database System Concepts - 5th Edition, Aug 22, 2005.
10.142
©Silberschatz, Korth and Sudarshan
String Representation
• Store each top level element as a string field of a tuple in a
relational database
– Use a single relation to store all elements, or
– Use a separate relation for each top-level element type
• E.g. account, customer, depositor relations
– Each with a string-valued attribute to store the
element
• Indexing:
– Store values of subelements/attributes to be indexed as extra fields
of the relation, and build indices on these fields
• E.g. customer_name or account_number
– Some database systems support function indices, which use the
result of a function as the key value.
• The function should return the value of the required
subelement/attribute
Database System Concepts - 5th Edition, Aug 22, 2005.
10.143
©Silberschatz, Korth and Sudarshan
String Representation (Cont.)
• Benefits:
– Can store any XML data even without DTD
– As long as there are many top-level elements in a document, strings
are small compared to full document
• Allows fast access to individual elements.
• Drawback: Need to parse strings to access values inside the
elements
– Parsing is slow.
Database System Concepts - 5th Edition, Aug 22, 2005.
10.144
©Silberschatz, Korth and Sudarshan
Tree Representation
• Tree representation: model XML data as tree and store using
relations
nodes(id, type, label, value)
bank (id:1)
child (child_id, parent_id)
customer (id:2)
customer_name
(id: 3)
•
•
•
•
•
account (id: 5)
account_number
(id: 7)
Each element/attribute is given a unique identifier
Type indicates element/attribute
Label specifies the tag name of the element/name of attribute
Value is the text value of the element/attribute
The relation child notes the parent-child relationships in the tree
– Can add an extra attribute to child to record ordering of children
Database System Concepts - 5th Edition, Aug 22, 2005.
10.145
©Silberschatz, Korth and Sudarshan
Tree Representation (Cont.)
• Benefit: Can store any XML data, even without DTD
• Drawbacks:
– Data is broken up into too many pieces, increasing space overheads
– Even simple queries require a large number of joins, which can be
slow
Database System Concepts - 5th Edition, Aug 22, 2005.
10.146
©Silberschatz, Korth and Sudarshan
Mapping XML Data to Relations
• Relation created for each element type whose schema is known:
– An id attribute to store a unique id for each element
– A relation attribute corresponding to each element attribute
– A parent_id attribute to keep track of parent element
• As in the tree representation
• Position information (ith child) can be store too
• All subelements that occur only once can become relation
attributes
– For text-valued subelements, store the text as attribute value
– For complex subelements, can store the id of the subelement
• Subelements that can occur multiple times represented in a
separate table
– Similar to handling of multivalued attributes when converting ER
diagrams to tables
Database System Concepts - 5th Edition, Aug 22, 2005.
10.147
©Silberschatz, Korth and Sudarshan
Storing XML Data in Relational Systems
• Publishing: process of converting relational data to an XML format
• Shredding: process of converting an XML document into a set of
tuples to be inserted into one or more relations
• XML-enabled database systems support automated publishing and
shredding
• Some systems offer native storage of XML data using the xml data
type. Special internal data structures and indices are used for
efficiency
Database System Concepts - 5th Edition, Aug 22, 2005.
10.148
©Silberschatz, Korth and Sudarshan
SQL/XML
• New standard SQL extension that allows creation of nested XML
output
– Each output tuple is mapped to an XML element row
<bank>
<account>
<row>
<account_number> A-101 </account_number>
<branch_name> Downtown </branch_name>
<balance> 500 </balance>
</row>
…. more rows if there are more output tuples …
</account>
</bank>
Database System Concepts - 5th Edition, Aug 22, 2005.
10.149
©Silberschatz, Korth and Sudarshan
SQL Extensions
• xmlelement creates XML elements
• xmlattributes creates attributes
select xmlelement (name “account,
xmlattributes (account_number as account_number),
xmlelement (name “branch_name”, branch_name),
xmlelement (name “balance”, balance))
from account
Database System Concepts - 5th Edition, Aug 22, 2005.
10.150
©Silberschatz, Korth and Sudarshan
Ernestina Menasalvas
Facultad de Informática.
Universidad Politécnica de Madrid
[email protected]