Download Using SQL Queries to Generate XML

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Database wikipedia , lookup

Extensible Storage Engine wikipedia , lookup

Microsoft SQL Server wikipedia , lookup

SQL wikipedia , lookup

Relational algebra wikipedia , lookup

Functional Database Model wikipedia , lookup

Entity–attribute–value model wikipedia , lookup

Clusterpoint wikipedia , lookup

Relational model wikipedia , lookup

Database model wikipedia , lookup

Transcript
Using SQL Queries to Generate XML-Formatted Data
Joline Morrison
Mike Morrison
Department of Computer Science
University of Wisconsin-Eau Claire
Eau Claire, WI 54702
[email protected]
[email protected]
Abstract
XML provides a convenient and portable way to store text data in a structured
format. The XQuery portion of the ISO SQL-2003 standards specify requirements for
forming SQL queries that retrieve XML-formatted data. Developers can then place this
formatted data in XML documents and share it across diverse applications and hardware
and/or software platforms. In this paper, we demonstrate how the Oracle and SQL Server
database management systems provide functions to generate XML-formatted data within
SQL queries. We address the problem of capturing data relationships within XML
structures, and discuss the differences between the XML implementations in the two
database systems. We also provide ideas for related student projects.
Introduction
eXtensible Markup Language (XML) uses tags and attributes to structure data. XML
allows developers to create custom tags to define data values and their associated
relationships and properties in a text format. Over time, XML-formatted documents have
become the de facto standard for sharing data across applications and across diverse
hardware and software platforms. XML documents have also become an accepted means
of storing program configuration information.
SQL-2003 ISO specifies standards for forming queries that retrieve XML-formatted data.
In this paper, we present an overview of XML documents, and show how XML
documents capture data relationships and semantics. We also show how to form SQL
queries to retrieve XML-formatted data using both Oracle and SQL Server. Finally, we
provide ideas for student labs and projects that use XML-formatted data and queries.
XML Overview
The Standard Generalized Markup Language (SGML) is a high level language for
defining markup languages that was developed in the 1960's and 1970's. A markup
language is a language that uses tags or other symbols that are embedded in a document,
and defines how the document content appears or acts. HTML is a markup language that
uses embedded tags to control the appearance and functions of Web pages.
Early developers used SGML to create document type definitions (DTDs) to define
standard formatting notations. For example, the DTD for HTML specifies that characters
enclosed in <b> tags appear in a boldface font, and that <form> tags enclose HTML form
input elements.
XML is a subset of SGML that defines structured data using markup language notation.
Structured data is data that is defined in a specific and unambiguous format, such as the
data that a relational database table stores. As with HTML, XML uses tags and
attributes. However, while the HTML DTD specifies the exact meaning of each tag and
attribute, XML allows developers to create custom tags to define data items and
relationships. You can use XML documents to store relational database data in text files,
and then share these files with other systems that don’t know anything about your
relational database, but do understand XML.
Figure 1 shows an example XML-formatted data file, and Figure 2 shows the associated
entity relationship model that illustrates the data attributes and relationships. Figure 1
shows data for BOOK and AUTHOR entities, and illustrates how data values can be
represented either as XML elements or as element attribute name/value pairs. It also
shows how a data element such as <book> can contain the aggregated element <author>,
which contains sub-elements.
2
<?xml version="1.0" encoding="UTF-8" ?>
<books>
<book isbn="99999-99999">
<title>CS 365: A Visual History</title> Element value
<authors>
<author id="100">
<firstname>Tom</firstname>
Aggregated data
<lastname>Moore</lastname>
element
</author>
Attribute value
<author id="101">
<firstname>Leonard</firstname>
<lastname>Larsen</lastname>
</author>
</authors>
<publisher>UWEC-CS Press</publisher>
<publishyear>2000</publishyear>
<price type="USD">10.00</price>
</book>
<book isbn="99999-99998">
<title>Teaching Java to Non-Majors: They Can Do It
When I Type the Code</title>
<authors>
<author id="109">
<firstname>Mike</firstname>
<lastname>Morrison</lastname>
</author>
</authors>
<publisher>UWEC-CS Press</publisher>
<publishyear>2006</publishyear>
<price type="USD">7.00</price>
</book>
</books>
Figure 1: XML-formatted data
AUTHOR
BOOK
PK
isbn
title
publisher
publishyear
price
PK
authorid
firstname
lastname
Figure 2: ER model showing data relationships
The ER model representation shows a many-to-many relationship (a book can be written
by multiple authors, and an author can write multiple books). Note that the XML
representation is limited to a hierarchical one-to-many relationship: a book can have
multiple authors, but in the XML document, the data for each author must be repeated for
each book.
3
The XML document’s underlying structure and semantics are captured in its XML
Schema Definition (XSD), which defines the document in terms of element names, data
types, relationships, order and grouping mechanisms, and constraints. Developers use
XSDs to validate XML documents, or to describe an existing relational database structure
and then migrate its data to an XML document. The XSD stores all of the database
structure information from the original database while the XML document stores the
actual data. Figure 3 shows the XSD for the XML file in Figure 1.
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:element name="books">
<xs:complexType>
<xs:sequence>
<xs:element name="book" maxOccurs="unbounded">
<xs:complexType>
<xs:sequence>
<xs:element name="isbn" maxOccurs="1" type="xs:string" />
<xs:element name="title" maxOccurs="1" type="xs:string" />
<xs:complexType>
<xs:sequence>
<xs:element name="authors">
<xs:complexType>
<xs:sequence>
<xs:element name="author" maxOccurs="unbounded">
<xs:complexType>
<xs:sequence>
<xs:element name="firstname" maxOccurs="1" type="xs:string" />
<xs:element name="lastname" maxOccurs="1" type="xs:string" />
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:sequence>
<xs:element name="publisher" maxOccurs="1" type="xs:string" />
<xs:element name="publishyear" maxOccurs="1" type="xs:dateTime" />
<xs:element name="price" maxOccurs="1" type="xs:long" />
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:schema>
Figure 3: XML schema definition (XSD)
The schema element contains type definitions (simpleType and complexType elements),
and associated attribute and element declarations. It also specifies relationship
cardinalities using the minOccurs and maxOccurs attributes, and attribute data types
(integer, string, dateTime, and so forth).
4
Creating Database Queries To Retrieve XML-Formatted Data
The ISO-SQL 2003 standards include recommendations that specify how SQL can be
used in conjunction with XML. The general approach is to structure SELECT queries
that instruct the DBMS to return retrieved data values in XML format. These queries
retrieve data as a text stream that includes data values and associated formatting
information. You could write a program that creates an XML document that dynamically
retrieves its contents from a database, and then use this file to share data across different
applications and hardware and software platforms.
The ISO 2003 standards do not prescribe query syntax but instead specify required
functionality, such as providing both element and attribute data values, aggregate data
elements, and data sorting. The following subsections provide examples for how the
Oracle and Microsoft SQL Server relational databases support these standards.
Oracle XML Queries
Oracle provides a series of XML functions that return data values using different XML
formatting approaches. The following subsections describe the primary Oracle XML
functions.
XMLElement
To create Oracle queries that format data as XML elements, you use the following
general syntax:
SELECT XMLElement("element_name", column_name)
FROM ...
This syntax returns a series of data values enclosed in nodes specified by element_name.
For example, the following query retrieves XML-formatted data elements for university
departments, sorted by department names:
SELECT XMLElement("department", department_name)
FROM university_department
ORDER BY department_name;
<department>Accounting</department>
<department>Chemistry</department>
<department>Computer Science</department>
…
XMLAttributes
The XMLAttributes function displays retrieves data values as attributes. This function
must be nested within an XMLElement function call, as shown in the following example:
SELECT XMLElement("department",
XMLAttributes(department_id AS "id", department_name AS "name"))
FROM university_department
5
ORDER BY department_name;
<department id="2" name="Accounting"></department>
<department id="5" name="Chemistry"></department>
<department id="4" name="Computer Science"></department>
…
XMLAgg
To create aggregate elements that specify data relationships, you use the Oracle
XMLAgg function. This function must be nested within an XMLElement function that
retrieves the parent data values. The following example expresses the relationship that a
department offers multiple courses:
SELECT XMLElement("department",
XMLAgg(XMLElement("course", course_name)))
FROM university_department a INNER JOIN university_course b
ON a.department_id = b.department_id
GROUP BY department_name;
<department><course>ACCT 201</course>
<course>ACCT 312</course></department>
<department><course>CHEM 205</course></department>
<department><course>CS 245</course></department>
You could modify the above query to create a container element name named <courses>
to enclose each department's courses:
SELECT XMLElement("department", XMLElement("courses",
(XMLAgg(XMLElement("course", course_name)))))
FROM university_department a INNER JOIN university_course b
ON a.department_id = b.department_id
GROUP BY department_name
<department><courses>
<course>ACCT 201</course>
<course>ACCT 312</course></courses></department>
<department><courses><course>CHEM 205</course></courses></department>
<department><courses><course>CS 245</course></courses></department>
…
XMLForest
The previous examples show that you must call the XMLElement function for each
retrieved data element. Oracle's XMLForest function simplifies this by creating a
"forest" of XML elements in a single function call:
SELECT XMLElement("department", XMLForest(department_name AS "name",
instructor_last_name as "chair"))
FROM university_department a INNER JOIN university_instructor b
ON a.department_chair_id = b.instructor_id;
<department>
<name>Management Information Systems</name>
<chair>Morrison</chair>
6
</department>
<department>
<name>Accounting</name>
<chair>Dutton</chair>
</department>
…
Combining Oracle XML Functions
Our final example shows how you can combine the XMLElement, XMLAttributes,
XMLAgg, and XMLForest functions to create aggregated XML-formatted data in which
some values appear as attributes and others appear as elements:
SELECT XMLElement("department", XMLAttributes(department_name AS
"name"),
XMLElement("courses", (XMLAgg(XMLElement("course", course_name)))))
FROM university_department a INNER JOIN university_course b
ON a.department_id = b.department_id
WHERE a.department_id = 1
GROUP BY department_name;
<department name="Management Information Systems">
<courses>
<course>MIS 240</course>
<course>MIS 310</course>
<course>MIS 344</course>
<course>MIS 345</course>
</courses>
</department>
…
SQL Server XML Queries
To create a SQL Server SELECT query that retrieves data values in XML format, you
use the following general syntax:
SELECT Query
FOR XML Mode[, ELEMENTS]
In this syntax, SELECT Query represents any valid SQL SELECT query. The FOR XML
clause says to wrap the data in XML tags. Mode specifies how the DBMS wraps the
data, and can have the following values:

RAW, which specifies to return each row as a single XML element node, with the
data stored as attribute node values within each row’s element node

AUTO, which is used with multiple table queries and specifies to wrap each
returned row within an element node and hierarchically nest child nodes under
parent element node as text nodes based on the query’s JOIN condition.

EXPLICIT, which wraps the data can be stored in a combination of attribute and
text nodes.
7
The ELEMENTS option specifies to format data as text nodes rather than attribute values.
The following subsections describe the XML outputs using different modes and options.
FOR XML RAW
The FOR XML RAW mode returns data in a single XML element, and stores the data as
attribute node values. The following query retrieves data about departments and their
associated courses using the FOR XML RAW mode.
SELECT DepartmentName, CourseName
FROM UniversityDepartment a INNER JOIN UniversityCourse b
ON a.DepartmentID = b.DepartmentID
ORDER BY DepartmentName, CourseName
FOR XML RAW
<row DepartmentName="Accounting" CourseName="ACCT 201" />
<row DepartmentName="Accounting" CourseName="ACCT 312" />
<row DepartmentName="Chemistry" CourseName="CHEM 205" />
...
Note that the DBMS wraps each row in a <row> tag, and specifies the field names as
attributes and the associated data values as attribute values enclosed in quotation marks.
The ELEMENTS option instructs the DBMS to place the data values within XML
elements. The first three retrieved records appear as follows using the FOR XML RAW,
ELEMENTS option:
SELECT DepartmentName, CourseName
FROM UniversityDepartment a INNER JOIN UniversityCourse b
ON a.DepartmentID = b.DepartmentID
ORDER BY DepartmentName, CourseName
FOR XML RAW, ELEMENTS
<row>
<DepartmentName>Accounting</DepartmentName>
<CourseName>ACCT 201</CourseName>
</row>
<row>
<DepartmentName>Accounting</DepartmentName>
<CourseName>ACCT 312</CourseName>
</row>
<row>
<DepartmentName>Chemistry</DepartmentName>
<CourseName>CHEM 205</CourseName>
</row>
...
FOR XML AUTO
The FOR XML AUTO mode retrieves data as attribute values, and automatically nests
element values when queries join multiple tables. This mode retrieves the following
values for the department/course query:
8
SELECT DepartmentName, CourseName
FROM UniversityDepartment dept INNER JOIN UniversityCourse course
ON dept.DepartmentID = course.DepartmentID
ORDER BY DepartmentName, CourseName
FOR XML AUTO
<dept DepartmentName="Accounting">
<course CourseName="ACCT 201" />
<course CourseName="ACCT 312" />
</dept>
<dept DepartmentName="Chemistry">
<course CourseName="CHEM 205" />
</dept>
<dept DepartmentName="Computer Science">
<course CourseName="CS 245" />
</dept>
...
Note that the DBMS wraps each value in a tag that reflects the table name or table name
alias (such as <dept> and <course>, specifies the field names as attributes, and shows the
associated data values as attribute values enclosed in quotation marks. The child table
(<course>) values are enclosed within the parent table (<dept>) tags.
The FOR XML AUTO, ELEMENTS option formats data values as elements rather than
as attribute values, as follows:
SELECT DepartmentName, CourseName
FROM UniversityDepartment dept INNER JOIN UniversityCourse course
ON dept.DepartmentID = course.DepartmentID
ORDER BY DepartmentName, CourseName
FOR XML AUTO, ELEMENTS
<dept>
<DepartmentName>Accounting</DepartmentName>
<course>
<CourseName>ACCT 201</CourseName>
</course>
<course>
<CourseName>ACCT 312</CourseName>
</course>
</dept>
<dept>
Note that you can create column aliases for both columns and tables to change the names
of the associated XML output nodes. For example, the following query assigns aliases to
both retrieved tables and columns. The XML-formatted output for the first record that
this query retrieves appears as follows, with nodes names changed to match the alias
names.
SELECT DepartmentName AS dname, CourseName AS cname
FROM UniversityDepartment dept
INNER JOIN UniversityCourse course
ON dept.DepartmentID = course.DepartmentID
ORDER BY dname, cname
FOR XML AUTO, ELEMENTS
9
<dept>
<dname>Accounting</dname>
<course>
<cname>ACCT 201</cname>
</course>
<course>
<cname>ACCT 312</cname>
</course>
</dept>
...
FOR XML EXPLICIT
The FOR XML EXPLICIT option provides precise control over the XML output format.
It allows developers to create XML output that retains hierarchical relationships, and to
specify whether data appears as attribute node values, text node values, or a combination
of both. It also provides an alternate way to allow developers to specify node names.
This option makes queries much more difficult to configure, but provides a much higher
degree of control over the query output.
With the EXPLICIT option, each SELECT query includes a tag field as the first field in
the SELECT clause, and a parent field as the second field in the SELECT clause. The
query assigns numeric values, such as 1 or 2, to the tag and parent fields. These levels
identify corresponding levels in the resulting XML hierarchy.
The simplest case for using the EXPLICIT option involves a SELECT query which
retrieves data values from a single table, displays the values as an XML tree with a single
level of output, and displays the data values as attribute values. In this situation, the tag
field is 1, because this query defines the top level, and the parent field is NULL, because
the top node has no parent. Consider the following query and its associated output:
SELECT 1 as tag,
NULL as parent,
DepartmentName AS [dept!1!name]
FROM
UniversityDepartment
ORDER BY DepartmentName
FOR XML EXPLICIT
<dept name="Accounting" />
<dept name="Chemistry" />
<dept name="Computer Science" />
…
In this query, the first and second fields in the SELECT clause assign the tag field value
as 1 and the parent field value as NULL.
The third field in the SELECT clause, which is highlighted above, has the following
general syntax:
DatabaseFieldName AS
[XMLNodeName!XMLNodeLevel!DataName]
10
In this syntax, DatabaseFieldName specifies the name of the database field that the query
retrieves, which in the example is DepartmentName. The exclamation point (!) serves as
a delimiter. XMLNodeName specifies the name of the node that wraps the retrieved data.
XMLNodeLevel specifies the level of the corresponding XML node, which in this case is
1. DataName specifies the name of the attribute associated with each data value.
If you want to wrap data values in tags rather than express them as attributes, you modify
the third field in the SELECT clause so it includes the !element option as follows:
SELECT 1 as tag,
NULL as parent,
DepartmentName AS [dept!1!name!element]
FROM
UniversityDepartment
ORDER BY DepartmentName
FOR XML EXPLICIT
<dept>
<name>Accounting</name>
</dept>
<dept>
<name>Chemistry</name>
</dept>
…
Sometimes you need to join multiple tables to create single-level XML outputs. For
example, suppose you want to include department chair names (which are stored in the
UniversityInstructor table) along with department names in the previous query. The next
example joins two tables and shows the output in a single XML level:
SELECT 1 as tag,
NULL as parent,
DepartmentName As [dept!1!name!element],
InstructorLastName As [dept!1!chair!element]
FROM UniversityDepartment a INNER JOIN UniversityInstructor b
ON a.DepartmentChairID = b.InstructorID
ORDER BY DepartmentName
FOR XML EXPLICIT
<dept>
<name>Accounting</name>
<chair>Dutton</chair>
</dept>
<dept>
<name>Chemistry</name>
<chair>Harrison</chair>
</dept>
...
An important feature of the EXPLICIT option is that it allows you to format data using a
combination of attribute values and text node values. If you delete the !element
option in the third query line above, the output appears as follows:
11
<dept name="Accounting">
<chair>Dutton</chair>
</dept>
<dept name="Chemistry">
<chair>Harrison</chair>
</dept>
...
So far, all of these queries have had a single level of output. To create multiple XML
output levels using the EXPLICIT option, you create a separate SELECT query for each
level, and join the levels using the UNION operator. Recall that the UNION operator
joins the output of two unrelated SELECT queries into a single output. Every SELECT
query in a UNION operation must return exactly the same number of fields, and they
must be of the same corresponding data types. Therefore, the initial query must specify
all fields that the query ultimately retrieves, and fields that appear in lower-level queries
are designated as NULL values.
Consider the following query, which retrieves two levels of XML output consisting of
department names and the associated courses for each department:
SELECT 1 As tag,
NULL As parent,
DepartmentName As [dept!1!dname],
NULL As [course!2!cname!element],
NULL As [course!2!title!element]
FROM
UniversityDepartment
UNION
SELECT 2 As tag,
1 As parent,
DepartmentName,CourseName, CourseTitle
FROM
UniversityDepartment a INNER JOIN UniversityCourse b
ON a.DepartmentID = b.DepartmentID
ORDER BY [dept!1!dname], [course!2!cname!element]
FOR XML EXPLICIT
<dept dname="Accounting">
<course>
<cname>ACCT 201</cname><title>Principles of Accounting</title>
</course>
<course>
<cname>ACCT 312</cname>
<title>Managerial Accounting</title>
</course>
</dept>
<dept dname="Chemistry">
<course>
<cname>CHEM 205</cname>
<title>Applied Physical Chemistry</title>
</course>
</dept>
...
12
The query joins two SELECT queries using the UNION operator. The first query
retrieves the data for the parent nodes, and the second query retrieves the data for both
the parent and child nodes using an INNER JOIN operation. The first query includes
NULL placeholders for the child data fields (CourseName and CourseTitle). These
placeholders don't retrieve data, but are responsible for formatting the XML output. The
second query has a tag field value of 2 and a parent field value of 1, which indicates that
it is hierarchically subordinate to the first query.
The ORDER BY clause is very important, because it specifies how the XML output
structures the data hierarchy. If you omit the ORDER BY clause, the output appears as
follows:
<dept dname="Accounting" />
<dept dname="Chemistry" />
<dept dname="Computer Science" />
<dept dname="Foreign Languages" />
<dept dname="Geology" />
<dept dname="Management Information Systems" />
<dept dname="Physics">
<course>
<cname>ACCT 201</cname>
<title>Principles of Accounting</title>
</course>
<course>
<cname>ACCT 312</cname>
<title>Managerial Accounting</title>
</course>
...
To create lower hierarchical levels, you create additional SELECT queries and join them
using the UNION operator. To create container elements, add levels that retrieve NULL
values. For example, the following query creates a <courses> container element to group
each department's courses:
SELECT 1 As tag,
NULL As parent,
DepartmentName As [dept!1!dname],
NULL As [courses!2],
NULL As [course!3!cname!element],
NULL As [course!3!ctitle!element]
FROM
UniversityDepartment
UNION
SELECT 2 As tag,
1 As parent,
DepartmentName,
NULL AS [courses!2],
NULL As [course!3!cname!element],
NULL As [course!3!ctitle!element]
FROM UniversityDepartment
UNION
SELECT 3 As tag,
2 As parent,
DepartmentName,
NULL As [courses!2],
13
CourseName, CourseTitle
FROM
UniversityDepartment a INNER JOIN UniversityCourse b
ON a.DepartmentID = b.DepartmentID
ORDER BY [dept!1!dname], [course!3!cname!element]
FOR XML EXPLICIT
<dept dname="Accounting">
<courses>
<course>
<cname>ACCT 201</cname>
<ctitle>Principles of Accounting</ctitle>
</course>
<course>
<cname>ACCT 312</cname>
<ctitle>Managerial Accounting</ctitle>
</course>
</courses>
</dept>
…
Conclusions and Potential Student Projects
Presenting XML representation of relational database data helps students understand the
relationships between of XML and relational databases, as well as XML's limitations as a
data storage mechanism. The contrast between how different DBMSs support the same
end functionality provides fertile ground for discussion of syntax differences and the
driving forces of ISO standards. The contrast between how Oracle and SQL Server
implement the ISO standards highlights how different vendors take radically different
approaches to achieve the same result. Interesting student projects for illustrating
connections between relational databases and XML include:

Provide a series of relational database tables with complex foreign key
relationships, and ask students to manually generate XML-formatted data to show
the data relationships in a variety of ways.

Require students to create SQL queries to retrieve identically XML-formatted
data using both Oracle and SQL Server DBMSs, and to then consider the
differences, advantages, and disadvantages of each approach.

Ask students to determine if Oracle and SQL Server provide identical
functionality in retrieving XML-formatted data, or if either platform has
capabilities that the other does not.

Require students to generate queries to generate XML-formatted data and display
it in a browser.

Require students to write XSLTs to transform XML-formatted data into formatted
Web pages.
14
Resources
Script files to generate the database tables used in the example queries in this paper can
be obtained at http://www.cs.uwec.edu/~morrisjp/Public/Conferences/MICS.
This
directory also contains an electronic version of this paper and associated PowerPoint.
15