Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Using SQL Queries to Generate XML-Formatted Data Joline Morrison Mike Morrison Department of Computer Science University of Wisconsin-Eau Claire Eau Claire, WI 54702 [email protected] [email protected] Abstract XML provides a convenient and portable way to store text data in a structured format. The XQuery portion of the ISO SQL-2003 standards specify requirements for forming SQL queries that retrieve XML-formatted data. Developers can then place this formatted data in XML documents and share it across diverse applications and hardware and/or software platforms. In this paper, we demonstrate how the Oracle and SQL Server database management systems provide functions to generate XML-formatted data within SQL queries. We address the problem of capturing data relationships within XML structures, and discuss the differences between the XML implementations in the two database systems. We also provide ideas for related student projects. Introduction eXtensible Markup Language (XML) uses tags and attributes to structure data. XML allows developers to create custom tags to define data values and their associated relationships and properties in a text format. Over time, XML-formatted documents have become the de facto standard for sharing data across applications and across diverse hardware and software platforms. XML documents have also become an accepted means of storing program configuration information. SQL-2003 ISO specifies standards for forming queries that retrieve XML-formatted data. In this paper, we present an overview of XML documents, and show how XML documents capture data relationships and semantics. We also show how to form SQL queries to retrieve XML-formatted data using both Oracle and SQL Server. Finally, we provide ideas for student labs and projects that use XML-formatted data and queries. XML Overview The Standard Generalized Markup Language (SGML) is a high level language for defining markup languages that was developed in the 1960's and 1970's. A markup language is a language that uses tags or other symbols that are embedded in a document, and defines how the document content appears or acts. HTML is a markup language that uses embedded tags to control the appearance and functions of Web pages. Early developers used SGML to create document type definitions (DTDs) to define standard formatting notations. For example, the DTD for HTML specifies that characters enclosed in <b> tags appear in a boldface font, and that <form> tags enclose HTML form input elements. XML is a subset of SGML that defines structured data using markup language notation. Structured data is data that is defined in a specific and unambiguous format, such as the data that a relational database table stores. As with HTML, XML uses tags and attributes. However, while the HTML DTD specifies the exact meaning of each tag and attribute, XML allows developers to create custom tags to define data items and relationships. You can use XML documents to store relational database data in text files, and then share these files with other systems that don’t know anything about your relational database, but do understand XML. Figure 1 shows an example XML-formatted data file, and Figure 2 shows the associated entity relationship model that illustrates the data attributes and relationships. Figure 1 shows data for BOOK and AUTHOR entities, and illustrates how data values can be represented either as XML elements or as element attribute name/value pairs. It also shows how a data element such as <book> can contain the aggregated element <author>, which contains sub-elements. 2 <?xml version="1.0" encoding="UTF-8" ?> <books> <book isbn="99999-99999"> <title>CS 365: A Visual History</title> Element value <authors> <author id="100"> <firstname>Tom</firstname> Aggregated data <lastname>Moore</lastname> element </author> Attribute value <author id="101"> <firstname>Leonard</firstname> <lastname>Larsen</lastname> </author> </authors> <publisher>UWEC-CS Press</publisher> <publishyear>2000</publishyear> <price type="USD">10.00</price> </book> <book isbn="99999-99998"> <title>Teaching Java to Non-Majors: They Can Do It When I Type the Code</title> <authors> <author id="109"> <firstname>Mike</firstname> <lastname>Morrison</lastname> </author> </authors> <publisher>UWEC-CS Press</publisher> <publishyear>2006</publishyear> <price type="USD">7.00</price> </book> </books> Figure 1: XML-formatted data AUTHOR BOOK PK isbn title publisher publishyear price PK authorid firstname lastname Figure 2: ER model showing data relationships The ER model representation shows a many-to-many relationship (a book can be written by multiple authors, and an author can write multiple books). Note that the XML representation is limited to a hierarchical one-to-many relationship: a book can have multiple authors, but in the XML document, the data for each author must be repeated for each book. 3 The XML document’s underlying structure and semantics are captured in its XML Schema Definition (XSD), which defines the document in terms of element names, data types, relationships, order and grouping mechanisms, and constraints. Developers use XSDs to validate XML documents, or to describe an existing relational database structure and then migrate its data to an XML document. The XSD stores all of the database structure information from the original database while the XML document stores the actual data. Figure 3 shows the XSD for the XML file in Figure 1. <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"> <xs:element name="books"> <xs:complexType> <xs:sequence> <xs:element name="book" maxOccurs="unbounded"> <xs:complexType> <xs:sequence> <xs:element name="isbn" maxOccurs="1" type="xs:string" /> <xs:element name="title" maxOccurs="1" type="xs:string" /> <xs:complexType> <xs:sequence> <xs:element name="authors"> <xs:complexType> <xs:sequence> <xs:element name="author" maxOccurs="unbounded"> <xs:complexType> <xs:sequence> <xs:element name="firstname" maxOccurs="1" type="xs:string" /> <xs:element name="lastname" maxOccurs="1" type="xs:string" /> </xs:sequence> </xs:complexType> </xs:element> </xs:sequence> </xs:complexType> </xs:element> </xs:sequence> <xs:element name="publisher" maxOccurs="1" type="xs:string" /> <xs:element name="publishyear" maxOccurs="1" type="xs:dateTime" /> <xs:element name="price" maxOccurs="1" type="xs:long" /> </xs:sequence> </xs:complexType> </xs:element> </xs:sequence> </xs:complexType> </xs:element> </xs:schema> Figure 3: XML schema definition (XSD) The schema element contains type definitions (simpleType and complexType elements), and associated attribute and element declarations. It also specifies relationship cardinalities using the minOccurs and maxOccurs attributes, and attribute data types (integer, string, dateTime, and so forth). 4 Creating Database Queries To Retrieve XML-Formatted Data The ISO-SQL 2003 standards include recommendations that specify how SQL can be used in conjunction with XML. The general approach is to structure SELECT queries that instruct the DBMS to return retrieved data values in XML format. These queries retrieve data as a text stream that includes data values and associated formatting information. You could write a program that creates an XML document that dynamically retrieves its contents from a database, and then use this file to share data across different applications and hardware and software platforms. The ISO 2003 standards do not prescribe query syntax but instead specify required functionality, such as providing both element and attribute data values, aggregate data elements, and data sorting. The following subsections provide examples for how the Oracle and Microsoft SQL Server relational databases support these standards. Oracle XML Queries Oracle provides a series of XML functions that return data values using different XML formatting approaches. The following subsections describe the primary Oracle XML functions. XMLElement To create Oracle queries that format data as XML elements, you use the following general syntax: SELECT XMLElement("element_name", column_name) FROM ... This syntax returns a series of data values enclosed in nodes specified by element_name. For example, the following query retrieves XML-formatted data elements for university departments, sorted by department names: SELECT XMLElement("department", department_name) FROM university_department ORDER BY department_name; <department>Accounting</department> <department>Chemistry</department> <department>Computer Science</department> … XMLAttributes The XMLAttributes function displays retrieves data values as attributes. This function must be nested within an XMLElement function call, as shown in the following example: SELECT XMLElement("department", XMLAttributes(department_id AS "id", department_name AS "name")) FROM university_department 5 ORDER BY department_name; <department id="2" name="Accounting"></department> <department id="5" name="Chemistry"></department> <department id="4" name="Computer Science"></department> … XMLAgg To create aggregate elements that specify data relationships, you use the Oracle XMLAgg function. This function must be nested within an XMLElement function that retrieves the parent data values. The following example expresses the relationship that a department offers multiple courses: SELECT XMLElement("department", XMLAgg(XMLElement("course", course_name))) FROM university_department a INNER JOIN university_course b ON a.department_id = b.department_id GROUP BY department_name; <department><course>ACCT 201</course> <course>ACCT 312</course></department> <department><course>CHEM 205</course></department> <department><course>CS 245</course></department> You could modify the above query to create a container element name named <courses> to enclose each department's courses: SELECT XMLElement("department", XMLElement("courses", (XMLAgg(XMLElement("course", course_name))))) FROM university_department a INNER JOIN university_course b ON a.department_id = b.department_id GROUP BY department_name <department><courses> <course>ACCT 201</course> <course>ACCT 312</course></courses></department> <department><courses><course>CHEM 205</course></courses></department> <department><courses><course>CS 245</course></courses></department> … XMLForest The previous examples show that you must call the XMLElement function for each retrieved data element. Oracle's XMLForest function simplifies this by creating a "forest" of XML elements in a single function call: SELECT XMLElement("department", XMLForest(department_name AS "name", instructor_last_name as "chair")) FROM university_department a INNER JOIN university_instructor b ON a.department_chair_id = b.instructor_id; <department> <name>Management Information Systems</name> <chair>Morrison</chair> 6 </department> <department> <name>Accounting</name> <chair>Dutton</chair> </department> … Combining Oracle XML Functions Our final example shows how you can combine the XMLElement, XMLAttributes, XMLAgg, and XMLForest functions to create aggregated XML-formatted data in which some values appear as attributes and others appear as elements: SELECT XMLElement("department", XMLAttributes(department_name AS "name"), XMLElement("courses", (XMLAgg(XMLElement("course", course_name))))) FROM university_department a INNER JOIN university_course b ON a.department_id = b.department_id WHERE a.department_id = 1 GROUP BY department_name; <department name="Management Information Systems"> <courses> <course>MIS 240</course> <course>MIS 310</course> <course>MIS 344</course> <course>MIS 345</course> </courses> </department> … SQL Server XML Queries To create a SQL Server SELECT query that retrieves data values in XML format, you use the following general syntax: SELECT Query FOR XML Mode[, ELEMENTS] In this syntax, SELECT Query represents any valid SQL SELECT query. The FOR XML clause says to wrap the data in XML tags. Mode specifies how the DBMS wraps the data, and can have the following values: RAW, which specifies to return each row as a single XML element node, with the data stored as attribute node values within each row’s element node AUTO, which is used with multiple table queries and specifies to wrap each returned row within an element node and hierarchically nest child nodes under parent element node as text nodes based on the query’s JOIN condition. EXPLICIT, which wraps the data can be stored in a combination of attribute and text nodes. 7 The ELEMENTS option specifies to format data as text nodes rather than attribute values. The following subsections describe the XML outputs using different modes and options. FOR XML RAW The FOR XML RAW mode returns data in a single XML element, and stores the data as attribute node values. The following query retrieves data about departments and their associated courses using the FOR XML RAW mode. SELECT DepartmentName, CourseName FROM UniversityDepartment a INNER JOIN UniversityCourse b ON a.DepartmentID = b.DepartmentID ORDER BY DepartmentName, CourseName FOR XML RAW <row DepartmentName="Accounting" CourseName="ACCT 201" /> <row DepartmentName="Accounting" CourseName="ACCT 312" /> <row DepartmentName="Chemistry" CourseName="CHEM 205" /> ... Note that the DBMS wraps each row in a <row> tag, and specifies the field names as attributes and the associated data values as attribute values enclosed in quotation marks. The ELEMENTS option instructs the DBMS to place the data values within XML elements. The first three retrieved records appear as follows using the FOR XML RAW, ELEMENTS option: SELECT DepartmentName, CourseName FROM UniversityDepartment a INNER JOIN UniversityCourse b ON a.DepartmentID = b.DepartmentID ORDER BY DepartmentName, CourseName FOR XML RAW, ELEMENTS <row> <DepartmentName>Accounting</DepartmentName> <CourseName>ACCT 201</CourseName> </row> <row> <DepartmentName>Accounting</DepartmentName> <CourseName>ACCT 312</CourseName> </row> <row> <DepartmentName>Chemistry</DepartmentName> <CourseName>CHEM 205</CourseName> </row> ... FOR XML AUTO The FOR XML AUTO mode retrieves data as attribute values, and automatically nests element values when queries join multiple tables. This mode retrieves the following values for the department/course query: 8 SELECT DepartmentName, CourseName FROM UniversityDepartment dept INNER JOIN UniversityCourse course ON dept.DepartmentID = course.DepartmentID ORDER BY DepartmentName, CourseName FOR XML AUTO <dept DepartmentName="Accounting"> <course CourseName="ACCT 201" /> <course CourseName="ACCT 312" /> </dept> <dept DepartmentName="Chemistry"> <course CourseName="CHEM 205" /> </dept> <dept DepartmentName="Computer Science"> <course CourseName="CS 245" /> </dept> ... Note that the DBMS wraps each value in a tag that reflects the table name or table name alias (such as <dept> and <course>, specifies the field names as attributes, and shows the associated data values as attribute values enclosed in quotation marks. The child table (<course>) values are enclosed within the parent table (<dept>) tags. The FOR XML AUTO, ELEMENTS option formats data values as elements rather than as attribute values, as follows: SELECT DepartmentName, CourseName FROM UniversityDepartment dept INNER JOIN UniversityCourse course ON dept.DepartmentID = course.DepartmentID ORDER BY DepartmentName, CourseName FOR XML AUTO, ELEMENTS <dept> <DepartmentName>Accounting</DepartmentName> <course> <CourseName>ACCT 201</CourseName> </course> <course> <CourseName>ACCT 312</CourseName> </course> </dept> <dept> Note that you can create column aliases for both columns and tables to change the names of the associated XML output nodes. For example, the following query assigns aliases to both retrieved tables and columns. The XML-formatted output for the first record that this query retrieves appears as follows, with nodes names changed to match the alias names. SELECT DepartmentName AS dname, CourseName AS cname FROM UniversityDepartment dept INNER JOIN UniversityCourse course ON dept.DepartmentID = course.DepartmentID ORDER BY dname, cname FOR XML AUTO, ELEMENTS 9 <dept> <dname>Accounting</dname> <course> <cname>ACCT 201</cname> </course> <course> <cname>ACCT 312</cname> </course> </dept> ... FOR XML EXPLICIT The FOR XML EXPLICIT option provides precise control over the XML output format. It allows developers to create XML output that retains hierarchical relationships, and to specify whether data appears as attribute node values, text node values, or a combination of both. It also provides an alternate way to allow developers to specify node names. This option makes queries much more difficult to configure, but provides a much higher degree of control over the query output. With the EXPLICIT option, each SELECT query includes a tag field as the first field in the SELECT clause, and a parent field as the second field in the SELECT clause. The query assigns numeric values, such as 1 or 2, to the tag and parent fields. These levels identify corresponding levels in the resulting XML hierarchy. The simplest case for using the EXPLICIT option involves a SELECT query which retrieves data values from a single table, displays the values as an XML tree with a single level of output, and displays the data values as attribute values. In this situation, the tag field is 1, because this query defines the top level, and the parent field is NULL, because the top node has no parent. Consider the following query and its associated output: SELECT 1 as tag, NULL as parent, DepartmentName AS [dept!1!name] FROM UniversityDepartment ORDER BY DepartmentName FOR XML EXPLICIT <dept name="Accounting" /> <dept name="Chemistry" /> <dept name="Computer Science" /> … In this query, the first and second fields in the SELECT clause assign the tag field value as 1 and the parent field value as NULL. The third field in the SELECT clause, which is highlighted above, has the following general syntax: DatabaseFieldName AS [XMLNodeName!XMLNodeLevel!DataName] 10 In this syntax, DatabaseFieldName specifies the name of the database field that the query retrieves, which in the example is DepartmentName. The exclamation point (!) serves as a delimiter. XMLNodeName specifies the name of the node that wraps the retrieved data. XMLNodeLevel specifies the level of the corresponding XML node, which in this case is 1. DataName specifies the name of the attribute associated with each data value. If you want to wrap data values in tags rather than express them as attributes, you modify the third field in the SELECT clause so it includes the !element option as follows: SELECT 1 as tag, NULL as parent, DepartmentName AS [dept!1!name!element] FROM UniversityDepartment ORDER BY DepartmentName FOR XML EXPLICIT <dept> <name>Accounting</name> </dept> <dept> <name>Chemistry</name> </dept> … Sometimes you need to join multiple tables to create single-level XML outputs. For example, suppose you want to include department chair names (which are stored in the UniversityInstructor table) along with department names in the previous query. The next example joins two tables and shows the output in a single XML level: SELECT 1 as tag, NULL as parent, DepartmentName As [dept!1!name!element], InstructorLastName As [dept!1!chair!element] FROM UniversityDepartment a INNER JOIN UniversityInstructor b ON a.DepartmentChairID = b.InstructorID ORDER BY DepartmentName FOR XML EXPLICIT <dept> <name>Accounting</name> <chair>Dutton</chair> </dept> <dept> <name>Chemistry</name> <chair>Harrison</chair> </dept> ... An important feature of the EXPLICIT option is that it allows you to format data using a combination of attribute values and text node values. If you delete the !element option in the third query line above, the output appears as follows: 11 <dept name="Accounting"> <chair>Dutton</chair> </dept> <dept name="Chemistry"> <chair>Harrison</chair> </dept> ... So far, all of these queries have had a single level of output. To create multiple XML output levels using the EXPLICIT option, you create a separate SELECT query for each level, and join the levels using the UNION operator. Recall that the UNION operator joins the output of two unrelated SELECT queries into a single output. Every SELECT query in a UNION operation must return exactly the same number of fields, and they must be of the same corresponding data types. Therefore, the initial query must specify all fields that the query ultimately retrieves, and fields that appear in lower-level queries are designated as NULL values. Consider the following query, which retrieves two levels of XML output consisting of department names and the associated courses for each department: SELECT 1 As tag, NULL As parent, DepartmentName As [dept!1!dname], NULL As [course!2!cname!element], NULL As [course!2!title!element] FROM UniversityDepartment UNION SELECT 2 As tag, 1 As parent, DepartmentName,CourseName, CourseTitle FROM UniversityDepartment a INNER JOIN UniversityCourse b ON a.DepartmentID = b.DepartmentID ORDER BY [dept!1!dname], [course!2!cname!element] FOR XML EXPLICIT <dept dname="Accounting"> <course> <cname>ACCT 201</cname><title>Principles of Accounting</title> </course> <course> <cname>ACCT 312</cname> <title>Managerial Accounting</title> </course> </dept> <dept dname="Chemistry"> <course> <cname>CHEM 205</cname> <title>Applied Physical Chemistry</title> </course> </dept> ... 12 The query joins two SELECT queries using the UNION operator. The first query retrieves the data for the parent nodes, and the second query retrieves the data for both the parent and child nodes using an INNER JOIN operation. The first query includes NULL placeholders for the child data fields (CourseName and CourseTitle). These placeholders don't retrieve data, but are responsible for formatting the XML output. The second query has a tag field value of 2 and a parent field value of 1, which indicates that it is hierarchically subordinate to the first query. The ORDER BY clause is very important, because it specifies how the XML output structures the data hierarchy. If you omit the ORDER BY clause, the output appears as follows: <dept dname="Accounting" /> <dept dname="Chemistry" /> <dept dname="Computer Science" /> <dept dname="Foreign Languages" /> <dept dname="Geology" /> <dept dname="Management Information Systems" /> <dept dname="Physics"> <course> <cname>ACCT 201</cname> <title>Principles of Accounting</title> </course> <course> <cname>ACCT 312</cname> <title>Managerial Accounting</title> </course> ... To create lower hierarchical levels, you create additional SELECT queries and join them using the UNION operator. To create container elements, add levels that retrieve NULL values. For example, the following query creates a <courses> container element to group each department's courses: SELECT 1 As tag, NULL As parent, DepartmentName As [dept!1!dname], NULL As [courses!2], NULL As [course!3!cname!element], NULL As [course!3!ctitle!element] FROM UniversityDepartment UNION SELECT 2 As tag, 1 As parent, DepartmentName, NULL AS [courses!2], NULL As [course!3!cname!element], NULL As [course!3!ctitle!element] FROM UniversityDepartment UNION SELECT 3 As tag, 2 As parent, DepartmentName, NULL As [courses!2], 13 CourseName, CourseTitle FROM UniversityDepartment a INNER JOIN UniversityCourse b ON a.DepartmentID = b.DepartmentID ORDER BY [dept!1!dname], [course!3!cname!element] FOR XML EXPLICIT <dept dname="Accounting"> <courses> <course> <cname>ACCT 201</cname> <ctitle>Principles of Accounting</ctitle> </course> <course> <cname>ACCT 312</cname> <ctitle>Managerial Accounting</ctitle> </course> </courses> </dept> … Conclusions and Potential Student Projects Presenting XML representation of relational database data helps students understand the relationships between of XML and relational databases, as well as XML's limitations as a data storage mechanism. The contrast between how different DBMSs support the same end functionality provides fertile ground for discussion of syntax differences and the driving forces of ISO standards. The contrast between how Oracle and SQL Server implement the ISO standards highlights how different vendors take radically different approaches to achieve the same result. Interesting student projects for illustrating connections between relational databases and XML include: Provide a series of relational database tables with complex foreign key relationships, and ask students to manually generate XML-formatted data to show the data relationships in a variety of ways. Require students to create SQL queries to retrieve identically XML-formatted data using both Oracle and SQL Server DBMSs, and to then consider the differences, advantages, and disadvantages of each approach. Ask students to determine if Oracle and SQL Server provide identical functionality in retrieving XML-formatted data, or if either platform has capabilities that the other does not. Require students to generate queries to generate XML-formatted data and display it in a browser. Require students to write XSLTs to transform XML-formatted data into formatted Web pages. 14 Resources Script files to generate the database tables used in the example queries in this paper can be obtained at http://www.cs.uwec.edu/~morrisjp/Public/Conferences/MICS. This directory also contains an electronic version of this paper and associated PowerPoint. 15