Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Query Processing with XML CSE 350 – Advanced Database Topics Jeffrey R. Ellis Query Processing Topics Why? Java and Other Programming Languages XPath/XSLT XQuery (W3C-sponsored Query Language) Current Research – – Other Query Languages XISS (XML Indexing and Storage System) FIRST – Distinction between XML and HTML/Web Technologies XML spotlight is analogous to Java – – XML IS NOT AN HTML REPLACEMENT – – Immediate benefits applied to World Wide Web Long-range, more exciting benefits in applications HTML marks pages up for presentation on the web XML marks text for semantic information purposes XML can encode HTML pages, but HTML works well on the Web XML Data Storage XML Documents – – – – – – – Data is delineated semantically Schemas/DTDs control contents of elements Semi-structured attitude allows flexibility Text is human-readable and machine-parsable Open standards work with common tools File data storage allows for easy sharing Can queries control access to data? Traditional Database Storage Databases – – – – – – – Data is delineated semantically Schemas control contents of rows No flexibility from semi-structured storage Data is not human-readable, but only machineparsable Proprietary standards prevent interoperability Proprietary storage prevents data sharing Queries control access to data XML for Query Processing If we can get efficient query processing, XML document storage provides many benefits over traditional database storage. Sample application – – – Employee database document XML Schema assumed to exist Employee information queried as per standard HR processing <?xml version="1.0"?> <!DOCTYPE employees SYSTEM "employee.xsd"> <employees> <emp gender='m'> <name> <last>Bissell</last> <first>Brian</first> </name> <position>IT Specialist</position> <salary>35,000</salary> <location>CT</location> </emp> <emp gender='m'> <name> <last>Pham</last> <first>Hung</first> <mi>Q</mi> </name> <position>Senior IT Specialist</position> <salary>45,000</salary> <location>CT</location> </emp> … </employees> Tree Structure of XML Document Remember that XML documents are trees emp gender last name first position mi salary location Query Processing – Programming Languages XML Documents are flat files Any language with file I/O can read XML document Any language with string parsing capabilities can use XML data Query processing done through language syntax “Obvious” result different from traditional databases Query Processing – Programming Languages Strategy – – – Languages have gathered XML processing tools in libraries – Basic File I/O through language Basic String matching to identify elements Processing possible, but not necessarily efficient xerces – Apache library for Java and C++ Two methods for parsing XML data – – DOM SAX DOM Document Object Model Defined by W3C for XML, HTML, and stylesheets Provides an hierarchical, object-view of the document DOMParser parses through file, then provides access to nodes Key: Every item in XML document is a node DOM Example Node (Attr) name=“gender” value=“m” parent Node (Element) name=“emp” attribute1 child1 Node (Element) name=“name” parent child1 Node (Element) name=“last” parent child1 Node (Text) value=“Bissell” parent SAX Simple API for XML Defined by XML-DEV mailing list Provides an event-driven processing of the document XMLReader parses through file and activates different methods and functions based on the elements retrieved Key: Methods are defined in interface, implemented in user code DOM versus SAX SAX is primarily Java-based; DOM defined for most languages DOM requires storage of entire document in memory; SAX processes as it reads DOM mirrors a document that can be revisited; suited for document processing SAX mirrors object lifecycles; suited for data processing Query Processing - XPath/XSLT Standard XML technologies XPath and XSLT provide a ready-made querying infrastructure XPath identifies the location of various document elements XSL Stylesheets provide methods for tranforming data from one format to another Combining XPath and XSLT provides easy generation of result sets based on queries XPath Provides element, value, and attribute identification employees/emp/name/first = “Brian”, “Hung”, “Sara”, “Brian” //salary = “35,000”, “40,000”, “35,000”, “60,000” count(/employees/emp) = 4 //mi = “Q” XSLT Stylesheet transforms data from one form into another <xsl:template match=“name”> <xsl:value-of select=“first”/> <xsl:value-of select=“last”/> </xsl:template> = Brian Bissell, Hung Pham, Sara Menillo, Brian Chicos Combine XPath and XSLT for Queries Query: Find the last name and position of each employee named Brian <xsl:template match='employees'> <xsl:for-each select='emp'> <xsl:if test='name/first="Brian"'> <xsl:value-of select='name/last'/> <xsl:text>:</xsl:text> <xsl:value-of select='position'/> <xsl:text>; </xsl:text> </xsl:if> </xsl:for-each> </xsl:template> Combine XPath and XSLT for Queries Query: Find the average salary of all non-managers <xsl:template match='employees'> <xsl:variable name='running_sum'> <xsl:value-of select='sum(emp/salary[../position!="Manager"])'/> </xsl:variable> <xsl:variable name='running_count'> <xsl:value-of select='count(emp[position!="Manager"])'/> </xsl:variable> <xsl:value-of select='$running_sum div $running_count'/> </xsl:template> Results XSLT/XPath Many SQL queries can be accomplished – – – – – – – XPath provides element (data) access XPath provides basic functions (e.g., sum() ) XPath provides WHERE functionality XSLT provides SELECT functionality XSLT provides ORDER BY functionality (sort) XSLT provides result set formatting UNION functionality provided ..? Querying with XPath and XSLT Important questions – – – Is it sufficient? Is it efficient? Is there a better way? XML community has need to design a full query language XQuery – Working draft published 7 June 2001 Query Processing - XQuery XML provides flexibility in representing many kinds of information Good query language must be likewise flexible – Pre-XQuery languages are good for specific types of data Goal: “[S]mall, easily implementable language in which queries are concise and easily understood.” XQuery Forms 1. 2. 3. 4. 5. 6. 7. Path expressions Element constructors FLWR expressions Operator/Function expressions Conditional expressions Quantified expressions Data Type expressions XQuery – Path Expressions Contribution of XPath XQuery 1.0 and XPath 2.0 Data Model document(“sample1.xml”)//emp/salary /employees/emp/name[../@gender=‘f’] //emp[1 TO 3]/name/first XQuery – Element Constructors Queries can generate new elements Similar to XSLT abilities <worker> {$name/last} {$position} </worker> XQuery – FLWR Expressions For clause/Let clause/Where clause/Return Similar to SQL FOR $e IN document(“sample1.xml”)//emp WHERE $e/salary > 38000 AND $e/@gender = ‘f’ RETURN $e/name XQuery – Operator/Function Expressions Pre-defined and user-defined operators and functions Still under development: Union, Intersect, Except FOR $e IN //employees/emp WHERE not(empty($e//mi)) RETURN $e/name XQuery – Conditional Expressions If-then-else expressions are not yet limited to boolean (ongoing discussion) FOR $e IN /employees/emp RETURN <worker> {$name} IF ($e/position=“Manager”) THEN <manager /> </worker> Quanitifed Expressions Some/Every conditions Some/Every evaluates to True or False FOR $e IN //employees WHERE SOME $p IN $e//emp/position = “Manager” RETURN $e Data Types Data Types based on those available from XML Schema Data types can be literal (“Brian”), from constructor functions (date(“2001-10-11”) ), or from casting ( CAST AS xsd:integer(24) ) User-defined data types are also allowable and parsable XQuery More choices than XSLT/XPath combination Work in progress Current W3C efforts into query language Influencing the future design of the core XML technologies (XPath) Hopes to be fully flexible for all future XML applications Query Processing – Research XQuery specification continues to undergo review and change – – 6 of 7 specification documents released since June All specifications released in 2001 Other avenues of research – – – Other Query languages Indexing strategies Implementation Query Processing – Other Query Languages Many query languages exist – – – – Quilt (basis for XQuery) W3C early languages (XML-QL, XQL) Adopted traditional languages (OQL, XSQL) Research papers (XML-GL, YATL, Lorel) Other query languages often optimized for a particular subset of XML documents Query language field *MAY* be standardizing to XQuery Query Processing – Indexing Strategy Query language less important; better indexing techniques lead to efficiency XISS (XML Indexing and Storage System) – – – – September 19, 2001 publishing Builds sets of indexes on XML data elements and attributes on initial parse of XML document Lookup becomes constant-time through the various built indexes Demonstrated successes in test runs Query Processing - Implementation XML is currently in state of flux – – – Standards are still being revised Industry cautious before embracing a new technology Economic slowdown may prevent new research and development efforts XML still waiting for its “Killer App”, application that forces immediate acceptance XML Query Processing XML is a functional database storage language Efficient query language needed to turn XML into a viable database Query language solutions are being developed – – – – Java/C++ hooks first developed – OK XSLT/XPath implemented – GOOD XQuery being designed – GREAT? Future additions – ????