Download XML Publishing - Computer Science, NMSU

XML Publishing  Introduction  General approach  XPERRANTO SilkRoute Microsoft SQL 2000  Summary Introduction  What is XML Publishing? XML Publishing is the task of transforming the relational data into XML, for the purpose of exchange over the Internet. More specifically, publishing XML data involves joining tables, selecting and projecting the data that needs to be exported, creating XML hierarchies; and processing values in an application specific manner. Introduction  Why need XML Publishing? - most business data are stored in relational database systems. - XML is a standard for exchanging business data on the web. - it’s a simply, platform independent, unicode based syntax for which simple and efficient parsers are widely available. - it can not only represent structured data, but also provide an uniform syntax for semi-structured data and marked-up content. Introduction Two data model:  Relational data - fragmented into many flat relations - normalized - proprietary  XML data - nested - un-normalized - public (450 schemas at www.biztalk.org) General Approach  Create XML views over Relational Data, each of these XML views can provide an alternative, application-specific view of the underlying relational data. Through these XML views, business partners can access existing relational data as though it was in some industry-standard XML format. Virtual vs. Materialize  Materialized XML Publishing Materialize the entire XML view on request and return the resulting XML document.  Virtual XML Publishing Support queries over XML views, return what user applications actually want. Virtual vs. Materialize  Materialized XML Publishing - applications can access all the data without interfering with the relational engine - XML view need to be refreshed periodically - inefficient in some cases  Virtual XML Publishing - guarantee data freshness - leverage the processing power of relational engines - translation of an XML query of an XML view into SQL may be complex Middleware System  Interface between Relational Database and User Application - defines and manages XML views - translates incoming XML queries into SQL and submits them to the database system - receives the queries’ results, then translates them back into XML terms. Applications Middleware System Web/Intranet User XML Queries XML Query Processor Result XML Documents View Definition XML Views Manager View Description SQL Queries Figure 1 A high-level architecture of middleware system XML Tagger Tuples Streams RDBMS XPERRANTO vs. SilkRoute  IBM XPERRANTO - pure XML, single query language approach. XML views are defined by XML query language which is using the type system of XML schema.  SilkRoute - XML views are defined using a declarative query language called RXL (Relational to XML Transformation Language). XPERRANTO vs. SilkRoute  XPERRANTO - user only need be familiar with XML - both relation data and meta-data can be represented and queried in the same framework - can publish object-relational structures - pushes all relational logic down to database engine Query Translation View Definition XML View Services XML-QL Parser View Description XQGM XML Schema Query Rewrite XML Schema Generator Catalog Info. XQGM SQL Translation XML Result XML Tagger Data Tuples O-R Database SQL Query Processor SQL Queries Stored Tables System Catalog Figure 2 XPERRANTO Architecture Example 1: Relational Schema vs. XML View Schema  DDL (Data Definition Language) for O-R Schema in SQL99 Terms 1.Create Table Book AS (bookID CHAR(30), name VARCHAR(225), publisher VARCHAR(30)) 2.Create Table publisher AS (name VARCHAR(30), address VARCHAR(255)) 3.Create Type author_type AS (bookID CHAR(30), first VARCHAR(30), last VARCHAR(30)) 4.Create Table author OF author_type (REF IS ssn USER GENERATED) <simpleType name=“string255” source=“string”> <maxlength value=“255”/> </simpleType> <simpleType name=“string30” source=“string”> <maxlength value=“30”/> </simpleType> <complexType name=“bookTupleType”> <element name=“bookID” type=“string30”/> <element name=“name” type=“string225”/> <element name=“publisher” type=“string30”/> </complexType> Create Table book AS... <complexType name=“bookSetType”> <element name=“bookTuple” type=“bookTupleType” maxOccurs=“*”/> </complexType> <element name=“book” type=“bookSetType”/> <complexType name=“author_type”> <element name=“bookID” type=“string30”/> <element name=“first” type=“string30”/> <element name=“last” type=“string30”/> </complexType> Create Type author_type AS... <complexType name=“authorTupleType” source=“author_type” derivedBy=“extension”> <attribute name=“ssn” type=“ID”/> </complexType> <complexType name=“authorSetType”> <element name=“authTuple” type=“authTupleType” maxOccurs=“*”/> </complexType> <element name=“author” type=“authSetType”/> XML View Schema over Example O-R database Create Table author OF ... <db> <book> <row><bookID>…</bookID><name>…</name><publisher>…</publisher></row> <row><bookID>…</bookID><name>…</name><publisher>…</publisher></row> … </book> <author> <row><ssn>…</ssn><bookID>…</bookID><first>…</first><last>…</last></row> <row><ssn>…</ssn><bookID>…</bookID><first>…</first><last>…</last></row> … </author> <publisher> …similar to <book> and <item> </publisher> </db> Default XML View over Example O-R database Example 2: From XQuery to SQL Query Result XQuery XPERRANTO Query Engine XQuery Parser XQGM Query Rewrite & View Composition XQGM Computational Pushdown Tagger Graph SQL Query Tagger Runtime Tuples RDBMS order id custname 10 Smith Construction 9 Western Builders item custnum payment oid desc cost oid 7734 10 10 8000 12000 10 7725 generator backhoe 10 due 1/10/01 6/10/01 amt 20000 12000 <db> <order> <row><id>10</id><custname>Smith Construction</custname><custnum>7734</custnum></row> <row><id>9</id><custname>Western Builders</custname><custnum>7725</custnum></row> </order> <item> <row><oid>10</id><desc>generator</desc><cost>8000</cost></row> <row><oid>10</id><desc>backhoe</desc><cost>24000</cost></row> </item> <payment> …similar to <order> and <item> </payment> </db> A Purchase Order Database and its Default View <order id=“10”> <customer>Smith Construction</customer> <items> <item description=“generator”> <cost>8000</cost> </item> <item description=“backhoe”> <cost>24000</cost> </item> </items> <payments> <payment due=“1/10/01”> <amount>20000</amount> </payment> <payment due=“6/10/01”> <amount>12000</amount> </payment> </payments> </order> <order id=“9”> … </order> XML Purchase Order 01. create view orders as ( 02. for $order in view (“default”)/order/row 03. return 04. <order id=$order/id> 05. <customer>$order/custname</customer> 06. <items> 07. for $item in view(“default”)/item/row 08. where $order/id=$item/oid 09. return 10. <item description=$item/desc> 11. <cost>$item/cost</cost> 12. </item> 13. </items> 14. <payments> 15. for $payment in view(“default”)/payment/row 16. where $order/id=$payment/oid 17. return 18. <payment due=$payment/data> 19. <amount>$payment/amount</amount> 20. </payment> 21. sortby(@due) 22. </payments> 23. </order>) User-defined XML “orders” view 1. for $order in view(“orders”) 2. where $order/customer/text() like “Smith%” 3. return $order XQuery over “orders” view XQuery Parser Query Parsing  XQGM (XML Query Graph Model) - extension of a SQL internal query representation called Query Graph Model (QGM). - consists of a set of operators and functions that are designed to capture the semantics of an XML query. OPERATOR DESCRIPTION Table Represents a table in a relational database Project Computes results based on its input Select Restricts its input Join Join two or more inputs Groupby Applies aggregate functions and grouping Orderby Sorts input based on column values Union Unions two or more inputs Unnest Applies super-scalar functions to input View Represents a view Function Represents an Xquery function XML FUNCTION Part of the XML Functions and Operators in XQGM DESCRIPTION OPERATORS 1 cr8Elem(Tag, Atts, Clist) Creates an element with tag name Tag, attribute list Atts, and contents Clist Project 2 cr8AttList(A1,…,An) Creates a list of attributes from the attributes passed as parameters Project 3 cr8XMLFragList(C1,…,Cn) Creates an XML fragment list from the content (element/text) parameters Project 4 aggXMLFrags© Aggregate function that creates an XML fragment list from content inputs Groupby 5 getTagName(Elem) Returns the element name of Elem Project, Select 6 getAttributes(Elem) Returns the list of attributes of Elem Project, Select 7 getAttName(Att) Returns the name of attribute Att Project, Select 8 Is Element(E) Returns true if E is an element, returns false otherwise Select 9 isText(T) Returns true if T is text, returns false otherwise Select 10 Unnest(List) Superscalar function that unnest a list Unnest 11 view result $order project: $order= <order id=$id> <custname>$custname</custname> <items>$items</items> <payments>$pmts</payments> </order> $id $custname $items correlation on order.id $pmts groupby: orderby (on $due): 9 $pmts = aggXMLFrags($pmt) groupby: $items = aggXMLFrags($item) $due $item 4 $pmt 8 project: $pmt = <payment> … project: $item = <item> … $desc $pmts 10 join (correlated): $items 5 XQGM for the XML Orders View $cost 3 select: $oid = $id 2 $oid $desc $cost table: item 7 $id 1 $custname table: order 6 $due $amt select: $oid = $id $oid $due $amt table: payment for $order in view(“order”) where $order/customer/text() like “Smith%” return $order $order 8 join (correlated): 7 $val select: isText($val) and $val like “Smith%” $val 6 Unnest: $val = unnest($vals) correlation 5 on $order $order View: orders 1 $vals project: $vals = getContents($elem) 4 $elem select: isElement($elem) and getTagName($elem) = “customer” 3 $elem Unnest: $elem = unnest($elems) 2 $elems project: $elems = getContents($order) XQGM for the Query over Orders View View Composition  XQGM after the Query Parsing Stage is composed with the views it references (orders view here) and rewrite optimizations are performed to eliminate the construction of intermediate XML fragments and push down predicates. View Composition FUNCTION COMPOSES WITH REDUCTION 1 getTagName cr8Elem(Tag, Atts, Clist) Tag 2 getAttributes cr8Elem(Tag, Atts, Clist) Atts 3 getContents cr8Elem(Tag, Atts, Clist) Clist 4 getAttName cr8Att(Name, Val) Name 5 getAttValue cr8Att(Name, Val) Val 6 isElement cr8Elem(Tag, Atts, Clist) True 7 isElement Other than cr8Elem False 8 isText PCDATA True 9 isText Other than PCDATA False 10 unnest aggXMLFrags( C ) C 11 unnest cr8XMLFragList(C1,…,Cn) C1∪… ∪Cn 12 unnest cr8AttList(A1,…, An) A1 ∪… ∪An Composition Rules View 11 $custname $order project: $order= <order id=$id>… $id $custname $items correlation on order.id 3 12 $due $pmt 8 project: $pmt = <payment> … project: $item = <item> … $cost 7 select: $oid = $id Select: $custname like “Smith%” $oid $desc $cost table: item $id 1 $due $amt select: $oid = $id $custname $id 2 Select: $custname like “Smith%” $pmts groupby: orderby (on $due): 9 $pmts = aggXMLFrags($pmt) groupby: $items = aggXMLFrags($item) $desc 13 $custname 10 join (correlated): $item 4 join (correlated): $pmts $items 5 Query $order $custname table: order 6 Predicate pushdown $oid $due $amt table: payment Computation Pushdown  1. 2. The goal in this phase of query processing is to push all data and memory intensive operations down to the relational engine as an efficient SQL query. Two techniques are available: Query Decorrelation Tagger Pull-up Query Decorrelation  Complex expressions in Xquery can be represented using correlations. However, it has been shown in earlier work that executing correlated XML queries over a relational database leads to poor performance, so query de-correlation is a necessary step for efficient XML query execution. 13 $order project: $order= <order>… 12 $order $custname $items left outer join: $id = $id $id 5 4 $pmts $id groupby: orderby (on $due): 9 $pmts = aggXMLFrags($pmt) $items $due $pmt $id 8 project: $pmt = <payment> … $item project: $item = <item> … $id 3 right outer join: $id = $id Groupby (on $id) : $items = aggXMLFrags($item) $id $pmts $items $id $custname 11 XQGM after Decorrelation $desc $cost 7 join: $oid = $id $id $due $amt join: $oid=$id $custname $id 10 Select: $custname like “Smith%” 2 $oid $desc $cost table: item $id 1 $custname table: order 6 $oid $due $amt table: payment Tagger Pull-up  This step comes right after the query decorrelation. It separates the tagger and SQL operations before SQL query are generated  Relational operations are pushed to the bottom of the graph. SQL statements are generated and sent to the relational engine for execution.  XML construction functions are pulled up to the top of the query graph and transformed into a “tagger run-time” graph, which produces the result XML documents. correlation on id 8 $order Merge: $order=<order>… $items 4 $pmts aggregate:: 7 $pmts = aggXMLFrags($pmt) aggregate: $items = aggXMLFrags($item) $item 3 2 $pmt 6 merge: $pmt = <payment> … merge: $item = <item> … $desc $cost $id input: $oid = $id 1 select p.oid, i.desc, i.cost from item i, order o where o.custname like ‘Smith%’ and i.oid = o.id order by o.id XQGM after Tagger Pull-up $custname 5 $due $amt input: $oid = $id input: select o.id, o.custname from order o where o.custname like ‘Smith%’ order by o.id select p.oid, p.due, p.amt from payment p, order o where o.custname like ‘Smith%’ and p.oid = o.id order by o.id, p.due SilkRoute Approach Applications Web/Intranet User XML Queries Virtual View Or Materialized View RXL Result XML Documents SilkRoute Query Composer XML Tagger Query RXL Plan Generator XML Template SQL Queries Tuples Streams Source Description XML SilkRoute’s Architecture RDBMS      SilkRoute Approach Database administrator starts by writing an RXL query that defines the XML view of the database. It is called the view query. A materialized view is fed directly into the Plan Generator, which generates a set of SQL queries and one XML template. A virtual view is first composed by the Query Composer with a user query resulting another RXL query which then is fed into Plan Generator. SQL queries are sent to the RDMS server, which returns one sorted tuple stream per SQL query XML Tagger merges the tuple streams and produces the XML document, which is returned to the application. Query Composer  This component takes a user XML-QL query and composes it with the RXL view query resulting a new RXL query. It combines fragments of the view query and user query. Works the similar way that the Query Parser and Query Rewrite components in XPERRANTO do. Plan Generator  This component in SilkRoute uses a greedy optimization algorithm to choose an optimal set of SQL queries for a given RXL view definition. The algorithm bases its decisions on query cost estimations provided by the relational engine and can return more than one plan, which will be integrated with additional optimization algorithms that optimize specific parameters, such as network traffic or server load. Details of the greedy algorithm can be found in: Efficient evaluation of XML middle-ware queries. M. Fernandez etc. XML Publishing : SQL Server Two approaches  SQL-centric approach extend the function of SQL queries to realize the transformation. The extended version of SQL query is called “FOR XML”.  Virtual XML views approach use XDR (XML-based XML-Data Reduced) schema language to define virtual XML views over relation database, then do querying with XPath. XML Publishing : SQL Server SQL-centric approach Three modes  RAW mode  Auto Mode  Explicit Mode XML Publishing : SQL Server, RAW Mode SELECT CustomerID, OrderID FROM Customer LEFT OUTER JOIN ORDERS ON Customers.CustomerID = Orders.CustomerID For XML Raw <row CustomerID = “ALFKI”, OrderID = “10643”/> <row CustomerID = “ALFKI”, OrderID = “10692”/> <row CustomerID = “ANATR”, OrderID = “10308”/> .... • flat XML • default tag and attribute names XML Publishing : SQL Server Auto Mode SELECT Customers.CustomerID, OrderID FROM Customer LEFT OUTER JOIN ORDERS ON Customers.CustomerID = Orders.CustomerID ORDER BY Customers.OrderID For XML Auto <Customers CustomerID = “ALFKI”> <Orders OrderID = “10643”/> <Orders OrderID = “10692”/> </Customers> <Customers CustomerID = “ANATR”> <Orders OrderID = “10308”/> </Customers> .... • default tag and attribute names • no differently typed sibling elements XML Publishing : SQL Server Explicit Mode  Nested XML  User defined tags and attributes  Idea: write SQL queries with complex column names  Ad-hoc, order dependent semantics XML Publishing : SQL Server Virtual XML Views  The core mechanism of providing XML views over relation data is the concept of an annotated schema, which consist of a schema description of the XML view and annotations that describe the mapping of the XML schema constructs to the relational schema constructs. Then the XPath query together with the annotated schema is translated into a FOR XML query that only returns the data that is required by the query. Summary  IBM XPERRANTO pure XML, single query language approach. XML views are defined by XML query language which is using the type system of XML schema.  SilkRoute XML views are defined using a declarative query language called RXL (Relational to XML Transformation Language).  Microsoft SQL 2000 Supports queries over XML views, but the support is very limited, because queries are specified using XPath, which is a subset of XQuery. Future Work  IBM XPERRANTO - provides support for insertable and updateable XML views - pushes tagging inside the database system  SilkRoute - looks for better algorithms for translating of RXL into efficient SQL and minimization of composed RXL views  Microsoft SQL 2000 - finds out whether query composition and decomposition is possible for the complete XQuery language or for only a subset of the language

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download XML Publishing - Computer Science, NMSU