* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
Download Recursive XML Schemas, Recursive XML Queries, and Relational
Oracle Database wikipedia , lookup
Entity–attribute–value model wikipedia , lookup
Microsoft SQL Server wikipedia , lookup
Open Database Connectivity wikipedia , lookup
Microsoft Jet Database Engine wikipedia , lookup
Concurrency control wikipedia , lookup
ContactPoint wikipedia , lookup
Clusterpoint wikipedia , lookup
Relational model wikipedia , lookup
Recursive XML Schemas, Recursive XML Queries, and Relational Storage: XML-to-SQL Query Translation Krishnamurthy, R., Chakaravarthy, V.T., Kaushik, R., & Naughton, J.F. (2004) Proceedings of the 20th International Conference on Data Engineering (ICDE’04) Presenter: Jochen Stoesser mailto: [email protected] Motivation (1) • General topic: XML-to-SQL query translation • 3 scenarios of XML shredding schema-oblivious shredding XML Publishing schema-based shredding April 5, 2005 Advanced Database Systems, Jochen Stoesser 2 Motivation (2): Schema Graph • Tree schema graph (see example) • directed acyclic (DAG) schema graph multiple incoming edges • recursive schema graph Source: Krishnamurthy et al. (2004) April 5, 2005 Advanced Database Systems, Jochen Stoesser 3 Motivation (3): Schema-based Shredding • for each non-leaf node create separate relation Book(id, title) Author(id, parentid) Section(id, parentid, parentcode, title) Para(id, parentid) Figure(id, parentid, caption, image) • each leaf node is associated with a column name • parentid and parentcode to preserve structure April 5, 2005 Advanced Database Systems, Jochen Stoesser 4 Motivation (4) • Focus of the paper: XML-to-SQL query translation with schema-based shredding, especially in the presence of recursive XML query, e.g. /book//section/title // /descendant-or-self::node()/ recursive XML schemas • (>> example) not solved by existing schema-based shredding algorithms, schema-oblivious shredding algorithms, XML Publishing algorithms April 5, 2005 Advanced Database Systems, Jochen Stoesser 5 Queries: SQL(p) • example: retrieve nodes 9 (titles of subsections, i.e. nodes 7) /book/section/section/title • path p = <book, section(4), section(7), title> • SQL(p): select from where S2.title Book B, Section S1, Section S2 B.id=S1.parentid and S1.parentcode=1 and S1.id=S2.parentid and S2.parentcode=2; >> April 5, 2005 Advanced Database Systems, Jochen Stoesser 6 Queries: RtoL(l) • root-to-leaf SQL query • possibly multiple root-to-leaf paths p1, ..., pm to leaf l • RtoL(l) := m i = 1SQL(pi ) • retrieves all information that would be retrieved traversing all paths from the root to leaf l • Problem: recursive schema possibly infinite number of root-to-leaf paths RtoL query: union of infinitely many queries April 5, 2005 Advanced Database Systems, Jochen Stoesser * 7 Query Translation • two stages: 1. Identify the paths in the XML schema graph that satisfy the query: PathId stage 2. Use annotations (schemas) from XML-to-Relational mapping to construct equivalent relational query: SQLGen stage XML schema PathId Mapping schema SQLGen SQL query XML query April 5, 2005 Advanced Database Systems, Jochen Stoesser 8 PathId Stage • Problems: recursive schema: number of paths possibly infinite DAG graph: exponential number of paths • General idea: Represent matching paths as graph instead of enumerating to reflect shared information across multiple paths (will become important for SQLGen stage) execute query on a schema graph and identify statisfying paths April 5, 2005 Advanced Database Systems, Jochen Stoesser 9 PathId Stage • Algorithm outline: Take automaton AS representing schema graph S Translate Query into DFA AQ Create cross-product automaton ASQ from AS and AQ, eliminate all dead states ASQ contains all matching paths View ASQ as mapping schema SSQ >> April 5, 2005 Advanced Database Systems, Jochen Stoesser 10 PathId Stage: XPath Semantics Query: //section//title Source: Krishnamurthy et al. (2004) April 5, 2005 Advanced Database Systems, Jochen Stoesser 11 SQLGen Stage XML schema PathId Mapping schema SQLGen XML query • informally: union of all RtoL in mapping schema SSQ corresponds to query result • problem: DAG and recursive graphs & queries DAG: number of paths can be exponential in the size of the component recursive graphs & queries: infinite number of matching paths April 5, 2005 Advanced Database Systems, Jochen Stoesser 12 SQLGen Stage • Solution: SQL99 with-clause • Used to create temporary relations for • Nodes in DAG components shared computation, reflects shared information contained in paths through DAG components decrease exponential to polynomial complexity! • Recursive components >> April 5, 2005 Advanced Database Systems, Jochen Stoesser 13 SQLGen: Algorithm (1) • Query: /E0//E10 c1 • Find mapping schema SSQ (S=SSQ) c2 • strongly connected, non-recursive components (i = Ei): {0}, {1}, {2},{3}, {4}, {5}, {6}, {10}, {11} • merge adjacent components ci and cj if ci dominates cj c3 c1 = {0, 1, 2, 3, 4, 5, 6}, c3 = {10, 11} E11 • recursive component c2 = {7, 8, 9} Source: Krishnamurthy et al. (2004) April 5, 2005 Advanced Database Systems, Jochen Stoesser 14 SQLGen: Algorithm (2), DAG components • further process in top-down topological order • c1 = {0, 1, 2, 3, 4, 5, 6} non-recursive DAG component • create temporary relation for each non-root node that is leaf node has child or parent in different component multiple incoming/outgoing edges ( shared computation) • N1 = {2, 3, 6} >> April 5, 2005 c1 Advanced Database Systems, Jochen Stoesser 15 SQLGen: Algorithm (3), DAG components • for example for node E6 N1: shared computation with T6 as ( select R6.* from R4, T3 where R4.id=R6.parentid and T3.id=R4.parentid and R6.parentcode=1 union all select R6.* from R5, T3 where R5.id=R6.parentid and T3.id=R5.parentid and R6.parentcode=2 ) April 5, 2005 Advanced Database Systems, Jochen Stoesser 16 SQLGen: Algorithm (4), recursive components • c2 = {7, 8, 9} recursive component • temporary relation TR, schema is union of the schemas of nodes in TR • two parts: 1. Initialization part captures all incoming edges 2. Recursive part captures recursion >> April 5, 2005 Advanced Database Systems, Jochen Stoesser 17 SQLGen: Algorithm (5), initialization • incoming edges from other components: (2, 8) and (3, 7) shared computation • Q1 = select R8.*, id(8) as schemanode from T2, R8 where R8.parentcode=2 and R8.parentid=T2.id • Q2 = select R7.*, id(7) as schemanode from T3, R7 where R7.parentcode=3 and R7.parentid=T3.id • Qinit = Q1 Q2 April 5, 2005 Advanced Database Systems, Jochen Stoesser 18 SQLGen: Algorithm (6), recursion • edges within the recursive component c2: (7, 9), (8, 7), (8, 9), (9, 8) • construct recursive query for each of these edges, e.g. Qe1=(7,9) = select R9.*, id(9) as schemanode from TR, R9 where TR.schemanode=id(7) and R9.parentid=TR.id and R9.parentcode=1 • recursive part QR = jQ ej TR = Qinit QR • for each n c2: temporary relation T(n) = select * from TR where schemanode=id(n) (>> example) April 5, 2005 Advanced Database Systems, Jochen Stoesser 19 SQLGen: Final Query • result of SQLGen stage: temporary relations - T2, T3, T6 for c1 - T7, T8, T9, and TR for c2 - T10 and T11 for c3 Final query: /E0//E10 algorithm select elemid from T11 April 5, 2005 Advanced Database Systems, Jochen Stoesser 20 Empirical Evaluation (1) • XMark Benchmark schema • Translation of query fragments that appear in query suite • XML-to-Relational mapping schema has 101 nodes • Size of cross-product schema in all cases < 100 nodes • Result: Translation processes for each query < 6ms April 5, 2005 Advanced Database Systems, Jochen Stoesser 21 Empirical Evaluation (2) • test under extreme conditions • all transitions on single label x • query //x//x//x//x//x • mapping schema complete graph (clique) of n nodes • runtime of translation algorithm: Source: Krishnamurthy et al. (2004) April 5, 2005 Advanced Database Systems, Jochen Stoesser 22 Contributions • Claim: „first to present a generic algorithm that translates path expression queries to SQL in the presence of recursion in the schema in the context of schema-based XML storage shredding of XML into relations“ • Algorithm translates a path expression query into a single SQL query, irrespective of the XML schema‘s complexity • SQL query‘s size polynomial in size of the input XML-toRelational mapping and the XML query April 5, 2005 Advanced Database Systems, Jochen Stoesser 23 Limitations • High complexity of SQL query even for relatively easy XML queries • Although running time may be small, memory requirements may be high due to many temporary relations • Furthermore, although authors indicate solutions for XPath semantics and branching path expression queries (e.g. p1[p2 op value]), there is no proposition about increase in complexity regarding runtime, memory requirements etc. April 5, 2005 Advanced Database Systems, Jochen Stoesser 24 Q&A ? April 5, 2005 Advanced Database Systems, Jochen Stoesser 25 References • Krishnamurthy, R. (2004) XML-to-SQL Query Translation. Dissertation at the University of Wisconsin-Madison. Retrieved at March 18 from http://www.cs.wisc.edu/~sekar/research/main.pdf • XPath v1.0. W3C. Retrieved at March 18 from http://www.w3.org/TR/1999/REC-xpath-19991116.html • Tian, F., DeWitt, D.J., Chen, J., & Zhang, C. The Design and Performance Evaluation of Alternative XML Storage Strategies. Retrieved at April 4 from http://www.cs.wisc.edu/~czhang/doc/publications/feng6page.pdf April 5, 2005 Advanced Database Systems, Jochen Stoesser 26 Appendix: Recursive XML Schema <?xml version="1.0" encoding="UTF-8"?> <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" version="1.0"> <xs:element name="element"> <xs:complexType> <xs:sequence> <xs:any processContents="skip" minOccurs="0" /> <xs:element ref="element" minOccurs="0" /> </xs:sequence> * </xs:complexType> </xs:element> element </xs:schema> << April 5, 2005 Advanced Database Systems, Jochen Stoesser 27 Appendix: SQL(path p) << April 5, 2005 Advanced Database Systems, Jochen Stoesser 28 Appendix: PathId(Q,S) << April 5, 2005 Advanced Database Systems, Jochen Stoesser 29 Appendix: SQLGen(SSQ) << April 5, 2005 Advanced Database Systems, Jochen Stoesser 30 Appendix: SQLFromDAG(c) << April 5, 2005 Advanced Database Systems, Jochen Stoesser 31 Appendix: SQLFromRecursive(c) (1) April 5, 2005 Advanced Database Systems, Jochen Stoesser 32 Appendix: SQLFromRecursive(c) (2) << April 5, 2005 Advanced Database Systems, Jochen Stoesser 33 Appendix: TR example << April 5, 2005 Advanced Database Systems, Jochen Stoesser 34