Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Oracle Database wikipedia , lookup
Entity–attribute–value model wikipedia , lookup
Microsoft SQL Server wikipedia , lookup
Open Database Connectivity wikipedia , lookup
Microsoft Jet Database Engine wikipedia , lookup
Concurrency control wikipedia , lookup
ContactPoint wikipedia , lookup
Clusterpoint wikipedia , lookup
Relational model wikipedia , lookup
Recursive XML Schemas, Recursive XML Queries, and Relational Storage: XML-to-SQL Query Translation Krishnamurthy, R., Chakaravarthy, V.T., Kaushik, R., & Naughton, J.F. (2004) Proceedings of the 20th International Conference on Data Engineering (ICDE’04) Presenter: Jochen Stoesser mailto: [email protected] Motivation (1) • General topic: XML-to-SQL query translation • 3 scenarios of XML shredding schema-oblivious shredding XML Publishing schema-based shredding April 5, 2005 Advanced Database Systems, Jochen Stoesser 2 Motivation (2): Schema Graph • Tree schema graph (see example) • directed acyclic (DAG) schema graph multiple incoming edges • recursive schema graph Source: Krishnamurthy et al. (2004) April 5, 2005 Advanced Database Systems, Jochen Stoesser 3 Motivation (3): Schema-based Shredding • for each non-leaf node create separate relation Book(id, title) Author(id, parentid) Section(id, parentid, parentcode, title) Para(id, parentid) Figure(id, parentid, caption, image) • each leaf node is associated with a column name • parentid and parentcode to preserve structure April 5, 2005 Advanced Database Systems, Jochen Stoesser 4 Motivation (4) • Focus of the paper: XML-to-SQL query translation with schema-based shredding, especially in the presence of recursive XML query, e.g. /book//section/title // /descendant-or-self::node()/ recursive XML schemas • (>> example) not solved by existing schema-based shredding algorithms, schema-oblivious shredding algorithms, XML Publishing algorithms April 5, 2005 Advanced Database Systems, Jochen Stoesser 5 Queries: SQL(p) • example: retrieve nodes 9 (titles of subsections, i.e. nodes 7) /book/section/section/title • path p = <book, section(4), section(7), title> • SQL(p): select from where S2.title Book B, Section S1, Section S2 B.id=S1.parentid and S1.parentcode=1 and S1.id=S2.parentid and S2.parentcode=2; >> April 5, 2005 Advanced Database Systems, Jochen Stoesser 6 Queries: RtoL(l) • root-to-leaf SQL query • possibly multiple root-to-leaf paths p1, ..., pm to leaf l • RtoL(l) := m i = 1SQL(pi ) • retrieves all information that would be retrieved traversing all paths from the root to leaf l • Problem: recursive schema possibly infinite number of root-to-leaf paths RtoL query: union of infinitely many queries April 5, 2005 Advanced Database Systems, Jochen Stoesser * 7 Query Translation • two stages: 1. Identify the paths in the XML schema graph that satisfy the query: PathId stage 2. Use annotations (schemas) from XML-to-Relational mapping to construct equivalent relational query: SQLGen stage XML schema PathId Mapping schema SQLGen SQL query XML query April 5, 2005 Advanced Database Systems, Jochen Stoesser 8 PathId Stage • Problems: recursive schema: number of paths possibly infinite DAG graph: exponential number of paths • General idea: Represent matching paths as graph instead of enumerating to reflect shared information across multiple paths (will become important for SQLGen stage) execute query on a schema graph and identify statisfying paths April 5, 2005 Advanced Database Systems, Jochen Stoesser 9 PathId Stage • Algorithm outline: Take automaton AS representing schema graph S Translate Query into DFA AQ Create cross-product automaton ASQ from AS and AQ, eliminate all dead states ASQ contains all matching paths View ASQ as mapping schema SSQ >> April 5, 2005 Advanced Database Systems, Jochen Stoesser 10 PathId Stage: XPath Semantics Query: //section//title Source: Krishnamurthy et al. (2004) April 5, 2005 Advanced Database Systems, Jochen Stoesser 11 SQLGen Stage XML schema PathId Mapping schema SQLGen XML query • informally: union of all RtoL in mapping schema SSQ corresponds to query result • problem: DAG and recursive graphs & queries DAG: number of paths can be exponential in the size of the component recursive graphs & queries: infinite number of matching paths April 5, 2005 Advanced Database Systems, Jochen Stoesser 12 SQLGen Stage • Solution: SQL99 with-clause • Used to create temporary relations for • Nodes in DAG components shared computation, reflects shared information contained in paths through DAG components decrease exponential to polynomial complexity! • Recursive components >> April 5, 2005 Advanced Database Systems, Jochen Stoesser 13 SQLGen: Algorithm (1) • Query: /E0//E10 c1 • Find mapping schema SSQ (S=SSQ) c2 • strongly connected, non-recursive components (i = Ei): {0}, {1}, {2},{3}, {4}, {5}, {6}, {10}, {11} • merge adjacent components ci and cj if ci dominates cj c3 c1 = {0, 1, 2, 3, 4, 5, 6}, c3 = {10, 11} E11 • recursive component c2 = {7, 8, 9} Source: Krishnamurthy et al. (2004) April 5, 2005 Advanced Database Systems, Jochen Stoesser 14 SQLGen: Algorithm (2), DAG components • further process in top-down topological order • c1 = {0, 1, 2, 3, 4, 5, 6} non-recursive DAG component • create temporary relation for each non-root node that is leaf node has child or parent in different component multiple incoming/outgoing edges ( shared computation) • N1 = {2, 3, 6} >> April 5, 2005 c1 Advanced Database Systems, Jochen Stoesser 15 SQLGen: Algorithm (3), DAG components • for example for node E6 N1: shared computation with T6 as ( select R6.* from R4, T3 where R4.id=R6.parentid and T3.id=R4.parentid and R6.parentcode=1 union all select R6.* from R5, T3 where R5.id=R6.parentid and T3.id=R5.parentid and R6.parentcode=2 ) April 5, 2005 Advanced Database Systems, Jochen Stoesser 16 SQLGen: Algorithm (4), recursive components • c2 = {7, 8, 9} recursive component • temporary relation TR, schema is union of the schemas of nodes in TR • two parts: 1. Initialization part captures all incoming edges 2. Recursive part captures recursion >> April 5, 2005 Advanced Database Systems, Jochen Stoesser 17 SQLGen: Algorithm (5), initialization • incoming edges from other components: (2, 8) and (3, 7) shared computation • Q1 = select R8.*, id(8) as schemanode from T2, R8 where R8.parentcode=2 and R8.parentid=T2.id • Q2 = select R7.*, id(7) as schemanode from T3, R7 where R7.parentcode=3 and R7.parentid=T3.id • Qinit = Q1 Q2 April 5, 2005 Advanced Database Systems, Jochen Stoesser 18 SQLGen: Algorithm (6), recursion • edges within the recursive component c2: (7, 9), (8, 7), (8, 9), (9, 8) • construct recursive query for each of these edges, e.g. Qe1=(7,9) = select R9.*, id(9) as schemanode from TR, R9 where TR.schemanode=id(7) and R9.parentid=TR.id and R9.parentcode=1 • recursive part QR = jQ ej TR = Qinit QR • for each n c2: temporary relation T(n) = select * from TR where schemanode=id(n) (>> example) April 5, 2005 Advanced Database Systems, Jochen Stoesser 19 SQLGen: Final Query • result of SQLGen stage: temporary relations - T2, T3, T6 for c1 - T7, T8, T9, and TR for c2 - T10 and T11 for c3 Final query: /E0//E10 algorithm select elemid from T11 April 5, 2005 Advanced Database Systems, Jochen Stoesser 20 Empirical Evaluation (1) • XMark Benchmark schema • Translation of query fragments that appear in query suite • XML-to-Relational mapping schema has 101 nodes • Size of cross-product schema in all cases < 100 nodes • Result: Translation processes for each query < 6ms April 5, 2005 Advanced Database Systems, Jochen Stoesser 21 Empirical Evaluation (2) • test under extreme conditions • all transitions on single label x • query //x//x//x//x//x • mapping schema complete graph (clique) of n nodes • runtime of translation algorithm: Source: Krishnamurthy et al. (2004) April 5, 2005 Advanced Database Systems, Jochen Stoesser 22 Contributions • Claim: „first to present a generic algorithm that translates path expression queries to SQL in the presence of recursion in the schema in the context of schema-based XML storage shredding of XML into relations“ • Algorithm translates a path expression query into a single SQL query, irrespective of the XML schema‘s complexity • SQL query‘s size polynomial in size of the input XML-toRelational mapping and the XML query April 5, 2005 Advanced Database Systems, Jochen Stoesser 23 Limitations • High complexity of SQL query even for relatively easy XML queries • Although running time may be small, memory requirements may be high due to many temporary relations • Furthermore, although authors indicate solutions for XPath semantics and branching path expression queries (e.g. p1[p2 op value]), there is no proposition about increase in complexity regarding runtime, memory requirements etc. April 5, 2005 Advanced Database Systems, Jochen Stoesser 24 Q&A ? April 5, 2005 Advanced Database Systems, Jochen Stoesser 25 References • Krishnamurthy, R. (2004) XML-to-SQL Query Translation. Dissertation at the University of Wisconsin-Madison. Retrieved at March 18 from http://www.cs.wisc.edu/~sekar/research/main.pdf • XPath v1.0. W3C. Retrieved at March 18 from http://www.w3.org/TR/1999/REC-xpath-19991116.html • Tian, F., DeWitt, D.J., Chen, J., & Zhang, C. The Design and Performance Evaluation of Alternative XML Storage Strategies. Retrieved at April 4 from http://www.cs.wisc.edu/~czhang/doc/publications/feng6page.pdf April 5, 2005 Advanced Database Systems, Jochen Stoesser 26 Appendix: Recursive XML Schema <?xml version="1.0" encoding="UTF-8"?> <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" version="1.0"> <xs:element name="element"> <xs:complexType> <xs:sequence> <xs:any processContents="skip" minOccurs="0" /> <xs:element ref="element" minOccurs="0" /> </xs:sequence> * </xs:complexType> </xs:element> element </xs:schema> << April 5, 2005 Advanced Database Systems, Jochen Stoesser 27 Appendix: SQL(path p) << April 5, 2005 Advanced Database Systems, Jochen Stoesser 28 Appendix: PathId(Q,S) << April 5, 2005 Advanced Database Systems, Jochen Stoesser 29 Appendix: SQLGen(SSQ) << April 5, 2005 Advanced Database Systems, Jochen Stoesser 30 Appendix: SQLFromDAG(c) << April 5, 2005 Advanced Database Systems, Jochen Stoesser 31 Appendix: SQLFromRecursive(c) (1) April 5, 2005 Advanced Database Systems, Jochen Stoesser 32 Appendix: SQLFromRecursive(c) (2) << April 5, 2005 Advanced Database Systems, Jochen Stoesser 33 Appendix: TR example << April 5, 2005 Advanced Database Systems, Jochen Stoesser 34