Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Business intelligence wikipedia , lookup
Data vault modeling wikipedia , lookup
Entity–attribute–value model wikipedia , lookup
Versant Object Database wikipedia , lookup
Clusterpoint wikipedia , lookup
Open data in the United Kingdom wikipedia , lookup
Relational model wikipedia , lookup
Relational-Style XML Query Taro L. Saito, Shinichi Morishita University of Tokyo June 10th, SIGMOD 2008 Intelligent Database Systems Lab School of Computer Science & Engineering Seoul National University, Seoul, Korea Vancouver, Canada Presented by Sangkeun-Lee Reference slides: www.xerial.org/pub/paper/2008/leo-sigmod2008-slides.pdf If your Manager Says… It’s a tragedy… Center for E-Business Technology Copyright 2009 by CEBT Migration to XML Database Benefits of using XML XML is a portable text-data format Tree-structured XML can reduce redundancy of relational data Co Company Employee Office 1 e1 NY 1 e2 NY Emp Emp e1 Relational Data Office e2 Office NY XML Data Center for E-Business Technology Copyright 2009 by CEBT NY Problem Querying relational data translated into XML Q: Retrieve a node tuple (Co, Emp, Office) from the XML data E.g. Xpath, a path expression query /Co/Emp/Office Co Company Employee Office 1 e1 NY 1 e2 NY Emp Emp e1 Relational Data Office e2 Office NY XML Data Center for E-Business Technology Copyright 2009 by CEBT NY Structure Variations Tree-representation of relational data is not unique Company Employee Office 1 e1 NY 1 e2 NY Co Emp Office Co Emp e1 NY Office Office Co NY Emp Emp e1 Emp Emp e1 Center for E-Business Technology e2 Copyright 2009 by CEBT Office NY e2 e2 NY Inconvenience of Xpath Query User must know the entire XML structures to produce correct path queries Co Co Office Office Emp NY NY Co Emp e1 Emp Emp e1 e2 /Co/Office/Emp Center for E-Business Technology Emp Emp e1 e2 Office Office NY //Office[Co]/Emp Copyright 2009 by CEBT e2 NY //Co/Emp[Office] Relational-Style XML Query Query relations in XML With an SQL-like syntax SELECT Co, Emp, Office from (XML Data) Co Co Office Office Emp e1 Emp NY Co Emp Input XML NY Emp Emp e1 e2 Emp e1 e2 Office e2 Office NY NY Result Center for E-Business Technology Copyright 2009 by CEBT Company Employee Office 1 e1 NY 1 e2 NY To Retrieve Relations in XML Center for E-Business Technology Copyright 2009 by CEBT Problem Definition Convert an SQL query, SELECT A,B,C, into an XML structure query There can be many structural variations of (A,B,C) For N nodes, there exists N^(N-1) structural variations – 3^2 = 9 … Center for E-Business Technology Copyright 2009 by CEBT An Example This example involves various tree structures that denote data in the same relation. Center for E-Business Technology Copyright 2009 by CEBT Amoeba A node tuple (A,B,C) is an amoeba iff one of the A,B and C is a common ancestor of the others … Amoeba join retrieves all amoeba structures in the XML data Center for E-Business Technology Copyright 2009 by CEBT Relation in XML A Key observation Relation is simply embedded in XML Company Employee Office 1 e1 NY 1 e2 NY Co Co Office Office NY Emp NY Co Emp Emp e1 Emp Emp e1 e1 e2 e2 /Co/Office/Emp Center for E-Business Technology Emp Office e2 Office NY //Office[Co]/Emp Copyright 2009 by CEBT //Co/Emp[Office] NY Hidden Semantics in XML Some amoeba structure may not form a relation Why this structure is not allowed? Because there are functional dependencies (FD) implied in the XML structure Company 1 M Company Office Emp Office 1 Office Emp Emp Center for E-Business Technology N Employee Emp Copyright 2009 by CEBT Functional Dependencies (FD) FD: X->Y (From a given X,Y is uniquely determined) Employee -> office (Each employee belongs to an office) Office -> company (Each office belongs to a company) Relation in XML must have an amoeba structure corresponding to each FD Relations and FDs are sufficient to describe a schema of XML Company 1 M Invalid structure!! Company Office Emp N Employee Office Emp Emp Office 1 Emp Center for E-Business Technology Copyright 2009 by CEBT Examples of FDs Center for E-Business Technology Copyright 2009 by CEBT Detecting FDs A type of FDs required to determine XML structures to query is one-to-many(or one-to-one) relationships: FD: Emp -> Office – Each employee belongs to an office – An office may have several employees (one-to-many) We can observe these relationships by counting node occurrences or directory from the ER-diagram Company Company Office Emp M Office 1 Office Emp Center for E-Business Technology Emp 1 N Employee Emp Copyright 2009 by CEBT If FDs are ignored… The company has M offices, and each office has N employees: # of (company, office, employee) tuples: When M = 100, N=5 100x(100x5) While, # of correct answers is only M*N = 500 Company 1 Company Office Emp Emp Office Emp Center for E-Business Technology Emp M Office Emp Emp Copyright 2009 by CEBT Office 1 N Employee Avoiding Invalid Relations However, an amoeba structure itself is a connected component of tree nodes, and thus invalid nodes may be connected, as illustrated in Figure 9. To avoid these irrelevant node connections, while allowing various tree structures in describing XML data, we introduce a restricted class of XML structures, called a tree relation Center for E-Business Technology Copyright 2009 by CEBT FD-Aware Amoeba Join Tree Relation <<Emp,Office,Company>>&<<Emp,Office>>&<<Office,Company>> FDs: Emp-> Office, Office -> Company Bottom-up construction of query results Amoeba Join (Employee, Office) Amoeba Join (Office, Company) FD-aware amoeba join avoids invalid XML structures Company Company 1 M Office Emp Emp Center for E-Business Technology Office Emp Emp Office Emp Emp Copyright 2009 by CEBT Office 1 N Employee Query Performance FD-aware amoeba join scales well For various sizes of XML data Center for E-Business Technology Copyright 2009 by CEBT Experiments – Query Set Center for E-Business Technology Copyright 2009 by CEBT Think in Relational-Style First, consider XML := Relations + their annotations Steps Detect relational part from XML data Detect one-to-many(one) relationships (FDs) Write relational queries – SELECT Co, Emp, Office Things more in the paper XML Algebra Data Integration Pushing Structural Constraints for better performance Query Incomplete Relations Other experiments Center for E-Business Technology Copyright 2009 by CEBT Contributions & Conclusions Relation in XML Defined using amoeba structure and FDs XML Algebra Relational-Style XML Query Retrieves relations in XML with a SQL-like query syntax (SQL over XML) Allows structural variations of XML data Departure from path expression queries Target XML structures are automatically determined Applications Database integration Managing relational data enhanced with XML syntax “It’s Just SQL” A large number of XML data and queries are still relational Center for E-Business Technology Copyright 2009 by CEBT Contributions & Conclusions It’s not tragedy anymore… Center for E-Business Technology Copyright 2009 by CEBT My thought on Really? Center for E-Business Technology Thoughts on The Paper Good Points Generally, I think it’s a very good paper The paper shows us very interesting motivation and clearly define the problem The approach in the paper is intuitive and well-explained The authors performed well-designed experiments The paper proposes many future works – Maybe, one of us can work on them However, I doubt if it is really useful – Can we use this for an important business project? Although the author insists that their algorithm scales well, some queries caused out of memory and didn’t work – It’s a crucial problem to be used in real-life business Isn’t Xpath good enough? – People who use XML docs usually know the structure of XML docs Center for E-Business Technology Copyright 2009 by CEBT