Download ppt - Intelligent Data Systems Laboratory

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Business intelligence wikipedia , lookup

Data vault modeling wikipedia , lookup

Entity–attribute–value model wikipedia , lookup

Copyright wikipedia , lookup

Versant Object Database wikipedia , lookup

Clusterpoint wikipedia , lookup

Open data in the United Kingdom wikipedia , lookup

Relational model wikipedia , lookup

Database model wikipedia , lookup

XML wikipedia , lookup

Transcript
Relational-Style XML Query
Taro L. Saito, Shinichi Morishita
University of Tokyo
June 10th, SIGMOD 2008
Intelligent Database Systems Lab
School of Computer Science & Engineering
Seoul National University, Seoul, Korea
Vancouver, Canada
Presented by Sangkeun-Lee
Reference slides: www.xerial.org/pub/paper/2008/leo-sigmod2008-slides.pdf
If your Manager Says…
It’s a tragedy…
Center for E-Business Technology
Copyright  2009 by CEBT
Migration to XML Database
 Benefits of using XML

XML is a portable text-data format

Tree-structured XML can reduce redundancy of relational data
Co
Company
Employee Office
1
e1
NY
1
e2
NY
Emp
Emp
e1
Relational Data
Office
e2
Office
NY
XML Data
Center for E-Business Technology
Copyright  2009 by CEBT
NY
Problem
 Querying relational data translated into XML
 Q: Retrieve a node tuple (Co, Emp, Office) from the XML data

E.g. Xpath, a path expression query
/Co/Emp/Office
Co
Company
Employee Office
1
e1
NY
1
e2
NY
Emp
Emp
e1
Relational Data
Office
e2
Office
NY
XML Data
Center for E-Business Technology
Copyright  2009 by CEBT
NY
Structure Variations
 Tree-representation of relational data is not unique
Company
Employee Office
1
e1
NY
1
e2
NY
Co
Emp
Office
Co
Emp
e1
NY
Office
Office
Co
NY
Emp
Emp
e1
Emp
Emp
e1
Center for E-Business Technology
e2
Copyright  2009 by CEBT
Office
NY
e2
e2
NY
Inconvenience of Xpath Query
 User must know the entire XML structures to produce correct
path queries
Co
Co
Office
Office
Emp
NY
NY
Co
Emp
e1
Emp
Emp
e1
e2
/Co/Office/Emp
Center for E-Business Technology
Emp
Emp
e1
e2
Office
Office
NY
//Office[Co]/Emp
Copyright  2009 by CEBT
e2
NY
//Co/Emp[Office]
Relational-Style XML Query
 Query relations in XML

With an SQL-like syntax
 SELECT Co, Emp, Office from (XML Data)
Co
Co
Office
Office
Emp
e1
Emp
NY
Co
Emp
Input XML
NY
Emp
Emp
e1
e2
Emp
e1
e2
Office
e2
Office
NY
NY
Result
Center for E-Business Technology
Copyright  2009 by CEBT
Company
Employee
Office
1
e1
NY
1
e2
NY
To Retrieve Relations in XML
Center for E-Business Technology
Copyright  2009 by CEBT
Problem Definition
 Convert an SQL query, SELECT A,B,C, into an XML structure
query

There can be many structural variations of (A,B,C)

For N nodes, there exists N^(N-1) structural variations
–
3^2 = 9
…
Center for E-Business Technology
Copyright  2009 by CEBT
An Example
This example involves various tree
structures that denote data in the
same relation.
Center for E-Business Technology
Copyright  2009 by CEBT
Amoeba
 A node tuple (A,B,C) is an amoeba iff one of the A,B and C is a
common ancestor of the others
…
 Amoeba join retrieves all amoeba structures in the XML data
Center for E-Business Technology
Copyright  2009 by CEBT
Relation in XML
 A Key observation
Relation is simply embedded in XML

Company
Employee
Office
1
e1
NY
1
e2
NY
Co
Co
Office
Office
NY
Emp
NY
Co
Emp
Emp
e1
Emp
Emp
e1
e1
e2
e2
/Co/Office/Emp
Center for E-Business Technology
Emp
Office
e2
Office
NY
//Office[Co]/Emp
Copyright  2009 by CEBT
//Co/Emp[Office]
NY
Hidden Semantics in XML
 Some amoeba structure may not form a relation

Why this structure is not allowed?
 Because there are functional dependencies (FD) implied in the
XML structure
Company
1
M
Company
Office
Emp
Office
1
Office
Emp
Emp
Center for E-Business Technology
N
Employee
Emp
Copyright  2009 by CEBT
Functional Dependencies (FD)

FD: X->Y (From a given X,Y is uniquely determined)

Employee -> office (Each employee belongs to an office)

Office -> company (Each office belongs to a company)

Relation in XML must have an amoeba structure corresponding to each FD

Relations and FDs are sufficient to describe a schema of XML
Company
1
M
Invalid
structure!!
Company
Office
Emp
N
Employee
Office
Emp
Emp
Office
1
Emp
Center for E-Business Technology
Copyright  2009 by CEBT
Examples of FDs
Center for E-Business Technology
Copyright  2009 by CEBT
Detecting FDs
 A type of FDs required to determine XML structures to query is
one-to-many(or one-to-one) relationships:

FD: Emp -> Office
–
Each employee belongs to an office
–
An office may have several employees (one-to-many)
 We can observe these relationships by counting node
occurrences or directory from the ER-diagram
Company
Company
Office
Emp
M
Office
1
Office
Emp
Center for E-Business Technology
Emp
1
N
Employee
Emp
Copyright  2009 by CEBT
If FDs are ignored…
 The company has M offices, and each office has N employees:
 # of (company, office, employee) tuples:
When M = 100, N=5 100x(100x5)

 While, # of correct answers is only M*N = 500
Company
1
Company
Office
Emp
Emp
Office
Emp
Center for E-Business Technology
Emp
M
Office
Emp
Emp
Copyright  2009 by CEBT
Office
1
N
Employee
Avoiding Invalid Relations
 However, an amoeba structure itself is a connected component
of tree nodes, and thus invalid nodes may be connected, as
illustrated in Figure 9.
 To avoid these irrelevant node connections, while allowing
various tree structures in describing XML data, we introduce a
restricted class of XML structures, called a tree relation
Center for E-Business Technology
Copyright  2009 by CEBT
FD-Aware Amoeba Join

Tree Relation

<<Emp,Office,Company>>&<<Emp,Office>>&<<Office,Company>>

FDs: Emp-> Office, Office -> Company

Bottom-up construction of query results


Amoeba Join (Employee, Office)

Amoeba Join (Office, Company)
FD-aware amoeba join avoids invalid XML structures
Company
Company
1
M
Office
Emp
Emp
Center for E-Business Technology
Office
Emp
Emp
Office
Emp
Emp
Copyright  2009 by CEBT
Office
1
N
Employee
Query Performance
 FD-aware amoeba join scales well

For various sizes of XML data
Center for E-Business Technology
Copyright  2009 by CEBT
Experiments – Query Set
Center for E-Business Technology
Copyright  2009 by CEBT
Think in Relational-Style

First, consider


XML := Relations + their annotations
Steps

Detect relational part from XML data

Detect one-to-many(one) relationships (FDs)

Write relational queries
–

SELECT Co, Emp, Office
Things more in the paper

XML Algebra

Data Integration

Pushing Structural Constraints for better performance

Query Incomplete Relations

Other experiments
Center for E-Business Technology
Copyright  2009 by CEBT
Contributions & Conclusions



Relation in XML

Defined using amoeba structure and FDs

XML Algebra
Relational-Style XML Query

Retrieves relations in XML with a SQL-like query syntax (SQL over XML)

Allows structural variations of XML data
Departure from path expression queries



Target XML structures are automatically determined
Applications

Database integration

Managing relational data enhanced with XML syntax
“It’s Just SQL”

A large number of XML data and queries are still relational
Center for E-Business Technology
Copyright  2009 by CEBT
Contributions & Conclusions
It’s not tragedy
anymore…
Center for E-Business Technology
Copyright  2009 by CEBT
My thought on
Really?
Center for E-Business Technology
Thoughts on The Paper

Good Points

Generally, I think it’s a very good paper

The paper shows us very interesting motivation and clearly define the problem

The approach in the paper is intuitive and well-explained

The authors performed well-designed experiments

The paper proposes many future works
–

Maybe, one of us can work on them
However,

I doubt if it is really useful – Can we use this for an important business project?

Although the author insists that their algorithm scales well, some queries caused out of
memory and didn’t work
–

It’s a crucial problem to be used in real-life business
Isn’t Xpath good enough?
–
People who use XML docs usually know the structure of XML docs
Center for E-Business Technology
Copyright  2009 by CEBT