Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Management of XML and Semistructured Data Lecture 13: Keys for XML and Advanced query analysis Wednesday, May 9th, 2001 Outline • Keys in XML • Query analysis – Query pruning – Query containment (next time) XML: Keys in XML Schema <purchaseReport> <regions> <zip code="95819"> <part number="872-AA" quantity="1"/> <part number="926-AA" quantity="1"/> <part number="833-AA" quantity="1"/> <part number="455-BX" quantity="1"/> </zip> <zip code="63143"> <part number="455-BX" quantity="4"/> </zip> </regions> <parts> <part number="872-AA">Lawnmower</part> <part number="926-AA">Baby Monitor</part> <part number="833-AA">Lapis Necklace</part> <part number="455-BX">Sturdy Shelves</part> </parts> </purchaseReport> XML Schema: <key name="NumKey"> <selector xpath="parts/part"/> <field xpath="@number"/> </key> Keys in XML Schema • In general, two flavors: <key name=“someDummyNameHere"> <selector xpath=“p"/> <field xpath=“p1"/> <field xpath=“p2"/> . . . <field xpath=“pk"/> </key> <unique name=“someDummyNameHere"> <selector xpath=“p"/> <field xpath=“p1"/> <field xpath=“p2"/> . . . <field xpath=“pk"/> </key> Note: all Xpath expressions “start” at the element currently being defined The fields must identify a single node Keys in XML Schema • Unique = guarantees uniqueness • Key = guarantees uniqueness and existence • All Xpath expressions are “restricted”: – /a/b | /a/c OK for selector” – //a/b/*/c OK for field – To “help the implementors” (???) • Note: better than DTD’s ID mechanism Keys in XML Schema • Examples Recall: must have A single forename, Single surname <key name="fullName"> <selector xpath=".//person"/> <field xpath="forename"/> <field xpath="surname"/> </key> <unique name="nearlyID"> <selector xpath=".//*"/> <field xpath="@id"/> </unique> Foreign Keys in XML Schema • Examples <keyref name="personRef" refer="fullName"> <selector xpath=".//personPointer"/> <field xpath="@first"/> <field xpath="@last"/> </keyref> Another Proposal for Keys • Keys for XML, Buneman, Davidson, Fan, Hara, Tan, in WWW’10, May, 2001. • Cleaner definition • Extends with relative keys • Addresses satisfiability problem Another Proposal for Keys • A key is q{p1, …, pk} • An instance I satisfies the key, if: – x1, x2 q(root) ((z1 p1(x1).z2 p1(x2). z1=z2) ... (z1 pk(x1).z2 pk(x2). z1=z2)) x1 = x2) Another Proposal for Keys Examples: • //person {@id} • //person {name} • //person {firstname, lastname} – What happens with multiple names ? • //person {e} • //person {} – What is the difference between these two ? • //* {id} – What happens if an id doesn’t have an id child ? Another Proposal for Keys Intuition for q{p1, …, pk} If I have k values, z1, …, zk, then there exists at most one x q(root) s.t. z1 p1(x), …, zk pk(x) Think of retrieving x from z1, …, zk, using a hash table Another Proposal for Keys • Some inference rules for keys • q {p1, …, pk} is a key q {p1, …, pn} is a key, for k n • q.q’ {p} is a key q {q’.p} is a key Another Proposal for Keys Relative key: q: q’{p1, …, pk} An instance I satisfies the relative key, if x q(I), q’{p1, …, pk} is a key for the instance rooted at x Another Proposal for Keys Examples • /bible/book/chapter: verse {number} • /bible/book: chapter {number} • /bible: book {name} Another Proposal for Keys • No relative keys in XML-Schema • But could work around: <key name=“dummyName"> <selector xpath=“/bible/book/chapter"/> <field xpath=“number"/> <field xpath=“../number"/> <field xpath=“../../name"/> </key> Combining Keys and Schemas • On XML Integrity Constraints in the Presence of DTDs, Fan and Libkin, PODS’2001 • Keys + DTDs sometimes imply unexpected facts • Main story: implication is undecidable Combining Keys and Schemas <teachers> <teacher name=“Joe”> <subject expert=“Jim”> DB </subject> <subject expert=“Karl”> Graphics </subject> </teacher> <teacher name=“Jim”> <subject expert=“Joe”> AI </subject> <subject expert=“Fred”> OS </subject> </teacher> .... </teachers> <!ELEMENT teachers (teacher+)> <!ELEMENT teacher (subject,subject)> Combining Keys and Schemas Keys and foreign keys: • Keys: – //teacher @name – //subject @expert • Foreign keys: – //@expert //teacher/@name • But this is impossible ! • In general: undecidable to check if it is possible Query Analysis Generic term to describe: • Query rewriting based on schema information • Query containment and minimization Query Rewriting Problem: • Given a query Q – Regular path expression – Or more complex Xquery expression • Given a schema S – graph schema – DTD – XML-Schema • Rewrite Q to some QS s.t. – Q is equivalent to QS over databases conforming to S – QS is more efficient than Q Query Rewriting Optimizing Regular Path Expressions Using Graph Schemas, M.Fernandez and D.Suciu, Data Engineering, 98 Simplest setting: • Regular path expression • Graph schemas Example of Query Rewriting Q = //Department//Project • Naive evaluation: need to traverse entire graph (or tree) Example of Query Rewriting Graph Schema: s1 S= other Org s2 other “Project” “Member” s3 Org = “Department” “College” “School” other = Org ”Project” ”Member” other s4 other Example of Query Rewriting • Schema says: “there can be at most one Department edge; below, there can be at most one Project edge” Q = //Department//Project QS = (other)*/Department/(other)*/Project other = “Department” “College” “School” ”Project” ”Member” • QS can be evaluated more efficiently than Q – Why ? Example of Query Rewriting • How to construct QS systematically from Q and S ? • Step 1 build the automaton A for Q • Step 2 build the product automaton S x A • Step 3 QS = expression of S x A Example of Query Rewriting true true Project Dept A= a3 a2 a1 S xA= S= s1 false other other false other Org Org other s2 other Project other Member s3 Project false Dept Org other false false false other false Project false Member other other false other false false s4 other QS = (other)*/Department/(other)*/Project Query Rewriting Correctness: Proposition If the instance I conforms to S, then Q(I) = QS(I) That is, Q and QS are equivalent over databases conforming to S Query Rewriting Efficiency • Given query Q, instance I, define: cost(Q,I) = | {w(I) | wprefix(Lang(Q))} | Proposition If Q and Q’ are equivalent over all databases conforming to S, and if I conforms to S, then cost(QS,I) cost(Q’,I) Hence, QS is optimal (in a certain sense) Query Rewriting Query Optimization for Structured Documents Based on Knowledge on the Document Type Definition, K. Bohm, K. Gayer, K. Aberer, T. Özsu More complex settings: • Schema = DTD • Query = region algebrar (think: Xpath) Problem is more complex; this works proposes some solution Query Rewriting Idea: analyze DTD and extract 3 relations: Exclusivity. Element is E1 exclusively contained in E2 if every path from the root to E1 goes through E2 Xpath simplification: E1[ancestor-or-self::E2] E1 Query Rewriting Obligation E1 obligatorily contains E2 if it has a child of type E2 E1[E2] E1 Query Rewriting Entrance Location E is an entrance location for E1, E2 if every path from E1 to E2 goes through some E E1[ancestor-or-self::E2] E1[ancestor-or-self::E[ancestor-or-self::E2]] Query Rewriting Add these rules, plus variations, to a rule-based optimizer • HyperStorM – a Structured Document Database • On top of VODAK – an oo database system Open question: does this approach exploit all the information in a DTD/XML-Schema ? How can we exploit what is not used ?