Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Open Database Connectivity wikipedia , lookup
Microsoft Jet Database Engine wikipedia , lookup
Microsoft SQL Server wikipedia , lookup
Functional Database Model wikipedia , lookup
Entity–attribute–value model wikipedia , lookup
Clusterpoint wikipedia , lookup
Relational algebra wikipedia , lookup
Versant Object Database wikipedia , lookup
15-829A/18-849B/95-811A/19-729A Internet-Scale Sensor Systems: Design and Policy Lecture 24 – Part 2 XML Query Processing Phil Gibbons April 15, 2003 Lecture 24 XML Query Processing: Outline • XML vs. Relational • XML on Relational DB: • Shanmugasundaram et al, “Relational Databases for Querying XML Documents: Limitations and Opportunities”, VLDB’99, plus follow on papers • LegoDB, STORED, Edge (2 slides) • To be continued in my next lecture… 04-15-03 Lecture 24 2 XML vs. SQL for Sensor Databases • IrisNet represents data in XML (semi-structured model) • Hierarchical documents, Queries in XPATH • TinyDB represents data in the relational model • Tables, Queries in SQL What are the pros and cons for each approach? How does it depend on the sensing context? 04-15-03 Lecture 24 3 Why IrisNet Uses XML • Rich, heterogeneous data • Hard to capture in a rigid data model • Self-describing tags useful • Schema evolution • XML supports on-the-fly schema changes • Wide area sensing => Hierarchical organization • Good match for XML, bad for relational • Standard data exchange format 04-15-03 Lecture 24 4 Disadvantages of XML • Query languages are lacking • Some minimal features: e.g., aggregates, updates • Query processors not available for XQuery • Query processing is SLOW Key research question: Can we store XML in a relational DB, and use a relational database system to process queries? 04-15-03 Lecture 24 5 Why use Relational DB Systems? • Highly reliable, scalable, optimized for performance, advanced functionality • Result of 30+ years of Research & Development • XML database systems are not “industrial strength” … and not expected to be in the foreseeable future • Existing data and applications • XML applications have to inter-operate with existing relational data and applications • Not enough incentive to move all existing business applications to XML database systems • Lessons from object-oriented database systems? 04-15-03 Lecture 24 Adapted from slides ©Jayavel Shanmugasundaram 6 XML Query Processing: Outline • XML vs. Relational • XML on Relational DB: • Shanmugasundaram et al, “Relational Databases for Querying XML Documents: Limitations and Opportunities”, VLDB’99, plus follow on papers • LegoDB, STORED, Edge (2 slides) • To be continued… 04-15-03 Lecture 24 7 Storing and Querying XML Documents XML Schema XML Documents XML Query XML Result XML Translation Layer Relational Schema SQL Query Tuples Relational Result Translation Information Relational Database System 04-15-03 Lecture 24 Adapted from slides ©Jayavel Shanmugasundaram 8 Relational Data PurchaseOrder Id Customer 200I Cars R Us Day Month Year 10 300I Bikes R Us null June 1999 July Item Payment Quantity Cost 50 2000.00 200I 1 40% 300I Schwinn Tire 100 300I Trek Tire 20 200I Goodyear Tire 200 2500.00 400.00 8000.00 200I 300I 2 1 60% 100% Pid Name 1999 200I Firestone Tire 04-15-03 Lecture 24 Pid Installment Percentage Adapted from slides ©Jayavel Shanmugasundaram 9 SQL Query PurchaseOrder Id Customer Day 200I 300I Cars R Us Bikes R Us 10 null Payment Item Pid 200I 300I 300I 200I Name Firestone Tire Schwinn Tire Trek Tire Goodyear Tire Month Year 1999 June 1999 July Cost Quantity 50 2000.00 100 2500.00 400.00 20 200 8000.00 Pid Installment 1 200I 200I 2 300I 1 Percentage 40% 60% 100% Find all the items bought by “Cars R Us” in the year 1999 Select it.name From PurchaseOrder po, Item it Where po.customer = “Cars R Us” and po.year = 1999 and po.id = it.pid 04-15-03 Lecture 24 Predicates Join Adapted from slides ©Jayavel Shanmugasundaram 10 XML Document Self-describing tags Nested structure Nested sets Order 04-15-03 <PurchaseOrder id=“200I” customer=“Cars R Us”> <Date> <Day> 10 </Day> <Month> June </Month> <Year> 1999 </Year> </Date> <Item name=“Firestone Tire” cost=“2000.00”> <Quantity> 50 </Quantity> </Item> <Item name=“Goodyear Tire” cost=“8000.00”> <Quantity> 200 </Quantity> </Item> <Payment> 40% </Payment> <Payment> 60% </Payment> </PurchaseOrder> Lecture 24 Adapted from slides ©Jayavel Shanmugasundaram 11 XML Schema PurchaseOrder <PurchaseOrder id={integer} customer={string}> Date (Item)* (Payment)* </PurchaseOrder> Date <Date> Day? Month Year </Date> Day <Day> {integer} </Day> Month <Month> {string} </Month> Year <Year> {integer} </Year> Item <Item name={string} cost={float}> Quantity </Item> … and so on 04-15-03 Lecture 24 Adapted from slides ©Jayavel Shanmugasundaram 12 Schemas to Relations: Issues • Complex schema specifications • Two level nature of relational schema (tuples and attributes) vs. arbitrary nesting of XML Schema • Recursion 04-15-03 Lecture 24 Adapted from slides ©Jayavel Shanmugasundaram 13 Naïve Approach PurchaseOrder Id (200I) Customer (Cars R Us) Day (10) Date Month (June) Element Node 04-15-03 Lecture 24 Item Item Payment (40%) … Year (1999) Attribute Node Adapted from slides ©Jayavel Shanmugasundaram 14 Naïve Approach (Contd.) Edges Id Type Name Value 0 Element PurchaseOrder null 1 Attribute Id 200I 2 Attribute Customer Cars R Us 3 Element Date null 4 Element Day 10 5 Element Month June 6 Element Year 1999 … … … … Ordinal ParentId null 0 null 0 1 2 0 1 0 0 3 3 2 3 … … Problem: Many joins for queries (one per hop) eg. PurchaseOrder/Date/Year 04-15-03 Lecture 24 Adapted from slides ©Jayavel Shanmugasundaram 15 Desired Properties of Generated Relational Schema R • All XML documents conforming to XML schema should be “mappable” to tuples in R • All queries over XML documents should be “mappable” to SQL queries over R • Not Required: Ability to re-generate XML schema from R 04-15-03 Lecture 24 Adapted from slides ©Jayavel Shanmugasundaram 16 XML Schema: Further Examples PurchaseOrder <PurchaseOrder id={integer} customer={string}> Date? (Item | Payment)* </PurchaseOrder> PurchaseOrder <PurchaseOrder id={integer} customer={string}> (Date | Payment*) (Item (Item Item)* Payment)* </PurchaseOrder> PurchaseOrder <PurchaseOrder id={integer} customer={string}> Date Item (PurchaseOrder)* Payment </PurchaseOrder> 04-15-03 Lecture 24 Adapted from slides ©Jayavel Shanmugasundaram 17 Simplifying XML Schemas • XML schemas can be “simplified” for translation purposes PurchaseOrder <PurchaseOrder id={integer} customer={string}> (Date | (Payment)*) (Item (Item Item)* Payment)* </PurchaseOrder> PurchaseOrder <PurchaseOrder id={integer} customer={string}> Date? (Item)* (Payment)* </PurchaseOrder> Without undermining storage and query functionality 04-15-03 Lecture 24 Adapted from slides ©Jayavel Shanmugasundaram 18 Simplification Desiderata • Simplify structure, but preserve differences that matter in relational model • Single occurrence (attribute) • Zero or one occurrences (nullable attribute) • Zero or more occurrences (relation) PurchaseOrder <PurchaseOrder id={integer} customer={string}> (Date | (Payment)*) (Item (Item Item)* Payment)* </PurchaseOrder> PurchaseOrder <PurchaseOrder id={integer} customer={string}> Date? (Item)* (Payment)* </PurchaseOrder> 04-15-03 Lecture 24 Adapted from slides ©Jayavel Shanmugasundaram 19 Simplification Rules • Flattening transformations • (e1 e2)* -> e1* e2* • (e1 e2)? -> e1? e2? • (e1 | e2) -> e1? e2? • Simplification transformations • e** -> e* • e*? -> e* • e?* -> e* • e?? -> e? • Grouping transformations • e1* e2* e1* -> e1* e2* …etc What is lost? • e+ -> e* 04-15-03 Lecture 24 Adapted from slides ©Jayavel Shanmugasundaram 20 Result: Translation Normal Form • An XML schema production is either of the form: P <P attr1={type1} … attrm={typem}> a1 … ap ap+1? … aq? aq+1*… ar* </P> where ai aj • … or of the form: P 04-15-03 <P attr1={type1} … attrm={typem}> {type} </P> Lecture 24 Adapted from slides ©Jayavel Shanmugasundaram 21 Simplified XML Schema PurchaseOrder <PurchaseOrder id={integer} customer={string}> Date (Item)* (Payment)* </PurchaseOrder> Date <Date> Day? Month Year </Date> Day <Day> {integer} </Day> Month <Month> {string} </Month> Year <Year> {integer} </Year> Item <Item name={string} cost={float}> Quantity </Item> … and so on 04-15-03 Lecture 24 Adapted from slides ©Jayavel Shanmugasundaram 22 Relational Schema Generation PurchaseOrder (id, customer) 1 Date ? Day 1 Month * * Item (name, cost) 1 Year Payment 1 Quantity Satisfy: Tables are normalized Minimize: Number of joins for simple path expressions (of form /a/b/c) 04-15-03 Lecture 24 Adapted from slides ©Jayavel Shanmugasundaram 23 Generated Relational Schema and Shredded XML Document PurchaseOrder Id Customer Day Month Year 200I Cars R Us 10 June 1999 Payment Item Pid Order Name Cost Quantity Pid Order Value 200I 1 Firestone Tire 2000.00 50 200I 2 40% 200I 3 Goodyear Tire 8000.00 200 200I 4 60% 04-15-03 Lecture 24 Adapted from slides ©Jayavel Shanmugasundaram 24 Example Schema Graph book monogra ph article ? booktitle * title conta cta uthor editor authorID * author name name address authorid ? firstnam e lastname Not just a tree 04-15-03 Lecture 24 Adapted from slides ©Jayavel Shanmugasundaram 25 Shared Inlining Technique • Thus far, works well for trees only • Intuition: • Inline as many sub-elements as possible • Do not inline only if it is a shared, recursive or set subelement. • Technique: • Necessary and Sufficient Condition for shared/ recursive element: In-degree >= 2 in (simplified) schema graph 04-15-03 Lecture 24 Adapted from slides ©Jayavel Shanmugasundaram 26 Relational Schema Generation and XML Document Shredding • Any XML Schema X can be mapped to a relational schema R, and … • Any XML document XD conforming to X can be converted to tuples in R • Further, XD can be recovered from the tuples in R What do you think of the approach, for IrisNet? Exercise: What would the Parking Space Finder relational schema look like? Would there be many or few joins in queries? 04-15-03 Lecture 24 Adapted from slides ©Jayavel Shanmugasundaram 27 04-15-03 Hybrid 2 2 1.5 1.5 0.5 0.5 0 0 Shared Lecture 24 re sid pif en tia l sa ej sm il vr m l am bi l ps 14 m at h ni tf of - x x1 51 6 1 Queries Shared re sid pif en tia l sa ej sm il vr m l bi l ps 14 m at h ni tf of - x x1 51 6 am Total Joins re sid pif en tia l sa ej sm il vr m l am bi l ps 14 m at h ni tf of - x x1 51 6 Joins/Query Path Expression with Length 3 Shared Hybrid 1 Hybrid 2.5 2 1.5 1 0.5 0 Adapted from slides ©Jayavel Shanmugasundaram 28 Varying Path Expression Length Hybrid Shared 11 10 9 8 7 6 5 4 3 2 1 0 Total Joins Total Joins Shared 1 2 3 4 5 6 7 8 9 10 11 Path Length 11 10 9 8 7 6 5 4 3 2 1 0 1 2 3 4 5 6 7 8 9 10 11 Path Length Group 1 DTD 04-15-03 Hybrid Group 3 DTD Lecture 24 Adapted from slides ©Jayavel Shanmugasundaram 29 Storing and Querying XML Documents XML Schema XML Documents XML Query XML Result XML Translation Layer Relational Schema SQL Query Tuples Relational Result Translation Information Relational Database System 04-15-03 Lecture 24 Adapted from slides ©Jayavel Shanmugasundaram 30 XPERANTO Create XML Document Repository Relational Schema Generator Store XML Documents Query over Stored XML Documents XML view over tables to reconstruct shredded XML documents Relational Schema Information Query Processor for XML views of Relational Data XML Document Shredder Create tables Store rows in tables Table 1 Query over tables Table n Relational Database System 04-15-03 Lecture 24 31 XML Query Processing: Outline • XML vs. Relational • XML on Relational DB: • Shanmugasundaram et al, “Relational Databases for Querying XML Documents: Limitations and Opportunities”, VLDB’99, plus follow on papers • LegoDB, STORED, Edge (2 slides) • To be continued… 04-15-03 Lecture 24 32 LegoDB [Bohannon et al, ICDE’02] • An optimization approach: • automatically explores a space of possible mappings • selects the mapping which has the lowest cost for a given application • Important features: • Application-driven: takes into account schema, data statistics, and query workload • Logical/physical independence: interface is XML-based (XML Schema, XQuery, XML data statistics) • Leverage existing technology: XML standards; XML-specific operations for generating space of mappings; relational optimizer for evaluating configurations 04-15-03 Lecture 24 Adapted from slides ©Juliana Freire 33 But What If There’s No Schema? • Revert to one row per edge Id 0 1 2 3 4 5 6 … Type Name Value Ordinal ParentId Element PurchaseOrder null null null Attribute Id 200I 0 0 Attribute Customer Cars R Us 1 0 Element Date null 2 0 Element Day 10 0 3 Element Month June 1 3 Element Year 1999 2 3 … … … … … • STORED [Deutsch, Fernandez, Suciu, Sigmod’99] • Looks at data, finds highly supported patterns for tables • [Florescu, Kossman, Data Engineering Bulletin, 1999] 04-15-03 Lecture 24 Adapted from slides ©Jayavel Shanmugasundaram 34 XML Query Processing: Outline • XML vs. Relational • XML on Relational DB: • Shanmugasundaram et al, “Relational Databases for Querying XML Documents: Limitations and Opportunities”, VLDB’99, plus follow on papers • LegoDB, STORED, Edge (2 slides) • To be continued… (Thurs) • Updates, Native XML DBMS • Also in next lecture: Historical queries 04-15-03 Lecture 24 35