Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Microsoft Jet Database Engine wikipedia , lookup
Open Database Connectivity wikipedia , lookup
Relational algebra wikipedia , lookup
Microsoft SQL Server wikipedia , lookup
Clusterpoint wikipedia , lookup
Entity–attribute–value model wikipedia , lookup
Extensible Storage Engine wikipedia , lookup
Mapping XML to a Wide Sparse Table ABSTRACT XML is commonly supported by SQL database systems. However, existing mappings of XML to tables can only deliver satisfactory query performance for limited use cases. In this paper, we propose a novel mapping of XML data into one wide table whose columns are sparsely populated. This mapping provides good performance for document types and queries that are observed in enterprise applications but are not supported efficiently by existing work. XML queries are evaluated by translating them into SQL queries over the wide sparsely-populated table. We show how to translate full XPath 1.0 into SQL. Based on the characteristics of the new mapping, we present rewriting optimizations that dramatically reduce the number of joins. Experiments demonstrate that query evaluation over the new mapping delivers considerable improvements over existing techniques for the target use cases. Mapping nested elements to flattened tables is the key problem for supporting XML on SQL databases. Many mappingschemes have been proposed to decompose nested structuresinto normalized tables. XML schemas, on the other hand, reduce the complexityof some expensive query constructs specifically designed forXML. One example is the wild card (*) in path expressions, which represents all elements in a document. The expressionA/* is implemented as a structural join between the A elementsand all the element in the database, which can be prohibitively slow. Existing system: XML is commonly supported by SQL database systems. However, existing mappings of XML to tables can only deliver satisfactory query performance for limited use cases. Mapping nested elements to flattened tables is the key problem for supporting XML on SQL databases. Many mapping schemes have been proposed to decompose nested structures into normalized tables. Disadvantages of Existing System: Possibly extended to handle special XML data and query constructs. Mapping XML to a Wide Sparse Table Mapping nested elements to flattened tables is the key problem for supporting XML on SQL databases. . The nested structure is captured by primary-foreign key relationships, so hierarchical navigation in XML queries is evaluated by primary-foreign key joins. Proposed System we propose a novel mapping of XML data into one wide table whose columns are sparsely populated. This mapping provides good performance for document types and queries that are observed in enterprise applications but are not supported efficiently by existing work. XML queries are evaluated by translating them into SQL queries over the wide sparsely-populated table. We show how to translate full XPath 1.0 into SQL. Based on the characteristics of the new mapping, we present rewriting optimizations that dramatically reduce the number of joins. Experiments demonstrate that query evaluation over the new mapping delivers considerable improvements over existing techniques for the target use cases. Advantages of Proposed System: XML query languages contain unconventional constructs that bring more challenges. XML’s schema-less property is one of its major advantages. Though normalization mappings provide schema- and normalization-based optimizations, they can only support a limited number of use cases. IMPLEMENTATION Mapping XML to a Wide Sparse Table Modules 1. 2. 3. 4. 5. Data Guide and Node Encoding Mapping rules Rewriting queries using indexes Applicability for XML with a static schema Extending to XQuery Data Guide and Node Encoding: we describe the mapping from XML to awide, sparsely-populated table. We start by introducing DataGuides and a node encoding scheme. Then we elaborate themapping rules and physical representation of the mappingtable. Finally, we discuss optimizations on the mapping table. Mapping rules: Given the DG of an XML corpus, each DG node is mappedto a column. Each XML element (including attribute and textnodes) is stored in the column to which its correspondingDG node is mapped. Elements from the same document arestored in one or more rows consecutively. For an element v,its children are mapped as follows: MAPPING RULE 1. The children that are not collection elements are stored in the same row as v.MAPPING RULE 2. For the children that belong to acollection, only store the first element in the same row as v, Storethe other elements in separate rows. MAPPING RULE 3. The children that are exception nodes arealways stored in separate rows.Table I shows the result of applying the mapping rules tothe XML document in Figure b1; b2 are two elementsof collection B. Since b1 is the first element under a1, it isstored in the same row as a1 and b2 is stored in a separaterow. Similarly, since c2 is the first element in collection C(under b2), c2 is stored in the same row as its parent b2, andc3 is stored in a separate row, as is the exception node c4.This sparse mapping resembles normalization in the sensethat if two elements are in the same row of a table afternormalization, they are in the same row in this sparse-mappingtable as well. In Table I, dashed lines highlight three logicalsub-tables, T1(A); T2(B; D); T3(C), which are the tables wewould get by normalizing the schema. Unlike conventionalnormalization, tuples in different sub-tables are connected bystructural joins of the elements’ ordpaths, rather than primaryforeign keys. Furthermore, tuples that are in the same sub-tablebut from different documents are not physically consecutive.An important feature of this sparse mapping is that it mimicspaths (i.e., DG nodes) and themapping rules are applied to Mapping XML to a Wide Sparse Table individual elements. Moreover,updates and schema evolution do not require table repartitioning. For example, if a new element d2 is inserted as d1’ssibling, the link between B and D evolves to a one-to-manyrelationship. The sparse mapping only needs to insert a newrow, whereas conventional relational storage would partitionT2(B; D) into two tables T4(B); T5(D). (Inserting a new rowdoes change logical sub-tables. As we will see in SectionIV-A, logical sub-tables are expressed as metadata, whosemaintenance cost is usually cheaper than table repartitioning.)Appendix I gives details of mapping maintenance due toupdates. Rewriting queries using indexes There is only one tablein the sparse mapping. An index on one column may alsoprovide fast access for another column. For example, considera selection on one column, e.g., A=6 null(T ), a frequentoperation in translated algebra expressions. If there is no indexon column A, the selection requires a full table scan. If thetable has an index on column B and the DG indicates thatevery row with an a value must have a b value, then rewritingthe query into A=6 null^B=6 null(T ) utilizes the index on B tofirst filter out unrelated rows and avoid a full scan of the table.In general, DG nodes with the same alias can use the samefiltered index for column selections. Since these columns forma logical sub-table, such a filtered index essentially providesfast accesses to the sub-table, even though tuples from thesub-table are not consecutive in physical storage. Applicability for XML with a static schema While thesparse mapping was designed for the target use cases, it alsohas great potential for XML data with static schema. Physicalstorage of documents in the sparse mapping follows the sameprinciple of normalization. With the annotated DG, queryrewriting can reduce all unnecessary joins. Therefore, for XMLdata with static schema, theexecution plan of a query in thesparse mapping has at most the same number of joins as anormalization mapping (the Rewriting Rule may cause fewerjoins). For this reason, we believe that the sparse mappingwould be competitive on standard benchmarks—a few largeXML documents conforming to a static schema; further testingwill be required to confirm this conclusion.The extra cost of the sparse mapping, compared withnormalization mappings, is the storage overhead of each row,because interpreted storage maintains schema information forindividual rows. In addition, rows have variable length in thesparse mapping. Some may even be very long. This increasesthe row-access cost. Mapping XML to a Wide Sparse Table Extending to XQuery Until now we have discussed XPathtranslation over the sparse mapping. The translation algorithmcan be extended to XQuery as well. In the literature, severalpapers have studied comprehensive translation from XQueryto SQL in which relational algebra is basedon the nodeencoding mapping table. In our mapping, theoutput table of a translated algebra expression has the samesemantics as the algebra expressions under the node-encodingmapping. That is: for an XPath expression e that represents anode set, R(e) represents a binary relation hdocid; ordpathi,which is the same as algebra expressions in the node-encodingmapping. After XPath expressions are translated into algebraexpressions under the sparse mapping, The PATH index is built on (path, value) of the mappingtable. It locates an element first by its path, then by its value. The VALUE index is build on (value, path), which indexesthe same columns as the PATH index, but in reverse order. The PROPERTY index is built on (docid, ordpath, path,value). Since it contains the primary key of the mappingtable, it helps search multi-valued properties in the same XML document