Download Mapping XML to a Wide Sparse Table

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Microsoft Jet Database Engine wikipedia , lookup

Open Database Connectivity wikipedia , lookup

Relational algebra wikipedia , lookup

Microsoft SQL Server wikipedia , lookup

Clusterpoint wikipedia , lookup

PL/SQL wikipedia , lookup

SQL wikipedia , lookup

Entity–attribute–value model wikipedia , lookup

Join (SQL) wikipedia , lookup

Extensible Storage Engine wikipedia , lookup

Relational model wikipedia , lookup

Database model wikipedia , lookup

Transcript
Mapping XML to a Wide Sparse Table
ABSTRACT
XML is commonly supported by SQL database systems. However, existing
mappings of XML to tables can only deliver satisfactory query performance for
limited use cases. In this paper, we propose a novel mapping of XML data into one
wide table whose columns are sparsely populated. This mapping provides good
performance for document types and queries that are observed in enterprise
applications but are not supported efficiently by existing work. XML queries are
evaluated by translating them into SQL queries over the wide sparsely-populated
table. We show how to translate full XPath 1.0 into SQL. Based on the
characteristics of the new mapping, we present rewriting optimizations that
dramatically reduce the number of joins. Experiments demonstrate that query
evaluation over the new mapping delivers considerable improvements over
existing techniques for the target use cases. Mapping nested elements to flattened
tables is the key problem for supporting XML on SQL databases. Many
mappingschemes have been proposed to decompose nested structuresinto
normalized tables. XML schemas, on the other hand, reduce the complexityof
some expensive query constructs specifically designed forXML. One example is
the wild card (*) in path expressions, which represents all elements in a document.
The expressionA/* is implemented as a structural join between the A elementsand
all the element in the database, which can be prohibitively slow.
Existing system:
XML is commonly supported by SQL database systems. However, existing
mappings of XML to tables can only deliver satisfactory query performance for
limited use cases. Mapping nested elements to flattened tables is the key problem
for supporting XML on SQL databases. Many mapping schemes have been
proposed to decompose nested structures into normalized tables.
Disadvantages of Existing System:

Possibly extended to handle special XML data and query constructs.
Mapping XML to a Wide Sparse Table

Mapping nested elements to flattened tables is the key problem for supporting XML on
SQL databases.

. The nested structure is captured by primary-foreign key relationships, so hierarchical
navigation in XML queries is evaluated by primary-foreign key joins.
Proposed System
we propose a novel mapping of XML data into one wide table whose columns are
sparsely populated. This mapping provides good performance for document types and
queries that are observed in enterprise applications but are not supported efficiently by
existing work. XML queries are evaluated by translating them into SQL queries over the
wide sparsely-populated table. We show how to translate full XPath 1.0 into SQL.
Based on the characteristics of the new mapping, we present rewriting optimizations
that dramatically reduce the number of joins. Experiments demonstrate that query
evaluation over the new mapping delivers considerable improvements over existing
techniques for the target use cases.
Advantages of Proposed System:

XML query languages contain unconventional constructs that bring more challenges.

XML’s schema-less property is one of its major advantages.

Though normalization mappings provide schema- and normalization-based optimizations,
they can only support a limited number of use cases.
IMPLEMENTATION
Mapping XML to a Wide Sparse Table
Modules
1.
2.
3.
4.
5.
Data Guide and Node Encoding
Mapping rules
Rewriting queries using indexes
Applicability for XML with a static schema
Extending to XQuery
Data Guide and Node Encoding:
we describe the mapping from XML to awide, sparsely-populated table. We start by introducing
DataGuides and a node encoding scheme. Then we elaborate themapping rules and physical
representation of the mappingtable. Finally, we discuss optimizations on the mapping table.
Mapping rules:
Given the DG of an XML corpus, each DG node is mappedto a column. Each XML element
(including attribute and textnodes) is stored in the column to which its correspondingDG node is
mapped. Elements from the same document arestored in one or more rows consecutively. For an
element v,its children are mapped as follows:
MAPPING RULE 1. The children that are not collection elements are stored in the same row as
v.MAPPING RULE 2. For the children that belong to acollection, only store the first element in
the same row as v, Storethe other elements in separate rows.
MAPPING RULE 3. The children that are exception nodes arealways stored in separate
rows.Table I shows the result of applying the mapping rules tothe XML document in Figure b1;
b2 are two elementsof collection B. Since b1 is the first element under a1, it isstored in the same
row as a1 and b2 is stored in a separaterow. Similarly, since c2 is the first element in collection
C(under b2), c2 is stored in the same row as its parent b2, andc3 is stored in a separate row, as is
the exception node c4.This sparse mapping resembles normalization in the sensethat if two
elements are in the same row of a table afternormalization, they are in the same row in this
sparse-mappingtable as well. In Table I, dashed lines highlight three logicalsub-tables, T1(A);
T2(B; D); T3(C), which are the tables wewould get by normalizing the schema. Unlike
conventionalnormalization, tuples in different sub-tables are connected bystructural joins of the
elements’ ordpaths, rather than primaryforeign keys. Furthermore, tuples that are in the same
sub-tablebut from different documents are not physically consecutive.An important feature of
this sparse mapping is that it mimicspaths (i.e., DG nodes) and themapping rules are applied to
Mapping XML to a Wide Sparse Table
individual elements. Moreover,updates and schema evolution do not require table repartitioning.
For example, if a new element d2 is inserted as d1’ssibling, the link between B and D evolves to
a one-to-manyrelationship. The sparse mapping only needs to insert a newrow, whereas
conventional relational storage would partitionT2(B; D) into two tables T4(B); T5(D). (Inserting
a new rowdoes change logical sub-tables. As we will see in SectionIV-A, logical sub-tables are
expressed as metadata, whosemaintenance cost is usually cheaper than table
repartitioning.)Appendix I gives details of mapping maintenance due toupdates.
Rewriting queries using indexes
There is only one tablein the sparse mapping. An index on one column may alsoprovide fast
access for another column. For example, considera selection on one column, e.g., A=6 null(T ), a
frequentoperation in translated algebra expressions. If there is no indexon column A, the
selection requires a full table scan. If thetable has an index on column B and the DG indicates
thatevery row with an a value must have a b value, then rewritingthe query into A=6 null^B=6
null(T ) utilizes the index on B tofirst filter out unrelated rows and avoid a full scan of the
table.In general, DG nodes with the same alias can use the samefiltered index for column
selections. Since these columns forma logical sub-table, such a filtered index essentially
providesfast accesses to the sub-table, even though tuples from thesub-table are not consecutive
in physical storage.
Applicability for XML with a static schema
While thesparse mapping was designed for the target use cases, it alsohas great potential for
XML data with static schema. Physicalstorage of documents in the sparse mapping follows the
sameprinciple of normalization. With the annotated DG, queryrewriting can reduce all
unnecessary joins. Therefore, for XMLdata with static schema, theexecution plan of a query in
thesparse mapping has at most the same number of joins as anormalization mapping (the
Rewriting Rule may cause fewerjoins). For this reason, we believe that the sparse mappingwould
be competitive on standard benchmarks—a few largeXML documents conforming to a static
schema; further testingwill be required to confirm this conclusion.The extra cost of the sparse
mapping, compared withnormalization mappings, is the storage overhead of each row,because
interpreted storage maintains schema information forindividual rows. In addition, rows have
variable length in thesparse mapping. Some may even be very long. This increasesthe row-access
cost.
Mapping XML to a Wide Sparse Table
Extending to XQuery
Until now we have discussed XPathtranslation over the sparse mapping. The translation
algorithmcan be extended to XQuery as well. In the literature, severalpapers have studied
comprehensive translation from XQueryto SQL in which relational algebra is basedon the nodeencoding mapping table. In our mapping, theoutput table of a translated algebra expression has
the samesemantics as the algebra expressions under the node-encodingmapping. That is: for an
XPath expression e that represents anode set, R(e) represents a binary relation hdocid;
ordpathi,which is the same as algebra expressions in the node-encodingmapping. After XPath
expressions are translated into algebraexpressions under the sparse mapping,



The PATH index is built on (path, value) of the mappingtable. It locates an element first
by its path, then by its value.
The VALUE index is build on (value, path), which indexesthe same columns as the
PATH index, but in reverse order.
The PROPERTY index is built on (docid, ordpath, path,value). Since it contains the
primary key of the mappingtable, it helps search multi-valued properties in the same
XML document