Download 7. XML_Native Storage

Storing XML using native storage Presented by Molato Badr Supervised by Dr. H.Haddouti Introduction • XML more frequently used development of systems that store and query xml data efficiently • Research to improve system performance: • Indexing paths • Optimizing XML queries – Storage configuration of XML data on disk efficiency of an XML Data Management System Outlines I. Native storage as a definition II. Several Native storage strategies III. Comparison to DBMS storage Native storage? • based on the XML Data Models such as Document Object Model (DOM), • NXDs : a native XML database is simply a database for storing and accessing XML using XML. NXDs • NXD defines a (logical) model for an XML document, stores and retrieves documents according to that model. • Has an XML document as its fundamental unit of (logical) storage, just as relational database has a row in a table as its fundamental unit of (logical) storage. • Documents go in and documents come out. Thus NXD may not actually be a standalone database at all. • NXD is intended to developer by providing robust storage and manipulation of XML documents. • NXDs manage collections of documents, allowing you to query and manipulate those documents as a set. Native storage strategies • Schema independent – Subtree-based strategy (Natix) – Document based strategy (Apache Xindice system) – Element based strategy (TIMBER) each element node is a record. • OrientStore two schema-guided storage strategies: – Element-Based Clustering (EBC), – Logical partition-Based Clustering (LPC) strategies. Subtree-strategy (Natix) • Natix (University of Mannheim, Germany) – Semantically partition large document into subtrees based on tree structure – Store each subtree in one record (unit of storage) that is atomic – Proxy nodes are used to connect subtrees in different records – Primitives for read/write/insert/delete of element – Record size need not be statically configured, can be a dynamic value; adapting to the size and structure of document at runtime – Reconstruction of original tree by replacing proxies by subtrees Document based strategy (Apache Xindice system) • No mapping to relational required • Stores documents in tokenized form • Provides quick fragment retrieval • Supports optimized XML querying Document based strategy (Apache Xindice system) cont’ • • • • • • • Basic unit of data is a Document Sets of Documents are Collections Collections may contain Collections Think of it as a file system for XML Collections may be indexed Collections may maintain XMLObjects XMLObjects are like Stored Procedures Element-based strategy (TIMBER) Element-based strategy (TIMBER) • Build on Shore (responsible for disk management) • takes an XML document as input, produces a parse tree as output. • Takes each node of this parse tree as it is produced, transforms it into an internal representation • Stores it into shore as an atomic unit of storage • Each node corresponds to an element. Child nodes for subelement. • All attributes of an element node are clubbed into a single node Stored as a child node of that element. • The content of an element node is pulled out in a child node. • Mixed content: each pulled out in a separate child node. Schema guided strategy (OrientStore) • EBC (Element-Based clustering) similar to Elementbased strategy but clusters the element records such that records with the same schemaNodeID. • LPC (Logical partition-based clustering): The Logical Partition-Based Clustering (LPC) storage strategy partitions the schema graph into semantic blocks. • A semantic block describes a relatively integrated logical unit. EBC (Element based clustering) Clusters all the elements title together with all their text values together. LPC (logical partition-Based strategy) Book and its children title and publisher form a semantic block. • • Records are instances of the formed semantic blocks: v (n, b1, b2) instance of vendor (name, book). Logical Partition-Based Clustering • all the instances of the same semantic block are clustered together. Thus the records b1 (p1, t1) and b2 (p2, t2) in Figure 2(b) will be stored in a physical page, • v (n, b1, b2) may be stored in another physical page. N.B.: Lies between Subtree based strategy and element-based strategy Comparison with DBMS

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download 7. XML_Native Storage