Download XML Data Storage

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Data analysis wikipedia , lookup

Data model wikipedia , lookup

Information privacy law wikipedia , lookup

Database wikipedia , lookup

SAP IQ wikipedia , lookup

Business intelligence wikipedia , lookup

3D optical data storage wikipedia , lookup

Versant Object Database wikipedia , lookup

Open data in the United Kingdom wikipedia , lookup

B-tree wikipedia , lookup

Data vault modeling wikipedia , lookup

Clusterpoint wikipedia , lookup

Entity–attribute–value model wikipedia , lookup

Relational algebra wikipedia , lookup

XML wikipedia , lookup

Database model wikipedia , lookup

Relational model wikipedia , lookup

Transcript
Storage of XML Data
• XML data can be stored in
– Non-relational data stores
• Flat files
– Natural for storing XML
– But has all problems discussed in Chapter 1 (no
concurrency, no recovery, …)
• XML database
– Database built specifically for storing XML data, supporting
DOM model and declarative querying
– Currently no commercial-grade systems
– Relational databases
• Data must be translated into relational form
• Advantage: mature database systems
• Disadvantages: overhead of translating data and queries
Storing XML as Relations
• Tree representation: model XML data as tree and store
using relations
nodes(id, type, label, value)
child (child-id, parent-id)
– Each element/attribute is given a unique identifier
– Type indicates element/attribute
– Label specifies the tag name of the element/name of
attribute
– Value is the text value of the element/attribute
– The relation child notes the parent-child relationships
in the tree
• Can add an extra attribute to child to record ordering
of children
Storing XML as Relations
– Benefit: Can store any XML data, even without DTD
– Drawbacks:
• Data is broken up into too many pieces, increasing
space overheads
• Even simple queries require a large number of joins,
which can be slow
Root
SigmodRecord
1
issue
2
volume
articles
number
3
4
11
1
5
6
title
7
Annotated Bibliography on Data Design
initPage
authors
endPage
8
9
45
77
10
author
author
11
position
13
position
Anthony I. Wasserman
12
Karen Botnich
14
Src
0
1
2
2
2
5
6
6
6
6
Ord
1
1
1
2
3
1
1
2
3
4
Tgt
1
2
3
4
5
6
7
8
9
10
Label
SigmodRecord
issue
volume
number
articles
article
title
initPage
endPage
authors
Flag
ref
ref
val
val
ref
ref
val
val
val
ref
Value
11
1
Annotated
45
77
Storing XML in Relations
• Map to relations
– If DTD of document is known, can map data to relations
– Bottom-level elements and attributes are mapped to attributes of
relations
– A relation is created for each element type
• An id attribute to store a unique id for each element
• all element attributes become relation attributes
• All subelements that occur only once become attributes
– For text-valued subelements, store the text as attribute value
– For complex subelements, store the id of the subelement
• Subelements that can occur multiple times represented in a
separate table
– Similar to handling of multivalued attributes when
converting ER diagrams to tables
Storing XML in Relations
•
Benefits:
• Efficient storage
• Can translate XML queries into SQL, execute
efficiently, and then translate SQL results back to
XML
– Drawbacks: need to know DTD, translation overheads
still present
Start mapping – Annotated DTD graph
DTD graph represent the structure of XML document.
Node: Element / Attribute
Edge: The parent-child relationship with operators
Patient
?
Name
*
DOB
Case
Hospital
Details
Inlining
The process that put the nodes in DTD graph into
attributes of relational schema.
Patient (PatientID, Name, DOB)
Patient
?
Name
Name and DOB is inlined into
The relation Patient
*
DOB
Case
Hospital
Details
Basic Inlining
Inline as many descendants as possible
Rules:
1. Relation is made for each node
2. For each node, inlined all decendents except those
with incoming edge with + and *, in-degree>1
3. New relation is created for node with incoming
edge with + and *, in-degree>1, add ParentID.
Acknowledgment:
Jayavel Shanmugasundaram, Kristin Tufte, Gang He, Chun Zhang, David DeWitt,
Jeffrey Naughton,
Relational Databases for Querying XML Documents: Limitations and Opportunities
, 25th VLDB Conference Edinburgh, Scotland, 1999.
Basic Inlining
Relations:
Patient (PatientID, Name, DOB)
Case (CaseID,Hospital, Details, PatientID: foreign key)
Name (NameID, Name)
Hospital (HospitalID, Hospital)
Details (DetailsID, Details)
Problems:
Create many relations
Shared Inlining
Share some nodes such as Name.
Rules:
1. Relation is made for the node with in-degree=0 or
>1, incoming edge + or *. Add ParentID if any.
2. For each node, inlined all descendants with indegree=1
Relations:
Patient (PatientID, Name, DOB)
Case (CaseID,Hospital, Details, PatientID: foreign key)