Download XML

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Relational model wikipedia , lookup

Entity–attribute–value model wikipedia , lookup

Clusterpoint wikipedia , lookup

Database model wikipedia , lookup

Transcript
Storing XML in ORDBMS
Amine Kaddara
Supervisor: Dr Haddouti
XML and Data Management
1
Outline
Motivation
Benefits of using ORDBMS for storing XML
Storage techniques using XORator algorithm
JDOM API (JavaDOM)
JDOM Examples
JDO API(Java Data Objects)
JDO Examples
XML and Data Management
2
Motivation
First, most database vendors today offer
universal database products that combine their
relational DBMS and ORDBMS offerings into a
single product.
Second, an ORDBMS has a more expressive
type system than an RDBMS.
Third, an ORDBMS is better suited for storing
and querying XML documents that may use a
richer set of data types.
XML and Data Management
3
Motivation: Applications
Computer-Aided Design (CAD)
Computer-Aided Manufacturing (CAM)
Computer-Aided Software Engineering (CASE)
Network Management Systems
Office Information Systems (OIS) and Multimedia
Systems
Digital Publishing
Geographic Information Systems (GIS)
Interactive and Dynamic Web sites
Other applications with complex and interrelated
objects and procedural data.
XML and Data Management
4
Motivation: RDBMS weaknesses
Poor Representation of “Real World” Entities

Normalization leads to relations that do not correspond to entities
in “real world”.
Semantic Overloading


Relational model has only one construct for representing data and
data relationships: the relation.
Relational model is semantically overloaded.
Difficulty Handling Recursive Queries
RDBMSs are poor at navigational access to data.
Limited Operations

RDBMs only have a fixed set of operations which are difficult to
extend.
XML and Data Management
5
Motivation: ORDBMS Advantages
Add object storage facilities to relational
database



Greater flexibility than strict relational
Easier to introduce into organisation than full OO
Backwards compatible with strict relational
applications, SQL etc
Relational paradigm retained


Tables with rows of values
But attributes can contain objects, sets, arrays, tuples
etc
XML and Data Management
6
Motivation: ORDBMS Advantages
Code held within database, as functions,
procedures or methods

common functionality can be centralised rather
than re-implemented by every application that
uses the data
BLOBs(Binary Large Objects) and
CLOBs(Character Large Objects) are used to
store large unstructured values within
database

allows storage of complex data e.g. multimedia
XML and Data Management
7
Motivation: ORDBMS Advantages
ORDBMS
The ability to directly manipulate data
stored in a relational database using an
object programming language is called
transparent persistence
Object-relational mapping means less
code to write
Higher performance over an embedded
SQL or a call interface(JDBC,ODBC)
XML and Data Management
8
XML and ORDBMS
XML and Data Management
9
XORator mapping
The XORator(XML to OR Translator) algorithm
is a practical demonstration of the use of XML
data types
It takes advantage of using an ORDBMS over
an RDBMS.
XORator uses Document Type Definitions
(DTDs) to map XML documents to tables in an
ORDBMS.
An important part of this mapping is the
assignmentof a fragment of an XML document to
a new XML data type, called XADT (XML
Abstract Data Type).
XML and Data Management
10
XORator: DTD -> OR schema
Reducing the DTD complexity
Building DTD graph
Mapping DTD to OR schema
Defining XADT(XML Abstract Data Types)
XML and Data Management
11
XORator: DTD -> OR schema
<!ELEMENT PLAY (INDUCT?, ACT+)>
<!ELEMENT INDUCT (TITLE, SUBTITLE*, SCENE+)>
<!ELEMENT ACT (SCENE+, TITLE, SUBTITLE*, SPEECH+,
PROLOGUE?)>
<!ELEMENT SCENE (TITLE, SUBTITLE*, (SPEECH |
SUBHEAD)+)>
<!ELEMENT SPEECH (SPEAKER, LINE)+>
<!ELEMENT PROLOGUE (#PCDATA)>
<!ELEMENT TITLE (#PCDATA)>
<!ELEMENT SUBTITLE (#PCDATA)>
<!ELEMENT SUBHEAD (#PCDATA)>
<!ELEMENT SPEAKER (#PCDATA)>
<!ELEMENT LINE (#PCDATA)>
XML and Data Management
12
XORator: DTD complexity
Simplify the DTD information to a form that
makes the mapping process easier.
Set of transformations to reduce the number of
nested expressions and the number of element
items:




Flattening (to convert a nested definition into a flat
representation): (e1,e2)* -> e1, e2
Simplification (to reduce multiple unary operators into
a single unary operator) : e1**->e1*
Grouping (to group subelements that have the same
name): e0; e1*; e1*; e2 -> e0; e1*; e2
In addition, e+ is transformed to e*.
XML and Data Management
13
XORator: DTD -> OR schema
The simplified version of the previous DTD











<!ELEMENT PLAY (INDUCT?, ACT*)>
<!ELEMENT INDUCT (TITLE, SUBTITLE*, SCENE*)
<!ELEMENT ACT (SCENE*, TITLE, SUBTITLE*, SPEECH*,
PROLOGUE?)>
<!ELEMENT SCENE (TITLE, SUBTITLE*, SPEECH*, SUBHEAD*)>
<!ELEMENT SPEECH (SPEAKER*, LINE*)>
<!ELEMENT PROLOGUE (#PCDATA)>
<!ELEMENT TITLE (#PCDATA)>
<!ELEMENT SUBTITLE (#PCDATA)>
<!ELEMENT SUBHEAD (#PCDATA)>
<!ELEMENT SPEAKER (#PCDATA)>
<!ELEMENT LINE (#PCDATA)>
XML and Data Management
14
XORator: DTD -> OR schema
we build a DTD graph to represent the
structure of the DTD.
Nodes in the DTD graph are elements,
attributes, and operators.
In the DTD graph, elements that contain
characters are duplicated to eliminate the
sharing.
XML and Data Management
15
XORator: DTD -> OR schema
Given an DTD graph, a relation is created for nodes that
satisfy any of these following conditions:
1) nodes that have an in-degree of zero
2) recursive nodes with in-degree greater than one
3) one node among mutually recursive nodes with indegree one.
4) All remaining nodes (nodes not mapped to a relation)
are inlined as attributes under the relation created for
their closest ancestor nodes (in the DTD graph).
XML and Data Management
16
XML and Data Management
17
XORator: DTD -> OR schema
An XADT attribute can store a fragment of
an XML document
The XORator algorithm allows mapping an
entire subtree of the DTD graph to an
attribute of the XADT.
XML and Data Management
18
XML and Data Management
19
XORator: XADT
A storage representation is to use a compressed
representation for each XML fragment.
The element tags are mapped to integer codes,
and element tags are replaced by these integer
codes.
A small dictionary is stored along with the XML
fragment to record the mapping between the
integer codes and the actual element tag
names.
There is two implementations of the XADT: one
that uses compression, and the other one that
does not.
XML and Data Management
20
XORator: XADT
The decision to use the “correct”
implementation of the XADT is made
during the document transformation
process by monitoring the effectiveness of
the compression technique.
Compression is used only if the space
efficiency is above a certain threshold
value.
XML and Data Management
21
XORator: XADT
XADT getElm(XADT inXML, VARCHAR rootElm, VARCHAR
searchElm, VARCHAR searchKey, INTEGER level):

This Method returns all rootElm elements that have searchElm
within a depth of level from the rootElm.
INTEGER findKeyInElm(XADT inXML, VARCHAR searchElm,
VARCHAR searchKey):

This method examines all elements with the tag name
searchElm in inXML, and searches for all searchElm elements
with content that matches the searchKey keyword and returns 1
if true
XADT getElmIndex(XADT inXML, VARCHAR parentElm,
VARCHAR childElm, INTEGER startPos, INTEGER endPos):

This method returns all childElm elements that are children of
the parentElm elements and with the sibling order from startPos
to endPos positions.
XML and Data Management
22
XORator: XADT
This query retrieves lines that are spoken
in acts by the ‘SPEAKER’ ‘HAMLET’ and
have the keyword ‘friend’ in the line.
XML and Data Management
23
JDOM
JDOM is an open source, tree-based(DOM),
pure Java API for parsing, creating,
manipulating, and serializing XML documents
JDOM represents an XML document as a tree
composed of elements, attributes, comments,
processing instructions, text nodes, CDATA
sections,etc..
JDOM is written in and for Java. It consistently
uses the Java coding conventions and the class
library and it implemets the cloenable and
serializable interfaces
XML and Data Management
24
JDOM
Xerces 1.4.4 is bundled with JDOM to
parse XML documents.
A JDOM tree is fully read-write. All parts of
the tree can be moved, deleted, and
added to, subject to the usual restrictions
of XML.
Unlike DOM, there are no annoying readonly sections of the tree that one can’t
change.
XML and Data Management
25
JDOM Example
<person>
<name>Michael Owen</name>
<address>222 Bazza Lane, Liverpool,
MN</address>
<ssn>111-222-3333</ssn>
<email>[email protected]</email>
<home-phone>720.111.2222</home-phone>
<work-phone>111.222.3333</work-phone>
</person>
XML and Data Management
26
JDOM Example
public class Person {
private String name;
private String address;
private String ssn;
private String email;
private String homePhone;
private String workPhone;// -- allows us to create a Person
public Person(String name, String address, String ssn,
String email, String homePhone, String workPhone) {
this.name = name;
this.address = address;
this.ssn = ssn;
this.email = email;
this.homePhone = homePhone;
this.workPhone = workPhone;
}// -- used by the data-binding
XML and Data Management
27
JDOM Example
public Person() { }// -- accessors
public String getName() { return name; }
public String getAddress() { return address; }
public String getSsn() { return ssn; }
public String getEmail() { return email; }
public String getHomePhone() { return homePhone; }
public String getWorkPhone() { return workPhone; }// -- mutators
public void setName(String name) { this.name = name; }
public void setAddress(String address) { this.address = address; }
public void setSsn(String ssn) { this.ssn = ssn; }
public void setEmail(String email) { this.email = email; }
public void setHomePhone(String homePhone) { this.homePhone =
homePhone; }
public void setWorkPhone(String workPhone) { this.workPhone =
workPhone; }
XML and Data Management
28
JDOM Example
import org.exolab.castor.xml.*;
import java.io.FileReader;
public class ReadPerson {
public static void main(String args[]) {
try {
Person person = (Person) Unmarshaller.unmarshal(Person.class, new
FileReader("person.xml"));
System.out.println("Person Attributes"); System.out.println("-----------------");
System.out.println("Name: " + person.getName() );
System.out.println("Address: " + person.getAddress() );
System.out.println("SSN: " + person.getSsn() );
System.out.println("Email: " + person.getEmail() );
System.out.println("Home Phone: " + person.getHomePhone() );
System.out.println("Work Phone: " + person.getWorkPhone() ); }
catch (Exception e) {
System.out.println( e ); }
}
}
XML and Data Management
29
JDOM Example
import org.exolab.castor.xml.*;
import java.io.FileWriter;
public class CreatePerson {
public static void main(String args[]) {
try {// -- create a person to work with
Person person = new Person("Bob Harris", "123 Foo Street", "222-2222222", "[email protected]", "(123) 123-1234",
"(123) 123-1234");// -marshal the person object out as a <person>
FileWriter file = new FileWriter("bob_person.xml");
Marshaller.marshal(person, file);
file.close(); }
catch (Exception e) {
System.out.println( e ); } }
}
XML and Data Management
30
JDOM Example
import org.exolab.castor.xml.*;import java.io.FileWriter;
import java.io.FileReader;
public class ModifyPerson {
public static void main(String args[]) {
try {
// -- read in the person
Person person = (Person)
Unmarshaller.unmarshal(Person.class, new FileReader("person.xml"));
// -- change the name
person.setName("David Beckham");
// -- marshal the changed person back to disk
FileWriter file = new FileWriter("person.xml");
Marshaller.marshal(person, file);
file.close();
} catch (Exception e) {
System.out.println( e );
} }}
XML and Data Management
31
JDO
Sun's Java Data Objects (JDO) standard.
JDO allows you to persist Java objects.
It supports transactions and multiple users. It
differs from JDBC in that you don't have to think
about SQL and "all that database stuff."
It differs from serialization as it allows multiple
users and transactions.
It allows Java developers to use their object
model as a data model. There is no need to
spend time going between the "data" side and
the "object" side.
XML and Data Management
32
JDO: Example
package addressbook;
import java.util.*;//OF
Import javax.jdo.*;
Importcom.prismt.j2ee.connector.jdbc.ManagedConnectionFactoryImpl;
public class PersonPersist{
private final static int SIZE = 3;
private PersistenceManagerFactory pmf = null;
private PersistenceManager pm = null;
private Transaction transaction = null;
private Person[] people;
// Vector of current object identifiers
private Vector id = new Vector(SIZE);
public PersonPersist() { try {
Properties props = new Properties();
props.setProperty("javax.jdo.PersistenceManagerFactoryClass",
"com.prismt.j2ee.jdo.PersistenceManagerFactoryImpl");
pmf = JDOHelper.getPersistenceManagerFactory(props);
pmf.setConnectionFactory( createConnectionFactory() ); }
catch(Exception ex) {
XML
and Data Management
33
ex.printStackTrace();
System.exit(1); } }
JDO: Example
public static Object createConnectionFactory() {
ManagedConnectionFactoryImpl mcfi = new
ManagedConnectionFactoryImpl(); Object connectionFactory = null;
try {
mcfi.setUserName("scott");
mcfi.setPassword("tiger");
mcfi.setConnectionURL("jdbc:oracle:thin:@localhost:1521:thedb");
mcfi.setDBDriver("oracle.jdbc.driver.OracleDriver");
connectionFactory = mcfi.createConnectionFactory();
} catch(Exception e) {
e.printStackTrace();
System.exit(1); }
return connectionFactory;
}
XML and Data Management
34
JDO: Example
public void persistPeople() { // create an array of Person's
people = new Person[SIZE]; // create three people
people[0] = new Person("Gary Segal", "123 Foobar Lane“,"123-123-1234",
"[email protected]", "(608) 294-0192", "(608) 029-4059");
people[1] = new Person("Michael Owen", "222 Bazza Lane, Liverpool,
MN",
"111-222-3333", "[email protected]",
"(720)
111-2222", "(303) 222-3333");
people[2] = new Person("Roy Keane", "222 Trafford Ave, Manchester,
MN",
"234-235-3830", "[email protected]", "(720) 940-9049",
"(303) 309-7599)"); // persist the array of people
pm = pmf.getPersistenceManager();
transaction = pm.currentTransaction();
pm.makePersistentAll(people);
transaction.commit(); // retrieve the object ids for the persisted objects
for(int i = 0; i < people.length; i++) {
id.add(pm.getObjectId(people[i]));
} // close current persistence manager to ensure that // objects are read
from the db not the persistence // manager's memory cache.
pm.close();
}
XML and Data Management
35
JDO: Example
public void change() {
Person person; // retrieve objects from datastore
pm =pmf.getPersistenceManager();
transaction = pm.currentTransaction();
// change DataString field of the second persisted object
person=(Person)pm.getObjectById(id.elementAt(1, false);
person.setName("Steve Gerrard"); // commit the change
and close the persistence manager
transaction.commit();
pm.close();
}
XML and Data Management
36
JDOM Example

<addressbook name="Manchester United Address Book">
<person name="Roy Keane">
<address>23 Whistlestop Ave</address>
<ssn>111-222-3333</ssn>
<email>[email protected]</email>
<home-phone>720.111.2222</home-phone>
<work-phone>111.222.3333</work-phone>
</person>
<person name="Juan Sebastian Veron">
<address>123 Foobar Lane</address>
<ssn>222-333-444</ssn>
<email>[email protected]</email>
<home-phone>720.111.2222</home-phone>
<work-phone>111.222.3333</work-phone>
</person>
</addressbook>
XML and Data Management
37
JDOM: Example
import java.util.List;
import java.util.ArrayList;
public class Addressbook {
private String addressBookName;
private List persons = new ArrayList();
public Addressbook() { }// -- manipulate the List of Person
public void addPerson(Person person) {
persons.add(person); }
public List getPersons() { return persons; }
// -- manipulate the name of the address book
public String getName() { return addressBookName; }
public void setName(String name) {
this.addressBookName = name; }
}XML and Data Management
38
JDOM Example
<?xml version="1.0"?>
<mapping>
<description>A mapping file for our Address Book application</description>
<class name="Person">
<field name="name" type="string">
<bind-xml name="name" node="attribute" />
</field>
<field name="address" type="string" />
<field name="ssn" type="string" />
<field name="email" type="string" />
<field name="homePhone" type="string" />
<field name="workPhone" type="string" />
</class>
<class name="Addressbook">
<field name="name" type="string">
<bind-xml name="name" node="attribute" />
</field>
<field name="persons" type="Person" collection="collection" />
</class>
</mapping>
XML and Data Management
39
JDOM Example
import org.exolab.castor.xml.*;
import org.exolab.castor.mapping.*;
import java.io.FileReader;
import java.util.List;
import java.util.Iterator;
public class ViewAddressbook {
public static void main(String args[]) {
try { // -- Load a mapping file
Mapping mapping = new Mapping();
mapping.loadMapping("mapping.xml");
Unmarshaller un = new Unmarshaller(Addressbook.class);
un.setMapping( mapping ); // -- Read in the Addressbook using the
mapping
FileReader in = new FileReader("addressbook.xml");
Addressbook book = (Addressbook) un.unmarshal(in);
in.close();
XML and Data Management
40
JDOM Example
// -- Display the addressbook
System.out.println( book.getName() );
List persons = book.getPersons();
Iterator iter = persons.iterator();
while ( iter.hasNext() ) {
Person person = (Person) iter.next();
System.out.println("\n" + person.getName() );
System.out.println("-----------------------------");
System.out.println("Address = "+ person.getAddress());
System.out.println("SSN = " + person.getSsn() );
System.out.println("Home Phone = " +
person.getHomePhone() );
}
} catch (Exception e) {
System.out.println( e ); }
}
}
XML and Data Management
41
The End
XML and Data Management
42