Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Storing XML in ORDBMS Amine Kaddara Supervisor: Dr Haddouti XML and Data Management 1 Outline Motivation Benefits of using ORDBMS for storing XML Storage techniques using XORator algorithm JDOM API (JavaDOM) JDOM Examples JDO API(Java Data Objects) JDO Examples XML and Data Management 2 Motivation First, most database vendors today offer universal database products that combine their relational DBMS and ORDBMS offerings into a single product. Second, an ORDBMS has a more expressive type system than an RDBMS. Third, an ORDBMS is better suited for storing and querying XML documents that may use a richer set of data types. XML and Data Management 3 Motivation: Applications Computer-Aided Design (CAD) Computer-Aided Manufacturing (CAM) Computer-Aided Software Engineering (CASE) Network Management Systems Office Information Systems (OIS) and Multimedia Systems Digital Publishing Geographic Information Systems (GIS) Interactive and Dynamic Web sites Other applications with complex and interrelated objects and procedural data. XML and Data Management 4 Motivation: RDBMS weaknesses Poor Representation of “Real World” Entities Normalization leads to relations that do not correspond to entities in “real world”. Semantic Overloading Relational model has only one construct for representing data and data relationships: the relation. Relational model is semantically overloaded. Difficulty Handling Recursive Queries RDBMSs are poor at navigational access to data. Limited Operations RDBMs only have a fixed set of operations which are difficult to extend. XML and Data Management 5 Motivation: ORDBMS Advantages Add object storage facilities to relational database Greater flexibility than strict relational Easier to introduce into organisation than full OO Backwards compatible with strict relational applications, SQL etc Relational paradigm retained Tables with rows of values But attributes can contain objects, sets, arrays, tuples etc XML and Data Management 6 Motivation: ORDBMS Advantages Code held within database, as functions, procedures or methods common functionality can be centralised rather than re-implemented by every application that uses the data BLOBs(Binary Large Objects) and CLOBs(Character Large Objects) are used to store large unstructured values within database allows storage of complex data e.g. multimedia XML and Data Management 7 Motivation: ORDBMS Advantages ORDBMS The ability to directly manipulate data stored in a relational database using an object programming language is called transparent persistence Object-relational mapping means less code to write Higher performance over an embedded SQL or a call interface(JDBC,ODBC) XML and Data Management 8 XML and ORDBMS XML and Data Management 9 XORator mapping The XORator(XML to OR Translator) algorithm is a practical demonstration of the use of XML data types It takes advantage of using an ORDBMS over an RDBMS. XORator uses Document Type Definitions (DTDs) to map XML documents to tables in an ORDBMS. An important part of this mapping is the assignmentof a fragment of an XML document to a new XML data type, called XADT (XML Abstract Data Type). XML and Data Management 10 XORator: DTD -> OR schema Reducing the DTD complexity Building DTD graph Mapping DTD to OR schema Defining XADT(XML Abstract Data Types) XML and Data Management 11 XORator: DTD -> OR schema <!ELEMENT PLAY (INDUCT?, ACT+)> <!ELEMENT INDUCT (TITLE, SUBTITLE*, SCENE+)> <!ELEMENT ACT (SCENE+, TITLE, SUBTITLE*, SPEECH+, PROLOGUE?)> <!ELEMENT SCENE (TITLE, SUBTITLE*, (SPEECH | SUBHEAD)+)> <!ELEMENT SPEECH (SPEAKER, LINE)+> <!ELEMENT PROLOGUE (#PCDATA)> <!ELEMENT TITLE (#PCDATA)> <!ELEMENT SUBTITLE (#PCDATA)> <!ELEMENT SUBHEAD (#PCDATA)> <!ELEMENT SPEAKER (#PCDATA)> <!ELEMENT LINE (#PCDATA)> XML and Data Management 12 XORator: DTD complexity Simplify the DTD information to a form that makes the mapping process easier. Set of transformations to reduce the number of nested expressions and the number of element items: Flattening (to convert a nested definition into a flat representation): (e1,e2)* -> e1, e2 Simplification (to reduce multiple unary operators into a single unary operator) : e1**->e1* Grouping (to group subelements that have the same name): e0; e1*; e1*; e2 -> e0; e1*; e2 In addition, e+ is transformed to e*. XML and Data Management 13 XORator: DTD -> OR schema The simplified version of the previous DTD <!ELEMENT PLAY (INDUCT?, ACT*)> <!ELEMENT INDUCT (TITLE, SUBTITLE*, SCENE*) <!ELEMENT ACT (SCENE*, TITLE, SUBTITLE*, SPEECH*, PROLOGUE?)> <!ELEMENT SCENE (TITLE, SUBTITLE*, SPEECH*, SUBHEAD*)> <!ELEMENT SPEECH (SPEAKER*, LINE*)> <!ELEMENT PROLOGUE (#PCDATA)> <!ELEMENT TITLE (#PCDATA)> <!ELEMENT SUBTITLE (#PCDATA)> <!ELEMENT SUBHEAD (#PCDATA)> <!ELEMENT SPEAKER (#PCDATA)> <!ELEMENT LINE (#PCDATA)> XML and Data Management 14 XORator: DTD -> OR schema we build a DTD graph to represent the structure of the DTD. Nodes in the DTD graph are elements, attributes, and operators. In the DTD graph, elements that contain characters are duplicated to eliminate the sharing. XML and Data Management 15 XORator: DTD -> OR schema Given an DTD graph, a relation is created for nodes that satisfy any of these following conditions: 1) nodes that have an in-degree of zero 2) recursive nodes with in-degree greater than one 3) one node among mutually recursive nodes with indegree one. 4) All remaining nodes (nodes not mapped to a relation) are inlined as attributes under the relation created for their closest ancestor nodes (in the DTD graph). XML and Data Management 16 XML and Data Management 17 XORator: DTD -> OR schema An XADT attribute can store a fragment of an XML document The XORator algorithm allows mapping an entire subtree of the DTD graph to an attribute of the XADT. XML and Data Management 18 XML and Data Management 19 XORator: XADT A storage representation is to use a compressed representation for each XML fragment. The element tags are mapped to integer codes, and element tags are replaced by these integer codes. A small dictionary is stored along with the XML fragment to record the mapping between the integer codes and the actual element tag names. There is two implementations of the XADT: one that uses compression, and the other one that does not. XML and Data Management 20 XORator: XADT The decision to use the “correct” implementation of the XADT is made during the document transformation process by monitoring the effectiveness of the compression technique. Compression is used only if the space efficiency is above a certain threshold value. XML and Data Management 21 XORator: XADT XADT getElm(XADT inXML, VARCHAR rootElm, VARCHAR searchElm, VARCHAR searchKey, INTEGER level): This Method returns all rootElm elements that have searchElm within a depth of level from the rootElm. INTEGER findKeyInElm(XADT inXML, VARCHAR searchElm, VARCHAR searchKey): This method examines all elements with the tag name searchElm in inXML, and searches for all searchElm elements with content that matches the searchKey keyword and returns 1 if true XADT getElmIndex(XADT inXML, VARCHAR parentElm, VARCHAR childElm, INTEGER startPos, INTEGER endPos): This method returns all childElm elements that are children of the parentElm elements and with the sibling order from startPos to endPos positions. XML and Data Management 22 XORator: XADT This query retrieves lines that are spoken in acts by the ‘SPEAKER’ ‘HAMLET’ and have the keyword ‘friend’ in the line. XML and Data Management 23 JDOM JDOM is an open source, tree-based(DOM), pure Java API for parsing, creating, manipulating, and serializing XML documents JDOM represents an XML document as a tree composed of elements, attributes, comments, processing instructions, text nodes, CDATA sections,etc.. JDOM is written in and for Java. It consistently uses the Java coding conventions and the class library and it implemets the cloenable and serializable interfaces XML and Data Management 24 JDOM Xerces 1.4.4 is bundled with JDOM to parse XML documents. A JDOM tree is fully read-write. All parts of the tree can be moved, deleted, and added to, subject to the usual restrictions of XML. Unlike DOM, there are no annoying readonly sections of the tree that one can’t change. XML and Data Management 25 JDOM Example <person> <name>Michael Owen</name> <address>222 Bazza Lane, Liverpool, MN</address> <ssn>111-222-3333</ssn> <email>[email protected]</email> <home-phone>720.111.2222</home-phone> <work-phone>111.222.3333</work-phone> </person> XML and Data Management 26 JDOM Example public class Person { private String name; private String address; private String ssn; private String email; private String homePhone; private String workPhone;// -- allows us to create a Person public Person(String name, String address, String ssn, String email, String homePhone, String workPhone) { this.name = name; this.address = address; this.ssn = ssn; this.email = email; this.homePhone = homePhone; this.workPhone = workPhone; }// -- used by the data-binding XML and Data Management 27 JDOM Example public Person() { }// -- accessors public String getName() { return name; } public String getAddress() { return address; } public String getSsn() { return ssn; } public String getEmail() { return email; } public String getHomePhone() { return homePhone; } public String getWorkPhone() { return workPhone; }// -- mutators public void setName(String name) { this.name = name; } public void setAddress(String address) { this.address = address; } public void setSsn(String ssn) { this.ssn = ssn; } public void setEmail(String email) { this.email = email; } public void setHomePhone(String homePhone) { this.homePhone = homePhone; } public void setWorkPhone(String workPhone) { this.workPhone = workPhone; } XML and Data Management 28 JDOM Example import org.exolab.castor.xml.*; import java.io.FileReader; public class ReadPerson { public static void main(String args[]) { try { Person person = (Person) Unmarshaller.unmarshal(Person.class, new FileReader("person.xml")); System.out.println("Person Attributes"); System.out.println("-----------------"); System.out.println("Name: " + person.getName() ); System.out.println("Address: " + person.getAddress() ); System.out.println("SSN: " + person.getSsn() ); System.out.println("Email: " + person.getEmail() ); System.out.println("Home Phone: " + person.getHomePhone() ); System.out.println("Work Phone: " + person.getWorkPhone() ); } catch (Exception e) { System.out.println( e ); } } } XML and Data Management 29 JDOM Example import org.exolab.castor.xml.*; import java.io.FileWriter; public class CreatePerson { public static void main(String args[]) { try {// -- create a person to work with Person person = new Person("Bob Harris", "123 Foo Street", "222-2222222", "[email protected]", "(123) 123-1234", "(123) 123-1234");// -marshal the person object out as a <person> FileWriter file = new FileWriter("bob_person.xml"); Marshaller.marshal(person, file); file.close(); } catch (Exception e) { System.out.println( e ); } } } XML and Data Management 30 JDOM Example import org.exolab.castor.xml.*;import java.io.FileWriter; import java.io.FileReader; public class ModifyPerson { public static void main(String args[]) { try { // -- read in the person Person person = (Person) Unmarshaller.unmarshal(Person.class, new FileReader("person.xml")); // -- change the name person.setName("David Beckham"); // -- marshal the changed person back to disk FileWriter file = new FileWriter("person.xml"); Marshaller.marshal(person, file); file.close(); } catch (Exception e) { System.out.println( e ); } }} XML and Data Management 31 JDO Sun's Java Data Objects (JDO) standard. JDO allows you to persist Java objects. It supports transactions and multiple users. It differs from JDBC in that you don't have to think about SQL and "all that database stuff." It differs from serialization as it allows multiple users and transactions. It allows Java developers to use their object model as a data model. There is no need to spend time going between the "data" side and the "object" side. XML and Data Management 32 JDO: Example package addressbook; import java.util.*;//OF Import javax.jdo.*; Importcom.prismt.j2ee.connector.jdbc.ManagedConnectionFactoryImpl; public class PersonPersist{ private final static int SIZE = 3; private PersistenceManagerFactory pmf = null; private PersistenceManager pm = null; private Transaction transaction = null; private Person[] people; // Vector of current object identifiers private Vector id = new Vector(SIZE); public PersonPersist() { try { Properties props = new Properties(); props.setProperty("javax.jdo.PersistenceManagerFactoryClass", "com.prismt.j2ee.jdo.PersistenceManagerFactoryImpl"); pmf = JDOHelper.getPersistenceManagerFactory(props); pmf.setConnectionFactory( createConnectionFactory() ); } catch(Exception ex) { XML and Data Management 33 ex.printStackTrace(); System.exit(1); } } JDO: Example public static Object createConnectionFactory() { ManagedConnectionFactoryImpl mcfi = new ManagedConnectionFactoryImpl(); Object connectionFactory = null; try { mcfi.setUserName("scott"); mcfi.setPassword("tiger"); mcfi.setConnectionURL("jdbc:oracle:thin:@localhost:1521:thedb"); mcfi.setDBDriver("oracle.jdbc.driver.OracleDriver"); connectionFactory = mcfi.createConnectionFactory(); } catch(Exception e) { e.printStackTrace(); System.exit(1); } return connectionFactory; } XML and Data Management 34 JDO: Example public void persistPeople() { // create an array of Person's people = new Person[SIZE]; // create three people people[0] = new Person("Gary Segal", "123 Foobar Lane“,"123-123-1234", "[email protected]", "(608) 294-0192", "(608) 029-4059"); people[1] = new Person("Michael Owen", "222 Bazza Lane, Liverpool, MN", "111-222-3333", "[email protected]", "(720) 111-2222", "(303) 222-3333"); people[2] = new Person("Roy Keane", "222 Trafford Ave, Manchester, MN", "234-235-3830", "[email protected]", "(720) 940-9049", "(303) 309-7599)"); // persist the array of people pm = pmf.getPersistenceManager(); transaction = pm.currentTransaction(); pm.makePersistentAll(people); transaction.commit(); // retrieve the object ids for the persisted objects for(int i = 0; i < people.length; i++) { id.add(pm.getObjectId(people[i])); } // close current persistence manager to ensure that // objects are read from the db not the persistence // manager's memory cache. pm.close(); } XML and Data Management 35 JDO: Example public void change() { Person person; // retrieve objects from datastore pm =pmf.getPersistenceManager(); transaction = pm.currentTransaction(); // change DataString field of the second persisted object person=(Person)pm.getObjectById(id.elementAt(1, false); person.setName("Steve Gerrard"); // commit the change and close the persistence manager transaction.commit(); pm.close(); } XML and Data Management 36 JDOM Example <addressbook name="Manchester United Address Book"> <person name="Roy Keane"> <address>23 Whistlestop Ave</address> <ssn>111-222-3333</ssn> <email>[email protected]</email> <home-phone>720.111.2222</home-phone> <work-phone>111.222.3333</work-phone> </person> <person name="Juan Sebastian Veron"> <address>123 Foobar Lane</address> <ssn>222-333-444</ssn> <email>[email protected]</email> <home-phone>720.111.2222</home-phone> <work-phone>111.222.3333</work-phone> </person> </addressbook> XML and Data Management 37 JDOM: Example import java.util.List; import java.util.ArrayList; public class Addressbook { private String addressBookName; private List persons = new ArrayList(); public Addressbook() { }// -- manipulate the List of Person public void addPerson(Person person) { persons.add(person); } public List getPersons() { return persons; } // -- manipulate the name of the address book public String getName() { return addressBookName; } public void setName(String name) { this.addressBookName = name; } }XML and Data Management 38 JDOM Example <?xml version="1.0"?> <mapping> <description>A mapping file for our Address Book application</description> <class name="Person"> <field name="name" type="string"> <bind-xml name="name" node="attribute" /> </field> <field name="address" type="string" /> <field name="ssn" type="string" /> <field name="email" type="string" /> <field name="homePhone" type="string" /> <field name="workPhone" type="string" /> </class> <class name="Addressbook"> <field name="name" type="string"> <bind-xml name="name" node="attribute" /> </field> <field name="persons" type="Person" collection="collection" /> </class> </mapping> XML and Data Management 39 JDOM Example import org.exolab.castor.xml.*; import org.exolab.castor.mapping.*; import java.io.FileReader; import java.util.List; import java.util.Iterator; public class ViewAddressbook { public static void main(String args[]) { try { // -- Load a mapping file Mapping mapping = new Mapping(); mapping.loadMapping("mapping.xml"); Unmarshaller un = new Unmarshaller(Addressbook.class); un.setMapping( mapping ); // -- Read in the Addressbook using the mapping FileReader in = new FileReader("addressbook.xml"); Addressbook book = (Addressbook) un.unmarshal(in); in.close(); XML and Data Management 40 JDOM Example // -- Display the addressbook System.out.println( book.getName() ); List persons = book.getPersons(); Iterator iter = persons.iterator(); while ( iter.hasNext() ) { Person person = (Person) iter.next(); System.out.println("\n" + person.getName() ); System.out.println("-----------------------------"); System.out.println("Address = "+ person.getAddress()); System.out.println("SSN = " + person.getSsn() ); System.out.println("Home Phone = " + person.getHomePhone() ); } } catch (Exception e) { System.out.println( e ); } } } XML and Data Management 41 The End XML and Data Management 42