Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Scheduling of Transactions on XML Documents Author: Stijin Dekeyser Jan Hidders Reviewed by Jason Chen, Glenn, Steven, Christian Native XML Database • It’s a database that stores and retrieves XML documents efficiently • The query language should be based on XML, and the structure of XML • The indexes should ensure that the fast execution of queries against XML documents G.Dobbie 2004 Pros and cons of NXD compare to relational Database Pros Cons • NXD is more efficient • NXD is slow to in terms of documents retrieve a view retrieval compare to relational database • NXD requires less system space compare to relational DB since it store less nulls and tables compare to relational DB G.Dobbie 2004 Motivation for conflict schduler • NXD becomes more popular for companies to use • Currently there is no NXD scheduler that can guarantee high concurrency and serializability • State of the art of concurrency control in NXD is currently locking the entire document Data Model of xml document in Conflict scheduler • Similar to Xpath data model representation n 1 <document id = “0”> <person id = “1”, age = “22”> <name>Jason</name> <address>33 Colmar Rd </address> <occupation>Student</occupation> <father> <person id = 3, age = “54”> Roo t n 2 n 3 n 4 <name>Ming</name> </person> Id = 1 Name = “Jason” Figure 1 elemen tattribut e tex t Occupation= “Research Assistant” </father> </person> <person id = “2”, age = “23”> <name>Tanya</name> <occupation>Office Assistant</occupation> </person> </document> figure 1 Figure 2 Data Manipulation language in conflict scheduler • DML is used to query a resultset in a DB, in this case it is a NXD DB • Also similar to Xpath DML • P ::= F | P/ F | P// F • F ::= E | * • Where f can be root or an element of a xml document Dekeyser, S. & Hidders, J. (2004) Operations in Conflict Scheduler • A(n, a) • D(n) • Q(n, p) This is the update operation which add a new node a to an existing node n The new node a is returned as the result of the operation The operation fails if the document can not represent as a tree structure. The update operation delete an existing node n from the document tree This operation does not return any result This operation fails if the document structure is not a tree after the updates This query operation return all the nodes which is connected from node n by path p Dekeyser, S. & Hidders, J. (2004) Locking Scheme in Conflict Scheduler • Locking scheme is used to lock the node when one user is accessing it. In this way it prevents other user to modify it at the same time to protect data integrity • There are 2 locking scheme involved in the conflict scheduler – Path Lock Propagation Scheme – Path Lock Satisfiability Scheme Dekeyser, S. & Hidders, J. (2004) Path Lock Propagation Scheme • In this locking scheme, if node n is locked, all nodes which connect from n with path expression “p” is also locked <a1(Q(, document/person/father/person), t1)> will perform the locking of this node rl(t1, r, document/person/father/person) the propagation process will also perform the following locking rl(t1, n1, person/father/person) rl(t1, n2, father/person) rl(t1, n3, father/person) rl(t1, n8, person) Dekeyser, S. & Hidders, J. (2004) Path Lock Satisfiability Scheme • This locking generates less locks than Path Lock Propagation Scheme but is hard to complex in terms of testing conflict therefore can also effect the efficiency in terms of processing speed of the database system. <a1(Q(, document/person/father/person), t1)> will perform the locking of this node rl(t1, r, document/person/father/person) Dekeyser, S. & Hidders, J. (2004) The Conflict Scheduler • The conflict scheduler is the automaton whose state consists of a schedule S of actions that it has previously accepted and processed, a set of locks L, a dependency graph G which is directed graph whose nodes are transaction identifiers, and an instance graph I. Dekeyser, S. & Hidders, J. (2004) The Conflict Scheduler (cont) • Consider the following schedule S in given the instance graph in figure 1 S in = < a1( Q(r,document/person/occupation, t1), a2 (Q(r, document), t2), a3 (A (n1, person), t2), a4( A(n28, occupation), a5(D ( n22)> The Conflict Scheduler (cont) • Initially : • S: empty, G empty • First operation ( Q(r, document/person/occupation, t1) dependency graph G1 t1 with only 1 node The Conflict Scheduler (cont) • Second operation: document), t2) • Dependency graph G2 t1 a2 (Q(r, t2 • two nodes without any edge between them The Conflict Scheduler (cont) • Third operation: a3 (A (n1, person), t2) Root n1 new person node n2 n3 n28 n4 n20 n5 Id = 1 n9 n21 Name = “Jason” element attribute n22 n23 Occupation= “Research Assistant” text • Dependency graph G3 is still the same to the G2 from the second operation The Conflict Scheduler (cont) • Forth operation: a4( A(n28, occupation) t1 • This operation conflicts with operation a1. ie the node a4 required (occupation is locked by a1 ) thus, G4 now contains an edge from t1 to t2, after a4, no further conflicts appear. The conflict scheduler finishes and accepts Sin as its output schedule • Dependency graph G4: t1 t2 The Conflict Scheduler (cont) • So now a serial schedule equivalent to S in is the schedule obtained by first taking all actions of transaction t1 and then all actions of transaction t2. The order is based on the dependency graph. Conclusion • Concurrency becomes important as NXD is gaining popularity • Conflict scheduler has improved concurrency compare to commit scheduler • The serializability of conflict scheduler has also been proven • Future development will aim to realize it in the industry