Download Insert(X)

EFFICIENT SOLUTION TO REPLICATED LOG AND DICTIONARY PROBLEM. (Gene T.J. Wuu & Arthur J. Bernstein.) Presented By : Megha Priyanka Overview • Need For Data Replication. • Consistency Constraints For Replicated Data. • Model Of The Distributed Environment. • Dictionary And Log Structure. • Dictionary Problem. • Prior Work. • Proposed Solution. • Comparison With Other Work. • 2DTT Data Structure Improvement. • Extending The Proposed Solution. • Conclusion. Need for replicated data? • Many applications share data objects. • Reliability and fast access are in demand. • First step toward a comprehensive disaster recovery plan. • Availability of data even when individual node fails. 1/21 Consistency constraints for replicated data…. • • Serializable transactions ensure correctness of database. Serial consistency is harder in unreliable distributed system. Why? -> Availability conflicts with serial consistency. -> Concurrency and serializability are compatible when concurrent transactions access disjoint databases. So, • • Lower the consistency bar. Use a weaker consistency constraint with additional information about th distributed transaction. 2/21 Event Model n2 n1 n3 Send Uses Lamport total ordering and happened-before concept. (m,T6,6) receive crashed n5 n4 n6 Local Data Intact! Non-communication event 3/21 Distributed Dictionary Data Replication needs an efficient data structure ---scalable, available and recoverable. Solution is….. A replicated dictionary using log Dictionary: An abstraction of data object like file directory, a resource management table, an electronic appointment calendar. Index 11 X Insert Delete 4/21 A Dictionary Snapshot. 5/21 Distributed Log Data Structure: type Event = record op: OperationType; time : TimeType; node : NodeId; end Example: 1. delete, Ti, 3. 2. add , Ti+4, 6. 6/21 DICTIONARY PROBLEM NOTATION: •Each node has a local, fully replicated dictionary copy Vi. • V(e) = Contents of node where e occurred. • X = Dictionary Entry. •.CX = Event that inserts X . Insert x Delete x e • X-delete event = Event that deletes X. T Dictionary Problem Restrictions. R1) X є V(e) iff CX -> e with no X-delete event g, g -> E. R2) Delete(X) can be invoked on Ni only if X є Vi immediately prior to execution . R3) For each dictionary entry X, there is almost one event, insert(X) in the dictionary. Dictionary Problem: Problem of finding distributed algorithm on n nodes such that each node can do insert/delete/send/receive subjected to restrictions R1,R2 AND R3. 7/21 Prior Work Log P1 L1 P2 INSERT X INSERT X 1) Y P3 1) 2) Y X INSERT X L2 SENDS WHOLE LOG EXCESSIVE COMMUNICATION Dictionary 1) X ENTIRE LOG STORED EXCESSIVE STORAGE COST. USED TO CALCULATE DICTIONARY ENTRY. Y є V(e) iff CY -> e WITH NO X-DELETE EVENT g, g -> E EXCESSIVE CALCULATION 8/21 Proposed Solution is… Data Structures Used:  Log Data Structure: • 2-D Time Table Ti (Remember Matrix Timestamp) • Partial Log PLi  Dictionary Data Structure: • Vi : Set Of Dictionary Entries. 9/21 Algorithm  Initialization: Vi = 0; PLi = 0; For all (i,j) Ti[i,j] = 0  Insert(X)/ Delete(X): • Ti[i,i] = Clocki. • PLi = PLi U { Op,Ti[i,i],i} If Op = Insert(X), Vi = Vi U {X}. If Op = Delete(X), Vi = Vi – {X}.  Send(m) To Nk: • NP = {eR , (eR є PLi) & ( Ni knows that Nk doesn’t know about eR with 2DTT = Ti at node Ni). • SEND <NP, Ti> TO Nk.  Receive(m) From Nk: • m = < NPk, Tk > • NE = Msg to include = those records of which Ni isn’t aware of. • Vi = {V | (V є Vi or insertion of V є NE) AND (V hasn’t being deleted from NE ).} • Update Ti using same concept as matrix timestamp. • PLi = {eR , the event belongs to PLi U NE & if at most one node has no info about eR}. 10/21 n2 log dictionary 0 0 0 0 0 0 0 0 0 <Insert x , T1> n3 T2 n1 <Insert x , T1> 1 x Insert(X,1,1) log dictionary dictionary log 0 0 0 0 0 0 0 0 0 T3 2 0 0 0 0 0 0 0 0 T1 11/21 1 x Insert(X,1,2) n2 dictionary log 2 0 0 2 1 0 0 0 0 <Insert x , T1> n3 1 n1 <Insert x , T1> x Insert (X, 1,3) 1 x Insert (X,1,1) log dictionary dictionary log 2 0 0 0 0 0 2 0 1 T3 T2 2 0 0 0 0 0 0 0 0 T1 12/21 1 x 2 y Insert (X,1,2) Insert(Y,2,2) n2 dictionary <(Insert x, Insert y ), T2> log 2 0 0 2 3 0 0 0 0 <Insert y , T2> n3 T2 n1 1 x 1 x Insert (X, 1,3) Insert (X,1,1) log dictionary dictionary log 2 0 0 0 0 0 2 0 1 T3 2 0 0 0 0 0 0 0 0 T1 13/21 1 x 2 y Insert(X,1,2) Insert(Y,3,2) n2 dictionary log <(Insert x, Insert y ), T2> 2 0 0 2 3 0 0 0 0 <Insert x , T2> n3 T2 n1 1 x 2 1 x y Insert (X, 1,3) Insert (Y,2,3) Insert(X,1,1) Insert(Y,3,1) 2 log dictionary dictionary log 2 0 0 2 3 0 2 0 2 T3 y 3 3 0 2 3 0 0 0 0 T1 14/21 1 x 2 y Insert (X,1,2) Insert (Y,3,2) n2 dictionary <insert z , T3> log 2 0 0 2 3 0 0 0 0 T2 n3 <(insert z, insert y ), T3> 1 x 2 n1 y Insert (Y,2,3) Insert (z,3,3) 1 x Insert (X,1,1) Insert( Y,3,1) 2 log dictionary dictionary log 2 0 0 2 3 0 2 0 4 T3 y 3 3 0 2 3 0 0 0 0 T1 15/21 n2 1 2 x y 3 z Insert(X,1,2) Insert(Y,3,2) Insert(Z,4,2) dictionary log <(insert z ), T3> 2 0 0 2 4 0 2 0 4 T2 n3 <(insert z, insert y), T3> 1 x 2 n1 y Insert( Y,2,3) Insert (z,4,3) 1 x Insert(X,1,1) Insert(Y,3,1) Insert(Z,4,1) log 0 0 2 3 0 2 0 4 T3 y 3 z dictionary Dictionary log 2 2 4 3 4 2 3 0 2 0 4 T1 16/21 n2 1 2 x y 3 z Insert(Y,3,2) Insert(Z,4,2) dictionary log 2 0 0 2 4 0 2 0 4 <(del x, insert z ), T1> n3 2 y 3 z n1 <(del x) T1> 1 x Insert( Y,2,3) Insert (z,4,3) 1 x Insert(Y,3,1) Insert(Z,4,1) log 0 0 2 3 0 2 0 4 T3 2 y 3 z dictionary dictionary log 2 T2 6 3 4 2 3 0 2 0 4 T1 17/21 Comparison with other work Proposed By: Data Structure used: Disadvantage : Fisher and Michael Dictionary data structures. Have to send entire copy of the dictionary in each message. Allchin Synchronization set (SS) and 1-D Time Table. SS ~= Partial Log SS grows unboundedly. Wuu & Bernstein Dictionary, Log and 2-D Time Table 2-DTT of message complexity = O(n2). is sent in every message. 18/21 Improving 2-DTT Message Complexity Strategy Data Structure Stored/Sent. Pros & Cons. 0 Complete 2DTT is stored at the node Complete 2DTT is sent in the message. Message Complexity is 2 Store: O(n ) as high as O(n2), as Send: O(n2) one has to send and store n x n matrix. 1 Complete 2DTT is stored at the node. A node sends only its own row in the message. Requires direct Store: O(n2) messages to Send: updateO(n) each row. Needs to include more event records. 2 Stores neighbors’ and own rows. Sends corresponding row info. to corresponding neighbor. Can’t determine when Store: O(nk) all nodes have come to Send: O(n) know about an event. Discard event record once all neighbors know about it. 3 Stores all entries (row & column) corresponding to neighbors. Sends row info. thorough the gateway nodes. Better whenStores: n/w is O(k2) Send:O(k) large , connectivity and communication are less. 19/21 Extending The Proposed Solution…. Replicated Numeric Data: It supports add-to and subtract-from operations, that are commutative. Log/2DTT solution makes sure that no matter what order one does the operation, the answer is consistent. So, result1 = b + a –c; result2 = b – c + a; result1 = result2. Detection Of Failure : To distinguish node failure from communication failure, a log is used to collect records of communication events. Suppose node N1 has the 2DTT as 100 000 103 It knows that no one has received any info from Node 2. So, node 2 might be down. 20/21 Conclusion •Mutual consistency of replicated data is achieved. •AlgorithmReduction works well in an unreliable network. Link failure/Message lost: of comm / storage Get info from other nodes. used to Partial logcompute sent andothers’ stored Node failure: Info stored in viewsofofcomputation data. •Excessive communication, computation and storage Reduction log/dictionary that are stable cost: costs are reduced. storages. Partial entries re-calculated in the dictionary Remember cost: Replicated Log •Weaker Consistency Constraint is used. 21/21

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Insert(X)