Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
EFFICIENT SOLUTION
TO
REPLICATED LOG AND DICTIONARY PROBLEM.
(Gene T.J. Wuu & Arthur J. Bernstein.)
Presented By : Megha Priyanka
Overview
• Need For Data Replication.
• Consistency Constraints For Replicated Data.
• Model Of The Distributed Environment.
• Dictionary And Log Structure.
• Dictionary Problem.
• Prior Work.
• Proposed Solution.
• Comparison With Other Work.
• 2DTT Data Structure Improvement.
• Extending The Proposed Solution.
• Conclusion.
Need for replicated data?
• Many applications share data objects.
• Reliability and fast access are in demand.
• First step toward a comprehensive disaster recovery plan.
• Availability of data even when individual node fails.
1/21
Consistency constraints for replicated data….
•
•
Serializable transactions ensure correctness of database.
Serial consistency is harder in unreliable distributed system.
Why?
-> Availability conflicts with serial consistency.
-> Concurrency and serializability are compatible when concurrent
transactions access disjoint databases.
So,
•
•
Lower the consistency bar.
Use a weaker consistency constraint with additional information about th
distributed transaction.
2/21
Event Model
n2
n1
n3
Send
Uses Lamport total
ordering and
happened-before
concept.
(m,T6,6)
receive
crashed
n5
n4
n6
Local Data
Intact!
Non-communication event
3/21
Distributed Dictionary
Data Replication needs an efficient data structure ---scalable,
available and recoverable.
Solution is…..
A replicated dictionary using log
Dictionary: An abstraction of data object like file
directory, a resource management table, an electronic
appointment calendar.
Index
11
X
Insert
Delete
4/21
A Dictionary Snapshot.
5/21
Distributed Log
Data Structure:
type Event =
record
op: OperationType;
time : TimeType;
node : NodeId;
end
Example:
1. delete, Ti, 3.
2. add , Ti+4, 6.
6/21
DICTIONARY PROBLEM
NOTATION:
•Each node has a local, fully replicated dictionary copy Vi.
• V(e) = Contents of node where e occurred.
• X = Dictionary Entry.
•.CX = Event that inserts X .
Insert x
Delete x
e
• X-delete event = Event that deletes X.
T
Dictionary Problem Restrictions.
R1) X є V(e) iff CX -> e with no X-delete event g, g -> E.
R2) Delete(X) can be invoked on Ni only if X є Vi immediately
prior to execution .
R3) For each dictionary entry X, there is almost one event,
insert(X) in the dictionary.
Dictionary Problem:
Problem of finding distributed algorithm on n nodes such that
each node can do insert/delete/send/receive subjected to
restrictions R1,R2 AND R3.
7/21
Prior Work
Log
P1
L1
P2
INSERT
X
INSERT
X
1) Y
P3
1)
2)
Y
X
INSERT
X
L2
SENDS WHOLE LOG
EXCESSIVE COMMUNICATION
Dictionary
1) X
ENTIRE LOG STORED
EXCESSIVE STORAGE COST.
USED TO CALCULATE DICTIONARY
ENTRY.
Y є V(e) iff CY -> e WITH
NO X-DELETE EVENT g, g -> E
EXCESSIVE CALCULATION
8/21
Proposed Solution is…
Data Structures Used:
Log Data Structure:
• 2-D Time Table Ti (Remember Matrix Timestamp)
• Partial Log PLi
Dictionary Data Structure:
• Vi : Set Of Dictionary Entries.
9/21
Algorithm
Initialization:
Vi = 0; PLi = 0; For all (i,j) Ti[i,j] = 0
Insert(X)/ Delete(X):
• Ti[i,i] = Clocki.
• PLi = PLi U { Op,Ti[i,i],i}
If Op = Insert(X), Vi = Vi U {X}.
If Op = Delete(X), Vi = Vi – {X}.
Send(m) To Nk:
• NP = {eR , (eR є PLi) & ( Ni knows that Nk doesn’t know about eR with 2DTT = Ti at node Ni).
• SEND <NP, Ti> TO Nk.
Receive(m) From Nk:
• m = < NPk, Tk >
• NE = Msg to include = those records of which Ni isn’t aware of.
• Vi = {V | (V є Vi or insertion of V
є
NE) AND (V hasn’t being deleted from NE ).}
• Update Ti using same concept as matrix timestamp.
• PLi = {eR , the event belongs to PLi U NE & if at most one node has no info about eR}.
10/21
n2
log
dictionary
0
0
0
0
0
0
0
0
0
<Insert x , T1>
n3
T2
n1
<Insert x , T1>
1 x
Insert(X,1,1)
log
dictionary
dictionary log
0
0
0
0
0
0
0
0
0
T3
2
0
0
0
0
0
0
0
0
T1
11/21
1
x
Insert(X,1,2)
n2
dictionary
log
2
0
0
2
1
0
0
0
0
<Insert x , T1>
n3
1
n1
<Insert x , T1>
x
Insert (X, 1,3)
1 x
Insert (X,1,1)
log
dictionary
dictionary log
2
0
0
0
0
0
2
0
1
T3
T2
2
0
0
0
0
0
0
0
0
T1
12/21
1
x
2
y
Insert (X,1,2)
Insert(Y,2,2)
n2
dictionary
<(Insert x, Insert y ), T2>
log
2
0
0
2
3
0
0
0
0
<Insert y , T2>
n3
T2
n1
1 x
1 x
Insert (X, 1,3)
Insert (X,1,1)
log
dictionary
dictionary log
2
0
0
0
0
0
2
0
1
T3
2
0
0
0
0
0
0
0
0
T1
13/21
1
x
2
y
Insert(X,1,2)
Insert(Y,3,2)
n2
dictionary
log
<(Insert x, Insert y ), T2>
2
0
0
2
3
0
0
0
0
<Insert x , T2>
n3
T2
n1
1 x
2
1 x
y
Insert (X, 1,3)
Insert (Y,2,3)
Insert(X,1,1)
Insert(Y,3,1)
2
log
dictionary
dictionary log
2
0
0
2
3
0
2
0
2
T3
y
3
3
0
2
3
0
0
0
0
T1
14/21
1
x
2
y
Insert (X,1,2)
Insert (Y,3,2)
n2
dictionary
<insert z , T3>
log
2
0
0
2
3
0
0
0
0
T2
n3
<(insert z, insert y ), T3>
1 x
2
n1
y
Insert (Y,2,3)
Insert (z,3,3)
1 x
Insert (X,1,1)
Insert( Y,3,1)
2
log
dictionary
dictionary log
2
0
0
2
3
0
2
0
4
T3
y
3
3
0
2
3
0
0
0
0
T1
15/21
n2
1
2
x
y
3
z
Insert(X,1,2)
Insert(Y,3,2)
Insert(Z,4,2)
dictionary
log
<(insert z ), T3>
2
0
0
2
4
0
2
0
4
T2
n3
<(insert z, insert y), T3>
1 x
2
n1
y
Insert( Y,2,3)
Insert (z,4,3)
1 x
Insert(X,1,1)
Insert(Y,3,1)
Insert(Z,4,1)
log
0
0
2
3
0
2
0
4
T3
y
3
z
dictionary
Dictionary log
2
2
4
3
4
2
3
0
2
0
4
T1
16/21
n2
1
2
x
y
3
z
Insert(Y,3,2)
Insert(Z,4,2)
dictionary
log
2
0
0
2
4
0
2
0
4
<(del x, insert z ), T1>
n3
2
y
3
z
n1
<(del x) T1>
1 x
Insert( Y,2,3)
Insert (z,4,3)
1 x
Insert(Y,3,1)
Insert(Z,4,1)
log
0
0
2
3
0
2
0
4
T3
2
y
3
z
dictionary
dictionary log
2
T2
6
3
4
2
3
0
2
0
4
T1
17/21
Comparison with other work
Proposed By:
Data Structure
used:
Disadvantage :
Fisher and Michael
Dictionary data
structures.
Have to send entire
copy of the
dictionary in each
message.
Allchin
Synchronization set
(SS) and 1-D Time
Table.
SS ~= Partial Log
SS grows
unboundedly.
Wuu & Bernstein
Dictionary, Log and
2-D Time Table
2-DTT of message
complexity =
O(n2).
is sent in every
message.
18/21
Improving 2-DTT Message Complexity
Strategy
Data Structure Stored/Sent.
Pros & Cons.
0
Complete 2DTT is stored at the node
Complete 2DTT is sent in the message.
Message Complexity is 2
Store: O(n )
as high as O(n2), as
Send: O(n2)
one has to send and
store n x n matrix.
1
Complete 2DTT is stored at the node.
A node sends only its own row in the
message.
Requires direct
Store: O(n2)
messages to Send:
updateO(n)
each row. Needs to
include more event
records.
2
Stores neighbors’ and own rows.
Sends corresponding row info. to
corresponding neighbor.
Can’t determine when
Store: O(nk)
all nodes have come to
Send: O(n)
know about an event.
Discard event record
once all neighbors know
about it.
3
Stores all entries (row & column)
corresponding to neighbors.
Sends row info. thorough the gateway
nodes.
Better whenStores:
n/w is O(k2)
Send:O(k)
large , connectivity
and
communication are
less.
19/21
Extending The Proposed Solution….
Replicated Numeric Data:
It supports add-to and subtract-from operations, that
are commutative. Log/2DTT solution makes sure that
no matter what order one does the operation, the
answer is consistent.
So, result1 = b + a –c; result2 = b – c + a;
result1 = result2.
Detection Of Failure :
To distinguish node failure from communication failure,
a log is used to collect records of communication
events. Suppose node N1 has the 2DTT as
100
000
103
It knows that no one has received any info from Node
2. So, node 2 might be down.
20/21
Conclusion
•Mutual consistency of replicated data is achieved.
•AlgorithmReduction
works
well
in an unreliable
network.
Link failure/Message
lost:
of comm / storage
Get info from other nodes.
used to
Partial
logcompute
sent andothers’
stored
Node
failure:
Info
stored
in
viewsofofcomputation
data.
•Excessive communication,
computation
and storage
Reduction
log/dictionary
that are stable
cost:
costs are reduced.
storages.
Partial entries
re-calculated
in the dictionary
Remember cost:
Replicated Log
•Weaker Consistency
Constraint
is used.
21/21