Download plaxton_current

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Linked list wikipedia , lookup

Quadtree wikipedia , lookup

Lattice model (finance) wikipedia , lookup

Red–black tree wikipedia , lookup

Interval tree wikipedia , lookup

Binary tree wikipedia , lookup

Binary search tree wikipedia , lookup

B-tree wikipedia , lookup

Transcript
The Oceanstore Regenerative
Wide-area Location Mechanism
Ben Zhao
John Kubiatowicz
Anthony Joseph
Endeavor Retreat, June 2000
Wide-area Location
 Increasing scale more vital to distributed systems
 Existing wide-area location mechanisms:
–
–
–
–
SLP - WASRV extension
LDAP centroids
Berkeley SDS
Globe location system
 Unresolved issues:
– True Scalability
– Fault tolerance against:
•
•
•
•
Single/multiple node failures
Network partitions
Data-corruption and malicious users
Denial of Service attacks
– Support for high updates / mobile data
 OceanStore Approach
– Wide-area location using Plaxton trees
Previous Work: Plaxton Trees





Distributed tree structure where every node is the root of a tree
Simple mapping from object ID to root ID of the tree it belongs to
Nodes keep “nearest” neighbor pointers differing in 1 ID digit
– Along with a referrer map of closest referrer nodes
Insertion:

Query:
– Insert object at local node
– Proceed to root node hop by hop
– Proceed to root node hop by hop
– If intersect node w/ desired back-pointer, follow it
– Leave back-pointers to object at each hop – If root or best approximate node reached and no
Benefits:
pointer to object, then it has not been inserted
– Decouples tree traversal from any single node
– Exploits locality with short-cutting back-pointers
– Guaranteed # of hops O(Log(N)) where N = size of namespace
629
629
529
529
109
479
479
675
116
Inserting Obj #62942
Object Location
Search Client
Root Node
116
Searching Obj #62942
Introducing Sibling Meshes
 Set of meta-tree structures that assist node-insertion and
fault-recovery
– Every node keeps n ptrs to “similar” nodes w.r.t. 1 property
– Example: all nodes ending in 629 belong to a single mesh
– Each node belongs to Log(N) meshes,
where N = number of unique IDs in namespace
– Meshes decrease in size as granularity becomes more fine
629
Oceanstore Canopy
Single path to root
29 Level
Sibling pointers
Single hops to root
9 Level
Plaxton Trees / Ground Level
• Each mesh represents a single hop
on the route to a given root.
• Sibling nodes maintain pointers to
each other.
• (optional) Each referrer has pointers
to the desired node’s siblings
Building a Plaxton Grove
 Incremental node addition algorithm
For new node Nn to be added with nodeID D:
– Do a hop by hop search for D
– At each hop, visit X closest nodes
– For each node Ni in set X:
•
•
•
•
Integrate next neighbor map from Ni neighbor map
Take referrer list from Ni
Measure distance between each referrer and Nn
If new distance shorter, notify referrer to point to Nn
– Stop: when no exact match for each digit of ID found
ID Granularity
• Redirect those referrers that are looking for IDs closer to you to
point to you
Domains of
Influence
At each hop, aggregate
neighbor and referrer maps
from closest nodes
Neighbor nodes
New node Nn
New node Nn
Node Removal / Failure
 Simple detection of pointer corruption and node failure
 Recovery from node failure and data corruption
– Mark node pointer with invalid tag
– Use next closest sibling of failed node
 Invalid pointers has second chance time-to-live
– Failures expected to recover within finite timeframe
– Entries marked invalid with countdown timer
– Each request has some chance of being forwarded to invalid
node, in order to check if recovery has been completed
– Referrer tracks traffic to failed node and assigns each
packet a “validation” probability
– Restarted node notifies referrers to remove invalid tag
– Nodes which fail to recover within timer period must reinsert
as new nodes
 Node removals = intentional exits from system
– Actively announce removal to referrers,invalidation skipped
– Referrers maintain backups by requesting another sibling ptr
Self-tuning and Stability
Self-maintenance build into searches:
 Self tuning of non-optimal routes
– Keep running totals of subpath distances
– Inform nodes of better routes
 Stability
– Multiple secure hashing algorithms applied to incoming query
– Secure hash prevents targeted server attacks
– Multiple hashes multiplexes query into parallel mappings
• Simulates multiple mappings of nodeIDs onto physical nodes
• Overlap provides additional security against single, multiple server
failures, network partitions, and corrupted data
• Provides this without additional pointer storage overhead
 Temporary map pointers
– Referrer and Neighbor entries have time-to-live fields
– Renewal by usage (implicit) or explicit renewal messages
– Implicit priority queue where least often used paths can be
“forgotten” in favor of more vital paths
– Natural node recovery, wait for messages to renew maps
Node Replication



Remove single node bottleneck
– Critical for load balancing
– Adds fault-tolerance at Root nodes
Node replication
– Copy Neighbor-mapping,
then regenerate
– Redirect referrer traffic
Replicate Groups
– Share referrer mappings
– Use peer monitoring to
detect node failure and
redirect traffic as necessary
629
116
629
629
629
629
629
629
116
116
Malicious Users and Dos (Ongoing...)
 Misleading/malicious advertisement
– Source validation at storage nodes
– orthogonal mechanism ensures association
between advertisement and principal of trust
 Denial of Service attacks
– Overload of infrastructure nodes
• Routing and storage load distribution via node replication
– DoS source identification
• Probabilistic source packet stamping (Savage et. al.)
 Invalidation propagation
– Invalidations can be given by authoritative servers
– Can propagate as datagram to referrers
More Ongoing Issues
 Support for mobility
– Mobile data
• All back-pointers point to initial node (Ninit)
• Location updates sent to previous node and Ninit
• All back-pointers other than from root can be updated to
new position by current query
• Traveled node pointers time out via point expiration
• On failure, revert back to root, then to Ninit for current position
– Mobile clients and asynchronous operations
• Chain location updates on visited nodes
• When expected asynchronous requests satisfied,
node leaves chain, and informs previous node of forward link
 Oceanstore location as routing infrastructure