Download Decentralized Location Services

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

IEEE 1355 wikipedia , lookup

Airborne Networking wikipedia , lookup

Recursive InterNetwork Architecture (RINA) wikipedia , lookup

Brocade Communications Systems wikipedia , lookup

Routing wikipedia , lookup

Routing in delay-tolerant networking wikipedia , lookup

Transcript
Tapestry
Architecture and status
UCB ROC / Sahara Retreat
January 2002
Ben Y. Zhao
[email protected]
Why Tapestry
Today’s Internet

Route failures not uncommon


IPv4 constrains deployment of new protocols


BGP too slow to recover, redundant routes unexploited
IP multicast, security protocols (DDoS traceback), …
Wide-area applications straining existing systems

Scalable management of large scale resources
Our goals

Wide-area scalable network overlay





Highly fault-tolerant routing / location
Introspective / self-tuning platform
Support application-specific protocols
Efficient (b/w, latency) data delivery
Pass on wide-area solutions to application layer
ROC/Sahara Retreats, 1/2002
2
What is Tapestry?
A prototype of a decentralized, fault-tolerant, adaptive
overlay infrastructure
(Zhao, Kubiatowicz, Joseph et al. 2000)
Network substrate of OceanStore


Routing: Suffix-based hypercube
Similar to Plaxton, Rajamaran, Richa (SPAA97)
Decentralized location:
Virtual hierarchy per object with cached location references
Dynamic algorithms using local information
Core API:



publishObject(ObjectID)
routeMsgToObject(ObjectID)
routeMsgToNode(NodeID)
ROC/Sahara Retreats, 1/2002
3
Routing and Location
Namespace (nodes and objects)


160 bits length  280 names before name collision
Each object has its own hierarchy rooted at Root
f (ObjectID) = RootID, via a dynamic mapping function
Suffix routing from A to B


At hth hop, arrive at nearest node hop(h) such that:
hop(h) shares suffix with B of length h digits
Example: 5324 routes to 0629 via
5324  2349  1429  7629  0629
Object location:


Root responsible for storing object’s location
Publish / search both route incrementally to root
ROC/Sahara Retreats, 1/2002
4
Tapestry Mesh
Incremental suffix-based routing
3
4
NodeID
0x79FE
NodeID
0x23FE
NodeID
0x993E
4
NodeID
0x035E
3
NodeID
0x43FE
4
NodeID
0x73FE
3
NodeID
0x44FE
2
2
4
3
1
1
3
NodeID
0xF990
2
3
NodeID
0x555E
2
NodeID
0xABFE
1
NodeID
0x73FF
ROC/Sahara Retreats, 1/2002
NodeID
0x04FE
NodeID
0x13FE
4
NodeID
0x9990
1
2
2
NodeID
0x423E
3
NodeID
0x239E
1
NodeID
0x1290
5
Object Location
Randomization and Locality
ROC/Sahara Retreats, 1/2002
6
Fault-tolerant Routing
Strategy:


Detect failures via soft-state probe packets
Route around problematic hop via backup pointers
Handling:



3 forward pointers per outgoing route
(2 backups)
2nd chance algorithm for intermittent failures
Upgrade backup pointers and replace
Protocols:


First Reachable Link Selection (FRLS)
Proactive Duplicate Packet Routing
ROC/Sahara Retreats, 1/2002
7
Talk Outline
Tapestry overview
Architecture
Evaluation
Brocade
Conclude
ROC/Sahara Retreats, 1/2002
8
Architecture Background
OceanStore implementation


Java with asynchronous I/O
Event-based, stage driven architecture
(Sandstorm – M. Welsh)
Applications
OceanStore
Tapestry
Sandstorm (async I/O, event arch.)
Java Virtual Machine
Operating System
ROC/Sahara Retreats, 1/2002
9
Key Stages
StaticTClient / Federation

Uses config files to bootstrap initial Tapestry
DynamicTClient

Integrates new nodes into static Tapestry
Router

Primary handler of routing and location
Patchwork

Introspective monitoring and fault-detection
Applications
Dynamic TClient
Patch OceanStore
Static TClient
work
Router
Sandstorm (async I/O, event arch.)
ROC/Sahara Retreats, 1/2002
10
Static TClient
Federation used as rendezvous point
Pair-wise pings to generate route tables
Federation used as global barrier to begin
S
S
F
S
ROC/Sahara Retreats, 1/2002
1. Si says hello to F
2. F informs group of Si
S
3. Nodes do pair-wise pings
4. Nodes signal readiness
5. Barrier reached at F,
signals start
11
Dynamic TClient
Node Integration
1. Hill-climb to find nearest Gateway
2. Route to surrogate / copy routes
3. Move relevant objects to new root
4. Directed multicast notifies nearby nodes
Routes Request
F
Routes Response
G
S
Moving Object Pointers
Directed Multicast
ROC/Sahara Retreats, 1/2002
?
12
Routing / Location
Router class
Maintains:


RoutingTable:
[ ][ ] of RouteEntries
ObjectPointers:
Hash(Guid)PublishInfo
Hash(Guid)LastHop
Handles:


Object publication / unpublication / mobile objects
Route / location message handling
ROC/Sahara Retreats, 1/2002
13
Patchwork
Fault-handling / introspective stage


Granulated periodic beacons measure loss and
network latency to entries in routing table
Promote/demote routes in single RouteEntry
A
BB
CA
C
B
A
X
C
network
ROC/Sahara Retreats, 1/2002
Router
14
Deployment Status
 Object Location
 Publish / unpublish / route to object
 Mobile objects (backtracking unpublish)
 Active deletes, confirmation of non-existence
 General Routing
 Route to node, redundant routes
 Soft-state fault-detection, limited optimization
 Advanced policies for fault recovery
 Dynamic Integration
 Integration w/ limited optimizations
 Best effort fault-resilient integration mechanisms
 Background threads for optimization / refresh
ROC/Sahara Retreats, 1/2002
15
Talk Outline
Tapestry overview
Architecture
Evaluation
Brocade
Conclude
ROC/Sahara Retreats, 1/2002
16
Generalized Results
Cached object pointers
Efficient lookup for nearby objects
 Reasonable storage overhead

Multiple object roots
Improves availability under attack
 Improves performance and perf. stability

Reliable packet delivery
Redundant pointers approximate optimal
reachability
 FRLS, a simple fault-tolerant UDP protocol

ROC/Sahara Retreats, 1/2002
17
First Reachable Link Selection
Use periodic UDP packets to
gauge link condition
Packets routed to shortest
“good” link
Assumes IP cannot correct
routing table in time for
packet delivery
IP
A
B
C
D
E
ROC/Sahara Retreats, 1/2002
Tapestry
No path exists to dest.
18
Some Numbers
Measurements





PIII 800, L2.2.18, IBM JDK 1.3
Simulating 6 nodes
(4 staticTC, 1 federation, 1 dynamicTC)
Publishing / locating ~10 objects
PublishMsg, RouteMsg: ~ 0-2 ms
Integration: ~2600ms (w/ pings)
Integration messages:


Assuming latency data available
2 x n (routing and objects)
16M (directed multicast notification) (M  3)
ROC/Sahara Retreats, 1/2002
19
Talk Outline
Tapestry overview
Architecture
Evaluation
Brocade
Conclude
ROC/Sahara Retreats, 1/2002
20
Landmark Routing on P2P
Brocade


Exploit non-uniformity
Minimize wide-area routing hops / bandwidth
Secondary overlay on top of Tapestry

Select super-nodes by admin. domain


Super-nodes form secondary Tapestry


Divide network into cover sets
Advertise cover set as local objects
Routing (AB) uses brocade to route directly into
B’s local network
ROC/Sahara Retreats, 1/2002
21
Brocade Mechanisms
Selective utilization


Nodes cache local cover set
Only utilize brocade if dest. not in cache
Forwarding messages to supernodes
1.
2.
Super-node does IP-snooping
Direct: cover set caches supernode
Inter-domain routing: AB
1.
2.
3.
ASN(A) via IP
SN(A) finds SN(B) via Tapestry location
SN(B)B via Tapestry/Chord/Pastry/CAN
ROC/Sahara Retreats, 1/2002
22
Brocade Routing RDP
Brocade Latency RDP 3:1
Relative Delay Penalty
Original Tapestry
IP Snooping Brocade
Directed Brocade
5
4.5
4
3.5
3
2.5
2
1.5
1
0.5
0
2
4
6
8
10
12
14
16
18
20
22
24
26
Interdomain-adjusted Latency on Optimal Route
Local cover set cache on; interdomain:intradomain = 3:1
Packet simulator, Transit-stub 4096 T nodes, 16 SuperN
ROC/Sahara Retreats, 1/2002
23
Brocade Bandwidth Usage
Brocade Aggregate Bandwidth Usage
Original Tapestry
IP Snooping Brocade
Directed Brocade
Approx. BW per Message
60
50
40
30
20
10
0
2
4
6
8
10
12
14
Physical Hops in Optimal Route
Local cover set cache on
B/W unit: (sizeof (Msg) * Hops)
ROC/Sahara Retreats, 1/2002
24
Ongoing / Future Work
Fill in full functionality

Fault-handling policies, introspection, self-repair
More realistic experiments


Artificial topologies on SOSS simulator
Larger scale dynamic integration experiments
Code development

External deployment / Code release




Sprint programmable routers
Academic networks
Introspective measurement platform
Implementing applications (Bayeux, Brocade … )
ROC/Sahara Retreats, 1/2002
25
For More Information
Tapestry and related projects (and these slides):
http://www.cs.berkeley.edu/~ravenben/tapestry
OceanStore:
http://oceanstore.cs.berkeley.edu
Related papers:
http://oceanstore.cs.berkeley.edu/publications
http://www.cs.berkeley.edu/~ravenben/publications
[email protected]
ROC/Sahara Retreats, 1/2002
26