Download Exploiting System Diversity in Peer-to

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Computer network wikipedia , lookup

Zero-configuration networking wikipedia , lookup

Network tap wikipedia , lookup

Distributed operating system wikipedia , lookup

Recursive InterNetwork Architecture (RINA) wikipedia , lookup

IEEE 802.1aq wikipedia , lookup

CAN bus wikipedia , lookup

Airborne Networking wikipedia , lookup

Kademlia wikipedia , lookup

Routing in delay-tolerant networking wikipedia , lookup

Peer-to-peer wikipedia , lookup

Transcript
1
EXPLOITING SYSTEM DIVERSITY IN
PEER-TO-PEER PUBLISH-SUBSCRIBE SYSTEMS
Final Exam of Jay A. Patel (April 1, 2009)
Outline
2
•
•
•
Introduction •
Computing Today
System Diversity
Publish-Subscribe Systems
Thesis Statement
• Confluence
• RANS Framework
Contributions • Rappel
• Concluding Remarks
Conclusion
Computing Today
3

Continued growth of Internet services
 search
engines, webmail systems, social networks,
blogging platforms, file sharing systems, etc.

Many use publish-subscribe paradigm
 FaceBook,

Twitter, YouTube
Scaling via geographic distribution of infrastructure
 moving
around large sets of data
 Content
distribution, log collection, archiving data
System Diversity
4


Successful distributed systems must provide high
reliability, availability, performance, and scalability
System diversity arises from variations in either
 resources
 e.g.,
or characteristics available to end hosts
network diversity
 requirements
of individual end hosts themselves, usually
arising due to end user demands
 e.g.,
interest diversity
Many Forms of System Diversity
5

Interest


Network


Hardware and software
Workload


Bandwidth, location, connectivity, packet loss rates
Platform


Subscription heterogeneity
Time of day effects, flash crowds
Other

Availability
Taxonomies of Publish-Subscribe Systems
6

Subscription Model

 Subject
based
 Content based

Architecture
 Client-Server
 Peer-to-Peer
Delivery Mechanisms
 Unicast
(“1-to-1”)
 Single source multicast (“1-to-n”)
 Convergecast (“n-to-1”)
 General purpose multicast (“m-to-n”)
RSS
(contemporary
delivery
mechanism),
Gryphon
[Bhola03],
Siena [Sandler05],
[Carzaniga01],
RSS
(contemporary
delivery
mechanism),
Scribe
[Castro02],
FeedTree
Scribe
[Castro02],
SplitStream
[Castro03],
SpiderCast
[Chockler07],
AnySee [Liao06]
Scribe
IP Multicast,
[Castro02],
ESM
FeedTree[Sandler05],
[Chu02],
RMTP
[Paul96],
Bayeux
Gossip
[Zhuang01],
[Demers88],
SpiderCast
Bimodal
Multicast
RSS
(contemporary
delivery
mechanism),
E-mail
Listservs
Corona
[Ramasubramanian06],
Cobra
[Rose07]
Gryphon
[Bhola03],
Siena
[Carzaniga01],
Sub-2-Sub
[Voulgaris05]
Bayeux
[Zhuang01],
SpiderCast[Chockler07]
[Chockler07],
[Birman99]
Sub-2-Sub
[Voulgaris05]
Thesis Focus
7

We choose to focus on the convergecast and
general purpose multicast paradigms as
 these
paradigms remain the least explored of the lot
with respect to system diversity
 Unicast
is meant for simplicity of use
 Multicast is well studied
 these
paradigms are most relevant in today’s
computing
 Multi-site
cloud computing, grid computing, etc.
 Improved delivery for RSS
Thesis Statement
8
“We show that by directly addressing interest and
network diversity as a first class design principle,
the scale and performance of peer-to-peer publishsubscribe systems can be improved.”
Thesis Contributions
9

Confluence (n-to-1)
A system for lossless collection of data from multiple sources
to a single sink
 Exploits both spatial and temporal network diversity


RANS Framework
Provides realistic and deterministic simulation at large scales
 Helps model network, interest, and availability diversity


Rappel (m-to-n)
A system that leverages interest and network locality to
improve fairness
 Exploits network and interest diversity

Outline
10
•
•
•
Introduction •
Computing Today
System Diversity
Publish-Subscribe Systems
Thesis Statement
• Confluence
• RANS Framework
Contributions • Rappel
• Concluding Remarks
Conclusion
11
Confluence
A System for Lossless Multi-Source Data
Collection
Introduction
12

New emerging paradigm: fetching large files from
multiple sources to a central clearing house
 Multiple
data centers (cloud computing)
 PlanetLab servers
 Tele-immersive video gateways
 Grid computing

A n-to-1 publish-subscribe system
 Multiple
sources = multiple publishers
 Single sink = single subscriber
The Objective
13
“To minimize the total time required to transfer the
necessary files from the source nodes to the sink
node.”

Currently, there are no known systems that optimize
for this goal
 End
users generally use the direct transfer (1-to-1)
strategy to fetch files
Key Observation
14

The diversity of connections amongst Internet hosts
has been widely observed and falls into two
categories
 Spatial
diversity refers to the fact that different links
have different bandwidth availabilities
 Temporal diversity refers to the variation over time of
the available bandwidth at a single link
2500 secs
Host x
Host y
2 MBps
5000 MB
Exploit Natural Parallelism
1 MBps
5000 secs
5000 MB
5 MBps
1000 secs
Host t
15
Motivating Example
The transfer process can be speeded up by routing data
via intermediate nodes
•
37% of PlanetLab node pairs achieve better throughput by leveraging
a third node
System Assumptions
16


Files can be subdivided into blocks
All files (file blocks) are unique
 no

a priori replicas present in the system
Source node failures do not occur
 if
a source node fails, Confluence provides no resiliency
guarantees
 acceptable as same problem with Direct Transfer
Network Graph Model
17
Host x
Host y
x+
y+
Cx +
C y+
x0
Cx -
Host t

x-
C ycxt
The network graph G supports both
 asymmetric
link capacities
 asymmetric ISP connectivity limits
cyx
cxy
y0
t+
Ct +
t0
cyt
y-
Theoretical Solution (Spatial Only)
18




bx
Graph translation: GT
T is duration
Add super source s
Connect super source to nodes
via file edges


bx capacity equals file size held
total capacity of s, B   bi
i

The maximum st0 flow in GT


corresponding flow: f T
Find the smallest time T* that
can move B blocks from st0



binary search on [0, Tmax)
corresponding flow: f T*
optimal flow rates: f *=f T*/ T*
by
s
x+
y+
Cx+x+
TC
TC
Cy+y+
x0
y0
- C
TC
xx
x-
Time complexity: O(log (VBC) VE log
cyxyx
Tc
Tc
cxyxy
TC
Cy-y-
cTc
xtxt
t+
(V2/E))
t0
++
C
TC
t t
cytyt
Tc
y-
System Design
19

Maintaining network graph G


via coordinator
Link information
staleness: large gap between probes (round robin)
 cost: must send data packets to measure
 our solution: k << n

each participating node has k peers
 chosen uniformly at random by the coordinator


Measuring cxy and CxPathChirp [Ribeiro2003]
 PathBlast – brute force


needed only once, during bootstrapping
Transfer Plan
20

Coordinator calculates based on theoretical solution
 Sends
directives to nodes: how many blocks, to whom
 What about optimal flow rate?

First approach: Flow Control System [Mehra2003]
 Subpar
performance with multiple flows
 Temporal diversity galore

Better approach: dynamic adaptation at the
application level
Dynamic Adaptation
21


Data is pushed from senders to all receivers
simultaneously
Periodic recomputation
 Update
network graph G with new state information
 the
number of residual file blocks: b´x
 if rxy > f *xy × (1 − slack)
then cxy = max(cxy , rxy)
else cxy = max(cxy /2, rxy)
 Coordinator
 State
recalculates transfer plan periodically
inconsistency: final_computation flag
rxy – measured flow rate
f *xy – optimal flow rate
slack – to prevent hysteresis
Exploiting Block Replication
22





Conservation of blocks
If the solution uses a file
replication edge



e.g., node y holds some blocks
originating at node x
purge-immediately policy
“Tag” blocks with origin node
and unique ID
File replication edge


b'x
Replication naturally occurs
during the transfer process
REPLICATED_BLOCKS directive
Lets origin node retransmit
Note: threshold for replication
b'y + byx
s
x+
y+
Cx +
b yx
x0
y0
cxy
Cx -
x-
C y+
cxt
cyx
t+
Ct+
t0
C y-
cyt
y-
Experimental Methodology
23

Implemented using ns2 network simulator


PlanetLab topology


trace measurements collected via S3 [Yalagandula06]
Up to a 100 source nodes


TCP CUBIC as transport layer protocol
100 MB file per source node
Confluence parameters
Number of neighbors, k = 10
 Recomputation interval, p = 15
 Replication enabled with purge-immediately policy

Direct Transfer: Microbenchmark
24
Pool 49
Pool 10

Maximizing the number of parallel connections is the best
approach

Poorly-connection connections start earlier
Direct Transfer: Macrobenchmark
25





100 total nodes
 50 runs: in run #i, node #i
acts as a sink node
First result: fetch from the first
50 nodes (49 source nodes)
Second result: fetch from all
nodes (99 source nodes)
y-axis = ratio of second result
to first result
Direct Transfer scales will increasing number of source nodes

Well-connected nodes are better able to exploit parallelism
Confluence vs. Direct Transfer
26

50 participating nodes, 50 different runs
in run #i, node #i acts as a sink node, other nodes act as
source nodes
 each sink node fetches from all other participating nodes


Confluence performs better than Direct Transfer
70% perform better with Confluence on a planetary scale
 90% perform better with Confluence on a continental scale

Overheads in Confluence
27

Measuring network graph G
 may


be inaccurate, stale, or both
k peers may not be able to saturate the sink
Delayed start
 metadata
must be collected by the coordinator, solution
calculated, and transfer plan directives sent

State inconsistency
 The
final set of blocks sent directly to coordinator may
take sometime to finish

Tracking each block
Confluence vs. Direct Transfer II
28

Confluence performs well on different planetary scale topologies


All nodes perform better in 3 topologies, 90% in one, and 70% in other
Confluence excels in scenarios with small number of source nodes

100% under 50 nodes, 80% with 75 nodes, 70% with 100 nodes
Related Work
29

CoBlitz


GridFTP


tool to move around large sets of data on the grid
BitTorrent [Cohen03], CDNs [Akamai]


uses PlanetLab nodes to improve 1-to-1 transfers
Multiple replicas of data available
Miscellaneous


Estimating bandwidth: pathChirp [Ribeiro03], PTR/IGI [Hu03],
Pathload [Jain02], pathrate [Dovrolis04], Pathneck [Hu04], etc.
TCP for LFNs: High-Speed TCP [Chase01], TCP BIC [Xu04],
TCP CUBIC [Rhee05], FAST TCP [Wie06], TCP-Illinois [Liu06], etc.
Outline
30
•
•
•
Introduction •
Computing Today
System Diversity
Publish-Subscribe Systems
Thesis Statement
• Confluence
• RANS Framework
Contributions • Rappel
• Concluding Remarks
Conclusion
31
RANS Framework
Realistic and Deterministic Simulation at Large
Scales
Motivation
32

Need to study system diversity at large scales
 network
diversity: end-to-end latency fluctuations
 interest diversity: subscription correlation

System deployment is labor-intensive and limited
 PlanetLab


usually only has about 400 accessible nodes
Experiments are not replayable
Simulations provide an alternative, but not realistic
Introduction
33

RANS objectives
 Realism
 simulation
results should match deployment observations
 simulation should be run using the same code as an actual
implementation
 Deterministic
replay
 unmodified
application should yield the same result when
provided with identical input as a previous execution
 Large
scale
 ability
to simulate several thousand end nodes
 selective granularity simulation
Application Programming Interface
34


An application interfaces with the EventManager
and the TransportManager
An application can run multiple protocols
Events
35
Messages
36
Implementation
37

Sockets implementation
 single-threaded,

uses boost::asio
Trace-based Simulator
 Overnet
churn trace
Topology Manager
38

Goal: desire realistic end-to-end latency fluctuations



PlanetLab RTT trace with fluctuations



226 end hosts, over 4 hours
continuous fluctuations between node pairs (median latency)
Internet AS topology


Problem: artificial topology, limited scale of trace data
Solution: topology fitting
20062 stub networks, 175 transit networks, and 8279 transit-and-stub
network networks
Topology fitting via trial and error

Match simulator generated latencies with PlanetLab median latencies




the first 10% of inter-AS links: latency between 0ms and 4ms
the next 30% of inter-AS links: latency between 4ms and 30ms
the final 60% of inter-AS links: latency between 30ms and 115ms
Map each simulator node pair with a random PlanetLab node pair (for
fluctuations)
Topology Fitting Results
39

The latencies modeled by RANS closely matches the
latencies experienced within PlanetLab
Simulation vs. PlanetLab:
Rappel Per-Feed Dissemination Tree
40
Experiment: 1 publisher, 250 subscribers, 4 hours, 1 update very minute
Close match of results validates RANS framework
95% of nodes consistently receive updates in under 0.5 seconds
Simulation vs. PlanetLab II
41
Close match of results validates RANS framework
Stretch ratio w.r.t. to the underlying coordinate space is lower
Related Work
42

Network simulators


Emulators


PlanetLab
Application-level network simulators


ModelNet, EmuLab
Testbed


ns2, QualNet, OPNET
p2psim, GnutellaSim
Programming libraries/languages for rapid deployment

Macedon, P2
Outline
43
•
•
•
Introduction •
Computing Today
System Diversity
Publish-Subscribe Systems
Thesis Statement
• Confluence
• RANS Framework
Contributions • Rappel
• Concluding Remarks
Conclusion
44
Rappel
Using Locality to Improve Fairness in PublishSubscribe Systems
System Goals
45

Peer-to-peer delivery for RSS
 Low
publisher and subscriber overhead
 Optimizations due to m-to-n delivery paradigm

Client Fairness
 Zero
noise
 Load should scale w/ number of subscriptions
 Exploit

interest diversity
Real-time update dissemination
 Via
a low stretch ratio
 Exploit network diversity
Design Overview
46
C
H
D
A

Used to locate other nodes
(“friends”) close in interestand network- proximity
 The “friends overlay”

E
B
G
F
P1

P1
Pi
A global control plane
A multicast dissemination
tree per feed
Can be joined by contacting
any active node
 Only subscribers join a feed’s
dissemination tree

K
W
Z
B
A

Eliminates noise
Techniques Utilized in Rappel
47

Friends Overlay
 Utility-based
selection of friends
 Exploit
interest and network locality using Bloom filters and
network coordinates
 Discovering
 Periodic

new nodes via gossip
audits
Per-feed Dissemination Trees
 Primitives
based on network coordinates
 Periodic rejoin
 Push-pull of updates
Exploiting Network Diversity
48
Experiment: 1 publisher (at UIUC), 25 subscribers (distributed across USA)
Rappel’s per-feed dissemination trees exploit network diversity
Exploiting Interest Diversity
49
Experiment: 250 feeds, 5582 subscribers (simulation only)
91% clients perfectly covered; no “wasted” friend
Rappel’s friendship overlay effectively exploits interest diversity
Outline
50
•
•
•
Introduction •
Computing Today
System Diversity
Publish-Subscribe Systems
Thesis Statement
• Confluence
• RANS Framework
Contributions • Rappel
• Concluding Remarks
Conclusion
Concluding Remarks
51
“We show that by directly addressing interest and network
diversity as a first class design principle, the scale and
performance of peer-to-peer publish-subscribe systems can
be improved.”



Confluence exploits spatial and temporal network diversity
for a convergecast delivery based publish-subscribe system
RANS framework provides realistic and deterministic
simulation to study system diversity at large scales
Rappel exploits both interest and network diversity via the
use of locality to improve fairness and system performance