Download Exploiting System Diversity in Peer-to

1 EXPLOITING SYSTEM DIVERSITY IN PEER-TO-PEER PUBLISH-SUBSCRIBE SYSTEMS Final Exam of Jay A. Patel (April 1, 2009) Outline 2 • • • Introduction • Computing Today System Diversity Publish-Subscribe Systems Thesis Statement • Confluence • RANS Framework Contributions • Rappel • Concluding Remarks Conclusion Computing Today 3  Continued growth of Internet services  search engines, webmail systems, social networks, blogging platforms, file sharing systems, etc.  Many use publish-subscribe paradigm  FaceBook,  Twitter, YouTube Scaling via geographic distribution of infrastructure  moving around large sets of data  Content distribution, log collection, archiving data System Diversity 4   Successful distributed systems must provide high reliability, availability, performance, and scalability System diversity arises from variations in either  resources  e.g., or characteristics available to end hosts network diversity  requirements of individual end hosts themselves, usually arising due to end user demands  e.g., interest diversity Many Forms of System Diversity 5  Interest   Network   Hardware and software Workload   Bandwidth, location, connectivity, packet loss rates Platform   Subscription heterogeneity Time of day effects, flash crowds Other  Availability Taxonomies of Publish-Subscribe Systems 6  Subscription Model   Subject based  Content based  Architecture  Client-Server  Peer-to-Peer Delivery Mechanisms  Unicast (“1-to-1”)  Single source multicast (“1-to-n”)  Convergecast (“n-to-1”)  General purpose multicast (“m-to-n”) RSS (contemporary delivery mechanism), Gryphon [Bhola03], Siena [Sandler05], [Carzaniga01], RSS (contemporary delivery mechanism), Scribe [Castro02], FeedTree Scribe [Castro02], SplitStream [Castro03], SpiderCast [Chockler07], AnySee [Liao06] Scribe IP Multicast, [Castro02], ESM FeedTree[Sandler05], [Chu02], RMTP [Paul96], Bayeux Gossip [Zhuang01], [Demers88], SpiderCast Bimodal Multicast RSS (contemporary delivery mechanism), E-mail Listservs Corona [Ramasubramanian06], Cobra [Rose07] Gryphon [Bhola03], Siena [Carzaniga01], Sub-2-Sub [Voulgaris05] Bayeux [Zhuang01], SpiderCast[Chockler07] [Chockler07], [Birman99] Sub-2-Sub [Voulgaris05] Thesis Focus 7  We choose to focus on the convergecast and general purpose multicast paradigms as  these paradigms remain the least explored of the lot with respect to system diversity  Unicast is meant for simplicity of use  Multicast is well studied  these paradigms are most relevant in today’s computing  Multi-site cloud computing, grid computing, etc.  Improved delivery for RSS Thesis Statement 8 “We show that by directly addressing interest and network diversity as a first class design principle, the scale and performance of peer-to-peer publishsubscribe systems can be improved.” Thesis Contributions 9  Confluence (n-to-1) A system for lossless collection of data from multiple sources to a single sink  Exploits both spatial and temporal network diversity   RANS Framework Provides realistic and deterministic simulation at large scales  Helps model network, interest, and availability diversity   Rappel (m-to-n) A system that leverages interest and network locality to improve fairness  Exploits network and interest diversity  Outline 10 • • • Introduction • Computing Today System Diversity Publish-Subscribe Systems Thesis Statement • Confluence • RANS Framework Contributions • Rappel • Concluding Remarks Conclusion 11 Confluence A System for Lossless Multi-Source Data Collection Introduction 12  New emerging paradigm: fetching large files from multiple sources to a central clearing house  Multiple data centers (cloud computing)  PlanetLab servers  Tele-immersive video gateways  Grid computing  A n-to-1 publish-subscribe system  Multiple sources = multiple publishers  Single sink = single subscriber The Objective 13 “To minimize the total time required to transfer the necessary files from the source nodes to the sink node.”  Currently, there are no known systems that optimize for this goal  End users generally use the direct transfer (1-to-1) strategy to fetch files Key Observation 14  The diversity of connections amongst Internet hosts has been widely observed and falls into two categories  Spatial diversity refers to the fact that different links have different bandwidth availabilities  Temporal diversity refers to the variation over time of the available bandwidth at a single link 2500 secs Host x Host y 2 MBps 5000 MB Exploit Natural Parallelism 1 MBps 5000 secs 5000 MB 5 MBps 1000 secs Host t 15 Motivating Example The transfer process can be speeded up by routing data via intermediate nodes • 37% of PlanetLab node pairs achieve better throughput by leveraging a third node System Assumptions 16   Files can be subdivided into blocks All files (file blocks) are unique  no  a priori replicas present in the system Source node failures do not occur  if a source node fails, Confluence provides no resiliency guarantees  acceptable as same problem with Direct Transfer Network Graph Model 17 Host x Host y x+ y+ Cx + C y+ x0 Cx - Host t  x- C ycxt The network graph G supports both  asymmetric link capacities  asymmetric ISP connectivity limits cyx cxy y0 t+ Ct + t0 cyt y- Theoretical Solution (Spatial Only) 18     bx Graph translation: GT T is duration Add super source s Connect super source to nodes via file edges   bx capacity equals file size held total capacity of s, B   bi i  The maximum st0 flow in GT   corresponding flow: f T Find the smallest time T* that can move B blocks from st0    binary search on [0, Tmax) corresponding flow: f T* optimal flow rates: f *=f T*/ T* by s x+ y+ Cx+x+ TC TC Cy+y+ x0 y0 - C TC xx x- Time complexity: O(log (VBC) VE log cyxyx Tc Tc cxyxy TC Cy-y- cTc xtxt t+ (V2/E)) t0 ++ C TC t t cytyt Tc y- System Design 19  Maintaining network graph G   via coordinator Link information staleness: large gap between probes (round robin)  cost: must send data packets to measure  our solution: k << n  each participating node has k peers  chosen uniformly at random by the coordinator   Measuring cxy and CxPathChirp [Ribeiro2003]  PathBlast – brute force   needed only once, during bootstrapping Transfer Plan 20  Coordinator calculates based on theoretical solution  Sends directives to nodes: how many blocks, to whom  What about optimal flow rate?  First approach: Flow Control System [Mehra2003]  Subpar performance with multiple flows  Temporal diversity galore  Better approach: dynamic adaptation at the application level Dynamic Adaptation 21   Data is pushed from senders to all receivers simultaneously Periodic recomputation  Update network graph G with new state information  the number of residual file blocks: b´x  if rxy > f *xy × (1 − slack) then cxy = max(cxy , rxy) else cxy = max(cxy /2, rxy)  Coordinator  State recalculates transfer plan periodically inconsistency: final_computation flag rxy – measured flow rate f *xy – optimal flow rate slack – to prevent hysteresis Exploiting Block Replication 22      Conservation of blocks If the solution uses a file replication edge    e.g., node y holds some blocks originating at node x purge-immediately policy “Tag” blocks with origin node and unique ID File replication edge   b'x Replication naturally occurs during the transfer process REPLICATED_BLOCKS directive Lets origin node retransmit Note: threshold for replication b'y + byx s x+ y+ Cx + b yx x0 y0 cxy Cx - x- C y+ cxt cyx t+ Ct+ t0 C y- cyt y- Experimental Methodology 23  Implemented using ns2 network simulator   PlanetLab topology   trace measurements collected via S3 [Yalagandula06] Up to a 100 source nodes   TCP CUBIC as transport layer protocol 100 MB file per source node Confluence parameters Number of neighbors, k = 10  Recomputation interval, p = 15  Replication enabled with purge-immediately policy  Direct Transfer: Microbenchmark 24 Pool 49 Pool 10  Maximizing the number of parallel connections is the best approach  Poorly-connection connections start earlier Direct Transfer: Macrobenchmark 25      100 total nodes  50 runs: in run #i, node #i acts as a sink node First result: fetch from the first 50 nodes (49 source nodes) Second result: fetch from all nodes (99 source nodes) y-axis = ratio of second result to first result Direct Transfer scales will increasing number of source nodes  Well-connected nodes are better able to exploit parallelism Confluence vs. Direct Transfer 26  50 participating nodes, 50 different runs in run #i, node #i acts as a sink node, other nodes act as source nodes  each sink node fetches from all other participating nodes   Confluence performs better than Direct Transfer 70% perform better with Confluence on a planetary scale  90% perform better with Confluence on a continental scale  Overheads in Confluence 27  Measuring network graph G  may   be inaccurate, stale, or both k peers may not be able to saturate the sink Delayed start  metadata must be collected by the coordinator, solution calculated, and transfer plan directives sent  State inconsistency  The final set of blocks sent directly to coordinator may take sometime to finish  Tracking each block Confluence vs. Direct Transfer II 28  Confluence performs well on different planetary scale topologies   All nodes perform better in 3 topologies, 90% in one, and 70% in other Confluence excels in scenarios with small number of source nodes  100% under 50 nodes, 80% with 75 nodes, 70% with 100 nodes Related Work 29  CoBlitz   GridFTP   tool to move around large sets of data on the grid BitTorrent [Cohen03], CDNs [Akamai]   uses PlanetLab nodes to improve 1-to-1 transfers Multiple replicas of data available Miscellaneous   Estimating bandwidth: pathChirp [Ribeiro03], PTR/IGI [Hu03], Pathload [Jain02], pathrate [Dovrolis04], Pathneck [Hu04], etc. TCP for LFNs: High-Speed TCP [Chase01], TCP BIC [Xu04], TCP CUBIC [Rhee05], FAST TCP [Wie06], TCP-Illinois [Liu06], etc. Outline 30 • • • Introduction • Computing Today System Diversity Publish-Subscribe Systems Thesis Statement • Confluence • RANS Framework Contributions • Rappel • Concluding Remarks Conclusion 31 RANS Framework Realistic and Deterministic Simulation at Large Scales Motivation 32  Need to study system diversity at large scales  network diversity: end-to-end latency fluctuations  interest diversity: subscription correlation  System deployment is labor-intensive and limited  PlanetLab   usually only has about 400 accessible nodes Experiments are not replayable Simulations provide an alternative, but not realistic Introduction 33  RANS objectives  Realism  simulation results should match deployment observations  simulation should be run using the same code as an actual implementation  Deterministic replay  unmodified application should yield the same result when provided with identical input as a previous execution  Large scale  ability to simulate several thousand end nodes  selective granularity simulation Application Programming Interface 34   An application interfaces with the EventManager and the TransportManager An application can run multiple protocols Events 35 Messages 36 Implementation 37  Sockets implementation  single-threaded,  uses boost::asio Trace-based Simulator  Overnet churn trace Topology Manager 38  Goal: desire realistic end-to-end latency fluctuations    PlanetLab RTT trace with fluctuations    226 end hosts, over 4 hours continuous fluctuations between node pairs (median latency) Internet AS topology   Problem: artificial topology, limited scale of trace data Solution: topology fitting 20062 stub networks, 175 transit networks, and 8279 transit-and-stub network networks Topology fitting via trial and error  Match simulator generated latencies with PlanetLab median latencies     the first 10% of inter-AS links: latency between 0ms and 4ms the next 30% of inter-AS links: latency between 4ms and 30ms the final 60% of inter-AS links: latency between 30ms and 115ms Map each simulator node pair with a random PlanetLab node pair (for fluctuations) Topology Fitting Results 39  The latencies modeled by RANS closely matches the latencies experienced within PlanetLab Simulation vs. PlanetLab: Rappel Per-Feed Dissemination Tree 40 Experiment: 1 publisher, 250 subscribers, 4 hours, 1 update very minute Close match of results validates RANS framework 95% of nodes consistently receive updates in under 0.5 seconds Simulation vs. PlanetLab II 41 Close match of results validates RANS framework Stretch ratio w.r.t. to the underlying coordinate space is lower Related Work 42  Network simulators   Emulators   PlanetLab Application-level network simulators   ModelNet, EmuLab Testbed   ns2, QualNet, OPNET p2psim, GnutellaSim Programming libraries/languages for rapid deployment  Macedon, P2 Outline 43 • • • Introduction • Computing Today System Diversity Publish-Subscribe Systems Thesis Statement • Confluence • RANS Framework Contributions • Rappel • Concluding Remarks Conclusion 44 Rappel Using Locality to Improve Fairness in PublishSubscribe Systems System Goals 45  Peer-to-peer delivery for RSS  Low publisher and subscriber overhead  Optimizations due to m-to-n delivery paradigm  Client Fairness  Zero noise  Load should scale w/ number of subscriptions  Exploit  interest diversity Real-time update dissemination  Via a low stretch ratio  Exploit network diversity Design Overview 46 C H D A  Used to locate other nodes (“friends”) close in interestand network- proximity  The “friends overlay”  E B G F P1  P1 Pi A global control plane A multicast dissemination tree per feed Can be joined by contacting any active node  Only subscribers join a feed’s dissemination tree  K W Z B A  Eliminates noise Techniques Utilized in Rappel 47  Friends Overlay  Utility-based selection of friends  Exploit interest and network locality using Bloom filters and network coordinates  Discovering  Periodic  new nodes via gossip audits Per-feed Dissemination Trees  Primitives based on network coordinates  Periodic rejoin  Push-pull of updates Exploiting Network Diversity 48 Experiment: 1 publisher (at UIUC), 25 subscribers (distributed across USA) Rappel’s per-feed dissemination trees exploit network diversity Exploiting Interest Diversity 49 Experiment: 250 feeds, 5582 subscribers (simulation only) 91% clients perfectly covered; no “wasted” friend Rappel’s friendship overlay effectively exploits interest diversity Outline 50 • • • Introduction • Computing Today System Diversity Publish-Subscribe Systems Thesis Statement • Confluence • RANS Framework Contributions • Rappel • Concluding Remarks Conclusion Concluding Remarks 51 “We show that by directly addressing interest and network diversity as a first class design principle, the scale and performance of peer-to-peer publish-subscribe systems can be improved.”    Confluence exploits spatial and temporal network diversity for a convergecast delivery based publish-subscribe system RANS framework provides realistic and deterministic simulation to study system diversity at large scales Rappel exploits both interest and network diversity via the use of locality to improve fairness and system performance

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Exploiting System Diversity in Peer-to