Download ibm-delhi - Computer Science, Columbia University

Peer-to-peer systems for autonomic VoIP and web hotspot handling Kundan Singh, Weibin Zhao and Henning Schulzrinne Internet Real Time Laboratory Computer Science Dept., Columbia University, New York http://www.cs.columbia.edu/IRT/p2p-sip http://www.cs.columbia.edu/IRT/dotslash P2P for autonomic computing • • • Autonomic at the application layer: – Robust against partial network faults – Resources grow as user population grows – Self-configuring Traditional p2p systems – file storage • motivation is often legal, not technical, efficiency – usually unstructured, optimized for Zipf-like popularity Other p2p applications: – Skype demonstrates usefulness for VoIP  • identifier lookup • NAT traversal: media traversal – OpenDHT (and similar) as emerging common infrastructure? – Non-DHT systems with smaller scope  web hotspot rescue – Network management (see our IRTF slides) IBM Delhi (Jan. 2006) 2 Aside: middle services instead of middleware • Common & successful network services – identifier lookup: ARP, DNS – network storage: proprietary (Yahoo, .mac, …) – storage + computation: CDNs • Emerging network services – peer-to-peer identifier lookup – network storage – network computation (“utility”) • maybe programmable • already found as web hosts and grid computing IBM Delhi (Jan. 2006) 3 What is P2P? • Computer systems Centralized Distributed mainframes workstations Client-server Flat Hierarchical RPC HTTP DNS mount Share the resources of individual peers – CPU, disk, bandwidth, information, … Communication and collaboration Magi Groove Skype Peer-to-peer Pure Hybrid Gnutella Chord Napster Groove Napster Gnutella Kazaa Freenet Overnet Kazaa C C S C C C P P P P P File sharing SETI@Home folding@Home Distributed computing IBM Delhi (Jan. 2006) 4 Distributed Hash Table (DHT) • Types of search – Central index (Napster) – Distributed index with flooding (Gnutella) – Distributed index with hashing (Chord, Bamboo, …) • Basic operations find(key), insert(key, value), delete(key), but no search(*) Properties/types Every peer has complete table Chord Every peer has one key/value Search time or messages O(1) O(log(N)) O(n) Join/leave messages O(n) O(log(N)2) O(1) IBM Delhi (Jan. 2006) 5 CAN Content Addressable Network • • • Each key maps to one point in the d-dimensional space Each node responsible for all the keys in its zone. Divide the space into zones. 1.0 C D E B A 0.0 1.0 0.0 C D A B IBM Delhi (Jan. 2006) E 6 CAN 1.0 E A B X E A B X Z .75 C C .5 D D .25 (x,y) 0.0 0.0 .25 .5 .75 1.0 Node X locates (x,y)=(.3,.1) Node Z joins State = 2d Search = dxn1/d IBM Delhi (Jan. 2006) 7 Chord • Identifier circle • Keys assigned to successor • Evenly distributed keys and nodes 1 54 8 58 10 14 47 21 42 38 32 38 24 30 1 2006) 2 IBM 0 Delhi (Jan. 3 4 5 6 7 88 Chord 1 54 8 58 10 14 47 Key node 8+1 = 9 14 8+2 = 10 14 8+4 = 12 14 8+8 = 16 21 8+16=24 32 8+32=40 42 21 • 42 Finger table: logN • ith finger points to first node that succeeds n by at least 2i-1 • Stabilization after join/leave 38 32 38 24 30 IBM Delhi (Jan. 2006) 9 Tapestry • ID with base B=2b • Route to numerically closest node to the given key • Routing table has O(B) columns. One per digit in node ID • Similar to CIDR – but suffixbased 763 427 364 123 324 365 135 564 **4 => *64 => 364 N=2 N=1 N=0 064 ?04 ??0 164 ?14 ??1 264 ?24 ??2 364 ?34 ??3 464 ?44 ??4 564 ?54 ??5 664 ?64 ??6 IBM Delhi (Jan. 2006) 10 Pastry • Prefix-based • Route to node with shared prefix (with the key) of ID at least one digit more than this node • Neighbor set, leaf set and routing table d471f1 d46a1c d467c4 d462ba d4213f Route(d46a1c) d13da3 65a1fc IBM Delhi (Jan. 2006) 11 Other schemes • • • • • • Distributed TRIE Viceroy Kademlia SkipGraph Symphony … IBM Delhi (Jan. 2006) 12 DHT Comparison Property/ scheme Un-structured CAN Chord Tapestry Pastry Viceroy Routing O(N) or no guarantee d x N1/d log(N) logBN logBN log(N) State Constant 2d log(N) logBN B.logBN log(N) Join/leave Constant 2d (logN)2 logBN logBN log(N) Reliability and fault resilience Data at Multiple locations; Retry on failure; finding popular content is efficient Multiple peers for each data item; retry on failure; multiple paths to destination Replicate data on consecutive peers; retry on failure Replicate data on multiple peers; keep multiple paths to each peers IBM Delhi (Jan. 2006) Routing load is evenly distributed among participant lookup servers 13 Server-based vs peer-to-peer Reliability, failover latency DNS-based. Depends on client retry timeout, DB replication latency, registration refresh interval DHT self organization and periodic registration refresh. Depends on client timeout, registration refresh interval. Scalability, number of users Depends on number of servers in the two stages. Depends on refresh rate, join/leave rate, uptime Call setup latency One or two steps. O(log(N)) steps. Security TLS, digest authentication, S/MIME Additionally needs a reputation system, working around spy nodes Maintenance, configuration Administrator: DNS, database, middle-box Automatic: one time bootstrap node addresses PSTN interoperability Gateways, TRIP, ENUM Interact with server-based infrastructure or colocate peer node with the gateway IBM Delhi (Jan. 2006) 14 The basic SIP service • HTTP: retrieve resource identified by URI • SIP: translate address-of-record SIP URI (sip:[email protected]) to one or more contacts (hosts or other AORs, e.g., sip:[email protected]) – single user  multiple hosts • e.g., home, office, mobile, secretary • can be equal or ordered sequentially • Thus, SIP is (also) a binding protocol – similar, in spirit, to mobile IP except application layer and without some of the related issues • Function performed by SIP proxy for AOR’s domain – delegated logically to location server • This function is being replaced by p2p approaches IBM Delhi (Jan. 2006) 15 What is SIP? Why P2P-SIP? (1) REGISTER [email protected] =>128.59.19.194 (2) INVITE [email protected] (3) Contact: 128.59.19.194 columbia.edu Bob’s host Alice’s host 128.59.19.194 Problem in client-server: maintenance, configuration, controlled infrastructure (2) INVITE alice Peer-to-peer network (3) 128.59.19.194 (1) REGISTER Alice 128.59.19.194 No central server, but more lookup latency IBM Delhi (Jan. 2006) 16 How to combine SIP + P2P? • • SIP-using-P2P – Replace SIP location service by a P2P protocol P2P-over-SIP – Additionally, implement P2P using SIP messaging SIP-using-P2P P2P SIP proxies P2P-over-SIP Maintenance P2P P2P SIP Lookup P2P SIP SIP P2P network FIND INSERT INVITE alice INVITE sip:[email protected] REGISTER P2P-SIP overlay Alice 128.59.19.194 Alice 128.59.19.194 IBM Delhi (Jan. 2006) 17 Design alternatives 1 54 58 servers 47 42 38 38 8 d471f1 14 10 d46a1c 21 d467c4 d462ba 1 54 d4213f 10 32 24 30 Route(d46a1c) d13da3 65a1fc 38 24 30 clients Use DHT in server farm Use DHT for all clients - but some are resource limited IBM Delhi (Jan. 2006) Use DHT among super-nodes 1. 2. Hierarchy Dynamically adapt 18 Deployment scenarios P P P P P P P P P P P P P P2P clients Plug and play; May use adaptors; Untrusted peers P P P2P proxies Zero-conf server farm; Trusted servers and user identities P2P database Global, e.g., OpenDHT; Clients or proxies can use; Trusted deployed peers Interoperate among these! IBM Delhi (Jan. 2006) 19 Hybrid architecture • Cross register, or • Locate during call setup – DNS, or – P2P-SIP hierarchy IBM Delhi (Jan. 2006) 20 What else can be P2P? • • • • • • Rendezvous/signaling (SIP) Configuration storage Media storage (e.g., voice mail) Identity assertion (?) PSTN gateway (?) NAT/media relay (find best one) Trust models are different for different components! IBM Delhi (Jan. 2006) 21 What is our P2P-SIP? • • • • Unlike server-based SIP architecture Unlike proprietary Skype architecture – Robust and efficient lookup using DHT – Interoperability • DHT algorithm uses SIP communication – Hybrid architecture • Lookup in SIP+P2P Unlike file-sharing applications – Data storage, caching, delay, reliability Disadvantages – Lookup delay and security IBM Delhi (Jan. 2006) 22 Implementation: SIPpeer • Platform: Unix (Linux), C++ • Modes: – Chord: using SIP for P2P maintenance – OpenDHT: using external P2P data storage • based on Bamboo DHT, running on PlanetLab nodes • Scenarios: – P2P client, P2P proxies – Adaptor for existing phones • Cisco, X-lite, Windows Messenger, SIPc – Server farm IBM Delhi (Jan. 2006) 23 P2P-SIP: identifier lookup • • • • P2P serves as SIP location server: – address-of-record  contacts – e.g., [email protected]  128.59.16.1, 128.72.50.13 multi-valued: (keyn, value1), (keyn, value2) with limited TTL variant: point to SIP proxy server – either operated by supernode or traditional server • allows registration of nonp2p SIP domains (*@example.com) – easier to provide call routing services (e.g., CPL) IBM Delhi (Jan. 2006) alice  128.59.16.1 alice  128.72.50.13 24 Background: DHT (Chord) 1 54 8 58 10 14 47 Key node 8+1 = 9 14 8+2 = 10 14 8+4 = 12 14 8+8 = 16 21 8+16=24 32 8+32=40 42 • • Identifier circle Keys assigned to successor Evenly distributed keys and nodes Finger table: logN – ith finger points to first node that succeeds n by at least 2i-1 Stabilization for join/leave • • • 21 42 38 32 38 24 30 0 1 IBM Delhi (Jan. 2006) 2 3 4 5 6 7 8 25 Implementation: SIPpeer User interface (buddy list, etc.) On reset Signout, transfer On startup Leave Discover Peer found/ Detect NAT ICE Signup, Find buddies IM, call User location Find Join DHT (Chord) Multicast REGISTER REGISTER Audio devices REGISTER, INVITE, MESSAGE SIP Codecs RTP/RTCP SIP-over-P2P P2P-using-SIP IBM Delhi (Jan. 2006) 26 P2P vs. server-based SIP • Prediction: – P2P for smaller & quick setup scenarios – Server-based for corporate and carrier • Need federated system – multiple p2p systems, identified by DNS domain name – with gateway nodes 2000 requests/second ≈7 million registered users IBM Delhi (Jan. 2006) 27 Open issues • Presence and IM – where to store presence information: need access authorization • Performance – how many supernodes are needed? (Skype: ~1000) • Reliability – P2P nodes generally replicate data – if proxy or presence agent at leaf, need proxy data replication • Security – Sybil attacks: blackholing supernodes – Identifier protection: protect first registrant against identity theft – Anonymity, encryption – Protecting voicemails on storage nodes • Optimization – Locality, proximity, media routing • Deployment – SIP-P2P vs P2P-SIP, Intra-net, ISP servers • Motivation – Why should I run as super-node? IBM Delhi (Jan. 2006) 28 Comparison of P2P and server-based systems server-based P2P scaling server count  scales with user count, but limited by supernode count efficiency most efficient DHT maintenance = O((log N)2) security trust server provider; binary trust most supernodes; probabilistic reliability server redundancy; catastrophic failure possible unreliable supernodes; catastrophic failure unlikely IBM Delhi (Jan. 2006) 29 Using P2P for binding updates • Proxies do more than just plain identifier translation: – translation may depend on who’s asking, time of day, … • e.g., based on script output • hide full range of contacts from caller – sequential and parallel forking – disconnected services: e.g., forward to voicemail if no answer • Using a DHT as a location service  – use only plain translation Skype approach – run services on end systems – run proxy services on supernode(s) and use proxy as contact  need replication for reliability IBM Delhi (Jan. 2006) 30 Reliability and scalability Two stage architecture for CINEMA a*@example.com a1 s1 Master a2 a.example.com _sip._udp SRV 0 0 a1.example.com SRV 1 0 a2.example.com Slave sip:[email protected] s2 sip:[email protected] b*@example.com s3 ex example.com _sip._udp SRV 0 40 s1.example.com SRV 0 40 s2.example.com SRV 0 20 s3.example.com SRV 1 0 ex.backup.com b1 Master b2 Slave b.example.com _sip._udp SRV 0 0 b1.example.com SRV 1 0 b2.example.com Request-rate = f(#stateless, #groups) Bottleneck: CPU, memory, bandwidth? Failover latency: ? IBM Delhi (Jan. 2006) 31 SIP p2p summary • • • Advantages – Out-of-box experience – Robust • catastrophic failureunlikely – Inherently scalable • • more with more nodes Status – IETF involvement – Columbia SIPpeer Security issues – Trust, reputation – malicious node, sybil attack – SPAM, DDoS – Privacy, anonymity (?) Other issues – Lookup latency,proximity – P2P-SIP vs SIP-using-P2P – Why should I run as supernode? http://www.p2psip.org and http://www.cs.columbia.edu/IRT/p2p-sip IBM Delhi (Jan. 2006) 32 Weibin Zhao Henning Schulzrinne DotSlash: An Automated Web Hotspot Rescue System The problem • Web hotspots – Also known as flash crowds or the Slashdot effect – Short-term dramatic load spikes at web servers • Existing mechanisms are not sufficient – Over-provisioning • Inefficient for rare events • Difficult because the peak load is hard to predict – CDNs • Expensive for small web sites that experience the Slashdot effect IBM Delhi (Jan. 2006) 34 The challenges • Automate hotspot handling – Eliminate human intervention to react quickly – Improve availability during critical periods (“15 minutes of fame”) • Allocate resources dynamically – Static configuration is insufficient for unexpected dramatic load spikes • Address different bottlenecks – Access network, web server, application server, and database server IBM Delhi (Jan. 2006) 35 Our approach • DotSlash – An automated web hotspot rescue system by building an adaptive distributed web server system on the fly • Advantages – Fully self-configuring – no configuration • Service discovery, adaptive control, dynamic virtual hosting – Scalable, easy to use – Works for static & LAMP applications • handles network, CPU and database server bottlenecks – Transparent to clients • cf. CoralCache IBM Delhi (Jan. 2006) 36 DotSlash overview • Rescue model – Mutual aid community using spare capacity – Potential usage by web hosting companies • DotSlash components – Workload monitoring – Rescue server discovery – Load migration (request redirection) – Dynamic virtual hosting – Adaptive rescue and overload control IBM Delhi (Jan. 2006) 37 Handling load spikes • Request redirection – DNS-RR: reduce arrival rate – HTTP redirect: increase service rate • Handle different bottlenecks Technique Bottleneck Addressed Cache static content Network, web server Replicate scripts dynamically Application server Cache query results on demand Database server IBM Delhi (Jan. 2006) 38 Rescue example • Cache static content client1 (2) HTTP redirect (4) (3) (1) reverse proxy origin server rescue server (3) (4) (1) DNS server client2 (2) DNS round robin IBM Delhi (Jan. 2006) 39 Rescue example (2) • Replicate scripts dynamically Apache origin server (1) client (2) (8) (3) (4) (5) PHP rescue server PHP (6) PHP database server (7) MySQL Apache IBM Delhi (Jan. 2006) 40 Rescue example (3) • client Cache query results on demand origin server rescue server query result cache data driver database server query result cache data driver IBM Delhi (Jan. 2006) database server 41 Server states Origin server Get help from others SOS state Allocate rescue server Release all rescues Normal state Accept SOS request Rescue server Provide help to others Shutdown all rescues Rescue state IBM Delhi (Jan. 2006) 42 Handling load spikes • Load migration – DNS-RR: reduce arrival rate – HTTP redirect: increase service rate – Both: increase throughput • Benefits – Reduce origin server network load by caching static content at rescue servers – Reduce origin web server CPU load by replicating scripts dynamically to rescue servers IBM Delhi (Jan. 2006) 43 Adaptive overload control • Objective – CPU and network load in desired load region • Origin server – Allocate/release rescue servers – Adjust redirect probability • Rescue server – Accept SOS requests – Shutdown rescues – Adjust allowed redirect rate IBM Delhi (Jan. 2006) 44 Self-configuring • Rescue server discovery via SLP and DNS SRV • Dynamic virtual hosting: – Serving content of a new site on the fly – use “pre-positioned” Apache virtual hosts • Workload monitoring: network and CPU – take headers and responses into account • Adaptive rescue control – Don’t know precise load handling capacity of rescue servers • particularly for active content – Establish desired load region (typically, ~70%) – Periodically measure and adjust redirect probability • convey via protocol IBM Delhi (Jan. 2006) 45 Implementation • Based on LAMP (Linux, Apache, MySQL, PHP) • Apache module (mod_dots), DotSlash daemon (dotsd), DotSlash rescue protocol (DSRP) • Dynamic DNS using BIND with dot-slash.net • Service discovery using enhanced SLP HTTP client mod_dots SHM Apache dotsd DNS BIND IBM Delhi (Jan. 2006) DSRP other dotsd SLP mSLP 46 Handling File Inclusions • The problem – A replicated script may include files that are located at the origin server – Assume: included files under DocumentRoot • Approaches – Renaming inclusion statements • Need to parse scripts: heavy weight – Customized error handler • Catch inclusion errors: light weight IBM Delhi (Jan. 2006) 47 Evaluation • Workload generation – httperf for static content – RUBBoS (bulletin board) for dynamic content • Testbed – LAN cluster and WAN (PlanetLab) nodes – Linux Redhat 9.0, Apache 2.0.49, MySQL 4.0.18, PHP 4.3.6 • Metrics – Max request rate and max data rate supported IBM Delhi (Jan. 2006) 48 Results in LANs Request rate, redirect rate, rescue rate IBM Delhi (Jan. 2006) Date rate 49 Handling worst-case workload Settling time: 24 second #timeouts 921/113565 IBM Delhi (Jan. 2006) 50 Results for dynamic content Configuration: Origin (HC) Rescue(LC) (LC) Rescue Rescue (LC) Rescue(LC) (LC) Rescue Rescue (LC) Rescue(LC) (LC) Rescue Rescue (LC) DB (HC) No rescue: R=118 CPU: Origin=100% DB=45% With rescue: R=245 #rescue servers: 9 CPU: Origin=55% DB=100% 245/118>2 IBM Delhi (Jan. 2006) 51 Caching TTL and Hit Ratio (Read-Only) 100 95 Cache hit ratio (%) 90 85 80 75 70 65 60 0 10 1 10 10 Caching TTL (seconds) IBM Delhi (Jan. 2006) 2 10 3 52 CPU Utilization (Read-Only) 100 READ3 database server READ4 database server READ5 database server READ5 shared cache server 90 CPU utilization (%) 80 70 READ3 with rescue no cache 60 READ4 with rescue with co-located cache 50 40 30 20 READ5 with rescue with shared cache 10 0 500 1000 1500 2000 2500 Number of clients 3000 IBM Delhi (Jan. 2006) 3500 4000 53 Request Rate (Read-Only) 550 READ3 with rescue no cache 500 Requests per second 450 400 350 READ4 with rescue with co-located cache 300 250 200 READ3 READ4 READ5 150 100 500 1000 1500 2000 2500 Number of clients 3000 IBM Delhi (Jan. 2006) 3500 READ5 with rescue with shared cache 4000 54 CPU Utilization (Submission) Origin database server CPU utilization (%) 100 SUB4 SUB6 SUB5 90 80 SUB4 with rescue no cache 70 60 SUB5 with rescue with cache no invalidation 50 40 30 20 10 0 3000 3500 4000 4500 5000 5500 Number of clients 6000 IBM Delhi (Jan. 2006) 6500 7000 SUB6 with rescue with cache with invalidation 55 Request Rate (Submission) 900 SUB4 with rescue no cache 850 Requests per second 800 750 SUB4 SUB6 SUB5 700 SUB5 with rescue with cache no invalidation 650 600 550 500 450 400 3000 3500 4000 4500 5000 5500 Number of clients 6000 IBM Delhi (Jan. 2006) 6500 7000 SUB6 with rescue with cache with invalidation 56 Performance • Static content (httperf) – 10-fold improvement – Relieve network and web server bottlenecks • Dynamic content (RUBBoS) – Completely remove web/application server bottleneck – Relieve database server bottleneck – Overall improvement: 10 times for read-only mix, 5 times for submission mix IBM Delhi (Jan. 2006) 57 Conclusion • DotSlash prototype – Applicable to both static and dynamic content – Promising performance improvement – Released as open-source software • On-going work – Address security issues in deployment – Extensible to SIP servers? Web services? • For further information – http://www.cs.columbia.edu/IRT/dotslash – DotSlash framework: WCW 2004 – Dynamic script replication: Global Internet 2005 – On-demand query result cache: TR CUCS-035-05 (under submission) IBM Delhi (Jan. 2006) 58

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download ibm-delhi - Computer Science, Columbia University