Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Reliability and Relay Selection in Peerto-Peer Communication Systems Salman A. Baset and Henning Schulzrinne Internet Real-time Laboratory Department of Computer Science Columbia University August 3rd, 2010 Background 2 Peer-to-peer communication system media relay (or relay) node A node E NAT / firewall network address media of(3) node B? (2) (4) media (2) signaling Reliability of p2p. comm systems? (1) (3) signaling PSTN / Mobile Relay selectionP2P techniques? P2P / PSTN gateway (1) NAT / firewall node B (1) network address (2) of node E? (1) • • (2) signaling node C node = user agent node D • nodes form an overlay share responsibilities for message routing, signaling, media relaying super nodes, ordinary nodes 3 Outline How to find a relay in O(1) hop that minimizes latency and user annoyance? Motivation Sources of unreliability in p2p comm. systems? Reliability and Relay Selection How to quantify the interference of relayed calls with other applications? Improving reliability of relayed calls How to improve the reliability of relayed calls? How many relays per call to achieve 99.9% success rate? 4 Reliability framework • Reliability=Proportion of completed calls (99.9%) • Goal – understand reasons for call failure – devise techniques to improve them • Reasons for call failure – (1) distributed search fails to find online callee • DHT lookup – (2) distributed search fails to find a suitable relay • DHT lookup or any appropriate relay selection scheme – (3) relay fails during voice/video session • understand and improve reliability for relayed calls • devise techniques for finding a relay 5 Outline Motivation Reliability and Relay Selection How many relays per call to achieve 99.9% success rate? 6 Understanding reliability of relayed calls • Percentage of VoIP calls that need relaying – the provider knows – 15-20% calls for a commercial client-server IM / VoIP application – 341 relays in 20 days for Skype [Suh05Infocom] • 17 per day for a super node (~50K super nodes) – Some client-server providers relay all calls – NAT studies 7 Understanding reliability of relayed calls For desired reliability, minimum relays per call? – let Xi and Ri lifetime and residual lifetime of a relay candidate (i.i.d.) – let D denote the call duration. – when ith relay fails, call is switched (i+1)st relay which is instantly selected from the global pool of all relays. D 1 R1 2 K-1 k Desired rel P( Ri D) 99.9% i 0 Rk-1 k Rk Smallest k such that call completion prob. is greater than or equal to desired reliability k depends on the relationship b/w node lifetime and call duration 8 Understanding reliability of relayed calls Exponential node lifetimes 99.9% 1 ( /( v)) k Min # of relays k Skype node lifetimes 95% of Skype relay calls last less than 60 mins Min # of relays k Mean node lifetime Mean call duration 6 4 3 5 1 10 Skype 12 hours (mean) 4 hours (med) 3 (mean call holding time = one hour) lifetimes approximated as pareto 95% of Skype relayed call durations – minimum of 3 relays to maintain 99.9% success rate What if the system does not have enough relays? 9 Outline Motivation Reliability and Relay Selection Improving reliability of relayed calls How to improve the reliability of relayed calls? 10 Improving reliability of relayed calls • Approach 1 -- no-replacement – select k relays in the beginning of a call – do not replace failed relays pure death process P( RNR D) 1 P(max( R1 ... Rk ) D) • Approach 2 -- with-replacement – select k relays in the beginning of a call 1-(λ + μ) λ 2λ 2 – replace failed relays after μ – no failure during switch over 1 0 μ 1 / MTTF 22 /(3 ) [Bir04] – Skype uses 2-relay withreplacement scheme 1-2λ FRWR (t ) et / MTTF for P( RWR D) v /(MTTF v) 11 Improving reliability of relayed calls • No-replacement – add more relays? – diminishing returns • 1 vs. 2 vs. • MTTF 50% 3 vs. 22% 4 13% (exp) • No-replacement (NR) vs. with-replacement (WR) – depends on mean lifetime, call duration, repair time 2 relay with-replacement Skype mean=12 hours Median=4 hours search time=60s 12 Outline Motivation Reliability and Relay Selection How to quantify the interference of the relayed call with other applications? Improving reliability of relayed calls 13 User annoyance • Interference of relayed call with other applications running on the relay machine • File sharing = mutually beneficial (tit-for-tat) • Relaying = altruistic • Provide incentives or minimize user annoyance • How to quantify user annoyance? – automatically? – spare network capacity • Issues in measuring spare capacity? – bandwidth tests, ALTO 14 Outline How to find a relay in O(1) hop that minimizes latency and user annoyance? Motivation Reliability and Relay Selection Improving reliability of relayed calls 15 Distributed relay selection • Goal O(1) hop • 2-level hierarchical network Give me a relay Here is a randomly selected relay IP address RTT Bandwidth IP address RTT Bandwidth NAT search performance dropped calls close-by NAT 1-relay local-random scheme 16 Distributed relay selection • Delay • User annoyance • Results – interference with user applications – file sharing (draft idle peers) – spare capacity • random • mindelay – strategies perform similar near system collapse point – minimizing latency increases annoyance, number of jobs per relay, vice versa – threshold approach performs reasonably well – select relay with minimum delay • netmax – select relay with maximum spare bw • threshold – select relays with delay < 150 ms and maximum spare capacity 17 Related work • Modeling – On lifetime-based node failure and stochastic resilience of decentralized peer-to-peer networks [Leonard09ToN] • Minimizing churn – Minimizing churn in distributed systems [Godfrey06Sigcom] • Relay selection – ASAP: an AS-aware peer relay protocol for high quality VoIP [Ren06ICDCS] $ diff this related_work – focus on node isolation – minimizing churn is not sufficient – reliability, relay selection, user annoyance 18 Conclusion • Framework for analyzing reliability in p2p communication systems • A model for reliability of relayed calls • Reliability improvement schemes • User annoyance • Distributed relay selection 19