Download iptcom10-rel - Computer Science, Columbia University

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

SIP extensions for the IP Multimedia Subsystem wikipedia , lookup

Recursive InterNetwork Architecture (RINA) wikipedia , lookup

IEEE 802.1aq wikipedia , lookup

CAN bus wikipedia , lookup

Airborne Networking wikipedia , lookup

Microwave transmission wikipedia , lookup

TXE wikipedia , lookup

Distributed operating system wikipedia , lookup

Transcript
Reliability and Relay Selection in Peerto-Peer Communication Systems
Salman A. Baset and Henning Schulzrinne
Internet Real-time Laboratory
Department of Computer Science
Columbia University
August 3rd, 2010
Background
2
Peer-to-peer communication system
media relay
(or relay)
node A
node E
NAT /
firewall
network address
media
of(3)
node
B?
(2)
(4) media
(2) signaling
Reliability of p2p. comm systems?
(1)
(3) signaling
PSTN / Mobile
Relay selectionP2P
techniques?
P2P / PSTN
gateway
(1)
NAT /
firewall
node B
(1)
network address
(2)
of node E?
(1)
•
•
(2) signaling
node C
node = user agent
node D
•
nodes form an overlay
share responsibilities for
message routing, signaling,
media relaying
super nodes, ordinary nodes
3
Outline
How to find a relay in O(1)
hop that minimizes latency
and user annoyance?
Motivation
Sources of unreliability in
p2p comm. systems?
Reliability and
Relay Selection
How to quantify the
interference of relayed calls
with other applications?
Improving
reliability of
relayed calls
How to improve the
reliability of relayed
calls?
How many relays per call
to achieve 99.9% success
rate?
4
Reliability framework
• Reliability=Proportion of completed calls (99.9%)
• Goal
– understand reasons for call failure
– devise techniques to improve them
• Reasons for call failure
– (1) distributed search fails to find online callee
• DHT lookup
– (2) distributed search fails to find a suitable relay
• DHT lookup or any appropriate relay selection scheme
– (3) relay fails during voice/video session
• understand and improve reliability for relayed calls
• devise techniques for finding a relay
5
Outline
Motivation
Reliability and
Relay Selection
How many relays per call
to achieve 99.9% success
rate?
6
Understanding reliability of relayed calls
• Percentage of VoIP calls that need relaying
– the provider knows 
– 15-20% calls for a commercial client-server IM / VoIP
application
– 341 relays in 20 days for Skype [Suh05Infocom]
• 17 per day for a super node (~50K super nodes)
– Some client-server providers relay all calls
– NAT studies
7
Understanding reliability of relayed calls
For desired reliability, minimum relays per call?
– let Xi and Ri lifetime and residual lifetime of a relay
candidate (i.i.d.)
– let D denote the call duration.
– when ith relay fails, call is switched (i+1)st relay which is
instantly selected from the global pool of all relays.
D
1
R1
2
K-1
k
Desired rel  P( Ri  D)
99.9%
i 0
Rk-1
k
Rk
Smallest k such that call
completion prob. is greater than or
equal to desired reliability
k depends on the relationship b/w node lifetime and call duration
8
Understanding reliability of relayed calls
Exponential node lifetimes
99.9%  1  ( /(   v)) k
 Min # of
 relays k
Skype node lifetimes
95% of Skype relay calls last less than 60 mins
Min # of relays k
Mean node lifetime
Mean call duration
6
4
3
5
1
10
Skype
12 hours (mean)
4 hours (med)
3
(mean call
holding time
= one hour)
lifetimes approximated as pareto
95% of Skype relayed call durations – minimum of
3 relays to maintain 99.9% success rate
What if the system does not have enough relays?
9
Outline
Motivation
Reliability and
Relay Selection
Improving
reliability of
relayed calls
How to improve the
reliability of relayed
calls?
10
Improving reliability of relayed calls
• Approach 1 -- no-replacement
– select k relays in the
beginning of a call
– do not replace failed relays
pure death process
P( RNR  D)  1  P(max( R1  ...  Rk )  D)
• Approach 2 -- with-replacement
– select k relays in the
beginning of a call
1-(λ + μ)
λ
2λ
2
– replace failed relays after μ
– no failure during switch over
1
0
μ
1 / MTTF  22 /(3   )
[Bir04]
– Skype uses 2-relay withreplacement scheme
1-2λ
FRWR (t )  et / MTTF for   
P( RWR  D)  v /(MTTF  v)
11
Improving reliability of relayed calls
• No-replacement – add more relays?
– diminishing returns
• 1 vs. 2 vs.
• MTTF 50%
3 vs.
22%
4
13% (exp)
• No-replacement (NR) vs. with-replacement (WR)
– depends on mean lifetime, call duration, repair time
2 relay
with-replacement
Skype
mean=12 hours
Median=4 hours
search time=60s
12
Outline
Motivation
Reliability and
Relay Selection
How to quantify the
interference of the relayed
call with other applications?
Improving
reliability of
relayed calls
13
User annoyance
• Interference of relayed call with other applications
running on the relay machine
• File sharing = mutually beneficial (tit-for-tat)
• Relaying = altruistic
• Provide incentives or minimize user annoyance
• How to quantify user annoyance?
– automatically?
– spare network capacity
• Issues in measuring spare capacity?
– bandwidth tests, ALTO
14
Outline
How to find a relay in O(1)
hop that minimizes latency
and user annoyance?
Motivation
Reliability and
Relay Selection
Improving
reliability of
relayed calls
15
Distributed relay selection
• Goal O(1) hop
• 2-level hierarchical network
Give me
a relay
Here is a
randomly
selected
relay
IP address
RTT
Bandwidth
IP address
RTT
Bandwidth
NAT
search performance
dropped calls
close-by
NAT
1-relay
local-random scheme
16
Distributed relay selection
• Delay
• User annoyance
• Results
– interference with user
applications
– file sharing (draft idle peers)
– spare capacity
• random
• mindelay
– strategies perform similar near
system collapse point
– minimizing latency increases
annoyance, number of jobs
per relay, vice versa
– threshold approach performs
reasonably well
– select relay with minimum delay
• netmax
– select relay with maximum spare
bw
• threshold
– select relays with delay < 150
ms and maximum spare capacity
17
Related work
• Modeling
– On lifetime-based node failure and stochastic resilience of
decentralized peer-to-peer networks [Leonard09ToN]
• Minimizing churn
– Minimizing churn in distributed systems [Godfrey06Sigcom]
• Relay selection
– ASAP: an AS-aware peer relay protocol for high quality VoIP
[Ren06ICDCS]
$ diff this related_work
– focus on node isolation
– minimizing churn is not sufficient
– reliability, relay selection, user annoyance
18
Conclusion
• Framework for analyzing reliability in p2p
communication systems
• A model for reliability of relayed calls
• Reliability improvement schemes
• User annoyance
• Distributed relay selection
19