Download ibm-delhi - Computer Science, Columbia University

Document related concepts

Remote Desktop Services wikipedia , lookup

Zero-configuration networking wikipedia , lookup

SIP extensions for the IP Multimedia Subsystem wikipedia , lookup

Lag wikipedia , lookup

Peer-to-peer wikipedia , lookup

Transcript
Peer-to-peer systems for autonomic VoIP and web
hotspot handling
Kundan Singh, Weibin Zhao and Henning Schulzrinne
Internet Real Time Laboratory
Computer Science Dept., Columbia University, New York
http://www.cs.columbia.edu/IRT/p2p-sip
http://www.cs.columbia.edu/IRT/dotslash
P2P for autonomic computing
•
•
•
Autonomic at the application layer:
– Robust against partial network faults
– Resources grow as user population grows
– Self-configuring
Traditional p2p systems
– file storage
• motivation is often legal, not technical, efficiency
– usually unstructured, optimized for Zipf-like popularity
Other p2p applications:
– Skype demonstrates usefulness for VoIP 
• identifier lookup
• NAT traversal: media traversal
– OpenDHT (and similar) as emerging common infrastructure?
– Non-DHT systems with smaller scope  web hotspot rescue
– Network management (see our IRTF slides)
IBM Delhi (Jan. 2006)
2
Aside: middle services instead of middleware
• Common & successful network services
– identifier lookup: ARP, DNS
– network storage: proprietary (Yahoo, .mac, …)
– storage + computation: CDNs
• Emerging network services
– peer-to-peer identifier lookup
– network storage
– network computation (“utility”)
• maybe programmable
• already found as web hosts and grid computing
IBM Delhi (Jan. 2006)
3
What is P2P?
•
Computer systems
Centralized
Distributed
mainframes
workstations
Client-server
Flat
Hierarchical
RPC
HTTP
DNS
mount
Share the resources of
individual peers
– CPU, disk, bandwidth,
information, …
Communication and collaboration
Magi
Groove
Skype
Peer-to-peer
Pure
Hybrid
Gnutella
Chord
Napster
Groove
Napster
Gnutella
Kazaa
Freenet
Overnet
Kazaa
C
C
S
C
C
C
P
P
P
P
P
File sharing
SETI@Home
folding@Home
Distributed computing
IBM Delhi (Jan. 2006)
4
Distributed Hash Table (DHT)
• Types of search
– Central index (Napster)
– Distributed index with flooding (Gnutella)
– Distributed index with hashing (Chord, Bamboo, …)
• Basic operations
find(key), insert(key, value), delete(key),
but no search(*)
Properties/types
Every peer has complete
table
Chord
Every peer has one
key/value
Search time or messages
O(1)
O(log(N))
O(n)
Join/leave messages
O(n)
O(log(N)2)
O(1)
IBM Delhi (Jan. 2006)
5
CAN
Content Addressable Network
•
•
•
Each key maps to one point in the
d-dimensional space
Each node responsible for all the
keys in its zone.
Divide the space into zones.
1.0
C
D
E
B
A
0.0
1.0
0.0
C
D
A
B
IBM Delhi (Jan. 2006)
E
6
CAN
1.0
E
A
B
X
E
A
B
X Z
.75
C
C
.5
D
D
.25
(x,y)
0.0
0.0
.25
.5
.75
1.0
Node X locates (x,y)=(.3,.1)
Node Z joins
State = 2d
Search = dxn1/d
IBM Delhi (Jan. 2006)
7
Chord
• Identifier circle
• Keys assigned to
successor
• Evenly distributed keys
and nodes
1
54
8
58
10
14
47
21
42
38
32
38
24
30
1 2006) 2
IBM 0
Delhi (Jan.
3
4
5
6
7
88
Chord
1
54
8
58
10
14
47
Key
node
8+1 = 9
14
8+2 = 10
14
8+4 = 12
14
8+8 = 16
21
8+16=24
32
8+32=40
42
21 •
42
Finger table: logN
• ith finger points to first node
that succeeds n by at least 2i-1
• Stabilization after join/leave
38
32
38
24
30
IBM Delhi (Jan. 2006)
9
Tapestry
• ID with base B=2b
• Route to numerically closest
node to the given key
• Routing table has O(B) columns.
One per digit in node ID
• Similar to CIDR – but suffixbased
763
427
364
123
324
365
135
564
**4 => *64 => 364
N=2
N=1
N=0
064
?04
??0
164
?14
??1
264
?24
??2
364
?34
??3
464
?44
??4
564
?54
??5
664
?64
??6
IBM Delhi (Jan. 2006)
10
Pastry
• Prefix-based
• Route to node with shared
prefix (with the key) of ID at
least one digit more than this
node
• Neighbor set, leaf set and
routing table
d471f1
d46a1c
d467c4
d462ba
d4213f
Route(d46a1c)
d13da3
65a1fc
IBM Delhi (Jan. 2006)
11
Other schemes
•
•
•
•
•
•
Distributed TRIE
Viceroy
Kademlia
SkipGraph
Symphony
…
IBM Delhi (Jan. 2006)
12
DHT Comparison
Property/
scheme
Un-structured
CAN
Chord
Tapestry
Pastry
Viceroy
Routing
O(N) or no
guarantee
d x N1/d
log(N)
logBN
logBN
log(N)
State
Constant
2d
log(N)
logBN
B.logBN
log(N)
Join/leave
Constant
2d
(logN)2
logBN
logBN
log(N)
Reliability
and fault
resilience
Data at Multiple
locations;
Retry on failure;
finding popular
content is efficient
Multiple peers
for each data
item; retry on
failure; multiple
paths to
destination
Replicate data on
consecutive peers;
retry on failure
Replicate data on multiple peers;
keep multiple paths to each peers
IBM Delhi (Jan. 2006)
Routing load is
evenly distributed
among participant
lookup servers
13
Server-based vs peer-to-peer
Reliability, failover
latency
DNS-based. Depends on client retry timeout,
DB replication latency, registration refresh
interval
DHT self organization and periodic registration
refresh. Depends on client timeout, registration
refresh interval.
Scalability, number of
users
Depends on number of servers in the two
stages.
Depends on refresh rate, join/leave rate,
uptime
Call setup latency
One or two steps.
O(log(N)) steps.
Security
TLS, digest authentication, S/MIME
Additionally needs a reputation system, working
around spy nodes
Maintenance,
configuration
Administrator: DNS, database, middle-box
Automatic: one time bootstrap node addresses
PSTN interoperability
Gateways, TRIP, ENUM
Interact with server-based infrastructure or colocate peer node with the gateway
IBM Delhi (Jan. 2006)
14
The basic SIP service
• HTTP: retrieve resource identified by URI
• SIP: translate address-of-record SIP URI
(sip:[email protected]) to one or more contacts (hosts
or other AORs, e.g., sip:[email protected])
– single user  multiple hosts
• e.g., home, office, mobile, secretary
• can be equal or ordered sequentially
• Thus, SIP is (also) a binding protocol
– similar, in spirit, to mobile IP except application layer
and without some of the related issues
• Function performed by SIP proxy for AOR’s domain
– delegated logically to location server
• This function is being replaced by p2p approaches
IBM Delhi (Jan. 2006)
15
What is SIP? Why P2P-SIP?
(1) REGISTER
[email protected] =>128.59.19.194
(2) INVITE [email protected]
(3) Contact: 128.59.19.194
columbia.edu
Bob’s host
Alice’s host
128.59.19.194
Problem in client-server: maintenance, configuration, controlled infrastructure
(2) INVITE alice
Peer-to-peer network
(3) 128.59.19.194
(1) REGISTER
Alice
128.59.19.194
No central server, but more lookup latency
IBM Delhi (Jan. 2006)
16
How to combine SIP + P2P?
•
•
SIP-using-P2P
– Replace SIP location
service by a P2P
protocol
P2P-over-SIP
– Additionally,
implement P2P using
SIP messaging
SIP-using-P2P
P2P SIP proxies
P2P-over-SIP
Maintenance
P2P
P2P
SIP
Lookup
P2P
SIP
SIP
P2P network
FIND
INSERT
INVITE alice
INVITE sip:[email protected]
REGISTER
P2P-SIP
overlay
Alice
128.59.19.194
Alice
128.59.19.194
IBM Delhi (Jan. 2006)
17
Design alternatives
1
54
58
servers
47
42
38
38
8
d471f1
14 10
d46a1c
21
d467c4
d462ba
1
54
d4213f
10
32
24 30
Route(d46a1c)
d13da3
65a1fc
38
24 30
clients
Use DHT in
server farm
Use DHT for all
clients - but some
are resource limited
IBM Delhi (Jan. 2006)
Use DHT among super-nodes
1.
2.
Hierarchy
Dynamically adapt
18
Deployment scenarios
P
P
P
P
P
P
P
P
P
P
P
P
P
P2P clients
Plug and play;
May use adaptors;
Untrusted peers
P
P
P2P proxies
Zero-conf server farm;
Trusted servers and
user identities
P2P database
Global, e.g., OpenDHT;
Clients or proxies can use;
Trusted deployed peers
Interoperate among these!
IBM Delhi (Jan. 2006)
19
Hybrid architecture
• Cross register, or
• Locate during call setup
– DNS, or
– P2P-SIP hierarchy
IBM Delhi (Jan. 2006)
20
What else can be P2P?
•
•
•
•
•
•
Rendezvous/signaling (SIP)
Configuration storage
Media storage (e.g., voice mail)
Identity assertion (?)
PSTN gateway (?)
NAT/media relay (find best one)
Trust models are different for different components!
IBM Delhi (Jan. 2006)
21
What is our P2P-SIP?
•
•
•
•
Unlike server-based SIP architecture
Unlike proprietary Skype architecture
– Robust and efficient lookup using DHT
– Interoperability
• DHT algorithm uses SIP communication
– Hybrid architecture
• Lookup in SIP+P2P
Unlike file-sharing applications
– Data storage, caching, delay, reliability
Disadvantages
– Lookup delay and security
IBM Delhi (Jan. 2006)
22
Implementation: SIPpeer
• Platform: Unix (Linux), C++
• Modes:
– Chord: using SIP for P2P maintenance
– OpenDHT: using external P2P data storage
• based on Bamboo DHT, running on PlanetLab nodes
• Scenarios:
– P2P client, P2P proxies
– Adaptor for existing phones
• Cisco, X-lite, Windows Messenger, SIPc
– Server farm
IBM Delhi (Jan. 2006)
23
P2P-SIP: identifier lookup
•
•
•
•
P2P serves as SIP location server:
– address-of-record  contacts
– e.g., [email protected] 
128.59.16.1, 128.72.50.13
multi-valued: (keyn, value1), (keyn,
value2)
with limited TTL
variant: point to SIP proxy server
– either operated by supernode or
traditional server
• allows registration of nonp2p SIP domains
(*@example.com)
– easier to provide call routing
services (e.g., CPL)
IBM Delhi (Jan. 2006)
alice  128.59.16.1
alice  128.72.50.13
24
Background: DHT (Chord)
1
54
8
58
10
14
47
Key
node
8+1 = 9
14
8+2 = 10
14
8+4 = 12
14
8+8 = 16
21
8+16=24
32
8+32=40
42
•
•
Identifier circle
Keys assigned to
successor
Evenly distributed keys
and nodes
Finger table: logN
– ith finger points to
first node that
succeeds n by at
least 2i-1
Stabilization for
join/leave
•
•
•
21
42
38
32
38
24
30
0
1
IBM Delhi (Jan. 2006)
2
3
4
5
6
7
8
25
Implementation: SIPpeer
User interface (buddy list, etc.)
On reset Signout,
transfer
On startup
Leave
Discover
Peer found/
Detect NAT
ICE
Signup,
Find buddies
IM,
call
User location
Find
Join
DHT (Chord)
Multicast REGISTER
REGISTER
Audio devices
REGISTER,
INVITE,
MESSAGE
SIP
Codecs
RTP/RTCP
SIP-over-P2P
P2P-using-SIP
IBM Delhi (Jan. 2006)
26
P2P vs. server-based SIP
• Prediction:
– P2P for smaller &
quick setup
scenarios
– Server-based for
corporate and
carrier
• Need federated system
– multiple p2p
systems, identified
by DNS domain
name
– with gateway nodes
2000 requests/second ≈7 million
registered users
IBM Delhi (Jan. 2006)
27
Open issues
• Presence and IM
– where to store presence information: need access authorization
• Performance
– how many supernodes are needed? (Skype: ~1000)
• Reliability
– P2P nodes generally replicate data
– if proxy or presence agent at leaf, need proxy data replication
• Security
– Sybil attacks: blackholing supernodes
– Identifier protection: protect first registrant against identity theft
– Anonymity, encryption
– Protecting voicemails on storage nodes
• Optimization
– Locality, proximity, media routing
• Deployment
– SIP-P2P vs P2P-SIP, Intra-net, ISP servers
• Motivation
– Why should I run as super-node?
IBM Delhi (Jan. 2006)
28
Comparison of P2P and server-based systems
server-based
P2P
scaling
server count 
scales with user count,
but limited by
supernode count
efficiency
most efficient
DHT maintenance =
O((log N)2)
security
trust server provider;
binary
trust most supernodes;
probabilistic
reliability
server redundancy;
catastrophic failure
possible
unreliable supernodes;
catastrophic failure
unlikely
IBM Delhi (Jan. 2006)
29
Using P2P for binding updates
• Proxies do more than just plain identifier translation:
– translation may depend on who’s asking, time of day,
…
• e.g., based on script output
• hide full range of contacts from caller
– sequential and parallel forking
– disconnected services: e.g., forward to voicemail if
no answer
• Using a DHT as a location service 
– use only plain translation
Skype approach
– run services on end systems
– run proxy services on supernode(s) and use proxy as
contact  need replication for reliability
IBM Delhi (Jan. 2006)
30
Reliability and scalability
Two stage architecture for CINEMA
a*@example.com
a1
s1
Master
a2
a.example.com
_sip._udp
SRV 0 0 a1.example.com
SRV 1 0 a2.example.com
Slave
sip:[email protected]
s2
sip:[email protected]
b*@example.com
s3
ex
example.com
_sip._udp
SRV 0 40 s1.example.com
SRV 0 40 s2.example.com
SRV 0 20 s3.example.com
SRV 1 0 ex.backup.com
b1
Master
b2
Slave
b.example.com
_sip._udp
SRV 0 0 b1.example.com
SRV 1 0 b2.example.com
Request-rate = f(#stateless, #groups)
Bottleneck: CPU, memory, bandwidth?
Failover latency: ?
IBM Delhi (Jan. 2006)
31
SIP p2p summary
•
•
•
Advantages
– Out-of-box experience
– Robust
• catastrophic failureunlikely
– Inherently scalable
•
• more with more
nodes
Status
– IETF involvement
– Columbia SIPpeer
Security issues
– Trust, reputation
– malicious node, sybil
attack
– SPAM, DDoS
– Privacy, anonymity (?)
Other issues
– Lookup latency,proximity
– P2P-SIP vs SIP-using-P2P
– Why should I run as supernode?
http://www.p2psip.org and
http://www.cs.columbia.edu/IRT/p2p-sip
IBM Delhi (Jan. 2006)
32
Weibin Zhao
Henning Schulzrinne
DotSlash: An Automated Web Hotspot Rescue System
The problem
• Web hotspots
– Also known as flash crowds or the Slashdot effect
– Short-term dramatic load spikes at web servers
• Existing mechanisms are not sufficient
– Over-provisioning
• Inefficient for rare events
• Difficult because the peak load is hard to predict
– CDNs
• Expensive for small web sites that experience the
Slashdot effect
IBM Delhi (Jan. 2006)
34
The challenges
• Automate hotspot handling
– Eliminate human intervention to react quickly
– Improve availability during critical periods (“15
minutes of fame”)
• Allocate resources dynamically
– Static configuration is insufficient for unexpected
dramatic load spikes
• Address different bottlenecks
– Access network, web server, application server, and
database server
IBM Delhi (Jan. 2006)
35
Our approach
• DotSlash
– An automated web hotspot rescue system by building
an adaptive distributed web server system on the fly
• Advantages
– Fully self-configuring – no configuration
• Service discovery, adaptive control, dynamic virtual
hosting
– Scalable, easy to use
– Works for static & LAMP applications
• handles network, CPU and database server
bottlenecks
– Transparent to clients
• cf. CoralCache
IBM Delhi (Jan. 2006)
36
DotSlash overview
• Rescue model
– Mutual aid community using spare capacity
– Potential usage by web hosting companies
• DotSlash components
– Workload monitoring
– Rescue server discovery
– Load migration (request redirection)
– Dynamic virtual hosting
– Adaptive rescue and overload control
IBM Delhi (Jan. 2006)
37
Handling load spikes
• Request redirection
– DNS-RR: reduce arrival rate
– HTTP redirect: increase service rate
• Handle different bottlenecks
Technique
Bottleneck Addressed
Cache static content
Network, web server
Replicate scripts dynamically
Application server
Cache query results on demand
Database server
IBM Delhi (Jan. 2006)
38
Rescue example
•
Cache static content
client1
(2) HTTP redirect
(4)
(3)
(1)
reverse proxy
origin server
rescue server
(3)
(4)
(1)
DNS server
client2
(2)
DNS round robin
IBM Delhi (Jan. 2006)
39
Rescue example (2)
•
Replicate scripts dynamically
Apache
origin server
(1)
client
(2)
(8)
(3)
(4)
(5) PHP
rescue server
PHP
(6) PHP
database
server
(7)
MySQL
Apache
IBM Delhi (Jan. 2006)
40
Rescue example (3)
•
client
Cache query results on demand
origin
server
rescue
server
query result cache
data driver
database
server
query result cache
data driver
IBM Delhi (Jan. 2006)
database
server
41
Server states
Origin server
Get help from others
SOS state
Allocate rescue server
Release all rescues
Normal state
Accept SOS request
Rescue server
Provide help to others
Shutdown all rescues
Rescue state
IBM Delhi (Jan. 2006)
42
Handling load spikes
• Load migration
– DNS-RR: reduce arrival rate
– HTTP redirect: increase service rate
– Both: increase throughput
• Benefits
– Reduce origin server network load by caching
static content at rescue servers
– Reduce origin web server CPU load by
replicating scripts dynamically to rescue
servers
IBM Delhi (Jan. 2006)
43
Adaptive overload control
• Objective
– CPU and network load in desired load region
• Origin server
– Allocate/release rescue servers
– Adjust redirect probability
• Rescue server
– Accept SOS requests
– Shutdown rescues
– Adjust allowed redirect rate
IBM Delhi (Jan. 2006)
44
Self-configuring
• Rescue server discovery via SLP and DNS SRV
• Dynamic virtual hosting:
– Serving content of a new site on the fly
– use “pre-positioned” Apache virtual hosts
• Workload monitoring: network and CPU
– take headers and responses into account
• Adaptive rescue control
– Don’t know precise load handling capacity of rescue
servers
• particularly for active content
– Establish desired load region (typically, ~70%)
– Periodically measure and adjust redirect probability
• convey via protocol
IBM Delhi (Jan. 2006)
45
Implementation
• Based on LAMP (Linux, Apache, MySQL, PHP)
• Apache module (mod_dots), DotSlash daemon (dotsd),
DotSlash rescue protocol (DSRP)
• Dynamic DNS using BIND with dot-slash.net
• Service discovery using enhanced SLP
HTTP
client
mod_dots
SHM
Apache
dotsd
DNS
BIND
IBM Delhi (Jan. 2006)
DSRP
other
dotsd
SLP
mSLP
46
Handling File Inclusions
• The problem
– A replicated script may include files that are
located at the origin server
– Assume: included files under DocumentRoot
• Approaches
– Renaming inclusion statements
• Need to parse scripts: heavy weight
– Customized error handler
• Catch inclusion errors: light weight
IBM Delhi (Jan. 2006)
47
Evaluation
• Workload generation
– httperf for static content
– RUBBoS (bulletin board) for dynamic content
• Testbed
– LAN cluster and WAN (PlanetLab) nodes
– Linux Redhat 9.0, Apache 2.0.49, MySQL 4.0.18, PHP
4.3.6
• Metrics
– Max request rate and max data rate supported
IBM Delhi (Jan. 2006)
48
Results in LANs
Request rate, redirect rate, rescue rate
IBM Delhi (Jan. 2006)
Date rate
49
Handling worst-case workload
Settling time:
24 second
#timeouts
921/113565
IBM Delhi (Jan. 2006)
50
Results for dynamic content
Configuration:
Origin (HC)
Rescue(LC)
(LC)
Rescue
Rescue
(LC)
Rescue(LC)
(LC)
Rescue
Rescue
(LC)
Rescue(LC)
(LC)
Rescue
Rescue
(LC)
DB (HC)
No rescue: R=118
CPU: Origin=100%
DB=45%
With rescue: R=245
#rescue servers: 9
CPU: Origin=55%
DB=100%
245/118>2
IBM Delhi (Jan. 2006)
51
Caching TTL and Hit Ratio (Read-Only)
100
95
Cache hit ratio (%)
90
85
80
75
70
65
60
0
10
1
10
10
Caching TTL (seconds)
IBM Delhi (Jan. 2006)
2
10
3
52
CPU Utilization (Read-Only)
100
READ3 database server
READ4 database server
READ5 database server
READ5 shared cache server
90
CPU utilization (%)
80
70
READ3
with rescue
no cache
60
READ4
with rescue
with co-located cache
50
40
30
20
READ5
with rescue
with shared cache
10
0
500
1000
1500
2000
2500
Number of clients
3000
IBM Delhi (Jan. 2006)
3500
4000
53
Request Rate (Read-Only)
550
READ3
with rescue
no cache
500
Requests per second
450
400
350
READ4
with rescue
with co-located cache
300
250
200
READ3
READ4
READ5
150
100
500
1000
1500
2000
2500
Number of clients
3000
IBM Delhi (Jan. 2006)
3500
READ5
with rescue
with shared cache
4000
54
CPU Utilization (Submission)
Origin database server CPU utilization (%)
100
SUB4
SUB6
SUB5
90
80
SUB4
with rescue
no cache
70
60
SUB5
with rescue
with cache
no invalidation
50
40
30
20
10
0
3000
3500
4000
4500 5000 5500
Number of clients
6000
IBM Delhi (Jan. 2006)
6500
7000
SUB6
with rescue
with cache
with invalidation
55
Request Rate (Submission)
900
SUB4
with rescue
no cache
850
Requests per second
800
750
SUB4
SUB6
SUB5
700
SUB5
with rescue
with cache
no invalidation
650
600
550
500
450
400
3000
3500
4000
4500 5000 5500
Number of clients
6000
IBM Delhi (Jan. 2006)
6500
7000
SUB6
with rescue
with cache
with invalidation
56
Performance
• Static content (httperf)
– 10-fold improvement
– Relieve network and web server bottlenecks
• Dynamic content (RUBBoS)
– Completely remove web/application server
bottleneck
– Relieve database server bottleneck
– Overall improvement: 10 times for read-only mix, 5
times for submission mix
IBM Delhi (Jan. 2006)
57
Conclusion
• DotSlash prototype
– Applicable to both static and dynamic content
– Promising performance improvement
– Released as open-source software
• On-going work
– Address security issues in deployment
– Extensible to SIP servers? Web services?
• For further information
– http://www.cs.columbia.edu/IRT/dotslash
– DotSlash framework: WCW 2004
– Dynamic script replication: Global Internet 2005
– On-demand query result cache: TR CUCS-035-05
(under submission)
IBM Delhi (Jan. 2006)
58