Download SIP2007-keynote - Columbia University

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Social peer-to-peer processes wikipedia , lookup

Transcript
SIP as infrastructure
Henning Schulzrinne
Dept. of Computer Science, Columbia University, New York
[email protected]
SIP 2007 (upperside.fr)
Paris, France
February 2007
Outline
• Scaling SIP to the real world: emergency calling
• Scaling SIP to very large deployments
–
–
–
–
some measurements for designing large servers
congestion control and dealing with avalanche restart
P2P SIP
failure discovery
• The state of SIP standardization, year 11
– developments in 2006 & upcoming highlights
– trouble in standards land
February 2007
2
Roadmap
•
•
•
•
•
•
Introduction
Emergency calling
Server scaling
P2P SIP
End-to-end management
Standardization and interoperability
February 2007
3
Evolution of VoIP
long-distance calling,
ca. 1930
“does it do
call transfer?”
“amazing
– the
phone
rings”
catching up
with the digital PBX
1996-2000
2000-2003
February 2007
“How can
I make it
stop
ringing?”
“Can it really
replace the
phone
system?”
replacing the
global phone system
going beyond
the black phone
2004-2005
20064
IETF VoIP efforts
ECRIT
ENUM
SIMPLE
(emergency calling)
(E.164 translation)
(presence)
uses
GEOPRIV
uses
SPEERMINT
(geo + privacy)
may use
XCON
uses
(conf. control)
SIP
IPTEL
(protocol)
provides
uses
SIPPING
(usage, requirements)
SPEECHSC
(tel URL)
(speech services)
usually
used
with
IETF RAI area
February 2007
(peering)
AVT
MMUSIC
(RTP, SRTP, media)
(SDP, RTSP, ICE)
SIGTRAN
(signaling transport)
5
Roadmap
•
•
•
•
•
•
Introduction
Emergency calling
Server scaling
P2P SIP
End-to-end management
Standardization and interoperability
February 2007
6
VoIP emergency communications
emergency call
emergency alert
(“inverse 911”)
Contact wellknown number
dispatch
or identifier
Route call to
locationappropriate PSAP
civic coordination
February 2007
Deliver precise
location to call
taker to dispatch
emergency help
now
transition all IP
112
911
112
911
112, 911
VPC
LoST
in-band 
key 
location
in-band
SR
phone
number

location
(ALI
lookup)
urn:service:sos
7
IETF ECRIT working group
•
•
Emergency Contact Resolution with Internet Technologies
Solve four major pieces of the puzzle:
–
–
–
–
•
location conveyance (with SIP & GEOPRIV)
emergency call identification
mapping geo and civic caller locations to PSAP URI
discovery of local and visited emergency dial string
Not solving
– location discovery --> GEOPRIV
– inter-PSAP communication and coordination
– citizen notification
•
Current status:
– finishing general and security requirements
– agreement on mapping protocol (LoST) and identifier (sos URN)
– working on overall architecture and UA requirements
February 2007
8
ECRIT: Options for location delivery
• GPS
• L2: LLDP-MED (standardized version of CDP + location data)
– periodic per-port broadcast of configuration information
– currently implementing CDP
• L3: DHCP for
– geospatial (RFC 3825)
– civic (RFC 4676)
• L7: proposals for retrievals: HELD, RELO, LCP, SIP, …
–
–
–
–
–
for own IP address or by third party (e.g., ISP to infrastructure provider)
by IP address
by MAC address
by identifier (conveyed by DHCP or PPP)
HELD, RELO: both HTTP-based
February 2007
9
ECRIT: Finding the correct PSAP
• Which PSAP should the e-call go to?
–
–
–
–
Usually to the PSAP that serves the geographic area
Sometimes to a backup PSAP
If no location, then ‘default’ PSAP
solved by LoST
I am at "Otto-Hahn-Ring 6, 81739
München"
I need contact the ambulance.
(Emergency Identifier)
Mapping
Client
Mapping
Server
Contact URI [email protected]
February 2007
10
ECRIT: LoST Functionality
•
Civic as well as geospatial queries
–
•
•
civic address validation
Recursive and iterative resolution
Fully distributed and hierarchical
deployment
–
–
•
Indicates errors in civic location data 
debugging
–
•
but provides best-effort resolution
Can be used for non-emergency services:
–
–
directory and information services
pizza delivery services, towing companies, …
can be split by any geographic or civic
boundary
same civic region can span multiple
LoST servers
February 2007
<findService xmlns="urn:…:lost1">
<location profile="basic-civic">
<civicAddress>
<country>Germany</country>
<A1>Bavaria</A1>
<A3>Munich</A3>
<A6>Neu Perlach</A6>
<HNO>96</HNO>
</civicAddress>
</location>
<service>urn:service:sos.police</service>
</findService>
11
LoST: Location-to-URL Mapping
VSP1
cluster serving VSP1
replicate
root information
cluster
serves VSP2
123 Broad Ave
Leonia
Bergen County
NJ US
LoST
NJ
US
sip:[email protected]
root
NY
US
nodes
search
referral
Bergen County
NJ US
Leonia
NJ US
February 2007
12
LoST Architecture
G
tree guide
G
G
G
T1: .us
G
broadcast (gossip)
T2: .de
resolver
seeker
313 Westview
Leonia, NJ US
T2
T1
February 2007
(.us)
(.de)
T3
(.dk)
Leonia, NJ  sip:[email protected]
13
Roadmap
•
•
•
•
•
•
Introduction
Emergency calling
Server scaling
P2P SIP
End-to-end management
Standardization and interoperability
February 2007
14
SIP server overload
Springsteen tickets!!
earthquake
vote for your favorite…
overloaded
INVITE
503
overloaded
overloaded
• Proxies will return 503 --> retry elsewhere
• Just adds more load
• Retransmissions exacerbate the problem
February 2007
15
Avalanche restart
•
•
•
•
Large number of terminals all start at once
Typically, after power outage
Overwhelms registrar
Possible loss of registrations due to retransmission time-out
#1
REGISTER
#300,000
reboot after
power outage
February 2007
16
Overload control
•
•
•
•
Current discussion in design team
Feedback control: rate-based or window-based
Avoid congestion collapse
Deal with multiple upstream sources
goodput
S1
S4
capacity
S2
S5
S3
UA
UA
offered load
February 2007
17
Scaling servers & TCP
• Need TCP
•
– TLS support: customer
privacy, theft of service, …
– running series of tests to identify
differences
– difference mainly in
• particularly for WiFi
– many SIP messages now
exceed reasonable UDP size
(fragmentation)
• e.g., INVITE for IMS: 1182
bytes
• Concern: UA support
– improving: 82% of systems at
recent SIPit’19 had TCP
support
– only 45% support TLS
February 2007
Concern: TCP (and TLS) much
less efficient than UDP
• connection setup cost
• message splitting (may need preparsing or incremental parsers)
• thread count (one per socket?)
•
Our model:
– 300,000 customers/servers
• 0.1 Erlang, 180 sec/call
– 600,000 BHCA --> 167 req/sec
– 300,000 registrations --> 83
req/sec
– $0.001/subscriber
18
Performance evaluation results
• Pentium 4 server, 3 GHz
echo server
0.5
Response time (ms)
0.45
100
response time 2,500 req/sec
response time 14,800 req/sec
CPU 2,500 req/sec
CPU 14,800 req/sec
90
0.4
80
0.35
70
0.3
60
0.25
50
0.2
40
0.15
30
0.1
20
0.05
10
0
0
transaction
February 2007
Kumiko Ono
persistent w/setup
persistent w/o setup
UDP
19
CPU (%)
– 4 GB memory
– Linux 2.6.16
SIP server measurements
TCP
• Initial INVITE measurements
– OpenSER
– 400 calls/sec for TCP
– roughly 260 calls/sec for TLS
sipd REGISTER test
February 2007
Kumiko Ono, Charles Shen, Erich Nahum
20
Roadmap
•
•
•
•
•
•
Introduction
Emergency calling
Server scaling
P2P SIP
End-to-end management
Standardization and interoperability
February 2007
21
P2P SIP
•
Why?
generic DHT service
p2p network
– no infrastructure available: emergency
coordination
– don’t want to set up infrastructure: small
companies
– Skype envy :-)
•
P2P provider B
P2P technology for
DNS
– user location
• only modest impact on expenses
• but makes signaling encryption cheap
P2P provider A
– NAT traversal
• matters for relaying
traditional provider
– services (conferencing, …)
• how prevalent?
•
New IETF working group just formed
– likely, multiple DHTs
– common control and look-up protocol?
February 2007
zeroconf
LAN
22
P2P SIP -- components
• Multicast-DNS (zeroconf) SIP
enhancements for LAN
– announce UAs and their
capabilities
• Client-P2P protocol
– GET, PUT mappings
– mapping: proxy or UA
• P2P protocol
– get routing table, join, leave, …
– independent of DHT?
– replaces DNS for SIP, not proxy
February 2007
23
Roadmap
•
•
•
•
•
•
Introduction
Emergency calling
Server scaling
P2P SIP
End-to-end management
Standardization and interoperability
February 2007
24
VoIP user experience
•
Only 95-99.5% call attempt success
– “Keynote was able to complete VoIP
calls 96.9% of the time, compared with
99.9% for calls made over the public
network. Voice quality for VoIP calls on
average was rated at 3.5 out of 5,
compared with 3.9 for public-network
calls and 3.6 for cellular phone calls.
And the amount of delay the audio
signals experienced was 295
milliseconds for VoIP calls, compared
with 139 milliseconds for publicnetwork calls.” (InformationWeek, July
11, 2005)
•
•
Mid-call disruptions common
Lots of knobs to turn
– Separate problem: manual
configuration
February 2007
25
Open issues: Configuration
•
•
Ideally, should only need a
user name and some
credential
– password, USB key, host
identity (MAC address),
…
More than DHCP: device
needs to get
– SIP-level information
(outbound proxy, timers)
– policy information (“sorry,
no video”)
February 2007
•
•
•
•
Multiple sources of configuration
information
– local network (hotel proxy)
– voice service provider (offnetwork)
Configuration information may change
Needs to allow no-touch deployment of
thousands of devices
SIP configuration framework
– has been languishing for years
– currently being rewritten to reduce
complexity
26
Circle of blame
probably packet
loss in your
Internet connection 
reboot your DSL modem
ISP
VSP
OS
must be a
Windows registry
problem  re-install
Windows
February 2007
probably a gateway fault
 choose us as provider
app
vendor
must be
your software
 upgrade
27
Traditional network management model
X
SNMP
“management from the center”
February 2007
28
Old assumptions, now wrong
• Single provider (enterprise,
carrier)
– has access to most path
elements
– professionally managed
• Problems are hard failures &
elements operate correctly
– element failures (“link dead”)
– substantial packet loss
• Mostly L2 and L3 elements
– switches, routers
– rarely 802.11 APs
February 2007
• Problems are specific to a
protocol
– “IP is not working”
• Indirect detection
– MIB variable vs. actual
protocol performance
• End systems don’t need
management
– DMI & SNMP never
succeeded
– each application does its own
updates
29
Management
what causes the
most trouble?
network understanding
fault location
configuration
we’ve only
succeeded
here
element inspection
February 2007
30
Managing the protocol stack
media
RTP
UDP/TCP
IP
February 2007
echo
gain problems
VAD action
protocol problem
playout errors
protocol problem
authorization
asymmetric conn
(NAT)
SIP
TCP neg. failure
NAT time-out
firewall policy
no route
packet loss
31
Proposal: “Do You See What I See?”
• Each node has a set of active and passive measurement tools
• Use intercept (NDIS, pcap)
– to detect problems automatically
• e.g., no response to HTTP or DNS request
– gather performance statistics (packet jitter)
– capture RTCP and similar measurement packets
• Nodes can ask others for their view
– possibly also dedicated “weather stations”
• Iterative process, leading to:
– user indication of cause of failure
– in some cases, work-around (application-layer routing)  TURN
server, use remote DNS servers
• Nodes collect statistical information on failures and their likely
causes
February 2007
32
Management architecture
“not working”
(notification)
inspect protocol requests
orchestrate tests
contact others
request diagnostics
(DNS, HTTP, RTCP, …)
ping 127.0.0.1
can buddy reach our
resolver?
“DNS failure for 15m”
notify admin
(email, IM, SIP events, …)
February 2007
33
Roadmap
•
•
•
•
•
•
Introduction
Emergency calling
Server scaling
P2P SIP
End-to-end management
Standardization and interoperability
February 2007
34
SIP, SIPPING & SIMPLE –00 drafts
80
70
60
50
SIP
SIPPING
SIMPLE
40
30
20
10
0
1999 2000 2001 2002 2003 2004 2005 2006
includes draft-ietf-*-00 and draft-personal-*-00
February 2007
35
RFC publication
14
12
10
8
SIP
SIPPING
SIMPLE
6
4
2
0
February 2007
2001
2002
2003
2004
2005
2006
36
IETF WG: SIP in 2006 & 2007
•
~ 44 SIP-related RFCs published in 2006
–
–
–
•
BFCP, conferencing
SDP revision
rich presence
Activities:
–
–
hitchhiker’s guide
infrastructure:
•
•
•
•
–
GRUUs (random identifiers)
URI lists
XCAP configuration
SIP MIB
services:
•
•
•
•
February 2007
rejecting anonymous requests
consent framework
location conveyance
session policy
– security:
•
•
•
•
end-to-middle security
certificates
SAML
sips clarification
– NAT:
• connection re-use
• SIP outbound
• ICE (in MMUSIC)
see http://tools.ietf.org/wg/sip’/
37
IETF WG: SIPPING
• 31 RFCs published in 2006
• Policy
– media policy
– SBC functions
• Services
–
–
–
–
–
–
service examples
call transfer
configuration framework
spam and spit
text-over-IP
transcoding
February 2007
• Testing and operations
–
–
–
–
–
–
–
IPv6 transition
race condition examples
IPv6 torture tests
SIP offer-answer examples
overload requirements
configuration
voice quality reporting
38
Interoperability
• Generally no interoperability problems for basic SIP functionality
– basic call, digest registration, call transfer, voice mail
• Weaker in advanced scenarios and backward compatibility
–
–
–
–
–
–
handling TCP, TLS
NAT support (symmetric RTP, ICE, STUN, ...)
multipart bodies
SIP torture tests
call transfer, call pick-up
video and voice codec interoperability (H.264, anything beyond G.711)
• SIPit useful, but no equivalent of WiFi certification
– most implementations still single-vendor (enterprise, carrier) or vendorsupplied (VSP)
– SFTF (test framework) still limited
• Need profiles to guide implementers
February 2007
39
Trouble in Standards Land
•
Proliferation of transition standards: 2.5G,
2.6G, 3.5G, …
– true even for emergency calling…
•
Splintering of standardization efforts
across SDOs
OASIS
W3C
ISO (MPEG)
– primary:
• IEEE, IETF, W3C, OASIS, ISO
data
exchange
IETF
L2.5-L7
protocols
IEEE
L1-L2
– architectural:
• PacketCable, ETSI, 3GPP, 3GPP2, OMA,
UMA, ATIS, …
data
formats
– specialized:
• NENA
3GPP
• SIP Forum, IPCC, …
February 2007
PacketCable
– operational, marketing:
40
IETF issues
•
SIP WGs: small number (dozen?)
of core authors (80/20)
– some now becoming managers…
– or moving to other topics
•
IETF: research  engineering 
maintenance
•
– often from core equipment
vendors, not software vendors or
carriers
•
– often dealing with transition to
hostile & “random” network
– network ossification
February 2007
fair amount of not-invented-here
syndrome
• late to recognize wide usage of
XML and web standards
• late to deal with NATs
• security tends to be per-protocol
(silo)
– many groups are essentially
maintaining standards written a
decade (or two) ago
• DNS, IPv4, IPv6, BGP, DHCP;
RTP, SIP, RTSP
• constrained by design choices
made long ago
Stale IETF leadership
– some efforts such as SAML and
SASL
•
tendency to re-invent the wheel in
each group
41
IETF issue: timeliness
•
Most drafts spend lots of time in 90%complete state
–
–
lack of energy (moved on to new -00)
optimizers vs. satisfiers
•
•
–
–
•
multiple choices that have noncommensurate trade-offs
SIP request history: Feb. 2002 – May
2005 (RFC 4244)
Session timers: Feb. 1999 – May 2005
(RFC 4028)
Resource priority: Feb. 2001 – Feb
2006 (RFC 4412)
New framework/requirements phase
adds 1-2 years of delay
Three bursts of activity/year, with
silence in-between
–
occasional interim meetings
February 2007
IETF meetings are often not
productive
– most topics gets 5-10 minutes 
lack context, focus on minutiae
– no background  same people as
on mailing list
– 5 people discuss, 195 people read
email
Notorious examples:
–
•
•
•
No formal issue tracking
– some WGs use tools, haphazardly
•
Gets worse over time:
– dependencies increase,
sometimes undiscovered
– backwards compatibility issues
– more background needed to
contribute
42
IETF issues: timeliness
• WG chairs run meetings, but are not managing WG
progress
– very little control of deadlines
• e.g., all SIMPLE deadlines are probably a year behind
– little push to come to working group last call (WGLC)
– limited timeliness accountability of authors and editors
– chairs often provide limited editorial feedback
• IESG review can get stuck in long feedback loop
– author – AD – WG chairs
– sometimes lack of accountability (AD-authored documents)
• RFC editor often takes 6+ months to process document
– dependencies; IANA; editor queue; author delays
– e.g., session timer: Aug. 2004 – May 2005
February 2007
43
Conclusion
• Moving from lab and trials to large-scale deployments
• Planning horizon includes turning off circuit-switched phones
– in large enterprises
– in some carriers
• From emphasis on features to global scale:
–
–
–
–
–
–
interoperation
configuration
peer-to-peer systems
emergency services
overload behavior
failure detection across networks and protocol layers
• Integration of advanced features (IM, presence, video,
programmable services) still lacking
• Current standardization processes slow and complexity-inducing
February 2007
44