Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
SIP as infrastructure Henning Schulzrinne Dept. of Computer Science, Columbia University, New York [email protected] SIP 2007 (upperside.fr) Paris, France February 2007 Outline • Scaling SIP to the real world: emergency calling • Scaling SIP to very large deployments – – – – some measurements for designing large servers congestion control and dealing with avalanche restart P2P SIP failure discovery • The state of SIP standardization, year 11 – developments in 2006 & upcoming highlights – trouble in standards land February 2007 2 Roadmap • • • • • • Introduction Emergency calling Server scaling P2P SIP End-to-end management Standardization and interoperability February 2007 3 Evolution of VoIP long-distance calling, ca. 1930 “does it do call transfer?” “amazing – the phone rings” catching up with the digital PBX 1996-2000 2000-2003 February 2007 “How can I make it stop ringing?” “Can it really replace the phone system?” replacing the global phone system going beyond the black phone 2004-2005 20064 IETF VoIP efforts ECRIT ENUM SIMPLE (emergency calling) (E.164 translation) (presence) uses GEOPRIV uses SPEERMINT (geo + privacy) may use XCON uses (conf. control) SIP IPTEL (protocol) provides uses SIPPING (usage, requirements) SPEECHSC (tel URL) (speech services) usually used with IETF RAI area February 2007 (peering) AVT MMUSIC (RTP, SRTP, media) (SDP, RTSP, ICE) SIGTRAN (signaling transport) 5 Roadmap • • • • • • Introduction Emergency calling Server scaling P2P SIP End-to-end management Standardization and interoperability February 2007 6 VoIP emergency communications emergency call emergency alert (“inverse 911”) Contact wellknown number dispatch or identifier Route call to locationappropriate PSAP civic coordination February 2007 Deliver precise location to call taker to dispatch emergency help now transition all IP 112 911 112 911 112, 911 VPC LoST in-band key location in-band SR phone number location (ALI lookup) urn:service:sos 7 IETF ECRIT working group • • Emergency Contact Resolution with Internet Technologies Solve four major pieces of the puzzle: – – – – • location conveyance (with SIP & GEOPRIV) emergency call identification mapping geo and civic caller locations to PSAP URI discovery of local and visited emergency dial string Not solving – location discovery --> GEOPRIV – inter-PSAP communication and coordination – citizen notification • Current status: – finishing general and security requirements – agreement on mapping protocol (LoST) and identifier (sos URN) – working on overall architecture and UA requirements February 2007 8 ECRIT: Options for location delivery • GPS • L2: LLDP-MED (standardized version of CDP + location data) – periodic per-port broadcast of configuration information – currently implementing CDP • L3: DHCP for – geospatial (RFC 3825) – civic (RFC 4676) • L7: proposals for retrievals: HELD, RELO, LCP, SIP, … – – – – – for own IP address or by third party (e.g., ISP to infrastructure provider) by IP address by MAC address by identifier (conveyed by DHCP or PPP) HELD, RELO: both HTTP-based February 2007 9 ECRIT: Finding the correct PSAP • Which PSAP should the e-call go to? – – – – Usually to the PSAP that serves the geographic area Sometimes to a backup PSAP If no location, then ‘default’ PSAP solved by LoST I am at "Otto-Hahn-Ring 6, 81739 München" I need contact the ambulance. (Emergency Identifier) Mapping Client Mapping Server Contact URI [email protected] February 2007 10 ECRIT: LoST Functionality • Civic as well as geospatial queries – • • civic address validation Recursive and iterative resolution Fully distributed and hierarchical deployment – – • Indicates errors in civic location data debugging – • but provides best-effort resolution Can be used for non-emergency services: – – directory and information services pizza delivery services, towing companies, … can be split by any geographic or civic boundary same civic region can span multiple LoST servers February 2007 <findService xmlns="urn:…:lost1"> <location profile="basic-civic"> <civicAddress> <country>Germany</country> <A1>Bavaria</A1> <A3>Munich</A3> <A6>Neu Perlach</A6> <HNO>96</HNO> </civicAddress> </location> <service>urn:service:sos.police</service> </findService> 11 LoST: Location-to-URL Mapping VSP1 cluster serving VSP1 replicate root information cluster serves VSP2 123 Broad Ave Leonia Bergen County NJ US LoST NJ US sip:[email protected] root NY US nodes search referral Bergen County NJ US Leonia NJ US February 2007 12 LoST Architecture G tree guide G G G T1: .us G broadcast (gossip) T2: .de resolver seeker 313 Westview Leonia, NJ US T2 T1 February 2007 (.us) (.de) T3 (.dk) Leonia, NJ sip:[email protected] 13 Roadmap • • • • • • Introduction Emergency calling Server scaling P2P SIP End-to-end management Standardization and interoperability February 2007 14 SIP server overload Springsteen tickets!! earthquake vote for your favorite… overloaded INVITE 503 overloaded overloaded • Proxies will return 503 --> retry elsewhere • Just adds more load • Retransmissions exacerbate the problem February 2007 15 Avalanche restart • • • • Large number of terminals all start at once Typically, after power outage Overwhelms registrar Possible loss of registrations due to retransmission time-out #1 REGISTER #300,000 reboot after power outage February 2007 16 Overload control • • • • Current discussion in design team Feedback control: rate-based or window-based Avoid congestion collapse Deal with multiple upstream sources goodput S1 S4 capacity S2 S5 S3 UA UA offered load February 2007 17 Scaling servers & TCP • Need TCP • – TLS support: customer privacy, theft of service, … – running series of tests to identify differences – difference mainly in • particularly for WiFi – many SIP messages now exceed reasonable UDP size (fragmentation) • e.g., INVITE for IMS: 1182 bytes • Concern: UA support – improving: 82% of systems at recent SIPit’19 had TCP support – only 45% support TLS February 2007 Concern: TCP (and TLS) much less efficient than UDP • connection setup cost • message splitting (may need preparsing or incremental parsers) • thread count (one per socket?) • Our model: – 300,000 customers/servers • 0.1 Erlang, 180 sec/call – 600,000 BHCA --> 167 req/sec – 300,000 registrations --> 83 req/sec – $0.001/subscriber 18 Performance evaluation results • Pentium 4 server, 3 GHz echo server 0.5 Response time (ms) 0.45 100 response time 2,500 req/sec response time 14,800 req/sec CPU 2,500 req/sec CPU 14,800 req/sec 90 0.4 80 0.35 70 0.3 60 0.25 50 0.2 40 0.15 30 0.1 20 0.05 10 0 0 transaction February 2007 Kumiko Ono persistent w/setup persistent w/o setup UDP 19 CPU (%) – 4 GB memory – Linux 2.6.16 SIP server measurements TCP • Initial INVITE measurements – OpenSER – 400 calls/sec for TCP – roughly 260 calls/sec for TLS sipd REGISTER test February 2007 Kumiko Ono, Charles Shen, Erich Nahum 20 Roadmap • • • • • • Introduction Emergency calling Server scaling P2P SIP End-to-end management Standardization and interoperability February 2007 21 P2P SIP • Why? generic DHT service p2p network – no infrastructure available: emergency coordination – don’t want to set up infrastructure: small companies – Skype envy :-) • P2P provider B P2P technology for DNS – user location • only modest impact on expenses • but makes signaling encryption cheap P2P provider A – NAT traversal • matters for relaying traditional provider – services (conferencing, …) • how prevalent? • New IETF working group just formed – likely, multiple DHTs – common control and look-up protocol? February 2007 zeroconf LAN 22 P2P SIP -- components • Multicast-DNS (zeroconf) SIP enhancements for LAN – announce UAs and their capabilities • Client-P2P protocol – GET, PUT mappings – mapping: proxy or UA • P2P protocol – get routing table, join, leave, … – independent of DHT? – replaces DNS for SIP, not proxy February 2007 23 Roadmap • • • • • • Introduction Emergency calling Server scaling P2P SIP End-to-end management Standardization and interoperability February 2007 24 VoIP user experience • Only 95-99.5% call attempt success – “Keynote was able to complete VoIP calls 96.9% of the time, compared with 99.9% for calls made over the public network. Voice quality for VoIP calls on average was rated at 3.5 out of 5, compared with 3.9 for public-network calls and 3.6 for cellular phone calls. And the amount of delay the audio signals experienced was 295 milliseconds for VoIP calls, compared with 139 milliseconds for publicnetwork calls.” (InformationWeek, July 11, 2005) • • Mid-call disruptions common Lots of knobs to turn – Separate problem: manual configuration February 2007 25 Open issues: Configuration • • Ideally, should only need a user name and some credential – password, USB key, host identity (MAC address), … More than DHCP: device needs to get – SIP-level information (outbound proxy, timers) – policy information (“sorry, no video”) February 2007 • • • • Multiple sources of configuration information – local network (hotel proxy) – voice service provider (offnetwork) Configuration information may change Needs to allow no-touch deployment of thousands of devices SIP configuration framework – has been languishing for years – currently being rewritten to reduce complexity 26 Circle of blame probably packet loss in your Internet connection reboot your DSL modem ISP VSP OS must be a Windows registry problem re-install Windows February 2007 probably a gateway fault choose us as provider app vendor must be your software upgrade 27 Traditional network management model X SNMP “management from the center” February 2007 28 Old assumptions, now wrong • Single provider (enterprise, carrier) – has access to most path elements – professionally managed • Problems are hard failures & elements operate correctly – element failures (“link dead”) – substantial packet loss • Mostly L2 and L3 elements – switches, routers – rarely 802.11 APs February 2007 • Problems are specific to a protocol – “IP is not working” • Indirect detection – MIB variable vs. actual protocol performance • End systems don’t need management – DMI & SNMP never succeeded – each application does its own updates 29 Management what causes the most trouble? network understanding fault location configuration we’ve only succeeded here element inspection February 2007 30 Managing the protocol stack media RTP UDP/TCP IP February 2007 echo gain problems VAD action protocol problem playout errors protocol problem authorization asymmetric conn (NAT) SIP TCP neg. failure NAT time-out firewall policy no route packet loss 31 Proposal: “Do You See What I See?” • Each node has a set of active and passive measurement tools • Use intercept (NDIS, pcap) – to detect problems automatically • e.g., no response to HTTP or DNS request – gather performance statistics (packet jitter) – capture RTCP and similar measurement packets • Nodes can ask others for their view – possibly also dedicated “weather stations” • Iterative process, leading to: – user indication of cause of failure – in some cases, work-around (application-layer routing) TURN server, use remote DNS servers • Nodes collect statistical information on failures and their likely causes February 2007 32 Management architecture “not working” (notification) inspect protocol requests orchestrate tests contact others request diagnostics (DNS, HTTP, RTCP, …) ping 127.0.0.1 can buddy reach our resolver? “DNS failure for 15m” notify admin (email, IM, SIP events, …) February 2007 33 Roadmap • • • • • • Introduction Emergency calling Server scaling P2P SIP End-to-end management Standardization and interoperability February 2007 34 SIP, SIPPING & SIMPLE –00 drafts 80 70 60 50 SIP SIPPING SIMPLE 40 30 20 10 0 1999 2000 2001 2002 2003 2004 2005 2006 includes draft-ietf-*-00 and draft-personal-*-00 February 2007 35 RFC publication 14 12 10 8 SIP SIPPING SIMPLE 6 4 2 0 February 2007 2001 2002 2003 2004 2005 2006 36 IETF WG: SIP in 2006 & 2007 • ~ 44 SIP-related RFCs published in 2006 – – – • BFCP, conferencing SDP revision rich presence Activities: – – hitchhiker’s guide infrastructure: • • • • – GRUUs (random identifiers) URI lists XCAP configuration SIP MIB services: • • • • February 2007 rejecting anonymous requests consent framework location conveyance session policy – security: • • • • end-to-middle security certificates SAML sips clarification – NAT: • connection re-use • SIP outbound • ICE (in MMUSIC) see http://tools.ietf.org/wg/sip’/ 37 IETF WG: SIPPING • 31 RFCs published in 2006 • Policy – media policy – SBC functions • Services – – – – – – service examples call transfer configuration framework spam and spit text-over-IP transcoding February 2007 • Testing and operations – – – – – – – IPv6 transition race condition examples IPv6 torture tests SIP offer-answer examples overload requirements configuration voice quality reporting 38 Interoperability • Generally no interoperability problems for basic SIP functionality – basic call, digest registration, call transfer, voice mail • Weaker in advanced scenarios and backward compatibility – – – – – – handling TCP, TLS NAT support (symmetric RTP, ICE, STUN, ...) multipart bodies SIP torture tests call transfer, call pick-up video and voice codec interoperability (H.264, anything beyond G.711) • SIPit useful, but no equivalent of WiFi certification – most implementations still single-vendor (enterprise, carrier) or vendorsupplied (VSP) – SFTF (test framework) still limited • Need profiles to guide implementers February 2007 39 Trouble in Standards Land • Proliferation of transition standards: 2.5G, 2.6G, 3.5G, … – true even for emergency calling… • Splintering of standardization efforts across SDOs OASIS W3C ISO (MPEG) – primary: • IEEE, IETF, W3C, OASIS, ISO data exchange IETF L2.5-L7 protocols IEEE L1-L2 – architectural: • PacketCable, ETSI, 3GPP, 3GPP2, OMA, UMA, ATIS, … data formats – specialized: • NENA 3GPP • SIP Forum, IPCC, … February 2007 PacketCable – operational, marketing: 40 IETF issues • SIP WGs: small number (dozen?) of core authors (80/20) – some now becoming managers… – or moving to other topics • IETF: research engineering maintenance • – often from core equipment vendors, not software vendors or carriers • – often dealing with transition to hostile & “random” network – network ossification February 2007 fair amount of not-invented-here syndrome • late to recognize wide usage of XML and web standards • late to deal with NATs • security tends to be per-protocol (silo) – many groups are essentially maintaining standards written a decade (or two) ago • DNS, IPv4, IPv6, BGP, DHCP; RTP, SIP, RTSP • constrained by design choices made long ago Stale IETF leadership – some efforts such as SAML and SASL • tendency to re-invent the wheel in each group 41 IETF issue: timeliness • Most drafts spend lots of time in 90%complete state – – lack of energy (moved on to new -00) optimizers vs. satisfiers • • – – • multiple choices that have noncommensurate trade-offs SIP request history: Feb. 2002 – May 2005 (RFC 4244) Session timers: Feb. 1999 – May 2005 (RFC 4028) Resource priority: Feb. 2001 – Feb 2006 (RFC 4412) New framework/requirements phase adds 1-2 years of delay Three bursts of activity/year, with silence in-between – occasional interim meetings February 2007 IETF meetings are often not productive – most topics gets 5-10 minutes lack context, focus on minutiae – no background same people as on mailing list – 5 people discuss, 195 people read email Notorious examples: – • • • No formal issue tracking – some WGs use tools, haphazardly • Gets worse over time: – dependencies increase, sometimes undiscovered – backwards compatibility issues – more background needed to contribute 42 IETF issues: timeliness • WG chairs run meetings, but are not managing WG progress – very little control of deadlines • e.g., all SIMPLE deadlines are probably a year behind – little push to come to working group last call (WGLC) – limited timeliness accountability of authors and editors – chairs often provide limited editorial feedback • IESG review can get stuck in long feedback loop – author – AD – WG chairs – sometimes lack of accountability (AD-authored documents) • RFC editor often takes 6+ months to process document – dependencies; IANA; editor queue; author delays – e.g., session timer: Aug. 2004 – May 2005 February 2007 43 Conclusion • Moving from lab and trials to large-scale deployments • Planning horizon includes turning off circuit-switched phones – in large enterprises – in some carriers • From emphasis on features to global scale: – – – – – – interoperation configuration peer-to-peer systems emergency services overload behavior failure detection across networks and protocol layers • Integration of advanced features (IM, presence, video, programmable services) still lacking • Current standardization processes slow and complexity-inducing February 2007 44