Download thesis_talk_mar5

QoS Measurement and Management for VoIP Wenyu Jiang IRT Lab March 5, 2003 Introduction to VoIP & IP Telephony  Transport of voice packets over IP networks  Cost savings – Consolidates voice and data networks – Avoids leased lines, long-distance toll calls  Smart and new services – Call management (filtering, TOD forwarding): CPL – Better than PSTN quality: wide-band codecs  Protocols and Standards – Signaling: SIP (IETF), H.323 (ITU-T) – Transport: RTP/RTCP (IETF) Practical Issues in VoIP  Quality of Service (QoS) – Internet is a best-effort network  Loss, delay and jitter  Users expect at least PSTN quality for VoIP!  Ease of deployment – Requires seamless integration with legacy networks (PSTN/PBX) – Security is a must  High yardstick of service availability – Can your network achieve 99.999% up time? Outline  QoS measurement – Objective vs. subjective metrics – Automated measurement of subjective quality  QoS management: improving your quality – End-to-End: FEC, LBR, PLC – Network provisioning: voice traffic aggregation  Reality check – Performance of end-points (IP phones, …) – Deployment issues in VoIP – Evaluation of VoIP service availability through Internet measurement Workings of a VoIP Client  Audio is packetized, encoded and transmitted  Forward error correction (FEC) may be used to recover lost packets  Playout control smoothes out jitter to minimize late losses; coupled with FEC  Packet loss concealment (PLC) – Last line of “defense” after FEC and playout multimedia packets with FEC Internet added loss, jitter FEC recovery unrecoverable losses by FEC playout delay control FEC affects playout control added late losses loss concealment & decoding LBR: An Alternative to FEC An (n,k) block FEC code can recover  n-k losses  Low Bit-rate Redundancy (LBR)  – Transmit a lower bit-rate version of original audio – No notion of “blocks” – Not bit-exact recovery transmission time FEC block 1 A C B A E D B FEC block 2 C F D FEC data FEC data transmission time A LBR data B C a' D b' E c' F d' Objective QoS Metrics: Loss  Internet packet loss is often bursty – May worsen voice quality than random (Bernoulli) loss  Characterization of packet loss – 2-state Markov (Gilbert) model: conditional loss prob. p 1-q = p c 1-p 0 1 (loss) (non-loss) q – More detailed models, but more states!   Extended Gilbert model, nth order Markov model Hidden Markov model, Gilbert-Elliot model, inter-loss distance – More states  Larger test set, loss of big picture, and   Adaptive applications can trade-off model accuracy for fast feedback Gilbert model provides an acceptable compromise Effect of Gilbert Loss Model  Loss burst distribution of a packet trace – Roughly, though not exactly exponential  Loss burstiness on FEC performance – FEC less efficient under bursty loss 3 Packet trace Gilbert model 2.5 p_f: final loss% after FEC number of occurrences 1000 100 Gilbert Bernoulli 2 1.5 10 1 1 0.5 0.1 0 2 4 6 8 Loss burst length 10 12 0 10 20 30 40 conditional loss p_c (%) 50 60 Objective QoS Metrics: Delay Complementary Conditional CDF (C3DF) f (t )  P[di  t | di l  t ], lag l  1,2,3,..., di : delay of packet i – More descriptive than auto-correlation function (ACF) – Delay correlation rises rapidly beyond a threshold – Approximates conditional late loss probability 1 unconditional 0.8 y: probability  lag=1 0.6 lag=2 0.4 lag=3 lag=5 0.2 lag=20 0 0 lag=10 0.05 0.1 0.15 x: delay (sec) 0.2 0.25 0.3 Subjective QoS Metrics  Perceived quality MOS Grade – Mean Opinion Score (MOS) Excellent  ITU-T P.800/830 Good  Obtained via listening tests Fair Poor – MOS variations Bad  DMOS (Degradation)  CMOS (Comparison)  MOSc (Conversational): considers delay  A/B preference  Pros: more meaningful to end users  Cons: time consuming, labor intensive Score 5 4 3 2 1 Effect of Loss Model on Perceived Quality  Codec: G.729 (8kb/s ITU std)  Random (Bernoulli) vs. bursty (Gilbert) loss – Bursty  lower MOS – True even when FEC or LBR is used Effect of random vs. bursty loss on MOS quality 4.5 random (Bernoulli) loss bursty (Gilbert) loss 4 4.5 4 MOS 3.5 MOS random vs. bursty loss on FEC (G.723.1) quality 5 3.5 3 3 2.5 2 0.02 2.5 0.04 0.06 0.08 0.1 loss probability 0.12 2 0.02 FEC (3,2) (Gilbert) FEC (3,2) (Bernoulli) 0.04 0.06 0.08 0.1 loss probability 0.12 Going Further: Bridging Objective and Subjective Metrics  The E-model (ITU-T G.107/108) – Originally for telephone network planning – Considers various impairments – Reduces to delay and loss impairment when adapted for VoIP  Objective quality estimation algorithms – Suitable when network stats is not available, e.g., phone-to-phone service with IP in between. – Speech recognition performance may be used as a quality predictor, by comparing with original text The E-model Map from loss and delay to impairment scores (Ie, Id)  Compute a gross score (R value) and map to MOSc  Limited number of codec loss impairment mappings 35 E-model Id 50 45 Ie (loss impairment)  40 35 30 25 20 15 10 0 25 3.5 Id (delay impairment) 4 MOS R to MOS mapping 3 2.5 15 2 10 1.5 5 1 0 0.18 4.5 30 20 G.729 T=20ms random loss 0.03 0.06 0.09 0.12 0.15 average loss probability 0 50 100 150 200 250 300 350 400 delay (ms) 0.5 20 40 60 R value 80 100 Using Speech Recognition to Predict MOS  Evaluation of automatic speech recognition (ASR) based MOS prediction – IBM ViaVoice Linux version – Codec used: G.729 – Performance metric  absolute word recognition ratio # of correctly recognized words Rabs  total # of spoken wor ds  relative word recognition ratio Rabs( p) Rrel ( p)  , p is loss probabilit y Rabs(0%) Recognition Ratio vs. MOS Impact of packet loss on audio quality 3.6  Both MOS and Rabs decrease w.r.t. loss  Then, eliminate middle variable p 3.2 MOS 3 2.8 2.6 2.4 2.2 2 mapping from speech recognition performance to MOS 3.8 speech recognition performance 3.6 2 4 6 8 10 loss rate (%) 12 14 16 word recognition ratio (%) 40 3.2 MOS 0 Impact of packet loss on automatic speech recognition 44 G.729 codec 42 3.4 38 3 36 2.8 34 2.6 32 2.4 30 2.2 2 28 G.729 codec 3.4 28 30 32 34 36 38 40 word recognition ratio (%) 42 44 0 2 4 6 8 10 loss rate (%) 12 14 16 Speaker Dependency 3.8  Absolute performance is speaker-dependent  But relative word recognition ratio is not  Suitable for MOS prediction 3.4 MOS 3.2 3 2.8 2.6 2.4 2.2 2 0.65 0.7 0.75 0.8 0.85 0.9 0.95 relative word recognition ratio R_rel 1 1 relative word recognition ratio R_rel 0.9 Speaker A Speaker B Speaker C Speaker A Speaker B Speaker C 0.95 word recognition ratio 0.8 0.7 0.6 0.9 0.85 0.5 0.4 0.8 0.75 0.3 0.2 Speaker A Speaker B Speaker C 3.6 0 2 4 6 8 10 12 packet loss probability p (%) 14 16 0.7 0.65 0 2 4 6 8 10 12 packet loss probability p (%) 14 16 Summary of QoS Measurement  Loss burstiness: – Affects (generally worsens) perceived quality as well as FEC performance – May be described with, e.g., a Gilbert model  Delay correlation: – Increases rapidly beyond a threshold, revealed through Complementary Conditional CDF (C3DF) – Late losses are also bursty  Perceived quality (MOS) estimation – Analytical: the E-model – If network statistics N/A: relative word recognition ratio can provide speaker-independent MOS prediction Outline  QoS measurement – Objective vs. subjective metrics – Automated measurement of subjective quality  QoS management: improving your quality – End-to-End: FEC, LBR, PLC – Network provisioning: voice traffic aggregation  Reality check – Performance of VoIP end-points (IP phones, …) – Deployment issues in VoIP – Evaluation of VoIP service availability through Internet measurement Quality of FEC vs. LBR  FEC is substantially and consistently better – At comparable bandwidth overhead – Across all codec configurations tested FEC vs. LBR based on G.723.1 FEC vs. LBR based on AMR 4 4 3.5 3.5 MOS 4.5 MOS 4.5 3 2.5 2 0.02 3 J: FEC (2,1) I: G.723.1 LBR 0.04 0.06 0.08 loss probability 2.5 0.1 0.12 G.729+G.723.1 LBR 2 0.02 N: AMR12.2+FEC (3,2) M: AMR12.2+6.7 LBR 0.04 0.06 0.08 loss probability AMR LBR 0.1 0.12 Quality of FEC under Bursty Loss Packet interval T has a stronger effect on MOS with FEC than without FEC MOS (Mean Opinion Score)  conditional loss probability p_c = 30% 4.5 T=40ms, FEC 4 T=20ms, FEC 3.5 T=20ms 0.5-0.6 MOS T=40ms 3 2.5 2 0.02 0.04 0.06 0.08 0.1 0.12 p_u (overall loss rate) 0.14 0.16 0.18 FEC MOS Optimization Considering Delay Effect Larger T  FEC efficiency, but delay   Optimizing T with the E-model  – Calculate final loss probability after FEC, apply delay impairment of FEC, map to MOSc  Prediction close to FEC MOS test results – Suitable for analytical perceived quality prediction FEC MOS prediction, p_c=30% FEC MOS optimization, Id != 0, d=3*T 4 4.2 p_u=4% p_u=8% p_u=12% p_u=16% E-model prediction T=40ms real MOS test T=40ms 4 3.8 3.5 MOS_c MOS_c 3.6 3.4 3 3.2 3 2.5 2.8 2.6 2 20 40 60 80 100 120 packet interval T (ms) 140 160 180 2.4 0 2 4 6 8 10 original loss rate (%) 12 14 16 Trade-off Analysis between Codec Robustness and FEC 3 loss repair options – FEC, LBR, PLC  Loss-resilient codec – Better PLC  iLBC (IETF) – But more bit-rates – Better than FEC? 4 iLBC 14kb/s G.729 8kb/s G.723.1 6.3kb/s 3.5 MOS  3 2.5 2 1.5 0 0.03 0.06 0.09 0.12 0.15 average loss probability Observations and Results  When considering delay: – iLBC is usually preferred in low loss conditions – G.729 or G.723.1 + FEC better for high loss  Example: max bandwidth 14 kb/s – Consider delay impairment (use MOSc) 4 iLBC,no FEC G.729+(5,3) G.723.1+(2,1),T=60ms 3.8 MOS_c 3.6 MOS_c 3.4 3.2 3 2.8 2.6 2.4 0 0.03 0.06 0.09 0.12 average loss probability 0.15 4 iLBC Max BW: 14 kb/s 3.8 G.723.1+(2,1),T=60ms 3.6 3.4 3.2 G.729+(5,3) 3 2.8 2.6 2.4 0 0.03 0.06 0.09 0.12 0.15 average loss probability Effect of Max Bandwidth on Achievable Quality  14 to 21 kb/s: significant improvement in MOSc  From 21 to 28 kb/s: marginal change due to increasing delay impairment by FEC 4 3.8 3.6 MOS_c 3.4 3.2 3 2.8 Max BW: 14 kb/s Max BW: 21 kb/s Max BW: 28 kb/s 2.6 2.4 0 0.03 0.06 0.09 0.12 average loss probability 0.15 Provisioning a VoIP Network  Silence detection/suppression – Transmit only during On period, saves bandwidth – Allows traffic aggregation through statistical multiplexing  Characteristics of On/Off patterns in VoIP – Traditionally found to be exponentially distributed – Modern silence detectors (G.729B VAD, NeVoT SD) produce different patterns talk-spurt/gap distribution, G.729B VAD real spurt CDF exponential spurt CDF real gap CDF exponential gap CDF 0.1 complementary CDF complementary CDF 1 0.01 0.001 0.01 0.001 0.0001 1e-05 talk-spurt/gap distribution, Nevot SD (default setting) 1 real spurt CDF exponential spurt CDF real gap CDF exponential gap CDF 0.1 0.0001 0 50 100 150 200 250 300 350 400 450 500 spurt/gap duration (in 10 ms frames) 1e-05 0 200 400 600 800 spurt/gap duration (in 10 ms frames) 1000 Traffic Aggregation Simulation    Token bucket filter with N sources, R: reserved to peak BW ratio CDF model resembles trace model in most cases Exponential (traditional) model – Under-predicts out-of-profile packet probability; – Under-prediction ratio  as token buffer size B   Similar results for NeVoT SD Summary of QoS Management  End-to-End – FEC is superior in quality to LBR – Codec robustness is better than FEC in low loss conditions   Combining both schemes brings the best of both sides Network provisioning – Observation: New silence detectors (G.729B, NeVoT SD)  non-exponential voice On/Off patterns – Result: performance of voice traffic aggregation  under new On/Off patterns – Important in traffic engineering and Service Level Agreement (SLA) validation Outline  QoS measurement – Objective vs. subjective metrics – Automated measurement of subjective quality  QoS management: improving your quality – End-to-End: FEC, LBR, PLC – Network provisioning: voice traffic aggregation  Reality check – Performance of end-points (IP phones, …) – Deployment issues in VoIP – Assessment of VoIP service availability through Internet measurement Mouth-to-ear Delay of VoIP End-points    All receivers can adjust M2E delay adaptively whenever it is too low or too high M2E delay depends mainly on receiver (esp. RAT) HW phones have relatively low delay (~45-90ms) Effect of Sender and Receiver 50 45 40 35 0 50 100 150 200 time (sec) 240 220 Sender: 3Com Sender: Cisco 200 Sender: Mediatrix 180 Sender: Pingtel Sender: RAT 160 140 120 100 80 60 40 3Com Cisco Mediatrix Receiver M2E delay (ms) M2E delay (ms) 60 experiment 1-1 experiment 1-2 silence gaps 55 250 300 350 Pingtel RAT But Adaptiveness  Perfection  Symptom of playout buffer underflow  Waveforms are dropped  Occurred at point of delay adjustment  Bugs in software?  LAN  perfect quality? Major Observations    Overall: end-points matter a lot! HW IP phones: 45-90ms average M2E delay SW clients: – Messenger 2000 lowest (68ms), XP (96-120ms)  c.f. GSMPSTN: 110ms either direction – NetMeeting very bad (> 400ms)  PLC robustness – Acceptable in all 3 IP phones tested, Cisco phone more robust  Silence detection/suppression – Works for speech input – Often fails for non-speech (e.g., music) input  Generates many unnatural gaps  Not good for customer support center (on-hold music)!  Acoustic echo cancellation (AEC): – Good on most IP phones (Echo Return Loss > 40 dB) – But some do not implement AEC at all Reality Check #2: IP Telephony Deployment  Localized deployment at Columbia Univ. Regular phone Telephone Switch/PBX Conference Server Voicemail Server sipd T1/E1 RTP/SIP SIP/PSTN Gateway IP Phones SIP proxy, redirect server SQL database Web based configuration Web Server Core Server Server status monitoring Issues and Lessons Learned  PSTN/PBX integration – Requires full understanding of legacy networks  Lower layer (e.g., T1 line configuration) – Parameters must match on both PSTN/PBX and gateway!  PBX access configurations – To ensure calls go through in both directions  Address translation (dial-plan) in both directions – Previous lessons/experiences can help greatly   E.g., second gateway installed in weeks instead of months Security – Issue: SIP/PSTN gateway has no authentication feature – Solution:   Use gateway’s access control lists to block direct calls SIP proxy server handles authentication using record-route Reality Check #3: VoIP Service Availability  Focus on availability rather than traditional QoS – Delay is a minor issue; FEC recovers most isolated losses – Ability to make a call is vital, especially in emergency  Internet measurement sites: – 14 nodes worldwide, not just Internet2 and alike  Definitions: – Availability = MTBF / (MTBF + MTTR) – Availability = successful calls / first call attempts     Equipment availability: 99.999% (“5 nines”)  5 minutes/year AT&T: 99.98% availability (1997) IP frame relay SLA: 99.9% UK mobile phone survey: 97.1-98.8% First Look of Availability  Call success probability: – 62,027 calls succeeded, 292 failed  99.53% availability – Roughly constant across I2, I2+, commercial ISPs: 99.39-99.58%  Overall network loss – PSTN: once connected, call usually of good quality  exception: mobile phones – Compute % time below loss threshold   5% loss causes degradation for many codecs others acceptable till 20% loss 0% 5% 10% 20% All 82.3 97.48 99.16 99.75 ISP 78.6 96.72 99.04 99.74 I2 97.7 99.67 99.77 99.79 I2+ 86.8 98.41 99.32 99.76 US 83.6 96.95 99.27 99.79 Int. 81.7 97.73 99.11 99.73 US ISP 73.6 95.03 98.92 99.79 Int. ISP 81.2 97.60 99.10 99.71 Network Outages Sustained packet losses interpolation)       23% packet losses are outages Make up significant part of 0.25% unavailability Symmetric: AB  BA Spatially correlated: AB   AX Not correlated across networks (e.g., I2 and commercial) Mostly short (a few seconds), but some are very long (100’s of seconds), make up majority of outage time Complementary CDF – arbitrarily defined at 8 packets – far beyond recoverable (FEC, 1 US Domestic paths International paths 0.1 0.01 0.001 0.0001 0 1 Complementary CDF  50 100 150 200 250 300 350 400 outage duration (sec) all paths Internet2 0.1 0.01 0.001 0.0001 1e-05 0 50 100 150 200 250 300 350 400 outage duration (sec) Outage-induced Call Abortion Probability      Long interruption  user likely to abandon call from E.855 survey: P[holding] = e-t/17.26 (t in seconds)  half the users will abandon call after 12s 2,566 have at least one outage 946 of 2,566 expected to be dropped  1.53% of all calls all 1.53% I2 1.16% I2+ 1.15% ISP 1.82% US 0.99% Int. 1.78% US ISP 0.86% Int. ISP 2.30% Summary of Service Availability  Through several metrics, one can translate from network loss to VoIP service availability (no Internet dial-tone)  Current results show availability far below five 9’s, but comparable to mobile telephony – Outage statistics are similar in research and ISP networks  Working on identifying fault sources and locations  Additional measurement sites are welcome Conclusions  Measuring QoS – Loss burstiness and delay correlation affects (generally worsens) perceived quality – Bridging objective and subjective metrics: the E-model, or speech recognition based MOS prediction – Performance of real products: IP phones and soft clients  Ensuring/improving QoS – Network provisioning (voice traffic aggregation)  Efficient, but may be expensive to deploy and manage – End-to-End (FEC > LBR, PLC)  Easier to deploy, but must control overhead of FEC  Reality Check – Good implementation at the end-point (e.g., IP phones) is vital – VoIP deployment requires PSTN integration and security – Service availability is crucial for VoIP, but still far from 99.999% over the Internet Ongoing and Future Work  Sampling Internet performance – Where do the problems reside?  Access networks (Cable, DSL), or  International paths? – How can we solve these problems?  Can adaptive FEC react fast enough to changes in network conditions?  Playout delay behaviors of VoIP end-points – How well do they react to jitter, delay spikes?

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download thesis_talk_mar5