Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Wideband Codecs for Enhanced Voice Quality Ensuring optimum wideband speech quality in converged VoIP/mobile applications/services Claude Gravel VP Engineering VoiceAge Corporation Contents • Introduction • Why Wideband Speech? • Deployment Challenges • AMR-WB Alleviates These Challenges • Market Momentum / Conclusions / Demo 3 VoiceAge Corporation – who are we? Business Low bit rate audio compression technologies research, IPR licensing and optimized implementations development Headquarters Montreal, Canada Technologies AMR : 3GPP, CableLabs narrowband voice codec AMR-WB : 3GPP, ITU-T, CableLabs wideband voice codec VMR-WB : 3GPP2, CableLabs wideband voice codec AMR-WB+ : 3GPP, DVB-H audio codec Achievements Won every international audio compression standard for which VoiceAge competed in the last 10 years at 3GPP, 3GPP2, ITU, ETSI, TIA, CableLabs Implementations World Class optimized implementations and proprietary solutions on multiple O/S and processors/platforms (including TI- & ARM-based systems) 4 Deployment More than 2B mobile phones and over 500M PCs currently use VoiceAge’s technologies International Standards Using ACELP® 5 Contents • Introduction • Why Wideband Speech? • Deployment Challenges • AMR-WB Alleviates These Challenges • Market Momentum / Conclusions / Demo 6 Speech Synthesis Model Used in CELP/ACELP® Speech Coding 1 = air from lungs 3 2 = vocal chords (periodicity) 3 = vocal tract articulators (including jaw, lips, tongue, velum) 2 1 c(n) Innovative excitation 1 7 Long-term Prediction 2 v(n) Short-term Prediction 3 ^ s(n) Synthesized speech Speech Signal – Basically, same synthesis model for everyone – So, speech has a “universal” structure or signature 1.25 sec v oi ce a ge 180 ms 45 ms Voiced fricative 70 ms • quasi periodic + noise Purely Voiced • lower energy • quasi periodic • high energy • more low frequency energy • strongly correlated 45 ms 8 45 ms Unvoiced • non periodic • low energy • uncorrelated • more high frequency energy Transient • variable energy • fast spectral evolution What is Wideband Communication? • Delivers double the audio signal bandwidth • Enables digital end-to-end packet-based services to deliver much better speech communication quality than traditional PSTN circuitswitched telephony • VoIP quality differentiator 9 Signal Power • Substantially increases captured speech information Frequency Range An Emerging Opportunity to Deliver Vastly Improved Speech Quality Signal Bandwidth Wideband Speech: Below 200 Hz: increased naturalness, presence, and comfort. Above 3400 Hz: increased intelligibility and fricative differentiation Voiced segment Unvoiced segment 10 Typical Speech Signal Acoustics 7000 Improved voice quality & intelligibility (e.g., s & f differentiation) 6000 4000 3000 2000 1000 0 0 0.5 1.0 1.5 Time [s] 2.0 2.5 3.0 200 - 3400Hz 50 - 7000Hz Frequency [Hz] 5000 Improved speech naturalness, presence and comfort “Everyone looked extremely confused about the news” 11 Wideband telephony covers much more speech signal information Why Wideband Speech Now? • Improved intelligibility, naturalness and presence – Reduces listener fatigue – Improved hands-free/speakerphone sound quality – Improves speaker and speech recognition • High-quality low-bit-rate wideband codecs – G.722.2/AMR-WB at ~7–24 kbps – No need to increase network capacity to deliver better quality sound • Wideband capable devices are available now – Wideband audio microphones and device acoustics more affordable • Rising user awareness of enhanced sound quality – Wideband teleconferencing – Wideband enterprise/ASP IP telephony – Wireless/VoIP multimedia services 12 Speech Coding Technology, Network/Device Capabilities and Market Demand are Converging Towards Pervasive Wideband Communications Contents • Introduction • Why Wideband Speech? • Deployment Challenges • AMR-WB Alleviates These Challenges • Market Momentum / Conclusions / Demo 13 Voice Processing -- Key for Speech Quality Control & Management Voice Processing (Digital Communications Domain) PCM I/F Echo Echo Canceller Canceller Speech Codec Noise Suppressor VAD CNG DTX PLC VariableMulti Rate Switching Jitter Buffer Voice MIBs System MIBs Call Processing SNMP Signaling Protocol Packet De-Packet [RTP] UDP Analog Domain TCP IP MAC Layer Physical Layer 14 Codec choice impacts network cost and interoperability + A major contributor to the listener quality experience Speech Coding Attributes As required by specific applications Bit rate • As low as possible Delay • As little as possible Quality • As high as possible Difficult to attain all of these often divergent objectives at the same time Complexity • As algorithmically simple as possible to constrain platform processing and memory requirements and reduce battery consumption in mobile devices Robustness • Effective operation under background noise and channel impairment conditions Standards compliance • Open, tested and interoperable solutions 15 VoIP Speech Quality Challenges 16 • Missing packets • Due to network congestion or transmission errors • Wireless networks are more prone to losing packets • Packet delay • Due to network congestion or transmission errors • Real-time communication can’t wait too long for packets or retransmission • Transcoding • Needed when end-devices and network equipment support incompatible speech/audio coding technologies – traversing diverse networks such as across fixed/mobile environments • Increases system costs, adds delays and introduces audio quality impairments • Background noise • Reduces intelligibility and comfort level of conversations • Ambient office/workplace/household noise • Street/car noise in mobile applications Speech Processing Techniques for Improving VoIP Voice Quality • Missing packet impairments can be mitigated through… – Sending additional data to help preserve information • FEC/Repetition of frames • Works well for sporadic packet losses but not so well for bursts of lost packets • Increases transmitted bit rate to send redundant information frames f(n-2) f(n-1) f(n) f(n+1) f(n+2) f(n+3) f(n+4) p(n-1) p(n) p(n+1) packets p(n+2) p(n+3) time 17 p(n+4) A simple forward error correction scheme based on repeating the previous frame in each packet Speech Processing Techniques for Improving VoIP Voice Quality • Missing packet impairments can be mitigated through …(cont’d) – Packet loss concealment (PLC) • Techniques used by the decoder to estimate parameter values for missing frames based on the characteristics of preceding frames • Can be improved by classifying frames and repeating or adjusting parameters based on heuristics driven by the classes of the frames preceding the missing frame(s) – Extrapolate missing frame parameters as a function of the expected frame class (e.g., voiced/unvoiced, stops, nasals, …) – E.g., for voiced frames, repeat the pitch parameters – Objective: limit abrupt changes in energy that can cause annoying clicks • Late packet arrival processing can also be leveraged to benefit from some of the information in a packet that arrives too late – Can benefit PLC methods as applied to subsequent delayed or lost packets 18 Speech Processing Techniques for Improving VoIP Voice Quality • Missing packet impairments can be mitigated through…(cont’d) – Frame Interleaving • Each packet contains non-contiguous frames to lower the overall impact on the reconstructed speech signal of a lost packet • Introduces delays which may make it unsuitable for real-time speech communication • Works well for audio streaming frames f0 f0 f1 f3 packet 1 f2 f3 f4 19 f6 f1 f4 f7 f8 I.e., loss of packet 2 leads to non-contiguous missing frames which are easier to compensate for in the decoder through PLC f6 packet 2 time f5 f7 f2 f5 f8 packet 3 Speech Processing Techniques for Improving VoIP Voice Quality • Network congestion, which can lead to delayed or dropped packets, can be alleviated by lowering the average communication bit rate … – VAD/DTX/CNG • Using Voice Activity Detection (VAD), Discontinuous Transmission (DTX) and Comfort Noise Generation (CNG) capabilities to limit consumed bandwidth during periods of silence during a conversation – Adaptive codecs – Source controlled » Optimal selection of the bit rate and coding scheme based on active speech – Network controlled » Adapt the bit rate to make best use of varying available bandwidth 20 Transcoder-Free Network Design for Fixed/Mobile Convergence 21 Improving VoIP Speech Quality Mitigating the main issues impacting VoIP speech quality • Missing packets • Delayed packets • Transcoding • Background noise 22 • Proper network engineering with integrated QoS mechanisms (in closed systems) • Choosing the best speech coding/processing technology (adaptive, enhanced voice quality, robust and extensible) • Improved packet loss concealment •Late packet arrival processing •Time scale modification • Adaptive jitter buffering • Transcoder-free network design to avoid increased system costs, delays and audio quality impairments • Leverage seamlessly interoperable standardsproven codecs • Choose codecs that can readily accommodate background noise suppression algorithms • Proven noise suppression in standards selection & characterization testing results Contents • Introduction • Why Wideband Speech? • Deployment Challenges • AMR-WB Alleviates These Challenges • Market Momentum / Conclusions / Demo 23 Why AMR-WB/G.722.2 • AMR-WB/G.722.2 is the right wideband codec for network convergence – Very robust • Supports dynamic adaptation to mobile network conditions • Includes built-in efficient packet loss concealment • Performs well even with high bit error rates – Multi-rate codec delivers very good quality even at bit rates comparable to those of narrowband (~12 kbps) • No need for potentially costly and time-consuming network capacity upgrades – – – – Supports VAD/DTX/CNG for enhanced efficiency Low-complexity encoder and decoder Standardized in 3GPP, ITU-T & CableLabs PacketCable 2.0 Can interoperate transcoder free across mobile/IP networks • Eliminates latency, impairments, costs 24 Subjective NB-WB Quality Comparison NB-WB Voice Quality as a Function of Bit Rate Ericsson Review, No. 3, 2006 25 AMR-WB/G.722.2 Greatly Improves Perceived Voice Quality AMR-WB Subjective Testing Results 5.0 4.5 Clean Condition Test (English Language) AMR-WB/G.722.2 Characterization Test G.722 @ 64 kbps 4.0 G.722 @ 48 kbps MOS 3.5 3.0 G.722.2 @ 8.85 kbps 2.5 G.722.2 @ 12.65 kbps G.722.2 @ 18.25 kbps 2.0 G.722.2 @ 23.05 kbps 1.5 1.0 26 No Tandem -26 dBov Self-Tandem -26 dBov AMR-WB/G.722.2 Delivers Excellent Wideband Speech Quality Even at Low Bit Rates (e.g., MOS at 8.85 kbps exceeds G.722 at 48 kbps) AMR-WB CPU efficiency • AMR-WB/G.722.2 performance on widely deployed communications device processors show the codec’s relatively low complexity Mode Bit rate (kbps) 0 6.6 1 8.85 2 12.65 3 14.25 4 15.85 5 18.25 6 19.85 7 23.05 8 23.85 39 11 34 9 39 8 41 8 41 8 42 8 43 8 43 8 43 9 19.67 4.88 21.24 4.35 24.64 4.20 27.02 4.30 27.23 4.39 28.20 4.55 29.33 4.61 29.13 4.83 26.64 5.21 22.15 5.94 23.75 5.00 26.98 4.81 29.36 4.85 29.58 4.88 30.68 4.95 32.10 4.98 31.76 5.05 29.97 5.40 ARM 9E (MHz) Encoder Decoder TI C55x (MIPS) Encoder Decoder TI C64x (MIPS) Encoder Decoder 27 Supported by most commonly used communications processors The Standard Solution Advantage • Open, collaborative and competitive process • Requirements specifically address target applications • Published algorithms and source code – Permits wider and more effective scrutiny – Clearer intellectual property ownership • Rigorous comparative testing under diverse conditions – – – – 28 Background noise types and levels Spoken languages Speaker types Various network impairments Interoperable, Open and Fully Tested Ensures that the best technologies are chosen Interoperability between Fixed/Mobile Network Services Transcoder-free Interoperability in Fixed/Mobile Convergence • • • • 3GPP – Wi-Fi/WiMAX – ITU-T interoperability AMR-WB / G.722.2 end-to-end across networks No need for transcoding at media gateways Improves on service quality end to end • • 29 Reduces network delays and equipment complexity Lowers network costs (equipment costs and licensing) Contents • Introduction • Why Wideband Speech? • Deployment Challenges • AMR-WB Alleviates These Challenges • Market Momentum / Conclusions / Demo 30 Growing Market Momentum Chipset / Silicon Vendors • VeriSilicon • Texas Inst. • Freescale • Renesas • ST Micro •… 31 Test Set Vendors Terminal Device Manufacturers • • • • • • • • • Nokia Sony-Ericsson Motorola Samsung Panasonic NEC CounterPath Polycom Mobiles, Softphones, VoIP terminals, Conferencing terminals… Network Equipment Vendors • • • • Nokia Ericsson AudioCodes Gateways, ATA/MTA, Softswitches, … • VoiceAge • Others… Codec Developers • T-Mobile Trial • Wireless Operators • Cablecos • VoIP ASPs • … • Ixia • Tektronix • GL Comms • NetHawk • Many others Network Operators Service Providers Accelerating Adoption of AMR-WB/G.722.2 leads to Happy Consumers and a Wealthy Telecom Service Value Chain Successful Ericsson/T-Mobile Trial > 90% +’ve 35% Extremely Good 36% 11% 11% Good Quite Good Nice to Have 4% 2% 3% Ericsson Review, No. 3, 2006 • 150 consumers participated for 4 weeks in Germany, April/May 2006 – confirmed earlier lab MUSHRA tests Quite Bad Bad Extremely Bad – More than 90% perceived better voice quality & clarity – Felt a greater sense of privacy, discretion & comfort due to improved voice quality & intelligibility – Could more easily place & complete calls in environments with high background noise – Business users highly valued voice quality for improving communication, reducing expenses & giving a positive impression • Ericsson anticipates positive outcomes for operators 32 – More mobile traffic, i.e., more calls for longer durations – Can offer enhanced services for conferencing, personalized ringback signals, automatic voice recognition, voice mail … – Can cut costs, e.g., by reducing cost of acquiring new subscribers, reducing helpdesk costs Wideband Speech Communications An Evolutionary Migration • Wideband speech coding is consistent with narrowband codecs – Bit rates comparable to narrowband codecs – Similar robustness techniques to handle packet losses and delays can be used – Low-complexity implementations available for all popular communications processor types – While vastly improving perceived voice quality • Strategically deploying wideband capability in terminal and network equipment enables evolution to wideband speech communications 33 – Compatible with existing network infrastructure – No forklift replacements needed … a graceful evolutionary migration, not a disruptive revolution Conclusions Speech communications are rapidly moving to end-to-end digital packets over all networks – wired and wireless – towards fixed/mobile convergence • Provides an opportunity to vastly improve communications quality through widescale deployment of wideband speech – Efficient codecs, devices with wideband acoustics and processing are already available • Many benefits but also some challenges to consistently delivering high-quality voice end to end in real-world deployments • Enhanced speech coding and processing techniques have been developed to help overcome these challenges • The selection of standards-based advanced wideband speech coding technologies such as AMR-WB/G.722.2 is one of the fundamental steps towards improving voice quality between diverse devices and converging networks • Adoption of AMR-WB/G.722.2 in the telecom service delivery value chain is growing – wideband speech quality has been shown to be highly preferred by consumers 34 Are your devices, systems, solutions, services ready? Hear the rich sound of wideband Wideband Demo 35 Wideband Codecs for Enhanced Voice Quality Thank you! [email protected] www.voiceage.com Come and talk to VoiceAge at Booth #107 36