Download File: atrac

Properties of Musical Sound Subjective Objective Pitch Frequency Volume Amplitude/power/intensity Timbre Overtone content/spectrum Duration in beats Duration in time Direct Sound Sound waves that travel directly from the source to the listener. Direct Sound intensity attenuates with the distance according to the inverse square law. K I Dist 2 For example, doubling the distance will result in an attenuation of 4 times, or  6dB Early (first order) Reflection Sound waves that travel to the listener after reflecting “once” from the environment (mainly walls). Early reflection within 35ms from direct sound reinforce the latter. According to Beranek who study 54 concert halls, “intimate” effect was felt with early reflections of less than 20ms. In large halls, suspended reflectors are employed to provide early reflection to center seats. Reverberation (second to higher order reflection) Sound waves that travel to the listener after reflection of first-order-reflection. Reverberation will decay with time as sound energy is absorbed by the enviroment. Reverberation time is the duration for the sound pressure to drop to 60dB of its initial level, in general for the frequency range of 500-1000Hz,     Room volume m3 TR  Total room surface m 2 High frequency signals are absorbed more quickly in air than low ones, reverberation time is hence shorter. Microphone Dynamic Magnetic Induction Ribbon diaphragm Coil  Simple  Economical  Robust Magnet A Classical Ribbon Microphone Microphone Condenser     Capacitor Transducer Current Complicated Expensive Sharper transient Phantom power A Condenser Microphone Magnitude of a microphone’s response to pressure changes imposed at different directions. 0o 330o 1.0 30o 0.75 300o 60o 0.50 0.25 270o 90o 120o 240o 210o 180o 150o Omnidirectional  1 0o 330o 1.0 30o 0.75 300o 60o 0.50 0.25 270o 90o 120o 240o 210o 180o 150o Bidirectional (figure-eight)   cos  0o 330o 1.0 30o 0.75 300o 60o 0.50 0.25 270o 90o 120o 240o 210o 180o 150o Standard cardioid   0.5  0.5 cos  0o 330o 1.0 30o 0.75 300o 60o 0.50 0.25 270o 90o 120o 240o 210o 180o 150o Supercardioid   0.37  0.63 cos  0o 330o 1.0 30o 0.75 300o 60o 0.50 0.25 270o 90o 120o 240o 210o 180o 150o   0.75  0.25 cos  Subcardioid 0o 330o 1.0 30o 0.75 300o 60o 0.50 0.25 270o 90o 120o 240o 210o 180o 150o XY (coincident pair) Microphone Recording 90o-135o Top view Front view Two identical cardioids aimed across each other at 90o to 135o, 12 inches or less apart Extremely mono-compatible, moderate stereo effect. Localization of sound source based on difference in amplitude. e.g., if L > R, the source seems to be closer to the left side. Blumlein coincident Microphone Recording 90o Top view Front view Two identical “figure 8” microphones placed at 90o, one directly on top of the other Create by Alan Blumlein, provides precise stereo imaging from sound sources at front and reverberation from rear. L R Near coincident Microphone Recording 90o-135o Top view ORTF (Office de Radio Television Francaise), 2 cardioids spaced 17cm apart at 110o apart. NOS (Netherlandshe Omroep Stichting), 2 cardioids spaced 12cm apart at 90o apart. MS Microphone Recording: Recording and playback configuration can be different S (side) M (main/mono) Simulates equivalent microphone at playback M is a microphone of any polar pattern, S is a bidirectional microphone S M LM S R  M S L  R  2M e.g. a cardioid for M Preserve monophonic compatibility. Flexible stereoscopic perspectives. Optimized Cardioid Triangle (OCT) C (Center) 8cm LF (Left Front) RF (Right Front) 4-100cm INA 5 C (Center) L Ideale Nierenanordung (ideal cardioid) R 17.5cm 17.5cm Five cardioid microphones orientated in 5 directions to supply the five channels 17.5cm 60o 60cm LS 60cm RS Fukada Tree Developed by NHK C (Center) INA 5 as basis plus two omnidirectional microphones to expand spatial impression L R LL RR LS RS Pair-wise pan-pot permit permits positioning of sound source Non-zero gain is applied only to the two speakers adjacent to the phantom image location Even if there are more than two speakers, only the pair which encloses the phantom image is considered. Assuming gain decreases linearly in one channel and increase linearly in the other, we have P 2   P  1 2 1 g1  2   P  2  1 g2   P  1  2  1 2 2 Total gain   gi Total power   gi i 1 2 Ideal case: independent on image position i 1 Let  2  45o ,1  315o ( 45o ) Constant Gain Optimization Linear Panning: Total Gain 1.2 1 Channel one 0.8 Channel two 0.6 Total gain 0.4 Loudness is proportional to power instead of gain 0.2 45 35 25 15 5 -5 -1 5 -2 5 -3 5 -4 5 0 Linear Panning: Total Power 1.2 1 0.8 Channel one 0.6 Channel two 0.4 Total Power 0.2 45 35 25 15 5 -5 -1 5 -2 5 -3 5 -4 5 0    1  Let  m  90 P    2  1  g1  cos m g 2  sin m Constant Power Optimization Constant Power Panning: Total Gain Channel one Channel two 45 35 25 15 5 -5 5 -1 5 -2 5 -3 5 Total gain -4 1.6 1.4 1.2 1 0.8 0.6 0.4 0.2 0 -0.2 Constant Power Panning: Total Power 1.2 1 0.8 Channel one 0.6 Channel two 0.4 Total Power 0.2 45 35 25 15 5 -5 -1 5 -2 5 -3 5 -4 5 0 Time domain Digitization (e.g. CD) x(n) x(t) Sampling y(n) Quantization …01001010... Bit-rate = Sampling rate (f)  Bits per sample  Number of Channels Example: bit-rate of 16bits, 44kHz stereo signal = 44,100  16  2 = 1,411,200 bits per second = 176,400 bytes per second Time domain Digitization (e.g. CD) x(n) x(t) Sampling Quantization y(n) After sampling, the maximum frequency of the signal will be restricted to half the sampling frequency (why?). The highest repetitive pattern that can be obtained with a sampling interval of T is shown below: 2T T 1 T f 1 Minimum period =  f max  s 2T 2 fs  1 T f 1 Minimum period =  f max  s 2T 2 2T fs  T A common convention: Normalized the digital frequencies to the range 0,2  f w 0 0 fs/8 pi/4 fs/6 pi/3 fs/4 pi/2 fs/2 (fmax) pi Frequency spectrum of a digitized audio signal w fs/2 fs Increasing the sampling rate by two times w fs/4 fs/2 Frequency spectrum of a digitized audio signal w fs/2 fs Increasing the sampling rate by N times w fs/2N fs Increasing the sampling rate by N times Quantization noise w fs/2N fs/2 Relocate the quantization errors to the high frequency end so that it will reduce its effect on the signal q q/2 pk+1 pk If the signal is random (white) the probability distribution of the quantization noise is uniform, noise power (mean square quantization error) = 1 q/2 2 N Q   x dx  q 2 / 12 q q / 2 Whenever q is reduced by two times, the power is reduced by 4, i.e. 6dB. d(n) x(n) + _ + u(n) y(n) Q + _ + h(n) un  xn  yn  un hn (1) yn  Qun  d n (2) The combine noise addition and quantization can be represented by an overall noise term e(n), as e(n) x(n) + _ u(n) + + _ y(n) + h(n) un  xn  yn  un hn (as before) yn  un  en (3) Applying Fourier Transform gives E(w) Xw + _ Uw + + _ Y(w) + H(w) U w   X w   Y w   U w  H w  (4) Y w   U w   Ew  (5)  Y w   X w   Ew 1  H w  (6) Y w   X w   Ew 1  H w   X w   N w  (6) If H(w)=1 then the quantization error will be eliminated. However this kind of filter cannot be implemented in practice, alternatively different transfer function can be selected so that the noise will be attenuated more on the low frequency end. H w   e  jw (7)   Y w   X w   E w  1  e jw  X w   E w N w  (8) f w |N(w|2 db 0 0 0 -infinity fs/8 pi/4 0.5 -3 fs/6 pi/3 1 0 fs/4 pi/2 2 3 fs/2 pi 4 6 Noted that the noise is attenuated more at the low frequency end than the higher end. Noise power gain N PG 1  2  2 0 2 1  H w  dw  2 (9) Hence the noise shaper had increased the noise power by 3dB Time Frequency domain Digitization (e.g. MD) x(t) Sampling Block 1 Block 2 Block N x(n) Block M 1  M  N Band 1 Quantizer 1 Band 2 Quantizer 2 Band 3 Quantizer 2 Band K Quantizer K Freq. To Time Converter y(n) 1. Time signal is chopped into segments or blocks 2. Each block is transformed into its frequency spectrum 3. Frequency spectrum is partitioned into bands 4. Each band is digitized and quantized 5. In the player, each digitized band is converted back to analogue form 6. The frequency bands integrates to reconstruct the frequency spectrum 7. The frequency spectrum is transformed back to the time domain to reproduce the time segment. If each frequency band is quantized with the same number of levels, no compression is achieved. The extra, complicated effort is wasted Compression is attained if certain bands can be quantized with less number of levels However, those bands will subject to more distortion The distortion is in the form of “Quantization Noise” Any solution to make both ends meet ? Key researchers in the study of HAS 1. G. von Bekesy 2. J.B. Allen Noise is less audible at some Quantization frequencies than at others 3. H. Fletcher 4. B. Scharf 5. D.D. Greenwood Important Findings: Hearing Sensitivity, ToneMasking-Noise, Critical Bands “The brain interprets signal received via the auditory system rather than its objective representation.” R Listeners grouped tones by frequency proximity, rather than the actual representation L R L Author: Diana Deutsch Source: http://psy.ucsd.edu/~ddeutsch/psychology/figures/fig3.jpg Copyright: Diana Deutsch “When two identical but delayed audio sources are heard, the first one will inhibit the other if the delay is within 25 to 35 ms.” This is true even if the second sound is 10db above the first one. The result is sound seems to originate from the first source only, and the loudness is increased. Frequency response of human ears is non-uniform The ear operates like a spectrum analyser Analyses with frequency (critical) bands 100 Hz below 500Hz 1/6 to 1/3 of an octave above 500Hz High energy in one band may inhibit neighboring bands Masking occurs after the masking tone starts and ends: Forward and backward masking • Placed an audience in a quiet room • Raised 1kHz tone until just audible and recorded the amplitude • Repeat with other frequencies dB 20 10 2 4 6 8 10 kHz Hiding of one signal at a given frequency by another signal at or near that frequency Masking involves two signals; a Masker (M) and a Probe (P) M HAS P P is masked by M M Hiding of one signal at a given frequency by another signal at or near that frequency Masking involves two signals; a Masker (M) and a Probe (P) M HAS M P The level when P is just audible is known as “just noticable difference (JND) dB 60 1 40 20 2 4 6 8 Masking by 1kHz tone Note: Two types of masking 10 kHz dB 60 0.25 8 4 1 40 20 2 4 6 8 Masking of multiple tone Note: Two types of masking 10 kHz Masking tone Divide signal into bands Determine masking envelop Determine masked noise region Noise that can be masked Masking tone Divide signal into spectral bands Determine masking envelop Determine masked noise region Quantization is a kind of noise S  SQ  RQ  SQ  S  RQ =S + Noise Noise that can be masked The coarser the quantization, the smaller is the bit-rate. The effect, however, is negiligible is the noise can be masked Masking tone Masking tone Noise that can be masked Noise that can be masked The narrower the bandwidth of each band, the better is the noise masking effect. Frequency resolution is best at low frequencies: Easier to discriminate different frequencies Time resolution is best at higher frequencies: Easier to locate the instance of a particular tone Frequency resolution is best at low frequencies: Easier to discriminate different frequencies Time resolution is best at higher frequencies: Easier to locate the instance of a particular tone Suggest non-uniform partitioning of audio frequency spectrum Partitioning of frequency spectrum into Critical Bands according to the Psychoacoustic model Standard: The Bark Scale (after Barkhausen) f 1 Bark f  for f  500 Hz 100  f   9  4 log   otherwise  1000  dB 60 0.25 (2.5Bk) 1k (9Bk) 0.5 (5Bk) 4k (17Bk) 2k (13Bk) 40 20 5 10 15 Masking of multiple tone 20 Bark dB Mask tone Test tone 60 40 20 0 5 10 20 Test tone shortly after the Mask is not audible mS Sensitivity of the ear varies with different frequency Most sensitive: around 4kHz Less sensitive: at higher frequencies Quantization Noise is less audible at some frequencies than at others Simultaneous masking: A softer sound is less audible in the presence of a louder sound Quantization Noise is less audible at frequencies on, or closed to loud tones. Segments of input signal x(n) Y0 Y1 Y0 Y1 | .| | .| YN-1 t0 YN-1 t1 A single spectral component for each time slot Others are computed in the same x(n) x(n) DCT Yi Analyzing time windows The MDCT blends one frame into the next to avoid inter-frame block boundary artifacts. The MDCT output of one frame is windowed according to MDCT requirements, overlapped 50% with the output of the previous frame and added. Case 1: equal sized-windows The MDCT blends one frame into the next to avoid inter-frame block boundary artifacts. The MDCT output of one frame is windowed according to MDCT requirements, overlapped 50% with the output of the previous frame and added. Case 2: non-equal sized-windows A single window Y0 Y1 x(n) DCT Yi N samples     N Yi   xk  cos  2k  1  2i  1 2 k 0  2N   N 1 YN-1 Overlapping window w(n) Y0 Y1 x’(n) DCT x ' n   xn * wn  N samples     N Yi   xk  cos  2k  1  2i  1 2 k 0  2N   N 1 Yi YN-1 Overlapping window Y0 Y1 x’(n) IDCT Yi YN-1 x ' k   N 1 2   Y cos   i i 0  2N  N    2 k  1  2 i  1    2   k  0,1,...., N  1 Discard frequency band that is less essential to HAS Number of data samples x(n)= [x(0),x(1), ....., x(N-1)] Decompose x(n) into N N N MDCT coefficients Select the coefficients that are sensitive to the HAS and discard the rest K<N (e.g. select K bands where K<N) Disadvantage: Noticable distortion on discarded bands A better approach: Assign different quantization step-size to each coefficients according to their tolerance to quantization noise based on HAS x(n)= [x(0),x(1), ....., x(N-1)] Number of data samples N Decompose x(n) into N N MDCT coefficients Quantize each coefficient so that the noise is below the masking threshold (1bit = 6dB) N 1 q j 0 j whe re q j  1 Source bit-rate: 1.4Mb/s Target bit-rate: 292Kb/s A total of 52 critical bands 20 bands for lower frequencies 16 bands for middle frequencies 16 bands for higher frequencies Number of time windows: 8 Smallest time window: 1.45mS Longest time window: 11.60mS Different noise masking response in bands can be taken to quantize frequency components adaptively dB Coarse quantization allowed Fine quantization is required 5 10 No interbank masking 15 20 Bark 1.4Mbps 292kbps H Frequency M Range Analyser 11-22k MDCT 5.5-11k MDCT 0.5-5.5k L MDCT Block size decision Bit Allocation

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download File: atrac