Download File: atrac

Document related concepts

Amateur radio repeater wikipedia , lookup

Cellular repeater wikipedia , lookup

Resistive opto-isolator wikipedia , lookup

Analog television wikipedia , lookup

Telecommunication wikipedia , lookup

Phase-locked loop wikipedia , lookup

Regenerative circuit wikipedia , lookup

Audio crossover wikipedia , lookup

Loudspeaker wikipedia , lookup

Wien bridge oscillator wikipedia , lookup

Analog-to-digital converter wikipedia , lookup

Quantization (signal processing) wikipedia , lookup

Valve audio amplifier technical specification wikipedia , lookup

Mathematics of radio engineering wikipedia , lookup

Spectrum analyzer wikipedia , lookup

Superheterodyne receiver wikipedia , lookup

HD-MAC wikipedia , lookup

Radio transmitter design wikipedia , lookup

Equalization (audio) wikipedia , lookup

FM broadcasting wikipedia , lookup

Index of electronics articles wikipedia , lookup

Heterodyne wikipedia , lookup

Valve RF amplifier wikipedia , lookup

Transcript
Properties of Musical Sound
Subjective
Objective
Pitch
Frequency
Volume
Amplitude/power/intensity
Timbre
Overtone content/spectrum
Duration in beats
Duration in time
Direct Sound
Sound waves that travel directly from the source to the listener.
Direct Sound intensity attenuates with the distance according to the inverse square law.
K
I
Dist 2
For example, doubling the distance will result in an attenuation of 4 times, or
 6dB
Early (first order) Reflection
Sound waves that travel to the listener after reflecting “once” from the environment
(mainly walls).
Early reflection within 35ms from direct sound reinforce the latter.
According to Beranek who study 54 concert halls, “intimate” effect was felt with
early reflections of less than 20ms.
In large halls, suspended reflectors are employed to provide early reflection to
center seats.
Reverberation (second to higher order reflection)
Sound waves that travel to the listener after reflection of first-order-reflection.
Reverberation will decay with time as sound energy is absorbed by the enviroment.
Reverberation time is the duration for the sound pressure to drop to 60dB of its
initial level, in general for the frequency range of 500-1000Hz,
 
 
Room volume m3
TR 
Total room surface m 2
High frequency signals are absorbed more quickly in air than low ones,
reverberation time is hence shorter.
Microphone
Dynamic
Magnetic Induction
Ribbon diaphragm
Coil
 Simple
 Economical
 Robust
Magnet
A Classical Ribbon Microphone
Microphone
Condenser




Capacitor Transducer
Current
Complicated
Expensive
Sharper transient
Phantom power
A Condenser Microphone
Magnitude of a microphone’s response to pressure changes imposed at
different directions.
0o
330o
1.0
30o
0.75
300o
60o
0.50
0.25
270o
90o
120o
240o
210o
180o
150o
Omnidirectional
 1
0o
330o
1.0
30o
0.75
300o
60o
0.50
0.25
270o
90o
120o
240o
210o
180o
150o
Bidirectional (figure-eight)
  cos 
0o
330o
1.0
30o
0.75
300o
60o
0.50
0.25
270o
90o
120o
240o
210o
180o
150o
Standard cardioid
  0.5  0.5 cos 
0o
330o
1.0
30o
0.75
300o
60o
0.50
0.25
270o
90o
120o
240o
210o
180o
150o
Supercardioid
  0.37  0.63 cos 
0o
330o
1.0
30o
0.75
300o
60o
0.50
0.25
270o
90o
120o
240o
210o
180o
150o
  0.75  0.25 cos 
Subcardioid
0o
330o
1.0
30o
0.75
300o
60o
0.50
0.25
270o
90o
120o
240o
210o
180o
150o
XY (coincident pair) Microphone Recording
90o-135o
Top view
Front view
Two identical cardioids aimed across each other at 90o to 135o, 12 inches or less apart
Extremely mono-compatible, moderate stereo effect.
Localization of sound
source based on difference
in amplitude.
e.g., if L > R, the source
seems to be closer to the
left side.
Blumlein coincident Microphone Recording
90o
Top view
Front view
Two identical “figure 8” microphones placed at 90o, one directly on top of the other
Create by Alan Blumlein, provides precise stereo imaging from sound sources at front
and reverberation from rear.
L
R
Near coincident Microphone Recording
90o-135o
Top view
ORTF (Office de Radio Television Francaise), 2 cardioids spaced 17cm apart at 110o apart.
NOS (Netherlandshe Omroep Stichting), 2 cardioids spaced 12cm apart at 90o apart.
MS Microphone Recording: Recording and playback configuration can be different
S (side)
M (main/mono)
Simulates equivalent microphone
at playback
M is a microphone of any polar pattern, S is a bidirectional microphone
S
M
LM S
R  M S
L  R  2M
e.g. a cardioid for M
Preserve monophonic
compatibility.
Flexible stereoscopic
perspectives.
Optimized Cardioid Triangle (OCT)
C (Center)
8cm
LF (Left Front)
RF (Right Front)
4-100cm
INA 5
C (Center)
L
Ideale Nierenanordung (ideal cardioid)
R
17.5cm
17.5cm
Five cardioid microphones orientated in 5
directions to supply the five channels
17.5cm
60o
60cm
LS
60cm
RS
Fukada Tree
Developed by NHK
C (Center)
INA 5 as basis plus two omnidirectional
microphones to expand spatial impression
L
R
LL
RR
LS
RS
Pair-wise pan-pot permit permits positioning of sound source
Non-zero gain is applied only to the two speakers adjacent to the
phantom image location
Even if there are more than two speakers, only the pair which
encloses the phantom image is considered.
Assuming gain decreases linearly in
one channel and increase linearly in
the other, we have
P
2   P  1
2
1
g1 
2   P
 2  1
g2 
 P  1
 2  1
2
2
Total gain   gi
Total power   gi
i 1
2
Ideal case: independent on image position
i 1
Let  2  45o ,1  315o ( 45o )
Constant Gain
Optimization
Linear Panning: Total Gain
1.2
1
Channel one
0.8
Channel two
0.6
Total gain
0.4
Loudness is
proportional to
power instead of
gain
0.2
45
35
25
15
5
-5
-1
5
-2
5
-3
5
-4
5
0
Linear Panning: Total Power
1.2
1
0.8
Channel one
0.6
Channel two
0.4
Total Power
0.2
45
35
25
15
5
-5
-1
5
-2
5
-3
5
-4
5
0
   1 
Let  m  90 P

  2  1 
g1  cos m
g 2  sin m
Constant Power
Optimization
Constant Power Panning: Total Gain
Channel one
Channel two
45
35
25
15
5
-5
5
-1
5
-2
5
-3
5
Total gain
-4
1.6
1.4
1.2
1
0.8
0.6
0.4
0.2
0
-0.2
Constant Power Panning: Total Power
1.2
1
0.8
Channel one
0.6
Channel two
0.4
Total Power
0.2
45
35
25
15
5
-5
-1
5
-2
5
-3
5
-4
5
0
Time domain Digitization (e.g. CD)
x(n)
x(t)
Sampling
y(n)
Quantization
…01001010...
Bit-rate = Sampling rate (f) 
Bits per sample
 Number of Channels
Example: bit-rate of 16bits, 44kHz stereo signal =
44,100  16  2
= 1,411,200 bits per second
= 176,400 bytes per second
Time domain Digitization (e.g. CD)
x(n)
x(t)
Sampling
Quantization
y(n)
After sampling, the maximum frequency of the signal will be restricted to
half the sampling frequency (why?).
The highest repetitive pattern that can be obtained with a sampling interval
of T is shown below:
2T
T
1
T
f
1
Minimum period =
 f max  s
2T
2
fs 
1
T
f
1
Minimum period =
 f max  s
2T
2
2T
fs 
T
A common convention: Normalized the digital frequencies to the range
0,2 
f
w
0
0
fs/8
pi/4
fs/6
pi/3
fs/4
pi/2
fs/2 (fmax)
pi
Frequency spectrum of a digitized audio signal
w
fs/2
fs
Increasing the sampling rate by two times
w
fs/4
fs/2
Frequency spectrum of a digitized audio signal
w
fs/2
fs
Increasing the sampling rate by N times
w
fs/2N
fs
Increasing the sampling rate by N times
Quantization noise
w
fs/2N
fs/2
Relocate the quantization errors to the high frequency end so that it will
reduce its effect on the signal
q
q/2
pk+1
pk
If the signal is random (white) the probability distribution of the
quantization noise is uniform, noise power (mean square quantization
error) =
1 q/2 2
N Q   x dx  q 2 / 12
q q / 2
Whenever q is reduced by
two times, the power is
reduced by 4, i.e. 6dB.
d(n)
x(n)
+
_
+
u(n)
y(n)
Q
+
_
+
h(n)
un  xn  yn  un hn
(1)
yn  Qun  d n
(2)
The combine noise addition and quantization can be represented by an
overall noise term e(n), as
e(n)
x(n)
+
_
u(n)
+
+
_
y(n)
+
h(n)
un  xn  yn  un hn
(as before)
yn  un  en
(3)
Applying Fourier Transform gives
E(w)
Xw
+
_
Uw
+
+
_
Y(w)
+
H(w)
U w   X w   Y w   U w  H w 
(4)
Y w   U w   Ew 
(5)
 Y w   X w   Ew 1  H w 
(6)
Y w   X w   Ew 1  H w   X w   N w 
(6)
If H(w)=1 then the quantization error will be eliminated.
However this kind of filter cannot be implemented in practice, alternatively different
transfer function can be selected so that the noise will be attenuated more on the low
frequency end.
H w   e  jw
(7)


Y w   X w   E w  1  e jw  X w   E w N w 
(8)
f
w
|N(w|2
db
0
0
0
-infinity
fs/8
pi/4
0.5
-3
fs/6
pi/3
1
0
fs/4
pi/2
2
3
fs/2
pi
4
6
Noted that the noise
is attenuated more at
the low frequency
end than the higher
end.
Noise power gain
N PG
1

2

2
0
2
1  H w  dw  2
(9)
Hence the noise shaper had increased the noise power by 3dB
Time Frequency domain Digitization (e.g. MD)
x(t)
Sampling
Block 1 Block 2
Block N
x(n)
Block M 1  M  N
Band 1
Quantizer 1
Band 2
Quantizer 2
Band 3
Quantizer 2
Band K
Quantizer K
Freq. To
Time
Converter
y(n)
1. Time signal is chopped into segments or blocks
2. Each block is transformed into its frequency spectrum
3. Frequency spectrum is partitioned into bands
4. Each band is digitized and quantized
5. In the player, each digitized band is converted back to
analogue form
6. The frequency bands integrates to reconstruct the
frequency spectrum
7. The frequency spectrum is transformed back to the
time domain to reproduce the time segment.
If each frequency band is quantized with the same
number of levels, no compression is achieved.
The extra, complicated effort is wasted
Compression is attained if certain bands can be
quantized with less number of levels
However, those bands will subject to more
distortion
The distortion is in the form of “Quantization
Noise”
Any solution to make both ends meet ?
Key researchers in the study of HAS
1. G. von Bekesy
2.
J.B. Allen Noise is less audible at some
Quantization
frequencies
than at others
3. H. Fletcher
4. B. Scharf
5. D.D. Greenwood
Important Findings: Hearing Sensitivity, ToneMasking-Noise, Critical Bands
“The brain interprets signal received via the auditory system rather than its objective
representation.”
R
Listeners grouped
tones by frequency
proximity, rather
than the actual
representation
L
R
L
Author: Diana Deutsch
Source: http://psy.ucsd.edu/~ddeutsch/psychology/figures/fig3.jpg
Copyright: Diana Deutsch
“When two identical but delayed audio sources are heard, the first one will inhibit the other
if the delay is within 25 to 35 ms.”
This is true even if the second sound is 10db above the first one.
The result is sound seems to originate from the first source only,
and the loudness is increased.
Frequency response of human ears is non-uniform
The ear operates like a spectrum analyser
Analyses with frequency (critical) bands
100 Hz below 500Hz
1/6 to 1/3 of an octave above 500Hz
High energy in one band may inhibit neighboring bands
Masking occurs after the masking tone starts and ends:
Forward and backward masking
• Placed an audience in a quiet room
• Raised 1kHz tone until just audible and recorded the
amplitude
• Repeat with other frequencies
dB
20
10
2
4
6
8
10
kHz
Hiding of one signal at a given frequency by another
signal at or near that frequency
Masking involves two signals; a Masker (M) and a
Probe (P)
M
HAS
P
P is masked by M
M
Hiding of one signal at a given frequency by another
signal at or near that frequency
Masking involves two signals; a Masker (M) and a
Probe (P)
M
HAS
M
P
The level when P is just audible is known as “just
noticable difference (JND)
dB
60
1
40
20
2
4
6
8
Masking by 1kHz tone
Note: Two types of masking
10
kHz
dB
60
0.25
8
4
1
40
20
2
4
6
8
Masking of multiple tone
Note: Two types of masking
10
kHz
Masking tone
Divide signal into bands
Determine masking envelop
Determine masked noise region
Noise that can be
masked
Masking tone
Divide signal into spectral bands
Determine masking envelop
Determine masked noise region
Quantization is a kind of noise
S  SQ  RQ
 SQ  S  RQ
=S + Noise
Noise that can be
masked
The coarser the quantization, the smaller is the bit-rate. The effect,
however, is negiligible is the noise can be masked
Masking tone
Masking tone
Noise that can be
masked
Noise that can be
masked
The narrower the bandwidth of each band, the better is the
noise masking effect.
Frequency resolution is best at low frequencies:
Easier to discriminate different frequencies
Time resolution is best at higher frequencies:
Easier to locate the instance of a particular tone
Frequency resolution is best at low frequencies:
Easier to discriminate different frequencies
Time resolution is best at higher frequencies:
Easier to locate the instance of a particular tone
Suggest non-uniform partitioning of audio frequency
spectrum
Partitioning of frequency spectrum into Critical Bands
according to the Psychoacoustic model
Standard: The Bark Scale (after Barkhausen)
f
1 Bark
f

for f  500 Hz
100
 f 
 9  4 log 
 otherwise
 1000 
dB
60
0.25 (2.5Bk)
1k (9Bk)
0.5 (5Bk)
4k (17Bk)
2k (13Bk)
40
20
5
10
15
Masking of multiple tone
20
Bark
dB
Mask tone
Test tone
60
40
20
0
5
10
20
Test tone shortly after the Mask is not audible
mS
Sensitivity of the ear varies with different frequency
Most sensitive: around 4kHz
Less sensitive: at higher frequencies
Quantization Noise is less audible at some frequencies
than at others
Simultaneous masking: A softer sound is less audible in
the presence of a louder sound
Quantization Noise is less audible at frequencies on, or
closed to loud tones.
Segments of input signal
x(n)
Y0
Y1
Y0
Y1
|
.|
|
.|
YN-1 t0
YN-1 t1
A single spectral component
for each time slot
Others are computed in the
same
x(n)
x(n)
DCT
Yi
Analyzing time windows
The MDCT blends one frame into the next to avoid inter-frame block
boundary artifacts. The MDCT output of one frame is windowed
according to MDCT requirements, overlapped 50% with the output of the
previous frame and added.
Case 1: equal sized-windows
The MDCT blends one frame into the next to avoid inter-frame block
boundary artifacts. The MDCT output of one frame is windowed
according to MDCT requirements, overlapped 50% with the output of the
previous frame and added.
Case 2: non-equal sized-windows
A single window
Y0
Y1
x(n)
DCT
Yi
N samples
  

N
Yi   xk  cos
 2k  1  2i  1
2
k 0
 2N 

N 1
YN-1
Overlapping window w(n)
Y0
Y1
x’(n)
DCT
x ' n   xn * wn 
N samples
  

N
Yi   xk  cos
 2k  1  2i  1
2
k 0
 2N 

N 1
Yi
YN-1
Overlapping window
Y0
Y1
x’(n)
IDCT
Yi
YN-1
x ' k  
N
1
2
 
Y
cos


i
i 0
 2N

N



2
k

1

2
i

1



2


k  0,1,...., N  1
Discard frequency band that is less essential to HAS
Number of data samples
x(n)= [x(0),x(1), ....., x(N-1)]
Decompose x(n) into N
N
N
MDCT coefficients
Select the coefficients that are sensitive
to the HAS and discard the rest
K<N
(e.g. select K bands where K<N)
Disadvantage: Noticable distortion on discarded bands
A better approach: Assign different quantization step-size to
each coefficients according to their tolerance to quantization
noise based on HAS
x(n)= [x(0),x(1), ....., x(N-1)]
Number of data samples
N
Decompose x(n) into N
N
MDCT coefficients
Quantize each coefficient so
that the noise is below the
masking threshold (1bit = 6dB)
N 1
q
j 0
j
whe re q j  1
Source bit-rate: 1.4Mb/s
Target bit-rate: 292Kb/s
A total of 52 critical bands
20 bands for lower frequencies
16 bands for middle frequencies
16 bands for higher frequencies
Number of time windows: 8
Smallest time window: 1.45mS
Longest time window: 11.60mS
Different noise masking response in
bands can be taken to quantize
frequency components adaptively
dB
Coarse
quantization
allowed
Fine quantization
is required
5
10
No interbank masking
15
20
Bark
1.4Mbps
292kbps
H
Frequency
M
Range
Analyser
11-22k
MDCT
5.5-11k
MDCT
0.5-5.5k
L
MDCT
Block size
decision
Bit
Allocation