Download Camera Mouse with AHM Tracker - Computer Science

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Dynamic range compression wikipedia , lookup

Music technology (electronic and digital) wikipedia , lookup

Transcript
CS 591 S1 – Computational Audio
Wayne Snyder
Computer Science Department
Boston University
Lecture 16
Convolution Reverb Concluded
Computing Spectrograms
Time and Pitch Shifting by Interpolation
Lecture 17
Phase Vocoder
Unlocking time and pitch using the phase vocoder
Computer Science
Digital Audio Fundamentals: The Discrete Fourier Transform
Computer Science
Manipulation of spectra
The spectrum produced by the FFT gives the absolute amplitudes relative to the WAV file:
X = readWaveFile("Bach.Brandenburg.2.3.wav")
S = spectrumFFT( X )
(F,A,phi) = zip(*S)
plt.plot(F,A)
Digital Audio Fundamentals: The Discrete Fourier Transform
Computer Science
Manipulation of spectra: logarithmic scales
Often it is more useful to view this on a logarithmic scale on the frequency axis; this
corresponds to human perception of pitch (e.g., the piano key frequencies are log scale):
S = spectrumFFT( readWaveFile("Bach.Brandenburg.2.3.wav”) )
(F,A,phi) = zip(*S)
ax = plt.axes()
ax.set_xscale('log')
plt.plot(F,A)
Digital Audio Fundamentals: The Discrete Fourier Transform
Computer Science
Manipulation of spectra: squaring the amplitudes
You can also view the spectrum with a log scale on the amplitude; this does not correspond to
the scale used for WAV files but it DOES correspond to human perception of loudness:
S = spectrumFFT( readWaveFile("Bach.Brandenburg.2.3.wav”) )
(F,A,phi) = zip(*S)
ax = plt.axes()
ax.set_xscale('log')
ax.set_yscale(‘log’)
plt.plot(F,A)
Digital Audio Fundamentals: The Discrete Fourier Transform
Computer Science
Human perception of loudness is complicated, because it depends on
Frequency
Duration
Power of the sound (not the amplitude)
Frequency
Duration
The human ear averages the amplitude (sound
pressure level) over a 0.6 – 1.0 sec interval. Short
sounds are perceived as less loud than long sounds
up to 1.0 sec.
Sound Intensity and Power
Intensity = amplitude squared
Power = is proportional to amplitude squared
Punchline: Spectra are often presented as squared amplitudes to give a sense for human
perception. This is necessary only when perceptual factors are important (not so far….).
Digital Audio Fundamentals: The Discrete Fourier Transform
Computer Science
Here is the same spectrum with squared amplitudes:
Digital Audio Fundamentals: The Discrete Fourier Transform
Computer Science
Here is the same spectrum with squared amplitudes:
Digital Audio Fundamentals: The Discrete Fourier Transform
Computer Science
Squaring hardly affects the log scale frequency view at all (since log( A2 ) = 2 Log( A ) )
Digital Audio Fundamentals: The Discrete Fourier Transform
Computer Science
Conclusion:
From now on, we will usually view the spectrum in log scale on both axes, with
squared amplitudes, since this corresponds to human perception more closely; but
remember that when constructing a signal from a spectrum, you can NOT use squared
amplitudes. Also keep in mind that log scales are only for viewing convenience, and do not
change the data.
Digital Audio Fundamentals: The Discrete Fourier Transform
Computer Science
Spectrogram:
A spectrogram is a 2 D array of spectral data over time; hence it is a 3 D object over
frequency, amplitude, and time:
Frequency
Spectrum
W samples
Time
Digital Audio Fundamentals: The Discrete Fourier Transform
Computer Science
Spectrogram:
Frequency
A spectrogram is a 2 D array of spectral data over time; hence it is a 3 D object over
frequency, amplitude, and time:
W samples
Time
Digital Audio Fundamentals: The Discrete Fourier Transform
Computer Science
Spectrogram:
Viewing 2D data can be done using faux-3D plots:
Digital Audio Fundamentals: The Discrete Fourier Transform
Computer Science
Spectrogram:
But is more commonly done by “heat-maps” where the amplitude is indicated by greyscale or
color:
Digital Audio Fundamentals: The Discrete Fourier Transform
Computer Science
Spectrogram:
But is more commonly done by “heat-maps” where the amplitude is indicated by greyscale or
color:
Digital Audio Fundamentals: The Discrete Fourier Transform
Computer Science
Spectrogram:
In matplotlib, we have a library function to create spectrograms from a signal directly:
(spectrum, freqs, t, im) = plt.specgram(X,NFFT=2048, Fs=44100, noverlap=0)
This is the 2D matrix
of the spectrogram
amplitudes.
Digital Audio Fundamentals: The Discrete Fourier Transform
Computer Science
Spectrogram:
By adding a few bells and whistles, we can get log scale and proper axis measurements:
Time and Pitch Scale Modification
Computer Science
Review of last time:
A signal can be stretched or contracted in the time domain using various kinds of interpolation
techniques:
Polynomial Interpolation using the interpolation library from scipy:
Time and Pitch Scale Modification
Computer Science
Polynomial Interpolation using the interpolation library from scipy:
Time and Pitch Scale Modification
Computer Science
Polynomial Interpolation using the interpolation library from scipy:
Time and Pitch Scale Modification
Computer Science
Polynomial Interpolation using the interpolation library from scipy:
Time and Pitch Scale Modification
Computer Science
To change the time/pitch scale of a signal, you would need to slide a window across the signal,
calculate the interpolation function f, and apply in each window; one messy detail is that the
interpolation function f needs “extra’ values on each side of the window in order to properly
calculate the interpolated values:
X
lo
hi
A
B
f = interp1d(range(lo,hi), X[lo:hi])
for k in range(A, B):
Y[i] = f( X[i] )
This works reasonably well! [BachStretchA.wav 170% cubic]
Took ~18 seconds to do a 4.14 second signal.
Time and Pitch Scale Modification
Computer Science
To change the time/pitch scale of a signal, you would need to slide a window across the signal,
calculate the interpolation function f, and apply in each window; one messy detail is that the
interpolation function f needs “extra’ values on each side of the window in order to properly
calculate the interpolated values:
X
lo
hi
A
B
Listening Examples:
f = interp1d(range(lo,hi), X[lo:hi])
for k in range(A, B):
Y[i] = f( X[i] )
BachStretchAlinear.wav 170% linear
BachStretchedA2.wav 580% cubic
BachStretchedA2linear.wav 580% linear
BachStretchedAAudition.wav 580%
This works reasonably well! [BachStretchA.wav 170% cubic]
Took ~18 seconds to do a 4.14 second signal.
Time and Pitch Scale Modification
Computer Science
Fourier Interpolation:
Another approach is to take the Fourier Transform of the entire signal:
S = spectrumFFT( X )
# version of fft which returns list of triples
X
This is a representation of the entire signal in the frequency domain.
Now we simply use
the spectrum S to create
a new signal at any sample
rate we wish:
Y = [0]* int(len(X) * P)
# P is expansion factor: 1.7 =
170%
Two for loops! WAY inefficient! One
for i in range(len(Y)):
Y.append( signal( S , i, 44100 * P ) hour to process a 4.14 second signal!
Time and Pitch Scale Modification
Computer Science
Fourier Interpolation:
Of course there is a better way to do this using the Inverse FFT to do the resynthesis, which is
the basis for the resample function from the scipy.signal library:
Note that this is not the new sample rate, but
the new total number of samples.
Time and Pitch Scale Modification
Computer Science
Fourier Interpolation:
Of course there is a better way to do this using the Inverse FFT to do the resynthesis, which is
the basis for the resample function from the scipy.signal library:
Time and Pitch Scale Modification
Computer Science
Fourier Interpolation:
Of course there is a better way to do this using the Inverse FFT to do the resynthesis, which is
the basis for the resample function from the scipy.signal library:
Time and Pitch Scale Modification
Computer Science
Fourier Interpolation:
Of course there is a better way to do this using the Inverse FFT to do the resynthesis, which is
the basis for the resample function from the scipy.signal library: