Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
CS 591 S1 – Computational Audio Wayne Snyder Computer Science Department Boston University Lecture 16 Convolution Reverb Concluded Computing Spectrograms Time and Pitch Shifting by Interpolation Lecture 17 Phase Vocoder Unlocking time and pitch using the phase vocoder Computer Science Digital Audio Fundamentals: The Discrete Fourier Transform Computer Science Manipulation of spectra The spectrum produced by the FFT gives the absolute amplitudes relative to the WAV file: X = readWaveFile("Bach.Brandenburg.2.3.wav") S = spectrumFFT( X ) (F,A,phi) = zip(*S) plt.plot(F,A) Digital Audio Fundamentals: The Discrete Fourier Transform Computer Science Manipulation of spectra: logarithmic scales Often it is more useful to view this on a logarithmic scale on the frequency axis; this corresponds to human perception of pitch (e.g., the piano key frequencies are log scale): S = spectrumFFT( readWaveFile("Bach.Brandenburg.2.3.wav”) ) (F,A,phi) = zip(*S) ax = plt.axes() ax.set_xscale('log') plt.plot(F,A) Digital Audio Fundamentals: The Discrete Fourier Transform Computer Science Manipulation of spectra: squaring the amplitudes You can also view the spectrum with a log scale on the amplitude; this does not correspond to the scale used for WAV files but it DOES correspond to human perception of loudness: S = spectrumFFT( readWaveFile("Bach.Brandenburg.2.3.wav”) ) (F,A,phi) = zip(*S) ax = plt.axes() ax.set_xscale('log') ax.set_yscale(‘log’) plt.plot(F,A) Digital Audio Fundamentals: The Discrete Fourier Transform Computer Science Human perception of loudness is complicated, because it depends on Frequency Duration Power of the sound (not the amplitude) Frequency Duration The human ear averages the amplitude (sound pressure level) over a 0.6 – 1.0 sec interval. Short sounds are perceived as less loud than long sounds up to 1.0 sec. Sound Intensity and Power Intensity = amplitude squared Power = is proportional to amplitude squared Punchline: Spectra are often presented as squared amplitudes to give a sense for human perception. This is necessary only when perceptual factors are important (not so far….). Digital Audio Fundamentals: The Discrete Fourier Transform Computer Science Here is the same spectrum with squared amplitudes: Digital Audio Fundamentals: The Discrete Fourier Transform Computer Science Here is the same spectrum with squared amplitudes: Digital Audio Fundamentals: The Discrete Fourier Transform Computer Science Squaring hardly affects the log scale frequency view at all (since log( A2 ) = 2 Log( A ) ) Digital Audio Fundamentals: The Discrete Fourier Transform Computer Science Conclusion: From now on, we will usually view the spectrum in log scale on both axes, with squared amplitudes, since this corresponds to human perception more closely; but remember that when constructing a signal from a spectrum, you can NOT use squared amplitudes. Also keep in mind that log scales are only for viewing convenience, and do not change the data. Digital Audio Fundamentals: The Discrete Fourier Transform Computer Science Spectrogram: A spectrogram is a 2 D array of spectral data over time; hence it is a 3 D object over frequency, amplitude, and time: Frequency Spectrum W samples Time Digital Audio Fundamentals: The Discrete Fourier Transform Computer Science Spectrogram: Frequency A spectrogram is a 2 D array of spectral data over time; hence it is a 3 D object over frequency, amplitude, and time: W samples Time Digital Audio Fundamentals: The Discrete Fourier Transform Computer Science Spectrogram: Viewing 2D data can be done using faux-3D plots: Digital Audio Fundamentals: The Discrete Fourier Transform Computer Science Spectrogram: But is more commonly done by “heat-maps” where the amplitude is indicated by greyscale or color: Digital Audio Fundamentals: The Discrete Fourier Transform Computer Science Spectrogram: But is more commonly done by “heat-maps” where the amplitude is indicated by greyscale or color: Digital Audio Fundamentals: The Discrete Fourier Transform Computer Science Spectrogram: In matplotlib, we have a library function to create spectrograms from a signal directly: (spectrum, freqs, t, im) = plt.specgram(X,NFFT=2048, Fs=44100, noverlap=0) This is the 2D matrix of the spectrogram amplitudes. Digital Audio Fundamentals: The Discrete Fourier Transform Computer Science Spectrogram: By adding a few bells and whistles, we can get log scale and proper axis measurements: Time and Pitch Scale Modification Computer Science Review of last time: A signal can be stretched or contracted in the time domain using various kinds of interpolation techniques: Polynomial Interpolation using the interpolation library from scipy: Time and Pitch Scale Modification Computer Science Polynomial Interpolation using the interpolation library from scipy: Time and Pitch Scale Modification Computer Science Polynomial Interpolation using the interpolation library from scipy: Time and Pitch Scale Modification Computer Science Polynomial Interpolation using the interpolation library from scipy: Time and Pitch Scale Modification Computer Science To change the time/pitch scale of a signal, you would need to slide a window across the signal, calculate the interpolation function f, and apply in each window; one messy detail is that the interpolation function f needs “extra’ values on each side of the window in order to properly calculate the interpolated values: X lo hi A B f = interp1d(range(lo,hi), X[lo:hi]) for k in range(A, B): Y[i] = f( X[i] ) This works reasonably well! [BachStretchA.wav 170% cubic] Took ~18 seconds to do a 4.14 second signal. Time and Pitch Scale Modification Computer Science To change the time/pitch scale of a signal, you would need to slide a window across the signal, calculate the interpolation function f, and apply in each window; one messy detail is that the interpolation function f needs “extra’ values on each side of the window in order to properly calculate the interpolated values: X lo hi A B Listening Examples: f = interp1d(range(lo,hi), X[lo:hi]) for k in range(A, B): Y[i] = f( X[i] ) BachStretchAlinear.wav 170% linear BachStretchedA2.wav 580% cubic BachStretchedA2linear.wav 580% linear BachStretchedAAudition.wav 580% This works reasonably well! [BachStretchA.wav 170% cubic] Took ~18 seconds to do a 4.14 second signal. Time and Pitch Scale Modification Computer Science Fourier Interpolation: Another approach is to take the Fourier Transform of the entire signal: S = spectrumFFT( X ) # version of fft which returns list of triples X This is a representation of the entire signal in the frequency domain. Now we simply use the spectrum S to create a new signal at any sample rate we wish: Y = [0]* int(len(X) * P) # P is expansion factor: 1.7 = 170% Two for loops! WAY inefficient! One for i in range(len(Y)): Y.append( signal( S , i, 44100 * P ) hour to process a 4.14 second signal! Time and Pitch Scale Modification Computer Science Fourier Interpolation: Of course there is a better way to do this using the Inverse FFT to do the resynthesis, which is the basis for the resample function from the scipy.signal library: Note that this is not the new sample rate, but the new total number of samples. Time and Pitch Scale Modification Computer Science Fourier Interpolation: Of course there is a better way to do this using the Inverse FFT to do the resynthesis, which is the basis for the resample function from the scipy.signal library: Time and Pitch Scale Modification Computer Science Fourier Interpolation: Of course there is a better way to do this using the Inverse FFT to do the resynthesis, which is the basis for the resample function from the scipy.signal library: Time and Pitch Scale Modification Computer Science Fourier Interpolation: Of course there is a better way to do this using the Inverse FFT to do the resynthesis, which is the basis for the resample function from the scipy.signal library: