Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Pitch Recognition with Wavelets 1.130 Final Presentation by Stephen Geiger What is pitch recognition? Well, what is pitch? . . . How HIGH or LOW a sound is Which note? Perceived Frequency Relationship Between Pitch and Frequency Pitch Fundamental Frequency For Example: For Middle C: Frequency = 262 Hz MATLAB CODE: fs = 22050; % Sampling Frequency. f = 262; % Fundamental Freq of Middle C. t=0:1/(fs):1; % Time range of 0 to 1 seconds. sound(cos(2*pi*f*t)/2,fs); % Make some noise! For an A Scale: A = A#= B = C = C#= D = D#= 220*2^(0/12)= 220*2^(1/12)= 220*2^(2/12)= 220*2^(3/12)= 220*2^(4/12)= 220*2^(5/12)= 220*2^(6/12)= 220 233 247 262 277 294 311 Hz Hz Hz Hz Hz Hz Hz E = F = F#= G = G = A = 220*2^(7/12) = 220*2^(8/12) = 220*2^(9/12) = 220*2^(10/12)= 220*2^(11/12)= 220*2^(12/12)= 330 349 370 392 415 440 Hz Hz Hz Hz Hz Hz An Octave Up: For C5: Frequency = 524 Hz MATLAB CODE: fs = 22050; % Sampling Frequency. f = 524; % Fundamental Freq of C5. t=0:1/(fs):1; % Time range of 0 to 1 seconds. sound(cos(2*pi*f*t)/2,fs); % Make some noise! A Sum with 2 Frequencies: Frequency = 262 Hz and Frequency = 524 Hz MATLAB CODE: fs = 22050; % Sampling Frequency. f1 = 262; % Fundamental Freq of Middle C. f2 = 524; % Fundamental Freq of C5. t=0:1/(fs):1; % Time range of 0 to 1 seconds. sound((cos(2*pi*f1*t)+ . . . 0.25*cos(2*pi*f2*t))/2,fs); Freq in a Piano - Middle C Frequency, Hz FFT of a Oboe Middle C Frequency, Hz Mono vs. Poly Monophonic one note at a time (e.g. trumpet) Polyphonic Creates a problem for pitch recognition. (especially octaves!) multiple notes at a time (e.g. piano, orchestra) Some Existing Methods Time Domain – Pitch Period estimation With wavelets. With auto-correlation function. Freq. Domain – Find Fundamental Auditory Scene Analysis Blackboard Systems Neural Networks Perceptual Models What applications are there? Transcription Modeling Speech Besides of Music of Musical Instruments Analysis its an Interesting Problem My Work . . . A Novel Wavelet Approach Based on an observation made by Jeremy Todd, that: For a piano playing these notes, a CWT could be used to identify a ‘G’ with certain scale/wavelet combinations. Even with some polyphony ! Finding a G in a C Scale Original Signal CWT @ Specific “Scale” The Continuous Wavelet Transform Definition of a CWT: C a ,b 1 t b f (t ) dt a a Where: a = scaling factor b = shift factor f(t) = function we start with (t) = Mother wavelet What is Scale? LOW SCALE Compressed Wavelet Lots of Detail High Frequency (You are here) HIGH SCALE Stretched Wavelet Coarse Features Low Frequency (And here) Gaussian 2nd Order Wavelet Initial Work Took an empirical approach. Ran a number of CWT’s at varying scale, and looked at the results. Picked out a CWT scale for each note in the C scale. Finding Notes in a C Scale Original Scale: 594 530 472 446 394 722 642 606 Finding Notes w/ Polyphony Original Scale: 594 530 472 446 394 722 642 606 More Complex Polyphony Original Scale: 594 530 472 446 394 722 642 606 Testing with different timbre Original Scale: 594 530 472 446 394 722 642 606 Why does this work? The scale parameter in the CWT affects frequency response. However, our “scales” that work don’t seem to follow a clear pattern. Training Algorithm Again, took an empirical approach. Ran CWT’s at varying scales, on sample files containing one note. Picked out scales, where: maximum of the CWT for one note >> other notes (and collected results). Results of Training Algorithm ... Longer C Scale – Trained on 3 Octaves of Notes *From Right Hand of Prelude in C, Op. 28 No. 1 A Fragment by Chopin* Training on a ‘Real’ Guitar Only able to find 5 of 8 pitches for C Scale training case. (With limited attempt). Results on a test file were not completely accurate. Expected to be a more difficult case than a piano. Could merit a more thorough try. Entire 88 K on a P Work in progress. It takes a long time to run many CWT’s on 88 different sound files. Initial results able to identify notes 70-88. Frequency Response Revisited Frequency Response of a 2nd Order Gaussian Wavelet Resulting Scales for 22 Piano Notes 2500 2000 1500 SCALE 1000 500 0 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 NOTE NUMBER 15 16 17 18 19 20 21 22 Resulting Scales for 8 Sinusoidal Notes 14000 12000 10000 SCALE 8000 6000 4000 2000 0 0 1 2 3 4 5 NOTE NUMBER 6 7 8 Conclusions The novel wavelet approach isn’t perfect. Requiring “training” is a handicap. Most likely not suited to sources with varying timbre. (e.g. guitar, voice) Some interesting results. The mechanism of detection could be further investigated and better understood.