Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
International Conference on Electrical, Electronics, and Optimization Techniques (ICEEOT) - 2016 Raaga Identification using Clustering Algorithm Krishna Pawan Kumar Sukesh Rao M. Department of Electronics and Communication Engineering NMAM Institute of Technology, Nitte Autonomous College under VTU, Belagavi Karkala, Udupi-574110 [email protected] Department of Electronics and Communication Engineering NMAM Institute of Technology, Nitte Autonomous College under VTU, Belagavi Karkala, Udupi- 574110 [email protected] Abstract— Music information retrieval is one of the very important emerging trends in the field of speech signal processing. Any music rendition comprises of a set of Swaras, which lie on different pitch frequencies. Therefore pitch detection plays an important role in identifying the Raaga. In Pitch Detection Algorithm, Subharmonics-toHarmonics Ratio is used. The frequencies of different Swaras are calculated using the Pitch Detection Algorithm. A study of comparison between the inbuilt Pitch Detection Algorithm and the standard music analysis tool is used to obtain pitch frequency of Swaras. A Clustering Algorithm is proposed to identify the nearest matching Swaras and thereby identifying the Raaga. The Clustering Algorithm is used to test several Raagas. The results from these tests conducted are tabulated and Raagas identified are found to be accurate. Keywords—Pitch Detection Algorithm, Raaga Identification, Music Information Retrieval, Subharmonics-to-Harmonics Ratio, Clustering Algorithm I. INTRODUCTION Raagas are the melodic framework or fomalization of melodies found in Indian Classical Music (Hindusthani and Carnatic). In Carnatic music, there are 72 Melakartha Raagas1 or Janaka Raagas 2and numerous Janya Raagas 3derived from the Janaka Raagas considering various combinations of Swaras. Each Raaga therefore comprises of a sequence of Swaras depicting the mood and sentiments. Melakartha Raagas or Janaka Raagas have all seven Swaras4 (Sa, Ri, Ga, Ma, Pa, Da, Ni) in the sequential order. These Swaras or notes are called Saptha5 Swaras. These seven notes form an Octave. The Melakartha Raagas are formed by permutation and combination of the 16 theoretical Swaras used in the musical system. The sixteen Swaras are named as 1. Shadja6 1 Raagas having notes Sa, Ri, Ga, Ma, Pa, Da, Ni in the sequential order 2 Synonym of Melakartha Raagas 3 Derived Raagas from Melakartha Raagas 4 Notes in music 5 Saptha means seven in Sanskrit 6 Fundamental frequency that says the pitch of the performer 978-1-4673-9939-5/16/$31.00 ©2016 IEEE 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. Shudhdha Rishabha Chathushruthi Rishabha Shatshruthi Rishabha Shudhdha Gaandhaara Saadhaarana Gaandhaara Antara Gaandhaara Shudhdha Madhyama Prati Madhyama Panchama Shudhdha Daivatha, Chathushruthi Daivatha Shatshruthi Daivatha Shudhdha Nishada Kaishiki Nishada and Kaakali Nishada. The Swaras mentioned here are in the order of increasing frequencies of a particular pitch frequency. Further, it is also to be noted that of the sixteen theoretical Swaras, following Swara pairs are of the same frequency: 1. Chathushruthi Rishabha and Shudhdha Gaandhara 2. Shatshruthi Rishabha and Saadhaarana Gaandhaara 3. Shudhdha Nishada and Chathushruthi Daivatha, and 4. Shatshruthi Daivatha and Kaishiki Nishada Since the above mentioned pairs of Swaras lie on the same frequencies, either of the two is selected to conclude a Raaga. This means that a Raaga having Chathushruthi Rishabha, will not have Shudhdha Gaandhaara. This condition applies to other three pairs of Swaras mentioned above. Therefore Swaras are limited from 16 theoretical Swaras to 12 practical Swaras. The 12 Swaras would therefore be as below. 1. Shadja 2. Shudhdha Rishabha 3. Chathushruthi Rishabha 4. Saadhaarana Gaandhaara 5. Antara Gaandhaara 6. Shudhdha Madhyama 7. Prati Madhyama 8. Panchama 9. Shudhdha Daivatha 10. Chathushruthi Daivatha 11. Kaishiki Nishada and 12. Kaakali Nishada. These 12 Swaras are used to identify the Raagas using a suitable Clustering Algorithm. However some music scholars speak about the 22 Swara system which was used during the Vedic era. These 22 Swaras are called as 22 pitches or Dwavimshathi7 Shruthi8s. These 22 Shruthis incorporate the concept of Gamakas to differentiate between the different Raagas. To understand the concept of 22 Shruthis, a novice has to learn music for several years. Raagas are defined by a set of ascending and descending Swaras. These are called as Aarohana9 - Avarohana10. Only in Melakartha Raagas, Aarohana-Avarohana will have the seven sequential notes. It is essential for a student of music to identify Raagas and therefore needs to have a good knowledge of various Raagas and their Aarohana-Avarohana patterns in order to identify Raagas which they would be able to identify only after several years of dedicated practice. However, a computer can easily identify these Raagas using principles of digital signal processing and with the aid of suitable Clustering Algorithm. II. LITERATURE SURVEY Raaga identification is a process of listening to a piece of music, synthesizing it in to a sequence of notes, analyzing the sequence of notes and comparing the results obtained with that of the Aarohana-Avarohana of various Raagas in order to identify the Raaga it follows. Support Vector Machines are used to perform classification and the system is found to be 88% accurate [1]. Pitch, i.e., fundamental frequency, is an important feature in speech research areas. Pitch determination remains one of the most challenging problems in speech analysis. The most common errors are pitch doubling and pitch halving due to the appearance of Alternate Pulse Cycles (APC). Pitch detection using Subharmonic-to-Harmonic Ratio (SHR) is a perception oriented Pitch Detection Algorithm (PDA). This algorithm effectively reduces the gross error rate resulting from Subharmonics [2]. The identification of Raagas is cognitive, and comes only after adequate amount of exposure to classical music. One can identify a Raaga by finding its most prominent Swara by counting the number of occurrences or the duration of each Swara. Gamakas, the variation of pitch around a note can be used to identify a raga as only certain type of variations are allowed in each raga [3]. The pitch candidates or probabilities are connected into pitch contours using dynamic programming or Hidden Markov Models (HMMs). Deep Neural Network (DNN) and Recurrent Neural Network (RNN) are also used to estimate the posterior probabilities of pitch states for pitch tracking in highly noisy speech. The supervised learning based approach 7 Dwavimshathi means 22 in Sanskrit Musical terminology for pitch 9 Set of ascending Swaras as in, Sa Ri Ga Ma Pa Da Ni Sa’ (in case of Melakartha Raagas) 10 Set of descending Swaras as in Sa’ Ni Da Pa Ma Ga Ri Sa (in case of Melakartha Raagas) 8 produces strong pitch tracking results even in noisy conditions [4]. III. IMPLEMENTATION Pitch detector is an essential component in speech processing systems. Music is a highly structured system dictated by specific rules for time, beat, rhythm, pitch and harmony. A scale is a sequence of pitches or notes with a specific spacing in frequency. Pitch is the smallest interval between two notes. Fundamental frequency estimation is referred to as pitch detection. In a music rendition, the fundamental frequency is frequency of the note Shadja which is the lowest frequency as compared to all other Swaras included in the AarohanaAvarohana of a Raaga. A number of methods have already been proposed for pitch detection of music signals. They are: Modified autocorrelation method using clipping (AUTOC), Cepstrum based pitch detection (CEP), Simplified inverse filtering technique (SIFT), Data reduction method (DARD), Parallel processing method (PPROC) etc. Pitch Detection Algorithm using Subharmonic-to-Harmonic Ratio (SHR) is used to detect frequencies of Swaras rendered in the recording. Subharmonic-to-Harmonic Ratio means the amplitude ratio between Subharmonics and Harmonics. When the ratio is small, the perceived pitch remains the same. As the ratio increases above certain threshold, the Subharmonics become clearly visible on the spectrum, and the perceived pitch becomes one octave lower than the original pitch. Praat is a free scientific computer software package for speech analysis ,developed by Paul Boersma and David Weenink of University of Amsterdam. Praat is a very flexible tool to do speech analysis. Praat tool is used as the standard music analysis tool for this implementation and the results obtained from the Pitch Detection Algorithm using SHR is compared with the results of Praat and the best possible, near to the ideal result from the inbuilt Pitch Detection Algorithm is considered for evaluation. The shrp.m is a MATLAB® exchange file that is used for pitch detection of voice signals with and without APC’s. The inbuilt Pitch Detection Algorithm obtained from MATLAB® file exchange has the following set of parameters that are crucial for analysis. They are: input data, sampling frequency, frequency range (for male and female voice in case of vocal signals), frame length and time step. Input data is nothing but the music signal. Sampling frequency is extracted from the input music signal and the same sampling frequency is used for analysis. Frequency ranges vary for male and female voices. For male voice, the default range of frequency is from 50Hz to 500Hz. For female voice, the default range of frequency is from 120Hz to 400Hz. Table I. Frequency values obtained for frame length of 40ms and time step of 10ms using inbuilt pitch detection as compared with Praat. Swaras Sa Re1 Ga3 Ma1 Pa Da1 Ni3 Frequencies (Hz) from Praat tool 133 134.8 164.1 175.7 197 205.1 252.5 Frequencies (Hz) from SHR tool 133.0614 137.4639 164.5236 174.4046 197.9144 203.3223 250.0679 Error frequency %age error 0.0614 0.3361 0.4236 1.2954 0.9144 1.7777 2.4321 0.4616 0.2493 0.2581 0.7372 0.4620 0.866 0.96 Frame length is defined by the smallest time duration (in milliseconds) for which pitch estimation needs to be done at once. The time duration is converted into number of samples to be subjected for analysis. The default value of frame length is 40ms. Time step is defined as the interval for updating short-term analysis in milliseconds. The default value of time step parameter in shrp.m function is 10ms. By varying frame length and time step but keeping the frequency ranges constant for the music signals, the obtained frequency values of the notes are compared with the standard music analysis tool and the values are tabulated. Figure1. represents the frequency plot for the Raaga “Maayamaalavagaula” using Praat tool. The male audio voice signal provided as input is a .wav file stored in the database with sampling frequency 8 kHz and frequency range varying from 50 Hz to 500 Hz. By tabulating the values of frequencies obtained using a standard music analysis tool and the inbuilt Pitch Detection Algorithm using Subharmonics-to-Harmonics Ratio, the values of frame length and time step that provide near to the ideal values of frequencies are obtained. Both male and female audio voice clippings are used for the analysis. music analysis tool and Frequency error is computed. The parameters of frame length and time step to be chosen are dependent on the frequency error. Figure 2. Plot for Raaga “Maayamaalavagaula” using the inbuilt Pitch Detection Algorithm Table II represents the frequency values obtained for frame length of 30ms and time step of 10ms using inbuilt Pitch Detection Algorithm as compared with Praat, by taking a male voice as an input signal. When the frequency values are compared, it is found that error percentage is significant for Nishada in the above mentioned condition. Table II. Frequency values obtained for frame length of 30ms and time step of 10ms using inbuilt pitch detection as compared with Praat. Swaras Frequencies Frequencies Error %age (Hz) from (Hz) from frequency error Praat tool SHR Sa 133 132.76 0.24 0.1805 Re1 134.8 136.0014 1.0089 0.7484 Ga3 164.1 164.2820 0.182 0.1109 Ma1 175.7 174.1835 1.5165 0.8631 Pa 197 196.9135 0.0865 0.439 Da1 205.1 206.8970 1.797 0.876 Ni3 252.5 265.42 12.92 5.11 Table III represents the frequency values obtained for frame length of 40ms and time step of 10ms using inbuilt Pitch Detection Algorithm as compared with Praat, by taking a female voice as an input signal. Figure 1. Frequency plot for the Raaga “Maayamaalavagaula” using the Praat tool Figure 2 represents the time and frequency domain plot for Raaga Maayamaalavagaula. The frequencies obtained here is compared with the frequencies obtained from the standard Table III. Frequency values obtained for frame length of 40ms and time step of 10ms using inbuilt pitch detection as compared with Praat. Swaras Frequencies Frequencies Error %age (Hz) from (Hz) from frequency error Praat tool SHR Sa 206.2 206.3572 0.1572 0.0762 Re1 210.8 210.3020 0.498 0.2362 Ga3 256.3 256.4283 0.1283 0.05 Ma1 272.8 272.9942 0.1942 0.0712 Pa 310.8 309.4255 1.3745 0.4422 Da1 323.8 316.6327 7.1673 2.213 Ni3 387.3 384.8597 2.44030 0.63 Table IV represents frequency values obtained for frame length of 40ms and time step of 10ms using inbuilt Pitch Detection Algorithm as compared with Praat, by taking a female voice as an input signal. Table IV. Frequency values obtained for frame length of 30ms and time step of 10ms using inbuilt Pitch Detection Algorithm as compared with Praat. Swaras Frequencies Frequencies Error %age (Hz) from (Hz) from frequency error Praat tool SHR Sa 206.2 205.2972 0.9028 0.4378 Re1 210.8 210.0223 0.7777 0.3689 Ga3 256.3 255.2144 1.0856 0.4235 Ma1 272.8 272.6618 0.1382 0.051 Pa 310.8 311.4716 0.6716 0.2161 Da1 323.8 323.1433 1.0020 0.3095 Ni3 387.3 384.0283 3.2717 0.8447 From the tabulation, the frequency values obtained from the inbuilt Pitch Detection Algorithm exactly matches with the standard Praat tool when the frame length of 40ms and time step of 10ms. Hence for the further analysis, the default values of frame length and time step are considered for evaluation of frequencies of Swaras and thereby identification of Raagas. IV. CLUSTERING ALGORITHM Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group (called a cluster) are more similar to each other than to those in other groups (clusters). The Raagas are identified using a Clustering Algorithm. Figure 2 represents the flowchart of the Clustering Algorithm program. Initially a music file is read using audioread function of MATLAB®. The audioread function is also used to extract the sampling rate of the recording. Using the standard Praat tool, the fundamental frequency (or frequency of the note Sa) is obtained and fed as input to the program. Using this frequency, the frequency values of all other twelve practical notes used in Carnatic music is computed. The range for each of the Swaras is computed, and the range of frequencies obtained is compared with the frequency obtained using an inbuilt Pitch Detection Algorithm. If the frequency matches, then that note is selected for clustering. Once all the seven notes are known, a process is written to know if the notes are repeated in the music input. After recognizing the notes for the second time, the notes detected are re printed on the console. Now the Clustering Algorithm has to map the Swaras into Raaga. To achieve this, an algorithm is implemented to find out the nearest neighbors between two sets of frequencies that are computed. The Swaras that match closely with the ouput of Clustering Algorithm is selected and the name of the Raaga is also printed. Figure 2: Flowchart of the Clustering Algorithm V. RESULTS AND DISCUSSION The computer detects few Raagas tested accurately with the aid of the Clustering Algorithm and the inbuilt Pitch Detection Algorithm. The computer clusters the Swaras based on frequencies that are plotted using the inbuilt Pitch Detection Algorithm. Swaras of the Raagas are given in the database. The system computes the nearest match by comparing the frequencies and displays the Raaga that was rendered. Figure 3 represents the time domain plot of Raaga Pantuvarali. Figure 3: Time domain plot of Raaga Pantuvarali Figure 4 represents the Frequency domain plot of Raaga Pantuvarali using inbuilt Pitch Detection Algorithm. By choosing frame length of 40ms and time step of 10ms, the frequency domain plot is obtained with the aid of an inbuilt Pitch Detection Algorithm. Table V. Tabulation of proficiency of the system to identify Raagas using the Clustering Algorithm Raaga tested Type of No. of No. of times Raaga times the Raaga Raaga is was tested identified correctly Pantuvarali Janaka 10 10 Shankaraabharana Janaka 10 8 Vachaspathi Janaka 10 9 Natabhairavi Janaka 10 5 Abhogi Janya 10 1 Kalyani Janaka 10 8 VI. CONCLUSION Figure 4: Frequency domain plot of Raaga Pantuvarali The implementation displays two prompt windows for user interface. Once the audio input is fed to the system as input, till the system computes all the frequencies, a prompt message is displayed requesting the user to wait. After all the computation, on the command window, the Swaras detected in the audio input are displayed. Using the Swaras detected, the nearest match Raaga is identified and that Raaga is displayed. This is achieved by computing the distances between the two vector sequences using predecessors. The same algorithm is used to re-verify the process of Swara identification to detect the Raaga. Figure 5 represents the results displayed on the command window. After displaying the results on the command window, one more prompt window is used for user interaction. In this prompt window the Raaga identified is displayed. A few Janaka Raagas and Janya Raagas are tested a number of times using audio input sung in different pitches. Table V represents the tabulation of proficiency of the system to identify Raagas using the Clustering Algorithm. Repetitive tests carried out using the system have identified Raagas with reasonably good accuracy. However, in some cases Swaras detected by the system were found not to be consistent. Some of the Swaras in the Raaga rendered were detected wrongly hence causing a problem for proper identification of Raagas. One of the potential causes for this could be the usage of Gamakas in Swaras that were rendered. When a Janya Raaga like Abhogi was tested, most of the times the system was identifying the Raaga as Kharaharapriya, which happens to be Janaka Raaga of Abhogi. Similar behavior was observed for such Janya Raagas. Hence there is a need for calibration in the code when the scenario changes from Janaka Raagas to Janya Raagas. VII. FUTURE SCOPE The implementation needs to be calibrated for both Janaka Raagas and all types of Janya Raagas so as to enhance versatility of the system and enhance its consistency to give more accurate results. Scope also exists for improvement in the system so as to ensure that Swaras rendered in the Raaga are detected more accurately so that displayed Swaras consistently & exactly match with the theoretical AarohanaAvarohana of the Raaga. References [1] [2] [3] [4] Figure 5: Results displayed on the command window Arvindh Krishnaswamy, “Application of Pitch Tracking to South Indian Music”, 2003 IEEE International Conference on Acoustics, Speech & Signal Processing, April 6-10, 2003,Hong Kong, pp. 389-392. Xuejing Sun, “A Pitch Detection Algorithm based on Subharmonic-toHarmonic Ratio”, Communciation Sciences and Disorders,1998, pp. 395-399. Vijay Kumar, Harit Pandya and C.V.Jawahar, “Identifying Ragas in Indian Music”, IEEE Computer society 22nd International Conference on Pattern Recognition, 2014, pp. 767-773 Kun Han and Wang, “Neural Network Based Pitch Tracking in Very Noisy Speech”, IEEE/ACM Transactions on Audio, Speech and Language Processing, Vol.22, No.12, December 2014, pp. 2158-2167.