Download Raaga Identification using Clustering Algorithm

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Pitch-accent language wikipedia , lookup

Transcript
International Conference on Electrical, Electronics, and Optimization Techniques (ICEEOT) - 2016
Raaga Identification using Clustering Algorithm
Krishna Pawan Kumar
Sukesh Rao M.
Department of Electronics and Communication Engineering
NMAM Institute of Technology, Nitte
Autonomous College under VTU, Belagavi
Karkala, Udupi-574110
[email protected]
Department of Electronics and Communication Engineering
NMAM Institute of Technology, Nitte
Autonomous College under VTU, Belagavi
Karkala, Udupi- 574110
[email protected]
Abstract— Music information retrieval is one of the very
important emerging trends in the field of speech signal
processing. Any music rendition comprises of a set of
Swaras, which lie on different pitch frequencies. Therefore
pitch detection plays an important role in identifying the
Raaga. In Pitch Detection Algorithm, Subharmonics-toHarmonics Ratio is used. The frequencies of different
Swaras are calculated using the Pitch Detection
Algorithm. A study of comparison between the inbuilt
Pitch Detection Algorithm and the standard music
analysis tool is used to obtain pitch frequency of Swaras.
A Clustering Algorithm is proposed to identify the nearest
matching Swaras and thereby identifying the Raaga. The
Clustering Algorithm is used to test several Raagas. The
results from these tests conducted are tabulated and
Raagas identified are found to be accurate.
Keywords—Pitch Detection Algorithm, Raaga Identification,
Music Information Retrieval, Subharmonics-to-Harmonics Ratio,
Clustering Algorithm
I. INTRODUCTION
Raagas are the melodic framework or fomalization of
melodies found in Indian Classical Music (Hindusthani and
Carnatic). In Carnatic music, there are 72 Melakartha Raagas1
or Janaka Raagas 2and numerous Janya Raagas 3derived from
the Janaka Raagas considering various combinations of
Swaras. Each Raaga therefore comprises of a sequence of
Swaras depicting the mood and sentiments. Melakartha
Raagas or Janaka Raagas have all seven Swaras4 (Sa, Ri, Ga,
Ma, Pa, Da, Ni) in the sequential order. These Swaras or notes
are called Saptha5 Swaras. These seven notes form an Octave.
The Melakartha Raagas are formed by permutation and
combination of the 16 theoretical Swaras used in the musical
system.
The sixteen Swaras are named as
1. Shadja6
1
Raagas having notes Sa, Ri, Ga, Ma, Pa, Da, Ni in the
sequential order
2
Synonym of Melakartha Raagas
3
Derived Raagas from Melakartha Raagas
4
Notes in music
5
Saptha means seven in Sanskrit
6
Fundamental frequency that says the pitch of the performer
978-1-4673-9939-5/16/$31.00 ©2016 IEEE
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
15.
16.
Shudhdha Rishabha
Chathushruthi Rishabha
Shatshruthi Rishabha
Shudhdha Gaandhaara
Saadhaarana Gaandhaara
Antara Gaandhaara
Shudhdha Madhyama
Prati Madhyama
Panchama
Shudhdha Daivatha,
Chathushruthi Daivatha
Shatshruthi Daivatha
Shudhdha Nishada
Kaishiki Nishada and
Kaakali Nishada.
The Swaras mentioned here are in the order of increasing
frequencies of a particular pitch frequency. Further, it is also
to be noted that of the sixteen theoretical Swaras, following
Swara pairs are of the same frequency:
1. Chathushruthi Rishabha and Shudhdha Gaandhara
2. Shatshruthi Rishabha and Saadhaarana Gaandhaara
3. Shudhdha Nishada and Chathushruthi Daivatha, and
4. Shatshruthi Daivatha and Kaishiki Nishada
Since the above mentioned pairs of Swaras lie on the same
frequencies, either of the two is selected to conclude a Raaga.
This means that a Raaga having Chathushruthi Rishabha, will
not have Shudhdha Gaandhaara. This condition applies to
other three pairs of Swaras mentioned above. Therefore
Swaras are limited from 16 theoretical Swaras to 12 practical
Swaras. The 12 Swaras would therefore be as below.
1. Shadja
2. Shudhdha Rishabha
3. Chathushruthi Rishabha
4. Saadhaarana Gaandhaara
5. Antara Gaandhaara
6. Shudhdha Madhyama
7. Prati Madhyama
8. Panchama
9. Shudhdha Daivatha
10. Chathushruthi Daivatha
11. Kaishiki Nishada and
12. Kaakali Nishada.
These 12 Swaras are used to identify the Raagas using a
suitable Clustering Algorithm. However some music scholars
speak about the 22 Swara system which was used during the
Vedic era. These 22 Swaras are called as 22 pitches or
Dwavimshathi7 Shruthi8s. These 22 Shruthis incorporate the
concept of Gamakas to differentiate between the different
Raagas. To understand the concept of 22 Shruthis, a novice
has to learn music for several years.
Raagas are defined by a set of ascending and descending
Swaras. These are called as Aarohana9 - Avarohana10. Only in
Melakartha Raagas, Aarohana-Avarohana will have the seven
sequential notes.
It is essential for a student of music to identify Raagas and
therefore needs to have a good knowledge of various Raagas
and their Aarohana-Avarohana patterns in order to identify
Raagas which they would be able to identify only after several
years of dedicated practice. However, a computer can easily
identify these Raagas using principles of digital signal
processing and with the aid of suitable Clustering Algorithm.
II. LITERATURE SURVEY
Raaga identification is a process of listening to a piece of
music, synthesizing it in to a sequence of notes, analyzing the
sequence of notes and comparing the results obtained with
that of the Aarohana-Avarohana of various Raagas in order to
identify the Raaga it follows. Support Vector Machines are
used to perform classification and the system is found to be
88% accurate [1].
Pitch, i.e., fundamental frequency, is an important feature in
speech research areas. Pitch determination remains one of the
most challenging problems in speech analysis. The most
common errors are pitch doubling and pitch halving due to the
appearance of Alternate Pulse Cycles (APC). Pitch detection
using Subharmonic-to-Harmonic Ratio (SHR) is a perception
oriented Pitch Detection Algorithm (PDA). This algorithm
effectively reduces the gross error rate resulting from
Subharmonics [2].
The identification of Raagas is cognitive, and comes only
after adequate amount of exposure to classical music. One can
identify a Raaga by finding its most prominent Swara by
counting the number of occurrences or the duration of each
Swara. Gamakas, the variation of pitch around a note can be
used to identify a raga as only certain type of variations are
allowed in each raga [3].
The pitch candidates or probabilities are connected into pitch
contours using dynamic programming or Hidden Markov
Models (HMMs). Deep Neural Network (DNN) and
Recurrent Neural Network (RNN) are also used to estimate
the posterior probabilities of pitch states for pitch tracking in
highly noisy speech. The supervised learning based approach
7
Dwavimshathi means 22 in Sanskrit
Musical terminology for pitch
9
Set of ascending Swaras as in, Sa Ri Ga Ma Pa Da Ni Sa’ (in
case of Melakartha Raagas)
10
Set of descending Swaras as in Sa’ Ni Da Pa Ma Ga Ri Sa
(in case of Melakartha Raagas)
8
produces strong pitch tracking results even in noisy conditions
[4].
III.
IMPLEMENTATION
Pitch detector is an essential component in speech processing
systems. Music is a highly structured system dictated by
specific rules for time, beat, rhythm, pitch and harmony. A
scale is a sequence of pitches or notes with a specific spacing
in frequency. Pitch is the smallest interval between two notes.
Fundamental frequency estimation is referred to as pitch
detection. In a music rendition, the fundamental frequency is
frequency of the note Shadja which is the lowest frequency as
compared to all other Swaras included in the AarohanaAvarohana of a Raaga.
A number of methods have already been proposed for pitch
detection of music signals. They are: Modified autocorrelation
method using clipping (AUTOC), Cepstrum based pitch
detection (CEP), Simplified inverse filtering technique
(SIFT), Data reduction method (DARD), Parallel processing
method (PPROC) etc. Pitch Detection Algorithm using
Subharmonic-to-Harmonic Ratio (SHR) is used to detect
frequencies of Swaras rendered in the recording.
Subharmonic-to-Harmonic Ratio means the amplitude ratio
between Subharmonics and Harmonics. When the ratio is
small, the perceived pitch remains the same. As the ratio
increases above certain threshold, the Subharmonics become
clearly visible on the spectrum, and the perceived pitch
becomes one octave lower than the original pitch.
Praat is a free scientific computer software package for speech
analysis ,developed by Paul Boersma and David Weenink of
University of Amsterdam. Praat is a very flexible tool to do
speech analysis. Praat tool is used as the standard music
analysis tool for this implementation and the results obtained
from the Pitch Detection Algorithm using SHR is compared
with the results of Praat and the best possible, near to the ideal
result from the inbuilt Pitch Detection Algorithm is
considered for evaluation.
The shrp.m is a MATLAB® exchange file that is used for
pitch detection of voice signals with and without APC’s. The
inbuilt Pitch Detection Algorithm obtained from MATLAB®
file exchange has the following set of parameters that are
crucial for analysis. They are: input data, sampling frequency,
frequency range (for male and female voice in case of vocal
signals), frame length and time step.
Input data is nothing but the music signal. Sampling
frequency is extracted from the input music signal and the
same sampling frequency is used for analysis. Frequency
ranges vary for male and female voices. For male voice, the
default range of frequency is from 50Hz to 500Hz. For female
voice, the default range of frequency is from 120Hz to 400Hz.
Table I. Frequency values obtained for frame length of 40ms and time
step of 10ms using inbuilt pitch detection as compared with Praat.
Swaras
Sa
Re1
Ga3
Ma1
Pa
Da1
Ni3
Frequencies
(Hz) from
Praat tool
133
134.8
164.1
175.7
197
205.1
252.5
Frequencies
(Hz) from
SHR tool
133.0614
137.4639
164.5236
174.4046
197.9144
203.3223
250.0679
Error
frequency
%age
error
0.0614
0.3361
0.4236
1.2954
0.9144
1.7777
2.4321
0.4616
0.2493
0.2581
0.7372
0.4620
0.866
0.96
Frame length is defined by the smallest time duration (in
milliseconds) for which pitch estimation needs to be done at
once. The time duration is converted into number of samples
to be subjected for analysis. The default value of frame length
is 40ms. Time step is defined as the interval for updating
short-term analysis in milliseconds. The default value of time
step parameter in shrp.m function is 10ms.
By varying frame length and time step but keeping the
frequency ranges constant for the music signals, the obtained
frequency values of the notes are compared with the standard
music analysis tool and the values are tabulated.
Figure1. represents the frequency plot for the Raaga
“Maayamaalavagaula” using Praat tool. The male audio voice
signal provided as input is a .wav file stored in the database
with sampling frequency 8 kHz and frequency range varying
from 50 Hz to 500 Hz.
By tabulating the values of frequencies obtained using a
standard music analysis tool and the inbuilt Pitch Detection
Algorithm using Subharmonics-to-Harmonics Ratio, the
values of frame length and time step that provide near to the
ideal values of frequencies are obtained. Both male and female
audio voice clippings are used for the analysis.
music analysis tool and Frequency error is computed. The
parameters of frame length and time step to be chosen are
dependent on the frequency error.
Figure 2. Plot for Raaga “Maayamaalavagaula” using the inbuilt Pitch
Detection Algorithm
Table II represents the frequency values obtained for frame
length of 30ms and time step of 10ms using inbuilt Pitch
Detection Algorithm as compared with Praat, by taking a male
voice as an input signal. When the frequency values are
compared, it is found that error percentage is significant for
Nishada in the above mentioned condition.
Table II. Frequency values obtained for frame length of 30ms and time
step of 10ms using inbuilt pitch detection as compared with Praat.
Swaras
Frequencies
Frequencies
Error
%age
(Hz) from
(Hz) from
frequency
error
Praat tool
SHR
Sa
133
132.76
0.24
0.1805
Re1
134.8
136.0014
1.0089
0.7484
Ga3
164.1
164.2820
0.182
0.1109
Ma1
175.7
174.1835
1.5165
0.8631
Pa
197
196.9135
0.0865
0.439
Da1
205.1
206.8970
1.797
0.876
Ni3
252.5
265.42
12.92
5.11
Table III represents the frequency values obtained for frame
length of 40ms and time step of 10ms using inbuilt Pitch
Detection Algorithm as compared with Praat, by taking a
female voice as an input signal.
Figure 1. Frequency plot for the Raaga “Maayamaalavagaula” using the
Praat tool
Figure 2 represents the time and frequency domain plot for
Raaga Maayamaalavagaula. The frequencies obtained here is
compared with the frequencies obtained from the standard
Table III. Frequency values obtained for frame length of 40ms and time
step of 10ms using inbuilt pitch detection as compared with Praat.
Swaras
Frequencies
Frequencies
Error
%age
(Hz) from
(Hz) from
frequency
error
Praat tool
SHR
Sa
206.2
206.3572
0.1572
0.0762
Re1
210.8
210.3020
0.498
0.2362
Ga3
256.3
256.4283
0.1283
0.05
Ma1
272.8
272.9942
0.1942
0.0712
Pa
310.8
309.4255
1.3745
0.4422
Da1
323.8
316.6327
7.1673
2.213
Ni3
387.3
384.8597
2.44030
0.63
Table IV represents frequency values obtained for frame
length of 40ms and time step of 10ms using inbuilt Pitch
Detection Algorithm as compared with Praat, by taking a
female voice as an input signal.
Table IV. Frequency values obtained for frame length of 30ms and time
step of 10ms using inbuilt Pitch Detection Algorithm as compared with
Praat.
Swaras
Frequencies
Frequencies
Error
%age
(Hz) from
(Hz) from
frequency
error
Praat tool
SHR
Sa
206.2
205.2972
0.9028
0.4378
Re1
210.8
210.0223
0.7777
0.3689
Ga3
256.3
255.2144
1.0856
0.4235
Ma1
272.8
272.6618
0.1382
0.051
Pa
310.8
311.4716
0.6716
0.2161
Da1
323.8
323.1433
1.0020
0.3095
Ni3
387.3
384.0283
3.2717
0.8447
From the tabulation, the frequency values obtained from the
inbuilt Pitch Detection Algorithm exactly matches with the
standard Praat tool when the frame length of 40ms and time
step of 10ms. Hence for the further analysis, the default values
of frame length and time step are considered for evaluation of
frequencies of Swaras and thereby identification of Raagas.
IV.
CLUSTERING ALGORITHM
Cluster analysis or clustering is the task of grouping a set of
objects in such a way that objects in the same group (called a
cluster) are more similar to each other than to those in other
groups (clusters).
The Raagas are identified using a Clustering Algorithm.
Figure 2 represents the flowchart of the Clustering Algorithm
program. Initially a music file is read using audioread function
of MATLAB®. The audioread function is also used to extract
the sampling rate of the recording.
Using the standard Praat tool, the fundamental frequency (or
frequency of the note Sa) is obtained and fed as input to the
program. Using this frequency, the frequency values of all
other twelve practical notes used in Carnatic music is
computed. The range for each of the Swaras is computed, and
the range of frequencies obtained is compared with the
frequency obtained using an inbuilt Pitch Detection
Algorithm. If the frequency matches, then that note is selected
for clustering.
Once all the seven notes are known, a process is written to
know if the notes are repeated in the music input. After
recognizing the notes for the second time, the notes detected
are re printed on the console. Now the Clustering Algorithm
has to map the Swaras into Raaga. To achieve this, an
algorithm is implemented to find out the nearest neighbors
between two sets of frequencies that are computed. The
Swaras that match closely with the ouput of Clustering
Algorithm is selected and the name of the Raaga is also
printed.
Figure 2: Flowchart of the Clustering Algorithm
V. RESULTS AND DISCUSSION
The computer detects few Raagas tested accurately with the
aid of the Clustering Algorithm and the inbuilt Pitch
Detection Algorithm. The computer clusters the Swaras based
on frequencies that are plotted using the inbuilt Pitch
Detection Algorithm. Swaras of the Raagas are given in the
database. The system computes the nearest match by
comparing the frequencies and displays the Raaga that was
rendered. Figure 3 represents the time domain plot of Raaga
Pantuvarali.
Figure 3: Time domain plot of Raaga Pantuvarali
Figure 4 represents the Frequency domain plot of Raaga
Pantuvarali using inbuilt Pitch Detection Algorithm. By
choosing frame length of 40ms and time step of 10ms, the
frequency domain plot is obtained with the aid of an inbuilt
Pitch Detection Algorithm.
Table V. Tabulation of proficiency of the system to identify Raagas using
the Clustering Algorithm
Raaga tested
Type of
No. of
No. of times
Raaga
times
the Raaga
Raaga is
was
tested
identified
correctly
Pantuvarali
Janaka
10
10
Shankaraabharana
Janaka
10
8
Vachaspathi
Janaka
10
9
Natabhairavi
Janaka
10
5
Abhogi
Janya
10
1
Kalyani
Janaka
10
8
VI. CONCLUSION
Figure 4: Frequency domain plot of Raaga Pantuvarali
The implementation displays two prompt windows for user
interface. Once the audio input is fed to the system as input,
till the system computes all the frequencies, a prompt message
is displayed requesting the user to wait.
After all the computation, on the command window, the
Swaras detected in the audio input are displayed. Using the
Swaras detected, the nearest match Raaga is identified and
that Raaga is displayed. This is achieved by computing the
distances between the two vector sequences using
predecessors. The same algorithm is used to re-verify the
process of Swara identification to detect the Raaga. Figure 5
represents the results displayed on the command window.
After displaying the results on the command window, one
more prompt window is used for user interaction. In this
prompt window the Raaga identified is displayed. A few
Janaka Raagas and Janya Raagas are tested a number of times
using audio input sung in different pitches. Table V represents
the tabulation of proficiency of the system to identify Raagas
using the Clustering Algorithm.
Repetitive tests carried out using the system have identified
Raagas with reasonably good accuracy. However, in some
cases Swaras detected by the system were found not to be
consistent. Some of the Swaras in the Raaga rendered were
detected wrongly hence causing a problem for proper
identification of Raagas. One of the potential causes for this
could be the usage of Gamakas in Swaras that were rendered.
When a Janya Raaga like Abhogi was tested, most of the
times the system was identifying the Raaga as
Kharaharapriya, which happens to be Janaka Raaga of
Abhogi. Similar behavior was observed for such Janya
Raagas. Hence there is a need for calibration in the code when
the scenario changes from Janaka Raagas to Janya Raagas.
VII. FUTURE SCOPE
The implementation needs to be calibrated for both Janaka
Raagas and all types of Janya Raagas so as to enhance
versatility of the system and enhance its consistency to give
more accurate results. Scope also exists for improvement in
the system so as to ensure that Swaras rendered in the Raaga
are detected more accurately so that displayed Swaras
consistently & exactly match with the theoretical AarohanaAvarohana of the Raaga.
References
[1]
[2]
[3]
[4]
Figure 5: Results displayed on the command window
Arvindh Krishnaswamy, “Application of Pitch Tracking to South Indian
Music”, 2003 IEEE International Conference on Acoustics, Speech &
Signal Processing, April 6-10, 2003,Hong Kong, pp. 389-392.
Xuejing Sun, “A Pitch Detection Algorithm based on Subharmonic-toHarmonic Ratio”, Communciation Sciences and Disorders,1998, pp.
395-399.
Vijay Kumar, Harit Pandya and C.V.Jawahar, “Identifying Ragas in
Indian Music”, IEEE Computer society 22nd International Conference on
Pattern Recognition, 2014, pp. 767-773
Kun Han and Wang, “Neural Network Based Pitch Tracking in Very
Noisy Speech”, IEEE/ACM Transactions on Audio, Speech and
Language Processing, Vol.22, No.12, December 2014, pp. 2158-2167.