Download Automatic Mood Classication of Indian Popular Music

Document related concepts

Cluster analysis wikipedia , lookup

Nonlinear dimensionality reduction wikipedia , lookup

K-nearest neighbors algorithm wikipedia , lookup

Transcript
Automatic Mood Classification of Indian
Popular Music
Dissertation
Submitted in partial fulfillment of the requirements
for the degree of
Master of Technology, Computer Engineering
by
Aniruddha M. Ujlambkar
Roll No: 121022001
under the guidance of
Prof. V. Z. Attar
Department of Computer Engineering and Information Technology
College of Engineering, Pune
Pune - 411005.
June 2012
Dedicated to
my mother, Smt. Manasi M. Ujlambkar, who has always emphasized the
importance of education, discipline, integrity and has been a constant source of
inspiration for me, my entire life
and
my father, Shri. Mukund K. Ujlambkar, who has always been my role model
for hard work, persistence, patience and always supported me open heartedly in
all my endeavors.
DEPARTMENT OF COMPUTER ENGINEERING AND
INFORMATION TECHNOLOGY,
COLLEGE OF ENGINEERING, PUNE
CERTIFICATE
This is to certify that the dissertation titled
Automatic Mood Classification of Indian
Popular Music
has been successfully completed
By
Aniruddha M. Ujlambkar
(121022001)
and is approved for the degree of
Master of Technology, Computer Engineering.
Prof. V. Z. Attar,
Guide,
Department of Computer Engineering
and Information Technology,
College of Engineering, Pune,
Shivaji Nagar, Pune-411005.
Date :
Dr. Jibi Abraham,
Head,
Department of Computer Engineering
and Information Technology,
College of Engineering, Pune,
Shivaji Nagar, Pune-411005.
Abstract
Music has been an inherent part of human life when it comes to recreation; entertainment and much recently, even as a therapeutic medium. The way music is
composed, played and listened to has witnessed an enormous transition from the
age of magnetic tape recorders to the recent age of digital music players streaming
music from the cloud. What has remained intact is the special relation that music
shares with human emotions. We most often choose to listen to a song or music
which best fits our mood at that instant. In spite of this strong correlation, most
of the music softwares present today are still devoid of providing the facility of
mood-aware play-list generation. This increases the time music listeners take in
manually choosing a list of songs suiting a particular mood or occasion, which can
be avoided by annotating songs with the relevant emotion category they convey.
The problem, however, lies in the overhead of manual annotation of music with
its corresponding mood and the challenge is to identify this aspect automatically
and intelligently.
The study of mood recognition in the field of music has gained a lot of momentum in the recent years with machine learning and data mining techniques
contributing considerably to analyze and identify the relation of mood with music. We take the same inspiration forward and contribute by making an effort to
build a system for automatic identification of mood underlying the audio songs by
mining their spectral, temporal audio features. Our focus is specifically on Indian
Popular Hindi songs. We have analyzed various data classification algorithms in
order to learn, train and test the model representing the moods of these audio
songs and developed an open source framework for the same. We have been successful to achieve a satisfactory precision of 70% to 75% in identifying the mood
underlying the Indian popular music by introducing the bagging (ensemble) of
random forest approach experimented over a list of 4600 audio clips.
iii
Acknowledgments
I express my deepest gratitude towards my guide Prof. V. Z. Attar for
her constant help and encouragement throughout the project work. I have been
fortunate to have a guide who gave me the freedom to explore on my own and at
the same time helped me plan the project with timely reviews and constructive
comments, suggestions wherever required. A big thanks to her for having faith
in me throughout the project and helping me walk through the new avenues of
research papers and publications.
I would like to thank Prof. A. A. Sawant, for the continuous support and
encouragement he extended through the enthusiastic discussions he used to have
very often with us and the insightful thoughts and ideas he used to share. I also
take this opportunity to thanks all those teachers, staff and colleagues who
have constantly helped me grow, learn and mature both personally and professionally throughout the process.
A BIG thanks goes to my dearest friends who have always supported, guided
and even criticized me, always for the right reasons and have helped me stay sane
throughout this and every other chapter of my life. I greatly value their friendship
and deeply appreciate their belief in me. Special thanks to all the new friends from
M.Tech. I have made without whom the journey wouldn’t have been so interesting
and memorable!
Most importantly, none of this would have happened without the love and
patience of my family - my parents, to whom this dissertation is dedicated. I
would like to express my heart-felt gratitude to my family.
Aniruddha M. Ujlambkar
College of Engineering, Pune
June 2012
iv
Contents
Abstract
iii
Acknowledgments
iv
List of Figures
viii
List of Tables
1 Introduction
1.1 Music and Mood . . . . . . .
1.2 Introduction to music features
1.3 Music and Data Mining . . .
1.4 Motivation . . . . . . . . . . .
1.5 Thesis Objective and Scope .
1.6 Thesis Outline . . . . . . . . .
ix
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
1
1
2
3
4
5
6
2 Literature Survey
7
2.1 Music Mood Model and Audio Features . . . . . . . . . . . . . . . . 7
2.2 Music classification . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3 Music Mood Model
3.1 Music Mood Relation . . . . .
3.2 Mood(Emotion) Models . . .
3.2.1 Hevner’s experiment .
3.2.2 Russell’s model . . . .
3.2.3 Thayer’s model . . . .
3.2.4 Indian Classical model:
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
Navras
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
12
12
13
13
14
14
16
4 Audio Features
17
4.1 Low level Audio Features . . . . . . . . . . . . . . . . . . . . . . . . 17
4.2
Feature List . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
5 Mining Mood from Audio Features
5.1 Overview of Data Mining . . . . . . . . . . .
5.2 Overview of Data Mining functionalities . .
5.3 Classification . . . . . . . . . . . . . . . . .
5.3.1 Classification using Decision-tree . .
5.3.2 Random Forest Classification . . . .
5.4 Random Forest Highlights . . . . . . . . . .
5.5 Our Approach: Bagging of Random Forests
5.5.1 Algorithm . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
22
22
23
25
26
28
29
30
30
.
.
.
.
.
.
32
32
33
34
34
35
36
.
.
.
.
.
.
38
38
38
39
39
40
40
8 Applications
8.1 Music Therapy Applications . . . . . . . . . . . . . . . . . . . . . .
8.2 Music Information Retrieval . . . . . . . . . . . . . . . . . . . . . .
8.3 Intelligent Automatic Music Composition . . . . . . . . . . . . . . .
45
45
45
46
6 Mood Identification System
6.1 Mood Model Selection . . . . . . .
6.2 System Overview . . . . . . . . . .
6.3 System Design and Components . .
6.3.1 Audio Pre-processor . . . .
6.3.2 Audio Feature Extractor . .
6.3.3 Mood Identification System
7 Experiments and Results
7.1 Experimental Setup . . . . .
7.1.1 Data Collection . . .
7.1.2 Data pre-processing .
7.1.3 Training and Testing
7.2 Results . . . . . . . . . . . .
7.2.1 Evaluation Metrics .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
9 Conclusion and Future Work
47
9.1 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
9.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
10 Project Milestones
49
10.1 Project Schedule . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
vi
10.2 Publications’ status . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
Bibliography
51
vii
List of Figures
3.1
3.2
3.3
3.4
Hevner’s Mood Model . . . . . . . . . .
Russell’s Mood Model . . . . . . . . . .
Thayer’s Mood Model . . . . . . . . . .
Navras: Indian Classical emotion model
4.1
Audio Features Taxonomy . . . . . . . . . . . . . . . . . . . . . . . 18
5.1
5.2
5.3
5.4
Data Mining in Knowledge discovery
Data Mining disciplines . . . . . . . .
Classification process . . . . . . . . .
Classification using Decision Tree . .
6.1
6.2
Mood Recognition System . . . . . . . . . . . . . . . . . . . . . . . 34
Mood Detection System: Detailed Design . . . . . . . . . . . . . . . 35
7.1
7.2
7.3
7.4
Area under ROC statistics
Recall statistics . . . . . .
Precision statistics . . . .
F-measure statistics . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
14
15
15
16
22
23
26
27
42
42
43
43
List of Tables
6.1
Mood Model: Indian popular Hindi music . . . . . . . . . . . . . . 33
7.1
7.2
Experimental Results on Test Dataset of 2938 music clips . . . . . . 44
Experimental Results on Test Dataset of 2938 music clips . . . . . . 44
10.1 Weekly Schedule of Project Starting 1 July, 2011 . . . . . . . . . . 49
10.2 Paper publications’ status . . . . . . . . . . . . . . . . . . . . . . . 50
Chapter 1
Introduction
1.1
Music and Mood
THE well-known German philosopher Friedrich Nietzsche” once quoted a famous
line: “Without music, life would be a mistake”. Music has always been an inherent
part of recreation of human life. Music is not just useful for entertainment, but
studies have shown that listening the right music does play an important role in
healing, rejuvenating and even inspiring human mind in difficult conditions such as
is widely studied and demonstrated by the field of Music Therapy [27]. With the
rapidly increasing technology and the new advent in latest multimedia gadgets,
music has reached almost every individual’s personal gadget may it be a laptop,
music player or a cell phone. The music which in the olden days was limited
to live concerts, performances or radio broadcasts is now available at everyone’s
finger tips within few clicks. Music has thus become very easily accessible and
available. However, the music database is ever increasing and the list will go so
long that it would not be wrong to say that we might hear a couple or more
completely new and never-heard music pieces every single day. Today, the overall
music collection can count to a few millions of records in the whole world and still
continue to increase every day. With so much of variety of music easily available,
we humans do not always listen to the same type of music all the time. We have
our interests, favorite artists, albums, music type. To put simply, we have our
personal choices and more importantly, even our choice might differ from time to
time. This choice is very much naturally governed by our emotional state at that
particular instant. The relation between musical sounds and their influence on the
listeners’ emotion has been well studied and is evident from the much celebrated
papers such as that of Hevner [18] and Farnsworth [13]. The papers described
experiments which eventually substantiated a hypothesis that music inherently
1
1.2 Introduction to music features
carries emotional meaning.
Currently we can store, sort and retrieve our digital music files based on various
traditional music classification tags like Artist, Band(Group), Album, Movie and
Genre. However, choosing a song or music piece suiting our mood from a large
database is difficult and time consuming, since each of the mentioned parameter
cannot sufficiently convey the emotional aspect associated with the song. What
we need is an additional parameter or rather search filter here - Mood - which
signifies the emotion of that particular music piece. However, classifying music as
per its mood is a much harder task. The main reasons being:
• First, emotion or mood of music is very subjective. Human mood, surrounding environment, personality, cultural background can have an influence on
the perceived emotion of a particular music.
• Second, the adjectives describing emotion can be ambiguous. For instance
happy and refreshing may refer to the same song.
• Third, it is inexplicable as to how music arouses emotion. What intrinsic
quality of music, if any, creates a specific emotional response is still far from
well-understood [6].
1.2
Introduction to music features
As it is a well established fact that music indeed has an emotional quotient attached with it, it is very essential to know what are the intrinsic factors present
in music which associate it with a particular mood or emotion. A lot of research
has been done and still going on in capturing various features from the audio file
based on which we can analyze and classify a list of audio files. Audio features
are nothing but mathematical functions calculated over the audio data, in order
to describe some unique aspect of that data. In the last decades a huge number
of features were developed for the analysis of audio content. Dalibor Mitrovic and
team [7] have analyzed various state-of-the-art audio features useful for contentbased audio retreival.
Audio features were initially studied and explored for application domains like
speech recognition [29]. With upcoming novel application areas, the analysis of
music and general purpose environmental sounds gained importance. Different
research fields evolved, such as audio segmentation, music information retrieval
(MIR), and environmental sound recognition (ESR). Each of these areas developed its specific description techniques (features). Many audio features have been
2
1.3 Music and Data Mining
proposed in the literature for music classification. Different taxonomies exist for
the categorization of audio features. Weihs et al. [40] have categorized the audio
features into four subcategories, namely short-term features, long-term features,
semantic features, and compositional features. Scaringella [33] followed a more
standard taxonomy by dividing audio features used for genre classification into
three groups based on timbre, rhythm, and pitch information, respectively. Each
taxonomy attempts to capture audio features from certain perspective. Instead
of a single-level taxonomy, Zhouyu Fu and team [42] unify the two taxonomies
and present a hierarchical taxonomy that characterizes audio features from different perspectives and levels. From the perspective of music understanding, we
can divide audio features into two levels, low-level and mid-level features along
with top-level labels. Low-level features can be further divided into two classes
of timbre and temporal features. Timbre features capture the tonal quality of
sound that is related to different instrumentation, whereas temporal features capture the variation and evolution of timbre over time. Low-level features are the
basis description of the audio data, for instance, tempo, beats per minute and so
on. On the contrary, mid-level features are derived by using these basic features
to provide the music related technical understanding such as rhythm, pitch which
in turn is perceived by the humans as genre, mood, which form the top-level of
the taxonomy.
A wide range of audio features have been studied by specialists and experts
over the past many years and many of the features have been even standardized for
instance MPEG7 standards [28] which provide a list of low level audio descriptors
(features) and techniques and tools to extract the same. Audio feature extraction
process involves a lot of complex mathematical and signal processing to convert
the digital audio data into meaningful features represented by numbers (fixed
or variable dimensions). To name a few, following are examples of some low-level
audio features :- zero-crossing rate, magnitude spectrum, spectral centroid, spectral
roll-off and many more which will be discussed in detail in coming chapters.
With the increasing standardization and research in audio features, an effort
is always made to to obtain features, which are orthogonal and which provide
descriptions with a high variance for the underlying data.
1.3
Music and Data Mining
Data Mining is relatively young and interdisciplinary field in computer science
which deals with analyzing and discovering interesting and useful patterns from
3
1.4 Motivation
large data sets. This field involves various tasks for analyzing data out of which
the most important task relevant in the context of our work is :- Classification.
Classification task involves generalizing a known structure or pattern among the
available data already assigned some specific class or label. This generalized pattern can then be used to predict the class of a new and unknown data. On the
contrary, Clustering is also a data mining task of discovering groups and structures of data which are in some way similar to each other and differ in similar
way from other groups, without any prior knowledge of the structure of the data.
Music mood detection fits the criteria for a data mining problem as we have a huge
number of music pieces each with few hundred audio features associated with it.
Music emotion detection and classification has been studied and researched
before. Initially most of them adopted pattern recognition approach. Wang et
al. [39] extracted features from MIDI files and used a support vector machine
(SVM) to classify music into six classes: joyous, robust, restless, lyrical, sober,
and gloomy. High classification accuracy was reported; however, one cannot easily
transcribe real world music into symbolic form, as done in MIDI files. Li et
al. [21] divided emotion into thirteen categories and combined them into six classes.
Then, they adopted MARSYAS [25] in their system to extract music features from
acoustic data and used SVM to train and recognize music emotion. Liu et al. [22]
presented a hierarchical mood recognition system, which uses a Gaussian mixture
model (GMM) to represent the feature dataset and a Bayesian classifier to classify
music clips. Byeong-jun Han proposed a music emotion recognition system using
support vector regression and using the Thayers emotion model [37].
Overall, may it be genre classification, instrument classification or even mood
classification, data mining techniques, especially classification techniques have
proved much effective in analyzing music and categorizing it.
1.4
Motivation
The way we choose a song to listen is very much restricted by what search options
we are provided with by the underlying software of the music device. Today we
can search a song by tags like Name, Artist, Album, and Genre. After years of
research and study, it was an established fact that human emotions are influenced
by music. Hence, it was high time to introduce a parameter Mood for annotating
an audio file so that users can search the list of songs relating to their mood at that
instant. The idea is great, but the question is, how the music will be annotated
with this Mood tag? It has been observed that the search tags (like Genre/Mood)
4
1.5 Thesis Objective and Scope
are most of the times edited manually. This is prone to lot of human errors and
needs a better solution.
This is where we intend to contribute so that the mood can be automatically
detected for a given audio file and no manual intervention needed to annotate
a song with particular mood. This can not only reduce manual effort, but also
minimize to a certain extent the human errors associated with it. This in turn
can thus help users to organize their music according to their moods instead of
remembering and searching for a particular artist or album name that contained
the song of that particular mood. More interesting fact observed was, a great deal
of work has been done on non-Indian music so far. So we find it a challenging
task to see how we can utilize the relevant work done till now and take it further
to Indian music mood recognition with an intention to provided a much enriched
user experience but also contribute for the good of digital music community.
1.5
Thesis Objective and Scope
Most of the experimentation done in the field of music mood recognition has been
observed with respect to non-Indian music. Music being subjective to cultural
backgrounds, it is but natural that Indian Music might need a different treatment
as compared to non-Indian music. Our goal is to develop a music emotion recognition system for Indian popular music by analyzing the relation of timbral, spectral
and temporal features of audio file with the emotion represented by the audio file.
The main goals of this thesis can be stated as:a. To build an Automatic mood recognition system for Indian popular songs.
b. To develop an open-source framework that can help analyze and experiment
music data with various machine learning, data mining algorithms.
In order to achieve these ultimate goals, we have laid down a list of sub-goals
that we together help achieve the end objective:
a. Identifying the various moods associated with Indian popular music and
finalize a mood model.
b. Identifying and finalizing the set of audio features important from mood
perspective.
c. Identifying and Developing tools required for extracting audio features.
5
1.6 Thesis Outline
d. Identifying and developing the Data Mining technique to construct the mood
classification model.
e. Design, implementation and testing of the framework integrating the whole
process of Mood classification and prediction.
The scope of the work is limited to Indian popular Hindi music.
1.6
Thesis Outline
The rest of the paper is organized as follows: In Chapter 2 we give a brief description of the important papers and literature that we have studied or utilized as a
part of our literature survey. In Chapter 3, we discuss the about our mood model.
Chapter 4 explains the various features associated with music and the feature set
important from the perspective of this project. In Chapter 5, Data mining and
its use in mining mood from music is discussed. Chapter 6 puts forth the overall
design of the Music Mood identification system followed by Chapter 7 discussing
the experiments and corresponding results obtained for the performance of the
system. Chapter 8 gies an overview of the possible applications and use of this
project. Chapter 9 outlines the conclusion and future work related to this project.
Finally chapter 10 lists the various project milestones and the publications’ status
that resulted during the course of the project.
6
Chapter 2
Literature Survey
The research and study behind this topic can be subdivided into 3 different subfields:
• First: Mood model, which involves identifying and defining the list of
adjectives precisely describing all possible moods.
• Second: Audio features identification and extraction, which involves
identifying and extracting the essential features from an audio file applying
signal processing algorithms and techniques in order to analyze the file.
• Third: Mining, machine learning algorithm, which involves learning
and choosing the appropriate algorithm(s) that help to mine efficiently the
music datasets with substantial accuracy.
2.1
Music Mood Model and Audio Features
Various experts in the fields of psychology, musicology have come with models
describing human emotions. One of the most ancient of the experiments done by
Hevner [18] helped in categorizing various adjectives into 8 different groups each
representing a class of mood. The model was more of a categorical model wherein
a list of adjectives representing same emotion were grouped together. Russell [31]
later came up with the circumplex model representing human emotions on a circle
with each mood category plotted within the circle separated from other categories along the polar co-ordinates. Thayer [37] too came up with a dimensional
model plotted along two axes (Stress versus energy) with mood represented by
a two-dimensional co-ordinate system and lying on either of the two axes or the
four quadrants formed by the two-dimensional plot. Details of these models are
discussed in further chapters.
7
2.1 Music Mood Model and Audio Features
JungHyun Kim and team [20] proposed an Arousal-Valence (A-V) based mood
classification model for music recommendation system. The collected music mood
tags and A-V values from 20 subjects were analyzed and the A-V plane was classified into 8 regions depicting mood by using k-means clustering algorithm. Their
work shows that some regions on the A-V plane can be identified by representative mood tags like previous mood models, but some mood tags are overlapped in
almost all regions.
Akase and group [1] discuss an approach for the feature extraction for audio
mood classification. In this task the timbral information has been widely used,
however many musical moods are characterized not only by timbral information
but also by musical scale and temporal features such as rhythm patterns and
bass-line patterns. Their paper proposed the extraction of rhythm and bass-line
patterns, and these unit pattern analysis are combined with statistical feature
extraction for mood classification. In combination with statistical features including MFCCs and musical scale feature, the effectivity of the features was verified
experimentally.
McKay and team [24] developed ”jAudio” an open-source audio feature extraction framework that includes implementations of 26 core features, including both
features proven in MIR research and more experimental perceptually motivated
features. jAudio places an even greater emphasis on implementations of metafeatures and aggregators that can be used to automatically generate many more
features from core features (for instance , standard deviation, derivative, running
mean etc.) that can be useful for music analysis. The tool has been quite useful
and widely accepted for music analysis research.
Dalibor Mitrovic and team’s work [7] deals with the statistical data analysis of
a broad set of state-of-the-art audio features and low-level MPEG-7 audio descriptors. The investigation comprises of data analysis to reveal redundancies between
state-of-the-art audio features and MPEG-7 audio descriptors. The work employs
Principal Components Analysis, which reveals low redundancy between most of
the MPEG-7 descriptor groups. However, there is high redundancy within some
groups of descriptors such as the BasicSpectral group and the TimbralSpectral
group. Redundant features capture similar properties of the media objects and
should not be used in conjunction. The paper provides a good insight on the
choice of audio features for analysis.
8
2.2 Music classification
2.2
Music classification
Doris Baum and team introduce EmoMusic [3] in their paper which presents a user
study on the usefulness of the “PANAS-X” emotion descriptors as mood labels
for music. It describes the attempt to organize and categorize music according
to emotions with the help of different machine learning methods, namely Selforganizing Maps and Naive Bayes, Random Forest and Support Vector Machine
classifiers. The study showed that emotions may very well be derivable in an
automatic way, although the procedure certainly can be refined further. Naive
Bayes and Random Forest classifiers can, for instance, be used to predict the
emotion of a piece of music with reasonable success.
Z. Fu and team [14] provide a comprehensive review on audio-based classification in their paper. It systematically summarizes the state-of-the-art techniques
for music classification along with the recent progress information in this field.
This survey emphasizes on recent development of the techniques and discusses
several open issues for future research. The survey has provided an up-to-date
discussion of audio features and classification techniques used in the literature. In
addition, the individual tasks for music classification and annotation also reviewed
and identified both task-specific issues.
K. C. Dewi and A. Harjoko [10] put forth the music classification system based
on mood parameters using K-Nearest Neighbor classification method and Self
Organizing Map. The mood parameters used is based on Robert Thayer’s energystress Model. Features that are used are rhythm patterns of the music. Classification of music based on mood parameters by the method of K-Nearest Neighbor
and Self Organizing Map with 30 songs reached an accuracy of 66.67%. Classification of music based on mood parameters with 120 songs reached 73.33% accuracy
for K-Nearest Neighbor methods and 86.67% for Self-Organizing Map method.
B. Han and group [16] proposed SMERS - A Music Emotion Recognition
using Support Vector Regression. In their proposed paper, automatic emotion
recognition of music has been evaluated using various machine learning classification algorithms such as SVM, SVR and GMM with remarkable increase in accuracy using SVR as compared to GMM. For further research, the paper suggests
more perceptual features should be considered and other classification algorithms
such as fuzzy and kNN (k-Nearest Neighbor). The paper also suggests comparing
the result of machine learning based emotion recognition with human performed
arousal/valence data.
Chia-Chu Liu and team [6] presented an emotion detection and classification
9
2.2 Music classification
system for pop music. The system extracts feature values from the training music
files by PsySound2 and generates a music model from the resulting feature dataset
by a classification algorithm. The system is designed using a hierarchical framework followed by an accuracy enhancement mechanism. The experimental results
show that the system gives satisfactory performance. Furthermore, the system
aims at popular music, so it can be applied to public music database software to
provide emotion-based search. The features that affect the perception of emotion
are associated with frequency centroid, spectral dissonance and pure tonalness.
The paper suggests finding out the deeper relation between these features and
music emotion in order to have a more accurate music mood classification.
T. Li and M. Ogihara [21] discussed SVM-based multi-label classification method
for two problems: classification into the thirteen adjective groups and classification
into the six super-groups. The experiments showed an overall low performance
which can be attributed to the fact that there were numerous borderline cases
for which the labeler found it difficult to make decision. Experiments show that
emotion detection is a rather difficult problem and improvement of performance
is the immediate issue. This can be resolved by: expanding the sound data sets,
collecting labeling in multiple rounds.
Trung-Thanh Dang and Kiyoaki Shirai [8] proposed the classification of moods
of songs based on lyrics and meta-data, and proposed several methods for supervised learning of classifiers. The training data was collected from a LiveJournal
blog site in which each blog entry is tagged with a mood and a song. Then three
kinds of machine learning algorithms were applied for training classifiers: SVM,
Nave Bayes and Graph-based methods. The results showed the accuracy of mood
classification methods is not good enough to apply for a real music search engine
system. There are two main reasons: mood is a subjective meta-data; lyric is short
and contains many metaphors which only human can understand. The authors
hence planned to integrate audio information with lyric for further improvement.
As per the contribution of Atin Das and Pritha Das [9], explained in their
paper, some of the prevailing classifications of Indian songs were quantified by
measuring their fractal dimension. Samples were collected from three categories:
Classical, Semi-classical, and Light. After appropriate processing, the samples
were converted into time series datasets and their fractal dimension was computed.
The analysis presented here can be generalized to categorize different types of
songs. Samples can be chosen from playing a prerecorded song or directly from the
recorder device. Samples would be filtered to remove sounds from accompanying
musical instruments to get only the sound of the voice. In the present case this was
10
2.3 Summary
done manually and the length of music pieces used was not sufficient to accurately
classify the songs.
2.3
Summary
The literature survey helped us gain a better insight with reference to the mood
analysis of music, various techniques used for the same along with their current
performance limitations and corresponding improvement suggestions given by respective authors. It is clearly evident that lots of serious work has been going
on for automatic mood identification and music analysis. Observing the work
done so far, it is seen that Data Mining and Machine Learning techniques have
played a good deal of part in learning music data. The fact still remains; the
accuracy achieved so far needs more improvement from learning and identification
perspective which calls for better algorithms and techniques. It is also seen that
classification techniques have been much prevalent and performed well in mining
music data as compared to clustering techniques and we too prefer the former.
The striking and most important finding from the survey, which is worth noting,
is that much of the music research has been done on non-Indian music. Although
some work has been done on Indian Classical music, but not been explored to
an extent as much as compared non-Indian Music with respect to mood. Indian
Popular Music accounts for almost 72% of the music sales in India, which shows
its immense popularity among the people. Identifying the lack of mood-based
categorizers and the growing popularity and use of Indian popular music, we take
this opportunity to develop an automatic mood recognition system for Indian popular music by analyzing existing classification mining techniques and developing a
novel approach to automatically categorize the songs belonging to Indian Popular
Music, according to their underlying mood.
11
Chapter 3
Music Mood Model
Most of the literature dealing with music and psychology tells us that music mood
is subjective and the mood of the same music piece can be interpreted differently
by different individual. However, it is seen that there are considerable agreements
about the moods underlying the music belonging to similar cultural context [30].
Thus music belonging to similar cultural background has a better chance of consensus among the individuals in interpreting the mood of the song. Our work
limits the scope to India popular music which falls under a common cultural context thus increasing the chances of similar interpretations of the music among the
individuals, when it comes to understanding mood.
In order to classify songs according to their mood, its essential to identify
the list of moods which a song can be categorized into. This chapter explores
the various mood models that have been proposed and proven constructive in
categorizing music as per the emotions.
3.1
Music Mood Relation
Music psychology studies on music mood have a number of fundamental generalizations that can benefit MIR research as mentioned below:
• There does exist mood effect in music and studies have confirmed the existence of functions of music which can change people’s mood [5]. Also, it
comes naturally to associate mood labels with the music the listeners listen
to [36]
• Not all moods are equally likely to be aroused by listening to music. For
instance emotions like sadness, happiness, peace have a very high probability
of getting induced through music as compared to that of anger or disgust.
12
3.2 Mood(Emotion) Models
• There do exist uniform mood effects among different people. Sloboda and
Juslin [36] summarized that listeners are often consistent in their judgment
about the emotional expression of music.
• There is definitely some correspondence between listeners judgment regarding mood and the musical parameters such as tempo, rhythm, dynamics,
pitch, mode, beats, harmony etc. [36]. People do relate to the tune or
rhythm of a song and do most of the times hymn the tune.
3.2
Mood(Emotion) Models
Mood Models are generally studied by two approaches:• Categorical approach: This introduces distinct classes of moods which form
the basis for all other possible emotional variations.
• Dimensional approach: This classifies emotions along several axes such as
valence (pleasure), arousal (activity), potency (dominance) and so on. This
is generally the most commonly used approach in music applications
Human psychologists have done a great deal of work and proposed a number
of models on human emotions. Musicologists have too adopted and extended a
few of the influential models that we will be navigating through. The six universal
emotions defined by Ekman [12]: anger, disgust, fear, happiness, sadness, and
surprise, are well known in psychology. However, since they were designed for
encoding facial expressions, some of them may not be suitable for music (for
instance, disgust), and some common music moods are missing (for instance, calm
or soothing).
3.2.1
Hevner’s experiment
In music psychology, the earliest and still best known systematic attempt at creating music mood taxonomy was by Kate Hevner [18]. Hevner examined the affective
value of six musical features such as tempo, mode, rythm, pitch , harmont and
melody and studied how they relate to mood. Based on the study 67 adjectives
were categorized into eight different emotional groups with similar emotions. The
Figure 3.1 shows the emotional groups with adjectives belonging to each group.
13
3.2 Mood(Emotion) Models
merry
humorous
lyrical
dreamy
joyous
playful
leisurely
yielding
gay
whimsical
satisfying
tender
happy
fanciful
serene
sentimental
cheerful
quaint
tranquil
longing
bright
sprightly
quite
yearning
pathetic
delicate
soothing
pleading
sad
light
exhilarated
plaintive
mournful
graceful
triumphant
spiritual
tragic
vigorous
dramatic
lofty
melancholy
robust
passionate
inspiring
frustrated
emphatic
sensational
dignified
depressing
martial
agitated
sacred
gloomy
ponderous
excited
solemn
heavy
majestic
impetuous
sober
dark
exalting
restless
serious
Figure 3.1: Hevner’s Mood Model
3.2.2
Russell’s model
Both Ekmans and Hevners models belong to Categorical Model” because the mood
spaces consist of a set of discrete mood categories.On the contrary, James Russell [31] came up with a circumplex model of emotions arranging 28 adjectives in
a circle on two-dimensional bipolar space (arousal - valence). This model helped
in separating and keeping away the opposite emotions. Figure 3.2 depicts the
Russell’s model which has been adopted in a considerable number of musical psychology studies [31] [35] [38].
3.2.3
Thayer’s model
Yet another well-known dimensional model was proposed by Thayer [37]. It describes the mood with two factors: Stress dimension (happy/anxious) and Energy
dimension (calm/energetic), and divides music mood into four clusters according
to the four quadrants in the two-dimensional space: Contentment, Depression, Exuberance and Anxious (Frantic). In this model, Contentment refers to happy and
calm music; Depression refers to calm and anxious music; Exuberance refers to
happy and energetic music; and Anxious (Frantic) refers to anxious and energetic
music. Such definitions of the four clusters are clear and with high discriminatory
power. Such a dimensional mood model which divides the whole music emotion
14
3.2 Mood(Emotion) Models
900
Alarmed
Aroused
Astonished
Tense
Afraid
Angry
Excited
Annoyed
Distressed
Frustrated
Delighted
Happy
1800
Pleased
Miserable
Glad
Sad
Serene
At ease Content
Satisfied
Relaxed
Calm
Gloomy
Depressed
Bored
Droopy
Tired Sleepy
2700
Figure 3.2: Russell’s Mood Model
High Energy
Anxious
Exuberance
+ve Stress
-ve Stress
Depression
Contentment
Low Energy
Figure 3.3: Thayer’s Mood Model
15
00
3.2 Mood(Emotion) Models
Krauna
Shringar
Veer
(Pathos)
(Love)
(Valor)
Hasya
Raudra
Bhayanak
(Happy)
(Angry)
(Horrific)
Vibhatsa
Adbhut
Shaanti
(Disgust)
(Surprise)
(Peace)
Figure 3.4: Navras: Indian Classical emotion model
space into four meaningful quadrants facilitates rough music mood categorization
and thus is also widely adopted in mood recognition studies. Figure 3.3 depicts
the Thayers model for mood.
3.2.4
Indian Classical model: Navras
Since we are considering the analysis of Indian Music, we need to have a look at
the traditional mood model that is prevalent since ancient times in the Indian
Classical Music which forms the base for India music. Navras as it is termed in
Sanskrit, it means nine sentiments. This model sums up all the major categories of
emotions that a human can exhibit into total nine classes. These nine sentiments
are depicted in the Figure 3.4.
Studying the advantages and short-coming of various models learned so far and
taking into consideration the Indian popular music scenario, deriving one of the
mood models exactly from the existing ones mentioned in the literature cannot do
justice in selecting the mood categories. Hence, we try to put forth a simple mood
model covering majority of mood aspects after careful study and experiments as
would be witnessed in coming chapters.
16
Chapter 4
Audio Features
4.1
Low level Audio Features
The key components of a classification system are feature extraction and classifier
learning [11]. Feature extraction addresses the problem of how to represent the
music pieces to be classified in terms of feature vectors or pair-wise similarities.
Many audio features have been proposed in the literature for music classification. Different taxonomies exist for the categorization of audio features. Weihs et
al. [40] has categorized the audio features into four subcategories, namely shortterm features, long-term features, semantic features, and compositional features.
Scaringella [33] followed a more standard taxonomy by dividing audio features
used for genre classification into three groups based on timbre, rhythm, and pitch
information, respectively. Each taxonomy attempts to capture audio features from
certain perspective. Zhouyu Fu [42] characterizes the audio features into two levels: low-level and middle-level features as seen in Figure 4.1. Our audio feature
selection is inspired by this two-tier taxonomy of audio features.
Low-level features although not closely related to the intrinsic properties of
music as perceived by human listeners, form the basic features which can be used
to derive the mid-level features providing a closer relationship and include mainly
three classes of features, namely rhythm, pitch, and harmony as seen in the Figure
4.1 . In our work we focus only on the low-level audio features which can be further
categorized as :• Timbral features: These capture the tonal quality of sound that is related
to different instrumentation. “Timbre” is the quality of a musical note or
sound or tone that distinguishes different types of sound production, such as
voices and musical instruments, string instruments, wind instruments, and
17
4.1 Low level Audio Features
Top level labels
Genre
Mood
Instrument
(User perspective)
Artist,
Style, Other
Semantic Gap
Mid-level features
Pitch
Rythmn
PH/PCP, EPCP
Harmony
CP, CH
BH, BPM
Timbre
Temporal
Low-level features
ZCR, SC, SR, MFCC, SF ...
SM, ARM, FP, AM, ...
Figure 4.1: Audio Features Taxonomy
percussion instruments. The physical characteristics of sound that determine
the perception of timbre includes spectrum and envelope.
• Temporal features: These capture the variation and evolution of timbre over
time. In this work, more focus is laid on the instantaneous timbre values
rather than their temporal variation, although not completely ignored.
These low-level features are extracted using various signal processing techniques like Fourier transform, Spectral/Cepstral analysis, autoregressive modeling
and similar computations. We follow the MPEG-7 standardization [28] and make
use of the jAudio [24] and Marsyas [25] open source tools for extracting selective
timbral spectral and temporal audio features from the music pieces. The features are extracted and consolidated for each music piece in a standard Attributerelation file format (ARFF) [2] so as to make it easy for mining the relations
between these features with respect to the corresponding mood of the audio files.
After a careful study and survey of various experts papers and publications, our
current consolidated list of selected and extracted features is as mentioned below.
The list below names just each form of audio feature, but the feature vector is
comprised of its actual value as well as corresponding meta-features such as standard deviation, mean, logarithm wherever required as identified by McKay and
team [24].
18
4.2 Feature List
4.2
Feature List
• Root Mean Square (RMS): RMS is calculated on a per window basis.
It is defined by the equation:
s
R.M.S. =
PN
n
x2n
N
where N is the total number of samples provided in the time domain. RMS
is used to calculate the amplitude of a window.
• Magnitude Spectrum: This feature extracts the FFT (Fast Fourier Transform) magnitude spectrum from a set of audio samples. It gives a good idea
about the magnitude of different frequency components within a window.
The magnitude spectrum is found by first calculating the FFT with a Hanning window [give ref]. The magnitude spectrum value for each bin is found
by first summing the squares of the real and imaginary components. The
square root of this is then found and the result is divided by the number of
bins.
• Power Spectrum: This feature extracts the FFT power from a set of audio samples. It gives a good idea about the power of different frequency
components within a window.
• Spectral Roll-off Point [15]: The spectral roll-off point is the fraction
of bins in the power spectrum at which 85% of the power is at lower frequencies. It denotes the amount of the right-skewness of the power spectrum
• Spectral Centroid [15]: This is a measure of the ”center of mass” of the
power spectrum. It is obtained by calculating the mean bin of the power
spectrum. The result returned is a number from 0 to 1 that represents at
what fraction of the total number of bins this central frequency is.
• Spectral Flux: It measures the amount of spectral change of a signal by
calculating the difference between the current value of each magnitude spectral bin in current window and the corresponding value of the magnitude
19
4.2 Feature List
spectrum of the previous window. Each of these differences is then squared,
and the result is the sum of the squares.
• Spectral Variability: It stands for the standard deviation of the magnitude spectrum of the audio signal.
• Fraction of low energy windows [26]: This measures the quietness of
the signal relative to the rest of a signal and is calculated by taking the mean
of the RMS (Root Mean Square) of the last 100 windows and finding what
fraction of these 100 windows are below the mean.
• Zero Crossings [26]: This feature helps identify the pitch as well as the
noisiness of a signal. It is calculated by finding the number of times the
signal changes sign from one sample to another crossing the zero value.
• Strongest Beat: This feature finds the strongest beat in a signal.
• Beat sum: It is calculated by summing up the beat values of a signal and
gives a measure of how important a role regular beats play in a piece of music.
• Beat histogram: This feature helps to identify the strength of different
rhythmic periodicities in a signal. This is calculated by taking the RMS of
256 windows and then taking the FFT of the result.
• Strongest frequency via Zero Crossings: It denotes the highest frequency of the signal present at the Zero crossing point. This is found by
mapping the fraction in the zero-crossings to a frequency in Hertz.
• Mel Cepstral coefficients (MFCC):This feature constitutes the co-efficients
derived from the Cepstral representation of audio signal such that the frequency bands are equally spaced on the Mel scale approximating the human
auditory system’s response more closely. MFCCs are more commonly and
widely used as features in speech recognition systems. In recent times these
20
4.2 Feature List
features are increasingly finding uses in Music information retrieval, audio
similarity measures, genre classification.
• Linear Predictive Coding (LPC) coefficients: This feature helps in
representing the spectral envelope of an audio or speech signal.
• Spectral smoothness: This feature is calculated by evaluating the log of a
partial minus the average of the log of the surrounding partials and is based
upon Stephan McAdam’s Spectral Smoothness [23]. This feature helps in
identifying the peak based calculation of the smoothness of an audio signal.
• Relative difference function: It represents the onset detection and is
calculated as the log of the derivative of the Root Mean Square value.
• Mood: This is the class attribute that needs to be populated during the
training and which is detected automatically while testing the classifier
against a new audio file.
In the given list of audio features some features have a single dimension - for
instance, Strongest Beat, which has just one value - and on the contrary, some
features have variable dimensions - for instance Beat Histogram, which has a series
of values exhibiting the histogram. The variable dimension however depends on
the window size, which in case of this work has been kept constant to “32” for
every 30-second audio clip in the data-set. Hence, including the class attribute,
the data-set consists of a total of 330 feature vectors.
21
Chapter 5
Mining Mood from Audio
Features
5.1
Overview of Data Mining
Data mining is the field which deals with the extraction of interesting, non-trivial,
implicit, previously unknown and potentially useful patterns or knowledge from
huge amount of data. Data mining is often termed as knowledge discovery in
databases (KDD). A typical knowledge discovery process can be seen as depicted
in the Figure 5.1.
Data Mining can be considered as a confluence of various disciplines, including database systems, statistics, machine learning, visualization, and information science. Moreover, depending on the data mining approach used, techniques
from other disciplines may be applied, such as neural networks, fuzzy and/or
Figure 5.1: Data Mining in Knowledge discovery
22
5.2 Overview of Data Mining functionalities
Figure 5.2: Data Mining disciplines
rough set theory, knowledge representation, inductive logic programming, or highperformance computing. Depending on the kinds of data to be mined or on the
given data mining application, the data mining system may also integrate techniques from spatial data analysis, information retrieval, pattern recognition, image
analysis, signal processing, computer graphics, Web technology, economics, business, bio-informatics, or psychology. Figure 5.2 depicts few of these prominent
disciplines closely associated with Data mining.
5.2
Overview of Data Mining functionalities
Data mining functionalities are used to specify the kind of patterns to be found
in data mining tasks. In general, data mining tasks can be classified into two
categories: descriptive and predictive. Descriptive mining tasks characterize the
general properties of the data in the database. Predictive mining tasks perform
inference on the current data in order to make predictions. Since this work is
related to predicting the mood underlying a particular music piece, the focus of
the work is directed towards exploring the “predictive” mining tasks rather than
“descriptive” mining. Following are the various Data Mining functionalities that
exist formally with a short description of each:-
23
5.2 Overview of Data Mining functionalities
• Characterization and discrimination: Data characterization is a summarization of the general characteristics or features of a target class of data.
The data corresponding to the user-specified class are typically collected
by a database query. For example, to study the characteristics of software
products whose sales increased by 10% in the last year, the data related to
such products can be collected by executing an SQL query. Data discrimination is a comparison of the general features of target class data objects
with the general features of objects from one or a set of contrasting classes.
The target and contrasting classes can be specified by the user, and the corresponding data objects retrieved through database queries. The output in
both cases can in the form of pie-chart, bar graphs and similar constructs
for the analyst to study the data.
• Frequent patterns, Association, Correlation: Frequent patterns, as
the name suggests, are patterns that occur frequently in data. There are
many kinds of frequent patterns, including item-sets, subsequences, and substructures. The data under consideration can be analyzed for such frequently
occurring patterns of data attributes which leads to the discovery of interesting associations and correlations within data.
• Classification and prediction: Classification is the process of finding
a model (or function) that describes and distinguishes data classes or concepts, for the purpose of being able to use the model to predict the class
of objects whose class label is unknown. The derived model is based on
the analysis of a set of training data (i.e., data objects whose class label
is known). Whereas classification predicts categorical (discrete, unordered)
labels, prediction models continuous-valued functions. That is, it is used to
predict missing or unavailable numerical data values rather than class labels
• Cluster analysis:
Unlike classification and prediction, which analyze
class-labeled data objects, clustering analyzes data objects without consulting a known class label. In general, the class labels are not present in the
training data simply because they are not known to begin with. Clustering
can be used to generate such labels. The objects are clustered or grouped
based on the principle of maximizing the intraclass similarity and minimizing the interclass similarity. That is, clusters of objects are formed so that
objects within a cluster have high similarity in comparison to one another,
but are very dissimilar to objects in other clusters.
24
5.3 Classification
• Outliers analysis: A database may contain data objects that do not
comply with the general behavior or model of the data. These data objects are outliers. Most data mining methods discard outliers as noise or
exceptions.However, in some applicate ions such as fraud detection, the rare
events can be more interesting than the more regularly occurring ones. The
analysis of outlier data is referred to as outlier mining.
• Trend and evolution analysis: Data evolution analysis describes and
models regularities or trends for objects whose behavior changes over time.
Although this may include characterization, discrimination, association and
correlation analysis, classification, prediction, or clustering of time related
data, distinct features of such an analysis include time-series data analysis,
sequence or periodicity pattern matching, and similarity-based data analysis.
5.3
Classification
This work involves learning the mood aspect of music by analyzing the various
feature vectors extracted from each audio file. The learning done thus can facilitate
in identifying which specific category of mood a particular audio file belongs to,
provided its fixed set of audio features are available. Of all the functionalities of
data mining just described, “classification” and “cluster analysis” seem to be the
best methods of discovering the mood information from the music feature dataset. Also, as witnessed in most of the literature survey, classification algorithms
have always proved to be quite effective as compared to others in analyzing the
mood or genre aspect of music data-sets so far. Our own experimentation too has
revealed a quite higher performance of classification algorithms as compared to
clustering algorithms. Hence, we opt for the classification techniques of mining
this music feature data-set with a supervised learning approach.
The Figure 5.3 shows a general process of classification. It is a two step process
:• First: This is also called as a “learning step” or “training phase” which
involves learning of a mapping or a function y = f(X), that can predict
the associated class label y of a given tuple X. In this view, we wish to
learn a mapping or function that separates the data classes. Typically, this
mapping is represented in the form of classification rules, decision trees, or
mathematical formulae. This mapping or function is generally termed as
the “Classification Model”. As seen in the step 1 of Figure 5.3, each row of
25
5.3 Classification
Figure 5.3: Classification process
the table represents the tuple X. The function f(X) is learnt as a process of
training by using classification algorithms, and corresponding rule is stored
in the classifier model. This rule helps in predicting if the person represented
in the tuple X is tenured (yes) or not (no) depending upon the values of
various attributes of the tuple.
• Second: This model is used for classification. The model is evaluated against
the test data-set in order to predict the class label of each data instance as has
been learned from the model. The results are compared with actual classes of
the test data and accordingly decided whether the model is accurate enough
to classify the test data. If the model is acceptable, it can be used further
for classifying data with unknown classes. As seen in the step 2 of Figure
5.3, the classifier model evaluates over an unknown tuple X by applying the
function f(X) learnt in order to predict its outcome.
5.3.1
Classification using Decision-tree
Classification of data can be achieved by various methods, to name a few:• Classification by Decision Tree Induction
• Bayesian Classification
• Artificial Neural Networks
• Rule-Based Classification
• Classification by Back-propagation
26
5.3 Classification
Figure 5.4: Classification using Decision Tree
• Support Vector Machines
• Associative Classification
Since this work focuses more on a Decision-tree based classification approach,
the description of the rest of methods is outside the scope of the document although
relevant information can be found in the book published by Han Kamber [17].
Decision tree induction is the learning of decision trees from class-labeled training
tuples. A decision tree is a flowchart-like tree structure,where each internal node
(non-leaf node) denotes a test on an attribute, each branch represents an outcome
of the test, and each leaf node (or terminal node) holds a class label. The topmost
node in a tree is the root node. A typical example of a Decision tree is as shown
in the Figure 5.4.
Given a tuple, X, for which the associated class label is unknown, the attribute
values of the tuple are tested against the decision tree. A path is traced from the
root to a leaf node, which holds the class prediction for that tuple. Decision trees
can easily be converted to classification rules. Figure 5.4 represents a decision
tree to predict the class for sanctioning a credit (class values: yes, no) depending
upon various parameters of credit risk assessment like age, current credit rating
and profession. For instance, a senior with an excellent credit rating definitely has
higher chances of sanctioning as compared to the one who has comparatively fair
credit rating.
27
5.3 Classification
Why Decision trees?
Following are the few strong reasons why decision trees have been considered so
often when it comes to classification techniques [17]:
• The construction of decision tree classifiers does not require any domain
knowledge or parameter setting, and therefore is appropriate for exploratory
knowledge discovery.
• Decision trees can handle high dimensional data.
• Their representation of acquired knowledge in tree form is intuitive and
generally easy to assimilate by humans.
• The learning and classification steps of decision tree induction are simple
and fast.
• In general, decision tree classifiers have good accuracy.
• Decision tree induction algorithms have been successfully used for classification in many application areas, such as medicine, manufacturing and
production, financial analysis, astronomy, and molecular biology.
5.3.2
Random Forest Classification
In order to improve the classification accuracy, ensemble methods like bagging and boosting have been proved quite productive. Ensemble methods use
a combination of models of a series of k learned classification models , M1 , M2 ,
..., Mk , with the aim of creating an improvised model in terms of classification
accuracy. Our work makes use of the “Bagging” approach also called as “Bootstrap aggregation”. In this method, bootstrap samples of data-sets are created
by randomly sampling the features and data instances from the given training set
with replacement. These samples are then independently and simultaneously used
for training and learning classifier models separately for each sample. Finally, the
classification is done by considering the maximum of the votes taken from all the
models learnt. Random forest approach involves learning such ensemble consisting of a bagging of un-pruned decision tree learners with a randomized selection
of features at each split.This is done by randomly sampling a feature subset for
each decision tree (as in Random Subspaces [19]), and/or by randomly sampling
a training data subset for each decision tree (as in Bagging [4]).
28
5.4 Random Forest Highlights
Random Forests Algorithm
Following is a simplified algorithm explaining the Random forest approach
1
2
3
4
5
6
7
8
9
10
11
12
Data: Training set, Ntrees = Number of trees
Result: Majority vote of classification
initialization;
for i ← 1 to Ntrees do
Select a new bootstrap sample from training set;
Grow an un-pruned tree on this bootstrap;
for each internal node do
Mtry ← Random number of predictors;
Choose best split for these Mtry predictors;
end
Save the un-pruned tree built;
Record the vote of classification for each class;
Return the Majority vote;
end
Algorithm 1: Random forest
CART (Classification and Regression Tree) is chosen for building the Randomly generated trees as it is evident from the literature [4] that Random forests
generated from CART yield better results as compared to other tree algorithms
in most of the cases.
5.4
Random Forest Highlights
Random Forests have time and again proven useful and effective in many classification problem scenarios. Here are few highlights of this approach that make
it more appropriate and suitable for the purpose of mood classification of highdimensional music data-sets:• Random forests readily handle large number of classifiers.
• They are faster to train and evaluate as compared to other comparable
approaches.
• Random forests exhibit stronger resistance to over-training and thus overfitting.
• Separate Cross-validation is unnecessary in case of Random forests since it
is already taken care at the time of forest building.
29
5.5 Our Approach: Bagging of Random Forests
• Random forests generally have similar accuracy as Support Vector Machines,
Neural Networks although Random forests have shown much better performance in case of huge and high-dimensional data-sets.
5.5
Our Approach: Bagging of Random Forests
In this work we present an additional hierarchy of ensemble by generating an
ensemble of Random Forests using bootstrap aggregation also know as Bagging.
The Algorithm for the same can be explained as below.
5.5.1
1
2
3
4
5
6
7
8
Algorithm
Data: Training set, Nf orests = Number of forests
Result: Majority vote of classification
initialization;
for i ← 1 to Nf orests do
Select a new bootstrap sample from training set;
Generate a Random forest on this bootstrap with un-pruned random
tress as mentioned in Algorithm1;
Save the Random Forest built;
Record the majority vote of classification among the trees;
end
Return the majority vote of classification among the Random Forests;
Algorithm 2: Bagging of Random forest
For growing Random Trees, the randomly sampled data attributes are split
on the basis of “Gini Index” which has shown better results when working with
CART trees. Gini Index, basically measures the impurity of the data set D at
hand and is given by the formula:Gini(D) = 1 −
m
X
p2i
i=1
where pi is the probability that a tuple in D belongs to class Ci . The sum
is computed over m classes. The Gini index considers a binary split for each
attribute. For each split, a weighted sum of the impurity of each resulting partition
is calculate. Suppose dataset D is split into partition D1 and D2 , on the basis of
attribute A1 then the Gini Index of attribute A1 for splitting dataset D is given
by :-
30
5.5 Our Approach: Bagging of Random Forests
GiniA1 (D) =
|D1 |
|D2 |
Gini(D1 ) +
Gini(D2 )
|D|
|D|
The Gini Index is computed for all the elligible splitting attributes and the
reduction in impurity of Gini index for each attribute is calculated by the formula:∆GiniA1 = Gini(D) − GiniA1 (D)
The attribute maximizing the above mentioned reduction in impurity is selected as the splitting attribute.
The approach mentioned in Algorithm 2 has not only shown a rise in accuracy
of classification of music data-sets as compared to traditional Random forest approach, but also has shown a consistent better performance as compared to other
classification techniques as would be discussed in coming chapters.
31
Chapter 6
Mood Identification System
6.1
Mood Model Selection
As seen in the literature, the mood models studied were mostly from the perspective of psychology. The way the new dimensional models like Thayer’s Model [37]
and Russell’s model [31] were proposed, if we map music emotion on any of these
dimensional models, then the different emotions could be plotted with different coordinates on the two-dimensional plot, which would have to be grouped together
to get different categories of emotions. There is always a trade-off between the
number of emotions a mood model can portray. A very large number of different
moods can be confusing and frustrating for an end-user to choose a song belonging to one of these moods and a very small number will be too general to isolate
the basic emotions. Since most of the mood models in the literature have been
evaluated on non-Indian music, in this work we do consider the Indian perspective
explained by the nine sentiments (Navras) as mentioned in section 3.2.4.
Out of these nine emotions, however, emotions like anger, horrific, surprise
are seen very rarely seen described by just music alone. These emotions are a
combined effect of music, expression and act. Also, some emotions like Hasya
(Happiness) need a further subdivision, for instance, happy, excited. Hence this
model cannot be used as it is for analyzing the mood aspect of Indian popular
songs. It needs further changes as to how people interpret these songs.
A list of 2500 Indian popular songs which are well-known were chosen by
surveying the songs most liked by majority of the people. A short experiment
similar to Hevner’s [18] was conducted with the help of a panel of 5 music listeners
wherein each member of the panel was independently suppose to listen to a 30
second clip of each of the 2500 songs and note down the best adjective(s) that
they think describes the song emotion aptly. The panel constituted of one Music
32
6.2 System Overview
expert, two avid music listeners and two common music listeners. The adjectives
collected were then grouped together depending on the similarity and the music
clip under consideration. A total of five groups of moods were categorized each
covering a group of adjectives of songs with similar emotional quotient. These five
categories of moods forms our mood model as shown in the table 6.1:Table 6.1: Mood Model: Indian popular Hindi music
Mood Category Adjectives represented
Happy
Sad
Silent
Excited
Romantic
6.2
cheerful, funny, comic, happy, jovial
depressed, frustrated, angry, betrayal, withdrawal, serious
peaceful, calm, silent, nostalgia, slow-paced
danceable, celebration, fast-track, excited,
motivational, inspirational
love, romantic, playful
System Overview
The Mood Identification system is the main engine which would help identify the
mood of given music or audio files. This system is designed as an open source
software system. The system would generally be a part of the back-end in most
of the applications whose result can be used by the application layer on top to
utilize the information in the required way. The system has two-fold objectives as
mentioned below:1. The system should have a provision of analyzing music files and learn the
classifier models associated with them
2. It should be able to predict the class of mood that a particular audio file or
music belongs to.
An abstract view of the Mood Identification system as seen from a users’
perspective is as shown in the Figure 6.1. The system accepts music files as input
from the user and returns the respective mood associated with each file to the
end-user.
33
6.3 System Design and Components
Figure 6.1: Mood Recognition System
6.3
System Design and Components
The system can be divided into several components, each dedicated to perform
a particular task as shown in the Figure 6.2 and as explained in the following
content:-
6.3.1
Audio Pre-processor
This component as the name signifies has the main task of preprocessing the audio
files that are fed by the user to the system. The preprocessing task involves :a. Audio file splitting: Each of the input music file split into a consequent clips,
each of 30 second duration. A music or a song is generally of a duration of
more than a couple of minutes at least, which makes it difficult to analyze
it due to the enormous data content within this duration. Moreover, 30
seconds has been proven to be quite good duration from analysis point of
view as it is not too short to lose any important content and not too long to
increase the processing time. It is very much possible to relate a particular
mood to a song by just listening to an excerpt of 30 seconds of that song.
b. Audio format conversion: Each of the 30 second music clip is converted to a
standard WAV format (PCM signed 16 bit, stereo) with a sampling rate of
34
6.3 System Design and Components
Input : Music Files
Output: mood
Audio Pre-processor
Mood Identification
system
File Splitter
Mood
Classifier
Wav format
Learner
Mood
Mood
Model
Detector
convertor
Train Data
Test Data
Audio Feature Extractor
Figure 6.2: Mood Detection System: Detailed Design
44.1 kHz. Currently the system supports MP3 and WAV formats which are
widely used for audio. A provision for conversion of other formats can also
be done by extending the existing code interfaces. Format conversion to a
single format is necessary so as to ensure the files that would be processed
and analyzed are consistent in structure and format thereby ensuring similar
treatment and processing unlike the case would have been if the formats were
different.
This component thus makes sure that the input music files provided by the user
are transformed so that they can be ready for processing and analyzing further.
6.3.2
Audio Feature Extractor
This module revolves around the audio signal features associated with the music
clips obtained from the Audio Pre-processor. The module performs two main
tasks :a. Feature Extraction: Each of the 30 second music clip received as input is
processed by applying mathematical signal computations like Fourier transforms, logarithms, integrals, to name a few, and their variants and combination. These mathematical functions are representatives of each of the
feature mentioned in the list discussed in section 4.2. The module implements the computations and functions involved for calculating each of the
35
6.3 System Design and Components
features mentioned. Most part of the module implementation is inspired and
extended from the well-know open source tool jAudio [24], with some variations and customizations as required for this work. The features extracted
are either fixed dimension or multi-dimensional. A feature vector comprising
all of these features are extracted for each of the music clip.
b. Data-set generation: The feature vectors thus extracted form the attributes
of the each music clip - which can be called as a data instance. These
feature vectors computed in the memory are stored in a flat file following
the standard ARFF file format understood by most of the data mining tools
like Weka [41]. In addition to the features extracted, another feature called
“Mood” is appended to each data instance. This particular feature will be
manually updated in case of a training set and can be possess any dummy
value in case of real scenarios of mood prediction.
6.3.3
Mood Identification System
This is the main processing unit of the whole system and is responsible for mining
the mood from the music data-set obtained as input from the audio feature extractor module. It comprises of the actual implementation of algorithms mentioned in
section 5.5. The module has two important roles to perform as mentioned below:a. Mood Learner: In this case, the input received is a training data-set of
music features with the “Mood” attribute manually updated by the domain
experts, from the point of view of training. The Mood learner can make use
of the existing mining algorithms or newly written algorithms, provided they
follow the convention and framework laid down by Weka tool [41]. Thus, this
module can serve as the experimenter so that user - analyst or researcher can utilize it to try various algorithms to mine mood aspect of the underlying
music data-set. The classifier model learnt can thus be saved so that it can be
utilized for further evaluation purpose. The output of this part of the module
generally serves useful to end-users who are analysts or researchers, keen
to understand and tune the machine learning aspect of this whole process.
Using this model, the classifier model for bagging of Random forest approach
was trained and store after careful evaluation and comparisons with other
comparable models. Mood learning is generally one-time activity. Once
done, the model is saved and can be re-used for evaluations any number of
times. However, depending upon the user preference, the learning can be
made iterative to improve accuracy with the most updated music data which
36
6.3 System Design and Components
evolves over time to a great extent. This change, however, might require few
code changes which is out of the scope of this project currently.
b. Mood Detector: In this case, the music data-set received as input will
have some dummy data in the “Mood” attribute as this feature is not known
ans is expected to be predicted by this module. The Mood detector then
evaluates the data-set under consideration against the mood classifier model
that has been saved. The evaluation results in predicting the mood for every
30 second music clip that was fed to the system by the user. In case a whole
song was fed instead by the user, the system returns the maximum voted
mood from the moods predicted for all of the clips derived from that song.
The output of this module is generally used by the end-user application
such as a mood-annotator or any Music information retrieval application or
even the end-user himself/herself. Although the module helps in detecting
the mood of the music under consideration, the whole and sole control of
accepting or rejecting this decision can be always given to the end-user with
some minor enhancements to the code.
37
Chapter 7
Experiments and Results
The project involved a lot of rigorous experimentation from data mining point of
view. In addition to it, the preparation and pre-processing involved in carrying
out the experimentation is also worth mentioning. This section describes the
experimental apparatus, flow and results obtained during the experimentation for
music mood identification process.
7.1
Experimental Setup
The apparatus included:• A huge diverse personal music collection of Indian popular music in mp3 or
wav format.
• Open-source tools and libraries for audio processing.
• Open-source data mining framework - Weka [41].
• Music Mood Identification System.
• Panel of five people - one Music expert, two avid Music listeners, two common music listeners
• One workstation for software development, assembly and execution
7.1.1
Data Collection
The data collection involved personal music collection of Indian popular Hindi
songs. Only those songs which are generally popular and famous among the people
were selected and care was taken to ensure there is a good mix of collection of
38
7.1 Experimental Setup
songs spawning across each of the five mood classes. Only songs belonging to MP3
or WAV format were shortlisted in alignment with the scope of the project.
7.1.2
Data pre-processing
Dataset generation was carried out in three stages. First stage consisted of 490
songs, the second consisted of 2200 songs and by the third stage a total of 2300
audio songs, popular and belonging to Indian Hindi films were processed to generate the dataset. All the songs were trimmed to 30 seconds duration clips. Their
low-level features were extracted and consolidated in an ARFF file dataset. Each
entry was annotated with respective most probable mood from the data collected
by consulting the panel of five people in order to recreate a real scenario for supervised training.
7.1.3
Training and Testing
The datasets in each stage were subjected to a range of various existing classification algorithms under numerous runs and folds. Those algorithms showing a bias
towards only specific class labels or performing very low were discarded thereby
subjecting the dataset to a 66%-34% training-testing split learning and evaluation
for all the algorithms. Following are top 11 algorithms which have shown top
comparable results during this experimentation:• NaiveBayes
• Support Vector Machines
• J48 (ID3 algorithm implementation)
• Random Tree
• Random Forest
• REPTree
• Simple CART (Classification and Regression Trees)
• Bagging of Random Trees
• Bagging of Random Forests
• Bagging of simple CART
• Bagging of REPTree
39
7.2 Results
7.2
Results
7.2.1
Evaluation Metrics
The 11 classification algorithms were evaluated with respect to four evaluation
measures for each of the datasets generated:• Receiver Operating Characteristic (ROC) : It shows the trade-off between
the true positive rate and the false positive rate. It is a two-dimensional
plot with vertical axis representing the true positive rate and horizontal axis
representing the false positive rate. A model with perfect accuracy will have
an area of “1”. The area under the ROC curve is a measure of the accuracy
of the model. It ranks the test tuples in decreasing order: the one that is
most likely to belong to the positive class appears at the top of the list .The
closer to the diagonal line (i.e., the closer the area is to 0.5), the less accurate
is the model. Area under ROC was mainly used in signal detection theory
and medical domain where in it was said to be the plot of Sensitivity verses
1- Specificity which is a similar plot as defined earlier. For each of the five
classes of mood model, area under ROC is calculated and more the value
nears “1”, more accurate the classification is.
• Confusion Matrix: The columns of the confusion matrix represent the predictions, and the rows represent the actual class. Correct predictions always
lie on the diagonal of the matrix. Equation 7.1 shows the general structure
of confusion matrix.
"
TP FN
FP TN
#
(7.1)
wherein, True Positives (TP) indicate the number of instances of a class
that were correctly predicted, True Negatives (TN) indicate the number of
instances NOT of a particular class that were correctly predicted NOT to
belong to that class. False Positives (FP) indicate the number of instances
NOT belonging to a class were incorrectly predicted belonging to that class
and False Negatives (FN) indicate the number of instances that were incorrectly predicted belonging to other class. Though the confusion matrix
gives a better outlook on how the classifier performed than accuracy, a more
detailed analysis is preferable which are provided by the further metrics.
Since, in this case we have five mood classes, the confusion matrix will be a
5 X 5 matrix with each diagonal representing the True positives.
40
7.2 Results
• Recall: Recall is a metric that gives a percentage of how many of the actual
class members the classifier correctly identified. (FN + TP) represent a total
of all minority members. Recall is given by equation 7.2
Recall =
TP
TP + FN
(7.2)
• Precision: It gives us the total the percentage of how many of a particular
class instances as determined by the model or classifier actually belong to
that particular class. (TP + FP) represents the total of positive predictions
by the classifier. Precision is given by equation 7.3
P recision =
TP
TP + FP
(7.3)
Thus in general it is said that Recall is a Completeness Measure and Precision is a Exactness Measure. The ideal classifier would give value as 1
for both Recall and Precision but if the classifier gives higher(closer to one)
for one of the above metrics and lower for the other metrics in that case
choosing the classifier is difficult task. In such cases some other metrics as
discussed further are suggested in the literature.
• F-Measure: It is a harmonic mean of Precision and Recall. We can say that
it is essentially an average between the two percentage. It really simplifies
the comparison between the classifiers. It is given by the equation 7.4.
F − M easure =
1
( Recall
2
1
+ P recision
)
(7.4)
Figures 7.1, 7.2, 7.3, 7.4 depict the performance of the algorithms with reference
to the four measures namely, AUROC, Recall, Precision and Fmeasure. From each
of the results, it can be seen that Bagging(ensemble) approach of classification
tree algorithms like RandomForest, RandomTree and SimpleCART showed better
results as compared to other algorithms, and Bagging of Random Forest topped
among all consistently.
41
7.2 Results
Figure 7.1: Area under ROC statistics
Figure 7.2: Recall statistics
42
7.2 Results
Figure 7.3: Precision statistics
Figure 7.4: F-measure statistics
43
7.2 Results
Table 7.1: Experimental Results on Test Dataset of 2938 music clips
TP
FP
Precision Recall FROC Mood
Rate Rate
Measure
Area Class
0.964
0.805
0.77
0.822
0.871
0.853
0.106
0.021
0.006
0.019
0.038
0.042
0.751
0.914
0.971
0.867
0.849
0.867
0.964
0.805
0.77
0.822
0.871
0.853
0.845
0.856
0.859
0.844
0.86
0.853
0.991
0.978
0.967
0.977
0.983
0.98
excited
happy
romantic
sad
silent
←
Avg.
Table 7.2: Experimental Results on Test Dataset of 2938 music clips
a
b
c
d
e
← Classified As
704
69
94
34
36
16
511
16
5
11
1
7
470
1
5
0
15
10
314
23
9
33
20
28
506
a = excited
b = happy
c = romantic
d = sad
e = silent
The Table 7.1 shows the evaluation results obtained for the said metrics after
performing a test run a dataset of 2938 music clips belonging to Indian popular
Hindi music. The table shows the classification performance for each of the mood
category defined in the mood model.The last row represents the metric values with
weighted average taken over all the classes.
The Table 7.2 displays the confusion matrix for the evaluation of the Test
dataset of 2938 music clips belonging to Indian popular music. As seen from
the matrix, the diagonal elements marked bold are the correctly identified data
instances and denote the True positives. From the data seen in the matrix, following can be inferred:Total number of instances: 2938
Number of correctly classified instances: 2505 (85.26%)
Number of incorrectly classifier instances: 433 (14.74%)
44
Chapter 8
Applications
Our work we believe can contribute substantially to a variety of real world applications involving music. Following are a few of the many fields that can reap the
benefits this system:-
8.1
Music Therapy Applications
The field of Music Therapy involves clinical use of music in a therapeutic way
to treat individuals by addressing their physical, emotional, social and cognitive
needs. As a result of the tremendous research and successful experiments, Music
Therapy has emerged as an important field using music as medium to improve
the quality of life of the people in spite of diversity, disability or illness. Receptive
Musical Therapy is one of the many important streams of this field wherein after
examining the condition of the individual; the Music Therapy Expert plans and
recommends a routine involving listening to a particular type of music. Since this
therapy is more close to emotional and psychological needs of the individual, mood
underlying the music plays an important role in the choice of music. Automatic
mood recognition of music can help to reduce efforts of the expert to manage,
search and recommend the appropriate music relevant for the individual. This
can also be extended to online self-therapy applications wherein the individuals
can themselves choose the appropriate music accurately as directed by the expert
and without much search efforts.
8.2
Music Information Retrieval
MIR systems aim at extracting information from music. This information can
be used for various music applications like Recommender systems, Instrument
45
8.3 Intelligent Automatic Music Composition
recognition and separation applications, Automatic categorization systems and
many more. Our system can contribute to the Automatic categorization systems
wherein the music can be categorized by its corresponding mood recognized by
our system automatically. This will not only help to organize the music in a much
better way but also reduce the overhead on users for selecting a list of songs suiting
the current mood or occasion. With this system in place, user can just choose a
mood and the system can give him the list of all songs belonging to that mood.
From this subset, the user has to select the songs he wishes to listen to and this
subset is very small in size as compared to the whole set of songs wherein by using
traditional technique the user selects the songs list either by song name, album
or artist and then searches for the song that matches the mood. The system can
also find application in recommender system to recommend the songs matching
the mood along with other traditional parameters, which can definitely give better
results.
8.3
Intelligent Automatic Music Composition
Music in today’s world is created and composed by highly skilled and trained
musicians. With the increasing innovations in technology, many softwares and
devices have also proved beneficial in assisting the musicians in easing the efforts
put to compose music from various instruments, singers and merge or process it.
A lot of research is going on from all parts of the world with the aim of building a
system which can compose music automatically and intelligently enough to sound
interesting the way humans compose it. Building such application will not only
require a lot of music signal processing , pattern recognition and matching but
also a great deal of information and data about the music in order to produce a
novel music composition. Mood of the music pieces can form one of the important
parameter in searching music pieces to be put together to generate a new music.
Our system can help at this stage by automatically recognizing and annotating
music pieces.
46
Chapter 9
Conclusion and Future Work
9.1
Conclusion
We successfully experimented with the task of mapping audio features of Indian
Popular Music with respective moods with the top precision ranging in between
75% and 81% with respect to Fmeasure and 70% to 75% precision measure. The
best accuracy w.r.t. area under ROC was observed in the range 0.91 to 0.94 which
seems quite satisfactory. The Bagging of Random Forest approach thus performed
much better as compared to not just other decision tree based algorithms but other
classification algorithms as well. This was a new observation in case on analysis of
Indian popular music unlike western music where SVM and neural network algorithms dominated the classifier accuracy. The classification performance achieved
seems satisfactory so far thus making it useful for use in real applications. The
open source framework developed as a part of the project also serves as a common
framework for music data mining analysis in terms of an end-to-end solution. Although the current approach has proved satisfactory results, we consider this as
just a first step in exploring Indian popular music and it opens avenues for further
research and developments in this are to bring more efficient results.
9.2
Future Work
The path forward involves a further cycle of experimentations and refinement
of the audio features and the mood categories if required, so as to enrich the
dataset in addition to increased number and variety of songs to extract further
valuable information for mood learning. During this development cycle the mood
model also has a chance to likely undergo some changes to suit best the Indian
47
9.2 Future Work
Song scenarios. The current system is capable of recognizing the mood of songs
of 30 second duration. Further this can be extended to derive the mood of the
entire song by collectively recognizing and weighing the moods recognized for the
30-second trimmed clips of the song. In future, this system can be extended
to other genres of Indian songs like Hindustani classical, Carnatic music with
changes involving audio features and classification techniques. Customization of
this system to non-Indian songs cannot be ruled out as well after a thorough
experimentation. Since some of the moods represented by Indian popular music
are very much governed by expressions, which are very well conveyed through
lyrics, lyrics analysis in combination with audio features can make the system
much stronger with a better accuracy.
48
Chapter 10
Project Milestones
10.1
Project Schedule
Table 10.1 outlines the project schedule and major milestones achieved during
the schedule in the direction of completion of the project. The project was on
schedule and completed in the time planned with respect to the scope assigned
for the project.
Table 10.1: Weekly Schedule of Project Starting 1 July, 2011
Week
Task
Status
1 to 4
5 to 6
7
8 to 9
10 to 11
12
13 to 14
15-16
17
18 to 19
20 to 23
24
25 to 26
27 to 29
30 to 31
32
33
34 to 35
36 to 40
41 to 44
Problem Statement Indentification
Problem Statement Finalization
Project Synopsis, Literature Survey - MIR
Literature Survey - MIR and Music Analysis
Literature Survey - Mood Classification
Literature Survey - Audio Features
Literature Survey - Feature Extraction Tools
Literature Survey- Data Mining for Music
Data Collection and Preprocessing
Detailed System Design, Data Processing
Existing mining algorithms training
Algorithm performance analysis
Algorithm refinement
Feature selection refinement
Mood Model refinement
Feature and Mood model finalization
Data collection and Dataset re-structuring
Re-analysis and evaluation of model learnt
Code integration and testing
Bug solving and fixing
49
Completed
Completed
Completed
Completed
Completed
Completed
Completed
Completed
Completed
Completed
Completed
Completed
Completed
Completed
Completed
Completed
Completed
Completed
Completed
Completed
10.2 Publications’ status
10.2
Publications’ status
Right from the conception of the project till its completion for the said scope,
the project has been through various stages wherein we have received a very good
response, suggestions and critics regarding our work as presented through our
research papers. Few of our papers related to our work done, which have been
reviewed, accepted and appreciated by notable international conferences are listed
below
Title
Table 10.2: Paper publications’ status
Conference
Status
Automatic mood clas- Asia Modelling Symposium 2012, Pro- Published
sification model for In- ceedings to be published in IEEE Comdian Popular Music
puter Society Digital Library (CSDL)
and I-Xplore http://ams2012.info
Mood based classifica- CUBE
2012,
Proceedings Accepted
tion of Indian Popular to
be
published
in
ACM,
Music
http://www.thecubeconf.com/academic/
Music Mood Identifi- 5th International conference on Psy- Accepted
cation: A Data Min- chology of Music and Mental Health,
ing Approach
Bangalore, http://www.nada.in/
50
Bibliography
[1] Tsunoo, E., Akase, T., Ono, N., Sagayama S., (2010), “Musical mood classification by rhythm and bass-line unit pattern analysis”, Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[2] Attribute-Relation File Format, http://www.cs.waikato.ac.nz/ml/weka/arff.html
[3] Doris Baum, (2006), “Emomusic - Classifying music according to emotionl”,
Proceedings of the 7th Workshop on Data Analysis (WDA), Kosice, Slovakia.
[4] Breiman, L., (1996), “Bagging predictors”. Machine Learning 24(2), 123140.
[5] Capurso, A., Fisichelli, V. R., Gilman, L., Gutheil, E. A., Wright, J. T., (1952),
“Music and Your Emotions”, Liveright Publishing Corporation.
[6] Chia-Chu Liu, Yi-Hsuan Yang, Ping-Hao Wu, Homer H. Chen, (2006), “Detecting and classifying emotions in popular music”, JCIS Proceedings’.
[7] Dalibor Mitrovic, Matthias Zeppelzauer, Horst Eidenberger, (2007), “Analysis
of the Data Quality of Audio Descriptions of Environmental Sounds”, Journal
of Digital Information Management, 5(2):48.
[8] Trung-Thanh Dang, Kiyoaki Shirai, (2009), “Machine Learning Approaches
for Mood Classification of Songs toward Music Search Engine”, International
Conference on Knowledge and Systems Engineering.
[9] Atin Das, Pritha Das, (2005), “Classification of Different Indian Songs Based
on Fractal Analysis”, Complex Systems Publications.
[10] Dewi K.C., Harjoko. A., (2010), “Kid’s Song Classification Based on Mood
Parameters Using K-Nearest Neighbor Classification Method and Self Organizing Map.”, International Conference on Distributed Frameworks for Multimedia Applications (DFmA).
51
BIBLIOGRAPHY
[11] Duda, R. O., Hart P. E., (2000)“Pattern Classification”,New York Press:
Wiley.
[12] Ekman P., (1982), “Emotion in the Human Face”, Cambridge University
Press, Second ed.
[13] Paul R. Farnsworth, (1958), “The social psychology of music, The Dryden
Press.
[14] Fu, Z., Lu, G., Ting, K. M., Zhang D., (2010), “A survey of audio-based
music classification and annotation”, IEEE Trans. Multimedia.
[15] Geroge Tzanetakis, Perry Cook, (2002), “Musical Genre Classification of Audio Signals”, IEEE Transaction on Speech and Audio Processing.
[16] Han, B., Rho, S., Dannenberg, R. B., Hwang E., (2009), “Smers: Music
emotion recognition using support vector regression”, Proceedings of the 10th
Intl. Society for Music Information Conf., Kobe, Japan.
[17] Han J., Kamber M., Pei J., (2011), “Data Mining: Concepts and Techniques,
3rd Edition”, Morgan Kauffman publications, ISBN: 9780123814791.
[18] Kate Hevner, (1936), “Experimental studies of the elements of expression in
music, American Journal of Psychology, 48:246268.
[19] Ho, T., (1998), “The random subspace method for constructing decision
forests.”, IEEE Transactions on Pattern Analysis and Machine Intelligence
20(8), 832844.
[20] JungHyun Kim, Seungjae Lee, SungMin Kim, WonYoung Yoo, (2011), “Music Mood Classification Model Based on Arousal-Valence Values”, ICACT2011,
ISBN 978-89-5519-155-4.
[21] Li T., Ogihara, M., (2003), “Detecting emotion in music”, Proceedings of the
International Symposium on Music Information Retrieval, Washington D.C.,
USA.
[22] Liu, D., Lu, L., Zhang H. J., (2003), “Automatic Mood Detection from Acoustic Music Data”, Johns Hopkins University.
[23] McAdams. S., (1999), “Perspectives on the contribution of timbre to musical
structure.”, Computer Music Journal, 23:85102.
52
BIBLIOGRAPHY
[24] McEnnis, D., McKay, C., Fujinaga, I., Depalle P., (2005), “jAudio: A feature extraction library”, Proceedings of the International Conference on Music
Information Retrieval. 6003.
[25] Marsyas, http://opihi.cs.uvic.ca/marsyas
[26] Masato Miyoshi, Satoru Tsuge, Hillary Kipsang Choge, Tadahiro Oyama,
Momoyo Ito, Minoru Fukumi, (2010), “Music Impression Detection Method for
User Independent Music Retrieval System”,Proc. of KES2010, pages 612621.
[27] Mirenkov, N., Kanev, K., Takezawa, H., (2008), “Quality of Life Supporters
Employing Music Therapy”, Advanced Information Networking and Applications - Workshops(AINAW).
[28] MPEG-7 Overview, http://mpeg.chiariglione.org/standards/mpeg-7/mpeg7.htm
[29] Rabiner, L., Juang, B., (1993), “Fundamentals of speech recognition.”, New
York: Prentice-Hall.
[30] Radocy E., Boyle J. D., (1988), Psychological foundations of musical behavior, Springfield, IL: Charles C. Thomas.
[31] Russell J. A., (1980), “A circumplex model of affect”,Journal of Personality
and Social Psychology, 39: 1161-1178.
[32] Ryo Hirae, Takashi Nishi, (2008), “Mood Classification of Music Audio Signals”, The Journal of the Acoustical Society of Japan.
[33] Scaringella, N., Zoia, G., Mlynek, D., (2006), “Automatic genre classification
of music content, A survey.”, IEEE Signal Processing Magazine, vol. 23, no.
2, pp. 133141.
[34] Schoen M., Gatewood E. L., (1999), “The mood effects of music”, International Library of Psychology, Routledge.
[35] Schubert E., (1996), “Continuous response to music using a two dimensional
emotion space”, Proceedings of the 4th International Conference of Music Perception and Cognition.
[36] Sloboda J. A., Juslin P. N., (2001), “Music and Emotion: Theory and Research”, New York: Oxford University Press.
53
BIBLIOGRAPHY
[37] Thayer, R. E., (1989), “The Bio-psychology of Mood and Arousal”, New
York: Oxford University Press.
[38] Tyler P., (1996), “Developing A Two-Dimensional Continuous Response
Space for Emotions Perceived in Music”, Doctoral dissertation, Florida State
University.
[39] Wang, M., Zhang, N., Zhu, H., (2004), “User-adaptive Music Emotion Recognition”, IEEE International Conference on Signal Processing, pp. 1352-1355.
[40] Weihs, C., Ligges, U., Morchen, F., Mullensiefen, D., (2007), “Classification
in music research.”, Advance Data Analysis Classification, vol. 1, no. 3, pp.
255291.
[41] Weka, http://www.cs.waikato.ac.nz/ml/weka/
[42] Zhouyu Fu, Guojun Lu, Kai Ming Ting, Dengsheng Zhang, (2011), “A Survey
of Audio-Based Music Classification and Annotation”, IEEE Transactions on
multimedia, Vol. 13, No. 2.
54