Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Automatic Mood Classification of Indian Popular Music Dissertation Submitted in partial fulfillment of the requirements for the degree of Master of Technology, Computer Engineering by Aniruddha M. Ujlambkar Roll No: 121022001 under the guidance of Prof. V. Z. Attar Department of Computer Engineering and Information Technology College of Engineering, Pune Pune - 411005. June 2012 Dedicated to my mother, Smt. Manasi M. Ujlambkar, who has always emphasized the importance of education, discipline, integrity and has been a constant source of inspiration for me, my entire life and my father, Shri. Mukund K. Ujlambkar, who has always been my role model for hard work, persistence, patience and always supported me open heartedly in all my endeavors. DEPARTMENT OF COMPUTER ENGINEERING AND INFORMATION TECHNOLOGY, COLLEGE OF ENGINEERING, PUNE CERTIFICATE This is to certify that the dissertation titled Automatic Mood Classification of Indian Popular Music has been successfully completed By Aniruddha M. Ujlambkar (121022001) and is approved for the degree of Master of Technology, Computer Engineering. Prof. V. Z. Attar, Guide, Department of Computer Engineering and Information Technology, College of Engineering, Pune, Shivaji Nagar, Pune-411005. Date : Dr. Jibi Abraham, Head, Department of Computer Engineering and Information Technology, College of Engineering, Pune, Shivaji Nagar, Pune-411005. Abstract Music has been an inherent part of human life when it comes to recreation; entertainment and much recently, even as a therapeutic medium. The way music is composed, played and listened to has witnessed an enormous transition from the age of magnetic tape recorders to the recent age of digital music players streaming music from the cloud. What has remained intact is the special relation that music shares with human emotions. We most often choose to listen to a song or music which best fits our mood at that instant. In spite of this strong correlation, most of the music softwares present today are still devoid of providing the facility of mood-aware play-list generation. This increases the time music listeners take in manually choosing a list of songs suiting a particular mood or occasion, which can be avoided by annotating songs with the relevant emotion category they convey. The problem, however, lies in the overhead of manual annotation of music with its corresponding mood and the challenge is to identify this aspect automatically and intelligently. The study of mood recognition in the field of music has gained a lot of momentum in the recent years with machine learning and data mining techniques contributing considerably to analyze and identify the relation of mood with music. We take the same inspiration forward and contribute by making an effort to build a system for automatic identification of mood underlying the audio songs by mining their spectral, temporal audio features. Our focus is specifically on Indian Popular Hindi songs. We have analyzed various data classification algorithms in order to learn, train and test the model representing the moods of these audio songs and developed an open source framework for the same. We have been successful to achieve a satisfactory precision of 70% to 75% in identifying the mood underlying the Indian popular music by introducing the bagging (ensemble) of random forest approach experimented over a list of 4600 audio clips. iii Acknowledgments I express my deepest gratitude towards my guide Prof. V. Z. Attar for her constant help and encouragement throughout the project work. I have been fortunate to have a guide who gave me the freedom to explore on my own and at the same time helped me plan the project with timely reviews and constructive comments, suggestions wherever required. A big thanks to her for having faith in me throughout the project and helping me walk through the new avenues of research papers and publications. I would like to thank Prof. A. A. Sawant, for the continuous support and encouragement he extended through the enthusiastic discussions he used to have very often with us and the insightful thoughts and ideas he used to share. I also take this opportunity to thanks all those teachers, staff and colleagues who have constantly helped me grow, learn and mature both personally and professionally throughout the process. A BIG thanks goes to my dearest friends who have always supported, guided and even criticized me, always for the right reasons and have helped me stay sane throughout this and every other chapter of my life. I greatly value their friendship and deeply appreciate their belief in me. Special thanks to all the new friends from M.Tech. I have made without whom the journey wouldn’t have been so interesting and memorable! Most importantly, none of this would have happened without the love and patience of my family - my parents, to whom this dissertation is dedicated. I would like to express my heart-felt gratitude to my family. Aniruddha M. Ujlambkar College of Engineering, Pune June 2012 iv Contents Abstract iii Acknowledgments iv List of Figures viii List of Tables 1 Introduction 1.1 Music and Mood . . . . . . . 1.2 Introduction to music features 1.3 Music and Data Mining . . . 1.4 Motivation . . . . . . . . . . . 1.5 Thesis Objective and Scope . 1.6 Thesis Outline . . . . . . . . . ix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1 2 3 4 5 6 2 Literature Survey 7 2.1 Music Mood Model and Audio Features . . . . . . . . . . . . . . . . 7 2.2 Music classification . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 3 Music Mood Model 3.1 Music Mood Relation . . . . . 3.2 Mood(Emotion) Models . . . 3.2.1 Hevner’s experiment . 3.2.2 Russell’s model . . . . 3.2.3 Thayer’s model . . . . 3.2.4 Indian Classical model: . . . . . . . . . . . . . . . . . . . . . . . . . Navras . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 12 13 13 14 14 16 4 Audio Features 17 4.1 Low level Audio Features . . . . . . . . . . . . . . . . . . . . . . . . 17 4.2 Feature List . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 5 Mining Mood from Audio Features 5.1 Overview of Data Mining . . . . . . . . . . . 5.2 Overview of Data Mining functionalities . . 5.3 Classification . . . . . . . . . . . . . . . . . 5.3.1 Classification using Decision-tree . . 5.3.2 Random Forest Classification . . . . 5.4 Random Forest Highlights . . . . . . . . . . 5.5 Our Approach: Bagging of Random Forests 5.5.1 Algorithm . . . . . . . . . . . . . . . . . . . . . . . 22 22 23 25 26 28 29 30 30 . . . . . . 32 32 33 34 34 35 36 . . . . . . 38 38 38 39 39 40 40 8 Applications 8.1 Music Therapy Applications . . . . . . . . . . . . . . . . . . . . . . 8.2 Music Information Retrieval . . . . . . . . . . . . . . . . . . . . . . 8.3 Intelligent Automatic Music Composition . . . . . . . . . . . . . . . 45 45 45 46 6 Mood Identification System 6.1 Mood Model Selection . . . . . . . 6.2 System Overview . . . . . . . . . . 6.3 System Design and Components . . 6.3.1 Audio Pre-processor . . . . 6.3.2 Audio Feature Extractor . . 6.3.3 Mood Identification System 7 Experiments and Results 7.1 Experimental Setup . . . . . 7.1.1 Data Collection . . . 7.1.2 Data pre-processing . 7.1.3 Training and Testing 7.2 Results . . . . . . . . . . . . 7.2.1 Evaluation Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 Conclusion and Future Work 47 9.1 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 9.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 10 Project Milestones 49 10.1 Project Schedule . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 vi 10.2 Publications’ status . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 Bibliography 51 vii List of Figures 3.1 3.2 3.3 3.4 Hevner’s Mood Model . . . . . . . . . . Russell’s Mood Model . . . . . . . . . . Thayer’s Mood Model . . . . . . . . . . Navras: Indian Classical emotion model 4.1 Audio Features Taxonomy . . . . . . . . . . . . . . . . . . . . . . . 18 5.1 5.2 5.3 5.4 Data Mining in Knowledge discovery Data Mining disciplines . . . . . . . . Classification process . . . . . . . . . Classification using Decision Tree . . 6.1 6.2 Mood Recognition System . . . . . . . . . . . . . . . . . . . . . . . 34 Mood Detection System: Detailed Design . . . . . . . . . . . . . . . 35 7.1 7.2 7.3 7.4 Area under ROC statistics Recall statistics . . . . . . Precision statistics . . . . F-measure statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 15 15 16 22 23 26 27 42 42 43 43 List of Tables 6.1 Mood Model: Indian popular Hindi music . . . . . . . . . . . . . . 33 7.1 7.2 Experimental Results on Test Dataset of 2938 music clips . . . . . . 44 Experimental Results on Test Dataset of 2938 music clips . . . . . . 44 10.1 Weekly Schedule of Project Starting 1 July, 2011 . . . . . . . . . . 49 10.2 Paper publications’ status . . . . . . . . . . . . . . . . . . . . . . . 50 Chapter 1 Introduction 1.1 Music and Mood THE well-known German philosopher Friedrich Nietzsche” once quoted a famous line: “Without music, life would be a mistake”. Music has always been an inherent part of recreation of human life. Music is not just useful for entertainment, but studies have shown that listening the right music does play an important role in healing, rejuvenating and even inspiring human mind in difficult conditions such as is widely studied and demonstrated by the field of Music Therapy [27]. With the rapidly increasing technology and the new advent in latest multimedia gadgets, music has reached almost every individual’s personal gadget may it be a laptop, music player or a cell phone. The music which in the olden days was limited to live concerts, performances or radio broadcasts is now available at everyone’s finger tips within few clicks. Music has thus become very easily accessible and available. However, the music database is ever increasing and the list will go so long that it would not be wrong to say that we might hear a couple or more completely new and never-heard music pieces every single day. Today, the overall music collection can count to a few millions of records in the whole world and still continue to increase every day. With so much of variety of music easily available, we humans do not always listen to the same type of music all the time. We have our interests, favorite artists, albums, music type. To put simply, we have our personal choices and more importantly, even our choice might differ from time to time. This choice is very much naturally governed by our emotional state at that particular instant. The relation between musical sounds and their influence on the listeners’ emotion has been well studied and is evident from the much celebrated papers such as that of Hevner [18] and Farnsworth [13]. The papers described experiments which eventually substantiated a hypothesis that music inherently 1 1.2 Introduction to music features carries emotional meaning. Currently we can store, sort and retrieve our digital music files based on various traditional music classification tags like Artist, Band(Group), Album, Movie and Genre. However, choosing a song or music piece suiting our mood from a large database is difficult and time consuming, since each of the mentioned parameter cannot sufficiently convey the emotional aspect associated with the song. What we need is an additional parameter or rather search filter here - Mood - which signifies the emotion of that particular music piece. However, classifying music as per its mood is a much harder task. The main reasons being: • First, emotion or mood of music is very subjective. Human mood, surrounding environment, personality, cultural background can have an influence on the perceived emotion of a particular music. • Second, the adjectives describing emotion can be ambiguous. For instance happy and refreshing may refer to the same song. • Third, it is inexplicable as to how music arouses emotion. What intrinsic quality of music, if any, creates a specific emotional response is still far from well-understood [6]. 1.2 Introduction to music features As it is a well established fact that music indeed has an emotional quotient attached with it, it is very essential to know what are the intrinsic factors present in music which associate it with a particular mood or emotion. A lot of research has been done and still going on in capturing various features from the audio file based on which we can analyze and classify a list of audio files. Audio features are nothing but mathematical functions calculated over the audio data, in order to describe some unique aspect of that data. In the last decades a huge number of features were developed for the analysis of audio content. Dalibor Mitrovic and team [7] have analyzed various state-of-the-art audio features useful for contentbased audio retreival. Audio features were initially studied and explored for application domains like speech recognition [29]. With upcoming novel application areas, the analysis of music and general purpose environmental sounds gained importance. Different research fields evolved, such as audio segmentation, music information retrieval (MIR), and environmental sound recognition (ESR). Each of these areas developed its specific description techniques (features). Many audio features have been 2 1.3 Music and Data Mining proposed in the literature for music classification. Different taxonomies exist for the categorization of audio features. Weihs et al. [40] have categorized the audio features into four subcategories, namely short-term features, long-term features, semantic features, and compositional features. Scaringella [33] followed a more standard taxonomy by dividing audio features used for genre classification into three groups based on timbre, rhythm, and pitch information, respectively. Each taxonomy attempts to capture audio features from certain perspective. Instead of a single-level taxonomy, Zhouyu Fu and team [42] unify the two taxonomies and present a hierarchical taxonomy that characterizes audio features from different perspectives and levels. From the perspective of music understanding, we can divide audio features into two levels, low-level and mid-level features along with top-level labels. Low-level features can be further divided into two classes of timbre and temporal features. Timbre features capture the tonal quality of sound that is related to different instrumentation, whereas temporal features capture the variation and evolution of timbre over time. Low-level features are the basis description of the audio data, for instance, tempo, beats per minute and so on. On the contrary, mid-level features are derived by using these basic features to provide the music related technical understanding such as rhythm, pitch which in turn is perceived by the humans as genre, mood, which form the top-level of the taxonomy. A wide range of audio features have been studied by specialists and experts over the past many years and many of the features have been even standardized for instance MPEG7 standards [28] which provide a list of low level audio descriptors (features) and techniques and tools to extract the same. Audio feature extraction process involves a lot of complex mathematical and signal processing to convert the digital audio data into meaningful features represented by numbers (fixed or variable dimensions). To name a few, following are examples of some low-level audio features :- zero-crossing rate, magnitude spectrum, spectral centroid, spectral roll-off and many more which will be discussed in detail in coming chapters. With the increasing standardization and research in audio features, an effort is always made to to obtain features, which are orthogonal and which provide descriptions with a high variance for the underlying data. 1.3 Music and Data Mining Data Mining is relatively young and interdisciplinary field in computer science which deals with analyzing and discovering interesting and useful patterns from 3 1.4 Motivation large data sets. This field involves various tasks for analyzing data out of which the most important task relevant in the context of our work is :- Classification. Classification task involves generalizing a known structure or pattern among the available data already assigned some specific class or label. This generalized pattern can then be used to predict the class of a new and unknown data. On the contrary, Clustering is also a data mining task of discovering groups and structures of data which are in some way similar to each other and differ in similar way from other groups, without any prior knowledge of the structure of the data. Music mood detection fits the criteria for a data mining problem as we have a huge number of music pieces each with few hundred audio features associated with it. Music emotion detection and classification has been studied and researched before. Initially most of them adopted pattern recognition approach. Wang et al. [39] extracted features from MIDI files and used a support vector machine (SVM) to classify music into six classes: joyous, robust, restless, lyrical, sober, and gloomy. High classification accuracy was reported; however, one cannot easily transcribe real world music into symbolic form, as done in MIDI files. Li et al. [21] divided emotion into thirteen categories and combined them into six classes. Then, they adopted MARSYAS [25] in their system to extract music features from acoustic data and used SVM to train and recognize music emotion. Liu et al. [22] presented a hierarchical mood recognition system, which uses a Gaussian mixture model (GMM) to represent the feature dataset and a Bayesian classifier to classify music clips. Byeong-jun Han proposed a music emotion recognition system using support vector regression and using the Thayers emotion model [37]. Overall, may it be genre classification, instrument classification or even mood classification, data mining techniques, especially classification techniques have proved much effective in analyzing music and categorizing it. 1.4 Motivation The way we choose a song to listen is very much restricted by what search options we are provided with by the underlying software of the music device. Today we can search a song by tags like Name, Artist, Album, and Genre. After years of research and study, it was an established fact that human emotions are influenced by music. Hence, it was high time to introduce a parameter Mood for annotating an audio file so that users can search the list of songs relating to their mood at that instant. The idea is great, but the question is, how the music will be annotated with this Mood tag? It has been observed that the search tags (like Genre/Mood) 4 1.5 Thesis Objective and Scope are most of the times edited manually. This is prone to lot of human errors and needs a better solution. This is where we intend to contribute so that the mood can be automatically detected for a given audio file and no manual intervention needed to annotate a song with particular mood. This can not only reduce manual effort, but also minimize to a certain extent the human errors associated with it. This in turn can thus help users to organize their music according to their moods instead of remembering and searching for a particular artist or album name that contained the song of that particular mood. More interesting fact observed was, a great deal of work has been done on non-Indian music so far. So we find it a challenging task to see how we can utilize the relevant work done till now and take it further to Indian music mood recognition with an intention to provided a much enriched user experience but also contribute for the good of digital music community. 1.5 Thesis Objective and Scope Most of the experimentation done in the field of music mood recognition has been observed with respect to non-Indian music. Music being subjective to cultural backgrounds, it is but natural that Indian Music might need a different treatment as compared to non-Indian music. Our goal is to develop a music emotion recognition system for Indian popular music by analyzing the relation of timbral, spectral and temporal features of audio file with the emotion represented by the audio file. The main goals of this thesis can be stated as:a. To build an Automatic mood recognition system for Indian popular songs. b. To develop an open-source framework that can help analyze and experiment music data with various machine learning, data mining algorithms. In order to achieve these ultimate goals, we have laid down a list of sub-goals that we together help achieve the end objective: a. Identifying the various moods associated with Indian popular music and finalize a mood model. b. Identifying and finalizing the set of audio features important from mood perspective. c. Identifying and Developing tools required for extracting audio features. 5 1.6 Thesis Outline d. Identifying and developing the Data Mining technique to construct the mood classification model. e. Design, implementation and testing of the framework integrating the whole process of Mood classification and prediction. The scope of the work is limited to Indian popular Hindi music. 1.6 Thesis Outline The rest of the paper is organized as follows: In Chapter 2 we give a brief description of the important papers and literature that we have studied or utilized as a part of our literature survey. In Chapter 3, we discuss the about our mood model. Chapter 4 explains the various features associated with music and the feature set important from the perspective of this project. In Chapter 5, Data mining and its use in mining mood from music is discussed. Chapter 6 puts forth the overall design of the Music Mood identification system followed by Chapter 7 discussing the experiments and corresponding results obtained for the performance of the system. Chapter 8 gies an overview of the possible applications and use of this project. Chapter 9 outlines the conclusion and future work related to this project. Finally chapter 10 lists the various project milestones and the publications’ status that resulted during the course of the project. 6 Chapter 2 Literature Survey The research and study behind this topic can be subdivided into 3 different subfields: • First: Mood model, which involves identifying and defining the list of adjectives precisely describing all possible moods. • Second: Audio features identification and extraction, which involves identifying and extracting the essential features from an audio file applying signal processing algorithms and techniques in order to analyze the file. • Third: Mining, machine learning algorithm, which involves learning and choosing the appropriate algorithm(s) that help to mine efficiently the music datasets with substantial accuracy. 2.1 Music Mood Model and Audio Features Various experts in the fields of psychology, musicology have come with models describing human emotions. One of the most ancient of the experiments done by Hevner [18] helped in categorizing various adjectives into 8 different groups each representing a class of mood. The model was more of a categorical model wherein a list of adjectives representing same emotion were grouped together. Russell [31] later came up with the circumplex model representing human emotions on a circle with each mood category plotted within the circle separated from other categories along the polar co-ordinates. Thayer [37] too came up with a dimensional model plotted along two axes (Stress versus energy) with mood represented by a two-dimensional co-ordinate system and lying on either of the two axes or the four quadrants formed by the two-dimensional plot. Details of these models are discussed in further chapters. 7 2.1 Music Mood Model and Audio Features JungHyun Kim and team [20] proposed an Arousal-Valence (A-V) based mood classification model for music recommendation system. The collected music mood tags and A-V values from 20 subjects were analyzed and the A-V plane was classified into 8 regions depicting mood by using k-means clustering algorithm. Their work shows that some regions on the A-V plane can be identified by representative mood tags like previous mood models, but some mood tags are overlapped in almost all regions. Akase and group [1] discuss an approach for the feature extraction for audio mood classification. In this task the timbral information has been widely used, however many musical moods are characterized not only by timbral information but also by musical scale and temporal features such as rhythm patterns and bass-line patterns. Their paper proposed the extraction of rhythm and bass-line patterns, and these unit pattern analysis are combined with statistical feature extraction for mood classification. In combination with statistical features including MFCCs and musical scale feature, the effectivity of the features was verified experimentally. McKay and team [24] developed ”jAudio” an open-source audio feature extraction framework that includes implementations of 26 core features, including both features proven in MIR research and more experimental perceptually motivated features. jAudio places an even greater emphasis on implementations of metafeatures and aggregators that can be used to automatically generate many more features from core features (for instance , standard deviation, derivative, running mean etc.) that can be useful for music analysis. The tool has been quite useful and widely accepted for music analysis research. Dalibor Mitrovic and team’s work [7] deals with the statistical data analysis of a broad set of state-of-the-art audio features and low-level MPEG-7 audio descriptors. The investigation comprises of data analysis to reveal redundancies between state-of-the-art audio features and MPEG-7 audio descriptors. The work employs Principal Components Analysis, which reveals low redundancy between most of the MPEG-7 descriptor groups. However, there is high redundancy within some groups of descriptors such as the BasicSpectral group and the TimbralSpectral group. Redundant features capture similar properties of the media objects and should not be used in conjunction. The paper provides a good insight on the choice of audio features for analysis. 8 2.2 Music classification 2.2 Music classification Doris Baum and team introduce EmoMusic [3] in their paper which presents a user study on the usefulness of the “PANAS-X” emotion descriptors as mood labels for music. It describes the attempt to organize and categorize music according to emotions with the help of different machine learning methods, namely Selforganizing Maps and Naive Bayes, Random Forest and Support Vector Machine classifiers. The study showed that emotions may very well be derivable in an automatic way, although the procedure certainly can be refined further. Naive Bayes and Random Forest classifiers can, for instance, be used to predict the emotion of a piece of music with reasonable success. Z. Fu and team [14] provide a comprehensive review on audio-based classification in their paper. It systematically summarizes the state-of-the-art techniques for music classification along with the recent progress information in this field. This survey emphasizes on recent development of the techniques and discusses several open issues for future research. The survey has provided an up-to-date discussion of audio features and classification techniques used in the literature. In addition, the individual tasks for music classification and annotation also reviewed and identified both task-specific issues. K. C. Dewi and A. Harjoko [10] put forth the music classification system based on mood parameters using K-Nearest Neighbor classification method and Self Organizing Map. The mood parameters used is based on Robert Thayer’s energystress Model. Features that are used are rhythm patterns of the music. Classification of music based on mood parameters by the method of K-Nearest Neighbor and Self Organizing Map with 30 songs reached an accuracy of 66.67%. Classification of music based on mood parameters with 120 songs reached 73.33% accuracy for K-Nearest Neighbor methods and 86.67% for Self-Organizing Map method. B. Han and group [16] proposed SMERS - A Music Emotion Recognition using Support Vector Regression. In their proposed paper, automatic emotion recognition of music has been evaluated using various machine learning classification algorithms such as SVM, SVR and GMM with remarkable increase in accuracy using SVR as compared to GMM. For further research, the paper suggests more perceptual features should be considered and other classification algorithms such as fuzzy and kNN (k-Nearest Neighbor). The paper also suggests comparing the result of machine learning based emotion recognition with human performed arousal/valence data. Chia-Chu Liu and team [6] presented an emotion detection and classification 9 2.2 Music classification system for pop music. The system extracts feature values from the training music files by PsySound2 and generates a music model from the resulting feature dataset by a classification algorithm. The system is designed using a hierarchical framework followed by an accuracy enhancement mechanism. The experimental results show that the system gives satisfactory performance. Furthermore, the system aims at popular music, so it can be applied to public music database software to provide emotion-based search. The features that affect the perception of emotion are associated with frequency centroid, spectral dissonance and pure tonalness. The paper suggests finding out the deeper relation between these features and music emotion in order to have a more accurate music mood classification. T. Li and M. Ogihara [21] discussed SVM-based multi-label classification method for two problems: classification into the thirteen adjective groups and classification into the six super-groups. The experiments showed an overall low performance which can be attributed to the fact that there were numerous borderline cases for which the labeler found it difficult to make decision. Experiments show that emotion detection is a rather difficult problem and improvement of performance is the immediate issue. This can be resolved by: expanding the sound data sets, collecting labeling in multiple rounds. Trung-Thanh Dang and Kiyoaki Shirai [8] proposed the classification of moods of songs based on lyrics and meta-data, and proposed several methods for supervised learning of classifiers. The training data was collected from a LiveJournal blog site in which each blog entry is tagged with a mood and a song. Then three kinds of machine learning algorithms were applied for training classifiers: SVM, Nave Bayes and Graph-based methods. The results showed the accuracy of mood classification methods is not good enough to apply for a real music search engine system. There are two main reasons: mood is a subjective meta-data; lyric is short and contains many metaphors which only human can understand. The authors hence planned to integrate audio information with lyric for further improvement. As per the contribution of Atin Das and Pritha Das [9], explained in their paper, some of the prevailing classifications of Indian songs were quantified by measuring their fractal dimension. Samples were collected from three categories: Classical, Semi-classical, and Light. After appropriate processing, the samples were converted into time series datasets and their fractal dimension was computed. The analysis presented here can be generalized to categorize different types of songs. Samples can be chosen from playing a prerecorded song or directly from the recorder device. Samples would be filtered to remove sounds from accompanying musical instruments to get only the sound of the voice. In the present case this was 10 2.3 Summary done manually and the length of music pieces used was not sufficient to accurately classify the songs. 2.3 Summary The literature survey helped us gain a better insight with reference to the mood analysis of music, various techniques used for the same along with their current performance limitations and corresponding improvement suggestions given by respective authors. It is clearly evident that lots of serious work has been going on for automatic mood identification and music analysis. Observing the work done so far, it is seen that Data Mining and Machine Learning techniques have played a good deal of part in learning music data. The fact still remains; the accuracy achieved so far needs more improvement from learning and identification perspective which calls for better algorithms and techniques. It is also seen that classification techniques have been much prevalent and performed well in mining music data as compared to clustering techniques and we too prefer the former. The striking and most important finding from the survey, which is worth noting, is that much of the music research has been done on non-Indian music. Although some work has been done on Indian Classical music, but not been explored to an extent as much as compared non-Indian Music with respect to mood. Indian Popular Music accounts for almost 72% of the music sales in India, which shows its immense popularity among the people. Identifying the lack of mood-based categorizers and the growing popularity and use of Indian popular music, we take this opportunity to develop an automatic mood recognition system for Indian popular music by analyzing existing classification mining techniques and developing a novel approach to automatically categorize the songs belonging to Indian Popular Music, according to their underlying mood. 11 Chapter 3 Music Mood Model Most of the literature dealing with music and psychology tells us that music mood is subjective and the mood of the same music piece can be interpreted differently by different individual. However, it is seen that there are considerable agreements about the moods underlying the music belonging to similar cultural context [30]. Thus music belonging to similar cultural background has a better chance of consensus among the individuals in interpreting the mood of the song. Our work limits the scope to India popular music which falls under a common cultural context thus increasing the chances of similar interpretations of the music among the individuals, when it comes to understanding mood. In order to classify songs according to their mood, its essential to identify the list of moods which a song can be categorized into. This chapter explores the various mood models that have been proposed and proven constructive in categorizing music as per the emotions. 3.1 Music Mood Relation Music psychology studies on music mood have a number of fundamental generalizations that can benefit MIR research as mentioned below: • There does exist mood effect in music and studies have confirmed the existence of functions of music which can change people’s mood [5]. Also, it comes naturally to associate mood labels with the music the listeners listen to [36] • Not all moods are equally likely to be aroused by listening to music. For instance emotions like sadness, happiness, peace have a very high probability of getting induced through music as compared to that of anger or disgust. 12 3.2 Mood(Emotion) Models • There do exist uniform mood effects among different people. Sloboda and Juslin [36] summarized that listeners are often consistent in their judgment about the emotional expression of music. • There is definitely some correspondence between listeners judgment regarding mood and the musical parameters such as tempo, rhythm, dynamics, pitch, mode, beats, harmony etc. [36]. People do relate to the tune or rhythm of a song and do most of the times hymn the tune. 3.2 Mood(Emotion) Models Mood Models are generally studied by two approaches:• Categorical approach: This introduces distinct classes of moods which form the basis for all other possible emotional variations. • Dimensional approach: This classifies emotions along several axes such as valence (pleasure), arousal (activity), potency (dominance) and so on. This is generally the most commonly used approach in music applications Human psychologists have done a great deal of work and proposed a number of models on human emotions. Musicologists have too adopted and extended a few of the influential models that we will be navigating through. The six universal emotions defined by Ekman [12]: anger, disgust, fear, happiness, sadness, and surprise, are well known in psychology. However, since they were designed for encoding facial expressions, some of them may not be suitable for music (for instance, disgust), and some common music moods are missing (for instance, calm or soothing). 3.2.1 Hevner’s experiment In music psychology, the earliest and still best known systematic attempt at creating music mood taxonomy was by Kate Hevner [18]. Hevner examined the affective value of six musical features such as tempo, mode, rythm, pitch , harmont and melody and studied how they relate to mood. Based on the study 67 adjectives were categorized into eight different emotional groups with similar emotions. The Figure 3.1 shows the emotional groups with adjectives belonging to each group. 13 3.2 Mood(Emotion) Models merry humorous lyrical dreamy joyous playful leisurely yielding gay whimsical satisfying tender happy fanciful serene sentimental cheerful quaint tranquil longing bright sprightly quite yearning pathetic delicate soothing pleading sad light exhilarated plaintive mournful graceful triumphant spiritual tragic vigorous dramatic lofty melancholy robust passionate inspiring frustrated emphatic sensational dignified depressing martial agitated sacred gloomy ponderous excited solemn heavy majestic impetuous sober dark exalting restless serious Figure 3.1: Hevner’s Mood Model 3.2.2 Russell’s model Both Ekmans and Hevners models belong to Categorical Model” because the mood spaces consist of a set of discrete mood categories.On the contrary, James Russell [31] came up with a circumplex model of emotions arranging 28 adjectives in a circle on two-dimensional bipolar space (arousal - valence). This model helped in separating and keeping away the opposite emotions. Figure 3.2 depicts the Russell’s model which has been adopted in a considerable number of musical psychology studies [31] [35] [38]. 3.2.3 Thayer’s model Yet another well-known dimensional model was proposed by Thayer [37]. It describes the mood with two factors: Stress dimension (happy/anxious) and Energy dimension (calm/energetic), and divides music mood into four clusters according to the four quadrants in the two-dimensional space: Contentment, Depression, Exuberance and Anxious (Frantic). In this model, Contentment refers to happy and calm music; Depression refers to calm and anxious music; Exuberance refers to happy and energetic music; and Anxious (Frantic) refers to anxious and energetic music. Such definitions of the four clusters are clear and with high discriminatory power. Such a dimensional mood model which divides the whole music emotion 14 3.2 Mood(Emotion) Models 900 Alarmed Aroused Astonished Tense Afraid Angry Excited Annoyed Distressed Frustrated Delighted Happy 1800 Pleased Miserable Glad Sad Serene At ease Content Satisfied Relaxed Calm Gloomy Depressed Bored Droopy Tired Sleepy 2700 Figure 3.2: Russell’s Mood Model High Energy Anxious Exuberance +ve Stress -ve Stress Depression Contentment Low Energy Figure 3.3: Thayer’s Mood Model 15 00 3.2 Mood(Emotion) Models Krauna Shringar Veer (Pathos) (Love) (Valor) Hasya Raudra Bhayanak (Happy) (Angry) (Horrific) Vibhatsa Adbhut Shaanti (Disgust) (Surprise) (Peace) Figure 3.4: Navras: Indian Classical emotion model space into four meaningful quadrants facilitates rough music mood categorization and thus is also widely adopted in mood recognition studies. Figure 3.3 depicts the Thayers model for mood. 3.2.4 Indian Classical model: Navras Since we are considering the analysis of Indian Music, we need to have a look at the traditional mood model that is prevalent since ancient times in the Indian Classical Music which forms the base for India music. Navras as it is termed in Sanskrit, it means nine sentiments. This model sums up all the major categories of emotions that a human can exhibit into total nine classes. These nine sentiments are depicted in the Figure 3.4. Studying the advantages and short-coming of various models learned so far and taking into consideration the Indian popular music scenario, deriving one of the mood models exactly from the existing ones mentioned in the literature cannot do justice in selecting the mood categories. Hence, we try to put forth a simple mood model covering majority of mood aspects after careful study and experiments as would be witnessed in coming chapters. 16 Chapter 4 Audio Features 4.1 Low level Audio Features The key components of a classification system are feature extraction and classifier learning [11]. Feature extraction addresses the problem of how to represent the music pieces to be classified in terms of feature vectors or pair-wise similarities. Many audio features have been proposed in the literature for music classification. Different taxonomies exist for the categorization of audio features. Weihs et al. [40] has categorized the audio features into four subcategories, namely shortterm features, long-term features, semantic features, and compositional features. Scaringella [33] followed a more standard taxonomy by dividing audio features used for genre classification into three groups based on timbre, rhythm, and pitch information, respectively. Each taxonomy attempts to capture audio features from certain perspective. Zhouyu Fu [42] characterizes the audio features into two levels: low-level and middle-level features as seen in Figure 4.1. Our audio feature selection is inspired by this two-tier taxonomy of audio features. Low-level features although not closely related to the intrinsic properties of music as perceived by human listeners, form the basic features which can be used to derive the mid-level features providing a closer relationship and include mainly three classes of features, namely rhythm, pitch, and harmony as seen in the Figure 4.1 . In our work we focus only on the low-level audio features which can be further categorized as :• Timbral features: These capture the tonal quality of sound that is related to different instrumentation. “Timbre” is the quality of a musical note or sound or tone that distinguishes different types of sound production, such as voices and musical instruments, string instruments, wind instruments, and 17 4.1 Low level Audio Features Top level labels Genre Mood Instrument (User perspective) Artist, Style, Other Semantic Gap Mid-level features Pitch Rythmn PH/PCP, EPCP Harmony CP, CH BH, BPM Timbre Temporal Low-level features ZCR, SC, SR, MFCC, SF ... SM, ARM, FP, AM, ... Figure 4.1: Audio Features Taxonomy percussion instruments. The physical characteristics of sound that determine the perception of timbre includes spectrum and envelope. • Temporal features: These capture the variation and evolution of timbre over time. In this work, more focus is laid on the instantaneous timbre values rather than their temporal variation, although not completely ignored. These low-level features are extracted using various signal processing techniques like Fourier transform, Spectral/Cepstral analysis, autoregressive modeling and similar computations. We follow the MPEG-7 standardization [28] and make use of the jAudio [24] and Marsyas [25] open source tools for extracting selective timbral spectral and temporal audio features from the music pieces. The features are extracted and consolidated for each music piece in a standard Attributerelation file format (ARFF) [2] so as to make it easy for mining the relations between these features with respect to the corresponding mood of the audio files. After a careful study and survey of various experts papers and publications, our current consolidated list of selected and extracted features is as mentioned below. The list below names just each form of audio feature, but the feature vector is comprised of its actual value as well as corresponding meta-features such as standard deviation, mean, logarithm wherever required as identified by McKay and team [24]. 18 4.2 Feature List 4.2 Feature List • Root Mean Square (RMS): RMS is calculated on a per window basis. It is defined by the equation: s R.M.S. = PN n x2n N where N is the total number of samples provided in the time domain. RMS is used to calculate the amplitude of a window. • Magnitude Spectrum: This feature extracts the FFT (Fast Fourier Transform) magnitude spectrum from a set of audio samples. It gives a good idea about the magnitude of different frequency components within a window. The magnitude spectrum is found by first calculating the FFT with a Hanning window [give ref]. The magnitude spectrum value for each bin is found by first summing the squares of the real and imaginary components. The square root of this is then found and the result is divided by the number of bins. • Power Spectrum: This feature extracts the FFT power from a set of audio samples. It gives a good idea about the power of different frequency components within a window. • Spectral Roll-off Point [15]: The spectral roll-off point is the fraction of bins in the power spectrum at which 85% of the power is at lower frequencies. It denotes the amount of the right-skewness of the power spectrum • Spectral Centroid [15]: This is a measure of the ”center of mass” of the power spectrum. It is obtained by calculating the mean bin of the power spectrum. The result returned is a number from 0 to 1 that represents at what fraction of the total number of bins this central frequency is. • Spectral Flux: It measures the amount of spectral change of a signal by calculating the difference between the current value of each magnitude spectral bin in current window and the corresponding value of the magnitude 19 4.2 Feature List spectrum of the previous window. Each of these differences is then squared, and the result is the sum of the squares. • Spectral Variability: It stands for the standard deviation of the magnitude spectrum of the audio signal. • Fraction of low energy windows [26]: This measures the quietness of the signal relative to the rest of a signal and is calculated by taking the mean of the RMS (Root Mean Square) of the last 100 windows and finding what fraction of these 100 windows are below the mean. • Zero Crossings [26]: This feature helps identify the pitch as well as the noisiness of a signal. It is calculated by finding the number of times the signal changes sign from one sample to another crossing the zero value. • Strongest Beat: This feature finds the strongest beat in a signal. • Beat sum: It is calculated by summing up the beat values of a signal and gives a measure of how important a role regular beats play in a piece of music. • Beat histogram: This feature helps to identify the strength of different rhythmic periodicities in a signal. This is calculated by taking the RMS of 256 windows and then taking the FFT of the result. • Strongest frequency via Zero Crossings: It denotes the highest frequency of the signal present at the Zero crossing point. This is found by mapping the fraction in the zero-crossings to a frequency in Hertz. • Mel Cepstral coefficients (MFCC):This feature constitutes the co-efficients derived from the Cepstral representation of audio signal such that the frequency bands are equally spaced on the Mel scale approximating the human auditory system’s response more closely. MFCCs are more commonly and widely used as features in speech recognition systems. In recent times these 20 4.2 Feature List features are increasingly finding uses in Music information retrieval, audio similarity measures, genre classification. • Linear Predictive Coding (LPC) coefficients: This feature helps in representing the spectral envelope of an audio or speech signal. • Spectral smoothness: This feature is calculated by evaluating the log of a partial minus the average of the log of the surrounding partials and is based upon Stephan McAdam’s Spectral Smoothness [23]. This feature helps in identifying the peak based calculation of the smoothness of an audio signal. • Relative difference function: It represents the onset detection and is calculated as the log of the derivative of the Root Mean Square value. • Mood: This is the class attribute that needs to be populated during the training and which is detected automatically while testing the classifier against a new audio file. In the given list of audio features some features have a single dimension - for instance, Strongest Beat, which has just one value - and on the contrary, some features have variable dimensions - for instance Beat Histogram, which has a series of values exhibiting the histogram. The variable dimension however depends on the window size, which in case of this work has been kept constant to “32” for every 30-second audio clip in the data-set. Hence, including the class attribute, the data-set consists of a total of 330 feature vectors. 21 Chapter 5 Mining Mood from Audio Features 5.1 Overview of Data Mining Data mining is the field which deals with the extraction of interesting, non-trivial, implicit, previously unknown and potentially useful patterns or knowledge from huge amount of data. Data mining is often termed as knowledge discovery in databases (KDD). A typical knowledge discovery process can be seen as depicted in the Figure 5.1. Data Mining can be considered as a confluence of various disciplines, including database systems, statistics, machine learning, visualization, and information science. Moreover, depending on the data mining approach used, techniques from other disciplines may be applied, such as neural networks, fuzzy and/or Figure 5.1: Data Mining in Knowledge discovery 22 5.2 Overview of Data Mining functionalities Figure 5.2: Data Mining disciplines rough set theory, knowledge representation, inductive logic programming, or highperformance computing. Depending on the kinds of data to be mined or on the given data mining application, the data mining system may also integrate techniques from spatial data analysis, information retrieval, pattern recognition, image analysis, signal processing, computer graphics, Web technology, economics, business, bio-informatics, or psychology. Figure 5.2 depicts few of these prominent disciplines closely associated with Data mining. 5.2 Overview of Data Mining functionalities Data mining functionalities are used to specify the kind of patterns to be found in data mining tasks. In general, data mining tasks can be classified into two categories: descriptive and predictive. Descriptive mining tasks characterize the general properties of the data in the database. Predictive mining tasks perform inference on the current data in order to make predictions. Since this work is related to predicting the mood underlying a particular music piece, the focus of the work is directed towards exploring the “predictive” mining tasks rather than “descriptive” mining. Following are the various Data Mining functionalities that exist formally with a short description of each:- 23 5.2 Overview of Data Mining functionalities • Characterization and discrimination: Data characterization is a summarization of the general characteristics or features of a target class of data. The data corresponding to the user-specified class are typically collected by a database query. For example, to study the characteristics of software products whose sales increased by 10% in the last year, the data related to such products can be collected by executing an SQL query. Data discrimination is a comparison of the general features of target class data objects with the general features of objects from one or a set of contrasting classes. The target and contrasting classes can be specified by the user, and the corresponding data objects retrieved through database queries. The output in both cases can in the form of pie-chart, bar graphs and similar constructs for the analyst to study the data. • Frequent patterns, Association, Correlation: Frequent patterns, as the name suggests, are patterns that occur frequently in data. There are many kinds of frequent patterns, including item-sets, subsequences, and substructures. The data under consideration can be analyzed for such frequently occurring patterns of data attributes which leads to the discovery of interesting associations and correlations within data. • Classification and prediction: Classification is the process of finding a model (or function) that describes and distinguishes data classes or concepts, for the purpose of being able to use the model to predict the class of objects whose class label is unknown. The derived model is based on the analysis of a set of training data (i.e., data objects whose class label is known). Whereas classification predicts categorical (discrete, unordered) labels, prediction models continuous-valued functions. That is, it is used to predict missing or unavailable numerical data values rather than class labels • Cluster analysis: Unlike classification and prediction, which analyze class-labeled data objects, clustering analyzes data objects without consulting a known class label. In general, the class labels are not present in the training data simply because they are not known to begin with. Clustering can be used to generate such labels. The objects are clustered or grouped based on the principle of maximizing the intraclass similarity and minimizing the interclass similarity. That is, clusters of objects are formed so that objects within a cluster have high similarity in comparison to one another, but are very dissimilar to objects in other clusters. 24 5.3 Classification • Outliers analysis: A database may contain data objects that do not comply with the general behavior or model of the data. These data objects are outliers. Most data mining methods discard outliers as noise or exceptions.However, in some applicate ions such as fraud detection, the rare events can be more interesting than the more regularly occurring ones. The analysis of outlier data is referred to as outlier mining. • Trend and evolution analysis: Data evolution analysis describes and models regularities or trends for objects whose behavior changes over time. Although this may include characterization, discrimination, association and correlation analysis, classification, prediction, or clustering of time related data, distinct features of such an analysis include time-series data analysis, sequence or periodicity pattern matching, and similarity-based data analysis. 5.3 Classification This work involves learning the mood aspect of music by analyzing the various feature vectors extracted from each audio file. The learning done thus can facilitate in identifying which specific category of mood a particular audio file belongs to, provided its fixed set of audio features are available. Of all the functionalities of data mining just described, “classification” and “cluster analysis” seem to be the best methods of discovering the mood information from the music feature dataset. Also, as witnessed in most of the literature survey, classification algorithms have always proved to be quite effective as compared to others in analyzing the mood or genre aspect of music data-sets so far. Our own experimentation too has revealed a quite higher performance of classification algorithms as compared to clustering algorithms. Hence, we opt for the classification techniques of mining this music feature data-set with a supervised learning approach. The Figure 5.3 shows a general process of classification. It is a two step process :• First: This is also called as a “learning step” or “training phase” which involves learning of a mapping or a function y = f(X), that can predict the associated class label y of a given tuple X. In this view, we wish to learn a mapping or function that separates the data classes. Typically, this mapping is represented in the form of classification rules, decision trees, or mathematical formulae. This mapping or function is generally termed as the “Classification Model”. As seen in the step 1 of Figure 5.3, each row of 25 5.3 Classification Figure 5.3: Classification process the table represents the tuple X. The function f(X) is learnt as a process of training by using classification algorithms, and corresponding rule is stored in the classifier model. This rule helps in predicting if the person represented in the tuple X is tenured (yes) or not (no) depending upon the values of various attributes of the tuple. • Second: This model is used for classification. The model is evaluated against the test data-set in order to predict the class label of each data instance as has been learned from the model. The results are compared with actual classes of the test data and accordingly decided whether the model is accurate enough to classify the test data. If the model is acceptable, it can be used further for classifying data with unknown classes. As seen in the step 2 of Figure 5.3, the classifier model evaluates over an unknown tuple X by applying the function f(X) learnt in order to predict its outcome. 5.3.1 Classification using Decision-tree Classification of data can be achieved by various methods, to name a few:• Classification by Decision Tree Induction • Bayesian Classification • Artificial Neural Networks • Rule-Based Classification • Classification by Back-propagation 26 5.3 Classification Figure 5.4: Classification using Decision Tree • Support Vector Machines • Associative Classification Since this work focuses more on a Decision-tree based classification approach, the description of the rest of methods is outside the scope of the document although relevant information can be found in the book published by Han Kamber [17]. Decision tree induction is the learning of decision trees from class-labeled training tuples. A decision tree is a flowchart-like tree structure,where each internal node (non-leaf node) denotes a test on an attribute, each branch represents an outcome of the test, and each leaf node (or terminal node) holds a class label. The topmost node in a tree is the root node. A typical example of a Decision tree is as shown in the Figure 5.4. Given a tuple, X, for which the associated class label is unknown, the attribute values of the tuple are tested against the decision tree. A path is traced from the root to a leaf node, which holds the class prediction for that tuple. Decision trees can easily be converted to classification rules. Figure 5.4 represents a decision tree to predict the class for sanctioning a credit (class values: yes, no) depending upon various parameters of credit risk assessment like age, current credit rating and profession. For instance, a senior with an excellent credit rating definitely has higher chances of sanctioning as compared to the one who has comparatively fair credit rating. 27 5.3 Classification Why Decision trees? Following are the few strong reasons why decision trees have been considered so often when it comes to classification techniques [17]: • The construction of decision tree classifiers does not require any domain knowledge or parameter setting, and therefore is appropriate for exploratory knowledge discovery. • Decision trees can handle high dimensional data. • Their representation of acquired knowledge in tree form is intuitive and generally easy to assimilate by humans. • The learning and classification steps of decision tree induction are simple and fast. • In general, decision tree classifiers have good accuracy. • Decision tree induction algorithms have been successfully used for classification in many application areas, such as medicine, manufacturing and production, financial analysis, astronomy, and molecular biology. 5.3.2 Random Forest Classification In order to improve the classification accuracy, ensemble methods like bagging and boosting have been proved quite productive. Ensemble methods use a combination of models of a series of k learned classification models , M1 , M2 , ..., Mk , with the aim of creating an improvised model in terms of classification accuracy. Our work makes use of the “Bagging” approach also called as “Bootstrap aggregation”. In this method, bootstrap samples of data-sets are created by randomly sampling the features and data instances from the given training set with replacement. These samples are then independently and simultaneously used for training and learning classifier models separately for each sample. Finally, the classification is done by considering the maximum of the votes taken from all the models learnt. Random forest approach involves learning such ensemble consisting of a bagging of un-pruned decision tree learners with a randomized selection of features at each split.This is done by randomly sampling a feature subset for each decision tree (as in Random Subspaces [19]), and/or by randomly sampling a training data subset for each decision tree (as in Bagging [4]). 28 5.4 Random Forest Highlights Random Forests Algorithm Following is a simplified algorithm explaining the Random forest approach 1 2 3 4 5 6 7 8 9 10 11 12 Data: Training set, Ntrees = Number of trees Result: Majority vote of classification initialization; for i ← 1 to Ntrees do Select a new bootstrap sample from training set; Grow an un-pruned tree on this bootstrap; for each internal node do Mtry ← Random number of predictors; Choose best split for these Mtry predictors; end Save the un-pruned tree built; Record the vote of classification for each class; Return the Majority vote; end Algorithm 1: Random forest CART (Classification and Regression Tree) is chosen for building the Randomly generated trees as it is evident from the literature [4] that Random forests generated from CART yield better results as compared to other tree algorithms in most of the cases. 5.4 Random Forest Highlights Random Forests have time and again proven useful and effective in many classification problem scenarios. Here are few highlights of this approach that make it more appropriate and suitable for the purpose of mood classification of highdimensional music data-sets:• Random forests readily handle large number of classifiers. • They are faster to train and evaluate as compared to other comparable approaches. • Random forests exhibit stronger resistance to over-training and thus overfitting. • Separate Cross-validation is unnecessary in case of Random forests since it is already taken care at the time of forest building. 29 5.5 Our Approach: Bagging of Random Forests • Random forests generally have similar accuracy as Support Vector Machines, Neural Networks although Random forests have shown much better performance in case of huge and high-dimensional data-sets. 5.5 Our Approach: Bagging of Random Forests In this work we present an additional hierarchy of ensemble by generating an ensemble of Random Forests using bootstrap aggregation also know as Bagging. The Algorithm for the same can be explained as below. 5.5.1 1 2 3 4 5 6 7 8 Algorithm Data: Training set, Nf orests = Number of forests Result: Majority vote of classification initialization; for i ← 1 to Nf orests do Select a new bootstrap sample from training set; Generate a Random forest on this bootstrap with un-pruned random tress as mentioned in Algorithm1; Save the Random Forest built; Record the majority vote of classification among the trees; end Return the majority vote of classification among the Random Forests; Algorithm 2: Bagging of Random forest For growing Random Trees, the randomly sampled data attributes are split on the basis of “Gini Index” which has shown better results when working with CART trees. Gini Index, basically measures the impurity of the data set D at hand and is given by the formula:Gini(D) = 1 − m X p2i i=1 where pi is the probability that a tuple in D belongs to class Ci . The sum is computed over m classes. The Gini index considers a binary split for each attribute. For each split, a weighted sum of the impurity of each resulting partition is calculate. Suppose dataset D is split into partition D1 and D2 , on the basis of attribute A1 then the Gini Index of attribute A1 for splitting dataset D is given by :- 30 5.5 Our Approach: Bagging of Random Forests GiniA1 (D) = |D1 | |D2 | Gini(D1 ) + Gini(D2 ) |D| |D| The Gini Index is computed for all the elligible splitting attributes and the reduction in impurity of Gini index for each attribute is calculated by the formula:∆GiniA1 = Gini(D) − GiniA1 (D) The attribute maximizing the above mentioned reduction in impurity is selected as the splitting attribute. The approach mentioned in Algorithm 2 has not only shown a rise in accuracy of classification of music data-sets as compared to traditional Random forest approach, but also has shown a consistent better performance as compared to other classification techniques as would be discussed in coming chapters. 31 Chapter 6 Mood Identification System 6.1 Mood Model Selection As seen in the literature, the mood models studied were mostly from the perspective of psychology. The way the new dimensional models like Thayer’s Model [37] and Russell’s model [31] were proposed, if we map music emotion on any of these dimensional models, then the different emotions could be plotted with different coordinates on the two-dimensional plot, which would have to be grouped together to get different categories of emotions. There is always a trade-off between the number of emotions a mood model can portray. A very large number of different moods can be confusing and frustrating for an end-user to choose a song belonging to one of these moods and a very small number will be too general to isolate the basic emotions. Since most of the mood models in the literature have been evaluated on non-Indian music, in this work we do consider the Indian perspective explained by the nine sentiments (Navras) as mentioned in section 3.2.4. Out of these nine emotions, however, emotions like anger, horrific, surprise are seen very rarely seen described by just music alone. These emotions are a combined effect of music, expression and act. Also, some emotions like Hasya (Happiness) need a further subdivision, for instance, happy, excited. Hence this model cannot be used as it is for analyzing the mood aspect of Indian popular songs. It needs further changes as to how people interpret these songs. A list of 2500 Indian popular songs which are well-known were chosen by surveying the songs most liked by majority of the people. A short experiment similar to Hevner’s [18] was conducted with the help of a panel of 5 music listeners wherein each member of the panel was independently suppose to listen to a 30 second clip of each of the 2500 songs and note down the best adjective(s) that they think describes the song emotion aptly. The panel constituted of one Music 32 6.2 System Overview expert, two avid music listeners and two common music listeners. The adjectives collected were then grouped together depending on the similarity and the music clip under consideration. A total of five groups of moods were categorized each covering a group of adjectives of songs with similar emotional quotient. These five categories of moods forms our mood model as shown in the table 6.1:Table 6.1: Mood Model: Indian popular Hindi music Mood Category Adjectives represented Happy Sad Silent Excited Romantic 6.2 cheerful, funny, comic, happy, jovial depressed, frustrated, angry, betrayal, withdrawal, serious peaceful, calm, silent, nostalgia, slow-paced danceable, celebration, fast-track, excited, motivational, inspirational love, romantic, playful System Overview The Mood Identification system is the main engine which would help identify the mood of given music or audio files. This system is designed as an open source software system. The system would generally be a part of the back-end in most of the applications whose result can be used by the application layer on top to utilize the information in the required way. The system has two-fold objectives as mentioned below:1. The system should have a provision of analyzing music files and learn the classifier models associated with them 2. It should be able to predict the class of mood that a particular audio file or music belongs to. An abstract view of the Mood Identification system as seen from a users’ perspective is as shown in the Figure 6.1. The system accepts music files as input from the user and returns the respective mood associated with each file to the end-user. 33 6.3 System Design and Components Figure 6.1: Mood Recognition System 6.3 System Design and Components The system can be divided into several components, each dedicated to perform a particular task as shown in the Figure 6.2 and as explained in the following content:- 6.3.1 Audio Pre-processor This component as the name signifies has the main task of preprocessing the audio files that are fed by the user to the system. The preprocessing task involves :a. Audio file splitting: Each of the input music file split into a consequent clips, each of 30 second duration. A music or a song is generally of a duration of more than a couple of minutes at least, which makes it difficult to analyze it due to the enormous data content within this duration. Moreover, 30 seconds has been proven to be quite good duration from analysis point of view as it is not too short to lose any important content and not too long to increase the processing time. It is very much possible to relate a particular mood to a song by just listening to an excerpt of 30 seconds of that song. b. Audio format conversion: Each of the 30 second music clip is converted to a standard WAV format (PCM signed 16 bit, stereo) with a sampling rate of 34 6.3 System Design and Components Input : Music Files Output: mood Audio Pre-processor Mood Identification system File Splitter Mood Classifier Wav format Learner Mood Mood Model Detector convertor Train Data Test Data Audio Feature Extractor Figure 6.2: Mood Detection System: Detailed Design 44.1 kHz. Currently the system supports MP3 and WAV formats which are widely used for audio. A provision for conversion of other formats can also be done by extending the existing code interfaces. Format conversion to a single format is necessary so as to ensure the files that would be processed and analyzed are consistent in structure and format thereby ensuring similar treatment and processing unlike the case would have been if the formats were different. This component thus makes sure that the input music files provided by the user are transformed so that they can be ready for processing and analyzing further. 6.3.2 Audio Feature Extractor This module revolves around the audio signal features associated with the music clips obtained from the Audio Pre-processor. The module performs two main tasks :a. Feature Extraction: Each of the 30 second music clip received as input is processed by applying mathematical signal computations like Fourier transforms, logarithms, integrals, to name a few, and their variants and combination. These mathematical functions are representatives of each of the feature mentioned in the list discussed in section 4.2. The module implements the computations and functions involved for calculating each of the 35 6.3 System Design and Components features mentioned. Most part of the module implementation is inspired and extended from the well-know open source tool jAudio [24], with some variations and customizations as required for this work. The features extracted are either fixed dimension or multi-dimensional. A feature vector comprising all of these features are extracted for each of the music clip. b. Data-set generation: The feature vectors thus extracted form the attributes of the each music clip - which can be called as a data instance. These feature vectors computed in the memory are stored in a flat file following the standard ARFF file format understood by most of the data mining tools like Weka [41]. In addition to the features extracted, another feature called “Mood” is appended to each data instance. This particular feature will be manually updated in case of a training set and can be possess any dummy value in case of real scenarios of mood prediction. 6.3.3 Mood Identification System This is the main processing unit of the whole system and is responsible for mining the mood from the music data-set obtained as input from the audio feature extractor module. It comprises of the actual implementation of algorithms mentioned in section 5.5. The module has two important roles to perform as mentioned below:a. Mood Learner: In this case, the input received is a training data-set of music features with the “Mood” attribute manually updated by the domain experts, from the point of view of training. The Mood learner can make use of the existing mining algorithms or newly written algorithms, provided they follow the convention and framework laid down by Weka tool [41]. Thus, this module can serve as the experimenter so that user - analyst or researcher can utilize it to try various algorithms to mine mood aspect of the underlying music data-set. The classifier model learnt can thus be saved so that it can be utilized for further evaluation purpose. The output of this part of the module generally serves useful to end-users who are analysts or researchers, keen to understand and tune the machine learning aspect of this whole process. Using this model, the classifier model for bagging of Random forest approach was trained and store after careful evaluation and comparisons with other comparable models. Mood learning is generally one-time activity. Once done, the model is saved and can be re-used for evaluations any number of times. However, depending upon the user preference, the learning can be made iterative to improve accuracy with the most updated music data which 36 6.3 System Design and Components evolves over time to a great extent. This change, however, might require few code changes which is out of the scope of this project currently. b. Mood Detector: In this case, the music data-set received as input will have some dummy data in the “Mood” attribute as this feature is not known ans is expected to be predicted by this module. The Mood detector then evaluates the data-set under consideration against the mood classifier model that has been saved. The evaluation results in predicting the mood for every 30 second music clip that was fed to the system by the user. In case a whole song was fed instead by the user, the system returns the maximum voted mood from the moods predicted for all of the clips derived from that song. The output of this module is generally used by the end-user application such as a mood-annotator or any Music information retrieval application or even the end-user himself/herself. Although the module helps in detecting the mood of the music under consideration, the whole and sole control of accepting or rejecting this decision can be always given to the end-user with some minor enhancements to the code. 37 Chapter 7 Experiments and Results The project involved a lot of rigorous experimentation from data mining point of view. In addition to it, the preparation and pre-processing involved in carrying out the experimentation is also worth mentioning. This section describes the experimental apparatus, flow and results obtained during the experimentation for music mood identification process. 7.1 Experimental Setup The apparatus included:• A huge diverse personal music collection of Indian popular music in mp3 or wav format. • Open-source tools and libraries for audio processing. • Open-source data mining framework - Weka [41]. • Music Mood Identification System. • Panel of five people - one Music expert, two avid Music listeners, two common music listeners • One workstation for software development, assembly and execution 7.1.1 Data Collection The data collection involved personal music collection of Indian popular Hindi songs. Only those songs which are generally popular and famous among the people were selected and care was taken to ensure there is a good mix of collection of 38 7.1 Experimental Setup songs spawning across each of the five mood classes. Only songs belonging to MP3 or WAV format were shortlisted in alignment with the scope of the project. 7.1.2 Data pre-processing Dataset generation was carried out in three stages. First stage consisted of 490 songs, the second consisted of 2200 songs and by the third stage a total of 2300 audio songs, popular and belonging to Indian Hindi films were processed to generate the dataset. All the songs were trimmed to 30 seconds duration clips. Their low-level features were extracted and consolidated in an ARFF file dataset. Each entry was annotated with respective most probable mood from the data collected by consulting the panel of five people in order to recreate a real scenario for supervised training. 7.1.3 Training and Testing The datasets in each stage were subjected to a range of various existing classification algorithms under numerous runs and folds. Those algorithms showing a bias towards only specific class labels or performing very low were discarded thereby subjecting the dataset to a 66%-34% training-testing split learning and evaluation for all the algorithms. Following are top 11 algorithms which have shown top comparable results during this experimentation:• NaiveBayes • Support Vector Machines • J48 (ID3 algorithm implementation) • Random Tree • Random Forest • REPTree • Simple CART (Classification and Regression Trees) • Bagging of Random Trees • Bagging of Random Forests • Bagging of simple CART • Bagging of REPTree 39 7.2 Results 7.2 Results 7.2.1 Evaluation Metrics The 11 classification algorithms were evaluated with respect to four evaluation measures for each of the datasets generated:• Receiver Operating Characteristic (ROC) : It shows the trade-off between the true positive rate and the false positive rate. It is a two-dimensional plot with vertical axis representing the true positive rate and horizontal axis representing the false positive rate. A model with perfect accuracy will have an area of “1”. The area under the ROC curve is a measure of the accuracy of the model. It ranks the test tuples in decreasing order: the one that is most likely to belong to the positive class appears at the top of the list .The closer to the diagonal line (i.e., the closer the area is to 0.5), the less accurate is the model. Area under ROC was mainly used in signal detection theory and medical domain where in it was said to be the plot of Sensitivity verses 1- Specificity which is a similar plot as defined earlier. For each of the five classes of mood model, area under ROC is calculated and more the value nears “1”, more accurate the classification is. • Confusion Matrix: The columns of the confusion matrix represent the predictions, and the rows represent the actual class. Correct predictions always lie on the diagonal of the matrix. Equation 7.1 shows the general structure of confusion matrix. " TP FN FP TN # (7.1) wherein, True Positives (TP) indicate the number of instances of a class that were correctly predicted, True Negatives (TN) indicate the number of instances NOT of a particular class that were correctly predicted NOT to belong to that class. False Positives (FP) indicate the number of instances NOT belonging to a class were incorrectly predicted belonging to that class and False Negatives (FN) indicate the number of instances that were incorrectly predicted belonging to other class. Though the confusion matrix gives a better outlook on how the classifier performed than accuracy, a more detailed analysis is preferable which are provided by the further metrics. Since, in this case we have five mood classes, the confusion matrix will be a 5 X 5 matrix with each diagonal representing the True positives. 40 7.2 Results • Recall: Recall is a metric that gives a percentage of how many of the actual class members the classifier correctly identified. (FN + TP) represent a total of all minority members. Recall is given by equation 7.2 Recall = TP TP + FN (7.2) • Precision: It gives us the total the percentage of how many of a particular class instances as determined by the model or classifier actually belong to that particular class. (TP + FP) represents the total of positive predictions by the classifier. Precision is given by equation 7.3 P recision = TP TP + FP (7.3) Thus in general it is said that Recall is a Completeness Measure and Precision is a Exactness Measure. The ideal classifier would give value as 1 for both Recall and Precision but if the classifier gives higher(closer to one) for one of the above metrics and lower for the other metrics in that case choosing the classifier is difficult task. In such cases some other metrics as discussed further are suggested in the literature. • F-Measure: It is a harmonic mean of Precision and Recall. We can say that it is essentially an average between the two percentage. It really simplifies the comparison between the classifiers. It is given by the equation 7.4. F − M easure = 1 ( Recall 2 1 + P recision ) (7.4) Figures 7.1, 7.2, 7.3, 7.4 depict the performance of the algorithms with reference to the four measures namely, AUROC, Recall, Precision and Fmeasure. From each of the results, it can be seen that Bagging(ensemble) approach of classification tree algorithms like RandomForest, RandomTree and SimpleCART showed better results as compared to other algorithms, and Bagging of Random Forest topped among all consistently. 41 7.2 Results Figure 7.1: Area under ROC statistics Figure 7.2: Recall statistics 42 7.2 Results Figure 7.3: Precision statistics Figure 7.4: F-measure statistics 43 7.2 Results Table 7.1: Experimental Results on Test Dataset of 2938 music clips TP FP Precision Recall FROC Mood Rate Rate Measure Area Class 0.964 0.805 0.77 0.822 0.871 0.853 0.106 0.021 0.006 0.019 0.038 0.042 0.751 0.914 0.971 0.867 0.849 0.867 0.964 0.805 0.77 0.822 0.871 0.853 0.845 0.856 0.859 0.844 0.86 0.853 0.991 0.978 0.967 0.977 0.983 0.98 excited happy romantic sad silent ← Avg. Table 7.2: Experimental Results on Test Dataset of 2938 music clips a b c d e ← Classified As 704 69 94 34 36 16 511 16 5 11 1 7 470 1 5 0 15 10 314 23 9 33 20 28 506 a = excited b = happy c = romantic d = sad e = silent The Table 7.1 shows the evaluation results obtained for the said metrics after performing a test run a dataset of 2938 music clips belonging to Indian popular Hindi music. The table shows the classification performance for each of the mood category defined in the mood model.The last row represents the metric values with weighted average taken over all the classes. The Table 7.2 displays the confusion matrix for the evaluation of the Test dataset of 2938 music clips belonging to Indian popular music. As seen from the matrix, the diagonal elements marked bold are the correctly identified data instances and denote the True positives. From the data seen in the matrix, following can be inferred:Total number of instances: 2938 Number of correctly classified instances: 2505 (85.26%) Number of incorrectly classifier instances: 433 (14.74%) 44 Chapter 8 Applications Our work we believe can contribute substantially to a variety of real world applications involving music. Following are a few of the many fields that can reap the benefits this system:- 8.1 Music Therapy Applications The field of Music Therapy involves clinical use of music in a therapeutic way to treat individuals by addressing their physical, emotional, social and cognitive needs. As a result of the tremendous research and successful experiments, Music Therapy has emerged as an important field using music as medium to improve the quality of life of the people in spite of diversity, disability or illness. Receptive Musical Therapy is one of the many important streams of this field wherein after examining the condition of the individual; the Music Therapy Expert plans and recommends a routine involving listening to a particular type of music. Since this therapy is more close to emotional and psychological needs of the individual, mood underlying the music plays an important role in the choice of music. Automatic mood recognition of music can help to reduce efforts of the expert to manage, search and recommend the appropriate music relevant for the individual. This can also be extended to online self-therapy applications wherein the individuals can themselves choose the appropriate music accurately as directed by the expert and without much search efforts. 8.2 Music Information Retrieval MIR systems aim at extracting information from music. This information can be used for various music applications like Recommender systems, Instrument 45 8.3 Intelligent Automatic Music Composition recognition and separation applications, Automatic categorization systems and many more. Our system can contribute to the Automatic categorization systems wherein the music can be categorized by its corresponding mood recognized by our system automatically. This will not only help to organize the music in a much better way but also reduce the overhead on users for selecting a list of songs suiting the current mood or occasion. With this system in place, user can just choose a mood and the system can give him the list of all songs belonging to that mood. From this subset, the user has to select the songs he wishes to listen to and this subset is very small in size as compared to the whole set of songs wherein by using traditional technique the user selects the songs list either by song name, album or artist and then searches for the song that matches the mood. The system can also find application in recommender system to recommend the songs matching the mood along with other traditional parameters, which can definitely give better results. 8.3 Intelligent Automatic Music Composition Music in today’s world is created and composed by highly skilled and trained musicians. With the increasing innovations in technology, many softwares and devices have also proved beneficial in assisting the musicians in easing the efforts put to compose music from various instruments, singers and merge or process it. A lot of research is going on from all parts of the world with the aim of building a system which can compose music automatically and intelligently enough to sound interesting the way humans compose it. Building such application will not only require a lot of music signal processing , pattern recognition and matching but also a great deal of information and data about the music in order to produce a novel music composition. Mood of the music pieces can form one of the important parameter in searching music pieces to be put together to generate a new music. Our system can help at this stage by automatically recognizing and annotating music pieces. 46 Chapter 9 Conclusion and Future Work 9.1 Conclusion We successfully experimented with the task of mapping audio features of Indian Popular Music with respective moods with the top precision ranging in between 75% and 81% with respect to Fmeasure and 70% to 75% precision measure. The best accuracy w.r.t. area under ROC was observed in the range 0.91 to 0.94 which seems quite satisfactory. The Bagging of Random Forest approach thus performed much better as compared to not just other decision tree based algorithms but other classification algorithms as well. This was a new observation in case on analysis of Indian popular music unlike western music where SVM and neural network algorithms dominated the classifier accuracy. The classification performance achieved seems satisfactory so far thus making it useful for use in real applications. The open source framework developed as a part of the project also serves as a common framework for music data mining analysis in terms of an end-to-end solution. Although the current approach has proved satisfactory results, we consider this as just a first step in exploring Indian popular music and it opens avenues for further research and developments in this are to bring more efficient results. 9.2 Future Work The path forward involves a further cycle of experimentations and refinement of the audio features and the mood categories if required, so as to enrich the dataset in addition to increased number and variety of songs to extract further valuable information for mood learning. During this development cycle the mood model also has a chance to likely undergo some changes to suit best the Indian 47 9.2 Future Work Song scenarios. The current system is capable of recognizing the mood of songs of 30 second duration. Further this can be extended to derive the mood of the entire song by collectively recognizing and weighing the moods recognized for the 30-second trimmed clips of the song. In future, this system can be extended to other genres of Indian songs like Hindustani classical, Carnatic music with changes involving audio features and classification techniques. Customization of this system to non-Indian songs cannot be ruled out as well after a thorough experimentation. Since some of the moods represented by Indian popular music are very much governed by expressions, which are very well conveyed through lyrics, lyrics analysis in combination with audio features can make the system much stronger with a better accuracy. 48 Chapter 10 Project Milestones 10.1 Project Schedule Table 10.1 outlines the project schedule and major milestones achieved during the schedule in the direction of completion of the project. The project was on schedule and completed in the time planned with respect to the scope assigned for the project. Table 10.1: Weekly Schedule of Project Starting 1 July, 2011 Week Task Status 1 to 4 5 to 6 7 8 to 9 10 to 11 12 13 to 14 15-16 17 18 to 19 20 to 23 24 25 to 26 27 to 29 30 to 31 32 33 34 to 35 36 to 40 41 to 44 Problem Statement Indentification Problem Statement Finalization Project Synopsis, Literature Survey - MIR Literature Survey - MIR and Music Analysis Literature Survey - Mood Classification Literature Survey - Audio Features Literature Survey - Feature Extraction Tools Literature Survey- Data Mining for Music Data Collection and Preprocessing Detailed System Design, Data Processing Existing mining algorithms training Algorithm performance analysis Algorithm refinement Feature selection refinement Mood Model refinement Feature and Mood model finalization Data collection and Dataset re-structuring Re-analysis and evaluation of model learnt Code integration and testing Bug solving and fixing 49 Completed Completed Completed Completed Completed Completed Completed Completed Completed Completed Completed Completed Completed Completed Completed Completed Completed Completed Completed Completed 10.2 Publications’ status 10.2 Publications’ status Right from the conception of the project till its completion for the said scope, the project has been through various stages wherein we have received a very good response, suggestions and critics regarding our work as presented through our research papers. Few of our papers related to our work done, which have been reviewed, accepted and appreciated by notable international conferences are listed below Title Table 10.2: Paper publications’ status Conference Status Automatic mood clas- Asia Modelling Symposium 2012, Pro- Published sification model for In- ceedings to be published in IEEE Comdian Popular Music puter Society Digital Library (CSDL) and I-Xplore http://ams2012.info Mood based classifica- CUBE 2012, Proceedings Accepted tion of Indian Popular to be published in ACM, Music http://www.thecubeconf.com/academic/ Music Mood Identifi- 5th International conference on Psy- Accepted cation: A Data Min- chology of Music and Mental Health, ing Approach Bangalore, http://www.nada.in/ 50 Bibliography [1] Tsunoo, E., Akase, T., Ono, N., Sagayama S., (2010), “Musical mood classification by rhythm and bass-line unit pattern analysis”, Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). [2] Attribute-Relation File Format, http://www.cs.waikato.ac.nz/ml/weka/arff.html [3] Doris Baum, (2006), “Emomusic - Classifying music according to emotionl”, Proceedings of the 7th Workshop on Data Analysis (WDA), Kosice, Slovakia. [4] Breiman, L., (1996), “Bagging predictors”. Machine Learning 24(2), 123140. [5] Capurso, A., Fisichelli, V. R., Gilman, L., Gutheil, E. A., Wright, J. T., (1952), “Music and Your Emotions”, Liveright Publishing Corporation. [6] Chia-Chu Liu, Yi-Hsuan Yang, Ping-Hao Wu, Homer H. Chen, (2006), “Detecting and classifying emotions in popular music”, JCIS Proceedings’. [7] Dalibor Mitrovic, Matthias Zeppelzauer, Horst Eidenberger, (2007), “Analysis of the Data Quality of Audio Descriptions of Environmental Sounds”, Journal of Digital Information Management, 5(2):48. [8] Trung-Thanh Dang, Kiyoaki Shirai, (2009), “Machine Learning Approaches for Mood Classification of Songs toward Music Search Engine”, International Conference on Knowledge and Systems Engineering. [9] Atin Das, Pritha Das, (2005), “Classification of Different Indian Songs Based on Fractal Analysis”, Complex Systems Publications. [10] Dewi K.C., Harjoko. A., (2010), “Kid’s Song Classification Based on Mood Parameters Using K-Nearest Neighbor Classification Method and Self Organizing Map.”, International Conference on Distributed Frameworks for Multimedia Applications (DFmA). 51 BIBLIOGRAPHY [11] Duda, R. O., Hart P. E., (2000)“Pattern Classification”,New York Press: Wiley. [12] Ekman P., (1982), “Emotion in the Human Face”, Cambridge University Press, Second ed. [13] Paul R. Farnsworth, (1958), “The social psychology of music, The Dryden Press. [14] Fu, Z., Lu, G., Ting, K. M., Zhang D., (2010), “A survey of audio-based music classification and annotation”, IEEE Trans. Multimedia. [15] Geroge Tzanetakis, Perry Cook, (2002), “Musical Genre Classification of Audio Signals”, IEEE Transaction on Speech and Audio Processing. [16] Han, B., Rho, S., Dannenberg, R. B., Hwang E., (2009), “Smers: Music emotion recognition using support vector regression”, Proceedings of the 10th Intl. Society for Music Information Conf., Kobe, Japan. [17] Han J., Kamber M., Pei J., (2011), “Data Mining: Concepts and Techniques, 3rd Edition”, Morgan Kauffman publications, ISBN: 9780123814791. [18] Kate Hevner, (1936), “Experimental studies of the elements of expression in music, American Journal of Psychology, 48:246268. [19] Ho, T., (1998), “The random subspace method for constructing decision forests.”, IEEE Transactions on Pattern Analysis and Machine Intelligence 20(8), 832844. [20] JungHyun Kim, Seungjae Lee, SungMin Kim, WonYoung Yoo, (2011), “Music Mood Classification Model Based on Arousal-Valence Values”, ICACT2011, ISBN 978-89-5519-155-4. [21] Li T., Ogihara, M., (2003), “Detecting emotion in music”, Proceedings of the International Symposium on Music Information Retrieval, Washington D.C., USA. [22] Liu, D., Lu, L., Zhang H. J., (2003), “Automatic Mood Detection from Acoustic Music Data”, Johns Hopkins University. [23] McAdams. S., (1999), “Perspectives on the contribution of timbre to musical structure.”, Computer Music Journal, 23:85102. 52 BIBLIOGRAPHY [24] McEnnis, D., McKay, C., Fujinaga, I., Depalle P., (2005), “jAudio: A feature extraction library”, Proceedings of the International Conference on Music Information Retrieval. 6003. [25] Marsyas, http://opihi.cs.uvic.ca/marsyas [26] Masato Miyoshi, Satoru Tsuge, Hillary Kipsang Choge, Tadahiro Oyama, Momoyo Ito, Minoru Fukumi, (2010), “Music Impression Detection Method for User Independent Music Retrieval System”,Proc. of KES2010, pages 612621. [27] Mirenkov, N., Kanev, K., Takezawa, H., (2008), “Quality of Life Supporters Employing Music Therapy”, Advanced Information Networking and Applications - Workshops(AINAW). [28] MPEG-7 Overview, http://mpeg.chiariglione.org/standards/mpeg-7/mpeg7.htm [29] Rabiner, L., Juang, B., (1993), “Fundamentals of speech recognition.”, New York: Prentice-Hall. [30] Radocy E., Boyle J. D., (1988), Psychological foundations of musical behavior, Springfield, IL: Charles C. Thomas. [31] Russell J. A., (1980), “A circumplex model of affect”,Journal of Personality and Social Psychology, 39: 1161-1178. [32] Ryo Hirae, Takashi Nishi, (2008), “Mood Classification of Music Audio Signals”, The Journal of the Acoustical Society of Japan. [33] Scaringella, N., Zoia, G., Mlynek, D., (2006), “Automatic genre classification of music content, A survey.”, IEEE Signal Processing Magazine, vol. 23, no. 2, pp. 133141. [34] Schoen M., Gatewood E. L., (1999), “The mood effects of music”, International Library of Psychology, Routledge. [35] Schubert E., (1996), “Continuous response to music using a two dimensional emotion space”, Proceedings of the 4th International Conference of Music Perception and Cognition. [36] Sloboda J. A., Juslin P. N., (2001), “Music and Emotion: Theory and Research”, New York: Oxford University Press. 53 BIBLIOGRAPHY [37] Thayer, R. E., (1989), “The Bio-psychology of Mood and Arousal”, New York: Oxford University Press. [38] Tyler P., (1996), “Developing A Two-Dimensional Continuous Response Space for Emotions Perceived in Music”, Doctoral dissertation, Florida State University. [39] Wang, M., Zhang, N., Zhu, H., (2004), “User-adaptive Music Emotion Recognition”, IEEE International Conference on Signal Processing, pp. 1352-1355. [40] Weihs, C., Ligges, U., Morchen, F., Mullensiefen, D., (2007), “Classification in music research.”, Advance Data Analysis Classification, vol. 1, no. 3, pp. 255291. [41] Weka, http://www.cs.waikato.ac.nz/ml/weka/ [42] Zhouyu Fu, Guojun Lu, Kai Ming Ting, Dengsheng Zhang, (2011), “A Survey of Audio-Based Music Classification and Annotation”, IEEE Transactions on multimedia, Vol. 13, No. 2. 54