Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Music Mood Classification using Intro and Refrain Parts of Lyrics Seungwon Oh and Minsoo Hahn Jinsul Kim Digital Media Lab Korea Advanced Institute of Science and Technology (KAIST) Daejeon, Korea {swoh, mshahn1}@kaist.ac.kr Electronics and Computer Engineering Chonnam National University Gwangju, Korea [email protected] Abstract—In this paper, we propose an lyrics-based classification approach. It estimates a mood of a song with only intro and refrain parts of lyrics. In general, the intro part creates a specific atmosphere of a song, and the chorus part is the strongest part of the song. The proposed method detects important features significantly associated with the mood of songs from the both parts. By calculating the similarity between terms of the parts and eight basic emotions, it can classify songs according to mood. Keywords—mood; classification; lyrics I. INTRODUCTION Recently, we are living in the flow of media. Specially, music is one of the most popular media. In general, songs are closely related with emotion and mood. Sometimes, people select songs which they want to listen to according to their mood. However, it is not an easy task to find manually the music what they want because there are so many songs. Therefore, a music classification and recommendation system is needed. In Itunes software, users can attach tags to songs, and they can listen to music what they want easily [1]. However, users should attach tag keywords to their songs fully manually in order to classify them. In addition, users should know about songs to attach tags to them. Thus, automatic approaches are needed. II. PLUTCHICK'S EMOTION MODEL A. Decision of mood The Plutchick’s basic emotion model defines the emotion of human with eight basic emotions as shown in Fig. 1 [3]. Plutchick insists that the other emotions which are not displayed on the Fig. 1 can be represented by combinations of basic emotions. In this paper, the proposed approach utilizes the eight basic emotions such as Joy, Acceptance, Anticipation, Anger, Disgust, Sadness, Surprise, and Fear. And in order to analyze the similarity between lyrics and emotions, the method considers a correlation between vocabularies and the emotions. B. Music recommendation based on the mood of a user In order to recommend songs for users, the proposed approach defines three types of emotions as users' state such as happy, angry, and sad as shown in Table I. When a user feels happy, it plays songs which have the mood of joy, acceptance, and anticipation. If a user feels sad, there are two kinds of results. One is a collection of sad songs to let user feel sympathy. And another is happy and encouraging one to let user overcome the sadness. Users in sad mood can choose one of the two. There are some methods to classify songs automatically. Saunders uses speech and sound signals to get features from music [2]. However, speech in music is quite different from normal speech. Therefore, it is very hard to recognize the speech signals. On the other hand, another method is to analyze the lyrics. It is much simpler than analyzing sound signal. This paper represents a new lyrics-based mood classification method. It utilizes only intro and refrain parts of lyrics in order to classify songs. The intro part of lyrics includes the information to create the atmosphere of a song, and the refrain part has the most important keywords of the song. Therefore, the proposed approach can enhance the classification accuracy by disregarding meaningless words. Figure 1. Plutchick's basic emotion model 978-1-4799-0604-8/13/$31.00 ©2013 IEEE TABLE I. MUSIC RECOMMENDATION TABLE User's mood Happy Angry Recommended mood of songs joy + acceptance + anticipation anger + disgust III. Sad case1 Sadness + fear + surprise case2 joy + acceptance + anticipation FEATURE SELECTION A. Feature Set Selection Feature selection is an important part in pattern classification processes. It is more important than choosing the learning algorithm. In text classification area, the frequency of words is usually used. The proposed classification method is based on lyrics of a song. The method deals with features different from other text classification. First, we employ term count as a feature for our classification as shown in Fig. 2. In addition, we calculate the similarity between word and emotion. Second, we focus on the intro part and the refrain part because of the following two reasons. 1) The intro part of lyrics is the start part and decides the atmosphere of songs. Therefore, the intro should include more important keywords. 2) The refrain part represents the repeated words of a song. The refrain is also important because a song writer writes essential keywords repeatedly. B. Feature: Term Count We use ANEW(Affective Norms for English Words) database in order to collect training samples as shown in Fig. 3 [4]. ANEW model uses three factors such as Valence, Arousal, and Dominance. However our model needs eight factors to use Plutchik's emotion model. Therefore we expand the ANEW model. To get eight dimensional features from ANEW model, we calculate a three dimensional Euclidean distance from the eight emotion term such as Joy, Acceptance, Anticipation, Anger, Figure 3. 3D feature plot for ANEW(Affective Norms for English words) Disgust, Sadness, Surprise, and Fear. Terms which represent the emotion of Joy may have closer distance to the term "Joy" than other seven emotion terms. C. Feature: Intro and Refrain Part It may more effective to handle not all parts but important parts of the lyrics. We define two parts as follows. 1) Intro part: Intuitively, the intro of a song has more intensive words for mood to give information about the mood of a song to listeners. We assume it is first two sentences of lyrics as an initial setting. It can be adjusted for optimization. 2) Refrain part: To find the refrain part of a song, we need to find repeating sentences. In general, the refrain part is to repeat the same sentences. However, some of the part can be changed in a song. Therefore, in order to detect the refrain part correctly, we deal with the refrain part including small changes. If we do not consider about changes of the refrain part in a song, we can simplify the problem as a ‘longest common repeat problem’. IV. EXPERIMENT A. Setup In order to evaluate the proposed approach, we utilize a support vector machine algorithm. We implement a Java-based application for mood classification with the Livsvm, an open source library, as shown in Fig. 4. B. Building Training Set We use eight kinds of moods. For each mood, we calculate distances between each term and eight mood classes. For mood ‘A’, we assume two classes: class ‘A’ and ‘Ā’ class. Class ‘A’ consists of songs in ‘A’ mood. Class ‘Ā’ consists of songs not in ‘A’ mood. Next we counts term frequency in each feature part of lyrics of songs and use it as a weight. We use the intro and refrain part as features for classification. Figure 2. Term count plot C. Building Testing Set We select one hundred songs randomly among various music collections and evaluate the classification application with them. TABLE II. EXPERIMENT RESULTS Mood # of songs Joy Acceptance Fear Surprise Sadness Disgust Anger Anticipation 33 36 15 3 35 30 28 57 V. # of correct classification 14 12 11 0 12 11 16 40 Accuracy 42.4% 33.3% 73.3% 0% 34.3% 36.7% 57.1% 70.1% CONCLUSION We proposed a method using the Plutchick's emotion model to classify the mood of a song. The proposed approach can be used for automatic music classification in commercial music download services or internet radio broadcasting services. In addition, it can provide recommendations according to users’ mood. If it utilizes additional information of songs such as genre, accent, and speed, it can provide better classifier and recommender services. ACKNOWLEDGMENT Figure 4. Music mood classifier application D. Results Table II represents the results of the experiments. The accuracy of Fear and Anticipation is higher than others because a lot of terms in ANEW are located nearby them. On the other hand, because the terms related with Joy, Acceptance, and Sadness imply ambiguous meanings, the classification has low accuracy rate. However, as shown in the recommendation table, the evaluation of users was rather positive because the recommendation results are decided by the combinations of the basic emotions. This research is supported by Ministry of Culture, Sports and Tourism(MCST) and Korea Creative content Agency(KOCCA) in the Culture Technology(CT) Research & Development Program. REFERENCES [1] [2] [3] [4] http://www.apple.com/itunes/ J. Sounders, “Real-time discrimination of broadcast speech/music,” in Proc.ICASSP96, vol.2, Atlanta, GA, 1996, pp.993-996. A. Ortony and T.J. Turner, “What’s the basic emotions,” Psychological Review, 1990. M.M. Bradley, B.N. Cuthbert, and P.J. Lang, “Affective Norms for English Words (ANEW). Technical Manual and Affective Ratings,” Gainesville, FL: The Center for Research in Psychophysiology, University of Florida, 1998.