* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download 4. support vector machines
Survey
Document related concepts
Transcript
A Comparative Study of the various methods used for the Prediction of Hit Songs Aishwarya Harne Mihir Borkar Dept. of Computer Engineering D.J. Sanghvi College of Engineering Mumbai University, Mumbai Dept. of Computer Engineering D.J. Sanghvi College of Engineering Mumbai University, Mumbai [email protected] [email protected] Abhinav Garg Prof. Abhijit Patil Dept. of Computer Engineering D.J. Sanghvi College of Engineering Mumbai University, Mumbai Dept. of Computer Engineering D.J. Sanghvi College of Engineering Mumbai University, Mumbai [email protected] [email protected] ABSTRACT The music industry today is filled with scores of struggling composers and music producers churning out a plethora of songs. Hardly 5% of these tracks are actually released and even fewer become popular amongst the masses. This project aims to support the 20 billion-dollar music industry cater to those who patronize it by helping musicians understand the complete extent of how impactful their new soundtrack will be in terms of commercial popularity. We propose to do this by using logistic regression to determine the popular features of a track encoded in a numerical format. We do this by essentially extracting the frequency, tempo and pitch of the sound from the MIDI file. The MIDO parser in Python encodes these characteristics in a numerical format. The MIDO file associates every instrument with a number (1 for Drums, 2 for Guitar etc.) separating consecutive notes with colon ( : ) as a delimiter. We proceed to increase the accuracy of the training by using 50 soundtracks to create positive learning and 50 to create a negative one. three models, namely Logistic Regression, Naive Bayes and the SVM (Support Vector Machine). All the three methods help us work on a dichotomous result. We proceed to look at each of these methods in depth and then finally conclude on which method is most applicable to the project. 2. TYPE OF MUSICAL FEATURES The characteristics of a particular track can be identified by three basic components that which make it unique when compared to other musical tracks as well as increase its popularity and that are [2] : 1. Timbre/instrument It is also known as tone color or tone quality in psychoacoustics. Timbre is the quality of the musical note, sounds that differentiates the various types of sounds production which include string instruments, voices and musical instruments, wind instruments and percussions. 2. Melody It is a sequence of musical notes that is played together as a part of a composition. The arrangement of single notes forms an important aspect of the Melody. 3. Beats It is a periodic variation of amplitude or sound. It is a regular rhythmic sound or movement that forms the backbone of any composition. Keywords Naïve Bayes, logistic regression, security protocols, support vector machine 1. INTRODUCTION In this paper, we seek to establish the best way to determine the commercial success of a song. After decoding the MIDI files into a numerical format, there are a variety of probabilistic models that can be applied to analyze and determine the success of the sound track. We are considering 3. LOGISTIC CLASSIFICATION A logistic regression model can be used in classification of inputs into binary or multiple categories. Logistic regression measures the relationship between the categorical dependent variable and one or more independent variables by estimating probabilities using a logistic function, which is the cumulative logistic distribution. Thus, it treats the same set of problems as probit regression using similar techniques, with the latter using a cumulative normal distribution curve instead. Equivalently, in the latent variable interpretations of these two methods, the first assumes a standard logistic distribution of errors and the second a standard normal distribution of errors. Logistic regression can be seen as a special case of generalized linear model and thus analogous to linear regression. The model of logistic regression, however, is based on quite different assumptions (about the relationship between dependent and independent variables) from those of linear regression. In particular the key differences of these two models can be seen in the following two features of logistic regression. First, the conditional distribution is a Bernoulli distribution rather than a Gaussian distribution, because the dependent variable is binary. Second, the predicted values are probabilities and are therefore restricted to (0, 1) through the logistic distribution function because logistic regression predicts the probability of particular outcomes [1]. it a non-probabilistic binary linear classifier.In addition to performing linear classification, SVMs can efficiently perform a non-linear classification using what is called the kernel trick, implicitly mapping their inputs into highdimensional feature spaces. [6] SVMs belong to the family of linear classifiers. A linear classifier is used to achieve the results of statistical classification to use an object’s characteristics and identify the group or class it belongs to. This decision is achieved based on the value of linear combination of the characteristics. The SVM algorithm has been widely used in text and hypertext classification. Also, hand written characters can be efficiently recognized by a SVM model. SVMs are also useful in medical science to classify proteins with up to 90% of the compounds classified correctly. 5. NAIVE-BAYES 5.1 Concept of Conditional Probability: Logistic regression can be used to predict songs by classifying them as hit or not by using a decision boundary which will be based on the labels like whether the song has been constantly rated in the top 40 of the billboard charts or has been rated in top 100 of music websites such as Spotify. Conditional Probability is a concept that determines the likelihood of a particular event occurring given that another event has already occurred. The following formulae are used while performing logistic classification: Examples of this type can be solved using conditional probability. For E.g. - In a box that consists of apples and oranges, some of the fruits are corrupted. What is the probability that a fruit picked at random is corrupted if it is an apple? 5.2 Bayes Rule Bayes rule gives quantifies the conditional probability of an event by giving a relation involving its reverse form. [7] Taking the above example in mind, J (Ɵ) stands for the learning rate P (A) = Probability that the fruit is corrupted hƟ(x) stands for hypotheses P (B) = Probability that the fruit is an apple P (B|A) = Probability that the fruit is an apple, given its corrupted 4. SUPPORT VECTOR MACHINES In machine learning, support vector machines (SVMs, also support vector networks[1]) are supervised learning models with associated learning algorithms that analyze data and recognize patterns, used for classification and regression analysis. an SVM training algorithm builds a model that assigns new examples into one category or the other, making Let’s assume P (A) = 0.3 P (B) = 0.5 P (B|A) = 0.2 According to Bayes theorem, P (A|B) = P (B|A)*P (A) ---------------P (B) = 0.2*0.3 --------0.5 = 0.1 5.3 NAIVE BAYES Naive Bayes: The Naive Bayes is generally used when trying to predict an outcome in the presence of more than one evidence. We decompose the Bayes rule in such a scenario and apply it to every evidence individually. Posterior Probability = Prior Probability * Likelihood -----------------------------------Evidence 6. CONCLUSION We have tried to train our data sets using Logistic regression, SVM and Naive Bayes method. However, we conclude the Logistic Regression to be the most effective method in this case. SVM is generally used for cases where time efficiency is critical [2], but is not needed when working on a relatively small dataset. Naive Bayes is effective when one feature does not depend on another [2], but in the case of music tracks, the popularity is determined by the succession of notes. Hence this method is not very useful. Logistic Regression is a perfect choice since it also allows us to impose a stricter criteria on popular songs by choosing an appropriate probability coefficient [2]. REFERENCES [1] Logistic Regression for Classification, available at: https://en.wikipedia.org/wiki/Logistic_regression [2] Wang, Keven Kedao. "Predicting Hit Songs with MIDI Musical Features.", cs229.stanford.edu [3] Koenigstein, Noam, Yuval Shavitt, and Noa Zilberman. "Predicting billboard success using data-mining in p2p networks." Multimedia, 2009. ISM'09. 11th IEEE International Symposium on. IEEE, 2009. [4] Shapiro, Heather. "A Bayesian Understanding Music Popularity." (2015). Approach to [5] Dhanaraj, Ruth, and Beth Logan. "Automatic Prediction of Hit Songs." ISMIR. 2005. [6] Support Vector Machines, available at: https://en.wikipedia.org/wiki/Support_vector_machine [7] Naïve Bayes Classifier, available at: https://en.wikipedia.org/wiki/Naive_Bayes_classifier