Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Data Mining on Big Data for Music Recommender Systems Advisor(s): Fabrice Muhlenbach (UJM, LaHC), Pierre-René Lhérisson (UJM, LaHC / 1Dlab), and Pierre Maret (UJM, LaHC) Mail: [email protected], [email protected], [email protected] Location: Laboratoire Hubert Curien Team: Connected Intelligence Summary In today’s world, many goods and services provided to the consumers are done through a web application. Within the available plethora of information in e-commerce, it is necessary to filter this information for keeping only items that might be relevant for the user. Recommender systems are such automated systems, they can be defined as software tools and techniques that provide suggestions for items that are most likely of interest to a particular user [1]. In the cultural field, like music recommendation [2], using those systems raises the question of diversity, novelty, and discovery [3]. The human being is fond of stability, but he is not against breaking his routine and exploring things out of his comfort zone. In this context, it is relevant to propose new items not too similar to items already used or buy by the users for expanding and enriching their cultural knowledge. This approach can been done based on a dissimilarity measure computed between cultural items. Moreover, few years ago, with the emergence of the word embedding paradigm [4] and the ability to run algorithms on big data, the content-based approaches [5] have experienced a resurgence of interest for the recommender systems [6, 7]. The objective of this master thesis is to study how data mining techniques can be helpful for improving the quality of the music recommendations on a streaming platform when the content associated to music artists is not well structured, which is the case of emerging music artists from independent record labels (“indie labels”): the artists are not publicly known, they have no publicity or only a little, most of them do not have a web site with a useful structured content to exploit (e.g., a Wikipedia page), there is a very few chance of finding items on these music artists in the specialized music press, etc. This project will be done at Hubert Curien Laboratory (Saint-Etienne, France) on the data of a real music streaming platform called “1D lab” developed by the social start-up company 1D Lab (http://en.1d-lab.eu/). Expected results • Theoretical: Data mining techniques used for improving the quality of a music contentbased recommender system model. 1 • Practical: – Implementation of word embedding techniques (in R/Python) on artists descriptions for improving the similarity measure between music artists. – Application of this similarity between a list of listened music artists and candidate recommender music items. – Evaluation of the recommendation relevance with a top-N recommendation protocol when the items are not so popular, like indie music artists [8]. Keywords: data mining, recommender system, music recommendation, content-based recommendation, word embedding, top-N recommendations References [1] F. Ricci, L. Rokach, and B. Shapira, “Recommender systems: Introduction and challenges,” in Recommender Systems Handbook, F. Ricci, L. Rokach, and B. Shapira, Eds. Springer, 2015, pp. 1–34. [Online]. Available: http://dx.doi.org/10.1007/ 978-1-4899-7637-6_1 [2] M. Schedl, P. Knees, B. McFee, D. Bogdanov, and M. Kaminskas, “Music recommender systems,” in Recommender Systems Handbook, F. Ricci, L. Rokach, and B. Shapira, Eds. Springer, 2015, pp. 453–492. [Online]. Available: http://dx.doi.org/10.1007/ 978-1-4899-7637-6_13 [3] E. Pariser. (2011). The Filter Bubble: What The Internet Is Hiding From You. NY, Penguin Press. [4] T. Mikolov, Q. V. Le, and I. Sutskever, “Exploiting similarities among languages for machine translation,” CoRR, vol. abs/1309.4168, 2013. [Online]. Available: http:// arxiv.org/abs/1309.4168 [5] P. Lops, M. de Gemmis, and G. Semeraro, “Content-based recommender systems: State of the art and trends,” in Recommender Systems Handbook, F. Ricci, L. Rokach, B. Shapira, and P. B. Kantor, Eds. Springer, 2011, pp. 73–105. [Online]. Available: http://dx.doi.org/10.1007/978-0-387-85820-3_3 [6] J. Manotumruksa, C. MacDonald, and I. Ounis, “Modelling user preferences using word embeddings for context-aware venue recommendation,” CoRR, vol. abs/1606.07828, 2016. [Online]. Available: http://arxiv.org/abs/1606.07828 [7] M. G. Ozsoy, “From word embeddings to item recommendation,” CoRR, vol. abs/1601.01356, 2016. [Online]. Available: http://arxiv.org/abs/1601.01356 [8] P. Cremonesi, P. Garza, E. Quintarelli, and R. Turrin, “Top-N recommendations on unpopular items with contextual knowledge,” in Proceedings of the 3rd International Workshop on Context-Aware Recommender Systems, CARS-2011, October 23, 2011, Chicago, Illinois, USA, 2011, 5 pages. [Online]. Available: http://ceur-ws.org/ Vol-791/paper1.pdf 2