Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Introduction to Recommender System Guo, Guangming [email protected] Outline • • • • • • Background & Definition Some history worth noting Various applications Main-stream approach Evaluation Some resources 2012-12-19 Lab of Semantic Computing and Data Mining 2 Outline • Background & Definition – Related areas – Challenges – Paradigms • • • • • Some history worth noting Various applications Main-stream approach Evaluation Some resources 2012-12-19 Lab of Semantic Computing and Data Mining 3 Become clear with basic concepts • First step of learning • Building blocks of new ideas • Define the rules to play with • Prerequisites for communication 2012-12-19 Lab of Semantic Computing and Data Mining 4 Definition of Recommender Systems • Also named recommendation systems • A subclass of information filtering system that seek to predict the 'rating' or 'preference' that a user would give to an item (such as music, books, or movies) or social element (e.g. people or groups) they had not yet considered, using a model built from the characteristics of an item (content-based approaches) or the user's social environment (collaborative filtering approaches). --http://en.wikipedia.org/wiki/Recommender 2012-12-19 Lab of Semantic Computing and Data Mining 5 More truth • Important vertical technique in data mining • One of the most success solution for industry • Became an independent research area in 1990s – Many highly reputed academic conferences such as SIGIR, KDD, ICML, WWW, EMNLP et al. have it as their subtopics. – RecSys is fully devoted to this area • Data mining/machine learning approach – 1) specifying heuristics that define the utility function and empirically validating its performance – 2) estimating the utility function that optimizes certain performance criterion, such as the mean square error. 2012-12-19 Lab of Semantic Computing and Data Mining 6 Chanllenges • • • • • • • • Cold start Long tail Data sparsity Scalability Social & Temporal Context-aware Personality-aware Being accuracy is not enough 2012-12-19 Lab of Semantic Computing and Data Mining 7 Related Research Area • • • • • • • • • Cognitive science Text mining Natural Language Processing Information retrieval Machine learning Association mining Approximation theory Management science Consumer choice in marketing 2012-12-19 Lab of Semantic Computing and Data Mining 8 Paradigm of RecSys • Content-based recommendations: – recommended items similar to the ones the user preferred in the past; • Collaborative recommendations: – recommended items that people with similar tastes and preferences liked in the past; • Knowledge-based recommendations: – recommended items based existing knowledge models that fit the needs of users • Hybrid approaches: – Combination of various input data or/and composition various mechanism 2012-12-19 Lab of Semantic Computing and Data Mining 9 Background • Universe Problem in Information Age – – – – Information overload From SE to Recsys pull vs. push Web 1.0 vs. web 2.0 • Leverage the existing user generated data – User profile – Behavior history on the web,Rating – Click through data, browse data • Great benefits(win-win) – Help users find valuable information – Help business make more profits 2012-12-19 Lab of Semantic Computing and Data Mining 10 Outline • Background & Definition • Some history worth noting – Netflix prize • • • • Various applications Main-stream approach Evaluation Some resources 2012-12-19 Lab of Semantic Computing and Data Mining 11 A peak in the history • Research on collaborative filtering algorithm reached a peak during the Netflix movie recommendation competition • October 2, 2006 ~ September 21, 2009 • RMSE – Must outperform baseline by 10% 2012-12-19 Lab of Semantic Computing and Data Mining 12 The Million Dollar Programming Prize • The Netflix Prize – Greatly energize the research in Recsys – Last from 2006 to 2009 • Finalist: BellKor’s Pragamatic Chaos team – A joint-team – Andreas Töscher and Michael Jahrer ( Commendo Research &Consulting GmbH), originally team BigChaos – Robert Bell, and Chris Volinsky (AT& T), Yehuda Koren (Yahoo),originally team BellKor – Martin Piotte and Martin Chabbert, originally team Pragmatic Theory • The ensemble Team – The most accurate algorithm in 2007 used an ensemble method of 107 different algorithmic approaches 2012-12-19 Lab of Semantic Computing and Data Mining 13 Outline • • • • • • Background & Definition Some history worth noting Various applications Main-stream approach Evaluation Some resources 2012-12-19 Lab of Semantic Computing and Data Mining 14 Existing applications • • • • News/Article recommendation Targeted Advertisement Tags Recommendation Mobile Recommendation • E-commerce – Books, movies, music… 2012-12-19 Lab of Semantic Computing and Data Mining 15 Benefits • Alternative to Search Engine • Boost the profit – Amazon et al. • Better user experience 2012-12-19 Lab of Semantic Computing and Data Mining 16 Outline • • • • Background & Definition Some history worth noting Various applications Main-stream approach – Content-based – Collaborative filtering • Evaluation • Some resources 2012-12-19 Lab of Semantic Computing and Data Mining 17 Content-based • Simple compute the similarity – Cosine similarity or pearson correlation coefficient – TF-IDF • Utilize dimensionality reduction – LDA 2012-12-19 Lab of Semantic Computing and Data Mining 18 Collaborative filtering • Association mining • Memory-based – Nearest-neighbors • Model-based – Latent fator model • Some comparison – Space & time – Theory foundation and interpretability 2012-12-19 Lab of Semantic Computing and Data Mining 19 Latent factor model • LSI, pLSA, LDA, latent class model, Topic model et al. • A method based on matrix factorization/decomposition 𝑅′ = 𝑃𝑇 𝑄 ′ 𝑟𝑢𝑖 = 𝑝𝑢𝑓 𝑞𝑖𝑓 𝑓 where R is the rating matrix, P and Q are sub-matrix after dimension reduction An low-rank approximation of the original matrix 2012-12-19 Lab of Semantic Computing and Data Mining 20 Computations • Traditional SVD – Needs a simple method to complete the matrix – Cost on the completed dense matrix is very high • Situation changed in 2006 after the Netflix Prize – Simon Funk – Defined a cost function on the training data • 𝐶 𝑝, 𝑞 = 𝑢,𝑖 ∈𝑡𝑟𝑎𝑖𝑛 ′ 𝑟𝑢𝑖 − 𝑟𝑢𝑖 2 • To avoid overfitting, add regularization term • 𝐶 𝑝, 𝑞 = 𝑢,𝑖 ∈𝑡𝑟𝑎𝑖𝑛 ′ 𝑟𝑢𝑖 − 𝑟𝑢𝑖 2 + 𝜆( 𝑝𝑢 2 + ( 𝑞𝑖 2) • Gradient descent to optimize C(p,q) 2012-12-19 Lab of Semantic Computing and Data Mining 21 Outline • • • • • • Background & Definition Some history worth noting Various applications Main-stream approach Evaluation Some resources 2012-12-19 Lab of Semantic Computing and Data Mining 22 Evaluation Criterion • User satisfaction by quesionnaire • Precision – RMSE – Top-k • • • • Coverage Diversity Novelty Serendipity – Originally thinking recommendation has non-sense • … 2012-12-19 Lab of Semantic Computing and Data Mining 23 Outline • • • • • • Background & Definition Some history worth noting Various applications Main-stream approach Evaluation Some resources 2012-12-19 Lab of Semantic Computing and Data Mining 24 葫芦项亮 2012-12-19 Lab of Semantic Computing and Data Mining 25 Resources • www.recsyswiki.com • 各大推荐引擎资料汇总 by 大魁 – http://blog.csdn.net/lzt1983/article/details/7914536 2012-12-19 Lab of Semantic Computing and Data Mining 26