Download Movie Rating Prediction System

Movie Rating Prediction System Group members: Shu Zhang and Yue Xu Abstract In this paper, we build a movie rating prediction system on selected training sets provided by MovieLens. We applied two different dimensionality reduction algorithms: K-means and Stochastic Gradient Descent. It can be used to predict the rating of a user based on an unrated movie. RMSE (Root-Mean-Squared-Error) is applied as the evaluating criteria to analyze the performance of these two algorithms. Also, different parameters of the Stochastic Gradient Descent method are applied to analyze the effects of each parameter on the rating results. Finally, the performance of these two different algorithms are compared and analyzed. Introduction Movie is becoming a more and more important entertainment in people’s life. There are many choice of movies, and what would a person may like is an interesting topic. In this project, we are building a movie rating predicting system. That is, when a certain user has reviewed some movies, this system would predict ratings of other movies that he/she has not reviewed. The result may be used for recommendation on some video website. We use a 100k MovieLens dataset collected through the MovieLens web site (movielens.umn.edu) during the seven-month period from September 1997 to April 1998. The data includes 100,000 ratings among 943 users and 1682 movies. 90% of the data is considered as the training data to build this movie rating prediction model using the dimensionality reduction algorithm. The other 10% dataset is used as test data set to evaluate the model using RMSE criteria. This dataset records different users’ ratings for different movies (rating score from 1 to 5). This data set consists of: • 100,000 ratings (1 - 5) from 943 users on 1682 movies (90% would be 90000 ratings) • The genres of the 1682 movies 1 The structure of these matrices are shown below: Denoted as R Denoted as M The problem of this paper here comes, as how to predict a particular user’s rating on a movie that he/she may has not seen yet. This paper aims to find out these missing ratings so that it can be used to recommend personalized movies to different users. Method Data Pre-processing and Problem Formalized As mentioned before, the whole data has 100,000 ratings, which is split into a 90% training set and 10% test set. The training and testing data set are pre-processed as follows: Users are assigned as user ID from 1 to 943, while movies are assigned to have a item ID be 1 to 1682. So the data can be transformed, as a 1682 * 943 matrix with each corresponding rating is the value in the matrix, which is a very sparse matrix with lots of missing ratings. 𝑅!,! = 𝑟!,! , 𝑒𝑥𝑖𝑠𝑡𝑖𝑛𝑔 𝑟𝑎𝑡𝑖𝑛𝑔𝑠 0, 𝑚𝑖𝑠𝑠𝑖𝑛𝑔 𝑟𝑎𝑡𝑖𝑛𝑔𝑠 Thus our work is to fill all the zero-entries in the matrix based on the training set data. After all the zero-entries are filled out, the widely used RMSE (Root-Mean-Squared-Error) criteria are used on the test data set to evaluate the performance of our training algorithms. To be clear, the RMSE is calculated using the test data set by comparing the real rating and our predicted rating using the following formula: 2 ! RMSE = ! !!!(𝑌! ! - 𝑌! )! K-means clustering Algorithm K-means clustering algorithm can be applied to predict the missing ratings in the matrix A in the following steps. 1. We use k-means clustering to group the movies based on their genre affiliation. All the movies and their corresponding genre affiliation are denoted in M as mentioned before. Based on this matrix, K-means is applied to cluster the movies. As a decision criterion, the optimal cluster number is determined by minimizing the SSE (Sum Squared Error) as the following formula: ! ||𝑋! − 𝜇! ||! argmin ! !!! !"∈! After clustering, a matrix, denoted as D, is obtained to show the corresponding cluster where each movie belongs. 2. This step and next step aim to calculate average rating of each user for each cluster, which can be achieved as follows. First, based on matrix R and D, we can obtain a matrix shown below: Ø 1st column: user ID Ø 2nd column: movie ID Ø 3rd column: rating Ø 4th column: cluster number 3. After that, we group the rows of this matrix based on user ID, then we can calculate the user’s average rating for each cluster. The rows are user ID and the columns are cluster numbers. Entries are corresponding average ratings. 3 4. Once the average rating of each user for each cluster is generated, the rating matrix R can be filled out as follows. For a particular user with user ID u and a movie with movie ID m, user u has not watched movie m before, the predicted rating Rum is the average rating of u on the cluster where movie m belongs. 5. The final step is to apply the RMSE formula mentioned before to calculate the RMSE to evaluate the result. Stochastic Gradient Descent Algorithm Stochastic Gradient Descent is an extension of the basic SVD algorithm. SVD is known to give the minimum SSE (Sum of Squared Errors). SSE and RMSE are monotonically related: 1 𝑆𝑆𝐸 𝑐 So SVD is also minimizing RMSE. But back to our problem, SVD cannot be defined as 𝑅𝑀𝑆𝐸 = matrix R has lots of missing ratings. The aim is to minimize SSE for the unknown test ratings. So the goal can be described as to find P and Q such that Where R = Q * PT Ø Where rxi is the ith entry of the xth row in the matrix rating Ø Ø Ø qi the ith row of matrix Q px the xth column of matrix P k is the number of columns in matrix Q The first idea is to apply SVD to get P and Q based on the training matrix R. This will minimize the SSE of the training data set. But the model trained based on minimizing the training error will be over-fitted and not to generalize well to the test data. R = U∑VT = Q * PT , so that Q = U; PT = ∑VT To solve the over-fitting problem, intuitively, we should not use a large number of k (k is the number of columns of matrix Q) as the training matrix R has lots of missing ratings. So we need to introduce regularization to the problem to consider the number of k. The formula is shown below, it can be seen that the number of k should shrink sharply when 4 lots of data is missing in the training data matrix. 𝜆 is the regularization parameter and μ is the learning rate. 𝑚𝑖𝑛 (𝑟!" − 𝑞! 𝑝! )2 + μ𝜆( ||𝑞! ||! + ||𝑝! ||! ) In order to achieve minimum objective, Stochastic Gradient Descent method is applied: 1. Initialize P and Q by apply the SVD of rating matrix (with missing values) R = U∑VT = Q * PT Q=U PT = ∑VT 2. Then iterate over the training rating matrix R and update all P and Q 𝜀!" = 2 * (rxi – qi*px) qi = qi + µ (𝜀!" *px –𝜆*qi) pi = px + µ (𝜀!" * qi –𝜆* px) The learning rate needs to be chosen carefully in order to get an optimal solution for this problem. In order to choose a good learning rate, we tried the values from 0.1 to 0.01, and calculate the SSE based on the training data for each learning rate. After this, we find the optimal value for learning rate and then the SGD model is applied based on the training data and get the P, Q to reconstruct the matrix R and calculate the RMSE using the method described above. Finally, another interesting thing is to apply different 𝜆 𝑎𝑛𝑑 𝑘 values to calculate both the training and test data set SSE. By examining the errors, it can be shown clearly that introducing regularization into the problem decrease over-fitting issue, which is also in consistent with our intuition mentioned before. Results K-means clustering Algorithm 3 clusters are the best. RMSE is 1.15. Stochastic Gradient Descent Algorithm To find the optimal learning rate μ, we tried values from μ= 0.1 to μ= 0.01 and calculate the training set SSE for each learning rate μ. It is found that when μ = 0.03, 5 the training error has a minimum. After we get the optimal learning rate, we can apply the stochastic gradient descent methods mentioned above to get the matrix P and Q and reconstruct the matrix R. After that, we calculate RMSE is 0.95 after 40 iterations. Finally the effect of regularization on over-fitting is also analyzed by applying different values of 𝜆 𝑎𝑛𝑑 𝑘. From the above graph, when 𝜆 = 0, which means no regularization is introduced, it is clear that although training set error decrease a lot with number of k increases, the test set error will increase dramatically when we have larger number of k. But if we use 𝜆 = 0.2, the training set error and test set error will both decrease if we have larger number of k. The comparison proves that if the regularization solves the over-fitting problem. 6 Comparison Clearly, the stochastic gradient descent method gave a better value of RMSE, which can be analyzed in the following two aspects. First of all, k-means clustering only considered the movie genre to cluster all the movies. Each user has the same rating on all the movies belonging to the same cluster. However, the Stochastic Gradient Descent method took the relationship of both user and movie into consideration. Additionally, comparing to number of clusters in k-means clustering method, in the Stochastic Gradient Descent method, the parameter k, which is the number of columns of matrix Q, denoted the number of factors considered for each movie. Therefore, in comparison with the k-means cluster with only 3 clusters, the SGD method considered 20 factors for each movie. Conclusions To conclude, this project has successfully computed the missing values for a very sparse matrix by dimensional reduction algorithms. K-means clustering and stochastic gradient descent algorithm were applied, and the second one got the better result. Results of this work can be used to recommend movies to users based on predicting user’s rating on a particular movie. In future, the SGD (Stochastic Gradient Descent) can be explored more by applying different parameters and optimize the results. 7

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Movie Rating Prediction System