Download Recommending Services using Description Similarity Based Clustering and Collaborative Filtering

International Journal of Computer Application (2250-1797) Volume 5– No. 6, October 2015 Recommending Services using Description Similarity Based Clustering and Collaborative Filtering Ms. Sheetal Thokal Prof. Vrunda Bhusar i Department of Computer Engineering, Department of Computer Engineering,JSPM’s Bhivrabai Sawant Institute & Research, JSPM’s Bhivrabai Sawant Institute & Research, Wagholi, Pune, India Wagholi, Pune, India [email protected]@gmail.com Abstract-The challenge for the Big Data applications is to handle the large size of data and extract useful information for future use. Number of services are added on the Internet through service computing and cloud computing. So, servicerelevant data becoming too big & difficult to effectively handle by traditional approaches for database management like DBMS & RDBMS. To handle this challenge Clustering based on Description and Collaborative Filtering approach is proposed in this paper, which collect similar services in the same clusters to recommend services collaboratively. Technically this approach is divided in two stages. In the first stage, the available services are divided into small scale clusters using agglomerative Hierarchical Clustering algorithm & Nearest Neighbor algorithm. Clustering is used to reduce the data size by grouping similar services together. For further processing, at the second stage, a collaborative filtering algorithm is used on one of the clusters which contains services in descending order depends on rating. It can reduce the online execution time of collaborative filtering as, the number of services in a cluster are less than the total number of the services available on the web. I. INTRODUCTION In Big Data applications data collection has growing tremendously and commonly used software tools does not have the ability to capture, manage, and process data within less time[2].The most important challenge for the Big Data applications is to handle the large size of data and extract useful information for future use[3]. Nowadays, more services are added in cloud infrastructures to provide better functionalities. But Service users have difficulties in finding ideal services from the set of services. To assist users in a decision making processRecommender systems (RSs) are techniques which is intelligent applications, Where they want to choose some items among a set of alternative products or services.RSs handles two main challenges for big data application: 1) making decision within acceptable time; and 2) generating ideal recommendations from so many services. Clustering isimportant task in data analysis and data mining applications. In this method objects are assigned in the identical group which are more related to each other than theobjects those in other groups. Cluster is an ordered list of data which have the familiar characteristics. Cluster analysis is done by calculating similarities between data according to the characteristics of the data and grouping similar data objects into clusters. Collaborative filtering methods are based on collecting and analyzing a large amount of information on users behaviors, activities or preferences. It predicts what users will like based on their similarity to other users. It has capability of accurately recommending items without understanding of the item. Collaborative filtering (CF)is of two types, item and userbased methods[4]. The user-based CF is assumes that people who agree in the past will agree again in the future. The item-based CF algorithm recommends a user the items that are similar to what user has preferred before[5]. 18 International Journal of Computer Application (2250-1797) Volume 5– No. 6, October 2015 We propose a Description Based Clustering & Collaborative Filtering which is divided in two stages: Clustering and Collaborative filtering. Clustering is a first step to separate big data services into manageable clusters. So number of services in a cluster are much less than the total number of services. Therefore computation time of Collaborative Filtering algorithm which is used in second step can be reduced. As the ratings of similar services within a cluster are more relevant than dissimilar services, the recommendation accuracy based on users ratings may increase. II. RELATED WORK A neural networks-based clustering collaborative filtering algorithm in e-commerce recommendation system is proposed by Mai et al. [6]. To achieve the predictions for a user by first minimizing the size of item set the user needed to explore the Recommender system framework is proposed by Mittal et al. [7]. In this system K-means clustering algorithm was applied to Cluster movies based on the request of user. Li et al.[8]designed to incorporate multidimensional clustering into a collaborative filtering recommendation model. Data in the form of user and item profiles was clustered using the proposed algorithm in the first stage. Then the poor clusters were deleted and the appropriate clusters were further selected based on cluster pruning. At the third stage, an item prediction was made by performing a weighted average of deviations from the neighbors mean. Zhou et al. [9] proposed Data Providing service in terms of vectors by considering the composite relation between input, output, and semantic relations between them. For vector clustering fuzzy C-means algorithm was used. To use network clustering technique on social network of users for identify their neighborhood, and then use the traditional CF algorithms to generate the recommendations Pham et al. proposed system [10]. This work depends on social relationships between users. Simon et al. [11]proposed system in which a high-dimensional parameter-free, divisive hierarchical clustering algorithm which requires only implicit feedback is used, to discover the relationships within the users. Based on the clustering results, products of high interest were recommended to the users. III. a. PROPOSED SYSTEM Preliminary Knowledge A service can be expressed as a triple, S = (𝐷,𝐹,𝑅), where 𝐷 is a set of words which describes service s, 𝐹 is a set of functionalities of service s, 𝑅 is a set of ratings which some users gave to service s. Three kinds of service similarities are computed based on 𝐷, F and 𝑅 during the process of system.Suppose 𝑠𝑡 = <𝐷𝑡,F𝑡,𝑅𝑡> and 𝑠𝑗 = <𝐷𝑗,𝐹𝑗,𝑅𝑗> are two services. The similarity between 𝑠𝑡 and 𝑠𝑗 is considered in Three dimensions which are description similarity 𝐷_𝑠𝑖𝑚 (𝑠𝑡,sj), rating similarity 𝑅_𝑠𝑖𝑚 (𝑠𝑡,sj) and enhanced rating similarity 𝑅_𝑠𝑖𝑚′ (𝑠𝑡,s𝑗) , respectively. With this assumption, Proposed systems approach for Big Data application is presented. b. System Architecture Technically, Proposed system focuses on two stages, Clustering stage and Collaborative filtering stage as shown in Fig. 1. In the first stage, services are clustered according to their Description similarities. In the second stage, a collaborative filtering algorithm is applied within a cluster that has a target service. 19 International Journal of Computer Application (2250-1797) Volume 5– No. 6, October 2015 Fig. 1. Architecture of Proposed System A) Clustering 1) Stem Word Finding: In this stage, stem words are recognized from description of services from database. Different developers use different words to describe similar services.These words directly can influence the measurement of description similarity. Therefore, description words are uniformed before usage by collecting morphological similar words together assuming that they are semantically similar. 2) Description Similarity Computation: Description similarity 𝐷_𝑠𝑖𝑚 (𝑠𝑡, 𝑠𝑗) is calculated by using formula 𝐷_𝑠𝑖𝑚 (𝑠𝑡,) = |𝐷𝑡′⋂𝐷𝑗′| / |𝐷𝑡′⋃𝐷𝑗′| Larger value of |𝐷𝑡′⋂𝐷𝑗′| gives more similar services. 3) Clustering Services: Services are clustered according to their description similarity using agglomerative hierarchical clustering algorithm and Nearest Neighbor Algorithm which are discussed further in section c. B) Collaborative Filtering 1) Rating Similarity Computation: using Pearson correlation coefficient Compute 𝑅_𝑠𝑖𝑚 (𝑠𝑡,s𝑗) , if 𝑠𝑡 and 𝑠𝑗 belong to the same cluster, and enhanced rating similarity 𝑅_𝑠𝑖𝑚′ (𝑠𝑡,𝑠𝑗) is computed by weighting 𝑅_𝑠𝑖𝑚 (𝑠𝑡,𝑠𝑗). 2) Select Neighbors:Neighbors are selected for targeted services from Enhanced Rating Similarity. For selecting Neighbors Rating Similarity threshold value is considered. Services which has greater enhanced rating similarity than threshold that services is selected as neighbor for target service. 20 International Journal of Computer Application (2250-1797) Volume 5– No. 6, October 2015 3) Predicted Rating Computation: For an active user compute the predicted rating of 𝑠𝑡. If the predicted rating exceeds a recommending threshold value, it will be recommended to the active user. All recommendable services are ranked in descending order according to predicted ratings so that users can find out valuable services in less time. c. ALGORITHMS 1. Agglomerative Hierarchical Clustering Algorithm(AHC) Input:A set of services 𝑆= {S1,… ,Sn}, a description similarity matrix 𝐷= [𝑑𝑖,𝑗]n×n , the number of required clusters 𝐾. Output:𝐷𝑒𝑛𝑑𝑟 for 𝑘=1 to |𝑆|. 1. 𝐶𝑖= {𝑠𝑖} ,∀𝑖; 2. 𝑑𝐶𝑖,𝐶𝑗=𝑑𝑖,𝑗,∀𝑖,𝑗; 3. for𝑘= |𝑆| down to𝐾 4. 𝐷𝑒𝑛𝑑𝑟𝑜𝑔𝑟𝑎𝑚𝑘= {𝐶1,…,} ; 5. 𝑙,𝑚=𝑎𝑟𝑔𝑚𝑎𝑥𝑖,𝑗𝑑𝐶𝑖,𝐶𝑗; 6. 𝐶𝑙= 𝐽𝑜𝑖𝑛 (𝐶𝑙,𝐶𝑚) ; 7. for each𝐶h ∈𝑆 if𝐶h ≠ 𝐶𝑙 and 𝐶h ≠ 𝐶𝑚 8. 𝑑𝐶𝑙,𝐶h = 𝐴𝑣𝑒𝑟𝑎𝑔𝑒 (𝑑𝐶𝑙,𝐶h,𝑑𝐶𝑚,𝐶h) ; 9. 10. end if 11. end for 12. 𝑆= 𝑆− {𝐶𝑚} ; 13. end for The input to above algorithm is description similarity matrix. The algorithm compares similarity weightage of services with the threshold selected by user. According to threshold value services are clustered in to number of clusters by following above steps for AHC algorithm. 2.Nearest Neighbor Algorithm(NN) 1. begin 2. initially each services is its own cluster 3. Find two clusters closest to each other from shortest distance. 4. Merge that two Clusters. 5. Update Dendrogram 6. Repeat step 2 to 5 until K = 1 7. end 21 International Journal of Computer Application (2250-1797) Volume 5– No. 6, October 2015 For analysis purpose the another algorithm considered is Nearest Neighbor algorithm which is divisive hierarchical clustering algorithm. The algorithm compares similarity weightage of services, according to threshold value and services are clustered in to number of clusters by following above steps for NN algorithm. IV. EXPERIMENTAL RESULTS Fig. 2. Description Similarity Matrix Fig. 3.Threshold selection for AHC Algorithm Fig. 4. Clustering using AHC AlgorithmFig. 5.Threshold selection for NN Algorithm 22 International Journal of Computer Application (2250-1797) Volume 5– No. 6, October 2015 Fig. 6.Clustering using NN Algorithm Fig. 7.Rating Similarity Matrix Fig. 8.Neighbors of the services in clusterFig. 9. Recommended Services for Users 23 International Journal of Computer Application (2250-1797) Volume 5– No. 6, October 2015 Fig.10. Analysis Using Mean Absolute Error V. ANALYSIS To evaluate the accuracy of proposed system, Mean Absolute Error (MAE), a measure of the deviation of recommendations from their true user-specified ratings, is used in this paper. MAE is computed as follow: 𝑛 𝑟𝑎, 𝑡 − 𝑃 𝑢𝑎, 𝑠𝑡 /𝑛 𝑖=0 Where, 𝑛is the number of rating-prediction pairs, 𝑟𝑎, is the rating that an active user 𝑢𝑎gives to a target service 𝑠𝑡, (𝑢𝑎,𝑡) denotes the predicted rating of 𝑠𝑡for 𝑢𝑎. Nearest Neighbor algorithm proves its effectiveness as compare to Agglomerative Hierarchical Algorithm, as it does not require predefined number of clusters. The algorithm is best suited for clustering services. As the number of services in a cluster is much less than that of in the whole system, This approach costs less on line computation time. The Fig. 11 and Fig. 12 shows that we have done analysis between AHC and NN algorithms in the terms Mean Absolute Error on Y-axis and different clustering Threshold values on X-axis.The Fig. 11 shows the comparative analysis for Recommendation Threshold value 𝛾 = 0.1 and it shows that NN algorithm gives best results than AHC algorithm. The Fig.12 shows the comparative analysis for Recommendation Threshold value 𝛾= 0.2 and it also shows that NN algorithm gives best results than AHC algorithm and it is stable than AHC algorithm. 24 International Journal of Computer Application (2250-1797) Volume 5– No. 6, October 2015 Fig. 11.Comparative Analysis for𝛾 = 0.1 VI. Fig. 12. Comparative Analysis for𝛾 = 0.2 CONCLUSION The proposed systems approach for big data applications is relevant to service recommendation. Before applying Collaborative Filtering technique, clustering of services are done in which services are merged into some clusters via an Agglomerative Hierarchical Clustering and Nearest Neighbor Algorithm. Then the rating similarities between services within the same cluster are computed. As the number of services in a cluster is less than number of services in the whole system, proposed system costs less online computation time. As the ratings of services in the same cluster are more relevant with each other than services in other clusters, prediction based on the ratings of the services in the same cluster will be more accurate than based on the ratings of all similar or dissimilar services in all clusters. ACKNOWLEDGMENT The author wishes to thank Prof. A. C. Lomte, Prof. G. M. Bhandari and Dr. T. K. Nagaraj for valuable guidance and encouragement. REFERENCES [1] Rong Hu, Wanchun Dou, Jianxun Liu, “ ClubCF: A Clustering-based Collaborative Filtering Approach for Big Data Application” IEEE Transactions on Emerging Topics in Computing, 2014. [2] X. Wu, X. Zhu, G. Q. Wu, et al., “Data mining with big data,” IEEE Trans. on Knowledge and Data Engineering, vol. 26, no. 1, pp. 97107, January 2014. [3] A. Rajaraman and J. D. Ullman, “Mining of massive datasets,” Cambridge University Press, 2012. [4] A. Bellogín, I. Cantador, F. Díez, et al., “An empirical comparison of social, collaborative filtering, and hybrid recommenders,” ACM 25 International Journal of Computer Application (2250-1797) Volume 5– No. 6, October 2015 Trans. on Intelligent Systems and Technology, vol. 4, no. 1, pp. 1-37, January 2013. [5] W. Zeng, M. S. Shang, Q. M. Zhang, et al., “Can Dissimilar Users Contribute to Accuracy and Diversity of Personalized Recommendation?,” International Journal of Modern Physics C, vol. 21, no. 10, pp. 1217-1227, June 2010. [6] J. Mai, Y. Fan, and Y. Shen, “A Neural Networks-Based Clustering Collaborative Filtering Algorithm in E-Commerce Recommendation System,” in Proc. 2009 Int’l Conf. on Web Information Systems and Mining, pp. 616-619, June 2009. [7] N. Mittal, R. Nayak, M. C. Govil, et al., “Recommender System Framework using Clustering and Collaborative Filtering,” in Proc. 3rd Int’l Conf. on Emerging Trends in Engineering and Technology, pp. 555- 558, November 2010. [8] X. Li, and T. Murata. “Using Multidimensional Clustering Based Collaborative Filtering Approach Improving Recommendation Diversity,” in Proc. 2012 IEEE/WIC/ACM Int’l Joint Conf. on Web Intelligence and Intelligent Agent Technology, pp. 169-174, December 2012. [9] Z. Zhou, M. Sellami, W. Gaaloul, et al., “Data Providing Services Clustering and Management for Facilitating Service Discovery and Replacement,” IEEE Trans. on Automation Science and Engineering, vol. 10, no. 4, pp. 1-16, October 2013. [10] M. C. Pham, Y. Cao, R. Klamma, et al., “A Clustering Approach for Collaborative Filtering Recommendation Using Social Network Analysis,” Journal of Universal Computer Science, vol. 17, no. 4, pp. 583-604, April 2011. [11] R. D. Simon, X. Tengke, and W. Shengrui, “Combining collaborative filtering and clustering for implicit recommender system,” in Proc. 2013 IEEE 27th Int’l Conf. on. Advanced Information Networking and Applications, pp. 748-755, March 2013. [12] X. Liu, G. Huang, and H. Mei, “Discovering homogeneous web service community in the user-centric web environment,” IEEE Trans. on Services Computing, vol. 2, no. 2, pp. 167-181, April-June 2009. [13] H. H. Li, X. Y. Du, and X. Tian, “A review-based reputation evaluation approach for Web services,” Journal of Computer science and b technology . [14] G. Thilagavathi, D. Srivaishnavi, N. Aparna, et al., “A Survey on Efficient Hierarchical Algorithm used in Clustering,” International Journal of Engineering, vol. 2, no. 9, September 2013. [15] A. Yamashita, H. Kawamura, and K. Suzuki, “Adaptive Fusion Method for User-based and Item-based Collaborative Filtering,” Advances in Complex Systems, vol. 14, no. 2, pp. 133-149, May 2011. 26

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Recommending Services using Description Similarity Based Clustering and Collaborative Filtering