Download Recommending Services using Description Similarity Based Clustering and Collaborative Filtering

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Human genetic clustering wikipedia , lookup

Nonlinear dimensionality reduction wikipedia , lookup

Expectation–maximization algorithm wikipedia , lookup

K-nearest neighbors algorithm wikipedia , lookup

K-means clustering wikipedia , lookup

Nearest-neighbor chain algorithm wikipedia , lookup

Cluster analysis wikipedia , lookup

Transcript
International Journal of Computer Application (2250-1797)
Volume 5– No. 6, October 2015
Recommending Services using Description Similarity Based
Clustering and Collaborative Filtering
Ms. Sheetal Thokal
Prof. Vrunda Bhusar i
Department of Computer Engineering,
Department of Computer
Engineering,JSPM’s Bhivrabai Sawant Institute & Research, JSPM’s Bhivrabai Sawant Institute &
Research,
Wagholi, Pune, India
Wagholi, Pune, India
[email protected]@gmail.com
Abstract-The challenge for the Big Data applications is to handle the large size of data and extract useful information
for future use. Number of services are added on the Internet through service computing and cloud computing. So, servicerelevant data becoming too big & difficult to effectively handle by traditional approaches for database management like
DBMS & RDBMS. To handle this challenge Clustering based on Description and Collaborative Filtering approach is
proposed in this paper, which collect similar services in the same clusters to recommend services collaboratively.
Technically this approach is divided in two stages. In the first stage, the available services are divided into small scale
clusters using agglomerative Hierarchical Clustering algorithm & Nearest Neighbor algorithm. Clustering is used to
reduce the data size by grouping similar services together. For further processing, at the second stage, a collaborative
filtering algorithm is used on one of the clusters which contains services in descending order depends on rating. It can
reduce the online execution time of collaborative filtering as, the number of services in a cluster are less than the total
number of the services available on the web.
I.
INTRODUCTION
In Big Data applications data collection has growing tremendously and commonly used software tools does not have
the ability to capture, manage, and process data within less time[2].The most important challenge for the Big Data
applications is to handle the large size of data and extract useful information for future use[3]. Nowadays, more
services are added in cloud infrastructures to provide better functionalities. But Service users have difficulties in
finding ideal services from the set of services. To assist users in a decision making processRecommender systems
(RSs) are techniques which is intelligent applications, Where they want to choose some items among a set of
alternative products or services.RSs handles two main challenges for big data application: 1) making decision within
acceptable time; and 2) generating ideal recommendations from so many services.
Clustering isimportant task in data analysis and data mining applications. In this method objects are assigned in the
identical group which are more related to each other than theobjects those in other groups. Cluster is an ordered list
of data which have the familiar characteristics. Cluster analysis is done by calculating similarities between data
according to the characteristics of the data and grouping similar data objects into clusters. Collaborative filtering
methods are based on collecting and analyzing a large amount of information on users behaviors, activities or
preferences. It predicts what users will like based on their similarity to other users. It has capability of accurately
recommending items without understanding of the item. Collaborative filtering (CF)is of two types, item and userbased methods[4]. The user-based CF is assumes that people who agree in the past will agree again in the future.
The item-based CF algorithm recommends a user the items that are similar to what user has preferred before[5].
18
International Journal of Computer Application (2250-1797)
Volume 5– No. 6, October 2015
We propose a Description Based Clustering & Collaborative Filtering which is divided in two stages: Clustering and
Collaborative filtering. Clustering is a first step to separate big data services into manageable clusters. So number of
services in a cluster are much less than the total number of services. Therefore computation time of Collaborative
Filtering algorithm which is used in second step can be reduced. As the ratings of similar services within a cluster
are more relevant than dissimilar services, the recommendation accuracy based on users ratings may increase.
II.
RELATED WORK
A neural networks-based clustering collaborative filtering algorithm in e-commerce recommendation system is
proposed by Mai et al. [6]. To achieve the predictions for a user by first minimizing the size of item set the user
needed to explore the Recommender system framework is proposed by Mittal et al. [7]. In this system K-means
clustering algorithm was applied to Cluster movies based on the request of user. Li et al.[8]designed to incorporate
multidimensional clustering into a collaborative filtering recommendation model. Data in the form of user and item
profiles was clustered using the proposed algorithm in the first stage. Then the poor clusters were deleted and the
appropriate clusters were further selected based on cluster pruning. At the third stage, an item prediction was made
by performing a weighted average of deviations from the neighbors mean. Zhou et al. [9] proposed Data Providing
service in terms of vectors by considering the composite relation between input, output, and semantic relations
between them. For vector clustering fuzzy C-means algorithm was used. To use network clustering technique on
social network of users for identify their neighborhood, and then use the traditional CF algorithms to generate the
recommendations Pham et al. proposed system [10]. This work depends on social relationships between users.
Simon et al. [11]proposed system in which a high-dimensional parameter-free, divisive hierarchical clustering
algorithm which requires only implicit feedback is used, to discover the relationships within the users. Based on the
clustering results, products of high interest were recommended to the users.
III.
a.
PROPOSED SYSTEM
Preliminary Knowledge
A service can be expressed as a triple, S = (𝐷,𝐹,𝑅), where 𝐷 is a set of words which describes service s, 𝐹 is a set of
functionalities of service s, 𝑅 is a set of ratings which some users gave to service s. Three kinds of service
similarities are computed based on 𝐷, F and 𝑅 during the process of system.Suppose 𝑠𝑡 = <𝐷𝑡,F𝑡,𝑅𝑡> and 𝑠𝑗 =
<𝐷𝑗,𝐹𝑗,𝑅𝑗> are two services. The similarity between 𝑠𝑡 and 𝑠𝑗 is considered in Three dimensions which are
description similarity 𝐷_𝑠𝑖𝑚 (𝑠𝑡,sj), rating similarity 𝑅_𝑠𝑖𝑚 (𝑠𝑡,sj) and enhanced rating similarity 𝑅_𝑠𝑖𝑚′ (𝑠𝑡,s𝑗) ,
respectively. With this assumption, Proposed systems approach for Big Data application is presented.
b.
System Architecture
Technically, Proposed system focuses on two stages, Clustering stage and Collaborative filtering stage as shown in
Fig. 1. In the first stage, services are clustered according to their Description similarities. In the second stage, a
collaborative filtering algorithm is applied within a cluster that has a target service.
19
International Journal of Computer Application (2250-1797)
Volume 5– No. 6, October 2015
Fig. 1. Architecture of Proposed System
A) Clustering
1) Stem Word Finding: In this stage, stem words are recognized from description of services from
database. Different developers use different words to describe similar services.These words directly
can influence the measurement of description similarity. Therefore, description words are uniformed
before usage by collecting morphological similar words together assuming that they are semantically
similar.
2) Description Similarity Computation: Description similarity 𝐷_𝑠𝑖𝑚 (𝑠𝑡, 𝑠𝑗) is calculated by using
formula
𝐷_𝑠𝑖𝑚 (𝑠𝑡,) = |𝐷𝑡′⋂𝐷𝑗′| / |𝐷𝑡′⋃𝐷𝑗′|
Larger value of |𝐷𝑡′⋂𝐷𝑗′| gives more similar services.
3) Clustering Services: Services are clustered according to their description similarity using
agglomerative hierarchical clustering algorithm and Nearest Neighbor Algorithm which are discussed
further in section c.
B) Collaborative Filtering
1) Rating Similarity Computation: using Pearson correlation coefficient Compute 𝑅_𝑠𝑖𝑚 (𝑠𝑡,s𝑗) , if 𝑠𝑡
and 𝑠𝑗 belong to the same cluster, and enhanced rating similarity 𝑅_𝑠𝑖𝑚′ (𝑠𝑡,𝑠𝑗) is computed by
weighting 𝑅_𝑠𝑖𝑚 (𝑠𝑡,𝑠𝑗).
2)
Select Neighbors:Neighbors are selected for targeted services from Enhanced Rating Similarity. For
selecting Neighbors Rating Similarity threshold value is considered. Services which has greater
enhanced rating similarity than threshold that services is selected as neighbor for target service.
20
International Journal of Computer Application (2250-1797)
Volume 5– No. 6, October 2015
3) Predicted Rating Computation: For an active user compute the predicted rating of 𝑠𝑡. If the predicted
rating exceeds a recommending threshold value, it will be recommended to the active user. All
recommendable services are ranked in descending order according to predicted ratings so that users
can find out valuable services in less time.
c.
ALGORITHMS
1. Agglomerative Hierarchical Clustering Algorithm(AHC)
Input:A set of services 𝑆= {S1,… ,Sn}, a description similarity
matrix 𝐷= [𝑑𝑖,𝑗]n×n , the number of required clusters 𝐾.
Output:𝐷𝑒𝑛𝑑𝑟 for 𝑘=1 to |𝑆|.
1. 𝐶𝑖= {𝑠𝑖} ,∀𝑖;
2. 𝑑𝐶𝑖,𝐶𝑗=𝑑𝑖,𝑗,∀𝑖,𝑗;
3. for𝑘= |𝑆| down to𝐾
4.
𝐷𝑒𝑛𝑑𝑟𝑜𝑔𝑟𝑎𝑚𝑘= {𝐶1,…,} ;
5.
𝑙,𝑚=𝑎𝑟𝑔𝑚𝑎𝑥𝑖,𝑗𝑑𝐶𝑖,𝐶𝑗;
6.
𝐶𝑙= 𝐽𝑜𝑖𝑛 (𝐶𝑙,𝐶𝑚) ;
7.
for each𝐶h ∈𝑆
if𝐶h ≠ 𝐶𝑙 and 𝐶h ≠ 𝐶𝑚
8.
𝑑𝐶𝑙,𝐶h = 𝐴𝑣𝑒𝑟𝑎𝑔𝑒 (𝑑𝐶𝑙,𝐶h,𝑑𝐶𝑚,𝐶h) ;
9.
10.
end if
11.
end for
12.
𝑆= 𝑆− {𝐶𝑚} ;
13. end for
The input to above algorithm is description similarity matrix. The algorithm compares similarity weightage of
services with the threshold selected by user. According to threshold value services are clustered in to number of
clusters by following above steps for AHC algorithm.
2.Nearest Neighbor Algorithm(NN)
1.
begin
2.
initially each services is its own cluster
3.
Find two clusters closest to each other from shortest distance.
4.
Merge that two Clusters.
5.
Update Dendrogram
6.
Repeat step 2 to 5 until K = 1
7.
end
21
International Journal of Computer Application (2250-1797)
Volume 5– No. 6, October 2015
For analysis purpose the another algorithm considered is Nearest Neighbor algorithm which is divisive
hierarchical clustering algorithm. The algorithm compares similarity weightage of services, according to
threshold value and services are clustered in to number of clusters by following above steps for NN
algorithm.
IV.
EXPERIMENTAL RESULTS
Fig. 2. Description Similarity Matrix
Fig. 3.Threshold selection for AHC Algorithm
Fig. 4. Clustering using AHC AlgorithmFig. 5.Threshold selection for NN Algorithm
22
International Journal of Computer Application (2250-1797)
Volume 5– No. 6, October 2015
Fig. 6.Clustering using NN Algorithm
Fig. 7.Rating Similarity Matrix
Fig. 8.Neighbors of the services in clusterFig. 9. Recommended Services for Users
23
International Journal of Computer Application (2250-1797)
Volume 5– No. 6, October 2015
Fig.10. Analysis Using Mean Absolute Error
V.
ANALYSIS
To evaluate the accuracy of proposed system, Mean Absolute Error (MAE), a measure of the deviation of
recommendations from their true user-specified ratings, is used in this paper.
MAE is computed as follow:
𝑛
𝑟𝑎, 𝑡 − 𝑃 𝑢𝑎, 𝑠𝑡 /𝑛
𝑖=0
Where, 𝑛is the number of rating-prediction pairs, 𝑟𝑎, is the rating that an active user 𝑢𝑎gives to a target service 𝑠𝑡,
(𝑢𝑎,𝑡) denotes the predicted rating of 𝑠𝑡for 𝑢𝑎.
Nearest Neighbor algorithm proves its effectiveness as compare to Agglomerative Hierarchical Algorithm, as it does
not require predefined number of clusters. The algorithm is best suited for clustering services. As the number of
services in a cluster is much less than that of in the whole system, This approach costs less on line computation time.
The Fig. 11 and Fig. 12 shows that we have done analysis between AHC and NN algorithms in the terms Mean
Absolute Error on Y-axis and different clustering Threshold values on X-axis.The Fig. 11 shows the comparative
analysis for Recommendation Threshold value 𝛾 = 0.1 and it shows that NN algorithm gives best results than AHC
algorithm. The Fig.12 shows the comparative analysis for Recommendation Threshold value 𝛾= 0.2 and it also
shows that NN algorithm gives best results than AHC algorithm and it is stable than AHC algorithm.
24
International Journal of Computer Application (2250-1797)
Volume 5– No. 6, October 2015
Fig. 11.Comparative Analysis for𝛾 = 0.1
VI.
Fig. 12. Comparative Analysis for𝛾 = 0.2
CONCLUSION
The proposed systems approach for big data applications is relevant to service recommendation. Before applying
Collaborative Filtering technique, clustering of services are done in which services are merged into some clusters via
an Agglomerative Hierarchical Clustering and Nearest Neighbor Algorithm. Then the rating similarities between
services within the same cluster are computed. As the number of services in a cluster is less than number of services
in the whole system, proposed system costs less online computation time. As the ratings of services in the same
cluster are more relevant with each other than services in other clusters, prediction based on the ratings of the
services in the same cluster will be more accurate than based on the ratings of all similar or dissimilar services in all
clusters.
ACKNOWLEDGMENT
The author wishes to thank Prof. A. C. Lomte, Prof. G. M. Bhandari and Dr. T. K. Nagaraj for valuable guidance
and encouragement.
REFERENCES
[1] Rong Hu, Wanchun Dou, Jianxun Liu, “ ClubCF: A Clustering-based Collaborative Filtering Approach for Big Data Application” IEEE
Transactions on Emerging Topics in Computing, 2014.
[2] X. Wu, X. Zhu, G. Q. Wu, et al., “Data mining with big data,” IEEE Trans. on Knowledge and Data Engineering, vol. 26, no. 1, pp. 97107, January 2014.
[3] A. Rajaraman and J. D. Ullman, “Mining of massive datasets,” Cambridge University Press, 2012.
[4] A. Bellogín, I. Cantador, F. Díez, et al., “An empirical comparison of social, collaborative filtering, and hybrid recommenders,” ACM
25
International Journal of Computer Application (2250-1797)
Volume 5– No. 6, October 2015
Trans. on Intelligent Systems and Technology, vol. 4, no. 1, pp. 1-37, January 2013.
[5] W. Zeng, M. S. Shang, Q. M. Zhang, et al., “Can Dissimilar Users Contribute to Accuracy and Diversity of Personalized
Recommendation?,” International Journal of Modern Physics C, vol. 21, no. 10, pp. 1217-1227, June 2010.
[6] J. Mai, Y. Fan, and Y. Shen, “A Neural Networks-Based Clustering Collaborative Filtering Algorithm in E-Commerce Recommendation
System,” in Proc. 2009 Int’l Conf. on Web Information Systems and Mining, pp. 616-619, June 2009.
[7] N. Mittal, R. Nayak, M. C. Govil, et al., “Recommender System Framework using Clustering and Collaborative Filtering,” in Proc. 3rd
Int’l Conf. on Emerging Trends in Engineering and Technology, pp. 555- 558, November 2010.
[8] X. Li, and T. Murata. “Using Multidimensional Clustering Based Collaborative Filtering Approach Improving Recommendation
Diversity,” in Proc. 2012 IEEE/WIC/ACM Int’l Joint Conf. on Web Intelligence and Intelligent Agent Technology, pp. 169-174,
December 2012.
[9] Z. Zhou, M. Sellami, W. Gaaloul, et al., “Data Providing Services Clustering and Management for Facilitating Service Discovery and
Replacement,” IEEE Trans. on Automation Science and Engineering, vol. 10, no. 4, pp. 1-16, October 2013.
[10] M. C. Pham, Y. Cao, R. Klamma, et al., “A Clustering Approach for Collaborative Filtering Recommendation Using Social Network
Analysis,” Journal of Universal Computer Science, vol. 17, no. 4, pp. 583-604, April 2011.
[11] R. D. Simon, X. Tengke, and W. Shengrui, “Combining collaborative filtering and clustering for implicit recommender system,” in Proc.
2013 IEEE 27th Int’l Conf. on. Advanced Information Networking and Applications, pp. 748-755, March 2013.
[12] X. Liu, G. Huang, and H. Mei, “Discovering homogeneous web service community in the user-centric web environment,” IEEE Trans. on
Services Computing, vol. 2, no. 2, pp. 167-181, April-June 2009.
[13] H. H. Li, X. Y. Du, and X. Tian, “A review-based reputation evaluation approach for Web services,” Journal of Computer science and b
technology .
[14] G. Thilagavathi, D. Srivaishnavi, N. Aparna, et al., “A Survey on Efficient Hierarchical Algorithm used in Clustering,” International
Journal of Engineering, vol. 2, no. 9, September 2013.
[15] A. Yamashita, H. Kawamura, and K. Suzuki, “Adaptive Fusion Method for User-based and Item-based Collaborative Filtering,” Advances
in Complex Systems, vol. 14, no. 2, pp. 133-149, May 2011.
26