Download - International Journal of Multidisciplinary Research and

International Journal of Multidisciplinary Research and Development Online ISSN: 2349-4182 Print ISSN: 2349-5979 www.allsubjectjournal.com Volume 3; Issue 3; March 2016; Page No. 82-83; (Special Issue) A study on web page prediction using Markov models and page rank algorithm 1 1 2 C Thavamani, 2 Dr. A Rengarajan Research Scholar, Bharathiar University, Coimbatore Professor, Veltech Multi Tech SRS Engineering, Avadi, Chennai Abstract The web is a large source of information that can be turned into knowledge. Web mining is the application of data mining techniques to discover patterns from the web. Web logs contain information about web server request and response. The purpose of this paper is to explore ways to exploit the information from web logs for predicting users' web page access. Markov model is the most commonly used in the identification of patterns based on the sequence of previously accessed page and predication model because of its high accuracy. To predict the next page access, we use Markov model on the web session. And if ambiguous results are found, Page Rank algorithm is used for deciding the desired page. Keywords: Web mining, prediction model, Markov models, page rank algorithm Introduction Web usage mining is the application of data mining techniques to discover usage patterns from Web data in order to understand and better serve needs of Web based applications. It consists of three phases, namely pre-processing, pattern discovery, and pattern analysis. Web servers, proxies, and client applications can quite easily capture data about Web usage. Various attempts have been taken the advantage of web page access prediction by preprocessing web server log files and analyzing web users' navigational patterns. The Markov model process calculates the probability of the page the user will visit next after visiting a sequence of web pages in the same session. Markov model implementations have been disturbed due to the fact that low order Markov models do not use enough history and therefore, lack accuracy, whereas, high order Markov models incur high state space complexity. Literature Study A number of researchers attempt to improve the web page access prediction precision or coverage by combining different recommendation framework. For instance many papers combined clustering with association rules (Lai and Yang 2000, Liu et al. 2001) [1]. Lai & Yang (2000) have introduced a customized marketing on the Web approach using a combination of clustering and association rules. The authors collected information about customers using forms, Web server log files and cookies. They categorized customers according to the information collected. Since k-means clustering algorithm works only with numerical data, the authors used PAM (Partitioning around Medoids) algorithm to cluster data using categorical scales. They then performed association rules techniques on each cluster. They proved through experimentations that implementing association rules on clusters achieves better results than on non- clustered data for customizing the customers' marketing preferences. Markov Models Markov models [2] have been extensively used for predicting the action a user will take next given the sequence of actions he or she has already performed. For this type of problems, Markov models are represented by three parameters < X; Y; T >, where X is the set of all possible actions that can be performed by the user; Y is the set of all possible states for which the Markov model is built; and T is a jYj _ jXj Transition Probability Matrix (TPM), where each entry tij corresponds to the probability of performing the action j when the process is in state i. The state-space of the Markov model depends on the number of previous actions used in predicting the next action. The simplest Markov model predicts the next action by only looking at the last action performed by the user. In this model, also known as the first-order Markov model, each action that can be performed by a user corresponds to a state in the model. A somewhat more complicated model computes the predictions by looking at the last two actions performed by the user. This is called the second-order Markov model, and its states correspond to all possible pairs of actions that can be performed in sequence. This approach is generalized to the Kth-order Markov model, which computes the predictions by looking at the last K actions performed by the user, leading to a state-space that contains all possible sequences of K actions. Example:- Suppose we have a user session A=<1,2,3,4,5,6,7> is the sequence of pages a user have visited. Suppose, also, that we use a sliding window of size 5. We apply feature extraction to A=<1,2,3,4,5,6,7> and end with the following user sessions of 5 page length: B=<1,2,3,4,5>, C=<2,3,4,5,6> and D=<3,4,5,6,7>. Note that the outcome or label of the sessions A,B,C and D are 7,5,6 and 7, respectively. This way, we end up with the following four user sessions: A, B, C, and D. In general, the total number of extracted sessions using a sliding window of size w and original session of size A is |A|w+1. To extract more knowledge from the user sessions, we use what we call frequency matrix 1 2 …. N 1 0 Freq(2,1) Freq(….,1) 0 2 Freq(1,2) 0 Freq(….,2) Freq(N,2) …. …. …. …. …. N Freq(1,N) Freq(2,N) Freq(….,N) 0 82 Sample web sessions of the above example, WS1 : {3,2,1,4,5,6,3} WS2 : {3,2,1} WS3 : {4,5,2,1,5,4} WS4 : {3,5,2,1,4,6,7,5} WS5 : {1,4,2,5,4,6} Table 1: First order Transition Probability Matrix 1st Order S1={1} S2={2} S3={3} S4={4} S5={5} S6={6} S7={7} 1 0 4 0 0 0 0 0 2 0 0 2 1 2 0 0 3 0 0 0 0 0 1 0 4 3 0 0 0 2 0 0 5 1 1 1 2 0 0 1 6 0 0 0 2 1 0 0 7 0 0 0 0 0 1 0 References 1. Faten Khalil, Jiuyong Li, Hua Wang Integrating Recommendation Models for Improved Web Page Prediction Accuracy 2. Deshpande M, Karypis G, Selective Markov Models for Predicting Web Page Accesses, ACM transactions on Internet Technology, 2004; 4(2):163-184. 3. Bing Liu, Web Data Mining Exploring Hyperlinks, Contents, and Usage Data, Springer-Verlag Berlin Heidelberg 2007. 4. Tasawar Hussain, Dr. Sohail Asghar, Dr. Nayyer Masood, Web Usage Mining: A Survey on Preprocessing of Web Log File. 5. Payal Gulati, A Novel Approach for Determining Next Page Access, 2008, IEEE. Page Rank Algorithm The importance of a page is proportional to the sum of the importance scores of pages linking to it. The justification for using Page Rank for ranking web pages comes from the random surfer model. Page Rank models the behavior of a web surfer who browses the Web. The Web surfer starts from a random node on the graph, user clicks on hyperlinks forever and picks a link uniformly at random on each page to move on to the next page. The number of times the surfer has visited each page is counted. Page Rank of a given page is this number divided by the total number of pages the surfer has browsed. Page Rank is a static ranking of web pages in the sense that a Page Rank value is computed for each page offline and it does not depend on search queries [3]. The Web is treated as a directed graph G = (V, E), where V is the set of vertices or nodes, i.e., the set of all pages, and E is the set of directed edges in the graph, i.e., hyperlinks. In page rank calculation, especially for larger systems, iterative calculation method is used. 4 Proposed System The proposed system focuses on the improvements of predicting web page access. The process is as follows: Prediction: Begin For each coming session Use Markov model to make prediction if the predictions are made with the ambiguous result Use page rank algorithm to make a prediction End If End for End Conclusion and Future Work Markov model is the most commonly used prediction model because of its high accuracy. Low order Markov models have higher accuracy and lower coverage. Higher-order Markov models and hidden Markov models are more accurate for predicting navigational paths. Page rank algorithms and Markov model are commonly used for next page prediction. In addition, popularity of pages in page rank can be considered as well. However, the similarity of page is not yet considered for page ranking algorithm. And the popularity factor may depend on the concept of page. In future, we are going to propose a new hybrid technique which integrates higher order markov model and Popularity and Similarity based Page Rank (PSPR) models for next page prediction will be a promising approach than that of previous similar model. 83

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download - International Journal of Multidisciplinary Research and