Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
WEB MINING by NINI P SURESH PROJECT CO-ORDINATOR Kavitha Murugeshan Page 1 OUTLINE Introduction Data mining Vs Web mining Web mining subtasks Challenges Taxonomy Web content mining Web structure mining Web usage mining Applications Page 2 INTRODUCTION Nowadays, it has become necessary for users to utilise automated tools to find, extract, filter & evaluate desired information & resources. The target of search engines is only to discover the resources on the web. Page 3 INTRODUCTION Needs for Web Mining Narrowly searching scope Low precision Page 4 INTRODUCTION Other Approaches Database approach (DB) Information retrieval Natural language processing (NLP) Web document community Page 5 WEB MINING DEFENITION Web mining refers to the overall process of discovering potentially useful and previously unknown information or knowledge from the Web data. Page 6 DATA MINING Extraction of useful WEB MINING Extracting relevant patterns from data information hidden in sources like databases, Web-related data, like texts, web, images etc hypertext documents on web Page 7 WEB MINING SUBTASKS Resource finding Information selection & preprocessing Generalization Analysis Page 8 CHALLENGES Search relevant information on web Create knowledge Personalization of Information Learn patterns Uniformity & standardisation Page 9 CHALLENGES Redundant Information Noisy web Monitoring changes Sites providing Services Privacy Page 10 TAXONOMY Web Mining Web Content Mining Web Text Mining Web Structure Mining Web Multimedia Mining Link Mining Gen. Access Pattern Track Internal Structure Mining Web Usage Mining Personalized Usages Track URL Mining Page 11 WEB CONTENT MINING Discovering useful information & Analyses the content Automatic process beyond keyword extraction Approaches to restructure document content Two groups of mining strategies Page 12 WEB CONTENT MINING Agent based Approach Intelligent search agents Information filtering/categorization Personalized web agents Page 13 WEB CONTENT MINING Database Approach Multilevel databases Web query system Page 14 WEB STRUCTURE MINING Discovering structure information from web Web graph : web pages as nodes & hyperlinks as edges Page 15 WEB STRUCTURE MINING Two algorithms for handling of links PageRank HITS Page 16 WEB STRUCTURE MINING PageRank Metric for ranking hypertext documents Depends on rank of pages pointing it Iterative process Page 17 WEB STRUCTURE MINING n : Number of nodes in graph Outdegree(q) : Number of hyperlinks on page q d : damping factor Page 18 WEB STRUCTURE MINING HITS Iterative algorithm Identify topic hubs & authorities Input : search results returned by traditional text indexing technique Page 19 WEB STRUCTURE MINING Assigns weight to hub based on authoritiveness Outputs pages with largest hub & authority weights Page 20 WEB USAGE MINING Extracting information from server logs Discover user access patterns of Web pages Decomposed into 3 subtasks Site Files Preprocessing Raw logs Mining algorithms User session file Rules, Patterns & Statistic Pattern Analysis Interesting Rules, Patterns & Statistic Page 21 WEB USAGE MINING Preprocessing Data cleaning User identification User sessions identification Access path supplement Transaction identification Page 22 WEB USAGE MINING Pattern discovery Statistical Analysis Association Rules Clustering analysis Page 23 WEB USAGE MINING Classification analysis Sequential Pattern Dependancy Modeling Page 24 WEB USAGE MINING Pattern Analysis Eliminates irrelevant rules or patterns Extract intresting patterns Page 25 APPLICATIONS Personalized Services Improve website design System Improvement Predicting trends Carry out intelligent buisness Page 26 PROS High trade volumes Classify threats & fight against Terrorism Establish better customer relationship Increase profitability Page 27 CONS Invasion of Privacy Discrimination by controversial attributes Page 28 CONCLUSION Rapidly growing area Promising area of future research Page 29 REFERENCE [1] http://en.wikipedia.org/wiki/Web mining [2] http://www.galeas.de/webimining.html [3] Jaideep srivastava, Robert Cooley, Mukund Deshpande, Pan-Ning Tan, Web Usage Mining: Discovery and Applications of Usage Patterns from Web Data, SIGKDD Explorations, ACM SIGKDD,Jan 2000. [4] Miguel Gomes da Costa Jnior,Zhiguo Gong, Web Structure Mining: An Introduction, Proceedings of the 2005 IEEE International Conference on Information Acquisition [5] R. Cooley, B. Mobasher, and J. Srivastava,Web Mining: Information and Pattern Discovery on the World Wide Web, ICTAI97 [6] Brijendra Singh, Hemant Kumar Singh, WEB DATA MINING RESEARCH: A SURVEY, 2010 IEEE [7] Mining the Web: discovering knowledge from hypertext data, Part 2 By Soumen Chakrabarti, 2003 edition [8] Web mining: applications and techniques By Anthony Scime Page 30 WEB MINING Thank You Page 31