Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
In customer relationship management (CRM), Web mining is the integration of information gathered by traditional data mining methodologies and techniques with information gathered over the World Wide Web. (Mining means extracting something useful or valuable from a baser substance, such as mining gold from the earth.) Web mining is used to understand customer behavior, evaluate the effectiveness of a particular Web site, and help quantify the success of a marketing campaign. Web mining allows you to look for patterns in data through content mining, structure mining, and usage mining. Content mining is used to examine data collected by search engines and Web spiders. Structure mining is used to examine data related to the structure of a particular Web site and usage mining is used to examine data related to a particular user's browser as well as data gathered by forms the user may have submitted during Web transactions. The information gathered through Web mining is evaluated (sometimes with the aid of software graphing applications) by using traditional data mining parameters such as clustering and classification, association, and examination of sequential patterns. Web Mining is a collection of inter-related files on one or more Web servers. Web mining is The application of data mining techniques to extract knowledge ation Web data. Web data is Web content-text, image, records, etc Web structure-hyperlinks, tags, etc. Web usage – http logs, app server logs, etc. Web Mining –history -Term first used in [E1996], defined in a task oriented manner -Alternate ‘data oriented’ de z 1z st 1997 [SM1997] ICTAI panel discussion at Continuing forum z WebKDD workshops with ACM SIGKDD, 1999, 2000, 2001, z 0 attendees 9 2002, … ; 60 – shop 2001, 2002, … SIAM Web analytics work z Special issues of DMKD journal, SIGKDD Explorations z Papers in various data mining conferences & journals z Surveys[ MBNL 1999, BL 1999, KB2000] Pre-processing Web Data -Web Content ‰ Extract “snippets” from a Web document that represents the Web Document -Web Structure ‰ Identifying interesting graph patterns or preprocessing the whole web graph to come up with metrics such as PageRank -Web Usage ‰ User identification, session creation, robot detection and filtering, and extracting usage path patterns Common Mining Techniques The more basic and popular data mining techniques include: Classification ™ Clustering ™ Associations ™ The other significant ideas: Topic Identification, tracking and drift analysis Concept hierarchy creation ™ Relevance of content. Web Content Mining Applications Identify the topics represented by a Web Documents ™ Categorize Web Documents ™ Find Web Pages across different servers that are similar ™ Applications related to relevance ™ nhance standard Query Relevance with User, E Queries Role, and/or Task Based Relevance ist of top “n” relevant documents in L Recommendations a collection or portion of a collection. Filters-show/Hide documents based on relevance score. What is Web Usage Mining? A web is a collection of inter-related files on one or more Web Web servers Web Usage Mining Discovery of meaningful patterns from data generated by client-server transactions on one or more Web localities Typical Sources of Data access automatically generated data stored in server cookies logs, and client-side agent logs, referrer logs, user profiles meta data: page attributes, content attributes, usage data Conclusions Web Structure is a useful source for extracting information such as Quality of Web Page The authority of a page on a topic Ranking of web pages Interesting Web Structures Graph patterns like Co-citation, Social choice, Complete bipartite graphs, etc. Web Page Classification Classifying web pages according to various topic Which pages to crawl Deciding which web pages to add to the collection of web pages Finding Related Pages Given one relevant page, find all related pages Detection of duplicated pages Detection of neared-mirror sites to eliminate duplication.