Download Web Mining Presentation.ppt

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
In customer relationship management (CRM), Web mining is the integration of
information gathered by traditional data mining methodologies and techniques with
information gathered over the World Wide Web. (Mining means extracting
something useful or valuable from a baser substance, such as mining gold from the
earth.) Web mining is used to understand customer behavior, evaluate the
effectiveness of a particular Web site, and help quantify the success of a marketing
campaign.
Web mining allows you to look for patterns in data through content mining, structure
mining, and usage mining. Content mining is used to examine data collected by search
engines and Web spiders. Structure mining is used to examine data related to the
structure of a particular Web site and usage mining is used to examine data related to
a particular user's browser as well as data gathered by forms the user may have
submitted during Web transactions.
The information gathered through Web mining is evaluated (sometimes with the aid
of software graphing applications) by using traditional data mining parameters such as
clustering and classification, association, and examination of sequential patterns.
Web Mining
is a collection of inter-related files on one or more
Web servers.
Web mining is
The application of data mining techniques to extract knowledge
ation Web data.
Web data is
Web content-text, image, records, etc
Web structure-hyperlinks, tags, etc.
Web usage –
http logs, app server logs, etc.
Web Mining –history
-Term first used in [E1996], defined in a task oriented manner
-Alternate ‘data oriented’ de z
1z
st
1997 [SM1997] ICTAI panel discussion at
Continuing forum z
WebKDD workshops with ACM SIGKDD, 1999, 2000, 2001, z
0 attendees 9 2002, … ; 60 –
shop 2001, 2002, … SIAM Web analytics work z
Special issues of DMKD journal, SIGKDD Explorations z
Papers in various data mining conferences & journals z
Surveys[ MBNL 1999, BL 1999, KB2000]
Pre-processing Web Data
-Web Content ‰
Extract “snippets” from a Web document that
represents the Web Document
-Web Structure ‰
Identifying interesting graph patterns or preprocessing the whole web graph to come up with
metrics such as PageRank
-Web Usage ‰
User identification, session creation, robot detection
and filtering, and extracting usage path patterns
Common Mining Techniques
The more basic and popular data mining
techniques include:
Classification ™
Clustering ™
Associations ™
The other significant ideas:
Topic Identification, tracking and drift analysis
Concept hierarchy creation ™
Relevance of content.
Web Content Mining Applications
Identify the topics represented by a Web Documents ™
Categorize Web Documents ™
Find Web Pages across different servers that are similar ™
Applications related to relevance ™
nhance standard Query Relevance with User, E Queries
Role, and/or Task Based Relevance
ist of top “n” relevant documents in L Recommendations
a collection or portion of a collection.
Filters-show/Hide documents based on relevance score.
What is Web Usage Mining?
A web is a collection of inter-related files on one or more Web
Web servers
Web Usage Mining
Discovery of meaningful patterns from data generated by
client-server transactions on one or more Web localities
Typical Sources of Data
access automatically generated data stored in server
cookies logs, and client-side agent logs, referrer logs,
user profiles
meta data: page attributes, content attributes, usage data
Conclusions
Web Structure is a useful source for extracting
information such as
Quality of Web Page
The authority of a page on a topic Ranking of web pages Interesting Web Structures Graph patterns like Co-citation,
Social choice, Complete bipartite graphs, etc.
Web Page Classification
Classifying web pages according to various topic
Which pages to crawl
Deciding which web pages to add to the collection of web pages
Finding Related Pages Given one relevant page, find all related
pages Detection of duplicated pages
Detection of neared-mirror sites to eliminate duplication.