Download presentation_hye_yi

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Web Mining
4/17/2007
Hye Seon Yi
Why Web mining?
• To mine semi-structured data with
hyperlinks and html tags
– Traditional Data Mining, Information
Retrieval and machine learning systems
use well-structured data (ex) tabular
data
• To find information systematically
from the vast collection of Web
documents
Data for Web mining
• Web pages
– Written in HTML and XML
• Web logs
– generated and kept at Web servers
• Hyperlink structures
• Other related data
– User profiles and registration
information
Tasks of Web Mining
• Resource finding
– Collect data
• Information selection and preprocessing
– Convert data into a desired format
• Generalization
– Discover patterns
• Analysis
– Validate and/or interpret the patterns
Category in Web Mining
• Web Usage Mining
– Find usage patterns of users
• Web Content Mining
– Knowledge discovery in Web data
• Web Structure Mining
– Analyze the node and connection
structure of a Web site
Web Usage Mining
• Find usage patterns of users
• Mining Objects
– Web log records
• Visits, clicks, accessed pages, and so on
– User Input from forms and survey
• Utilization at E-Commerce sites
– Suggesting pages and resources according to
user’s browsing trends
(ex) Amazon.com, Netflix.com
– Active researching area
Web Content Mining
• Knowledge discovery in Web data
• Mining Objects
– Textural data (unstructured, semi-structured,
well-structured)
– Multimedia data (image, audio, video and etc)
• IR view (bag of words) vs. DB view
(structured) of Web content
• Agent-based approach (AI agent) vs. DB
approach
Web Structure Mining
• Mining Objects
– Hyperlink Structure
• Utilization for search engines
– Rank search results
(ex) PageRank Algorithm
– Developed and used by Google
– Each link as a vote
– Each Link is weighed
differently by the importance
of the page itself
Summary
• Web Mining: Mining Web documents
• 3 sub-categories
– Web Usage Mining
– Web Content Mining
– Web Structure Mining