Download What is Web Data

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Web Mining
Web Mining
Web mining is the application of data
mining techniques to find interesting and
potentially useful knowledge from web
data.
What is Web Data ?
Web data is
Web content –text,image,records,etc.
Web structure –hyperlinks,tags,etc.
Web usage –http logs,app server
logs,etc.
Web Mining Taxonomy
Web-Mining
Web Content
Mining
Web Structured
Mining
Web Usage
Mining
Web Content Mining
Discovery of useful information from web
contents / data / documents
Web data contents:
1. text,
2. image,
3. audio,
4. video,
5. metadata and
6. hyperlinks
Web Structured Mining
•It deal with discovering and modeling the link
structure of the web.
•Work has been carried out to model the web based
on the topology of the hyperlinks.
Helps in
•Discovering similarities between sites
•In discovering important sites for a particular topic.
•Discovering web communities.
Web Usage Mining
t deals with understanding user
behavior in interacting with the web or
with a website.
Aim
To obtain information that may assist
web sites for reorganization or
adaptation to better suit the user.
To understand user’s behaviour
•Clicking pattern
•Browsing time
•Transaction
Application
1. Target potential customers for electronic
commerce
2. Enhance the quality and delivery of Internet
information services to the end user
3. Improve Web server system performance
4. Identify potential prime advertisement locations
5. Facilitates personalization/adaptive sites
6. Improve site design
7. Fraud/intrusion detection
8. Predict user’s actions (allows pre fetching)
Web Mining Taxonomy
Web-Mining
Web Structured
Mining
Web Content Mining
Text
Image
Audio
Video
Structured
Hyperlink
s
Web Usage Mining
Document
Structured
Intra-Document
Hyperlinks
Web-Server
Logs
Inter-Document
Hyperlinks
Application
Server Logs
Application
Level Logs
Web usage mining and
E-Commerce
E-commerce is the killer-application of web mining
•Keep former customers and attract new customers
•Provide better service and be more interactive
Web usage mining is the best way to analyse the customer’s
behaviour.
•Discover customers needs or interests
•Analyse customers behaviour
The KDD Process for E-commerce
Action
Data collection and
Pre-processing
Mining
Pattern discovery and analysis
Reconditions
Again Action
Pattern Discovery and Analysis
Pattern Discovery
Using the mining algorithms to discover the
pattern
Pattern Analysis
To filter out uninteresting/meaningless rules or
patterns from the set found in the pattern
discovery phase
•Information filter
•OLAP (On-line analytical processing)
•Visualization
•Knowledge query mechanism (SQL)
Technologies For Web Usage Mining
Web usage mining technologies
•Statistical analysis
Most common method, such as frequency, mean (average),
median, etc.
•Classification
Mapping a data item into one of several predefined classes
•Clustering
To group together a set of items having similar characteristics
Technologies For Web Usage Mining
•Association rule
Can be used to relate page or product that are most often
referenced or purchased together
•Sequential patterns
A set of items is followed by another item in time-order
E-commerce Business Objectives
•Personalization
Web site personalization (content or layout)
Personalized advertisement
Personalized product recommendation
•Marketing strategy
Marketing rule
Changing the marketing strategy
•Web site design
Web site evaluation
Reorganize
Improve the hypertext structure
Optimization
Web usage mining for e-commerce
•Many applications in different areas of Ecommerce have already been proposed
•However, most research just focuses on the
first two steps of the KDD process
•Data mining is meaningless if we do not take
action in E-commerce
Possible work
n the area of web usage mining for
-commerce. Also in the area of web search
mploying Web Crawlers or algorithms like
ITS (Hypertext Induced topic search), Web
Warehousing etc.