Download Connected Social Network

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
WEB MINING
by
NINI P SURESH
PROJECT CO-ORDINATOR
Kavitha Murugeshan
Page 1
OUTLINE
Introduction
Data mining Vs Web mining
Web mining subtasks
Challenges
Taxonomy
Web content mining
Web structure mining
Web usage mining
Applications
Page 2
INTRODUCTION
Nowadays, it has become necessary for users to
utilise automated tools to find, extract, filter &
evaluate desired information & resources.
The target of search engines is only to discover
the resources on the web.
Page 3
INTRODUCTION
Needs for Web Mining
Narrowly searching scope
Low precision
Page 4
INTRODUCTION
Other Approaches
 Database approach (DB)
 Information retrieval
 Natural language processing (NLP)
 Web document community
Page 5
WEB MINING
DEFENITION
Web mining refers to the overall process of
discovering potentially useful and
previously unknown information or
knowledge from the Web data.
Page 6
DATA MINING
 Extraction of useful
WEB MINING
 Extracting relevant
patterns from data
information hidden in
sources like databases,
Web-related data, like
texts, web, images etc
hypertext documents
on web
Page 7
WEB MINING SUBTASKS
 Resource finding
 Information selection & preprocessing
 Generalization
 Analysis
Page 8
CHALLENGES
 Search relevant information on web
 Create knowledge
 Personalization of Information
 Learn patterns
 Uniformity & standardisation
Page 9
CHALLENGES
 Redundant Information
 Noisy web
 Monitoring changes
 Sites providing Services
 Privacy
Page 10
TAXONOMY
Web Mining
Web Content
Mining
Web Text
Mining
Web Structure
Mining
Web
Multimedia
Mining
Link Mining
Gen. Access
Pattern
Track
Internal
Structure Mining
Web Usage
Mining
Personalized
Usages
Track
URL Mining
Page 11
WEB CONTENT MINING
Discovering useful information & Analyses
the content
Automatic process beyond keyword
extraction
Approaches to restructure document content
Two groups of mining strategies
Page 12
WEB CONTENT MINING
Agent based Approach
Intelligent search agents
Information filtering/categorization
Personalized web agents
Page 13
WEB CONTENT MINING
Database Approach
Multilevel databases
Web query system
Page 14
WEB STRUCTURE MINING
Discovering structure information from web
Web graph : web pages as nodes & hyperlinks
as edges
Page 15
WEB STRUCTURE MINING
Two algorithms for handling of links
 PageRank
 HITS
Page 16
WEB STRUCTURE MINING
PageRank
 Metric for ranking hypertext documents
 Depends on rank of pages pointing it
 Iterative process
Page 17
WEB STRUCTURE MINING
n : Number of nodes in graph
Outdegree(q) : Number of hyperlinks on page q
d : damping factor
Page 18
WEB STRUCTURE MINING
HITS
 Iterative algorithm
 Identify topic hubs & authorities
 Input : search results returned by traditional text
indexing technique
Page 19
WEB STRUCTURE MINING
 Assigns weight to hub based on authoritiveness
 Outputs pages with largest hub & authority
weights
Page 20
WEB USAGE MINING
Extracting information from server logs
Discover user access patterns of Web pages
Decomposed into 3 subtasks
Site Files
Preprocessing
Raw logs
Mining
algorithms
User session
file
Rules, Patterns
& Statistic
Pattern
Analysis
Interesting
Rules, Patterns
& Statistic
Page 21
WEB USAGE MINING
Preprocessing
 Data cleaning
 User identification
 User sessions identification
 Access path supplement
 Transaction identification
Page 22
WEB USAGE MINING
Pattern discovery
 Statistical Analysis
 Association Rules
 Clustering analysis
Page 23
WEB USAGE MINING
 Classification analysis
 Sequential Pattern
 Dependancy Modeling
Page 24
WEB USAGE MINING
Pattern Analysis
 Eliminates irrelevant rules or patterns
 Extract intresting patterns
Page 25
APPLICATIONS
Personalized Services
Improve website design
System Improvement
Predicting trends
Carry out intelligent buisness
Page 26
PROS
 High trade volumes
 Classify threats & fight against Terrorism
 Establish better customer relationship
 Increase profitability
Page 27
CONS
Invasion of Privacy
Discrimination by controversial attributes
Page 28
CONCLUSION
 Rapidly growing area
 Promising area of future research
Page 29
REFERENCE
[1] http://en.wikipedia.org/wiki/Web mining
[2] http://www.galeas.de/webimining.html
[3] Jaideep srivastava, Robert Cooley, Mukund Deshpande, Pan-Ning
Tan, Web Usage Mining: Discovery and Applications of Usage Patterns
from Web Data, SIGKDD Explorations, ACM SIGKDD,Jan 2000.
[4] Miguel Gomes da Costa Jnior,Zhiguo Gong, Web Structure Mining: An
Introduction, Proceedings of the 2005 IEEE International Conference on
Information Acquisition
[5] R. Cooley, B. Mobasher, and J. Srivastava,Web Mining: Information
and Pattern Discovery on the World Wide Web, ICTAI97
[6] Brijendra Singh, Hemant Kumar Singh, WEB DATA MINING RESEARCH: A SURVEY, 2010 IEEE
[7] Mining the Web: discovering knowledge from hypertext data, Part 2 By
Soumen Chakrabarti, 2003 edition
[8] Web mining: applications and techniques By Anthony Scime
Page 30
WEB MINING
Thank You
Page 31