Download Mining real world data

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Mining real world data
Web data
World Wide Web
•Hypertext documents
–Text
–Links
•Web
–billions of documents
–authored by millions of diverse people
–edited by no one in particular
–distributed over millions of computers, connected by
variety of media
Structured vs. Web data mining
• traditional data mining
– data is structured and relational
– well-defined tables, columns, rows, keys, and
constraints.
• Web data
– readily available data rich in features and patterns
– spontaneous formation and evolution of
• topic-induced graph clusters
• hyperlink-induced communities
History of Hypertext
• Citation,
– Hyperlinking
• Ramayana, Mahabharata, Talmud
– branching, non-linear discourse, nested
commentary,
• Dictionary, encyclopedia
– self-contained networks of textual nodes
– joined by referential links
Three Broad Categories of Web
Mining
• Web content mining
– Application of data-mining techniques
• Web structure mining
– Operates on the Web’s hyperlink structure
• Web usage mining
– Analyzes user interaction with Web server
– Include logs, database transaction, …
– Privacy concern
Web Context and Structure Mining
•
•
•
•
•
Web as a Database
Document Classification
Hubs and Authorities
Clever: Ranking by Content
Identifying Web Communities
Web as a Database
• Placing a layer of abstraction containing some
semantic information on top of semistructured
Web
• Query the Web as a database
– Topic, author, creation date, and so on
• WebLog and WebSQL
• Recent work: Semantic Web
Document Classification
• Roots
– Machine learning
– Pattern Recognition
– Text Analysis
• Topic Aggregation
• Google News
– http://news.google.com
Semantic Web Mining
• Semantic Web
– Next generation Web
– Semantically rich language
• Web Ontology Language
– More Complex than Web-as-database
– Fit Web mining
– More and more benefits
Related documents