Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Mining real world data Web data World Wide Web •Hypertext documents –Text –Links •Web –billions of documents –authored by millions of diverse people –edited by no one in particular –distributed over millions of computers, connected by variety of media Structured vs. Web data mining • traditional data mining – data is structured and relational – well-defined tables, columns, rows, keys, and constraints. • Web data – readily available data rich in features and patterns – spontaneous formation and evolution of • topic-induced graph clusters • hyperlink-induced communities History of Hypertext • Citation, – Hyperlinking • Ramayana, Mahabharata, Talmud – branching, non-linear discourse, nested commentary, • Dictionary, encyclopedia – self-contained networks of textual nodes – joined by referential links Three Broad Categories of Web Mining • Web content mining – Application of data-mining techniques • Web structure mining – Operates on the Web’s hyperlink structure • Web usage mining – Analyzes user interaction with Web server – Include logs, database transaction, … – Privacy concern Web Context and Structure Mining • • • • • Web as a Database Document Classification Hubs and Authorities Clever: Ranking by Content Identifying Web Communities Web as a Database • Placing a layer of abstraction containing some semantic information on top of semistructured Web • Query the Web as a database – Topic, author, creation date, and so on • WebLog and WebSQL • Recent work: Semantic Web Document Classification • Roots – Machine learning – Pattern Recognition – Text Analysis • Topic Aggregation • Google News – http://news.google.com Semantic Web Mining • Semantic Web – Next generation Web – Semantically rich language • Web Ontology Language – More Complex than Web-as-database – Fit Web mining – More and more benefits