Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
Web Mining 4/17/2007 Hye Seon Yi Why Web mining? • To mine semi-structured data with hyperlinks and html tags – Traditional Data Mining, Information Retrieval and machine learning systems use well-structured data (ex) tabular data • To find information systematically from the vast collection of Web documents Data for Web mining • Web pages – Written in HTML and XML • Web logs – generated and kept at Web servers • Hyperlink structures • Other related data – User profiles and registration information Tasks of Web Mining • Resource finding – Collect data • Information selection and preprocessing – Convert data into a desired format • Generalization – Discover patterns • Analysis – Validate and/or interpret the patterns Category in Web Mining • Web Usage Mining – Find usage patterns of users • Web Content Mining – Knowledge discovery in Web data • Web Structure Mining – Analyze the node and connection structure of a Web site Web Usage Mining • Find usage patterns of users • Mining Objects – Web log records • Visits, clicks, accessed pages, and so on – User Input from forms and survey • Utilization at E-Commerce sites – Suggesting pages and resources according to user’s browsing trends (ex) Amazon.com, Netflix.com – Active researching area Web Content Mining • Knowledge discovery in Web data • Mining Objects – Textural data (unstructured, semi-structured, well-structured) – Multimedia data (image, audio, video and etc) • IR view (bag of words) vs. DB view (structured) of Web content • Agent-based approach (AI agent) vs. DB approach Web Structure Mining • Mining Objects – Hyperlink Structure • Utilization for search engines – Rank search results (ex) PageRank Algorithm – Developed and used by Google – Each link as a vote – Each Link is weighed differently by the importance of the page itself Summary • Web Mining: Mining Web documents • 3 sub-categories – Web Usage Mining – Web Content Mining – Web Structure Mining