Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
CS315-Web Search & Data Mining A Semester in 50 minutes or less The Web History, Key technologies and developments Its future Information Retrieval (IR) incl. on the Web How do you find the information you need, fast? Web crawling and Indexing Link Analysis, Quality of information Data Mining and Maching Learning How do you cluster and classify information (semi)automatically? Introduction to “The Social Web” Blogs, Twitter, FB, … Social Networks Web’s Search Engines What are they? How did they start? How do they work? How do they make money? Should I care about privacy? How high is the quality of their results? Can they be improved? PAID RESULTS ORGANIC RESULTS Problems of Search and Mining The Web poses a number of difficulties A populist medium The information abundance and authority problem Uniform access Data with little structure The Web: A populist medium Anyone can be an author! # of writers ~= # of readers Because ~= online members Anyone can be an author! The evolution of memes Memes: ideas, theories, etc., that spread from person to person by imitation Now more easily spread via the web Easier to connect to people with similar interests Gave rise to a plethora of online social networks Info Abundance and Authority Liberal and informal culture of content generation and dissemination Redundancy Non-standard form and content Millions of qualifying pages for broad queries E.g.: java, kayaking, panther No authoritative information about the reliability or trustworthiness of content on a site Your favorite urban legend? Problems from uniform access Little support for adapting to the background of specific users Does your grandfather surf and search the web as easily as you do? Personalized search might help (somewhat) Commercial interests routinely influence the operation of Web search “Search Engine Optimization” AdSense (Lack of) Structured Information Hypertext refers to ability to click and link, not to the structure of data Semi-structured or unstructured No schema (precise description of data) Large number of attributes Each word is a potential feature Major topics to cover History of the Web Relevant network protocols Search Engines and Directories Hyperlink analysis Measuring and Modeling the Web Quality of information Clustering and classification Social networks The Future of the web Reading for next time Vanevar Bush: “As We May Think” Tim Berners-Lee: Chapters 1 (Enquire within) & 2 (Tangles, Bits, Webs) Berners-Lee et. al.: “The Information Universe”