Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Challenges in Web Search Amit Singhal Web Search • Crawl, Index, Search – Crawl and Index • freshness • coverage (page selection, deep web) – Search • adversarial IR, trust • evaluation • partitioning the query space Crawl and Index • Freshness – pages are deleted, created, changed – How to keep the index fresh? • Coverage – which 2.5B pages to index? – lot of useful information in databases – How to index “hidden” content? Search • Adversarial IR – all useful signals are spammed Search • Trust – how much can we trust a site • an article hosted at BBC is much more trustworthy than the same article hosted at yet-another-news-company.com – How trustworthy is a site, and how to use this information in ranking? Search • Evaluation – the collection changes continuously • rel. pages become non-rel., and vice-versa – can’t easily freeze a copy • relevance is a function of rendering – need all images, all redirects, CSS, … • linkage characteristics change over time – query space is huge (over 150M/day) • most popular query: 0.037%, 10th most popular: 0.011% • need a very large query set, expensive – How to evaluate given changing collection and a very big query space? Search • Ranking in a huge query space – specific methods work well for specific query types • e.g strong proximity helps for people names – identify query type and use type-specific ranking algorithms – How to partition the query space into meaningful and useful partitions? Web Search – How to keep the index fresh? – How to index “hidden” content? – How trustworthy is a site, and how to use this information in ranking? – How to evaluate given changing collection and a very big query space? – How to partition the query space into meaningful and useful partitions? It is a capital mistake to theorize before one has data. Insensibly one begins to twist facts to suit theories, instead of theories to suit facts. Sir Arthur Conan Doyle (1859 - 1930)