Download Web Search Uis

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Challenges in Web Search
Amit Singhal
Web Search
• Crawl, Index, Search
– Crawl and Index
• freshness
• coverage (page selection, deep web)
– Search
• adversarial IR, trust
• evaluation
• partitioning the query space
Crawl and Index
• Freshness
– pages are deleted, created, changed
– How to keep the index fresh?
• Coverage
– which 2.5B pages to index?
– lot of useful information in databases
– How to index “hidden” content?
Search
• Adversarial IR
– all useful signals are spammed
Search
• Trust
– how much can we trust a site
• an article hosted at BBC is much more
trustworthy than the same article hosted at
yet-another-news-company.com
– How trustworthy is a site, and how to
use this information in ranking?
Search
• Evaluation
– the collection changes continuously
• rel. pages become non-rel., and vice-versa
– can’t easily freeze a copy
• relevance is a function of rendering
– need all images, all redirects, CSS, …
• linkage characteristics change over time
– query space is huge (over 150M/day)
• most popular query: 0.037%, 10th most popular: 0.011%
• need a very large query set, expensive
– How to evaluate given changing collection and a
very big query space?
Search
• Ranking in a huge query space
– specific methods work well for specific
query types
• e.g strong proximity helps for people names
– identify query type and use type-specific
ranking algorithms
– How to partition the query space into
meaningful and useful partitions?
Web Search
– How to keep the index fresh?
– How to index “hidden” content?
– How trustworthy is a site, and how to use this
information in ranking?
– How to evaluate given changing collection and a
very big query space?
– How to partition the query space into
meaningful and useful partitions?
It is a capital mistake to theorize before one
has data. Insensibly one begins to twist facts
to suit theories, instead of theories to suit
facts.
Sir Arthur Conan Doyle (1859 - 1930)