Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Slide 2.1 Chapter 2 : The Web and the Problem of Search • • • • • • • • • The size of the web, and how is it measured. Search engine usage statistics. The bow-tie structure of the web. The small-world web. Web information seeking strategies. A taxonomy of web searches. Web search versus Information Retrieval. Differences between global and local search. Differences between search and navigation. Mark Levene, An Introduction to Search Engines and Web Navigation © Pearson Education Limited 2005 Slide 2.2 Web size statistics • Number of accessible web pages – latest estimate, May 2005, 11.5 billion. • The deep (or hidden or invisible) web contains 400-550 times more information. • Coverage (i.e. the proportion of the web indexed) is crucial for search engines. Mark Levene, An Introduction to Search Engines and Web Navigation © Pearson Education Limited 2005 Slide 2.3 Measuring the size of the web • Capture-recapture method – SE1 is the number of pages indexed first search engine. – QSE2 is the number of pages returned by second search engine for typical queries. – OVR is the number of pages returned by both search engines for typical queries. • Estimate = (SE1 x QSE2)/OVR • Estimate of 64.81 million web sites as of June 2005. Mark Levene, An Introduction to Search Engines and Web Navigation © Pearson Education Limited 2005 Slide 2.4 Web usage statistics • Over 10% of the world’s population were online as of late 2004. • Number of broadband users is growing (over 50% of connected Americans use broadband). • Search engine usage as of June 2004: – Google (41.6%), Yahoo! (31.5%), MSN (27.4%), AOL (13.6%), Ask Jeeves (7%) • 200 million hits per day to Google (mid 2004). Mark Levene, An Introduction to Search Engines and Web Navigation © Pearson Education Limited 2005 Slide 2.5 Tabular Data versus Web Data Figure 2.1: A database table versus a web site Mark Levene, An Introduction to Search Engines and Web Navigation © Pearson Education Limited 2005 Slide 2.6 Structure of the web Figure 2.2: Map of the Internet (1998) Mark Levene, An Introduction to Search Engines and Web Navigation © Pearson Education Limited 2005 Slide 2.7 Structure of the web Figure 2.3: Web pages related to dcs.bbk.ac.uk (see www.touchgraph.com) Mark Levene, An Introduction to Search Engines and Web Navigation © Pearson Education Limited 2005 Slide 2.8 Structure of the web Figure 2.4: Bow-tie shape of the web Mark Levene, An Introduction to Search Engines and Web Navigation © Pearson Education Limited 2005 Slide 2.9 The small-world web • Over 75% of the time there is no directed path from one random web page to another. • When a directed path exists its average length is 16 clicks. • When an undirected path exists its average length is 7 clicks. • Short average path between pairs of nodes is characteristic of a small-world network. Mark Levene, An Introduction to Search Engines and Web Navigation © Pearson Education Limited 2005 Slide 2.10 Web information seeking strategies • Direct navigation – Enter the URL directly into the browser. • Navigation within a directory – Use a web portal as an entry point to the web. • Information seeking on the web is problematic and more users are turning to search engines. Mark Levene, An Introduction to Search Engines and Web Navigation © Pearson Education Limited 2005 Slide 2.11 Navigation using a search engine Figure 2.5: Information seeking Mark Levene, An Introduction to Search Engines and Web Navigation © Pearson Education Limited 2005 Slide 2.12 A taxonomy of web searches • Informational – acquire some information about a topic from web pages. • Navigational – find a site to start navigation from. • Transactional – perform some activity mediated by a web site. Mark Levene, An Introduction to Search Engines and Web Navigation © Pearson Education Limited 2005 Slide 2.13 Web search versus Information Retrieval • The scale of web search is way beyond traditional information retrieval. • The web is very dynamic. • The web contains an enormous amount of duplication. • The quality of web pages is not uniform. • The range of topics on the web is open. • The web is globally distributed. • Users typical habits are different (short queries, inspect only top-10 pages). • The web is hypertextual. Mark Levene, An Introduction to Search Engines and Web Navigation © Pearson Education Limited 2005 Slide 2.14 Information retrieval evaluation Figure 2.6: Recall versus precision Mark Levene, An Introduction to Search Engines and Web Navigation © Pearson Education Limited 2005 Slide 2.15 Differences between global and local search • Local search engines on web sites have a bad reputation. • Users often use a web search engine such as Google or Yahoo! to find information on web sites, rather than the local web site search engine. • Many companies do not invest in local search. • Content management is a problem. • Language may be a problem. • Information needs on web sites may be different. Mark Levene, An Introduction to Search Engines and Web Navigation © Pearson Education Limited 2005 Slide 2.16 Differences between search and navigation • Search – employing a search engine to find information. • Navigation (or surfing) – employing a linkfollowing strategy to find information. • The web encourages a combination of search, navigation and browsing. Mark Levene, An Introduction to Search Engines and Web Navigation © Pearson Education Limited 2005