Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
The Search for Quality: productive Web searching John Cox James Hardiman Library NUI, Galway The Problem 7.3 million new Web pages daily Quality varies, mainly due to ease of publication and lack of checks Quality is in the eye of the beholder Over-dependence on general search engines Simplistic use of search tools Some Usage Findings NUI, Galway Library survey, March 2000: Search engines cited by 79 out of 167 respondents Exclusively used for, eg Nazism, defamation law, hepatitis C Less than 50% satisfied Other surveys show very simplistic use: 33% users enter one word only Further 33% users enter two words only UK survey indicates 80% searchers waste some time US survey shows “search rage” within 12 minutes Key Question “How much better than users are information staff at finding high-quality information on the Web and what leadership do we provide?” 5 key actions needed 5 Key Actions Get the best from the search engines Go vertical: subject-specific sources Take time to experiment, eg helper software Exploit the invisible Web Actively promote quality searching 1: Get the Best from the Search Engines Understand how they work Know their limitations Use advanced features Search more than one Know when not to use them Search Engine Components Crawler: follows links Indexer: builds database Query processor: lets us search Common Limitations Profit-oriented Paid entries listed at top Out of date Partial site indexing Technically must exclude many sites, eg Password-protected Registration needed Database-driven Hidden search facilities Understanding Google Strengths Coverage Cached pages File types, eg PDF,.doc,.ppt Relevance: link popularity Beyond pages: images, newsgroups Weaknesses Poor Boolean support No truncation Limited date searching Invisible search facilities Two pages per site displayed by default Google: coverage Google: search modes Basic Advanced Google: file types Google: newsgroup search Google: cached pages 1 Google: cached pages 2 Google: Boolean limitations 1 Correct syntax: medline OR embase Google: Boolean limitations 2 Correct syntax: medline –embase (or use Advanced Search) Google: no truncation Use clinton (tax OR taxes OR taxation) Google: few date limits Google: hidden features 1 Discovered at www.searchengineshowdown.com (buried in Google help) Google: hidden features 2 Partial URL v Specific Site Search: Not possible on Advanced Search despite “Domains” limit Other Search Engines Always worth searching more than one, eg All the Web (FAST) AltaVista Lycos/HotBot Northern Light (?) Overlap may be limited Different ranking criteria 2. Go Vertical: specific tools Type Region Example(s) Doras, Yahoo Australia & NZ Domain SearchEdu.com Genre Newsindex Discipline EEVL, LawCrawler Subject Politicalinformation.com Horses for Courses 1 Horses for Courses 2 Horses for Courses 3 3. Experimentation Try out “add-on” search software, eg BullsEye Pro Copernic Copernic Summariser BullsEye Pro: searching BullsEye Pro: Webliographies Copernic Copernic Summariser 4: Explore the “Invisible Web” Material, often of high quality, that general search engines can’t or won’t index Unlinked pages Non-HTML file types, eg audio, video, PDF Authenticated sites Databases Much greater in size than visible Web invisibleweb.com invisible-web.net WebData Librarians’ Index to the Internet 5. Promote Quality Searching Old sources Old habits New media Old Sources Old Habits Search strategy formulation Concept analysis Flexibility Patience Critical appraisal of search hits Critical source selection New Media Library Web Site Enewsletter Weblog http://www.hw.ac.uk/libWWW/irn/irn.html Towards a Brighter Future Automatically-generated, accurate metadata Smarter search engines More quality-sensitive More penetrative XML: structured data References •Sherman, Chris and Price, Gary The invisible Web: uncovering information sources search engines can't see. Medford, N.J.: Information Today, 2001. ISBN 091096551X. (accompanying database at http://invisible-web.net) •Search Engine Watch: http://www.searchenginewatch.com •Search Engine Showdown: www.searchengineshowdown.com