Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Week 9 Search Engines and the Invisible Web Resource Pages • • • • Collections of Links Compiled by “experts” Sometimes annotated Targeted Information for a Specific User Group Examples: Voice of the Shuttle: http://vos.ucsb.edu/ Computer Science Research Guide: http://guides.library.cmu.edu/SCS Anatomy of a Search Engine Basically, there are three parts to a search engine: • “Spider” or “Crawler” - Finds the pages - Brings them home • “Index” or “Database” - Storehouse of pages - Size matters, frequency of updates matters • “Search Tool” - What we use to find the pages in the engine’s index - This is the user interface; the only part we see How Search Engines Rank Pages • Relevance retrieval • Location of search terms • Frequency of search terms • Meta-tags (in the HTML source code of a Web page) Other Ranking Methods • Positions of Words • Term Co-Occurrence • Proximity • Pay for Placement • “Featured Web Sites!” • Link Analysis Search Engine Showdown Chart: http://www.searchengineshowdown.com/features/ What Many Search Engines Cannot Find • Some file types: some engines can, some cannot • Dynamically-generated pages • Pages locked behind firewalls or in fee-based online databases (such as Dialog) • Lots of the “Deep Web” stuff: http://www.completeplanet.com Differences Between the “Deep Web” and Search Engine Results The Deep Web is another phrase for the Invisible Web Deep Web resources are usually: • • • • • Subject specific / more focused Less content but tends to be of higher quality Updated more frequently Have specialized search interfaces Have a target audience in mind Overview of the Deep Web What is Still Invisible: • • • • • Disconnected, loose pages Password-protected pages and sites “robots.txt” files Dynamically-created pages: no static URLs Information bound in database structures that are uncrawlable by many search engines When to Consider the Deep Web • When you are familiar with a topic • When you want authoritative information • When you want specific information • When you want timely information Popular Deep Web Information • • • • • • • • Clinical Trials Environmental Information Grant Information Historical Documents and Images Art Collections Patents Demographic and Economic Data Government Information Look at Some Deep Web Resources • Salary.com Database http://www.ecomponline.com/ • U.S. Patent & Trademark Office http://www.uspto.gov • Los Angeles Municipal Code http://www.municode.com/Library/clientCodePage.aspx?clientID=6662 How to Find the Deep Web • Use a search engine: search “database” as a term • Use a print directory: try OCLC WorldCat to find those specific to your subject need • Ask your colleagues • Take note in the professional literature How to Find the Deep Web (cont.) Use Alerting Services: • The Scout Report (Internet Scout Project) http://scout.wisc.edu/ • INFOMINE http://infomine.ucr.edu/ Evaluation of Web-Based Information Continuously evaluate as you look at “information” on the free Web. The key principles to look for are: • • • • • Currency / Timeliness Authenticity Objectivity Completeness and Accuracy Verifiability Example: Thinking Critically about Web 2.0 and Beyond http://www2.library.ucla.edu/libraries/college/11605_12008.cfm Staying Current • Subscribe to alerting services for Deep Web resources • Look at reviewing tools • Research Buzz: http://www.researchbuzz.org/wp/ • Search Engine Watch and Search Engine Report http://searchenginewatch.com/ Search Engines Don’t Find Information—People Do! • Use the right combination of tools for the job, including offline (paper) resources • Use the right tools the best way possible • Sometimes a search engine, Deep Web resource or other Web finding tool is not appropriate to the information need A “good” search engine is one that finds what you want.