Download PowerPoint Materials

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

URL redirection wikipedia , lookup

Transcript
Week 9
Search Engines
and the
Invisible Web
Resource Pages
•
•
•
•
Collections of Links
Compiled by “experts”
Sometimes annotated
Targeted Information for a Specific User Group
Examples:
Voice of the Shuttle: http://vos.ucsb.edu/
Computer Science Research Guide: http://guides.library.cmu.edu/SCS
Anatomy of a Search Engine
Basically, there are three parts to a search engine:
• “Spider” or “Crawler”
- Finds the pages
- Brings them home
• “Index” or “Database”
- Storehouse of pages
- Size matters, frequency of updates matters
• “Search Tool”
- What we use to find the pages in the engine’s index
- This is the user interface; the only part we see
How Search Engines Rank Pages
• Relevance retrieval
• Location of search terms
• Frequency of search terms
• Meta-tags (in the HTML source code of a Web page)
Other Ranking Methods
•
Positions of Words
•
Term Co-Occurrence
•
Proximity
•
Pay for Placement
• “Featured Web Sites!”
• Link Analysis
Search Engine Showdown Chart:
http://www.searchengineshowdown.com/features/
What Many Search Engines
Cannot Find
• Some file types: some engines can, some cannot
• Dynamically-generated pages
• Pages locked behind firewalls or in fee-based online
databases (such as Dialog)
• Lots of the “Deep Web” stuff:
http://www.completeplanet.com
Differences Between the “Deep Web”
and Search Engine Results
The Deep Web is another phrase for the Invisible Web
Deep Web resources are usually:
•
•
•
•
•
Subject specific / more focused
Less content but tends to be of higher quality
Updated more frequently
Have specialized search interfaces
Have a target audience in mind
Overview of the Deep Web
What is Still Invisible:
•
•
•
•
•
Disconnected, loose pages
Password-protected pages and sites
“robots.txt” files
Dynamically-created pages: no static URLs
Information bound in database structures that
are uncrawlable by many search engines
When to Consider the Deep Web
•
When you are familiar with a topic
•
When you want authoritative information
•
When you want specific information
•
When you want timely information
Popular Deep Web Information
•
•
•
•
•
•
•
•
Clinical Trials
Environmental Information
Grant Information
Historical Documents and Images
Art Collections
Patents
Demographic and Economic Data
Government Information
Look at Some
Deep Web Resources
• Salary.com Database
http://www.ecomponline.com/
• U.S. Patent & Trademark Office
http://www.uspto.gov
• Los Angeles Municipal Code
http://www.municode.com/Library/clientCodePage.aspx?clientID=6662
How to Find the Deep Web
• Use a search engine: search “database” as a term
• Use a print directory: try OCLC WorldCat to find
those specific to your subject need
• Ask your colleagues
• Take note in the professional literature
How to Find the Deep Web (cont.)
Use Alerting Services:
• The Scout Report (Internet Scout Project)
http://scout.wisc.edu/
• INFOMINE
http://infomine.ucr.edu/
Evaluation of
Web-Based Information
Continuously evaluate as you look at “information” on the
free Web. The key principles to look for are:
•
•
•
•
•
Currency / Timeliness
Authenticity
Objectivity
Completeness and Accuracy
Verifiability
Example: Thinking Critically about Web 2.0 and Beyond
http://www2.library.ucla.edu/libraries/college/11605_12008.cfm
Staying Current
• Subscribe to alerting services for Deep Web
resources
• Look at reviewing tools
• Research Buzz:
http://www.researchbuzz.org/wp/
• Search Engine Watch and Search Engine Report
http://searchenginewatch.com/
Search Engines Don’t Find
Information—People Do!
• Use the right combination of tools for the job,
including offline (paper) resources
• Use the right tools the best way possible
• Sometimes a search engine, Deep Web resource or
other Web finding tool is not appropriate to the
information need
A “good” search engine is
one that finds what you want.