Download PDF version

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Finding Information
on the
Information Highway
How to get around in the Internet
Finding information on the
information highway
 the
Internet vs the World Wide Web
 Search
engines
 Subject
 Online
directories
databases
 Boolean
searches
Aren’t the Internet and
the Web the same thing?
the Internet

Think of the Internet as
the physical components
necessary to build a
[massive] computer
network (nodes, cables,
servers, gateways,
routers, firewalls, etc.).
the World Wide Web

Think of the web as all
the services (i.e. email,
webpages, file transfers,
etc.) available over the
Internet; each service
requires its own protocol
(SMTP, HTTP, FTP, etc.).
The primary internet protocols:
Transmission Control Protocol
Internet Protocol
File Transfers
File Transfer Protocol
Web Pages
HyperText Transfer Protocol
eMail
Simple Mail Transfer Protocol
Post Office Protocol
Finding information on the Internet:
Search Engines
Search engines are comprised
of 3 basic components
A spider
aka: crawler/bot
program that crawls across
the Web collecting info
A database
organized by an
indexer program
Search engine
software
pulls hits based
on your inquiry
The search process:
The user enters key words or phrases
The search engine “spider” searches the database index to find
matching items
The software returns “Hits” (results).
[The hits are prioritized according to multiple factors]
A search engine is just a program,
nothing validates or authenticates the results;
no human review takes place.
Why does the same search in different
search engines get different results?

Each search engine uses different algorithms or “spiders.”


Each engine has a different method for ranking or relevance:


The hits are dependent on database content.
These might be based on factors such as:

Frequency: How many times do the words occur in the website?

Location: Are keywords contained in the URL or the site name?
Each engine may search different sites.

Is the search being conducted across the entire web?

Is this a specialty search?
Bottom line? Use more than one
search engine to perform research!
Some of the factors search
engines may use to rank results:
Factors based on
the site itself
Factors based on
external criteria

Frequency

Link popularity

Location

Click popularity

Page count

Demographics

Website structure

Alliances

$$$ (who pays the
most to have their
sites shown)
What are “metasearch” engines?
 Search
engines that search
search engines instead of
individual websites
 Think
how much wider the search area is!
Finding information on the Internet:
Subject Directories
How do subject directories differ
from search engines?

Utilize the human element to categorize

Typically more commercial/consumer oriented

“Drill-down” search by subject, not keywords

Hierarchical organization

Topics

Subtopics
A great resource on subject directories:
http://www.lib.berkeley.edu/TeachingLib/Guides/Internet/SubjDirectories.html
Finding information on the Internet:
Online Databases
Online databases are referred to as
the hidden Internet or deep web

Online databases provide access to
resources outside the reach of web
crawlers or search spiders:

Newspapers, journals, periodicals

Academic papers, white papers

Corporate data and specialty data
About
Yahoo!
Library Index
A great resource for databases:
http://www.itc.nl/Pub/Home/library/Library-generalinformation/more-info-databases/lii_info.html
Making the most of your search
Do your searches end up returning an
overwhelming number of hits?
Use Boolean operators to “tweak” them!
The basic Boolean operators:
Examples of how Boolean
operators affect your search:
example
returns
car AND ford
Documents containing BOTH the words car
and ford
(AND is assumed when 2 words are used)
car OR ford
Documents containing either word and both
words
OR results in the greatest number of hits
car NOT ford
Documents containing car that do NOT
contain ford
NOT generally returns the smallest number
of hits
Ways to further refine your search:
example
returns
combinations Documents containing BOTH the words car
(car AND ford) and ford but nothing about President Gerald
NOT Gerald Ford
Quotes
“Men In Black”
Documents containing the exact string of
words within the quotes, not any
occurrence of any of the words
Wildcard (*)
Bio*
Documents that contain any words that
begin with the letters bio – (biology,
biography, biotech, etc)
Wildcard
(%, ?)
Smithw%ck
% stands for any letter – great when a word
may be spelled in a variety of ways.
Smithwick - Smithwyck - Smithweck
More considerations
(these may vary based on the search engine used)
example
returns
stopwords
Ignored by the search engine –
(a, an, the, of, by, with, for, to, etc.)
keys
(not concepts)
Break phrases down into keywords
(TQM in manufacturing assembly lines)
(total quality management, TQM,
production, manufacturing,
assembly line production, etc.)
proximity
operators
variety
designate how close keywords should be
Change spellings; try abbreviations,
singular/plural forms, related terms,
synonyms