Download Basics

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Basics
Computer
Internet
Search
Strategy
Computer Basics




IP address: Internet Protocol Address
An identifier for a computer or device on a
network
The format of an IP address is four numbers
separated by periods. Each number can be zero
to 255. For example, 134.140.112.9
Can be static or on the fly
Internet Basics
The Internet vs. The World Wide Web
 The Internet is not synonymous with
World Wide Web

Internet Basics




The Internet is a network of computer networks
Computers connected so they can communicate
with any other computer also connected, or
networked, to the internet
Used for communicating many kinds of
information using protocols including SMTP for
email and HTTP for web pages
The World Wide Web is only one part of the
Internet, which also includes email, newsgroups,
and instant messaging
Internet Basics





The World Wide Web is a “web” of documents,
called web pages, connected via hyperlinks
One way of communicating on the internet
Uses HTTP protocol
Accessed via browsers, such as Internet
Explorer or Netscape
Web pages can include graphics, audio, text,
and video.
Internet Basics
Web pages vs. websites
 A web page is a document on the World
Wide Web
 A website is a collection of web pages
including a home page, the main page on
the site and first to be viewed, plus
additional, related, hyperlinked pages

Internet Basics
URL: Uniform Resource Locator
 The unique address of a web page
 Can be persistent or dynamic
 Format:
http://web.simmons.edu/~krajewsk

Internet Basics
A hyperlink is an element in an web page
that links to another place in the same
page or to an entirely different web page
 Click on the hyperlink to access the linked
web page

Internet Basics
A domain is a group of computers sharing
a part of an IP address
 Consist of a range of IP addresses
 Will share the same basis of url
 www.simmons.edu/libraries and
www.simmons.edu/gslis are all part of the
simmons.edu domain

Cache
Copies of frequently used data stored on a
local hard drive
 Allows information to be accessed more
quickly because it does not have to be
retrieved from the internet each time it is
called

Browser
A Web browser is a software application
used to locate and display Web pages.
 Most browsers can display text, graphics,
audio, and video

Internet Basics
On the web vs. Access via the web
 On the web: online, free, available to
everyone
 Via the web: online, but in a special,
restricted database, requiring a login and
/or subscription fee

Internet Basics

A search engine is a program that
searches the web for specified keywords
and returns a list of the web pages where
the keywords were found
Search Basics

Searching is the process of querying a
database—a library catalog, periodical
index, or search engine—to find relevant
information
Search Basics




Each item in a database is called a RECORD
All records are INDEXED by specialists or
computers who pull out key pieces of information
Each key piece of information indexed belongs
in a specific FIELD, which is generally
searchable (author, title, or specific to subject)
HITS are the number of records in the entire
database that match your search terms
Search Basics
Syntax – The “language” of the database
you are searching
 HOW you translate you information need
into a query

Search Basics



Boolean operators are connectors used to define the
relationship between or among your search terms:
OR – Either Term A or Term B must be present on a
web page for it to be included in your results list
AND – Both Term A and Term B must be present for the
web page to appear on the results list
NOT – Term A must be present and Term B must not be
present for the web page to appear on the results list
Dogs OR Cats

Gets both, might be overlap
Dogs
Cats
Dogs AND Cats

Only gets records where both appear
Dogs
Cats
Dogs NOT Cats

Eliminates Records where Cats Appear
Dogs
Cats
Parentheses: Nesting
(Dog? or Pupp?) and (Cat? or Kitten?)
Dog or Dogs or
Puppy or
Puppies
Cat or Cats
or Kitten or
Kittens
Search Basics




Proximity operators specify how close search terms must appear
together in a web page to be included in the results list:
Next to – Term A and Term B must appear right next to each other
for the web page to appear in the results list
Near – Term A and Term B must be near each other for the web
page to appear in the results list
Within # - Term A and Term B must appear within a certain number
of words for the web page to appear in the results list
Same paragraph - Term A and Term B must appear in the same
paragraph for the web page to appear in the results list
Search Basics
Truncation is the use of a symbol to stand
for any possible ending of a root
 Eliminates the need for long searches with
similar words separated by the Boolean
operator OR


Example Child* The asterisk * can stand for any
possible ending of the root child, such as child,
children, childhood, child’s, children’s,
Search Basics
Wildcard symbols can stand for any
character or characters within a word
 Useful for roots that have many unrelated
endings
 Example wom?n can stand for woman,
women, womon, womyn

Search Basics




Searching terms as a phrase dictates that they
appear in the order specified, right next to each
other, in the web page
Sometimes automatic
Useful in searching for short quotations
Example “hot cross bun” finds only web pages
with that exact phrase, eliminating those that
have the words hot, cross, and bun unrelated to
one another
Search Basics




Limits restrict what part of the web page is
searched
Limited limiting capabilities with search engines
Usually searches metadata, information that
cannot be seen on the web page, itself
Example Language:English finds only web
pages with English language text
Search Basics
Search Index syntax varies
 Usually no field searching
 Limited truncation and wildcards
 Boolean “AND” may be assumed
 Phrase syntax important
 Limit search effectiveness dependent on
web page creators’ included metadata
Search Basics



Natural language searching common with
search engines
No connectors (no boolean, proximity, etc.)
Statistical algorithm for “relevance”
 Term
frequency
 Term location
 Proximity of terms to each other
 Uniqueness of term
 Possibly “popularity” of document
 Build taxonomy “on the fly”
Search Strategies
Key Factors for Successful Web Searching :
Which search engines/resources you
choose for the search
 How carefully you formulate & execute
the search terms & search logic
 How much information is actually
available

Search Strategies


Precision & Recall are traditional measures of a
successful search
Recall: % of relevant records found of all the relevant
records (possible hits) in file



“How much of the good stuff did your search produce?”
measured against all the possible relevant hits
Precision: % of relevant records within search
results


“How much of the bad stuff did your search produce?”
measured against what you actually retrieved
Search Strategies
Precision vs. Recall
Usually inverse relationship
High
Recall
Low
Low
Precision
High
Search Strategies
Precision is assured by choosing enough
appropriate concepts
 Recall is assured by choosing enough
appropriate synonyms

Search Strategies
Choosing Search Words
 Make a list of concrete words that define
the topic
 Identify alternatives
Search Strategies
Simplify words:
 Plurals: s, es, y-ies often will automatically
be searched
 Truncate (often * or !): packag* or wrap*
Search Strategies
Eliminate general and assumed/applied terms
 Leave out “science” if you are search a science
search engine
 Consider whether or not to search “efforts” general term implied in most articles – an web
page discussing effort to do something is likely
not to include the word “efforts”
Search Strategies
Be the most specific (narrowest) when:
 Sure of target document(s)
 Don’t care about recall (precision first)
 Don’t have time to plan
 “Quick & Dirty”
Search Strategies








To narrow a search
“And” in a new concept
Use fewer terms in concept sets
Be more specific - use proximity over ands
Go from free text to controlled vocabulary/fields
Truncate further right
Use narrower, more specific vocabulary terms
Qualify search strategies to titles, abstracts, descriptors
Limit by language, publication year, type
Search Strategies
Be less specific (broadest) when:
 Need comprehensive retrieval (high
recall)
 “Feeling your way”
Unsure of terms
Unsure of database content
Fuzzy topic
Search Strategies
To broaden a search:
 Eliminate a concept set - the least crucial
 OR in more terms
 Be less specific - go from descriptors to free text
 Use broader ands instead of proximity
 Truncate further left
 Use broader controlled vocabulary terms
 Remove qualifiers; search full text
 Remove limitations