Download Classes20_21_Search_..

Document related concepts

URL redirection wikipedia , lookup

Transcript
Web Content Development
Dr. Komlodi
Classes 20-21: Search systems
Web Searching
• Search within your site:
– Full site or subsites
– www.jhu.edu, www.umbc.edu
• Web search:
– Search indexes of web pages
– www.google.com,
• Metasearch:
– Searching across multiple search engines
– clusty.com, www.dogpile.com, www.myriadsearch.com
• Web Search Engine Watch:
http://searchenginewatch.com/
Does your site need a search?
•
Pp. 145-148
1. Sufficient content
2. Sufficient resources
3. Time and know-how to optimize system
4. Better alternatives?
5. Will users bother with it?
6. Too much information to browse
7. Fragmented site
8. Learning tool
9. User expectations
10. Dynamism
•
Post bullets on Blackboard discussion board
Why should an IA worry about search?
• You know the users
• Many decisions should be user-centered
and not technology-centered
• It has an interface
How does search work?
©2004 Google Source: http://www.google.com/technology/pigeonrank.html
How does search work?
Documents
Searchers
Search Engine
Search Interface
Queries
Indexes
Matching
Results
Indexing
(manual or automatic)
How does web search work?
Documents = Web sites & pages
Searchers
Search Engine
Search Interface
Queries
Indexes
Matching =
queries to
search
engine
indexes
Results
Indexing =
Automatic, spiders & robots crawl
websites and index pages
according to their own rules. As a
result, they build large databases
containing the indexes.
How does search work?
Documents
Searchers
Search Engine
Search Interface
Queries
Indexes
Matching
Results
Indexing
(manual or automatic)
What to search on…
• All the content?
• Determining search zones
• Site search:
– Subsite
– Type of document
• Web search:
– Multimedia and heterogeneous
• Full-text or metadata
• Types of indexes
How does search work?
Documents
Searchers
Search Engine
Search Interface
Queries
Indexes
Matching
Results
Indexing
(manual or automatic)
Indexing by
•
•
•
•
•
•
•
Navigation vs. destination pages
Audience or Reading level
Topic
Date of update
Author
Title
User task
What would the index look like?
Full Text Indexing
• Take out frequent words from documents
• List the rest of the words from each
document
• May add frequency numbers to each
word
• Search the lists of words
What would the index look like?
Indexing Languages
• An index is a systematic guide designed to
indicate topics or features of documents in
order to facilitate retrieval of documents or
parts of documents.
• An Indexing language is the set of terms
used in an index to represent topics or
features of documents, and the rules for
combining or using those terms.
Web Search Engine Indexes
• The larger a web search engine’s index is,
the more web pages it can return and the
more types of queries it can accommodate
• However, quantity is just one measure of
performance
• How to compare:
• http://www.google.com/help/indexsize.html
• Try this!
How does search work?
Documents
Searchers
Search Engine
Search Interface
Queries
Indexes
Matching
Results
Indexing
(manual or automatic)
Search Interface
• Shneiderman, Byrd, Croft, Clarifying Search,
DLib, 1997
• Formulation:
–
–
–
–
•
•
•
•
Sources
Fields
What to search for
Variants
Action
Review of results
Refinement
Let’s see Google’s advanced search
Query Format
• Boolean:
– Good for advanced users
– Precise and clear why you go results back
– Need to understand syntax
• “Natural Language”
–
–
–
–
Good for difficult questions when you can’t think of terms
Or novice users
Difficult to know why certain results come back
Black box
• Relevance Feedback
– User selects relevant items from results
– Search engine consider these in reformulating query
• Similarity Retrieval
– Similar to relevance feedback
– “I want more like this”
– Both are good if you don’t know what exactly you are looking for
Boolean
Natural Language
Relevance Feedback
Source: http://nayana.ece.ucsb.edu/imsearch/imsearch.html Accessed January 2007.
Relevance Feedback
Relevance Feedback
Similarity Retrieval
Other Query Building Tools
• Citation networks:
– This page/paper is citing/linking to?
– This page/paper is cited by/linked to?
– What other papers/pages cite/link to the same
papers/pages?
– http://portal.acm.org/dl.cfm
•
•
•
•
Spell checkers in queries
Phonetic tools
Stemming tools
Controlled vocabularies
How does search work?
Documents
Searchers
Search Engine
Search Interface
Queries
Indexes
Matching
Results
Indexing
(manual or automatic)
Matching
• Boolean
– AND, OR, NOT
• Probabilistic
• Vector model (calculate weights of
words)
• Natural Language
– process the query as well, match lists
How does search work?
Documents
Searchers
Search Engine
Search Interface
Queries
Indexes
Matching
Results
Indexing
(manual or automatic)
Results Presentation
• How many?
• How much information about each item?
• What can users do with each item?
• Presenting results by categories
Evaluation of Search Engines
Your book is wrong on page 159!!!
Recall:
Relevant retrieved documents
All relevant documents in collection
Precision:
Relevant retrieved documents
All retrieved documents
Copyright Dr. David Grossman, Source: http://ir.iit.edu/~dagr/cs529/files/handouts/01Introduction-6per.PDF
Within-Site Search Bloopers 1
1. Baffling search controls. Search options
require knowledge of computer or industryinsider concepts.
2. Dueling search controls. Competing search
boxes on page, with no guidance.
3. Hits look alike. List of found items cannot be
easily distinguished by scanning.
4. Duplicate hits. List of found items contains
duplicates.
5. Search myopia: Missing relevant items.
Items that should be found are not.
http://www.web-bloopers.com/
Within-Site Search Bloopers 2
6. Needle in a haystack: Piles of irrelevant
hits. Many items don’t match search criteria.
7. Hits sorted uselessly. Sort-order of found
items doesn’t support user tasks.
8. Crazy search behavior. Modifying search
criteria yields unexpected results.
9. Search-terms not shown. Not showing
what search terms produced these results.
10. Number of hits not revealed. Not showing
how many items were found.
http://www.web-bloopers.com/
Search User Interface Design
Recommendations 1
• Put a simple, reasonably long search field on
every page of the site. (Nielsen: min. 27
characters long)
• Use simple words to explain the process:
remove all jargon and technical terms, and
make sure that any icons have labels.
• Avoid inventing a new interface, which will
confuse users: take the best of the formats of
the large public search engines
• Make the search forms and results pages fit
into the overall design of the web site: they
should use the same colors, fonts and so on.
http://www.searchtools.com/info/user-interface.html
Search User Interface Design
Recommendations 2
• Include site names and navigation links
into results pages, so users can see the
context and structure of the site.
• Set up a special page to be displayed
when the search does not find any
matches in the index
• Avoid surprises: clarify all automated
search features, such as stemming,
phonetic matching, thesaurus lookups and
stopwords
http://www.searchtools.com/info/user-interface.html
How Search Should Work
PWU Ch5
• Follow the standards of the large search engines:
– Search box (min. 27 char-s) and a button in the top right
corner of the page
– Search box on every page
– Linear results in order of relevance
• Users expect search to be a keyword search and not
other types of searches (by types of clothing, size,
season, etc.)
• Advanced search should be a secondary option or
omitted
• Scope search useful is you site has distinct sections
• Do not default the search to a scope
Search Engine Results Pages
(PWU Ch5)
• Copy the design of major search engines
• List results in relevance order but no need
to show measure of relevance
• If appropriate, allow users to re-sort results
• Each result should start with a clickable
headline
• Follow headline by 2-3-line summary
• Include a search box with the user’s query
in it to make query reformulation easier
Design of No-Matches Pages
•
•
•
•
•
•
•
Site Context and Navigation
Instead of a bare page saying that the search failed, show the standard site
layout, including background colors, logos, text and link colors, and
navigation links.
If you have a site map or Yahoo-style directory for your site, include it in the
no-matches page -- otherwise you may want a statement of the site scope.
That provides a positive way to help people understand what is available,
and browse if they choose.
Search Again Field
Make sure there is a Search field, so people can try a different search. Don't
make them click a link or otherwise take an extra step to search again.
Suggested Wording
Include some text that explains why the search might have failed, and what
people can do next. This list is carefully worded to be positive and helpful,
rather than blaming the user for the search failure. For example: Your
search returned no results. Try broadening your search (from heart attack to
heart disease) or adding additional terms (from high blood pressure to high
blood pressure or hypertension).
http://www.searchtools.com/guide/nomatches.html
Search UI Design Exercise
• Work in pairs
• Select an imaginary website
• Design on paper:
– A homepage with a search box
– A search results page
– A no-hit page
Search Engine Optimization
How do search engines find you?
• Search engine optimization:
– Changing your site to improve the site’s ranking in
search results
• Search engine submission:
– To submit your site to search engines to make sure
the engines know about it
• Search engine marketing/promotion:
the process of submitting (free or paid) and
search engine optimization
• http://blog.searchenginewatch.com/090402110851 (From 2:12)
Search Engine Submission
• Yahoo’s human-compiled directory listings
(http://help.yahoo.com/l/us/yahoo/ysm/ds/index.html):
– Crawlers look at those pages
– Free for normal review
– $299 for expedited review and commercial listing (no guarantee of
listing)
• Google:
– Free but not guaranteed
(http://www.google.com/addurl/?continue=/addurl)
– Or use AdWords for payment (http://www.google.com/ads/)
• Yahoo ads submission:
– Yahoo sponsored search
(http://searchmarketing.yahoo.com/arp/sponsoredsearch_ss.php?o=US
1806&cmp=SYC&ctv=&s=Y&s2=S&b=25)
– Pay by the number of clicks
Search Engine Optimization
• Linguistic SEO:
– Research what words users use for your content:
• Search engine logs, user testing, support calls,
discussion forums
– Use those words to describe your content on your
pages and in the metadata
• Architectural SEO:
– Make sure your important content is text
– Make sure your linking structure leads search
engine indexing crawlers to important content
• Reputation SEO:
– Make sure other sites link to you
Search Engine Optimization
• Study your guideline
• Create a few bullet points to describe your
guideline and post them on the discussion board
• Sources:
• http://searchenginewatch.com/webmasters/articl
e.php/2168021
• http://searchenginewatch.com/webmasters/articl
e.php/2167931