Download Evaluation in IR in context of the Web

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Evaluating IR (Web) Systems
•
•
•
•
•
Study of Information Seeking & IR
Pragmatics of IR experimentation
The dynamic Web
Cataloging & understanding Web docs
Web site characteristics
Study of Info seeking & retrieval
- Well known authors (useful for research papers)
• Real life studies (not TREC)
-
User context of questions
Questions (structure & classification)
Searcher (cognitive traits & decision making)
Information Items
• Difference searches with same question
• Relevant items
• “models, measures, methods, procedures and
statistical analyses” p 175
• Beyond common sense and anecdotes
Study 2
• Is there ever enough user research?
• A good set of elements to include in an IR
system evaluation
• How do you test for real life situations?
-
Questions the users actually have
Expertise in subject (or not)
Intent
User’s computers, desks & materials
• What’s a search strategy?
- Tactics, habits, previous knowledge
• How do you collect search data?
Study 3
• How do you ask questions?
- General knowledge test
- Specific search terms
• Learning Style Inventory
- NOT the best way to understand users
- Better than nothing
- Choose your questions like your users
• Let users choose their questions?
• Let users work together on searches
• Effectiveness Measures
- Recall, precision, relevance
Study 4
• Measuring efficiency
- Time on tasks
- Task completion
• Correct answer
• Any answer?
- Worthwhile?
• Counting correct answers
• Statistics
-
Clicks, commands, pages, results
Not just computer time, but the overall process
Start with the basics, then get advanced
Regression analysis (dependencies for large studies)
Let’s design an experiment
• User Selection
- Searcher (cognitive traits & decision making)
- User context of questions
• Environment
• Questions (structure & classification)
• Information Items
- Successful answers
- Successful/Worthwhile sessions
• Measurement
Pragmatics of IR experimentation
• The entire IR evaluation must be planned
• Controls are essential
• Working with what you can get
- Expert defined questions & answers
- Specific systems
• Fast, cheap, informal tests
- Not always, but could be pre-tests
- Quick results for broad findings
Pragmatic Decision1
• Testing at all?
- Purpose of test
- Pull data from previous tests
• Repeat old test
- Old test with new system
- Old test with new database
• Same test, many users
- Same system
- Same questions (data)
Pragmatic Decision 2
• What kind of test?
• Everything at once?
- System (help, no help?)
- Users (types of)
- Questions (open-ended?)
• Facts
- Answers with numbers
- Words the user knows
• General knowledge
- Found more easily
- Ambiguity goes both ways
Pragmatic Decision 3
•
•
•
•
Understanding the Data
What are your variables? (p 207)
Working with initial goals of study
Study size determines measurement methods
- Lots of user
- Many questions
- All system features, competing system features
• What is acceptable/passable performance?
- Time, correct answers, clicks?
- Which are controlled?
Pragmatic Decision 4
• What database?
- The Web (no control)
- Smaller dataset (useful to user?)
• Very similar questions, small dataset
- Web site search vs. whole Web search
- Prior knowledge of subject
- Comprehensive survey of possible results
beforehand
• Differences other than content?
Pragmatic Decision 5
• Where do queries/questions come from?
- Content itself
- User pre-interview (pre-tests)
- Other studies
• What are search terms (used or given)
- Single terms
- Advanced searching
- Results quantity
Pragmatic Decisions 6, 7, 8
• Analyzing queries
- Scoring system
- Logging use
• What’s a winning query (treatment of units)
- User success, expert answer
- Time, performance
- Different querie with same answer?
• Collect the data
- Logging and asking users
- Consistency (software, questionnaires, scripts)
Pragmatic Decisions 9 & 10
• Analyzing Data
• Dependent on the dataset
• Compare to other studies
• Basic statistics first
• Presenting Results
• Work from plan
• Purpose
• Measurement
• Models
• Users
• Matching other studies
Keeping Up with the Changing Web
• Building Indices is difficult enough in theory
• What about a continuously changing huge volume of
information?
• Is old information good?
• What does up-to-date mean anymore?
• Is Knowledge a depreciating commodity?
- Correctness + Value over time
• Different information changes at different rates
- Really it’s new information
• How do you update an index with constantly
changing information?
Changing Web Properties
• Known distributions for information change
• Sites and pages may have easily identifiable
patterns of update
- 4% change on every observation
- Some don’t ever change (links too)
• If you check and a page hasn’t changed, what
is the probability it will ever change?
• Rate of change is related to rate of attention
- Machines vs. Users
- Measures can be compared along with information
Dynamic Maint. of Indexes w/Landmarks
• Web Crawlers do the work in gathering pages
• Incremental crawling means incremented
indices
- Rebuild the whole index more frequently
- Devise a scheme for updates (and deletions)
- Use supplementary indices (i.e. date)
• New documents
• Changed documents
• 404 documents
Landmarks for Indexing
• Difference-based method
• Documents that don’t change are landmarks
- Relative addressing
- Clarke: block-based
- Glimpse: chunking
• Only update pointers to pages
• Tags and document properties are
landmarked
• Broader pointers mean less updates
• Faster indexing – Faster access?
Yahoo! Cataloging the Web
• How do information professionals build an “index” of
the Web?
• Cataloging applies to the Web
• Indexing with synonyms
• Browsing indexes vs searching them
• Comprehensive index not the goal
- Quality
- Information Density
• Yahoo’s own ontology – points to site for full info
• Subject Trees with aliases (@) to other locations
• “More like this” comparisons as checksums
Yahoo uses tools for indexing
Investigation of Documents from the WWW
• What properties do Web documents have?
• What structure and formats do Web
documents use?
• What properties do Web documents have?
-
Size – 4K avg.
Tags – ratio and popular tags
MIME types (file extensions)
URL properties and formats
Links – internal and external
Graphics
Readability
WWW Documents Investigation
• How do you collect data like this?
- Web Crawler
• URL identifier, link follower
- Index-like processing
• Markup parser, keyword identifier
• Domain name translation (and caching)
• How do these facts help with indexing?
• Have general characteristics changed?
• (This would be a great project to update.)
Properties of Highly-Rated Web Sites
• What about whole Web sites?
• What is a Web site?
- Sub-sites?
- Specific contextual, subject-based parts of a Web
site?
- Links from other Web pages: on the site and off
- Web site navigation effects
• Will experts (like Yahoo catalogers) like a
site?
Properties
•
•
•
•
•
•
Links & formatting
Graphics – one, but not too many
Text formatting – 9 pt. with normal style
Page (layout) formatting – min. colors
Page performance (size and acess)
Site architecture (pages, nav elements)
- More links within and external
- Interactive (search boxes, menus)
• Consistency within a site is key
• How would a user or index builder make use
of these?
Extra Discussion
• Little Words, Big Difference
- The difference that makes a difference
- Singular and plural noun identification can change
indices and retrieval results
- Language use differences
• Decay and Failures
- Dead links
- Types of errors
- Huge amount of dead links (PageRank effective)
• 28% in 1995-1999 Computer & CACM
• 41% in 2002 articles
• Better than the average Web page?
Break!
Topic Discussions Set
• Leading WIRED Topic Discussions
- About 20 minutes reviewing issues from the week’s
readings
• Key ideas from the readings
• Questions you have about the readings
• Concepts from readings to expand on
- PowerPoint slides
- Handouts
- Extra readings (at least a few days before class) –
send to wired listserv
Web IR Evaluation
- 5 page written evaluation of a Web IR System
- technology overview (how it works)
• Not an eval of a standard search engine
• Only main determinable diff is content
- a brief overview of the development of this type of
system (why it works better)
- intended uses for the system (who, when, why)
- (your) examples or case studies of the system in
use and its overall effectiveness
Projects and/or Papers Overview
• How can (Web) IR be better?
- Better IR models
- Better User Interfaces
• More to find vs. easier to find
• Web documents sampling
• Web cataloging work
- Metadata & IR
- Who watches the catalogers?
• Scriptable applications
- Using existing IR systems in new ways
- RSS & IR
Project Ideas
• Searchable Personal Digital Library
• Browser hacks for searching
• Mozilla keeps all the pages you surf so you
can search through them later
- Mozilla hack
- Local search engines
• Keeping track of searches
• Monitoring searches
Paper Ideas
• New datasets for IR
• Search on the Desktop – issues, previous
research and ideas
• Collaborative searching – advantages and
potential, but what about privacy?
• Collaborative Filtering literature review
• Open source and IR systems history &
discussion