Download Presentation - Drexel University

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Amanda Spink :
Analysis of Web Searching and
Retrieval
Larry Reeve
INFO861 - Topics in Information Science
Dr. McCain - Winter 2004
Background

Amanda Spink

Self-described areas of work:





Information Retrieval
Web Retrieval
Human Information Behavior / Information
Seeking
Medical Informatics
Ph.D. 1993 – Rutgers University


Thesis - Feedback in Information Retrieval
Studied under Tefko Saracevic
2
Background

Amanda Spink

Over 140 papers published



5th in journal article production,
18th in citation production among U.S. IS faculty
Institute for Information Science – most highly cited
paper in Web Retrieval:

Real Life, Real Users, Real needs: A Study and Analysis of
User Queries on the Web (2000)
3
Background

Amanda Spink

Associate Professor at University of Pittsburgh


School of Information Sciences
Prior faculty positions

Pennsylvania State University



School of Information Science & Technology
Web Research Group
University of North Texas

School of Library and Information Sciences
4
Background

Tefko Saracevic

Associate Dean


School of Communication, Information and Library
Studies, Rutgers University
Related research



Test and Evaluation of IR systems
Relevance in Information Science
Analysis of web queries
5
Web Searching and Retrieval

Analyze user queries

Important for building future IR systems on Web

Focus on search terms




Failure analysis in query construction
Term Relevance Feedback (TRF)
Topics / Classification
Use of language
6
Studies Conducted

U.S. – Excite (www.excite.com)

“51K study”




51,473 queries
18,113 users
March 9, 1997
“1M study”



1,025,910 queries
211,063 users
September 16, 1997
7
Studies Conducted

European - AllTheWeb.com

1 million queries
200,000 users

Logs from two days:




February 6, 2001
May 28, 2002
Most users from Norway and Germany
8
Studies Conducted

Issues with Web transaction logs

Where does session start and end?

Temporal boundary – Spink found 15 mins avg,



Numerical boundary – 100 entries
How to eliminate non-individual users


Others found 5mins, 12mins, 32mins, and 2 hours
Meta-search engines, other agents
No user insight into user’s process
9
Findings





Relevance Feedback
Advanced Search Techniques
Term Characteristics
Query Classification
American vs. European
10
Findings: Relevance Feedback

Term Relevance Feedback (TRF) rarely used

51K study




1,597 queries from 823 users (<5% of queries)
Those using TRF had longer sessions
Successful 60% of time
Implications:


Failure rate of 40% may be too high
IR designers could automatically perform TRF
11
Findings: Relevance Feedback

Mediated searching



11% of search terms come from TRF
37% from users, 63% from mediators
2/3 of TRF contributed positively
12
Findings: Relevance Feedback

Identified 6 session states



Identified 4 session patterns


Initial Query, Modified Query, Next Page,
New Query, Relevance Feedback, Prev Query
Using the 6 session states
Implication: IR designers should
accommodate these states and patterns
13
Findings: Relevance Feedback
Relevance Feedback Session Patterns
14
Findings: Advanced Search Techniques

Includes:
 Boolean operators
 Modifiers +,  Quotes (phrases)

Not often used by Web users, but used more by mediated search
 Boolean <10%, Modifiers 9%, 6% phrases

Used incorrectly
 Boolean: AND:50%, OR:28%, AND NOT:19%
 Modifiers: 75% of time
 Phrases: 8%

Users and advanced techniques do not get along!
15
Findings: Advanced Search Techniques

Boolean, most common problems:


Not capitalizing AND
Confusing ‘AND’ operator with ‘and’ conjunction



e.g. Science and Technology
Science AND Technology
Modifiers, most common problems:

Prefix rather than mathematical postix


+news +weather rather than news+weather
No space required, as is required with Boolean
16
Findings: Term Characteristics

Terms per query



1: 26.6%, 2: 31.5%, 3: 18.2%, >7: 1.8%
Mediated searching: 7-15 terms
Distribution of terms not quite Zipf:



Top terms account for 10% of all terms
Single-use terms account for 9% of all
terms
Not understood why this occurs
17
Findings: Query Classification
Classification of queries based on Rutgers’ Web Classification
18
Findings: Query Classification

What users are looking for is not what is on
Web:

Distribution of content:



83% Commercial, 6% Educational, 3% Health
Example: 10% of searches are for Health
Searchers find classifications understandable

IR system presentation design
19
Findings: American & European Searching

Commonalities:

Three or fewer terms




American: 80%, European 85%
Predominantly use English terms
Relevance judgments: less than 15 minutes
viewing retrieved documents
Information seeking sessions short
20
Findings: American & European Searching

Differences

Categories




American: Entertainment, Sex, Commerce
European: People-places-things, Computers, Commerce
American searchers spent more time searching ecommerce sites than European counterparts
Did not examine:



Use of advanced techniques
Relevance feedback
First in initial set of studies?
21
Findings: Summary





Number of query terms is about 2
TRF is not used often
Boolean operators and modifiers not used
often – difficulty in using them correctly
Users do not spend much time making
relevancy judgments
Term frequency distribution is a few terms
used often, many terms used only once
22
Findings: Summary



Most users had single query only and did not
follow up with successive queries
Average viewing of 2 pages
50% did not access beyond first page; more
than 75% did not go beyond 2 pages
23
Implications / Further Research

Improve use of advanced search techniques


Improve use of relevance feedback


UI changes, result overview
Improve understanding of language use


Automatic generation of TRF results
Improve classification of results


UI changes, Venn Diagrams
Adapt IR designs to language
Examine cultural differences

TRF, advanced search techniques (same or different)
24
Amanda Spink Web Searching and Retrieval

Questions
25
Related documents