Download Query Suggestion Using Hitting Time

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

K-nearest neighbors algorithm wikipedia , lookup

Transcript
Query Suggestion Using
Hitting Time
Qiaozhu Mei †, Dengyong Zhou ‡,
Kenneth Church ‡
† University
of Illinois at Urbana-Champaign
‡ Microsoft Research, Redmond
Motivating Examples
MSG
1. Difficult for a user to express
information need
2. Difficult for a Search engine to
infer information need
Sports
center
Food
Additive
Query Suggestions: Accurate to
express the information need;
Easy to infer information need
2
Motivating Examples (Cont.)
Welcome to the
hotel california
Suggestions
hotel california
eagles hotel california
hotel california band
hotel california by the eagles
hotel california song
lyrics of hotel california
listen hotel california eagle
3
Motivating Examples: Personalization
MSR
Metropolis Street Racer
Magnetic Stripe Reader
Molten salt reactor
Mars Sample Return
…
Mountain safety research
Actually Looking for Microsoft Research…
4
Research Questions
• How can we generate query suggestions in a
principled way?
• Can we generate personalized query
suggestions using the same method?
• Can this method be generalized to other search
related tasks?
5
Rest of This Talk
•
•
•
•
•
Random Walk, Hitting Time, and Bipartite Graph
Generating Query Suggestion
Personalized Query Suggestion
Experiments
Discussion and Summary
6
Random Walk and Hitting Time
P = 0.3
0.3
k
A
i
P = 0.7
0.7
j
• Hitting Time
– TA: the first time that the random
walk is at a vertex in A
• Mean Hitting Time
– hiA: expectation of TA given that
the walk starts from vertex i
7
Computing Hitting Time
hiA = 0.7 hjA + 0.3 hkA + 1
h=0
0.7
k
A
i
0.7
TA: the first time that the random
walk is at a vertex in A
T A  min{ t : X t  A, t  0}
hiA: expectation of TA given that the
walk starting from vertex i
j
Apparently, hiA = 0 for those
hiA 
 p(i  j )h
jV
A
j
 1, for i  A
iA
Iterative
Computation
0, for i  A
8
Bipartite Graph and Hitting Time
5
Bipartite Graph:
A
V1
4
0.4
k
0.7
7
i
V2
1
ww(i,
(i, j ) j) =3 3 j

dw
(3w(1k), j )
j (i, j )
p(i  k ) 
w(i, j )
3
jV2 p (dii  j ) d
j

di
(3  7)
p( j  i ) 

- Edges between V1 and V2
- No edge inside V1 or V2
- Edges are weighted
- e.g., V1 = query; V2 = Url
Expected proximity of query i to the
query A : hitting time of i  A, hiA
• convert to a directed graph, even collapse one group
9
Generate Query Suggestion
• Construct a (kNN)
Query
Url
subgraph from the
300
T
query log data (of a
www.aa.com
15
aa
predefined number of
www.theaa.com/travelwatch/ queries/urls)
planner_main.jsp
• Compute transition
mexiana
probabilities p(i  j)
A
•
Compute
hitting
time
h
american
i
en.wikipedia.org/wiki/Mexicana
airline
• Rank candidate queries
using hiA
10
Intuition
• Why it works?
– A url is close to a query if freq(q, url)
dominates the number of clicks on this url
(most people use q to access url)
– A query is close to the target query if it is
close to many urls that are close to the target
query
11
Personalized Query Suggestion
• Queries are ambiguous
• Different user  different information need 
different query suggestions
• Simple approach: build the graph, compute
hitting time solely based on the user’s history
• Data Sparseness
– E.g., you cannot see a query if you never used it
• Alternative: modify the bipartite graph instead of
rebuilding all
12
Personalize the Bipartite Graph
Query
Url
aa
T
www.aa.com
pseudo query:
P
“aa” + user
Introduce
a pseudo
(personali
zed query)
Reweight
edges using
personalized
Probs.
p(Url | User, Query )
p(Query | User,Url )
www.theaa.com/travelwatch/planner_main.jsp
alcoholics
anonymous
american airline
en.wikipedia.org/wiki/Alcoholics_Anonymous
www.alcoholics-anonymous.org
• Key: How to compute
p(Url | User, Query )
– From w(url, user, query) – Sparse data!
– Compute a smoothed p(Url | User, Query)
13
Personalization with Backoff
(Mei and Church 08)
Full personalization: sparse data!
P(Url | IP, Q)  4 P(Url | IP4 , Q)
Personalization
with backoff:
 3 P(Url | IP3 , Q)
156.111.188.*
 2 P(Url | IP2 , Q)
156.111.*.*
 1 P(Url | IP1 , Q)
No personalization:
lose the
opportunity
156.111.188.243
 0 P(Url | IP0 , Q)
156.*.*.*
*.*.*.*
We don’t have enough data for everyone!
- Backoff to classes of users (e.g., IP)
14
Experiments
• Query Suggestion using Query Logs
– commercial search engine log (1.5 year)
– 637 million queries; 585 million urls
– Query-click bipartite graph
• Author/keyword suggestion using DBLP
– titles and authors from DBLP
– 110k of papers, 580k authors
– Coauthor graph, keyword graph, author-keyword
bipartite graph
• Baselines: nearest neighbor; personalized pagerank
15
Result: Query Suggestion
Query = friends
Hitting time
Google
wikipedia friends
friendship
friends poem
friendster
friends episode guide
friends scripts
how to make friends
true friends
Yahoo
friends tv show wikipedia
secret friends
friends home page
friends reunited
friends warner bros
hide friends
the friends series
hi 5 friends
friends official site
find friends
friends(1994)
poems for friends
friends quotes
16
Result: Query Suggestion (II)
Query = aa
Yahoo
Hitting time
aa route planner
alcoholics anonymous
aa route finder Live
automobile association
aa airlines
aa route finder
theaa
aa meetings
aa route planner american airlines
aa autoroute
aa airlines
aa road map
american airlines american airline ticket
reservation
aa meeting
ndcg
aa road map
Chris burges
american air
Query = ranknet
Hitting Time
learning to rank
ndcg measure ir
lambdarank
pairwise test
17
Results: Personalized Query
Suggestion
Query = msr
Personalized
No personalization
Microsoft research
mountian safety research
research
msrcorp
what is research
msr outdoor equipment
research website
msr camp stoves
msr snowshoes
microsoft research and
development
msr racing
yahoo research labs
18
Result: Author Suggestion
Query = Jon Kleinberg
(personalized
Pagerank is
similar)
Nearest Neighbor;
Famous
researchers
+ former
students
Favor students,
especially current
students
Prabhakar Raghavan
Hitting time
Eva Tardos
Aleksandrs Slivkins
Daniel P. Huttenlocher
Mark Sandler
David Kempe
Tom Wexler
Amit Kumar
Lars Backstrom
Andrew Tomkins
Elliot Anshelevich
Xiangyang Lan
19
Result: Keyword Suggestion
Query = social network
Knowledge collaboration
Community structure Query = pagerank
Resource organizationPagerank computation
Information kiosks
Ranking systems
Efficient searching
Pagerank approximation
Network extraction
Incremental computations
Query = olap
Dimension updates
OLAP data
OLAP cubes
OLAP queries
View size
Hierarchical cluster
Web spam
Iterative computation
20
Result: Keyword Suggestion for Author
Query = Michael I. Jordan
Baselines
mining
Baselines
data
learning
frequent
Hitting Time
statistical
Hitting time
Efficient
large databases
kernel
Dirichlet process
pattern
frequent pattern
markov
approximate inference
inference
dirichlet
data mining sequential pattern
pattern mining
model
mean field
frequent
supervised learning
multi dimensional
graphic models
Query = Jiawei Han
21
Discussions
• Hitting time effectively boosts infrequent queries
– Nearest Neighbor & personalized pagerank favorites
frequent queries
• Fast convergence: a few iterations and a
subgraph gets most of the value
• No parameter to tune
• Can be generalized to many other tasks (on
different graphs)
22
Ranking on Query log Graph and
Search Tasks
• Query  Query: query suggestion
• Url  Url: finding related pages
www.cs.jhu.edu/~brill 
• "research.microsoft.com/users/brill”
•
•
•
•
•
•
IP  IP: finding similar users
Url  Query: Annotation, Summarization, ads term
Query  Url: Search
IP, Query  Url: Personalized Search
IP, Query  Query: Personalized Query Suggestion
Many other opportunities!
Summary
• Generate query suggestions using hitting time
on query-click graph
• Personalized query suggestion
• Generalizable to other search tasks
• Future work:
– Different types of graphs: e.g., query sessions
– Combine with other features
– Large scale evaluation
24
Thanks!
25