Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Query Suggestion Using Hitting Time Qiaozhu Mei †, Dengyong Zhou ‡, Kenneth Church ‡ † University of Illinois at Urbana-Champaign ‡ Microsoft Research, Redmond Motivating Examples MSG 1. Difficult for a user to express information need 2. Difficult for a Search engine to infer information need Sports center Food Additive Query Suggestions: Accurate to express the information need; Easy to infer information need 2 Motivating Examples (Cont.) Welcome to the hotel california Suggestions hotel california eagles hotel california hotel california band hotel california by the eagles hotel california song lyrics of hotel california listen hotel california eagle 3 Motivating Examples: Personalization MSR Metropolis Street Racer Magnetic Stripe Reader Molten salt reactor Mars Sample Return … Mountain safety research Actually Looking for Microsoft Research… 4 Research Questions • How can we generate query suggestions in a principled way? • Can we generate personalized query suggestions using the same method? • Can this method be generalized to other search related tasks? 5 Rest of This Talk • • • • • Random Walk, Hitting Time, and Bipartite Graph Generating Query Suggestion Personalized Query Suggestion Experiments Discussion and Summary 6 Random Walk and Hitting Time P = 0.3 0.3 k A i P = 0.7 0.7 j • Hitting Time – TA: the first time that the random walk is at a vertex in A • Mean Hitting Time – hiA: expectation of TA given that the walk starts from vertex i 7 Computing Hitting Time hiA = 0.7 hjA + 0.3 hkA + 1 h=0 0.7 k A i 0.7 TA: the first time that the random walk is at a vertex in A T A min{ t : X t A, t 0} hiA: expectation of TA given that the walk starting from vertex i j Apparently, hiA = 0 for those hiA p(i j )h jV A j 1, for i A iA Iterative Computation 0, for i A 8 Bipartite Graph and Hitting Time 5 Bipartite Graph: A V1 4 0.4 k 0.7 7 i V2 1 ww(i, (i, j ) j) =3 3 j dw (3w(1k), j ) j (i, j ) p(i k ) w(i, j ) 3 jV2 p (dii j ) d j di (3 7) p( j i ) - Edges between V1 and V2 - No edge inside V1 or V2 - Edges are weighted - e.g., V1 = query; V2 = Url Expected proximity of query i to the query A : hitting time of i A, hiA • convert to a directed graph, even collapse one group 9 Generate Query Suggestion • Construct a (kNN) Query Url subgraph from the 300 T query log data (of a www.aa.com 15 aa predefined number of www.theaa.com/travelwatch/ queries/urls) planner_main.jsp • Compute transition mexiana probabilities p(i j) A • Compute hitting time h american i en.wikipedia.org/wiki/Mexicana airline • Rank candidate queries using hiA 10 Intuition • Why it works? – A url is close to a query if freq(q, url) dominates the number of clicks on this url (most people use q to access url) – A query is close to the target query if it is close to many urls that are close to the target query 11 Personalized Query Suggestion • Queries are ambiguous • Different user different information need different query suggestions • Simple approach: build the graph, compute hitting time solely based on the user’s history • Data Sparseness – E.g., you cannot see a query if you never used it • Alternative: modify the bipartite graph instead of rebuilding all 12 Personalize the Bipartite Graph Query Url aa T www.aa.com pseudo query: P “aa” + user Introduce a pseudo (personali zed query) Reweight edges using personalized Probs. p(Url | User, Query ) p(Query | User,Url ) www.theaa.com/travelwatch/planner_main.jsp alcoholics anonymous american airline en.wikipedia.org/wiki/Alcoholics_Anonymous www.alcoholics-anonymous.org • Key: How to compute p(Url | User, Query ) – From w(url, user, query) – Sparse data! – Compute a smoothed p(Url | User, Query) 13 Personalization with Backoff (Mei and Church 08) Full personalization: sparse data! P(Url | IP, Q) 4 P(Url | IP4 , Q) Personalization with backoff: 3 P(Url | IP3 , Q) 156.111.188.* 2 P(Url | IP2 , Q) 156.111.*.* 1 P(Url | IP1 , Q) No personalization: lose the opportunity 156.111.188.243 0 P(Url | IP0 , Q) 156.*.*.* *.*.*.* We don’t have enough data for everyone! - Backoff to classes of users (e.g., IP) 14 Experiments • Query Suggestion using Query Logs – commercial search engine log (1.5 year) – 637 million queries; 585 million urls – Query-click bipartite graph • Author/keyword suggestion using DBLP – titles and authors from DBLP – 110k of papers, 580k authors – Coauthor graph, keyword graph, author-keyword bipartite graph • Baselines: nearest neighbor; personalized pagerank 15 Result: Query Suggestion Query = friends Hitting time Google wikipedia friends friendship friends poem friendster friends episode guide friends scripts how to make friends true friends Yahoo friends tv show wikipedia secret friends friends home page friends reunited friends warner bros hide friends the friends series hi 5 friends friends official site find friends friends(1994) poems for friends friends quotes 16 Result: Query Suggestion (II) Query = aa Yahoo Hitting time aa route planner alcoholics anonymous aa route finder Live automobile association aa airlines aa route finder theaa aa meetings aa route planner american airlines aa autoroute aa airlines aa road map american airlines american airline ticket reservation aa meeting ndcg aa road map Chris burges american air Query = ranknet Hitting Time learning to rank ndcg measure ir lambdarank pairwise test 17 Results: Personalized Query Suggestion Query = msr Personalized No personalization Microsoft research mountian safety research research msrcorp what is research msr outdoor equipment research website msr camp stoves msr snowshoes microsoft research and development msr racing yahoo research labs 18 Result: Author Suggestion Query = Jon Kleinberg (personalized Pagerank is similar) Nearest Neighbor; Famous researchers + former students Favor students, especially current students Prabhakar Raghavan Hitting time Eva Tardos Aleksandrs Slivkins Daniel P. Huttenlocher Mark Sandler David Kempe Tom Wexler Amit Kumar Lars Backstrom Andrew Tomkins Elliot Anshelevich Xiangyang Lan 19 Result: Keyword Suggestion Query = social network Knowledge collaboration Community structure Query = pagerank Resource organizationPagerank computation Information kiosks Ranking systems Efficient searching Pagerank approximation Network extraction Incremental computations Query = olap Dimension updates OLAP data OLAP cubes OLAP queries View size Hierarchical cluster Web spam Iterative computation 20 Result: Keyword Suggestion for Author Query = Michael I. Jordan Baselines mining Baselines data learning frequent Hitting Time statistical Hitting time Efficient large databases kernel Dirichlet process pattern frequent pattern markov approximate inference inference dirichlet data mining sequential pattern pattern mining model mean field frequent supervised learning multi dimensional graphic models Query = Jiawei Han 21 Discussions • Hitting time effectively boosts infrequent queries – Nearest Neighbor & personalized pagerank favorites frequent queries • Fast convergence: a few iterations and a subgraph gets most of the value • No parameter to tune • Can be generalized to many other tasks (on different graphs) 22 Ranking on Query log Graph and Search Tasks • Query Query: query suggestion • Url Url: finding related pages www.cs.jhu.edu/~brill • "research.microsoft.com/users/brill” • • • • • • IP IP: finding similar users Url Query: Annotation, Summarization, ads term Query Url: Search IP, Query Url: Personalized Search IP, Query Query: Personalized Query Suggestion Many other opportunities! Summary • Generate query suggestions using hitting time on query-click graph • Personalized query suggestion • Generalizable to other search tasks • Future work: – Different types of graphs: e.g., query sessions – Combine with other features – Large scale evaluation 24 Thanks! 25