* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Ranking Objects based on relationships
Survey
Document related concepts
Computational complexity theory wikipedia , lookup
Genetic algorithm wikipedia , lookup
Factorization of polynomials over finite fields wikipedia , lookup
Lateral computing wikipedia , lookup
Simplex algorithm wikipedia , lookup
Algorithm characterizations wikipedia , lookup
Transcript
Ranking objects based on relationships Computing Top-K over Aggregation Sigmod 2006 Kaushik Chakrabarti et al. Outline • • • • Motivation The framework and problem definition Proposed solution Discussions • Experiments Heating up discussion • We basically know how web search engines work. – Having web crawlers collecting web-page information, index and rank them. • How do we define searching in a relational database – Free-style search v.s. SQL + predicates ? – What’s the expected outcome? – How do we rank results? Motivation • Searching over a relational database – information scattered in different relations Motivation • Full text search, aggregation already supported by RDBMS – What else do we need in order to perform good searching? Related work • Information Retrieval (full text searching) • Researches in Text Databases • Explore database via foreign key-primary key – DBExplorer (ICDE 2002) – BANKS (ICDE 2002) – DISCOVER (VLDB 2002) • What are related work missing – Target objects don’t contain keywords – Lack of scoring function for query results – Not utilizing aggregates to put together search results for multiple keywords Contributions • Introduce an interesting problem domain • Define “Object Finder” (OF) queries • Propose scoring functions • Propose a solution to process OF query – Return top K ranked results – Efficient early termination property System Overview Scoring functions • Scoring Matrixes and row- column- marginal's Scoring semantics • All Query Keywords Present in each document – can be too restrictive • All Query Keywords Present in Set of Related Documents – can not use MIN as row-marginal scoring • Pseudo-document Approach: – enlarged searching space Problem definition • Object finder problem: Process OF query as Top-K query • Top-K query incorporates ranking. Results are total ordered if we process strong topK • A good algorithm can utilize early termination to avoid processing of results that are not in top-K Top-K query processing • General framework: Supporting Ad-hoc Ranking Aggregates SIGMOD 2006 ( presented in May) *SELECT* ga_1,..ga_n , F ----groups *FROM* R1,...,Rh ----source rel *WHERE* c1 AND... cl ----join cond. *GROUP BY* ga_1,...ga_n ----group def. *ORDER BY* F ----ordering func. an aggregate *LIMIT* k ----Top-k setting Top-K query processing • For OF query, it is select TOId, TOValue, score(TOId) from TargetTable T, R, L1,...,LN where R.TOId = T.TOID and R.DocId=Li.DocID (i=1..N) group by TOId, TOValue order By score(TOId) limit k My work is done (please try to recall my last talk) Algorithm : Generate-Prune Phrase I : Compute top-K candidates Algorithm Overview Algorithm • Phrase II Compute exact top-K Discussions • In this work – Choice of aggregation function – ranking function in general – How do you think of this work • Not limited – Impact of more complicated schema – Impact of selectivity of the query Experiment Results • • • • Faster than SQL Faster than Generate-Only Robust to # of keywords and selections Intuitive Results Experiments • Faster than SQL Experiments • Faster than Generate-Only Experiments • Robust to # of keywords and selections Thank you Questions to discuss?