Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Computational complexity theory wikipedia , lookup
Genetic algorithm wikipedia , lookup
Factorization of polynomials over finite fields wikipedia , lookup
Lateral computing wikipedia , lookup
Simplex algorithm wikipedia , lookup
Algorithm characterizations wikipedia , lookup
Ranking objects based on relationships Computing Top-K over Aggregation Sigmod 2006 Kaushik Chakrabarti et al. Outline • • • • Motivation The framework and problem definition Proposed solution Discussions • Experiments Heating up discussion • We basically know how web search engines work. – Having web crawlers collecting web-page information, index and rank them. • How do we define searching in a relational database – Free-style search v.s. SQL + predicates ? – What’s the expected outcome? – How do we rank results? Motivation • Searching over a relational database – information scattered in different relations Motivation • Full text search, aggregation already supported by RDBMS – What else do we need in order to perform good searching? Related work • Information Retrieval (full text searching) • Researches in Text Databases • Explore database via foreign key-primary key – DBExplorer (ICDE 2002) – BANKS (ICDE 2002) – DISCOVER (VLDB 2002) • What are related work missing – Target objects don’t contain keywords – Lack of scoring function for query results – Not utilizing aggregates to put together search results for multiple keywords Contributions • Introduce an interesting problem domain • Define “Object Finder” (OF) queries • Propose scoring functions • Propose a solution to process OF query – Return top K ranked results – Efficient early termination property System Overview Scoring functions • Scoring Matrixes and row- column- marginal's Scoring semantics • All Query Keywords Present in each document – can be too restrictive • All Query Keywords Present in Set of Related Documents – can not use MIN as row-marginal scoring • Pseudo-document Approach: – enlarged searching space Problem definition • Object finder problem: Process OF query as Top-K query • Top-K query incorporates ranking. Results are total ordered if we process strong topK • A good algorithm can utilize early termination to avoid processing of results that are not in top-K Top-K query processing • General framework: Supporting Ad-hoc Ranking Aggregates SIGMOD 2006 ( presented in May) *SELECT* ga_1,..ga_n , F ----groups *FROM* R1,...,Rh ----source rel *WHERE* c1 AND... cl ----join cond. *GROUP BY* ga_1,...ga_n ----group def. *ORDER BY* F ----ordering func. an aggregate *LIMIT* k ----Top-k setting Top-K query processing • For OF query, it is select TOId, TOValue, score(TOId) from TargetTable T, R, L1,...,LN where R.TOId = T.TOID and R.DocId=Li.DocID (i=1..N) group by TOId, TOValue order By score(TOId) limit k My work is done (please try to recall my last talk) Algorithm : Generate-Prune Phrase I : Compute top-K candidates Algorithm Overview Algorithm • Phrase II Compute exact top-K Discussions • In this work – Choice of aggregation function – ranking function in general – How do you think of this work • Not limited – Impact of more complicated schema – Impact of selectivity of the query Experiment Results • • • • Faster than SQL Faster than Generate-Only Robust to # of keywords and selections Intuitive Results Experiments • Faster than SQL Experiments • Faster than Generate-Only Experiments • Robust to # of keywords and selections Thank you Questions to discuss?