Download Enabling Cost-based Optimization for Top

“Artificial Intelligence” in my research Seung-won Hwang Department of CSE POSTECH Recap    Bridging the gap between under-/over-specified user queries We went through various techniques to support intelligent querying, implicitly/automatically from data, prior users, specific user, and domain knowledge My research shares the same goal, with some AI techniques applied (e.g., search, machine learning) 2 The Context: query select * from houses top-3 houses order by [ranking function F] limit 3 Rank Formulation ranked results Rank Processing e.g., realtor.com 3 Overview Usability: Rank Formulation query select * from houses top-3 houses order by [ranking function F] limit 3 Rank Formulation Rank Processing ranked results e.g., realtor.com Efficiency: Processing Algorithms 4 Part I: Rank Processing  Essentially a search problem (you studied in AI) 5 Limitation of Naïve approach Merge step F = min(new,cheap,large) k=1 Sort step new (search predicate) : x a:0.90, b:0.80, c:0.70, d:0.60, e:0.50 cheap (expensive predicate) : pc b:0.78 Algorithm     d:0.90, a:0.85, b:0.78, c:0.75, e:0.70 large (expensive predicate) : pl b:0.90, d:0.90, e:0.80, a:0.75, c:0.20  Our goal is to schedule the order of probes to minimize the number of probes 6 global schedule : H(pc, pl) OID x pc pl min(x, pc, pl) a 0.90 0.85 0.75 0.75 b 0.80 0.78 0.90 0.78 c 0.70 d 0.60 e 0.50 Unnecessary probes initial state pr(a,pc) =0.85 a:0.9 b:0.8 c:0.7 d:0.6 e:0.5 a:0.85 b:0.8 c:0.7 d:0.6 e:0.5 pr(a,pl) =0.75 a a b c b d c d e e b b goal state 7 Search Strategies?      Depth-first Breadth-first Depth-limited / iterative deepening (try every depth limit) Bidirectional Iterative improvement (greedy/hill climbing) 8 Best First Search    Determining which node to explore next, using evaluation function Evaluation function:  exploring more on object with the highest “upper bound score” We could show that this evaluation function minimizes the number of evaluation, by evaluating only when “absolutely necessary”. 9 Necessary Probes?  Necessary probes  probe pr(u,p) is necessary if we cannot determine top-k answers until probing pr(u,p), where u: object, p: predicate Let global schedule be H(pc, pl) OID x pc pl min(x, pc, pl) a 0.90 0.85 0.75 0.75 b 0.80 0.78 0.90 0.78 Can we decide top-1 without probing pr(a,pc)? c 0.70 0.75 0.20 0.20  No pr(a,pc) necessary! d 0.60 0.90 0.90 0.60 e 0.50 0.70 0.80 0.50 top-1: b(0.78) ≤0.90 10 global schedule : H(pc, pl) OID x pc a 0.90 0.85 0.75 0.75 b 0.80 0.78 0.90 0.78 c 0.70 d 0.60 e 0.50 pr(a,pc) =0.85 a:0.9 b:0.8 c:0.7 d:0.6 e:0.5 a:0.85 b:0.8 c:0.7 d:0.6 e:0.5 pr(a,pl) =0.75 b:0.8 a:0.75 c:0.7 d:0.6 e:0.5 pl min(x, pc, pl) Unnecessary probes pr(b,pc) =0.78 b:0.78 a:0.75 c:0.7 d:0.6 e:0.5 pr(b,pl) =0.90 b:0.78 a:0.75 c:0.7 d:0.6 e:0.5 Top-1 b:0.78 11 Generalization Random Access Sorted Access s =1 (cheap) s=h (expensive) s= (impossible) r =1 (cheap) r=h (expensive) r= (impossible) FA, TA, QuickCombine CA, SR-Combine NRA, StreamCombine FA, TA, QuickCombine NRA, StreamCombine Unified Top-k Optimization MPro [ICDE05a/TKDE] [SIGMOD02/TODS] 12 Just for Laugh: Adapted from Hyountaek Yong’s presentation  Strong nuclear force Electromagnetic force Weak nuclear force Unified field theory Gravitational force 13 FA TA NRA Unified Cost-based Approach CA MPro 14 Generality  Across a wide range of scenarios  One algorithm for all 15 Adaptivity  Optimal at specific runtime scenario 16 Cost based Approach  Cost-based optimization Finding optimal algorithm for the given scenario, with minimum cost, from a space   M opt  argmin M Ω Cost( M )  Mopt 17 Evaluation: Unification and Contrast (v. TA) Unification: For symmetric function, e.g., avg(p1, p2), framework NC behaves similarly to TA Contrast: For asymmetric function, e.g., min(p1, p2), NC adapts with different behaviors and outperforms TA cost cost N depth into p2 T T depth into p2 N N depth into p1 depth into p1 18 Part II: Rank Formulation Usability: Rank Formulation query select * from houses top-3 houses order by [ranking function F] limit 3 Rank Formulation Rank Processing ranked results e.g., realtor.com Efficiency: Processing Algorithms 19 Learning F from implicit user interactions Using machine learning technique (that you will learn soon!) to combine quantitative model for efficiency and qualitative model for usability  Quantitative model     Query condition is represented as a mapping F of objects into absolute numerical scores DB-friendly, by attaining the absolute score on each object Example F( )=0.9 F( )=0.5 Qualitative model    Query condition is represented as a relative ordering of objects User-friendly by alleviating user from specifying the absolute score on each object Example > 20 A Solution: RankFP (RANK Formulation and Processing) For usability, a qualitative formulation front-end which enables rank formulation by ordering samples For efficiency, a quantitative ranking function F which can be efficiently processed Over S: RF R* ? ranking R* over S yes no 5 4 3 2 1 Function Learning: learn new F Sample Selection: generate new S sample S (unordered) Rank Formulation ranking function Q: select * from houses order by F limit k F ranked results processing of Q Rank Processing 21 Task 1: RankingClassification Challenge: Unlike a conventional learning problem of classifying objects into groups, we learn a desired ordering of all objects learning algorithms: a binary classifier + - F Solution: We transform ranking into a classification on pairwise comparisons [Herbrich00] ranking view: c>b>d>e>a c b d e a classification view: pairwise comparison classification a-b b-c c-d d-e a-c … + + … [Herbrich00] R. Herbrich, et. al. Large margin rank boundary for ordinal regression. MIT Press, 2000. 22 Task 2: ClassificationRanking Challenge: With the pairwise classification function, we need to efficiently process ranking. F(a-b)? F(a)=0.7 a b F(a-c)? e F(a-d)? ….. c d Solution: developing duality connecting F also as a global perobject ranking function. Suppose function F is linear Classification View: Ranking View: F(ui-uj)>0  F(ui)- F(uj)>0  F(ui)> F(uj)  Rank with F(.) e.g., F(c)>F(b)>F(d)>… 23 Task 3: Active Learning  Finding samples maximizing learning effectiveness  Selective sampling: resolving the ambiguity F  Top sampling: focusing on top results F  Achieving >90% accuracy in <=3 iterations (<=10 ms) 24 Using Categorization for Intelligent Retrieval  Category structure created a-priori (typically a manual process)  At search time: each search result placed under pre-assigned category  Susceptible to skew  information overload 25 Categorization: Cost-based Optimization   Categorize results automatically/dynamically  Generate labeled, hierarchical category structure dynamically based on the contents of the tuples in the result set  Does not suffer from problems as in a-priori categorization Contributions:  Exploration/cost models to quantify information overload faced by an user during an exploration  Cost-driven search to find low cost categorizations  Experiments to evaluate models/algorithms 26 Thank You! 27

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Enabling Cost-based Optimization for Top