Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
Database Research Group Interactive SQL Query Suggestion Making Databases User-Friendly Ju Fan, Guoliang Li, and Lizhu Zhou Database Research Group, Tsinghua University ICDE 2011 – Apr. 13, Hanover Outline • • • • • • Motivation Overview of SQL Query Suggestion Template Suggestion SQL Generation Experiments Conclusion 5/22/2017 SQL Suggestion, ICDE 2011 2 Outline • • • • • • Motivation Overview of SQL Query Suggestion Template Suggestion SQL Generation Experiments Conclusion 5/22/2017 SQL Suggestion, ICDE 2011 3 SQL: Powerful Yet Difficult • SQL is powerful but difficult for inexperienced users to pose queries ▪ Be skillful in SQL syntax to express query intent ▪ Have a thorough understanding of the schema 5/22/2017 SQL Suggestion, ICDE 2011 4 SQL Assistant Tools • Target Users ▪ The novice users who struggle with the basic SQL syntax or the structure of the schema. • Limitations ▪ Only support metadata and SQL syntax ▪ Require users to manually join multi-tables 5/22/2017 SQL Suggestion, ICDE 2011 5 Keyword Search over RDB • Keyword Search over Relational DB ▪ Data: A database with multiple tables ▪ Query: Keywords ▪ Answer: Joined tuples containing the keywords • Limitations ▪ Cannot precisely capture users’ query intent ▪ May involve irrelevant results ▪ Cannot support aggregate functions, range queries, etc. 5/22/2017 SQL Suggestion, ICDE 2011 6 SQL Suggestion from Keywords 5/22/2017 SQL Suggestion, ICDE 2011 7 Features of SQL Suggestion • Objective: Assist users to formulate SQL queries using keywords • Main Features ▪ ▪ ▪ ▪ 5/22/2017 Query intent prediction Answer grouping Aggregation queries Range queries SQL Suggestion, ICDE 2011 8 Usability Easier Comparison of Query Paradigms Keyword Search SQL Suggestion SQL Expressiveness 5/22/2017 SQL Suggestion, ICDE 2011 More Powerful 9 Outline • • • • • • Motivation Overview of SQL Query Suggestion Template Suggestion SQL Generation Experiments Conclusion 5/22/2017 SQL Suggestion, ICDE 2011 10 Problem Definition Query: Keywords User Answer: SQL Queries Data: A Database with Multiple Tables … 5/22/2017 SQL Suggestion, ICDE 2011 11 A Two-Step Framework One of Relevant Templates Step 1 Template Suggestion User “count paper ir” One of Generated SQL Queries SELECT COUNT (P.id) FROM Paper P, Author A, Write W WHERE A.name CONTAINS “ir” AND A.id = W.aid AND P.id = W.pid 5/22/2017 SQL Suggestion, ICDE 2011 Step 2 SQL Generation 12 Outline • • • • • • Motivation Overview of SQL Query Suggestion Template Suggestion SQL Generation Experiments Conclusion 5/22/2017 SQL Suggestion, ICDE 2011 13 Template Suggestion One of Relevant Templates Step 1 Template Suggestion User “count paper ir” One of Generated SQL Queries SELECT COUNT (P.id) FROM Paper P, Author A, Write W b WHERE A.name CONTAINS “ir” AND A.id = W.aid AND P.id = W.pid 5/22/2017 SQL Suggestion, ICDE 2011 Step 2 SQL Generation 14 Queryable Template • The skeleton of SQL queries that models the joined entities and their attributes. • A template is an undirected graph 5/22/2017 SQL Suggestion, ICDE 2011 15 Template Generation • Atom Entities Size ▪ E.g., Paper 1 • Expansion Rules ▪ E.g., P – W P – W – A • Combinatory Explosion ▪ A ranking model for avoiding exploring all templates 2 3 … 5/22/2017 SQL Suggestion, ICDE 2011 ID Template T1 P T2 A T3 W T4 P–W T5 A–W T6 P–W–A T7 W–P–W T8 W–A–W … … 16 Template Ranking Model P(Q,T) = P(T) ∑R∈T ∑k ∈Q P(k|R) P(R|T) Query Keyword1 Keyword2 … Keywordn Keywords Q P(k|R):Relevance of R to k(TF-IDF) Entities in template T Paper Write P(R|T):Importance of Rto T: (PageRank) 5/22/2017 SQL Suggestion, ICDE 2011 Author P(T) Query Ability of T 17 Top-k Suggestion Algorithm P(Q,T) = P(T) ∑R∈T ∑k ∈Qw P(k|R) P(R|T) R • Fagin Algorithm [5] ▪ Lists of templates for all entities ordered by P(R|T) • Indexing ▪ Inverted Index: wp* wA* wW* ◦ Entity-to-Template ▪ Forward Index: ◦ Template-to-Entity P(P|T) 5/22/2017 SQL Suggestion, ICDE 2011 P(A|T) P(W|T) 18 Outline • • • • • • Motivation Overview of SQL Query Suggestion Template Suggestion SQL Generation Experiments Conclusion 5/22/2017 SQL Suggestion, ICDE 2011 19 SQL Generation One of Relevant Templates Step 1 Template Suggestion User “count paper ir” One of Generated SQL Queries SELECT COUNT (P.id) FROM Paper P, Author A, Write W WHERE A.name CONTAINS “ir” AND A.id = W.aid AND P.id = W.pid 5/22/2017 SQL Suggestion, ICDE 2011 Step 2 SQL Generation 20 Match Keywords to Attributes database count author A Matching Keyword-to-Attribute SELECTION σ Mapping Φ ,σ Aggregation Φ Φ σ id title booktitle Φ year Paper 5/22/2017 Projection Φ π id Φ name Author SQL Suggestion, ICDE 2011 21 SQL Generation Model S(M)= ∑m∈M ρ(k,A) I(A) database count author A Matching σ ρ(k,A) the degree of a mapping Φ id title booktitle year Paper 5/22/2017 π id I(A): the importance of mapped attributes (Entropy) name Author SQL Suggestion, ICDE 2011 22 Best SQL Query Generation • Optimization Problem MAX. S(M)= ∑m∈M ρ(k,A) I(A) • Weighted Set Covering Problem (NP-hard) ▪ A greedy approximation algorithm • Extensions ▪ Find Top-k matchings 5/22/2017 SQL Suggestion, ICDE 2011 23 Outline • • • • • • Motivation Overview of SQL Query Suggestion Queryable Template Suggestion SQL Generation from Templates Experiments Conclusion 5/22/2017 SQL Suggestion, ICDE 2011 24 Experiment Setup • Data sets ▪ DBLP: More than one million publication records ▪ DBLIFE: Activity information of people in DB comm. • Query sets, E.g., ▪ count author mining (DBLP) ▪ database jim gray (DBLIFE) • Baseline method: DISCOVER-II • User-Study for effectiveness evaluation 5/22/2017 SQL Suggestion, ICDE 2011 25 Template Suggestion Precision-Recall Curves on the DBLife data set 5/22/2017 SQL Suggestion, ICDE 2011 26 SQL Generation Query: database Jim Gray Precisions on the DBLife data set 5/22/2017 SQL Suggestion, ICDE 2011 27 Record Retrieval Query: count author mining Advantages of SQL Suggestion • Support aggregation functions • Support meta-data matching Precisions on the DBLife data set 5/22/2017 SQL Suggestion, ICDE 2011 28 Efficiency Efficiency Comparison (DBLife) 5/22/2017 SQL Suggestion, ICDE 2011 29 Scalability Scalability (DBLP) 5/22/2017 SQL Suggestion, ICDE 2011 30 Outline • • • • • • Motivation Overview of SQL Query Suggestion Queryable Template Suggestion SQL Generation from Templates Experiments Conclusion 5/22/2017 SQL Suggestion, ICDE 2011 31 Conclusion • An effective and user-friendly keyword-based method • Assist users to formulate SQL queries • Suggest templates relevant to keyword queries • Generate SQL queries from templates • Extensive experiments 5/22/2017 SQL Suggestion, ICDE 2011 32 Future Work • This study opens many new interesting and challenging problems ▪ Cardinality estimation of suggested SQL queries ▪ Personalized SQL suggestion 5/22/2017 SQL Suggestion, ICDE 2011 33 Thanks Demo: http://tastier.cs.tsinghua.edu.cn/sqlsugg My Homepage: http://dbgroup.cs.tsinghua.edu/fanju 5/22/2017 SQL Suggestion, ICDE 2011 34 Comparison with Existing Work • CN-Based Methods ▪ Better template ranking ▪ SQL Generation ▪ Aggregation functions, range queries, etc. 5/22/2017 SQL Suggestion, ICDE 2011 35