Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
DBease: Making Databases User-Friendly and Easily Accessible Guoliang Li, Ju Fan, Hao Wu, Jiannan Wang, Jianhua Feng Database Group, Department of Computer Science and Technology, Tsinghua University, Beijing 100084, China How to Access Databases? • Traditional database-access methods: – SQL Select title, author, booktitle, year From dblp Where title Contains “search” And booktitle Contains “cidr” – Query-by-exmaple (Form) cidr – Keyword Search “search cidr” CIDR'11 - DBease (2) Usability Comparison of Different Methods CIDR'11 - DBease (3) Keyword Search • Is traditional keyword search good enough? Too many results! No result! CIDR'11 - DBease (4) Form-based Search • Form-based Search has the same problem. Complicated and still no result! CIDR'11 - DBease (5) Our Solution Type-Ahead Search Usability Type-Ahead Search in Forms SQL Suggestion CIDR'11 - DBease (6) What is Type-Ahead Search? CIDR'11 - DBease (7) Type-Ahead Search • Advantages – On-the-fly giving users instant feedback – Helping users navigate the underlying data – Tolerating inconsistencies between query and data – Supporting Synonyms – Supporting XML data – Supporting Multiple tables CIDR'11 - DBease (8) Problem Formulation • Data: A set of records • Query Edit Distance: – Q = {p1, p2, …, pl}: a set of prefixes – δ: Edit-distance threshold • Result The number of edit operations (insertion, deletion, substitution) transformed a string to another ed(string, stang) =2 – A set of records having all query prefixes or their similar forms (conjunctive) CIDR'11 - DBease (9) Indexing • Trie Index • Words: root to leaves • Inverted lists on leaves CIDR'11 - DBease (10) Algorithm • • • • Step 1: Find similar prefixes incrementally Step 2: Retrieve the leaf nodes of similar prefixes Step 3: Compute union lists of inverted lists of leaf nodes Step 4: Intersect the union lists of query keywords =cid r CIDR'11 - DBease (11) Type-Ahead Search in Forms Type-Ahead Search Usability Type-Ahead Search in Forms CIDR'11 - DBease (12) What is Type-Ahead Search in Forms? CIDR'11 - DBease (13) Type-Ahead Search in Forms • Problem Formulation – Data: A relation with multiple attributes – Query: A set of prefixes on attributes in a form interface – Answers: • Local results of the focused attribute • Global results of the relation • Advantages – On-the-fly Faceted Search – Supporting Aggregation CIDR'11 - DBease (14) Data Partition • Global Table Local Tables ID Title T1 xml database T2 xml search ID Title Conf. Author T3 xml security 1 xml database VLDB albert T4 rdbms 2 xml database SIGMOD bob 3 xml search VLDB albert 4 xml security VLDB alice 5 rdbms SIGMOD charlie ID Author A1 albert ID Conf. C1 VLDB C2 SIGMOD CIDR'11 - DBease (15) A2 bob A3 alice A4 charlie Indexing • Each attribute – Trie – Mapping Tables • Local Global • Global Local Trie: L-G Mapping Table: Φ T1 1, 2 s x T2 3 e m l a c …… T3 4 T4 5 G-L Mapping Table: 1 T1 T1: xml datrabase 2 T1 T2: xml search 3 T2 T3: xml security 4 T3 5 T4 CIDR'11 - DBease (16) Our Solution Title: xml Author: xml database xml search xml security xml database Trie: (albert) xml database (bob) x xml search (albert) xml security (alice) L-G Mapping Table: Φ s CIDR'11 - DBease (17) T2 3 e m l T1 1, 2 a c …… T3 4 T4 5 G-L Mapping Table: 1 T1 T1: xml datrabase 2 T1 T2: xml search 3 T2 T3: xml security 4 T3 5 T4 Our Solution Title: xml Author: al a xml database, albert xml search, albert xml security, alice L-G Mapping Table l albert alice b i c e r a 4: albert CIDR'11 - DBease (18) Trie e 5: alice T1 T2 T3 T4 1, 2 3 4 5 G-L Mapping Table 1 2 3 4 5 T1 T1 T2 T3 T4 SQL Suggestion Type-Ahead Search Usability Type-Ahead Search in Forms SQL Suggestion CIDR'11 - DBease (19) What is SQL Suggestion? CIDR'11 - DBease (20) SQL Suggestion • Problem Formulation – Data: A database with multiple tables – Query: A set of keywords – Answers: Relevant SQL queries • Advantages – – – – – – Suggest SQL queries based on keywords Help users formulate SQL queries to find accurate results Designed for both SQL programmers and Internet users Group answers based on SQL structures Support Aggregation Support Range queries CIDR'11 - DBease (21) Our Solution • Suggest Templates from Keywords – A template is a structure in the databases – Modeled as a graph • Nodes: entities (table names or attribute names) • Edges: foreign keys or membership keyword paper ir (a) Query (b) Template • Suggest SQL queries from Templates – Mapping between keywords and templates (c) SQL CIDR'11 - DBease (22) Template Suggestion • Template Generation – Extension from basic entities (tables) • Template Ranking – Template weight • Pagerank – Relevancy between a keyword and an entity • Tf*idf • Algorithms – Fagin algorithms – Threshold-based pruning techniques CIDR'11 - DBease (23) SQL Suggestion • SQL suggestion model – Mapping from keywords to templates – Matching is a set of mappings with all keywords – Weighted set-covering problem (NP-hard) • SQL ranking – Relevancy between keywords and attributes – Attribute weight • Algorithms – Greedy algorithms CIDR'11 - DBease (24) Search: dbease http://dbease.cs.tsinghua.edu.cn Keyword Search: http://dbease.cs.tsinghua.edu.cn/ipubmed/ http://dbease.cs.tsinghua.edu.cn/dblpsearch/ Form-based Search: http://dbease.cs.tsinghua.edu.cn/seaform/ SQL: http://dbease.cs.tsinghua.edu.cn/sqlsugg/ Differences to Google Instant Search • Fuzzy prefix matching • Google firstly predicts queries, and then use the top queries to search the documents. Google may involve false negatives, while we can find the accurate top-k answers. CIDR'11 - DBease (27) Differences to Complete Search • Fuzzy prefix matching • Different index structures • More efficient CIDR'11 - DBease (28) Differences to Keyword Search • Effectiveness – SQL Suggestion supports range queries, and aggregation functions. – SQL Suggestion can group answers. – SQL Suggestion can help users to express their query intent more accurately. • Efficiency – Faster CIDR'11 - DBease (29)