Download How to Develop UDFs?

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
DBease: Making Databases
User-Friendly and Easily Accessible
Guoliang Li, Ju Fan, Hao Wu, Jiannan Wang, Jianhua Feng
Database Group, Department of Computer Science and Technology,
Tsinghua University, Beijing 100084, China
How to Access Databases?
• Traditional database-access methods:
– SQL
Select title, author, booktitle, year From dblp
Where title Contains “search” And booktitle Contains “cidr”
– Query-by-exmaple (Form)
cidr
– Keyword Search
“search cidr”
CIDR'11 - DBease (2)
Usability
Comparison of Different Methods
CIDR'11 - DBease (3)
Keyword Search
• Is traditional keyword search good enough?
Too many
results!
No result!
CIDR'11 - DBease (4)
Form-based Search
• Form-based Search has the same problem.
Complicated
and still
no result!
CIDR'11 - DBease (5)
Our Solution
Type-Ahead
Search
Usability
Type-Ahead Search in Forms
SQL Suggestion
CIDR'11 - DBease (6)
What is Type-Ahead Search?
CIDR'11 - DBease (7)
Type-Ahead Search
• Advantages
– On-the-fly giving users instant feedback
– Helping users navigate the underlying data
– Tolerating inconsistencies between query and data
– Supporting Synonyms
– Supporting XML data
– Supporting Multiple tables
CIDR'11 - DBease (8)
Problem Formulation
• Data: A set of records
• Query
Edit Distance:
– Q = {p1, p2, …, pl}: a set of prefixes
– δ: Edit-distance threshold
• Result
The number of edit operations
(insertion, deletion, substitution)
transformed a string to another
ed(string, stang) =2
– A set of records having all query prefixes or their similar forms (conjunctive)
CIDR'11 - DBease (9)
Indexing
• Trie Index
• Words: root to leaves
• Inverted lists on leaves
CIDR'11 - DBease (10)
Algorithm
•
•
•
•
Step 1: Find similar prefixes incrementally
Step 2: Retrieve the leaf nodes of similar prefixes
Step 3: Compute union lists of inverted lists of leaf nodes
Step 4: Intersect the union lists of query keywords
=cid r
CIDR'11 - DBease (11)
Type-Ahead Search in Forms
Type-Ahead
Search
Usability
Type-Ahead Search in Forms
CIDR'11 - DBease (12)
What is Type-Ahead Search in Forms?
CIDR'11 - DBease (13)
Type-Ahead Search in Forms
• Problem Formulation
– Data: A relation with multiple attributes
– Query: A set of prefixes on attributes in a form interface
– Answers:
• Local results of the focused attribute
• Global results of the relation
• Advantages
– On-the-fly Faceted Search
– Supporting Aggregation
CIDR'11 - DBease (14)
Data Partition
• Global Table  Local Tables
ID
Title
T1
xml database
T2
xml search
ID
Title
Conf.
Author
T3
xml security
1
xml database
VLDB
albert
T4
rdbms
2
xml database
SIGMOD
bob
3
xml search
VLDB
albert
4
xml security
VLDB
alice
5
rdbms
SIGMOD
charlie
ID Author
A1 albert
ID Conf.
C1
VLDB
C2
SIGMOD
CIDR'11 - DBease (15)
A2 bob
A3 alice
A4 charlie
Indexing
• Each attribute
– Trie
– Mapping Tables
• Local Global
• Global  Local
Trie:
L-G Mapping Table:
Φ
T1 1, 2
s
x
T2 3
e
m
l
a
c
……
T3 4
T4 5
G-L Mapping Table:
1
T1
T1: xml datrabase
2
T1
T2: xml search
3
T2
T3: xml security
4
T3
5
T4
CIDR'11 - DBease (16)
Our Solution
Title:
xml
Author:
xml database
xml search
xml security
xml database
Trie:
(albert)
xml database (bob)
x
xml search (albert)
xml security (alice)
L-G Mapping Table:
Φ
s
CIDR'11 - DBease (17)
T2 3
e
m
l
T1 1, 2
a
c
……
T3 4
T4 5
G-L Mapping Table:
1
T1
T1: xml datrabase
2
T1
T2: xml search
3
T2
T3: xml security
4
T3
5
T4
Our Solution
Title:
xml
Author: al
a
xml database, albert
xml search, albert
xml security, alice
L-G Mapping Table
l
albert
alice
b
i
c
e
r
a
4: albert
CIDR'11 - DBease (18)
Trie
e
5: alice
T1
T2
T3
T4
1, 2
3
4
5
G-L Mapping Table
1
2
3
4
5
T1
T1
T2
T3
T4
SQL Suggestion
Type-Ahead
Search
Usability
Type-Ahead Search in Forms
SQL Suggestion
CIDR'11 - DBease (19)
What is SQL Suggestion?
CIDR'11 - DBease (20)
SQL Suggestion
• Problem Formulation
– Data: A database with multiple tables
– Query: A set of keywords
– Answers: Relevant SQL queries
• Advantages
–
–
–
–
–
–
Suggest SQL queries based on keywords
Help users formulate SQL queries to find accurate results
Designed for both SQL programmers and Internet users
Group answers based on SQL structures
Support Aggregation
Support Range queries
CIDR'11 - DBease (21)
Our Solution
• Suggest Templates from Keywords
– A template is a structure in the
databases
– Modeled as a graph
• Nodes: entities (table names or
attribute names)
• Edges: foreign keys or
membership
keyword paper
ir (a) Query
(b) Template
• Suggest SQL queries from Templates
– Mapping between keywords and
templates
(c) SQL
CIDR'11 - DBease (22)
Template Suggestion
• Template Generation
– Extension from basic entities
(tables)
• Template Ranking
– Template weight
• Pagerank
– Relevancy between a keyword and an
entity
• Tf*idf
• Algorithms
– Fagin algorithms
– Threshold-based pruning techniques
CIDR'11 - DBease (23)
SQL Suggestion
• SQL suggestion model
– Mapping from keywords to templates
– Matching is a set of mappings with all keywords
– Weighted set-covering problem (NP-hard)
• SQL ranking
– Relevancy between keywords and attributes
– Attribute weight
• Algorithms
– Greedy algorithms
CIDR'11 - DBease (24)
Search: dbease
http://dbease.cs.tsinghua.edu.cn
Keyword Search:
http://dbease.cs.tsinghua.edu.cn/ipubmed/
http://dbease.cs.tsinghua.edu.cn/dblpsearch/
Form-based Search:
http://dbease.cs.tsinghua.edu.cn/seaform/
SQL:
http://dbease.cs.tsinghua.edu.cn/sqlsugg/
Differences to Google Instant Search
• Fuzzy prefix matching
• Google firstly predicts queries, and then use the top
queries to search the documents. Google may
involve false negatives, while we can find the
accurate top-k answers.
CIDR'11 - DBease (27)
Differences to Complete Search
• Fuzzy prefix matching
• Different index structures
• More efficient
CIDR'11 - DBease (28)
Differences to Keyword Search
• Effectiveness
– SQL Suggestion supports range queries, and
aggregation functions.
– SQL Suggestion can group answers.
– SQL Suggestion can help users to express their query
intent more accurately.
• Efficiency
– Faster
CIDR'11 - DBease (29)
Related documents