Download Study Abroad Program Search

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Microsoft Access wikipedia , lookup

IMDb wikipedia , lookup

Oracle Database wikipedia , lookup

Open Database Connectivity wikipedia , lookup

Microsoft SQL Server wikipedia , lookup

Relational algebra wikipedia , lookup

SQL wikipedia , lookup

Concurrency control wikipedia , lookup

Entity–attribute–value model wikipedia , lookup

Microsoft Jet Database Engine wikipedia , lookup

Database wikipedia , lookup

PL/SQL wikipedia , lookup

Extensible Storage Engine wikipedia , lookup

Ingres (database) wikipedia , lookup

Clusterpoint wikipedia , lookup

ContactPoint wikipedia , lookup

Join (SQL) wikipedia , lookup

Relational model wikipedia , lookup

Database model wikipedia , lookup

Transcript
DBXplorer: A System for KeywordBased Search over Relational
Databases
Sanjay Agrawal
Surajit Chaudhuri
Gautam Das
Presented by
Bhushan Pachpande
Contents
1.
2.
3.
4.
5.
6.
Introduction
Overview of DBXplorer
Symbol Table Design - Publish
Keyword Search
Support for Generalized Matches
Conclusion
Introduction

Internet search engines have popularized keyword-based search.

Searching on traditional database management system is done
through customized applications which are closely tied to the
database schema.

Traditional database management systems do not support keywordbased search.


e.g. search the Microsoft intranet on ‘Jim Gray’ to obtain matched rows,
i.e., rows in the database where ‘Jim Gray’ occur.
In this paper, DBXplorer, an efficient and scalable keyword search
utility for relational databases, is described.

The goal is to enable such searches without necessarily requiring the
users to know the schema of the respective databases.
Example
Searching for
a book

Keywords “Programming” by “Ritchie”

Less probability of presence of both keywords in single
row of table

For result, rows need to be generated by joining tables
on the fly (all possible combinations)
Authors
AuthorsBooks
Books
BookStores
Store
Overview of DBXplorer

Objective - Given a set of query keywords, DBXplorer returns all
rows (either from single tables, or by joining tables connected by
foreign-key joins) such that the each row contains all keywords.

Applying IR techniques from the documents world to databases
is difficult, because of
 Database normalization – by which logical units of information
may be fragmented and scattered across several tables
 Matching row may be obtained by joining several tables on the fly

IR techniques use Inverted Lists = Symbol Table in databases

Symbol Table - Stores the information about keywords at
different granularities (column/row), i.e. for each keyword stores
the list of all rows
Overview of DBXplorer - Publish

Enabling keyword search in DBXplorer requires 2 steps
 Publish


Search


enables database for keywords by building Symbol table and
associated structures
retrieves matching rows from published database
Steps in Publish
 Step 1: A database is identified, along with the set of tables and
columns within the database to be published.

Step 2: Auxiliary tables [like Symbol table] are created for
supporting keyword searches.
Overview of DBXplorer - Search

Steps in Search

Looks up symbol table to find tables / columns which
contain keywords

Identify all possible joins (subsets of table if joined) whose
rows might contain required keywords

Generate SQL statements for each join (gives rows which
contain all keywords), rank rows and return
Architecture of DBXplorer
The publish component provides
interfaces to




select a database,
select tables/columns within the
database to publish, and
modify/remove/maintain the
publication.
For a given set of keywords, the
search component provides
interfaces to



retrieve matching databases from a
set of published databases, and
selectively identify tables,
columns/rows that need to be
searched within each database
identified in step (1).
Architecture of DBXplorer
Symbol Table Design

Exact match problem considered only

Symbol Table (S) - stores the information about keywords at
different granularities (column/row), i.e. for each keyword stores
the list of all rows

Column Level granularity (Pub-Col)
 For every keyword S maintains list of all database columns (i.e.
table.column)

Cell level granularity (Pub-cell)
 For every keyword S maintains list of all database cells (i.e.
table.column.rowid)
Symbol Table - Design factors



Space and Time Requirements
 Size: pub-col are smaller than pub-cell since repetition of
keyword in a column does not increases entries in case of pubcol
 Time: pub-col takes less time to build
Keyword search performance
 Depends on efficient generation and execution of SQL statement
(built from symbol table entries)
 pub-cell returns more number of SQL statements than pub-col as
for a keyword in column there are multiple entries
Ease of maintenance
 Insert/Update: required for insertion of distinct new value in case
of pub-col while pub-cell needs for every update/insert
 Same for delete
Storing Symbol Table


Store symbol tables (pub-col) in
database as (keyword hash ,
column Id)
FK Compression (Foreign Key)



v3
If there is foreign key relationship
between c1 and c2, store only c1
CP Compression


v2
v4
c1
x
c2
Uncompressed hash table
Partition H into a minimum number
of bipartite cliques (a bipartite clique
is any subgraph of H with a maximal
number of edges).
Compress each clique.
Stores symbol table (pub-cell) in
database as (keyword hash, list of
all cellids)
ColumnsMap table
Compressed hash table
Search - Enumerating Join Trees

Step1 - Looks up symbol table to find tables / columns which
contain keywords

Step2 - Enumerate join trees
 Identify and enumerate all potential subsets of tables in the
database that, if joined, might contain rows having all keywords.
 The resulting relation will contain all potential rows having all
keywords specified in the query.
keywords
If we view the schema graph G
as an undirected graph, this step
enumerates join trees, i.e., subtrees of G such that
(a)
(b)
the leaves belong to MatchedTables
together, the leaves contain all
keywords of the query
Join Trees
Search – Identify matching rows

The input to this final search step is the enumerated join trees.

Each join tree is then mapped to a single SQL statement that joins the
tables as specified in the tree, and selects those rows that contain all
keywords.

The retrieved rows are ranked before being output.

Rows ranked by number of joins involved (ties broken arbitrarily) (same as
keywords occurring close to one another in documents are ranked higher)
Join Trees
Generalized Matches – Token Matches

Token matches - the keyword in the query matches only a token or
sub-string of an attribute value (e.g., retrieve rows of address by
specifying only a street name).

Pub-Prefix method

B+ tree indexes can be used to retrieve rows whose cell matches a
given prefix string

This clause is of the form
WHERE T.C LIKE ‘P%K%’

During publishing of a database, for every keyword K, the entry
(hash(K), T.C, P) is kept in the symbol table if there exists a string in
column T.C which

contains a token K, and

has prefix P
Generalized Matches - Token Matches
Let the hash values of the searchable tokens i.e., ‘string’,
‘ball’ and ‘round’ be 1, 2 and 3 respectively
RowId
C
1
this is a string
2
this string
3
this is a ball
4
no string
5
any ball is round
Pub-Prefix table
Database table T
Consider searching keyword “string”
Pub-Prefix table returns prefixes “th” and “no” and subsequent SQL will
contain (T.C LIKE ‘th%string%’) OR (T.C LIKE ‘no%string%’)
Conclusion

This paper discusses DBXplorer, a system that enables
keyword-based search in relational databases.

DBXplorer uses symbol table alternatives to store the
location of keywords in database.

DBXplorer support exact matches and generalized
matches upto some extent.