Download SIGMOD 2006: Effective Keyword Search in Relational Databases

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Encyclopedia of World Problems and Human Potential wikipedia , lookup

IMDb wikipedia , lookup

Oracle Database wikipedia , lookup

Open Database Connectivity wikipedia , lookup

SQL wikipedia , lookup

Entity–attribute–value model wikipedia , lookup

Ingres (database) wikipedia , lookup

Functional Database Model wikipedia , lookup

Microsoft Jet Database Engine wikipedia , lookup

Relational algebra wikipedia , lookup

Concurrency control wikipedia , lookup

Database wikipedia , lookup

Astrophysics Data System wikipedia , lookup

Versant Object Database wikipedia , lookup

ContactPoint wikipedia , lookup

Clusterpoint wikipedia , lookup

Database model wikipedia , lookup

Relational model wikipedia , lookup

Transcript
Effective Keyword Search in
Relational Databases
Fang Liu (University of Illinois at Chicago)
Clement Yu (University of Illinois at Chicago)
Weiyi Meng (Binghamton University)
Abdur Chowdhury (America Online, Inc.)
Effective Keyword Search in
Relational Databases





Introduction
IR ranking in text databases
Our ranking strategy in RDBs
Experiments
Conclusions and future work
SIGMOD 2006: Effective Keyword Search in Relational Database
Introduction
Why keyword search in relational
databases?



We want to search text data in relational
databases
SQL with the “contains” operator is not for
non-expert users
Keyword search is tremendous successful in
text database by ranking documents based
on similarity. It is for non-expert users
SIGMOD 2006: Effective Keyword Search in Relational Database
Introduction

Text data in relational databases
SIGMOD 2006: Effective Keyword Search in Relational Database
Introduction
Suppose a user is looking for albums titled “off the wall”
SIGMOD 2006: Effective Keyword Search in Relational Database
Introduction

Keyword search is very successful in
text database by ranking documents
based on similarity. Google, Yahoo
and MSN search are the examples.
So, let’s do keyword search in relational databases!
(DBXplorer, BANKS, DISCOVER & IR-style DISCOVER,
ObjectRank, Ranking Objects)
SIGMOD 2006: Effective Keyword Search in Relational Database
Introduction

Let’s do it, but how?


What are answers to be ranked?
How should we rank these answers?
SIGMOD 2006: Effective Keyword Search in Relational Database
Introduction -- an answer
An answer for a given
query Q: a tuple tree, in
which every leaf node
must have at least one
keyword in Q.
SIGMOD 2006: Effective Keyword Search in Relational Database
Introduction

Use a slightly modified algorithm
[DISCOVER] to produce all answers
for a given query.
SIGMOD 2006: Effective Keyword Search in Relational Database
Introduction: Ranking

Our focus is on the effectiveness
problem of ranking answers: the more
relevant an answer is to the user query,
the higher it should be ranked.
SIGMOD 2006: Effective Keyword Search in Relational Database
Introduction: Contributions



We identify four new factors that are critical to
effective ranking and we propose a new ranking
strategy
Design and conduct comprehensive experiments
for the effectiveness problem
Experimental results show our strategy is
significantly better than existing works in
effectiveness
SIGMOD 2006: Effective Keyword Search in Relational Database
Effective Keyword Search in
Relational Databases





Introduction
IR ranking in text databases
Our ranking strategy in RDBs
Experiments
Conclusions and future work
SIGMOD 2006: Effective Keyword Search in Relational Database
tf=2, ntf=1.53;tf=10, ntf=2.2;
half: idf =0.69, 1/100, idf=4.6, 1/200,000, idf=12, s=0.2
1: ndl=1, half, ndl=0.9, 1/10:ndl = 0.8, 2: ndl=1.2, 10: ndl=2.8
3.3 IR Ranking

Q=(k1, k2, ..,kn), D is a document, Sim(Q,D) is
the ranking score of D.
Sim(Q, D) 
 weight (k , Q) * weight (k , D)
kQ , D
ntf
weight(k , D) 
 idf
ndl

ntf  1  ln(1  ln(tf ))

N
idf  ln
df  1

ndl  (1  s )  s 
dl
avgdl
SIGMOD 2006: Effective Keyword Search in Relational Databases
Effective Keyword Search in
Relational Databases





Introduction
IR ranking in text databases
Our ranking strategy in RDBs
Experiments
Conclusions and future work
SIGMOD 2006: Effective Keyword Search in Relational Database
Our Ranking Strategy

T=(D1,D2,..Dn), so
Sim(Q,D)Sim(Q,T)
Sim(Q, D) 
 weight (k , Q) * weight (k , D)
kQ , D
Sim(Q, T ) 
 weight (k , Q) * weight (k ,T )
kQ ,T
SIGMOD 2006: Effective Keyword Search in Relational Database
Our Ranking Strategy

T=(D1,D2,..Dn), so Sim(Q,D)Sim(Q,T)
Sim(Q, T ) 
 weight (k , Q) * weight (k ,T )
kQ ,T
ntf * idf g
weight (k , Di ) 
ndl * Nsize(T )
weight(k , T )  Combweight (k , D1 ),..., weight (k , Dm )
SIGMOD 2006: Effective Keyword Search in Relational Database
Our Ranking Strategy

Tuple Tree Size Normalization
ntf * idf g
weight (k , Di ) 
ndl * Nsize(T )
size (T )
Nsize (T )  (1  s )  s 
avgsize
# of tuples in
a tuple tree T
SIGMOD 2006: Effective Keyword Search in Relational Database
Our Ranking Strategy

Document Length Normalization
Reconsidered
ntf * idf g
weight (k , Di ) 
ndl * Nsize(T )
Document length of Di

dl 
 * 1  ln(avgdl) 
ndl   (1  s )  s 
avgdl 

Average Document length of
the text column of Di
SIGMOD 2006: Effective Keyword Search in Relational Database
Our Ranking Strategy

Document Frequency Normalization
ntf * idf g
weight (k , Di ) 
ndl * Nsize(T )
idf
g
 ln
N
df
g
g
1
SIGMOD 2006: Effective Keyword Search in Relational Database
Our Ranking Strategy

T=(D1,D2,..Dn)


maxWgt is the maximum weight(k, Di)
sumWgt is the sum of weight(k, Di)
weight(k , T )  Combweight (k , D1 ),..., weight (k , Dm )



sumWgt
 
Comb()  max Wgt * 1  ln 1  ln


max
Wgt




SIGMOD 2006: Effective Keyword Search in Relational Database
Our Ranking Strategy

T=(D1,D2,..Dn), so Sim(Q,D)Sim(Q,T)
ntf
weight(k , D) 
 idf
ndl
weight(k , T )  Combweight (k , D1 ),..., weight (k , Dm )
ntf * idf g
weight (k , Di ) 
ndl * Nsize(T )
SIGMOD 2006: Effective Keyword Search in Relational Database
Our Ranking Strategy

Schema Terms in Query



Phrase-based Ranking


lyrics for How come by D12
lusher the singer's lyrics to burn
Using position information to boast phrase matching
Concept-based Ranking


Can improve effectiveness
Can assign semantics to answers
SIGMOD 2006: Effective Keyword Search in Relational Database
Effective Keyword Search in
Relational Databases





Introduction
IR ranking in text databases
Our ranking strategy in RDBs
Experiments
Conclusions and future work
SIGMOD 2006: Effective Keyword Search in Relational Database
Experiments – data set



A Lyrics Database
50 Queries from an AOL query log
Relevance Judgment: pooling + logs
Experiments: some queries






to me lyrics by lionel richie
inner smile texas lyrics
lionel richie lyrics
lionel richie lyrics you mean more to me
avril lavigne lyrics for the album under
this skin
avril lavigne lyrics
Experiments – measure


Reciprocal rank: measures how good the
system is to return the first relevant answer.
MAP (mean average precision): A precision
is computed after each relevant answer is
retrieved. Then we average all precision
values to get a single number to measure
the overall effectiveness.
Experiments – results

Our ranking strategy: the four
new factors.
Experiments – results

Comparison with related works
Effective Keyword Search in
Relational Databases





Introduction
IR ranking in text databases
Our ranking strategy in RDBs
Experiments
Conclusions and future work
SIGMOD 2006: Effective Keyword Search in Relational Database
Conclusions



Effectiveness is as important
as efficiency
The four new factors are
critical to search effectiveness
Our strategy is significantly
more effective than related
works
SIGMOD 2006: Effective Keyword Search in Relational Database
Future Work




Utilize link analysis
Combine non-text columns
Efficiency Problem
More real world data sets
SIGMOD 2006: Effective Keyword Search in Relational Database
Questions ?
SIGMOD 2006: Effective Keyword Search in Relational Database