* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download SIGMOD 2006: Effective Keyword Search in Relational Databases
Survey
Document related concepts
Encyclopedia of World Problems and Human Potential wikipedia , lookup
Oracle Database wikipedia , lookup
Open Database Connectivity wikipedia , lookup
Entity–attribute–value model wikipedia , lookup
Ingres (database) wikipedia , lookup
Functional Database Model wikipedia , lookup
Microsoft Jet Database Engine wikipedia , lookup
Relational algebra wikipedia , lookup
Concurrency control wikipedia , lookup
Astrophysics Data System wikipedia , lookup
Versant Object Database wikipedia , lookup
ContactPoint wikipedia , lookup
Clusterpoint wikipedia , lookup
Transcript
Effective Keyword Search in Relational Databases Fang Liu (University of Illinois at Chicago) Clement Yu (University of Illinois at Chicago) Weiyi Meng (Binghamton University) Abdur Chowdhury (America Online, Inc.) Effective Keyword Search in Relational Databases Introduction IR ranking in text databases Our ranking strategy in RDBs Experiments Conclusions and future work SIGMOD 2006: Effective Keyword Search in Relational Database Introduction Why keyword search in relational databases? We want to search text data in relational databases SQL with the “contains” operator is not for non-expert users Keyword search is tremendous successful in text database by ranking documents based on similarity. It is for non-expert users SIGMOD 2006: Effective Keyword Search in Relational Database Introduction Text data in relational databases SIGMOD 2006: Effective Keyword Search in Relational Database Introduction Suppose a user is looking for albums titled “off the wall” SIGMOD 2006: Effective Keyword Search in Relational Database Introduction Keyword search is very successful in text database by ranking documents based on similarity. Google, Yahoo and MSN search are the examples. So, let’s do keyword search in relational databases! (DBXplorer, BANKS, DISCOVER & IR-style DISCOVER, ObjectRank, Ranking Objects) SIGMOD 2006: Effective Keyword Search in Relational Database Introduction Let’s do it, but how? What are answers to be ranked? How should we rank these answers? SIGMOD 2006: Effective Keyword Search in Relational Database Introduction -- an answer An answer for a given query Q: a tuple tree, in which every leaf node must have at least one keyword in Q. SIGMOD 2006: Effective Keyword Search in Relational Database Introduction Use a slightly modified algorithm [DISCOVER] to produce all answers for a given query. SIGMOD 2006: Effective Keyword Search in Relational Database Introduction: Ranking Our focus is on the effectiveness problem of ranking answers: the more relevant an answer is to the user query, the higher it should be ranked. SIGMOD 2006: Effective Keyword Search in Relational Database Introduction: Contributions We identify four new factors that are critical to effective ranking and we propose a new ranking strategy Design and conduct comprehensive experiments for the effectiveness problem Experimental results show our strategy is significantly better than existing works in effectiveness SIGMOD 2006: Effective Keyword Search in Relational Database Effective Keyword Search in Relational Databases Introduction IR ranking in text databases Our ranking strategy in RDBs Experiments Conclusions and future work SIGMOD 2006: Effective Keyword Search in Relational Database tf=2, ntf=1.53;tf=10, ntf=2.2; half: idf =0.69, 1/100, idf=4.6, 1/200,000, idf=12, s=0.2 1: ndl=1, half, ndl=0.9, 1/10:ndl = 0.8, 2: ndl=1.2, 10: ndl=2.8 3.3 IR Ranking Q=(k1, k2, ..,kn), D is a document, Sim(Q,D) is the ranking score of D. Sim(Q, D) weight (k , Q) * weight (k , D) kQ , D ntf weight(k , D) idf ndl ntf 1 ln(1 ln(tf )) N idf ln df 1 ndl (1 s ) s dl avgdl SIGMOD 2006: Effective Keyword Search in Relational Databases Effective Keyword Search in Relational Databases Introduction IR ranking in text databases Our ranking strategy in RDBs Experiments Conclusions and future work SIGMOD 2006: Effective Keyword Search in Relational Database Our Ranking Strategy T=(D1,D2,..Dn), so Sim(Q,D)Sim(Q,T) Sim(Q, D) weight (k , Q) * weight (k , D) kQ , D Sim(Q, T ) weight (k , Q) * weight (k ,T ) kQ ,T SIGMOD 2006: Effective Keyword Search in Relational Database Our Ranking Strategy T=(D1,D2,..Dn), so Sim(Q,D)Sim(Q,T) Sim(Q, T ) weight (k , Q) * weight (k ,T ) kQ ,T ntf * idf g weight (k , Di ) ndl * Nsize(T ) weight(k , T ) Combweight (k , D1 ),..., weight (k , Dm ) SIGMOD 2006: Effective Keyword Search in Relational Database Our Ranking Strategy Tuple Tree Size Normalization ntf * idf g weight (k , Di ) ndl * Nsize(T ) size (T ) Nsize (T ) (1 s ) s avgsize # of tuples in a tuple tree T SIGMOD 2006: Effective Keyword Search in Relational Database Our Ranking Strategy Document Length Normalization Reconsidered ntf * idf g weight (k , Di ) ndl * Nsize(T ) Document length of Di dl * 1 ln(avgdl) ndl (1 s ) s avgdl Average Document length of the text column of Di SIGMOD 2006: Effective Keyword Search in Relational Database Our Ranking Strategy Document Frequency Normalization ntf * idf g weight (k , Di ) ndl * Nsize(T ) idf g ln N df g g 1 SIGMOD 2006: Effective Keyword Search in Relational Database Our Ranking Strategy T=(D1,D2,..Dn) maxWgt is the maximum weight(k, Di) sumWgt is the sum of weight(k, Di) weight(k , T ) Combweight (k , D1 ),..., weight (k , Dm ) sumWgt Comb() max Wgt * 1 ln 1 ln max Wgt SIGMOD 2006: Effective Keyword Search in Relational Database Our Ranking Strategy T=(D1,D2,..Dn), so Sim(Q,D)Sim(Q,T) ntf weight(k , D) idf ndl weight(k , T ) Combweight (k , D1 ),..., weight (k , Dm ) ntf * idf g weight (k , Di ) ndl * Nsize(T ) SIGMOD 2006: Effective Keyword Search in Relational Database Our Ranking Strategy Schema Terms in Query Phrase-based Ranking lyrics for How come by D12 lusher the singer's lyrics to burn Using position information to boast phrase matching Concept-based Ranking Can improve effectiveness Can assign semantics to answers SIGMOD 2006: Effective Keyword Search in Relational Database Effective Keyword Search in Relational Databases Introduction IR ranking in text databases Our ranking strategy in RDBs Experiments Conclusions and future work SIGMOD 2006: Effective Keyword Search in Relational Database Experiments – data set A Lyrics Database 50 Queries from an AOL query log Relevance Judgment: pooling + logs Experiments: some queries to me lyrics by lionel richie inner smile texas lyrics lionel richie lyrics lionel richie lyrics you mean more to me avril lavigne lyrics for the album under this skin avril lavigne lyrics Experiments – measure Reciprocal rank: measures how good the system is to return the first relevant answer. MAP (mean average precision): A precision is computed after each relevant answer is retrieved. Then we average all precision values to get a single number to measure the overall effectiveness. Experiments – results Our ranking strategy: the four new factors. Experiments – results Comparison with related works Effective Keyword Search in Relational Databases Introduction IR ranking in text databases Our ranking strategy in RDBs Experiments Conclusions and future work SIGMOD 2006: Effective Keyword Search in Relational Database Conclusions Effectiveness is as important as efficiency The four new factors are critical to search effectiveness Our strategy is significantly more effective than related works SIGMOD 2006: Effective Keyword Search in Relational Database Future Work Utilize link analysis Combine non-text columns Efficiency Problem More real world data sets SIGMOD 2006: Effective Keyword Search in Relational Database Questions ? SIGMOD 2006: Effective Keyword Search in Relational Database