Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Information Retrieval and Databases: Synergies and Syntheses IDM Workshop Panel 15 Sep 2003 Jayavel Shanmugasundaram Cornell University 10000 foot view of Data Management Information Retrieval Systems Ranked Keyword Search Queries Complex and Structured Database Systems Structured Unstructured Data 10000 foot view of Data Management Information Retrieval Systems Ranked Keyword Search Queries Complex and Structured Database Systems Structured Unstructured Data Applications • Information discovery over structured databases • Keyword search over relational databases – DBXplorer [Agrawal et al.] – DISCOVER [Hristidis et al.] – BANKS [Hulgeri et al.] 10000 foot view of Data Management Information Retrieval Systems Ranked Keyword Search Queries Complex and Structured Database Systems Structured Unstructured Data 10000 foot view of Data Management Information Retrieval Systems Ranked Keyword Search Queries Complex and Structured Database Systems Structured Unstructured Data Applications • Content management – Mix of structured and unstructured data • Database with date and time of accident (structured data) and accident description (unstructured data) – Semi-structured data • Scientific documents, Shakespeare’s plays, … • Support flexible ranked keyword search interface over such data – XRANK [Guo et al., SIGMOD 2003] – XIRQL [Fuhr et al., SIGIR 2001] XML Keyword Search <workshop date=”28 July 2000”> <title> XML and Information Retrieval: A SIGIR 2000 Workshop </title> <editors> David Carmel, Yoelle Maarek, Aya Soffer </editors> <proceedings> <paper id=”1”> <title> XQL and Proximal Nodes </title> <author> Ricardo Baeza-Yates </author> <author> Gonzalo Navarro </author> <abstract> We consider the recently proposed language … </abstract> <section name=”Introduction”> Searching on structured text is becoming more important with XML … </section> … <cite xmlns:xlink=”http://www.acm.org/www8/paper/xmlql> … </cite> </paper> … • Most specific results (exploits structure!) • Ranking at granularity of elements 10000 foot view of Data Management Information Retrieval Systems Ranked Keyword Search Queries Complex and Structured Database Systems Structured Unstructured Data Applications • The Internet is enabling end-users to directly ask queries and explore results – E.g., Used car marketplace – Find all “bright red ford mustangs” that cost less than 20% of the average price of cars in its class • Characteristics of queries – Keyword search (for ease of use) – Complex query operations (information synthesis) – Want to see ranked results! Towards Unifying DB and IR • No standard query language for both DB and IR – SQL and XQuery mostly “database” query languages • Currently developing TeXQuery: a full-text search extension to XQuery – With S. Amer-Yahia, C. Botev, J. Robie – Full composability of database and IR primitives, ranking – Submitted to W3C committee on full-text extensions to XQuery Summary • Applications have mix of structured (DB domain) and unstructured (IR domain) data – Stark difference in how they can be processed • Benefits of unifying DB & IR – Ranked keyword search (information discovery) over both structured and unstructured data – Complex queries over structured/semi-structured data • A truly unified data store – Need to generalize DB and IR techniques