Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Readings in Data Management Spring 2008 Computer Science Department Rutgers University Seminar Information Web page: http://www.cs.rutgers.edu/~amelie/courses/dbseminar.html Meets Thursday 1-2:30pm in CoRE A Organization Weekly presentation on a DB topic (30 minutes) We will select 2-3 topics to focus on the course of the semester For each topic Possibly a few external presentations such as: First week: overview paper (survey, influential work) Subsequent weeks: more complex papers on the subject Students preparing for DB conference talks or quals Invited speakers Discussion on the paper Topics First Topic:Probabilistic Databases We will select next topics from (non exhaustive list): Question answering Web Search Personal Information Spaces Query Optimization Data Cleaning Data Integration Data Mining Query Processing Techniques Adaptive, Automatic, Autonomic Systems OLAP Stream Aggregation Storage, Indexing, and System Architecture XML Processing Preference functions Spatial and High-Dimensional Data Recovery Privacy in DBMS … What I expect from you 1-2 presentation over the course of the semester First-year students will be given “overview” presentation assignments at the beginning of each topic More Senior students will present more researchfocused papers Number of presentations depends on the number of students in the seminar Everyone should read the paper in advance and prepare 1-2 questions/discussion topics Participation in discussion There are no “stupid” questions! If you did not understand something, chances are others did not either Presentations I will select a list of papers to present for each topic Start with an introductory paper The papers that go deeper into one or more aspect of the problem You are welcome to suggest some papers on the topic, as long as it is related (so that we can have more meaningful discussions) Papers that I have overlooked Papers on a different aspect of the topic that you would like to focus on First topic: Probabilistic Databases Uncertainty/Imprecision in data Query Semantics Probabilistic Data Representation Next few slides from Dan Suciu’s tutorial, more at Databases Today are Deterministic An item either is in the database or is not A tuple either is in the query answer or is not This applies to all variety of data models: Relational, E/R, NF2, hierarchical, XML, … What is a Probabilistic Database ? “An item belongs to the database” is a probabilistic event “A tuple is an answer to the query” is a probabilistic event Can be extended to all data models; Two Types of Probabilistic Data Database is deterministic Query answers are probabilistic Database is probabilistic Query answers are probabilistic Long History Probabilistic relational databases have been studied from the late 80’s until today: Cavallo&Pitarelli:1987 Barbara,Garcia-Molina, Porter:1992 Lakshmanan,Leone,Ross&Subrahmanian: 1997 Fuhr&Roellke:1997 Dalvi&S:2004 Widom:2005 So, Why Now ? Application pull: The need to manage imprecisions in data Technology push: Advances in query processing techniques Application Pull Need to manage imprecisions in data Many types: non-matching data values, imprecise queries, inconsistent data, misaligned schemas, etc, etc The quest to manage imprecisions = major driving force in the database community Ultimate cause for many research areas: data mining, semistructured data, schema matching, nearest neighbor Technology Push Processing probabilistic data is fundamentally more complex than other data models Some previous approaches sidestepped complexity There exists a rich collection of powerful, non-trivial techniques and results, some old, some very recent, that could lead to practical management techniques for probabilistic databases. Suggested Papers to discuss Nilesh Dalvi, Dan Suciu: Efficient Query Evaluation on Probabilistic Databases. (VLDB 2004). Minos Garofalakis et al, Probabilistic Data Management for Pervasive Computing: The Data Furnace Project. IEEE Data Eng. Bull. 29(1)(2006) Omar Benjelloun, Anish Das Sarma, Chris Hayworth, Jennifer Widom: An Introduction to ULDBs and the Trio System. IEEE Data Eng. Bull. 29(1)(2006) Prithviraj Sen, Amol Deshpande, Representing and Querying Correlated Tuples in Probabilistic Databases (ICDE 2007)