Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Operational transformation wikipedia , lookup
Clusterpoint wikipedia , lookup
Data analysis wikipedia , lookup
3D optical data storage wikipedia , lookup
Forecasting wikipedia , lookup
Information privacy law wikipedia , lookup
Data vault modeling wikipedia , lookup
Business intelligence wikipedia , lookup
What Happened to Theory Anyway? Role of Theory What Makes a Good Theory? Prospects 1 The 2 Most Annoying Things I Like to Say 1. Theoreticians: stop playing games and start worrying about what this stuff is good for. 2. Implementers: take a little time to appreciate the theory in the area where you are about to create a major hack. 2 Example: Universal Relation There is a theory about what happens when you imagine that all your relations are projections of a single relation. Some things become simpler (natural joins, e.g.); other things become harder. Ted Codd hated the idea. • Rule #217: Thou shalt not use the ideas of Maier-Ullman-Vardi. 3 UR --- (2) Whatever you may think of the idea, it keeps getting rediscovered. Moshe Vardi kept a file of over a dozen papers that thought they had discovered a “remarkable, new interface to databases.” Good example of why it pays to learn the theory before you proceed. 4 But Theory is no Less Guilty Confession: I’ve been guilty of at least the following: 1. Working on stuff just because it is fun. 2. Or so that I could get a paper out and maybe tenure. 3. Or to impress my friends. 5 What Makes a Good Theory? The first test is always whether it accomplishes something important. But theories take a long time to develop. “Incremental” work OK here. Decide where to put effort by “cleanliness”: theory should offer a high ratio of power to mechanism. 6 Example: Relational Model The relational model is the best example by far in our field. One basic construct. Good for anything. Perfect for almost nothing. 7 Example: Database logic We thought Datalog was going to take over for SQL; it didn’t. But it is natural and clean, and the study turned out to be worth the effort --- for reasons we didn’t anticipate. Prolog: baroque execution model. AI/logic: way too much mechanism. 8 Application: Information Integration Logical rules seem better tuned to data transformation than querying. name(X||Y) :- first(X) & last(Y) Work of Halevy, collaborators, many others. 9 Application: Data Exchange Work of Fagin, Kolaitis, &c. Data sharing, peer-to-peer. Uses not only logic, but related dependency theory and also some ideas from the universal-relation theory. 10 Application: Software Security Work of Monica Lam, &c. Datalog to express program analysis. More than rules --- stratified negation, e.g. Datalog compiled into BDD’s to exploit regularity in data. • Atypical for “normal” databases. 11 Application: Networks Work of Hellerstein &c. Describes network properties and constraints in Datalog. Clean, succinct descriptions of network protocols. Rapid prototyping. 12 Future 1. Funny data models. I mean XML. 2. Data with regularity. Is there more to BDD’s as a database implementation? 3. Data mining. Let’s get those terrorists. 13 Tree/Graph Data Models It seems very hard to switch from relations to semistructured data models. Which seem rather “clean” at first face. Is there a compromise that will give us the composability of relational algebra + clean queries in the Datalog (or even SQL) style? 14 Structure Within Data BDD’s work for apps when the data has regularity. Aside: BDD’s are a notation for boolean functions that are terrible except for the functions one actually cares about. BDD’s have good algorithms for relational algebra operations. Can the technique carry over to challenging DB apps, e.g., design DB’s? 15 Data Mining Database systems offers an underappreciated view of data mining: it’s queries, not building statistical models. Example: given a set of numbers, compute AVG, not the mean of the most likely Gaussian distribution. Move beyond association rules. 16 Data Mining on Steroids Biggest data-mining problem: track terrorism while protecting privacy. Petabytes of unstructured data. x phoned y; z bought ammonium nitrate with credit card c. Algorithms for multijoin queries that can’t possibly be evaluated fully. 17 Thanks While it’s terrific to receive an award like this, it really acknowledges the work of students and colleagues who developed the ideas and those who, in the fullness of time, put them into practice. 18