Download CS206 --- Electronic Commerce - The Stanford University InfoLab

yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Database wikipedia , lookup

Operational transformation wikipedia , lookup

Clusterpoint wikipedia , lookup

Data analysis wikipedia , lookup

Data model wikipedia , lookup

3D optical data storage wikipedia , lookup

Forecasting wikipedia , lookup

Information privacy law wikipedia , lookup

Data vault modeling wikipedia , lookup

Business intelligence wikipedia , lookup

Relational model wikipedia , lookup

Database model wikipedia , lookup

What Happened to Theory
Role of Theory
What Makes a Good Theory?
The 2 Most Annoying Things I
Like to Say
1. Theoreticians: stop playing games and
start worrying about what this stuff is
good for.
2. Implementers: take a little time to
appreciate the theory in the area
where you are about to create a major
Example: Universal Relation
There is a theory about what happens
when you imagine that all your relations
are projections of a single relation.
 Some things become simpler (natural joins,
e.g.); other things become harder.
 Ted Codd hated the idea.
• Rule #217: Thou shalt not use the ideas of
UR --- (2)
Whatever you may think of the idea, it
keeps getting rediscovered.
 Moshe Vardi kept a file of over a dozen
papers that thought they had discovered a
“remarkable, new interface to databases.”
Good example of why it pays to learn
the theory before you proceed.
But Theory is no Less Guilty
 Confession: I’ve been guilty of at least
the following:
1. Working on stuff just because it is fun.
2. Or so that I could get a paper out and
maybe tenure.
3. Or to impress my friends.
What Makes a Good Theory?
 The first test is always whether it
accomplishes something important.
 But theories take a long time to
 “Incremental” work OK here.
 Decide where to put effort by
“cleanliness”: theory should offer a
high ratio of power to mechanism.
Example: Relational Model
The relational model is the best
example by far in our field.
 One basic construct.
 Good for anything.
 Perfect for almost nothing.
Example: Database logic
We thought Datalog was going to take
over for SQL; it didn’t.
But it is natural and clean, and the
study turned out to be worth the effort
--- for reasons we didn’t anticipate.
Prolog: baroque execution model.
AI/logic: way too much mechanism.
Application: Information
Logical rules seem better tuned to data
transformation than querying.
name(X||Y) :- first(X) & last(Y)
Work of Halevy, collaborators, many
Application: Data Exchange
Work of Fagin, Kolaitis, &c.
Data sharing, peer-to-peer.
Uses not only logic, but related
dependency theory and also some ideas
from the universal-relation theory.
Application: Software Security
Work of Monica Lam, &c.
Datalog to express program analysis.
 More than rules --- stratified negation, e.g.
 Datalog compiled into BDD’s to exploit
regularity in data.
• Atypical for “normal” databases.
Application: Networks
Work of Hellerstein &c.
Describes network properties and
constraints in Datalog.
Clean, succinct descriptions of network
Rapid prototyping.
1. Funny data models.
I mean XML.
2. Data with regularity.
Is there more to BDD’s as a database
3. Data mining.
Let’s get those terrorists.
Tree/Graph Data Models
It seems very hard to switch from
relations to semistructured data models.
 Which seem rather “clean” at first face.
Is there a compromise that will give us
the composability of relational algebra
+ clean queries in the Datalog (or even
SQL) style?
Structure Within Data
BDD’s work for apps when the data has
 Aside: BDD’s are a notation for boolean
functions that are terrible except for the
functions one actually cares about.
 BDD’s have good algorithms for relational
algebra operations.
Can the technique carry over to
challenging DB apps, e.g., design DB’s?
Data Mining
Database systems offers an underappreciated view of data mining: it’s
queries, not building statistical models.
Example: given a set of numbers,
compute AVG, not the mean of the
most likely Gaussian distribution.
Move beyond association rules.
Data Mining on Steroids
Biggest data-mining problem: track
terrorism while protecting privacy.
Petabytes of unstructured data.
 x phoned y; z bought ammonium nitrate
with credit card c.
Algorithms for multijoin queries that
can’t possibly be evaluated fully.
While it’s terrific to receive an award
like this, it really acknowledges the
work of students and colleagues who
developed the ideas and those who, in
the fullness of time, put them into