Download MCDB: T M C D

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

IMDb wikipedia , lookup

Extensible Storage Engine wikipedia , lookup

Entity–attribute–value model wikipedia , lookup

Oracle Database wikipedia , lookup

Microsoft SQL Server wikipedia , lookup

SQL wikipedia , lookup

Open Database Connectivity wikipedia , lookup

PL/SQL wikipedia , lookup

Concurrency control wikipedia , lookup

Microsoft Jet Database Engine wikipedia , lookup

Database wikipedia , lookup

Functional Database Model wikipedia , lookup

ContactPoint wikipedia , lookup

Clusterpoint wikipedia , lookup

Relational model wikipedia , lookup

Database model wikipedia , lookup

Transcript
MCDB: THE MONTE CARLO DATABASE
SYSTEM
Chris Jermaine
Rice U.
Joint work with:
Subi Arumugam
Peter Haas
Luis Perez
Foula Vagena
Cai Zhuhua
...and others...
1
MCDB
• MCDB is a database system being developed at Rice U.
• Extensive support for the declarative subset of SQL
• Full-blown, cost-based query optimizer
• Compiles queries to MapReduce jobs, run on Hadoop
— So it scales to very large data sizes
— Maybe not well, but it scales!
• But that’s not what’s interesting about MCDB
— It’s the native support for stochastic analytics
2
Example
• Say you have archived billions of sales records and want to know:
“What would my profits have been in ‘08 if I’d cut all of my margins by 10%?”
• How might we use a data warehouse to guess this answer?
— Need to “guess” each customer’s demand at new price
— Use to build a hypothetical, revised version of sales table
— Finally, join this table with others (prices, supply costs, etc.) to compute profits
• Here’s how an analyst might use MCDB to do this...
3
Quantity purchased
A Stochastic Demand Model
For a given customer, demand is a linear function
(I know, could do better than linear...)
Price of part A
4
Quantity purchased
A Stochastic Demand Model
To allow for variability, uncertainty,
model is “probabilistic”
Price of part A
5
Quantity purchased
A Stochastic Demand Model
D 0 ∼ Gamma ( k D, θ D )
P 0 ∼ Gamma ( k P, θ P )
Price of part A
Demand curve is generated via samples from twin Gamma distributions
6
Quantity purchased
A Stochastic Demand Model
D 0 ∼ Gamma ( k D, θ D )
P 0 ∼ Gamma ( k P, θ P )
Price of part A
Demand curve is generated via samples from twin Gamma distributions
• This defines what’s known as a “prior” over cust demand curves
7
Quantity purchased
A Stochastic Demand Model
(p, d)
They must go thru this point
Price of part A
Allowable set of demand curves is a “posterior dist” to a Bayesian
• This defines what’s known as a “prior” over cust demand curves
• Taking into account what the cust actually purchased...
— Restricts the set of possible demand curves
8
This Is Where MCDB Comes In
• In MCDB, easy to associate a posterior dist of demand curves...
— with every one of the 100M customers in a large database
— And then use those curves to generate stochastic DB instances
database instance
warehoused data
parameterization
simulation
stochastic model
Done by implementing
a “VG Function” which
performs the simulation
9
This Is Where MCDB Comes In
• In MCDB, easy to associate a posterior dist of demand curves...
— with every one of the 100M customers in a large database
— And then use those curves to generate stochastic DB instances
database instances
warehoused data
parameterization
simulation
stochastic model
Logically, MCDB generates many database
instances (“possible worlds”)
10
This Is Where MCDB Comes In
• In MCDB, easy to associate a posterior dist of demand curves...
— with every one of the 100M customers in a large database
— And then use those curves to generate stochastic DB instances
database instances
warehoused data
parameterization
simulation
stochastic model
Logically, MCDB generates many database
instances (“possible worlds”)
11
This Is Where MCDB Comes In
• In MCDB, easy to associate a posterior dist of demand curves...
— with every one of the 100M customers in a large database
— And then use those curves to generate stochastic DB instances
warehoused data
SQL
database instances
parameterization
simulation
stochastic model
Then a user-issued SQL query
is simultaneously evaluated over all instances
12
This Is Where MCDB Comes In
• In MCDB, easy to associate a posterior dist of demand curves...
— with every one of the 100M customers in a large database
— And then use those curves to generate stochastic DB instances
warehoused data
SQL
n query results
n database instances
parameterization
simulation
stochastic model
Gives an empirical distribution of result sets
13
This Is Where MCDB Comes In
• In MCDB, easy to associate a posterior dist of demand curves...
— with every one of the 100M customers in a large database
— And then use those curves to generate stochastic DB instances
warehoused data
SQL
n query results
n database instances
parameterization
simulation
stochastic model
MCDB
Process happens entirely within MCDB
Download it today!
14