* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download MCDB: T M C D
Extensible Storage Engine wikipedia , lookup
Entity–attribute–value model wikipedia , lookup
Oracle Database wikipedia , lookup
Microsoft SQL Server wikipedia , lookup
Open Database Connectivity wikipedia , lookup
Concurrency control wikipedia , lookup
Microsoft Jet Database Engine wikipedia , lookup
Functional Database Model wikipedia , lookup
ContactPoint wikipedia , lookup
Clusterpoint wikipedia , lookup
MCDB: THE MONTE CARLO DATABASE SYSTEM Chris Jermaine Rice U. Joint work with: Subi Arumugam Peter Haas Luis Perez Foula Vagena Cai Zhuhua ...and others... 1 MCDB • MCDB is a database system being developed at Rice U. • Extensive support for the declarative subset of SQL • Full-blown, cost-based query optimizer • Compiles queries to MapReduce jobs, run on Hadoop — So it scales to very large data sizes — Maybe not well, but it scales! • But that’s not what’s interesting about MCDB — It’s the native support for stochastic analytics 2 Example • Say you have archived billions of sales records and want to know: “What would my profits have been in ‘08 if I’d cut all of my margins by 10%?” • How might we use a data warehouse to guess this answer? — Need to “guess” each customer’s demand at new price — Use to build a hypothetical, revised version of sales table — Finally, join this table with others (prices, supply costs, etc.) to compute profits • Here’s how an analyst might use MCDB to do this... 3 Quantity purchased A Stochastic Demand Model For a given customer, demand is a linear function (I know, could do better than linear...) Price of part A 4 Quantity purchased A Stochastic Demand Model To allow for variability, uncertainty, model is “probabilistic” Price of part A 5 Quantity purchased A Stochastic Demand Model D 0 ∼ Gamma ( k D, θ D ) P 0 ∼ Gamma ( k P, θ P ) Price of part A Demand curve is generated via samples from twin Gamma distributions 6 Quantity purchased A Stochastic Demand Model D 0 ∼ Gamma ( k D, θ D ) P 0 ∼ Gamma ( k P, θ P ) Price of part A Demand curve is generated via samples from twin Gamma distributions • This defines what’s known as a “prior” over cust demand curves 7 Quantity purchased A Stochastic Demand Model (p, d) They must go thru this point Price of part A Allowable set of demand curves is a “posterior dist” to a Bayesian • This defines what’s known as a “prior” over cust demand curves • Taking into account what the cust actually purchased... — Restricts the set of possible demand curves 8 This Is Where MCDB Comes In • In MCDB, easy to associate a posterior dist of demand curves... — with every one of the 100M customers in a large database — And then use those curves to generate stochastic DB instances database instance warehoused data parameterization simulation stochastic model Done by implementing a “VG Function” which performs the simulation 9 This Is Where MCDB Comes In • In MCDB, easy to associate a posterior dist of demand curves... — with every one of the 100M customers in a large database — And then use those curves to generate stochastic DB instances database instances warehoused data parameterization simulation stochastic model Logically, MCDB generates many database instances (“possible worlds”) 10 This Is Where MCDB Comes In • In MCDB, easy to associate a posterior dist of demand curves... — with every one of the 100M customers in a large database — And then use those curves to generate stochastic DB instances database instances warehoused data parameterization simulation stochastic model Logically, MCDB generates many database instances (“possible worlds”) 11 This Is Where MCDB Comes In • In MCDB, easy to associate a posterior dist of demand curves... — with every one of the 100M customers in a large database — And then use those curves to generate stochastic DB instances warehoused data SQL database instances parameterization simulation stochastic model Then a user-issued SQL query is simultaneously evaluated over all instances 12 This Is Where MCDB Comes In • In MCDB, easy to associate a posterior dist of demand curves... — with every one of the 100M customers in a large database — And then use those curves to generate stochastic DB instances warehoused data SQL n query results n database instances parameterization simulation stochastic model Gives an empirical distribution of result sets 13 This Is Where MCDB Comes In • In MCDB, easy to associate a posterior dist of demand curves... — with every one of the 100M customers in a large database — And then use those curves to generate stochastic DB instances warehoused data SQL n query results n database instances parameterization simulation stochastic model MCDB Process happens entirely within MCDB Download it today! 14