* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Monte Carlo algorithm is a randomized algorithm whose running
Microsoft Access wikipedia , lookup
Oracle Database wikipedia , lookup
Concurrency control wikipedia , lookup
Ingres (database) wikipedia , lookup
Entity–attribute–value model wikipedia , lookup
Microsoft Jet Database Engine wikipedia , lookup
Extensible Storage Engine wikipedia , lookup
Relational algebra wikipedia , lookup
Functional Database Model wikipedia , lookup
Microsoft SQL Server wikipedia , lookup
Open Database Connectivity wikipedia , lookup
Clusterpoint wikipedia , lookup
: A Monte Carlo Approach to Managing Uncertain Data • Authors : • • • • • • Ravi Jampani, Fei Xu, Mingxi Wu, Luis Leopoldo Perez, Christopher M. Jermaine, Peter J. Haas. What are we going to talk about today : A brief reminder what is DATABASE SYSTEM . SQL . The MCDB : monte Carlo database system . An example. EXPERIMENTS what is DATABASE SYSTEM ? Definition A database Is An Organized Collection Of Data. It is the collection of schemas, tables, queries, reports , views, and other objects. A database management system (DBMS) is a computer software application that interacts with the user, SQL stands for Structured Query Language. lets you access and manipulate databases. What Can SQL do? • • • • • • • • • • SQL can execute queries against a database SQL can retrieve data from a database SQL can insert records in a database SQL can update records in a database SQL can delete records from a database SQL can create new databases SQL can create new tables in a database SQL can create views in a database SQL can set permissions on tables, procedures, and views … SELECT A1,...,Ak FROM R1,...,Rn WHERE Condition(B1,...,Bm) Attributes List of relation names Condition(B1,...,Bm) == Condition over attributes The result of an SQL query is a list of tuples . Example Courses Student sid course 2 111 DB Zohar 1 111 PL Rami 2 222 PL sid name year 111 Alan 222 333 The query : SELECT DISTINCT name FROM Student, courses WHERE Student.sid = Courses.sid The query result name Alan Zohar CREATE DATABASE dbname; CREATE TABLE table_name ( column_name1 data_type(size), column_name2 data_type(size), column_name3 data_type(size), .... ); Aggregation : Aggregate Functions SQL aggregate functions return a single value, calculated from values in a column. Useful aggregate functions: AVG() - Returns the average value COUNT() - Returns the number of rows FIRST() - Returns the first value LAST() - Returns the last value MAX() - Returns the largest value MIN() - Returns the smallest value SUM() - Returns the sum example : SELECT AVG(column_name) FROM table_name Aggregate Functions CREATE TABLE SBP _DATA (PID ,GENDER, SBP) AS FOR EACH p in PATIENTS WITH SBP AS Normal ( SELECT s.MEAN, s.STD FROM SPB_PARAM s ) SELECT p.PID, p.GENDER ,b.VALUE FROM SBP b All the data All the data parametrization Stochastic model CREATE TABLE SBP _DATA (PID ,GENDER, SBP) AS FOR EACH p in PATIENTS WITH SBP AS Normal ( SELECT s.MEAN , s.STD FROM SPB _PARAM s ) SELECT p.PID, p.GENDER ,b.VALUE FROM SBP b. How it work : 1) MCDB calls the Initialize(SEED) method. 2) MCDB executes the queries that specify the input parameter tables to the VG function. 3) The result ,made available to the VG function in the form of a sequence of arrays called parameter vectors. 4) The parameter vectors are fed into the VG function via a sequence of calls to TakeParams() 5) MCDB then executes the first Monte Carlo iteration by repeatedly calling OutputVals() to produce the rows of the VG function’s output table, 6) MCDB knows that the last output row has been generated when OutputVals() returns a NULL result . 7) MCDB invokes the VG function’s Finalize() method, which deletes any internal VG-function data structures Implementation : Example implement DiscreteChoice function for strings: Accepts x1; x2; : : : ; xn and “weights” w1;w2; : : : ;wn then normalizes the weights into a vector of probabilities P = (p1; p2; : : : ; pn) with pi = wi/ wj , returns a random string X distributed according to P. The TakeParams() function simply adds the incoming parameter vector v to the list L. The Finalize() method de-allocates the storage for L and destroys myRandGen. VG functions can be arbitrary, it is very difficult to analytically compute the effect on the query result of the uncertainty that they embody. The solution MCDB avoids this problem by, 1. using the VG functions to generate a large number of independent and identically distributed. 2. realizations of the random database . running the query on each of them. 3. summarizes the effect of the underlying uncertainty in the form of probability distribution over the possible query results . All the data parametrization All the data VG F Tuple bundles n database instances Stochastic model TUPLE BUNDLES IN DETAIL : • A tuple bundle T with schema S : Simply an array of N tuples • Tuple bundles are manipulated using the new operators and all SQL operators . • Realized attribute values for a random table R can be bundled • With requirement on set of TUPLE BUNDLES ti for each i the set ri = Ùj tj[i] . • MCDB tries to bundle tuples so as to maximize the number of “constant” attributes. • MCDB allows the implementer of a VG function to specify attributes as constant as a hint to the system . • MCDB creates one tuple bundle for every distinct combination of constant-attribute values encountered. • Non-constant attributes values compressed form by Storing only the seed . All the data parametrization Stochastic model All the data VG F n query results SQL n database instances Query Answer : exact computation imposes strong restrictions both on the class of queries that can be handled and on the characteristics of the query answer that can be evaluated . The Monte Carlo approach is that the same general purpose methods apply to any uncertainty model. CREATE TABLE SBP _DATA (PID ,GENDER, SBP) AS FOR EACH p in PATIENTS WITH SBP AS Normal ( SELECT s.MEAN, s.STD FROM SPB_PARAM s ) SELECT p.PID , p.GENDER , b.VALUE FROM SBP b. EXPERIMENTS : Our experimental study is similarly focused, and has two goals: 1. To demonstrate examples of non trivial and “what-if” analyses that are made possible by MCDB. 2. If this analysis is actually practical in realistic application environment. CREATE VIEW from_japan AS SELECT * FROM nation , supplier , lineitem , partsupp WHERE n_name=’JAPAN’ AND s_suppkey=ps_suppkey AND ps_partkey=l_partkey AND ps_suppkey=l_suppkey AND n_nationkey = s_nationkey CREATE VIEW increase_per_cust AS SELECT o_custkey AS custkey, SUM(yr(o_orderdate)-1994.0) /SUM(1995.0-yr(o_orderdate)) AS incr FROM ORDERS WHERE yr(o_orderdate)=1994 OR yr(_orderdate)=1995 GROUP BY o_custkey CREATE TABLE order_increase AS FOR EACH o in ORDERS WITH temptable AS Poisson ( SELECT incr FROM increase_per_cust WHERE o_custkey=custkey AND yr(o_orderdate)=1995 ) SELECT t.value AS new_cnt, o_orderkey FROM temptable_t SELECT SUM(newRev-oldRev) FROM ( SELECT l extendedprice*(1.0-l_discount)*new_cnt AS newRev, (l_extendedprice*(1.0-l_discount)) AS oldRev FROM increase_per_cust, from_japan WHERE l_orderkey=o_orderkey) Q3 : CREATE TABLE prc_hist( ph_month, ph_year, ph_prc, ph_partkey) AS FOR EACH ps in partsupp WITH time series AS Random Walk ( VALUES (ps_supplycost,12,"Dec",1995,(-0.02,0.04)) SELECT month, year, value, ps_partkey FROM time_series_ts SELECT MIN(ph_prc) AS min_prc, ph_month, ph_year, ph_partkey FROM prc_hist GROUP BY ph_month, ph_year, ph_partkey SELECT SUM(min_prc*l_quantity) FROM prc_hist, lineitem, orders WHERE ph_month=month(o_orderdate) AND orderkey=o_orderkey AND yr(o_orderdate)=1995 AND ph_partkey=l_partkey Query Answer : The results obtained by running the four queries The MCDB New Operators The Seed Operator. the Seed operator appends to each tuple created by R’s FOR EACH statement an integer unique to the (tuple, VG function) The Instantiate Operator this operator uses a VG function to generate a set of attribute values corresponding to a Monte Carlo iteration. The Split Operator. takes as input a tuple bundle, with a set of attributes Atts. splits the tuple bundle into multiple tuple bundles, for ech change to constant att . The Inference Operator. and unbundled tuples . The output is distinct tuples Question ??????