* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Slides
Survey
Document related concepts
Oracle Database wikipedia , lookup
Entity–attribute–value model wikipedia , lookup
Microsoft Access wikipedia , lookup
Serializability wikipedia , lookup
Microsoft SQL Server wikipedia , lookup
Functional Database Model wikipedia , lookup
Extensible Storage Engine wikipedia , lookup
Ingres (database) wikipedia , lookup
Open Database Connectivity wikipedia , lookup
Microsoft Jet Database Engine wikipedia , lookup
Encyclopedia of World Problems and Human Potential wikipedia , lookup
Concurrency control wikipedia , lookup
ContactPoint wikipedia , lookup
Clusterpoint wikipedia , lookup
Transcript
Dependency Problems for Future Database Systems: Ingres Project D Adrian Hudnott Warwick Postgraduate Colloquium in Computer Science 2008 Research Problems • The Third Manifesto is the subtitle of a book written by Chris Date and Hugh Darwen that defines what a relational database system should and should not do • The Third Manifesto and related publications do not include any implementation techniques • There is no scalable implementation of any part of The Manifesto currently in existence • There are concerns that some of the demands of The Manifesto cannot be met using extensions of techniques in use by SQL DBMS • When these were catalogued all of the difficult ones had a common theme: constraints • All the problems involve the DBMS efficiently deducing a fact using known properties about the data source • Little or no access to the actual data Research Problems • Constraint Enforcement • Semantic Query Optimization • Multiple Simultaneous Assignment • Updates to Views • Interface with SQL Databases • Determination of Type (compile- and runtime) Database Representation • Use Datalog (like Prolog) • E.g. supplier(s1, supplier(s2, supplier(s3, supplier(s4, supplier(s5, smith, 20, london). jones, 10, paris). blake, 30, paris). clarke, 20, london). adams, 30, athens). Views • Expressed using Datalog rules • E.g. supplier_city(C) :supplier(SN, N, ST, C). Notation • Extensional Database: EDB • Intensional Database: IDB • Statistics: Stat • Constraints (invariant): Inv • Assignments: Ri := Ui • Queries: Qi(t) • Boolean expressions: Bi(t) • Cost function: c(exp) Constraint Enforcement Inv IDB U / R Stat EDB (Prove Safety) Inv U / R Or, alternatively: Inv IDB U / R Stat EDB ¬Inv U / R (Prove Failure) N.B. Minimize access to the database (EDB) Updates in the Presence of Views IDB U / R • Example insertion: supplier(s6, carter, 30, oxford). • R is “supplier” • U is <previous definition> OR <tuple is the one above> • supplier_city(C) :- supplier(SN, N, ST, C); (SN=s6, N=carter, ST=30, C=oxford). Materialized Views • Materialized View: contents of a view cached to avoid re-calculation. • Can be modelled using constraints CREATE TABLE supplier (City PRIMARY KEY REFERENCES supplier_city,... CREATE TABLE supplier_city (City PRIMARY KEY REFERENCES supplier); • And vice versa: A constraint is a view that is always empty. Multiple Assignment: Naive Implementation • Example: X := Y, Y := X; • Internal representation: atomic { Z := X; X := Y; Y := Z; } Multiple Assignment .... R’:= U’, R1 := U1, …, Rn := Un; Stat EDB IDB U / R U ' U / Rt Stat EDB IDB U ' t IIF U’ is the first assignment: Inv ( Stat EDB IDB U / R U ' U / Rt Stat EDB IDB U ' t ) Multiple Assignment Idea 1) Semantically optimize RHS expressions 2) Place RHS expressions into canonical form and find subexpressions equal to other RHS expressions 3) Build a dependency graph 4) Depth first search to find cycles 5) Break cycles by creating simulated copies 6) Schedule and execute assignments Transaction Repair • Transaction repair is modifying an update request so that it conforms with the constraints • Compensating actions – E.g. ON DELETE CASCADE • Research on more advanced repairs – Generates more than one repair View Update = Transaction Repair + Disambiguation DELETE FROM supplier_city WHERE City = ‘London’; Transaction repair needed: DELETE FROM supplier WHERE city=‘London’; For more complex cases, disambiguation is needed – Default values, rules, etc. Semantic Optimization Inv IDB Stat Q1 t Q2 t (Equivalence) and: CQ1 < CQ2 No access to the data (Lower Cost) Missing Information • NULLs are flawed for many reasons – Simple example: B OR NOT B is NULL if B is NULL • Can consider missing data as free variables • Problem is to efficiently determine if a formula is true for all possible values of the free variable Inv IDB Stat EDB Q(t ) B1 (t ) B2 (t ) and x dom( a ) : B2 (t ) B2 (t[ x / a ]) Current Progress • Catalogued all known issues requiring investigation • Outlined solutions or literature for the lesser problems – E.g. comparison of large values • Devised formal semantics for type checking Tutorial D programs • Elaborated on the problems requiring original research in the context of dependency • Resolved perceived ambiguities in The Manifesto • Detailed ideas for future development into a solution to the multiple assignment problem • Completed a literature review on constraint checking techniques Project D Roadmap Further literature review & code familiarization Address multiple assignment problem Summer 2008 Oct 2008 – Jan 2009 Development Phase 1 Feb – Aug 2009 Development Phase 2 Aug 2009 – Mar 2010 Thesis write-up Apr – Sep 2010 First complete build Mid – Late 2011 Product released Early 2012 Summing up Project D • Essentially about relieving the industry of the burdens that have been imposed by SQL implementations in the past 30 years • We think that the relational model can be properly implemented without SQL’s restrictions and ad hoc additions – … but requires innovative research • Project D is still in very early development • However, some small successes already