Download IDM Workshop Template

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Data analysis wikipedia , lookup

Information privacy law wikipedia , lookup

Entity–attribute–value model wikipedia , lookup

Concurrency control wikipedia , lookup

Database wikipedia , lookup

SQL wikipedia , lookup

Microsoft SQL Server wikipedia , lookup

Open data in the United Kingdom wikipedia , lookup

Semantic Web wikipedia , lookup

Operational transformation wikipedia , lookup

Data vault modeling wikipedia , lookup

Business intelligence wikipedia , lookup

SAP IQ wikipedia , lookup

Clusterpoint wikipedia , lookup

Relational model wikipedia , lookup

Versant Object Database wikipedia , lookup

Database model wikipedia , lookup

Transcript
CAREER: Generating Provably Correct Query Optimizers
Project Award Number: IIS-9984960
Principal Investigator
Mitch Cherniack
Brandeis University, Department of Computer Science
415 South St., Mailstop 018,
Waltham, MA, 02454
Phone: (781) 736-2738
Fax: (781) 736-2741
Email: [email protected]
URL: http://www.cs.brandeis.edu/~mfc/
Keywords
query optimization, formal methods, order optimization, access methods
Project Summary
The central goal of this project is to assist researchers and developers in building query optimizers that are
"provably correct". Specifically, this research group is building a framework which accepts specifications of
optimizer components and their interactions, and generates optimizers that can be shown to satisfy the property
that the plans they construct always return the data specified in a user queries. The group's approach separates the
components of the optimizer into those that require correctness proofs (the safety critical components) from those
that do not. Languages are under design for formally specifying those components, and tools are under
construction that both generate these components according to the specifications, and generate proof obligations
enabling their verification with an automated theorem prover. The experimental research is linked to the
educational goal of training students in the application of formal methods in building large software systems. The
results of this project will provide a sandbox for database researchers in both academia and industry, to introduce
new optimizer techniques and products while providing tangible guarantees that they are free of errors. A
peripheral goal of this project is to exploit the semantic specifications of queries required to formally specify
optimizer behavior, in developing new semantic query optimization techniques.
Publications and Products
[WC03b]
Xiaoyu Wang and Mitch Cherniack, "Avoiding Sorting and Grouping During Query Processing",
Proceedings of the 29th International Conference on Very Large Data Bases (VLDB), September,
2003. To Appear.
[WC03a]
Xiaoyu Wang and Mitch Cherniack, "Order Property Optimization", Brandeis University Technical
Report, Available from http://www.cs.brandeis.edu/~mfc/papers/opopttr.pdf., August 2003.
[AC02]
Daniel Abadi and Mitch Cherniack. Visual COKO: A Debugger for Query Optimizer Development.
Demonstration paper. Proceedings of the ACM SIGMOD Conference, 2002, Madison, WI, June,
2002.
Visual COKO Optimizer Development Debugger. Available for Linux from
http://www.cs.brandeis.edu/~cokokola/dist/cokokola-pre-oxed.tar.gz
COKO-KOLA Query Rewriter Generator. Available for Linux and Solaris from
http://www.cs.brandeis.edu/~cokokola/dist/cokokola-pre-oxed.tar.gz.
Project Impact
The results of this project will have an impact on query optimizer technology and development. Specifically, it
will provide a methodology and associated tools for query optimizer development that ensures provable
correctness. In addition, the semantic specifications of query representations will also influence semantic query
optimization techniques, as we demonstrated in [CZ98b] and [WC03b]. Now in its 3rd year, this project has
supported one Ph.D. student (Xiaoyu Wang, 4th year), one Masters student (Antonella Di Lillo, now graduated),
and (with the assistance of REU supplements) two undergraduate students (Daniel Abadi, now a PhD student at
MIT and Marina Zlatkina). The PI introduced 3 new project-related courses into the Brandeis Computer Science
curriculum. These courses include: an introductory database course (COSI 127b), an advanced course in database
implementation (COSI 128b), and a graduate seminar course in query optimization (COSI 227b, taught in Spring,
2001).
Goals, Objectives and Targeted Activities
The first two years of the grant were used to design and build a framework for specifying query rewriters and plan
generators in a manner permitting their automatic generation and formal verification. To this end, we ported the
query rewriter generator code base (built while the PI was a graduate student at Brown University over Solaris) to
Linux, modified the code base to support semantic query rewriting, as described in [CZ98b], built a graphical
debugger which provided a development environment for query optimizer developers specifying query rewriting
components by enabling visual traces of queries as they get transformed during rewriting as well as standard
debugger features such as breakpoints and "stepping", designed a generic plan generator algebra (GPA) and
formally specified it using the Larch specification tool, and used this formal specification and the Larch theorem
prover to verify several translation rules (rules that translate KOLA queries into their plan algebra equivalents)
and refinement rules (rules that refine plans into equivalent plans based on semantic conditions). This past year,
we have exploited this framework to develop new strategies for refining query plans generated during plan
generation. Specifically, we have studied techniques for inferring organization properties (e.g., ordering,
grouping) of intermediate query results for the purpose of avoiding costly “enforcing” operations that would
reorganize such results to guarantee satisfaction of these properties. This work exploits the semantic properties
known of stored data as well as the semantic properties known to be preserved or enforced by query plan
operators. Our current emphasis is in generalizing this work to infer partial orderings and groupings in processing
queries.
Area Background
Database systems are amongst the most complex software systems in existence, and query optimizers (optimizers)
are amongst their most complex components. Optimizers map declarative descriptions of data (queries) to
algorithmic plans that retrieve the data that the queries describe. An optimizer is correct if the plan it returns is
guaranteed to produce exactly the data specified in the original query.
Formal methods are mathematically-based languages, techniques and tools for specifying and verifying complex
software systems. These tools include: a formal specification language in which the behavior of a system can be
formally expressed, and an automated theorem prover that assists developers in proving certain properties about
their formally specified system (e.g., that the implementation is faithful to the specification). Formal methods
have been successfully applied to the development of such systems as transaction processing systems, compilers,
real-time systems and network management systems. Query optimizers represent another natural application of
this technology.
Our work distinguishes between the two main phases performed during query optimization: query rewriting and
plan generation. During query rewriting, a posed query is transformed into an equivalent query that is in some
way, easier to generate a plan for or cheaper to execute. During plan generation, the query or queries produced by
query rewriting our mapped to algorithms (plans) that efficiently extract and generate the specified data.
Area References
[Cha98]
Chaudhuri, S., An Overview of Query Optimization in Relational Systems Proceedings of the
Seventeenth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, June 13, Seattle, WA, June, 1998.
[CZ98b]
Cherniack, M. and Zdonik, S., Inferring Function Semantics to Optimize Queries. Proceedings of the
24th International Conference on Very Large Data Bases (VLDB), New York, NY, August, 1998.
[CZ98a]
Cherniack, M. and Zdonik, S., Changing the Rules: Transformations for Rule-Based Optimizers.
Proceedings of the ACM SIGMOD International Conference on Management of Data, Seattle, WA,
June, 1998.
[CZ96]
Cherniack, M. and Zdonik, S., Rule Languages and Internal Algebras for Rule-Based Optimizers.
Proceedings of the ACM SIGMOD International Conference on Management of Data, Montreal, Qc,
June, 1996.
[GHG+92] Guttag, J.V., Horning, J.J., Garland, S.G., Jones, K.D., Modet, A., and Wing, J.M., Larch: Languages
and Tools for Formal Specifications. Springer-Verlag, 1992.
[Gra92]
Graefe, G., Query Evaluation Techniques for Large Databases, ACM Computing Surveys, Volume 25,
Number 2, 1993. pp. 73-170.
[PHH92] Pirahesh, Hamid, Hellerstein, Joseph M., and Hasan, Waqar. Extensible/rule-based query rewrite
optimization in Starburst. In Proceedings of the SIGMOD International Conference on Management
of Data, pages 39-48, San Diego, California, June, 1992.
[SAC+79] Selinger, P.G., Astrahan, M. M., Chamberlin, D.D, Lorie, R.A., Price, T.G.,. Access path selection in a
relational database management system. In Proceedings of the SIGMOD International Conference on
Management of Data, pages 23-34, 1979.
[SSM96] Simmen, David E., Shekita, Eugene J., Malkemus, Timothy,. Fundamental Techniques for Order
Optimization. Proceedings of the ACM SIGMOD International Conference on Management of Data,
Montreal, Qc, June, 1996.
Potential Related Projects
Projects that specifically address the formal verification of query optimizers include Leonidas Fegaras' LambdaDB work (which was also supported by a CAREER award), Grant Weddell's Universal Data Representation work,
and the Andreas Heuer's work on the CROQUE query optimizer project. Semantic query optimization has been
addressed in Gryz’s Semantic Query Caching Project, as well as the SQO project of Minker, Raschid et al.
Project Websites
http://www.cs.brandeis.edu/~cokokola
Online Software
COKO-KOLA v0.9 with Visual Debugger. Available at http://www.cs.brandeis.edu/~cokokola/dist/cokokola0.9-preoxed.tar.gz.