* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download IDM Workshop Template
Survey
Document related concepts
Data analysis wikipedia , lookup
Information privacy law wikipedia , lookup
Entity–attribute–value model wikipedia , lookup
Concurrency control wikipedia , lookup
Microsoft SQL Server wikipedia , lookup
Open data in the United Kingdom wikipedia , lookup
Semantic Web wikipedia , lookup
Operational transformation wikipedia , lookup
Data vault modeling wikipedia , lookup
Business intelligence wikipedia , lookup
Clusterpoint wikipedia , lookup
Relational model wikipedia , lookup
Transcript
CAREER: Generating Provably Correct Query Optimizers Project Award Number: IIS-9984960 Principal Investigator Mitch Cherniack Brandeis University, Department of Computer Science 415 South St., Mailstop 018, Waltham, MA, 02454 Phone: (781) 736-2738 Fax: (781) 736-2741 Email: [email protected] URL: http://www.cs.brandeis.edu/~mfc/ Keywords query optimization, formal methods, order optimization, access methods Project Summary The central goal of this project is to assist researchers and developers in building query optimizers that are "provably correct". Specifically, this research group is building a framework which accepts specifications of optimizer components and their interactions, and generates optimizers that can be shown to satisfy the property that the plans they construct always return the data specified in a user queries. The group's approach separates the components of the optimizer into those that require correctness proofs (the safety critical components) from those that do not. Languages are under design for formally specifying those components, and tools are under construction that both generate these components according to the specifications, and generate proof obligations enabling their verification with an automated theorem prover. The experimental research is linked to the educational goal of training students in the application of formal methods in building large software systems. The results of this project will provide a sandbox for database researchers in both academia and industry, to introduce new optimizer techniques and products while providing tangible guarantees that they are free of errors. A peripheral goal of this project is to exploit the semantic specifications of queries required to formally specify optimizer behavior, in developing new semantic query optimization techniques. Publications and Products [WC03b] Xiaoyu Wang and Mitch Cherniack, "Avoiding Sorting and Grouping During Query Processing", Proceedings of the 29th International Conference on Very Large Data Bases (VLDB), September, 2003. To Appear. [WC03a] Xiaoyu Wang and Mitch Cherniack, "Order Property Optimization", Brandeis University Technical Report, Available from http://www.cs.brandeis.edu/~mfc/papers/opopttr.pdf., August 2003. [AC02] Daniel Abadi and Mitch Cherniack. Visual COKO: A Debugger for Query Optimizer Development. Demonstration paper. Proceedings of the ACM SIGMOD Conference, 2002, Madison, WI, June, 2002. Visual COKO Optimizer Development Debugger. Available for Linux from http://www.cs.brandeis.edu/~cokokola/dist/cokokola-pre-oxed.tar.gz COKO-KOLA Query Rewriter Generator. Available for Linux and Solaris from http://www.cs.brandeis.edu/~cokokola/dist/cokokola-pre-oxed.tar.gz. Project Impact The results of this project will have an impact on query optimizer technology and development. Specifically, it will provide a methodology and associated tools for query optimizer development that ensures provable correctness. In addition, the semantic specifications of query representations will also influence semantic query optimization techniques, as we demonstrated in [CZ98b] and [WC03b]. Now in its 3rd year, this project has supported one Ph.D. student (Xiaoyu Wang, 4th year), one Masters student (Antonella Di Lillo, now graduated), and (with the assistance of REU supplements) two undergraduate students (Daniel Abadi, now a PhD student at MIT and Marina Zlatkina). The PI introduced 3 new project-related courses into the Brandeis Computer Science curriculum. These courses include: an introductory database course (COSI 127b), an advanced course in database implementation (COSI 128b), and a graduate seminar course in query optimization (COSI 227b, taught in Spring, 2001). Goals, Objectives and Targeted Activities The first two years of the grant were used to design and build a framework for specifying query rewriters and plan generators in a manner permitting their automatic generation and formal verification. To this end, we ported the query rewriter generator code base (built while the PI was a graduate student at Brown University over Solaris) to Linux, modified the code base to support semantic query rewriting, as described in [CZ98b], built a graphical debugger which provided a development environment for query optimizer developers specifying query rewriting components by enabling visual traces of queries as they get transformed during rewriting as well as standard debugger features such as breakpoints and "stepping", designed a generic plan generator algebra (GPA) and formally specified it using the Larch specification tool, and used this formal specification and the Larch theorem prover to verify several translation rules (rules that translate KOLA queries into their plan algebra equivalents) and refinement rules (rules that refine plans into equivalent plans based on semantic conditions). This past year, we have exploited this framework to develop new strategies for refining query plans generated during plan generation. Specifically, we have studied techniques for inferring organization properties (e.g., ordering, grouping) of intermediate query results for the purpose of avoiding costly “enforcing” operations that would reorganize such results to guarantee satisfaction of these properties. This work exploits the semantic properties known of stored data as well as the semantic properties known to be preserved or enforced by query plan operators. Our current emphasis is in generalizing this work to infer partial orderings and groupings in processing queries. Area Background Database systems are amongst the most complex software systems in existence, and query optimizers (optimizers) are amongst their most complex components. Optimizers map declarative descriptions of data (queries) to algorithmic plans that retrieve the data that the queries describe. An optimizer is correct if the plan it returns is guaranteed to produce exactly the data specified in the original query. Formal methods are mathematically-based languages, techniques and tools for specifying and verifying complex software systems. These tools include: a formal specification language in which the behavior of a system can be formally expressed, and an automated theorem prover that assists developers in proving certain properties about their formally specified system (e.g., that the implementation is faithful to the specification). Formal methods have been successfully applied to the development of such systems as transaction processing systems, compilers, real-time systems and network management systems. Query optimizers represent another natural application of this technology. Our work distinguishes between the two main phases performed during query optimization: query rewriting and plan generation. During query rewriting, a posed query is transformed into an equivalent query that is in some way, easier to generate a plan for or cheaper to execute. During plan generation, the query or queries produced by query rewriting our mapped to algorithms (plans) that efficiently extract and generate the specified data. Area References [Cha98] Chaudhuri, S., An Overview of Query Optimization in Relational Systems Proceedings of the Seventeenth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, June 13, Seattle, WA, June, 1998. [CZ98b] Cherniack, M. and Zdonik, S., Inferring Function Semantics to Optimize Queries. Proceedings of the 24th International Conference on Very Large Data Bases (VLDB), New York, NY, August, 1998. [CZ98a] Cherniack, M. and Zdonik, S., Changing the Rules: Transformations for Rule-Based Optimizers. Proceedings of the ACM SIGMOD International Conference on Management of Data, Seattle, WA, June, 1998. [CZ96] Cherniack, M. and Zdonik, S., Rule Languages and Internal Algebras for Rule-Based Optimizers. Proceedings of the ACM SIGMOD International Conference on Management of Data, Montreal, Qc, June, 1996. [GHG+92] Guttag, J.V., Horning, J.J., Garland, S.G., Jones, K.D., Modet, A., and Wing, J.M., Larch: Languages and Tools for Formal Specifications. Springer-Verlag, 1992. [Gra92] Graefe, G., Query Evaluation Techniques for Large Databases, ACM Computing Surveys, Volume 25, Number 2, 1993. pp. 73-170. [PHH92] Pirahesh, Hamid, Hellerstein, Joseph M., and Hasan, Waqar. Extensible/rule-based query rewrite optimization in Starburst. In Proceedings of the SIGMOD International Conference on Management of Data, pages 39-48, San Diego, California, June, 1992. [SAC+79] Selinger, P.G., Astrahan, M. M., Chamberlin, D.D, Lorie, R.A., Price, T.G.,. Access path selection in a relational database management system. In Proceedings of the SIGMOD International Conference on Management of Data, pages 23-34, 1979. [SSM96] Simmen, David E., Shekita, Eugene J., Malkemus, Timothy,. Fundamental Techniques for Order Optimization. Proceedings of the ACM SIGMOD International Conference on Management of Data, Montreal, Qc, June, 1996. Potential Related Projects Projects that specifically address the formal verification of query optimizers include Leonidas Fegaras' LambdaDB work (which was also supported by a CAREER award), Grant Weddell's Universal Data Representation work, and the Andreas Heuer's work on the CROQUE query optimizer project. Semantic query optimization has been addressed in Gryz’s Semantic Query Caching Project, as well as the SQO project of Minker, Raschid et al. Project Websites http://www.cs.brandeis.edu/~cokokola Online Software COKO-KOLA v0.9 with Visual Debugger. Available at http://www.cs.brandeis.edu/~cokokola/dist/cokokola0.9-preoxed.tar.gz.