Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Serializability wikipedia , lookup
Open Database Connectivity wikipedia , lookup
Microsoft Jet Database Engine wikipedia , lookup
Entity–attribute–value model wikipedia , lookup
Functional Database Model wikipedia , lookup
Clusterpoint wikipedia , lookup
Concurrency control wikipedia , lookup
Extensible Storage Engine wikipedia , lookup
Oracle Database wikipedia , lookup
An evaluation of the integrity of MySQL and Oracle database management systems By: Tonia S Stakemire Supervised by: John Ebden Submitted in partial fulfilment of requirements for the degree of Bachelor of Science (Honours) Department of Computer Science Rhodes University Grahamstown November 2000 Abstract In order for information in a database to be accurate, the integrity of the data needs to be enforced. A Database Management System (DBMS) is available to ensure this. This project evaluates the integrity of two DBMSs, namely Oracle and MySQL. By testing these DBMSs, the results help to highlight the problems of each and allow for improving current databases or selecting a new one. Oracle is a commercial product with all the standard features such as an SQL interface and user rights, as well as additional functionality such as stored procedures and rowlevel locking. While MySQL has the edge in terms of speed, it lacks certain advanced features found in Oracle. This project compares the features of the two DBMSs, and how these potentially contribute to the integrity of the data. It was found that Oracle did not violate the integrity of the data, while on the other hand MySQL did on several occasions. MySQL did not support check constraints, triggers, foreign keys, transactions, and row level locking. For integrity enforcement MySQL would not be the best choice and Oracle would be more suitable. Oracle is also well suited to an E-commerce environment because it enforces integrity, has a means to write business rules and is web orientated. MySQL, on the other hand, suits an environment where the operations consist mainly of SELECT statements and where speed is essential and therefore would be suitable for a search engine. i Acknowledgements I would like to thank the following people for helping me throughout this year and for their assistance with my project in particular: My supervisor, John Ebden – For his guidance and motivation all through the year. The Calnet lab – For many hours of entertainment and their assistance on many occasions. The staff of the Computer Science Department - For all the support. Patsi – For the great times during the year and for proof reading my thesis for me. Peter – For his encouragement and beneficial suggestions throughout the year. Also for proof reading my final document. ii Table of Contents Chapter 1: Introduction 1 1.1 Aim .................................................................................................................................................. 1 1.2 Overview of Project ....................................................................................................................... 1 1.3 Motivation ....................................................................................................................................... 2 1.4 Overview of Chapters .................................................................................................................... 2 1.5 Summary of Chapter ..................................................................................................................... 3 Chapter 2:Background 4 2.1 Possible Criteria for Evaluating a DBMS .................................................................................... 4 2.2 Why Integrity is Used .................................................................................................................... 4 2.3 Overview of Oracle ........................................................................................................................ 5 2.4 Overview of MySQL ...................................................................................................................... 6 2.5 Summary of Chapter ..................................................................................................................... 7 Chapter 3:Considerations and Design 8 3.1 Implementation considerations ..................................................................................................... 8 3.1.1 External Influences ................................................................................................................... 8 3.1.2 Operating System ..................................................................................................................... 8 3.1.3 Software ................................................................................................................................... 9 3.1.4 Sufficient Tests ......................................................................................................................... 9 3.1.5 Accurate Tests .......................................................................................................................... 9 3.2 Features that MySQL Lacks which Influence the Experiment ................................................. 9 3.2.1 Transactions ............................................................................................................................. 10 3.2.2 Stored Procedures/Triggers ...................................................................................................... 10 3.2.3 Foreign Key .............................................................................................................................. 10 3.3 Design of Experiment..................................................................................................................... 11 3.3.1 Stage 1:Install and Setup DBMS .............................................................................................. 11 3.3.2 Stage 2: Modify Definitions and Create Database ................................................................... 11 3.3.3 Stage 3: Design Experiment ..................................................................................................... 11 3.3.4 Stage 4: Implement Tests ......................................................................................................... 12 3.3.5 Stage 5: Verifying Actions ....................................................................................................... 12 3.3.6 Stage 6: Analyse and Write-up Results .................................................................................... 12 3.4 Summary of Chapter ..................................................................................................................... 12 Chapter 4: Prerequisite for Implementation 13 4.1 Overview of Perl ............................................................................................................................. 13 4.1.1 DBI Modules and their functions ............................................................................................. 13 iii Table of Contents 4.2 Overview of PL/SQL ...................................................................................................................... 15 4.2.1 Benefits of PL/SQL .................................................................................................................. 15 4.3 Summary of Chapter ..................................................................................................................... 15 Chapter 5: Integrity Constraints 16 5.1 Different Types of Integrity .......................................................................................................... 16 5.1.1 Domain Constraints .................................................................................................................. 16 5.1.2 Entity Integrity ......................................................................................................................... 16 5.1.3 Referential Integrity ................................................................................................................. 17 5.1.4 Triggers .................................................................................................................................... 17 5.2 Advantages of Integrity Constraints ............................................................................................ 17 5.2.1 Declarative Ease ....................................................................................................................... 18 5.2.2 Centralized rules ....................................................................................................................... 18 5.2.3 Maximum Application Development Productivity .................................................................. 18 5.2.4 Immediate User Feedback ........................................................................................................ 18 5.2.5 Superior Performance ............................................................................................................... 18 5.2.6 Flexibility ................................................................................................................................. 18 5.3 Domain Constraint Experiments .................................................................................................. 19 5.3.1 Data Type Tests and Results .................................................................................................... 20 5.3.1.1 String/Character Tests and Results .................................................................................... 20 5.3.1.1.1 Analysis of Error Messages (String Tests) ................................................................. 21 5.3.1.1.2 Analysis of Actions (String Tests) .............................................................................. 22 5.3.1.2 Integer Tests and Results ................................................................................................... 23 5.3.1.2.1 Analysis of Error Messages (Integer Tests) ................................................................ 24 5.3.1.2.2 Analysis of Actions (Integer Tests) ............................................................................ 25 5.3.1.3 Float Tests and Results ...................................................................................................... 26 5.3.1.3.1 Analysis of Error Messages (Float Tests) ................................................................... 27 5.3.1.3.2 Analysis of Actions (Float Tests)................................................................................ 27 5.3.1.4 Enumeration Data Type Tests and Results ........................................................................ 28 5.3.1.4.1 Analysis of Error Messages (Enumeration tests) ........................................................ 30 5.3.1.4.2 Analysis of Actions (Enumeration Tests) ................................................................... 31 5.3.2 Check Constraint tests and Results .......................................................................................... 32 5.3.2.1 Analysis of Error Messages (Check Constraints) .............................................................. 33 5.3.2.2 Analysis of Actions (Check Constraints) ........................................................................... 33 5.3.3 Not NULL Tests and Results ................................................................................................... 34 5.3.3.1 Analysis of Error Messages ............................................................................................... 35 5.3.3.2 Analysis of Actions ( Null Tests) ....................................................................................... 35 5.3.4 Overall Evaluation of the Domain Experiments ...................................................................... 36 5.4 Entity Integrity ............................................................................................................................... 37 5.4.1 Primary Key Tests and Results ................................................................................................ 37 5.4.1.1 Analysis of Error Message (Primary Key Tests) ............................................................... 39 5.4.1.2 Analysis of Action (Primary Key Tests) ............................................................................ 39 5.4.2 Unique Key Tests and Results ................................................................................................. 39 5.4.2.1 Analysis of Error Message (Unique Key Tests) ................................................................ 41 5.4.2.2 Analysis of Action (Unique Key Tests) ............................................................................. 41 iv Table of Contents 5.5 Referential Integrity ...................................................................................................................... 42 5.5.1 Foreign Key Tests and Results ................................................................................................. 43 5.5.1.1 Analysis of Error Message (Foreign Key Tests) ................................................................ 46 5.5.1.2 Analysis of Action Foreign Key Tests).............................................................................. 46 5.5.2 Example of Violation ............................................................................................................... 47 5.6 Triggers ........................................................................................................................................... 48 5.7 Summary of Chapter ..................................................................................................................... 49 Chapter 6: Transactions 50 6.1 Overview of Transactions .............................................................................................................. 51 6.1.1 Properties of Transactions ........................................................................................................ 51 6.1.2 SQL Transaction Statements ................................................................................................... 51 6.1.3 Operations and their Relevance ................................................................................................ 52 6.1.4 Transaction States .................................................................................................................... 52 6.1.5 Problems with Transactions ..................................................................................................... 53 6.1.6 Importance of Transactions ...................................................................................................... 54 6.2 Design of Experiment..................................................................................................................... 54 6.2.1 Differences in the Design ......................................................................................................... 54 6.2.2 Methods of Simulating a Crash ................................................................................................ 55 6.2.3 Atomicity Test .......................................................................................................................... 55 6.2.4 Consistency Test ...................................................................................................................... 57 6.3 Significant findings ........................................................................................................................ 58 6.4 Summary of Chapter ..................................................................................................................... 59 Chapter 7: Concurrency Control 60 7.1 Problems With Concurrency ........................................................................................................ 60 7.1.1 Lost Updates ............................................................................................................................. 60 7.1.2 Uncommitted Data ................................................................................................................... 61 7.1.3 Inconsistent Retrievals ............................................................................................................. 61 7.2 Issues Concerning Serialization .................................................................................................... 61 7.2.1 Conflict Serialization................................................................................................................ 61 7.2.2 View Serialization .................................................................................................................... 62 7.3 Different Lock Types and Granularity ........................................................................................ 62 7.3.1 Oracle’s Locking System ......................................................................................................... 63 7.3.2 MySQL’s Locking System ....................................................................................................... 63 7.4 Design of the Experiment .............................................................................................................. 64 7.5 Results and Evaluation .................................................................................................................. 65 7.6 Summary of Chapter ..................................................................................................................... 65 v Table of Contents Chapter 8: Conclusion 66 8.1 Evaluation of MySQL .................................................................................................................... 66 8.2 Evaluation of Oracle ...................................................................................................................... 67 8.3 Overall Evaluation of the Integrity .............................................................................................. 67 8.4 Suitable Environment for Each .................................................................................................... 68 8.4.1 Advantages of each DBMS ...................................................................................................... 68 8.4.2 Environment Suitable for MySQL ........................................................................................... 68 8.4.3 Environment Suitable for Oracle .............................................................................................. 69 8.5 Possible Future Extensions 69 8.5 1 Further research on this Project ................................................................................................ 69 8.5.2 Use Different DBMS with a Similar Projects .......................................................................... 69 8.5.3 Investigate Other Aspects of a DBMS ..................................................................................... 70 8.5.4 Attempt to Solve Problems Found with MySQL ..................................................................... 70 Appendix A vi List of figures List of Figures Figure 2.1 Diagram showing Market Share held by Oracle .................................... 6 Figure 2.2 Figure showing the results of the crash-me-test ..................................... 7 Figure 5.1 An Example of where MySQL violates integrity .................................. 22 Figure 5.2 An example where a string is entered in place of a number ................ 25 Figure 5.3 Example where data exceeds the maximum size of the number ......... 25 Figure 5.4 An example situation where integrity is violated .................................. 28 Figure 5.5 Example where the value is not in the list ............................................. 30 Figure 5.6 Example where it’s better not to have case sensitivity ......................... 31 Figure 5.7 A few examples where check constraints are necessary ...................... 34 Figure 5.8 Graph showing a comparison of the number of error messages ......... 37 Figure 5.9 Charts showing the results of the Entity integrity experiments .......... 41 Figure 5.9 Diagram showing how foreign keys link tables .................................... 43 Figure 5.10 Code showing the schema of a child entity .......................................... 43 Figure 5.11 Code to check no violation of the foreign key constraint ................... 44 Figure 5.12 Delete code to test cascade rule ............................................................ 44 Figure 5.13 Code to reveal changes after deleting .................................................. 44 Figure 5.14 Example of violation of foreign key when inserting ........................... 47 Figure 5.15 Example where foreign key constraint violated when deleting ......... 47 Figure 5.16 Example where foreign key constraint violated when deleting ......... 48 Figure 6.1 Figure showing the states of a transaction ............................................ 53 Figure 6.2 An example of a transaction that leads to deadlock ............................. 54 Figure 6.3 Code for the schema of the state table ................................................... 56 Figure 6.4 Code for the schema for the airport table ............................................. 56 vii List of figures Figure 6.5 Code for the schema for the city table ................................................... 56 Figure 6.6 The procedure to handle an update of the state table .......................... 57 Figure 6.7 The procedure used when updating the airport table .......................... 58 viii List of tables List of Tables Table 5.1 Table showing the data types supported by MySQL and Oracle ......... 19 Table 5.2 Table showing the results of the string data type experiments ............. 21 Table 5.3 Table showing the results of the integer data type experiments ........... 24 Table 5.4 Table showing the results of the float data type experiments ............... 27 Table 5.5 Table showing the results of the enum data type experiments ............. 30 Table 5.6 Table showing the results of the check constraint experiments ............ 33 Table 5.7 Table showing the results of the null constraint experiment ................ 35 Table 5.8 Table showing the results of the primary key experiments................... 38 Table 5.9 Table showing the results of the unique key constraint experiment .... 40 Table 5.10 Table showing the rules for updating/deleting ..................................... 42 Table 5.11 Table showing the results of the foreign key experiments .................. 45 Table 7.1 Table showing possible situations and resulting conditions .................. 61 ix CHAPTER 1 Introduction Tonia Stakemire Chapter 1- Introduction Introduction 1.1 Aim The main aim of this project is to evaluate the integrity of two Relational Database Management Systems (RDBMS), namely Oracle and MySQL. This evaluation will highlight the problems of each DBMS, indicating how the integrity of the data could be lost. The reason why this evaluation is beneficial is that it can assist a Database Administrator (DBA) to make a decision about which DBMS fulfils their specific requirements. The aim is not to prove which is superior as each DBMS has its respective advantages, but to determine which is best employed depending on the needs and circumstances of the application. 1.2 Overview of Project This project is divided into three sections, which are integrity constraints, transactions and concurrency control. The first of these sections, integrity constraints, investigates domain constraints, entity integrity, referential integrity and triggers. The second section tests transactions in a single user environment. Finally, a multi-user environment is simulated so that concurrency control can be evaluated. Integrity constraints can be placed on the data’s attributes to ensure that the data is stored in a consistent and accurate manner. The first type of integrity constraint is the domain constraint, where different data types, check constraints and null constraints were tested. The second type of experiment was for the entity constraints. For these tests primary keys and unique keys were evaluated. The third experiments tested referential constraints, which evaluated foreign keys and demonstrated their importance. Finally, triggers were tested in Oracle to illustrate how the integrity of the data might be violated if they are not implemented. Transactions are used where it is essential that a group of operations be performed as a single task where all of them are executed or none. Because MySQL does not support transactions these experiments had to be designed in a different manner. The approach to these experiments was to use situations where transactions are necessary -1- Tonia Stakemire Chapter 1- Introduction to demonstrate how they ensure integrity. This made it possible to prove that transactions are necessary to maintain valid data. With a multi-user environment where numerous users are accessing the database and several transactions need to be executed concurrently, there is a problem with upholding the isolation property. The isolation property enforces that data used by one transaction cannot be accessed by a second transaction until the first one is complete. To enforce the isolation property locks can be implemented. This experiment was based around the fact that MySQL only implements table level locking while Oracle also supports row level locking. 1.3 Motivation There are many DBMSs available, making the selection process difficult. It is therefore important to know when to use which DBMS and what problems might be associated with the DBMS. Oracle and MySQL were selected because they are both widely used, making it beneficial to find the problems of each. Another reason for selecting these two DBMSs is that they come from very different backgrounds. MySQL is from an open source background whereas Oracle is from a commercial environment. This added an interesting additional aspect to the project. This project will be useful in assisting a Database Administrator (DBA) to determine the best to use by being aware of their respective advantages and downfalls. The project will do this by highlighting the problems that occurred with each DBMS in order to make the DBA aware of the potential problems. This project will also assist a DBA select the DBMS that best fits their requirements. A section is included that outlines the merits and faults of both DBMS and suggests a suitable environment for each. 1.4 Overview of Chapters Chapter 2: - This chapter gives the background information necessary for the project. It explains why integrity was chosen and introduces the DBMSs. -2- Tonia Stakemire Chapter 1- Introduction Chapter 3: -Considerations and design issues are discussed in this chapter. Then the design of the experiment is stated step-by-step. Chapter 4: - Technical information about Perl and PL/SQL is given to help the reader understand the experiments better. Chapter 5: - This chapter discusses the integrity constraint experiment, giving the design of the experiments, their results and the evaluation. Chapter 6: - Transactions are introduced in this chapter with examples of where transactions are necessary. Chapter 7: - This chapter investigates concurrency and the effect on integrity due to its failure. Chapter 8: -This chapter is the conclusion. It summarises the results of all the experiments so that the strengths of each DBMS can be identified. It then suggests suitable environments for each. In this chapter there are proposals for possible future extensions, which include extensions to this project and other similar projects. 1.5 Summary of Chapter This chapter gives the aim of the project and the motivation for it. It also has an overview of the contents of each of the following chapters. The next chapter gives the background for the project. -3- CHAPTER 2 Background Tonia Stakemire Chapter 2- Background Background 2.1 Possible Criteria for Evaluating a DBMS The DBMS selection can be a complex process, which is dependant on what is required and an element of personal preference. The choice is usually based on one of, or a combination of the following criteria [Hansen, 1996: 432]: Application requirements: This is dependent on who the user is and what their information requirements are. For example a database with a search engine as its front-end would need to support multiple users performing many select queries on it and so it must be able to handle many queries simultaneously. Maintaining data consistency: This includes integrity, security and recovery. An example is a web-based database, which has a high security risk, therefore must be secure. If the database needs to be available at all times then it is essential that the recovery time be fast. Response-Time requirements: The response time could be critical in certain cases, especially in web-oriented database whereas it is less important under different circumstances. This project focused on the second of these criteria, evaluating the integrity of the DBMS. Response-time is briefly discussed to add to the overview of MySQL. 2.2 Why Integrity is used When a user accesses a database they expect to receive the accurate information. If the database is corrupt and the integrity is violated this will no longer be possible. The integrity control prevents this from happening by ensuring that the data is always valid. It achieves this by not allowing users to enter incorrect information determined by field definitions. -4- Tonia Stakemire Chapter 2- Background The importance of integrity is highlighted by the definition of a Database Management System (DBMS)[Online Dictionary, 2000]: A database management system (DBMS) can be an extremely complex set of software programs that controls the organisation, storage and retrieval of data (fields, records and files) in a database. It also controls the security and integrity of the database. The DBMS accepts requests for data from the application program and instructs the operating system to transfer the appropriate data. Although this is not the only concern when choosing a DBMS, it was chosen because it is one of the most important issues. Others choices could have been: security, recoverability, performance, cost, ease of use, scalability. 2.3 Overview of Oracle Oracle is a commercial/proprietary database manufacture by Oracle Corporation, which is widely used in the industry, especially for E-business. From the Oracle web page [Oracle Homepage, 2000] the following is claimed: Oracle Corporation is the world's largest supplier of database software. In fact, the very latest independent research shows that Oracle now has more than twice the market share of our closest competitor, IBM DB2. Oracle is even more popular on the UNIX and Windows NT computers that dominate the Internet. All 10 of the world's largest Web sites from Amazon.com to Yahoo use Oracle. Sixty-five percent of the Fortune 100 use Oracle for e-business. The Oracle database is the software that powers the Internet. Form this extract from a chart by Dataquest in May 1999 it can be seen that Oracle is very popular and holds a large percentage of the market share. The chart below is a replication of the Dataquest chart, which clearly displays that Oracle has the majority of the market share and is obviously extremely popular [Oracle Homepage, 2000]. -5- Tonia Stakemire Chapter 2- Background Database Market share- Linux Oracle 61% IBM 16% Other 29% Oracle IBM Other Figure 2.1 Diagram showing Market Share held by Oracle Does this mean that it is always preferable to use Oracle? A lot of users and companies must think so! But why use MySQL then? 2.4 Overview of MySQL MySQL is an open source database that, although not as widely implemented as Oracle, is still very popular. Open source means that anyone can use the database because it is available to download off the Internet. It also means that the source code is available and can be modified by the user. Being open source might influence its popularity since the open source community is fairly large in size and supports open source products well. Furthermore the popularity is probably increased by the fact that MySQL is free (if not used commercially). A third reason why is might be widely used is that it has great performance. The performance can be tested by the crash-me-test. These are benchmark tests that are installed with MySQL. They test the performance of different operations performed on the database. The results of the graph below show a few of the results. These results suggest that Oracle is slower than MySQL. -6- Tonia Stakemire Chapter 2- Background Chart Showing Benchmark Tests 2.5 2 1.5 Time/Seconds MySQL Oracle 1 0.5 0 Alter_table_add create_key +drop delete_key insert wisc_benchmark Test Types Figure 2.2 Figure showing the results of the crash-me-test 2.5 Summary of Chapter This chapter helps the reader understand why this project was chosen. It starts off by explaining what the possible criteria are to evaluate DBMS. There is an explanation of why integrity is important and therefore why it is used as a point of comparison for this project. The other choice that is justified in this chapter is the reason for using Oracle and MySQL. The next chapter discusses the consideration made prior to the evaluation and some implementation details. -7- CHAPTER 3 Considerations and Design Tonia Stakemire Chapter 3-Considerations and Design Considerations and Design If the results of the experiments are to be valid, it is essential that the experiments be performed correctly. This chapter discusses the considerations that have made to ensure this. It also looks at the important factors that influence the experiment. The final section outlines the steps taken in designing the experiment. 3.1 Implementation Considerations There are six issues that are considered when designing the experiments. These issues are discussed in this section with a brief explanation of how each was handled. 3.1.1 External Influences To improve the accuracy of the experiment, keeping the external variables constant as much as possible will standardize it. The experiment was standardised by using: the same operating systems, the same hardware, the same database design and the same data. Both database servers are relational databases that use the same formal language (SQL) as their datadefinition and data-manipulation language. So as not to introduce unnecessary external influences on the consistency of the database, a pre-designed database was used. The database used was the MySQL ’test’ database just with a few minor adjustments. This database consisted of 28 inter related tables with data to be inserted into each table. The amount of data in each table varies from the largest table containing 2998 records to the smallest table containing 7 entries. This database was then exported to Oracle so that the databases were identical. 3.1.2 Operating System The operating system that was used was Redhat Linux version 6.1. This choice was due to the fact that databases are widely implemented on Unix systems and Linux is one of the most common Unix orientated systems at present. To back this statement up it is stated below [Chorafas, 1998:158]: -8- Tonia Stakemire Chapter 3-Considerations and Design by the mid- 1990’s more than 75 percent of the Oracle DBMS business was coming from Unix environment. 3.1.3 Software The reason for choosing Oracle and MySQL was that both are extensively implemented in industry at present, as stated previously. The version of Oracle used was 8.1.5 and the version of MySQL was 3.22, which was the latest stable release. 3.1.4 Sufficient Tests A substantial number of tests were performed to obtain accurate results. To achieve this, all the experiments were run several times. Also, to ensure that the results were not due to random chance the experiments were executed on many different tables with different data. It was verified that the same result was obtained for each experiment. 3.1.5 Accurate Tests It is important that the experiment tests the correct operation. Analysing the error messages verifies this. If the error messages are from some other erroneous operations as opposed to the task that was supposed to be tested then the experiment has to be altered. The main reasons for having to alter the experiment was due to the ‘not null’ constraints and the foreign key constraints. 3.2 Features that MySQL Lacks which Influence the Experiment It appears that MySQL achieves its speed at the expense of important features. This influences the results in this project, which is why it is explained here. The functionality missing from MySQL includes these features [MySQL Documentation, 2000]: Sub –selects Select into table Transactions Stored procedures and triggers and ‘check’ assertions -9- Tonia Stakemire Foreign keys Views Chapter 3-Considerations and Design Only those features that influence its integrity are discussed here. 3.2.1 Transactions Transactions are a group of operations that must be performed on a database as a single logical unit. Using the SQL COMMIT and ROLLBACK commands, transactions can be enforced. The commands achieve this by ensuring that unfinished updates or corrupted activities are not committed in the database. The problem with MySQL is that it does not implement either of these commands. This was because originally MySQL supported ‘atomic operations’ as opposed to transactions because it offered better performance. More details about transactions is given in chapter 6. 3.2.2 Stored Procedure/Triggers A stored procedure is a set of SQL commands that can be compiled and stored in the server. Once they are compiled clients do not need to keep re-issuing the entire query but can refer to the stored procedure. This provides better performance because the query has to be parsed only once and therefore less information needs to be sent between the server and the client. A type of stored procedure is a trigger. A trigger is a statement that is executed automatically by the system when a certain modification takes place. The experiments related to triggers are discussed in chapter 5. 3.2.3 Foreign key A foreign key is a set of attributes in a relation schema that forms a primary key for another schema. Although the syntax for foreign keys is included in MySQL this is only for compatibility. In MySQL the referential integrity is not actually maintained. In chapter 5, assessments are performed on the database to highlight the effect of not having foreign key constraints. -10- Tonia Stakemire Chapter 3-Considerations and Design 3.3 Design of Experiment Database integrity can be divided into three main sections: integrity constraints, concurrency and transactions. Experiments were written for each of these aspects and the results were then analysed. Finally a summary was made of the merits and faults of each of the DBMS. The steps followed are outlined below. 3.3.1 Stage 1: Install and Setup DBMS The first step was to install the software and setup the database correctly. To do this two different user accounts had to be created in Linux so that each of the DBMS could be installed separately. Accounts were then setup1 in each of these DBMS. These were minimal-privilege accounts so that the applications were not executed with administration-level rights. This minimizes security problems. 3.3.2 Stage 2: Modify Definitions and Create Database As stated before the MySQL ‘test’ database was used and had to be adapted to suit the experiments. Since it did not originally include any enumeration data types, foreign key constraints, unique constraints, or check constraints, these had to be added. An ER-diagram was constructed so that suitable positions could be found for these constraints. This enabled the new schemas to be constructed. The new table schemas were then installed on both MySQL and Oracle. Finally the data was inserted using Perl scripts. 3.3.3 Stage 3: Design the Experiment The experiments were then designed for the three sections of integrity constraints, transactions and concurrency. The first step in designing the experiment was to decide what operations should be tested, e.g. foreign key constraints. When this was done the details of the tests had to be decided like trying to alter an attribute that is referenced by a foreign key. 1 Instructions for creating accounts are in the appendix -11- Tonia Stakemire Chapter 3-Considerations and Design 3.3.4 Stage 4: Implement Tests When the experiments were designed they had to be executed on many tables to get accurate results. These experiments were implemented using Perl script and SQL commands. This was quite a complex process because the existing data and table definitions had to be considered to obtain the correct values to be used. When the values to be used were decided on, queries had to be adapted to run on both DBMS. These queries were then run on the DBMS’s and the results and error messages were sent to a file. The error messages were analysed and checked for irregularities. If an unexpected error message was found then this has to be investigated. The reason for this is that an error message is the only method of confirming that the experiment ran correctly. The experiment then had to be corrected, to take this into account. 3.3.5 Stage 5: Verifying Actions The actual changes made to the database were found by querying the database from the DBMS command line. These were recorded for analysis. 3.3.6 Stage 6: Analyse Results The results were analysed which lead to the conclusion where an evaluation was made of the two DBMS and a suggestion was made for where each could be used. 3.4 Summary of Chapter This chapter is about the considerations prior to the experiment and design of the project. It covers the considerations that have to be made to ensure accurate results and how these potential problems were handled in this project. The features that MySQL does not support and influence integrity are listed. Finally the design of the experiment is then given in a stepby-step manner. The next chapter gives the details of the integrity constraint tests and an evaluation of the results found. -12- CHAPTER 4 Prerequisite for Implementation Tonia Stakemire Chapter 4- Prerequisite for Implementation Prerequisite for Implementation This section just gives an introduction into the software products before the experiments, which implement them. Perl scripts were used for the integrity constraint tests, PL/SQL was used for the transaction tests and both Perl and PL/SQL were used for concurrency tests. 4.1 Overview of Perl Perl scripts are used with the Database Interface (DBI) module because they have built in functions for connecting to a database and querying it. The definition of the DBI module is [Perl Homepage, 2000]: ``The DBI is a database interface module for Perl. It defines a set of methods, variables and conventions that provide a consistent database interface independent of the actual database being used.'' The advantage of using scripts is that it is easy to run multiple queries consecutively or simultaneously, simulating a multi user environment, which would be difficult to observe otherwise. They can then be ported from MySQL to Oracle so that the tests are identical. Another advantage of using Perl scripts is that it’s easy to recreate the table and insert the original data. The benchmark scripts that were found on MySQL were adapted to create the table and to connect to the database. 4.1.1 DBI Modules and their functions The functions that these modules support are listed below, with a brief description of their actions. use DBI; This statement is used by Perl to activate the interface that is used for the other functions. Subsequently, multiple Oracle or MySQL database servers can be connected and numerous queries can be sent to any of them via a simple object oriented interface. Two types of objects are available: database handles and statement handles. -13- Tonia Stakemire Chapter 4- Prerequisite for Implementation The initial database handle used is the statement: $dbh = DBI->connect($dsn, undef, undef); Perl returns a database handle to the connect method as shown above. A connection to the database has now been made so that the database can be queried in many different ways. If results need to be retrieved, a statement handle can be created as follows: $sth = $dbh->prepare("SELECT * FROM foo WHERE blah=’123’"); Another important statement handle was used to retrieve a row of data: my $ref = $sth->fetchrow_hashref(); Then to print a header with all the rows returned the following must be executed: my $names = $sth->{'NAME'}; my $numFields = $sth->{'NUM_OF_FIELDS'}; for (my $i = 0; $i < $numFields; $i++) { printf("%s%s", $$names[$i], $i ? "," : ""); } print "\n"; while (my $ref = $sth->fetchrow_arrayref) { for (my $i = 0; $i < $numFields; $i++) { printf("%s%s", $$ref[$i], $i ? "," : ""); } print "\n"; } This was used to retrieve and display the results when checking the foreign key constraints. After this was done the operations were completed with the following statement: $sth->finish(); Once connected to a database, SQL statements can be executed with the following statements: $dbh->do("DROP TABLE foo"); $dbh->do("CREATE TABLE foo (id INTEGER, name VARCHAR(20))"); $dbh->do("INSERT INTO foo VALUES (1, " . $dbh->quote("Tim") . ")"); Finally to disconnect from the database the following statement is executed: $dbh->disconnect(); -14- Tonia Stakemire Chapter 4- Prerequisite for Implementation 4.2 Overview of PL/SQL PL/SQL is an extension to SQL that Oracle has to allow users more functionality. As stated on the Oracle Technology web page [OTN, 2000] this is what they say about PL/SQL: With PL/SQL, you can use SQL statements to manipulate Oracle data and flow-ofcontrol statements to process the data. Moreover, you can declare constants and variables, define procedures and functions, and trap runtime errors. Thus, PL/SQL combines the data manipulating power of SQL with the data processing power of procedural languages. PL/SQL has a block structure, which starts with a BEGIN statement and ends with an END statement. Inside this block there could be one or more transaction and a transaction can extend over more than one block. For the experiments in this project each block encapsulates one experiment and so may have a single transaction or many transactions within it. 4.2.1 Benefits of PL/SQL With SQL only one statement can be executed at a time. PL/SQL enables the user to execute several statements without passing control back to the user after each [Jones, 1997:181]. What this means is that PL/SQL enables the use of procedures, which adds additional functionality to the DBMS. An example of this extra functionality is triggers. Other complex applications can also be written which is not possible without PL/SQL. These include stored procedures and PL/SQL functions [Rob et al, 2000:150]. 4.3 Summary of Chapter The reason for giving some technical information prior to the discussion of the experiments is to enhance the reader’s understanding. In this chapter a bit of information is given about Perl and PL/SQL. The next chapter discusses the integrity constraint experiments and their results. -15- CHAPTER 5 Integrity Constraints Tonia Stakemire Chapter 5- Integrity Constraints Integrity Constraints An integrity constraint is a restriction applied to a set of data, used to minimise data inconsistencies and errors when entering or updating data within the database [Hansen et al, 1996: 368]. These semantic restrictions and integrity constraints attempt to ensure that the data within the database is always in a consistent state. An integrity constraint should be enforced automatically by the DBMS but unfortunately very few do so. 5.1 Different Types of Integrity The main aspects of integrity that will be considered for this project are: Domain Constraints Entity Constraints Referential Integrity Triggers 5.1.1 Domain Constraints The domain constraints are the most elementary of the integrity constraints and every new item entered into the database must obey them. A domain constraint defines a domain, which is the set of possible values that each attribute can have. The tests for these constraints include inserting/replacing data with a different domain type from that specified, updating a value so that it violates the domain constraint, and altering the data type. A non-standard data type is the enumerated data type where a list of the valid entries is given. Only MySQL and not Oracle support this data type. To solve this problem it was implemented on Oracle by using ‘check constraints’ to select the values in the domain. 5.1.2 Entity Integrity Entity constraints consist of primary keys, which are the chosen minimal super keys, and unique keys. Primary keys are a set of one or more attributes that taken collectively make it possible to uniquely identify an entity in the entity set. On the other hand, unique keys are -16- Tonia Stakemire Chapter 5- Integrity Constraints used to disallow duplicates entries. They differ in that primary keys must be not null but unique keys are allowed to be unless specified as not null. 5.1.3 Referential Integrity Referential integrity is used to ensure that a value that appears in one relation for a given set of attributes also appears for the same set of attributes in another relation [Silberschatz et al, 1997:195]. Foreign keys are used to enforce referential integrity. A foreign key is a set of attributes in a relation schema that forms a primary key for another schema. Database modifications can cause violations of the foreign key restriction. An example of this is where an entry in the referencing table does not exist in the referenced table. This is called a dangling tuple and is usually undesirable. To reduce the chances of dangling tuples, action must be taken when updating or deleting data in the table. This is also investigated in the experiments. 5.1.4 Triggers Triggers are not essential but they are useful to alert the user of a problem or to perform a specified action automatically. The definition of a trigger is [Rob et al, 2000:150]: A trigger is procedural SQL code that is automatically invoked by the RDBMS upon the occurrence of a data manipulation event. It is already known that MySQL does not support stored procedure, which means it does not support triggers either. Therefore the experiment had to be designed differently to show why triggers are useful and how the integrity can be violated if they are not implemented. 5.2 Advantages of Integrity Constraints Integrity of the data is essential. Integrity constraints have many advantages over alternative methods such as applications enforcing this. This section briefly describes several advantages for using integrity constraints [OTN, 1999]. The main advantages of integrity constraints are: - declarative ease, centralized rules, maximum application development productivity, immediate user feedback, superior performance and flexibility. -17- Tonia Stakemire Chapter 5- Integrity Constraints 5.2.1 Declarative ease It is easy to define and alter tables with integrity constraints because the SQL statements are simple and the DBMS enforces their functioning. Without integrity constraints, extra program code has to be written. 5.2.2 Centralized Rules The rules are enforced no matter which database application accesses the data because integrity constraints are associated with tables and therefore are stored in the data dictionary. This means that the rules are centralized. 5.2.3 Maximum Application Development Productivity If a business rule is modified then only the constraint has to be changed which cascades the effect throughout the database. 5.2.4 Immediate User Feedback Violations can be checked immediately as the integrity constraints are stored in the data dictionary. 5.2.5 Superior Performance The query optimiser can use these rules to understand more about the data. It then utilizes this to improve the overall performance. 5.2.6 Flexibility This is for both data loads and identification of violations of integrity. It is possible to disable the integrity constraints while loading large amounts of data and then re-enabling it afterwards to ensure new data abides by the rules. Note that this was not done as the original data had to conform to the integrity constraints as well. -18- Tonia Stakemire Chapter 5- Integrity Constraints 5.3 Domain Constraint Experiments The experiments for the domain constraints included testing the data type constraints, check constraints and null/not null constraints. Oracle and MySQL support different data types. Those used for the experiments are shown in the table below: Key: The data type is supported The data type is not supported Data type Supported in Oracle Supported in MySQL Char(size) Varchar(size) tinyint smallint mediumint int integer bigint number float double real enum Table 5.1 Table showing the data types supported by MySQL and Oracle -19- Tonia Stakemire Chapter 5- Integrity Constraints MySQL supports more data types than Oracle does, as can be seen above. This does not mean that Oracle is missing some of the important data types as this is not the case. MySQL just has more variation for certain data types, e.g. many different types of int data types. This is advantageous since it gives the user more flexibility. There are other data types that were not examined including: image data types (BLOB, CLOB, tinyblob, longblob, blob), various date data types and binary data types such as raw and bfile. These data types were not examined for several reasons. The reasons included the fact that the original database did not have data of these types, these types of data take a lot of space to store, there are a lot of issue when looking at the integrity of these types of data therefore this would be a whole project in itself. It is recommended that these data types be investigated in future studies (chapter 8.6). 5.3.1 Data Type Tests and Results The data type experiments are divided into five sections as follows: string, integer, float, enumeration and check constraint. For each of these experiments the error messages as well as the actions were evaluated. The reason why the error messages are given is that they verify that the correct operations were tested. There were many tests performed but the following results only show the most significant findings in order to highlight the main problems. 5.3.1.1 String/Character Tests and Results The significant experiments for the character data types (the attribute is specified to be a char) were performed as follows: INSERT statement: The values tested were a number with/without a decimal, and a string with more characters than specified. The UPDATE command: The tests included changing the values to char, a number with/without a decimal, and too many characters. The ALTER command: The data types were modified to an integer and float. Below is the list of MySQL’s and Oracle’s error messages: -20- Tonia Stakemire Chapter 5- Integrity Constraints 1. ErrM1: DBD::Oracle::db do failed: ORA-01401: inserted value too large for column (DBD ERROR: OCIStmtExecute) at ./insert_integrity_test line 54. 2. ErrM2: DBD::Oracle::db do failed: ORA-01439: column to be modified must be empty to change datatype (DBD ERROR: OCIStmtExecute) at ./alter_table_test line 51. 3. ErrM3: DBD::Oracle::db do failed: ORA-00907: missing right parenthesis (DBD ERROR: OCIStmtExecute) at ./alter_table line 51 4. ErrM4: You have and error with query 4: Duplicate entry 'COACH ECONOMY CLASS DISCOUNTED' for key 3 Below is a table displaying the error messages and the actions resulting from these experiments. Value used for Tests Error Action by Error Action by Message by MySQL Message by Oracle MySQL Oracle Insert Int None Inserted Int None Inserted Int Insert Float None Inserted Float None Inserted Float Insert more Chars None Inserted up to ErrM1 Not Inserted Specified Update to Int None Updated to Int None Updated to Int Update to Float None Updated Float None Updated Float Update to more None Updated up to ErrM1 Not Updated char amount Alter to Int None Set all to 0 ErrM2 Not Altered Alter to Float ErrM4 Not Altered ErrM3 Not Altered Table 5.2 Table showing the results of the string data type experiments These results are analysed in terms of error messages and actions. 5.3.1.1.1 Analysis of Error Messages (String Tests) It is evident from the table that there are only a few significant differences between the results for MySQL and Oracle. The results show that MySQL permitted all actions except altering the data type of the column to a float. The reason for this error message was that on altering -21- Tonia Stakemire Chapter 5- Integrity Constraints the data type there was subsequently a duplicate value. It was not because of the obvious reason that there was a restriction on changing the data type. Oracle, on the other hand, had the same error message because it does not allow a data type to be altered when there is data in that column. Oracle also had an error message when too many characters were inserted. This meant that Oracle prevented a string with more characters than that allowed from being placed into the table either for an insert or an update. Below is an example of where this restriction is beneficial to demonstrate how integrity can be violated: An attribute specified to be an employee code is three characters long. The user accidentally mistypes the entry and appends an additional character to the code entry. If this insertion is permitted, the incorrect code will be inserted in the database. If there is no warning the user will not be aware of the error. There will thus be an incorrect value in the table without the user knowing. Figure 5.1 An Example of Where MySQL Violates Integrity If a situation similar to this example occurs, it is evident that the integrity of the database has been violated. MySQL would allow this situation to occur and so could lose the integrity of its data. Oracle handles the situation differently from MySQL. The integrity would not be lost in Oracle because it does not allow values with additional characters be inserted into the database. 5.3.1.1.2 Analysis of Actions (String Tests) There were two important differences between the actions of MySQL and Oracle, as mentioned in the previous section. The first is when the inserted value had more characters or more numbers than the column’s defined length. The action that MySQL performed when this happened was to insert the value up to the length permitted, truncating any additional input. The reason why this could be a problem is that any value with more than the specified number -22- Tonia Stakemire Chapter 5- Integrity Constraints of characters is not likely to be valid. Oracle, on the other hand, did not have this problem because whenever the value was longer than specified it prevented the action. An example of this is given above in figure 5.1. When there was an attempt to alter the data type of a column, it was only performed by MySQL. The problem with allowing this action is that data is lost. The reason for this is that when the column’s data type has been altered the data in that column can no longer remain the same. MySQL handles this by setting all the values to the default for the new data type. This means that the original data is lost. Oracle handles the situation differently. It does not permit any data type to be altered while there is data in the table. The downside to this is that it is not very flexible. A DBA might not mind losing the data in which case Oracle would be too restrictive. 5.3.1.2 Integer Tests and Results The following experiment for attributes specified as integers were performed using these commands: INSERT command: The values were int, a string, a number with decimal places, and a number larger than the maximum for the data type. The UPDATE command: The values were a number with decimal places, and a string. The ALTER command: A column was altered to char and float. Tests were run on several different integer types, which varied slightly for each DBMS because MySQL supports more integer data types. Below are the error messages for MySQL and Oracle: 1. ErrM1: DBD::Oracle::db do failed: ORA-01722: invalid number (DBD ERROR: OCIStmtExecute) at ./insert_integrity_test line 54. 2. ErrM2: DBD::Oracle::db do failed: ORA-00001: unique constraint (TONIA.SYS_C0011410) violated (DBD ERROR: OCIStmtExecute) at ./update_integrity_test line 53. 3. ErrM3: DBD::Oracle::db do failed: ORA-01439: column to be modified must be empty to change datatype (DBD ERROR: OCIStmtExecute) at ./alter_table_test line 51. -23- Tonia Stakemire Chapter 5- Integrity Constraints 4. ErrM4: DBD::Oracle::db do failed: ORA-00907: missing right parenthesis (DBD ERROR: OCIS 5. ErrM5: You have and error with query 1: Duplicate entry 'AIR CANADA' for key 2 Below is the table of results and a list of the error messages. In the table ‘d.p.’ stands for decimal place. Tests run Message by MySQL Action by MySQL Message by Action by Oracle Oracle Insert Float None Rounded off None Rounded off Insert large no. None Inserted Value ErrM1 Not Inserted Differently Insert String None Inserted 0 ErrM1 Not Inserted Insert char None Inserted 0 ErrM1 Not Inserted Update to String ErrM5 None ErrM1 Not Inserted Update to Float ErrM5 None ErrM2 Not Inserted Alter to Char None Not Changed ErrM3 Not Inserted Alter to Float None Added d.p. ErrM4 Not Inserted Table 5.3 Table showing the results of the integer data type experiments 5.3.1.2.1 Analysis of Error Messages (Integer Tests) The table shows many errors messages for Oracle and a couple for MySQL. This indicates that there are noteworthy differences between their restrictions. One of these differences occurred when inserting strings into the integer column. For this action Oracle issues an error message while MySQL does not restrict the action. An example of where this would not desirable is as follows: -24- Tonia Stakemire Chapter 5- Integrity Constraints Assume an integer attribute is designated to be used for a person’s age. If a user enters the age in words instead of digits then the stored value is a string instead of a number. When the DBA subsequently attempts to find all the people older than a certain age, even if this person is over that age it will not be returned in the query results. Invalid data exists in the database Figure 5.2 An example where a string is entered in place of a number Because MySQL allowed this without warning the user there might be invalid data in the table if a similar situation occurs. Oracle on the other hand will not allow the value to be inserted in the first place so this situation will not occur. The other significant difference was when a number larger than the maximum value was inserted into the database. A situation where this would be undesirable is shown below in figure 5.4. 5.3.1.2.2 Analysis of Actions (Integer Tests) There was a problem when a number larger than the data type’s maximum was inserted. This tests the results when a number larger than the data type’s defined capacity was inserted. With MySQL, the wrong value was inserted and no error message was given. MySQL inserted the value 2147483647 instead of 80000000000. Oracle did not permit this action at all. An example of where this situation could be detrimental is given below: A scientist is inserting data for experiments they have performed. Some of the experiments have very large values for their results. One of these values exceeds the maximum value that the numerical data type can store. When this value is inserted, if no error message is given and an invalid number is inserted then the integrity has been corrupted. Figure 5.3 Example where data exceeds the maximum size of the number -25- Tonia Stakemire Chapter 5- Integrity Constraints As stated above, MySQL inserted an incorrect value. It appears that the reason for this is that when MySQL reaches the maximum value, it overflows. This means that it loops around and starts from the initial value again. Instead of warning the user that the data value is too big, it inserts an incorrect value. Because there was no warning, the user would be unaware of an error. This means that calculations using this value will give incorrect results. In both DBMS the float values are rounded off. That means that if the decimal part was greater than .5 it was rounded up to the next whole number else it remained at the same whole number. Therefore the reason for error messages when updating was that there were duplicate values because the float was rounded off (or because the string was inserted as a 0). Again it was not possible to alter the table with data already in it. This is explained earlier in section 5.3.1.1. 5.3.1.3 Float Tests and Results The experiments were performed on float data types using the following commands: INSERT command: The value was a number without decimal places, a string, and a number with more decimal places than specified. The UPDATE command: This was used to change the values to entries of the data type of a number without a decimal as well as with too many decimal places, and a string. The ALTER command: Changed the column data type to int and char. There were no error messages for MySQL and so for Oracle were: 1. ErrM1: DBD::Oracle::db do failed: ORA-01722: invalid number (DBD ERROR: OCIStmtExecute) at ./insert_integrity_test line 54. 2. ErrM2: DBD::Oracle::db do failed: ORA-01400: cannot insert NULL into ("TONIA"."AIRLINE"."AIRLINE_NAME") (DBD ERROR: OCIStmtExecute) at ./insert_integrity_test line 54. 3. ErrM3: DBD::Oracle::db do failed: ORA-01439: column to be modified must be empty to change datatype (DBD ERROR: OCIStmtExecute) at ./alter_table_test line 51. -26- Tonia Stakemire Chapter 5- Integrity Constraints The next table show the results from the test performed on attributes specified as floats. Tests run Message by Action by Message by Action by MySQL MySQL Oracle Oracle Insert Integer None Added d.p. None Inserted Int Insert String None Inserted 0.00 ErrM1 Not inserted Insert more d.p. None Rounded Off None Rounded up Insert char None Inserted 0.00 ErrM1 Not Inserted Update String None Updated to 0.0 ErrM1 Not Inserted Update Int None Updated Int None Inserted Int Update to more None Rounded Off None Rounded Off None Changed ErrM3 Not Inserted ErrM3 Not Inserted decimal places Alter to Int Values to Int Alter to Char None Not Changed Table 5.4 Table showing the results of the float data type experiments 5.3.1.3.1 Analysis of Error Messages (Float Tests) The findings for this experiment were similar to those of the previous experiments. There were no error messages for MySQL whereas for Oracle there were. One of these occurred when a string was inserted or was the update value. The other was when altering the data type. It was also found that both DBMS permitted a value with more decimal places to be inserted. 5.3.1.3.2 Analysis of Actions (Float Tests) The action that caused the most significant problems was inserting or updating a string because MySQL just inserted a default value whereas Oracle gave an error message. An example of how this could cause a problem is given below: -27- Tonia Stakemire Chapter 5- Integrity Constraints A user enters the values in the incorrect columns and so inserts their name where they should insert their weight. If this is allowed without an error message then there is incorrect data in the table. The dietician then wants to know the average weight of the people to determine whether they are eating correctly. The incorrect entries will be included in this average and so the result will not be accurate Figure 5.4 An example situation where integrity is violated MySQL might suffer from a problem similar to the situation stated above. The reason for this is that MySQL inserted the default2 value when the data was a string. Oracle did not have the same problem, as it did not permit the action. For both DBMS the values are rounded off if they have too many decimal places, e.g. 10.999 became 11.00. This is the similar to what happened for the integer values. Finally when attempting to alter the table Oracle restricted the action while MySQL did not. For MySQL if it was altered to int, all the values were rounded to int. Here only minimal data was lost, which is not too big a problem. This means that MySQL is more flexible than Oracle in this case. However when the data type was changed to char the situation was slightly different because none of the data was changed, as it was no longer in the domain. 5.3.1.4 Enumeration Data Type Tests and Results The enum test was not really an accurate comparison because only MySQL supports enumeration data types whereas Oracle has to use ‘check constraints’. This still worked though because an enumeration data type restricts the data entered to be in a list and ‘check constraints’ enforces that. 2 The default value for float is 0.00 correct to the number of decimal places -28- Tonia Stakemire Chapter 5- Integrity Constraints The experiments were performed using the following commands: INSERT command: The values were an empty string, null, omitted, a value not in the enum type with a default set and not set, a value spelt wrong and a value in a different case (to check if the DBMS is case sensitive). The UPDATE command: Was used to change the entries to a value not in the enum list and null. The ALTER command: A column was changed to have an enum data type (which was not previously) where all the values do and do not match those in the table. The next table shows the results of the tests performed on attributes of the data type enum. Examples of the error messages were as follows: 1. ErrM1: DBD::Oracle::db do failed: ORA-02290: check constraint (TONIA.SYS_C0011257) violated (DBD ERROR: OCIStmtExecute) at ./insert_integrity_test line 54. 2. ErrM2: DBD::Oracle::db do failed: ORA-01400: cannot insert NULL into ("TONIA"."COMPOUND_CLASS"."DISCOUNTED") (DBD ERROR: OCIStmtExecute) at ./insert_integrity_test line 54. 3. ErrM3: DBD::Oracle::db do failed: ORA-01439: column to be modified must be empty to change datatype (DBD ERROR: OCIStmtExecute) at ./alter_table_test line 51. In the table the symbol represents a blank. -29- Tonia Stakemire Chapter 5- Integrity Constraints Tests run Message by Action by Message by Action by MySQL MySQL Oracle Oracle Insert “ “ None Insert “ “ ErrM2 Not Inserted Omit Insert (df) None Insert Default None Insert Default Omit Insert (no df) None Insert 1st Entry None Insert Insert Not in Enum None Insert ErrM1 Not Inserted Spelt Wrong None Insert ErrM1 Not Inserted Different case None Inserts fine ErrM1 Not Inserted Update not in enum None Update to ErrM1 Not Updated Alter Enum (in) None No Change None Altered Alter Enum (not in) None Inserted “ “ ErrM3 Not Altered Table 5.5 Table showing the results of the enum data type experiments 5.3.1.4.1 Analysis of Error Messages (Enumeration Tests) ‘Check constraints’ were stricter than the enumeration types. This is shown by the fact that MySQL did not have any error messages while Oracle did. One of the instances where Oracle issued an error message when MySQL did not was when an empty string was inserted. This supports the idea that ‘check constraints’ are stricter. When a value was not in the list, MySQL, once again, did not issue an error message. This is unsuitable because it meant that MySQL did not notify the user that the value was not in the enumeration data type. An example of a situation where this could cause a problem is given below: A user is entering the towns that people live. In a rush they misspell some of the towns when entering the data. When they try to calculate the population for each town the results will not be accurate. This is because some people will not be registered in any town since the town name was misspelt. Figure 5.5 Example where the value is not in the list -30- Tonia Stakemire Chapter 5- Integrity Constraints For MySQL the integrity would have been broken in this situation, however, for Oracle it would not have because it restricted the action. Another case where only Oracle gives an error message is when changing a data type into an enumeration data type. If all the values are in the list then it executed correctly for both DBMS. However if the list did not include all the data already inserted in the table then only Oracle issued an error message. This means that MySQL violated the integrity of the database in this instance. 5.3.1.4.2 Analysis of Actions (Enumeration Tests) As stated before Oracle enforced these constraints better than MySQL. An example of this is where MySQL allowed an empty string to be inserted whereas Oracle did not. In this case MySQL actually inserted a blank, which is not in the list and therefore should not be allowed. A similar situation occurred when the value was omitted. However for this test Oracle inserted a blank but MySQL did not. MySQL handled this situation more appropriately by inserting the default value instead. This was a more desirable action because it did not violate the integrity. For MySQL it was found that when a value was not in the list, the default3 value was inserted. A specific instance of this was where the entry was the same but in a different case. Oracle viewed the value as invalid while MySQL did not. This was because Oracle is case sensitive while MySQL is not. Both have distinct advantages and disadvantages. MySQL is more flexible than Oracle whereas Oracle can enforce rules more strictly. Below is an example of when it is better not to have case sensitivity: The column needs either the value ‘yes’/’no’. If the user types these in lower case or upper case or mixed it should not make a difference. If the DBMS were case sensitive all the possible values would have to be included in the enumeration list. Figure 5.6 Example where it’s better not to have case sensitivity 3 If a default is not specified then the first value in the list was used. -31- Tonia Stakemire Chapter 5- Integrity Constraints For MySQL this would not be a problem since all variations will have the same meaning. However for Oracle either the definition will have to list every option possible or the user will be given an error message and will have to type it in the correct case. If there is an error message the user might not know why since they might not consider the case sensitivity. Although this is not an integrity issue as such MySQL is often more suitable due to its convenience. 5.3.2 Check Constraint tests and Results It was found that check constraints could only be used on a single table and on a single column. The experiments were performed using the following commands: < or > tests: These tests included inserting a value larger or smaller than allowed. Also the values were updated to values that were < or >. Both entries may not be 0: The values were both 0’s, null and 0, omitted and 0 or both omitted. Where one value must be greater than the other: The value was bigger or the values were the same. The next table show the results of the test performed on attributes with ‘check constraints’. There was only one error message as follows: 1. ErrM1: DBD::Oracle::db do failed: ORA-02290: check constraint (TONIA.SYS_C0011257) violated (DBD ERROR: OCIStmtExecute) at ./insert_integrity_test line 54. Represents a blank -32- Tonia Stakemire Tests run Chapter 5- Integrity Constraints Message by Action by Message by Action by MySQL MySQL Oracle Oracle Insert > or < None Inserted Value ErrM1 Not Inserted Insert Both 0 None 2 * Inserted 0.0 ErrM1 Not Inserted Insert 0 & NULL None Inserted NULL None Insert 0 & “ “ &0 Insert 0 & Omit None 2 * Inserted 0.0 None Insert 0 & “ “ Insert Omit Both None 2 * Inserted 0.0 None Inserted If one > other None Inserted Value ErrM1 Not Inserted Insert = value None Inserted Value ErrM1 Not Inserted Update > None Inserted Value ErrM1 Not Updated Update < None Inserted Value ErrM1 Not Updated Table 5.6 Table showing the results of the check constraint experiments 5.3.2.1 Analysis of Error Messages (Check Constraints) The most significant finding was that MySQL did not support ‘check constraint’ at all. It implements the syntax but does not enforce the rules. This was shown by the fact that there were no errors messages for MySQL. In contrast Oracle enforced ‘check constraint‘ effectively and would not allow a value to violate the constraint to be entered. The only situation that might a problem for Oracle occurs where it is specified that either one of two values is not 0.0. In this situation Oracle allowed one of the values to be null or left blank. This is not an ideal action, as the DBA may have wanted at least one of the entries to have a value. 5.3.2.2 Analysis of Actions (Check Constraints) The difference between the actions was that MySQL did not support check constraints at all whereas Oracle did. Examples are given to demonstrate why it is important that ‘check constraints’ are implemented: -33- Tonia Stakemire Chapter 5- Integrity Constraints There are many cases where constraints are needed. The start time of an experiment cannot be later than the end time. When entering the day of the week it should not be possible to enter anything greater than 7 as this is the upper limit. When a passenger buys a ticket they have to either book one way or both ways, it should not be possible to leave both blank. Figure 5.7 A few examples where check constraints are necessary Another difference was found when one of the values was omitted or null. Oracle left the values blank while MySQL inserted 0.00s. This actually violated the constraint that both values cannot be 0.00. 5.3.3 Not NULL Tests and Results In both the documentation for MySQL and Oracle it states that in SQL, a null represents a missing, unknown, or inapplicable column value and it equates neither to zero nor to a blank. Therefore null is used when a value is not known or an attribute does not have a value. The experiments were performed on the not NULL constraint using the following commands: INSERT command: The values were an empty string, null and omitted. The UPDATE command: Was used to change the values to an empty string and null. The ALTER command: Changed a column type from ‘not null’ to ‘null’ and also altered a column to ‘not null’ which was not previously so. Finally the table below shows the results of the tests performed on attributes with the ‘not null’ constraint. The error messages were as shown below: 1. ErrM1: DBD::Oracle::db do failed: ORA-01400: cannot insert NULL into ("TONIA"."AIRPORT"."AIRPORT_NAME") (DBD ERROR: OCIStmtExecute) at ./insert_integrity_test line 54. 2. ErrM2: You have and error with query 3: ORA-01407: cannot update ("TONIA"."CITY"."CITY_NAME") to NULL (DBD ERROR: OCIStmtExecute) -34- Tonia Stakemire Chapter 5- Integrity Constraints 3. ErrM3: DBD::Oracle::db do failed: ORA-01449: column contains NULL values; cannot alterto NOT NULL (DBD ERROR: OCIStmtExecute) at ./alter_table_test line 51. 4. ErrM4: You have and error with query 5: Column 'rank' cannot be null Tests run Message by Action by Message by Action by MySQL MySQL Oracle Oracle Insert NULL ErrM4 Not Inserted ErrM1 Not Inserted Insert “ “ None Inserted ErrM1 Not Inserted Omit Insert None Inserted ErrM1 Not Inserted Update to NULL None Updated ErrM2 Not Updated Update to “ “ None Updated 0.0 ErrM4 Not Updated Alter to NULL None Nothing ErrM3 Not Altered Add not NULL None Nothing None or (Not) Altered ErrM5 Table 5.7 Table showing the results of the not null constraint experiments 5.3.3.1 Analysis of Error Messages The only restriction the MySQL enforced was that null was not inserted. Oracle enforced this as well as not allowing the value to be omitted or an empty string. Which one of these restrictions is correct depends on how strictly the DBA wants to implement this constraint. Depending on the definition of null and what is desired, this could be considered to be violating the integrity constraint. 5.3.3.2 Analysis of Actions (Null Tests) In Oracle when specified ‘not null’ is specified a valid value must be inserted in that field. MySQL on the other hand only enforces that the value is not null. Sometimes a user will not want anything but a valid entry to be entered in that field in which case Oracle is more suitable. Otherwise if the only values that cannot be entered are null values then MySQL is better. -35- Tonia Stakemire Chapter 5- Integrity Constraints For Oracle the difference between a null value and an empty string is not differentiated. However the value is still viewed as a null value by the system, it is just not displayed as such. From the definition given earlier, it is hard to distinguish whether it is important that a blank and a null value should be shown as different values. 5.3.4 Overall Evaluation of the Domain Experiments It was found that Oracle enforced integrity better than MySQL. This is because MySQL violated the integrity constraint on several occasions. One of these is when there is a restriction on the number of characters, which MySQL permits a user to insert more. A serious integrity violation was found with the integer data type. This was based on the fact that every integer data type has a maximum value that it can hold. When a number larger than specified was used MySQL inserted an invalid number instead of the original value. MySQL implements a data type called enumeration, yet it does not restrict the user in the manner it should. This deceives the DBA since they assume it is enforcing rules that it is not. It was known before hand that MySQL did not support ‘check constraints’ hence these experiments simply confirmed that. MySQL was found to be more flexible in certain circumstances. An example of this is the fact that MySQL uses the default when an incorrect insertion is approved. This included the use of the first value in the list for enum when the value was omitted. It is also not case sensitive which is often more suitable. Another situation is where it allowed the value to be omitted or set to “ “ when it is specified to be ‘not null’. To get an overall idea of the results a chart of the error messages was constructed. The heights of the two bars indicate how many times an error message was given by Oracle and MySQL. The difference between the bars shows the number of times that MySQL violated the integrity. Below is this graph showing an overview of this experiment: -36- Tonia Stakemire Chapter 5- Integrity Constraints Comparision of restrictions of MySQL and Oracle 8 7 6 Number Of Error Messages 5 4 Oracle MySQL 3 2 1 0 String Integer Float Enumeration Check Constraint Null Data Types Figure 5.8 Graph showing a comparison of the number of error messages 5.4 Entity Integrity The entity integrity experiments investigated the integrity on the data of the primary key constraints and the unique key constraints. 5.4.1 Primary Key Tests and Results To test the primary key, tests were performed as follows: INSERT command: There were tests for both tables with single and composite primary keys with the values which included inserting a null, an empty string, omitting the primary key value and a duplicate value in the place of a primary key. The UPDATE command: The tests were to update a single key an empty string or a duplicate value, to update a table with a composite key to an empty string and a duplicate value. The ALTER command: The tests include dropping the primary key for a table and adding new primary keys to a table. -37- Tonia Stakemire Chapter 5- Integrity Constraints The table below shows the error messages and actions for the primary key. SK: Represents the test being performed on a table with a single key CK: Represents the test being performed on a table with a composite key The error messages found were as follows: 1. ErrM1: You have and error with query 1: Column ‘aircraft_code' cannot be null 2. ErrM2: You have and error with query 1: Duplicate entry '' for key 1 3. ErrM3: DBD::Oracle::db do failed: ORA-01400: cannot insert NULL into ("TONIA"."COMPOUND_CLASS"."FARE_CLASS") (DBD ERROR: OCIStmtExecute) at ./primary_key_test line 56. 4. ErrM4: DBD::Oracle::db do failed: ORA-00001: unique constraint (TONIA.SYS_C006733) violated (DBD ERROR: OCIStmtExecute) at ./primary_key_test line 56. 5. ErrM5: DBD::Oracle::db do failed: ORA-01449: column contains NULL values; cannot alterto NOT NULL (DBD ERROR: OCIStmtExecute) at ./alter_table_test line 51. Test performed SK- Insert NULL SK –Insert “ “ SK- Omitted SK –Insert Duplicate CK –Part “ “ CK - Part omitted CK- Part duplicate CK – All “ “ CK – All omitted CK–All duplicate SK – Update to “ “ SK – Update to Dup CK – Update to “ “ CK – Update to Dup Drop Primary Key Add Primary Key MySQL Error Message ErrM1 ErrM1 None or ErrM2 ErrM2 None None None None None ErrM2 None ErrM2 None ErrM2 None None or ErrM2 MySQL Action Not Inserted Insert 1:Insert 2:Not Inserted Not Inserted Inserted Insert Default Inserted Fine Inserted Fine Inserted Fine Not Inserted Inserted Fine Not Updated Updated Fine Not Updated Dropped Added/ Not Added Oracle Error Message Oracle Action ErrM3 ErrM3 ErrM3 Not Inserted Not Inserted Not Inserted ErrM4 ErrM3 ErrM3 Not Inserted Not Inserted Not Inserted Inserted Fine Not Inserted Not Inserted Not Inserted Not Updated Not Updated Not Updated Not Updated Dropped PK Not added None ErrM3 ErrM3 ErrM4 ErrM3 ErrM4 ErrM3 ErrM4 None ErrM5 Table 5.8 Table showing the results of the primary key experiments -38- Tonia Stakemire Chapter 5- Integrity Constraints 5.4.1.1 Analysis of Error Message (Primary Key Tests) Neither DBMS violated the most important restriction for primary keys, which is that a primary key cannot have the same value as another entry. This can be seen by the fact that an error message was given when inserting or updating to a duplicate value. When only part of the key became the same an error message was not given, which is correct. This means that neither DBMS violated the integrity of the data. However, from the table it can be seen that Oracle enforced the primary key constraint more strictly than MySQL does. An example of this is when the value for the primary key was left out for an insert or update. Oracle did not allow this while MySQL did, as long as it did not mean that there was consequently a duplicate entry. The other case where Oracle was stricter was when inserting an empty string. Similar results were found for the composite key. Oracle was stricter than MySQL because it did not allow an empty string or the value to be omitted for part or the entire primary key. MySQL on the other hand did allow this. This was not essentially a violation of the integrity because the primary keys were still unique and it was possible to differentiate between them. 5.4.1.2 Analysis of Action (Primary Key Tests) Both MySQL and Oracle strictly enforced that no duplicate entries were ever in the table. MySQL allowed an empty string and an omitted value to be inserted and the action was to insert a blank. This did not violate the integrity of the database and the primary key constraint was upheld. MySQL also allowed a primary key to be added when there was data in the table, which Oracle did not permit. 5.4.2 Unique Key Tests and Results The unique key tests were as follows: Insert or update a value to null, “ “(empty string), omitted and a duplicate value. The alter tests were adding a unique key constraint when data already existed in the table and dropping a unique key constraint. -39- Tonia Stakemire Chapter 5- Integrity Constraints Sometimes it is desirable only to have a value in the table once, which makes the candidate key and is where the ‘unique constraint’ is utilized. The next experiments were to test the ‘unique constraint’ and the error messages found were as follows: 1. ErrM1: You have and error with query 1: Duplicate entry 'AIR CANADA' for key 2 2. ErrM2: You have and error with query 4: Column 'city_name' cannot be null 3. ErrM3: DBD::Oracle::db do failed: ORA-00001: unique constraint (TONIA.AIRLINE_INDEX) violated (DBD ERROR: OCIStmtExecute) at ./insert_integrity_test line 54. 4. ErrM4: DBD::Oracle::db do failed: ORA-01400: cannot insert NULL into ("TONIA"."TRANSPORT"."TRANSPORT_DESC") (DBD ERROR: OCIStmtExecute) at ./insert_integrity_test line 54. 5. ErrM5: DBD::Oracle::db do failed: ORA-02442: Cannot drop nonexistent unique key (DBD ERROR: OCIStmtExecute) at ./alter_table_test line 51. Test Performed MySQL Error MySQL Oracle Error Oracle Message Action Message Action Insert NULL ErrM2 Not Inserted ErrM4 Not Inserted Insert “ “ None Value Inserted ErrM4 Not Inserted Omit in Insert None Value Inserted ErrM4 Not Inserted ErrM1 Not Inserted ErrM3 Not Inserted None Inserted Fine None Inserted Fine Update NULL ErrM2 Not Updated ErrM4 Not Updated Update to “ “ None Inserted ErrM4 Not Updated Omit in Update None Inserted ErrM4 Not Updated ErrM1 Not Updated ErrM3 Not Updated None Dropped key ErrM5 or none (not) Altered Insert Duplicate Duplicate but different case Update to duplicate Drop unique key Table 5.9 Table showing the results of the unique key constraint experiments When there is a unique constraint, neither Oracle nor MySQL will insert a duplicate entry. -40- Tonia Stakemire Chapter 5- Integrity Constraints 5.4.2.1 Analysis of Error Message (Unique Key Tests) The most important operation in this test was whether the DBMS would allow a duplicate value. Neither permitted an update or insert of a duplicate value, which shows that neither violated the integrity. However, only MySQL gave an error message when a duplicate entry when a different case was inserted. This was discussed previously for the enumeration tests; MySQL was not case sensitive whereas Oracle was. There was also an error messages for both DBMS when a unique constraint was added and there was already duplicate entries in those columns. As found for the primary key experiments, Oracle had extra constraints. These occurred when an empty string or no values were inserted. 5.4.2.2 Analysis of Action (Unique Key Tests) There were no significant findings here since neither DBMS violated the constraint. Oracle enforced an extra restriction that was not always necessary. This was consistent with the findings that Oracle is stricter while MySQL is more flexible. 5.4.3 Overall Evaluation of Entity Integrity The graph below gives a general overview of the results: Graph Showing the Entity Contraint Results 10 9 8 7 Number of Error Messages 6 5 Oracle MySQL 4 3 2 1 0 Single key composite key unique key Constraints Tested Figure 5.9 Chart showing the results of the Entity integrity experiments -41- Tonia Stakemire Chapter 5- Integrity Constraints This graph was constructed in the same manner as the one for the domain constraints. As can be seen from the chart there were only slight differences for the single primary keys. On the other hand the result for the composite key and the unique key have differences due to the fact that Oracle was more restrictive than MySQL. However, for the composite primary keys and the unique key constraints, neither DBMS violated the integrity of the database. MySQL allowed empty strings and the value to be omitted for part of the primary key and unique keys, which caused this variation. 5.5 Referential Integrity These are more complex than domain constraints but just as important. Foreign keys have extra rules to specify what should be done on updating and deleting. The different actions that can be performed when a foreign key constraint has been broken include: Rule Action Restrict This does not allow the action at all when trying to update or delete Set to NULL All associated dependent data is set to NULL when updated or deleted Set to All associated dependent data is set to the default value when updated default or deleted Cascade When updated, all associated dependent data is updated accordingly. When deleted, all associated dependent data is deleted accordingly No action / Similar to restrict except that it checks at the end of the statement or not specified transaction if the constraint is deferred. Table 5.10 Table showing the rules for updating/deleting MySQL supports the syntax for all of these, however it did not actually enforce foreign key constraints. Oracle, on the other hand, enforced only no action on update and no action, cascade and set to NULL on delete. These were the only rules tested. The default for Oracle was no action otherwise one of the other operations must be specified. -42- Tonia Stakemire Chapter 5- Integrity Constraints Below is a diagram that demonstrates how foreign keys work: City City_code State_code Country_name Time_zone_code airport Service City_code Airport Airport_code Airport_code Miles_distant Airport_name Direction Location Minutes_distant Key State_code Country_name Time_zone_code Child table/Referencing Parent table/Referenced Figure 5.9 Diagram showing how foreign keys link tables 5.5.1 Foreign Key Tests and Results The foreign key tests were a little more complicated and had to include a script to ensure that if there was a foreign key constraint that every entry in the child table did actually exist in the parent table. This was run before and after the tests to confirm that this constraint was not violated. An example of this is for the table with this foreign key constraint shown below: @flight_class=$server->create( "flight_class", ["flight_code integer(8)", "fare_class char(3) NOT NULL", "PRIMARY KEY (flight_code, fare_class)", "FOREIGN KEY (flight_code) REFERENCES flight"]); Figure 5.10 Code showing the schema of a child entity The easiest way to test this on Oracle was using sub queries as follows: -43- Tonia Stakemire Chapter 5- Integrity Constraints $query[0]= "select flight_class.flight_code from flight_class where not exists( select flight.flight_code from flight)"; Figure 5.11 Code to check no violation of the foreign key constraint Scripts were also written to reveal the relationship between the tables. These were also run prior and subsequent to the tests to establish any changes that occurred. An example of the code for performing a delete test was: #cascade $query[0]= "delete from fare where restrict_code='AP/68'"; Figure 5.12 Delete code to test cascade rule The test to show if any difference were found was as follows: $query[0]= "select distinct flight_fare.fare_code from flight_fare, fare where flight_fare.fare_code=fare.fare_code and fare.restrict_code='AP/68'"; Figure 5.13 Code to reveal changes after deleting For each of these rules as stated above the following tests were performed: On the table with child attribute: Insert a value that does not exist in the parent table. Update a value to a value that does not exist in the parent table On the table with the parent attribute: Delete where rule is set NULL Delete where rule is cascade Delete where rule is not specified -44- Tonia Stakemire Chapter 5- Integrity Constraints Update where rule is not specified The column was dropped The results for the foreign key tests are shown in the table below. The symbols used in the table are as follows: ChildT – this is performed on the child table (dependent on another tables attribute) that is the referencing table. If not specified then the test is performed on the parent table. Rules specified for updating and deleting: NA – No action specified for update or delete Cascade – Cascade on delete SN – Set Null on delete The error message were as follows: 1. ErrM1: DBD::Oracle::db do failed: ORA-02291: integrity constraint (TONIA.SYS_C0012416)violated - parent key not found (DBD ERROR: OCIStmtExecute) 2. ErrM2: DBD::Oracle::db do failed: ORA-02292: integrity constraint (TONIA.SYS_C0012416)violated - child record found (DBD ERROR: OCIStmtExecute) at ./foreign_key_testline 52. 3. ErrM3: DBD::Oracle::db do failed: ORA-02273: this unique/primary key is referenced by some foreign keys (DBD ERROR: OCIStmtExecute) at ./alter_table_test line 51. Test performed MySQL Error MySQL Action Message Oracle Error Oracle Action Message Insert ChildT None Value Inserted ErrM1 Nothing Inserted Update ChildT None Value updated ErrM1 Nothing Updated Update NA None Value Updated ErrM2 Nothing Updated Delete NA None Value Deleted ErrM2 Nothing Updated Delete Cascade None Value Deleted None Cascaded deleted Delete SN None Value Deleted None or ErrM2 Nothing/set Null Drop Referenced None Dropped ErrM3 Nothing Altered Table 5.11 Table showing the results of the foreign key experiments -45- Tonia Stakemire Chapter 5- Integrity Constraints Although it was known beforehand that foreign key restraints are not upheld in MySQL, the focus of this experiment was to demonstrate how important they are. As seen by the results there were dangling tuples. A dangling tuple is an entry that exists in the child table but not in the parent table. This lead to problems because the data might not be accessible when executing queries and consequently incorrect results are returned. 5.5.1.1 Analysis of Error Message (Foreign Key tests) The lack of error messages for MySQL clearly shows that MySQL did not support transactions. Oracle restricted everything except for when trying to delete and the rule was set as cascade and sometimes when it was set NULL. All other actions were not permitted. 5.5.1.2 Analysis of Action (Foreign Key Tests) In this section the incorrect actions will be outlined and in the next section specific results and their problem will be given. The resulting action is more important for these experiments than the error messages. This is because if the appropriate action is not taken, the database is left in an inconsistent state and integrity is violated. The reason for the integrity being violated is that dangling tuples are created. For each of the tables code was run before and afterwards to check that there were no dangling tuples. A dangling tuple exists if a query references another table’s column and all the entries in that table do not already exist in the parent entity table. The code in figure 5.11 for Oracle or figure 5.12 for MySQL checked that dangling tuples did not exist by returning an empty set. When a value was inserted into the child column (the column that references another column) that does not exist in the parent table, MySQL inserted the value without an error message while Oracle did not permit the action. Allowing this violates the integrity and so the MySQL database was no longer consistent. For MySQL there were also violations for all of the delete and update operations after executing the queries, not before. Oracle had no violations. The values that were returned were dangling tuples. This meant that MySQL allowed queries to be executed that left the database in an inconsistent state, violating the integrity. -46- Tonia Stakemire Chapter 5- Integrity Constraints Finally to check whether Oracle had performed the operation correctly, where permitted, the code in figure 5.13 was executed. The original results and the results after executing the code were compared so that the changes could be seen. Oracle altered the table correctly but MySQL did not as it set the appropriate values to null or deleted the values that were linked. 5.5.2 Example of Violation Flight_class references flight by the column flight_code and therefore an insert into flight_class was used for one of the insert examples. All the possible values for flight_code in the flight table were found. Then a new value was inserted into the flight_class table, which was not in the list of values returned before. Figure 5.14 Example of violation of foreign key when inserting Oracle did not allow this insert while MySQL did. Therefore there was a value in the flight_class table that violated the integrity, as it was a dangling tuple. Tests were performed that specified the action to be cascade, set NULL or not stated. The syntax for MySQL supports all of these however the result was the same because the syntax does not perform any action. The steps taken in performing this test were as follows: A parent table city was selected and deleted from. This table is referenced by airport_service as shown in figure 5.9. An entry was then deleted from the table. This should cause an action in the child table, according to the specified rule. The results of this action were then tested. Figure 5.15 Example where foreign key constraint violated when deleting -47- Tonia Stakemire Chapter 5- Integrity Constraints If the rule was not specified, the only time Oracle allowed the delete was when the entry was not in the child entity. When the rule was ‘set null’, Oracle permitted the action as long as the corresponding column allowed null values. Oracle set the appropriate values to null as specified, while MySQL left the values unchanged. If the rule was the cascade rule then it was always permitted. This resulted in the value being deleted in all the children tables for Oracle but not MySQL. When updating, a similar method to the delete experiments was used as shown below: A parent table airport was selected and updated. This table is referenced by airport_service as shown in figure 5.9. An entry was then updated in the table. This should cause a reaction in the child table according to the specified rule if this constraint is upheld. Figure 5.16 Example where foreign key constraint violated when deleting Oracle did not permit any of the actions whereas MySQL allowed all of them. This meant that MySQL violated the integrity because there were dangling tuples. 5.6 Triggers A trigger is a statement that is executed automatically by the DBMS when certain modifications are made to the database. Triggers are a type of stored procedure that is parsed once and invoked every time a user performs a certain action. It is already known that MySQL does not support stored procedures. This includes triggers because they are a type of stored procedure. MySQL does not plan to support triggers in the future. This is shown by their claim in their to-do list [MySQL Homepage, 2000]: Stored procedures. This is currently not regarded to be very important as stored procedures are not very standardized yet. Another problem is that true stored procedures make it much harder for the optimizer and in many cases the result is slower than before. -48- Tonia Stakemire Chapter 5- Integrity Constraints As stated here stored procedures slow down the processing of the data. Another problem is that they can cause unexpected results if they fire off other triggers causing a cascading effect. Nevertheless they are valuable as they add extra functionality so that certain critical actions happen implicitly. However, the standard SQL-92 does not include triggers so they can be thought of as an advanced feature. Oracle supports triggers, which are written in PL/SQL, Java or C. These types of triggers include: Row Triggers and Statement Trigger AFTER and BEFORE Triggers INSTEAD OF Triggers Triggers on System Events and User Events Triggers are a non standard feature which do not necessarily need to be implemented. They add extra functionality and allow the DBA to restrict more actions or to automatically update the database. This feature was not investigated in any detail because it is not a standard feature (to see code for triggers the related code for this project can be viewed). 5.7 Summary of Chapter Oracle enforces all of the integrity constraints. It was found though, that Oracle does not support the enumeration data type so this had to be implemented using ‘check constraints’. MySQL on the other hand does not support ‘check constraints’, foreign keys and triggers. MySQL violated the integrity on several occasions. The next chapter discusses transactions. -49- CHAPTER 6 Transactions Tonia Stakemire Chapter 6- Transactions Transactions If a collection of several operations on a database must be performed as a single unit then these are considered to be a transaction. This might be a single SQL statement or several statements that must be executed consecutively and the operations may involve I/O activities or CPU activities or both. In this chapter a single user environment is simulated whereas the next chapter investigates a multi-user environment. A transaction begins with the first executable statement and ends with a commit statement or the termination of the transaction. On the MySQL web page [MySQL Homepage, 2000] they state that they can get around the problem of transactions as follows: MySQL, in almost all cases, allows you to solve for potential problems by including simple checks before updates and by running simple scripts that check the databases for inconsistencies and automatically repair or warn if such occurs. Note that just by using the MySQL log or even adding one extra log, one can normally fix tables perfectly with no data integrity loss. This is not investigated in this project but could be further explored in future projects. These tests are simply to demonstrate how important transactions are. There are different views about MySQL not supporting transactions. MySQL states: Moreover, fatal transactional updates can be rewritten to be atomic. In fact, we will go so far as to say that all integrity problems that transactions solve can be done with LOCK TABLES or atomic updates, ensuring that you never will get an automatic abort from the database, which is a common problem with transactional databases. But on the Openacs page they argue: Furthermore, the MySQL manual claims that MySQL will soon implement "atomic operations" through the use of table locks, but without rollback. This is a blatant misuse of the term "atomic," which implies that either none or all operations will complete. A hardware or power failure in the middle of a set of statements will break the atomicity of the block if there is no rollback capability. -50- Tonia Stakemire Chapter 6- Transactions 6.1 Overview of Transactions 6.1.1 Properties of Transactions To ensure integrity, the database must not violate the following properties: Atomicity: All of a transaction is executed or none of it. Consistency: Concurrent transaction are treated as though they were executed serially, especially in a multi user environment Isolation: Each transaction is unaware that other transactions are executing concurrently because both may not access the same data consecutively Durability: After a transaction completes successfully, the changes it has made to the database persist, even if there are system failures This is supported by Openacs [Openacs, 2000] as stated: An enterprise-level system will never compromise certain features for speed. The ACID properties of an RDBMS are an absolute necessity for any critical data. Critical web sites that run on non-ACID-compliant systems are asking for trouble. For these experiments a single user environment is used which already ensures that isolation and durability is not violated therefore only atomicity and consistency will be analysed here. 6.1.2 SQL Transaction Statements A SQL statement is COMMIT or COMMIT WORK, which both perform the same operation. By default on Oracle, the autocommit option is turned off as shown by the command show autocommit;. This means that it does not commit until it reaches a commit statement. When a commit statement is reached the changes are made permanent and the transaction is ended. Comments can be added to the commit statement so that the user will understand what is going on. Another important SQL statement related to transactions is ROLLBACK or ROLLBACK WORK. If a rollback statement is reached all the alterations are aborted and the database will be in the same state as it was before the transaction. If a failure causes and abnormal abortion then this -51- Tonia Stakemire Chapter 6- Transactions should be equivalent to a rollback statement and none of the changes should be permitted. The work command can be used with this although it does not add any extra functionality. A specific savepoint can also be specified so that only part of the transaction rolls back. The command SAVEPOINT can be used to subdivide a transaction by marking certain parts of the transaction. This can then be used when there is an error to rollback to a point, which is marked instead of rolling back the whole transaction. The command SET TRANSACTION can also be used but this will not be implemented in these experiments. It is used when the transaction consists of read only operations. Oracle supports statement level rollbacks where if there is a problem after as single statement, this statement can be rolled back. 6.1.3 Operations and their Relevance There are two operations, namely: Read(X) where data is transferred from the database to a buffer Write(X) where data is transferred from the buffer to the database The read operation can be thought of as a select statement and will not be used for these experiments. The write operations can be thought of as any update or modification to the data and therefore will be tested here as updates are used. Write operations also include inserts and deletes. 6.1.4 Transactions States There are 5 basic transaction states, which are related as shown in the diagram below: -52- Tonia Stakemire Chapter 6- Transactions Partially Committed Committed Failed Aborted Active Figure 6.1 Figure showing the states of a transaction Initially the transaction is in an active state and its stays in this state while executing. It is partially committed after the final statement and will only be properly committed after a successful completion. The case that causes inconsistencies is the failed state, which leads to the aborted state. The transaction goes into a failed state when it can no longer execute as planned and then the transaction must be rolled back to the state prior to the transaction. 6.1.5 Problems with Transactions The problem with transactions is that either all of the transaction must be performed or none of it executes which means that inconsistencies may be introduced when there is a failure during a transaction. Deadlock is another problem, which occurs when one transaction accesses the same schema object as another one and both are waiting for the other to release the same resource. This could happen when two transactions try to update the same row but the other already locked it. The code below shows an example where deadlock occurred: -53- Tonia Stakemire Chapter 6- Transactions SQL> create procedure ua(name in char) as 2 state char(3); 3 begin 4 update airport 5 set country_name=name; 6 select state_code into state 7 from city 8 where name=country_name; 9 update airport 10 set state_code=state 11 where country_name=name; 12 commit; 13 end; 14 / Procedure created. Figure 6.2 An example of a transaction that leads to deadlock It created successfully but it hung and neither of the updates was executed. Oracle breaks the deadlock by signalling an error to the last participating transaction. 6.1.6 Importance of Transactions It is essential that when data in a database is changed that it is done so in a consistent manner and transactions help ensure this. This is however dependant on the transaction being designed correctly and logically. As stated by Openacs [Openacs, 2000]: Rollback is not just a convenient feature, it is a critical basis for solid data storage. This highlights the importance of transactions. 6.2 Design of Experiment This experiment was designed differently from the previous experiments. This is because MySQL does not support transactions. 6.2.1 Differences in the Design The design of the transaction experiments differ from those previously used for Integrity constraints. -54- Tonia Stakemire Chapter 6- Transactions The main differences are: Only implemented on Oracle. Perl scripts not used but PL/SQL blocks and procedures used instead. Aim was not to compare but to identify the application of transactions. Investigated specific examples as opposed to general cases. MySQL does not support transactions at present and therefore it is not possible to evaluate the implementation of them. Instead an investigation will be done into the use of transactions on Oracle to outline why transactions are so important. The best way to illustrate their relevance was to demonstrate with example situations. As stated above these experiments were not done using Perl because they do not need to be ported and it was therefore more efficient to implement them directly on Oracle. 6.2.2 Methods of Simulating a Crash It is important to establish that Oracle enforced transactions properly. The only way to do this was to simulate a crash and then the results were checked. Three methods were used to simulate a crash. The first was to use a method that Oracle enabled. Here a specific comment was inserted into the procedure, which forced the transaction to abort. This function is present in Oracle for this exact purpose. The second method did not directly simulate a crash but still assisted in the testing of Oracle’s transactions. With this approach an incorrect statement was placed inside the procedure, causing it to fail and abort. With both these methods it was known when the transaction would abort by the position of the command or the incorrect statement. Finally, the terminal was closed during the execution of the procedure to simulate a random crash. This was the best method since it was random, which was more realistic. Oracle enforced atomicity and consistency in all cases. 6.2.3 Atomicity Test This experiment tested the property that ensures that all of the statements in the transaction are implemented or none of them. For this example the following tables were used: -55- Tonia Stakemire Chapter 6- Transactions @state= $server->create("state", ["state_code char(2) primary key", "state_name char(25) NOT NULL", "country_name char(25) default 'SA'"]); Figure 6.3 Code of the schema of the state table This is the table that was initially updated by the user when they wanted to alter the state_code or country_name. After this UPDATE statement was executed the state_code or country_name needed to be updated in the airport and city tables. The tables, which also needed to be updated, are shown below. @airport= $server->create("airport", ["airport_code char(3) NOT NULL primary key", "airport_name char(40) NOT NULL", "location char(36) NOT NULL", "state_code char(2)", "country_name char(25) default 'SA' ", "time_zone_code char(3) default 'EST'"]); Figure 6.4 Code for the schema for the airport table This table has the attributes state_code and country_name, which must be updated as well. The code for the schema of the other table affected is shown below: @city= $server->create("city", ["city_code char(4) primary key", "city_name char(25) NOT NULL unique", "state_code char(2) NOT NULL", "country_name char(25) default 'SA'", "time_zone_code char(3) default 'EST'"]); Figure 6.5 Code for the schema for the city table -56- Tonia Stakemire Chapter 6- Transactions When a user updated country_name in the state table, the equivalent value for country_name in the other two tables (airport and city) also had to be updated. The code used to ensure that all of the tables were updated is as follows: SQL> create procedure update_state (name in char) as 2 state1 char(3); 3 begin 4 update state 5 set country_name=name; 6 select state_code into state1 7 from state 8 where country_name=name; 9 update airport 10 set country_name=name 11 where state_code=state1; 12 update airport 13 set country_name=name 14 where state_code=state1; 15 commit; 16 end; 17 / Procedure created. Figure 6.6 The procedure to handle an update of the state table 6.2.4 Consistency Test The consistency is the property where if a certain condition exists, it must always exist even after execution of statements. The example used for this uses the same tables as the atomicity tests did except this time either the airport or the city table was updated not the state table. For every state_code in the state table there is a specific country_name associated with it and this state_code/country_name combination must be kept in the airport and city tables for the database to be in a consistent state. When the state_code in the airport table was changed, it was checked that the new state_code exited in the state table. This is similar to foreign key references except that the parent entity attribute is not a primary key. If the state_code did not exist then the transaction -57- Tonia Stakemire Chapter 6- Transactions was rolled back and aborted otherwise a SAVEPOINT was used to mark that this part of the transaction was complete. Subsequently, it was checked that the new value for the state_code was still a code in the same country otherwise the country_name was updated to the new name. This was achieved by querying the state table again to find what the country_name was for the new state_code. If it was no longer in the same country then the country_name was updated to the correct value. Only after all the updates can be partially committed can the transaction be finally completed. The code to implement this was as follows: SQL> create procedure update_airport (name in char, code in char) as 2 state2 char(3), name2 char(25); 3 begin 4 select country_name into name2 5 from state 6 where country_name = name 4 update airport 5 set country_name=name2; 6 savepoint; 7 select state_code into state2 8 from state 9 where country_name=name; 11 update city 12 set state_code=state 13 where country_name=name; 14 commit; 15 end; 16 / Procedure created. Figure 6.7 The procedure used when updating the airport table 6.3 Significant findings This experiment was subdivided into two sections. First of all, by simulating a crash it was proved that Oracle handled transactions correctly. If the transaction had not completed then it was rolled back and no changes were made otherwise the whole transaction was committed. Secondly it was shown that transactions are important. To do this it was demonstrated that atomicity will be violated if transaction are not supported. If the crash occurred after only part of the transaction had executed then it was necessary to rollback. For MySQL rollback is not -58- Tonia Stakemire Chapter 6- Transactions implemented therefore the database would have performed part of the transaction only. Depending on when the crash occurred, only the state table or the state table and airport table would have been updated. This violates the atomicity test. A similar result was found for the consistency test. If there were a crash in the middle of the transaction, then only part of the update would have been executed. The database may no longer be in a consistent state because the country_name/state_code pair in the airport table might not be consistent with that of the state table. 6.4 Summary of Chapter This chapter shows that it is important that transactions are support. If they are not then a method must be implemented to ensure that atomicity and consistency are not violated when updating tables. The next chapter investigates a multi-user environment and looks at concurrency control. -59- CHAPTER 7 Concurrency Control Tonia Stakemire Chapter 7- Concurrency Concurrency Control A DBMS should not limit the system to executing a single transaction at a time, but allow multiple transactions to run concurrently. The benefits of allowing concurrency are: Increase in Throughput Reduction in Average Response Time The problem with permitting updates of data concurrently is that there is a risk of violating the ACID properties. The order of execution of multiple transactions is known as schedules and influences the implementation issues of the transactions. The DBMS must have a concurrency-control system to determine the schedule instead of leaving it up to the operating system, which could allow inconsistencies to be introduced. When unrelated data is being accessed for reads or writes concurrently there is no problem. However if more than one transaction access the same data concurrently, the order of the transactions will affect the results and so this must be controlled. The DBA should not have to worry about this, as an internal scheduler should automatically decide the order of the transactions and execute them accordingly. Serialization (which is defined in section 7.2) must not be violated. 7.1 Problems With Concurrency There are three main problems that can occur if concurrency is not managed properly. These are lost updates, uncommitted data and inconsistent retrievals [Rob et al, 2000]. 7.1.1 Lost Updates Assume two transactions, T1 and T2, need to concurrently update the same filed and each update transactions consist of a read followed by a write operation. If T2 reads the value of field before T1 has written the value, the initial value will not be correct. This is because it will not be using the updated value from T1 but the original value prior to the update. The update will therefore be lost as its altered value is written over straight away without ever being used. -60- Tonia Stakemire Chapter 7- Concurrency 7.1.2 Uncommitted Data This violates the isolation property. To illustrate this, again an example is given with two concurrent update transactions T1 and T2. If the situation occurs where T1 writes its value without committing it and T2 then reads that value. However T1 subsequently rolls back, which causes a problem because both transactions are supposed to be rolled back. In this case the result T2 will be committed as if the rollback never occurred. 7.1.3 Inconsistent Retrievals This occurs if a transaction reads data from the table concurrently with an update transaction. Depending on whether the updated value or the original value is read, the results will differ and could be inaccurate. 7.2 Issues Concerning Serialization The schedule is the order of the execution of the instructions for one or more transactions. In the previous chapter the different types of operations were given, read and write, which will now be used to describe serialization. There are two main types of serialization, conflict serialization and view serialization [Silberschatz et al, 1997]. 7.2.1 Conflict Serialization This is dependant on whether two transactions access the same data or different data. If the same data is accessed then the order of the transactions may matter. There are four situations that could occur. If there are two transactions, T1 and T2 with instructions I1 and I2 respectively then the table below describes the four possible situations: Transaction 1 Transaction 2 Result Read Read Order does not matter Read Write Depending on which comes first, the result will change. Order matters. Write Read Same as above Write Write The transactions will not be affected but the final result will be different Table 7.1 Table showing possible situations and resulting conditions -61- Tonia Stakemire Chapter 7- Concurrency The first of these cases would not be a problem because the order does not matter whereas the other three situations could be in conflict. Conflict occurs where different operations access the same data and there is a write operation. If the order of a schedule can be turned into a different schedule by a number of swaps of nonconflicting instructions then these schedules are said to be conflict equivalent. A schedule is conflict serializable if it is conflict equivalent to a serial schedule. 7.2.2 View Serialization This is similar to conflict serializable but it is less strict. Every conflict-serializable schedule is view serializable but the reverse is not always true. It is also based on the read and write operations only. The three conditions necessary for view equivalence are: 1. If a transaction T1 reads the initial value of the data in the original schedule, then in the view equivalent schedule T1 must also read that same initial value. 2. If a transaction T1 read a value produced by T2 in the original schedule, then in the view equivalent schedule T1 must still read the value produced by T2. 3. The transaction that performs the final write must not change for view equivalence. 7.3 Different Lock Types and Granularity Generally the lock manager automatically controls the locking procedures, however the DBA can override the default settings if necessary. Depending on how much of the data is at risk of being corrupt, there are different levels of locking as well as different restrictions. These levels or granularities of locking can be database, table, row or field. At the top level, database locking prevents transaction T2 from accessing the database until the lock is released by T1. This restriction enforces the integrity well but is unsuitable for a multi-user environment and therefore it not tested in this project. The next level down is table level locking where a whole table is locked preventing access to the data by another transaction. This level of lock is tested even though it is still not very efficient. The next level is row level -62- Tonia Stakemire Chapter 7- Concurrency locking, where only a single row is locked. Finally, field level locking is where the same row can be accessed as long as the same field is not being accessed concurrently. The different nature of locks are either set to restrict access altogether, which is known as an exclusive lock, or a shared lock. A shared lock can only be implemented when performing a read operation and no exclusive lock is already held. An exclusive lock is when writing to a database and so no other transaction should be allowed access at all. 7.3.1 Oracle’s Locking System The different specifications that Oracle supports are either share or exclusive. As stated in the Oracle documentation [OTN, 2000]: You need never explicitly lock a resource, because default locking mechanisms protect table data and structures. However, you can request data locks on tables or rows when it is to your advantage to override default locking. You can choose from several modes of locking such as row share and exclusive. The experiments will look at overriding the default and trying different locking types. The parameters for Oracle are: ROW SHARE, ROW EXCLUSIVE, SHARE UPDATE, SHARE, SHARE ROW EXCLUSIVE, or EXCLUSIVE. 7.3.2 MySQL’s Locking System MySQL supports table level locking and row level locking is not yet implemented although it is in their TODO list. In the MySQL documentation it is stated: All tables that are locked by the current thread are automatically unlocked when the thread issues another LOCK TABLES, or when the connection to the server is closed. If a thread obtains a READ lock on a table, that thread (and all other threads) can only read from the table. If a thread obtains a WRITE lock on a table, then only the thread holding the lock can READ from or WRITE to the table. -63- Tonia Stakemire Chapter 7- Concurrency MySQL supports two types of locks, a read lock and a write lock as stated above. There is a read local and a read where read local differs form the plain read in that it allows non conflicting statements to be executed while the lock is held. Read locks will wait for write locks as they have a higher priority so that the updating is finished as soon as possible. For MySQL they have used the atomic concept to handle multiple queries and so locking should not be necessary as each update thread is atomic so cannot interfere with another SQL statement. 7.4 Design of the Experiment To simulate a multi-user environment, Perl scripts were used again as they have a convenient command – the fork command. Using this command each of the DBMS was tested separately for because they support different types of locking commands. Many queries were executed concurrently and various clashes for resources were simulated. Afterwards, each of the ACID properties was tested with table level locks only. The reason for only using this type of lock is because this experiment only uses a single database therefore it would not be beneficial to tests database-level locks. Field locks were not tested either as neither system supports them. Row level locks are only supported by Oracle therefore they could not be tested on MySQL. This however could be tested in a future project, especially as NuSphere [NuSphere, 2000] is implementing row-level locking in MySQL. The first tests were done not implementing a lock type to investigate how the system handles locking itself. The table-level locks were then tested to see if there were any differences. Each of the transactions consisted of a single statement. Multiple queries were run concurrently and tested the following: Writing to separate tables Writing to the same table but different rows Reading and writing concurrently the same row but a different field Writing to the same row but a different field concurrently Reading the same row and field at the time that it is being written to Writing to the same row and field in a single table concurrently -64- Tonia Stakemire Chapter 7- Concurrency Where there were writing operations in the experiment the results were tested to check if there was any uncommitted data or lost updates. Where there were read operations the results were tested for inconsistent retrievals. All of the results were tested for atomicity, consistency, isolation and durability. 7.5 Results and Evaluation It was found that both systems handled concurrency correctly. All of the updates were performed correctly and left the database in a consistent manner. There was no difference between the actions where the lock type was specified or the scheduler decided the lock type used. This implied that both DBMS implement a suitable locking mechanism. For Oracle it was not known whether this was table-level locking or row-level locking. Both types of locks enforce integrity but the advantage of row-level locking is that it has better performance. These tests were not extensive. Therefore more in-depth experiments might reveal integrity violations. It is already known that MySQL does not support multiple statement transactions therefore this might have caused problems if it was tested. This could be tested further in a future project. 7.6 Summary of Chapter This chapter evaluated the concurrency control of the DBMSs. No violation of the integrity was found. The final chapter is the conclusion to this project. -65- CHAPTER 8 Conclusion Tonia Stakemire Chapter 8- Conclusion Conclusion This chapter summarises the findings and suggests suitable environments for each DBMS. Some future extensions are also recommended at the end of this chapter. 8.1 Evaluation of MySQL It was found that there were situations where MySQL did not enforce the integrity of its data. This was because there are features that MySQL does not support which are important to maintain the integrity of the database. The significant findings were: For the domain experiments the significant findings were: It permitted more characters than specified to be inserted It did not disallow the insertion of a number larger than the maximum value It did not enforce the enumeration constraint properly It did not enforce ‘check constraints’ at all It allowed data types to be altered while data was already in the table The integrity of the database was not violated for the ‘not null’ constraint although it was found that MySQL was less restrictive than Oracle. For the entity integrity experiments the significant findings were: There were no significant findings for either the primary key or unique key constraints. MySQL was less restrictive than Oracle but it was not found to violate the integrity. For the referential integrity constraints the significant findings were: It did not enforce foreign key constraints at all when inserting data in a referencing column It did not perform the correct operations when updating referenced or referencing columns -66- Tonia Stakemire Chapter 8- Conclusion It did not perform the correct operations when deleting referenced or referencing columns It was known before hand that MySQL does not yet support transactions. These experiments were therefore only performed on Oracle to highlight where transactions are essential. The aim of this section was to outline the problems that may arise with MySQL not supporting transactions so that the DBA is made aware of them. It only investigated a single user environment and showed that both atomicity and consistency could be violated with MySQL. The final experiment investigated concurrency control. MySQL supports table-level locking only. Only concurrent transactions comprising of single queries were evaluated and therefore limited results were found. As stated in chapter 7, a more extensive evaluation in this direction could be carried out. 8.2 Evaluation of Oracle Oracle was found to support all the features except the enumeration data type. This was easily implemented using ‘check constraints’ though. Oracle had additional functionality as well such as stored procedures which enables a DBA create complex applications. Oracle was also found not to violate the integrity. This is a situation that every DBA would find ideal because it is advantageous to have a DBMS that enforces integrity. 8.3 Overall Evaluation of the Integrity From the experiments performed in this project it was found that Oracle did not violate the integrity at all whereas MySQL did on several occasions. This suggests that Oracle enforces integrity much better than MySQL does. Oracle supports all the integrity constraints, transactions and concurrency control and does not appear to allow the violation of the integrity of the database at all. In depth experiments would have to be performed to obtain conclusive results. If it is essential that integrity be enforced then Oracle appears to be the suitable choice of DBMS while MySQL is not so suitable. -67- Tonia Stakemire Chapter 8- Conclusion 8.4 Suitable Environment for each DBMS Below is a list of the advantages of each of the DBMS that was found in this project. 8.4.1 Advantages of each DBMS The advantages of MySQL are: Free (if not used commercially) Fast (from the benchmark results and using it in this project) Less restrictive e.g. not case sensitive Easy to use e.g. easier to install Code is available and so can be altered The advantages of Oracle are: Enforces integrity Supports additional integrity constraints e.g. check constraints Supports transactions Has additional locking mechanisms e.g. row level locks Has extra functionality e.g. PL/SQL Has a good support system e.g. online documentation system 8.4.2 Environment Suitable for MySQL Because MySQL is fast but does not enforce integrity efficiently an environment where the queries consist mainly of insert operations and not updates will be more suitable. An example of this would be where the database is used for a search engine. If a transactions orientated environment is required then there will most likely be a problem with a multi-user system as MySQL locks the entire table to try to enforce atomicity. It is not advisable to use MySQL in a transaction orientated environment unless it is implemented on a stable system as in the case of a crash it is not known what the state of the system will be afterwards. Using the situations stated above there a crash would not leave the database in an inconsistent state as only selects are being performed, whereas updates might due to transactions not being supported. Also if this situation were used there would not be much of a problem because concurrency -68- Tonia Stakemire Chapter 8- Conclusion control is not needed where select statements are being executed as these are only read operations. If it is a simple, easy to use free database that is required then MySQL is definitely a suitable option to choose for the database to be used. 8.4 3 Environment Suitable for Oracle Oracle has additional functionality and enforces integrity well. Oracle is well suited to an e-commerce database where integrity is essential. It is designed to be a weborientated database, which makes it even more applicable for this type of environment. It is slower than MySQL, which could cause a bit of a problem but Internet users are used to delays. It is well suited to business environment as it has extra functionality with PL/SQL and so it is easy to implement business rules. 8.5 Possible Future Extensions Below is a list of five possible extensions to this project. Two of them are an extension to this project and the other two are similar projects. 8.5.1 Further Research on This Project Different data types could be investigated. One of these, which are becoming increasingly important, is multi-media data. Further research into concurrency control could be attempted. Only single query transactions were tested for this experiment was used and so multiple query transactions could have been tested. This includes investigating time stamps protocols. A bigger database could be used. This project only investigates integrity violations at a high level. More complex experiments could be designed. 8.5.2 Use Different DBMS with a Similar Project There are many different DBMS out there so any of them could be selected. There is a lot of discussion presently as to which out of MySQL and PostGres is superior. They are both open source DBMS, aimed at similar market DBMS, and so this would make -69- Tonia Stakemire Chapter 8- Conclusion and interesting project. Another platform could affect the results although it should not. 8.5.3 Investigate Other Aspects of a DBMS There are many other aspects other than integrity that can be investigated. These include: scalability, recovery system, security, and performance, just to name a few. 8.5.4 Attempt to Solve Problems Found with MySQL Unfortunately due to a time constraint it was not possible to attempt this. It would be very interesting to attempt to solve the problems found with the integrity of MySQL in particular. There are many methods of working around the problems. These can be implemented and then a comparison of their ease of implementation, efficiency and how effective they are can be made. An example would be testing whether tablelocking works for ensuring integrity with transactions, as suggested on the MySQL homepage. Recently there have been discussions at NuSphere4 [NuSphere, 2000] about designing software to solve row–level locking for MySQL. It might be interesting how efficiently this works and how much it slows down the system. 4 Supporters of open source software -70- References [Rob et al, 2000] Rob, P. and Cornel, C., Database Systems: Design, Implementation, & Management, Thomson Learning - Course Technology, 2000. [Silberschatz et al, 1997] Silberschatz, A., Korth, and H.F., Sudarshan, S., Database System Concepts, McGraw – Hill Companies, 1997. [Wall et al, 1996] Wall, L., Christiansen, T., and Schwartz,R.L., Programming Perl, O’Reilly & Associates, Inc, 1996. [Wall et al, 1990] Wall, L., Christiansen, T., and Schwartz,R.L., Programming Perl- Unix Programming, O’Reilly & Associates, Inc, 1990. [Koch et al, 1997] Koch,G. and Loney,K., Oracle 8: The Complete Reference, Osborne/McGraw-Hill, 1997 [Hansen et al, 1996] Hansen, G.W., and Hansen, J. V., Database Management and Design, Prentice Hall, 1996 [Jones et al, 1997] Jones, J. and Monk, S., Databases in Theory and Practice, International Thomson Computer Press, 1997 [Dorsey et al,1997] Dorsey, P., and Koletzke, P., Oracle: Designer/2000 Handbook, Osborne/McGrawHill, 1997 -71- References [Chorafas, 1998] Chorafas,D.N., Transaction Management: Managing Complex transactions and Sharing Distributed Database, St.Martin’s Press, Inc., 1998. [Post, 1999] Post, G.V., Database Management Systems: Designing and Building Business Application, Irwin/McGraw-Hill, 1999 Online References: [Oracle Homepage, 2000] The Oracle Homepage, Oracle Corporation, www.oracle.com, 2000 [OTN, 1999] The Oracle Technology Network, Oracle Corporation, Documentation, http://otn.oracle.com/, 1999 [MySQL Homepage, 2000] The MySQL Homepage, MySQL AB, Documentation, www.mysql.com, 1995 - 2000 [Perl Homepage, 2000] The Perl Homepage, O’Reilly & Associates, Inc., Documentation, www.perl.org, 1998 - 2000 [Earthweb, 2000] Earthweb, Inc., “Cross-platform Perl/CGI tips and tricks”, http://developer.earthweb.com, 2000 [Greatbridge, 2000] Greatbridge, “Open Source Database Routs Competition in New Benchmark Tests” http://www.greatbridge.com/about/press.php?content_id=4, 2000 -72- References [NuSphere, 2000] NuSphere Corporation, “NuSphere to Contribute Row-Level Locking to MySQL™ Database”, http://www.nusphere.com/releases/103000.htm, 2000 [Openacs, 2000] Adida, B., Open ACS, “Why not MySQL”, http://openacs.org/philosophy/why-notmysql.html, 2000 [Devshed, 2000] Widenius, M., “MySQL Developer Contests PostgreSQL Benchmarks” http://www.devshed.com/BrainDump/MySQL_Benchmarks/, 2000 [Symbolstone, 2000] Bunce, T., “DBI”, http://www.symbolstone.org/technology/perl/DBI/index.html, 2000 -73- Appendix Appendix A Perl Scripts Subroutines to create the tables: # An array of all the table names @tables = (\@aircraft, \@airline, \@airport, \@city, \@airport_service, \@class_of_service, \@code_description, \@compound_class, \@connection, \@day_name, \@dual_carrier, \@flight, \@connect_leg, \@flight_day,\@food_service,\@time_interval, \@month_name,\@restriction,\@restrict_carrier,\@fare,\@flight_fare, \@flight_class, \@restrict_class, \@state, \@stop,\@time_zone, \@transport, \@ground_service); #------------SUBROUTINE TO QUERY THE DATABASE-----------------------sub insert_into_table { my(@query)= @_; my(@results, $num); $num=1; foreach $query( @query) { print "Query $num: $query \n "; $results = $dbh->do($query) || warn "You have and error with query $num: $DBI::errstr \n\n"; $num++; } } #SUBROUTINE TO INSERT DATA INTO THE TABLES sub insert_data { print "Inserting data\n"; $row_count=0; $double_quotes=$server->{'double_quotes'}; for ($ti = 0; $ti <= $#table_names; $ti++) { my $table_name = $table_names[$ti]; my $array_ref = $tables[$ti]; my @table = @$array_ref; Appendix my $insert_start = "insert into $table_name values ("; open(DATA, "$pwd/data/${table_name}.txt") || die "Can't open text file: $pwd/data/${table_name}.txt\n"; while(<DATA>) { chomp; next unless ( $_ =~ /\w/ ); # skip blank lines my $command = $insert_start . $_ . ")"; $command =~ s/\'\'/\' \'/g if ($opt_server =~ /empress/i || $opt_server =~ /oracle/i); print "$command\n" if ($opt_debug); $command =~ s/\\'/\'\'/g if ($double_quotes); $sth = $dbh->do($command) or die "Got error: $DBI::errstr when executing '$command'\n"; $row_count++; } } close(DATA); } List of all the Domain tests: insert_into_table(String_test(@query)); insert_into_table(Integer_test(@query)); insert_into_table(Float_test(@query)); insert_into_table(Enum_test(@query)); insert_into_table(Check_Constraint_test(@query)); insert_into_table(Not_NULL_test(@query)); insert_into_table(Update_Domain_test(@query)); insert_into_table(Update_Check_test(@query)); insert_into_table(Update_NULL_test(@query)); Examples where integrity was violated: Character tests: Example 1 @city= $server->create("city", ["city_code char(4) primary key", "city_name char(25) NOT NULL unique", "state_code char(2) NOT NULL", "country_name char(25) default 'SA'", "time_zone_code char(3) default 'EST'"]); #more chars than required $query[4]="insert into city(city_code, city_name, state_code) values('YYYY', 'Cape', 'UCLA')"; Appendix Example 2 @airport= $server->create("airport", ["airport_code char(3) NOT NULL primary key", "airport_name char(40) NOT NULL", "location char(36) NOT NULL", "state_code char(2)", "country_name char(25) default 'SA' ", "time_zone_code char(3) default 'EST'"]); #too many char $query[10]="update connection set from_airport='ABCDEFG' where connect_code='305280'"; Example 3 (see the table definition for airport above) #float $query[7]="update connection set from_airport='1.2' where departure_time='1000'"; Integer Tests: Example 1 Significant column for test: "flight_code integer(8) primary key", #bigger than integer $query[4]="insert into flight(flight_code, flight_days, from_airport, to_airport, departure_time, arrival_time, airline_code, flight_number,class_string, aircraft_code, dual_carrier, time_elapsed) values(80000000000, '1234567', 'SFO', 'PHI',333,500,'A4','Y','FBHKY', 'YY', 'Y', 45)"; Example 2 Significant column for test: "pay_load integer", #string $query[1]="insert into aircraft(aircraft_code, engines,aircraft_type, pay_load) values('ZZ2',2,'BOEING','HELLO')"; Float Test: Example 1 @fare= $server->create("fare", ["fare_code char(8) primary key", "from_airport char(3) NOT NULL", "to_airport char(3) NOT NULL", Appendix "fare_class char(3) NOT NULL", "fare_airline char(2)", "restrict_code char(5) ", "one_way_cost float(7,2) DEFAULT '0.00'", "rnd_trip_cost float(8,2)", "FOREIGN KEY (fare_class) REFERENCES compound_class", "FOREIGN KEY (restrict_code) REFERENCES restriction", "CHECK(one_way_cost>0 or rnd_trip_cost>0)"]); #float test #char $query[3]="update fare set one_way_cost='HELLO' where fare_code='7100018'"; Enumeration Tests: Example 1 Significant column for test: "economy char(3) NOT NULL check( economy='YES' or economy='NO')", #is it case sensitive $query[4]="insert into compound_class(fare_class,base_class, class_type, premium, economy,discounted, night, season_fare) values('A3','Y', 'COACH', 'Y', 'yes', 'NO', 'NO','LOW')"; Example 2 @restriction= $server->create("restriction", ["restrict_code char(5) primary key", "application char(80) NOT NULL", "no_discounts char(80) DEFAULT 'NO-ONE'", "reserve_ticket smallint(3) NOT NULL", "stopovers char(2) check(stopovers='Y' or stopovers='N')", "return_min smallint(3)", "return_max smallint(3)", "CHECK(return_max >= return_min)"]); #no value assigned for the enum type - what is the default $query[0]="insert into restriction(restrict_code, application, reserve_ticket) values('AP/15', 'How far', 1)"; Example 3 Significant column for test: "class_type char(20) NOT NULL check( class_type='FIRST' or class_type='COACH' or class_type='BUSINESS' or class_type='THRIFT' or class_type='STANDARD' or class_type= 'SUPERSONIC')", #spelt wrong - letter missing $query[3]="insert into compound_class(fare_class, base_class, class_type, premium, economy, discounted, night, season_fare) values('A1','Y','BUSINES', 'NO', 'YES','NO', 'NO','LOW')"; Appendix Example 4 (see column above in example 3) #enum tests #not in enum $query[0]="update compound_class set class_type='LAST' where fare_class='CHW'"; Check Constraint Tests: Example 1 @airport_service= $server->create("airport_service", ["city_code char(4) ", "airport_code char(3)", "miles_distant number(5,2) CHECK (miles_distant > 0.00)", "direction char(3) check(direction='N' or direction='S'or direction='E' or direction='W' or direction='NE'or direction='SW' or direction='SE' or direction='NW')", "minutes_distant smallint NOT NULL CHECK (minutes_distant between 0 and 360)", "primary key(city_code, airport_code)", "foreign key (city_code) references city on delete cascade", "foreign key (airport_code) references airport"]); "]); #miles >=0 so will try negative $query[0]="insert into airport_service(city_code, airport_code, miles_distant, minutes_distant) values('AABB','AAB', '-1', 23)"; Example 2 @time_interval= $server->create("time_interval", ["period char(20) NOT NULL", "begin_time smallint NOT NULL", "end_time smallint NOT NULL", "PRIMARY KEY (period, begin_time)", "CHECK (end_time > begin_time)"]); #begin_time > end_time $query[7]="insert into time_interval(period, begin_time, end_time) values('midnight',1500, 100)"; Example 3 (see float test example 1) #leave both blank $query[6]="insert into fare(fare_code, from_airport, to_airport, fare_class) values('7001003', 'PHI', 'SFO', 'YN')"; Appendix Not Null Tests: Example 1 @state= $server->create("state", ["state_code char(2) primary key", "state_name char(25) NOT NULL", "country_name char(25) default 'SA'"]); #insert empty string $query[2]="insert into state(state_code, state_name) values('T',' ')"; Example 2 (see example above for table definition) $query[5]="insert into state(state_code, country_name) values('AT', 'SA')"; Example 3 Significant column for test: "wing_span float(6,2) NOT NULL", $query[2]="update aircraft set wing_span=' ' where aircraft_code='763'";