* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download 2.0 The Background of Database Systems
Tandem Computers wikipedia , lookup
Global serializability wikipedia , lookup
Commitment ordering wikipedia , lookup
Microsoft Access wikipedia , lookup
Microsoft SQL Server wikipedia , lookup
Entity–attribute–value model wikipedia , lookup
Serializability wikipedia , lookup
Oracle Database wikipedia , lookup
Extensible Storage Engine wikipedia , lookup
Ingres (database) wikipedia , lookup
Open Database Connectivity wikipedia , lookup
Functional Database Model wikipedia , lookup
Microsoft Jet Database Engine wikipedia , lookup
Versant Object Database wikipedia , lookup
Relational model wikipedia , lookup
Concurrency control wikipedia , lookup
Clusterpoint wikipedia , lookup
ICT Standards and Guidelines Segment 104 Database Systems Main Document (Version 2.0) Disclaimer The Office of the Minister of State for Administrative Reform (OMSAR) provides the contents of the ICT Standards and Guidelines documents, including any component or part thereof, submission, segment, form or specification, on an 'as-is' basis without additional representation or warranty, either expressed or implied. OMSAR does not accept responsibility and will not be liable for any use or misuse, decision, modification, conduct, procurement or any other activity of any nature whatsoever undertaken by any individual, party or entity in relation to or in reliance upon the ICT Standards and Guidelines or any part thereof. Use of or reliance upon the ICT Standards and Guidelines is, will be and will remain the responsibility of the using or relying individual, party or entity. The ICT Standards and Guidelines are works in progress and are constantly being updated. The documentation should be revisited regularly to have access to the most recent versions. The last date of update for this document was June 2003. Table of Contents - Database Systems 1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0 Executive Summary for Database Systems ............................................... 1 The Background of Database Systems ...................................................... 2 2.1 The Scope of Database Systems ......................................................... 2 2.2 The Benefits of Standardization ........................................................... 2 2.3 Policies to Follow for Database Systems ............................................... 3 2.4 Risks Resulting from the Standardization Activities ................................ 3 2.5 Related Documents ........................................................................... 3 2.6 How to Use This Document? ............................................................... 3 2.7 Related Terms and Acronyms ............................................................. 4 2.8 Related Segments and Cross References .............................................. 4 2.9 Related International Standards .......................................................... 5 2.10 All Segments in the ICT Standards and Guidelines ................................. 5 Roles and Responsibilities ........................................................................ 6 Selecting Database Systems Components ................................................ 7 4.1 Selecting Database Management Systems ............................................ 8 4.2 Selecting the Interface Layer .............................................................. 9 4.2.1 Open Database Connectivity (ODBC) ......................................... 9 4.2.2 Implementing Multi-Tier Architecture ......................................... 9 Mandatory Conditions for Database Management Systems .................... 11 5.1 Condition 1: Arabic Support .............................................................. 11 5.2 Condition 2: ODBC Support .............................................................. 11 5.3 Condition 3: Integrity Constraints ..................................................... 11 5.4 Condition 4: Transaction Management ............................................... 12 5.5 Condition 5: Multi Users Access ......................................................... 12 5.6 Condition 6: Recovery from System Crashes ...................................... 12 5.7 Condition 7: Access Control .............................................................. 13 Selecting the Architecture of a Database System ................................... 14 6.1 Centralized Databases ..................................................................... 14 6.2 Distributed Databases ...................................................................... 14 6.2.1 Data Fragmentation ............................................................... 15 6.2.2 Available Network ................................................................. 15 6.2.3 Transaction Management ....................................................... 15 6.2.4 Replication ........................................................................... 16 6.3 Parallel Databases ........................................................................... 16 6.4 Web Based Databases...................................................................... 17 Good Practices for Designing the Database ............................................ 19 7.1 Data Modeling................................................................................. 19 7.2 Schema Representation ................................................................... 19 7.3 The Use of CASE Tools ..................................................................... 20 7.4 Naming Conventions ........................................................................ 20 7.5 Decomposition and Normalization...................................................... 20 7.6 Using Database Triggers .................................................................. 21 7.7 Primary Key Selection ...................................................................... 22 Good Practices for Operating and Maintaining Database Systems .......... 23 8.1 Data Access and Security ................................................................. 23 8.2 Data Storage and Space Allocation .................................................... 23 8.2.1 Disk Storage ......................................................................... 23 8.2.2 Index Selection ..................................................................... 24 8.3 Tuning the Database........................................................................ 24 8.4 Auditing the Database ..................................................................... 25 8.5 Backup, Disaster Recovery and Contingency Planning .......................... 26 8.6 8.7 8.8 Documenting the Database System ................................................... 27 Roles and Responsibilities During Operation and Maintenance ............... 27 Training ......................................................................................... 31 Figures - Database Systems Figure Figure Figure Figure Figure Figure Figure 1: 2: 3: 4: 5: 6: 7: How to Use the Database Systems Segment ....................................... 4 The Components of a Database System ............................................... 7 Architecture of a Web Based Database System ................................. 18 The Normalization Process ................................................................ 21 Database Systems Documentation Checklist ..................................... 27 Roles and Responsibilities ................................................................. 28 Human Resources Involved in the Operation .................................... 30 1.0 Executive Summary for Database Systems The objective of this segment is to present guidelines that can be used during the acquisition, development and maintenance of database systems. The segment defines the basic components of a database system to be the database, the Database Management System (DBMS), the interface layer and the software application. Good practices for the design, operation, and maintenance of the database are presented in the second part of the segment. The segment establishes Relational Database Management Systems (RDBMS) as the standard for database management systems and presents the mandatory conditions for their selection. For the interface layer, the segment suggests using the multi-tier architecture design to decouple the data from the software application. Alternatively ODBC may be used as the interface layer. This segment proposes different architectures for database systems. The centralized, distributed, parallel, and web-based designs are presented each with its strengths and weaknesses and the concrete reasons for their selection. The segment recommends using web enabled centralized databases unless there is a need for else. The second part of the segment presents a set of good practices and guidelines. Good practices for data modeling, schema representation, using CASE tools, data fields naming conventions, third form normalization, using database triggers, and primary key selection are presented for a good design of the database system. In a later section, this segment presents good practices for the operation and maintenance of the database systems, practices for data access and security, data storage and space allocation as well as for tuning the database, auditing the database, and training on its use. The segment proposes a schema for dividing the roles and responsibilities for operating and maintaining the database system among the Ministry or Agency staff concerned. Finally, the segment presents a list of the documentation that the DBA must keep and update about the database system. A separate and comprehensive segment covers Software Applications while another one covers the Evaluation and Selection Framework for ICT products and services. This segment provides the input to these two segments and shall not deal with their content. Both these segments can be downloaded from OMSAR's website for ICT Standards and Guidelines at www.omsar.gov.lb/ICTSG. Database Systems Page 1 2.0 The Background of Database Systems This segment is concerned with Database Systems. Database systems are information technology solutions built around databases. The basic components of a database system are databases, database management systems and software applications. The objective of this segment is to present and discuss guidelines that can be used during the acquisition, development and maintenance of Relational Database Management Systems (RDBMS). Topics to be covered are the selection, administration, organization, storage, security and efficiency of database systems. The guidelines are applicable to any database system whether the database system is part of any internally developed application, one that is being developed by an external vendor or one that is part of a Commercial Off the Shelf Software (COTS). 2.1 The Scope of Database Systems The scope of database systems covered in this segment is limited to: The rules governing the organization of the data within the database DBMSs Data storage Data access and security Backup, recovery and contingency planning Auditing Basic database administration requirements Human resources involvement and training Selection criteria where applicable Programming issues Maintenance of databases Tuning This segment is not concerned with 2.2 The software applications that manipulate the data stored in the database according to the business rules of the organization hosting the data. These are presented in the Software Applications segment. Data Warehouses which are multi-dimensional repositories of data gathered from multiple sources. Data Mining which is the technique of finding relevant information from a large volume of data such as a data warehouse The Benefits of Standardization The objective of this segment is to provide a common framework for the acquisition, development and maintenance of database systems. Database Systems make up the largest segment of business oriented software applications. Whether choosing Commercial Off the Shelf ERP applications or developing Database Systems Page 2 a custom made application for a specific government ministry or agency, chances are that a database is needed to store the information. The requisitioning and the selection of future systems as well as the maintenance of existing ones are a burden. If these database systems have no common framework and no standard documentation, maintenance by itself will surely become a nightmare. Such maintenance becomes a major issue after five years or so from the inception of such systems. At that time, they would usually be exposed to disk crashes and database performance deterioration. 2.3 Policies to Follow for Database Systems The following policies are proposed: 2.4 Database systems should be used with the mandatory criteria presented in this segment. The Good Practices presented in this segment should be observed. Standardized practices should be used as these would reduce training and troubleshooting costs. Risks Resulting from the Standardization Activities When standardization is implemented, the following risks may arise: 2.5 The mandatory criteria are not observed while acquiring database systems The recommended Good Practices are not observed Incompetent persons are assigned roles related to the design, maintenance or operation of database systems Related Documents One document is related to this segment and that is a Check List to be discussed in Section 8.6. 2.6 How to Use This Document? There is one main document. It defines database systems and defines selection criteria for each of its components and for the architecture of the system as a whole. The rest of the document presents good practices for the design of databases and for the operation and maintenance of database systems. Figure 1 depicts the road map to navigate through the segment. Database Systems Page 3 Select DBMS (Section 4.1, 5.0) Design the Database Select Interface Layer Select the Architecture (Section 4.2) (Section 6.0) Develop or Acquire the System? Develop (Section 7.0) Acquire Design Application (Software Application Segment) Operate and Maintain the System (Section 8.0) Figure 1: How to Use the Database Systems Segment 2.7 Related Terms and Acronyms ADO API ASP CASE COTS CRUD DBMS ERD ERP Field ODBC ORM RDBMS Record Schema SQL UML 2.8 Active X Database Objects Application Programming Interface Active Server Page Computer Assisted Software Engineering Commercial Off the Shelf Software Create, Read, Update and Delete Database Management System Entity Relationship Diagram Enterprise Resource Planning Attribute or column (interchangeable terms that mean the same thing) Open Database Connectivity Object Role Modeling Relational Database Management System Row (interchangeable) The description of data in terms of the data model Structured Query Language Unified Modeling Language Related Segments and Cross References The following segments have Standards and Guidelines that relate to this segment: 105 201 202 203 204 www.omsar.gov.lb/ICTSG/OS www.omsar.gov.lb/ICTSG/QM www.omsar.gov.lb/ICTSG/SW www.omsar.gov.lb/ICTSG/EV www.omsar.gov.lb/ICTSG/SC Database Systems Operating Systems Quality Management Software Applications Evaluation + Selection Framework Information Integrity and Security Page 4 Each page contains the main document and supplementary forms, templates and articles for the specific subject. 2.9 Related International Standards There are no related standards for database systems. However, the Structured Query Language (SQL) is a standard language that may differ from one vendor to the other. The following organization defines standards for SQL: ANSI - American National Standard for Information Systems - Database Language ISO Database Language SQL - Part 2: Foundation 2.10 All Segments in the ICT Standards and Guidelines OMSAR's website for ICT Standards and Guidelines is found at www.omsar.gov.lb/ICTSG and it points to one page for each segment. The following pages will take you to the home page for the three main project document and the 13 segments: 101 101 102 103 104 105 106 201 202 203 204 205 206 207 www.omsar.gov.lb/ICTSG/Global www.omsar.gov.lb/ICTSG/Cover www.omsar.gov.lb/ICTSG/Legal www.omsar.gov.lb/ICTSG/HW www.omsar.gov.lb/ICTSG/HW www.omsar.gov.lb/ICTSG/NW www.omsar.gov.lb/ICTSG/TC www.omsar.gov.lb/ICTSG/DB www.omsar.gov.lb/ICTSG/OS www.omsar.gov.lb/ICTSG/EN www.omsar.gov.lb/ICTSG/QM www.omsar.gov.lb/ICTSG/SW www.omsar.gov.lb/ICTSG/EV www.omsar.gov.lb/ICTSG/SC www.omsar.gov.lb/ICTSG/DE www.omsar.gov.lb/ICTSG/RM www.omsar.gov.lb/ICTSG/CM Global Policy Document Cover Document for 13 segment Legal Recommendations Framework Hardware Hardware Systems Networks Telecommunications Database Systems Operating Systems Buildings, Rooms and Environment Quality Management Software Applications Evaluation + Selection Framework Information Integrity and Security Data Definition and Exchange Risk Management Configuration Management Each page contains the main document and supplementary forms, templates and articles for the specific subject. Database Systems Page 5 3.0 Roles and Responsibilities The Database Administrator (DBA) is responsible for the design, operation and maintenance of the database and the DBMS. The DBA is usually a staff in the IT Department of the Ministry or Agency. The IT Department Head or IT Manager is usually responsible for the software application side of the database system. When it comes to defining roles and responsibilities for the entire database system, other staff from the IT department and from the Ministry or Agency is involved within limited responsibilities. This is especially true when acquiring a database system or selecting the appropriate architecture. The IT Manager and DBA are asked for their input but they might or might not be the final decision makers. Section 8.7 describes in details the roles and responsibilities of the staff involved in the operation and maintenance of database systems. Database Systems Page 6 4.0 Selecting Database Systems Components This section addresses features that should be looked for when selecting a database system. Several components are discussed. The basic components of a database system are databases, database management systems and software applications. Figure 2 represents the components that define a database system. The minimum configuration for a database system is depicted in solid lines. The dotted lines are optional additional components. Software Application DBMS Interface DBMS1 DBMS2 DBMSn Database 1 Database 2 Database n Figure 2: The Components of a Database System Database Systems are collections of related data. The names, ages, topics of interests of all children attending a particular school are all one collection. The makes, engine numbers and colors of all cars driving on the roads of Lebanon can be considered as another collection. The software needed to control and maintain these collections of data is called a Database Management System (DBMS). Software Applications are special purpose programs that manipulate the data stored in the databases through the DBMS according to the business rules and procedures of the organization requesting the software application. The guidelines for selecting software applications are discussed in a separate segment and will not be discussed in Database Systems Page 7 this segment. The Software Applications segment can be downloaded from OMSAR's web site for ICT Standards and Guidelines www.omsar.gov.lb/ICTSG/SW. 4.1 Selecting Database Management Systems The following is a list showing the various database technologies available today: Networked databases Hierarchical databases Relational databases Object oriented databases Even though some existing database systems still operate on hierarchical and network databases, they are almost obsolete and are not used today. Standard: This segment proposes the use Relational Database Management Systems (RDBMS) as the database management system of choice. Exception: Object Oriented database management system (ODBMS), may be used instead of RDBMS, in the case where: RDBMS cannot be used such as when the nature of the application is highly complex such as in Artificial Intelligence (For example, designing expert systems), modeling (CASE tools, routing, workflow), or engineering applications. An object oriented programming language such as C++ or SmallTalk is used to code the functionality of the desired system. In this case, the ODBMS is a natural extension of the programming language. It is worth noting that even though the term “relational” applies to the data model underlying the structure of the database, the data model is not a selection criterion by itself. For example, the popular FoxPro product uses a relational database model allowing the organization of data in rows and columns and allowing the definition of relations between them. However, it uses its own command language and not SQL in order to create and manipulate the operating system files that form the database. Hence, Microsoft FoxPro™ systems are file processing systems. Permanent records are stored in various files supported by the operating system and the application programs extract records and add records to the appropriate files. File processing systems have major disadvantages such as allowing data redundancy and inconsistency between files. Moreover, they have difficulty accessing data and managing concurrent multiple users. Finally, they have integrity problems, atomicity problems (i.e., transaction management) and security problems. An example of the latter is that the database can be accessed through the operating system. Therefore, Microsoft FoxPro™ and Microsoft Access™ based systems are file processing systems and are not database management systems. They are excluded from use by this standard. Database Systems Page 8 4.2 Selecting the Interface Layer This layer is an optional component. If it is to be used, then the following must be chosen: ODBC (Open Database Connectivity) or Multi-tier architecture These are presented in the next two sections. 4.2.1 Open Database Connectivity (ODBC) Through ODBC, a single executable (Source code) can access different DBMSs without recompilation. In Figure 2 shown earlier, the ODBC drivers are in place of the DBMS Interface box. Even though, the segment requires that the DBMS of choice support ODBC, (See Section 5.2), the use of the ODBC is not recommended unless needed. The reason is the additional resource cost in performance and query time. The ODBC solves portability problems when Multi-tier architecture is not implemented. 4.2.2 Implementing Multi-Tier Architecture This segment emphasizes the importance of this architecture in the context of database systems. For database systems to be portable, reusable and easily maintained, all calls directed to the database should be grouped in a DBMS interface class. This allows the separation of calls to the database from the rest of the source code. The term “Class” refers to a separately compiled modules making up libraries. Such modules are often referred to as components and usually follow one of several industry standards such as COM or CORBA. The importance of this separation is to be able reach data independence by decoupling the calls to the database from the application code. Secondly, distributing the deployment of the components on several servers results in an improvement of performance. For example, assume that a screen displays the personal data including the date of birth of a citizen on a screen. Within the application, the following pseudo code appears: DBMS_Interface.Get_personal_data(citizen_id) In the DBMS_Interface class, library, package, etc.. The actual SQL (Structured Query Language) or other calls are issued. Such layering of the code would have reduced the impact of the Year 2000 problem, as all references to retrieving and storing two-bytes representations of dates could have been identified and corrected in the DBMS interface class as opposed to wandering from program after program searching for references to dates. Other benefits of data independence are: Database Systems Page 9 Decoupling the front end from the back end of the database system. An example would be the use of Object Oriented applications with a relational database. Portability: migrating either the front end to another language or changing the DBMS Reusability: the ability to reuse all or parts of the system Therefore, it is strongly recommended that multi-tier architecture be implemented for the DBMS interface layer for database systems. Database Systems Page 10 5.0 Mandatory Conditions for Database Management Systems Having established that a Relational Database Management System (RDBMS) should be used, the minimum requirements that the RDBMS must perform without the use of human interference or external software programs should be defined. Note that these features are mandatory and that when using the Evaluation and Selection Framework, they represent pass or fail conditions for the selection of RDBMSs. 5.1 Condition 1: Arabic Support Any DBMS being purchased must comply with ANSI standards to support the national language character sets including Arabic. The database must be able to store Arabic characters, regardless of whether the operating system hosting it supports Arabic or not. 5.2 Condition 2: ODBC Support ODBC is considered as the primary standard for open systems. Most major software vendors support it. It allows a single executable (Source code) to access different DBMSs without recompilation. Applications using ODBC are independent of which DBMS is being used at the source code level and at the executable level. In addition, using ODBC allows the application to access more than one DBMS simultaneously. This independence is achieved by adding an extra layer, a DBMS specific driver, between the application and the DBMS(s). The driver intercepts the SQL call issued from the application (Specified by the ODBC API) and translates it to a DBMS specific call. (Refer to the diagram in Section 3.0). The architecture of ODBC has four components: 5.3 The application The driver manager DBMS-specific driver(s) and The data source (i.e., the corresponding DBMS). Condition 3: Integrity Constraints An integrity constraint is a condition that is enforced automatically by the DBMS and whose violation prevents the data from being stored in the database. The DBMS enforces integrity constraints in that it only permits legal instances to be stored in the database. The key constraint and the referential integrity constraints are identified as the two minimum constraints that must be enforced by the DBMS. 1. Key constraint or unique constraint: Every record must have one unique identifier called the primary key that has a unique value within the table or collection. Primary keys can be concatenated, which means that the uniqueness can be made up of one or more fields. 2. Referential integrity constraint or foreign key constraint: This constraint asserts that a reference in one data item indeed leads to another data item. A Database Systems Page 11 foreign key is a field that is a primary key in another table. Referential integrity consists of: Not inserting a record if the value of the foreign key being inserted does not match an existing record in another table with the primary key having the same value, Not deleting a record whose primary key is defined as a foreign key in child records and Not modifying the value of primary keys. Most DBMS enforce other types of constraints having to do with the data content of the field and usually called Check constraints. Examples are limiting the values of a field to a list of values or to a range of values, validating dates and checking the format of the data i.e., no alpha characters allowed in a numeric field, etc. 5.4 Condition 4: Transaction Management The DBMS must have a way to differentiate between a simple command or SQL statement and a set of commands or SQL statements that form one transaction. For example, a money transfer transaction that transfers money from account A to account B is not complete before the first account has been debited and the second account credited. 5.5 Condition 5: Multi Users Access The DBMS must allow simultaneous multiple users access to the database. Additionally, a locking protocol must be available to ensure that concurrent execution of transactions does not act on the same resource (Row or table). The DBMS first locks every resource in shared or exclusive mode in order to be read or written respectively by a transaction. These and additional locks can be further manipulated programmatically through the application. 5.6 Condition 6: Recovery from System Crashes Transactions can be interrupted before completion for a variety of reasons. Examples are: System crashes Operating system crashes User session crashes or Forced disconnection Etc. The DBMS must ensure that the changes made by such incomplete transactions are automatically removed from the database and without the intervention of the database administrator. Likewise, the DBMS must ensure that changes made before the crash remain in the database. So the DBMS must bring the database to a consistent state after a system Database Systems Page 12 crash by ensuring that the effects of all transactions that completed prior to the crash are restored and that the effects of incomplete transactions are undone. 5.7 Condition 7: Access Control Management of different security levels for accessing and/or manipulating the database must be available within the DBMS. Furthermore, users with similar access rights are to be grouped together, the DBMS should allow these groups or roles to be given the same privileges. This section reviews the minimum expected from the DBMS in order to provide access control to the database. However, these security levels are tools that must be put to use within the framework of a security plan or security policy before they begin to guarantee the security of the database. Section 8.1 will describe the guidelines for putting security plans and implementing them. 1. The DBMS must ensure that unauthenticated access of the database is not permitted regardless of the network or Operating System security enforced. Authentication means that the DBMS should always request a valid username and password to any application, session and user accessing the database. 2. The security levels are formed by coupling privileges with database objects. Database objects of interest are tables and views. Security at the column level is not always available. This feature would be desirable to have in an RDBMS system. The privileges of interest that can be granted on database objects are listed below and are commonly referred to as the CRUD privileges. CREATE READ UPDATE DELETE The The The The right right right right to to to to insert rows select rows and read data update the contents of a row delete rows Additionally, the right to define REFERENCES means that foreign keys (in other tables) that refer to the specified column or all columns can also be defined. Other privileges such as creating, dropping and altering tables are usually given to the system administrator or the user who owns the schema. 3. Privileges are assigned to individual users or to a group of users. The group of users represents divisions in the real world, where people having the same role or job within an organization perform similar job functions. DBMSs allow discretionary access control which means that users with privileges on objects may pass on these privileges to other users. Discretionary access control is less secure than mandatory access control. It assigns security classes to database objects and clearances to users. However, commercial DBMSs do not support mandatory access control. Section 8.0 shows how the discretionary access control can be strengthened to achieve more secure levels. Database Systems Page 13 6.0 Selecting the Architecture of a Database System Choosing the database system architecture relies on two factors: The underlying network on which the database system will run The need to distribute data across multiple databases because of geographical or administrative constraints The following sections define different architectures. They provide the reader with the criteria to be used when selecting the architecture of a database system. When selecting databases, such criteria can be used in the Evaluation and Selection Framework presented as a separate Standards and Guidelines Segment. 6.1 Centralized Databases A centralized database system is a system that keeps the data in one single database at one single location. In a centralized database system, a single machine called a database server hosts the DBMS and the database. Multiple users or client workstations can work simultaneously on a centralized database system using the Client/Server configuration, or the Intranet configuration if An underlying LAN (Local Area Network) is available (LANs can span one or few adjacent buildings) An underlying WAN (Wide Area Network) is available (WANs can span all Lebanon) The client/server architecture is a very successful and popular one as it balances the processing load between the client machine and the server machine. The ongoing growth of Internet and intranet applications has refocused attention on centralized databases. In such configuration, the bulk of the processing does not lie on the client machine, but rather on the machine hosting the Application Server and the database server machine. The main disadvantage of centralized database systems is that of single point of failure. When the database fails, work of all users is interrupted. However, when 24x7 operations are needed, there are solutions to minimize the risk of failure of the database server such as the use of a cluster server. Also in the case where WANs are used, failure of part of the network means the interruption of work at the remote location. Therefore, centralized databases are easier to manage, maintain and control for security purposes. They should be the selection of choice if there is no need for a more complex architecture. 6.2 Distributed Databases The main difference between centralized and distributed database systems is that, in the former, the data resides in one single location, whereas in the latter, the data resides in several locations or on multiple servers at the same location. Database Systems Page 14 The distribution of the data across locations should be transparent to the user who continues to use the software application interface from his/her computer. Distributed database systems involve many complex issues such as transparency, transaction management, optimization, data fragmentation and replication. Their design requires a high level of sophistication and competence from the supplier and their management requires an experienced Database Administrator. The issues summarized below must be assessed during the selection process. It is recommended that distributed architectures be used strictly on a per need basis because of the complexity of their design and maintenance. 6.2.1 Data Fragmentation In order to assess the need for a distributed database system, the required partitioning of the data or fragmentation must first be studied. Horizontal partitioning means that a record is stored at every location. For example, every branch stores the records of its customers. In the broader sense, data of the Lebanese government is horizontally partitioned, i.e., the records of people are stored in their respective Muhafazat and the citizen performs all his needs in the government agency of his Muhafazat. Vertical partitioning means that the parts of the record are stored in different locations. For example, the customer data is stored at the Customer Relation department in Dbayeh, the loans data is available in the Business Expansion department in Baabda, Etc. The distributed database can involve both horizontal and vertical partitioning. For example, each branch identified above keeps the data of the accounts that are opened in it. 6.2.2 Available Network The design of distributed database systems is strongly influenced by the type of underlying WAN or LAN. Distributed database systems involving vertical partitioning can run only on those networks that are connected continuously - at least during the hours when the distributed database is operational. Networks that are not continuously connected typically do not allow transactions across sites, but may keep local copies of remote data and refresh the copies periodically. For example, a nightly backup might be taken. For applications where consistency is not critical, this is acceptable. This is also acceptable for systems involving horizontal partitioning of the data. 6.2.3 Transaction Management This is used when vertical partitioning is used and special techniques must be applied in order to ensure that the transaction is applied in two different databases so as not to cause inconsistency. This technique is called the two-phase commit. Database Systems Page 15 It is recommended that the DBMS vendor provide the distributed transaction management software. The supplier should not attempt to write transaction management code nor buy a third party product for such a purpose. 6.2.4 Replication Replication is the process of synchronizing several copies of the same records or record fragments located at different sites and is used to increase the availability of data and to speed query evaluation. The supplier must lay out a detailed Replication Plan including The partitioning of the data and how to select data field names and key values so as not to cause conflicts between sites The timing of the replication (i.e., synchronous vs. asynchronous) Resolution of potentially conflicting updates at different sites and ways for detecting them Note that suppliers feel that they can handle replication and especially an asynchronous one (i.e., copying numerous records from one database to the other). Unless such activities are labeled remote backups, it is recommended that the DBMS vendor provide the replication software. The supplier should not attempt to write replication code nor buy a third party product for such a purpose. 6.3 Parallel Databases Parallel database systems make use of multiple processors such as cluster server that host the DBMS. The use of multiple CPUs allows database system activities to be speeded up, allowing faster response to transactions as well as more transactions per second. Parallel database systems can be selected when a very high volume of transactions per second is expected from the system or when more than 100 users are expected to log into the system at a given time. Examples might be filing taxes online, renewing vehicles registration, etc. It is recommended that the DBMS vendor provide the software programs that ensure that the DBMS take advantage of multiple processors. Under no circumstances should the supplier write such code or buy a third party product for such a purpose. Note that clustered servers are not exclusively specified for improving the performance of the database system through parallel processing. Rather, they might be specified for insuring a 24x7 availability of the database. In this case, DBMS vendors should also provide software programs that ensure that the DBMS can switch automatically from one node of the cluster server to the other in the case of node failure. Database Systems Page 16 6.4 Web Based Databases With the advent of the web, the trend is towards using the internet and the intranet for internal and external applications. It follows that all database systems be fully web enabled or at least contain Web based components The architecture of a web based database system involves the following software components: Web Browser connecting the user to the Internet Web Server receiving requests from a remote Web Client. The Web Server simply retrieves the page defined by its URL and sends it to the Web Browser. Often, the Web Browser has to execute a program in order to assemble a dynamic page. In this case, the Web Browser has to access data in a DBMS Application Servers are optional but are recommended for the Web architecture. They facilitate the execution of programs and include security, session management, etc… Software application interfaces with the Application Server to provide the pages DBMS interfaces with the Application Server to provide the data needed The above is depicted in Figure 2. The Web Server is connected to the Internet and handles all requests from remote Web Clients. The Web Server communicates with the web enabled database system Application Server to retrieve the page requested. The Application Server contains the web-enabled program that dynamically accesses the data from the Database Server and the layout screen from the Application Server to send the page requested. Application servers may be DBMS vendor specific. Hosting the different software components on different physical server machines is recommended. At a minimum, the DBMS and the database should always be hosted together (Database server) separate from the Web or Application servers. The field is still in motion so that a standard cannot be recommended at this point for programming languages of software applications. However, it is recommended that XML pages instead of plain HTML be generated by the various Web based software applications. (XML is a document description standard that allows the description of the content and structure of the document in addition to giving display directives.) Document Type Declarations (DTDs) are being developed for various application areas. In very simplistic terms, the DTDs are templates or a relational database schema where specific fields have placeholders. Already XML-Query languages are being developed and commercial DBMS vendors can only catch the wagon to fully support the XML standard. There are other issues concerning Web based database systems that do not fall under the scope of this segment. For example, keyword search - which is the most common kind of query on the Web today - is not suited for databases. In all databases, it is not possible at the native level to search on the content of the field without knowing the field name (or the object name). However, there are full context search engines that can compile an index of content by searching all fields and report out a search result with % accuracy of the hit. Database Systems Page 17 Web Browser HTTP Internet Web Server Application Application Server Server Internet Server JDBC/ODBC Database System Database Server Figure 3: Architecture of a Web Based Database System Database Systems Page 18 7.0 Good Practices for Designing the Database Design is the process of translating the abstract users’ needs into a model that can be a workable foundation for constructing the database system. The following sections present some Good Practices to be followed in the design of database systems. 7.1 Data Modeling Data modeling is the process of defining the structure of the database. Mainly it is concerned with laying out the data stores of interest (Objects, entities, or tables) and to define the relations between them. The relations are the means of navigating between one table and the other and so must be uncovered and accounted for in the design phase in order to ensure proper execution of the application later during the development phase. The data modeling activity ultimately results in generating a schema, i.e., the organization of columns across tables and the definition of the relation between them. Several models do exist for the modeling of data: Entity Relationship (ER), Object Role Modeling (ORM), ODMG Object Model, etc... Other methods exist such as Yourdon, Merise and the UML that include data modeling. The most popular data modeling tool is the Entity Relationship (ER) modeling. Most database data modeling currently uses some variant of entity-relationship (ER) modeling. Such models focus on the objects of interest and the business relations between them such as to keep some link between the model and the real world. Relations can be modeled with real world business rules and terminology, for example, one customer orders many products, one customer may open many bank accounts, many work orders are combined with many sales orders, every citizen must have a unique identifier, etc... ERs are adaptable to object modeling. In this sense, objects can be modeled as entities (persons, cars, bank accounts, etc...). Finally, ER models have a special importance because several CASE tools are built around them. CASE tools provide a way to generate a relational schema directly from them. It is recommended that ER modeling be used. Furthermore, the ER diagram should be a required part of the standard documentation of a database system. 7.2 Schema Representation The database schema is the representation of how the data is organized within the database and what the database objects available in the database are. It is a recommendation of this segment to have the data dictionary as the official representation of a database and to be a required part of the standard documentation of any database system. The data dictionary format and content will be discussed in the Data Exchange Segment. Database Systems Page 19 7.3 The Use of CASE Tools CASE tools are Computer Assisted Software Engineering tools. These are now available to generate both back ends (The database) and front ends (The software application) for database systems. The use of CASE tools is recommended because they: Maximize the consistency between the design and the implementation Minimize human error incurred by translating the design into source code Shorten the time spent on the development of source code Facilitate the implementation of change control procedures Some CASE tools even integrate configuration management and allow the release of multiple versions of the database. The designer focuses on implementing the change at the highest level (the design level) and the CASE tool takes care of propagating the change down to the physical database level. This is of outmost importance when new releases occur after the start of operation of the database system. The possibility of down time because of version upgrade is minimized. Other CASE tools can be used for modeling ERDs and would hence be able to generate scripts for creating databases for different platforms. It is recommended to use CASE tools to generate both the back end (the database) and front end (the software application) of database systems. At a minimum, CASE tools should generate the back end, i.e., the database. 7.4 Naming Conventions Historically, names could be eight characters long and programmers used abbreviations and coded names to represent the data content of the field. Furthermore, some software engineers have taken the practice of coding the name of the table into the name of the fields belonging to that table. It is recommended that field names be as natural as possible. This is important because the table name can be queried from the database data dictionary at any time; it is also recommended that the name of the table not be included in the field name. This recommendation leads to a better readability and a more universal understanding of the data content. The most suitable practice would be to issue internal standards for the Naming Convention of database elements. 7.5 Decomposition and Normalization Normalization is the process of organizing the data at hand into tables. Normalization is at the heart of the relational model theory. It states that any subset of the data can be accessed if the database is in the Third Normal Form. Normalization insures that a query can be written to retrieve any information from the database. Database Systems Page 20 The process of normalization applies to relational databases only. Figure 4 depicts this process. As a first step, all the data fields necessary to conduct a business are organized into one long record. The resulting table is called the Un-normalized Form. In order to move to the First Normal Form, repeating groups must be removed and put into different tables. The primary key must be identified at this stage too. Unnormalized User View Remove Repeating Groups First Normal Form (1NF) Remove Partial Dependencies Second Normal Form (2NF) Remove Transitive Dependencies Third Normal Form (3NF) Figure 4: The Normalization Process In order to move to the Second Normal Form, partial dependencies must be removed in order that all non-key attributes become fully dependent on the primary key even if the primary key is concatenated (made up of many fields). In order to move to the Third Normal Form, transitive dependencies must be removed. Transitive dependencies occur when non-key attributes are dependent on a non-key attribute. 7.6 Using Database Triggers A database trigger is a procedure that is automatically invoked by the DBMS in response to a change event against the database. A database that has triggers attached to it is called an active database. The change events and the timing of the firing are specified within the trigger code and they are: Before Insert After Insert Before Update After Update Before Delete and After Delete The importance of triggers emanates from the fact that they are fired according to their set-up regardless of the source that is requesting the change. For example, when database triggers are fired in response to an Update operation, the trigger code is executed whether application 1, application 2, or the database administrator through the SQL interface of the DBMS is performing the operation. Database Systems Page 21 The frequency of firing the database trigger can also be controlled from within the code of the trigger. The trigger can fire once for each row being affected by the operation (Row-level trigger) or it can fire only once regardless of the number of rows being affected by the operation (Statement-level trigger). While database triggers are especially suited for auditing and statistical gathering, there is no need or no way to write a standard for limiting the scope of their use by application programmers. The standard however recommends that application suppliers and database administrators document database triggers because the maintenance of active databases is very difficult. The documentation is necessary because maintenance personnel must trace the error condition to either application code or database trigger action. Finally, some conditions may cause database triggers to fire in a chain reaction and this needs to be documented. The database trigger documentation should include the name of the trigger, the firing event or operation, the frequency of the firing, the name of the source table, the name of any table(s) affected by the trigger, a short description of the actions of the trigger. 7.7 Primary Key Selection Each record (row) must be uniquely identified within the table. Primary keys cannot be modified or updated throughout the life of the database. If the primary key must be updated because it was not selected properly, the entire row must be deleted and recreated. If the record has dependent records or children in other tables, which have children in other tables, deleting the record is an affair by itself. So encoding meanings in the value of the primary key is a dangerous practice that must be discontinued. For example the following candidate for identifying a citizen is not valid: the first three characters encode the city of residence, the next three digits encode the religion and the last seven digits are the unique home telephone number. All pieces of this candidate key are subject to change. The practice of encoding meaning into the primary key value is inherited from the paper bureaucracy and from the tight memory years. Now that powerful database systems are available, it is possible to search on any combination of fields and to create indexes on any combination of fields. Primary keys must be able to be quickly and easily generated and should not depend on other data for their generation. Because a record cannot be inserted in the database without a primary key, any violation in the generation of the primary key can result in loosing the data and not being able to insert the record. Therefore, the use of unique, automatically generated serial numbers is recommended for primary keys. Database Systems Page 22 8.0 Good Practices for Operating and Maintaining Database Systems The following sections present some good practices to be followed in the operation and maintenance of database systems. 8.1 Data Access and Security As seen in section 5.7, the recommended standard requires that any DBMS in use by the government must have the ability to assign different security levels to different users or groups of users (roles). However, with the best intentions, this ability does not enforce security. A clear and consistent security policy or security plan must be developed around these abilities. A security plan must include the following: Identification of the objects that must be protected: i.e., tables, views and columns and the reason for the protection. Identification of the privileges associated with the protected objects (CRUD) Identification of the users or group of users who get access to the protected objects and to all objects. Definition of the procedures to be followed in the handling of protected objects. For example this table needs a journal, updating a field in this table causes the before image to be dumped in a history table, etc... In parallel to the security plan, the following security policies must be followed: 8.2 Assign privileges to users (or roles) at the database level and not at the application level Rely on the audit trail to identify user activities Every user must have his own username and password Educate users not to give own username and password to others Data Storage and Space Allocation 8.2.1 Disk Storage Suppliers must deliver a plan on how they plan to store the database files on the physical hard disk. The principle is simple and consists of “not putting all the eggs in the same basket” in order to minimize the risk of physical disk failure and to improve performance (as disk read/write operations are still to date much slower than memory access). The details depend on the hardware configuration at hand, which cannot be completely covered by this segment. However, the standard offers the following guidelines: Identify tables with similar functionality hence similar access frequency and balance them across the physical disks so as not to use one physical disk more than the others. The different table groups are: frequently accessed lists of values tables, indexes, audit, journal and other read-only tables, etc... Database Systems Page 23 Identify the initial and expected size of tables after three years of operation and plan the initial storage allocation accordingly Remember that the disk mean time between failures (MTBF) is considered to be about 5.7 years by the industry. Therefore, increase the vigilance with time as opposed to getting comfortable with the system with time. A note must be added about RAID technology. (Please refer to the Hardware Segment for a review this hard disk technology). Redundant Arrays of Independent Disks (RAID) is a technology that arranges several disks, controlled by software to simulate having one large disk. The RAID comes in different levels. Depending on its level, RAID mirrors the data across the disks with the result that only a percentage of the total disks space is available to the user. RAID uses the rest of the disk space to store mirror data and to automatically restore it upon the failure of one or more disks. With RAID, the database administrator may either let the RAID control the storage of the data files on individual disks or he may partition the logical disks himself. 8.2.2 Index Selection The following guidelines for index selection are recommended: All primary keys and foreign keys must be indexed as a minimum Additional indexes can be added according to specific query needs. Attributes used in the WHERE clause of a SQL statement are good candidates for indexing. Additional indexes add on DBMS workload. The benefits of indexing outweigh the overhead cost associated with its maintenance. Adding indexes is one of the first actions to be considered to improve query and update performance. However, reevaluating indexes and dropping some should also be considered in the optimization of a particular update operation if it is taking too long. Hash versus tree index: B+ tree index is usually preferable because of its versatility with both range queries and equality queries. It is usually the default indexing mechanism in all DBMS. However, hash clustering can be used for lists of values tables (i.e., genders, sex, title, etc.) Clustering indexes can sometimes lead to performance benefits. It is recommended that the supplier not cluster any indexes because this task is best suited for the database administrator in charge of operating the database. 8.3 Tuning the Database The purpose of tuning is to adjust the parameters of the database engine in the light of the changes that are affecting it during operation. Some known causes for the decay of performance during operation are the increase in the number of the users and the increase in the number of operations with time. The database is tuned during the design phase for the anticipated operation phase. However, no matter how careful the designers are, the tuning must be reevaluated periodically during the operation phase. Preventive routine tuning is the best guard against performance degradation. Database Systems Page 24 The Database administrator (DBA) is responsible for tuning the database system. The DBA should establish measurable tuning goals. Recommended measurable tuning goals are: Response times: Response times address how long it takes for a user to receive data from a request, i.e., the result set of a query, or the time it takes to update a table, or generate a report. Database availability: Backups, changing tuning parameters and other housekeeping should be done as fast as possible. Memory utilization: Excessive paging and swapping can impact database and operating system performance. Disk utilization: Contention for disks should be kept to a minimum. The distribution of the data on disks shall be monitored for early detection of lack of free space in disks and table spaces. The specificity of the tuning performed is bound to the database engine and vendor so more cannot be elaborated on the subject by this segment. However, the guidelines specify that routine preventive tuning be performed on database systems twice a year as part of the maintenance agreement. The database vendor providing the maintenance is the best candidate to perform tuning on the database with the coordination of the DBA. 8.4 Auditing the Database Auditing is the ability to trace user access and manipulation of data. There is no magic solution to this problem. Usually, the DBA sets programs to trace a specific problem before finding the culprit. The programs that the DBA can use to audit access to the database either come with the DBMS package, or are custom developed by the DBA. The Audit Trail utilities that come with the DBMS can be parameterized to trace specific problems, such as tracing a user session, activity (UPDATE, DELETE), or tables (Who is accessing this particular table). Anything beyond these choices leads to the development of programs and database triggers. The guidelines for auditing are: Define the tables that need to be traced and the activities of interest on these tables Define specific fields that need to be traced. For example, change in the value of the salary. Add username, date and time stamp fields in every table. Up to four fields can be used: the user who created the record, date and time of creation, the last user who modified the record, date and time of last modification. These fields by themselves are not particularly helpful as they record the last modification action. In order to obtain a log of any and all modifications on a table, the Audit Trail must be used in conjunction with these fields. Database Systems Page 25 8.5 Records of sensitive tables are never deleted; rather they are moved into a history table with the same name as the original table along with the name of the user who deleted the record and the date and time of deletion. The same activity can be made to take place for updates, i.e., the before image is moved into a history table Use database triggers and not application procedures for recording records in the history tables The DBA must clean up the Audit Trail and the History tables periodically. Backup, Disaster Recovery and Contingency Planning These issues are discussed in the Segment on Security. However, here is a list of issues that are Database Systems specific: A Contingency Plan shall be developed by the DBA to document how many days of work are lost due to a crash when the best recovery is possible and how the data lost can be recovered. To minimize the department’s down time, the contingency plan must describe the methods used to revert to manual operations and the methods of re-entering the manual data into the database once it has been recovered. Usually in database systems a paper record exists for transactions being entered (application forms, change of address forms, etc...) Such records must be identified by the plan as well as the methods for reverting to manual operation and to go back to the automated system. Logical Backups are not acceptable if they are the only method of backup being performed. Logical backups are software specific copies of the database. Logical backups are performed using a tool available in the DBMS and this is why they cannot be dependable. If the version of the database is upgraded, or if the DBMS is not available, the logical backup is also unavailable and all hopes of recovering dissipate. The Export and Import utilities are examples of logical backups. Physical Backups (disk to disk copy) must be available for the whole database even if that means shutting the database down periodically. The recommended frequency is weekly and always before upgrades. The DBA must be trained on the DBMS with regard to Disaster Recovery. DBMS are offering sophisticated methods for recovery. Some DBMS allow recovery of the database from log files and archive files and it is not necessary to restore the database from other media at every failure. The database scripts, commands and programs used to create the database and populate it, or used to upgrade the database system must be secured and made available. A clear documentation of the installation and upgrade procedures must also be available. This is in case the database is unrecoverable and needs to be built from scratch. Database Systems Page 26 8.6 Documenting the Database System From its inception in the design phase of the software development process to its daily operation requirements, the database system needs a set of complementary documents to explain it and manage it. The various sections of this standard name and describe the relevant documentation needed for the section. The following table summarizes the list of the documentation needed along with the section number of this segment where the document is discussed. Document Name Auditing Requirements Back Up and Recovery Plan Data Dictionary Database System Architecture Database Tuning Log Disk Storage and Allocation Plan ER Diagram (under Data Modeling) Errors Log (under Roles and Responsibilities) Installation Log (under Decomposition and Normalization) List of Database Triggers List of functions accessing the database Replication Plan Security Plan Section 8.2 8.5 7.2 6.0 8.7 8.2.1 7.1 8.7 7.5 7.6 4.2.2 6.2.4 8.1 Figure 5: Database Systems Documentation Checklist Kindly use the template for this checklist which can be downloaded from OMSAR's website for ICT Standards and Guidelines at www.omsar.gov.lb/ICTSG/DB. 8.7 Roles and Responsibilities During Operation and Maintenance Software systems cannot operate without human intervention. They need operators who know how they work. Consider an application that records employee attendance through the use of a hardware interface (Hand or finger print machine). The inputs and outputs of such an application are known and become limited with time (i.e., who is late, who is absent, is Friday different from other week days, etc...). Now consider a database system that is used as a financial ERP system. As work procedures evolve and awareness is spread about the contents of the database, more and more expectations will be generated about the system. Management would play WHAT IF scenarios and needs new reports to confirm their suppositions. Through time, modification in the business rules of the organization will require resetting parameters values at best and modifying the application source code itself in the worst case. Finally, user error may cause data corruption and require the intervention of professionals. Therefore, because of their nature, database systems need on site human resources for their successful operation, administration and maintenance. Without thorough understanding of how the system functions, no one can use the database system. Database Systems Page 27 Most importantly, a clear statement of who owns and is responsible for the maintenance of the database is crucial for the successful operation of the database system. The following roles for the successful operation and maintenance of a database system are identified: Role System Owner Super User User ICT Manager Description Department head (not from the ICT Unit) in the organization. The functions performed by the system are the functions performed by his/her department. If the database system performs the functions of more than one department, then each department head is a Module Owner. Is a user appointed by the System Owner that has in depth functional knowledge about one or more modules. This user can teach others. Is a regular department employee or a data entry clerk. Head of the ICT Unit Duties - Understands the full functionality of the system - Reviews error logs with ICT manager and decides on the action to follow - Requests new reports - Requests new functionality - Uses the system - Reports errors to the ICT Unit - Routine control/checking of data entered by fellow users - Network Specialist - Programmer Analyst - DBA - Uses the system Reports errors to the super user Reports errors to the ICT Unit Manages the department Prepares a summary of all errors (errors log) reported for review and approval by the system owner Fixes operating system problems Fixes hardware problems such as printers and monitors Fixes network problems Develops new reports Fixes errors from the approved error log as assigned by the ICT Manager Updates the system documentation (User Guide) Assists users Understands the database schema Performs backups and recoveries Performs routine tuning Solves database performance problems Responsible for database security and user audit Figure 6: Roles and Responsibilities Figure 6 displays the distribution of the people involved within the hierarchy of the organization. It is of vital importance that each player understands his role and responsibility. The role of the system owner should not be faded because of computer Database Systems Page 28 phobia. The role of the ICT Unit should be restrained to operate and support the system. The ICT Unit is not the owner of the database system. The database system owner should be the owner of the business process. Database Systems Page 29 The analogy is having the telephone company (who is technically responsible for the telephone lines) change the telephone numbers and the ring volume without the prior knowledge and consent of the subscribers. Figure 7: Human Resources Involved in the Operation Note that when users report errors or request help, any of the following can be true: The user is misusing the system The user is unaware of the existence of the functionality needed in the system The user has uncovered a bug (an anomaly in computing). It is important to teach users to try to repeat the error because knowing the sequence that caused the error is the first step in correcting it The user has a hardware, network, or operating system problem The user is requesting a new functionality. It is especially important to have a clearly defined policy that identifies what is an error and what is a change request. This process is explained in the Segment on Change Management. The role of the ICT Unit staff is to keep an Error Log to differentiate between the types of errors, especially between bugs and new functionality. New functionality needs the approval and the planning of the system owner. Note: Even though this section states how the change request originates, it does not concern itself with how to perform the change requested. The Configuration Management segment takes care of that. Database Systems Page 30 8.8 Training Two types of training are specified for database systems operators: Formal Technical Training Functional Training Technical training mostly concerns the ICT Unit. Topics should include Database administration, tuning and especially backup and recovery Programming Reports development Decision support systems Network administration Database system users can also attend technical training. The topics should be restrained to operating system and word processing. The object of the training is to increase the user’s confidence in computing and to reward the user. Functional training is administered by the supplier of the application or COTS. It is essential that a group of super users be identified early on and trained thoroughly on the system. The group of super users may administer additional functional training. Functional training should be administered periodically (i.e., every year) to revive the awareness about the database system. Database Systems Page 31