* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download National University of Science and Technology
Survey
Document related concepts
Transcript
Databases Definition - Database It is often said that we live in an information society and that information is a very valuable resource (or, as some people say, information is power). In this information society, the term database has become a rather common term although its meaning seems to have become somewhat vague. Some people use the term database of an organisation to mean all the data in the organisation (whether computerised or not). Other people use the term to mean the software that manages the data. We would use it to mean a collection of computerised information such that it is available to many people for various uses. Some authors would put it to us that a database is a collection of data that is organised so that its contents can easily be accessed, managed, and updated, or simply a database is a self-describing collection of integrated records but a more encompassing definition would be as follows: 1. A database is a collection of interrelated data designed to meet the varied information needs of anorganisation. It has 2 most important properties: It is integrated and it is shared. Integration allows reduction of data redundancy and facilitates data access since previously distinct data files will have been logically and coherently organised. Sharing allows all potential users in an organisation to have access to the same data for use in a variety of ways/ activities. Databases contain aggregations of data records or files, such as sales transactions, product catalogsand inventories, and customer profiles. 2. A database is a well-organised collection of data that are related in a meaningful way, which can beaccessed in different logical orders but are stored only once. The data in the database is therefore integrated, structured, and shared. Some of the Database characteristics Shared collection of logically related data (and a description of this data), designed to meet the information needs of an organisation. System catalog (metadata) provides description of data to enable program–data independence. Logically related data comprises entities, attributes, and relationships of an organisation's information. Hierarchy of data elements (bits => bytes => fields => records => files => database) A database is self-describing: Database contains a description of its own structure Data dictionary, data directory, metadata Why is self-describing important? Promotes program/data independence Changing the structure of data affects few programs A database is a model of a model A database is a model of the user's model, not a model of reality Databases vary in their level of detail: the degree of detail depends on the information desired => important part of designing a database The main features of data in a database therefore are: - It is well organised - It is related - It is accessible in different orders without great difficulty - It is stored only once It is assumed that operations (update, insert, retrieve, etc.) on the database can be carried out in a simple and flexible way. Also since a database tends to be a longterm resource of an organisation, it is expected that planned as well as unplanned applications can (in general) be carried out without great difficulty. In any modern organisation, a large amount of data is generated about its operations. This data is sometime called operational data. The operational data: 1. Includes the data an organisation must necessarily maintain about its operation. 2. Includes relationships linking basic entities. 3. Excludes input, output data, work queues, temporary results or any transient information. Since data is a valuable resource for any enterprise, often a great deal of money is spent collecting, storing and maintaining data. The running of an enterprise may depend on proper maintenance of its operational data. For example, a university’s operational data may include the following: 1. Student personal data (e.g. name, sex, current address, home address, date of birth, nationality) 2. Student academic data (e.g. school results, academic history, current enrolment) 3. Academic staff data (e.g. name, sex, current address, date of birth, nationality, academic qualifications, appointment history, current appointment, salary history, current salary, sabbatical leave information, recreational leave, sick leave, etc.) 4. Non-academic staff data ( e.g. name, sex, current address, date of birth, nationality, trade qualifications and experience, appointment history, current appointment, salary history, current salary, recreational leave, sick leave, etc.) 5. Subjects offered data (e.g. subject name, department, syllabus, lecturer, quota if any) 6. Financial data (e.g. budget information, receipts, expenditure etc.) The above data is certainly not a complete list of data that a university generates (for example, data generated by the library is not included) and the reader will no doubt think of other information that should be included. The above data would have a number of classes of users, for example 1. Lecturers, who need information about enrolments in the subjects that they teach and Heads of Departments about finances of the departments. These are users that use the information in the database but do not develop their own software. They are often called end users. 2. Programmers in the NUST ICTS department who develop programs (called application programs) to produce reports that are needed regularly by the governments, university administration and the departments. These programmers are called applications programmers. They are also responsible for developing programs that assist the end users in retrieving the information that they need in their work. 3. The Registrar or some other person who is in charge of the database and makes decisions about what information is stored in the database and who can modify and access it. To conclude we note that data is A valuable resource and an investment Needed to manage other resources efficiently Should be managed like other resources (e.g. manpower) Database application. An application program (or set of related programs) that is used to perform a series of database activities (create, read, update, and delete) on behalf of database users. Characteristics of the database approach a single repository of data is maintained that is defined once and the accessed by various users. Self describing nature of a database system The db system contains not only the database itself but also a complete definition or description of the database. This definition is stored in the system catalog. The system catalog contains information such as the structure of each file, the type and storage format of each data item, etc. The information in the catalog is called meta-data. Insulation between programs and data, and data abstraction „ In traditional file processing the structure of files is embedded in the access programs, so any changes to the structure of a file may require changing all Programs that access the file while in the DB approach the structure of data files is stored in the DBMS catalog separately from the access programs. This is called program- data independence. „ When we consider object oriented databases we may also talk of programoperation independence „ Data abstraction allows program-data independence and program-operation independence. - Support of multiple views of the data - Sharing of data and Multi-user transaction processing Traditional File systems vs. database systems File-based Systems Collection of application programs that perform services for the end users (e.g. reports). Each program defines and manages its own data.- each user defines and implements the files needed for each specific application.- there is redundancy in defining and storing data which in turn results in wasted storage space. Why a database system is needed? Limitations of File-based Approach 1.File processing systems store groups of records in separate files… 2. Separation and isolation of data - Each program maintains its own set of data. - Users of one program may be unaware of potentially useful data held by other programs. 3. Duplication of data (redundancy) - Same data is held by different programs. - Wasted space and potentially different values and/or different formats for the same item => data integrity problem: produce inconsistent results 4. Data dependence/ application program dependency – File structure/ format and records are defined in the program/application code. – Changes in formats must be reflected in the code – Time consuming and error prone tasks 5. Incompatible file formats – Programs are written in different languages, and so cannot easily access each other’s files, rather files written in different programming languages cannot readily be combined or compared. 6. Fixed Queries/Proliferation of application programs - Programs are written to satisfy particular functions. Any new requirement needs a new program. 7. Difficulty of representing data in user’ view point - Relationships among records are not readily represented or processed Database Approach Arose because: – Definition of data was embedded in application programs, rather than being stored separately and independently. – No control over access and manipulation of data beyond that imposed by application programs. Result – The database and Database Management System (DBMS). Database Approach - Integrated data - All the application data is stored in a database - Programmer is not responsible for co-ordinating files; DBMS will do it. - Less duplication of data - Data is stored in only one place - Less data integrity problems - Program/data independence - Record formats are stored in DB itself, so it is accessed by DBMS, not by application programs - Minimises the impact of data format changes on application programs - Easier representation of user’s view of data. - Controlled access to database Controlled access to database may include: – A security system. – An integrity system. – A concurrency control system. – A recovery control system. – A user-accessible catalog. Benefits of the database approach: - Minimal data redundancy and improved data consistency The concept of normalisation ensures that there is reduced data redundancy in a database. - Ease of access to data/ Improved data accessibility and responsiveness Data in a database is interrelated and is in the same format. This facilitates better data retrieval for general use. - Increased of development productivity/Ease of application development and reduced program maintenance/ reduced application development time. New application programs to manipulate data can be written with ease because the data is integrated and is in the same format. Designing and implementing a new database from scratch may take more time. - Improved data sharing Database systems allow multiple access and update of data in a consistent manner. They also allow different views of the same data. - Enforcement of standards/Improved security and integrity - Improved data quality - Availability of up to date information - Flexibility/Program-data independence - Data is independent from applications and shared by multiple users and applications. It should be possible to effect changes to an application program that accesses the data without having to change the structure of the data itself. Similarly it should be possible to change the structure of the data without affecting the application program that operates on it. - Persistence It is possible to maintain data over long periods of time, independent of any program that accesses it. - Resilience The ability of data to survive hardware and software failures without sustaining loss or becoming inconsistent can be provided for in a DB environment. To sum up the potential benefits of the database approach are: a. Program-data independence b. Minimal data redundancy c. Improved data consistency d. Improved data sharing e. Increased development productivity / reduced application development time f. Enforcement standards g. Improved data quality h. Improved data accessibility and responsiveness i. Reduced program maintenance The DBMS Database management systems evolved from generalised routines for file processing, as the users demanded more extensive and more flexible facilities for managing data. A DBMS is a software system that enables users to define, create, and maintain the database and which provides controlled access to this database. Alternatively a DBMS can be defined as: - A collection of software that allows the creation and maintenance of the database. - A software that facilitates the process of defining, constructing and manipulating a database for various applications. - A software application, which allows the storage, retrieval, and manipulation of information in a prescribed format. In its purest form, a DBMS does not allow for unformatted data. This restriction allows quick indexing, sorting, and other data processing. The DBMS interfaces with application programs, so that multiple applications and users can use the data contained in the database. It is an intermediate link between the physical database, the computer and the operating system, and on the other hand, the users. A DBMS relieves the user from having to know about exact physical representations of data and having to specify detailed algorithms for storing, updating and retrieving data. The DBMS allows users to deal with the data in abstract terms, rather than as the computer stores the data A DBMS can be thought of as a file manager that manages data in databases rather than files in file systems. The DBMS manages user requests (and requests from other programs) so that users and other programs are free from having to understand where the data is physically located on storage media and, in a multi-user system, who else may also be accessing the data. In handling user requests, the DBMS ensures the integrity of the data (that is, making sure it continues to be accessible and is consistently organised as intended) and security (making sure only those with access privileges can access the data). The most typical DBMS is a relational database management system (RDBMS). A standard user and program interface is the Structured Query Language (SQL). A newer kind of DBMS is the object-oriented database management system (ODBMS). A DBMS is usually an inherent part of a database product. On PCs, Microsoft Access is a popular example of a single- or small-group user DBMS. Microsoft’s SQL Server is an example of a DBMS that serves database requests from multiple (client) users. A database system consists of - The database (data) - A DBMS (software) - A DDL and a DML (Part of the DBMS) - Application programs Defining a DB This involves specifying the data types, structures and constraints for the data to be stored in the database. Constructing the DB The process of storing the data itself (populating the DB with data) on some storage medium that is controlled by the DBMS. Manipulating the DB This includes performing such functions as querying the DB to retrieve specific data/information updating the DB, report generation etc. Advantages of using a DBMS There are three main features of a database management system that make it attractive to use a DBMS in preference to more conventional software. These features are centralized data management, data independence, and systems integration. In a database system, the data is managed by the DBMS and all access to the data is through the DBMS providing a key to effective data processing. This contrasts with conventional data processing systems where each application program has direct access to the data it reads or manipulates. In a conventional DP system, an organization is likely to have several files of related data that are processed by several different application programs. In the conventional data processing application programs, the programs usually are based on a considerable knowledge of data structure and format. In such environment any change of data structure or format would require appropriate changes to the application programs. These changes could be as small as the following: 1. Coding of some field is changed. For example, a null value that was coded as -1 is now coded as -9999. 2. A new field is added to the records. 3. The length of one of the fields is changed. For example, the maximum number of digits in a telephone number field or a postcode field needs to be changed. 4. The field on which the file is sorted is changed. In DBMS, all files are integrated into one system thus reducing redundancies and making data management more efficient. In addition, DBMS provides centralized control of the operational data. Some of the advantages of data independence, integration and centralized control are: 1. Redundancies and inconsistencies can be reduced In conventional data systems, an organization often builds a collection of application programs often created by different programmers and requiring different components of the operational data of the organization. The data in conventional data systems is often not centralized. Some applications may require data to be combined from several systems. These several systems could well have data that is redundant as well as inconsistent (that is, different copies of the same data may have different values). Data inconsistencies are often encountered in everyday life. For example, we have all come across situations when a new address is communicated to an organization that we deal with (e.g. a bank, or Telecom, or a gas company), we find that some of the communications from that organization are received at the new address while others continue to be mailed to the old address. Combining all the data in a database would involve reduction in redundancy as well as inconsistency. It also is likely to reduce the costs for collection, storage and updating of data. 2. Better service to the Users A DBMS is often used to provide better service to the users. In conventional systems, availability of information is often poor since it normally is difficult to obtain information that the existing systems were not designed for. Once several conventional systems are combined to form one centralized data base, the availability of information and its up-todatedness is likely to improve since the data can now be shared and the DBMS makes it easy to respond to unforeseen information requests. Centralizing the data in a database also often means that users can obtain new and combined information that would have been impossible to obtain otherwise. Also, use of a DBMS should allow users that do not know programming to interact with the data more easily. The ability to quickly obtain new and combined information is becoming increasingly important in an environment where various levels of governments are requiring organizations to provide more and more information about their activities. An organization running a conventional data processing system would require new programs to be written (or the information compiled manually) to meet every new demand. 3. Flexibility of the system is improved Changes are often necessary to the contents of data stored in any system. These changes are more easily made in a database than in a conventional system in that these changes do not need to have any impact on application programs. 4. Cost of developing and maintaining systems is lower As noted earlier, it is much easier to respond to unforeseen requests when the data is centralized in a database than when it is stored in conventional file systems. Although the initial cost of setting up of a database can be large, one normally expects the overall cost of setting up a database and developing and maintaining application programs to be lower than for similar service using conventional systems since the productivity of programmers can be substantially higher in using non-procedural languages that have been developed with modern DBMS than using procedural languages. 5. Standards can be enforced Since all access to the database must be through the DBMS, standards are easier to enforce. Standards may relate to the naming of the data, the format of the data, the structure of the data etc. 6. Security can be improved /security enforcement possible In conventional systems, applications are developed in an ad hoc manner. Often different system of an organisation would access different components of the operational data. In such an environment, enforcing security can be quite difficult. Setting up of a database makes it easier to enforce security restrictions since the data is now centralized. It is easier to control who has access to what parts of the database. However, setting up a database can also make it easier for a determined person to breach security. We will discuss this in the next section. 7. Integrity can be improved Since the data of the organization using a database approach is centralized and would be used by a number of users at a time, it is essential to enforce integrity controls. Integrity may be compromised in many ways. For example, someone may make a mistake in data input and the salary of a full-time employee may be input as $4,000 rather than $40,000. A student may be shown to have borrowed books but has no enrolment. Salary of a staff member in one department may be coming out of the budget of another department. If a number of users are allowed to update the same data item at the same time, there is a possibility that the result of the updates is not quite what was intended. For example, in an airline DBMS we could have a situation where the number of bookings made is larger than the capacity of the aircraft that is to be used for the flight. Controls therefore must be introduced to prevent such errors to occur because of concurrent updating activities. However, since all data is stored only once, it is often easier to maintain integrity than in conventional systems. 7.1 Availability of up-to-date information to all users 8. Enterprise requirements can be identified All enterprises have sections and departments and each of these units often consider the work of their unit as the most important and therefore consider their needs as the most important. Once a database has been set up with centralized control, it will be necessary to identify enterprise requirements and to balance the needs of competing units. It may become necessary to ignore some requests for information if they conflict with higher priority needs of the enterprise. 9. Data model must be developed Perhaps the most important advantage of setting up a database system is the requirement that an overall data model for the enterprise be built. In conventional systems, it is more likely that files will be designed as needs of particular applications demand. The overall view is often not considered. Building an overall view of the enterprise data, although often an expensive exercise, is usually very cost-effective in the long term. Data independence allows dynamic changes and growth potential Disadvantages of using a DBMS A database system generally provides on-line access to the database for many users. In contrast, a conventional system is often designed to meet a specific need and therefore generally provides access to only a small number of users. Because of the larger number of users accessing the data when a database is used, the enterprise may involve additional risks as compared to a conventional data processing system in the following areas. 1. Confidentiality, privacy and security. 2. Data quality. 3. Data integrity. 4. Enterprise vulnerability may be higher. 5. The cost of using DBMS. Confidentiality, Privacy and Security/problems associated with centralization When information is centralised and is made available to users from remote locations, the possibilities of abuse are often more than in a conventional data processing system. To reduce the chances of unauthorized users accessing sensitive information, it is necessary to take technical, administrative and, possibly, legal measures. Most databases store valuable information that must be protected against deliberate trespass and destruction. Data Quality Since the database is accessible to users remotely, adequate controls are needed to control users updating data and to control data quality. With increased number of users accessing data directly, there are enormous opportunities for users to damage the data. Unless there are suitable controls, the data quality may be compromised. Data Integrity Since a large number of users could be using a database concurrently, technical safeguards are necessary to ensure that the data remain correct during operation. The main threat to data integrity comes from several different users attempting to update the same data at the same time. The database therefore needs to be protected against inadvertent changes by the users. Enterprise Vulnerability Centralizing all data of an enterprise in one database may mean that the database becomes an indispensable resource. The survival of the enterprise may depend on reliable information being available from its database. The enterprise therefore becomes vulnerable to the destruction of the database or to unauthorized modification of the database. The Cost of using a DBMS Conventional data processing systems are typically designed to run a number of welldefined, preplanned processes. Such systems are often "tuned" to run efficiently for the processes that they were designed for. Although the conventional systems are usually fairly inflexible in that new applications may be difficult to implement and/or expensive to run, they are usually very efficient for the applications they are designed for. The database approach on the other hand provides a flexible alternative where new applications can be developed relatively inexpensively. The flexible approach is not without its costs and one of these costs is the additional cost of running applications that the conventional system was designed for. Using standardized software is almost always less machine efficient than specialized software. Cost of software/hardware and migration Complexity of backup and recovery Disadvantages in summary •Complexity •Size •Cost of DBMS •Additional hardware costs •Cost of conversion •Performance •Higher impact of a failure In addition, a DBMS provides facilities for 1. Describing the database, when a database is being set up 2. Authorization specification and checking 3. Access path selection 4. Concurrency control 5. Logging and recovery To provide all the mentioned facilities, a DBMS often has system architecture. The main components of the DBMS therefore are: 1. A Query Language and a Data Description Language (DDL) to provide users the access to the database. 2. A translator for users’ instructions in the query language and the DDL including query optimisation. 3. A Database manager 4. A file manager 5. The physical database 6. The metadata The above listing of DBMS components does not include some very important components e.g. concurrency controller and recovery manager. These components have been left out to keep the architecture relatively simple. The DBMS Architecture Several different frameworks of the DBMS architecture have been suggested over the last several years. For example, a framework may be developed based on the functions that the various components of a DBMS must provide to its users. It may also be based on different views of data that are possible within a DBMS. A commonly used view of data approach is the three-level architecture suggested by ANSI/SPARC (American National Standards Institute/Standards Planning and Requirements Committee). The three levels of the architecture are three different views of the data: 1. External - individual user view 2. Conceptual - community user view 3. Internal - physical or storage view The three level/schema database architecture allows a clear separation of the information meaning (conceptual view) from the external data representation and from the physical data structure layout. A database system that is able to separate the three different views of data is likely to be flexible and adaptable. This flexibility and adaptability is data independence that we discussed earlier. User 1 View 1 user 2 View2 Conceptual schema Internal schema database user 3 View 3 The view of each of these levels is described by a scheme. A scheme is an outline or a plan that describes the records and relationships existing in the view. A db schema is a description of the Db and this is specified during the Db design and is not expected to change frequently. The external level is the view that the individual user of the database has. This view is often a restricted view of the database and the same database may provide a number of different views for different classes of users. In general, the end users and even the applications programmers are only interested in a subset of the database. For example, a department head may only be interested in the departmental finances and student enrolments but not the library information. The librarian would not be expected to have any interest in the information about academic staff. The payroll office would have no interest in student enrolments. –Users' view of the database. –Describes that part of database that is relevant to a particular user External Level This is the level at which users interact with the system via applications programs, a host language or data sub language. The data definition language (DDL) and the data manipulation language (DML) are the most common interface tools used in this schema. This level describes that part of the database that is relevant to a particular user. It is usual for a user to require only certain tables (or parts of them) containing specific records and logical relationships between these records. Within these records the user may need access to only a few selected fields in order to perform the specified tasks. The external schema supplies the user with this limited window on the conceptual schema. Different views may have different representations of the same data. E.g., user1 views dates as (day, month, year) whereas user may view them as (year, month, day). Some views may include some derived or calculated data, data not actually stored in the database as such. E.g., ages of employees may be included in a view on an employee relation but are unlikely to be stored. Instead, their dates of birth would be stored and their ages calculated from them by the DBMS. The external schema also contains the method of deriving the objects in the external view from the objects in the conceptual view. The objects include entities, attributes and relationships. Conceptual Level The conceptual view is a representation of the entire information content of the database. This level describes what data is stored in the database and the relationships among the data. This level contains the logical structure of the entire database as seen by the database administrator (DBA). The conceptual schema hides the details of physical storage structures and concentrates on describing entities, data types, relationships, user operations, and constraints. This level mainly represents: all entities, their attributes and their relationships. security and integrity information. This level must not contain any storage-dependent details (e.g., storage structure and access technique). The schema can be regarded as derived from a model of the organization and should be designed with care as it is usual for its structure to remain relatively unchanged in the life of the database. Internal Level The internal view is a low-level representation of the entire database. This level describes how the data is stored in the database and the access paths for the Db. The internal view is described by means of the internal schema which defines the various stored record types, how stored fields are represented, what indexes exist, what physical sequence the stored records are in, and so on. It is concerned with storage details that are not part of a logical view of the database. The internal view does not deal in terms of physical records (blocks or pages) nor with any device-specific considerations such as cylinder or track sizes. In other words the internal view effectively assumes an infinite linear address space; details of how that address space is mapped to physical storage are systemspecific. Hence, it is generally understood that, below the internal level, there is a physical level which is managed by the operating system under the direction of the DBMS. The physical level below the DBMS consists of items only the operating system knows, such as exactly how the sequencing is implemented and whether the fields of internal records are stored as contiguous bytes on the disk. The internal view is the view about the actual physical storage of data. It tells us what data is stored in the database and how. At least the following aspects are considered at this level: 1. Storage allocation e.g. B-trees, hashing etc. 2. Access paths e.g. specification of primary and secondary keys, indexes and pointers and sequencing. 3. Miscellaneous e.g. data compression and encryption techniques, optimisation of the internal structures. Efficiency considerations are the most important at this level and the data structures are chosen to provide an efficient database. The internal view does not deal with the physical devices directly. Instead it views a physical device as a collection of physical pages and allocates space in terms of logical pages. Each user group refers to its own external schema, so the DBMS must transform a request specified on an external schema into a request the conceptual schema, and then into a request on the internal schema for processing over the stored database, then the data extracted from the stored DB must be reformatted to match the user’s external view. The processes of transforming requests and results between levels are called mappings. Mappings There are two levels of mapping in the architecture: Conceptual/Internal Mapping -Defines the correspondence between the conceptual view and the stored database (internal view): it specifies how conceptual records and fields are represented at the internal level. This enables the DBMS to find the actual record or combination of records in physical storage that constitute a logical record in the conceptual schema, together with any constraints to be enforced on the operations for that logical record. External/Conceptual Mapping - Defines the correspondence between a particular external view and the conceptual view. This enables the DBMS to map names in the user’s view onto the relevant part of the conceptual schema. Objectives of Three-Level Architecture • All users should be able to access same data. • A user’s view is immune to changes made in other views. • Users should not need to know physical database storage details. • DBA should be able to change database storage structures without affecting the users’ views. • Internal structure of database should be unaffected by changes to physical aspects of storage. • DBA should be able to change conceptual structure of database without affecting all users. Views The view mechanism provides users with only the data they want or need to use. A view allows each user to have his/her own view of the database. A view is essentially some subset of the database. Benefits of views include: – Reduce complexity; – Provide a level of security; – Provide a mechanism to customize the appearance of the database; – Present a consistent, unchanging picture of the structure of the database, even if the underlying database is changed. Data Independence The concept of data independence can be thought of as the capacity to change the schema at one level of the database without having to change the schema at the next higher level. Hide implementation and storage details from programs that use the data. DBMS systems, like Oracle, provide physical and logical independence as data can be managed separately from the applications that use the data. Protects application programs from changes in the underlying logical organisation and in physical access paths and storage structures. The separation of the conceptual view from the internal view enables us to provide a logical description of the database without the need to specify physical structures. This is often called physical data independence. • Logical Data Independence Refers to immunity of external schemas to changes in conceptual schema or simply the capacity to change the conceptual schema without having to change external schemas or application programs The mapping between the external and conceptual levels absorbs the changes. Conceptual schema may be changed to expand the DB eg by adding a new record type or data item or to reduce the DB eg by removing a record type or data item. It insulates application programs from operations such as combining two records into one or splitting an existing record into two more records. Should not require changes to external schema or rewrites of application programs. • Physical Data Independence Refers to immunity of conceptual schema to changes in the internal schema. Physical storage structures or devices used for storing the data could be changed without necessitating a change in the conceptual view or any of the external views. Internal schema changes may include e.g. using different file organizations, storage structures/devices, creating of additional access structures to improve the performance of retrieval or update. The changes are absorbed by the mappings between the conceptual and internal levels Should not require change to conceptual or external schemas. External schema External schema Conceptual schema Internal schema Functions of a DBMS Data Storage, Retrieval, and Update. A User-Accessible Catalog. Transaction Support. Concurrency Control Services. Recovery Services. Authorization Services. Support for Data Communication. External schema Integrity Services. Components that are part of the DBMS Environment Hardware software procedures people Services to Promote Data Independence. Utility Services. Hardware-Can range from a PC to a network of computers. Software-DBMS, operating system, network software (if necessary) and also the application programs. Data –Used by the organization and a description of this data called the schema. Procedures –Instructions and rules that should be applied to the design and use of the database and DBMS. People The Database Administrator (DBA) The database will be able to meet the demands of various users in the organization effectively only if it is maintained and managed properly. Usually a person (or a group of persons) centrally located, with an overall view of the database, is needed to keep the database running smoothly. Such a person is called the Database Administrator (DBA). The DBA is the custodian of the data and controls the database structure, he administers the three levels of the database. The DBA would normally have a large number of tasks related to maintaining and managing the database. These tasks would include the following: 1. Deciding and Loading the Database Contents - The DBA in consultation with senior management is normally responsible for defining the conceptual schema of the database. The DBA would also be responsible for making changes to the conceptual schema of the database if and when necessary. 2. Assisting and Approving Applications and Access - The DBA would normally provide assistance to end-users interested in writing application programs to access the database. The DBA would also approve or disapprove access to the various parts of the database by different users. 3. Deciding Data Structures - Once the database contents have been decided, the DBA would normally make decisions regarding how data is to be stored and what indexes need to be maintained. In addition, a DBA normally monitors the performance of the DBMS and makes changes to data structures if the performance justifies them. In some cases, radical changes to the data structures may be called for. 4. Backup and Recovery - Since the database is such a valuable asset, the DBA must make all the efforts possible to ensure that the asset is not damaged or lost. This normally requires a DBA to ensure that regular backups of a database are carried out and in case of failure (or some other disaster like fire or flood), suitable recovery procedures are used to bring the database up with as little down time as possible. 5. Monitor Actual Usage - The DBA monitors actual usage to ensure that policies laid down regarding use of the database are being followed. The usage information is also used for performance tuning. Database Languages • Data Definition Language (DDL) Allows the DBA or user to describe and name entities, attributes, and relationships required for the application ie it is used to define the conceptual scheme. The Data Definition Language (DDL) is used to create and destroy databases and database objects. Database administrators will primarily use these commands during the setup and removal phases of a database project. The definition includes any associated integrity and security constraints that have to be maintained. This may include constraints on the values assigned to a given attribute etc. These definitions are maintained in a compiled form (usually as a set of tables) and this compiled form is known as the data dictionary, directory or system catalog. The internal schema is specified using a similar language called the storage definition language (SDL). There is also a third language that is used to specify user views and their mappings to conceptual schema – this is the View definition language (VDL). • Data Manipulation Language (DML) Provides basic data manipulation operations on data held in the database. Typical manipulation operations include retrieval, insertion, deletion and modification of the data There are two main types of DMLs: 1. Procedural or Low level DML Allows user to indicate not only what to retrieve but how to go about retrieving it. Must be embedded in a general purpose programming language. Retrieves individual records from the DB and processes each record separately. Make use of programming language constructs such as looping to retrieve and process each individual record from the set. Hence Low level DMLs are called record at a time DMLs. 2. Non-Procedural or High level DML e.g. SQL In this case the DML statements can be entered either interactively from a terminal or they are embedded in a general purpose programming language. A single statement can specify and retrieve many records at a time hence they are called set oriented DMLs or set at a time DMLs. Allows user to state what data is needed rather than how it is to be retrieved. Such languages are also called declarative • Fourth Generation Language (4GL) Query Languages Forms Generators Report Generators Graphics Generators Application Generators DBMS Component Modules Data Definition Language Compiler Data Manager File Manager Disk Manager Query Processor Query Compiler Precompiler Communications facilities/Telecommunications system Data Dictionary Database Access Data Models •Models –“Description or analogy used to visualize something that cannot be directly observed” Webster’s Dictionary –“A model is a representation of the world in simplified terms, it is an abstraction of the real world” •Data Model –Relatively simple representation of complex real-world data structures Data model - A set of concepts that can be used to describe the structure of a database. Structure of the DB, is taken here to mean the data types, relationships and constraints that should hold for the data. - An integrated collection of concepts for describing data, relationships between data, and constraints on the data in an organization Used to interpret, specify, and document requirements for database processing systems Provide a language for expressing the user's data model (structure of data, data relationships) A data model: - A logical representation that defines the units of data, and specifies how each unit is related to others - Communication tool for end users and DB designers -Tools for data models: entity-relationship model, semantic object model Data model as inferencing - Users cannot describe data models directly - Developers infer structures and relationships from the user's statement about forms and reports - Difficult and challenging in multi-user applications Data Model comprises: – A structural part; – A manipulative part; – Possibly a set of integrity rules. – Purpose of Data Model – To represent data in an understandable way. So many data models have been proposed. Data models can be categorized based on the types of concepts they provide to describe the database structure. Categories of data models include: Object-based Record-based Physical. Object-Based/ High level / Conceptual Data Models These provide concepts that are close to the way many users perceive data. They use concepts such as entities, attributes and relationships. Entity-Relationship model (a popular high level data model) Semantic model – influenced by semantic networks developed by artificial intelligence. Semantic networks were developed to organize and represent general knowledge. Functional model Object-Oriented model. Record-Based/ Representational/ Implementation / Traditional Data Models These hide some details of data storage. They are the ones used most frequently in current commercial DBMSs. They represent data by using record structures. o Relational Model o Network Model o Hierarchical Model. o Object Model. These four models reflect the historical development of database technology. Hierarchical model: stores data in the form of hierarchies. Not all systems fit into a hierarchy and this leads to redundancy. Main problem - inflexibility. Network model: stores data as a network of inter-linked sets. Main problem complexity and inflexibility. Relational model: data represented as a set of tables. Advanced theoretical support, simplicity and elegance. Limitation: only suitable for relatively simple data structures. Object model: treats data as objects with methods, etc. Benefits with complex data structures. Physical / Low level Data Models Provide concepts that describe the details of how the data is stored in the computer by representing information such as record formats, record orderings and access paths. The concepts provided are generally meant for computer specialists not for end users. Conceptual Modelling Conceptual schema is the core of a system supporting all user views. Conceptual modeling is the process of describing the concepts and relationships of a domain that are to be stored in a database. The process takes place within a theoretical framework called a conceptual model. A conceptual model is a data model which formalizes the representation and manipulation of concepts and relationships. The conceptual model defines the language used to describe the domain. A conceptual model is used by a database developer to describe the aspects of a domain which are to be captured by a database. The description of a domain is called a conceptual schema Should be a complete and accurate representation of an organization’s data requirements. Conceptual modelling is a process of developing a model of information use that is independent of implementation details. Result is a conceptual data model e.g. E-R, Object models etc. Classification of Database Management systems The main criterion normally used to classify DBMSs is the data model on which the DBMS is based. Factors that may drive an organization to switch to a DBMS Data Complexity- As data relationships become more complex, the need for a DBMS is felt more strongly Dynamically evolving or growing data- If the data changes constantly, it is easier to cope with these changes using a DBMS than using a file system Sharing among applications- the greater the sharing among applications, the more the redundancy among files, and hence the greater need for a DBMS to integrate the data. Frequency of ad hoc requests for data – file systems are not at all suitable for ad hoc retrieval of data. Data volume and need for control- The sheer volume of data and the need to control it sometimes demands a DBMS. Economic and organizational factors that affect the choice of a DBMS Structure of the data (e.g. a Hierarchical structure means a hierarchical DBMS while a network or relational system may be more appropriate for data with many interrelationships) Familiarity of personnel with the system- their familiarity with a particular DBMS may reduce training costs and learning time. Availability of vendor services- this is purely for the purpose of solving problems with the system and also getting assistance. Costs- s/w & h/w acquisition costs, maintenance cost, DB creation and conversion cost, personnel cost, operating cost and training costs.