Download National University of Science and Technology

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

IMDb wikipedia , lookup

Entity–attribute–value model wikipedia , lookup

Open Database Connectivity wikipedia , lookup

Extensible Storage Engine wikipedia , lookup

Concurrency control wikipedia , lookup

Microsoft Jet Database Engine wikipedia , lookup

Relational model wikipedia , lookup

Database wikipedia , lookup

Clusterpoint wikipedia , lookup

Database model wikipedia , lookup

Transcript
Databases
Definition - Database
It is often said that we live in an information society and that information is a very
valuable resource (or, as some people say, information is power). In this information
society, the term database has become a rather common term although its meaning seems
to have become somewhat vague. Some people use the term database of an organisation
to mean all the data in the organisation (whether computerised or not). Other people use
the term to mean the software that manages the data. We would use it to mean a
collection of computerised information such that it is available to many people for
various uses. Some authors would put it to us that a database is a collection of data that is
organised so that its contents can easily be accessed, managed, and updated, or simply a
database is a self-describing collection of integrated records but a more encompassing
definition would be as follows:
1. A database is a collection of interrelated data designed to meet the varied information
needs of anorganisation. It has 2 most important properties: It is integrated and it is
shared.
Integration allows reduction of data redundancy and facilitates data access since
previously distinct
data files will have been logically and coherently organised. Sharing allows all potential
users in an organisation to have access to the same data for use in a variety of ways/
activities.
Databases contain aggregations of data records or files, such as sales transactions,
product catalogsand inventories, and customer profiles.
2. A database is a well-organised collection of data that are related in a meaningful way,
which can beaccessed in different logical orders but are stored only once. The data in the
database is therefore integrated, structured, and shared.
Some of the Database characteristics
 Shared collection of logically related data (and a description of this data),
designed to meet the information needs of an organisation.
 System catalog (metadata) provides description of data to enable program–data
independence.
 Logically related data comprises entities, attributes, and relationships of an
organisation's information.
Hierarchy of data elements
(bits => bytes => fields => records => files => database)
A database is self-describing:
 Database contains a description of its own structure
 Data dictionary, data directory, metadata
Why is self-describing important?
 Promotes program/data independence
 Changing the structure of data affects few programs
A database is a model of a model
 A database is a model of the user's model, not a model of reality
 Databases vary in their level of detail:
the degree of detail depends on the information desired
=> important part of designing a database
The main features of data in a database therefore are:
- It is well organised
- It is related
- It is accessible in different orders without great difficulty
- It is stored only once
 It is assumed that operations (update, insert, retrieve, etc.) on the database can be
carried out in a simple and flexible way. Also since a database tends to be a longterm resource of an organisation, it is expected that planned as well as unplanned
applications can (in general) be carried out without great difficulty.
 In any modern organisation, a large amount of data is generated about its
operations. This data is sometime called operational data. The operational data:
1. Includes the data an organisation must necessarily maintain about its operation.
2. Includes relationships linking basic entities.
3. Excludes input, output data, work queues, temporary results or any transient
information. Since data is a valuable resource for any enterprise, often a great deal of
money is spent collecting, storing and maintaining data. The running of an enterprise may
depend on proper maintenance of its operational data. For example, a university’s
operational data may include the following:
1. Student personal data (e.g. name, sex, current address, home address, date of birth,
nationality)
2. Student academic data (e.g. school results, academic history, current enrolment)
3. Academic staff data (e.g. name, sex, current address, date of birth, nationality,
academic qualifications, appointment history, current appointment, salary history, current
salary, sabbatical leave information, recreational leave, sick leave, etc.)
4. Non-academic staff data ( e.g. name, sex, current address, date of birth, nationality,
trade qualifications and experience, appointment history, current appointment, salary
history, current salary, recreational leave, sick leave, etc.)
5. Subjects offered data (e.g. subject name, department, syllabus, lecturer, quota if any)
6. Financial data (e.g. budget information, receipts, expenditure etc.)
The above data is certainly not a complete list of data that a university generates (for
example, data generated by the library is not included) and the reader will no doubt think
of other information that should be included.
The above data would have a number of classes of users, for example
1. Lecturers, who need information about enrolments in the subjects that they teach and
Heads of Departments about finances of the departments. These are users that use the
information in the database but do not develop their own software. They are often called
end users.
2. Programmers in the NUST ICTS department who develop programs (called
application programs) to produce reports that are needed regularly by the governments,
university administration and the departments. These programmers are called
applications programmers. They are also responsible for developing programs that assist
the end users in retrieving the information that they need in their work.
3. The Registrar or some other person who is in charge of the database and makes
decisions about what information is stored in the database and who can modify and
access it.
To conclude we note that data is
 A valuable resource and an investment
 Needed to manage other resources efficiently
 Should be managed like other resources (e.g. manpower)
Database application. An application program (or set of related programs) that is used to
perform a series of database activities (create, read, update, and delete) on behalf of
database users.
Characteristics of the database approach
 a single repository of data is maintained that is defined once and the accessed by
various users.
 Self describing nature of a database system
 The db system contains not only the database itself but also a complete definition
or description of the database. This definition is stored in the system catalog. The
system catalog contains information such as the structure of each file, the type and
storage format of each data item, etc. The information in the catalog is called
meta-data.
 Insulation between programs and data, and data abstraction
 „ In traditional file processing the structure of files is embedded in the access
programs, so any changes to the structure of a file may require changing all
Programs that access the file while in the DB approach the structure of data files
is stored in the DBMS catalog separately from the access programs. This is
called program- data independence.
„ When we consider object oriented databases we may also talk of programoperation independence
„ Data abstraction allows program-data independence and program-operation
independence.
- Support of multiple views of the data
- Sharing of data and Multi-user transaction processing
Traditional File systems vs. database systems
File-based Systems
Collection of application programs that perform services for the end users (e.g. reports).
Each program defines and manages its own data.- each user defines and implements the
files needed for each specific application.- there is redundancy in defining and storing
data which in turn results in wasted storage space.
Why a database system is needed?
Limitations of File-based Approach
1.File processing systems store groups of records in separate files…
2. Separation and isolation of data
- Each program maintains its own set of data.
- Users of one program may be unaware of potentially useful data held by other
programs.
3. Duplication of data (redundancy)
- Same data is held by different programs.
- Wasted space and potentially different values and/or different formats for the same item
=> data integrity problem: produce inconsistent results
4. Data dependence/ application program dependency
– File structure/ format and records are defined in the program/application code.
– Changes in formats must be reflected in the code
– Time consuming and error prone tasks
5. Incompatible file formats
– Programs are written in different languages, and so cannot easily access each other’s
files, rather files written in different programming languages cannot readily be combined
or compared.
6. Fixed Queries/Proliferation of application programs
- Programs are written to satisfy particular functions. Any new requirement needs a new
program.
7. Difficulty of representing data in user’ view point
- Relationships among records are not readily represented or processed
Database Approach
Arose because:
– Definition of data was embedded in application programs, rather than being stored
separately and independently.
– No control over access and manipulation of data beyond that imposed by application
programs.
Result
– The database and Database Management System (DBMS).
Database Approach
- Integrated data
- All the application data is stored in a database
- Programmer is not responsible for co-ordinating files; DBMS will do it.
- Less duplication of data
- Data is stored in only one place
- Less data integrity problems
- Program/data independence
- Record formats are stored in DB itself, so it is accessed by DBMS, not by application
programs
- Minimises the impact of data format changes on application programs
- Easier representation of user’s view of data.
- Controlled access to database
Controlled access to database may include:
– A security system.
– An integrity system.
– A concurrency control system.
– A recovery control system.
– A user-accessible catalog.
Benefits of the database approach:
- Minimal data redundancy and improved data consistency
The concept of normalisation ensures that there is reduced data redundancy in a database.
- Ease of access to data/ Improved data accessibility and responsiveness
Data in a database is interrelated and is in the same format. This facilitates better data
retrieval for general use.
- Increased of development productivity/Ease of application development and reduced
program maintenance/ reduced application development time.
New application programs to manipulate data can be written with ease because the data is
integrated and is in the same format. Designing and implementing a new database from
scratch may take more time.
- Improved data sharing
Database systems allow multiple access and update of data in a consistent manner. They
also allow different views of the same data.
- Enforcement of standards/Improved security and integrity
- Improved data quality
- Availability of up to date information
- Flexibility/Program-data independence - Data is independent from applications and
shared by multiple users and applications. It should be possible to effect changes to an
application program that accesses the data without having to change the structure of the
data itself. Similarly it should be possible to change the structure of the data without
affecting the application program that operates on it.
- Persistence
It is possible to maintain data over long periods of time, independent of any program that
accesses it.
- Resilience
The ability of data to survive hardware and software failures without sustaining loss or
becoming inconsistent can be provided for in a DB environment.
To sum up the potential benefits of the database approach are:
a. Program-data independence
b. Minimal data redundancy
c. Improved data consistency
d. Improved data sharing
e. Increased development productivity / reduced application development time
f. Enforcement standards
g. Improved data quality
h. Improved data accessibility and responsiveness
i. Reduced program maintenance
The DBMS
Database management systems evolved from generalised routines for file processing, as
the users demanded more extensive and more flexible facilities for managing data.
A DBMS is a software system that enables users to define, create, and maintain the
database and which provides controlled access to this database.
Alternatively a DBMS can be defined as:
- A collection of software that allows the creation and maintenance of the database.
- A software that facilitates the process of defining, constructing and manipulating a
database for various applications.
- A software application, which allows the storage, retrieval, and manipulation of
information in a prescribed format.
In its purest form, a DBMS does not allow for unformatted data. This restriction allows
quick indexing, sorting, and other data processing.
The DBMS interfaces with application programs, so that multiple applications and users
can use the data contained in the database. It is an intermediate link between the physical
database, the computer and the operating system, and on the other hand, the users.
A DBMS relieves the user from having to know about exact physical representations of
data and having to specify detailed algorithms for storing, updating and retrieving data.
The DBMS allows users to deal with the data in abstract terms, rather than as the
computer stores the data
A DBMS can be thought of as a file manager that manages data in databases rather than
files in file systems.
The DBMS manages user requests (and requests from other programs) so that users and
other programs are free from having to understand where the data is physically located on
storage media and, in a multi-user system, who else may also be accessing the data. In
handling user requests, the DBMS ensures the integrity
of the data (that is, making sure it continues to be accessible and is consistently organised
as intended) and security (making sure only those with access privileges can access the
data). The most typical DBMS is a relational database management system (RDBMS). A
standard user and program interface is the Structured
Query Language (SQL). A newer kind of DBMS is the object-oriented database
management system (ODBMS).
A DBMS is usually an inherent part of a database product. On PCs, Microsoft Access is a
popular example of a single- or small-group user DBMS. Microsoft’s SQL Server is an
example of a DBMS that serves database requests from multiple (client) users.
A database system consists of
- The database (data)
- A DBMS (software)
- A DDL and a DML (Part of the DBMS)
- Application programs
Defining a DB
This involves specifying the data types, structures and constraints for the data to be stored
in the database.
Constructing the DB
The process of storing the data itself (populating the DB with data) on some storage
medium that is controlled by the DBMS.
Manipulating the DB
This includes performing such functions as querying the DB to retrieve specific
data/information updating the DB, report generation etc.
Advantages of using a DBMS
There are three main features of a database management system that make it attractive to
use a DBMS in preference to more conventional software. These features are centralized
data management, data
independence, and systems integration.
In a database system, the data is managed by the DBMS and all access to the data is
through the DBMS providing a key to effective data processing. This contrasts with
conventional data processing systems where each application program has direct access
to the data it reads or manipulates. In a conventional DP system, an organization is likely
to have several files of related data that are processed by several different
application programs.
In the conventional data processing application programs, the programs usually are based
on a considerable knowledge of data structure and format. In such environment any
change of data structure or format would require appropriate changes to the application
programs. These changes could be as small as the following:
1. Coding of some field is changed. For example, a null value that was coded as -1 is now
coded as -9999.
2. A new field is added to the records.
3. The length of one of the fields is changed. For example, the maximum number of
digits in a telephone number field or a postcode field needs to be changed.
4. The field on which the file is sorted is changed.
In DBMS, all files are integrated into one system thus reducing redundancies and making
data management more efficient. In addition, DBMS provides centralized control of the
operational data. Some of the advantages of data independence, integration and
centralized control are:
1. Redundancies and inconsistencies can be reduced
In conventional data systems, an organization often builds a collection of application
programs often created by different programmers and requiring different components of
the operational data of the organization. The data in conventional data systems is often
not centralized. Some applications may require data to be combined from several
systems. These several systems could well have data that is redundant as well as
inconsistent (that is, different copies of the same data may have different values). Data
inconsistencies are often encountered in everyday life. For example, we have all come
across situations when a new address is communicated to an organization that we deal
with (e.g. a bank, or Telecom, or a gas company), we find that some of the
communications from that organization are received at the new address while others
continue to be mailed to the old address. Combining all the data in a database would
involve reduction in redundancy as well as inconsistency. It also is likely to reduce the
costs for collection, storage and updating of data.
2. Better service to the Users
A DBMS is often used to provide better service to the users. In conventional systems,
availability of information is often poor since it normally is difficult to obtain information
that the existing systems were not designed for. Once several conventional systems are
combined to form one centralized data base, the availability of information and its up-todatedness is likely to improve since the data can now be shared and the DBMS makes it
easy to respond to unforeseen information requests.
Centralizing the data in a database also often means that users can obtain new and
combined information that would have been impossible to obtain otherwise. Also, use of
a DBMS should allow users that do not know programming to interact with the data more
easily.
The ability to quickly obtain new and combined information is becoming increasingly
important in an environment where various levels of governments are requiring
organizations to provide more and more information about their activities. An
organization running a conventional data processing system would require new programs
to be written (or the information compiled manually) to meet every new demand.
3. Flexibility of the system is improved
Changes are often necessary to the contents of data stored in any system. These changes
are more easily made in a database than in a conventional system in that these changes do
not need to have any impact on application programs.
4. Cost of developing and maintaining systems is lower
As noted earlier, it is much easier to respond to unforeseen requests when the data is
centralized in a database than when it is stored in conventional file systems. Although the
initial cost of setting up of a database can be large, one normally expects the overall cost
of setting up a database and developing and maintaining application programs to be lower
than for similar service using conventional systems since the productivity of
programmers can be substantially higher in using non-procedural languages that have
been developed with modern DBMS than using procedural languages.
5. Standards can be enforced
Since all access to the database must be through the DBMS, standards are easier to
enforce. Standards may relate to the naming of the data, the format of the data, the
structure of the data etc.
6. Security can be improved /security enforcement possible
In conventional systems, applications are developed in an ad hoc manner. Often different
system of an organisation would access different components of the operational data. In
such an environment, enforcing security can be quite difficult.
Setting up of a database makes it easier to enforce security restrictions since the data is
now centralized. It is easier to control who has access to what parts of the database.
However, setting up a database can also make it easier for a determined person to breach
security. We will discuss this in the next section.
7. Integrity can be improved
Since the data of the organization using a database approach is centralized and would be
used by a number of users at a time, it is essential to enforce integrity controls.
Integrity may be compromised in many ways. For example, someone may make a
mistake in data input and the salary of a full-time employee may be input as $4,000 rather
than $40,000. A student may be shown to have borrowed books but has no enrolment.
Salary of a staff member in one department may be coming out of the budget of another
department.
If a number of users are allowed to update the same data item at the same time, there is a
possibility that the result of the updates is not quite what was intended. For example, in
an airline DBMS we could have a situation where the number of bookings made is larger
than the capacity of the aircraft that is to be used for the flight. Controls therefore must be
introduced to prevent such errors to occur because of concurrent updating activities.
However, since all data is stored only once, it is often easier to maintain integrity than in
conventional systems.
7.1 Availability of up-to-date information to all users
8. Enterprise requirements can be identified
All enterprises have sections and departments and each of these units often consider the
work of their unit as the most important and therefore consider their needs as the most
important. Once a database has been set up with centralized control, it will be necessary
to identify enterprise requirements and to balance the needs of competing units. It may
become necessary to ignore some requests for information if they conflict with higher
priority needs of the enterprise.
9. Data model must be developed
Perhaps the most important advantage of setting up a database system is the requirement
that an overall data model for the enterprise be built. In conventional systems, it is more
likely that files will be designed as needs of particular applications demand. The overall
view is often not considered. Building an overall view of the enterprise data, although
often an expensive exercise, is usually very cost-effective in the long term.
Data independence allows dynamic changes and growth potential
Disadvantages of using a DBMS
A database system generally provides on-line access to the database for many users. In
contrast, a conventional system is often designed to meet a specific need and therefore
generally provides access to only a small number of users. Because of the larger number
of users accessing the data when a database is used, the enterprise may involve additional
risks as compared to a conventional data processing system in the following areas.
1. Confidentiality, privacy and security.
2. Data quality.
3. Data integrity.
4. Enterprise vulnerability may be higher.
5. The cost of using DBMS.
Confidentiality, Privacy and Security/problems associated with centralization
When information is centralised and is made available to users from remote locations, the
possibilities of abuse are often more than in a conventional data processing system. To
reduce the chances of unauthorized users accessing sensitive information, it is necessary
to take technical, administrative and, possibly, legal measures.
Most databases store valuable information that must be protected against deliberate
trespass and destruction.
Data Quality
Since the database is accessible to users remotely, adequate controls are needed to control
users updating data and to control data quality. With increased number of users accessing
data directly, there are enormous opportunities for users to damage the data. Unless there
are suitable controls, the data quality may be compromised.
Data Integrity
Since a large number of users could be using a database concurrently, technical
safeguards are necessary to ensure that the data remain correct during operation. The
main threat to data integrity comes from several different users attempting to update the
same data at the same time. The database therefore needs to be protected against
inadvertent changes by the users.
Enterprise Vulnerability
Centralizing all data of an enterprise in one database may mean that the database
becomes an indispensable resource. The survival of the enterprise may depend on reliable
information being available from its database. The enterprise therefore becomes
vulnerable to the destruction of the database or to unauthorized modification of the
database.
The Cost of using a DBMS
Conventional data processing systems are typically designed to run a number of welldefined, preplanned processes. Such systems are often "tuned" to run efficiently for the
processes that they were designed for.
Although the conventional systems are usually fairly inflexible in that new applications
may be difficult to implement and/or expensive to run, they are usually very efficient for
the applications they are designed for.
The database approach on the other hand provides a flexible alternative where new
applications can be developed relatively inexpensively. The flexible approach is not
without its costs and one of these costs is the additional cost of running applications that
the conventional system was designed for. Using standardized software is almost always
less machine efficient than specialized software.
Cost of software/hardware and migration
Complexity of backup and recovery
Disadvantages in summary
•Complexity
•Size
•Cost of DBMS
•Additional hardware costs
•Cost of conversion
•Performance
•Higher impact of a failure
In addition, a DBMS provides facilities for
1. Describing the database, when a database is being set up
2. Authorization specification and checking
3. Access path selection
4. Concurrency control
5. Logging and recovery
To provide all the mentioned facilities, a DBMS often has system architecture. The main
components of the DBMS therefore are:
1. A Query Language and a Data Description Language (DDL) to provide users the
access to the database.
2. A translator for users’ instructions in the query language and the DDL including query
optimisation.
3. A Database manager
4. A file manager
5. The physical database
6. The metadata
The above listing of DBMS components does not include some very important
components e.g. concurrency controller and recovery manager. These components have
been left out to keep the architecture relatively simple.
The DBMS Architecture
Several different frameworks of the DBMS architecture have been suggested over the last
several years.
For example, a framework may be developed based on the functions that the various
components of a DBMS must provide to its users. It may also be based on different views
of data that are possible within a DBMS.
A commonly used view of data approach is the three-level architecture suggested by
ANSI/SPARC (American National Standards Institute/Standards Planning and
Requirements Committee). The three levels of the architecture are three different views
of the data:
1. External - individual user view
2. Conceptual - community user view
3. Internal - physical or storage view
The three level/schema database architecture allows a clear separation of the information
meaning (conceptual view) from the external data representation and from the physical
data structure layout. A database system that is able to separate the three different views
of data is likely to be flexible and adaptable. This flexibility and adaptability is data
independence that we discussed earlier.
User 1
View 1
user 2
View2
Conceptual schema
Internal schema
database
user 3
View 3
The view of each of these levels is described by a scheme. A scheme is an outline or a
plan that describes the records and relationships existing in the view. A db schema is a
description of the Db and this is specified during the Db design and is not expected to
change frequently.
The external level is the view that the individual user of the database has. This view is
often a restricted view of the database and the same database may provide a number of
different views for different classes of users. In general, the end users and even the
applications programmers are only interested in a subset of the database. For example, a
department head may only be interested in the departmental finances and student
enrolments but not the library information. The librarian would not be expected to have
any interest in the information about academic staff. The payroll office would have no
interest in student enrolments.
–Users' view of the database.
–Describes that part of database that is relevant to a particular user
External Level
 This is the level at which users interact with the system via applications
programs, a host language or data sub language.
 The data definition language (DDL) and the data manipulation language (DML)
are the most common interface tools used in this schema.
 This level describes that part of the database that is relevant to a particular user. It
is usual for a user to require only certain tables (or parts of them) containing
specific records and logical relationships between these records.
 Within these records the user may need access to only a few selected fields in
order to perform the specified tasks. The external schema supplies the user with
this limited window on the conceptual schema.
 Different views may have different representations of the same data. E.g.,
user1 views dates as (day, month, year) whereas user may view them as
(year, month, day).
 Some views may include some derived or calculated data, data not actually
stored in the database as such. E.g., ages of employees may be included in a
view on an employee relation but are unlikely to be stored. Instead, their
dates of birth would be stored and their ages calculated from them by the
DBMS.
The external schema also contains the method of deriving the objects in the external view
from the objects in the conceptual view. The objects include entities, attributes and
relationships.
Conceptual Level
 The conceptual view is a representation of the entire information content of the
database. This level describes what data is stored in the database and the
relationships among the data.
 This level contains the logical structure of the entire database as seen by the
database administrator (DBA). The conceptual schema hides the details of
physical storage structures and concentrates on describing entities, data types,
relationships, user operations, and constraints. This level mainly represents:
 all entities, their attributes and their relationships.
 security and integrity information.
 This level must not contain any storage-dependent details (e.g., storage structure
and access technique).
 The schema can be regarded as derived from a model of the organization and
should be designed with care as it is usual for its structure to remain relatively
unchanged in the life of the database.
Internal Level
 The internal view is a low-level representation of the entire database. This level
describes how the data is stored in the database and the access paths for the Db.
 The internal view is described by means of the internal schema which defines the
various stored record types, how stored fields are represented, what indexes exist,
what physical sequence the stored records are in, and so on. It is concerned with
storage details that are not part of a logical view of the database.
 The internal view does not deal in terms of physical records (blocks or pages) nor
with any device-specific considerations such as cylinder or track sizes. In other
words the internal view effectively assumes an infinite linear address space;
details of how that address space is mapped to physical storage are systemspecific.
 Hence, it is generally understood that, below the internal level, there is a physical
level which is managed by the operating system under the direction of the
DBMS. The physical level below the DBMS consists of items only the operating
system knows, such as exactly how the sequencing is implemented and whether
the fields of internal records are stored as contiguous bytes on the disk.
The internal view is the view about the actual physical storage of data. It tells us what
data is stored in the database and how. At least the following aspects are considered at
this level:
1. Storage allocation e.g. B-trees, hashing etc.
2. Access paths e.g. specification of primary and secondary keys, indexes and pointers
and sequencing.
3. Miscellaneous e.g. data compression and encryption techniques, optimisation of the
internal structures.
Efficiency considerations are the most important at this level and the data structures are
chosen to provide an efficient database. The internal view does not deal with the physical
devices directly. Instead it views a physical device as a collection of physical pages and
allocates space in terms of logical pages.
Each user group refers to its own external schema, so the DBMS must transform a
request specified on an external schema into a request the conceptual schema, and then
into a request on the internal schema for processing over the stored database, then the
data extracted from the stored DB must be reformatted to match the user’s external view.
The processes of transforming requests and results between levels are called mappings.
Mappings
There are two levels of mapping in the architecture:
 Conceptual/Internal Mapping
-Defines the correspondence between the conceptual view and the stored database
(internal view): it specifies how conceptual records and fields are represented at the
internal level. This enables the DBMS to find the actual record or combination of
records in physical storage that constitute a logical record in the conceptual schema,
together with any constraints to be enforced on the operations for that logical record.
 External/Conceptual Mapping
- Defines the correspondence between a particular external view and the conceptual
view. This enables the DBMS to map names in the user’s view onto the relevant part of
the conceptual schema.
Objectives of Three-Level Architecture
• All users should be able to access same data.
• A user’s view is immune to changes made in other views.
• Users should not need to know physical database storage details.
• DBA should be able to change database storage structures without affecting the users’
views.
• Internal structure of database should be unaffected by changes to physical aspects of
storage.
• DBA should be able to change conceptual structure of database without affecting all
users.
Views
The view mechanism provides users with only the data they want or need to use.
A view allows each user to have his/her own view of the database.
A view is essentially some subset of the database.
Benefits of views include:
– Reduce complexity;
– Provide a level of security;
– Provide a mechanism to customize the appearance of the database;
– Present a consistent, unchanging picture of the structure of the database, even if the
underlying database is changed.
Data Independence
The concept of data independence can be thought of as the capacity to change the schema
at one level of the database without having to change the schema at the next higher level.
Hide implementation and storage details from programs that use the data. DBMS
systems, like Oracle, provide physical and logical independence as data can be managed
separately from the applications that use the data.
Protects application programs from changes in the underlying logical organisation and in
physical access paths and storage structures. The separation of the conceptual view from
the internal view enables us to provide a logical description of the database without the
need to specify physical structures. This is often called physical data independence.
• Logical Data Independence
 Refers to immunity of external schemas to changes in conceptual schema or
simply the capacity to change the conceptual schema without having to change
external schemas or application programs
 The mapping between the external and conceptual levels absorbs the changes.
 Conceptual schema may be changed to expand the DB eg by adding a new record
type or data item or to reduce the DB eg by removing a record type or data item.

It insulates application programs from operations such as combining two records
into one or splitting an existing record into two more records.
 Should not require changes to external schema or rewrites of application
programs.
• Physical Data Independence
 Refers to immunity of conceptual schema to changes in the internal schema.
 Physical storage structures or devices used for storing the data could be changed
without necessitating a change in the conceptual view or any of the external
views.
 Internal schema changes may include e.g. using different file organizations,
storage structures/devices, creating of additional access structures to improve the
performance of retrieval or update.
 The changes are absorbed by the mappings between the conceptual and internal
levels
 Should not require change to conceptual or external schemas.
External schema
External schema
Conceptual schema
Internal schema
Functions of a DBMS







Data Storage, Retrieval, and Update.
A User-Accessible Catalog.
Transaction Support.
Concurrency Control Services.
Recovery Services.
Authorization Services.
Support for Data Communication.
External schema
 Integrity Services.
Components that are part of the DBMS Environment
Hardware




software
procedures
people
Services to Promote Data Independence.
Utility Services.
Hardware-Can range from a PC to a network of computers.
Software-DBMS, operating system, network software (if necessary) and also the
application programs.
 Data –Used by the organization and a description of this data called the schema.
 Procedures –Instructions and rules that should be applied to the design and use of
the database and DBMS.
 People
The Database Administrator (DBA)
 The database will be able to meet the demands of various users in the organization
effectively only if it is maintained and managed properly. Usually a person (or a
group of persons) centrally located, with an overall view of the database, is
needed to keep the database running smoothly. Such a person is called the
Database Administrator (DBA).
 The DBA is the custodian of the data and controls the database structure, he
administers the three levels of the database.
 The DBA would normally have a large number of tasks related to maintaining and
managing the database.
These tasks would include the following:
1. Deciding and Loading the Database Contents - The DBA in consultation with senior
management is normally responsible for defining the conceptual schema of the database.
The DBA would also be responsible for making changes to the conceptual schema of the
database if and when necessary.
2. Assisting and Approving Applications and Access - The DBA would normally provide
assistance to end-users interested in writing application programs to access the database.
The DBA would also approve or disapprove access to the various parts of the database by
different users.
3. Deciding Data Structures - Once the database contents have been decided, the DBA
would normally make decisions regarding how data is to be stored and what indexes need
to be maintained. In addition, a DBA normally monitors the performance of the DBMS
and makes changes to data structures if the performance justifies them. In some cases,
radical changes to the data structures may be called for.
4. Backup and Recovery - Since the database is such a valuable asset, the DBA must
make all the efforts possible to ensure that the asset is not damaged or lost. This normally
requires a DBA to ensure that regular backups of a database are carried out and in case of
failure (or some other disaster like fire or flood), suitable recovery procedures are used to
bring the database up with as little down time as possible.
5. Monitor Actual Usage - The DBA monitors actual usage to ensure that policies laid
down regarding use of the database are being followed. The usage information is also
used for performance tuning.
Database Languages
• Data Definition Language (DDL)
 Allows the DBA or user to describe and name entities, attributes, and
relationships required for the application ie it is used to define the conceptual
scheme.
 The Data Definition Language (DDL) is used to create and destroy databases and
database objects. Database administrators will primarily use these commands
during the setup and removal phases of a database project.
 The definition includes any associated integrity and security constraints that have
to be maintained.
This may include constraints on the values assigned to a given attribute etc.
 These definitions are maintained in a compiled form (usually as a set of tables)
and this compiled form is known as the data dictionary, directory or system
catalog.
 The internal schema is specified using a similar language called the storage
definition language (SDL).
 There is also a third language that is used to specify user views and their
mappings to conceptual schema – this is the View definition language (VDL).
• Data Manipulation Language (DML)
 Provides basic data manipulation operations on data held in the database.
 Typical manipulation operations include retrieval, insertion, deletion and
modification of the data
There are two main types of DMLs:
1. Procedural or Low level DML
 Allows user to indicate not only what to retrieve but how to go about retrieving it.
 Must be embedded in a general purpose programming language.
 Retrieves individual records from the DB and processes each record separately.
 Make use of programming language constructs such as looping to retrieve and
process each individual record from the set.
Hence Low level DMLs are called record at a time DMLs.
2. Non-Procedural or High level DML e.g. SQL
 In this case the DML statements can be entered either interactively from a
terminal or they are embedded in a general purpose programming language. A
single statement can specify and retrieve many records at a time hence they are
called set oriented DMLs or set at a time DMLs.
 Allows user to state what data is needed rather than how it is to be retrieved.
Such languages are also called declarative
• Fourth Generation Language (4GL)
 Query Languages
 Forms Generators
 Report Generators
 Graphics Generators
 Application Generators
DBMS Component Modules
 Data Definition Language Compiler
 Data Manager
 File Manager
 Disk Manager
 Query Processor
 Query Compiler
 Precompiler
 Communications facilities/Telecommunications system
 Data Dictionary
Database Access
Data Models
•Models
–“Description or analogy used to visualize something that cannot be directly observed”
Webster’s Dictionary
–“A model is a representation of the world in simplified terms, it is an abstraction of the
real world”
•Data Model
–Relatively simple representation of complex real-world data structures
Data model - A set of concepts that can be used to describe the structure of a database. Structure of
the DB, is taken here to mean the data types, relationships and constraints that should
hold for the data.
- An integrated collection of concepts for describing data, relationships between data, and
constraints on the data in an organization
 Used to interpret, specify, and document requirements for database processing
systems
 Provide a language for expressing the user's data model (structure of data, data
relationships)
A data model:
- A logical representation that defines the units of data, and specifies how each
unit is related to others
- Communication tool for end users and DB designers
-Tools for data models: entity-relationship model, semantic object model
Data model as inferencing
- Users cannot describe data models directly
- Developers infer structures and relationships from the user's statement about
forms and reports
- Difficult and challenging in multi-user applications
Data Model comprises:
– A structural part;
– A manipulative part;
– Possibly a set of integrity rules.
–
Purpose of Data Model
– To represent data in an understandable way.
So many data models have been proposed. Data models can be categorized based on the
types of concepts they provide to describe the database structure.
Categories of data models include:
 Object-based
 Record-based
 Physical.
Object-Based/ High level / Conceptual Data Models
 These provide concepts that are close to the way many users perceive data. They
use concepts such as entities, attributes and relationships.
 Entity-Relationship model (a popular high level data model)
 Semantic model – influenced by semantic networks developed by artificial
intelligence. Semantic networks were developed to organize and represent general
knowledge.
 Functional model
 Object-Oriented model.
Record-Based/ Representational/ Implementation / Traditional Data Models
These hide some details of data storage. They are the ones used most frequently in
current commercial DBMSs. They represent data by using record structures.
o Relational Model
o Network Model
o Hierarchical Model.
o Object Model.
These four models reflect the historical development of database technology.
Hierarchical model: stores data in the form of hierarchies. Not all systems fit into a
hierarchy and this leads to redundancy. Main problem - inflexibility.
Network model: stores data as a network of inter-linked sets. Main problem complexity
and inflexibility.
Relational model: data represented as a set of tables. Advanced theoretical support,
simplicity and elegance. Limitation: only suitable for relatively simple data structures.
Object model: treats data as objects with methods, etc. Benefits with complex data
structures.
Physical / Low level Data Models
 Provide concepts that describe the details of how the data is stored in the
computer by representing information such as record formats, record orderings
and access paths. The concepts provided are generally meant for computer
specialists not for end users.
Conceptual Modelling
 Conceptual schema is the core of a system supporting all user views.
 Conceptual modeling is the process of describing the concepts and relationships
of a domain that are to be stored in a database. The process takes place within a
theoretical framework called a conceptual model.
 A conceptual model is a data model which formalizes the representation and
manipulation of concepts and relationships.
 The conceptual model defines the language used to describe the domain.
 A conceptual model is used by a database developer to describe the aspects of a
domain which are to be captured by a database. The description of a domain is
called a conceptual schema
 Should be a complete and accurate representation of an organization’s data
requirements.
 Conceptual modelling is a process of developing a model of information use that
is independent of implementation details.
 Result is a conceptual data model e.g. E-R, Object models etc.
Classification of Database Management systems
The main criterion normally used to classify DBMSs is the data model on which the
DBMS is based.
Factors that may drive an organization to switch to a DBMS
 Data Complexity- As data relationships become more complex, the need for a
DBMS is felt more strongly
 Dynamically evolving or growing data- If the data changes constantly, it is
easier to cope with these changes using a DBMS than using a file system

Sharing among applications- the greater the sharing among applications, the
more the redundancy among files, and hence the greater need for a DBMS to
integrate the data.
 Frequency of ad hoc requests for data – file systems are not at all suitable for
ad hoc retrieval of data.
 Data volume and need for control- The sheer volume of data and the need to
control it sometimes demands a DBMS.
Economic and organizational factors that affect the choice of a DBMS
 Structure of the data (e.g. a Hierarchical structure means a hierarchical DBMS
while a network or relational system may be more appropriate for data with many
interrelationships)
 Familiarity of personnel with the system- their familiarity with a particular
DBMS may reduce training costs and learning time.
 Availability of vendor services- this is purely for the purpose of solving
problems with the system and also getting assistance.
 Costs- s/w & h/w acquisition costs, maintenance cost, DB creation and
conversion cost, personnel cost, operating cost and training costs.