Download DATABASES A database is a shared collection of logically related

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Open Database Connectivity wikipedia , lookup

Relational algebra wikipedia , lookup

Concurrency control wikipedia , lookup

Microsoft Jet Database Engine wikipedia , lookup

Entity–attribute–value model wikipedia , lookup

Extensible Storage Engine wikipedia , lookup

Database wikipedia , lookup

ContactPoint wikipedia , lookup

Clusterpoint wikipedia , lookup

Relational model wikipedia , lookup

Database model wikipedia , lookup

Transcript
DATABASES
A database is a shared collection of logically related data with descriptions designed to satisfy the
information needs of an organisation. The key idea behind the database concept is separating the
data from the application program (data independence)
Some common uses of databases are in large shops and supermarkets. The databases can be
linked directly to the operations of the business. For example, bar codes on items for sale can be
linked to products in the company's database so that stock levels can be maintained and pricing
details can be applied centrally from the database rather than on each individual item. The
database approach was developed and adopted because of the problems associated with data
being stored within application programs or within file based systems. The problem with data
being stored within application programs was that it was hard to access from other places and
there were limits to what could be done with the data.
There are three main areas within the relational model that can be focused on:
1. Data Structure
2. Data Integrity
3. Data Manipulation
Databases and File Based Systems
A file based system is a collection of application programs that perform services for the users
wishing to access information. Each program within a file based system defines and manages its
own data. Because of this, there are limits as to how that data can be used or transported.
File based systems were developed as better alternatives to paper based filing systems. By having
files stored on computers, the data could be accessed more efficiently. It was common practice for
larger companies to have each of its departments looking after its own data.
The problems that arise with this type of file based system are listed below:
1. Data separation and isolation
2. Data dependence
3. Data duplication
4. Incompatible data (different file formats)
5. Lack of flexibility in organising and querying the data
6. Increased number of different application programs
Some advantages of database systems are outlined below:
 Sharing of data
 Consistency of data
 Integrity of data
 Security of data
 Data independence
 Allows for more analysis of the same amount of data
 Improved data access and system performance
 Potentially increased productivity
 Increased concurrency
 Improved data backups and recovery
Some potential disadvantages of database systems are the cost of implementing them, the
amount of effort needed to transfer data into the database from a current system, and also the
impact on the whole company if the database fails (even if only for a relatively short period).
DATABASE MANAGEMENT SYSTEM (DBMS)
A DBMS is a software system that enables users to define, create and maintain a database. The
DBMS also enforces necessary access restrictions and security measures in order to protect the
database.
The DBMS also has the job of controlling access to database. Various types of control systems
within the DBMS make sure that the database continues to function properly:
 Integrity system
 Security system
 Concurrency control system
 Recovery control system
Many DBMSs enable "views" of the database to be defined. A view is how the database appears to
a certain user. Views offer the benefit of only having to show relevant information to different
types of users and it increases security as certain users will not be able to see data which they are
not meant to see. Views can also decrease the perceived complexity of the database from the
user's point of view.
Data Definition and Manipulation
The DBMS makes use of a Data Definition Language (DDL) and a Data Manipulation Language
(DML). The DML enables the specification of data types, structures and constraints that should be
part of the database. All specifications defined by the DDL are stored in the database.
The DML enables those with access to the database to insert, update, delete and retrieve data
from it. Structured Query Language (SQL) is the standard DML used today. SQL is a nonprocedural language.
Why Use a DBMS?
•Data independence and efficient access.
•Reduced application development time.
•Data integrity and security.
•Uniform data administration.
•Concurrent access, recovery from crashes.
Relational Database Principles and Terminology
A relational database is a collection of 2-dimensional tables which consist of rows and columns.
Tables are known as "relations", columns are known as "attributes" and rows (or records) are
known as "tuples".
Tables / Relations are a logical structure therefore they are an abstract concept and they do not
represent how the data is stored on the physical computer system. Each column / attribute in a
relation represents an attribute of an entity. A single row / tuple contains all the information (in
the form of attributes) about a single entity.
The "cardinality" of a relation is the number of row / tuples it has. The "degree" of a relation
refers to the number of columns / attributes in that relation. The order of records and columns is
irrelevant. Relations and columns should always be uniquely named and therefore uniquely
indentifiable. No duplicate rows occur in a single relation.
Each cell of a relation contains a single value or element which is atomic. This means that arrays
or lists, for example, would not be stored under a single attribute. Multi-valued attributes are
possible though but this involves a technique of referring to another relation which holds these
multiple values.
Attribute Domains
Attribute domains define the set of all possible values an attribute within a relation can take. For
example, the attribute "height" of a person may only take integer values with a 4 digit maximum
(if measuring in millimetres). The attribute "gender" may only take a string of length 1 with the
only two characters accepted being "m" or "f" representing male or female. Defining attribute
domains allows the database to reject invalid inappropriate data.
An attribute may in some cases be permitted to contain Null values. A Null value makes it possible
to deal with missing or erroneous data. Null means that there is an absence of a value. It is
different from using a blank space or zero value in an attribute.
Keys (Primary, Candidate, and Foreign)
A key (also known as a candidate key) is an attribute which uniquely identifies a row / tuple
within a relation. The primary key is the key chosen by the database designer as the main
identifier for all records in that relation (even though other keys could be used as the primary
key). A foreign key is an attribute which provides a logical link between tables. A foreign key in
one relation will correspond to a key within another relation. Candidate keys can consist of a
single attribute or multiple attributes. Multiple attribute keys are known as composite keys.
Database Design / Application Lifecycle
Before a database design is set into motion, planning is an essential stage. This is typically the
responsibility of management within the organisation. Good planning will enable the design and
implementation to be successfully completed with the desired results. A general definition of the
desired system should also be drawn up at this stage. This would include information such as
what the database application should be able to do, to what areas it will be applied, and who will
be using it.
Three main factors that should be analysed in the planning are
•
What work will need to be done
•
The resources needed to complete the work
•
How much it will all cost
Requirements Analysis and Collection
Information can be gathered using various techniques. Some of these are listed below:
•
Interviewing individuals
•
Observation
•
Examining documents
•
Using questionnaires
•
Using expert knowledge and experience from other design work
Interviewing many individuals can be time consuming but can result in quality feedback which
can be used in the system design. Using questionnaires is a quicker and easier way to gain
feedback from relevant people. It is hard to ensure that the answers people give are accurate
though. Looking at the current documents and paperwork in circulation can be useful in
determining what input/output screens should look like for example.
Database Design
The database design stage will result in a database model that will hopefully support the
organisation's goals. The two design approaches that can be used are Bottom Up or Top Down.
The Top Down approach starts with a very general overview of the system and details are added
in an iterative manner. Bottom Up design involves getting all the details firs then constructing the
overall design from the smaller more detailed parts.
The aim of a database design should be to represent all relevant data and the relationships that
exist between data items. The database model should also allow for all necessary transactions on
the data to be possible.
The following stages follow the database design stage:
•
DBMS Selection
•
Application Design
•
Prototyping
•
Implementation
•
Data Conversion and Loading (if data from an existing system needs to be put into the new
database)
•
Testing
•
Operational Maintenance (follows the installation of the system)
DATABASE MODELS
Various techniques are used to model data structure. Most database systems are built around one
particular data model, although it is increasingly common for products to offer support for more
than one model. For any one logical model various physical implementations may be possible,
and most products will offer the user some level of control in tuning the physical implementation,
since the choices that are made have a significant effect on performance. An example of this is the
relational model: all serious implementations of the relational model allow the creation of
indexes which provide fast access to rows in a table if the values of certain columns are known.
A data model is not just a way of structuring data: it also defines a set of operations that can be
performed on the data. The relational model, for example, defines operations such as select,
project, and join. Although these operations may not be explicit in a particular query language,
they provide the foundation on which a query language is built.
Flat model
This may not strictly qualify as a data model, as defined above. The flat (or table) model consists of
a single, two-dimensional array of data elements, where all members of a given column are
assumed to be similar values, and all members of a row are assumed to be related to one another.
For instance, columns for name and password that might be used as a part of a system security
database. Each row would have the specific password associated with an individual user. Columns
of the table often have a type associated with them, defining them as character data, date or time
information, integers, or floating point numbers. This model is, incidentally, a basis of the
spreadsheet.
Hierarchical model
In a hierarchical model, data is organized into a tree-like structure, implying a single upward link
in each record to describe the nesting, and a sort field to keep the records in a particular order in
each same-level list. Hierarchical structures were widely used in the early mainframe database
management systems, such as the Information Management System (IMS) by IBM, and now
describe the structure of XML documents. This structure allows one 1:N relationship between
two types of data. This structure is very efficient to describe many relationships in the real world;
recipes, table of contents, ordering of paragraphs/verses, any nested and sorted information.
However, the hierarchical structure is inefficient for certain database operations when a full path
(as opposed to upward link and sort field) is not also included for each record.
One limitation of the hierarchical model is its inability to efficiently represent redundancy in data.
Network model
The network model (defined by the CODASYL specification) organizes data using two
fundamental constructs, called records and sets. Records contain fields (which may be organized
hierarchically, as in the programming language COBOL). Sets (not to be confused with
mathematical sets) define one-to-many relationships between records: one owner, many
members. A record may be an owner in any number of sets, and a member in any number of sets.
The network model is a variation on the hierarchical model, to the extent that it is built on the
concept of multiple branches (lower-level structures) emanating from one or more nodes
(higher-level structures), while the model differs from the hierarchical model in that branches
can be connected to multiple nodes. The network model is able to represent redundancy in data
more efficiently than is the hierarchical model.
The operations of the network model are navigational in style: a program maintains a current
position, and navigates from one record to another by following the relationships in which the
record participates. Records can also be located by supplying key values.
Although it is not an essential feature of the model, network databases generally implement the
set relationships by means of pointers that directly address the location of a record on disk. This
gives excellent retrieval performance, at the expense of operations such as database loading and
reorganization.
Most Object databases use the navigational concept to provide fast navigation across networks of
objects, generally using Object Identifiers as "smart" pointers to related objects. Objectivity/DB,
for instance, implements named 1:1, 1:many, Many:1 and Many:Many named relationships that
can cross databases. Many object databases also support SQL, combining the strengths of both
models.
Relational model
The relational model was introduced in an academic paper by E. F. Codd in 1970 as a way to make
database management systems more independent of any particular application. It is a
mathematical model defined in terms of predicate logic and set theory.
The products that are generally referred to as relational databases in fact implement a model that
is only an approximation to the mathematical model defined by Codd. Three key terms are used
extensively in relational database models: relations, attributes, and domains. A relation is a table
with columns and rows. The named columns of the relation are called attributes, and the domain
is the set of values the attributes are allowed to take.
The basic data structure of the relational model is the table, where information about a particular
entity (say, an employee) is represented in columns and rows (also called tuples). Thus, the
"relation" in "relational database" refers to the various tables in the database; a relation is a set of
tuples. The columns enumerate the various attributes of the entity (the employee's name, address
or phone number, for example), and a row is an actual instance of the entity (a specific employee)
that is represented by the relation. As a result, each tuple of the employee table represents
various attributes of a single employee.
All relations (and, thus, tables) in a relational database have to adhere to some basic rules to
qualify as relations. First, the ordering of columns is immaterial in a table. Second, there can't be
identical tuples or rows in a table. And third, each tuple will contain a single value for each of its
attributes.
A relational database contains multiple tables, each similar to the one in the "flat" database
model. One of the strengths of the relational model is that, in principle, any value occurring in two
different records (belonging to the same table or to different tables), implies a relationship
among those two records. Yet, in order to enforce explicit integrity constraints, relationships
between records in tables can also be defined explicitly, by identifying or non-identifying parentchild relationships characterized by assigning cardinality (1:1, (0)1:M, M:M). Tables can also have
a designated single attribute or a set of attributes that can act as a "key", which can be used to
uniquely identify each tuple in the table.
A key that can be used to uniquely identify a row in a table is called a primary key. Keys are
commonly used to join or combine data from two or more tables. For example, an Employee table
may contain a column named Location which contains a value that matches the key of a Location
table. Keys are also critical in the creation of indexes, which facilitate fast retrieval of data from
large tables. Any column can be a key, or multiple columns can be grouped together into a
compound key. It is not necessary to define all the keys in advance; a column can be used as a key
even if it was not originally intended to be one.
A key that has an external, real-world meaning (such as a person's name, a book's ISBN, or a car's
serial number) is sometimes called a "natural" key. If no natural key is suitable (think of the many
people named Brown), an arbitrary or surrogate key can be assigned (such as by giving
employees ID numbers). In practice, most databases have both generated and natural keys,
because generated keys can be used internally to create links between rows that cannot break,
while natural keys can be used, less reliably, for searches and for integration with other
databases. (For example, records in two independently developed databases could be matched up
by social security number, except when the social security numbers are incorrect, missing, or
have changed.)