Download Views

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Entity–attribute–value model wikipedia , lookup

Extensible Storage Engine wikipedia , lookup

Database wikipedia , lookup

Clusterpoint wikipedia , lookup

Functional Database Model wikipedia , lookup

Relational model wikipedia , lookup

Database model wikipedia , lookup

Transcript
Introduction to
Database
Systems
1
© Michael Lang, National University of Ireland, Galway
Today’s session
• Some history
• Origins of DB
• Organisational structure
• Organisational information systems
• Data storage
• DBMS
• Data models
2
© Michael Lang, National University of Ireland, Galway
What is “Data”?
• ANSI definition:

Data
 A representation of facts, concepts, or instructions in a
formalized manner suitable for communication, interpretation,
or processing by humans or by automatic means.
 Any representation such as characters or analog quantities to
which meaning is or might be assigned. Generally, we
perform operations on data or data items to supply some
information about an entity.
• Volatile vs. persistent data

3
Our concern is primarily with persistent data
© Michael Lang, National University of Ireland, Galway
Early Data Management Ancient History
• Data are not stored on disk
• Programmer defines both logical data structure and physical
structure (storage structure, access methods, I/O modes,
etc)
• One data set per program. High data redundancy.
PROGRAM 1
Data
Management
DATA SET 1
PROGRAM 2
Data
Management
DATA SET 2
PROGRAM 3
Data
Management
4
DATA SET 3
© Michael Lang, National University of Ireland, Galway
Problems
• There is no persistence.

All data is transient and disappears when the
program terminates.
• Random access memory (RAM) is expensive and
limited

All data may not fit available memory
• Programmer productivity low

5
The programmer has to do a lot of tedious work.
© Michael Lang, National University of Ireland, Galway
•
File Processing Recent (and Current)
History
Data are stored in files with interface between programs and
files.
• Various access methods exist (e.g., sequential, indexed,
random)
• One file corresponds to one or several programs.
Data
Management
PROGRAM 2
Data
Management
PROGRAM 3
FILE 1
File
System
Services
FILE 2
Redundant Data
PROGRAM 1
Data
Management
6
© Michael Lang, National University of Ireland, Galway
File System Functions
• Mapping between logical files and physical files

Logical files: a file viewed by users and programs.
• Data may be viewed as a collection of bytes or as a collection
of records (collection of bytes with a particular structure)
• Programs manipulate logical files

Physical files: a file as it actually exists on a storage
device.
• Data usually viewed as a collection of bytes located at a
physical address on the device
• Operating systems manipulate physical files.
• A set of services and an interface (usually called
application independent interface – API)
7
© Michael Lang, National University of Ireland, Galway
Problems With File
Systems
• Data are highly redundant

sharing limited and at the file level
• Data is unstructured

“flat” files
• High maintenance costs

data dependence; accessing data is difficult

ensuring data consistency and controlling access
to data
• Sharing granularity is very coarse
• Difficulties in developing new applications
8
© Michael Lang, National University of Ireland, Galway
Database Approach
PROGRAM 1
DBMS
Query Processor
PROGRAM 1
Transaction Mgr
…
Integrated
Database
PROGRAM 2
9
© Michael Lang, National University of Ireland, Galway
Origins of Databases
• c1455: Gutenberg invents printing. Explosive interest
in publication of books (analogous with explosive
growth of Web in early 1990s) leads to public
libraries
• Libraries were first to introduce standards for
information storage and retrieval
• These paper-based systems were extended and
enhanced, and filing, indexing and classification
schemes were developed
10
• Second World War: accelerated R&D in computing
technologies spawned capability to computerise
maintenance of records
© Michael Lang, National University of Ireland, Galway
Computerised Data Storage
• Advantages of computerised data storage over
paper-based systems include:
11

ability to store data compactly (e.g Britannica CD)

enhanced data retrieval

ability to access data remotely, e.g. from a mobile
workstation, off-site location, or distant branch

ability to share data amongst multiple users with
concurrent access

facility to automate regular, speedy back-ups

enhanced data editing
• Most significant disadvantage is vulnerability e.g
system crash, corrupted data, viruses, hackers, etc.
© Michael Lang, National University of Ireland, Galway
Information: A Corporate
Asset
• Information is a vital corporate asset. Without
accurate, current, relevant information, mistakes and
misjudgements may be made
• Data management is an essential capability in the
modern business environment / information society
• A knowledge organisation is one in which the
primary asset is information; its competitive
advantage is derived from effective use of
documented knowledge. Examples: accounting
firms, marketing companies, software houses
12
• Organisational memory extends and amplifies
information / knowledge by capturing, organising,
disseminating, and reusing it
© Michael Lang, National University of Ireland, Galway
Organisational Memory
Organisational Memory
Database
(logical grouping of related files)
File / Table
(collection of related records)
Record
(collection of related fields, bound together as single units)
Field
(part of a record reserved for a particular data item)
Byte
(group of eight binary bits - can represent a single character)
13
© Michael Lang, National University of Ireland, Galway
Organisational Memory
Organisational
memory
Data
Informed
decisions
Improved
products and
services
14
© Michael Lang, National University of Ireland, Galway
Attributes of Organisational Memory
• Current
• Timely

real-time systems are commonplace in modern
business environment
• Relevant

data is only useful if relevant to task in hand
• Shareable
• Complete
• Accurate and consistent
15
© Michael Lang, National University of Ireland, Galway
Attributes of Organisational Memory
• Transportable

authorised personnel should have access to data
anywhere, anytime
• Secure against unauthorised access
16
© Michael Lang, National University of Ireland, Galway
Organisational Information Systems
• Generally, there is an alignment between business
units and core operational systems
• Typical core systems are:
17

Sales & Marketing Department: Customer
management system, Order processing system

Operations Unit: Purchasing system, Inventory
control system

Finance Department: Accounts payable and
receivable systems, Credit Management system
• There are interdependencies between these
systems; hence the need for an integrated data
management approach
© Michael Lang, National University of Ireland, Galway
Data Storage
• Information systems create, read, update and
delete data
• Data can be stored in conventional files or
databases

Files are collections of similar records

Databases are collections of interrelated files.
• Records can be linked through specified relationships to
records in other files
18
© Michael Lang, National University of Ireland, Galway
Conventional Files
• In the file environment, data
storage is built around the
applications that will use
the files
• Essentially, the file
“belongs” to a specific
application. This is termed
program-data dependence
File
19
• As applications are
developed, customised files
are created which may be
unusable by other
applications
File
Information
System
File
© Michael Lang, National University of Ireland, Galway
File
Conventional Files
• First attempts at computerised storage of records
followed traditional paper-based metaphors (“Flat
file” systems)
• Flat files were inefficient for data retrieval: it might be
necessary to search entire file for a record (which, it
may transpire, does not exist). Remedy: index files
• Indexing improved data retrieval, but conventional
files have other disadvantages:
20

Program-Data dependence

Proprietary file formats (closed systems)

Poor scalability
© Michael Lang, National University of Ireland, Galway
Conventional Files
• Disadvantages (Cont’d):

Duplicated and redundant data
• ambiguity: same thing being referred to by different names in
different places
• inconsistency: conflicting / unsynchronised data
• wasted effort

Separation and isolation of data
• data dispersed amongst many files complicates processing

Inflexibility
• cumbersome data structures and report layouts
• not responsive to ad hoc queries
• excessive program maintenance

21
Development environment
• procedural -v- non-procedural (3GL -v- 4GL)
© Michael Lang, National University of Ireland, Galway
Conventional Files
• Advantages:

Historically, conventional files have been faster to
process than DBMS applications
• As legacy file-based systems become candidates for
reengineering, the trend is to replace them with
database systems
22
© Michael Lang, National University of Ireland, Galway
Database Management
Systems
• A database is a large,
integrated collection of data
which models a real-world
enterprise
Information
System
• A Database
Management System
(DBMS) is a software
package designed to store
and manage databases
23
• In a DBMS environment,
applications are built around
an integrated adaptable
database
Information
System
Database
Information
System
Information
System
© Michael Lang, National University of Ireland, Galway
Database Management
Systems
• Advantages:
24

ability to share the same data across multiple
applications and systems

data independence

control of redundancy

enforced data integrity

improved data security

uniform data administration

concurrent access

improved backup and recovery facility

flexible data structures
© Michael Lang, National University of Ireland, Galway
Database Management
Systems
• Advantages (Cont’d):
25

databases allow the use of the data in ways not
originally specified by the end-users (ad hoc
queries)

database definition can be extended without
impacting existing programs that use it

economies of scale
© Michael Lang, National University of Ireland, Galway
Database Management
Systems
• Disadvantages:

database technology is more complex than file
technology
• requires more sophisticated hardware and software (DBMS)


DBMS’s can still be slower than file-based systems
database technology requires a significant
investment
• database administration
• operating costs and ongoing maintenance
• end-user training

26
higher impact of system failure
© Michael Lang, National University of Ireland, Galway
Database Management
Systems
• Roles in a DBMS environment
27

Data Administrator

Database Administrator

Database Designer

Application Programmer

End-User
© Michael Lang, National University of Ireland, Galway
Database Architectures
• Hierarchical Data Model
• Network Data Model
• Relational Data Model
• Object-Relational Model
28
© Michael Lang, National University of Ireland, Galway
Data Relationships
• One-to-One (1:1)

Example: Bank Manager manages one and only
one Bank Branch; Bank Branch is managed by
one and only one Bank Manager
• One-to-Many (1:M)

Example: An Employee works in a designated
Department; in any Department, there may be
many Employees
• Many-to-Many (M:M)

29
Example: A Student registers for one or more
Courses; for any Course, there may be one or
more registered Students
© Michael Lang, National University of Ireland, Galway
Hierarchical Data Model
30
© Michael Lang, National University of Ireland, Galway
Hierarchical Data Model
• Arose as a solution to a practical problem

Managing millions of parts for the space program
(standard “Bill of Materials” problem)
• The basic structure is a hierarchy or tree

Parent-child relationship
• Relationships are represented by pointers
• Restrictions:
31

Each segment has at most one parent

All relationships are 1:M
© Michael Lang, National University of Ireland, Galway
Hierarchical Data Model
• Problem: how to represent a M:M relationship ?


32
A hierarchical structure (tree) can only support
1:M relationships. Therefore, to represent M:M,
must create multiple hierarchies
… but, this means that records are duplicated in
different hierarchies

duplication gives rise to data anomalies

duplication can be eliminated using pointers;
must decide in which table to store data, and in
which table to store pointer. This is a very
awkward means of implementing M:M
© Michael Lang, National University of Ireland, Galway
Network Data Model
33
© Michael Lang, National University of Ireland, Galway
Network Data Model
• Objective was to overcome shortcomings of the
hierarchical data model, in particular, representation of
M:M relationship
• Like the hierarchical model, it can be likened to trees;
unlike hierarchical model, several trees can share
branches
• In practice, enjoyed little commercial success
 Too complex: suited to use by programmers as
opposed to end-users
34

Overtaken by the relational model

No clear theoretical base
© Michael Lang, National University of Ireland, Galway
Network Model: Structures
• Data item

a field or attribute
• Record

•
•
•
•
a collection of data items
Vectors (repeating structures) are permitted
Relationships are represented by sets
Sets have owners (parent) and members (children)
A member cannot have two parents in the one set

cannot directly represent M:M relationships
• Member records of a set can be ordered
35
© Michael Lang, National University of Ireland, Galway
Shortcomings of Early Models
• Languages of both hierarchical and network are
procedural and record-at-a-time
• To retrieve data …

you must navigate (find a path) to the required
record

issue multiple statements directing the system to
traverse that path
• It was necessary to issue multiple requests to the
DBMS to retrieve a data item
• It was necessary to have a detailed understanding of
how data was stored and structured

36
this is contrary to data independence principle
© Michael Lang, National University of Ireland, Galway
Relational Data Model
• Relational model, developed by E. F. Codd, has a strong
theoretical basis and overcomes shortcomings of network /
hierarchical models
• Relational model is non-procedural (What?, not How?), and
set-at-a-time
• Navigation is automatic
• Relations

2-dimensional data set consisting of N columns (fields /
attributes) and M rows (records)
• All rows in a relation are unique
• A relational database is a set of relations
37
© Michael Lang, National University of Ireland, Galway
Relational Model:
Structures
38
© Michael Lang, National University of Ireland, Galway
Relational Model:
Structures
39
© Michael Lang, National University of Ireland, Galway
Relational Model:
Structures
• Primary key
 A unique identifier of a row in a relation; can be
composite
• Candidate key
 An attribute that could be a primary key
• Alternative key
 A candidate key that is not selected as the primary key
• Foreign key
 An attribute of a relation that is the primary key of
another relation; can be composite
40
© Michael Lang, National University of Ireland, Galway
Relational Algebra and SQL
• A major strength of the relational model is that it
supports SQL, a flexible data retrieval language
which facilitates ad hoc queries
• Relational algebra is a standard for judging a data
retrieval language
Relational Algebra
Restrict
A where condition SELECT * FROM A
WHERE condition
Project
Product
A [X]
A times B
SELECT X FROM A
SELECT * FROM A, B
Union
A union B
SELECT * FROM A UNION
SELECT * FROM B
Difference
41
SQL
A minus B
SELECT * FROM A
WHERE NOT EXISTS
(SELECT * FROM B WHERE
A.X = B.X AND A.Y=B.Y AND ...
© Michael Lang, National University of Ireland, Galway
Relational Databases
• A truly relational database supports
 structures (domains and relations)

integrity rules

a manipulation language
• The word “relational” is sometimes used too freely; many
commercial systems are not fully relational because they do not
support domains and integrity rules
• E. F. Codd has set forward 12 rules that a database must satisfy
before it can be said to be truly relational
42
© Michael Lang, National University of Ireland, Galway
Codd’s 12 Rules
• Information representation

All data, including metadata (data definition, constraints,
user names, etc.), is represented solely and explicitly as
values in a table
• Guaranteed access

Every value in a database is accessed by specifying a
combination of table name, column name, and value of the
primary key of the row in which it is stored

No artificial paths, such as linked lists
• Systematic treatment of null values
43

There must be a distinct representation for unknown data,
irrespective of data type

Null values are not equivalent to zero or the empty string
© Michael Lang, National University of Ireland, Galway
Codd’s 12 Rules
• Dynamic on-line catalog based on the relational model

database description is represented at the logical level in the
same way as ordinary data

thus, only one manipulation language needs to be learned
• Comprehensive data sublanguage rule

A relational system may support several languages and
various modes of terminal use

However, there must be at least one language that supports
data definition, data manipulation, security and integrity
constraints, and transaction processing operations
• View updating

44
If the base tables of a view are updated, then the view itself
should be updated
© Michael Lang, National University of Ireland, Galway
Codd’s 12 Rules
• High-level insert, update, and delete

The system must support set-at-a-time operations; for
example, a set of rows can be deleted by a single statement
• Physical data independence

Changes to storage representation or access methods will
not affect application programs
• Logical data independence

45
Implementation of changes to base tables will not affect
application programs; for example, if a table is restructured
or split, applications should be immune to change (views are
beneficial here)
© Michael Lang, National University of Ireland, Galway
Codd’s 12 Rules
• Integrity independence

Integrity constraints should be part of a database's
definition rather than embedded within application programs

It must be possible to change integrity constraints without
affecting any existing application programs
• Distribution independence

Introduction of a distributed DBMS or redistributing existing
distributed data should have no impact on existing
applications
• Nonsubversion

46
It must not be possible to use low-level record-at-a-time
interface to by-pass high-level set-at-a-time security or
integrity constraints
© Michael Lang, National University of Ireland, Galway
Codd’s Rule 0
• A relational DBMS must be able to manage
databases entirely through its relational capacities
• A DBMS is either totally relational or it is not
relational
47
© Michael Lang, National University of Ireland, Galway
Tasks in the seminars 1 - 3
• In order to do MMDB you need to be able to use
SQL – self study sessions with tutor support
48
© Michael Lang, National University of Ireland, Galway