Download Document

Document related concepts

Microsoft Jet Database Engine wikipedia , lookup

Entity–attribute–value model wikipedia , lookup

Extensible Storage Engine wikipedia , lookup

Database wikipedia , lookup

Clusterpoint wikipedia , lookup

Relational model wikipedia , lookup

Database model wikipedia , lookup

Transcript
WFM 5201: Data Management and Statistical Analysis © Dr. Akm Saiful Islam
WFM 5201: Data Management and
Statistical Analysis
Lecture-07: Database Management System
Akm Saiful Islam
Institute of Water and Flood Management (IWFM)
Bangladesh University of Engineering and Technology (BUET)
June, 2008
Slide 1
WFM 5201: Data Management and Statistical Analysis © Dr. Akm Saiful Islam
Outline
Database Management System





Introduction to Databases
File System Vs. Databases
Advantages of using databases
Data Models – Hierarchical, network, relational,
object oriented
Overview of Relational Database
Slide 2
WFM 5201: Data Management and Statistical Analysis © Dr. Akm Saiful Islam
Introduction to Databases





Information Systems process and manage data.
Data Management involves “Capturing”, “Retrieval,”
and “Storage” of data.
Database Management Systems (DBMSs) are
Computer systems that manage data in databases.
Today’s DBMSs are based on sophisticated software
and powerful computer hardware.
Well known DBMS software includes ORACLE,
Microsoft SQL Server, Sybase and MySQL(free
download) among others.
Slide 3
WFM 5201: Data Management and Statistical Analysis © Dr. Akm Saiful Islam
File Organisation
Sequential Files
 records are stored in a fixed sequence
 records can only be read in that sequence, starting
from the first record
 records can only be added at the end of the file
(append)
 sequential files are not efficient
Indexed Files
 Use an index to access records in a random fashion.
 Records can be sorted according to an attribute or
preference. (e.g Alphabetically, Ascending,
Descending, etc.)
 Indexed files are efficient, and faster to access.
Slide 4
WFM 5201: Data Management and Statistical Analysis © Dr. Akm Saiful Islam
The File Systems Approach
General
Ledger
Personnel
File
File
Production
Planning
Payroll
File
File




Invoicing
Inventory
File
File
Despatch
Order
Entry
File
File

Redundant Data
Storage.
One file is used in each
application.
No data sharing.
Cross-application
transfers are difficult to
manage and achieve.
File Systems are rarely
used for data processing
anymore.
Slide 5
WFM 5201: Data Management and Statistical Analysis © Dr. Akm Saiful Islam
The Database Approach
General
Ledger
Production
Planning
Personnel

Payroll


Invoicing
Despatch
Inventory
Order
Entry

Compactness. Data is
stored in a single logical
“place.”
Data can be shared and
related between
applications
Data transfer between
applications is easier
Used for a wide range
of applications.
Slide 6
WFM 5201: Data Management and Statistical Analysis © Dr. Akm Saiful Islam
Database Characteristics
• Amount
 Database
size depends on the number of records
or files it contains.

Complexity
 Database
complexity depends on the number of
relations between the files.

Volatility
 A measure
of the changes typically required in a
given period of time.

Immediacy
 A measure
of how rapidly changes must be made
to data.
Slide 7
WFM 5201: Data Management and Statistical Analysis © Dr. Akm Saiful Islam
Advantages of using a Database
Approach

Flexible Data Access. DBMSs have various tools to
manipulate, query, or report data, such as Structured
Query Language (SQL), and Report Generators.
Hence:

Selected data is easily retrieved
 A DBMS can accommodate different data views for
different users

Improved Data Integrity. Modern DBMSs consist of
various tools and methods to:

ensure that data is correct, consistent, and current
 verify data input and check whether data is
‘reasonable’.
Slide 8
WFM 5201: Data Management and Statistical Analysis © Dr. Akm Saiful Islam
Advantages of DBs (continued)

Improved Data Security. Tools such as password
access, and encryption, ensure that data is not:

deliberately or accidentally damaged or changed
 accessed without proper authorisation

Data Independence.


Problems arising from the interdependence of data and
programs are kept to a minimum.
Reduced Data Redundancy.

Single version of the truth.
 Efficient data storage.
 Efficient time management of Hardware (CPU),
programmer(s), analyst(s) and user(s).
 Relational DBs use Normalisation to reduce data
redundancy.
Slide 9
WFM 5201: Data Management and Statistical Analysis © Dr. Akm Saiful Islam
Advantages of DBs (continued)

Ability to Share and Relate Data.

Different user groups can use the same data.
 Data in different (physical or logical) parts of the
system can be related for a certain application.

Standardisation of Data.


In general data items have common names and
storage format.
Increased Productivity.

The various tools reduce the complexity that is
otherwise associated with DB maintenance when
changes are required to the system. For example Law
changes, Economy Changes, User Changes.
Slide 10
WFM 5201: Data Management and Statistical Analysis © Dr. Akm Saiful Islam
Costs of Database Approach
The implementation and use of DBMSs is normally
associated with various costs. Such as:
 Initial expenses involve planning costs, and
consultancy fees.
 Computer hardware costs.
 Software costs.
 Database Administrator costs, and staff training
costs.
 Conversion costs of an existing system.
 Various operational costs.
Slide 11
WFM 5201: Data Management and Statistical Analysis © Dr. Akm Saiful Islam
Data Models
1. Hierarchical
2. Network
3. Relational
4. Object
Slide 12
WFM 5201: Data Management and Statistical Analysis © Dr. Akm Saiful Islam
1. Hierarchical Model

Stores data as hierarchically related to
each other. Record shape are tree
structure.
BUET
Faculty of
Civil Engineering
CE
WRE
Faculty of
Architectural
URP
Archit.
Slide 13
WFM 5201: Data Management and Statistical Analysis © Dr. Akm Saiful Islam
Hierarchical Database Model
Slide 14
WFM 5201: Data Management and Statistical Analysis © Dr. Akm Saiful Islam
Hierarchical Database
Model

Logically represented by an upside
down tree
 Each
parent can have many children
 Each child has only one parent
Slide 15
WFM
6202:
Remote
and GIS
Water Management
© Dr.
Akm
Saiful
Islam
WFM
5201:
Data Sensing
Management
andin
Statistical
Analysis © Dr.
Akm
Saiful
Islam
Hierarchical Model

Several records or files are hierarchically related with
each other. For example, an organization has several
departments, each of which has attributes such as name
of director, number of staffs, annual products etc.

Each department has several divisions with attributes of
name of manager, number of staffs, annual products etc.

Then each division has several sections with attributes
such as name of head, number of staff, number of PCs
etc.
Slide 16
WFM 5201: Data Management and Statistical Analysis © Dr. Akm Saiful Islam
Advantage and Disadvantages of
Hierarchical Model

Advantages
 High speed access to large databases
 Easy to update- (to add or delete new nodes)

Disadvantages
 Links are only possible in Vertical Direction (from
top to bottom) but not for horizontal or diagonal
unless they have same parents.
 For example, it is hard to find what is the relation
between URP and DCE from this data model.
Slide 17
WFM 5201: Data Management and Statistical Analysis © Dr. Akm Saiful Islam
2. Network Database Model


Doesn’t force data into hierarchical levels
Owner/Member relationships:
 Owner
record type
 Member record type


Each owner may have one or more member
types
Each member type and corresponding owner
record type form set, which represents
relationship
Slide 18
WFM 5201: Data Management and Statistical Analysis © Dr. Akm Saiful Islam
Network Database Model
Slide 19
WFM 5201: Data Management and Statistical Analysis © Dr. Akm Saiful Islam
Network Database Model

Each record can have multiple parents
 Composed of sets - relationships
 Each set has owner record and member record
 Member may have several owners
 A set represents a 1:M relationship between the owner and
the member
Figure
1.10
Slide 20
WFM 5201: Data Management and Statistical Analysis © Dr. Akm Saiful Islam
3. Relational Model
Student Table

Based on two important
concepts:


Key of relation - one to
one, one to many, many
to many
Primary attribute –
which can’t be duplicate
Student Table
*
*
Course Table
Many to many relationship
Student
ID
Name
CourseID
1
Mr. X
001
2
Mr. X
002
3
Mr. Y
003
Course table
Cour
seID
Title
Cre
dit
001
RS & GIS in WM
3
002
Watershed Hydrology 3
003
Risk Management
3
Slide 21
WFM 5201: Data Management and Statistical Analysis © Dr. Akm Saiful Islam
Relational Database






Relational database is the most popular model for GIS. For example,
the following relational database softwares are widely used.
- INFO in ARC/INFO
- DBASE III for several PC-based GIS
- ORACLE for several GIS uses
In a relational model, the following two important concepts should be
defined.
Key of relation ; a subset of attributes
Unique identification ; e.g. the key attributes is a phone directory in a
set of last name, first name and address.
non redundancy ; any key attribute selected and tabulated should keep
the key's uniqueness. e.g. address can not be dropped from telephone
address, because there may be many with the same names.
Prime attribute : an attribute listed in at least one key.
The most important point of the relational database design is to build a
set of key attributes with a prime attribute, so as to allow dependence
between attributes as well as to avoid loss of general information when
records are inserted or deleted.
Slide 22
WFM 5201: Data Management and Statistical Analysis © Dr. Akm Saiful Islam
Relational Database Model
Slide 23
WFM 5201: Data Management and Statistical Analysis © Dr. Akm Saiful Islam
Relational Database Model
Figure 1.11
Slide 24
WFM 5201: Data Management and Statistical Analysis © Dr. Akm Saiful Islam
SQL
What is it?
Structured Query Language
 Used in ORACLE and other DB systems
 Non-procedural - i.e. Specify what you want not
how to get it
 SQL - (also pronounced SEQUEL)
directly related to the development of the
RELATIONAL MODEL by E.F.Codd.
Slide 25
WFM 5201: Data Management and Statistical Analysis © Dr. Akm Saiful Islam
SQL

SQL is used to perform query in relations databases.

For example, find the name of the student who took more
than or equal to 6 credit hour in this term
SELECT Student.Name, Course.Credit
FROM Student, Course
WHERE Student.CourseID = Course.CourseID
AND Credit >= 6

The answer is :
Mr. X 6
Slide 26
WFM 5201: Data Management and Statistical Analysis © Dr. Akm Saiful Islam
Find the relationship between this
two tables in the BUET Library
Book Table
ISBN
Title
Author
050
Applied
David
Hydrology Maidmen
060
Irrigation
Cheng
Borrow Table
ID
Name
ISBN
1
2
3
Mr. P
Mr. Q
Mr. R
050
060
070
One to one
Many to Many
One to Many
?
Slide 27
WFM 5201: Data Management and Statistical Analysis © Dr. Akm Saiful Islam
Normalization of an Un-normalized
Table to relational database
Slide 28
WFM 5201: Data Management and Statistical Analysis © Dr. Akm Saiful Islam
Advantage of Relational Database

Advantages




there is no redundancy.
type of building of an owner can be changed without destroying
the relation between type and rate.
a new type of building for example "Clay" can be inserted. (row
insert is easy).
Disadvantages


Require a number of tables and relationship
Its difficult to add a new column in the table.
Slide 29
WFM 5201: Data Management and Statistical Analysis © Dr. Akm Saiful Islam
4. Object Databases

Current generation systems have a need to handle
complex data for complex applications such as





computer aided design
computer aided software engineering
geographic information systems
interactive web sites
Relational systems are inadequate for these systems

Why do you think this is?
Slide 30
WFM 5201: Data Management and Statistical Analysis © Dr. Akm Saiful Islam
Object Database Types

Object-oriented
 extend
a programming language such as
Java with persistency and a query language

Object-relational
 extend
a current RDBMS (e.g. Oracle) with
object-oriented extensions
Slide 31
WFM 5201: Data Management and Statistical Analysis © Dr. Akm Saiful Islam
Object Oriented Model
BUET
Part of
Part of
Departments
Is a
Is a
CE
Institutes
Is a
URP
DCE
IWFM
AIT
WRE
Is a = Inheritance
Part of = association
Attributes:
Faculty, Staff, Students
Slide 32
WFM 5201: Data Management and Statistical Analysis © Dr. Akm Saiful Islam
Object Oriented Database

An Object Oriented model uses functions to model spatial and
non-spatial relationships of geographic objects and the
attributes.

An object is an encapsulated unit which is characterized by
attributes, a set of orientations and rules. An object oriented
model has the following characteristics.

generic properties : there should be an inheritance
relationship.

abstraction : objects, classes and super classes are to be
generated by classification, generalization, association and
aggregation.

adhoc queries : users can order spatial operations to obtain
spatial relationships of geographic objects using a special
language.
Slide 33
WFM 5201: Data Management and Statistical Analysis © Dr. Akm Saiful Islam
Example of Object Oriented Model
Slide 34
WFM 5201: Data Management and Statistical Analysis © Dr. Akm Saiful Islam
5. Object-Relational Database Model

Object-relational database management
systems (ORDBMS):
 Combine:


Ability of object technology to handle advanced relationship
types
Data integrity, reliability, and recovery features of relational
models
 Most
popular and powerful of modern database
system applications

Oracle, Microsoft SQL Server
Slide 35
WFM 5201: Data Management and Statistical Analysis © Dr. Akm Saiful Islam
Object-Relational Database Table
Slide 36
WFM 5201: Data Management and Statistical Analysis © Dr. Akm Saiful Islam
Overview of
Relational Database
Slide 37
WFM 5201: Data Management and Statistical Analysis © Dr. Akm Saiful Islam
What is a Relational Database?



A database is more than just a collection of
information - such as student and course
information, faculty and grades.
A database is a representation of the people and
things your business needs to operate, and the
way those people and things relate to each
other.
A database system supports the business rules
defined by the customer.
Slide 38
WFM 5201: Data Management and Statistical Analysis © Dr. Akm Saiful Islam
Logical to Physical Database
Design


The Entities in the Logical Data Model are
translated into Tables in the physical
database design
The entity attributes become columns of
each table in the database
 Data type (numeric, character, date)
 Business rules for the legal values for the
column (the domain of the column)
Slide 39
WFM 5201: Data Management and Statistical Analysis © Dr. Akm Saiful Islam
Data Models
A data model is a collection of concepts
for describing data.
 A schema is a description of a particular
collection of data, using the given data
model.
 The relational model of data is the most
widely used model today.

 Main
concept: relation, basically a table with
rows and columns.
 Every relation has a schema, which describes
Slide 40
WFM 5201: Data Management and Statistical Analysis © Dr. Akm Saiful Islam
Example: University Database

Conceptual schema:
Students(sid: string, name: string, login: string,
age: integer, gpa:real)
 Courses(cid: string, cname:string,
credits:integer)
 Enrolled(sid:string, cid:string, grade:string)


Physical schema:
 Relations
stored as unordered files.
 Index on first column of Students.

External Schema (View):
 Course_info(cid:string,enrollment:integer)
Slide 41
WFM 5201: Data Management and Statistical Analysis © Dr. Akm Saiful Islam
Instance of Students Relation
Students( sid: string, name: string, login: string,
age: integer, gpa: real )
sid
53666
53688
53650
name
Jones
Smith
Smith
login
age
jones@cs 18
smith@ee 18
smith@math19
gpa
3.4
3.2
3.8
Slide 42
WFM 5201: Data Management and Statistical Analysis © Dr. Akm Saiful Islam
Levels of Abstraction

Many external schemata,
single conceptual(logical)
schema and physical
schema.
External schemata describe
how users see the data.
 Conceptual schema defines
logical structure
 Physical schema describes the
files and indexes used.

External
Schema 1
External
Schema
2
External
Schema 3
Conceptual Schema
Physical Schema
* Schemas are defined using DDL; data is modified/queried using DML.
Slide 43
WFM 5201: Data Management and Statistical Analysis © Dr. Akm Saiful Islam
Database Terminology





Tables within a relational database hold sets
of data using rows and columns
Rows (records) appear horizontally in a
report, and contain one or more columns
Columns (fields) are named data elements
and appear vertically in a report
Primary Keys identify uniqueness in a row
Indexes are created for faster access to the
data in the database
Slide 44
WFM 5201: Data Management and Statistical Analysis © Dr. Akm Saiful Islam
Basic Database Concepts

Table
A
set of related
records
u
Record
– A collection of data
about an individual item
u
Name: Barry Harris
College: Medicine
Tel: 392-5555
Name: Barry Harris
College: Medicine
Tel: 392-5555
Field
– A single item of data
common to all records
Name: Barry Harris
Slide 45
WFM 5201: Data Management and Statistical Analysis © Dr. Akm Saiful Islam
An Example of a Table
Fields
Records
Name
GatorLink
Phone
College
Graff
rgraff
392-3900
Pharmacy
Harris
bharris
392-5555
Medicine
Ipswich
zipswich
846-5656
PHHP
Slide 46
WFM 5201: Data Management and Statistical Analysis © Dr. Akm Saiful Islam
Different parts of a database
 Fields
– different types of data (number or
text)
 Records
 Queries
 Reports
Slide 47
WFM 5201: Data Management and Statistical Analysis © Dr. Akm Saiful Islam
Concepts pf Relational Database

Based on two important concepts:
 Key
of relation - one to one, one
to many, many to many.
attribute – which can’t
be duplicate
 Primary
Slide 48
WFM 5201: Data Management and Statistical Analysis © Dr. Akm Saiful Islam
Primary Key




The column or set of columns that provide the
uniqueness for the row.
A table can have only one primary key.
Existing values in primary key columns may not
be modified (insert new value and then delete
old value)
The table of a relationship containing the
primary key is called the Parent Table.
Slide 49
WFM 5201: Data Management and Statistical Analysis © Dr. Akm Saiful Islam
Foreign Keys




A primary key referenced from another table
is called a foreign key
For each foreign key value, there must be a
row in a table whose primary key has the
same value.
The foreign key can be made up of one or
more columns of a table but must match the
primary key it is referencing
A table can have any number of foreign keys.
Slide 50
WFM 5201: Data Management and Statistical Analysis © Dr. Akm Saiful Islam
Primary Keys & Foreign Keys
Name
User
Phone
College
Graff
rgraff
392-3900
Pharmacy
Harris
bharris
392-5555
Medicine
Ipswich
zipswich
846-5656
PHHP
To ensure that each record is unique in each table, we
can set one field to be a Primary Key field.
A Primary Key is a field that that will contain no
duplicates and no blank values.
Foreign Keys link to data in other tables
Slide 51
WFM 5201: Data Management and Statistical Analysis © Dr. Akm Saiful Islam
Relationship Types

One-to-One : relationship is single valued in both directions
 A manager manages one department; a department has only one
manager.

One-to-Many : relationship is multi-valued in one direction - one row
in the parent table is associated with many rows in the dependent
table.
 One department has many employees.

Many-to-Many : relationships are multi-valued in both directions. This
type of relationship can be expressed in a table with a column for
each entity. (crosswalk table)
 An employee can work on more than one project, and a project
can have more than one employee assigned. Employee, Project,
and Employee/Project tables.
Slide 52
WFM 5201: Data Management and Statistical Analysis © Dr. Akm Saiful Islam
Data Integrity



For a table to have Domain Integrity
 the value of each column of data is meaningful
and acceptable in the business environment, and
passes all the edits we impose on it.
For a table to have Association Integrity
 the relationship between two or more columns in
that table satisfies a pre-defined business
association.
For a table to have Referential Integrity
 referential constraints between tables must be
enforced at all times by the Relational Database
Management System
Slide 53
WFM 5201: Data Management and Statistical Analysis © Dr. Akm Saiful Islam
Relational Database Referential
Integrity
Student
*Student_ID
Student_Course
*Student_ID (FK)
*Course_Number
*Course_Ind
For Referential Integrity - The foreign key must match a value in the
primary key of the parent table, at all times.
In this example, the Student table has a *Primary Key - Student_ID. The
Student_Course table has a 3 column *Primary Key, and also has a
Foreign Key (FK) of Student_ID that references the Student table.
There must never be a Student_ID in the Student_Course table that does
not exist in the Student table first.
Slide 54
WFM 5201: Data Management and Statistical Analysis © Dr. Akm Saiful Islam
Database Options
Consumer
 Flat Files
 Microsoft Excel
- Limit of 65,536 Rows
 Microsoft Access
 FileMaker Pro
 MySQL (Open Source)
 Postgres (Open Source)
Enterprise RDMS
 Oracle
 IBM/DB2
 MS SQL-server
 Sybase
 Informix
 Lotus Notes
 MySQL (Open Source)
 Postgres (Open Source)
Slide 55