Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
INFORMATION SYSTEMS @ X
Chapter 13
Database Design
INFO425: Systems Design
INFORMATION SYSTEMS @ X
Learning Objectives
Describe the differences and similarities between
relational and object-oriented database management
systems
Design a relational database schema based on an
entity-relationship diagram
High Level
Detailed
Design an object database schema based on a class
diagram
INFO425: Systems Design
INFORMATION SYSTEMS @ X
Learning Objectives (continued)
Design a relational schema to implement a hybrid
object-relational database
Describe the different architectural models for
distributed databases
INFO425: Systems Design
INFORMATION SYSTEMS @ X
Overview
This chapter describes design of relational and OO
data models
Developers transform conceptual data models into
detailed database models
Entity-relationship diagrams (ERDs) for traditional analysis
Class diagrams for object-oriented (OO) analysis
Detailed database models are implemented with
database management system (DBMS)
INFO425: Systems Design
INFORMATION SYSTEMS @ X
Databases and Database
Management Systems
Databases (DB) – integrated collections of stored
data that are centrally managed and controlled
Database management system (DBMS) – system
software that manages and controls access to
database
Databases described by a schema – description of
structure, content, and access controls
XML: Data and Schema are integrated
INFO425: Systems Design
INFORMATION SYSTEMS @ X
XML Example
Elements
<PROPERTYLIST>
<PROPERTY>
<PROPERTYNO>PA14</PROPERTYNO>
<ADDRESS>
<STREET>16 Holland</STREET>
<CITY>Aberdeen</CITY>
<POSTCODE>AB7 5SU</POSTCODE>
</ADDRESS>
<RENT>650</RENT>
</PROPERTY>
</PROPERTYLIST>
INFO425: Systems Design
INFORMATION SYSTEMS @ X
Database Models
Impacted by technology changes since 1960s
Model types
Hierarchical
Network
Relational
Object-oriented
Most current systems use relational and (rarely)
object-oriented data models
INFO425: Systems Design
INFORMATION SYSTEMS @ X
INFO425: Systems Design
INFORMATION SYSTEMS @ X
Network Data Model
INFO425: Systems Design
INFORMATION SYSTEMS @ X
Relational Data Model
INFO425: Systems Design
INFORMATION SYSTEMS @ X
Multidimensional Data Model
INFO425: Systems Design
INFORMATION SYSTEMS @ X
Object Oriented Data Model
INFO425: Systems Design
INFORMATION SYSTEMS @ X
Components of a DB and DBMS
Virtually all actions in a
relational DB managed
through Structured Query
Language
INFO425: Systems Design
INFORMATION SYSTEMS @ X
Important DBMS Capabilities
Simultaneous access by multiple users and
applications
Access to data without application programs (via a
query language)
Organizational data management with uniform
access and content controls
INFO425: Systems Design
INFORMATION SYSTEMS @ X
Objectives of SQL
Ideally, database language should allow user to:
create the database and relation structures;
perform insertion, modification, deletion of data;
perform simple and complex queries.
Should perform these tasks with minimal user effort and
command structure/syntax must be easy to learn.
Should be portable.
The relational model provides Structured Query Language
(SQL) to meet the above objectives
INFO425: Systems Design
INFORMATION SYSTEMS @ X
SQL Benefits
SQL is relatively easy to learn:
it is non-procedural - you specify what information you require, rather
than how to get it;
it is essentially free-format.
Relatively easy to learn and use – but also very powerful
Consists of standard English words:
CREATE TABLE Staff(staffNo VARCHAR(5),
lName VARCHAR(15),
salary DECIMAL(7,2));
INSERT INTO Staff VALUES ('SG16', 'Brown', 8300);
INFO425: Systems Design
INFORMATION SYSTEMS @ X
SQL
SQL has 3 major components:
A Data Definition Language (DDL) for defining database
structure.
A Data Manipulation Language (DML) for retrieving and updating
data.
Commands to control access to data (security)
INFO425: Systems Design
INFORMATION SYSTEMS @ X
Relational Databases
Relational database management system (RDBMS)
organizes data into tables or relations
Tables are two dimensional data structures
Tuples – rows or records – represents a specific occurrence
of an entity
Fields – columns or attributes
Tables have primary key field(s) that can be used
to identify unique records
Keys relate tables to each other… foreign keys
INFO425: Systems Design
INFORMATION SYSTEMS @ X
Partial Display of Relational Database Table
INFO425: Systems Design
INFORMATION SYSTEMS @ X
Database design
High Level – “logical”
Detailed – “physical”
INFO425: Systems Design
INFORMATION SYSTEMS @ X
High Level Design
Create table for each entity type
Choose or invent primary key for each table
Add foreign keys to represent one-to-many
relationships
Create new tables to represent many-to-many
relationships
INFO425: Systems Design
INFORMATION SYSTEMS @ X
High Level Design
Define referential integrity constraints
Evaluate schema quality and make necessary
improvements…. normalization
Choose appropriate data types and value
restrictions (if necessary) for each field
Valid values
Valid ranges
INFO425: Systems Design
INFORMATION SYSTEMS @ X
RMO Entity-Relationship Diagram
INFO425: Systems Design
INFORMATION SYSTEMS @ X
Relationship Between Data in Two Tables
INFO425: Systems Design
INFORMATION SYSTEMS @ X
Representing Relationships
Relational databases use foreign keys to represent
relationships
One-to-many relationship
Add primary key field of “one” entity type as foreign key in
table that represents “many” entity type
Many-to-many relationship
Use the primary key field(s) of both entity types
Use (or create) an associative entity table to represent
relationship
INFO425: Systems Design
INFORMATION SYSTEMS @ X
Entity Tables with Primary Keys
INFO425: Systems Design
(Figure 12-7)
INFORMATION SYSTEMS @ X
Represent One-to-Many Relationships by
Adding Foreign Keys (in italics) (Figure 12-8)
INFO425: Systems Design
INFORMATION SYSTEMS @ X
Enforcing Referential Integrity
Consistent relational database state
Every foreign key value also exists as a primary
key value
DBMS enforces referential integrity automatically
after schema designer identifies primary and
foreign keys
INFO425: Systems Design
INFORMATION SYSTEMS @ X
DBMS Referential Integrity Enforcement
When rows containing foreign keys are created
DBMS ensures that value also exists as a primary key in a
related table
When row is deleted
DBMS ensures no foreign keys in related tables have same
value as primary key of deleted row
When primary key value is changed
DBMS ensures no foreign key values in related tables contain
the same value
INFO425: Systems Design
INFORMATION SYSTEMS @ X
RI in DDL
RI is defined
when defining
the foreign keys
associated with
a table
INFO425: Systems Design
INFORMATION SYSTEMS @ X
Evaluating Schema Quality
High-quality data model has
Uniqueness of table rows and primary keys
Ease of implementing future data model changes (flexibility
and maintainability)
Lack of redundant data (database normalization)
Database design is not objective or quantitatively
measured; it is experience and judgment based
INFO425: Systems Design
INFORMATION SYSTEMS @ X
Normalization
A good data model:
Is simple – all attributes in an entity describe only that entity.
Is non-redundant. All attributes other than foreign keys
describe one entity
Is flexible and adaptive.
Normalization is a data analysis technique that
organizes data attributes such that they are grouped
to form non-redundant, stable, flexible, and adaptive
entities.
INFO425: Systems Design
INFORMATION SYSTEMS @ X
Database Normalization
Normal forms minimize data redundancy
First normal form (1NF) – no repeating fields or groups of
fields
Functional dependency – one-to-one relationship between the
values of two fields
2NF – in 1NF and if each non-key element is functionally
dependent on entire primary key
3NF – in 2NF and if no non-key element is functionally
dependent on any other non-key element
INFO425: Systems Design
INFORMATION SYSTEMS @ X
First Normal Form
The first normal form states
that each field is granular and
there are no repeating groups
AND…
INFO425: Systems Design
INFORMATION SYSTEMS @ X
Second/Third Normal Form
INFO425: Systems Design
The second normal form states that
each field in a multiple field primary
key table must be directly related to the
entire primary key. Third normal form
is the same as second normal form
except that it only refers to tables that
have a single field as their primary
key.
INFORMATION SYSTEMS @ X
Exercise
Define what would need to be done place data in
first normal form (look carefully at the data)
Place this data in 3rd normal form
INFO425: Systems Design
INFORMATION SYSTEMS @ X
Detailed Design
Design physical representation
Analyze transactions
Choose file organizations
Choose indexes
Estimate disk space requirements
Design user views
Design security mechanisms
Consider the introduction of controlled
redundancy
Monitor and tune the operational system
INFO425: Systems Design
INFORMATION SYSTEMS @ X
Analyze Transactions
Assume that a transaction = use case
Use this information to identify the parts of the
database that may cause performance problems.
To select appropriate file organizations and indexes,
also need to know high-level functionality of the
transactions, such as:
attributes that are updated in an update transaction
criteria used to restrict rows that are retrieved in a query.
INFO425: Systems Design
INFORMATION SYSTEMS @ X
Analyze Transactions
Often not possible to analyze all expected
transactions, so investigate most ‘important’ ones –
use Paretto Principle
To help identify which transactions to investigate,
can use:
transaction/relation cross-reference matrix, showing relations
that each transaction accesses, and/or
Access frequency analysis.
INFO425: Systems Design
INFORMATION SYSTEMS @ X
Cross-referencing transactions and relations
Which relations require the most attention?
INFO425: Systems Design
INFORMATION SYSTEMS @ X
(2) Determine frequency information
We need to identify the key transactions that
access those relations
Key information includes:
Data volumes anticipated in each relation
Average and maximum number of times per period that a
transaction executes
Peak access times for each transaction
Why is peak access time important to know?
INFO425: Systems Design
INFORMATION SYSTEMS @ X
(2) Determine frequency information
Peak times are key to understanding if
transactions conflict; that is, have similar peak
times. If so, potential for performance issues is
greater.
INFO425: Systems Design
INFORMATION SYSTEMS @ X
Transaction usage map for some sample
transactions
INFO425: Systems Design
INFORMATION SYSTEMS @ X
Choose File Organizations
Purpose: To determine an efficient file
organization for each table.
Determines how records physically ordered on disk
File organizations type: unordered, ordered, hash.
Rule of Thumb: Use DBMS default organization
INFO425: Systems Design
INFORMATION SYSTEMS @ X
Choose Indexes
Purpose: To determine whether adding indexes will
improve the performance of the system.
Need to consider both primary and secondary
indexes
Primary Index – how data will be physically ordered on disk
Secondary Index – additional indexes created to speed access
to data
INFO425: Systems Design
INFORMATION SYSTEMS @ X
Why do indexes improve performance?
Without an index, need to scan tables
Index = table with very few attributes.
Use the index to find the record(s) that meet query
criteria.
INFO425: Systems Design
INFORMATION SYSTEMS @ X
Estimate Disk Space Requirements
Purpose: To estimate the amount of disk space that
will be required by the database.
We need to do this step to ensure that we have sufficient
hardware and disk capacity both now and in the future to
manage the database.
To calculate storage, need to consider:
Amount of data required by each table (width of each row * count of rows)
Indexes to be maintained
Log files to be maintained
Growth over a defined time period (1 – 2 years)
INFO425: Systems Design
INFORMATION SYSTEMS @ X
Design User Views
Purpose: To design the user views that were
identified during the requirements stage of the
relational database application lifecycle.
On multi-user systems, views are often used to
manage security, reduce complexity for users
INFO425: Systems Design
INFORMATION SYSTEMS @ X
Other Detailed Design Tasks
Design security mechanisms
Consider the introduction of controlled redundancy
Monitor and tune the operational system
INFO425: Systems Design
INFORMATION SYSTEMS @ X
Object-Oriented Databases
Direct extension of OO design and programming
paradigm
ODBMS stores data as objects
Direct support for method storage, inheritance,
nested objects, object linking, and programmerdefined data types
Object Definition Language (ODL)
Standard language for describing structure and content of an
object database
INFO425: Systems Design
INFORMATION SYSTEMS @ X
Designing Object Databases
Determine which classes require persistent
storage
Define persistent classes
Represent relationships among persistent classes
Choose appropriate data types and value
restrictions (if necessary) for each field
INFO425: Systems Design
INFORMATION SYSTEMS @ X
Representing Classes
Transient classes
Objects exist only during lifetime of program or process
Examples: view layer window, pop-up menu
Persistent classes
Objects not destroyed when program or process ceases
execution. State must be remembered.
Exist independently of program or process
Examples: customer information, employee information
INFO425: Systems Design
INFORMATION SYSTEMS @ X
Representing Relationships
Object identifiers
Used to identify objects uniquely
Physical storage address or reference
Relate objects of one class to another
ODBMS uses attributes containing object
identifiers to find objects that are related to other
objects
Similar to…..?
Keyword relationship can be used to declare
relationships between classes
INFO425: Systems Design
INFORMATION SYSTEMS @ X
Representing Relationships (continued)
Advantages include
ODBMS assumes responsibility for determining connection
among objects
ODBMS assumes responsibility for maintaining referential
integrity
Type of relationships
1:1, 1:M, M:M (one-to-one, one-to-many, many-to-many)
Association class used with M:M
INFO425: Systems Design
INFORMATION SYSTEMS @ X
RMO Domain
Model Class
Diagram
(Figure 12-15)
INFO425: Systems Design
INFORMATION SYSTEMS @ X
One-to-One Relationship Represented with
Attributes Containing Object Identifiers
INFO425: Systems Design
INFORMATION SYSTEMS @ X
One-to-Many Relationship Between
Customer and Order Classes
INFO425: Systems Design
INFORMATION SYSTEMS @ X
One-to-Many Relationship Represented with Attributes
Containing Object Identifiers
INFO425: Systems Design
INFORMATION SYSTEMS @ X
Generalization Hierarchy within
the RMO Class Diagram (Figure 12-21)
INFO425: Systems Design
INFORMATION SYSTEMS @ X
Object DBMS Realities (from Wikipedia)
Object databases based on persistent
programming acquired a niche in application areas
such as:
engineering and spatial databases, telecommunications, and
scientific areas such as high energy physics and molecular
biology.
They have made little impact on mainstream
commercial data processing,
INFO425: Systems Design
INFORMATION SYSTEMS @ X
Object DBMS Realities (from Wikipedia)
ODBMS suggest that pointer-based techniques are
optimized for very specific "search routes" or
viewpoints. However, for general-purpose queries
on the same information, pointer-based techniques
will tend to be slower and more difficult to
formulate than relational.
Other things that work against ODBMS:
lack of interoperability with a great number of tools/features
that are taken for granted in the SQL world including but not
limited to industry standard connectivity, reporting tools,
OLAP tools and backup and recovery standards.
Lack a formal mathematical foundation, unlike the relational
model,
INFO425: Systems Design
INFORMATION SYSTEMS @ X
Hybrid Object-Relational Database Design
RDBMS (hybrid DBMS) used to store object
attributes and relationships
Design complete relational schema and
simultaneously design equivalent set of classes
Mismatches between relational data and OO
Class methods cannot be directly stored or automatically
executed
Relationships are restricted compared to ODBMS
ODBMS can represent wider range of data types
INFO425: Systems Design
INFORMATION SYSTEMS @ X
Classes and Attributes
Designers store classes and object attributes in
RDBMS by table definition
Relational schema can be designed based on class
diagram
Table is created for each class
Fields of each table same as attributes of class
Row holds attribute values of single object
Key field is chosen for each table
INFO425: Systems Design
INFORMATION SYSTEMS @ X
Relationships
Relationships are represented with foreign keys
Foreign key values serve same purpose as object
identifiers in ODBMS
1:M relationship – add primary key field of class on
“one” side of the relationship to table representing
class on “many” side
M:M relationship – create new table that contains
primary key fields of related class tables and
attributes of the relationship itself
INFO425: Systems Design
INFORMATION SYSTEMS @ X
Data Access Classes
OO design based on a three-layer architecture
Data access classes are implementation bridge
between data stored in program objects and data
in relational database
Methods add, update, find, and delete fields and
rows in table or tables that represent the class
Methods encapsulate logic needed to copy data
values from problem domain class to database and
vice versa
INFO425: Systems Design
INFORMATION SYSTEMS @ X
Interaction
Among a
Domain Class,
a Data Access
Class, and the
DBMS
(Figure 12-25)
INFO425: Systems Design
INFORMATION SYSTEMS @ X
Data Types
Storage format and allowable content of program
variable, object state variable, or database field or
attribute
Primitive data types – directly implemented
Memory address (pointer), Boolean, integer, and so on
Complex data types – user-defined
Dates, times, audio streams, video images, URLs
INFO425: Systems Design
INFORMATION SYSTEMS @ X
Relational DBMS Data Types
Designer must choose appropriate data type for
each field in relational database schema
Choice for many fields is straightforward
Names and addresses use a set of fixed- or variable-length
character arrays
Inventory quantities can use integers
Item prices can use real numbers
Complex data types (DATE, LONG, LONGRAW)
INFO425: Systems Design
INFORMATION SYSTEMS @ X
Subset of Oracle RDBMS Data Types
INFO425: Systems Design
INFORMATION SYSTEMS @ X
Object DBMS Data Types
Use set of primitive and complex data types
comparable to RDBMS data types
Schema designer can create new data types and
associated constraints
Classes are complex user-defined data types that
combine traditional concept of data with processes
(methods) to manipulate data
INFO425: Systems Design
INFORMATION SYSTEMS @ X
Distributed Databases
Rare for all organizational data to be stored in a
single database in one location
Different information systems in an organization
are developed at different times
Parts of an organization’s data may be owned and
managed by different units
System performance is improved when data is
near primary applications
INFO425: Systems Design
INFORMATION SYSTEMS @ X
Single Database Server Architecture
(Figure 12-27)
INFO425: Systems Design
INFORMATION SYSTEMS @ X
Replicated Database Server Architecture
(Figure 12-28)
INFO425: Systems Design
INFORMATION SYSTEMS @ X
Partitioned Database Server Architecture
INFO425: Systems Design
INFORMATION SYSTEMS @ X
Advantages of DDBMSs
Reflects organizational structure
Improved shareability and local autonomy
Improved availability
Improved reliability
Improved performance
Economics
Modular growth
INFO425: Systems Design
INFORMATION SYSTEMS @ X
Disadvantages of DDBMSs
Complexity
Cost
Security
Integrity control more difficult
Lack of standards
Lack of experience
Database design more complex
INFO425: Systems Design
INFORMATION SYSTEMS @ X
Functions of a DDBMS
Expect DDBMS to have at least the functionality of a
DBMS.
Also to have following functionality:
Extended communication services.
Extended Data Dictionary.
Distributed query processing.
Extended concurrency control.
Extended recovery services.
INFO425: Systems Design