Download INFO425: Systems Design INFORMATION SYSTEMS @ X

Document related concepts

Database wikipedia , lookup

Concurrency control wikipedia , lookup

Relational model wikipedia , lookup

Database model wikipedia , lookup

Transcript
INFORMATION SYSTEMS @ X
Chapter 13
Database Design
INFO425: Systems Design
INFORMATION SYSTEMS @ X
Learning Objectives
 Describe the differences and similarities between
relational and object-oriented database management
systems
 Design a relational database schema based on an
entity-relationship diagram

High Level

Detailed
 Design an object database schema based on a class
diagram
INFO425: Systems Design
INFORMATION SYSTEMS @ X
Learning Objectives (continued)
 Design a relational schema to implement a hybrid
object-relational database
 Describe the different architectural models for
distributed databases
INFO425: Systems Design
INFORMATION SYSTEMS @ X
Overview
 This chapter describes design of relational and OO
data models
 Developers transform conceptual data models into
detailed database models


Entity-relationship diagrams (ERDs) for traditional analysis
Class diagrams for object-oriented (OO) analysis
 Detailed database models are implemented with
database management system (DBMS)
INFO425: Systems Design
INFORMATION SYSTEMS @ X
Databases and Database
Management Systems
 Databases (DB) – integrated collections of stored
data that are centrally managed and controlled
 Database management system (DBMS) – system
software that manages and controls access to
database
 Databases described by a schema – description of
structure, content, and access controls
 XML: Data and Schema are integrated
INFO425: Systems Design
INFORMATION SYSTEMS @ X
XML Example
Elements
<PROPERTYLIST>
<PROPERTY>
<PROPERTYNO>PA14</PROPERTYNO>
<ADDRESS>
<STREET>16 Holland</STREET>
<CITY>Aberdeen</CITY>
<POSTCODE>AB7 5SU</POSTCODE>
</ADDRESS>
<RENT>650</RENT>
</PROPERTY>
</PROPERTYLIST>
INFO425: Systems Design
INFORMATION SYSTEMS @ X
Database Models
 Impacted by technology changes since 1960s
 Model types




Hierarchical
Network
Relational
Object-oriented
 Most current systems use relational and (rarely)
object-oriented data models
INFO425: Systems Design
INFORMATION SYSTEMS @ X
INFO425: Systems Design
INFORMATION SYSTEMS @ X
Network Data Model
INFO425: Systems Design
INFORMATION SYSTEMS @ X
Relational Data Model
INFO425: Systems Design
INFORMATION SYSTEMS @ X
Multidimensional Data Model
INFO425: Systems Design
INFORMATION SYSTEMS @ X
Object Oriented Data Model
INFO425: Systems Design
INFORMATION SYSTEMS @ X
Components of a DB and DBMS
Virtually all actions in a
relational DB managed
through Structured Query
Language
INFO425: Systems Design
INFORMATION SYSTEMS @ X
Important DBMS Capabilities
 Simultaneous access by multiple users and
applications
 Access to data without application programs (via a
query language)
 Organizational data management with uniform
access and content controls
INFO425: Systems Design
INFORMATION SYSTEMS @ X
Objectives of SQL
 Ideally, database language should allow user to:



create the database and relation structures;
perform insertion, modification, deletion of data;
perform simple and complex queries.
 Should perform these tasks with minimal user effort and
command structure/syntax must be easy to learn.
 Should be portable.
 The relational model provides Structured Query Language
(SQL) to meet the above objectives
INFO425: Systems Design
INFORMATION SYSTEMS @ X
SQL Benefits
 SQL is relatively easy to learn:


it is non-procedural - you specify what information you require, rather
than how to get it;
it is essentially free-format.
 Relatively easy to learn and use – but also very powerful
 Consists of standard English words:
CREATE TABLE Staff(staffNo VARCHAR(5),
lName VARCHAR(15),
salary DECIMAL(7,2));
INSERT INTO Staff VALUES ('SG16', 'Brown', 8300);
INFO425: Systems Design
INFORMATION SYSTEMS @ X
SQL
 SQL has 3 major components:



A Data Definition Language (DDL) for defining database
structure.
A Data Manipulation Language (DML) for retrieving and updating
data.
Commands to control access to data (security)
INFO425: Systems Design
INFORMATION SYSTEMS @ X
Relational Databases
 Relational database management system (RDBMS)
organizes data into tables or relations
 Tables are two dimensional data structures


Tuples – rows or records – represents a specific occurrence
of an entity
Fields – columns or attributes
 Tables have primary key field(s) that can be used
to identify unique records
 Keys relate tables to each other… foreign keys
INFO425: Systems Design
INFORMATION SYSTEMS @ X
Partial Display of Relational Database Table
INFO425: Systems Design
INFORMATION SYSTEMS @ X
Database design
 High Level – “logical”
 Detailed – “physical”
INFO425: Systems Design
INFORMATION SYSTEMS @ X
High Level Design
 Create table for each entity type
 Choose or invent primary key for each table
 Add foreign keys to represent one-to-many
relationships
 Create new tables to represent many-to-many
relationships
INFO425: Systems Design
INFORMATION SYSTEMS @ X
High Level Design
 Define referential integrity constraints
 Evaluate schema quality and make necessary
improvements…. normalization
 Choose appropriate data types and value
restrictions (if necessary) for each field

Valid values

Valid ranges
INFO425: Systems Design
INFORMATION SYSTEMS @ X
RMO Entity-Relationship Diagram
INFO425: Systems Design
INFORMATION SYSTEMS @ X
Relationship Between Data in Two Tables
INFO425: Systems Design
INFORMATION SYSTEMS @ X
Representing Relationships
 Relational databases use foreign keys to represent
relationships
 One-to-many relationship

Add primary key field of “one” entity type as foreign key in
table that represents “many” entity type
 Many-to-many relationship


Use the primary key field(s) of both entity types
Use (or create) an associative entity table to represent
relationship
INFO425: Systems Design
INFORMATION SYSTEMS @ X
Entity Tables with Primary Keys
INFO425: Systems Design
(Figure 12-7)
INFORMATION SYSTEMS @ X
Represent One-to-Many Relationships by
Adding Foreign Keys (in italics) (Figure 12-8)
INFO425: Systems Design
INFORMATION SYSTEMS @ X
Enforcing Referential Integrity
 Consistent relational database state
 Every foreign key value also exists as a primary
key value
 DBMS enforces referential integrity automatically
after schema designer identifies primary and
foreign keys
INFO425: Systems Design
INFORMATION SYSTEMS @ X
DBMS Referential Integrity Enforcement
 When rows containing foreign keys are created

DBMS ensures that value also exists as a primary key in a
related table
 When row is deleted

DBMS ensures no foreign keys in related tables have same
value as primary key of deleted row
 When primary key value is changed

DBMS ensures no foreign key values in related tables contain
the same value
INFO425: Systems Design
INFORMATION SYSTEMS @ X
RI in DDL
RI is defined
when defining
the foreign keys
associated with
a table
INFO425: Systems Design
INFORMATION SYSTEMS @ X
Evaluating Schema Quality
 High-quality data model has



Uniqueness of table rows and primary keys
Ease of implementing future data model changes (flexibility
and maintainability)
Lack of redundant data (database normalization)
 Database design is not objective or quantitatively
measured; it is experience and judgment based
INFO425: Systems Design
INFORMATION SYSTEMS @ X
Normalization
A good data model:



Is simple – all attributes in an entity describe only that entity.
Is non-redundant. All attributes other than foreign keys
describe one entity
Is flexible and adaptive.
Normalization is a data analysis technique that
organizes data attributes such that they are grouped
to form non-redundant, stable, flexible, and adaptive
entities.
INFO425: Systems Design
INFORMATION SYSTEMS @ X
Database Normalization
 Normal forms minimize data redundancy




First normal form (1NF) – no repeating fields or groups of
fields
Functional dependency – one-to-one relationship between the
values of two fields
2NF – in 1NF and if each non-key element is functionally
dependent on entire primary key
3NF – in 2NF and if no non-key element is functionally
dependent on any other non-key element
INFO425: Systems Design
INFORMATION SYSTEMS @ X
First Normal Form
The first normal form states
that each field is granular and
there are no repeating groups
AND…
INFO425: Systems Design
INFORMATION SYSTEMS @ X
Second/Third Normal Form
INFO425: Systems Design
The second normal form states that
each field in a multiple field primary
key table must be directly related to the
entire primary key. Third normal form
is the same as second normal form
except that it only refers to tables that
have a single field as their primary
key.
INFORMATION SYSTEMS @ X
Exercise
 Define what would need to be done place data in
first normal form (look carefully at the data)
 Place this data in 3rd normal form
INFO425: Systems Design
INFORMATION SYSTEMS @ X
Detailed Design
 Design physical representation




Analyze transactions
Choose file organizations
Choose indexes
Estimate disk space requirements
 Design user views
 Design security mechanisms
 Consider the introduction of controlled
redundancy
 Monitor and tune the operational system
INFO425: Systems Design
INFORMATION SYSTEMS @ X
Analyze Transactions
 Assume that a transaction = use case
 Use this information to identify the parts of the
database that may cause performance problems.
 To select appropriate file organizations and indexes,
also need to know high-level functionality of the
transactions, such as:


attributes that are updated in an update transaction
criteria used to restrict rows that are retrieved in a query.
INFO425: Systems Design
INFORMATION SYSTEMS @ X
Analyze Transactions
 Often not possible to analyze all expected
transactions, so investigate most ‘important’ ones –
use Paretto Principle
 To help identify which transactions to investigate,
can use:


transaction/relation cross-reference matrix, showing relations
that each transaction accesses, and/or
Access frequency analysis.
INFO425: Systems Design
INFORMATION SYSTEMS @ X
Cross-referencing transactions and relations
Which relations require the most attention?
INFO425: Systems Design
INFORMATION SYSTEMS @ X
(2) Determine frequency information
 We need to identify the key transactions that
access those relations
 Key information includes:



Data volumes anticipated in each relation
Average and maximum number of times per period that a
transaction executes
Peak access times for each transaction
Why is peak access time important to know?
INFO425: Systems Design
INFORMATION SYSTEMS @ X
(2) Determine frequency information
 Peak times are key to understanding if
transactions conflict; that is, have similar peak
times. If so, potential for performance issues is
greater.
INFO425: Systems Design
INFORMATION SYSTEMS @ X
Transaction usage map for some sample
transactions
INFO425: Systems Design
INFORMATION SYSTEMS @ X
Choose File Organizations
Purpose: To determine an efficient file
organization for each table.
 Determines how records physically ordered on disk
 File organizations type: unordered, ordered, hash.
 Rule of Thumb: Use DBMS default organization
INFO425: Systems Design
INFORMATION SYSTEMS @ X
Choose Indexes
Purpose: To determine whether adding indexes will
improve the performance of the system.
 Need to consider both primary and secondary
indexes


Primary Index – how data will be physically ordered on disk
Secondary Index – additional indexes created to speed access
to data
INFO425: Systems Design
INFORMATION SYSTEMS @ X
Why do indexes improve performance?
 Without an index, need to scan tables
 Index = table with very few attributes.
 Use the index to find the record(s) that meet query
criteria.
INFO425: Systems Design
INFORMATION SYSTEMS @ X
Estimate Disk Space Requirements
Purpose: To estimate the amount of disk space that
will be required by the database.
 We need to do this step to ensure that we have sufficient
hardware and disk capacity both now and in the future to
manage the database.
 To calculate storage, need to consider:




Amount of data required by each table (width of each row * count of rows)
Indexes to be maintained
Log files to be maintained
Growth over a defined time period (1 – 2 years)
INFO425: Systems Design
INFORMATION SYSTEMS @ X
Design User Views
Purpose: To design the user views that were
identified during the requirements stage of the
relational database application lifecycle.
 On multi-user systems, views are often used to
manage security, reduce complexity for users
INFO425: Systems Design
INFORMATION SYSTEMS @ X
Other Detailed Design Tasks
 Design security mechanisms
 Consider the introduction of controlled redundancy
 Monitor and tune the operational system
INFO425: Systems Design
INFORMATION SYSTEMS @ X
Object-Oriented Databases
 Direct extension of OO design and programming
paradigm
 ODBMS stores data as objects
 Direct support for method storage, inheritance,
nested objects, object linking, and programmerdefined data types
 Object Definition Language (ODL)

Standard language for describing structure and content of an
object database
INFO425: Systems Design
INFORMATION SYSTEMS @ X
Designing Object Databases
 Determine which classes require persistent
storage
 Define persistent classes
 Represent relationships among persistent classes
 Choose appropriate data types and value
restrictions (if necessary) for each field
INFO425: Systems Design
INFORMATION SYSTEMS @ X
Representing Classes
 Transient classes


Objects exist only during lifetime of program or process
Examples: view layer window, pop-up menu
 Persistent classes



Objects not destroyed when program or process ceases
execution. State must be remembered.
Exist independently of program or process
Examples: customer information, employee information
INFO425: Systems Design
INFORMATION SYSTEMS @ X
Representing Relationships
 Object identifiers



Used to identify objects uniquely
Physical storage address or reference
Relate objects of one class to another
 ODBMS uses attributes containing object
identifiers to find objects that are related to other
objects

Similar to…..?
 Keyword relationship can be used to declare
relationships between classes
INFO425: Systems Design
INFORMATION SYSTEMS @ X
Representing Relationships (continued)
 Advantages include


ODBMS assumes responsibility for determining connection
among objects
ODBMS assumes responsibility for maintaining referential
integrity
 Type of relationships


1:1, 1:M, M:M (one-to-one, one-to-many, many-to-many)
Association class used with M:M
INFO425: Systems Design
INFORMATION SYSTEMS @ X
RMO Domain
Model Class
Diagram
(Figure 12-15)
INFO425: Systems Design
INFORMATION SYSTEMS @ X
One-to-One Relationship Represented with
Attributes Containing Object Identifiers
INFO425: Systems Design
INFORMATION SYSTEMS @ X
One-to-Many Relationship Between
Customer and Order Classes
INFO425: Systems Design
INFORMATION SYSTEMS @ X
One-to-Many Relationship Represented with Attributes
Containing Object Identifiers
INFO425: Systems Design
INFORMATION SYSTEMS @ X
Generalization Hierarchy within
the RMO Class Diagram (Figure 12-21)
INFO425: Systems Design
INFORMATION SYSTEMS @ X
Object DBMS Realities (from Wikipedia)
 Object databases based on persistent
programming acquired a niche in application areas
such as:

engineering and spatial databases, telecommunications, and
scientific areas such as high energy physics and molecular
biology.
 They have made little impact on mainstream
commercial data processing,
INFO425: Systems Design
INFORMATION SYSTEMS @ X
Object DBMS Realities (from Wikipedia)
 ODBMS suggest that pointer-based techniques are
optimized for very specific "search routes" or
viewpoints. However, for general-purpose queries
on the same information, pointer-based techniques
will tend to be slower and more difficult to
formulate than relational.
 Other things that work against ODBMS:


lack of interoperability with a great number of tools/features
that are taken for granted in the SQL world including but not
limited to industry standard connectivity, reporting tools,
OLAP tools and backup and recovery standards.
Lack a formal mathematical foundation, unlike the relational
model,
INFO425: Systems Design
INFORMATION SYSTEMS @ X
Hybrid Object-Relational Database Design
 RDBMS (hybrid DBMS) used to store object
attributes and relationships
 Design complete relational schema and
simultaneously design equivalent set of classes
 Mismatches between relational data and OO



Class methods cannot be directly stored or automatically
executed
Relationships are restricted compared to ODBMS
ODBMS can represent wider range of data types
INFO425: Systems Design
INFORMATION SYSTEMS @ X
Classes and Attributes
 Designers store classes and object attributes in
RDBMS by table definition
 Relational schema can be designed based on class
diagram
 Table is created for each class
 Fields of each table same as attributes of class
 Row holds attribute values of single object
 Key field is chosen for each table
INFO425: Systems Design
INFORMATION SYSTEMS @ X
Relationships
 Relationships are represented with foreign keys
 Foreign key values serve same purpose as object
identifiers in ODBMS
 1:M relationship – add primary key field of class on
“one” side of the relationship to table representing
class on “many” side
 M:M relationship – create new table that contains
primary key fields of related class tables and
attributes of the relationship itself
INFO425: Systems Design
INFORMATION SYSTEMS @ X
Data Access Classes
 OO design based on a three-layer architecture
 Data access classes are implementation bridge
between data stored in program objects and data
in relational database
 Methods add, update, find, and delete fields and
rows in table or tables that represent the class
 Methods encapsulate logic needed to copy data
values from problem domain class to database and
vice versa
INFO425: Systems Design
INFORMATION SYSTEMS @ X
Interaction
Among a
Domain Class,
a Data Access
Class, and the
DBMS
(Figure 12-25)
INFO425: Systems Design
INFORMATION SYSTEMS @ X
Data Types
 Storage format and allowable content of program
variable, object state variable, or database field or
attribute
 Primitive data types – directly implemented

Memory address (pointer), Boolean, integer, and so on
 Complex data types – user-defined

Dates, times, audio streams, video images, URLs
INFO425: Systems Design
INFORMATION SYSTEMS @ X
Relational DBMS Data Types
 Designer must choose appropriate data type for
each field in relational database schema
 Choice for many fields is straightforward



Names and addresses use a set of fixed- or variable-length
character arrays
Inventory quantities can use integers
Item prices can use real numbers
 Complex data types (DATE, LONG, LONGRAW)
INFO425: Systems Design
INFORMATION SYSTEMS @ X
Subset of Oracle RDBMS Data Types
INFO425: Systems Design
INFORMATION SYSTEMS @ X
Object DBMS Data Types
 Use set of primitive and complex data types
comparable to RDBMS data types
 Schema designer can create new data types and
associated constraints
 Classes are complex user-defined data types that
combine traditional concept of data with processes
(methods) to manipulate data
INFO425: Systems Design
INFORMATION SYSTEMS @ X
Distributed Databases
 Rare for all organizational data to be stored in a
single database in one location
 Different information systems in an organization
are developed at different times
 Parts of an organization’s data may be owned and
managed by different units
 System performance is improved when data is
near primary applications
INFO425: Systems Design
INFORMATION SYSTEMS @ X
Single Database Server Architecture
(Figure 12-27)
INFO425: Systems Design
INFORMATION SYSTEMS @ X
Replicated Database Server Architecture
(Figure 12-28)
INFO425: Systems Design
INFORMATION SYSTEMS @ X
Partitioned Database Server Architecture
INFO425: Systems Design
INFORMATION SYSTEMS @ X
Advantages of DDBMSs







Reflects organizational structure
Improved shareability and local autonomy
Improved availability
Improved reliability
Improved performance
Economics
Modular growth
INFO425: Systems Design
INFORMATION SYSTEMS @ X
Disadvantages of DDBMSs







Complexity
Cost
Security
Integrity control more difficult
Lack of standards
Lack of experience
Database design more complex
INFO425: Systems Design
INFORMATION SYSTEMS @ X
Functions of a DDBMS
 Expect DDBMS to have at least the functionality of a
DBMS.
 Also to have following functionality:





Extended communication services.
Extended Data Dictionary.
Distributed query processing.
Extended concurrency control.
Extended recovery services.
INFO425: Systems Design