Download Systematic Approach for Information Systems Reengineering

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Extensible Storage Engine wikipedia , lookup

Relational algebra wikipedia , lookup

Clusterpoint wikipedia , lookup

Entity–attribute–value model wikipedia , lookup

Database wikipedia , lookup

Relational model wikipedia , lookup

Database model wikipedia , lookup

Transcript
Systematic Approach for
Information Systems
Reengineering
Shi-Ming Huang
References



T. Cheung, J. Fong, and B. Siu, “Database
Reengineering and Interoperability”, Plenum,
1995, ISBN 0-306-45288-X
R.S. Arnold, “Software Reengineering”,
IEEE Press 1993, ISBN 0-8186-3272-0
Fong and S. Huang, “Information Systems
Reengineering”, Springer Verlag, 1997,
ISBN 981-3083-15-8
DataBase Reengineering
Direct
Translation
Schema
Translation
Rewrite
Indirect
Translation
Bridge Program
Database
Re-engineering
Program
Conversion
Emulation
Data
Conversion
Decompilation
Physical
Conversion
Co-existence
Logical
Conversion
Bridge Program
Database Reengineering- Schema
Translation

Direct translation –
One can directly translate a nonrelational schema
to relational schema. However, such translation
may cause loss of information because of its
primitive method that cannot recover all the
original nonrelational schema semantics. Certain
advanced semantics are lost once they are
mapped from a conceptual schema such as an
entity-relationship model to a logical schema such
as hierarchical or network schema. Thus, users
input are needed to recover the lost semantics.
Database Reengineering- Schema
Translation

Indirect translation –
Indirect translations can be accomplished by mapping logical
hierarchical or network schema into a conceptual entityrelationship model schema in a reverse engineering. The
translated conceptual schema must have all the original logical
schema semantics. Users must provide information of
advanced semantics in the logical schema. Then the
conceptual schema can be automatically mapped to another
logical relational schema. Similarly, in order to translate a
relational schema to an object-oriented schema, we can map
the entity-relationship model, a conceptual model for relational
schema, to OMT, a conceptual model for object-oriented model
in a peer-to-peer translation. Then the OMT model can be
mapped automatically to an object-oriented model(database).
Database Reengineering-Data Conversion



Physical conversion –
The physical data of the nonrelational database is directly
converted to the physical data of the relational database. This can
be done in interpreter approach or generator approach. The
former is a direct translation from a data item to another. The latter
is to provide a generator that generates program to accomplish
the physical data conversion.
Logical conversion –
The logical approach is to unload the nonrelational database to
sequential file in the logical sequence similar to the relational
model. The sequential files can then be reloaded back to a target
relational database. This approach concerns with the logical
sequence of the data rather the physical attributes of each data
item.
Bridge Program –
Each nonrelational file requires a bridge program to convert it to a
relational file.
Database Reengineering-Database
Program Translation


Rewrite –
One can translate the nonrelational schema into relational,
map a nonrelational database into a relational database, and
rewrite all the application programs to run on the relational
database.
Bridge program –
One can map the nonrelational schema into a relational
schema, and then add relational interface software layer on the
top of nonrelational DBMS. The relational interface layer
translate the relational program DML into nonrelational program
DML statements to access an existing nonrelational database.
The relational interface is transparent to the users as a relation
DBMS, but actually, the physical database is still nonrelational.
Database Reengineering-Database
Program Translation


Emulation –
It is the technique of providing software or firmware in the
target system which map source program commands into
functionally equivalent commands in the target system. Each
nonrelational DML is substituted by relational DML statements
to access the converted relational database.
Decompilation –
One can translate schema from nonrelational to relational,
convert data from nonrelational to relational, and then convert
application programs from nonrelational to relational by
decompilation. Decompilation is the process of transforming a
program written in a low level language into an equivalent but
more abstract version and the implementation of the new
programs to meet the new environmental, database files and
DBMS requirements.
Data Model




A data model is a general structure for data
organization.
It enables us to capture, partially, the meaning of
data as related to the complete meaning of the world.
It is the primary tool for designing a database.
The basic components of such a data model include:
1.
a set of rules (i.e. schema description) to describe
the structure and meaning of data in a database
2.
and the atomic operations (i.e. data language) that
may be performed on the data in that database.
Data Model

Schema is a term used to represent the name of a
class together with the properties of that class.

The schema description includes two parts:
1.
one is the structure specification part which
represents objects, attributes, and the relationship
between objects;
2.
the other is rule specification for the inferences and
constraints.
Data Model: Hierarchy Data Model





There is a set of relationships connecting all record types in
one data structure diagram.
The relationships expressed in the data structure diagram form
a tree with all edges pointing towards the leaves.
Each relationship is 1:n and it is total. That is, if Ri is the parent
of Rj in the hierarchy then for every record occurrence of Rj
there is exactly one Ri record connected to it.
The linkage between record types is in automatic fixed set
membership.
The database access path of hierarchical database follows the
hierarchical path from parent to child record. The default path is
a hierarchical sequence of top-to-bottom, left-to-right and frontto-back.
Data Model: Hierarchy Data Model
Loan
Contracts
9
1
14
Loan
Drawdown
Loan
Interest
10
2
Loan
Repayment
11
3
Fixed
Rate
12
4
7
6
Index
Rate
13
5
Loan
Balence
15
8
Hierarchical Data Manipulation Language


Hierarchical data manipulation language(HDML) is a
record-at-a-time language for manipulating hierarchical
databases.
The commands of a HDML must be embedded in a
general-purpose programming language, called host
language.
Hierarchical Data Manipulation Language
The followings are the syntax of a hierarchical DML of IMS
(Information Management System, a hierarchical DBMS).
There are four parameters in IMS DML. They are:





Function Code, which defines the database access function;
Program Control Block, which defines the external subschema
access path;
I-O-Area, which is a target segment address; and
Segment Search Argument, which defines the target segment
selection criteria as follows:
CALL BLTDLI” USING FUNCTION-CODE
PCB-MASK
I-O-AREA
SSA-1 …
SSA-n.
Hierarchical Data Manipulation Language
Retrieval Command:
Modification Commands:
1.
2.
3.
Get Unique (GU)
1.
INSERT(ISRT)
Get Next (GN)
2.
REPLACE(REPL)
Get Next WITHIN
3.
DELETE (DELT)
PARENT(GNP)
Example:
CALL BLTDLI” USING GU
PCB-MASK
I-O-AREA
LOAN_CONTRACT# = 277988.
BALANCE_DATE = 19960722.
BALANCE_AMOUNT = 1000000.
CALL BLTDLI” USING ISRT
PCB-MASK
LOAN_BALANCE.
NETWORK (Codasyl) MODEL
SYSTEM
set
set
set
Course
Department
Student
Course#
coure-location
set
dept#
dept-name
set
set
Instructor
Grade
set
grade
inst-name
inst-addr
set
set
Prerequisite
Section
Prerequisite#
prerequisite-title
student#
s-name
section#
NETWORK (Codasyl) MODEL


Date Item –
It is an occurrence of the smallest unit of named
data. It is represented in the database by a value.
A data item may be used to build other more
complicated data constructs. This corresponds to
an attribute in the ER data model.
Data Aggregation –
It is an occurrence of a named collection of data
items within a record.
NETWORK (Codasyl) MODEL



Record - It is an occurrence of a named collection of data
items or data aggregates. This collection is in conformity
with the record type definition specified in the database
schema.
Set - It is an occurrence of a named collection of records. A
set occurrence is in conformity with the set type definition
specified in the database schema. Each set type consists of
one owner record type and at least one member record type.
Area - The notion of an area is used to identify the partition
of record occurrences. An area is a named collection of
records which need not preserve owner-member
relationships. An area may contain occurrences of one or
more record types and a record type may have occurrences
in more than one area.
Relational Model
A Publishing Company Relational Database Schema:
au_ord
royaltyper
title_id
title_id
au_id
authors
au_id
au_id
pub_id
au_lname
au_fname
phone
address
city
state
zip
cntract
publishers
pub_id
pub_name
city
state
country
logo
pr_info
title
type
pub_id(FK)
price
advance
royalty
ytd_sales
notes
pubdate
title_id
title_id
pub_id
pub_id
employee
emp_id
pub_id
pub_id
fname
minit
lname
job_id(FK)
job_lvl
pub_id(FK)
hire_date
title_id(FK)
lorange
hirange
royalty
sales
title_id
title_id
stor_id(FK)
ord_num
ord_date
qty
payterms
title_id(FK)
discounts
discounttype
stor_id(FK)
lowqty
highqty
discount
pub_id
pub_info
pub_id(FK)
roysched
titles
title_id
titleauthor
au_id (FK)
title_id (FK)
job_id
job_id
s tor _id
stores
stor_id
job_id
stor_name
stor_address
city
state
zip
job_id
job_id
jobs
job_id
job_desc
min_lvl
max_lvl
Primary key
A
B
Foreign key
(B refer to A)
Relational Model





Relational model is a logical schema in the form of
tables (relations) corresponding to the representation
of an entity type.
A column(attribute) of the table represents the
extension of an attribute in the entity.
A row(tuple) of the table represents an instance of the
entity.
Such table is commonly called a record type and
consists of a primary key as an attribute of non-null
value that can uniquely identify a tuple.
The parent child relationship of relations are
represented in the foreign key residing in the child
relation referencing the primary key of parent relation.
OBJECT-ORIENTED Model
Department
Dept#
Dept-name
hire
Inst-name
Inst-addr
.....
.....
......
......
Class Instructor
OID
Inst-name
Inst-addr
xxx
John Doe
1 Main St, HK
hired-by
Class Department
OID
Dep#
yyy
D01
Dept-name
hire
Marketing
Class defining object
OID
zzz
OIDs of Instructor
OBJECT-ORIENTED Model






an object is an instance value of a class. A collection of similar
objects forms a class. A class has attributes and methods. The
attributes of a class describe its properties. The methods of a class
describe its operations.
a class must support encapsulation (i.e. hiding operations from the
uses) such that object = data + program
data = values of attributes program = methods that operates on the
state
object attributes can be either simple or complex. The value of a
complex attribute is a reference to the instance of another class. In
other words, an object can be a nested object such that the value of
an object is another object.
Object attributes can be single-valued or mutli-valued.
Objects are uniquely identified by object identifier (OID) that are
assigned by the system.
Direct translation from a Network
Model to a Relational Model


Step 1 Derive relations
Map each Network record type to a relation in a one-to-one
manner.
Step 2 Derive relation keys
Map each record key of a Network schema to a primary key in
a Relational table. However, if the existing Network record key
is not unique, then it needs to concatenate with its owner
record key in order to be mapped as a primary key. The owner
record key is also mapped as a foreign key in the Relational
table to link between the parent and child records. If the set
membership in the logical Network schema is manual, then
its record key of member record will be mapped as a
candidate key in the relational table to to link between the
parent and child records. For instance, Figure 3-1 is the
network schema for a US President.
Direct translation from a Network
Model to a Relational Model
SYSTEM
sys
sys
Plname , pfname , party, collg
sys
sys
set
ADM#,iny,inm,ind
set
Eyear,winvotes
set
CNGR# ,HD,HR,SD,SR
set
SNAME,CAP,yad
set

PRESIDENT (Plname, Pfname, Party, Collg, *Sname)
ADMINISTRATION (Adm#, Iny, Inm, Ind, *Plname, *Pfname)
STATE (Sname, Cap, Pln, Pfn, Adm#, Yad)
ELECTION (Eyear, Winvotes, *Plname, *Pfname)
LINK (*Plname, *Pfname, Cngr#)
CONGRESS (Cngr#, Hd, Hr, Sd, Sr)
Direct translation from a hierarchical
model to a relational model

Step 1 Derive relations:
Map each record type to a relation.

Step 2 Derive relation keys
The record key of a hierarchical schema is mapped as a
primary key of a relation. However, if the record type of
the hierarchical schema is a child record, then the
primary key is derived by concatenating with its parent
record key. The parent record key is also mapped as a
foreign key in the child relation (Quizon, 1990).
Direct translation from a hierarchical
model to a relational model
Hierarchcial schema
acct#
name
Mapped relational schema
GAA
=
meter#
GAB
GAC
billmo
net_charge
Relations:
GAA ( acct#,name )
GAB ( acct#,meter# )
GAC ( acct#,billmo,net_charge)
Indirect translation from a network
model to a relational model
Hierarchcial
or network
schema
Reverse
Engineering
from logical
model to
conceptual
model
Conceptual
ERR Model
Forward
Engineering
from
conceptual
model to
logical model
Relational
schema
Reverse engineering from network
schema to conceptual EER model
Step 1 Derive implied relationships: The explicit
semantic implies a 1:n relationship if there is one
duplicate key in one record type, or 1:1 if there is a
duplicate key found in the record on both sides of the
relationships. User input is sought to confirm the
existence of such a semantic.
Reverse engineering from network
schema to conceptual EER model
Step 1 Derive implied relationships
Non-relational record types with one duplicate
key
LOAN
CUSTOMER
Customer#
(record key)
Loan#
(Duplicate key)
Loan#
(record key)
Non-relational record types with two duplicate
keys
LOAN
CUSTOMER
Customer#
(record key)
Loan#
(Duplicate key)
Loan#
(record key)
Customer#
(Duplicate key)
Implied relationship
Customer Loan
: 1
N
Implied relationship
Customer Loan
: 1
1
Reverse engineering from network
schema to conceptual EER model
Step 2 Derive multiple (alternative) relationships
In a network schema, a set of circuit loopy record
types may carry different navigational semantics. It is
thus up to user to confirm the original database
designer's idea on the function of alternative path. If the
user confirms the existence of a navigational semantic.
then the record types and Sets in the alternative path are
mapped to different Network subschema (one
subschema for each path) before translating to the
Relational schema.
Reverse engineering from network
schema to conceptual EER model
Step 2 Derive multiple (alternative) relationships
SYSTEM
set
CITIES
set
STORES
city
city-headquarter
store
store-address
set
set
ITEMS
item
qty
Reverse engineering from network
schema to conceptual EER model
Step 3 Derive unary relationships.
Record Employee
n
1
set
1
set
1
n
Entity Employee
1
manages
Corresponding EER model
Dummy Record
Network Schema
Figure Map unary 1:n relationship from network to EER model
Reverse engineering from network
schema to conceptual EER model
Step 4 Derive binary relationships
RECORD
DEPARTMENT
1
ENTITY
DEPARTMENT
set
N
1
HAS
N
ENTITY
EMPLOYEE
corresponding EER model
RECORD
EMPLOYEE
Network Schema
RECORD
SUPPLIER
RECORD
PARTS
1
1
set
N
ENTITY
SUPPLIER
N
SUPPLY
QTY
N
ENTITY
PARTS
set
N
corresponding EER model
RECORD QTY
Network Schema
Figure Map 1:n and m:n relationship from network to EER model
Reverse engineering from network
schema to conceptual EER model
Step 5 Derive entities of n-ary relationships
Employee
Project
Skill
Employee
m
set
set
set
:
n
Textbookused
:
n
Skill-used
Network schema
Project
m
n
:
m
Skill
corresponding EER model
Figure Map n-ary relationship to EER model
Reverse engineering from network
schema to conceptual EER model
Step 6 Derive aggregation, generalization and categorization
RECORD
CLASS
RECORD
LECTURER
1
ENTITY
CLASS
SECTION
M
N
ENTITY
LECTURER
1
set
set
N
1
N
RECORD
SECTION
ATTENDED BY
1
set
N
N
RECORD
STUDENT
Network Schema
ENTITY
STUDENT
Translated EER model
Figure Map set of relationships to aggregation in EER model
Map is a relationship to overlap generalization
EMPLOYEE
Person
Employee-flag
Alumnus-falg
Student-flag
o
set
set
set
Employee
Alumnus
Student
Employee
Network schema
Alumnus
Student
corresponding EER model
Map is a relationships to categorization in EER model
Bank
Person
Company
set
set
set
Bank
Person
Company
u
Owner
Network schema
Owner
corresponding EER model
Reverse engineering from network
schema to conceptual EER model
Step 7 Derive entity keys and other constraints.
Customer
Customer#
Customer
Record identifier
(Customer#)
Loan
Record identifier
(Loan#)
Collateral
Record identifier
(Collateral#)
set
Loan
Loan#
set
Collateral
Collateral#
Figure Map network schema with fully internally identifier to relational
Reverse engineering from network
schema to conceptual EER model
Step 7 Derive entity keys and other constraints.
Customer
Customer#
Customer
Record identifier
(Customer#)
Loan
Record identifier
(Customer#,
Loan#)
Collateral
Record identifier
(Customer#,
Loan#,
Collateral#)
set
Loan
Loan#
set
Collateral
Collateral#
Figure Map network schema with partially internally identifier to relational
Reverse engineering from network
schema to conceptual EER model
Step 7 Derive entity keys and other constraints.
Customer
Customer#
Customer
Record identifier
(Customer#)
Loan
Record identifier
(Customer#,
Loan#)
Collateral
Record identifier
(Customer#,
Loan#,
Sequence#)
set
Loan
Loan#
set
Collateral
Collateral#
Figure Map network schema with internally unidentified to relational
Figure network schema dependency
relationship translation
FD: B.b -> A.a
Record
A
a
ENTITY
A
ID: B.a
 A.a
a
A UTOMA TIC-FIXED
A UTOMA TIC-MA NDA TORY
SET A B
Record
B
R
b
ENTITY
B
b
a
corresponding EER model
Record
A
SET A B
Record
B
a
MA NUA L-OPTIONA L/
MA NUA L-FIXED/
MA NUA L-MA NDA TORY /
A UTOMA TIC-OPTIONA L
b
Network Schema
ENTITY
A
a
R
ENTITY
B
b
a
corresponding EER model
Reverse engineering from relational
model to conceptual EER model
Step 1. Define each relation, key and field
• Primary relation. These relations describe entities.
• Primary relation - Type 1 (PR1). This is a relation whose primary
key does not contain a key of another relation.
• Primary relation - Type 2 (PR2). This is a relation whose primary
key does contain a key of another relation.
• Secondary relation. This is a relation whose primary key is full or
partially formed by concatenation of primary keys of other
relations.
Reverse engineering from relational
model to conceptual EER model
Step 1. Define each relation, key and field
•
•
•
•
•
•
Secondary relation - Type 1 (SR1). If the key of the secondary relation is
formed fully by concatenation of primary keys of primary relations, it is of
Type 1 or SR1
Secondary relation - Type 2 (SR2). Secondary relations that are not of
Type 1
Key attribute - Primary (KAP). This is an attribute in the primary key of a
secondary relation that is also a key of some primary relation.
Key attribute - General (KAG). These are all the other primary key
attributes in a secondary relation that are not of the KAP type.
Foreign key attribute (FKA). This is a non-primary key attribute of a
primary relation that is a foreign key.
Nonkey attribute (NKA). The rest of the non-primary-key attributes.
Reverse engineering from network
schema to conceptual EER model
Step 2. Map each PR1 into entity
Department
Dept#
Dept_name
Prerequisite
Student
Pre#
prer_title
Course
Student#
Student_name
Course#
Course_Location
Figure Map primary relations to entities
Step 3. Map each PR2 into weak entity.
Department
Dept#
Dept_name
hire
Instructor
Dept#
Inst_name
Inst_addr
Figure Map PR2 into EER model
Reverse engineering from network
schema to conceptual EER model
Step 4.
Map SR1 into binary/n-ary
relationship.
Step 5.
Map SR2 into binary/n-ary relationship
Instructor
Student
Student#
Student_name
grade
Section
Section#
Course
Dept#
Inst _name
Inst_addr
Course#
Course_Location
teach
has
Section
Figure Map SR1 into EER model
Dept#
Inst_Name
Course#
Section#
Figure Map SR1 into EER model
Reverse engineering from network
schema to conceptual EER model
Step 6. Map each FKA into relationship.
Course
pre-course
Course#
Course_Location
Prerequisite
Prer#
Prer_title
Figure Map FKA into EER model