Download Week1. Database Life Cycle. Data and data modeling

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Extensible Storage Engine wikipedia , lookup

Big data wikipedia , lookup

Entity–attribute–value model wikipedia , lookup

Database wikipedia , lookup

Clusterpoint wikipedia , lookup

Relational model wikipedia , lookup

Database model wikipedia , lookup

Transcript
Week1. Database Life Cycle. Data and data modeling
Terms to remember
Conceptual design, logical design, physical design
Hierarchical, network, relational data model, RDBMS
Accuracy, consistency, integrity, redundancy, accessibility of data
DBLC (Data Base Life Cycle) Phases compared to
SDLC (System Development Life Cycle)
DBLC
Database initial study
- list company objectives, operations,
structure
- define problems and constraints
- define the database system objectives
- define the scope and boundaries of the
project
Database design
- Conceptual design (ERD)
- DBMS software selection
- Logical Design (RDB)
- Physical Design
Implementation and loading
- install the DBMS, create the database, load
the data
Testing and evaluation
- testing, debugging, installation, fine-tuning
Operation
Database maintenance and Evolution
- evaluation, maintenance, enhancement,
change
SDLC
Planning
- initial assessment and feasibility study
- system analysis
- conclusion of this stage determines if
the database needs to be reassessed (the
database initial study)
Analysis
- user requirements, existing system
evaluation, logical system design. This stage
may involve Data Flow Diagrams (DFD) or
Hierarchical Input Process Output Diagrams
(HIPO) or Use Case Diagrams
Detailed design:
- detailed system specification made
- database design created
Implementation
- coding and installation
Testing and evaluation
- testing, debugging, installation, fine-tuning
Operation
Database maintenance and Evolution
- evaluation, maintenance, enhancement,
change
Within the information system we do transform data into useful information. Information system
itself can be considered as the model of business processes, including data, logic and algorithms.
Database is the model of information first of all, but such things like logic and algorithms make
their impact to information model as well. It means that database is not something what, being
once developed, will remain forever in this state. It has being developed and maintained along
with the Information System.
Data are raw facts. After some processing they become useful for operation and decision making
– they produce information.
Collecting data for computing – data must be structured and arranged for storage, extraction and
processing. Specialized tool to support data storage is known as Database Management System
(DBMS) and actually storage is named a database.
There are some conceptual requirements we have to remember:
-
Accuracy. Data must be accurate to be useful. It means that each particular raw
fact must be correct
Consistency. It means that any changes occurred with any raw fact must drive
relevant changes of other raw facts in order to be useful
Integrity. It means that relationships existed in the data are supported in its
electronic version
Accessibility. Data must be easy accessible to support the real life processes
Shared resource. Data must be arranged as a shared resource to support multi
access and remain consistent
To meet the requirements we have to resolve the number of problems:
Data redundancy. It means not only to get rid of something not needed for this particular
system. More serious problems appear when useful data are placed into more than one storage
unit. It causes different data anomalies during the process of:
- insertion
- deletion
- modification
Any data anomalies set the stage for data inconsistency, what can actually kill any use of data.
Thus the goal of computerized data storage, known as database, is to take data redundancy
under control.
Data accessibility requirement creates a number of questions to resolve. To achieve the goal:
- information space must be structured to give us the possibility to see data units and how
they interact each other. Those data units maybe titled differently: as “entity” (usually in
relational database system) or “object” (in object-oriented system) or “facts and
dimension” in data warehousing, however in each case they do represent something
important for operation support or decision making
- data units, whatever they are, must be identified
- data units must be described with some set of characteristics important for operation
- data units relationships must be determined and described
- all above mentioned must be done efficiently to support a proper performance of data
access
Data as a shared resource creates specific problems of data management.
DBMS is targeted to give the tools to implement database management – to provide effective
data accessibility supporting data integrity, accuracy and consistency in shared environment.
However BEFORE being implemented, database must be designed to meet the requirements of
effective data accessibility in consistent and accurate data state. The biggest challenge of
database design is that there is no “cooking book” for it. We can provide the methodology, the
technique but still for each particular system effective data structuring remains the issue.
Database design is the art.
The process of data structuring and modeling can be described roughly as the consequence of
stages and steps:
Conceptual data modeling:
- On the base of system and data analysis to discover data units - entities
- For each entity to define its description characteristics – attributes
- To find out how each entity can be uniquely identified
-
To discover the relationships among entities needed to support business processes
To find out the type of relationships – it can be a few (dependency, association,
aggregation, recursive etc)
The result – conceptual data model provides us with the understanding of data, independent of
implementation methods or tools. It relates to business processes and rules and it is the least
formalized process if to compare to the others: logical and physical.
Logical design assumes the knowledge about a type of the expected database model. Historically
starting from flat file systems, what really were not the databases in full sense of it, The types of
model: hierarchical, network, relational – basic models incorporated into commercial DBMS.
This classification is based on how the model implements association relationships of entities
from the relationship type point of view: 1:1, 1:M, M:N. The models differ by the data
navigation mechanisms first of all. Each model has its own advantages and disadvantages in
implementing the relationships.
Hierarchical model. Data items have been organized as the trees (in other words, parent-child
relationships). Good to support 1:M relationships fixed over time. For other cases, to implement
for example M:N type of relationships, this model cause essential data redundancy. Application
programming and use are complicated in any case. Data navigation is pre-determined by
hierarchical structure. It is very expensive to change data structure. The example: DBMS
ADABAS.
Network model. Allows relationships between data items be organized as the net. Navigation is
possible in any direction. More flexible, however still difficult for programming and use.
Flexibility achieved at the expense of the volume of meta information to support the networked
pointers. Changes of data structure are still difficult to implement, in fact this means database restructuring. The example: CODASIL based DBMS.
Relational model. Data items have been organized as the rectangle tables. Data navigation has
been implemented as the SQL based queries. The most effective among commercial DBMS.
Support any types of relationships, flexible to changes, easier to use. Disadvantages: complicated
RDBMS with high requirements to computer characteristics – not the problem now. The
problem is that this model is more vulnerable to design hidden mistakes. When hierarchical and
network data modeling mistakes often stop the possibility to achieve correct data processing
result, relational model design mistakes in most cases will not prevent data processing from
running, keeping the problem behind the scene. However the later discovered mistakes could be
quite painful, especially if it concerns data consistency or performance. That’s why it is very
important to learn relational database design technique. The example of relational DBMS:
Oracle, MS SQL Server, Informix, DB/2
Other models
Flat file system. Relationships and data structure are supported by application software. There
were no DBMS. Practically impossible to implement the basic paradigm of database – SHARED
data.
Object – Oriented model. Implement different approach of computer system design.
Conventional approach takes two things distinctly – data and data processing procedures, and
separate them in the implementation. Object-oriented approach assumes both things be joint into
a single unit – object. The example of pure object-oriented DBMS – Manifesto
Data Warehousing brings special data and data processing model – OLAP (OnLine Analytical
Processing), targeted to Decision Supporting System implementation. Incorporates some special
features of collecting big volumes of data for analytical processing. The example of OLAP
system - Cognos
LDAP (Light Data Access Protocol) – specialized data model used by directory services.