Download DATABASE DESIGN

Document related concepts

Operational transformation wikipedia , lookup

Data Protection Act, 2012 wikipedia , lookup

Expense and cost recovery system (ECRS) wikipedia , lookup

Concurrency control wikipedia , lookup

Versant Object Database wikipedia , lookup

Data center wikipedia , lookup

Entity–attribute–value model wikipedia , lookup

Data analysis wikipedia , lookup

Data model wikipedia , lookup

Database wikipedia , lookup

3D optical data storage wikipedia , lookup

Information privacy law wikipedia , lookup

Relational model wikipedia , lookup

Clusterpoint wikipedia , lookup

Data vault modeling wikipedia , lookup

Business intelligence wikipedia , lookup

Database model wikipedia , lookup

Transcript
DATABASE
DESIGN
Conventional Files versus the
Database
File – a collection of similar records.
– Files are unrelated to each other except
in the code of an application program.
– Data storage is built around the
applications that use the files.
Database – a collection of interrelated files
– Records in one file (or table) are
physically related to records in another
file (or table).
– Applications are built around the
integrated database
Files Versus Database
Pros and Cons of Conventional Files
Pros
 Easy to design because of their singleapplication focus
 Excellent performance due to optimized
organization for a single application
 Easy to design because of their singleapplication focus
 Excellent performance due to optimized
organization for a single application
Cons
 Harder
to
adapt
to
sharing
across
applications
 Harder to adapt to new requirements
 Need to duplicate attributes in several files.
Pros and Cons of Databases
Pros
 Data independence from applications
increases adaptability and flexibility
 Superior scalability
 Ability to share data across applications
 Less, and controlled redundancy (total nonredundancy is not achievable)
Cons
 More complex than file technology
 Somewhat slower performance
 Investment in DBMS and database experts
 Need to adhere to design principles to
realize benefits
 Increased
vulnerability
due
to
consolidating data in a centralized database
 Previous file design methods required that
the analyst specify precisely how the
records in a file should be:
– Sequenced (File organization)
– Accessed (File access)
 Database technology usually predetermines
and/or limits this
– Trained database administrator may be
given some control over organization,
storage location, and access methods for
performance tuning.
Data architecture – a definition of how:
– Files and databases are to be developed
and used to store data
– The file and/or database technology to
be used
– The administrative structure is set up to
manage the data resource
Data is stored in some combination of:
– Conventional files
– Operational databases – data bases that
support day-to-day operations and
transactions for an information system.
Also called transactional databases.
– Data warehouses – databases that store
data
extracted
databases.
from
• To support data mining
– Personal databases
– Work group databases
operational
A Modern Data Architecture
Data administrator – a database specialist
responsible for data planning, definition,
architecture, and management.
Database administrator – a specialist
responsible for database technology,
database design, construction, security,
backup and recovery, and performance
tuning.
– A database administrator will administer
one or more databases
Why Use A Database?
Data overload is a common problem in
business
today.
Corporations
and
individuals have plenty of raw data, but
can't always find it or aren't aware that they
even have it. Raw data must be filtered and
organized to become useful information.
Databases are a primary tool for the task; a
tool which takes advantage of the speed
and power of modern computers.
Some terms in DB Design
Entity - the principal data object about
which information is to be collected.
 Entities
are
usually
recognizable
concepts, either concrete or abstract, such
as person, places, things, or events which
have relevance to the database. Some
specific examples of entities are
EMPLOYEES, PROJECTS, INVOICES.
An entity is analogous to a table in the
relational model. Name your entities in
singular form and in ALL CAPS. For
example, an entity that contains data about
your company's employees would be
named EMPLOYEE.
Attribute - is a descriptive or quantitative
characteristic of an entity. They describe the
entity of which they are associated . The
physical counterpart of an attribute is a
database column (or field). Name your
attributes in singular form with either Initial
Capital Letters or in all lower case. For
example, some attribute names for your
EMPLOYEE entity might be: EmployeeId
(employee_id) and BirthDate (or birthdate).
Relationship - is a logical link between two
entities; an association between two or
more entities.
 Represents a business rule and can be
expressed as a verb phrase. Most
relationships between entities are of the
"one-to-many" type in which one instance
of the parent entity relates to many
instances of the child entity.
 Examples of relationships are:
i) employees are assigned to projects
ii) projects have subtasks
iii) departments manage one or more
projects
 The second type of relationship is the
"many-to-many" relationship. In a "manyto-many" relationship, many instances of
one entity relate to many instances of the
other entity. "Many-to-many" relationships
need to be resolved in order to avoid data
redundancy.
Database Design
Database design has two parts:
Data Model
Focuses on what data should be stored in
the database. The data model is used to
design the relational tables
Function Model
Deals with how the data is processed. The
functional model is used to design the
queries that will access and perform
operations on those tables designed at the
data model stage.
Planning and Analysis
Planning defines the goals of the database ,
explains why the goals are important, and
sets out the path by which the goals will be
reached.
Analysis
involves
determining
the
requirements of the database. This is
typically done by examining existing
documentation and interviewing users.
Planning and analysis……
Data modeling is preceded by planning and
analysis.
The effort devoted to this stage is
proportional to the scope of the database.
The planning and analysis of a database
intended to serve the needs of an enterprise
will require more effort than one intended
to serve a small workgroup.
An accurate and up-to-date data model can
serve as an important reference tool for
DBAs, developers, and other members of a
JAD (joint application development) team.
Data Modeling
The process of creating a data model helps
the team uncover additional questions to
ask of end users.
Effective database design also allows the
team to develop applications that perform
well from the beginning.
By building quality into the project, the
team reduces the overall time it takes to
complete the project, which in turn reduces
project development costs.
An effective data model completely and
accurately represents the data requirements
of the end users. It is simple enough to be
understood by the end user yet detailed
enough to be used by a database designer
to build the database.
The model eliminates redundant data, it is
independent of any hardware and software
constraints, and can be adapted to changing
requirements with a minimum of effort.
Data modeling is a bottom up process. A
basic model, representing entities and
relationships, is developed first. Then
detail is added to the model by including
information about attributes and business
rules.
The information needed to build a data
model is gathered during the requirements
analysis.
The Requirements Analysis
Goals:
a) to determine the data requirements of the
db in terms of primitive objects
b) to classify and describe the information
about these objects
c) to identify and classify the relationships
among the objects
e) to determine the types of transactions that
will be executed on the DB and the
interactions between the data and the
transactions
f) to identify rules governing the integrity of
the data
g) the modeler, works with the end users of
an organization to determine the data
The requirements analysis is usually done
at the same time as the data modeling.
As information is collected, data objects
are identified and classified as either
entities, attributes, or relationship; assigned
names; and, defined using terms familiar to
the end-users. The objects are then
modeled and analyzed using an ER
diagram.
The diagram can be reviewed to determine
its completeness and accuracy, and/or
modified.
The review and edit cycle continues until
the model is certified as correct.
Points to note
a) Talk to the end users about their data in
"real-world" terms. Users do not think in
terms of entities, attributes, and
relationships but about the actual people,
things, and activities they deal with daily.
a) Take the time to learn the basics about
the organization and its activities that you
want to model. Having an understanding
about the processes will make it easier to
build the model.
b) End-users typically think about and view
data in different ways according to their
function
within
an
organization.
Therefore, it is important to interview the
largest number of people that time
permits.
Steps In Building the Data Model
i. Identification
relationships
of
data
objects
and
ii. Drafting the initial ER (Entity
relationship) diagram with entities and
relationships
iii. Refining the ER diagram
iv. Add key attributes to the diagram
i.
Adding non-key attributes
ii. Diagramming Generalization Hierarchies
iii. Validating
the
normalization
model
through
iv. Adding business and integrity rules to the
Model
N.B: In practice, model building is not a
strict linear process.
Identification of data objects and
relationships
In order to begin constructing the basic
model, the modeler must analyze the
information
gathered
during
the
requirements analysis for the purpose of:
classifying data objects as either entities or
attributes
identifying and
between entities
defining
relationships
naming and defining identified entities,
attributes, and relationships
documenting this information in the data
document
What makes an object an entity or
attribute?
 For example, given the statement
"employees work on projects". Should
employees be classified as an entity or
attribute? Very often, the correct answer
depends upon the requirements of the
database. In some cases, employee would
be an entity, in some it would be an
attribute.
Some commonly given guidelines are:
 entities contain descriptive information
 attributes either identify or describe entities
 relationships are associations between
entities
The Entity-Relationship Model
 Is a conceptual data model that views the
real world as entities and relationships. A
basic component of the model is the
Entity-Relationship diagram which is used
to visually represents data objects.
 The ER model views the real world as a
construct of entities and association
between entities.
Achieving a Well-Designed
Database
 A table should have an identifier.
 A table should store only data for a single
type of entity.
 A table should avoid nullable columns.
 A table should not have repeating values or
columns.
Some Common Database Design
Mistakes
1. Poor design/planning
2. Ignoring normalization
3. Poor naming standards
4. Lack of documentation
5. Lack of testing
1.Poor Design/Planning
"If you don't know where you are going,
any road will take you there" –
George Harrison
2. Ignoring Normalization
 Normalization defines a set of methods to
break down tables to their constituent parts
until each table represents one and only
one "thing", and its columns serve to fully
describe only the one "thing" that the table
represents.
Normalization
 Normalization is a database design
approach that seeks the following four
objectives:
i. minimization of data redundancy,
ii. minimization of data restructuring,
iii. minimization of I/O by reduction of
transaction sizes, and
iv. enforcement of referential integrity.
Normalization….
Consider the following example Customer table:
 A payment does not
describe a Customer and
should not be stored in the
Customer table.
 Details of payments should
be stored in a Payment
table, in which you could
also
record
extra
information
about
the
payment, like when the
payment was made, and
what the payment was for.
3.Poor naming standards
Consistency. The names you choose are not
just to enable you to identify the purpose of
an object, but to allow all future
programmers, users, and so on to quickly and
easily understand how a component part of
your database was intended to be used, and
what data it stores.
Poor naming standards ……
Present to the users clear, simple,
Descriptive names, such as Customer and
Address.
Avoid names such as:
- colVarcharAddress
- X304_DSCR
These mean nothing to the user.
The usage of dashes, spaces, digits and
special characters is discouraged
4.Lack of Documentation
 Poorly documented code is a synonym for
"job security."
 Your goal should be to provide enough
information that when you turn the
database over to a support programmer,
they can figure out your minor bugs and fix
them.
Lack of Documentation…..
 In many cases, you may want to include
sample values, where the need arose for the
object, and anything else that you may
want to know in a year or two when "future
you" has to go back and make changes to
the code.
5.Lack of Testing
 Proper test plan takes into consideration all
possible types of failures, codes them into
an automated test, and tries them over and
over.
 Good testing won't find all of the bugs, but
it will get you to the point where most of
the issues that correspond to the original
design are ironed out.
DATABASE SECURITY
SECURITY CONCERNS
AND MEASURES
Classes of Vulnerabilities
Keep your confidential data secure from
(internal or external) intruders
 Vendor bugs – are programming errors that
result in users executing commands that
they are not allowed to execute.
Downloading and applying patches fixes
this problem
 Database worms – (A worm is a selfreplicating computer program. It uses a
network to send copies of itself to other
nodes (computer terminals on the network)
and it may do so without any user
intervention. Unlike a virus, it does not
need to attach itself to an existing program.
Worms always harm the network (if only
by consuming bandwidth), whereas viruses
always infect or corrupt files on a targeted
computer).
 Misconfiguration – caused by not locking
down databases – setting configuration
options in a way that compromises security
 Poor Architecture – Not factoring security
into the design of how the application
works, e.g. use of a weak form of
encryption. Are hardest to fix since they
require a major rework by the vendor.
Database Security Measures
Some security measures:
a) Encrypt Data and Packages
b) Audit Access to Sensitive Data
Regardless of Access – ensures that any
issues are dealt with in good time
Database Security Measures…
d) Server security - Use of firewalls
e) User-Authentication
passwords
Security
Use
of
f) Physical security – location of the server
holding the Database
Database Security Summary
 Stay aware of data security holes
 Explore possible third-party options
 Perform audits and pen tests on your
databases regularly
 Encryption of data in motion
 Encryption of data at rest within the
database
 Monitor your log files
 Implement Intrusion Detection
p.s Provide multiple levels of security
 The data stored in a database is managed
by a Data Base Management System
(DBMS).
 The DBMS is responsible for adding,
modifying, and deleting data from the
database. The DBMS is also responsible
for providing access to the data for viewing
and reporting.
 Open source DBMS's include MySQL,
Postgres, and BerkleyDB.
 Commercial DBMS's include Oracle, DB2,
Sybase, Informix, and Microsoft SQL.
 Effective database design can help the
development
team
reduce
overall
development time and costs. Undertaking
the process of database design and creating
a data model helps the team better
understand the user's requirements and thus
enables them to build a system that is more
reflective of the user's requirements and
business rules.
Data Warehousing
 A data warehouse is where information is
organized for quick retrieval. Data is got
from different sources (usually databases)
set up for different purposes
Differences to Traditional
Database
 Data is organized around major subjects
rather than individual transactions
 Summarized data is used rather than
detailed data
 Data is framed for long time decision
making
 They are organized for quick queries not so
much for efficient storage
 Optimized for complex queries known as
OLAP (online analytical processing).
Allows managers to look at a database at
different dimensions
 Allows easy access via data mining (swift
ware) that searches for patterns and is able
to identify relationships
 Include multiple databases that have been
processed so that data is uniform (clean
data)
 They include data from outside sources and
the one generated internally
 Building a warehouse is complex. An
analyst gathers information from a variety
of sources, translates it into a common
form e.g. a database of gender could be
“male” “female”, another one could have
“M” and “F” while a third one could have
“0” and “1”
 Once clean, the analyst has to decide how
to summarize data and predict the type of
queries that might be asked (details are
usually lost during summarization). The
warehouse is then designed both logically
and physically
 Note: the analyst must know a lot about the
business.
 Because of its size,
expensive
a warehouse is
Data Mining
 Data mining can identify patterns that
human is unable to detect The data mining
algorithms search data warehouses for
patterns. It is known by another name
Knowledge Data Discovery (KDD).
Software for Data Mining
Known as decision aids include:
 Statistical analysis software
 Neural networks
 Fuzzy networks
 Intelligent argents
 Logic and data visualization
 Patterns that decision makers try to identify
include:
 Associations: Patterns that occur together
at the same time. For example, a person
who buys milk usually buys bread
 Sequences: Actions that take place over a
period of time, e.g. if a family buys a house
this year, they will most likely buy a fridge
and cooker next year.
 Clustering:
A pattern that develops
among a group of people. e.g. Customers
who live in a particular area tend to buy a
particular product
 Trends: Patterns that are noticed over a
period of time. E.g. Customers may move
from buying processed food to natural
foods (herbal products) or African attires
 Data mining also targets customers.
Assuming that past behavior is a good
predictor for the future. A large amount of
data is captured from a particular person
and companies share this information.
Credit companies have taken advantage of
this where they target customers.
Problems with Data Mining
 Cost could be too high to justify data mining
 Coordination of several customers
departments could be problematic
or
 Customers could resent their privacy being
invaded and reject the offers that are coming
their way
 Erroneous profiles could be made of people,
stored, and not deleted. The police could act
on these profiles without meeting the people
Ethical Issues
 Analysts should take the responsibilities
for considering the ethical aspects of any
data mining projects that are proposed.
 Length of time the material is kept
 Privacy safe guards should be installed
 Confidentially of the material
 The uses to which inferences are put
should be asked and considered with the
client.
 The opportunities for abuse are apparent
and must be guarded against. For
consumers, data mining is a push
technology and if consumers do not want
to be pushed, data mining efforts could
back fire.
Data Warehousing
Operational
databases
accounting
databases
Intern
al
Data
source
s
Customer
databases
Extract
and
transform
Manufacturin
g databases
Extract
Filter
Transform
Classify
Aggregate
Summarize
Historical
databases
External
Data
sources
Data extraction
and
transformation
External
databases
Data
warehouses
Custome
r Data
Product
data
Sales
data
Integrated
Subject
oriented
Timevariant
Non-volatile
Data
Data access
and analysis
OLAP
Data
Mining
Querying
Reporting
Business
intelligence
THANK YOU