* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Data Modeling Overview
Survey
Document related concepts
Transcript
Data Modeling Overview
By: Dave Wentzel
What we will accomplish
Review
of DBMS
Issues related to DBMS
Entity Relationship Modeling
– Process flow
– Model types
– Component definition
Selecting
entities and attributes
Defining relationships
What we will accomplish
Defining
Cardinality
Selecting Primary Keys
Review of recursive relationships, weak
entities, and ternary relationships
Participation constraints
Erwin Notation
NULL issues
The Physical Model
What we will accomplish
Generalization
/ Specialization
Transaction processing
Normalization Rules
History issues
What is data?
Data
– Raw facts. Can be described, observed, and
measured.
Information
– Data organized in a form that is useful for
decision making. The meaning behind the data.
– New thing not previously observed that is
created based on the data.
Knowledge
– Information that is used for decision making.
What is a Database?
Collection
Database
of interrelated data
Data which can be visualized in a table
format
Contains relationships between data
Can be of any size and varying complexity
Can be maintained manually or by
computer
Data Base Management System (DBMS)
Collection
of programs (software) that
allows users to create and maintain a
database
Supports data:
– Definition - specification of data types,
structures, and constraints
– Construction - storing of the data itself
– Manipulation - updating & querying of the data
Defines
itself. Contains a catalog which
describes its data.
Components of a DBMS
Catalog
– Maintains information about the data in the
database
– Considered data about data (metadata)
Databases
– Collection of related tables
Tables
– Rows and columns containing data
Issues in DBMS
Data
independence
Query optimization
– Improve efficiency
– Faster responses
Transaction
management
– Sequence of operations that are treated as a unit
– Once 1st step is completed, 2nd step must also
be completed otherwise 1st step is aborted
(ROLLBACK mechanism)
Example:
Transferring Bank Funds
Issues in DBMS continued
Transaction
management
– Concurrency
– Recovery
Controlled
redundancy
– Goal of database design is to minimize
redundancy (duplicate data)
Integrity
constraints
– Includes business rules and data rules
Issues in DBMS continued
Security
and privacy
– Protect against unauthorized access
Data
/ database administration
– Involves managing people, data, performance,
security, etc.
Entity Relationship Modeling
Employee
Person
Account
Transaction
Data Model
Tool
for describing data, its relationships,
semantics, and integrity constraints
Provides for data abstraction
Hides details of data storage
Why use an ER Model?
Easy
to use for modeling DB design
Succinct representation of database layout
Good communication tool among project
team members
Most case tools support ER modeling
Implementation independent
Categories of Data Models
Logical
model
– Conceptual data model
– High level model
– Closest view user has of the data
Physical
model
– Low level model
– Defines how data is stored
Steps in Database Design
Mini World
Functional Requirements
Requirements
Collection and
Analysis
Functional Analysis
Database
Requirements
Logical Model
High Level Transaction Requirement
DBMS Independent
Data Model Mapping
DBMS Specific
API
Application Program
Design
Physical Design
Transaction
Implementation
Internal Schema
Application Programs
ER Modeling composed of
Entity
(table)
Attribute (field)
Relationship
– Binary Relationships
– Cardinality of relationships
What is an entity?
Conceptual
definition
– Distinguishable object that exists
Operational
definition
– Business object that has properties we are
interested in storing
Physical
definition
– Set of related data forming a table composed of
attributes (fields)
Entities
Primary THINGS
of a business about which
users need to record data
Objects about which the business is
interested in tracking information
When an ER Diagram is translated into a
relational model, the entities become the
tables.
Selecting Entities
Nouns
are candidate entities
Possible classes of entities:
– People who carry out some function (
employees, students, customers)
– Places (cities, offices, routes)
– Things which are tangible physical objects
(equipment, products, buildings)
– Organizations (teams, suppliers, departments)
Selecting Entities Continued
Events
which occur at a given date/time or have
steps (employee promotions, project phases,
account payments)
Concepts which are intangible ideas used to keep
track of business activities (projects, accounts,
complaints)
Questions to ask...
What
things do we need to keep data about?
What things are essential to the
organization?
What things do we talk about in the
organization?
What questions do we have that reports can
help answer?
What information should the reports
contain?
Naming entities
Use
a SINGULAR noun
Meaningful but intuitive
Avoid names which may be misinterpreted
within the problem domain
Follow organizational / industry trends
Do not try to rename entities within an
organization
Avoid abused names such as Task, Form,
Operation, Schedule...
Is it an entity to worry about?
Decide
if an entity is relevant to your
problem domain by determining if it has
attributes you need to track
If it does not have attributes you need to
track, it is NOT a valid entity for your
problem
Is it really an entity?
Can
you define attributes for it? An
attribute is a piece of information that we
are interested in tracking about an entity. It
is a property of an entity.
In general, if two objects differ by one
attribute, they are separate entities.
Does it participate in a relationship? Two
entities that are related somehow interact
with one another.
Attributes
Properties
of an object (entity)
Each attribute has a data type (char, int,
datetime)
Each attribute in an RDBMS (relational
database management system) has only one
value at a time (atomic)
Categories of Attributes
Descriptive
– Property of the entity that helps describe the
entity
Identifying
(key attributes)
– Property of the entity that helps uniquely
identify the entity
– Normally short
– If one does not exist it MUST be created
– If creating a key, use a numeric/integer data
type
Types of Attributes
Atomic
– Indivisible value
– Most desired state
Composite
– Can be divided into smaller parts
– Need to convert into atomic
Types of Attributes Continued
Multi-valued
– Multiple instances of an attribute
– Normally create another entity
Derived
– Can be determined by the value of another
attribute or attributes
– In most cases, do NOT store derived attributes
Naming Attributes
Use
a noun, adjective, or adverb
Name should be unique database wide
Use attribute names consistently
Use singular names
Define a naming convention for the
organization
Rules for Entity Analysis
Every
noun is a candidate for an entity
Every entity should be relevant to the problem
If an object has only one property of importance,
then it should be considered an attribute of
another entity
If an object has only one data instance (1 row)
then do not model as an entity
If an object needs a unique identifier then model
it as an entity
Relationships
Way
entities interact with one another
An association between two or more
entities
Depicts business interactions between
entities
They DO NOT represent business flow
Relationships Continued
Number
of entities associated through a
relationship defines its degree (unary,
binary, ternary, n-ary)
Cardinality defines the maximum number
of entities that can participate in the
relationship
How to Identify a Relationship
Ask
what is the action or verb used to
describe how one entity interacts with
another
Three types of relations to consider:
– Existence (Employee HAS Children)
– Functional (Professor TEACHES Course)
– Event (Customer PLACES Order)
Ignore
verbs not important to the
organization
More on Relationships
Relationships
and cardinality constraints
represent business rules
When naming a relationship use and active
verb in the present tense
Relationships are read bi-directionally
Example notes:
Together
the customer and account tables form a
schema - structure / layout of a logical database
design
Note the attributes. Order DOES NOT
MATTER but convention puts primary key first.
No duplicates for attributes.
No duplicate tuples (rows)
Relationship - same attribute name ( or different
attribute name with same meaning, in 2 tables.
Cardinality Constraints
Express
the MAXIMUM number of entities
that can be associated with another entity
via a relationship
Also known as mapping constraints
Types:
– 1:1 (one to one)
– 1:N (one to many)
– N:M (many to many)
The Key to It All
Identifiers...
Attribute(s)
which uniquely identify a
record
An entity may have multiple identifiers
Every entity MUST have at least one
Can be made up of more then one attribute
Candidate vs. Primary Keys
Both
are identifiers
Candidate keys are all the identifiers from
which you can choose which uniquely
identify the record
Primary key is the one candidate key which
is selected to always uniquely identify the
record
Selecting the Primary Key
In
general we create a primary key
however...
Choose the attribute most widely used in
the query
Select the shorter data type
If one does not exist, must create one
Select a MINIMUM key if using compound
attributes (not recommended)
Key Requirements and Preferences
Known
at all times
Can NOT be null
Should not be changed
Shorter is better
Numeric / integer is better
Avoid keys containing letters O, I, Z, S - can be
confused with numbers
If key includes time, it should be in 24hr format
Avoid carrying meaning
With this all said...
It
is difficult to come up with a primary key
based on real attributes which will not
change over time (phone numbers, SSN,
addresses, driver’s license numbers…)
In most cases it is best to create the primary
key
In SQL Server can use the identity column
which creates a sequential number
Primary Keys and Relationships
In
a 1:1 relationship, the primary key of
either one of the entities must migrate to the
other entity
In a 1:N, the primary key of the 1 side must
migrate to the entity on the N side
In a M:N, the keys of both entities are used
to identify a new entity which resolves the
M:N into two 1:N relationships
Foreign Key
When
a key migrates to another entity it is
called a Foreign Key
A foreign key CAN BE null if it is not part
of an entity’s primary key
If the FK value is NOT null, then that value
MUST exist in the table in which it is the
primary key. This is called Referential
Integrity (RI)
Recursive Relationships
An
entity having a relationship with itself
Same entity participates more than once in
a relationship type in different roles
Same cardinality examples exist in
recursive relationships
Weak Entity Type
Entity
that does not have a key attribute of
its own
Identified by its relationship with another
entity
Created for multi-valued attributes and time
dependent attributes
Weak entity has EXISTENCE dependence
on the parent. Only exists if the owner
entity exists.
Primary Keys of Weak Entities
Can
use the primary key of the owner entity
along with a qualifier such as sequence
number or date/time
Can create a surrogate key but make sure
you migrate the key of the parent
Ternary Relationship
Relationship
between 3 entities
Differs from 3 binary relationships
States that all three entities occur at the
same time
Must be converted to binary relationships
Creating Binary Relationships
from a Ternary Relationship
Participation Constraints
Specifies
whether the existence of an entity
depends on its being related to another
entity via a relationship
Notes the minimum cardinality
Total participation (mandatory)
Partial participation (optional)
Identifying Participation Constraints
Can
entity A exist without entity B?
– If no, A has total participation in the
relationship
– If yes, entity A has partial participation in the
relationship
Identifying Relationships In Erwin
An
identifying relationship is a relationship
between two tables in which an instance of
a child table is identified through its
association with a parent table, which
means the child table is dependent on the
parent table for its identity, and cannot exist
without it. In an identifying relationship,
one instance of the parent table is related to
multiple instances of the child.
Non-Identifying Relationship In Erwin
A non-identifying
relationship is a relationship
between two tables in which an instance of the
child table is not identified through its
association with a parent table, which means the
child table is not dependent on the parent table
for its identity, and can exist without it. In a
non-identifying relationship, one instance of the
parent table is related to multiple instances of
the child.
Optional Non-Identifying
In
an optional non-identifying relationship,
the columns that are migrated into the nonkey area of the child table are not required
in the child table. This means that nulls are
allowed in the foreign key. ERwin draws an
optional non-identifying relationship
differently depending on the notation for
your diagram
Mandatory Non-Identifying
In
a mandatory non-identifying
relationship, the columns that are migrated
into the non-key area of the child table are
required in the child table. This means that
the foreign key cannot be null.
Erwin Notation
Cardinality
Description
One to 0,
1, or M
Identifying
Non-Identifying
Nulls
No Nulls
To Null or Not to Null….
NULL means
no value
Two types of null values
– Unknown
– None (does not exist or not applicable)
Null Examples
Employee
e#
1
2
3
4
name
Bob
Jack
Mary
Kelly
salary
10,000
20,000
30,000
NULL
Questions:
• How many people make more than 15K?
• What is the average salary?
• Is Mary married?
spouse
Mary
Kate
NULL
John
Problems with NULL
Null
values are ambiguous
More programming is required to deal with
NULL values
Try to use UNKNOWN or NONE if
applicable
Getting Physical…
Getting Physical…
Converting
the logical data model into the
physical data model
Things to do when converting
Identify
data type
– Is it a string (character field) or a number?
– Use of varchar() or char()?
– Dates are dates not strings
Identify
data length
– Consider growth over time and maximum size
requirements
Identify
value constraints (valid ranges,
values, etc.)
Things to do when converting
Follow
proper naming conventions
Determine indexes
Consider combining 1:1 relationship
entities
Roll-up generalization / specialization
hierarchies
Add organizational attributes if any
Indexes
Index
is a physical access structure
Makes queries more efficient
Things to consider when creating
– Create an index for each PK
– Create an index for each FK
– Create an index for each AK which will be
used in queries
– Try to minimize number of indexes (update
overhead)
Specialization / Generalization
Specialization / Generalization
Inheritance
/ Abstraction
Subclasses / Superclasses
Specialization / Generalization
Two
processes resulting in the same model
Specialization is top-down approach. Can a
high level entity be broken down?
Generalization is bottom-up approach. Can
entities be combined at a higher level?
Example
Notes on Generalization/Specialization
Key
of subclass is always key of superclass
Subclasses can participate in their own
relationships
Participation in a subclass can either be inclusive
or exclusive
Exclusive subclasses should be defined by a type
Multiple inheritance not allowed in most
modeling tools
When converting to physical could combine into
one entity
Database Operations
CRUD
–
–
–
–
Create (Insert)
Read
Update (Modify)
Delete
Transactions
can not violate any integrity
constraints
Several may be grouped into a transaction
May propagate to maintain integrity constraints
If update violations occur
Cancel
the operation (Restrict)
Perform additional updates / deletes so the
violation is corrected (Cascade)
Execute a user specified operation to
correct (Trigger)
Perform the operation but inform the user
Normalization - What’s normal...
Normalization
Process
to design a highly desirable
relational schema using functional
dependencies
Guidelines for relational database design
which
–
–
–
–
Minimize redundancy
Avoid potential inconsistency
Help predict data behavior problems
Avoid update anomalies
Update Anomalies
Insert
extra values
Add redundant records
Delete records not intended
Change a fact more then once, possibly in
multiple tables
Miss changing a fact which is repeated
multiple times
Normal Forms
# of Tables
Joins
First
Normal Form
Second Normal Form
Third Normal Form
Boyce-Codd Normal Form
Fourth Normal Form
Fifth Normal Form
First Normal Form
A relation
is in 1NF if it contains only scalar
(atomic) values
–
–
–
–
One value for an attribute
No repeating groups
No composite attributes
No multi-valued attributes
To
convert to 1NF
– Create 1 table for each repeating group by adding the
PK of the original table
– Remove the repeating group from the original table
Example of Non-1NF w/ Conversion
Non-1NF
Dname
Dnumber DM GRSSN
Research
5 333445555
Administration
4 987654321
Headquarters
1 888665555
Dlocations
{Bellaire, Sugarland, Houston}
Stafford, Voorhees
Houston
1NF (note redundancy)
Dname
Dnumber DM GRSSN
Research
5 333445555
Research
5 333445555
Research
5 333445555
Administration
4 987654321
Administration
4 987654321
Headquarters
1 888665555
Dlocations
Bellaire
Sugarland
Houston
Stafford
Voorhees
Houston
Example of Non-1NF
EmployeeProject - NON-1NF
SSN
Ename
123456789 Smith, John
Pnumber Hours
1
2
666885555 Narayan, Ramesh
3
453223344 English, Joyce
1
2
32.5
7.5
40
20
20
Conversion
SSN
Ename
123456789 Smith, John
666885555 Narayan, Ramesh
453223344 English, Joyce
SSN
Pnumber Hours
123456789
1
32.5
123456789
2
7.5
666885555
3
40
453223344
1
20
453223344
2
20
Second Normal Form
All
attributes in the relation have a
functional dependency on the complete PK
Each non-key attribute is uniquely defined
by all components of the primary key
Example of Non-2NF w/ Conversion
EmployeeProject
SSN
Pnumber Hours
Ename
Pname
Plocation
FD1
FD2
FD3
Conversion to 2NF
EP1
SSN
Pnumber Hours
EP2
SSN
Ename
EP3
Pnumber Pname
Plocation
Third Normal Form
Every
non-key attribute (does not
participate in the primary key) is mutually
independent
Irreducibly dependent on the primary key
Example of Non-3NF w/ Conversion
Ex a mple
Lots
Pro p e rtyID#
Co untyNa me
Lo t#
Are a
Price
Lo ts1
Pro p e rtyID#
Co untyNa me
Lo t#
Are a
Price
Lo ts2
Co untyNa me
T a xRa te
Lo t#
Are a
2 NF
3 NF
Lo ts1A
Pro p e rtyID#
Co untyNa me
Lo ts1B
Are a
Price
T a xRa te
Maintaining History
Maintaining
History can serve one of two
purposes:
– Tracking changes in the entity over time
– Tracking record history in order to maintain inactive
records over time and maintain RI
Tracking
changes in an entity over time is very
difficult and requires significant storage
Tracking inactive records is our standard here
and provides value to the end user
Examples of History…