Download Logical Relational Data Modeling Standards

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Data center wikipedia , lookup

Clusterpoint wikipedia , lookup

Predictive analytics wikipedia , lookup

Object storage wikipedia , lookup

Database wikipedia , lookup

Operational transformation wikipedia , lookup

Versant Object Database wikipedia , lookup

Data analysis wikipedia , lookup

Information privacy law wikipedia , lookup

Forecasting wikipedia , lookup

Business intelligence wikipedia , lookup

Entity–attribute–value model wikipedia , lookup

Open data in the United Kingdom wikipedia , lookup

Data model wikipedia , lookup

Data vault modeling wikipedia , lookup

Relational model wikipedia , lookup

Database model wikipedia , lookup

Transcript
Property and Casualty Insurance Working Group
Logical Relational Data Modeling Standards
Version 1.0
June 16, 2008
Property and Casualty Insurance Working Group
Table of Contents
Introduction .................................................................................................................................................. 3
Purpose ..................................................................................................................................................... 3
Document Maintenance ........................................................................................................................... 3
Scope ......................................................................................................................................................... 3
Logical Relational Data Model Definition ..................................................................................................... 3
ER Diagramming Conventions....................................................................................................................... 4
Modeling Syntax ....................................................................................................................................... 4
Diagramming Layout Guidelines ............................................................................................................... 5
Normal Forms ........................................................................................................................................... 6
Writing Definitions of Logical Objects........................................................................................................... 6
Logical Object Definition Guidelines: ........................................................................................................ 6
Entity Definition Guidelines: ..................................................................................................................... 6
Attribute Definition Guidelines: ................................................................................................................ 6
Naming Logical Objects ................................................................................................................................. 7
Logical Object Naming Guidelines ............................................................................................................ 7
Entity Naming Guidelines.......................................................................................................................... 7
Attribute Naming Guidelines .................................................................................................................... 8
Relationship Naming Guidelines ............................................................................................................... 9
Relationship Standards ................................................................................................................................. 9
Super-types and Sub-types ......................................................................................................................... 10
Entity Keys ................................................................................................................................................... 11
Dimensional Data Modeling........................................................................................................................ 12
Appendix ..................................................................................................................................................... 13
Class Words ............................................................................................................................................. 13
Logical Relational Data Modeling Standard
Page 2
Property and Casualty Insurance Working Group
Introduction
Purpose
This document provides standards and guidance for the naming and use of objects in logical relational
data models. Logical objects are created and maintained to meet business requirements. Accurate
naming clarifies the specific nature of each logical object. Consistency allows the logical names to have
persistent value in differentiating data items. Name formation and the use of logical modeling objects
are independent of any particular data modeling tool or relational database management system
(RDBMS) platform. These logical relational data modeling guidelines are independent of specific CASE
tools.
The intention of this standard is to establish an agreed-upon basis for developing logical relational data
models in order to promote greater quality and consistency across data models and enable objective
model reviews.
Document Maintenance
To suggest improvements, changes or additions to this standard, contact:
Gail Austin
or
[email protected]
Harsh Sharma
[email protected]
Scope
These standards apply to all logical relational data models that are developed by OMG
submission teams.
Logical Relational Data Model Definition
The relational model for database management is a database model based on predicate logic and set
theory. It was first formulated and proposed in 1969 by Edgar Codd with aims that included avoiding,
without loss of completeness, the need to write computer programs to express database queries and
enforce database integrity constraints. “Relation” is a mathematical term for “table”, and thus
“relations” roughly means “based on tables”. It does not refer to the links or “keys” between tables,
contrary to popular belief.1
A logical relational data model defines what an organization knows about things of interest to the
business and graphically shows how they relate to each other in an entity relationship (ER) diagram. An
entity relationship diagram is an abstract conceptual representation of structured data. It uses standard
symbols to denote the things of interest to the business (entities), the relationships between entities
and the cardinality and optionality of those relationships. The Logical Relational Data Model, in contrast
to the more abstract Conceptual Relational Data Model, contains detailed characteristics of the entities
1
Wikipedia – relational model
Logical Relational Data Modeling Standard
Page 3
Property and Casualty Insurance Working Group
(attributes) and their definitions. It generates the structure of a physical data model which in turn
generates a database following Model Driven Architecture principles. It is a result of detailed analysis of
the business requirements.
The following illustration shows how the logical model fits into the overall data modeling process:
Ultimately, the logical relational data model helps to solidify and validate business requirements and
delivers stable, flexible data structures that are easily navigated and can answer unanticipated
questions.
ER Diagramming Conventions
Modeling Syntax
The recommended notation for models is Information Engineering (IE) – “Crow’s Feet” - because it is
easier for users to interpret than the Integration Definition for Information Modeling (IDEF1X) notation.2
2
The choice of IE notation will be revisited when the Barker notation becomes more widely available in the
modeling tools.
Logical Relational Data Modeling Standard
Page 4
Property and Casualty Insurance Working Group
Diagramming Layout Guidelines
Orient entities so that the “toes” of a relationship’s crow’s foot always point down. This puts
fundamental entities in the top area of the diagram, and positions associative and subtype entities in the
lower area of the diagram.
Recommended crow’s feet down convention
Avoid dead crows!
Keep the relationship lines as straight as possible. Avoid unnecessary bends. Too many symbols clutter
the diagram and make it confusing to the viewer.
Avoid crossing relationship lines. Crossed lines make it difficult to understand which entities are related.
Relationship names should be placed on the diagram so that the verbs or verb phrases are read in a
clockwise direction from one entity to the related entity.
Example:
Logical Relational Data Modeling Standard
Page 5
Property and Casualty Insurance Working Group
Normal Forms
Normal Forms provide a way to structure data to eliminate undesirable redundancies, inconsistencies
and dependencies. Normalization is a formalized technique for creating the most desirable logical
model for the given data and business rules. Completed logical models should be in, at least,
Boyce/Codd Normal Form (BCNF)3. For a model to be in BCNF, every entity in the model must be in
BCNF. The normal forms are summarized below:
First Normal Form (1NF) identifies and eliminates repeating groups and establishes a primary key.
Second Normal Form (2NF) identifies and removes partial-key dependencies. This applies only to
tables with composite keys.
Third Normal Form (3NF) identifies and eliminates non-key attributes that are dependent on other
non-key attributes.
Boyce/Codd Normal Form (BCNF) identifies and eliminates key attributes that are dependent upon
other key attributes in an entity with a composite key.
Writing Definitions of Logical Objects
Good Logical Object names are important because they provide a persistent record of the unique nature
of each object. Good names cannot be developed unless the object first has a good business definition.
Logical Object Definition Guidelines:








Use industry definitions where possible and appropriate.
Describe what the entity or attribute is – not where, when or by whom it is used.
Be clear and concise.
Write as if the reader is unfamiliar with the business area.
Use business terms rather than technical terms to express the meaning and importance to the
business.
Use mixed case according to standard business English conventions.
Do not use jargon, abbreviations or acronyms.
Do not include information that should be documented elsewhere, such as process descriptions.
Entity Definition Guidelines:



Entity definitions should be robust and communicate the essential and unique business nature
of the entity.
Do not depend on or refer to the definition of another object in the model.
Express one concept or idea – each entity should have a unique meaning.
Attribute Definition Guidelines:

3
Attribute definitions should communicate the essential business nature and purpose of the
attribute.
See Wikipedia Database Normalization: http://en.wikipedia.org/wiki/Database_normalization
Logical Relational Data Modeling Standard
Page 6
Property and Casualty Insurance Working Group


Do not depend on or refer to the definition of another object in the model, except for derived
attributes.
Include the domain of allowed values and default value where appropriate.
Naming Logical Objects
Logical Object Naming Guidelines











Use one or more words which are formed using the 26 letters (A-Z), the 10 digits (0-9), and no
special characters.
Separate words in the name with one space
Spell out words completely using no abbreviations.
Use the minimum set of words for the name that completely and uniquely capture the concept
expressed in the business definition
Reflect the business nature of the object in its name
Review names and corresponding definitions with business subject matter experts and get their
approval
Express a single idea or concept in the name that is clear and self-explanatory.
Write in plain English, spelling out all terms in full using business terms as defined by the business
client or as defined in a business or industry dictionary.
Do not use the possessive form; the articles “a”, “an”, or “the”; conjunctions; verbs; or prepositions
in the name.
Do not use the names of organizations, departments, computer applications, reports, publications,
forms or computer screens in the name.
Exceptions

Acronyms – An acronym is a word formed from the initial letters of a name, as WAC for
Women’s Army Corps, or by combining initial letters or parts of a series of words, as radar for
radio detecting and ranging. When an acronym is widely known it may be an exception to the
no abbreviation rule. A list of exceptions should be maintained as an appendix to this standard
and subject to an approval and a governance process.

Abbreviations – if the object name is too long to fit in the space allotted by the data modeling
tool and all non-essential words have been eliminated from the name, abbreviate the class
word. If the name is still too long, find text in the name that can form acronyms. Starting with
the right-most text, apply the acronym and repeat moving left in the name until the name fits.
Hyphen – use if the correct spelling of the word contains a hyphen (e.g. off-premises)

Slash – allowed if used in a business term (e.g. Actual/Expected)

Camel Case – allowed if the business term has an uppercase letter beyond the first letter –
though rarely found in formal written English, it is sometimes found in product names or
company names (e.g. NetQuote, SmartBrief)
Entity Naming Guidelines


Form a meaningful, concise, descriptive business name for the entity by extracting the important
concepts from its business definition. The name should avoid confusion with similarly named but
differently defined entities in other business areas.
Use business terms as defined by a business subject matter expert or by a business dictionary.
Logical Relational Data Modeling Standard
Page 7
Property and Casualty Insurance Working Group





Make the entity name a singular noun or noun phrase with qualifying adjectives because each
instantiation of the object represented by the entity is a single thing.
Use UPPER CASE.
Consider appending “LOOKUP” to reference entity names to make them easier to distinguish from
fundamental entities.
Do not use the words “Entity” or “Table” in the entity name unless they are part of common
business terminology.
Combine the names of the parent entities to form the name of the associative entity if that forms a
meaningful business name. For example, PERSON SKILL describes the association between the
PERSON and SKILL entities. In other cases, the noun form of the relationship verb may form the
associative entity name as in POLICY describes the association between PARTY and PARTY.
Attribute Naming Guidelines






4
Form a meaningful, concise, descriptive business name for the attribute by extracting the
important concepts from its business definition. Attributes in more than one model should have
the same name and definition in all models.
Use a singular noun or singular noun phrase with qualifying adjectives that are meaningful to
the business.
Use Title Case.
Do not use a class word or its abbreviation by itself as an attribute name.
Do not use the word “Attribute” in the attribute name unless it is part of common business
terminology.
Attribute Name Structure
o An attribute name begins with at least one Qualifier followed by a Class Word. Note that
conjunctions, verbs and other parts of speech are eliminated when they do not affect
the meaning of the name.
o Class words describe the type of data identified by the attribute name. Examples
include: amount, code, date, indicator, name and number.
o End the name with an approved class word that best categorizes the attribute. Class
words may also give an indication of the data type and possible values of the attribute,
e.g. an indicator is always a single alphanumeric character with only 2 possible values
other than Null, ‘Y’ or ‘N’.4
o Units of Measure describe the quantity that was measured such as height or volume.
o Objects are used for program objects, images, sounds and videos.
See Appendix for details on Class Words.
Logical Relational Data Modeling Standard
Page 8
Property and Casualty Insurance Working Group
Examples of logical attribute names and their components:
QUALIFIERS
CLASS WORDS
MODIFIER
PRIME WORD
KEY WORD
Automobile
Acquisition
Date
Insurance
Company
Name
Payment
Status
Code
Valid Driver
License
Indicator
Vehicle Engine
Capacity
Accident
Photograph
UNIT OF MEASURE
OBJECT
Cubic Centimeters
Image Jpeg
Relationship Naming Guidelines



The relationship name should be a verb or a verb phrase in third person singular form, i.e. a verb
form that is appropriate for a singular occurrence of the entity. This verb or verb phrase should
be an active verb in the parent to child direction and a passive verb phrase in the child to parent
direction. When used with the cardinality and optionality information, the verb or verb phrase
allows the relationship to be read as bi-directional English sentences. For example: A POLICY
covers zero, one or many EXPOSURE(S). An EXPOSURE is covered by one and only one
POLICY.
Do not include words that convey cardinality or optionality in the verb phrase – words such as
‘may’, ‘must’, ‘one and only one’ or ‘one or many’ are derived from the relationship symbols.
Avoid using generic or vague words and phrases such as ‘is’, ‘has’, ‘consists of’, ‘relates to’,
‘associated with ‘, etc.
Relationship Standards
A relationship describes the precise business rules governing the association between two entities and
facilitates the identification of foreign keys and referential integrity rules that may be required in the
database design.

The minimum components that must be specified for each relationship are:
o Name – a verb or verb phrase from parent to child
o Optionality rules
o Cardinality rules
o Qualification as an identifying or non-identifying relationship
Logical Relational Data Modeling Standard
Page 9
Property and Casualty Insurance Working Group





Many-to-many relationships are desirable in Conceptual Data Models but should always be
resolved with an associative entity in a Logical Data Model even if the associative entity has no
attributes other than the keys.
Investigate all mandatory one-to-one relationships because usually the two entities are in fact
one entity.
Eliminate circular relationships because they cause problems establishing proper data
dependency sequences. They usually result from an incorrect or misunderstood business rule.
Eliminate redundant relationships that consist of two dependency paths between the same two
entities. One of the paths is a direct relationship between the entities; the other uses a nondirect path that involves other entities. These redundant relationships may lead to problems
with database consistency.
Carefully review multiple relationships between the same two entities as they tend to represent
process logic and may introduce conflicting cardinalities. If the multiple relationships are
created to document roles, a better solution may be to create a role entity with appropriate
subtypes.
Super-types and Sub-types
Super-types and sub-types can be the result of either a generalization process – bottom-up – or a
specialization process – top-down. The result is a super-type (parent) that contains attributes that are
shared by all subtypes and a sub-type (child) that inherits all the shared attributes from the super-type
but also has unique attributes of its own.



A sub-type has an ‘is a’ relationship to its super-type. Sub-types are not ‘composed of’
relationships.
Super-types and sub-types clarify complex business rules and constraints between entities.
The super-type and sub-type have an exclusive OR relationship. An instance of the super-type
can be an instance of only one of the sub-type entities.
Logical Relational Data Modeling Standard
Page 10
Property and Casualty Insurance Working Group
An example of super-types and sub-types:
Entity Keys
A key identifies specific occurrences of an entity. They can be simple, consisting of a single attribute, or
they can be composite, consisting of two or more attributes.


A Candidate Key uniquely identifies occurrences of an entity. There may be more than one
candidate key for an entity. Candidate keys are not usually recorded in the logical data model
because they become either a primary key or an alternate key.
A Primary Key is a single candidate key selected as the ‘primary’ unique identifier for the
entity.
o The primary key must be stable for a relational data model. If the value were to change
over time, the result could be either a non-unique key value or multiple key values for
one instance of an entity. Either situation could cause ambiguous or lost data, system
crashes or difficult update processes.
o The primary key should be definitive because it uniquely identifies an instance of the
entity and thus no instance can be added to the entity until its identity is fully known.
The primary key cannot be nullable or contain nullable components.
o The primary key should use the minimal number of attributes required to define a
unique instance of the entity. A concise key has advantages in the physical database
such as smaller indexes and foreign keys.
Logical Relational Data Modeling Standard
Page 11
Property and Casualty Insurance Working Group



An Alternate Key is any candidate key not selected as the primary key of an entity. Alternate
keys are not usually recorded in the logical model but may become indexes in the physical
model. Alternate keys are usually unique but are not required to be.
A Surrogate Key consists of a single attribute created for the sole purpose of uniquely
identifying an instance of an entity. Natural keys consist of attributes that ‘naturally’ belong to
each occurrence of the entity. Surrogate keys are identifiers that contain no inherent,
embedded data about the entity. That is to say, they are always non-intelligent keys. Surrogate
keys are usually a numeric attribute whose value can be generated automatically either as a
sequential number or a random number. Synonyms for a surrogate key include: artificial key,
synthetic key, arbitrary key, and system-generated key.
A Foreign Key is a primary key of one entity (the ‘parent’ or independent entity) that is
duplicated in a separate, related entity (the ‘child’ or dependent entity). A foreign key is not
required to be unique within the child entity. A foreign key that is part of a composite primary
key in the child entity is known as an identifying or primary foreign key. Attributes in a nonidentifying foreign key become non-key attributes in the child entity.
Dimensional Data Modeling
There are dimensional data modeling concepts such as the grain of the model, conformed dimensions,
and diagramming layouts that deserve coverage in a standards document dedicated to dimensional
modeling. The next few paragraphs talk about which parts of the Relational Data Modeling Standard
apply to the Dimensional Modeling Standards and which do not.
Relational Data Models are designed to support operational databases that capture complex
information accurately. They deliver stable, flexible data structures that are easily navigated and can
answer unanticipated questions. Dimensional Data Models are designed to support reporting and
business analytics databases. They deliver simple, high-performance queries that answer a set of
anticipated questions.
Although Relational and Dimensional Data Models serve different purposes, they share many of the
same standards. Most importantly, they both use the Model Driven Architecture approach. Also, the
Logical Object, Entity, and Attribute Definition and Naming Guidelines apply to both styles of modeling.
They are both Entity Relationship diagrams and both use the same IE modeling syntax. The Relationship
Standards also apply to both though in practice relationship names are not used as often in Dimensional
models as they are in Relational.
Dimensional models are a denormalized design. Super-types and sub-types would be merged. Their
diagramming layouts often use a star schema design and occasionally a snow-flaked design so their
Diagramming Layout Guidelines are different from the Relational model.
Logical Relational Data Modeling Standard
Page 12
Property and Casualty Insurance Working Group
Appendix
Class Words
The three tables below enumerate approved class words which come in three flavors: key words,
units of measure and objects. Each class word has a standard abbreviation, definition, and associated
logical data type. The example is a typical column name and data value.
Key Word
Abbreviation
Definition
Amount
AMT
A quantity of money.
Code
CD
STRING
Count
CNT
Date
DT
Description
DSCR
Letters and numbers used
for brevity to identify
something.
A numeric count or
calculated quantity of
anything other than
money, used when no unit
of measure applies.
Time stated in terms of
year, month and day.
A statement that
represents something in
words.
Identifier, ID,
Identification,
Identity
Indicator
ID
Line
LN
Name
NM
Number
NUM
Objects
See Object
list below
Data that serves to
uniquely identify one item
in a group
Data that can have only
one of two values other
than NULL: Y(es) or N(o).
A set of characters
normally printed or
displayed as one
horizontal row.
A word or words by which
a thing is designated and
distinguished from others.
Normally numeric data
used to identify ordinal
position or to distinguish
between items in a set.
When numeric, it must
always be a whole
number.
Binary Objects, such as
program objects, images,
IND
Logical Relational Data Modeling Standard
Logical
Datatype
NUMERIC
Example
Policy Face
Amount = 1,200.0
Sales Office Code
= AR11
NUMERIC
Active Employee
Count = 41,256
DATE
STRING or
NUMERIC
Disability Date =
2002/4/5
Policy Change
Reason Description
= “Match coverage
to changed income”
Employee ID =
0123456
STRING
(1 character)
Auditing Approval
Indicator = Y
STRING
First Address Line
= “451 MAIN ST”
STRING
Person Full Name
= “Sammy
Somerset”
Arrival Sequence
Number = 5
STRING
STRING or
NUMERIC
STRING
Page 13
Property and Casualty Insurance Working Group
Percent
Percentage
PCT
Text
TXT
Time
TM
Timestamp
Units of
Measure
TS
See Unit of
Measure list
below
sounds, or videos.
Numeric data specifying a
portion or share out of
each 100 units. (75 units
out of 100 is 75 percent
(%). Percent values are
multiplied by 0.01 in order
to facilitate customary
processing. In the
example, 75 percent
would be stored as
0.7500 but displayed as
75.00 %.)
Data having relatively
undefined content and
arrangement such as a
note, comments or an
explanation
Time stated in terms of
hours, minutes and
seconds
Time stated in terms of
year, month, day, hours,
minutes, seconds and
fractions of seconds.
Identifies an instant in
time.
All units of measure, e.g.
Feet, Months, Miles,
Centimeters.
Logical Relational Data Modeling Standard
NUMERIC
Sales Closure
Percentage = .7500
STRING
Audience Comment
Text = “Enthusiastic
and attentive”
TIME
Check-In Time =
8:45 AM
TIMESTAMP
Transaction
Timestamp =
20021203134516.8
72
NUMERIC
Page 14
Property and Casualty Insurance Working Group
Unit of Measure
Beats per Minute
Centimeters (Centimetres)
Cubic Centimeters (Centimetres)
Days
Degrees
Feet
Grams
Horsepower
Hours
Inches
Kilograms
Kilometers (Kilometres)
Kilometers (Kilometres) per Hour
Liters (Litres)
Meters (Metres)
Miles
Miles Per Hour
Millimeters (Millimetres)
Minutes
Months
Ounces
Pounds
Units. “Units” is a generic Unit of
Measure (UOM) used when data with
different UOM will be stored in a
common column. In this case there must
be a companion code column containing
a UOM abbreviation indicating the UOM
of the Units value.
Weeks
Years
Object Type
C++
PowerBuilder
SmallTalk
Bitmap
Gif
Jpeg
Rav
Abbreviation
BPM
CM
CC
DAY
DEG
FT
G
HP
HR
IN
KG
KM
KMH
L
M
MILE
MPH
MM
MIN
MO
OZ
LB
UNIT
WK
YR
Object Class
Program Object
Program Object
Program Object
Image
Image
Image
Sound
Logical Relational Data Modeling Standard
Abbreviation
OBJ_C
OBJ_PB
OBJ_ST
IMG_BMP
IMG_GIF
IMG_JPG
SND_RAV
Page 15