Download Lecture 6 Data Model Design

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

SQL wikipedia , lookup

Relational algebra wikipedia , lookup

Microsoft Jet Database Engine wikipedia , lookup

Database wikipedia , lookup

Extensible Storage Engine wikipedia , lookup

Clusterpoint wikipedia , lookup

Entity–attribute–value model wikipedia , lookup

Relational model wikipedia , lookup

Database model wikipedia , lookup

Transcript
1 of 37
Lecture 6
Data Model Design
(continued)
Jeffery S. Horsburgh
Hydroinformatics
Fall 2013
This work was funded by National
Science Foundation Grants EPS 1135482
and EPS 1208732
2 of 37
Objectives
• Identify and describe important entities and
relationships to model data
• Develop data models to represent, organize,
and store data
• Design and use relational databases to
organize, store, and manipulate data
3 of 37
Present and discuss your
preliminary designs
4 of 37
Naming Database Objects
• Names should be
– Unique
– Have some meaning to the user
– Short
– No spaces or reserved characters
• Entity and Attribute names = nouns
• Relationship names = verbs
Many Observations are made at a Site.
5 of 37
More on Attributes
• Attribute values should be atomic
– Present a single fact
• Allows for:
– simpler programming,
– greater reusability of data
– easier to implement changes
6 of 37
Atomic Attribute Example
Instead of 1 overloaded attribute:
VariableName = “Dissolved Oxygen, mg/L, surface
water”
You might use three:
VariableName = “Dissolved Oxygen”
Units = “mg/L”
SampleMedium = “surface water”
7 of 37
Common Attribute Atomicity
Violations
• Simple aggregation: Address = “8200 Old Main
Hill, Logan, UT, 84322”
• Complex codes: VariableCode = “DO_mgL_Avg”
• Text fields: Free form text. Overreliance may
mean that data requirements may not be met by
the model.
• Mixed domains: Where the value of an attribute
can have different meaning under different
conditions.
8 of 37
Primary Keys
• Attribute or set of attributes that uniquely
identify a specific instance of an entity (row in
the table)
• Primary keys must:
– Have a non-null value for each instance of an
entity
– Have a unique value for each instance of an entity
– Have values that do not change or become null
9 of 37
Normalization
• Organizing the fields and tables in a relational
database to minimize redundancy and
dependency
– Dividing large tables into smaller tables (with
relationships)
• Isolate data so that additions, deletions, and
modifications of a field or record can be made in
one place
• Reduce the need for restructuring the database
as new types of data are introduced
10 of 37
Unnormalized Data Example
SiteID SiteName
VariableID VariableName DateTime
Value
1
Logan River
1
Temperature
1/1/2012
5
1
Logan River
1
Temperature
1/2/2012
5
1
Logan River
2
pH
1/1/2012
8
1
Logan River
2
pH
1/2/2012
8
2
Spring Creek
1
Temperature
1/1/2012
7
2
Spring Creek
1
Temperature
1/2/2012
7
2
Spring Creek
2
pH
1/1/2012
7.5
2
Spring Creek
2
pH
1/2/2012
7.5
11 of 37
Issues with Unnormalized Data
SiteID SiteName
VariableID VariableName DateTime
Value
1
Logan River
1
Temperature
1/1/2012
5
1
Logan River
1
Temperature
1/2/2012
5
1
Logan River
2
pH
1/1/2012
8
1
Logan River
2
pH
1/2/2012
8
INSERT: The fact that a site or variable exists cannot be asserted
until a measurement has been made.
DELETE: If a row is deleted, information may be lost about not
only the measurement, but also the variable and the site.
UPDATE: If a SiteName or VariableName changes, multiple
records have to be updated with the new information
12 of 37
Normalization Example
1
*
*
SiteID
SiteName
SiteID
VariableID
DateTime
Value
1
Logan River
1
1
1/1/2012
5
2
Spring Creek
1
1
1/2/2012
5
1
2
1/1/2012
8
1
2
1/2/2012
8
2
1
1/1/2012
7
2
1
1/2/2012
7
2
2
1/1/2012
7.5
2
2
1/2/2012
7.5
1
VariableID
VariableName
1
Temperature
2
pH
13 of 37
Normalization Tradeoffs
• Pros:
– Eliminates redundant data
– Saves space and can improve storage efficiency
– Inserts and updates are done in one place
– Can improve efficiency
• Cons:
– May complicate the code of common queries
– Abstracts tables using keys – can be harder for a
human to “see” the data
14 of 37
Data Integrity Rules
• Entity Integrity
– Primary key must exist, be unique, and not null
SiteID
SiteName
1
Logan River
2
Spring Creek
VariableID
VariableName
1
Temperature
2
pH
ValueID
SiteID VariableID DateTime Value
101
1
1
1/1/2012
5
102
1
1
1/2/2012
5
103
1
2
1/1/2012
8
104
1
2
1/2/2012
8
105
2
1
1/1/2012
7
106
2
1
1/2/2012
7
107
2
2
1/1/2012
7.5
108
2
2
1/2/2012
7.5
15 of 37
Data Integrity Rules
• Referential Integrity
– Every foreign key value must match a primary key
value in an associated table
– Ensures that we can navigate relationships
ValueID
SiteID VariableID DateTime Value
SiteID
SiteName
101
1
1
1/1/2012
5
1
Logan River
102
1
1
1/2/2012
5
2
Spring Creek
103
1
2
1/1/2012
8
104
1
2
1/2/2012
8
105
2
1
1/1/2012
7
106
2
1
1/2/2012
7
107
2
2
1/1/2012
7.5
108
2
2
1/2/2012
7.5
VariableID
VariableName
1
Temperature
2
pH
16 of 37
Data Integrity Rules
• Insert and Delete Rules
– What happens to a parent entity when child entities are
deleted?
– What happens to child entities when a parent is deleted?
ValueID
SiteID VariableID DateTime Value
SiteID
SiteName
101
1
1
1/1/2012
5
1
Logan River
102
1
1
1/2/2012
5
2
Spring Creek
103
1
2
1/1/2012
8
104
1
2
1/2/2012
8
105
2
1
1/1/2012
7
106
2
1
1/2/2012
7
107
2
2
1/1/2012
7.5
108
2
2
1/2/2012
7.5
VariableID
VariableName
1
Temperature
2
pH
17 of 37
Data Integrity Rules
• Value Domains
– Valid set of values for an attribute
– Controlled vocabulary, data type, length, range,
constraints, default value
Integer
Fields
Controlled
Domain
Date Field
Double
ValueID
SiteID VariableID DateTime Value
101
1
1
1/1/2012
5.5
VariableID
VariableName
102
1
1
1/2/2012
5.678
1
Temperature
103
1
2
1/1/2012
8.0
2
pH
104
1
2
1/2/2012
8.9
18 of 37
Specialization
• Designating entity subgroups within a higher
level entity
• Entity subgroups have attributes or
relationships that do not apply to the higher
level entity
• Attributes are inherited
– A lower level entity inherits all of the attributes
and relationship participation of the higher level
entity to which it is linked
19 of 37
Specialization Example
• A car is a vehicle
• A truck is a vehicle
20 of 37
Generalization
• Combine a number of entities that share
features into a higher level entity
• Specialization and generalization are
inversions of each other
Specialization
Generalization
21 of 37
Constraints on
Specialization/Generalization
• Constraints on which entities can be members of
a given lower-level entity set
– Condition-defined – “all vehicles with a towing
capacity of more than 10,000 lbs are trucks”
• Constraints on whether entities can belong to
more than one lower-level entity set
– Disjoint – an entity can belong to only one
– Overlapping – an entity can belong to more than one
• Completeness constraint – must every higher
level entity belong to at least one lower level
entity
22 of 37
Mapping Specialization to Tables
• Option 1: Put everything in one table
• There will be NULL values where attributes
don’t apply
23 of 37
Mapping Specialization to Tables
• Option2: Form tables
for the higher level
entity and the lower
level entities
• Each lower level entity
includes the primary
key of the higher level
entity set
24 of 37
Mapping Specialization to Tables
• Option3: Model only the lower level entities
• Repeats attributes
25 of 37
Steps in Data Model Design
1. Identify entities
2. Identify relationships among entities
3. Determine the cardinality and participation
of relationships
4. Designate keys / identifiers for entities
5. List attributes of entities
6. Identify constraints and business rules
7. Map 1-6 to a physical implementation
26 of 37
Physical Data Model
• The “physical” means a specific implementation
of the data model
– Choice of hardware and operating system
– Choice of relational database management system
– Implementation of tables, relationships, constraints,
triggers, indices, data types
– Database access and security
– Performance
– Storage
27 of 37
Relational Database Management
Systems (RDBMS)
•
•
•
•
•
File vs. server based
Free vs. commercial
Different data types
Potentially different syntax for SQL queries
Security models and concurrent users
28 of 37
Reduction of an ER Diagram to Tables
• Converting an ER diagram to table format is
the basis for deriving a relational database
– Primary keys allows entities to be expressed as
tables that contain data
– A database is a collection of tables
– Tables are assigned the same name as the entity
– Each table has columns that correspond to
attributes – each column has a unique name
– Each column must have a single data type
29 of 37
Advanced Database Objects
•
•
•
•
Views
Stored procedures
Triggers
Constraints
• Implementation of these objects may depend
on your choice of RDBMS software
30 of 37
Database Views
• A View is equivalent to a table, but is defined
by a SQL query
• Used to present a set of desired information,
independent of the underlying database
structure
• Can be used to hide complexities of the
underlying data model from the user
– One way to address the Cons of normalization
31 of 37
Stored Procedures
• A set of structured query language (SQL)
statements that are stored and executed on
the server
• Useful for repetitive tasks
• Encapsulate functionality and isolate users
from data tables
• Can provide a security layer – software
applications have no access to the database
directly, but can execute stored procedures
32 of 37
Triggers
• Special kind of stored procedure
• Automatically executes on a table or view
when an event occurs in the database
• Events include: CREATE, ALTER, INSERT,
UPDATE, DELETE
• Mostly used to maintain the integrity of
information in the database
33 of 37
Constraints
• Common way to enforce data integrity
• Examples:
– Not NULL – value in a column must not be NULL
– Unique – value(s) in specified column(s) must be
unique for each row in a table
– Primary Key – value(s) in the specified column(s) must
be unique for each row in the table and not be NULL
– Foreign Key – values(s) in the specified column(s)
must reference an existing record in another table via
its primary key
– Check – an expression that validates data and must
not be FALSE
34 of 37
Data Types
• Each attribute of an entity (column in a
database table) must have a single data type
• Data types are enforced by RDBMS software
Table: DataValues
Attribute
Data Type
Sample Data
ValueID
Integer
1
SiteID
Integer
5
VariableID
Integer
5
DateTime
Date/Time
8/15/2013 4:30 PM
DataValue
Double
4.567
35 of 37
Data Types
• Data types can be specific to RDBMS software
RDBMS
Integer
Floating Point
Decimal
String
Date/Time
MS SQL Server
TINYINT,
SMALLINT, INT,
BIGINT
FLOAT, REAL
NUMERIC,
DECIMAL,
SMALLMONEY,
MONEY
CHAR,
VARCHAR,
TEXT, NCHAR,
NVARCHAR,
NTEXT
DATE, DATETIMEOFFSET,
DATETIME2,
SMALLDATETIME,
DATETIME, TIME
MySQL
TINYINT (8-bit),
SMALLINT (16bit), MEDIUMINT
(24-bit), INT (32bit), BIGINT (64bit)
FLOAT (32-bit),
DOUBLE (aka
REAL) (64-bit)
DECIMAL
CHAR,
BINARY,
VARCHAR,
VARBINARY,
TEXT,
TINYTEXT,
MEDIUMTEXT,
LONGTEXT
DATETIME, DATE,
TIMESTAMP, YEAR
PostgreSQL
SMALLINT (16bit), INTEGER (32bit), BIGINT (64bit)
REAL (32-bit),
DOUBLE
PRECISION
(64-bit)
DECIMAL,
NUMERIC
CHAR,
VARCHAR,
TEXT
DATE, TIME (with/without
TIMEZONE), TIMESTAMP
(with/without
TIMEZONE), INTERVAL
Quick summary from: http://en.wikipedia.org/wiki/Comparison_of_relational_database_management_systems
36 of 37
Summary of 3 Levels of Data Model Design
Feature
Conceptual
Logical
Entity Names
X
X
Entity Relationships
X
X
Physical
Attributes
X
Primary Keys
X
X
Foreign Keys
X
X
Table Names
X
Column Names
X
Column Data Types
X
Views
X
Stored Procedures
X
Triggers
X
Constraints
X
37 of 37
Summary
• Simple rules for naming objects and specifying
domains can help protect the integrity of data
• Normalization can help reduce redundancy, increase
storage efficiency, and protect data integrity – but
there are tradeoffs
• Data integrity rules include relationships and domains
and protect the integrity of data in the database
• Specialization and generalization require special
consideration in implementation
• A physical database implementation requires choices
about hardware, software, security, formats and
storage, and other factors