Download Pclec08

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Database wikipedia , lookup

Concurrency control wikipedia , lookup

Entity–attribute–value model wikipedia , lookup

Functional Database Model wikipedia , lookup

Extensible Storage Engine wikipedia , lookup

Relational model wikipedia , lookup

Database model wikipedia , lookup

Transcript
8. Integrity
This session will be directed at the many and various
aspects of ‘Integrity’
Integrity processes are included to ensure that the
data in the database, and the information derived from
data is clear, complete and accurate.
And we will have another quick look at ‘Business Rules’ also
PCLec08 / 1
But,before we do
…..
Here is an answer to the puzzle which has been keeping you
awake for the past 3 nights.
Source
Progressive
C and D move across(A,B)
D returns (A,B,D)
A and B move across (D)
C returns (C,D)
C and D move across
Target
C,D
2
C
1
A,B,C
10
A,B
2
A,B,C,D
2
Time
2
3
13
15
17
PCLec08 / 2
Integrity
Components of a Data Model
Structure
Operators
Integrity
PCLec08 / 3
Dangers to Data
A DBMS must protect data from several dangers.
- Accidents( programming errors and miskeying) : these are
integrity issues.
- Malicious Use : A security issue.
- Hardware and Software Failures
: these are concurrency and restart issues.
PCLec08 / 4
Definitions of Integrity
Data integrity requires the database to be an
accurate reflection of the real world
Data should be valid and complete
Integrity issues may have been handled external to the
database in the application code and possibly in multiple
programs.
Codd (1985) states that integrity constraints
specific to a particular RDBMS must be
definable in SQL and stored in the database dictionary
(catalog).
PCLec08 / 5
A DBMS Enforced Integrity
Employee Table
Empno, emp_name,Age, Salary
EMPNO NUMBER(6,0) NOT NULL
Attempt to add an employee without an EMPNO value
INSERT INTO EMPLOYEE
(EMP_NAME,AGE,SALARY)
VALUES (‘Smith’,22,35000)
This process is rejected by the DBMS
but what would happen if the user entered (35000,’Smith’,22 ) ?
PCLec08 / 6
An Applications Forced Integrity
A further constraint
IF AGE > 16 OR AGE < 99
THEN O-K
ELSE
REJECT ‘Age Invalid’
This represents a segment of program code.
PCLec08 / 7
Integrity Enforcement
Integrity enforcement is usually split between the DBMS and
the application programs.
Using application programs for integrity assertions has
disadvantages.
Programming is more complex
Integrity constraints may be repeated
Change management is difficult
Constraints may contradict
Ad hoc operations may avoid the constraints
PCLec08 / 8
Integrity as a Role of the DBMS
Integrity rules must be considered at design time
Transactions must be monitored for violations
and appropriate actions taken
Rules should be few, without overlap and
should not impact performance too much
(this is not an invitation to exclude rules)
PCLec08 / 9
Classifications of Integrities
There are various possible classifications.
Date(Vol 2)
- Domain Integrity (attribute based)
- Relational Integrity (table based)
PCLec08 / 10
Codd’s CURED or CRUDE
Type E - Entity Integrity
Type R - Referential Integrity
Type D - Domain Integrity :A user defined datatype
Type C - Column Integrity:Linked to Domain integrity
Type U - User Defined Integrity
PCLec08 / 11
Data Integrity
Some terms you will encounter:
Entity Integrity
Referential Integrity
Functional Dependency (constraints between determinants
and attributes. For each value of the determinant there is
only one value for each of the attributes it determines)
Multivalued Dependency
Join Dependency
Domain Constraints
Cardinality Constraint
User Defined Constraints
PCLec08 / 12
Data Integrity
General Principle: Data compliance with a set of rules
Rules Location: Best embodied in the DBMS
If they are contained in an application, there is the danger of
saturating a network and causing degraded performance.
This is particularly so in client / server computing - but are
ALL the rules applicable to ALL users ?
CONSTRAINTS: Declarative approach where integrity
constraints are ‘declared’ as part of a table specification.
ANSI SQL-89 and SQL-92 and SQL-93 standards include
specifications for integrity constraints syntax and behaviour
PCLec08 / 13
Domain Integrity
A domain is a conceptual pool of values from which
one or more attributes draw their actual values.
Domain age range 0-127
Attribute employee_age 16-65
Two values can only be compared if they come from
the same domain.
PCLec08 / 14
Primary Key Integrity
(based on Oracle)
A primary key has these properties
– unique value (no duplicates permitted in table)
– not null
– multiple keys if required
– referenced qualification - foreign key(s)
– may be limited to a small range of values (the check
option)
– may be limited to a large range of values (the ‘exists’
option)
PCLec08 / 15
Foreign Key Properties
May be unique ( 1 : 1 relationship)
May be multiple keys
May be limited to a range of values (Domain -as for primary
Keys)
May be null (as required)
May be not null (as required)
Will reference a Primary Key (or keys)
May be subject to cascade update, delete, insert
PCLec08 / 16
A Domain Definition
DOMAIN GENDER
–
–
–
–
–
–
–
–
Data Type: Character
Length: 6 bytes
Allowable Values: Male, Female, Null
Storage Format: Uppercase
Operations Allowed:
Inherited Operators: String, Unstring, =
Input Editing: Nil
Extra Functions: Is_ Male, Is_Female,What_Gender
PCLec08 / 17
Timing Constraints
When should an integrity be checked ?
TC - Test constraint no later than the end of the
current relational request.
TT - Test at the end of the transaction.(terminal test)
START TRANSACTION
UPDATE EMPLOYEE
SET SALARY= SALARY*1.1
TC
DELETE FROM EMPLOYEE WHERE SALARY > 1000
END TRANSACTION
TT
In Oracle, the integrity check is determined by the commands ‘Update
Immediate’ or ‘Update Deferred’
There is also a ‘set constraints.. Immediate or deferred’ option
PCLec08 / 18
A Few Examples
A transaction is a unit of work
1. Single Table - The transaction affects 1 row only does not
alter any domain setting.
2. Single Table - The transaction affects multiple rows and will
affect domain settings.
When should the domain integrity breach be reported ?
At the first, second - or at the end of the processes ?
When should the transaction be aborted ?
Should there be a log held of these occurrences/rows ?
PCLec08 / 19
A Few Examples
3. Single Table - bulk loading.
Should the load process be stopped at the
detection of the first breach ?
Or should the load row be ‘diverted’ to a log file ?
Should there be a number count of failures ?
Should there be a limit over which the process
should be stopped ?
PCLec08 / 20
Slightly More Complex
A
B
1:M
1:M
C
Transaction
It is possible that multiple rows in
table A, table B and table C will be
affected by the transaction
PCLec08 / 21
An Algorithm for Integrity Checks
Determine constraints that apply to request.
Inspect timing types.
Before the end of the relational request run types TC.
Append types TT to the end of the transaction.
Before the end of transaction run types TT.
PCLec08 / 22
Foreign Key Rules
For each foreign key three rules need to be answered:
• Can the foreign key accept nulls ?
• What should happen on an attempt to delete the
target of a foreign key reference?
• What should happen on an attempt to update the
target of a foreign key reference ? (primary key)
Employee
Empno
e1
e2
e3
ename Worksfordept
red
d1
blue
brown d2
Dept
Dept
d1
d2
d3
Dname
Pay
Tax
Art
PCLec08 / 23
Foreign Key Rules
When should foreign key rules be checked ?
Dept (Deptno, Dname, Budget)
Emp (Empno, Ename, Salary, WorksforDeptno)
WorksforDeptno References Dept delete cascades,
update cascades
Depend (Empno, Dependname, Date-of-birth)
Empno references Emp delete cascades, update
cascades
PCLec08 / 24
Foreign Keys and Referencing Action
e.g. SUPPLIER
(Sno,Sname)
PART
(Pno,Pname)
SUPPLIER-PART (Sno,Pno,Qty)
CREATE TABLE SUPPLIER etc.
Primary Key (Sno )
CREATE TABLE PART etc.
Primary Key( Pno )
CREATE TABLE SUPPLIER_PART(etc.
Primary Key (Sno,Pno)
Foreign Key (Sno) REFERENCES SUPPLIER (Sno)
ON DELETE RESTRICT
ON UPDATE CASCADES
Foreign Key (Pno)REFERENCES PART (Pno)
ON DELETE RESTRICT
ON UPDATE CASCADES)
PCLec08 / 25
Foreign Keys and Referencing Action
The relation each Foreign Key identifies is defined. The foreign key
clause also contains other information.
DELETE when the target record of a foreign key reference is detected
Performs the operation -
CASCADE - all matching SUPPLIER-PART records are also deleted.
RESTRICT - the delete is restricted such that there are no matching
SUPPLIER-PART records.
SET NULL - the foreign key values are all set to NULL (only if nulls
are allowed)
PCLec08 / 26
Foreign Keys and Referencing Action
UPDATE when the Primary Key of the target record of a foreign key is
updated.
CASCADE
RESTRICT
SET NULL
These options are similar to delete.
Note that the design decisions embodied in pseudo SQL represent
constraint information which reflects the nature or business rules of the
organisation being modelled.
PCLec08 / 27
Possible Referential Integrity Processes
1. Limited Insert : If an incoming Foreign Key DOES NOT EXIST
as a referenced table Primary Key:
ABORT TRANSACTION - REPORT
2. Limited Update : If an incoming Foreign Key DOES NOT EXIST
as a referenced table Primary Key
TERMINATE PROCESS
3. Restricted Delete : If there are referencing FOREIGN KEYS
in a referencing table
TERMINATE DELETE PROCESS ON REFERENCED
TABLE
PCLec08 / 28
Possible Referential Integrity Processes
4. Restricted Update : If there are referencing Foreign Keys in a
referencing table
INHIBIT UPDATE OPERATION ON THE REFERENCED KEY
5. Cascade Delete : If there are Referenced Keys
INITIATE DELETION OPERATION ON REFERENCED
TABLE BY DELETING ALL REFERENCING ROWS
6. Cascade Update : Commence an UPDATE on the
REFERENCED TABLE by UPDATING the Foreign Keys on all
Referencing Rows in the Referencing Table(s)
PCLec08 / 29
Possible Referential Integrity Processes
7. Nullify Delete : Commence a DELETE operation on the
REFERENCED table by setting ALL the FOREIGN KEYS on
the Referencing Table(s) to NULL (watch Data
Types)
8. Nullify Update : Set all of the Foreign Keys of the
Referencing Table to NULL. This will invalidate any
referencing of the Referenced Key (which must not be
NULL)
9. Default Update : Invalidate references to Updated
Referenced Keys by setting all Referencing Table
Foreign Keys to a DEFAULT value
PCLec08 / 30
Possible Referential Integrity Processes
10. Default Delete : Invalidate references to the deleted Referencing Key
Value(s) by setting all Referencing Foreign Key values to a DEFAULT
value
11. Warning Delete : Permit the deletion BUT Warn the User of the
Unattached Foreign Keys which are now present in the Referencing
Table(s)
12. Warning Update : Permit the Update BUT Warn the User of
Unattached Foreign Keys which are now present in the Referencing
Table(s)
PCLec08 / 31
Some Integrity Schema Examples
Create table monash1(
city varchar2(13) not null,
studydate date not null,
noonreading
number(4,1),
midnightreading number(4,1),
rainfall
number,
unique (city,studydate) );
Creates a table with the candidate key of city,studydate
There may be a number of Unique constraints
PCLec08 / 32
Some Integrity Schema Examples
Create table monash1(
city varchar2(13) not null,
studydate date not null,
noonreading
number(4,1),
midnightreading number(4,1),
rainfall
number,
primary key (city,studydate) );
Creates a table with the Primary Key of city,studydate
and there is only 1 such set of values in the table.
There may be a number of Unique constraints.
PCLec08 / 33
Some Integrity Schema Examples
Create table monash1(
city varchar2(13) not null,
studydate date not null,
noonreading
number(4,1),
midnightreading number(4,1),
rainfall
number,
constraint pk_citystudy primary key (city,studydate) );
Creates a table with the Primary Key key of city,studydate
and names the constraint citystudy in the Constraints table.
PCLec08 / 34
Enable, Disable
There is a feature in Oracle which permits the
Disabling and Enabling of constraints.
e.g. alter table shipping add primary key (ship_no,
container_no)
This identifies the composite primary key as ship_no +
container_no, and ensures that no two rows have the same
values.
The Disable option defines the constraint but does not enforce
it.
The Enable function resets the enforcement of the constraint.
PCLec08 / 35
Enable, Disable
The formats of the ‘disable’ and ‘enable’ commands are :
disable {{unique(column[,column]…) |
primary key |
Constrains constraint} [cascade] } |
all triggers
enable {{ unique(column[,column]…|
primary key |
[using index [initrans integer]
[maxtrans integer]
[Tablespace tablespace]
[Storage storage]
all triggers
PCLec08 / 36
Triggers
• Oracle triggers are used to include more processing power
to the DBMS function for events which affect a database.
• In the following example a Trigger will be set which ensures
that changes to employee records will only take place during
business hours on working days ( security ?)
• See if you agree ...
PCLec08 / 37
Triggers
Create trigger emp_permit_change
before
delete or insert or update
on emp
declare dummy integer;
begin
/* if today is a Saturday or Sunday, then return an error*/
if (to_char(sysdate, ‘dy’) = ‘sat’ or
to_char (sysdate, ‘dy’) = ‘sun’)
then raise_application_error (-20501,
‘May not change employee table during the weekend’);
end if;
PCLec08 / 38
Triggers
• Perhaps we need this as well :If (to_char(sysdate, ‘hh24’) < 8
or to_char(sysdate, ‘hh24’) >= 18)
then raise application_error (-20502,
‘May only change employee table during working hours’);
end if;
end;
which raises and interesting point - what happens with flexible
time and enterprise bargaining ?
PCLec08 / 39
Business Rules and Data Modelling
Business Rules are necessary to ensure that data in a
database reflects accurately those conditions which apply to
data in the real world environment
The following overheads introduce some additional material
on this subject
PCLec08 / 41
Business Rules and Data Modelling
Business Rules are at the core of commercial applications
If systems ‘obey’ the Business Rules, then
– data will be correct
– applications will function
– users and management will be happy
Which leads us to
– what is a business rule ?
– where are they declared ?
– where are they enforced ?
PCLec08 / 42
Business Rules and Data Modelling
4 Proposed Levels of Business Rules
1. Single attribute (column) format definitions enforced by the
database
The ‘payment’ column is an amount interpreted as dollars
and cents
The Surname column is a text field expressed in the ASCII
character set
The Amount_on_Hand column must never be less than 0
PCLec08 / 43
Business Rules and Data Modelling
2. Multiple key column relationships
The ‘Brand Name’ column in the Brand table has a many to
one relationship with the Manufacturer Name in the
Manufacturer table
The Product foreign key in the Sales table has a many to
one relationship to the Product primary key in the Product
table
PCLec08 / 44
Business Rules and Data Modelling
3. Relationships between Entities
This is declared on the entity-relationship diagram, but is not
directly enforced by the database because the relationship is
many-to-many
The employee is a sub-type of Person
Supplier supplies the Customer
PCLec08 / 45
Business Rules and Data Modelling
4. Complex Business Logic
This relates to Business processes
It may be enforced at data entry time by a complex
application such as this :“When an insurance policy has been committed but has not
yet been approved by the underwriter, the administration
date can be NULL, but when the policy has been
underwritten, the administration date must be present (NOT
NULL) and must be more recent than the agreement date”
PCLec08 / 46
Business Rules and Data Modelling
From this it can be stated that :
The core database software manages the first 2 levels only the single column format definitions and multiple column key
relationships
Level 3 (relationships between entities) and Level 4
(complex business logic) should also be enforced as there is
much valuable business content at this level (or should that
be essential ?)
PCLec08 / 47
Business Rules and Data Modelling
Entity relationship modelling (E/R modelling) seems to be a
comprehensive language for mapping and describing
relationships between entities.
E/R modelling is a diagrammatic technique which specifies
one-to-one
many-to-one and
many-to-many relationships among data elements
It is a logical model
PCLec08 / 48
Business Rules and Data Modelling
Computer Associates’ Erwin converts an E/R diagram into
data definition language declarations
These declarations define key definitions and join constraints
You can follow this up, and use an Erwin example at
www.cai.com
Gershwin, which you have probably met, is a simpler E/R
modelling tool
PCLec08 / 49
Business Rules and Data Modelling
E/R modelling is a useful technique for beginning the
process of understanding and enforcing business rules
It does not provide a guarantee of completeness
E/R Modelling is incomplete in that the diagrams represent
only what the designer decided to stress, or was aware of.
There is no test of an E/R diagram to determine if the
designer has specified all possible one-to-one, one-to-many
or many-to-many relationships.
PCLec08 / 50
Business Rules and Data Modelling
E/R modelling is not unique
A given set of data relationships can be represented by a
number of E/R diagrams
Many real data relationships are many-to-many.
The E/R diagram model does not enforce the M:N situations
which may involve various conditions and degrees of
correlation which would be useful (and perhaps essential) to
include a business rules. E/R modelling provides no
extensions to the basic many-to-many declaration
PCLec08 / 51
Business Rules and Data Modelling
Many E/R models are ideal, not real
Many corporate models are based on ‘how things should be’
This is very useful in understanding the business
BUT if the model must be populated with real data
E/R models are rarely models of real data
There aren’t any tools for trawling over real data data sets
and developing E/R models
The E/R model is invariably constructed and the data is
‘fitted’ into the model - and that means we need to clean data
before it becomes resident in the model.
PCLec08 / 52
Business Rules and Data modelling
E/R models lead to complex schemas which mitigate against
the objectives of Information Delivery
As an example, the E/R diagram which underpins Oracle
Financials (a current Applications Package) requires
approximately 2000 tables
SAP’s model can require 10,000 tables.
All of which tends to work against the objectives of easy to
understand models, and high performance.
PCLec08 / 53
Business Rules and Data Modelling
Chris Date (An Introduction to Database Systems, 7th
edition) has this to say :
‘the E/R model is incapable of dealing with integrity
constraints or ‘business rules’ except for a few special cases.
Declarative rules are too complex to be captured as part of
the business model and must be defined separately by the
analyst/developer’.
PCLec08 / 54