Download Pclec02

Document related concepts

Concurrency control wikipedia , lookup

Oracle Database wikipedia , lookup

Microsoft Jet Database Engine wikipedia , lookup

Ingres (database) wikipedia , lookup

Database wikipedia , lookup

Microsoft SQL Server wikipedia , lookup

Open Database Connectivity wikipedia , lookup

Clusterpoint wikipedia , lookup

Functional Database Model wikipedia , lookup

Extensible Storage Engine wikipedia , lookup

Entity–attribute–value model wikipedia , lookup

PL/SQL wikipedia , lookup

Relational algebra wikipedia , lookup

SQL wikipedia , lookup

Database model wikipedia , lookup

Relational model wikipedia , lookup

Transcript
Lecture 2
This lecture will introduce some more terminology
- about primary keys, foreign keys, candidate keys,
access keys and some design concepts.
There will be a brief mention of table structures,
constraints and values.
There are some examples of outputs of queries
And we will also look at SQL
Relational Database Concepts
The relational data model was developed by Dr. E.Codd (a
mathematician) in the late 1960’s and early 1970’s.
The theory of normalisation of data is closely linked.
Databases based on the relational model should be easy to
use and understand.
There should be no need for the user to be aware of the
physical structure of the underlying files.
Most databases developed for commercial use are now
based on the relational model.
Data Models
Codd suggests that any data model has three components:
the data structures;
the integrity constraints;
the data manipulation operators.
The Relational Data Model
Data Structures - domain, attribute, relation, row (tuple),
primary key, degree, cardinality.
Integrity Constraints - entity integrity and referential integrity.
Data Manipulation Operations - defined through relational
algebra and equivalent relational calculus.
The Beginning of the Relational Model
In 1969, Dr. Edgar F Codd published an original paper titled ‘
‘Derivability, Redundancy, and Consistency of Relations
Stored in Large Data Banks’
In 1970, there was a revised version titled ‘A Relational
Model of Data for Large Shared Data Banks’
Dr. Codd’s Relational Theory
He also published
• ‘Relational Completeness of Data Base Sublanguages’
• ‘A Data Base Sublanguage Founded on the Relational
Calculus’
• ‘Further Normalisation of the Data base Relational Model’
• ‘Interactive Support for Non-programmers : The Relational
and Network Approaches’
Dr. Codd’s Relational Theory
• And ‘Extending the Relational Database Model to Capture
More Meaning’
Dr. Codd also produced papers relating to
• Multiprogramming
• Natural Language processing
The Relational Model serves as the basis for the theory of
data - he instigated the ideas of predicate logic as a
foundation for database management and he defined both a
relational algebra and a relational calculus as a basis for
data in relational form.
Dr. Codd’s Relational Theory
• The Original Paper (‘Derivability, Redundancy, and
Consistency of Relations Stored in Large Data Banks’ 1969) contains references to these aspects:
1
2
3
4
5
6
A Relational View of Data
Some Linguistic Aspects
Operations on Relations
Expressible, Named and Stored Relations
Derivability, Redundancy and Consistency
Data Bank Control
Dr. Codds Relational Theory
• The relational model described ‘provides a means of
describing data in terms of its natural structure’ - no machine
representation details.
• The model also provided a basis for constructing a high-level
retrieval language with ‘maximal data independence’
which led to the development of SQL
Meet Dr. E.F.Codd
Relational Data Structure
Relation
Attribute
EMPLOYEE
Empno
Name
Gender
Mgr Empno
E1
Jones
Male
E65
E6
Smith
Male
E28
E28
Jones
Female
-
Empno
E1 to
E125
Gender
Domains
Female
Male
And what about ‘expired’ Empnos ?
Heading
Body
Domains
Employee
Empno
E1
E2
E3
Name
Red
Brown
Black
Mgrno
E1
E1
Attributes
Person Name
Red, Brown
Black, Blue
E1, E2, E3,E4
Empno
Domains
Value Sets and Domains
• Domains in Relational Database can be extensive and
complex.
• A ‘domain’ (a restriction of value or expression) can be
applied to the result of a function or of a derived value.
For example, the multiplication of a person’s age by the
person’s I.D. would not lead to a realistic value
A domain constraint would ensure that this process, if
initiated, would not proceed and would result in an error
message being displayed
Value Sets and Domains
• The arithmetic addition of an I.D. and a date of birth would
also be a non realistic value
• Domains can be used to limit which attributes can be
associated with other attributes - this leads to interesting and
complex processes - Rules and Procedures (Ingres) and
Triggers and Constraints (Oracle).
• Access has the option of delving into Visual Basic
• Does anyone know what SQLServer has available ?
Relational Data Structures
• The only structure available is a 2-dimensional file of data.
• This is known as a relation or table.
• Each entity corresponds to a table and each attribute to a
column (or field) in that table.
• Each entity occurrence corresponds to a row of the table.
Properties of Relations
•
Data is held in tables
•
There is no order of data in the tables - either in row
or attribute
•
Primary Key - Foreign Key relationship
•
Data Typing including NULLS
•
Query Access - insert, update, delete, retrieval
•
Indexing on Candidate (and Primary) keys
Some Concepts
A database system is a computerised record keeping system
A database is a collection of structured data files and
associated indexes
A database user must be able to add, retrieve, insert, update
and delete data and files
A set is any collection of definite distinguishable things.
Olympians for instance are a ‘set’ of people.
The term distinguishable means that in inspection of any 2
things which fit into a set, there must be the capability of
deciding if they are identical or different
Some Concepts
The term ‘definite’ means that if the set is known, and the
thing is known, a decision can be made that
(a) the thing belongs to the set or
(b) the thing does not belong to the set
For the set to be known, it is sufficient if the members are
known
A Relation
Relations exist between 2 or more things
There is a relation between Lleyton Hewitt and tennis
There is a relation between Steve Waugh and cricket
There is a relation between Tiger Woods and golf
We could present this as :
Name
Sport
Lleyton Hewitt
Tennis
Steve Waugh
Cricket
Tiger Woods
Golf
and we have a relation of degree 2. We can also have relations
of any required degree 3, 4, 5 ……….
A Relation
This is a table of ‘ordered pairs’ and the relationship is
directional Lleyton Hewitt plays Tennis - Tennis doesn’t play
Lleyton Hewitt. This is a binary relation.
The order is horizontal, and is row limited.
The order of the rows in the table is immaterial to the data
In this example (and in any table) the relationship is the set
of all ordered pairs
(Question : what happens to this data if, for instance, Lleyton
Hewiit is unable to play tennis ?)
Another Relation
We could have this
Name
Activity
Smith, J
Doctor
Ellis,T
Blacksmith
Werija,K
Lecturer
Brack,S
Premier
Residence
Clayton
Colac
Caulfield
Ballarat
Date of Death
22-09-1998
12-10-1976
???
???
This is a relation, or table, of degree 4
Notice that each row has only 1 entry in each ‘column’ or
attribute - this is called the ‘atomic value’
Strictly Speaking
A ‘set’ in mathematics has no duplicates
A relation is a set, so a relation shouldn’t have duplicates either
A relational database consists of tables
A table is not a relation, but the only difference is that a table
may have duplicate row values (not a good idea)
Duplicate rows should be avoided and the duplicates erased
All relational database should consist of relations
Relations must have unique names
A Table
A Table :
Is a named set of rows - an ordered row of one or more
column names, together with zero or more unordered rows
of data values
Tables store data about a specific entity - each row in a Table
describes a single occurrence of that entity.
The SQL Standard defines 3 types of tables - Base tables,
Views, and Derived tables
More on Tables
Base tables are created and managed with the Create Table,
Alter Table and Drop Table statements.
Views are created and managed with the Create View and
Drop View statements
Derived tables are created when a query is executed.
Tables are dependent a Schema or a Module.
More on Tables
Column :
A column is a named component of a table. A set of similar
data values describe the same attribute of an entity. A
column’s values all belong to the same data type or to the
same Domain, and may vary over time.
A Column value is the smallest unit of data which can be
selected from, or updated in, a table.Columns are dependent
on some table, and are created, altered, and dropped with
column definition in the Create Table and Alter Table
statements
A Primary Key
• McFadden, Hoffer and Prescott define a Primary Key as :
An attribute (or combination of attributes) which uniquely
identifies each row in a relation. (table)
• Richard T. Watson has this to say:
The primary key definition block specifies a set of column
values comprising the primary key. Once a Primary Key is
defined, the system enforces its uniqueness by checking that
the Primary Key of any new row does not already exist in the
table.
A Primary Key - What’s That ?
• A key - a unique identifier
‘A key is said to be nonredundant if every attribute it
contains is necessary for the purpose of unique identification
- if any attribute of the key were removed, the remaining
attributes would not be a unique identifier’
And a Foreign Key ?
• McFadden, Hoffer and Prescott’s definition:
An attribute (or attributes) in a relation (table) of a database
which serves as the Primary Key of another relation (table)
in the same database.
• Richard T. Watson says:
An attribute (or attributes) that is a Primary Key in the same
table, or another table. It is the method of recording relations
in a relational database.
And, both the Primary and Foreign Key(s) should be drawn
from the same Domain.
Other Keys
• Candidate Key(s) - is a key (an attribute, or attributes) which
should be considered as a Primary Key
• Access Key - an attribute, or attributes, other than the
Primary (or Foreign) key on which data will be retrieved from
a table e.g. postcode as in your second tutorial example
SQL - An Introduction
• With SQL, the user does not ‘open’ nor ‘close’ tables
• A user normally has a subset of tables to which access is
allowed, and privileges are granted to allow the user to
perform some specific functions
• A query (an access to data in a table or tables) returns the
whole result set all at once. All of the required rows are
updated, inserted or deleted - or none of the rows are.
• The whole set involved in the ‘transaction’ works, OR the
whole ‘transaction’ fails
A Transaction
A transaction is a sequence of SQL statements which Oracle
treats as a single unit
The set of changes is made permanent with the Commit
statement
Part or all of a transaction can be undone with the Rollback
statement
A transaction starts with the execution of the first SQL
statement in the transaction and ends with either the Commit
or Rollback statement
SQL - An Introduction
Transaction Control
• A transaction in SQL is either completely finished OR it is not
done at all
• No partial results can be produced
• Work done can be committed - it becomes a permanent part
of the database or it can be rolled back - the database is
restored to the state prior to the transaction commencing
• SQL programmers need to be aware of the need for
concurrency control - that is the sharing of the database
contents among transactions (more about this later)
A Transaction
Oracle guarantees that a transaction has statement-level
read consistency (the data stays the same while Oracle is
gathering and returning it)
If a transaction has multiple queries, then each query is
consistent, but not with each other
Transaction-level read consistency can be achieved with the
Set Transaction Read Only - (queries only)
SQL - An Introduction
SQL has some very specific rules
1 is that every table has a structure
Another rule is that insertion, updating and deletion of rows
in each table can only occur if all the rows have the same
structure as the rest of the rows in the table
This reinforces the rule that
– A table is a set of rows of one particular type
SQL - An Introduction
A table has no ordering - data is not ‘in ascending or
descending’ order or ‘date’ order ….
Columns are referenced by name only, not by their relative
position in a table
The columns of a table can be re-arranged, BUT the SQL
statements referencing this or these tables are not affected
Properties of Relations
Integrity Constraints included in the DBMS
– Attribute value ranges
– Referential Integrity
– Entity Integrity - No part of any Primary key may be
null
Set retention constraints
(how long to retain a set of data)
Domain constraints
User Defined Rules
Recovery Procedures (after failure)
Properties of Relations
No explicit linkage between tables - set up at run time
Linking or Embedding database operations in a procedural
language
The Database may be distributed across similar or
different DBMS’s
A Relational Database
EMPNUM
3
7
11
18
NAME Date of Birth DEPTNUM
JONES
27/11/1967
650
ADAMS
14/10/1978
432
NGUYEN 9/05/1977
314
PHAN
30/06/1969
432
Relation Schema EMP(empnum,name,age,deptnum)
DEPTNUM
650
432
314
DEPTNAME
PRODUCTION
INFOSYS
FINANCE
Relation Schema DEPT
(deptnum, deptname)
A Relational Database
EMPNUM
3
7
11
18
NAME Date of Birth DEPTNUM
Relation
JONES
27/11/1967
650
EMP
ADAMS
14/10/1978
432
NGUYEN 9/05/1977
314
PHAN
30/06/1969
432
Relation Schema EMP(empnum,name,age,deptnum)
DEPTNUM
650
432
314
DEPTNAME
PRODUCTION
INFOSYS
FINANCE
Relation DEPT
Relation Schema DEPT
(deptnum, deptname)
More Terminology
The degree of a relation is the
number of attributes in that relation.
Degree
1
2
3
.
n
Name
unary
binary
ternary
n-ary
The cardinality is the number of rows in the relation (table).
Primary Keys
A candidate key of a relation is a set of attributes
that satisfy two time independent properties:
Uniqueness - No two rows of the relation have
the same values for the set of attributes forming
the candidate key.
Minimality - No attributes can be discarded from
the candidate key without destroying the
uniqueness property.
Empnum
E110
E261
E311
Surname Given Name Tax FileNo
Parkes
John
100-100-232
Kimball
John
Hurwitz
Fred
101-111-222
Entity Integrity
· No component of the Primary Key of a base relation is
allowed to accept nulls.
Surname Given Name
Parkes
John
Kimball
Hurwitz
Fred
Ashton
Salary
40,000
50,000
60,000
70,000
What is the Primary Key ?
Foreign Key
· A foreign key is an attribute or attribute combination
of one relation R2 whose values are required to
match those of the primary key of relation R1 where
R1 and R2 are not necessarily distinct. The foreign
key and the corresponding primary key should be
defined on the same domain(s).
Empnum Surname Worksfordept
E110
Parkes
d1
E261
Kimball
d3
E311
Hurwitz
d2
Employee
Foreign key
Dept
d1
d2
d3
Dname
Pay
Tax
Art
Dept
Referential Integrity
If base relation R2 includes a foreign key FK matching the
primary key PK of some base relation R1 then every value
of FK in R2 must either
(a) be equal to the value of PK in some row of R1, or
(b) be wholly null.
Note that PK and FK may comprise more than one attribute
and that R1 and R2 are not necessarily distinct.
( Stated more simply : a foreign key should associate to
a valid primary key value, or the foreign key should be
null.)
Recording Design Decisions
Formal design decisions can be recorded in the same
graphical notation as an E-R diagram.
This is called a data structure diagram and is developed
from normalised relations using a few simple steps.
Recording Design Decisions
a) Treat each relation as an entity, represent it as a
rectangle and enter its name.
b) Primary and Foreign keys are used to establish the
relationships (Note; a foreign key can be part of a
composite primary key).
If the primary key in one relation exists as the foreign key
in another relation, then draw a line linking the
relationship between these two entities.
Some E-R Examples
DEPARTMENT(DeptNo,Dname)
EMPLOYEE(Empnum,Ename,Salary,DeptNo)
EMPLOYEE
DEPARTMENT
STUDENT(StudentNo,Name)
UNIT(Unitcode,Title)
RESULT (StudentNo,Unitcode,Result)
STUDENT
RESULT
UNIT
Open to Interpretation
Student
Course
Unit
Text
There are a number of ‘rules’ in this model, which
determine the relationships.
They are known as Business Rules.
The Rules ?
•
•
•
•
•
•
A student must be enrolled in 1 Course
A Course may contain zero, or many students
A student may be enrolled in many units, but at least 1
A unit may attract many students (or no students ?)
Each Unit has one prescribed text
Each text is associated with one unit
Open to Interpretation
Customer
Invoice
Line
Each Customer may generate one or
more Invoices
Each Invoice is generated by one Customer
Each Invoice contains one or more lines
Each line is contained in an Invoice
Product
Each line references one product
Each Product may be referenced in one or more lines
Modelling to Processing
So,
how do we convert the conceptual design details into
software which allows for the entry of data into the
appropriate tables,
and for further processing to allow for the use of this data to
respond to queries ?
Something Different - or, how do we make this happen ?
An Introduction to SQL
Some Comments Regarding SQL
In the next few overheads, there will be some terms and
explanations which should help you to make the transition
from the methods of data storage and file processing to that
of the relational database style of storage and processing of
data.
An Introduction to SQL
Firstly some plusses for SQL.
1. SQL is the one industry standard for querying databases
2. Other ‘tools’ such as front enders don’t allow the developer
to use all of the features of a database
3. Tools provided invariably do not exploit the full functionality
of the underlying language
4. An SQL query in a client-server environment can be run in
any application language and the result will always be the
same
Some SQL Basics
SQL acts as a bridge between
– the user
– the database management system (DBMS)
– the data tables
– the transactions which involve the previous 3 items
SQL also allows the ‘system’ to be administered and
managed by a database administrator using the same format
: procedural commands and data in tables. (.net ?)
SQL can be embedded into source code from C to Pascal
Procedural and Non-Procedural Languages
SQL requires a different approach from that used in other
programming languages
C, Fortran, Basic, Cobol, Pascal, PL/1 are procedural
languages. They are characterised by statements which tell
the executing computer what to do, and in a structured stepby-step way (even when loops are used).
SQL is a declarative language - the computer is told what the
user wants to achieve and the computer ‘decides’ on how to
achieve this requirement, and correctly.
The user sees the results.
SQL Sets
SQL is a set-oriented language.
Many programmers are used to file-oriented languages.
A set is an unordered collection of items, all of which have the
same type and structure
These sets become tables in SQL, and are made up of vertical
attributes (or columns) and horizontal rows
SQL - Data Manipulation
Data Retrieval (DML)
SELECT
retrieve data from table
Data Modification (DML)
INSERT
UPDATE
add a single row or copy rows from other
table(s)
amend column values
DELETE
delete rows of data
Data Definition - DDL (Oracle)
Creating Tables
create table emp,
(empno number(6,0), name varchar2(20), salary
number(6,0), age number(3,0), deptno number(5,0));
A table is defined.
Space is reserved.
The system catalogue is updated. (also known as the Data
Dictionary)
Table and Column Names begin with alpha (A-Z) less than
or equal to 12 characters
Table names contain (A-Z, 0-9)
Column names contain (A-Z, 0-9,)
Data Definition (DDL)
Did you notice the entries such as
– Number(5,0)
– Varchar2(20)
– Number(6,0)
in the previous overhead ?
These are ‘data types’ and further assist integrity by defining
actual data values which can exist for each attribute
The size (or number of bytes) of each attribute is also
expressed (either explicitly or implicitly)
Overview of SQL
Data Definition (DDL)
Create Table
define table and constraints
Create View
define user view of data
Alter Table
add new columns (Oracle)
Drop Table
delete table
Drop View
delete user view
Overview of SQL
Data Control
Commit
Rollback
commit changes to the database
rollback previous changes
Data Security
Grant
Revoke
grant access privileges to users
revoke access privileges
Relational DBMS Products
IBM Relational Products
DB2/nn
SQL/DS
QMF
CSP
MVS/370 MVS/XA
VM/CMS DOS/VSE
front-end to DB2 and
SQL/DS
application development tool
Numerous other RDBMS
ORACLE 8, 8i, 9i
OPENINGRES
from ASK Corp.
(OSL,ABF)
AIM/RDB from Fujitsu
INFORMIX - now in DB2
VAXSQL/Rdb from DEC
NonStop SQL from Tandem
Microcomputer versions
SQL Server (as in MS 2000)
Quadbase-SQL
ORACLE
INGRES
dBASEV / Visual dBASE
microSQL
practically all micro DBMS
Other Oracle Products :
Designer2000, Developer2000,
Programmer2000,
Discoverer2000
Can you explain this ?
3 people agree to buy an item for $30 and hand over
$10 each.
• The salesperson discounts the item by $5 , and
refunds each person $1 each. Each person has
therefore paid $9. (5/3 does not give an even
amount)
• He keeps the remaining $2 as a token of good will.
• Mathematically, 3 x $9 = $27 plus the additional
$2 = $29
• The question is, where is the other $1 ???
And , what are your views on this ?
These are quotations :
Eye Drops Off Shelf
Wild Cow Injures Farmer with Axe
and Cold Wave Linked to Temperatures
Relax - until the next session