Download Databases - Mr Fraser

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Microsoft Jet Database Engine wikipedia , lookup

Database wikipedia , lookup

Entity–attribute–value model wikipedia , lookup

Functional Database Model wikipedia , lookup

Clusterpoint wikipedia , lookup

Relational model wikipedia , lookup

Extensible Storage Engine wikipedia , lookup

Database model wikipedia , lookup

Transcript
A collection of organised data
 Data has structure
 Can be paper-based
 More common to talk about electronic
databases i.e. computer-based


A flat file database is a simple database that stores all data in a single table. A flat file database can
be stored in a text file, such as a tab-delimited file; a spreadsheet; or in a database file that contains
one or more unrelated tables.

Useful for simple lists:
Address book
CD collection








But many problems...
Data redundancy
›
data duplicated in many different files.
›
Makes data entry slower
›
Uses disk space
Data inconsistency
›
same data held in different files has to be updated in each separate file when it changes.
Program-data dependence
›
Every computer program has to specify exactly what data fields constitute a record in the file
being processed. Changes in data structure result in changes to programs
Lack of flexibility
›
Difficult and time-consuming to assemble the data from the various files and write new programs
to produce the required non-routine reports.
Data was not shareable
›
If one department had data that was required by another department, it was awkward to obtain
it.

A more complex database that stores data in multiple tables that are interrelated.
RDBMS benefits:





Reduced redundancy
Improved data consistency
Improved data integrity
Better security
Program-data independence













A file is a collection of sets of similar data called records. Each piece of data within a record is
called an item. Items are stored in fields. Database consists of a series of related files called
tables.
Table
› Records
 Fields
Imagine a file of student details in school. Data held may include;
Name, address, emergency contact, form, form teacher, form room, subjects studied...
Each of these is a field, and the data placed in them in the item, So we have...
A file, containing all the data about the students in the school
Records, each has the same sort of contents and each one relates to a specific student
Fields, which may have an item in them or not.
› Field ‘name’ will always have an item in it.
› ‘Additional notes’ may be blank.
Blank fields can cause problems when interpreting the data at a later date. Does it mean that
you have no information, or you have forgotten to enter the information?
If information is unavailable it is better to provide a standard response – N/A or Unknown for
example.
Some fields may be unique. Could be possible that all students have different names –
unlikely!!
Fields stored can be used to mail merge letters for contacting a particular group - All people
living in a specific village.
Some field items will be repeated in record after record - Form name, room and teacher






Field items could be of different length, and can cause problems...
A file where all the records are of the same length is said to have fixed length records.
Some fields are always the same length
› Postcode is always 7 characters
Some fields may need to be 'padded out' so they are the correct length
› Surname - If 15 characters are stored then Jenkins would be stored as 'JENKINS
'-7
char + 8 spaces
Advantage : Access is fast because the computer knows where each record starts.
Disadvantage : Using Fixed length records, the records are usually larger and therefore need
more storage space and are slower to transfer (load or save).






One or more of the fields can be of differing lengths in each record.
Advantages:
the records will be smaller and will need less storage space
the records will load faster
Disadvantages:
The computer will be unable to determine where each record starts so processing the records
will be slower.





When the record is stored, each field has a field terminator byte stored at the end of it, and
there is often a record terminator at the end of the whole record.
The first record of the example would be stored as.....
* is a field terminator
% is a record terminator
This record requires 33 bytes of storage... but each record will be a different size.

Each field starts with a byte showing the length of the field.
The whole record starts with a byte giving the size of the record.
The first record of the example would be stored as...

This record requires 34 bytes of storage.. but again, each record would be a different size.




Each record in a file must be identifiable and one field must be unique.
Known as the Primary or Key Field.
Terminology:
File = Table
Record = Tuples
Field = Attribute




Databases are collections of data arranged into related tables. There are lots of ways of
arranging the data in tables and each arrangement can be given a label according to how it
has been arranged
These labels are called their Normal Form.
Normalisation is the process undertaken to make sure a database has no redundant data or
inconsistencies.
Tables should be organised so that:
›
›
›
›
No data is unnecessarily duplicated
Data is consistent throughout the database
The structure of each table is flexible enough to allow you to enter as many or as few items as you want to
The structure should enable a user to make all kinds of complex queries relating data from different tables


"A table is in 1NF if it contains no repeating attributes or groups of attributes"
e.g. A student can take several courses.
Each course has several students attending.
The relationship can be represented by an ER diagram:

The attributes in these tables would look something like this:
STUDENT (StudentID, StudentName, DoB, Gender)
COURSE (CourseNumber, CourseName, LecturerID, LecturerName)
Consider the problems of creating a relationship between these 2 tables...
A link has to made between common fields... but there are no common fields! We could link the tables
by copying an attribute from one into the other, but whichever attribute we pick, there will always be
repeating attributes created,(which is unacceptable in 1NF!) as shown below...




STUDENT (StudentID, StudentName, DoB, Gender, CourseNumber)
No good - The student takes several courses, which one would be mentioned?

COURSE (CourseNumber, CourseName, LecturerID, LecturerName, StudentID)
No good - Each course has more than one student taking it.

How about allowing space for 3 courses on each student record?
STUDENT (StudentID, StudentName, DoB, Gender, Course1, Course2, Course3)





This is no good either - we have created a repeating attribute! The field, CourseNumber is repeated 3
times. The table is therefore not in first normal form.
In standard notation, this would be represented by a line above the repeating attribute.
To achieve 1NF we must make CourseNumber part of the Primary Key of the STUDENT table...
STUDENT(StudentID, CourseNumber StudentName, DoB, Gender)
By grouping StudentID and CourseNumber together, we can uniquely identify each student and the
courses they are taking without having any repeating attributes = 1NF.









2NF only applies to table that have a Composite Key!
"A table is in 2NF if it is in 1NF and it contains no partial dependencies"
At the moment, our tables are not in 2NF because they contain attributes that are only partially
dependent upon the Primary Key... to be in 2NF, all attributes need to be wholly dependent on
the Primary Key.
All very well... but what does it mean?
The Primary Key of our STUDENT table is a Composite Key, made up of both StudentID and
CourseNumber. The attribute StudentName is dependent upon StudentID (One
specific StudentID will refer to only 1 student) but it is in no way dependent upon
CourseNumber (We cannot identify an individual student by a CourseNumber). This makes
it only partially dependent on the Primary Key, and therefore not in 2NF.
To achieve 2NF we need to introduce a 3rd table to link the two entities:
STUDENT(StudentID, StudentName, DoB, Gender)
COURSE (CourseNumber, CourseName, LecturerID, LecturerName)
STUDENT_TAKES (StudentID, CourseNumber)








"A table is in 3NF if it contains no non-key dependencies"
The COURSE table contains an attribute for LecturerID and also one for
LecturerName. LecturerName is dependent on LecturerID (not on CourseNumber)... We need
a new table for the entity LECTURER!
STUDENT(StudentID, StudentName, DoB, Gender)
COURSE (CourseNumber, CourseName, LecturerID)
STUDENT_TAKES (StudentID, CourseNumber)
LECTURER (LecturerID, LecturerName)
This is the optimum way of holding this data without any duplication.
All tables in a Relational Database should be in 3NF!



Each table contains one special attribute by which tuples can be identified because it is
unique
Primary key – shown by underlining its reference within the bracket of attributes
A Primary Key is one or more attributes which uniquely identify an entity occurrence.
Sometimes a single attribute is not sufficient to identify each occurrence of an entity uniquely.
In these instances we must combine two or more attributes to create a Composite Key. For
example, a person's name by itself will not necessarily be enough to identify an individual. A
person's name combined with their address may be more appropriate.


A key in one table that occurs in another table is called a Foreign Key - Used to link 2 tables
together
Terminology:
File = Table
Record = Tuples
Field = Attribute

Every entity has a name and a Primary Key. Most entities will also have a number of nonidentifying attributes. The convention used for defining attributes is shown below.

EntityName (Identifying Attribute1, NonIdentifying Attribute1, .....)

The name of the entity is followed by a list of its attributes in brackets. The identifying attribute(s)
(Primary / Composite Key) comes first and is underlined. When all the attributes are not yet
known this can be shown by a row of dots.

An entity-Relationship (ER) diagram shows what information is stored and how it is
related i.e. it models the structure of the data.

There are 3 main concepts in an ER Diagram:


Entities - Things, usually nouns e.g. 'Student'
Attributes - Properties of things e.g. 'Name', 'StudentID'
Relationships - Connections between things e.g. Student 'studies' Course

An entity is a real world object about which data is to be recorded.

Attributes are properties, or characteristics, of entities.

Relationships are associations between entities.

An ER model consists of:

›
›
›
›
A diagram showing entities and the relationships between them.
Formal descriptions of each entity in terms of its attributes.
Descriptions of meanings of relationships.
Descriptions of any constraints on the system and of any assumptions made.

Diagram Conventions for an ER Diagram

Entities are shown as rectangles with the name of the entity inside.
Name

When choosing an entity name:
Use singular nouns e.g. 'Student' not 'Students'.
Start with an upper case letter and concatenate words e.g. DegreeScheme.
Choose distinct names.

Three degrees of relationship can be represented:

A one-to-one relationship
1:1

A one-to-many relationship
1:n

A many-to-many relationship
m:n

Example:












A database can very quickly become complicated. They require something to control it and
to control access to it. It needs to control the amending of data to ensure that all the rules
remain unbroken. Addition & deletion of data must also be controlled.
This software is called Database Management System.
Database Management System
Data storage, retrieval and update DBMS must allow users to store, retrieve and update
information as easily as possible, without having to be aware of the internal structure of the
database.
Creation and maintenance of the data dictionary
Managing the facilities for sharing the database ensure that problems do not arise when two
people simultaneously access a record and try to update it.
Backup and recovery provide the ability to recover the database in the event of system
failure.
Security handle password allocation and checking, and the ‘view’ of the database that a
given user is allowed.

Includes a piece of software called Data Description Language DDL. DDL is used to define the
tables in the database, including;
›
›
›
›
›
›
›
›
›
›
›
Data types
Data structures within the database
Any constraints on the data
The design that is created is called a schema.
Each user of the database will use it for different things, will be allowed to see different parts and will be given their own
subschema to give the rules of how they see data.
Users of the database will be given different rights:
Db Admin allocates users to groups of one or more & assigns each group a set of privileges or permissions.
Permissions determine whether user can view / modify / execute / update.
Each user / group has own username
Each user has individual password (can & should change regularly)
Some will involve manipulating data (amend/delete/insert new data). Done using a tool called a Data Manipulation Language –
DML.













Manipulation techniques of a DBMS can simplify the use of the DDL & DML
Query By Example (QBE)
The DBMS maintains a file of descriptions of the data and the structure of storage known as the
data dictionary.
Data Dictionary
The data dictionary is a ‘database about the database’. It will contain information such as:
What tables and columns are included in the present structure;
The names of the current tables and columns;
The characteristics of each item of data, such as its length and data type;
Any restrictions on the value of certain columns;
The meaning of any data fields that are not self-evident; e.g. a field such as ‘course type’;
The relationships between items of data;
Which programs access which items of data, and whether they merely read the data or
change it.
Various tools allows the DBMS to present differing views of the data held within the database.









Internal level – 1st Level
View of the entire database as it is stored in the system
Level at which data is organised according to random access, indexed, sequential...
It is hidden from the user by the DBMS
Conceptual level – 2nd Level
Gives a single, usable, view of all the data on the database
External level – 3rd view
Where data is arranged according to user requirements and rights
Different users will get different views of the data